GRC
HR
SCM
CRM
BI


Article

 

Speed, Scalability, and Flexibility All at Once with New High Performance Analytics

by Andrew Ross | SAPinsider

October 1, 2005

Imagine billions of data records at your fingertips, organized to host complex queries and deliver aggregated data within seconds. Learn how high performance analytics, a feature of ¬¬SAP NetWeaver 2004s, can bring relevant data to the desktops of those who need it faster than ever before.
 

SAP NetWeaver 2004s 1 Business Intelligence (BI) is faster than ever before, accelerated by a massively boosted capability for high performance analytics (HPA) humming behind the familiar BI frontends.

Before HPA, most approaches to accessing BI data confronted IT staff with a maintenance challenge. Administrators had to study user behavior to target frequently asked queries, then build summary tables and database indexes for those queries. This was skilled work and hence a major cost driver. Response times were improved only for the targeted queries, so good judgment was required to satisfy users. To keep this work within bounds, access to the data was often restricted by means of predefined reports for selected users. Now, HPA enables companies to relax such restrictions.

Running HPA, users can call up aggregated sales data for their company's operations, slice it and dice it, drill down anywhere for details, and expect exact responses in seconds or less every time, no matter how unusual their request — even with billions of records filling terabyte volumes.

For the busy executive, the key benefits of SAP NetWeaver BI with HPA are:

  • It's fast. Average query response times are 10 to 100 times faster than traditional approaches.

  • It's flexible. HPA's on-the-fly aggregation greatly reduces administrative overhead, so there's no technical need to restrict user freedom.

  • It's inexpensive. State-of-the-art hardware can be scaled exactly to your requirements, and falling hardware prices ensure that HPA's total cost of ownership (TCO) advantage over other tuning approaches will only grow in the future.

Orders-of-Magnitude Improvements

To realize the boosted BI analytics capability, SAP developed a new 64-bit HPA engine. The new engine is based on an existing SAP search engine, but features a number of radical innovations at the technical level.

The results are scorching. With the new HPA engine, you can expect speeds that are not just incremental improvements, but orders of magnitude better than what you could get from most other analytics software.2

Aggregation on the Fly

The HPA engine can aggregate key figures on the fly, during query runtime. It uses powerful and dedicated hardware resources to do all the work in memory, with no disk accesses to slow it down (see Figure 1). The engine takes the query entered by the user, computes a plan for answering it, joins the relevant column indexes to create a join path from each view attribute to the BI fact table, performs the required aggregations, and merges the results for return to the user.

Figure 1
HPA Indexes Are Held in Memory and the HPA Engine Aggregates on the Fly

This ability to aggregate during runtime is the decisive benefit of HPA. All the chores of prebuilding indexes and aggregates and realigning them regularly fall away. The realignment slots in nightly or weekly load windows are freed up, and administrators who previously did these chores can spend their time more productively. Users benefit from predictable response times, which encourage them to slice and dice their data in new ways and gain new insights. Freed from the old bottlenecks, companies can empower more users.

Vertical Decomposition of Tables

The HPA engine decomposes table data vertically, into columns that are stored separately (see Figure 2). This makes more efficient use of memory space than row-based storage, since the engine needs to load only the data for relevant attributes or characteristics into memory.

Figure 2
Vertical Decomposition of Tables Enables Columns To Be Stored Seperately

This is a good idea for analytics, where most users want to see only a selection of data. In a conventional database, all the data in the table is loaded together, in complete rows, whereas the new engine touches only relevant data columns. The engine can also sort the columns individually to bring specific entries to the top. The column indexes are written to memory and cached as flat files. Efficiency is improved because both the memory footprint of the data and the input-output flows are smaller.

Data Compression

Data for HPA is compressed using integer coding and dictionary lookup. Integers represent the text or other values in table cells, and the dictionaries are used to replace integers by their values during post-processing. In particular, each record in a table has a document ID, and each value of a characteristic or an attribute in a record has a value ID. An index for an attribute is simply an ordered list of value IDs paired with sets of IDs for the records containing the corresponding values (see Figure 3).

Figure 3
Data Compression is Based on Integers and Dictionaries

To compress the data, a variety of methods are employed, some of them highly innovative and covered by recent patent applications. Integer compression greatly reduces the average volumes of processed and cached data. This allows more efficient numerical processing and smart caching strategies, which reduce the data volumes and flows by an average factor of 10. Hence all query processing can be performed in main memory.

Horizontal Partitioning of Indexes

The HPA engine can partition large tables horizontally for parallel processing on multiple machines in distributed landscapes (see Figure 4). This enables it to handle huge data volumes yet stay within the limits of installed memory. Formerly, large data volumes were kept on disk. Now the volumes can be split over multiple hosts by a round-robin procedure to build up parts of equal size, so that they can be processed fast and in parallel. A logical index server distributes join operations over partial indexes and merges partial results, all so smoothly that the index looks the same as ever to BI.

Figure 4
Large Tables Can Be Split Horizontally for Parallel Processing

This scalability enables HPA to run on advanced computing infrastructures, such as blade servers. HPA runs on 64-bit platforms, where memory address space limitations are finally a thing of the past.

Scalable Multiserver Architecture

The use of scalable and distributed search technology enables investment in hardware and network resources to be continually optimized to reflect changing availability requirements and load levels (see Figure 5). In each landscape, a name server maintains a list of active services, switches to backups where necessary, and balances load over active services.

Figure 5
The High Performance Server Architecture is Highly Scalable

Although the first customers will implement HPA on preconfigured hardware that they simply plug into their existing landscape, future customers will be able to configure new hardware dynamically. If the capacity is available, HPA will then run on existing hardware and scale rapidly to suit changing load. In an adaptive landscape, HPA instances will simply be replicated as required, by cloning services, and form groups with master and backup servers and additional cohort servers for handling query load. Such groups can be optimized for both high availability and good load balancing, with the overall goal of requiring zero administration.

InfoCubes as Join Graphs

A BI InfoCube is a star schema for representing structured data (see Figure 6). A large fact table is surrounded by dimension tables (D) and sometimes also X and Y tables. The S tables spell out the values of the integer IDs used in the other tables.

Figure 6
The Standard BI InfoCube Is Represented as a Join Graph

HPA models and algorithms are tailored and optimized to work well with InfoCubes. The HPA metamodel is designed to represent a star schema logically as a join graph, where joins between the tables forming the star schema are predefined in the model and materialized at runtime by the new query engine. The HPA metamodel bridges the gap between InfoCubes, which contain structured data, and search engine technology, which was originally developed to work with unstructured data.

Plug and Play

In summary, the technical benefits of SAP NetWeaver BI with the HPA capability are:

  • Speed: The use of advanced compression and aggregation algorithms enables the new engine to achieve query response times that are on average about 10 times faster than previous approaches.

  • Flexibility: The new approach greatly reduces administrative overhead. With HPA, there are no hunts for frequently asked queries, no pre-aggregations, and no realignment runs. So there is no technical need to restrict user freedom.

  • Ease of use: To deploy HPA, customers can simply plug in the new hardware (available preconfigured in sizes from XS to L) and use it through theirSAP NetWeaver 2004s BI frontends.3

For more information on HPA and other new or enhanced capabilities in SAP NetWeaver BI, just contact your local SAP sales office, and check the latest news at www.sap.com/solutions/netweaver/businessintelligence/index.epx.

Analytics Demo Wows Audience

At SAPPHIRE 2005 in Boston, SAP Executive Board Member Shai Agassi showed off the new high performance analytics (HPA) capability delivered as part of SAP NetWeaver 2004s Business Intelligence (BI). From a BI analytic dashboard (sample screen shown in Figure 7), Agassi launched complex queries against a billion data records and got results back in seconds. He then showed how to compose new dashboards in minutes from a set of services with the help of a drag-and-drop tool called Visual Composer. 4

Figure 7
Sample BI Analytic Application Screen

For the demonstration, the HPA engine ran under 64-bit Linux on eight blades, each with dual Intel Xeon processors, mounted above a server running SAP NetWeaver 2004s BI, and a filer, all in a standalone cabinet.


1 - SAP NetWeaver 2004s is the mySAP Business Suite edition of SAP NetWeaver 2004. It is a minor release that delivers on specific needs of the mySAP and xApps solutions delivered by SAP in 2005. An implementation of SAP NetWeaver 2004s is recommended only to customers requiring it for those solutions. However, some customers may wish to implement SAP NetWeaver 2004s in order to benefit from specific enhancements, such as high performance analytics.

2 - My team first demonstrated these performance improvements in late 2003, and later work has increasingly confirmed our expectations. For a high-level overview, see "Business Intelligence Best (and Worst!) Practices" by Peter Graf in the January-March 2005 issue of SAP Insider (www.SAPinsider.com).

3 - During the Ramp-Up phase (six months starting October 2005), HPA will only be available on preconfigured hardware (as a set of blade servers that can be plugged into an existing rack or cabinet in the customer landscape). Unrestricted shipment begins in Q2 2006.

4 - For more information on this drag-and-drop tool, see "Visual Composer — A Model-Driven Development Tool for Enterprise Portal iViews" by Karl Kessler in the October-December 2004 issue of SAP Insider (www.SAPinsider.com).


Andrew Ross is a developer in the SAP NetWeaver AS TREX team, which developed the HPA capability in collaboration with SAP NetWeaver BI. He became part of the TREX team in 2003. He joined SAP in 1999, working at first in SAP Active Global Support. Earlier he was a computer science editor at a publisher in Heidelberg. He is British and holds four degrees in technical philosophy, three from Oxford and one from London.

An email has been sent to:






More from SAPinsider



COMMENTS

Please log in to post a comment.

No comments have been submitted on this article. Be the first to comment!


SAPinsider
FAQ