In a 'scale-out' environment of SAP HANA, there may be many servers with InfoCubes and DSOs. Some of these can be so very large and accessed so very frequently that it can lead to bottlenecks, even for HANA. To fix this, you can partition large tables and move them across multiple servers to provide load balancing. In this blog, we look at some of the basic HANA partitioning options and new tools under development specifically for BW on HANA.
By Dr. Berg
When monitoring the system, you can sometimes see tables that have grown so large in HANA that it makes sense to split them ‘horizontally’ into smaller partitions. This is possible for column tables in HANA by de-fault. This is really a nice way to manage high data volumes. The SQL statements and any data manipulation language (DML) statements do not need to know that the data is partitioned. Instead, HANA manages the partitions behind the scenes automatically. This simplifies the access, front-end development and also gives the administrators a key tool to manage disks, memory, and large column stores.
In a distributed (scale-out) HANA system, you can also place the partitions on different nodes and thereby increase performance even more since there will be more processors available for the users. In fact, this may become the standard deployment method for extremely large systems with tens of thousands of users.
Currently, HANA supports up to 2 billion rows in a single column table. In a partitioned schema, you can now have 2 billion rows per partition and there is virtually no limit on how many partitions you can add. It becomes a hardware and landscape architecture issue, not a database limitation. There are three different ways you can create partitions in HANA from an admin standpoint. This include by ranges, by hash and by round-robin method. While more complex schemas are possible with multi-level partitioning, these three options covers the basics used in the higher level options. So let’s take a look at the fundamental partitioning choices
h2>Option 1: Partitioning Column Tables by Range
If you know your data really well, you can partition the data by any range in our table. While the most common is date, you can also partition by material numbers, postal codes, customer numbers or anything else.
A partition by date makes sense if you want to increase query speed and keep current data on a single node. Partition by customer number, makes sense if you are trying to increase speed of delta merges, since multiple nodes can be used at the same time during data loads. So you have to spend some time thinking of what benefits you want to achieve before undertaking any partitioning scheme. It should be noted that the maintenance of range partitions are somewhat higher than the other options since you will have to keep adding new partitions as data outside the existing partitions emerge (i.e. next year’s data if you partition by year now). Partitioning is done by SQL and the syntax for range partitioning is simply:
pre>CREATE COLUMN TABLE SALES (sales_order INT, customer_number INT, quantity INT, PRIMARY KEY (sales_order))
pre>PARTITION BY RANGE (sales_order)
pre>(PARTITION 1 <= values < 100000000,
pre> PARTITION 100000000 <= values < 200000000,
pre> PARTITION OTHERS)
This creates a table with 3 partitions. The first two have 100 million rows each and the last have all the other records. There are some basic rules though. First, the field we are partitioning on has to be part of the primary key (i.e. sales_order). Second, the field has to be defined as string, date or integer and finally, we can only partition column stores, not row stores.
h2>Option 2: Partitioning Column Tables by Hash
Unlike partitioning by ranges, partitioning column stores by the hash does not require any in-depth knowledge of the data. Instead, partitions are created by an internal algorithm applied to one, or more, field in the database by the system itself. This is known as a hash. The records are then assigned to the required partitions based on this internal hash number. The partitions are created in SQL and the syntax is:
pre>CREATE COLUMN TABLE SALES (sales_order INT, customer_number INT, quantity INT, PRIMARY KEY (sales_order, customer_number))
pre>PARTITION BY HASH (sales_order, customer_number)
In this example we are creating six partitions by sales orders and customer numbers. There are some rules though. If the table has a primary key, it must be included in the hash. If you add more than one column, and your table has a primary key, all fields used to partition on, have to be part of the primary key also. If you leave off the number (6), the system will determine the optimal number of partitions itself based on your configuration. It is therefore the recommended setting for most hash partitions.
h2>Option 3: Partitioning Column Tables by Round-Robin
In a ‘round-robin’ partition, the system assigns records to the partitions on a rotating basis. While it makes for efficient assignments and requires no knowledge of the data, it also means that removing partitions in the future will be harder as both new and old data will be in the same partitions. The partitions are created in SQL and the syntax is
pre>CREATE COLUMN TABLE SALES (sales_order INT, customer_number INT, quantity INT)
pre>PARTITION BY ROUNDROBIN
Here, we are creating six partitions and assigning records on a rotating basis. If you change the last statement to PARTITIONS GET_NUM_SERVERS() the system will assign the optimal number of partitions based on your system landscape. The only requirement here is that the table does not contain a primary key.
h2>Moving Files and Partitions for Load Balancing
You can periodically move files and file partitions for column tables to achieve better load balancing across hosts. Redistributions are particularly useful if you are adding, or removing, a node from the system, creating new partitions or load balancing existing ones that have grown very large. However, before you start, make sure you save your current distributions so that you can recover in case you make a mistake. If you have the system privilege Resource Admin, you can open the administration editor in SAP HANA and choose landscape --> redistribution and click ‘save’. Then select ‘next’ and ‘execute’. You have now saved the current distribution and can recover if anything goes wrong.
Once this is done, you can go to the Navigator pane in Studio and select the Table Distribution Editor. From here, you can see the catalog, schemas, and tables. Select the object you want to display, and choose Show Table Distribution. You can also filter to a single host as needed. This will display the first 1,000 tables in the area you selected. If more are available, you’ll see a message box.
In the overview lists, you can now select any table you want to analyze, and the details are displayed in the Table Partition Details area. You can move the table to another host by right-clicking it and selecting Move table. If you want to move a partition instead of a table, you can select the partition instead and do the same. This may be very useful if you want to load-balance large tables across multiple hosts or consolidate the partitions to single hosts. For detailed recommendations on load balancing, see SAP Note 1650394 for large table management.
h2>A New HANA Partitioning Tool for SAP BW
SAP is also working on a tool to help you automate the partitioning and merge tasks for SAP BW. It is currently being planned to be released to non-pilot customers in SP1.x for 7.3. The first part of the new partitioning tool for BW on HANA allows you to repartition DSOs and InfoCubes
The Second more advanced part of the tools also allows you to Merge, Split and Move partitions in an Admin interface. You can also schedule this to run as background jobs, with your own paramters, and even pick the partitioning schemes we discussed in option 1, 2 and 3 above.
Both of these capabilities are still under development at SAP, and may change before being released to non-pilot customers. But it is really great to see that SAP is working on taking the capabilities outlined in the general partitioning discussion above and place these into a BW tool that makes the tasks of partitioning and managing them much more manageable.
SAP HANA's capabilities keeps evolving and new tools keep being developed at a high pace. In this blog we looked at the core capabilities of managing very large tables and also on what new tools may be available in the very near future to help you to this with BW on HANA.
Next time, we will explore more of the admin features of SAP HANA.