GRC
HR
SCM
CRM
BI


Article

 

What's Degrading Your SAP Business Scenario Performance? 4 Steps to Find — and Fix — the Problem

by Armin Hechler-Stark and Swapan Saha | SAPinsider

December 14, 2010

When a large SAP customer experienced a complex performance and scalability problem, SAP used its development processes and tools to identify and fix performance issues in that customer’s landscape. Learn how you can use these same tools to diagnose problems in your own landscape and get an introduction to SAP’s method for testing these fixes to make sure that the production landscape can handle the business’s needs.
 

Performance problems can cripple your company’s productivity. Almost every business has at least one IT job they dread running, knowing that it eats up system resources and can even raise a solution’s TCO. So what can you do about it?

First, you need to find out exactly what is slowing you down. This can sound daunting, but we at SAP were able to resolve a complex performance and scalability problem at a large SAP enterprise customer site by following a few simple steps. In this article, we’ll explain how we used SAP development processes and tools to identify and fix performance issues in the customer’s landscape, and we’ll introduce our method for testing these fixes to ensure the production landscape would be able to handle what the business required of it.

Throughout this article, we will follow this particular example. But the steps and actions we recommend here are universally applicable to help diagnose and fix a wide array of performance problems. The example presented here is from a real SAP customer that uses SAP BusinessObjects Access Control to manage user access and prevent fraud throughout the enterprise. In this business scenario, the customer runs a scheduled background job every hour to collect activity logs from various resources, including the change document system and the system’s statistics records. The customer reported that this log collection job was performing poorly, and our job was to figure out why. Here are the steps we followed.

Step #1: Understand the Business Scenario and the Production Landscape

Before you start any project, it’s important to get a solid understanding of its expected goals and outcomes from the business users. You’ll want to learn the users’ business processes, what the current and desired response times are, and how much data they are dealing with. For example, you’ll want to find out:

  • What the production landscape is like and what kind of load it deals with. In our example case, the SAP BusinessObjects Access Control job collected all relevant information and data into corresponding database tables. The customer had more than 110 million change documents in its ERP production landscape from logging changes in business data and activities. In addition, the ERP system is the central system formed by the consolidation of multiple ERP systems in various locations, meaning thousands of global users accessed it.
  • How users run the process (the sequence of steps) and the concurrency of the users. In our example case, we were dealing with an overuse of “emergency” access privileges. SAP BusinessObjects Access Control enables administrators to grant users super-user emergency access to the ERP system. The original intention was to provide this access privilege to only a few users, but in the production system, access was actually provided to 100-400 users. With a growing amount of active users being granted access privileges, and with the amount of activities and volume of documents they created, the resource consumption and processing time for the business scenario grew.
  • Any restrictions the business has for this landscape. In our example, since the server running the ERP system is shared by multiple applications, only one CPU is allocated to the job of log collection, and only for a limited time (two to five minutes for each hour). This meant that, when the job continued past its allotted time, it forced other jobs to share resources.

Based on this information, our goals for this project were to:

  • Minimize the amount of CPU and memory consumed for a job that accesses a large volume of data
  • Ensure proper hardware sizing and improve the solution’s TCO

Step #2: Use SAP Processes and Tools to Identify the Root Cause of Performance Problems

Once the business goals are understood, the next step is to analyze and identify relevant performance bottlenecks using SAP processes and tools (see sidebar).

 

Here you’ll need to:

  • Set up a test system with a proportional amount of data so that the test system mimics the customer production landscape as closely as possible. Of course, this data must be of a reasonable size, the definition of which depends on how long it takes to create data in the system and what the minimum data requirement is. For example, if the customer system has more than 100 million records in the key tables, you’ll need to have at least several hundred thousand, or even one million, records in the test system.
  • Run the test and collect information about CPU and memory consumptions using STAD (see Figure 1). To ensure accuracy, you should run this transaction multiple times. In our example scenario, when we ran this test for a certain test load, we saw that it took 346,086 milliseconds to complete the job. For the test represented in Figure 1, for example, you will see that the database access time is 297,036 milliseconds.
  • Use STAD to drill further into the database access information. In our example, the 300 milliseconds it takes on average to access the database is relatively high, so we drilled down to understand why. In Figure 2, we can see that the amount of time it takes to access the database (around 85% of the total time it takes to return a query) is due to an unusually large number of database calls — around 7,518 in our test run (note that when it comes to database calls, the fewer, the better).

 

Figure 1 — Information about CPU and memory consumption from the STAD transaction
 

 

Figure 2 — Using STAD to drill down into the results from Figure 1 shows the reason for a relatively high database access time — an unexpectedly high amount of database calls

Once you’ve completed these steps to start to identify the cause of a problem, you’ll need to further analyze the issue with SAP’s other tools. Here’s how we proceeded in our example case:

We used SAT and ST05 to identify the root cause of the high database call levels: nested and unnecessary loops. Here, the system was designed to call the ChangeDocument_read function for each user in a loop. But when the number of users increased from a few to a few hundred, the time needed to read user activities with this function also grew, resulting in more than 100 million documents being generated and the job running beyond its allotted hour, thereby causing the performance and operational problem. Once we identified and fixed this issue, the database access time dropped drastically, from almost 300 milliseconds to just 4.827 milliseconds (see Figure 3).

 

Figure 3 — After fixing the nested loop issue, STAD shows that the database access time is dramatically lower

We used an ST05 trace to identify a second issue that degraded performance. The software that our customer was using would write application-specific results to the database for one user at a time; since the system had a very large number of users, it had to deal with many more individual database insert statements, leading to slower response time. We were able to remedy this issue by fixing the coding so that the application would execute a batch to the database insert for all user records instead of individual inserts.

You’ll need to continue using the SAP transactions to identify and fix any performance hot spots until you are confident that you’ve gotten to the root of the performance issues. After that, to test that your fixes have truly solved your performance problems — and to see if the changes might cause any new problems — you should attempt to predict and replicate the behavior of the improved software in the production landscape.

Step #3: Use Non-Linear Regression Testing to Predict Behavior in the Production Landscape

To get a good idea of how your improved coding will now handle requests made to it, you’ll have to test it, being sure to replicate as closely as possible the conditions and user behavior that the solution is likely to see. Of course, this model will still need to be verified with testing in the production landscape. In our example, we wanted to ensure that the customer’s goal of completing a log collector job in five minutes (300,000 milliseconds) was feasible.

In this example case, as in so many others, it’s nearly impossible to replicate the amount of data — millions of change documents, for instance — that would actually be generated, making it difficult to accurately predict how that system would handle high loads. So we set up a statistical regression test, measuring how the system handled four progressively higher amounts of load, so that we could predict how it would handle an even larger load.

In mathematics, nonlinear regression is a form of regression analysis in which observational data is modeled by a function. In our case, this function basically represents the relation between the log collection time and the number of change documents. So we could write this function as LCT = f (noCDs), with LCT being the log collector time and noCDs being the number of change documents.

To help us determine the constants (A and B) of this function, we chose a set number of users (600) and used four measurement points (taken in four different SAP NetWeaver ABAP clients) of the same instance with varying amounts of dependent data — that is, four different databases, each with a different amount of data (see Figure 4).

 

Figure 4 — The points on the graph represent the four measurement points that were tested

This dependent data was then entered into a set of change document tables and user-activity related tables. Each table for each of the four clients contains different change document data — for 100,000, 200,000, 400,000, and 1,000,000 change documents respectively — and all other tables are populated proportionately.1

Then we modeled the results obtained from the four measurement points and entered them into the graph you see in Figure 4. Using the data from this model, we were able to come up with a logarithmic function:

            y = A log(x) + B

Here, y is output and x is input. A and B are constants. In our scenario, values of A and B are shown in the equation (which is just an example):

            y = 4.1447 log(x) + 41.939

This allowed us to estimate the time required to complete a log collection job. We repeated this same process using different numbers of users — from 600 to 4,800. For each load level, we predicted the minimum and maximum amount of time required to complete the job (see Figure 5).

 

 

 Estimated log collection time

# of users per day 

  Minimum

  Maximum 

 600

  34.8 computed seconds

  996.23 computed seconds

 1,200

  67.91 computed seconds

  1,881.96 computed seconds

 2.400

  141.54 computed seconds

  3,879.69 computed seconds

 4,800

  425.6 computed seconds

  10,784.6 computed seconds


Figure 5 — A summary of the results of our log collector regression models

Step #4: Validate the Regression Model

Once you use your regression model to determine if your system can support the business and help it meet its goals, you’ll need to test the solution in the actual test system. So, in our example, once our internal measurement and regression model predicted that the collection job could be completed within the five-minute goal, we participated in an onsite engagement with the customer to run similar tests in its test system with data copied from the production system.

We found that, for various user levels up to 400 users, the log collection was completed within 120 seconds — this matched with our regression model and confirmed that the solution would now be able to perform log collections in a few minutes, rather than several hours.

Summary

Following the steps outlined in this article, SAP helped one of its customers enhance the performance of its SAP BusinessObjects Access Control solution. As a result of this work, the customer was able to optimize its hardware use and improve its TCO due to lowered CPU and memory consumption.

This four-step method can be applied to analyze system performance in your organization, as well. To learn more, visit www.sdn.sap.com/irj/sdn/performance-analysis

 1 Learn more at http://help.sap.com/saphelp_nw70/helpdata/En/2a/fa0168493111d182b70000e829fbfe/frameset.htm. [back]

An email has been sent to:






More from SAPinsider



COMMENTS

Please log in to post a comment.

No comments have been submitted on this article. Be the first to comment!


SAPinsider
FAQ