SAP Analytics Cloud (formerly SAP Cloud for Analytics) is the newest analytical toolset (only about two years old) offered by SAP. SAP Analytics Cloud is part of SAP’s overall strategy to move more products to the cloud. It was critical to this effort that a cloud-based analytics tool be offered to support SAP’s business applications, such as SAP S/4HANA, SAP SuccessFactors, Fieldglass, and Concur that run in the cloud.
SAP Analytics Cloud is only offered as a cloud-based solution, but it is not limited to reporting against cloud-based sources of data. Rather, it can access and report against SAP’s on-premise and cloud-based products and access data in comma-separated value (CSV) and Microsoft Excel files as well as other data sources.
A Brief Overview of SAP Analytics Cloud’s Predictive Analytics Offerings
Figure 1 shows various tools in the SAP arsenal that support predictive analytics. It also shows the different types of users these tools target. A common thread in all the tools is that they can all access algorithms running in memory in SAP HANA. In the case of SAP Analytics Cloud, SAP HANA is the database that runs SAP Analytics Cloud. The differences lie in the power of each of these tools and the flexibility they offer to use non-SAP HANA sources and algorithms.
SAP’s HANA-based Predictive Analytics toolsets
Starting from the right of the figure, developers can call the SAP HANA algorithms (APL and PAL) directly with code to run a web application that incorporates predictive features. In the middle, the on-premise solution is the SAP Predictive Analytics toolset. In addition to having a user-friendly user interface (UI), it can access custom R code (used by statisticians), PAL, APL, and internal algorithms. This feature makes SAP Predictive Analytics well suited for statisticians and advanced business users.
The left side of Figure 1 includes SAP Analytics Cloud. In one web-based cloud offering, this toolset offers planning and analysis features that are nearly equal to those of many other SAP BI products (such as Web Intelligence and SAP Lumira), all in one web-based cloud offering. It does not have the power or features of the other options mentioned above, but there is more and more talk of integration and features on the horizon. That said, there are two features that bring predictive analytics to the normal business analyst:
- Guided machine discovery
- Time-series forecasting
The Predictive Analytics Features of SAP Analytics Cloud
Now let’s dive into the predictive features offered in SAP Analytics Cloud. As mentioned earlier, these are a subset of the predictive features that the underlying SAP HANA database provides. The main difference I see is that with this do-it-all tool for the masses you cannot achieve the complexity that usually is expected with predictive analytics. This is because including all the features of the dedicated predictive analytics tool would increase the complexity of this tool to the point that the target audience of CEOs and business analysts would be overwhelmed. That said, however, I look for more integration with SAP’s full-blown predictive analytics product, SAP Predictive Analytics, as time goes on.
The Guided Machine Discovery Feature
The first feature I want to discuss is the guided machine discovery tool. This feature uses the APL regression functions to show relationships or influencing factors between fields (dimensions or measures). For example, you can use this feature to identify that weld failures are influenced by temperature, who did the welding, and the metal used in the weld in that order.
Another example might be analyzing data about former employees to identify the main drivers for their decisions to leave. By running the guided machine discovery tool, you could discover, for example, that the number-one influencing factor for leaving is the plant where they work. More specifically, you could learn that if they work in the Rotterdam plant they are 80 percent more likely to leave than if they work in any other plant. This solid data set could contradict the anecdotal data that suggested that pay and commute time were the reasons for employee attrition. This type of enlightenment is an example of a benefit the guided machine discovery can yield.
To perform a guided machine discovery using SAP Analytics Cloud, the first thing you need is access to the raw data view, as shown in Figure 2.
Access the data view to see the raw data
The Data View functionality is designed for analysts who want to look at the data from the model with their own visualizations and calculations and not rely on the person who designed the story to help them analyze the data. This is important because analysts want to do their own thing. They want to see the raw data and manipulate it, where in some cases, the data designer might be targeting CEOs who often focus on streamlining the presentation though graphical visualizations. The use of the Data View therefore is the appropriate place for accessing this discovery feature.
When you click the Data View button in Figure 2, the screen in Figure 3 opens. Here you can click the New Machine Discovery link to start performing the guided machine discovery task. Note that you can have more than one guided discovery open at the same time, as you might want to selectively restrict what variables are used in the discovery. For example, a variable exists (e.g., race or religion), but it might be illegal to use it or expensive to take action based on it.
Access the New Machine Discovery link
This action opens the panel on the right, as shown in Figure 4. (Note that there are two example panels in the figure.) Your task is to make the settings to determine which column of data you want to use to determine the influencers backing up this decision and which columns (measures/dimensions) should be excluded from the analysis (because they’re irrelevant).
The guided machine discovery settings to figure out what influences the discount percentage number
For example, in a hypothetical scenario you are trying to determine what factors influence a buy/no buy decision. You might remove a column called “how much money spent” as it is obvious that if they spent even one dollar, they made a buy decision. This is sometimes called a leaker in data-mining speak.
After making your required settings in Figure 4, click the Run button and the results appear as shown in Figure 5.
The results after the guided machine discovery tool has been run
In this example, after analysis it looks as if the larger the order revenue, the more likely it is that the customer will receive a bigger percent discount. You would expect to see a larger total discount on bigger deals, but not necessarily a larger percentage discount. However, your analysis does not need to stop here.
Select one of the other bars in the graph in Figure 5—for example, Store_ID. This analysis shows that the store ID is important in determining the percent discount. Select that one and the system breaks down the analysis by comparing the discount percentage by store. Double-click the Store_ID bar, and the graph expands with the details for that metric, as shown in the chart in Figure 6.
Detailed analysis by store
Here, the chart in the middle of Figure 6 shows the stores with higher-than-average discounts. The details on the right are automatically generated and add details for the data included in the chart.
You can even select more than one column (bar) on the chart to create a chart that shows the details for more than one column at the same time. Figure 7 is the result of clicking the Product_ID and Store_ID bars in Figure 6.
Compare the details of two columns (Product_ID and Store_ID) of data against the target (percent discount)
This is a small detail, but in the new chart (the lower left section of Figure 7) that is generated by selecting the two bars (Product_ID and Store_ID), you can see the products by stores on the left and with colors identifying the percent discount. This clearly shows the relationships within these factors. However, to make it even clearer, the tool auto-generates wording on the right to explain more details.
Another feature is the option to discover unexpected trends. When you click the Unexpected Values button (boxed in Figure 6), the screen expands at the bottom (Figure 8) and shows you where the values of the influencing factors were not consistent with the actual values of the discount percentage.
The sales represented by these entries might be worth investigating because they are not normal—that is, they are not the expected result. Once you flag these kinds of discrepancies, you can do more research and find out the cause, maybe with further discussions with the sales rep.
Information about the actual (existing) values compared with the predicted discount percentages
Note that there are two bars on the right of Figure 8. These are the actual values versus the predicted values of discount. When you click one, the information in that bar is detailed on the left (highlighted by the red arrow in Figure 8). In this case, the algorithm predicted that the record would have a discount rate of 0.28, but the actual record shows an existing discount rate of 0.17. This information would be helpful in identifying the sales orders to investigate that have unusually high or low discounts.
Let me briefly explain the other buttons at the top of the screen in Figure 6, and what information they provide (going from left to right).
- Insight Quality: This button is a measure to show how statistically valid and repeatable the underlying math is.
- Correlated Columns: This button is used to identify relationships between factors, not factors related to the target (in this case, percent discount). For example, if one column is for Marital Status and another is In a Relationship (y/n), these would be highly correlated to each other. This is important to know statistically (but beyond the scope of this article). This feature corrects for this statistical problem automatically, thus giving more accurate results. Correcting for this is normally a very tedious preparation phase done by statisticians.
- Key Influencers: The model’s key influencers are the factors that have the strongest effect on the target. For example, you might suspect that a column named Last Name of SalesRep has nothing to do with the percent discount and you would most likely be correct, as it would not be included in the list of key influencers.
The Predictive Time-Series Forecasting Feature
Now that you know a bit about the guided machine discovery tool, the next feature to explore is the time-series forecasting feature. This data-mining algorithm performs a predictive forecast under the covers (again using an algorithm in SAP HANA’s APL as shown in Figure 1). This time it is an APL time-series algorithm.
Time-series algorithms analyze a measure over time and create a forecast for this measure considering seasonal and trend factors. A prerequisite for using this technique is to have a planning model in place with a private forecast version.
In this example, I clicked the version-management icon (boxed in red in Figure 9) and copied a public forecast version to meet these prerequisites. To do this, after you click the version-management icon, click the copy icon next to the forecast version (highlighted by the top red arrow). This action opens the pop-up box in the figure in which you provide a name for your new private version that stores the results of the time-series forecast. Click the OK button in the pop-up, and the resulting new version appears on the bottom right (in this case, TIME_SERIES_FORECAST).
(Note: Private versions of the data are not visible to others unless they are later shared publicly.)
Create a private version of a time-series forecast
Now that you have set up the private version of your time-series forecast, the next step is to create a table with new settings, as shown in Figure 10. The table is needed to see the details of the generated forecast values. In addition, you can see these forecast values in relationship to the actual values. Thus, the options need to be set as identified with the arrows pointing to the right in Figure 9.
Table settings for the private version time-series forecast
The next step is optional, but I explain it so that you can compare your expected values of sales in the future versus values that the forecast algorithm generates later. To do this, you need to do some guesswork. Manually enter some estimated forecast values in the table, such as the 40.00 value shown in Figure 11. If you save the data, you submit (save) an estimate that is visible to anyone else who views this story.
Manually enter the new time-series forecast number for the private version
Now let’s see if using a predictive time-series algorithm generates a more scientific estimate. Click the forecast icon in Figure 12 to execute the predictive forecast. In the Set Predictive Forecast pop-up screen that opens, make your settings for the forecast (by Month, Day, Year, and so forth) and select the From and To dates. Enter any past periods or reference periods, if required. Once you’ve made your settings, click the Preview button to see the results.
Predictive forecast settings (accessed via the time series icon)
The screen in Figure 12 shows the basics of what level of granularity should be used to generate the forecast, in this case, what months should be predicted (2016.10 to 2016.12). In addition, the data back to 2015.01 is used as the basis for the calculation. The Reference Period shows you the date of the last actual data that is used to forecast the future values.
Once you’ve made your settings in Figure 12 and click the Preview button, the screen in Figure 13 opens. Here you can review what the algorithm suggests as better values, or at least more scientific ones.
The preview of the forecast values
As you can see, the algorithm predicted a much lower value in November than the manually entered values suggested, and a higher value in October. If you like these numbers better than the ones you entered manually in Figure 11, click the OK button to use these. This action opens the new table of data in Figure 14 that includes the results of these forecast values in the story.
The new data is added from the approved forecast preview