Data likely enters your organization through many channels: Customers place orders through online interfaces, an RFID reader records bulk shipment information, or clerks manually enter product details into a database, for example. This can lead to inaccurate and repetitive information stored in multiple sources. Business users rely on this data to make critical decisions, yet inconsistent data often leads to inefficiencies and poor decision making.
A business intelligence (BI) system is designed to help users make informed decisions by providing relevant, accurate information at the right time — but to reap the benefits of BI, your BI application must be founded on reliable, clean data. Because companies’ information management (IM) infrastructures deploy heterogeneous data sources, data integration and consolidation become critical. These activities help ensure the quality and consistency of information extracted from different origins.
With SAP BusinessObjects Data Services 3.2 and the upcoming release of SAP NetWeaver Business Warehouse (SAP NetWeaver BW) 7.2, SAP has taken a substantial step to offer a complete, integrated infrastructure to deploy any data sources (structured or unstructured data from SAP or non-SAP systems) to maximize the accuracy and quality of data that is reported to an end user.
SAP BusinessObjects Data Services 3.2, part of the SAP BusinessObjects IM portfolio (see sidebar below), is currently available. SAP NetWeaver BW 7.2 will enter ramp-up by mid-February 2010.
This article explains how the latest version of SAP BusinessObjects Data Services can be leveraged to cleanse master data located in SAP NetWeaver BW and enhance data quality — by enriching master data with geocoding information, for example. Once the cleansed, geocoded data is available in SAP NetWeaver BW, reports leverage the geographic information to display data as a map (see Figure 1). Consider a marketing team that wants to roll out a campaign to customers in certain locations. With data displayed graphically, users can easily see their target areas.
||Cleansed, enriched data displayed in a map
How the Integration of SAP NetWeaver BW and SAP BusinessObjects Data Services Works: A Detailed Example
In this article, I’ll walk through an example using customer data to illustrate how the integration of SAP BusinessObjects Data Services and SAP NetWeaver BW enhances the steps of extracting, cleansing and enriching, and loading master data (see Figure 2).
||An overview of a processing job in SAP BusinessObjects Data Services 3.2 to extract, cleanse and enrich, and load data stored in SAP NetWeaver BW
Step #1: Extract Master Data from SAP NetWeaver BW
First, you’ll extract the master data of the InfoObject “Customer” from SAP NetWeaver BW through the Open Hub interface via SAP BusinessObjects Data Services 3.2. These data extraction jobs are created in the Designer of SAP BusinessObjects Data Integrator (see sidebar for important prerequisites).
To transfer the data from the SAP NetWeaver BW Open Hub destination, you need to:
- Connect SAP NetWeaver BW to SAP BusinessObjects Data Services, where the Open Hub destination is located as a new datastore object (type: SAP BW source). Here, the connection parameters to the SAP NetWeaver BW system must be maintained.
- Import the metadata from the source system to the datastore in SAP BusinessObjects Data Services.
- Create a data flow object as a folder where the job to process the data will be located.
- From the datastore’s imported metadata, select from “node: Open Hub Tables” your Open Hub destination created in SAP NetWeaver BW and drag it to the working area of your data flow object.
Step #2: Cleanse the Data and Enrich It with Geocoding Information
After extracting the data, use SAP BusinessObjects Data Services 3.2 to improve its quality so users can make decisions based on accurate, meaningful information.1 Four main groups of transformations are primarily used to enhance data quality: address cleansing, data cleansing, geocoding, and matching.
For our example, let’s correct the addresses in the customer data using address cleansing, and organize the data by geographic context through geocoding.
Consider that we have this address among our data:
1001 Sixth Av.
Manhattan, N.Y. 10011
To ensure that this address is accurate — and to avoid messy data — we can use the “USA Regulatory Address Cleanse” transformation type, which transforms, identifies, parses, validates, and corrects US address data according to the US Coding Accuracy Support System (CASS). Many more address cleansing transformations are available to fulfill the specific requirements for regions all over the world.
To correct addresses and assign postal codes, the address cleansing transformations rely on databases called postal directories. SAP BusinessObjects Data Services compares and matches the extracted data to the real-world addresses in the directories in order to fix errors. There are many directories available through SAP BusinessObjects Data Services to meet certain requirements regarding basic data cleansing, standardizing, and data enrichment.
You can customize the USA Regulatory Address Cleanse transformation by mapping the fields of an input structure (“Schema In”) to the output structure (“Schema Out”) of the transformation within SAP BusinessObjects Data Services (see Figure 3). The input structure is determined from the output structure of the preceding job step. As we discuss in the next section, the latitude and longitude fields for geocoding have been added to the Schema Out of the address cleansing transformation.
||Basic maintenance of the “USA Regulatory Address Cleanse” transformation
You must maintain a location for all the necessary postal code directories in the “Reference Files” option group. The cleansing transformation works more precisely if each component of an address is specified as a single field in the source structure, but it is also possible to have one line with multiple components of an address and to let SAP BusinessObjects Data Services determine the record’s composition and cleanse it appropriately.
Using the USA Regulatory Address Cleanse transformation, you can obtain the corrected address:
1001 Avenue of the Americas
New York, NY 10013-1933
After you cleanse the customer data, you can code it by the geographic location of the addresses and then display the data graphically.
The configuration of the “Geocoding” transformation follows the same principle as for the address cleansing transformation. The Schema In input structure is taken from the Schema Out structure of the USA Regulatory Address Cleanse transformation, which is the preceding job step. (You can see in Figure 3 that the latitude and longitude fields have already been added to the Schema Out of the USA Regulatory Address Cleanse transformation). On output, you will have latitude and longitude information in your data; that information may help your organization target population sizes and other regional geographic data for specific marketing campaigns, for example. SAP NetWeaver BW can store this information in the master data, allowing you to create reports or maps showing data in geographic context.
Step #3: Load Cleansed and Enriched Data Back into SAP NetWeaver BW
To close the loop — and to work with the enriched data — you must transfer the data back to SAP NetWeaver BW. To do so, the data needs to be handed over to an SAP NetWeaver BW 7.x DataSource using an InfoPackage.2 This procedure is similar to the one for connecting SAP NetWeaver BW as a source to SAP BusinessObjects Data Services, but the difference is that now a datastore object must be created in SAP BusinessObjects Data Services with type: SAP BW target!
The SAP NetWeaver BW DataSource can be found below “node: Master Transfer Structures” after importing the metadata. SAP BusinessObjects Data Services automatically creates the InfoPackage in SAP NetWeaver BW that is needed to load the cleansed data.
A Powerful Combination
The enhanced integration between SAP BusinessObjects Data Services 3.2 and SAP NetWeaver BW 7.2 is an exciting development. Organizations will be able to integrate the data cleansing and enrichment functions of SAP BusinessObjects Data Services into their SAP NetWeaver BW data flows. The combined power of these solutions will result in more accurate, meaningful data that business users can leverage to make more informed decisions more efficiently.
For more information on SAP BusinessObjects Data Services, please visit www.sap.com/usa/solutions/sapbusinessobjects/large/information-management/data-services.
Heiko Schneider (firstname.lastname@example.org) is the Solution Manager for SAP NetWeaver Business Warehouse (SAP NetWeaver BW). He joined SAP 10 years ago and worked as a Senior Consultant for SAP NetWeaver BW in various areas and positions at customer sites. Currently, Heiko is working on SAP’s partnerships with Teradata and HP (around HP Neoview) and is involved in rollout activities for SAP NetWeaver BW 7.2.
1 At this point, you could also merge non-SAP data with the customer master data. [back]
2 The data source does not have to have a flat structure; segmented data sources are also possible. [back]