I recently moderated a web forum with SAP's Ginger Gatling, Corrie Brague, and Werner Daehn on improving and monitoring data quality with SAP Data Services. Ginger, Corrie, and Werner took questions on data cleansing support for European standards, using SAP NetWeaver BW vs. SAP Data Services, integration with other SAP applications, and other topics.
For the full Q&A, you can view the questions and Ginger, Corrie, and Werner's responses in the IT Forum, or read excerpts from the transcript of the Q&A below.
Scott Priest (moderator): Welcome to today's Forum on data quality and SAP Data Services.
I'm happy to have SAP's Ginger Gatling here, joined by two SAP Data Services gurus -- Corrie Brague and Werner Daehn -- from the new Enterprise Information Management for SAP book to answer your questions during the next hour.
Ginger is an author for SAP Professional Journal and SAPexperts, and has been a driving force behind this new EIM book. She's joined today by Corrie Brague, Director of Information Quality and Insight (IQI) Product Management for SAP, and Werner Daehn, Product Manager for SAP Data Services and manager of the Data Services wiki on SCN.
Welcome, Ginger, Corrie & Werner, and thanks for joining us today! It looks like there are plenty of questions to get started, so we'll get started right away.
MaureenCadden: Has SAP come out yet with a version of Data Services that works with Internet Explorer 9? If not, is there an expected date?
Corrie Brague: The Data Services (and Information Steward) 4.1 releases, now in Ramp Up, support Internet Explorer version 9.
For more information on the Data Services 4.1 Ramp Up program, please visit here.
eccard-von-bentivegni: How is the support of legal forms in company names?
The desire is to have the most common legal forms of Europe in a table with two or three columns: the abbreviation of the legal form, the whole name (long name) of the legal form, and maybe country of usage (a so-called synonym table for companies).
Actually, I could use basically the search_replace-function in a data flow to standardize the legal form of company names like 'AG' for 'Aktiengesellschaft' in Germany, but I have to build my own table with all often-used similar entries to the first one, like 'AG' for 'Aktienges.', 'AG' for 'Aktiengesellsch.', 'AG' for 'Aktiengesell.', etc.
Further, if I want to standardize legal forms with several words like 'PLC' for 'Public Limited Company' or 'LLP' for 'Limited Liability Partnership', then I have to use the comparison flag 'SR_STRING' instead of 'SR_WORD' but with the negative effect that it will also change in parts of words, which is mainly for the German language a problem with sometimes very long words, e.g., 'Baumschulenoberaufseherschuhbendel' is changed to 'BSnoberaufseherschuhbendel' if the change is activated from 'Baumschule' to 'BS'. What I want to say is that Data Services works often very fine for US countries but not to some markets from Europe like now in this case. Which data quality support do we have for special requirements from Europe markets like France, Spain, Germany, Italy, etc.?
Maybe I can cover 80% of the requirements with the easy-to-use-functionality of search-replace, which is often acceptable for the first step. But we need a scalable way to increase the solution not by throwing away the whole old stuff and making it new in a complete other much more complicated way. We need an acceptable way to achieve the 100% solution if the customer wants it, maybe after the first step of 80%.
What about DQM for SAP solutions, especially to the demand of standardizing company names, also with regard to having better matchings later on in the process? Actually we let the customer alone with this subject in SAP CRM or SAP ERP. The customer itself is responsible to save the company name in his way. And later on we don't match the customer in the matching process because the company names are not enough similar only because of different ways of the writing for the same legal form for the same company. Of course I could make a special solution for each company, but such fundamental problems should be part of the standard solution. The customer is not always willing to pay so much for products if fundamental parts of the product are not covered and must be first developed. It is much more better to say that it must be configured like adding some entries in a table, but saying that it must be programmed is for such cases a negative point in selling this software.
Corrie Brague: Great questions. I will try to respond to your questions in order.
The shipped Name, Title, and Firm Cleansing Package for usage with the Data Cleanse transform of Data Services includes standards (along with variations of those standards) for Firm or Company names. Our Global Name, Title and Firm Cleansing Package includes support for countries around the globe, including European countries. However, the Cleansing Package does not standardize necessarily on the legal forms of company names. You can however review the SAP supplied standards and extend to meet your better meet your business needs.
The Data Cleanse transform goes further than simple search and replace in terms of its logic for parsing and standardization. There are out of the box capabilities for global name, title and firm parsing and standardization as mentioned, along with emails, dates and phones numbers. Additionally, there is also the capability to build custom Cleansing Packages using Cleansing Package Builder (check out tutorials here for more information).
Additionally, we have global address cleansing and geocoding capabilities that meet unique requirements and local standards for European countries (e.g. France, Spain, Germany, Italy, etc). With our latest Data Services 4.1 release (now in Ramp Up), we achieved La Poste Ratification of our address cleansing capabilities and made several other EMEA focused enhancements to better meet local standards for the UK, Netherlands, Germany, Austria, Liechtenstein and Switzerland.
On the topic of DQM for SAP, currently firm standardization capabilities are not a part of that solution (out of the box). However, this is an item that is being considered for a future release.
eccard-von-bentivegni: Question: When is the HANA search functionality available in a match-transform in Data Services? Actually the match transform does not cover the requirements of a lot of customers in Europe.
Use case: I have a local customer (assurance) who would like to have better quality with data services. Over 90% of 2 million customers of his assurance are living in the same region and the call center of this assurance is looking first if the calling person is already a customer. Over 50% of the cases the customer doesn't know by mind all digits of the post code where he is living, only the first 2 digits.
With the actual matching transform BODS will not find a matching within a couple of seconds and if the call center would enter the wrong post code there is no matching. The consequence is that a new data record will be created which is wrong, ergo with Data Services the customer will get only better quality if the input is already very good and complete. Yes, it's possible to configure it in the way that it is more tolerant but mostly with time performance which is not tolerable by the customer/end user.
We need a better (no break keys) and more user friendly transform for matching cases. When will it be available in the future?
Corrie Brague: Another great post. The Match transform in SAP Data Services does favor batch match scenarios for duplicate/relationship detection and consolidation scenarios versus a transactional search scenario. You are also correct that you can configure Data Services to be more error tolerant, but the cost can be performance. We are currently exploring both simplification and improved support for search scenarios for our Match transform.
And, did you mention HANA… Would love to validate some concepts with you. If you are interested, please contact me at email@example.com.
bradschroeter: We use ECC DQM, which interfaces with DS for customer address standardization and duplicate check. We've recently run into an issue where we're having to do ECC custom code or DS custom code to solve for an issue where DQM cannot control the type of records created in break key table nor control the records sent to DS (see here). Are there any future plans to solve for this?
Corrie Brague: Brad, we are looking into the possibility of addressing this item in an upcoming support pack. We are currently engaged with the SAP BAS team to identify what is the best solution.
I would encourage everyone to review/vote on ideas and submit their own product ideas at SAP Idea Place.
Here is the direct link for Data Services and the link for Information Steward is here.
bradschroeter: What about a DS address cleanse which can calculate time zone? Reference here.
Corrie Brague: At this time, there are no current plans to enhance DS Address Cleanse to calculate a time zone. Although we appreciate the idea submission and will actively track this idea for consideration in future releases based on community interest.
Brad, it would be useful for context to understand what is the use case for the time zone calculation (what business problem does this feature solve) to be able to better understand the value from your perspective. Please communicate here or with the original Idea Place post. Thanks!
bradschroeter: Where can I get details on Match Review (new business user UI to review match results, provide resolution, and set master record) in Information Steward 4.1?
Corrie Brague: Match Review is available in the Information Steward 4.1 release currently in Ramp Up (for more information visit the SAP Information Steward 4.1 Ramp-Up Info Page).
To find out more information on Match Review in SAP Information Steward 4.1, check out the 4.1 Documentation (see instructions on how to access here).
I am a BW/BO consultant. In my previous project, I worked together with a group of BODS developers on retrieving G/L transaction data from BW, calculating some highly complex KPI based on the data, and putting the transformed data back into BW via UD Connect for reporting in BEx. We did not think much of BODS but an ETL/programming tool. In this case, can you let me know whether I have overlooked any of the advanced aspects of BODS which could have been applied for other better use? Besides, I just wonder whether you have any real-world examples where BW and the EIM tool suite, particularly BODS, co-exist or they complement each other?
Thanks and Regards,
Werner Daehn: That's an unpleasant one because you have two solutions - the ETL part of BW and DataServices as ETL tool - which both have their own set of advantages. BW-ETL comes for free, is deeply integrated. And DataServices is way more powerful.
Obviously we want to consolidate that but the current distinction is:
1. Source is ERP & Target is BW? No need for DataServices.
2. You have just a few external sources? Still no need to invest in DataServices.
3. You have to cleanse the data and standardize on the values for SAP data and non-SAP data? You have to use DataServices then.
4. You have many external sources to be loaded into BW, preferably with lots of changes? DataServices is just perfect.
Taking data out of BW, do calculations outside and put it back, is possible but depends on the calculations a lot. If it is statistics you won't be happy with DataServices, if it is lookups and the such, you would.
Ginger Gatling: Also - here is a link that describes the support in Data Services for the BW extractors.
Have you looked at the data quality capabilities and text data processing features within Data Services? These are both very robust and add additional capabilities to Data Services. The EIM book has a chapter on the data quality features and a chapter on text data processing. Corrie, who wrote the data quality chapter - is here online. Here is the link to the book.
sankolluribobj: Question: When I try to use a Persistent Data store table in a lookup function, I am not able to select a table from Persistent Data store using Lookup_EXT function. I am using Data services 3.2.
Werner Daehn: One second, let me try with my version....
Okay, I think I figured out what you missed - you have to save the dataflow to convert the template table to a permanent table and then you can use it.
In a persistent cache datastore you cannot import tables - there are none - you can add new template tables only. But as template tables might change their structure constantly, we do not allow lookup_ext to use template tables, just permanent ones.
With persistent cache, when you save the dataflow, the table is instantiated as regular object, as permanent table even.
Scott Priest (moderator): What SAP Data Services applications support data cleansing and transformation?
Corrie Brague: SAP Data Services includes data integration, data quality, and text data processing capabilities. Data Services allows you to integrate, transform, improve, and deliver trusted data to critical business processes across the enterprise for both SAP and non-SAP systems.
SAP Data Services data cleansing capabilities include: parsing, standardization, correction, enrichment, matching (de-duplication) and best record consolidation. There are multiple transforms available that support these capabilities, including Global Address Cleanse, US Regulatory Address Cleanse, Data Cleanse (non-address data cleansing), Match and Associate transforms. Basic data transformation capabilities are also available via functions like search and replace or by leveraging look up tables.
Additionally, there is an integration provided for data quality management in SAP applications, specifically in SAP ERP, SAP CRM, and SAP Master Data Governance. The embedded solution is called SAP Data Quality Management, version for SAP solutions, and it provides a transparent, out-of-the-box data quality solution within SAP applications. The key capabilities include address cleansing, address enhancements, and duplicate checking. As you enter a new business partner in the SAP application, an immediate check is done to parse names, cleanse the address, and check for duplicates.
Ginger Gatling: You can learn more about the transforms in the EIM book and also on the SCN page for Data Services.
Scott Priest (moderator): Werner, I know you do a lot of work on SCN, could you post a few links to resources you're involved with there?
Werner Daehn: Sure, there are three areas I would like to highlight.
The root page of all is this.
If you are new to DataServices we have put together a getting started, especially the video sub-page is well received. Actually, so much we should redo it with somebody better looking and of better quality. Anyway, different topic.
The EIM Use case is to get ideas when to use what EIM product, to get ideas of what can be done in addition and how.
Scott Priest (moderator): What's the timeline to get the data quality piece working, and what skills does the team need?
Corrie Brague: The timeline to implement data quality is dependent on the type and complexity of your data. We have blueprints and wizards available to get you started quickly on common cleansing, matching, consolidation and enrichment use cases. We have seen data quality implementations (production ready for real-time, point-of-entry data cleansing via the web) occur in a just over a week. To leverage the data quality transformations in Data Services, the team should be familiar with the core concepts of data quality. Chapter 8 of the Enterprise Information Management with SAP book will help to achieve this.
The good news is that Data Quality routines for cleansing and matching are provided out of the box via transformations within Data Services. No more custom programs/scripts for data cleansing/matching. For each transform, there are also transform configurations, which are preconfigured best practices in terms of input fields, output fields, and option configuration for a particular use case or type of data.
In addition, Data Services Blueprints are available. A blueprint is a sample SAP Data Services job that is configured with best practices to solve a specific scenario. Each blueprint is an end-to-end job that includes sample data and may be run in the customer environment with only a few modifications. Some jobs include batch data flows and some include real-time data flows; some jobs include party data and some include product data. Data quality jobs include structured data and text data processing jobs include unstructured data.
PaulRouthier: Is there a 'Preferred' approach to the use of Data Services when integrating with NW MDM or Master Data Governance (MDG) for Vendors and/or Customers re: Cleansing and de-duping?
Corrie Brague: Why yes there is, Paul.
There is an integration provided for data quality management in SAP applications, specifically in SAP ERP, SAP CRM, NW MDM and SAP Master Data Governance. The embedded solution is called SAP Data Quality Management, version for SAP solutions, and it provides a transparent, out-of-the-box data quality solution within SAP applications.
The key capabilities include address cleansing, address enhancements, and duplicate checking for customer and vendor data domains. For example, as you enter a new business partner, vendor, or customer in the SAP application, an immediate check is done to parse names, cleanse the address, and check for duplicates.
For more information, check out these product tutorials.
When I try to save Data Flow in my Designer, I am getting the following error: "Cannot Save the current Model". I am using SAP Data Services 3.2.
Werner Daehn: The first answer to that is, it should never happen. Never ever. It means that the textual representation of what you have drawn cannot be parsed again, so it could not read what it wants to save.
I haven't seen this error recently, not even in 3.2, but before there had been some hotspots like lookup_ext() and other elements called as new-function-call were not escaped properly or put under quotes because they are reserved words.
As it seems you can reproduce it, please send this info to support so we can fix that class of problem entirely.
Scott Priest (moderator): Could you provide a little more information about data matching and enrichment? How does that work?
Corrie Brague: In terms of data matching, duplicate records often exist in one or more source systems. The goal of matching is to determine whether records refer to the same entity. This involves evaluating how well the individual fields, or attributes, of the records match each other. Data Services Data Quality Management matching capabilities allow you to use various fields for comparisons, to compare records in multiple directions, and then join the intersections. Data quality management employs powerful matching algorithms to account for data entry errors, character transposition, and other data inaccuracies to match records.
In terms of data enrichment or enhancement, the out of the box enhancement options in Data Services include:
- Providing full international postal information
- Assigning longitude and latitude information to records
- Assigning geospatial information such as tax jurisdiction identifiers
- Assigning gender and prenames
Data Services also supports using 3rd party referential data (e.g., demographic data) to enhance records via our matching capabilities. Blueprints are available for some common data content providers.
PaulRouthier: Are there any plans to 'beautify' the Match Results achieved in a Data Services Batch job to allow a 'User' to see the match results coming from the Defined Match Rules?
Corrie Brague: With the Information Steward 4.1 release, there is a new business user UI to review match conflicts, to provide resolution on those conflicts, and make changes to the master record within a match group. The new Match Review capabilities include task management and status reporting capabilities.
Match Review is available in the Information Steward 4.1 release currently in Ramp Up (for more information visit the SAP Information Steward 4.1 Ramp-Up Info Page). To find out more information on Match Review in SAP Information Steward 4.1, check out the 4.1 Documentation (see instructions on how to access here).
Scott Priest (moderator): Thanks to all who joined is in the Forum today. A digest of all the questions will be available here in the IT Forum and in Insider Learning Network's IT Group.
You can find Ginger's contributions to SAPexperts here.
Although this Forum is now closed, you can post your own questions at any time in the IT Forum. Simply log in to Insider Learning Network, go to the IT Forum and select the "New Thread" button at the top of the page.
For more on data management and SAP's EIM roadmap, you can also listen in on our conversations with Ginger Gatling and EIM for SAP authors, with podcasts from Ginger Gatling on an overview of EIM, Ina Felsheim on data management and EIM, and Andreas Engel on SAP's approach to ECM - all topics covered in detail in the EIM for SAP book.
And once again, thanks to SAP's Ginger Gatling, Corrie Brague, and Werner Daehn for taking these questions today. Thank you all for joining us!