Authored by Ashvinder Rana, Data Migration Lead, Utopia, Inc.
Data Cleansing – an interesting term…what does it actually mean? Well, in simple terms it means the “clean-up” or the scrubbing of the legacy data to ensure accurate, consistent and usable data is migrated over to a new system (e.g. SAP CRM&B, etc.).
To identify the data elements that need to be “cleansed” in the legacy system, “data profiling” is executed on various sets of legacy data files wherein, various legacy data files are analyzed with the help of tools like SAP® BusinessObjects™. In this process, various checks are performed on data –
- Pattern Analysis – helps identify the various data patterns that exist in current legacy system for a particular field. E.g. Drivers License field – showed more than 95 patterns of data that resides in legacy today. To overcome this, business rules will be defined on how to interpret and migrate the data accurately and
- Completeness Check – some elements are required in applications like SAP®; therefore, a completeness analysis helps identifying where the key data that maybe missing and needs to be updated in legacy.
- Duplicate Check – identify duplicate data – duplicate meters, for example. If there are any identified, de-duplication rules can be pre-defined prior to conversion such that target system contains clean data.
- Accuracy Check – For e.g., email-id field – a simple accuracy check on this element is to evaluate if there are no “@” signs in the data then the data needs to be cleansed in the legacy system such that the information is accurate and will not error during conversion to the new system. These inaccuracies could be arising from typos, spelling errors or lack of naming standards, for example.
As an ongoing task, depending on the volume and nature of cleansing activity, various data elements identified for data cleansing are either cleaned-up manually by the business folks; or programmatically fixed during the extract process; or will be managed via business rules applied during the conversion process.
An email has been sent to: