As a software consultant specializing in data migrations, the idea of addressing and improving data quality during migration frequently is brought up. I mean, we are migrating core business objects with business rules at the field level… isn’t this a golden opportunity to improve on our data quality?
The answer quite simply is yes. But just how far can you go? A true data quality project is…well…a project. With a data migration project, DQ is not front and center. Moving the data to the target system is the priority, while trying to improve the quality at the same time will certainly add work.
You know what? The truth is this is a golden opportunity for the business to address and improve data quality. The truth is that without much added ETL effort, you can definitely improve your data quality during the migration process. The business owners of the data will have an increased level of effort by enhancing business rules, but the delta work pays huge dividends. I’ve adopted three basic rules and reinforce them with the business as we embark on ETL specifications and mappings.
First, each data mapping session
is always preceded by data profiling of the object. Show the business what data lives and breathes (or died) in the legacy system. Second, be specific and identify areas for improvement. A good data migration consultant can identify and point these out easily with the results of the data profiling always available. Lastly, be clear about what cleansing can be done via the transformation effort vs. a cleansing process outside the ETL loop.
For example, take a material master object. Product types, product groups, and other ‘code value’ fields can easily be cleaned up with a cross-reference table deployed during the transformation. Whether this cross reference table has tens of entries to simply map value A to value B, or hundreds of entries to consolidate or further delineate values is no additional work during transformations, and a powerful data quality weapon.
Cleansing outside the ETL loop, for example, would be standardizing text descriptions, or validating city, state, zip code combinations. Good software can cover many of the ‘outside the ETL loop’ cleansing. I‘ve been using SAP’s BusinessObjects software as an ETL/DQ tool and it handles all of the examples above inside the ETL loop. Now that’s DQ power!
An email has been sent to: