Data Profiling - A Quick Primer on the What and the Why of Data Integration
Published Date:
Apr 12, 2011Abstract
Databases in most companies have evolved in an ad-hoc manner, which has resulted in information silos. Companies, therefore, do not have a unified view of their customers, resulting in missed business opportunities or increased cost of operations. Data integration addresses those issues, but poses data verification challenges, since the source data are in diverse databases. Most data integration and migration projects overshoot their time and cost estimates because of the effort expended to understand the source data. Data profiling automates the identification of problematic source data, inconsistencies, redundancies, and inaccuracies. Data profiling also provides a factual foundation, based on which data can be cleansed and then consolidated before integration.
Excerpts from the Paper
A company’s database contains information that touches most aspects of its business activity market data, customer information, accounting information, production details, sales records, billing details, collection details, personnel records, salary records, and so on. This data is utilized by the company for various business decisions, and it is therefore imperative that the data in the database be consistent, accurate and reliable. Since the costs of poor data quality are high, increasingly companies are “profiling data” to check its quality and suitability for business. Data profiling uses “analytical techniques to discover the true content, structure, and quality of data.”

