Ever since data governance gained prominence, there is a realization among banks that personalization of products requires not only faster access to dynamic data but also accurate and reliable data. Therefore, banks are increasingly directing efforts on building sound data management practices.
One of the key objectives banks are aiming to achieve is establishing lineage and traceability of data elements. Currently, data programs are predominantly led by regulatory requirements such as BCBS 239, CPG235, MiFID, and GDPR. These mandated guidelines are further nudging banks to prioritize data lineage across business lines. In the process, banks are spending significant time and effort to map end-to-end data flow.
What is data lineage?
As per the DAMA* dictionary, here are a few definitions of data lineage:
The transfer of data between systems, applications, or data sets.
A description of the pathway from the data source to their current location and the alterations made to the data along the pathway.
*DAMA is a leading organization for data professionals involved in conceptualizing a complete manual on data management standards and practices. (DAMA stands for the Data Management Association)
The above definitions make it amply evident that data lineage is a cradle-to-grave journey of data. This is where there is an opportunity for banks to explore different facets of data lineage for identified data elements.
Note: Data Elements (DEs) are critical data points that are published to regulators as part of periodic reporting. DEs are also used by the business for strategic decision-making and oversight.
One of the foundational objectives of establishing sound data governance is data lineage mapping. Identifying data elements and documenting data lineage is a key input in understanding data quality and data controls landscape.
Banks are now focusing on creating a clearly defined scope for data lineage. As a part of the scope, banks are including upstream channels from the first point of data capture, understanding the system of record (i.e., system that masters and maintains data) till the final point of data consumption. It entails documenting underlying business processes, functionalities, business rules, and integration layers dovetailed into the overall data flow.
HCLTech participated in multiple similar engagements where different dimensions of lineage were explored. What started as a simple technical lineage activity, evolved into a full-fledged discovery of data flow from a business perspective (refer to the figure below).

The above diagram illustrates the wide range of areas banks can explore. Therefore, the decision on the extent of data lineage coverage is driven largely by the bank’s appetite around risk and regulatory needs.
Given below is a summarized view of different kinds of data lineage:
Key takeaways
Channel lineage: This is aimed at understanding the genesis of data capture and the relevant application screens where user enters data. Typical data points analyzed are systems, screens, UI fields, data validations, and data submission rules. Different channels are understood in detail from a business perspective (self-service vs guided channels) and the impact on the data flow.
Business domain lineage: Involves mapping business processes attached to the data elements. Typical information captured as a part of the lineage is around the lifecycle of the business process and the sub-processes and the business rules. Business processes/sub-processes are crucial in understanding the systems impacted as part of data flow.
Functional lineage: Illustrates understanding system functionalities impacting the data elements. A key component is analyzing functional triggers of data elements that lead to data moving to the next target system as part of the workflow/business process. Typical systems covered are origination system or a servicing system (deposit, loans, etc.). The other dimension of functional lineage captured is data flow from core systems to ancillary systems and potential impact on the data element (e.g., pricing, decisioning, third-party verifications).
Integration lineage: This is primarily aimed at understanding and documenting various communication protocols between systems. Examples can be APIs, traditional web services, FTPs (File Transfer Protocol) transfers, batch jobs, etc. It shares insight into real-time vs batch process-driven data movement.
Technical lineage: This is a visual representation of data flow between systems that store and master data. Also called the critical data path which enables an understanding of where the data is stored. Typically, data flow encompasses a system of record, data lake, data warehouse, reporting solution, and data streaming layers. It activity covers documenting information such as database, tables, fields, scripts/file exchanges, transformation rules, and applicable technology controls as data moves between different systems.
Benefits of data lineage
Domain-driven data lineage has many benefits which are crucial to unlocking the full potential of reporting and data management processes:
- An understanding of underlying business processes, business rules, and acceptable values of data elements and visual representation of data lineage creates greater visibility and control in the hands of business stakeholders. Also, it unravels the mystery surrounding the processes of how the data is produced within upstream systems/processes and the kind of data that feeds into downstream layers
- It provides a glimpse into how the data is handled by different systems at different stages of the business process lifecycle. Insights into how the final reported numbers are arrived at with a fine grain understanding of data sources
- Enhanced collaboration and consistent understanding of the data landscape with a business perspective across stakeholders starting from data stewards, technology architects, business analysts, business SMEs, and risk SMEs
- Enables understanding and clarity around discovery of data controls, data pipelines, data ownership, and strategic data assets
Conclusion
It is a pivotal moment where data, domain and technology are coming together presenting innumerable opportunities. In this backdrop, data lineage has a significant role to play that could lay foundational blocks for data controls and a data quality ecosystem.
Therefore, it is imperative for banks to embrace data lineage as a business-driven program to truly become a data-driven organization.