Summary of the Solution Framework
In a traditional ETL or data warehouse solution, the data ingestion into your data lake needs to happen from various source systems and then cleansed before they can be processed further by downstream applications. Additionally, in the current scenario, data migration from on-premises systems to the cloud is becoming more and more popular. The whole idea is to leverage the data warehouse solution with built-in Talend to ingest data from any structured and semi-structured data sources into any destination by adding some metadata information into a metadata file/table.
The talend-based accelerator provides a customized framework for data processing
Business Benefits
- This data ingestion framework helps in reducing the development and data migration costs as compared to traditional ETL methods and directly increase the performance of the IT team (ETL development and support)
- One-step recovery from the failover point with ETL job load statistics, easy to troubleshoot, and fix issues easier than the traditional approach
- Easy to maintain the configurable metadata because this approach requires a lower number of ETL jobs
- This accelerator supports schema evolution; any change in the schema of any existing feed does not have any impact on the solution framework thus reducing any need of any code change, leading to lesser effort required for data migration and change management
- Supports data ingestion as well as data cleansing with configurable rules in the data store or in the drools engine based on the rule complexity
- Job run statistics are captured for monitoring and auditing purposes and can be used to create dashboards for reporting and visualization purposes
- No dependency on the Talend SME to use this accelerator
Solution Framework Overview

Sample Metadata File Contents and Description
Here is how a metadata file looks like for ingesting a MySQL table into a Snowflake DB:
Field Name | Description | Sample Value | Optional |
Source DB | DB name of the source | mydb | |
Source Name | Source Table Name | employee | |
Source Type | Type of source to be ingested | MySQL | |
Target Type | Type of destination | Snowflake | |
Target DB | DB name of the target | Ingestion | |
Target Name | Target Table Name | employee_cleansed | |
Extract Query | Source Extraction Query | Y | |
Ingestion Flag | Should be set for the ingestion to happen | Y | |
Logs to Kafka Flag | Should be set if the audit logs are sent to a Kafka topic | N | |
Parm File | Contains source and target connection strings | C:/config/mysql_snowflake_ingestion_parm.txt | |
Delta Flag | Should be set for Incremental data extraction | N | |
Comparison Field | Should be set if Delta Flag is set to Y | last_update_dt|date|YYYYMMDD | Y |
Curate Flag | Should be set if data cleansing is to be done | Y | |
Curate Type | Should be set if curate flag is set Y | Basic | Y |
Rule File | A file holding the curate rules | C:/data/emp_data_schema_mysql.txt | Y |
Monitoring Dashboard

Conclusion:
This framework can be extended to ingest data from any structured data source system (RDBMS like Oracle, MySQL etc. , Local File, FTP server pulls etc.) and store data to any destination (AWS S3, Azure ADLS, RDS etc.). The cleansing rules can be created on a more advanced rule engines - drools for writing any rule as per the requirement. The ABaC tables data can be used to create dashboards in any BI tool such as Power BI, Tableau etc. to understand and monitor various KPIs of a batch run daily. The objective of the above write-up is to help create an ingestion and cleansing framework in Talend so that the same can be leveraged to on-board any source system feed with minimal/no development changes reducing the overall TTM.