Large Scale Data Migration | HCL Technologies

Large Scale Data Migration

Large Scale Data Migration
April 15, 2021

Co-authored by : Nibedan Sahu

Introduction

Large enterprises are running on age-old platforms like Mainframe, and face numerous issues like data integrity, data reconciliation, availability of IT skills, and implementing ever-changing business requirements, especially when dealing with legacy applications. The customer’s biggest concern is digital platform modernization for large-scale data migration. In a typical Oracle ERP implementation, the volume of data migration is around a few million transactions. However, when the number of transactions increase to one or two billion in an Oracle Application ERP implementation, we require strong technical expertise, data integrity, creativity, and consider different approaches for data migration.

Data Migration for Digital Enablement

We initiated a transformation program to “digitally enable” the customer by moving legacy applications to a modern stack. As part of this digital enablement program, billions of data packets (2.5 billion transactions) were migrated to Oracle. Data migration is tightly coupled with business windows. A flawless execution was scheduled to perform the data migration with zero impact on the daily transactions. This was the prime ask from the customer. 

We initiated a digital enablement program by moving legacy applications to a modern stack.

Expectation

  • 150 million data needed to be covered per conversion run
  • Data migration window of 10 to 15 hours per conversion

Challenges

  • A typical data migration project architecture has multiple nodes but each node is used for a different purpose and import programs run without any parallelization
  • Loading data files to the stage table takes around 4 to 5 hours per 100 million transactions
  • Validation and loading into the base table around 15 to 20 hours per 100 million transactions
  • Time for overall conversion needed to be reduced from 2 days to 10 hours

Approach

The below approach was performed to achieve data migration  of over 2 billion transactions  efficiently within the cutover window:

  • The file load process was improved by executing the load process in multiple nodes at the same time
  • The transaction creation process was improved by executing the parallel execution of the import process
  • Database tuning was done to get the maximum performance

The below diagram shows how billions of customer transactions flow from Mainframe to Oracle and the highlighted processes are time-consuming wherein improvement is required:

Flow of Custome

Figure 1: Flow of Customer Transactions from Mainframe to Oracle

HCL’s digital approach enables the enterprises to implement the architectural changes and parallelization in the import program to improve the performance during the cutover window.

HCL Digital Platform has implemented the architectural changes and parallelization process in a leading Logistics and Transport enterprise to migrate 2.5 billion transactions during the cutover window:

We have successfully migrated a mammoth 2.5 billion transactions from Mainframe to Oracle ERP within the customer-provided cut over window. The migrated data was ready and available to the business. At the same time, the regular transactions were carried out independently.

  1. SQL*Loader Processes- Single Node vs Multi Node

    The SQL Loader is a capable tool to load the data but when we talk about larger amounts of data (e.g.: 100 million), we must know the limitation and the rate of data load in a given time. As we read the large data file and load using the Loader, it takes time because one loader service runs and loads the data.

    It breaks the large file into multiple files (10 files each having 10 million transaction data  packets), invoking multi services in parallel process of loading the multiple files Thereby we achieve the speed in the process and the time taken is also drastically reduced.

  2. Server-Side Configuration

    During the Loader process, we must configure the server and use it effectively, otherwise the server takes more time to execute the Loader. The best option is to use the multiple nodes for the Loader service. Let’s get the multiple files and share them across different nodes so that each node processes a file quickly as the file size is 10 million.

    Scheduling tool (1 Automation), which calls the multiple nodes where the data files are distributed, processes the data in parallel and exports the data into Oracle staging tables. The Master is triggered, which, in turn, triggers multiple workers, which creates the invoices and lines in Oracle.

  3. Parallelization– Master/Worker

    Once the data is available in the Oracle staging table, we use the multiple instances of the API to load into Base tables. Here, multiple instances of Master and Worker processes are triggered. The Master decides the number of Workers based on the data volume to process the transaction. The Worker has the logic and validation for the transactions. The Worker controls the commit size of the data. For instance, let us assume we have 1 lakh transactions and we select the commit size as 5,000 and the number of workers as 20. The Master builds logic and designs to divide tasks across multiple processes running concurrently to allow us to do more work in less time. In this scenario, the Master triggers 20 workers in parallel and it commits the transactions after 5,000 records are processed. This process drastically reduces the process time.

We must use the multiprocessor architectures which breaks tasks into smaller tasks that can be executed concurrently by independent processes. Again, if we are in a critical conversion, where we have very limited time, then we may think of number of processes available, number of  users logged in, the memory available, and holding all other non-conversion processes so that all the resources are available only for conversion.