Introduction
Today’s enterprises face unprecedented volume growth at a higher speed of inflow for both structured and unstructured data. HCL understands the necessity to have a strong and robust archiving solution that can store enormous amounts of data quickly and scale to meet business requirements for data retention and retrieval.
Data archiving is done to secure and to store data for long term retention both on premise or cloud environment. Data archiving solutions allow important information to be stored in safe locations so that it can be used whenever required. The storage process will generally have some policies applied to it depending on the information in question. For example, policies might include the retention time frame and security sensitivity.
The need to improve enterprise application performance, adhere to regulatory compliance guidelines for audit and electronic discovery (e-discovery), and reduce operational costs are pressing enterprises to deploy effective and efficient data archiving solutions. The key enterprise expectations for next-generation archiving are data reusability, scalability, and extensibility in the next couple of years.
Key Business Needs for Data Archival




Electronic archiving or eArchiving-as-a-Service (AaaS) is becoming an increasingly popular option for organizations looking to reduce overheads and streamline their IT operation. Active archiving will enable you to store data securely off-site and retrieve it quickly and easily for compliance or disaster recovery. It removes data from more expensive primary storage, reduces costs associated with capacity-based backup software, shrinks your backup windows, and improves your recovery time objectives.
Some of the key benefits of e-Archiving-as-a-service are as follows:
- Reduced capital expenditure
- Ease of implementation
- Reduced operational and support overheads
- Predictable costs and scalability
- Data reusability, flexibility, and security
Typical Archiving Strategy Considerations

Driving Factors |
Requirements |
|
What |
Business needs |
Data sources |
Value to Business |
Volume of Data |
|
Systems |
Data Classification |
|
Business Area |
Functional domain |
|
Data Assets |
Global auditing needs |
|
Completeness and Appropriateness |
Key Stakeholders Buy-in |
|
Standard Regulatory Reports |
Business /End User Reporting |
|
When |
Frequency |
Manual Vs. Automation |
Where |
On Premise |
Disaster Recovery Policy |
Cloud |
Security Policy |
|
Media |
Data Integrity |
|
Compliance |
GxP / SOX etc., |
|
How Long |
Country Regulations |
Retention Policy |
Legal Requirements |
Legal Hold Policy |
|
Technology Obsolete |
Technology Refresh |
|
How |
Accessibility |
Static vs Active |
Retire |
Decommissioning legacy |
|
Management and Controls |
Roles and Responsibility |
Archiving Solution Parameters
Archive Data Identification |
|
Archival Factory Model |
|
Reporting Mechanism |
|
Data Integrity/Reference |
|
Verification and Validation |
|
Deployment and Release Management |
|
Data Archiving - Reference Architecture
Click here to view the Image
Sample e-Archiving Operating Model
Click here to view the Image
e-Archiving - Service Catalogue
Click here to view the Image
Classification of Small/Medium/Large Archiving Projects
Complexity Definition | Simple | Medium | Large |
Database Type | Standard RDBMS | RDBMS and Flat Files | RDBMS and Flat Files |
Data Type | Structured Data |
|
|
No. of Data Objects | < 100 | 100 – 1000 | 1000+ |
Data Complexity and Referential Data Integrity | Simple Relationships/Dependencies between Objects | Complex Relationships/Dependencies between Objects |
|
Application Type | Single Function | Multifunction | Complex Functionality |
Data Size | < 50 GB | >50 – 500 GB | >500 GB + |
Data Model | Simple | Medium | Complex |
Selection Criteria | No Filters | 1-5 Filters | 5-10 Filters |
Data Integration Points | 1 | 2 | 3+ |
Comparison Across Solution Approaches
Solution Approach One: COTS Product-based Solution Using Informatica ILM
HCL proposes to provision and use the Informatica ILM tool in the targeted landscape and do necessary customization development (outside the ILM tool) to support various other organizational requirements.
Data Vault (inbuilt within the ILM package) will be the target archival repository, which will compress the data up to 85% and provide easy retrieval access at a record level.
Solution Approach Two: Custom-based Solution Using Hadoop
Hadoop is a modern open source framework, designed for storing and processing large amounts of data.
Hadoop is a modern open-source framework designed for storing and processing large amounts of data. A Hadoop NoSQL-based worm archive services for enterprises have the following high-level advantages:
- Unlimited scale to store petabytes of data | Fastest data ingestion due to schema-less writes
- API-based data retrieval and search | Compliance management – complete security
- Ability to run analytics – well integrated into the Hadoop ecosystem
HCL recommends the use of HCL framework data movement framework (DMF) to ingest data for archival into the archival store in Hbase NoSQL store. It’s for the record-archival worm storage along with SOLR for indexing hot data for full-text search. DMF is a data flow lifecycle management application that brings in a process for the typical information lifecycle management (ILM) activities in a Hadoop-based archival storage system.
Parameter | Informatica ILM (Product-based approach) | Hadoop(Custom-based approach) |
Integration and ETL | Gartner's top vendor in archival space with effective integration and ETL configurations. The ILM tool has an inbuilt basic data validation functionality. | DMF supports file and DB sources. Need to add data enrichment and validation for archival process. |
Store Optimization | The target repository ‘data vault’ stores highly compressed format in a columnar database for easy retrieval. The tool doesn't have the update/delete functionality at the record level. | Hbase-based storage for record level CRUD and field-level security requirements. |
Security Compliance | A tool is compliant with SEC Rule 17a-4 security requirement and inbuilt legal hold functionalities. | Partially compliant with SEC Rule 17a-4 security requirement when using Hadoop on Isilon. |
Seamless Access to Retrieve/Search – API Support | Data discovery portal with SQL query like search criteria on archived records. Need customization developed to support the API-based ingestion and retrieval. | HBASE and SOLR APIs available. |
Authentication | Maintain archived data authenticity and leverages existing security and supports folder-level security, role-based security, and database level security. | Support for Kerberos. We may need to add the third-party SSO. |
Performance | Large set of built-in accelerators with proven results and used by a wide set of customers across the globe. | Effective and fast but requires more time and effort to implement the solution. |
Implementation | Project timeline would scale up to four months with seven resources. | Project timeline would scale up to six months with 10 resources. |