Sorry, you need to enable JavaScript to visit this website.

e-Archiving as a Service

e-Archiving as a Service
April 17, 2019

Introduction

Today’s enterprises face unprecedented volume growth at a higher speed of inflow for both structured and unstructured data. HCL understands the necessity to have a strong and robust archiving solution that can store enormous amounts of data quickly and scale to meet business requirements for data retention and retrieval.

Data archiving is done to secure and to store data for long term retention both on premise or cloud environment. Data archiving solutions allow important information to be stored in safe locations so that it can be used whenever required. The storage process will generally have some policies applied to it depending on the information in question. For example, policies might include the retention time frame and security sensitivity.

Data archiving is done to secure and store data for long-term retention, both on-premise or cloud environment.

The need to improve enterprise application performance, adhere to regulatory compliance guidelines for audit and electronic discovery (e-discovery), and reduce operational costs are pressing enterprises to deploy effective and efficient data archiving solutions. The key enterprise expectations for next-generation archiving are data reusability, scalability, and extensibility in the next couple of years.

Key Business Needs for Data Archival

Impact on Performance
Impact on Performance
Unutilized Data
Unutilized Data
Operational Inefficiencies
Operational Inefficiencies
Data Breach
Data Breach

Electronic archiving or eArchiving-as-a-Service (AaaS) is becoming an increasingly popular option for organizations looking to reduce overheads and streamline their IT operation. Active archiving will enable you to store data securely off-site and retrieve it quickly and easily for compliance or disaster recovery. It removes data from more expensive primary storage, reduces costs associated with capacity-based backup software, shrinks your backup windows, and improves your recovery time objectives.

Some of the key benefits of e-Archiving-as-a-service are as follows:

  • Reduced capital expenditure
  • Ease of implementation
  • Reduced operational and support overheads
  • Predictable costs and scalability
  • Data reusability, flexibility, and security

Typical Archiving Strategy Considerations

Strategy Considerations
 

Driving Factors

Requirements

What

Business needs

Data sources

Value to Business

Volume of Data

Systems

Data Classification

Business Area

Functional domain

Data Assets

Global auditing needs

Completeness and Appropriateness

Key Stakeholders Buy-in

Standard Regulatory Reports

Business /End User Reporting

When

Frequency

Manual Vs. Automation

Where

On Premise

Disaster Recovery Policy

Cloud

Security Policy

Media

Data Integrity

Compliance

GxP / SOX etc.,

How Long

Country Regulations

Retention Policy

Legal Requirements

Legal Hold Policy

Technology Obsolete

Technology Refresh

How

Accessibility

Static vs Active

Retire

Decommissioning legacy

Management and Controls

Roles and Responsibility

Archiving Solution Parameters

Archive Data Identification

  • Pre-archival identification of actual business data to be archived (requirements filtering)
  • Simplification of report requirements (complex application screen/reports to be simplified with simple view definitions
Archival Factory Model
  • Designing an optimized archival solution to address multiple applications archival to run in parallel for retirement applications
Reporting Mechanism
  • Integration of centralized reporting tool for all archival-related reporting needs
  • Performance optimization to handle bulk data required for reporting purpose
Data Integrity/Reference
  • Structured and Unstructured data archival
  • Maintaining data integrity/referential integrity during the Archiving and de-archiving process
Verification and Validation
  • Data verification and validation post archival
Deployment and Release Management
  • Production, User Acceptance Testing (UAT), and Test Environment (Archival is usually done directly to Production Archival Environment itself from the Source)

Data Archiving - Reference Architecture

Click here to view the Image

Sample e-Archiving Operating Model

Click here to view the Image

e-Archiving - Service Catalogue

Click here to view the Image

Classification of Small/Medium/Large Archiving Projects

Complexity Definition Simple Medium Large
Database Type Standard RDBMS RDBMS and Flat Files RDBMS and Flat Files
Data Type Structured Data
  • Structured Data and
  • Semi-structured Data
  • Structured Data and
  • Unstructured Data
No. of Data Objects 100 – 1000 1000+
Data Complexity and Referential Data Integrity Simple Relationships/Dependencies between Objects Complex Relationships/Dependencies between Objects
  • Complex Relationships and Dependencies between Objects
  • Presence of Data in BLOB/CLOB Attributes
  • Presence of Junk Characters/Multilingual Data
  • Presence of Attachments (Like Images, PFD)
Application Type Single Function Multifunction Complex Functionality
Data Size >50 – 500 GB >500 GB +
Data Model Simple Medium Complex
Selection Criteria No Filters 1-5 Filters 5-10 Filters
Data Integration Points 1 2 3+

Comparison Across Solution Approaches

Solution Approach One: COTS Product-based Solution Using Informatica ILM

HCL proposes to provision and use the Informatica ILM tool in the targeted landscape and do necessary customization development (outside the ILM tool) to support various other organizational requirements.

Data Vault (inbuilt within the ILM package) will be the target archival repository, which will compress the data up to 85% and provide easy retrieval access at a record level.

Solution Approach Two: Custom-based Solution Using Hadoop

Hadoop is a modern open source framework, designed for storing and processing large amounts of data.

Hadoop is a modern open-source framework designed for storing and processing large amounts of data. A Hadoop NoSQL-based worm archive services for enterprises have the following high-level advantages:

  • Unlimited scale to store petabytes of data | Fastest data ingestion due to schema-less writes
  • API-based data retrieval and search | Compliance management – complete security
  • Ability to run analytics – well integrated into the Hadoop ecosystem

HCL recommends the use of HCL framework data movement framework (DMF) to ingest data for archival into the archival store in Hbase NoSQL store. It’s for the record-archival worm storage along with SOLR for indexing hot data for full-text search. DMF is a data flow lifecycle management application that brings in a process for the typical information lifecycle management (ILM) activities in a Hadoop-based archival storage system.

Parameter Informatica ILM (Product-based approach) Hadoop(Custom-based approach)
Integration and ETL Gartner's top vendor in archival space with effective integration and ETL configurations. The ILM tool has an inbuilt basic data validation functionality. DMF supports file and DB sources. Need to add data enrichment and validation for archival process.
Store Optimization The target repository ‘data vault’ stores highly compressed format in a columnar database for easy retrieval. The tool doesn't have the update/delete functionality at the record level. Hbase-based storage for record level CRUD and field-level security requirements.
Security Compliance A tool is compliant with SEC Rule 17a-4 security requirement and inbuilt legal hold functionalities. Partially compliant with SEC Rule 17a-4 security requirement when using Hadoop on Isilon.
Seamless Access to Retrieve/Search – API Support Data discovery portal with SQL query like search criteria on archived records. Need customization developed to support the API-based ingestion and retrieval. HBASE and SOLR APIs available.
Authentication Maintain archived data authenticity and leverages existing security and supports folder-level security, role-based security, and database level security. Support for Kerberos. We may need to add the third-party SSO.
Performance Large set of built-in accelerators with proven results and used by a wide set of customers across the globe. Effective and fast but requires more time and effort to implement the solution.
Implementation Project timeline would scale up to four months with seven resources. Project timeline would scale up to six months with 10 resources.