The business landscape as it stands
Architecting an enterprise platform is a complex process - requiring considerable investment by organization. Additionally, the acquisition of MPP appliances such as Teradata, Exadata, Netezza and Greenplum, which help process voluminous data and expedite query performance, involves further outlays. BI layer is one of the key components of an Enterprise Platform. The traditional BI layer comprises of data sources (involving a robust RDBMS and an ETL tool) and reporting tool(s).
With the emergence of Big Data, vendors in the BI space are scrambling to utilize its landmark capabilities, and align their tools accordingly. Additionally, customers/firms are intent on ensuring that their investments are repurposed without impacting business objectives.
Big Data is clearly a priority for adoption, but despite positive POCs, organizations are concerned about certain possible impediments:
- The absence of in-house expertise
- The paucity of trust in the implementation partner’s commitment to design and development
- The rising pressure from MPP vendors to adopt their versions of the offering
A road ahead – Solutions that are built On the specifics
Both Gartner and Forrester suggest that over the next few years, a hybrid environment will take center stage - involving on-prem and cloud.
The hybrid environment in on-prem needs to be supported. While defining and designing a hybrid architecture, the following elements demand consideration:
- Leverage investments in middleware, databases like MPPs, data modelling, metadata capture, and ETL tools
- Empower end-users to conduct data exploration through tools like SAS/SPSS/Infinite Insight
- Provide end-users with sand-box capability to test use cases before pushing solution to production
- Ensure end-users can obtain data without any challenges (data abstraction layer)
- Empower reporting tools that can access hybrid architecture, when required
Unraveling the process flowchart
- Enterprises usually operate an ERP system – SAP/ Oracle Apps/ PeopleSoft. Multiple applications serve business processes and the data generated is important for analytical purposes. Earlier, a variety of tools were deployed for data capturing from sources like Tibco. The raw data layer (the first layer), involved connecting these tools to the sources. Canonical models, alternatively known as Common Flat File format/CFF, were created and leveraged to obtain data in an easy to use format.
- The data in the first layer is a replication of source data and needs validation in the second stage – processed data layer.
- In the following layer, the transformation of processed data takes place. There are two options to initiate the transition - an ETL tool of choice or the creation of re-usable transformations in the Hadoop layer. Look-ups, joiners, splitters, and aggregators are the most common types of transformations used. The data model essential to populate the Hadoop transformation layer is different from the RDBMS.
- Data abstraction is the fourth layer, and helps data distribution obtained from the preceding one. Further, identified as the “Managed Data Services Layer” it comprises –
- Data dispatch
- Data exploration
- Data labs
- Semantic views
- Data as a service