I am sure fellow practitioners would have had to deal with the question of ‘Time to Market’ at various points in time during pre-sales, solutioning, implementation and support. Typically, Agile methodologies are used to handle the time to the market issue by ensuring that MVPs (Minimum Viable Products) are created and set of MVPs make a VP (Viable Product). So what that means is if we had a solution that would take say 6 months (traditional SDLC) to build we break it down to 6 sprints of 4 weeks each and incrementally deliver the solution over a period of 6 months using agile implementation approach. The advantage in this scenario is that the Business Users are continuously engaged throughout the 6 months and any changes or modifications to the course can be taken up immediately instead of waiting till the solution goes live.
However, a welcome trend that is being observed is to digress from the traditional SDLC practices of:
The reason why it is a welcome trend is due to the following thoughts:
- Business Users have a wide variety of data to consider (Enterprise, Social, 3rd party data)
- Most of the data available today was not available in the past and the efficacy of the available data is also questionable
- Ability to relate Enterprise, Social and 3rd party data needs to be proven from an Operational and descriptive analytics perspective
The current trend is to harmonize Enterprise, Social and 3rd Party data without setting anything in stone and at the same time ensure that MVPs or VPs are delivered quickly. Data Virtualization is helping achieve the harmonization of Enterprise, Social, and 3rd Party data quickly to proof point the use cases and in many situations, the POCs are making into production.
Business Issue: Un-reliable manual Costing BI report to be replaced
At a Retail Customer the need was to harmonize Point of Sale (POS) transactional data with Costing data (OLTP Application for Costing Assortment) for the following:
- The Costing Analysis is to be done in Real Time, replacing a manual method which is slow and unreliable
- Costing Reports (standard and ad-hoc) to be refreshed every 5 minutes
- Costing BI Report needed at Consumer Choice level. Hierarchy being: Brand à Market à Channel à Division à Department à Class à Sub-Class à Style à Style CC (Style Consumer Choice)
- History is Current Season and future 3 seasons (basically current POS data and future forecasts)
- OLTP is the main source of data and pulls (at the lowest grain Style CC) from OLTP has to be un-intrusive meaning cannot affect the OLTP users utilization of the App.
- Created a Staging Table to pull data from the OLTP application at the lowest level of granularity (Style CC level)
- De-normalized the data from all the sources at the lowest level of granularity namely Style CC level
- Used the full refresh method to pull data as incremental pulls were taking longer and were complicated to deal with.
- Virtualization views were created with the parameters related to date, season and brand so that all queries on the virtualization layer would get a prompt response back.
- Ensured the referential integrity is maintained across the de-normalized layer within the access layer with proper indexing, association in the virtualization layer for faster query response.
- Before blending the data into the access layer, converted all the Nested Complex Arrays to the flattened collection (JSON), so that query retrieval should be faster. Also created index there.
- Data Volumes range from 6 to 10 Million records for each of the sources.
- Query time achieved is 2 minutes to refresh the data in the virtualization layer and reports are refreshed in less than 30 seconds.
- Horton Works (HDP 2.4)