July 5, 2013


Setting the stage for Big Data Analytics

The well-known futurist, Gerd Leonhard has said that data is the “new oil”. Quite unlike the prized hydrocarbons though, data is plentifully available all around us. Thanks to the cheaper storage costs, now it is considered alright to err on the side of storing excess. As a result, a lot of noise too, is being stored along with the useful stuff. Information highway too is abuzz – the amount of traffic flowing annually over the Internet is all set to surpass 667 exabytes this year (One exabyte equals 250 million DVDs). Massively parallel and distributed database technologies available today can process a lot of this data, in rest and in motion, and provide insights in real time or near real time. Enterprises can use this stream of timely, relevant insights to become not only more operationally efficient but also more responsive to the business changes. With such fantastic gains to be had, the future certainly looks data driven and as with every major change, it pays to adapt well in advance to “get it right”. Let’s quickly look at some of the factors that play an important role in the success of big data projects.

Era of Data Governance

In its “Predicts 2013: Big Data and Information Architecture” report, Gartner has noted that by 2015, 20% of Global 1000 organizations will have established a strategic focus on information infrastructure equal to that of application management. This reflects a significant shift in the scheme of things. Till now applications were in the spotlight. Data served primarily as application’s infrastructure and remained fragmented across business processes and IT systems. So even as we try to push ahead with data analytics initiatives, less than desired levels of information management maturity can stall the projects midway. Implementing an information governance program can preempt such problems and create an environment for the smooth sailing of the big data initiatives.

Context is the Key

In an enterprise, analytics initiatives are usually directed around well identified decision points and scenarios such as predicting customer behavior, improving supply chain efficiency or monitoring financial transactions. The existing analytical models bring only partial clarity to these scenarios. Such systems are mostly able to deal with only the standard, factual queries such as “which customers are leaving?” and “what is the impact?”  However, with more exploratory queries such as “why are they leaving?” or “is this trend related to recent launch of similar service by a competitor A?” - it is easy to hit a wall. To get more out of all the existing datasets and ultimately, to take better and faster decisions, we need to move forward from this myopic paradigm of analytics, which was severely restricted by its focus on structured data.

The confluence of traditional data and data from social media, content management and other enterprise systems uncovers multiple relationships and dependencies inherent in them. Application of relevant statistical models and algorithms to this broader data universe helps in answering a diverse range of predictive and investigative queries more conclusively. Thus, putting the data in proper context is a key step not only in visualizing the problem domain but also in combining the analysis of unstructured data with the existing analytics, to arrive at the kernel of truth that exists for the topic of inquiry.

Consumer Privacy Debate

The upcoming version of Google Maps will soon be customized as per the consumer’s preferences (read – their big data!). This illustrates how the usage of big data by enterprise is assuming new dimensions even as pervasive data collection by wearable sensors such as Google Glasses is pushing the debate regarding consumer privacy to the forefront. As a result, we can expect a growing level of scrutiny on how well the consumer data is pruned or masked to protect the pieces of PIIs (Personally Identifiable Information) in it. In short, adhering to a privacy policy and checking the data quality will be important while sourcing data from third parties.

Perfect Postcards for Business IT alignment

In an enterprise, big data initiatives will invariably be the last frontier of automated analysis undertaken, be it to drive operational efficiencies or to resolve the boardroom dilemmas. As such the most important “user acceptance tests” will be based on the real time business utility and applicability. For example, financial trading systems today, are reeling under the imperative of processing an ever growing deluge of data within razor thin time intervals. Performance, scalability and reliability are simply non-negotiable in such highly automated transactional systems. So while it is generally true that any technological intervention must be closely aligned with business goals, big data initiatives should strive to be nothing short of being the perfect postcards for Business IT alignment.

Crunch in data science skills

Data science is an emerging field which includes concepts of statistics, machine learning, mathematics and probabilities. A good mix of business processes and data science is required to define the business requirements, analytical models, and also to decipher the insights for decision making. However, data analysis or data science skills are scarce and therefore, command a premium.  Here the best approach is to be prepared to work in virtual team set ups initially while ramping up these capabilities in-house over a period of time.

To navigate the tricky terrains, use better roadmaps

Data discovery, platform selection, capacity planning, storage management, data management and visualization are some of the key hotspots of detailed technical analysis and design. A key challenge is to incorporate the big data solutions while leveraging the existing investments in data and analytics.  Running clusters in the data centers and maintaining the specialized technical skills required for big data amount to big investments. As such, high performance cloud computing solutions such as AWS’s Elastic Map reduce, offering benefits such as scalability, subscription model with ease of management, bring promise of big data within the reach of many mid segment and SMB enterprises. The cloud model has proved to be a good fit not only for prototyping, but also for many production scenarios. On the other hand, green field deployments of Hadoop platforms and appliances are decidedly nontrivial engagements, and usually require service intensive intervention of a trusted technology partner.

The state of information management maturity, existing data platforms and the business scenario will impact big data initiatives. In some industries the use case for big data is not yet established. In such cases, need to extensively experiment, prototype and adapt from the successes achieved in other domains will be high. With big data hype reaching an all-time high this year, it will be important to check the tendency to rush forward to ride the wave. To profitably adopt big data and forestall risks inherent in such innovative initiatives, enterprise should focus first on developing a big data strategy and roadmap, while factoring in the unique data and business model challenges facing them.Read more.