Artificial intelligence (AI), machine learning (ML), deep learning, Internet of Things (IoT), and edge computing – all the buzzing technologies of today have one thing in common. And that is data as the underlying foundation for these key technologies in some way or the other. Data that is humungous in size, generated at a great pace, and coming in from various sources is more commonly known as Big Data. These characteristics are recognized as volume, velocity, and variety, respectively, in Big Data parlance.
Big Data requires massive processing and huge amount of storage. Walmart is trying to analyze 2.5 petabytes of data every hour. One sensor on a blade of a turbine generates 520GB per day and you have 20 of them (source: GE). The amount of data being generated is increasing at an unprecedented pace and is set to reach 44 zettabytes by 2020 with worldwide Big Data and business analytics revenue projected to grow to more than $210 billion (2020) with a compounded annual growth rate (CAGR) of 11.9%, per IDC.
A little over a decade ago, such high levels of processing and storage were only possible through the use of supercomputers and, thus, not only required huge investments upfront, but also made management of massive amounts of data a very challenging proposition. Soon companies like Google, Yahoo, and Facebook realized that in order to monetize the huge volumes of data that they were collecting, a cost-efficient distributed computing system was required for data management. This eventually led to the descent of the Hadoop ecosystem that is now popularly being used across several organizations for storing, processing, and analyzing Big Data trends. By the start of 2011, Facebook was accessing and analyzing more than 30 petabytes of customer data using Hadoop clusters.
Apart from keeping all the data that is generated internally, organizations are also sourcing data from external channels like news networks, social media, online forums, sensors, etc., in order to generate actionable insights and make data-driven business decisions. More recently, Big Data Lake or simply Data Lake is being employed as the storage repository for all internal and external data sources in its native format. It is schema agnostic and allows storing of both structured and unstructured data. Most Data Lake offerings employ Hadoop Distributed File System (HDFS) object storage for holding vast amounts of data. Business analytics and data mining tools can independently be applied on the data lake to analyze the data. Recently, organizations have also started looking beyond Big Data Hadoop for their Data Lakes, investing in alternative data stores ranging from graph databases to cloud platform storage like AWS S3.
With each passing day, Big Data Hadoop is becoming faster and more approachable. The emphasis is to entitle all the employees and even end customers with the power of Big Data technologies. AI-driven analytics, use of natural language processing (NLP) for querying data, and visualizations in real time are set to become standard features of business analytics applications. A new Forrester research report predicts that 25% of enterprises will supplement point-and-click analytics with conversational interfaces in 2018, paving out a groundwork for ordinary users to perform extremely complex analysis without knowing how to code.
With each passing day, Big Data Hadoop is becoming faster and more approachable.
Big Data and analytics in the hands of everyone will certainly augment the decision-making capabilities for driving business outcomes at each level of an organization (not just the executives). It will also affect everyday decisions taken by the common man, with millions of users of Google Maps and similar apps already leveraging the benefit of Big Data technologies to decide their routes to a destination with near real-time information about the traffic conditions. Such initiatives put the power of data-driven decision-making in the hands of a greater section of the society as opposed to a privileged few.
In the near future, focus of enterprises will be on developing Big Data capabilities that deliver true streaming Big Data and analytics to process and analyze the data on the fly as and when it gets generated, providing insights that are truly up to the latest second. This, combined with AI augmenting human decisions and providing instructions in real time, can form the foundation of heightened innovation for an organization. Those on this path are set to be at the helm of causing business disruption and leveraging business value at scale to gain significant competitive advantage.