January 13, 2015


We need Deep Data, not Big Data

In an article titled, “How big data can revolutionize pharmaceutical R&D”, the McKinsey Global Institute estimates that applying big-data strategies for better informed decision making could generate up to $100 billion in value annually across the U.S. health-care system, by optimizing innovation, improving the efficiency of research and clinical trials, and building new tools for physicians, consumers, insurers and regulators to meet the promise of more individualized approaches.

A lot has been inked about the pharma and healthcare industry lagging behind the retail and financial services industries in the use of big data.  Our premise is that deep data, and not big data, is critical for healthcare.  Deep data is about combining the relevant data streams with domain knowledge and analysis, not merely pursuing data acquisition while hoping that “insights” emerge from correlation.  With the availability of new data streams like clinical trial performance, climate, adverse events reporting, personal medical devices, and more, the trick is to identify value in these data streams and merge the appropriate streams to gain new insights.  Determining value in the data streams and knowing which ones to merge requires domain expertise to visualize the use-cases.

When improving clinical trial efficiency, the key focus areas would be site selection, investigator selection, and patient recruiting.  Clinicaltrials.gov is an important data stream that provides a rich view of various trial sites, study parameters, investigators and site performance.  Mapping the competitive activity for trials can be built to identify white spaces for site selection.  Fig 1 below shows the trial locations for various solid tumor studies being conducted in California for a Sponsor and its competitors.  While the Sponsor seems to be active in Southern California, the competition seems to have a higher density of locations in Northern California.  If we were to drill down further and examine the number of studies per location being conducted for solid tumor (Fig 2), it presents a totally different picture for the Sponsor.  Competition is conducting more studies at the same or similar locations as the Sponsor, which could lead to challenges in patient retention.

Source: Clinicaltrials.gov, HCUP California 2011

Taking the next data stream - Hospital Discharge, can help identify patients with tumors at various hospitals (Fig 3).  When this information is overlaid on the trial sites, we can evaluate potential sites with lower competitive density.

Investigator performance can now be evaluated based on study parameters and trials across competitors.  When data streams across Clinicaltrial.gov and social media are combined we can get a fair measure of investigator experience (Fig 4) and any compliance or regulatory issues (Fig 5)

Fig 4: Investigator Experience 

Fig 5: Compliance/Regulatory Issues

Domain knowledge is critical in creating the problem statement and evaluating data streams for incremental gain.  Deep data allows for an optimal way to structure problems, analyze data and evaluate results, as opposed to the big data approach of hoarding data and performing analytics to get insights.  As seen in the above example, while Clinicaltrial.gov is a thin data stream, its content allows for rich evaluation of competitive and investigator landscapes for trials.  Layering additional data streams like Hospital Discharge, the FDA site and other social sites allows us to build a nuanced view of potential sites/investigators for trials.  Similarly, deep data techniques could be utilized for better targeting of physicians, hospital key account management and patient therapy adherence.