Next-Gen Clinical Data Platform | HCLTech

Next-Gen Clinical Data Platform
November 24, 2022

The pandemic has disrupted the clinical research domain to possibly the greatest extent. The drug pipeline across different phases was abruptly halted, and trials were put on hold, which led to a business continuity crisis for life sciences enterprises. Programs and business requirements that were generally categorized as ‘good to have’, became ‘must haves’ for enterprises, such as virtual trials, decentralized trials, and telehealth, to name a few.

Efficient clinical trial data analysis is essential to prove the efficacy and safety of new investigative products and therapies. To achieve the efficiency of clinical data management practices, it is paramount that platforms keep pace with the evolving landscape of study protocols, increased regulatory constraints, and globally distributed data management teams.   

In the current context, commercially available clinical data management platforms offer a containerized model, with variable extensibility, scalability features, and minimal or no artificial intelligence or machine learning (AI/ML) capabilities. This blog post highlights the salient features of knowledge graph-based clinical data platforms and compares them with the data lake and cluster approaches. 

Multiple technologies and standards are used to collect clinical trial data 

For various business requirements including monitoring and reporting, there’s a need for integrating, consolidating, transforming, and managing clinical data. The need to process voluminous data with inconsistent formats and changing business requirements brings significant challenges. Non-electronic data capture (EDC) data categorized as ‘laboratory’, ‘biomarker’, ‘imaging’, and ‘patient-reported’ account for most of the information collected during a clinical trial.

Typical clinical data platforms are designed as per traditional ways of working— the EDC platform is designed to collect data as per the electronic case report form (eCRF); the clinical trial management system (CTMS) system defines visits, tracks progress, and more; the eCRF designer helps with questionnaire configuration and edit checks; and the clinical data warehouse enables holistic clinical trial data analysis; among many other examples.  

The current implementation of clinical data platforms offers manual or semi-automatic data management and transformation processes.

With the surge in wearables and mobile apps, along with 5G connectivity, the amount of patient data being generated is rising multifold. It becomes even more challenging for the enterprises to perform clinical data management on such an unprecedented number of data sets from diverse sources— electronic case report forms (eCRF), wearables, apps, devices, and more.

Question to ponder—does the current technology platform support transition to clinical data science? 

Traditional data management techniques rely on manual effort coupled with some programmable routines offered within the platform, generally at the level of a domain. A library maintained at the therapeutic area, domain, or at questionnaire level is used to configure a protocol.

Clinicians, statisticians, and medical monitors, along with safety and other business groups, expect the ‘clean’ version of clinical data to be available as soon as it is collected. The surge in data from various sources (more than 70% is generated outside of EDC), along with business expectations of having a ‘clean’ version of data available at the earliest, becomes an uphill task. Sometimes these demands don’t seem feasible to achieve.  It may be possible if the platform offers the deployment of newer technologies such as AI/ML/NLP across the business processes that support managing large sets of data.

Can next-generation technologies be deployed on the current implementation of clinical data platforms?

The answer to this question is most likely yes if all data is consolidated into a data lake. However, a data lake has its own challenges— it struggles with structured query language (SQL) concurrency and in the rush to ingest incoming and essential data, it often gets stored without domain and precise metadata, which hinders its meaningful retrieval.

The other approach of spark-based clusters entails huge data engineering efforts leading to higher cost, complexity, and a lack of lineage, which is often not repeatable.

To meet the desired CDM paradigm, a new avatar of the clinical data platform needs to be envisioned. Such an avatar would need to ingest a variety of data from several sources and provide a longitudinal view of patient data to the business for several reasons. These include data management, safety reviews, and other business activities on a real-time basis with minimal manual intervention. This new avatar would also need to have the ability to leverage technological advancements such as AI/ML/NLP algorithms.

Clinical data management paradigms 


Current CDM Paradigm 

Desired CDM Paradigm 


10's to 100's of datapoints per patient per week (i.e., few datapoints per CRF) 

Thousands to millions of datapoints per patient per week 


EDC centric including local labs and PK. External data mostly limited to IxRS, central labs and eCOA 

Scope expanded to RWD, biomarkers, genomics, imaging, video, sensors, and wearables (i.e., sequenced data), structured and unstructured data 


Days, weeks and months. Data entered in eCRF days after patient’s visits 

Near real time. RESTful APIs providing interoperability between computer systems 


Exact copy of source, ALCOA. Mainly confirmed through SDV and queries. Perfect (100% error free). Manual/scientific Reviews 

Focused on what matters (i.e., critical to quality factor). Risk-based data strategies, early signal detection, cross domain automation using AI/ML. NLP interface for users 


Focused on regulatory submissions 

Continuous data insights on patients (e.g., safety, behavior, etc.) helping study design by improving sensitivity in measurements and better understanding of the disease to treat. Broader secondary use (synthetic arms, patient engagement, machine learning training sets, etc.) 

Table 1: To support the 5Vs and be future-proof, CDM needs a source and technology‐agnostic data collection, consolidation, and management strategy, looking beyond the transfer of source data to EDC/DMW. 

Introducing the knowledge graph 

Relationship or connectivity is the most important characteristic of today’s data and entities, from power grids, and retail to supply chains, or patient care data in electronic medical records (EMRs) to patient data in EDC. As the ecosystem becomes increasingly interconnected and complex, using technologies to leverage relationships and their characteristics becomes significant. The ability to define and store relationships as part of data itself is the single most differentiating factor in a graph database besides scalability, performance, the ability to change, and others.

Graph database technologies provide flexible ways to extend study data models to new definitions, preserving the contextual meaning of the original data.

This helps uncover relationships amongst clinical and broader R&D data and build the ability to analyze data that one did not know needed to be analyzed. A few examples are given below:

  1. Relationship between concomitant medication and AE 
  2. In studies where subjects were treated with dosed compound 1 and compound 2 - what other adverse events have been reported for patients with elevated liver values who received compound 1? 
  3. Which of the male/female patients that were given a dose of drug A have had high blood pressure measurements during episodes of severe headache? 

A next-generation graph-powered clinical data management platform enables the possibility of rapid analyses of clinical trial patient data, both within individual trials and across multiple clinical trials for meta-analyses. Clinical Data Interchange Standards Consortium (CDISC) -based clinical data ontology enables quick integration of patient data from EDC to a submission-ready format. Pooling analysis and combining real-world data is seamlessly enabled via ontology which helps guide future clinical trial designs and adaptations.

Adaptive design 

Master protocols 

Study design leveraging synthetic arms 

  • Require fully integrated e-clinical technologies 
  • Must be able to adapt rapid changes within the study 
  • Account for potential changes (sample size, dosing arms, regimen, patient population, endpoint, duration, schedule)  
  • Need real-time data for statistical modeling, thus, aiding adaptations 


  • Intended to improve the probability of matching the right treatment to right patient with right disease type to maximize a positive outcome 
  • Requires robust technology that supports a comprehensive data management plan 
  • Clinical data scientists need to consider more than the basic study parameters 
  • Where one or multiple study arms are replaced by previously collected data from either clinical studies or real-world evidence 
  • Helps in expediting and potentially saving costs by reducing physical enrollment 


Table 2: Clinical Data Platform: Key IT expectations to support clinical research 2.0

The road ahead for clinical trial data management

Efficient clinical trial data management is crucial to the continued success of biopharmaceutical enterprises. The complexity and rapid changes to the clinical data management landscape requires new sets of tools to keep pace with business initiatives. Graph technology provides a strong foundation for next-generation clinical data platforms with the ability to develop and deploy AI/ML solutions leveraging the inherent graph characteristics– metadata and data together, less sensitive to structural changes, as well as scalable and robust performance. 


Steve Chartier, Patrick Nadolny, Richard Young (March 2022). The 5Vs of Clinical Data

Get HCLTech Insights and Updates delivered to your inbox