Are We Ready to Embrace Precision Medicine?

Are We Ready to Embrace Precision Medicine?

Dr Pitluk is Vice President of Life Sciences, Paradigm4.
July 09, 2021
  • <<
  • >>

Precision medicine is one of the most talked about trends in healthcare. The promise is improved health outcomes and patient satisfaction, gained through an understanding of individual variability in genes, environment, and lifestyle. Recent commentators suggest that many of the technologies required to realize this goal already exist—and predict much wider application can be achieved by 2030.

At the heart of a precision medicine approach is data—huge amounts of data. Data from large patient cohorts that have been followed for years. Data from patient records, from routine genomics, from biochemical and molecular analysis, and phenomics and environmental impacts, for example.

Here lies the central challenge. To extract the value embedded in all this data requires IT and computing solutions that offer researchers flexible, scalable, and easy-to-use tools to interrogate and make connections across massive heterogeneous datasets.

A population-scale problem

National cohorts such as the UK Biobank, the Million Veteran Program, FinnGen, and the All of Us Research Program, have amassed huge amounts of patient data already and now the International Hundred Thousand Plus Cohort Consortium (IHCC) is bringing together more than 100 cohorts, comprising more than 50 million individuals from 43 countries.

Consequently, groups such as the Global Alliance for Genomics and Health (GA4GH) are working to develop and to coordinate common data models and file formats to facilitate collaboration and interoperability. Likewise, the Allotrope Data Format (ADF) —a vendor, platform and method agnostic file format for storing data, multi-dimensional arrays, contextual metadata, and ancillary files—is gaining traction as scientists look for better ways to find, access, and share essential data.

Single-cell ‘omics turns up the dial

As the number of patients in studies increases, the data challenge is compounded by the growing adoption of single-cell ‘omics (genomics, transcriptomics, epigenomics, and proteomics), which extends the mantra of precision medicine from right patient and right therapy, to include right cell target. With this new information, researchers and clinicians can, for example, better explore the transition from “healthy” to “disease” states, study potential biomarkers, understand the mechanics of disease pathways, and more accurately assess response to drug targets or available therapeutic regimens over time.

However, up to this point, current single-cell datasets have been generated from a small number of individuals, and the statistical significance required to generate answers with confidence relies on the number of patients studied, rather than the number of cells. This is because cells from the same patient are “siblings” and not true biological replicates.

The Immune Cell Survey in the Human Cell Atlas (HCA consortium) currently contains 780,000 cells from just 16 individuals, so with 50 million participants, the number of cells the IHCC could be managing is staggering. To add more scale and complexity, data from imaging, wearables, and biochemical assays is being increasingly added into the mix.

The reality is that to interrogate heterogeneous datasets with the 100,000s of patients and treatment conditions that are needed to progress toward precision medicine and to inform pharmaceutical development, software tools to manage billions of cells are essential. Many of the tools in use today are challenged by the level of data that is being generated now, and soon, the incumbent data-management tools will simply be unable to cope with what is required of them. A new approach, and an improved analytics platform for robust and reliable scientific data modeling, storage, and large-scale computation is urgently needed.

The struggle is real

There is a growing requirement to juxtapose datasets from different assay modalities (such as imaging data with RNAseq data), but current software and computing approaches lose physiological context because there is no ability to do this. In fact, with most current tools, even the simple task of comparing the same modality across different studies is lacking.

Moreover, current methods require repetitive extract/transform/load operations from data silos, or files, into an analyzable format, thereby increasing the time and costly computational overhead for every question asked of the dataset. What’s missing is a solution to “load/QA once—interrogate often”, and to run analyses at large scales on cost-effective hardware.

By creating silos of data, legacy systems extend analysis time, and, at worst, tools will not run without significant modification. In fact, some commonly used algorithms do not scale beyond certain low patient number limits—in certain cases, fewer than 20. As a result, researchers are often forced to use less accurate methods that compromise the decision-making power of data by aggregating cells before differential gene expression, crushing the ability to precisely define cell types and states.

A new way

A recent publication provides an illustration of how rapid insights and understanding can be gained using a scalable platform designed for sparse data, with integrated math functions that can routinely work with billions of cells.

Faced with a large collection of single-cell analyses from various tissues relevant to COVID infection, we created and curated a dataset populated with COVID Cell Atlas data and Human Cell Atlas data. The data —2.2 million cells, 32 projects—were all normalized in order to establish the cellular and tissue distribution of the disease targets, and to allow searching to find the distribution of transcripts for ACE2 and TMPRSS2—the primary receptors for COVID-19 infection.

In this case, searches were able to be completed within seconds, a timescale conducive to rapid iteration and timely decision making.

Looking ahead

If the future of healthcare is precision medicine, then the future of precision medicine hinges on the ability of data systems to make the vast amounts of data that will be generated “science-ready”. Extracting value from the depth of multi ’omics data that is being produced and assembled into cell atlases will play a critical role in the exciting, but challenging road ahead, for researchers, clinicians, and, ultimately, individual patients.

About the Author

Dr Pitluk is Vice President of Life Sciences, Paradigm4.

Related Articles

Join the discussion