Data-Driven Biotechnology: How Multi-Omics Analytics Are Shaping the Future of Medicine

Jamie Kasuboski, Partner at Luma Group and Rob Plasschaert, Senior Director of Biology at Stealth Newco

Innovation in biotechnology is driven by uncovering novel biological insights and translating them into life-saving therapeutics, diagnostics and medical devices. Over the past two decades, breakthroughs have largely stemmed from analyzing vast biological datasets, such as those generated by human genome projects.

Today, advancements in artificial intelligence (AI) and machine learning (ML) have significantly enhanced our ability to systematically analyze massive datasets, identifying complex relationships across genomic, proteomic, transcriptomic, metabolomic and other data simultaneously. The cross-section of all of these “-omics” is what we define as multi-omics, which represents a large untapped domain for future biotech innovation.

The convergence of affordable, sophisticated AI/ML analytics and large-scale multi-omics data collection has marked a pivotal shift within biotechnology, from single-omics approaches to integrated multi-omics innovations.

June 2025

Introduction: Multi-Omics and the Era of Big Data in Biotechnology

Over 20 years after the first published human genome, biotechnology is now firmly a discipline of big data. The dissection of disease mechanism is done by creating a logical path of cause and effect that moves from origin (e.g. a genetic mutation) to disrupted biology process (e.g. non-functional protein and pathway) to presentation of clinical symptoms (e.g. cancer). Traditionally, the scope of this work has been limited by scale, and progress has been consistent but slow. Advances in methodology have transformed full-coverage “-ome” profiling—genome, transcriptome, epigenome, metabolome, and beyond—from bleeding-edge novelty into standard practice, offering more qualitative quality control. We can now routinely generate terabytes of molecular measurements from a single study. We, and many others, believe that connection between these large datasets will define the next era of multi-omics drug development. This emerging field focuses on the integration and analysis of large-scale molecular and clinical data, enabling the systematic dissection of biological cause-and-effect at scale. By viewing molecular disease through this multi-faceted lens, researchers can identify and translate novel insights into new therapeutics, diagnostics and medical devices.

Looking through the Compound lens of Multi-Omics

By combining large datasets that characterize disease etiology with clinically meaningful endpoints, multi-omic analysis is poised to deliver transformative insights. High-throughput profiling of the central dogma—DNA → RNA → protein—is now routine, and assays for modulators (e.g., epigenetic marks) and downstream effectors (e.g., metabolites) have dramatically decreased in cost while increasing in sensitivity, rendering them nearly run-of-the-mill. Fifteen years ago, whole-genome sequencing cost over $10 million per genome, RNA-seq was low-throughput, proteomics relied on 2D gels, and electronic health records (EHRs) remained siloed; today, sequencing runs under $500 per sample, single-cell multi-omic kits can simultaneously profile chromatin accessibility and gene expression, and modern proteomics platforms quantify thousands of proteins in a single day.

Clinically, routine laboratory tests, digital histopathology, and imaging now feed into AI-enabled pipelines that extract multi-scale features, while EHR data—once trapped behind Epic or Cerner—are routinely exported as de-identified OMOP/FHIR–formatted datasets via research data warehouses. Public resources such as MIMIC-IV, NIH’s All of Us Research Program, and the UK Biobank exemplify how ICU telemetry, standardized lab values, and de-identified clinical notes can be linked to genomics and metabolomics under strict governance. What once required bespoke protocols, custom ETL pipelines, and extensive manual annotation has evolved into a streamlined, plug-and-play ecosystem, enabling researchers to integrate multi-omic and clinical data seamlessly and uncover biological insights that were impossible to detect a decade ago. All of these sources provide a rich array of data spanning numerous dimensions, from molecular-level insights to comprehensive patient health journeys.

Table 1. Omic Modalities and Their Captured Insights

Category	Data Modality	What it captures and/or quantifies
Molecular Mechanisms	Genomics	The genetic blueprint of an organism’s genome
	Epigenomics	Reversible chemical marks (e.g., DNA methylation, histone modifications) that modulate gene expression
	Transcriptomics	Dynamic gene-expression programs (mRNA abundance and isoforms)
	Proteomics	Proteins, their splice variants, and post-translational modifications
	Metabolomics	Small-molecule metabolites whose levels change with cellular activity and stress
Disease & Clinical Outcomes	Radiology & Functional Imaging	MRI, CT, PET, ultrasound imaging that quantify disease states over time in various organs
	Digital Pathology & Spatial Slides	Whole-slide histology, multiplex immunofluorescence, spatial transcriptomics mapping cellular phenotypes to anatomical context
	Electronic Health Records (EHR)	Structured labs, vitals, medications, procedures, plus unstructured clinical notes collected across years of care
	Longitudinal Laboratory Panels	Serial hematology, chemistry, and biomarker tests (e.g., HbA1c, troponin) tracking disease progression or therapeutic response

Why is Multi-Omics Poised to Make an Impact Now?

Multi-omics approaches use multidimensional datasets whose complexity often surpasses the capabilities of classical statistical methods. Although earlier computational approaches were effective, recent advances in AI/ML have not only dramatically increased computational power but also enable seamless integration across multiple datasets—each with its own unique architecture. By integrating neural networks—particularly advanced deep‐learning architectures, graph neural networks, and probabilistic causal frameworks—researchers can now uncover insights and identify connections that were too subtle for earlier computational methods, turning analytical challenges into strengths and offering a powerful means to decipher biological complexity. AI models integrate heterogeneous data modalities—such as DNA variants, RNA expression counts, protein abundances and metabolite concentrations—into unified latent representations, preserving critical biological interactions across layers. For example, alignment models facilitate a comprehensive understanding of complex systems by embedding different data types into shared latent spaces, maintaining biological coherence across diverse “-omic” layers.¹ Researchers have begun to apply these techniques to profile immune cells directly from patient samples. For example, Dominguez Conde et al. (2022) took early steps toward characterizing immune cells in both healthy individuals and diseased patients, aiming to understand how their multi-omic profiles shape the immune system’s adaptation and function in different tissue environments.²

Moreover, AI-based methods significantly improve data quality through noise reduction and imputation. Techniques, such as autoencoders and diffusion models reconstruct missing values, correct batch effects and enhance the signal-to-noise ratio in noisy assays. Variational autoencoders, for instance, have been successfully employed to impute missing data in single-cell multi-omics, dramatically enhancing analytical robustness.³ Additionally, supervised deep-learning models trained on clinical endpoints—including patient survival, relapse rates and therapeutic response—can accurately link complex molecular patterns to clinically relevant outcomes. These models distill intricate biological signatures into actionable insights, thereby accelerating precision medicine initiatives and facilitating personalized therapies (Lee et al., 2020).⁴

Foundation models trained on extensive multi-omics are increasingly valuable for generating testable biological hypotheses. These models predict causal interactions, protein structures and even simulate the effects of targeted genetic or pharmacological interventions. For instance, AlphaFold and similar AI systems demonstrate how computational predictions can effectively precede laboratory validation, dramatically shortening the cycle from data collection to biological discovery.⁵ AI is a critical translator, converting dense, molecular-level information into meaningful clinical insights and actionable therapeutic strategies, thereby bridging the gap between complex multi-omics data and tangible patient benefits.

Table 2. Standard Omic workflow

A Case Study in Low-Throughput Translation: γ-Secretase in AD

Consider γ-secretase inhibition in Alzheimer’s disease. Decades of biochemistry showed that γ-secretase generates amyloid-β peptides that aggregate into neurotoxic plaques; blocking the enzyme seemed like a slam-dunk therapeutic strategy. Yet Lilly’s semagacestat—a potent γ-secretase inhibitor—failed spectacularly in Phase III trials. Cognition worsened faster than in placebo, and adverse events spiked. One idea is that Amyloid processing is only one facet of a vast neurodegenerative network; γ-secretase also cleaves Notch receptors and other crucial substrates. If researchers had a multi-omic, systems-level view of neuronal biology—linking genomic risk alleles, transcriptomic stress responses, proteomic pathway crosstalk and metabolic dysfunction—they might have predicted these liabilities before thousands of patients were exposed. The lesson is clear: single-node interventions based on incomplete models can backfire.

Challenges That Remain

Although early precision medicine successes emerged from single-omic approaches, multi-omics strategies—despite their promise—face critical implementation hurdles. Foremost is data quality and standardization: unlike genomic sequencing’s unified formats, multi-omics suffers from inconsistent sample collection, processing methods and metadata curation, limiting cross-study comparability. A further obstacle is interpretability; predictive models often function as “black boxes,” failing to provide the transparent mechanistic insights regulators and clinicians require for trust and actionable decisions. Lastly, the scalability of experimental validation remains constrained, with wet-lab confirmation via phenotypic screening, organoid systems and CRISPR perturbations lagging far behind computationally generated hypotheses.

These bottlenecks—poor data standardization, opaque models and limited validation—represent significant barriers to realizing multi-omics’ clinical potential. Even with these challenges, research continues to make progress on removing these bottlenecks and with new predictive models combined with more efficient and standardized wet lab data collection. Additionally, new AI/ML approaches help fill in the bottlenecks and will be explored in a later section.

Table 3. Current Omic Bottlenecks and Pain Points

2. AI & Machine Learning make Multi-Omics Possible

AI tools are rapidly reshaping the landscape of biomedical research by solving longstanding challenges in data analysis, interpretation and utilization. Traditional methods in biology and medicine frequently face bottlenecks related to scale, accuracy and speed, limiting discovery and clinical translation. Today’s AI-powered tools offer unprecedented precision, automation and analytical depth, poised to resolve critical choke points throughout the multi-omics workflow—from initial raw signal cleanup to sophisticated drug design. Below, we highlight specific examples of the impact these AI tools are already making, along with a selection of emerging use cases. (For a deeper dive into our philosophy—more data isn’t better data; curated data is better data—see our prior AI white paper.)

For instance, one of AI’s transformative capabilities lies in converting biological sequences into accurate three-dimensional protein structures. Traditionally, researchers relied on experimental techniques like wet-lab protein crystallography, a process that could take months per protein and left most proteins structurally unresolved. AI-based solutions, exemplified by AlphaFold, have dramatically changed this reality. AlphaFold has generated readily accessible structural models for nearly 200 million proteins, empowering vaccine developers and enzyme engineers to rapidly obtain atomic-level detail in seconds instead of months.⁶

AI also significantly improves the quality and usability of genomic data. While advanced sequencing technologies such as long-read sequencers deliver critical insights, they often come with higher error rates compared to short-read counterparts. AI-driven models, including Google’s DeepVariant, address this issue by effectively “cleaning” raw genomic reads and boosting variant-calling accuracy to near-clinical standards.⁷ Such tools dramatically reduce the manual quality control burden and time—shaving weeks off the analysis pipeline and enabling faster translation from genomic discovery to clinical action. Additionally, AI facilitates the annotation and interpretation of complex single-cell datasets, a process that is notoriously labor-intensive and prone to subjectivity. Traditional manual annotation of million-cell datasets is both slow and variable across annotators. AI-driven solutions, such as the open-source popV ensemble, systematically assign cell-type annotations along with confidence scores. This automated process highlights only the ambiguous 10–15% of cells for expert review, significantly accelerating workflows and ensuring higher consistency and reproducibility across analyses.⁸

AI excels at integrating multi-omic data streams—such as DNA, RNA, and digital pathology—to create comprehensive predictive models. While individual biomarkers often fail to capture the complexity of diseases, AI-based multi-omic fusion models achieve remarkable accuracy. Recent pan-cancer studies employing deep-learning techniques have successfully combined diverse data types into unified survival risk scores. These AI-derived scores have consistently outperformed traditional stage-based predictions, delivering superior prognostic accuracy across studies involving more than 15,000 patients.⁹

Collectively, these advancements demonstrate AI’s potential to transform biomedical research, delivering faster, more precise and clinically relevant insights at unprecedented scales.

Table 4: Real-World Challenges and Examples of AI-enablement

Pain Point	What AI Can Do	Everyday Example
Too much data, not enough insight	Spots hidden warning-sign patterns that clinicians would never have time to sift out manually.	A machine-learning screen of newborn blood samples uncovered a 4-gene “early-warning” fingerprint for sepsis—flagging babies days before symptoms appeared.¹⁰
Different hospitals use incompatible equipment	Lines up images or lab results from many sites so they can be compared as if they came from one scanner or one lab.	An AI harmonization tool lets researchers combine breast-MRI databases into one study, boosting the accuracy of tumor-detection software across both hospitals.¹¹
Important signals are buried in noise	Cleans and sharpens data, filtering out scanner glitches or stray measurements.	In lung-cancer screening, an AI system that de-noised CT scans spotted malignant nodules earlier and with fewer false alarms than expert radiologists.¹²
Key test results are missing	Predicts likely values or tells clinicians which single test would add the most value, cutting down on repeat blood draws.	A study showed that AI imputation could reliably fill in missing lab results in electronic health-records for stroke and heart-failure patients, improving risk models without extra testing.¹³
Translate big data into clinical outcomes difficult	Converts continuous streams from wearables into medically meaningful alerts.	The 400,000-participant Apple Heart Study used an AI algorithm in a smartwatch to flag atrial-fibrillation episodes with 84 % accuracy, prompting users to seek timely care.¹⁴

3. Factors Driving Growth in Multi-Omics & AI

Three forces—capital, capability and clinical pull-through—are reinforcing one another and accelerating adoption of multi-omics platforms and solutions in today’s market environment.

I) Table 5: Plentiful capital for big data bioscience

What’s Happening		Why it Matters	Example
Capital for AI platforms is abundant	Investors see multi-omics + AI as the next big moment for biotech		Global multi-omics platform revenue is expected to nearly double—from ≈ $2.7bn (2024) to $5bn (2029)¹⁵
Big rounds for data-centric start-ups	Larger war-chests let companies build both wet-lab and compute infrastructure		The 20 biggest biotech start-ups raised $2.9bn in Q1 2024, many with AI/multi-omics pitches¹⁶
Generative-AI boom spills into life-sciences	General-purpose GenAI tools lower barrier to sophisticated modeling		Venture funding for GenAI hit $45bn in 2024, up ~2× YoY¹⁷
Strategic pharma partnerships	Pharma licenses data access and co-develops AI platforms instead of building in-house		UK Biobank’s new AI-ready proteomics program launched with 14 pharma partners

II) Capabilities for data generation and analysis continue to improve

The cost and economics of generating multi-omics data are changing rapidly: the price of whole-genome sequencing, once counted in the thousands of dollars, is now approaching the USD 200 threshold promised by the latest high-throughput instruments, effectively removing cost as the principal barrier to large-scale genomic studies.¹⁸ Parallel progress in proteomics has reduced mid-plex panel costs to well under USD 20 per sample, broadening access to routine protein profiling. At the same time, data resources have expanded in both depth and breadth. The UK Biobank has released metabolomic measurements for approximately 121,000 participants—and complementary panels quantifying roughly 3,000 plasma proteins—thereby creating an unprecedented reference for population-scale multi-omics analyses.¹⁹^,²⁰ These volumes would be unmanageable without a concurrent maturation of cloud-computing infrastructure. On-demand GPUs and browser-based “auto-ML” notebooks now allow investigators to execute multi-omic workflows that once required institutional high-performance clusters, placing advanced analytics within reach of modestly resourced laboratories. Finally, the regulatory climate is becoming markedly more receptive. Recent FDA guidance on the use of real-world evidence and tissue-agnostic companion diagnostics explicitly acknowledges integrated molecular signatures as acceptable decision-making inputs, thereby creating a clearer path from multi-omic discovery to clinical implementation.

III) Table 6: Early efforts point to big possible successes in the space

4. Luma Group’s Position and Vision

Scientific progress hinges on more than just data—it depends on the ability to make sense of it to improve outcomes for patients. At Luma Group, we invest in companies that are redefining how data is used to shape the future of drug discovery and development. While modern research tools can now produce unprecedented volumes of biological information—from single-cell sequencing to proteomic and metabolomic profiling—sheer quantity doesn’t guarantee clarity. The true advantage lies in the ability to connect disparate data streams, uncover hidden patterns and translate them into actionable insights.

For complex diseases, a holistic understanding of interrelated datasets is crucial to deciphering disease biology. Ultimately, these data function as an interconnected system, and forward-thinking companies increasingly rely on AI and ML to analyze these enormous datasets, revealing the complexities of many unmet medical needs and opening the door to new breakthroughs. We believe we are transitioning out of the genomics era and into the multi-omics era—one where integrated datasets and advanced analytical tools will transform the way we discover, develop and deliver the next generation of therapeutics.

Luma Group continues to champion this new era of innovation by focusing on companies that harness large multi-omics datasets alongside advanced AI/ML approaches. One of our earliest investments, Rome Therapeutics, built its discovery engine around an often overlooked repeatome, a human genome section that does not code for a human protein. By mining large patient datasets, the Rome team pinpointed a key correlation from the repeatome and pathways involved in autoimmunity, ultimately uncovering LINE1 as a novel target with broad therapeutic potential across multiple autoimmune indications.

We invested in Curve Bio because they applied similar multi-omics analysis to diagnostic applications. Their AI/ML-powered platform sifts through massive datasets to detect subtle changes in the methylation patterns of cell-free DNA—changes that strongly correlate with disease progression, particularly in liver tissues. These insights exceed standard-of-care detection and other methods in both sensitivity and selectivity, offering significant promise for earlier and more accurate diagnoses.

Our investment in Character Biosciences represents an archetype of Luma’s multi-omics investment strategy. As we recently detailed in a separate piece, Character Bio integrates ocular imaging, genetic profiles, metabolomics and patient natural histories to study dry AMD. The company uncovered novel biological pathways and therapeutic targets by applying AI/ML techniques to these massive datasets. With two lead assets set to enter the clinic within the next 12 months, Character Bio is on course to become one of the first to deliver approved therapies built on a multi-omics foundation, powered by advanced AI/ML analysis.

Our fund is optimistic that innovation within our sector will continue to grow as we leverage large multi-omics datasets with advanced AI/ML. At the heart of this growth are new, innovative tools and approaches that will lower the cost of generating these extensive datasets beyond just single-omics. We have begun to see a shift in how large consortiums are expanding their “-omics” footprint beyond just genomics. One prominent initiative that has embraced the power of multi-omics is the UK Biobank. This program has enrolled over 500,000 volunteers who will donate information—including biological samples, physical measurements, body and brain imaging data, bone density data, activity tracking and lifestyle questionnaire data—over the span of 30 years. Beyond genomics, the Biobank collects proteomic, metabolic, MRI imaging, natural history and other key datasets to holistically understand how these historically disparate data types interact. Their goal is to translate these insights into novel findings that can inform the development of new therapeutics and diagnostics.

Over the last decade, we have seen other private and public initiatives, such as the All of Us initiative and Project Baseline, set out to gather similarly large multi-metric data for the same purpose. The key to capitalizing on these datasets lies in understanding subtle and often hidden connections within them—an approach made possible through AI/ML methods that can identify insights too complex or minute for human intuition alone. In our portfolio companies, we have observed how crucial AI/ML approaches are for extracting valuable information and maximizing the potential of these datasets. This trend—integrating AI/ML with the collection of massive datasets through new and innovative tools—will likely continue and, in doing so, provide patients with a new generation of therapeutics and diagnostics.

5. Conclusion and Outlook

We have witnessed how impactful single-omics analyses, particularly genomics, have been in understanding disease pathology and progression, leading to dozens of approved drugs. However, single-dimensional “-omic” data inherently has limitations given the complexity of diseases and disorders we aim to treat. The initial wave of omics-based medicine was primarily driven by advances in genomic sequencing technologies and substantial reductions in sequencing costs, fueling significant innovation over the past two decades. Now, we see a similar technological advancement unfolding in other “-omics” domains—including proteomics, metabolomics, glycomics and beyond—with costs starting to decline in a manner comparable to genomics, setting the stage for further innovation. Yet, aggregating, managing and analyzing massive multi-omics datasets consisting of billions of data points poses unique challenges, necessitating more sophisticated artificial intelligence and machine learning methods. Both private and public sector initiatives are already emerging to address these challenges.

In the coming decade, we anticipate a new wave of innovation in multi-omics medicine, potentially surpassing the transformative impact initially driven by genomics. Luma Group aims to actively invest in and nurture this exciting frontier, positioning itself at the cusp of this transformative era in multi-omics medicine. If you’re building in the space, please reach out to us.

Argelaguet, R., et al. (2021). Multi-omics integration approaches for disease modeling. Nature Reviews Genetics, 22(6), 345–362. ↩︎
https://www.science.org/doi/abs/10.1126/science.abl5197 ↩︎
Lotfollahi, M., et al. (2022). Mapping single-cell data to reference atlases by transfer learning. Nature Biotechnology, 40(1), 121–130. ↩︎
Jumper, J., et al. (2021). Highly accurate protein structure prediction with AlphaFold. Nature, 596(7873), 583–589. ↩︎
Jumper, J., et al. (2021). Highly accurate protein structure prediction with AlphaFold. Nature, 596(7873), 583–589. ↩︎
https://www.nature.com/articles/d41586-022-02083-2 ↩︎
https://pmc.ncbi.nlm.nih.gov/articles/PMC11466455/ ↩︎
https://www.nature.com/articles/s41588-024-01993-3 ↩︎
https://www.nature.com/articles/s43018-024-00891-1 ↩︎
https://www.niaid.nih.gov/news-events/gene-signature-at-birth-predicts-neonatal-sepsis-before-signs-appear ↩︎
https://pmc.ncbi.nlm.nih.gov/articles/PMC8508003/ ↩︎
https://news.northwestern.edu/stories/2019/05/artificial-intelligence-system-spots-lung-cancer-before-radiologists/ ↩︎
https://www.nature.com/articles/s41746-021-00518-0 ↩︎
https://www.nejm.org/doi/full/10.1056/NEJMoa1901183 ↩︎
https://www.bccresearch.com/market-research/biotechnology/multiomics-market.html?srsltid=AfmBOordStMDYvNp3BPq_s_wZgT3nGfnzBzGpzJrUp4Wd1VNObRVqVz1 ↩︎
https://www.drugdiscoverytrends.com/20-biotech-startups-attracted-almost-3b-in-q1-2024-funding/ ↩︎
https://www.mintz.com/insights-center/viewpoints/2166/2025-03-10-state-funding-market-ai-companies-2024-2025-outlook ↩︎
https://www.biostate.ai/blog/genome-sequencing-cost-future-predictions ↩︎
https://www.nature.com/articles/s41597-023-01949-y ↩︎
https://www.ukbiobank.ac.uk/enable-your-research/about-our-data/past-data-releases ↩︎

Data-Driven Biotechnology: How Multi-Omics Analytics Are Shaping the Future of Medicine

Quick Links

Social Media

Footer

Quick Links

Social Media