First, there was genomics—recall how, just 15 years ago, the sequencing of the human genome seemed nigh-on miraculous? Then came transcriptomics (the high-throughput study of the entire set of RNA molecules produced by the genome) and proteomics (the large-scale study of the proteins made by the protein-coding set of RNAs). Now, as advances in technology have made the complete sequencing of genomes and transcriptomes a fairly routine and affordable matter while, at the same time, the mind-boggling complexity of the human proteome has become clear, a new omics is taking the lead: proteogenomics.

Proteogenomics operates at the intersection of these other three omics fields, and aims to fill in the gaps between them. Proteins cannot be amplified using PCR as DNA can, and the vast diversity of post-translational modifications (such as phosphorylation, glycosylation, and acetylation) that take place in most proteins means that generic proteomic databases lack sample-specific variation. But by synthesizing customized databases of protein sequences obtained from genomic data using tandem mass spectrometry, proteogenomics can detect variant peptides that are missed by classical proteomic approaches—allowing for the annotation of protein-coding genes.

Transforming cancer therapy

Proteogenomics has the potential to transform cancer therapy, allowing for truly precision oncology, says Karin Rodland, chief scientist for biomedical research and director of biomedical research partnerships at Pacific Northwest National Laboratory. Dr. Rodland is the principal investigator for one of three Proteogenomic Translational Research Centers that are part of the NCI-sponsored Clinical Proteomic Tumor Analysis Consortium (CPTAC). These Centers use proteogenomics in the context of NCI-sponsored clinical trials to understand drug response and resistance to current therapies.

“We are using clinical samples from SMART [sequential multiple assignment randomized trial]) studies to develop precision medicine approaches that combine genomics, proteomics, and phosphoproteomics to predict responsiveness before prescribing a targeted therapy,” she says. “Success on this front would mean no more stumbling around, no more trial and error, trying to figure out which targeted therapy a particular patient will respond to. The FDA is clearly moving toward focusing its approvals on mutations rather than a tumor type of origin. We would also like to see them moving toward approvals based on which specific pathway is upregulated.”

Over the past two years, CPTAC has released a series of large-scale proteogenomic studies of ovarian, colorectal, and breast cancer, linking DNA mutations to protein signaling. These key reports offer new insights into these cancer types, including proteomic-centric subtypes, key driver mutations, and post-translational modifications involved with cancer-relevant pathways. The breast cancer study, for example, identified novel protein markers and signaling pathways for breast cancer subtypes and tumors carrying frequent mutations such as PIK3CA and TP53 mutations. By correlating copy number alterations in some genes with protein levels, the researchers were able to identify 10 new candidate regulators—two of them for a particularly aggressive breast cancer subtype known as “basal-like” tumors.

Dr. Rodland’s group recently finished the proteomic analysis of another 100 samples of prospectively collected ovarian tumors. Details of the findings remain embargoed while the paper is in the submission process, but she notes that all important findings from the previous reports were confirmed.

In its next phase, CPTAC aims to conduct similar proteogenomic analyses of six additional tumor types: pancreatic ductal adenocarcinoma, lung adenocarcinoma, lung squamous cell carcinoma, uterine corpus endometrial carcinoma, clear cell renal cell carcinoma, and glioblastoma multiforme.

Drug repurposing

Kelly Ruggles, Ph.D., assistant professor of medicine at NYU Langone Health, leads a laboratory focused on multi-omic data integration and co-chairs CPTAC’s data analysis working group. “I’m really interested specifically in druggability and repurposing drugs by building integrative analysis focused on signaling pathways and drug databases,” she says. “In the breast cancer CPTAC paper, for example, we did an outlier analysis looking for extreme values in our data, with that we found kinase phosphorylation sites that were expressed at high levels in specific subtypes of breast cancer. When we enriched for kinase phosphorylation status by subtypes, we found multiple potential new targets for druggability. We will now extend that to the new tumor types.” She predicts major new findings will be released within the next year.

Key technological challenges for the CPTAC teams and for proteogenomics in general include sample size and spatial resolution. “With both RNA and DNA, you can do single-cell sequencing. It’s very easy to laser capture a microdissection of a tumor and look at the epithelial vs. the adjacent stromal cells,” says Dr. Rodland. “But as we still lack a protein amplification method, you already have to know what you’re looking at. Then you can amplify in a nucleic acid linker—but you’re still amplifying the signal, and not the target.”

Improvements in mass spectrometry sensitivity, along with enhancements to sample handling upstream, should begin to solve spatial resolution issues, driving down the lower limit of sample that can be analyzed well. “Right now, our lowest limit is about 1.5 mm, which is good enough to get you highly epithelial and stromal regions,” explains Dr. Rodland. “Both the Broad Institute and our group here at PNNL are working very hard to drive that limit down even further.” CPTAC has also benchmarked its mass spectrometry platforms against a defined set of samples to ensure reproducibility and robustness of the MS measures across multiple laboratories, and set rigorous standards for proteomic experiments.

“The beauty of proteogenomics is that it allows us to look at exactly what is happening within the tumor cell,” Dr. Ruggles enthuses. “We see the status of the proteins at that moment and can then trace it back to the genome level. And from the other side, we can test hypotheses we develop from genomic observations at the protein level. By taking these big studies that look at the protein and the genome together, we can identify genetic markers that are associated with specific clinical developments. That’s the real strength of proteogenomics for translational research.”