The Long and Short of Targeted Sequencing

 The Long and Short of Targeted Sequencing
Jeffrey Perkel has been a scientific writer and editor since 2000. He holds a PhD in Cell and Molecular Biology from the University of Pennsylvania, and did postdoctoral work at the University of Pennsylvania and at Harvard Medical School.

Nobody wants to hear the word “cancer.” But when it comes to tumors of the prostate, the emotional response often is exacerbated by confusion: Many individuals have tumors that won’t spread and are candidates for active surveillance; those who aren’t so lucky require surgery or radiation and chemotherapy. The ongoing challenge for physicians is early and proper diagnosis, and matching that to the correct treatment. 

Carlos S. Moreno, associate professor of pathology and laboratory medicine at the Winship Cancer Institute of Emory University, and colleagues decided to look for biomarkers that could help sort that out. Their assay of choice: RNA Sequencing (RNA-seq). Moreno and his team extracted the RNAs from 106 prostatectomy samples, removed the ribosomal RNA, synthesized cDNA from the remaining material and prepared sequencing libraries. Nearly a half-trillion bases later, the team hit upon a gene-expression signature that correlated strongly with tumor progression [1].

“We identified a set of 24 genes that were very accurate at identifying patients that would recur following surgery,” Moreno says. “It works better than some commercially available panels, and it has some promise at being able to detect aggressive prostate cancer.”

Such a strategy makes sense when researchers are on the hunt for biomarkers and don’t know where to look. But whole-transcriptome RNA-seq experiments are both financially and computationally expensive, so researchers sometimes reduce complexity by focusing on a relatively small subset of the transcriptome instead. Here’s how.

Why target?

Not long ago, researchers interested in quantifying gene expression used DNA microarrays or quantitative real-time PCR (qPCR). Others used proteomics or metabolomics to look for signs of gene-expression changes. But in this age of “next-generation DNA sequencing,” more and more researchers are turning to NGS.

Unlike microarrays and qPCR, NGS is “unbiased”—it doesn’t require researchers to know in advance what they’re looking for. It also reveals how many molecules of each sequence are present and shows such details as transcript architecture and sequence variants. But all that depends on how the experiment is run.

Illumina's sequencers break long DNAs into tens of millions or even billions of fragments, each a couple hundred bases long. That sequencing depth means researchers can often identify rare sequences, but it complicates transcript assembly—that is, figuring out which exons were on the same transcript.

Pacific Biosciences produces fewer, longer reads—about 75,000 reads per four-hour run, with an average length of 10,000 bases each. That allows researchers to read transcripts from end to end, which is especially useful for identifying novel transcript isoforms.

Given the complementary strengths of these two approaches, many researchers combine PacBio and Illumina sequencing for transcriptome analysis.

Trouble is, the benefits of NGS accrue with sequencing depth. So although it is true that RNA-seq can detect rare, novel transcripts and isoforms, “in practice, most people don’t sequence deep enough” to find them, says Anthony Schweitzer, principal scientist and head of bioinformatics in the Expression Business Unit at microarray vendor Affymetrix. In trying to balance budgetary concerns and population sizes, researchers often squeeze multiple samples per sequencing lane, reducing costs as well as sequencing depth per sample. Highly abundant transcripts can disproportionately consume sequencing reads, meaning rare transcripts may never be seen.

In September 2014, the Sequencing Quality Control (SEQC) project published “a multisite, cross-platform analysis of RNA-seq measurement performance” [2]. Two reference RNA samples (alone and in combination) were sequenced at 10 laboratories using sequencing technologies from Illumina, Life Technologies (now Thermo Fisher Scientific) and Roche. Those data—more than 100 billion reads totaling 10 terabases—were compared against both microarray and qPCR datasets and then analyzed for their ability to drive “junction discovery and differential expression profiling” studies.

Among other findings, the team documented just how hard it is to comprehensively capture the richness of the human transcriptome. For basic expression profiling, Illumina recommends 10 million reads on its NextSeq sequencer, according to Kevin Meldrum, senior director of product marketing at Illumina, and 50 million reads for discovery work. Yet in poring through some “12 billion mapped HiSeq 2000 RNA-seq fragments,” the SEQC researchers were still detecting known transcripts even at one billion reads, and exon-exon junctions at 10 billion reads [2].

“Standard RNA-seq only gives the tip of the iceberg,” says Ji Wu, international marketing director at Roche NimbleGen.

Get strategic!

Still, you have to start somewhere. Meldrum notes that researchers can combine the company’s whole-transcriptome workflows and its BaseSpace cloud-based informatics platform to identify candidate biomarkers that vary between conditions.

“At that point, you would like to explore those [genes] more broadly, and that’s where you pivot from something that is more transcriptome[-level] to something more targeted,” he says.

Targeting enables researchers to make the most of their sequencing runs by concentrating their efforts on sequences of interest—say, those that have been flagged in first-pass whole-transcriptome studies. Moreno, for instance, is now testing a 109-gene panel, including the 24 genes he previously identified, to see how it performs using exosomal RNAs found in urine. 

To acquire those data, Moreno used Precise™ Molecular Indexing™ assays from Cellular Research to capture the specific sequences of interest from 16 prostatectomy and nine urine samples; he presented his findings at the 2015 Advances in Genome Biology and Technology meeting

Precise assays allow researchers to introduce up to 6,500 unique-sequence barcodes into their samples during cDNA synthesis and amplification to overcome amplification bias and produce accurate estimates of transcript abundance, says Martin Pieprzyk,Cellular Research’s director of marketing. By counting barcodes instead of reads, researchers can determine how many copies of each transcript are accounted for from tens to thousands of amplicons.

“It’s like a movie ticket,” Pieprzyk explains. “Even if I make 10 copies of the ticket, you know they are copies of the original because each has a unique number.”

Moreno says he chose the Precise approach because the assay—which originally was designed to work with single cells—can handle very low inputs of RNA, a critical factor in exosome research.

The technology also should reduce experimental costs, Moreno adds. “Because you’re sequencing only a small number of genes, then you can multiplex many patient samples into a single lane,” he says. As a result, he estimates he will save “close to an order of magnitude reduction in cost and maybe even more.”

Other companies have commercialized alternative targeting strategies. Illumina offers predesigned TruSeq Targeted RNA Expression gene-panel designs covering such signaling pathways as p53 and Wnt, as well as custom designs for 12 to 1,000 genes, Meldrum says. Both are compatible with the company’s MiSeq and NextSeq 500 sequencers and are based on PCR amplification of targeted sequences.

Roche NimbleGen offers four kits under its oligonucleotide capture-based SeqCap RNA brand: a predesigned kit to enrich long noncoding RNAs, and custom kits capable of up to 100 Mb of user-defined targets in human DNA or up to 200 Mb in nonhuman or nonconventional transcriptomes. (Agilent also offers an oligonucleotide capture-based approach under its SureSelect brand; the company was not available for comment.)

Tradition has its place

Of course, researchers also can use traditional PCR for targeting. In one recent study, researchers at Uppsala University in Sweden used that approach to sniff out BCR-ABL1 mutations (which affect therapeutic response) in chronic myelogenous leukemia (CML). The team amplified a 1.6-kb BCR-ABL1 cDNA from blood or bone-marrow samples from six CML patients and sequenced the resulting material on a PacBio sequencer, generating some 32,000 reads per sample [3].

Among other things, the team was able to identify mutations that had been missed by Sanger sequencing in these patients and to differentiate mutations that co-occurred on the same transcript from those that were present on different molecules. They also detected novel transcript isoforms. 

“That’s amazing, because the gene fusion has been known for 20 years,” notes PacBio chief scientific officer Jonas Korlach. “You would think everything [about it] is known, but with new technology, we see new things.”

References

[1] Long, Q, et al., “Global transcriptome analysis of formalin-fixed prostate cancer specimens identifies biomarkers of disease recurrence,” Cancer Res, 74:3228-37, 2014. [PubMed ID: 24713434]

[2] SEQC/MAQC-III Consortium, “A comprehensive assessment of RNA-seq accuracy, reproducibility and information content by the Sequencing Quality Control Consortium,” Nat Biotechnol, 32:903-14, 2014. [PubMed ID: 25150838]

[3] Cavelier, L, et al., “Clonal distribution of BCR-ABL1 mutations and splice isoforms by single-molecule long-read RNA sequencing,” BMC Cancer, 15:45, 2015. [PubMed ID: 25880391]

 

Image: Shutterstock

  • <<
  • >>

Join the discussion