It’s not an understatement to say that next-generation sequencing (NGS) has revolutionized biomedical research. What once took years of painstaking genetics work—if it could be done at all—can now be performed many times over in the virtual blink of an eye, at less cost and finer resolution, with ability to process enormous sample sizes and obtain orders of magnitude more data. Yet even with the cost of sequencing continuing to drop, it’s not always necessary—and is sometimes counterproductive—to sequence an entire genome. Sometimes sequencing a small portion of the genome serves the researcher’s purpose.

By using a targeted enrichment strategy, it’s possible to pare down the amount of sequencing required for a given project, saving time, sequencing costs and of course the bioinformatic torrent that comes with all that data.

The exome—the protein-coding regions of the genome—for example, is only 1% to 2% of the genome, yet it includes about 85% of known disease-causing mutations.

Thus, sequencing only the exome (whole exome sequencing, or WES) saves 98% to 99% of the sequencing effort compared with whole genome sequencing (WGS). More targeted sequencing panels can winnow that down even further.

Here we look at popular ways in which researchers reduce their sequencing burden using targeted enrichment and weigh relevant considerations, including signs that it’s time to bite the bullet and go whole-hog WGS.

Two bins

Target enrichment itself is typically accomplished either by capture or by amplification (although the capture method also includes an amplification step). That is, the desired DNA is fished out of a pool of (usually genomic) DNA, or it’s selectively synthesized from such a pool. There are a variety of iterations of the technologies that fall within these paradigms—in some cases combining them—the specifics of which fall beyond the scope of this article.

A capture target-enrichment method usually starts with NGS library preparation—shearing the DNA, attaching adapters to the fragments and PCR amplifying (and perhaps attaching indexes in the process). The DNA is then hybridized to a pool of biotinylated probes complementary to the segments of interest and then pulled out with streptavidin. The complementary strands are released by another round of PCR, and the now-enriched fragments can be sequenced.

NEBNext Direct uses a similar approach but hybridizes its probes directly to genomic DNA, followed by an enzymatic digestion to remove off-target sequence and a subsequent NGS library-preparation step.

Hybridization for enrichment used to be done in a microarray-like format, but “it’s all in a solution now—solution is much more convenient,” explains Mike Leous, group marketing manager of Roche’s sequencing business. “I don’t know anyone who still does it in a solid state.”

Target enrichment can also be accomplished using multiplex PCR-like strategies, generically termed amplicon-based methods. The DNA may be enzymatically digested and probes used to capture the ends to make circular DNA templates, or genomic or randomly sheared DNA can be used, but the principle is the same: Primers specific to the regions of interest selectively amplify only the DNA from the regions of interest, leading to a pool of DNA that is highly enriched for those regions.

How much?

The idea of only sequencing what is potentially relevant—thereby avoiding the time, expense and data deluge of WGS—is very appealing.

By enriching for the exome, for example, researchers can concentrate their efforts on only sequences known to code for proteins.

Different vendors offer variations on exome enrichment, with some products including the 3’ or 5’ untranslated regions, or extras like certain types of noncoding RNAs. Probe length and density, and the algorithms used to generate them, vary as well, and each product gives incomplete but overlapping coverage—so it’s always a good idea to do some library research before choosing a product for the bench.

“If your project is exome-based, then it is better to use a method that targets exomes,” says Dimitrios Monos, professor of pathology and laboratory medicine at Children’s Hospital of Philadelphia. But he cautions that if you are looking at cancers and need to know the sequence of a region where an inversion or a deletion is suspected, for example, “you want to target genomic regions” that include nontranslated DNA, as well.

Roche’s NimbleGen and others offer a wide range of DNA panels. “We have an exome designed to enrich the coding regions of the human genome and offer more complete coverage of approximately 4,000 medically relevant genes. Or you can tell us you want only to look at the prostate cancer genes. We have some comprehensive cancer panels, or a mitochondrial panel, where a researcher can just say, ‘I want this design that you have already optimized,’” explains Leous. “Or they can say, ‘I just want these 10 genes.’ It’s fully customizable.”

“In-solution hybridization is more appropriate for very large panels—from a megabase of sequence content that you’re trying to capture, up to exomes,” which tend in humans to be between 30 and 60 megabases, says Andrew Barry, product marketing manager for target enrichment at New England Biolabs (NEB). “That’s the sweet spot. As you drop below that, you tend to see some loss of specificity, so you start to bring along a lot of off-target reads.” Multiplex-PCR approaches “work really well for very focused panels—they tend to be faster, and they tend to have lower input requirements than in-solution hybridization,” he explains. But “it becomes very hard to scale content.” Barry sees “novel” and sometimes hybrid approaches like Agilent’s HaloPlex, Illumina’s TruSeq Amplicon and NEB’s NEBNext Direct as attempts to bridge the gap.

Which is not to say that vendors don’t offer products outside of Barry’s sweet spots: Small, in-solution hybridization panel offerings are plentiful. Meanwhile, Thermo Fisher Scientific’s Ion AmpliSeq Exome RDY S5 Kit uses “ultrahigh multiplex-PCR” to enrich an exome for sequencing on its Ion S5 System.

The rare cases

Enriching the genome to look for a common hereditary mutation in a coding region is one thing. But trying to find a rare somatic aberration, one that occurs in repetitive DNA, or even genotyping a non-human, non-model species, can present different challenges.

In WGS, with a 20x to 30x coverage of the genome, “you’re really only [going] to be able to find germline mutations that are 50% or higher,” points out Barry. But “when you start to think about the things that are present in liquid biopsies, cell-free DNA, we’re talking about getting very high depths of coverage in order to find variants that are present at only 0.1%.” And that, he claims, is not “tractable from a cost perspective” without first paring down the amount of DNA being sequenced.

Subscribe to eNewsletters
Get the latest industry news and technology updates related
to your research interests.

Yet Janine Meienberg, a post-doc at the Foundation for People with Rare Diseases, no longer performs target enrichment, because it “fails to achieve complete coverage even of the targeted region … whereas these regions can be completely covered by 60x PCR-free WGS,” she reports [1]. Insufficient targeted enrichment and PCR steps during library preparation inevitably introduce bias—the higher the GC content, the lower the coverage with WES. With a mean coverage of 60x, the minimal coverage of high-confidence coding exons with PCR-free WGS was 13x, “which is still a read-depth you can work with; we found that for the WES there are coding regions which are below that or even not covered at all.” A downside of PCR-free WGS is that it requires more starting material.

As for the analysis of all that data, Meienberg sets most of it aside (using what’s sometimes termed in silico panels or bioinformatic masking), “so we have the WGS data, but then we filtered them to look at just the regions of interest.”

Sometimes the issue is finding something that doesn’t map directly to a reference genome—such as would be the case with a structural defect like a translocation or a highly polymorphic region. Monos used his Region-Specific Extraction (RSE) method to obtain selected segments in excess of 20 kilobases in length, coupled to long-read sequencing, to map a four-megabase stretch of the major histocompatibility complex (MHC) [2].

Or sometimes there is no comprehensive reference genome to start with, as is the case with many agriculturally (or scientifically) important organisms. In this case, a custom-designed panel “of upwards of 200,000 different regions of the genome, if that’s what’s required,” may be a good alternative to WGS, says Orin McCormick, sales manager at RAPiD Genomics. “WGS is not really a cost-effective option when you’re looking at genotyping a large population of something like pine, for example, which can be upwards of 20 gigabases.”

To pare, or not to pare, that is the question. And if so, how to choose? Questions such as: What is my target territory? How many regions of interest am I trying to capture? How much starting material do I have? and How many samples do I have? should be weighed along with budgetary and workflow considerations and the need for performance metrics such as specificity, uniformity and sensitivity. It’s also important to check out the commercial options, look into contract suppliers and see what your colleagues are doing for similar projects.

References

[1] Meienberg, J, et al., “Clinical sequencing: is WGS the better WES?” Hum Genet, 135(3):359-62, 2016. [PMID: 26742503]

[2] Dapprich, J, et al., “The next generation of target capture technologies—large DNA fragment enrichment and sequencing determines regional genomic variation of high complexity,” BMC Genomics, 17:486, 2016. [PMID: 27393338]

Image: Shutterstock Images