Beginning in 1977, with the introduction of the capillary electrophoresis-based Sanger method, gene sequencing has undergone improvements and cost reductions reminiscent of the semiconductor industry. Building on the milestone Human Genome Project, which by its completion in 2001 had taken 15 years and cost $3 billion, progress came quickly.

Illumina’s Genome Analyzer, released in 2005, improved sequencing throughput from about 84 kilobases to 1 gigabase per run. From that point, improvements in next-generation sequencing (NGS) have easily outpaced Moore’s law by more than doubling throughput each year, with costs simultaneously falling. Illumina’s HiSeq X Ten System, introduced in 2014, sequenced 45 human genomes per day at less than $1,000 per genome. Just three years later, the debut of the Illumina NovaSeq 6000 system ushered in the capability of the $100 genome. Readers should check a thorough but accessible explanation of NGS on the Illumina website.

Five manufacturers dominate NGS instrumentation: Illumina, Ion Torrent (a Thermo Fisher Scientific company), Pacific Biosystems, Roche, and SOLiD (also a Thermo Fisher Scientific company). Genohub has compiled an exhaustive table of instruments and their capabilities.

Targeted NGS always involves some type of target enrichment.

NGS can be used to sequence the entire genome or specific regions of interest. Whole-genome sequencing is primarily a discovery or research activity, whereas targeted sequencing is more widely used in clinical research and diagnostics. Targeted NGS always involves some type of target enrichment.

Andrew Barry, product marketing manager for target enrichment at New England Biolabs, divides the two main NGS user groups into research and clinical. “Research is more about discovery, and running experiments for which you don’t know the answers. Clinical genomics seeks to identify or check if a gene variant is present.” As we will see, the type and extent of NGS that users employ is critically related to the type of target enrichment they select.

What is target enrichment?

Target enrichment is a generic term for techniques that increase the abundance of target molecules derived from the whole genome, before sequencing. Enrichment is highly desirable when the objective is to sequence only part of the genome, for example the exome or one specific gene or locus. Since library prep fragments the entire genome, regions of interest will comprise a tiny fraction of available DNA. To conserve resources, investigators therefore enhance the signal from genes of interest. The idea is the same as for any other type of analytical sample prep: increase signal, reduce noise.

Targeted sequencing can be broadly categorized into two types based on the technique. Hybrid capture methods use complementary probes to capture the target of interest, while amplicon-based approaches rely on multiplexed polymerase chain reaction (PCR). A third technique, selective circularization (also called molecular inversion probes, MIPs), uses a gap-filling technique plus ligation to create circular gene constructs consisting of a universal sequence and the sequence of interest. Unlike hybridization and PCR-based amplifcation, MIPs does not enjoy the support of a commercial instrument platform.

“The large number of options for target enrichment can be overwhelming,” adds Barry. “Depending on the application and need, some methods work better than others.”

Hybrid capture is well-suited for targeting large regions such as the whole exome, comprising all coding regions of the genome. This requires a large amount of input DNA for library preparation. When sample input is limited, such as somatic mutation detection in tissue biopsy specimens, amplicon-based approaches have an advantage because of their deep, even coverage compatibility with low sample input. Amplicon-based NGS libraries are also faster and easier to prepare. “Panel size may be small when the clinical phenotype is clearly defined. Amplicon-based methods may be more attractive for these applications as well,” says Arvind Kothandaraman, director, NGS product portfolio, applied genomics, Revvity.

In his review of enrichment methodology, Florian Mertes of the Max Planck Institute enumerates salient features such as enrichment factor, specificity (ratio of sequence reads on to off the target region), coverage or read depth, evenness of coverage across the target region, method reproducibility, required amount of input DNA, and overall cost per target base of useful sequence data. These attributes occur in unique combinations among commercial products. Suppliers will be happy to provide data on any of these performance factors and how they relate to your application.

In any sequencing run, coverage refers to the number of reads that identify a particular region of interest. Identifying single-nucleotide polymorphisms (SNPs), mutations, and rearrangements generally requires coverage of 10- to 30-fold, whereas chromatin immunoprecipitation requires about 100 reads. These numbers are based on achieving statistical validity. Whole-genome sequencing on modern instrumentation provides coverage of between 30- to 50-fold, whereas targeted approaches achieve coverage levels up to a 1,000-fold higher, providing a much higher degree of certainty that results are valid.

Since target enrichment connects library preparation to the actual sequencing step it must be compatible with those two operations. Before selecting an enrichment method users need to check that the library preparation under consideration digests DNA in a way that facilitates capture of the desired fragments, and does not introduce fragment biases that favor enrichment of one species over another. Similarly, the output of sample prep (library plus enrichment) must be compatible with the sequencing platform and protocol.

Library preparation kits for major NGS platforms are available from Agilent (ClearSeq and SureSelect Capture) Archer, (FusionPlex, also for Ion Torren platforms), Bioo Scientific (NEXTflex), IDT (xGen), Illumina (TruSight), Nimblegen (SeqCap), Qiagen (GeneRead), New England Biolabs, and others. Again, before undertaking a sequencing project check with the supplier that the library prep is compatible with your preferred enrichment technique (and vice versa).

Kits for targeted enrichment are available for known regions of interest (e.g. specific diseases) or may be custom-designed. Scientists who prefer off-the-shelf panels (assuming they apply to their research) can purchase them from more than a dozen companies (see sidebar below).

Illumina’s Sequencing Assay Designer software allows users to create their own enrichment panels suitable for microarray-based experiments.

A typical hybridization workflow

Capture-Seq, a genotyping platform commercialized by RAPiD Genomics, involves a workflow that is typical of hybridization-based enrichment. Capture-Seq can cover the genome evenly or selectively for genome-wide association studies, or can home in on single nucleotide polymorphisms (SNPs), individual whole genes, quantitative trait loci, or genomic selection models. Targeted regions range from 1 kilobase to 100 megabases within the genome.

The CaptureSeq workflow involves the design of probes to capture regions of interest, library construction, enrichment, and sequencing. Target selection uses biotinylated probes reversibly bound to an enzyme fixed to a magnetic bead. After enrichment, off-target DNA is washed away, leaving only bound target—the only species remaining that is compatible with Illumina sequencing.

Orin McCormick, sales manager at RAPiD Genomics, explains enrichment succinctly: “If you’re sequencing unenriched samples and you’re interested in a panel of 500 genes, that total base pair amount—assuming a generous 4 kb gene size—may be just 0.1% of the genome. If you sequence unenriched samples, you’ll throw away 99.9% of your data.” Enrichment eliminates waste by enhancing the signal by orders of magnitude.”

“The only reason to invest in target enrichment is to increase the sequencing efficiency,” McCormick says. “Otherwise there would be no point and you would just whole-genome sequence everything.”

Biocompare’s Target Enrichment Search Tool
Find, compare and review target
enrichment tools from different suppliers Search

Amplicon-based methods provide exponential amplification of target sequences only. The presence of just a few PCR targets is often sufficient to amplify that target to levels that far exceed any non-target genes that may have survived the initial sample preparation steps. PCR products also tend to be much smaller than background regions of concerned, so these enrichment methods often involve a purification step to select smaller fragments. “Signal to noise issues tend not to be a problem with PCR-based enrichment,” Barry says.

That being said, the connection between libraries and enrichment cannot be overstated. McCormick stresses that designing probe sets and protocols is a critical component of target enrichment. “Selecting good regions to target and designing probe sequences is not trivial.”

Hybridization-based enrichment typically involves preparing a library, adding universal adapters to molecules of interest, hybridizing, washing, and amplifying the target. NEB’s technique, NEBNext Direct®, hybridizes bases directly to genomic DNA without the need to construct a library. NEB claims this cuts significant time, allowing an entire NGS sequencing workflow in a single day.

The difference is that NEB’s protocol enriches before PCR amplification instead of after. PCR introduces biases favoring AT content at the expense of GC constructs, meaning that GC regions are harder to enrich and amplify. Since conventional target enrichment amplifies all fragments containing the universal sequence, this bias carries over into the pool of gene fragments that is enriched and eventually sequenced.

Within the larger context of NGS, target enrichment will continue to evolve, but until the next big thing in sequencing, improvements are likely to be incremental. Barry mentions the potential for using barcodes to identify the origins of amplification products as one potential future improvement. Barcoding involves adding to each universal sequence used to tag target genes a separate, completely random sequence. This will ensure that each adapter, even though it is universal, contains a unique identifier that connects it to a specific hybridization.

“When you do PCR amplification you’re making lots of copies of an original starting number of molecules, but it’s difficult to tell from data whether the sequence reads are PCR copies of one another, or arose from true biological replicate molecules,” Barry explains. “This is important to know when you’re trying to understand variant frequency. Adding these unique ‘barcodes’ to DNA before amplification will enable experiments to determine the genesis of molecules they sequence, and know how many were present in the original sample.”

Applications will also broaden as more instruments and test platforms are approved for clinical use. NGS has already become indispensable in cancer research, due to its ability to discern specific, regional, or global genetic abnormalities. “The application of targeted NGS for studying mutations in liquid biopsy samples is particularly promising,” says Revvity’s Kothandaraman. “By eliminating the need for expensive, painful, invasive tissue biopsy, targeted NGS of liquid biopsy samples can potentially enable timely diagnosis, monitoring of therapy, and cancer recurrence. This marks a major shift in cancer research.”

Companies Offering Target Enrichment Kits and Reagents
AgilentPacific Biosciences
Arbor BiosciencesParagon Genomics
Boreal GenomicsRevvity
IlluminaRoche
Integrated DNA TechnologiesRubicon Genomics
New England BiolabsThermo Fisher Scientific
Nugen