Maximize Your NGS with Sequence Capture

 Maximize Your NGS with Sequence Capture
Jeffrey Perkel has been a scientific writer and editor since 2000. He holds a PhD in Cell and Molecular Biology from the University of Pennsylvania, and did postdoctoral work at the University of Pennsylvania and at Harvard Medical School.

For all that genome-sequencing costs have fallen, they’re still relatively high — for many researchers, at least those with large sample sizes, prohibitively so.

Fortunately, there is a solution. Sequence-capture technologies, as their name suggests, allow users to specifically pull out a desired fraction of a sample for sequencing, thereby concentrating the sequencer’s energy on the As, Cs, Gs, and Ts of interest while ignoring portions that are irrelevant.

Researchers can use sequence capture technologies to concentrate on, and sequence to higher coverage, custom gene sets (for instance, genes implicated in cancer) or genomic regions, or for exome sequencing, in which that 1–2% of the genome that actually codes for protein is specifically read out.

And it isn’t just small labs doing this work either. Even the 1000 Genomes Project Consortium, including sequencing facilities at BGI-Shenzhen, the Broad Institute, and the Wellcome Trust Sanger Institute, among others, takes advantage of sequence capture technologies. As they detail in a November 1 Nature paper on nearly 1,100 human genomes, “Primary data generated for each sample consist of low-coverage (average 5x) whole-genome and high-coverage (average 80x across a consensus target of 24Mb spanning more than 15,000 genes) exome sequence data, and high density SNP array information.” [1]

There are two basic approaches to sequence capture: PCR-based and hybridization-based, the latter of which can be done either in solution or on the surface of an oligonucleotide microarray. In practice, says Baiju Parikh, marketing manager for sequencing solutions at Roche Applied Science, few researchers use the array-based method any more, and none of the major manufacturers still offers such tools off-the-shelf

Hybridization-based Tools

Solution-based capture tools are available from Agilent Technologies, Roche NimbleGen, Illumina, and Life Technologies, the former two of which were used in the 1000 Genomes Project Consortium paper. All are based on solution hybridization selection (SHS), in which a pool of biotinylated oligonucleotides is used to capture some subset of a sequencing library of interest. Following pull-down with streptavidin beads and washing, the hybridized fragments are released and ready to sequence.

Agilent’s SureSelect technology is based on 120-mer RNA oligonucleotides. The company’s SureSelect Human All Exon V4 exome-capture kits come in two varieties, an exon-only version capturing 51 Mb of genomic DNA, and an exon plus untranslated regions kit covering 71 Mb. The protocol is simple enough: shear DNA, ligate sequencing adaptors, PCR amplify, capture, elute, and amplify again.

Agilent also offers a SureSelect Methyl-Seq kit for specific capture of potentially methylated regions, such as promoters and CpG islands totally 80 Mb. Unlike the All Exon SureSelect kit, the Methyl-Seq kit skips the pre-capture PCR amplification step to preserve methylation of the genomic DNA. That way, following capture, the DNA can be bisulfite treated to identify methylated bases.

“We spent a lot of time optimizing the library preparation to work without PCR,” says Olle Ericsson, Marketing Director for DNA sequencing applications, who adds that the company is unique in offering such a product. “The only other way to get single-base-pair methylation analysis is whole-genome sequencing, and that’s very inefficient,” he says.

Roche NimbleGen’s solution-based SeqCap EZ products use 2.1 million biotinylated DNA oligonucleotides ranging from 55 to 105 bases in length to effect sequence capture.

“That’s the big differentiator for us,” says Parikh, of the product’s high probe density. That density allows the company to design multiple redundant probes over each region, reducing the likelihood that mutations or SNPs will negatively impact capture efficiency. At the same time, Parikh adds, the probe density also gives the company design maneuverability to optimize and increase capture uniformity.

NimbleGen’s SeqCap EZ Human Exome Library v3.0 captures 64 Mb, and it’s newly released SeqCap EZ Exome + UTR captures 96 Mb. NimbleGen also offers more focused libraries, including panels for neurological diseases (256 genes, 1.5 Mb) and cancer (578 genes, 4 Mb).

Other hybridization-based solutions include Illumina’s TruSeq® Exome Enrichment Kit  (62 Mb with 340,000 95-mer DNA oligonucleotides) and Life Technologies’ TargetSeq™ Exome Enrichment Kit (45.1 Mb, 2 million probes).

Agilent, NimbleGen, and Life Technologies also offer custom configurations for their solution hybridization kits. Users simply specify regions of interest, for instance, using NimbleGen’s NimbleDesign tool, click submit, and wait. According to Parikh, NimbleGen will send a proposed design within 24 hours, which the user can either accept or modify. Agilent’s answer to NimbleDesign is SureDesign (which replaces eArray for SureSelect products); Life Technologies’ TargetSeq Custom Enrichment Kits must be ordered by submitting a form and genome coordinates to the company directly. 

PCR-based capture tools

Agilent and Life Technologies also offer PCR-based enrichment solutions. Life Technologies’ version is called AmpliSeq, while Agilent’s is HaloPlex. HaloPlex, says Ericsson, is intended mostly for clinical labs or small research labs using benchtop sequencers like Illumina’s MiSeq or Life Technologies’ Ion Torrent PGM.

In HaloPlex, a DNA sample is fragmented with restriction enzymes. A double-stranded universal probe flanked by single-stranded sequence-specific capture ends captures and circularizes desired targets, which are then amplified using universal primers. This design, says Ericsson, gives HaloPlex effectively infinite multiplexing capability. “You can do up to 250,000 PCRs in one single tube,” he says. 

Automating the process

For those doing more traditional PCR-based capture, RainDance Technologies and Fluidigm have devised automated solutions to simplify the process.

Fluidigm’s Access Array System uses a microfluidic plate to set up and run up to 2,304 individual PCR reactions (48 x 48). The plate and accompanying hardware automate the process of mixing sample and primer for each of 48 samples and 48 primer sets.

According to Julian Walker, Senior Product Manager at Fluidigm, the system can set up and run those reactions in 4 hours, including just 20 minutes of hands-on time.

Each 48.48 microfluidic plate costs a few hundred dollars, Walker says, on top of the Fluidigm instrumentation itself. But, he says, the result is reproducibility and data uniformity, including greater than 85% of coverage within 5x of the mean, greater than 90% of reads that map to the genome (meaning “you’re not producing a lot of garbage”), and more than 95% of reads mapping to the target (specificity).

RainDance Technologies’ RDT 1000 system simplifies the process of PCR multiplexing by mixing template, buffer, reagents, and primer pairs in individual picoliter reaction droplets within an oil emulsion. As a result, researchers can perform thousands of individual reactions in a single tube. In late 2011, the company launched a high-throughput version of the platform, the ThunderStorm™ system, for up to 96 samples per run.

Intuitively, as sequencing prices continue to fall, it may seem as if the need for sequence capture tools will wane. For his part, Parikh doubts that will happen; instead, more and more researchers could end up turning to these tools to squeeze ever more data from every sequencer run, and to ease the load on computational pipelines.

“As the cost of sequencing drops and bioinformatics pipelines get more savvy, the need for sequence capture will actually grow,” he says.

That conclusion is bolstered somewhat by a 2011 study in which Michael Snyder’s lab at Stanford University performed a detailed comparison of the Agilent, Illumina, and NimbleGen platforms. [2] Among other things, they found that exome sequencing is sometimes better able to identify nucleotide variants than whole-genome sequencing, by virtue of deeper sequence coverage.

Long story short: Your investment in a sequence-capture kit today is likely to be useful for some time to come.

 

References 

[1] 1000 Genomes Project Consortium, “An integrated map of genetic variation from 1,092 human genomes,” Nature, 491:56–65, 2012.

[2] M.J. Clark et al., “Performance comparison of exome DNA sequencing technologies,” Nat Biotechnol, 29:908–12, 2011. 

The image at the top of the page is from Roche NimbleGen.

  • <<
  • >>

Join the Discussion