Exome Sequencing Comes of Age

 Exome Sequencing Comes of Age
Jeffrey Perkel has been a scientific writer and editor since 2000. He holds a PhD in Cell and Molecular Biology from the University of Pennsylvania, and did postdoctoral work at the University of Pennsylvania and at Harvard Medical School.

With genomes, as with haute cuisine, less sometimes is more. Sure, researchers using next generation DNA sequencers can whip out several full-human-genome’s worth of sequence data in a single run. But interpreting those data—assembling the reads, identifying where the genome differs from the reference sequence and determining which of those variants, if any, might underlie some interesting biology—that’s another story.

Geneticists can make their lives easier by concentrating their efforts on that small fraction of the genome that encodes protein—the so-called “exome.” Representing just 1% or so of the overall sequence, an exome is like a Reader’s Digest condensed version of the genome—short, to the point, and less expensive than the full-length original.

Exome sequencing represents an “effective compromise between the competing goals of genome-wide comprehensiveness and cost-control,” wrote University of Washington genomicist Jay Shendure in a 2011 editorial to a special issue of Genome Biology on the topic of exome sequencing. [1] Basically, because a sequencer can only push out a finite number of bases, the more samples that can be combined per run through sample “barcoding,” the less each sample costs, and the more deeply each sample can be read.

Another benefit of exome sequencing is interpretability. Whole-genome sequencing generates a lot of data, but for the majority of base pairs, researchers don’t know their relationship to disease, explains Yaping Yang, laboratory director for the Whole Genome Laboratory at Baylor College of Medicine, a clinical laboratory that offers an exome-sequencing service. “Therefore, they are not helpful in making molecular diagnoses.”

In effect, exome sequencing is just a special form of targeted sequencing, an application in which researchers capture specific genomic segments for sequencing analysis. The difference is that instead of capturing, say, all the genes implicated in a particular cellular pathway, exome sequencing selects every exon of every protein-coding gene, and sometimes 5’ and 3’ untranslated sequences, as well—a collection numbering in the tens of megabases.

If PubMed is any guide, getting at those megabases has become very popular indeed: The database lists nearly 1,300 references including the term ‘exome,’ almost all of them published since 2011. Commercial vendors now offer tools to help researchers get in on the act. If you’ve been thinking about spicing up your genetics work with, as Genome Biology called it, “that special ‘exome factor’,” read on. [2] We’ll help you identify a solution that meets your needs.

Solution-based hybridization

Commercial options for exome capture fall into two basic categories, solution-based hybridization and PCR. Kits for the former are available from Agilent Technologies, Illumina and Roche NimbleGen. Clinically focused PCR-based kits can be obtained from Agilent. (Roche NimbleGen used to offer hybridization capture on planar microarrays but has discontinued that line.)

Solution hybridization-based techniques all follow the same basic protocol: Mix fragmented genomic DNA with biotinylated capture oligonucleotides, hybridize, capture hybrids on streptavidin-conjugated microbeads, wash away the unbound material and release what’s left. The differences lie largely in the details.

Agilent’s SureSelect Human All Exon V5 and V5+UTRs kits use pools of several hundred-thousand 120-mer biotinylated RNA capture probes to enrich exonic sequences (as well as, optionally, 5’ and 3’ untranslated regions and up to 6 Mb of custom sequence).

According to Olle Ericsson, Agilent’s marketing director for DNA Sequencing, RNA hybrids are stronger than comparably sized DNA-DNA hybrids, leading to stronger and more efficient capture. In addition, the system’s use of very long oligos means that even sequences containing short insertions and deletions (or “indels”) can be captured efficiently.

Version 5, the newest iteration of the SureSelect exome line, was launched last fall at the American Society for Human Genetics national meeting. According to Ericsson, V5 features updated content as well as a new, streamlined workflow that reduces sample preparation time by about a half-day, largely thanks to a shorter hybridization step: “If you start a prep on day one, you will have [the samples] ready to sequence on day two” (as opposed to the following morning).

The SureSelect system is available in two forms. SureSelect XT enables barcoding and mixing of samples after capture, and SureSelect XT2 barcodes (or “indexes”) samples before capture. The choice of which to use “depends on how you prefer to set up your workflow,” Ericsson says. To do pre-capture indexing, all the samples must be available at the same time. If samples tend to dribble in one or two at a time, post-capture indexing might make more sense. (Pre-capture pooling approaches also typically perform slightly worse than post-capture pooling, he adds.)

Also based on solution capture is Roche NimbleGen’s SeqCap EZ Exome Library v3.0, which uses a capture library comprising some 2.1 million DNA oligonucleotides averaging 80 bases in length. That design, says Thomas Albert, global head of technology innovation at Roche Applied Science, gives the company considerable flexibility in terms of how it positions its capture probes.

“We can put more probes in some places than in others, or make them longer, or shift them around in different ways—these are all things we can do because we have a larger number of smaller probes,” Albert says.

The reason that is necessary, he explains, is non-uniformity—variations in melting-temperature, secondary structure and cross-hybridization efficiency across the genome such that, though an exome might be sequenced to, say, 100-fold coverage, some regions will be over-represented and others possibly skipped altogether. That means crucial variants could be overlooked or misinterpreted, leading to more sequencing and rising costs.

According to Albert, the SeqCap EZ Exome Library v3.0 was released about a year ago and captures some 64 Mb of genome sequence. More recently, the company has added the ability to capture 32 Mb of 5’ and 3’ untranslated region (UTR) sequences (SeqCap EZ Exome +UTR Library) or up to 50 Mb of custom content (SeqCap EZ Exome Plus).

Illumina offers two solution-capture systems, the stand-alone TruSeq™ Exome Enrichment Kit, which captures 62 Mb of genomic sequence using more than 340,000 95-mer probes, and the Nextera® Exome Enrichment Kit, which integrates TruSeq into a “streamlined, automation-friendly workflow [that] combines [enzyme-based fragmentation and] library preparation and exome enrichment steps, and can be easily completed in 2.5 days with minimum hands-on time,” according to product literature.

PCR-based kits

On the PCR front, Agilent launched its PCR-based HaloPlex Exome kit in February 2013, coinciding with the annual Advances in Genome Biology & Technology (AGBT) 2013 conference. Aimed mostly at the clinical market, the kit offers a simpler workflow and less input DNA (200 ng vs. 1 to 3 g) than SureSelect, says Ericsson. In particular, he says, the HaloPlex protocol eliminates the need for mechanical shearing of the genomic template, integrating the library-preparation step into the PCR process itself.

Agilent also launched at AGBT a dedicated software package called SureCall. Unlike the more flexible (and powerful) GeneSpring software used with SureSelect, SureCall converts raw sequence data directly into mutation lists “that are classified according to industry guidelines,” Ericsson says.

In the clinic

Exomes might be simpler than whole genomes, but data interpretation is still a challenge in exome analysis, especially when medical decisions depend on the outcome. Such is the case at the growing number of clinical laboratories now offering exome-sequencing services.

At the Whole Genome Laboratory at Baylor College of Medicine, turnaround time on the lab’s exome-sequencing service is about 15 weeks, says Yang, “because the analyses and interpretation are so complicated.” The lab must consider a patient’s clinical presentation, prior testing and exome-sequencing data before it can issue a report.

Baylor’s service mainly is used to diagnose patients who are suspected of having genetic disorders that the referring physicians cannot pin down. Most are pediatric patients with neurological deficits, Yang says, and its “pick-up” rate—the fraction of cases in which a likely causative genetic mutation can be identified—ranges from 25% to 30%.

“If the clinical phenotype is not caused by a genetic defect, no matter how hard we try, we are not going to find mutations,” she says. And, of course, some mutations fall outside of the exome.

Yang’s lab sequences those exomes using three Illumina HiSeq sequencers, typically combining three samples per lane, 48 samples per run, for about 13 or 14 gigabases, or 150-fold mean coverage, per exome on average. For exome capture, the lab uses a custom Roche NimbleGen solution-hybridization design named VCRome, developed at Baylor’s Human Genome Sequencing Center. According to Yang, that system covers more than 95% of desired based at 20-fold-coverage or higher, with a capture specificity of 70% to 80%.

Which method to chose

On the face of it, any solution-based capture approach should work equally well for exome analysis. But do they?

To find out, Michael Snyder, professor and chair of genetics at Stanford University and director of the Stanford Center for Genomics and Personalized Medicine, and his team in 2011 compared the performance of all three commercial approaches. [3] The results, they report:

… suggest that the Nimblegen platform, which is the only one to use high-density overlapping baits, covers fewer genomic regions than the other platforms but requires the least amount of sequencing to sensitively detect small variants. Agilent and Illumina are able to detect a greater total number of variants with additional sequencing. Illumina captures untranslated regions, which are not targeted by the Nimblegen and Agilent platforms. [3]

(Today, UTR coverage is an option on all three platforms.)

In short, says Snyder, “They all worked pretty well.” But his lab prefers SureSelect, he says, “because it has a nice balance.”

Snyder says exome sequencing offers two advantages over whole-genome sequencing, even in this age of falling whole-genome prices. First, the lower cost means larger populations can be studied than might otherwise be possible. “Most projects tend to be budget-driven,” he says. But exomes also enable deeper sequencing than whole genomes—a consequence of the fact that, again, a sequencer can only push out so many bases. Snyder’s 2011 study routinely identified several thousand variants in exomes that were missed in the corresponding whole-genome sequences. “And because they’re in exomes, they tend to be things that you care about, because they’re coding,” he says.

On the other hand, whole-genome sequencing captures everything and can more effectively identify structural rearrangements that exome sequencing might miss. As a result, when dealing with clinical samples, Snyder’s lab tends to err on the side of caution and capture both an exome and a whole genome. “That’s to get us the extra coverage,” Snyder says. “We feel more confident about our calls.”

References

[1] Shendure, J, “Next-generation human genetics,” Genome Biology, 12:408, 2011.
[2] Stower, H, “The exome factor,” Genome Biology, 12:407, 2011.
[3] Clark, MJ, et al., “Performance comparison of exome DNA sequencing technologies,” Nature Biotechnology, 29:908-4, 2011.

  • <<
  • >>

Join the discussion