Next-Gen DNA Sequencing: 2015 Update

 NGS 2015 Update
Jeffrey Perkel has been a scientific writer and editor since 2000. He holds a PhD in Cell and Molecular Biology from the University of Pennsylvania, and did postdoctoral work at the University of Pennsylvania and at Harvard Medical School.

This past week, the U.S. National Institutes of Health announced the release of a huge bolus of data as part of the Common Fund’s Roadmap Epigenomics Program.

The flagship publication of the release—which the researchers detailed in more than 20 papers—describes the epigenomes of 111 human cell types, a collection of data including histone modifications, DNA methylation, chromatin structure, gene-expression data and more [1]. Each of these datasets required different assays, of course. But all, the authors noted, were read out the same way: “Massively parallel short-read sequencing.”

In total, the Roadmap Epigenomics Consortium produced 2,805 genomic datasets comprising 150 billion sequence reads—a dataset that would have been unthinkable just a decade ago.

Indeed, scientific news feeds these days are full of such previously unimaginable studies—from the single-cell transcriptomics of the mouse brain to the metagenomics of the New York City subway—to the point that they almost are routine. The reason, of course, is the ever-increasing output, speed and reliability of today’s DNA sequencing instrumentation, whose relentless march is advancing both clinical medicine and basic research.

Still, sequencing vendors have not been content to rest on their laurels. Here, we review the latest developments in the world of next-generation DNA sequencing.

Illumina

Short-read sequencing firm Illumina announced several additions to its sequencing line on January 12, including the HiSeq X Five, the HiSeq 3000/4000 systems and the NextSeq 550.

The HiSeq X Five is a scaled-down version of the company’s top-of-the-line HiSeq X Ten. Costing $10 million and comprising 10 sequencing devices operating in parallel, the HiSeq X Ten can sequence up to 18,000 whole human genomes per year at $1,000 apiece. The HiSeq X Five (priced at $6 million, according to GenomeWeb) includes five instruments capable of sequencing 9,000 human genomes for about $1,400 apiece—a function of the fact that consumables for the X Five cost $1,200, compared with $800 for the X Ten.

“When we launched the X Ten, the demand was amazing,” says Joel Fellis, senior market manager for high-throughput systems, “much beyond our most aggressive assumptions.” Some sequencing centers, though, could justify neither the system’s price tag nor its throughput. “That was the heart of releasing the X Five—to empower those users,” he says.

The only difference between the two systems is the number of sequencers, Fellis says. Users thus can upgrade their X Five with additional instruments, and after they acquire 10, the reagent pricing “resets” to allow for a $1,000 genome.

The dual flow-cell HiSeq 4000 and single-cell HiSeq 3000 update the company’s earlier HiSeq 2500 and 1000 instruments, albeit without the 2500’s 2x250 bp “rapid run mode.” The HiSeq 4000 ($900,000, according to GenomeWeb) can generate up to 1.5 terabases of 2x150-bp reads (750 gigabases for the $740,000 HiSeq 3000), in 3.5 days—amounting to “up to 12 genomes, 100 whole transcriptome samples, or 180 exomes,” according to a press release. (By comparison, the HiSeq X instruments produce up to 1.8 TB in 2.7 days.)

According to Fellis, one significant difference between the HiSeq 4000/3000 and earlier models is their flow-cell design. Earlier HiSeq instruments used a non-patterned flow cell in which sequencing clusters could form anywhere on the surface. As a result, sequencing clusters varied in size, shape and spacing, and one of the first steps in data analysis was identifying where they were. The new instruments (and the HiSeq X series) instead use a patterned flow cell in which sequencing clusters are restricted to 400-nm wells spaced 700 nm apart. Now, Fellis explains, “you know exactly where to look [for clusters], and you look only within those wells. That enables accurate resolution of flow cells clustered at very high densities, providing tremendous increases in throughput.”

The NextSeq 550 ($275,000, again per GenomeWeb) updates the Illumina NextSeq 500 with the addition of an array scanner, making the system ideal for labs that need both sequencing and array-reading services, especially for structural analyses such as cytogenetics.

Oxford Nanopore Technologies

Where Illumina's technology excels at data volume, other companies focus on sequence length. One such company is Oxford Nanopore

In nanopore sequencing—which also drives the Oxford Nanopore’s in-development GridION™ and PromethION™ systems—single DNA molecules are pulled through a nanometer-scale hole in a membrane, across which an electrical field has been established. Each base produces a characteristic disruption in that field, producing a signal. The company has made its USB key-sized MinION™ sequencer available through the beta-like MinION™ Access Program (MAP), and several publications have emerged in the past few months to document user experiences. (The company declined to comment for this article.)

Last fall, two researchers published a highly critical study based on their experiences with an early chemistry and system design. But with Oxford Nanopore steadily updating its tools, subsequent studies have been more positive, especially regarding so-called "2D" reads, in which the instrument reads both strands of a piece of DNA, yielding a more reliable sequence.

One study, for instance, used the MinION to sequence PCR amplicons of three pharmacogenomically relevant human loci—two HLA genes and a cytochrome P450. The resulting reads, measuring 4 kb to 5 kb, were long enough to assign specific genetic variants to individual chromosomal copies of the genes (a process called “haplotyping” or “phasing”) [2].

“The MinION device produced sufficiently long mappable reads to phase all variants in the loci examined,” the authors wrote. “As error rates on the MinION decrease, we can expect to deconvolute these data into more accurate diplotypes with less noise and will be able to measure how much multi-sample multiplexing can be supported by a single run.”

Another study, describing a computational pipeline to analyze and extract single-nucleotide polymorphisms from MinION data, generated reads out to 42 kb and used them to resolve the structure of a 50-kb gap in the human X chromosome annotation [3].

The authors of that second study report an “average identity”—a measure of how closely the read matches a reference—of 85% over some 29,000 reads. “[W]e have shown that the MinION has sufficient accuracy to resolve important biological questions by sequencing long, native DNA strands," they wrote. "This accuracy is improving rapidly.”

Several software packages now are available for analyzing nanopore data, including third-party tools poRe and Poretools and Oxford Nanopore’s cloud-based Metrichor.

Pacific Biosciences

Also focusing on read length is Pacific Biosciences. As we noted in our last NGS roundup, PacBio launched its current sequencing chemistry, P6-C4, last fall, bringing average read lengths to between 10,000 and 15,000 bases and with some reads as long as 50,000. Total sequencing yield per run averages 0.5 to 1 gigabase.

In a blog post earlier this year, the company’s chief scientific officer, Jonas Korlach, wrote that he anticipates a four-fold improvement in those numbers by the end of 2015—up to 2 to 4 GB per run, with an overall cost reduction per base and read lengths averaging 15,000 to 20,000 bases.

According to Korlach, those improvements will stem from improvements in sample preparation, sequencing chemistry and data analysis. For instance, he tells Biocompare, current protocols load only about 30% to 40% of zero mode waveguides per SMRT Cell. “We’re hoping for a factor-of-two improvement [in loading efficiency], which will automatically improve the number of reads per run.”

Another improvement involves the company’s FALCON assembler, which can keep track of haplotypes—that is, which chromosomal copy is which—to provide “a much better understanding of the true nature of the genome.”

Among other things, Korlach says, long-read sequencing technology provides crucial structural details that often are lost with shorter reads. The company’s IsoSeq technology, for instance, enables sequencing of full-length cDNAs, from which researchers can determine mRNA structure and splicing.

Researchers also can use long reads to close genomic annotation gaps. In one recent study, University of Washington researcher Evan Eichler and colleagues used PacBio’s P5-C3 chemistry to probe the structural details of a human haploid genome. The analysis identified more than 26,000 structural variants, 85% of which had never been seen before, the authors noted, and closed 50 previous annotation gaps [4].

Thermo Fisher Scientific/Life Technologies

Thermo Fisher Scientific/Life Technologies had no new updates to report since our last survey in November 2014 and declined to speak with Biocompare for this article.

References

[1] Roadmap Epigenomics Consortium, et al., “Integrative analysis of 111 reference human epigenomes,” Nature, 518:317-30, 2015. [PubMed ID: 25693563]

[2] Ammar, R, et al., “Long read nanopore sequencing for detection of HLA and CYP2D6 variants and haplotypes,” F1000Research, 4:17, 2015. [v1; ref status: approved 1]

[3] Jain, M, et al., “Improved data analysis for the MinION nanopore sequencer,” Nat Methods, doi:10.1038/nmeth.3290, 2015. [PubMed ID: 25686389]

[4] Chaisson, MJP, et al., “Resolving the complexity of the human genome using single-molecule sequencing,” Nature, 517:608-11, 2015. [PubMed ID: 25383537]

 

Updated 24 Feb 2015 to update information on Oxford Nanopore Technologies.

Image: iStockPhoto

  • <<
  • >>

Join the discussion