Next Generation Sequencing 2013: Looking Into Genomes

 NGS 2013: Looking Into Genomes

On January 29, 2013, Indian, Chinese, and American researchers reported the draft sequence of the chickpea, Cicer arietinum, “the third crop legume plant to have its genome unraveled and published so far,” according to GenomeWeb.

The week before, it was meningioma, a kind of central nervous system tumor, and the week before that, the diamondback moth.

These remarkable reports have to some extent lost their ability to shock. Where once new genomes were greeted with awe in the scientific community—not to mention the mainstream press—today the sense is, “another day, another genome.” Indeed, with human genome sequencing approaching the $1,000 mark, and low-cost sequencers like the Illumina MiSeq and Life Technologies’ Ion Torrent PGM putting genome technology in the hands of scientific have-nots, it might seem that DNA sequencing has plateaued. Become commoditized, even.

On one level, perhaps. But that doesn’t mean the technology is stagnant. Sequencing firms continue to optimize their hardware, software and reagents to squeeze ever more bases from their instruments ever more quickly. Meanwhile, researchers are expanding the application space by pushing sequencing technology in new directions.

Long fragment reads

Harvard University geneticist George Church cites three recent papers that illustrate exciting new developments in the sequencing arena. The first, published by Complete Genomics in July 2012 in Nature (and co-authored by Church), demonstrates the company’s ability to produce “phased” genome sequences—that is, genomes in which polymorphisms can be unambiguously assigned to either the maternal or paternal chromosome—using its new “long fragment read” (LFR) technology. [1]

In the LFR approach, pools of long DNA fragments are diluted out until, on average, each well of a plate contains either one or no DNA fragments. These individual fragments are then fragmented, amplified and barcoded to produce essentially a unique sequencing library for each long piece. Those pools are then recombined and sequenced en masse; the barcodes enable the researchers to then assign individual fragments to one parental chromosome or the other.

Typical genome sequences lack such information, which can be clinically vital, Church says. Suppose, for instance, that an individual harbors two different mutations in a particular gene. Are both mutations present in the same copy, or is there one mutation in each? “The difference is night and day,” he says; in the former case, the individual still harbors a working copy of the genome, but not in the latter case. “I don’t know any other way to get at that very important fact.”

Nanopore sequencing

The second paper describes a novel nanopore-based sequencing approach. [2] Several companies are pursuing nanopore sequencing, including Genia and Oxford Nanopore Technologies, which made a splash in February 2012 with its announcement of a nanopore-based sequencer that's about the size of a USB thumb drive.

In nanopore sequencing, the basic idea is that passage of either a single-stranded piece of DNA or individual nucleotides through a pore disrupts the flow of electrical current across the pore by some characteristic amount for each base. But nucleotides are very chemically similar to one another, making their signals difficult to distinguish. And it’s hard to get DNA to flow through a pore in such a way that each base can register.

The new paper describes a different approach. Led by Jingyue Ju of Columbia University and John Kasianowicz of the National Institute of Standards and Technology, the team produced a series of tagged nucleotide triphosphates, with unique tags for each of the four nucleotides. As a DNA polymerase positioned near the pore extends the primer-template complex, the tags are released into the pore, where they produce characteristic signatures.

“This is the newest flavor [of nanopore sequencing],” Church says. It is a sequencing-by-synthesis approach and “a new way of doing things. That’s why this is exciting.”

Still, he notes, the paper does not actually demonstrate sequencing per se. Rather, it shows that a nanopore can distinguish the four tags in solution.

Genia, which is collaborating with Ju, Kasianowicz and Church to develop this technology, has announced plans to have a beta test instrument available by the end of 2013 and a commercial product by 2014. The company fabricates its sequencers out of disposable computer chips, building a massively-parallel nanopore array automatically at run time.

According to Stefan Roever,Genia’s chief executive officer, the array consists of thousands of nanoliter-sized sensor wells. Each well has a membrane stretched over the top “like a drum skin covers a drum,” into which a single protein pore is inserted. Ultimately, point-of-care devices could be as small as a cell phone, he says.

The current “alpha” version of the Genia consumable contains 264 sensors. The anticipated beta version will contain 100,000 sensors, and the commercial version, about one million. Roever says the device will use a DNA polymerase and a sequencing-by-synthesis approach, as in the current paper, but it will use different chemistries. Still, he anticipates running at about 10 bases per second per sensor.

Genome sequencing

The third paper, from Sunney Xie’s lab at Harvard University, demonstrates genome sequencing and variant calling from a single human cell. [3] The method relies on a new amplification procedure called MALBAC – multiple annealing and looping-based amplification cycles – which combines a linear “preamplification” step and PCR to uniformly amplify the DNA from a single cell.

An additional paper, published in October 2012, also is generating interest in the sequencing community. Developed by Stephen Kingsmore’s lab at Children’s Mercy Hospital in Kansas City, Mo., the method, called STAT-Seq, leverages the approximate 24-hour run time of Illumina’s new HiSeq 2500 and some advanced bioinformatics algorithms to identify genetic mutations in newborns in just 50 hours. [4]

According to Abizar Lakdawalla, associate director of technical marketing at Illumina, the HiSeq 2500, an upgrade of the company’s older HiSeq 2000, began shipping in the third quarter of 2012. The instrument produces 2x150-base paired-end reads, which will increase to 2x250 base reads in the second half of 2013. “That will give you around 300 gigabases [of sequence] in approximately 60 hours,” Lakdawalla says. “So pretty massive output in a short amount of time.”

Illumina’s personal-sized MiSeq will shortly support 2x300 paired-end reads, to generate about 15 Gb of sequencing data. That puts the MiSeq read lengths on par with Sanger-based sequencing. But for really long reads, the company will soon offer straightforward library preparation kits and analysis software from a recently acquired reagent company called Moleculo (named after a “Saturday Night Live” sketch); the kits promise synthetic reads of approximately 10,000 bases on Illumina's sequencers.

“The consensus accuracy for the 10,000 base reads is Q50,” Lakdawalla says. “No existing technology gives this combination of incredibly long reads and super-high-accuracy. So you can now easily do de novo sequencing of large genomes, analyze complex metagenomes, and most importantly, effortlessly produce phased genomes.” The long read technology will be introduced as a service delivering phased genomes followed by kits for users to produce the long reads on existing Illumina sequencers.

According to Lakdawalla, Illumina’s sequencers are finding increasing use in high-resolution genomics and transcriptomics – that is, single-cell ‘omics. For instance, researchers are using Illumina hardware to characterize the complete transcriptome of circulating tumor cells at single cell resolution, and to define cellular heterogeneity in tissues and tumors.

“It will help us answer some profound questions, like how does a single cell create the incredible complexity of a whole organism,” he says.

Also offering long reads is Roche-subsidiary 454 Life Sciences, whose GS FLX+ system, coupled with its newest software (release 2.8) produces reads of “up to 1,000 bp and beyond,” according to a company spokesperson. “These extra-long reads are particularly useful for de novo genome and transcriptome sequencing projects.” An update to version 2.9 is expected in the first half of 2013, enabling “extra-long read amplicon sequencing.”

Pacific Biosciences offers the longest currently available reads on its PacBio RS. The company’s new XL chemistry produces reads averaging 5,000 bases apiece, though about 5% of those exceed 10,000 bases. “By year’s end, both of those numbers will double again,” says company Steve Turner, the company's chief technology officer.

The PacBio RS consumable has some 75,000 wells (zero-mode waveguides, or ZMWs) that can be simultaneously monitored, each of which can hold a single polymerization reaction. On average, 30% to 50% of the ZMWs are productive, Turner says, so under ideal conditions, a single run produces about 250 megabases of sequence. (A hardware upgrade planned for the second quarter of 2013 will double the number of ZMWs that can be simultaneously monitored to 150,000.)

Life Technologies’ Ion Torrent division launched its Ion Proton in September 2012. A follow-up to the Ion Torrent PGM, the Ion Proton can sequence “a human exome in a few hours,” according to The Scientist magazine, which named the machine one of the top 10 innovations of 2012. (Life Technologies could not be reached for comment.) An upgraded chip, anticipated early in 2013, is “designed to handle an entire human genome, from sample prep to full sequence in 8 hours.”

With hardware like that available, can you imagine what the next year will bring?


[1] B.A. Peters et al., “Accurate whole-genome sequencing and haplotyping from 10 to 20 human cells,” Nature, 487:190–5, 2012.

[2] S. Kumar et al., “PEG-labeled nucleotides and nanopore detection for single molecule DNA sequencing by synthesis,” Scientific Reports, 2:684, DOI:10.1038/srep00684, 2012.

[3] C. Zong et al., “Genome-wide detection of single-nucleotide and copy-number variations of a single human cell,” Science, 338:1622–6, 2012.

[4] C.J. Saunders et al., “Rapid whole-genome sequencing for genetic disease diagnosis in neonatal intensive care units,” Science Translational Medicine, 4:154ra135, 2012.

The image at the top of this page is from Roche 454 Sequencing.

  • <<
  • >>