by Jeffrey M. Perkel
It’s been a year since Biocompare last rounded up the state of the state in the next-gen sequencing (NGS) market, and oh, what a year it’s been.
New low-cost instruments, including Life Technologies’ low-cost Ion Torrent PGM and Illumina’s MiSeq, have made next-gen sequencing a possibility for ordinary labs. Pacific Biosciences garnered worldwide attention by decoding the Vibrio cholerae pathogen that rampaged across Haiti.  Meanwhile, in the clinic, human genomes were unraveled with nearly boring regularity. As Erika Check Hayden reported in Nature, Richard Gibbs, of the Human Genome Sequencing Center at Baylor College of Medicine, estimated that “roughly 5,000 human genomes will be sequenced this year , with some 30,000 expected next year.” 
“There’s been a switch this year from celebrity genomics to clinical genomics,” notes Daniel MacArthur, a United Kingdom-based postdoctoral fellow working in genomics. Last year, it was rocker Ozzy Osbourne who had his genome sequenced. This year, Gibbs and colleagues reported that whole genome sequencing of a pair of 14-year-old fraternal twins with dopa-responsive dystonia identified the molecular cause of their disease, yielding an improved treatment regimen.  The Milwaukee Journal Sentinel won a Pulitzer Prize for the moving story of a child with an unexplained gastrointestinal disorder whose medically unique genetic abnormality was finally deciphered by exome sequencing, a $75,000 gamble.  And we read in Walter Isaacson’s biography of Apple founder Steve Jobs that following his diagnosis with pancreatic cancer, Jobs had his cancer’s genome “partially sequenced” in an effort to develop “targeted therapies.” 
Increasing throughput, decreasing cost
Jobs’ sequencing effort cost about $100,000, Isaacson wrote. Since then, the price has plummeted by nearly two orders of magnitude. In February, Kevin Davies reported in Bio-IT World that whole genome sequencing service provider Complete Genomics Inc. was sequencing human genomes for $9,500 apiece, with an eight-genome minimum.  Now, the company is charging $4,000 per genome and has the capacity to crank out some 750 genomes per month, moving towards 1,000 per month by early 2012, according to chief executive officer Clifford Reid.
At 750-plus genomes per month, Complete Genomics can crank out about 10,000 genomes per year. But, says Reid, the company is “about to roll out” a new instrument that will push the company’s capacity up to 100,000 per year, thanks to more cameras and more efficient use of DNA chips. And the next-generation upgrade beyond that—powered both by better optics and by denser sample spacing—should up the ante another order of magnitude, to one million genomes per year by 2015, Reid says.
Longer reads will enable strand mapping
At the same time, Complete Genomics is preparing a new technology called LFR, or long-fragment reads, which will enable the company to bioinformatically construct massive reads of up to 100,000 bases or so. At that length, Reid says, the company can solve a “lurking problem” in genetics, the ability to identify which strand of DNA is which.
Humans, of course, are diploid; They carry two sets of chromosomes. The question is, if an individual has two mutations in one gene, are they both in one chromosomal copy, or is there one error in each copy? “The doctor absolutely has to have that information to make the correct diagnosis,” Reid says. Existing technologies don’t help, he explains, because they produce reassembled genomes in which it is impossible to assign a particular mutation to a specific chromosome. In such a system, data on “phase”—that is, which genomic segments are physically linked to which—is lost.
Complete Genomics does use short reads—about 70 bases at a time, using a ligation-based chemistry. But with LFR, it can circumvent that problem by tagging the fragments to mark them as coming from one discrete molecule or another. As an added bonus, Reid adds, LFR should increase the company’s accuracy from 10-5 to 10-7, or from 60,000 errors per diploid genome to 600.
According to a company spokesperson, Complete Genomics will initiate a pilot LFR service by the end of this year and formally launch it in 2012. “I predict that this will be a big deal,” says Harvard Medical School geneticist George Church (who also sits on the company’s Scientific Advisory Board).
Personalized NGS systems for megabase and gigabase runs
Other new developments in the NGS sector benefit researchers working outside the whole human genome space. For instance, Illumina launched MiSeq, a kind of junior sequencing instrument for situations in which the company’s high-end HiSeq sequencers simply don’t make sense.
“There are all sorts of applications that require just megabases or a gigabase,” says Jordan Stockton, associate director of informatics product marketing at Illumina—applications like bacterial genome sequencing and targeted resequencing of specific human alleles, for instance.
The MiSeq uses the same sequencing-by-synthesis fluorescent chemistry as the HiSeq instruments, meaning researchers can seamlessly migrate from one to the other without learning a whole new set of protocols. Costing $125,000, the MiSeq instrument produces less than two gigabases per run, with up to 150 bp paired-end reads, and the Broad Institute has shown  that even single-end 300 base runs are possible. That, says MacArthur, suggests that with a paired-end approach it could be possible to achieve reads of more than 500 bases, including a region of overlap in the middle, “which is pretty impressive.” (The HiSeq specification currently maxes out at 125 bp paired-end reads, largely because of differences in cycling times between the two devices.)
According to Stockton, Illumina recently announced three new tools for the NGS space. The first is the TrueSeq Amplicon Kit, a targeted amplification kit to sequence up to 384 targets per sample in each of 96 samples. The second is a simplified library-preparation kit that simplifies the protocol from “a day or two” to 90 minutes. And finally, the company launched a cloud-based bioinformatics system called BaseSpace.
The company inaugurated BaseSpace with five workflows, Stockton says: whole bacterial genome sequencing, targeted resequencing, metagenomics, small RNA profiling and library quality control (for those who want to test library quality prior to HiSeq runs). But “the beauty of the cloud is that it’s really easy to roll out new functionality to users,” he says. In fact, Illumina plans to launch a BaseSpace “App Store” so users can fold new algorithms into their pipelines.
Life Technologies entered the low-cost benchtop personal sequencer market in the past year with its $50,000 Ion Torrent PGM, which essentially turns semiconductor chip-based consumables into ultra-high density pH meters. At launch, the PGM used so-called “314” chips, which produced 10 megabases worth of 100 base reads in just two hours for about $500. The company then rolled out its “316” chips, which produces some 100 megabases per run. A new “318” chip, currently available to “early access” customers, ups the output again, to nearly a gigabase per run.
That gigabase, explains Mark Gardner, vice president and general manager for advanced genomic systems at Life Technologies, stems both from an increased number of on-chip sensors—there are about 11.3 million on the chip (compared to 1.4 million on the 314), of which at least 5 million will produce reads—and the read length, which has now doubled to 200 bases. At some point in 2012, Gardner says, read length will expand to 400 bases, meaning the 318 chip can theoretically yield upwards of two gigabases per run, all for about $500. “We’ve shown a 525-base pair ‘perfect read’ at ASHG [American Society of Human Genetics' annual meeting],” he says.
Life Technologies’ 5500 series Genetic Analysis Systems also are slated for an upgrade in 2012, says Gardner. The systems will be able to run Wildfire chemistry, which offloads template preparation (including the long and costly emulsion PCR steps) from the lab bench to the instrument. This reduces the cost of sample preparation, increases read density and improves consistency, Gardner says. By packing sample spots more tightly and enabling reads on both the top and bottom of the plates, Wildfire increases the number of imaged features up to five times, Gardner says, which is enough for “up to 10 genomes per run.” (The 5500xl Genetic Analysis System currently produces about two human genomes per run.)
454 Life Sciences
454 Life Sciences (part of Roche Life Sciences) has updated its sequencing platform, as well. The 454 GS FLX+, an upgrade to the GS FLX, combined with the company’s new GS FLX Titanium Sequencing Kit XL+, enables read lengths of up to 1,000 bases per run. “That allows you to get high-quality, high-throughput, Sanger-length sequencing from the system,” says senior marketing manager Mike Catalano, who adds that runs can yield up to 700 megabases apiece.
454 also has launched primer sets for targeted sequencing. These include kits for amplifying the human HLA regions and an assay for leukemia-associated genes (to be launched in 2012).
The past year also saw action in so-called third-generation sequencing technologies. For one thing, there was the official launch of Pacific Biosciences’ PacBio RS, a single-molecule fluorescent sequencing technology that played roles in untangling the Haitian cholera epidemic and a German E. coli outbreak.
According to Pacific Biosciences’ chief technology officer, Stephen Turner, the PacBio RS’ read length—about 2,500 to 3,000 bases, on average, with some as long as 22,000 bases—makes the instrument especially adept at solving genome structure, albeit with a per-read accuracy (85% to 89% or so) far lower than that of competing technologies.
Initially, says Turner, PacBio is focusing on a small set of familiar applications, including de novo bacterial sequencing, pathogen sequencing, viral sequencing and targeted gene resequencing. But, he adds, the technology also uniquely enables the possibility for some novel applications, including direct identification of DNA modifications like 5-methylcytosine and 5-hydromethylcytosine based on how they affect the kinetics of the polymerase used to read the sequence.
“These chemical modifications are like speed bumps on the highway, and as the polymerase drives over them, it slows down, or sometimes speeds up, depending on the particular context. And those patterns produce a characteristic signature that can allow us to detect them,” Turner says.
Oxford Nanopore, which is developing two nanopore-based sequencing approaches (“strand” and “exonuclease,” the latter for which the company has a commercialization agreement with Illumina), declined to comment on the status of its ongoing projects, pointing instead to online resources indicating that the system will consist of intercommunicating networked sequencer appliances (nodes) coupled with consumable cartridges. Each GridION node can function independently or communicate through the network with other nodes in a larger sequencing effort—an architecture designed to deliver versatile workflows.
However, given that the company on October 20, 2011, opened a job listing for “Early Access Collaborations Managers, DNA Sequencing,” whose responsibilities include “managing technology development collaborations with key customers at leading genomics institutions,” Oxford could be emerging from stealth mode and may initiate early-access programs in the near future.
Another nanopore-focused company, Genia, also is taking its first steps towards commercialization. According to chief executive officer Stefan Roever, Genia is developing a nanopore-based system that uses biological pores and integrated circuits—the same technologies driving the computer industry. The system, Roever explains, “is taking state-of-the-art electronics and applying it to a problem that’s on the very hairy edge of what can be done in signal processing,” namely, capturing the incredibly weak signals, on the order of thousands of electrons, that arise as single DNA bases slip through a nanoscale hole.
At the heart of Genia’s system are active chips, Roever says, which can read and control each pore independently.  By “powering up” those chips, researchers can dynamically assemble a nanopore array, load it with DNA and then sequence, rewind and eject those molecules on demand. At the moment, the technology is in development, though tests suggest it can distinguish all four bases and control DNA movement, Roever says. “We expect an alpha version of the chip with several hundred sensors in the first quarter of 2012,” he adds.
How that chip competes with Oxford and other sequencing players remains to be seen. One player that may not be in competition by then, though, is third-gen, single-molecule sequencing firm Helicos BioSciences. According to GenomeWeb Daily News, Helicos announced on November 14 increased revenues for the third quarter of 2011 “but cautioned that absent a cash infusion it could be forced to close its doors by the end of the year.” 
 Davies, K, “Break out: Pacific Biosciences team identifies Asian origin for Haitian cholera bug,” Bio-IT World, December 9, 2010.
 Hayden, EC, “Secrets of the human genome disclosed,” Nature, 478:17, 2011.
 Bainbridge, MN, “Whole-genome sequencing for optimized patient management,” Sci Transl Med, 3:87re3, 2011.
 Johnson, M, and Gallagher, K, “One in a billion: A boy’s life, a medical mystery,” Journal Sentinel, December 18, 21, 25, 2010.
 Isaacson, W, Steve Jobs, New York: Simon & Schuster, 2011.
 Davies, K, “The $10,000 genome and counting: The complete picture for 2011,” Bio-IT World, February 7, 2011.
 Hadfield, J, “Illumina generates 300bp reads on MiSeq at Broad,” November 2, 2011.
 Davies, K, “Genia’s nanopore/microchip technology gains Life Technologies’ support,” Bio-IT World, October 21, 2011.
 “Helicos Q3 revenues rise 14 percent, but raises questions about its survival past December,” GenomeWeb Daily News, November 15, 2011.
The image at the top of this page is Life Technologies' Ion Semiconductor Sequencing Chip.