Prior to the mid-1970s, researchers knew what DNA looked like, but the “message being expressed” wasn’t clear. That all changed in 1977, when Frederick Sanger developed the sequencing method that bears his name.

For nearly 30 years, Sanger’s dideoxy-sequencing method—so named because it uses modified DNA nucleotides that lack the 3’-hydroxyl group required to extend DNA chains—ruled the genetics landscape. Commercialized, automated and parallelized, the technology drove the sequencing of a raft of model organisms, from bacteria to yeast to mouse to man.

In its earliest incarnations, the Sanger method used radioisotopes and polyacrylamide gel electrophoresis to read out DNA sequence.

By the time of the Human Genome Project, 32P had given way to fluorescent dyes, and arrays of hair-thin capillaries had replaced cumbersome acrylamide gels. Core facilities could churn out sequence at the rate of 20 raw bases per second per instrument, by one estimate [1].

Still, the technology suffered significant limitations, especially vis-à-vis scale-up. DNA segments had to be cloned before they could be sequenced, requiring the creation and maintenance of DNA libraries. The technology was still expensive, and though throughput had steadily improved over the years, it remained too slow to practically apply to large numbers of multi-gigabase genomes.

Then, in September 2005, the genetics world was upended once again by a pair of papers that fundamentally redefined the nature of DNA sequencing. In the journal Science, Harvard University geneticist George Church and colleagues described a “cyclic array” method based on multiple rounds of DNA ligation and denaturation that could produce 30 Mb in a single 60-hour run [1]. In a Nature article, 454 Life Sciences founder Jonathan Rothberg and colleagues described a “sequencing by synthesis” approach with which they generated about 34 million bases in just four hours [2]. Though the two methods differed, both relied on the idea of repeatedly interrogating multiple DNA templates arrayed on a solid surface, reading out DNA sequence by the resulting signal patterns.

Since then, “next-generation DNA sequencing” (NGS) has exploded, with its accessibility skyrocketing and prices plummeting.

Here we review the various NGS technologies used today by most researchers as well as the existing and emerging applications that are driving their use.

*Editor's note: Sample preparation prior to NGS analysis can strongly influence the experimental plan and results. Due to space constraints for this article, we will not discuss the reagents used for sample preparation.  We strongly encourage you to visit the Biocompare product directory to learn about the latest reagents available for NGS researchers. 

Sequencing technologies

Though 454 was the first to market with a commercial sequencer, its technology (which was subsequently sold to Roche) today represents just a fraction of the sequencing market; Roche announced in 2013 it would be closing the 454 sequencing business.

Current NGS systems adopt one of a few basic approaches. Illumina and Ion Torrent have commercialized sequencing-by-synthesis variants on the “cyclic array” strategy, in which pools of identical molecules (sometimes called “polonies”) at a fixed location on a solid surface are repeatedly interrogated by a DNA polymerase to determine the order of bases added. Complete Genomics’ instruments use sequencing by ligation, and Pacific Biosciences and Oxford Nanopore Technologies have commercialized single-molecule methods that read out long stretches of sequence in real time. (Thermo Fisher Scientific’s SOLiD sequencers also were based on the sequencing-by-ligation approach, but they were discontinued as of May 1, 2016.)

Complete Genomics

Complete Genomics, a Mountain View, California-based human-genome sequencing service provider that was acquired by the Chinese sequencing giant BGI in 2013, sequences DNA using a form of sequencing by ligation based on tangled concatameric structures called “DNA nanoballs.”

Complete Genomics’ sequencing-by-ligation strategy, called “combinatorial probe-anchor ligation” (cPAL) is a cyclic array approach in which the arrayed nanoballs, each containing four known “anchor” sequences, are interrogated using successive rounds of oligonucleotide annealing, ligation to fluorescently labeled oligonucleotides and detection. An oligonucleotide complementary to one of the anchor points is annealed to the DNA, followed by a pool of probe molecules whose sequence is random at every position save one. That known base is coded to the fluorescent tag used to label the molecule, and ligation of this molecule to the anchor oligonucleotide reveals the identity of the specified base. The DNA is then denatured to remove the ligated probe, and the process repeats. Up to 100 base paired-end reads can be assembled in this way [3].

In October 2015, BGI announced a desktop sequencer called the BGISEQ-500, available only in China, based upon a variation of this sequencing strategy. According to Bio-IT World, BGI “clearly intends for the sequencer to compete with Illumina’s line of NextSeq instruments, as a benchtop machine with high-throughput capabilities.”

In June 2015, Complete Genomics announced another instrument based on its sequencing technology, the Revolocity, targeting the high-throughput human-genome sequencing market currently dominated by Illumina’s HiSeq X systems. Three Revolocity systems were sold for $12 million each, but they have not yet been delivered. But, in November 2015, the company announced it was putting Revolocity on hold and refocusing on assays for the BGISEQ-500 while also downsizing its California operations, Bio-IT World reported

Illumina

Illumina sequencers use a sequencing-by-synthesis approach in which template polonies on a flow-cell surface are incubated with DNA polymerase and all four nucleotides, each of which is tagged with a different fluorescent color. Those nucleotides contain a chain terminator, as in Sanger sequencing, but those terminators can be removed following imaging, thereby allowing successive rounds of base addition to take place. The number of cycles determines the length of the read, and the instrument monitors the emission wavelength in conjunction with the signal intensity to identify the base.

Illumina instruments generate large numbers of very accurate, relatively short reads, though the precise specifications vary from instrument to instrument and from experiment to experiment. This makes them especially useful for such applications as transcriptome analysis and rare variant calling, in which exceptionally deep sequencing is required.

The product line spans a continuum from the entry-level MiniSeq to the ultra-high-end HiSeq X Ten. The MiniSeq can produce 7.5 Gb’s worth of 2 x 150 bp paired-end reads in 24 hours (about 50 million reads), and the HiSeq 4000 can produce 1.5 Tb (5 billion reads) in 3.5 days. The HiSeq X Ten, on the other hand, can produce 1.8 Tb in three days for each of 10 arrayed sequencers—some 18,000 human genomes’ worth of sequences per year at about $1,000 apiece. The company’s longest reads are on the popular benchtop-sized MiSeq, which can generate up to 15 Gb’s worth of 2 x 300 bp paired-end reads in 56 hours.

Ion Torrent

Ion Torrent was founded by Jonathan Rothberg, of 454 Life Sciences, and the two chemistries have much in common. 454’s chemistry exploits the pyrophosphate that is released as the polymerase in the reaction adds a nucleotide to a growing chain. By converting that molecule back into ATP, the system drives a luciferase reaction whose light output is proportional to the number of nucleotides added. (The system reads the resulting sequence by feeding in nucleotides one at a time—e.g., first A, then C, then G, then T—and recording the signal each time.)

Also released during polymerase nucleotide addition is a proton, which is what the Ion Torrent system detects. Indeed, according to the company, Ion Torrent is “essentially the world's smallest solid-state pH meter.” As with 454’s chemistry, these nucleotides are added one by one, with readings taken at each step.

Now part of Thermo Fisher Scientific, Ion Torrent offers four sequencing systems: the Ion PGM™, Ion Proton™, Ion S5™ and Ion S5 XL. Each system can produce a range of outputs per run, depending upon the consumable (“chip”) used. But in general, the PGM can produce up to 5 million reads, averaging 200 bases (2 Gb); the Proton can produce up to 80 million reads (10 Gb); and the S5 lies in the middle (from 5 million reads, totaling 0.6 Gb, to 80 million reads, totaling 15 Gb).

Oxford Nanopore Technologies

Oxford Nanopore’s sequencers are based on “nanopore sequencing.” Individual DNA molecules are fed through nanometer-sized holes (or pores) via the action of a molecular motor. As each base occludes the hole, it produces a characteristic disruption in the current flow across it, generating an electrical signal that corresponds to the sequence.

As with PacBio’s instruments, the technology can produce exceptionally long reads, though length is a function of sample preparation. In a 2015 study, the authors generated reads as long as 230,000 bases, though most averaged about 10 kb, with a median of 60,000 reads per run [4]. The technology also can be used to sequence RNA directly—including detecting modified bases—without a cDNA-synthesis step [10].

Oxford Nanopore has developed two systems, the USB key-sized MinION, with a single flow cell, and the higher-throughput, 48-cell PromethION (currently available only through an early-access program). In July of this year, the National Aeronautics and Space Administration (NASA) sent a MinION sequencer to the International Space Station, where it will be used to monitor the quality of drinking water and identify microbes. A newer, smartphone-coupled design called the SmidgION was announced earlier this year and is currently in development.

Pacific Biosciences

Pacific Biosciences’ Single Molecule Real Time (SMRT) sequencing strategy immobilizes individual DNA polymerases at the bottom of attoliter-scale wells, called zero-mode waveguides (ZMWs). Each polymerase copies a single DNA template molecule using a set of fluorescently tagged nucleotide building blocks, with each base labeled a different color. By illuminating only the bottom of the well, the system can read the sequence of the new DNA strand based on the pattern of fluorescent signals that it produces in real time. Certain epigenetic variants, such as methylated bases, can be directly detected during the process based on the kinetics of nucleotide incorporation, as opposed to requiring chemical treatment (e.g., with bisulfite).

PacBio offers two sequencing systems, the PacBio RS II and the newly released PacBio Sequel, which is smaller, less expensive and higher-throughput than its predecessor. Reads on both systems average 10,000 bases, though some can be as long as 40 to 50 kb. The RS II generates some 55,000 reads, and the Sequel produces about six times more. Run times range from 30 minutes to six hours. Thus, where Illumina generates large numbers of short reads, PacBio generates lower numbers of very long reads. And, with “little to no GC bias,” according to chief scientific officer Jonas Korlach, the technology can be used to read traditionally problematic sequences, permitting such applications as microbial genome and epigenome sequencing, de novo genome assembly and full-length 16S metagenome sequencing.

“Synthetic” long-read sequencing

Long-read data, such as those produced by Oxford Nanopore and PacBio, offer several advantages. They are easier to assemble than short-read sequences and can reveal structural variations that might be missed by short reads. And, the data can be more easily “phased”—that is, assembled such that it is possible to distinguish the maternal and paternal chromosome.

Given these benefits, researchers also have developed strategies to extract long-read information from short-read data. According to a recent review, two such strategies currently are available (from Illumina and 10X Genomics), and both employ a similar principle: the isolation of relatively long DNA fragments, which are then barcoded such that it later is possible to work out which fragments were physically linked on the same strand of DNA [3]. Earlier this year, 10X Genomics unveiled the Chromium™ system based upon the company's GemCode™ technology, a comprehensive solution that uses very small amounts (as little as 1 ng starting material) to prepare sequencing libraries that are partitioned based on the unique barcodes incorporated via the methodology. In addition, the platform provides software  for detailed analysis and visualization of the products. The Chromium System and solutions offered by 10XGenomics is compatible with researchers’ existing sequencing systems and workflows, enabling scientists to identify single-nucleotide and structural variants and dynamic gene expression of individual cells.

Existing applications

NGS can be used for everything from de novo genome sequencing and epigenetics to transcriptomics and microbiology. 

It can be applied to uniform cell populations or individual cells, and even to complex microbial communities. Here, we review a few of the more popular uses for the technology.

Monitoring human variation

One popular application of human-genome sequencing is “resequencing,” a strategy in which human genomes or exomes are sequenced and the reads aligned to a reference genome, to identify benign and possibly pathogenic variants. Typically this is performed with short-read data. In one recent blockbuster example, a research team led by Daniel MacArthur, co-director of medical and population genetics at the Broad Institute of Harvard and MIT, reported the exome sequences of more than 60,000 human individuals, identifying more 7.4 million variants [5].

Clinical sequencing

Clinical researchers are focused on pinpointing and identifying genetic modifications or changes that are potentially linked to the cause of a disease state. The goal is to obtain better diagnoses for patients. This is where the power of NGS can excel. A number of laboratories now offer whole-genome or whole-exome sequencing services, as well as more targeted gene-panel sequencing.

But it’s likely nobody does whole-genome sequencing faster than Stephen Kingsmore, president and CEO of the Rady Children’s Institute for Genomic Medicine. Kingsmore holds a Guinness Book of World Record title for “fastest genetic diagnosis”—26 hours. According to Kingsmore, that strategy relies on a modified HiSeq 2500 whose high-speed runs have been pared from 26 hours to 18 hours. The protocol allows six hours for DNA sample preparation, 15 minutes each for sequence alignment and variant calling and 1.5 hours for data interpretation [6].

As of today, Kingsmore’s team has sequenced about 100 genomes on that fast protocol, with a diagnostic yield of about 57%, he says. But not every patient requires such speedy delivery; ultra-high-speed sequencing and diagnosis are only required for neonates in emergency situations, where time is of the essence. For other patients—those who have undiagnosed neurodevelopmental or cardiac difficulties, but who are not in immediate clinical danger, for instance—the team can afford a slower turnaround time. The researchers extract everything they can from every sequenced base. “Each genome is expensive to generate,” Kingsmore says. “It makes sense to get as much information as possible.”

Transcriptome sequencing

Another popular application uses NGS to monitor RNA abundance and structure.

Though generally dubbed RNA-seq, transcriptome analysis involves a number of distinct approaches. These include differential gene-expression analysis, in which large numbers of short reads are compared to identify differences in transcript abundance between samples; transcript structure analysis, in which longer reads are used to work out the precise exon and splice site usage in different RNAs; and de novo transcriptome assembly, whereby RNA reads are assembled from scratch into whole transcripts, rather than aligned to a reference genome. 

Epigenetics

Researchers also can use NGS to highlight multiple epigenetic features—changes in an organism caused by modification of gene expression as opposed to alterations of the genetic code. Examples include chromatin immunoprecipitation (ChIP)-seq, in which protein-DNA complexes are crosslinked, immunoprecipitated using specific antibodies and then sequenced to identify regions of protein-DNA contact; Hi-C, a strategy for mapping 3D chromosome architecture; bisulfite sequencing, for mapping DNA methylation; RIP-Seq, which identifies the transcripts associated with specific RNA-binding proteins; and DamID-seq, which maps DNA contact with the nuclear lamina.

Single-cell and metagenome sequencing

Though most DNA sequencing studies represent cell populations, researchers also are working to sequence the DNA and RNA of individual cells to gain a window into cell-to-cell variation. One area where this is proving particularly valuable is microbiology, where the vast majority of microbes are as yet uncultivable.

According to Tanja Woyke, microbial genomics program lead at the U.S. Department of Energy (DOE) Joint Genome Institute (JGI), single-cell genomics works hand in hand with the sequencing approach known as metagenomics to reveal the metabolic capabilities of microbes and their communities. Metagenomics, she explains, reveals the genetic makeup of the community overall. But it struggles with assembling the resulting DNA fragments into completed genomes without creating mixtures of different strains, making it difficult to determine which organisms provide which metabolic capabilities. Single-cell sequencing, on the other hand, “is tedious, expensive and biased”—not every cell can be sequenced or amplified, for instance—but it does at least provide the knowledge that every piece of DNA arose from a single cell. Thus, single-cell sequencing also enables researchers to associate viruses with their hosts, and plasmids with chromosomes.

Emerging applications

The McDonnell Genome Institute, at Washington University in St. Louis, has in its collection one Illumina HiSeq X Ten, one HiSeq 4000, four HiSeq 2500s, four MiSeqs, two PacBio RS IIs and an Ion Torrent PGM. And those instruments aren’t gathering dust. “As much firepower as we have in theory, per week I think we’re probably at maybe 50 to 75% total capacity,” says Elaine Mardis, the Robert E. and Louise F. Dunn Distinguished Professor of Medicine, who co-directs the institute.

For the most part, the McDonnell’s sequencers are dedicated to three tasks: whole human genomes, whole exomes and transcriptomes, with the latter two largely related to cancer genomics research. Yet new applications are constantly emerging, Mardis says. “It’s overwhelming how much is going on in this space.”

One particularly exciting application area, she adds, is cancer immunogenomics: Using exome sequences from cancer patients to identify tumor-specific alterations that, when compared to their human leukocyte antigens (HLA) haplotypes, indicate the most immune-stimulatory peptides in the cancer proteome. This information is then used to develop personalized, tumor-specific vaccines. In one recent study, for instance, Mardis and colleagues sequenced the exomes of three patients with advanced melanoma, identifying peptides likely to bind and activate the patient’s dendritic cells and thereby enhance the immune system’s ability to attack the tumor. After vaccinating each patient with his or her personalized dendritic-cell vaccine, the team monitored the patients to identify immune-stimulatory peptides [7].

Though Mardis’ team ran its sequencing experiments using Illumina technology, HLA typing is also a key emerging application of Pacific Biosciences’ long-read technology, according to chief scientific officer Jonas Korlach. The HLA genes, he explains, are a family of long, highly polymorphic genes several kilobases in length. Accurate typing and phasing of these genes is required by many research and clinical applications, including organ transplantation. But traditionally, he says, researchers have struggled to accurately and quickly type the HLA locus. In 2015, researchers at PacBio and in the United Kingdom demonstrated the use of PacBio sequencing to deconvolve the HLA locus [8].

Indeed, many researchers see exciting possibilities in the growing availability of long-read sequences. Kingsmore, for instance, is excited about the opportunity to easily detect structural variants in the clinic—something that traditionally has been performed using microarrays—and to combine that with the SNP-detecting power of Illumina-read sequencing.

Another trend Kingsmore sees is the growing use of sequence data in the clinic to answer questions beyond mere diagnosis. For instance, it should be possible to extract pharmacogenomic information (which can inform drug selection and dosing based on the sequence of the cytochrome p450 genes) from whole-genome or whole-exome data. Though such questions can be answered in a laboratory setting already, he says, integrating those data into the healthcare system will involve educating physicians and genetic counselors and providing them with tools to make sense of the data.

At the JGI, which accepts applications from scientists to leverage the facility’s sequencing capabilities for both metagenomic and single-cell applications through its Community Science Program, Woyke sees new applications in the increasing integration of short- and long-read sequence data, especially vis-à-vis metagenomics data. At the moment, she explains, metagenomics data are easier to collect but harder to interpret, as it is exceptionally difficult to assemble metagenomics fragments into complete bacterial genomes. But long-read sequences simplify that problem. “I would give single-cell genomics another five years of popularity, at least in the microbial world,” she says. “After that, who knows?”

According to Harvard University geneticist and sequencing pioneer George Church, researchers today are becoming increasingly clever in their use of DNA sequencing, finding ways to use sequencing to answer biological questions by turning them into changes in DNA.

For instance, Church’s lab recently developed a strategy to create evolving genetic barcodes in cells to track their lineages during division in situ. The readout of such an assay: in situ RNA sequencing [9].

NGS, Church says, is a bit like automation. “It used to be that when you wanted to automate something, you would buy a general-purpose robot, study how a person does [a task] and try to turn it into a set of commands,” he explains. “Now, you try to turn it into a sequencing experiment, because a sequencer is a highly multiplexed, automated instrument in its own right.”

Looking to the future with NGS

With the advent of exciting tools and reagents and improved instruments, NGS continues to revolutionize how researchers explore the genome.

The most subtle nuances, even at the single-cell level, are being dissected with NGS.

As with any evolving technology, there are many challenges that NGS researchers and tool providers face—such as experimental costs, data storage and analysis and the technology’s gradual adoption in key research sectors such as the clinic.

Cost per experimental run is a leading factor, impacting all parties involved in genome research. Although experimental costs are decreasing, the need for careful experimental planning is needed to maximize the return on investment in a NGS run.

Storage of the petabytes’ worth of information generated, and moreover how to analyze and interpret that data, can be a major hurdle. Cloud storage has become the repository solution for corralling the data generated. Cloud storage also enables access and sharing among research labs. Tool providers, as well as third-party software makers, continue to work with researchers to create solutions that will enable them to gain a better grasp of research results.

Lastly, clinic adoption of NGS will require regulated and approved processes and methodologies to enable clinical researchers to make fast, reliable and accurate interpretation of results—which will in turn enable accurate patient diagnoses.

Stay tuned for more exciting NGS breakthroughs from both the research and tool-provider communities, as they work together to advance a more comprehensive understanding of the diverse array of genomes.

Visit the Biocompare product directory to explore and learn about the latest NGS technologies and reagents available to scientists performing genomic research and characterization.

Image:  Shutterstock Images

References

[1] Shendure, J, et al., “Accurate multiplex polony sequencing of an evolved bacterial genome,” Science, 309:1728-32, 2005. [PMID: 16081699

[2] Margulies, M, et al., “Genome sequencing in microfabricated high-density picolitre reactors,” Nature, 437:376-80, 2005. [PMID: 16056220

[3] Goodwin, S, McPherson, JD, and McCombie, WR, “Coming of age: Ten years of next-generation sequencing technologies,” Nat Rev Genetics, 17:333-51, 2016. [PMID: 27184599]

[4] Ip, CLC, et al., “MinION Analysis and Reference Consortium: Phase 1 data release and analysis,” F1000Research, 4:1075, 2015. [PMID: 26834992

[5] Lek, M, et al., “Analysis of protein-coding genetic variation in 60,706 humans,” Nature, 536:285-91, 2016. [PMID: 27535533]

[6] Miller, NA, et al., “A 26-hour system of highly sensitive whole genome sequencing for emergency management of genetic diseases,” Genome Medicine, 7:100, 2015. [PMID: 26419432]

[7] Carreno, BM, et al., “A dendritic cell vaccine increases the breadth and diversity of melanoma neoantigen-specific T cells,” Science, 348:803-8, 2015. [PMID: 25837513]

[8] Mayor, NP, et al., “HLA typing for the next generation,” PLOS ONE, 10:e0127153, 2015. [PMID: 26018555

[9] Kalhor, R, et el., “Rapidly evolving homing CRISPR barcodes,” BioRxiv, July 26, 2016.

[10] Garalde, DR, et al., “Highly parallel direct RNA sequencing on an array of nanopores,” BioRxiv, Aug. 12, 2016.