The focused efforts by scientists to obtain a better understanding of the genome ultimately boils down to DNA and understanding what it has to say. As genome sequencing continues to become easier to perform, faster and less costly, researchers must still choose between sequencing specific parts or sequencing the entire genome to get to their answers. 

Next-generation sequencing (NGS), in particular, works faster, provides longer reads and delivers potentially more information than many other molecular technologies.

Nonetheless, without sequencing the entire genome—so-called whole genome sequencing (WGS)—important genes might not be included. If all of the genes are not sequenced, ones in low abundance or with rare mutations might be completely missed, and some of these could be involved in deadly or debilitating diseases. Consequently, some researchers select WGS to get a complete picture of the genome.

According to Jonas Korlach, chief scientific officer at Pacific Biosciences, the “development of assembly algorithms for high-quality genomes that capture the diploid nature of the genome, that is, representing the genome by the two alleles,” is one of the most exciting recent advances in WGS. He adds, “Previous whole genome sequencing approaches have ignored the fact that all higher organisms have at least two copies of the genome.”

As technical advances continue to be developed, the applications and utilities for WGS continue to broaden. For example, Fiona Stewart, portfolio manager, next-generation sequencing, at New England Biolabs, says that one of the most exciting recent advances is “the ability to obtain high-quality sequence data and improved mutation detection sensitivity from small amounts of low-quality, FFPE DNA samples.”

Overall, WGS can now take on new samples and return a deeper dive of data. That makes this technology reveal more information in genes than was possible in the past.

Getting SMRT

To capture the diploid information in a genome, Jason (Chen-Shan) Chin, senior director of bioinformatics at Pacific Biosciences, and colleagues from the U.S. Department of Energy’s Joint Genome Institute and a collection of universities used Pacific Biosciences’ open-source FALCON and FALCON-Unzip algorithms to assemble Single Molecule Real-Time (SMRT) sequencing data [1].

Chin and his colleagues wrote: “We demonstrate the quality of this approach by assembling new reference sequences for three heterozygous samples, including an F1 hybrid of the model species Arabidopsis thaliana, the widely cultivated V. vinifera cv. Cabernet Sauvignon, and the coral fungus Clavicorona pyxidata that have challenged short-read assembly approaches. The FALCON-based assemblies were substantially more contiguous and complete than alternate short or long-read approaches.” Moreover, this technique allowed the scientists to study “haplotype structures and heterozygosities between the homologous chromosomes, including identifying widespread heterozygous structural variations within the coding sequences.”

As Korlach points out, the SMRT technology provides “long sequence reads, lack of GC-bias and high consensus accuracy.” He adds, “All three characteristics are needed in a sequencing technology to allow for this new gold standard in high-quality, phased de novo genome assemblies.”

In addition, this technology works in a straightforward way. Korlach explains: “You apply SMRT sequencing to your sample of interest, run FALCON-Unzip and produce a high-quality, phased assembly that has high accuracy, completeness and contiguity.”

Combining company strengths

In 2016, Pacific Biosciences and Dovetail Genomics formed a collaboration. In particular, Dovetail added Pacific Biosciences’ SMRT to its genome assembly services.

According to Brandon Rice, chief operating officer at Dovetail Genomics, “Genome assemblies constructed with short-read NGS sequencing alone are often fragmented, leaving highly repetitive regions of the genome unresolved—gaps in the assembly.” He adds, “PacBio’s SMRT technology allows sequencing across repeat regions of up to about 10 to15 kilobases, providing base-level resolution of these regions, which in some instances can be gene rich and key to the organism’s biology—for example, repressor genes in plants.”

This added technology provides more options for Dovetail Genomics’ customers. “With our new service offering,” says Rice, “Dovetail will construct highly contiguous de novo assemblies produced with PacBio’s SMRT technology and then further scaffold the assemblies with our proprietary Chicago libraries and HiRise software.” He adds, “Together, these technologies offer customers the best possible genome assemblies available today, with contigs often exceeding 5 million base pairs and super-scaffolds often exceeding 30 million base pairs.”

Get more from less

Many samples for sequencing can be hard to come by, which means scientists would like to use as little as possible to preserve this precious material.

In other cases, scientists want to sequence very specific samples. “I would say single-cell sequencing is all the rage, but low-input, linked-read whole genome sequencing is also pretty nifty,” says Shana McDevitt, facility director at the QB3 Vincent J. Coates Genomics Sequencing Laboratory (GSL) at the University of California, Berkeley.

For those technologies, McDevitt mentions 10X Genomics, whose Chromium platform “can do linked-read WGS with very low genomic DNA inputs, adding a molecular barcode to library molecules through a molecular partitioning process with subsequent libraries sequenceable on robust and affordable Illumina platforms.” Scientists can use linked-read sequencing in various ways. “You can use the technology to detect copy number and structural variations that are hard to characterize with standard shotgun library preparations and Illumina short reads,” McDevitt explains. “You can also phase variants to haplotypes and get long, single-molecule type read data without microgram-scale inputs required for true long-read, single-molecule sequencing.”

Overall, McDevitt notes that the 10X Genomics platform provides long-range sequencing information on a single molecule without high DNA inputs, and all with “Illumina costs and sequencing accuracy.” She points out that this new technology still needs further validation from the research community, concluding, that “it will most likely not replace the need for true single-molecule technologies for all applications, but the technology has very exciting promise for the large body of researchers fighting to get by with nanograms of input material.”

Fueling FFPE findings

The ability to get the desired sequencing data from FFPE samples, says Stewart, arose from a combination of advances in various technologies. One, she says, is “significant improvements in library construction efficiency at each step of the workflow—for example, in the NEBNext Ultra II DNA Library Prep Kit for Illumina—thereby enabling higher yields of high-quality libraries from lower input amounts.” The second advance is “repair of the types of DNA damage found in FFPE DNA by use of the NEBNext FFPE DNA Repair Mix,” Stewart says. This repair kit is a mixture of enzymes formulated to repair various kinds of DNA damage, including deamination of cytosine to uracil, nicks and gaps, oxidized bases and blocked 3' ends. “Treatment of FFPE DNA with this mix ahead of NGS library preparation results in both increased quantity and quality of libraries, enabling higher-quality sequence data and an improved ability to detect low-frequency mutations,” Stewart says.

These technologies allow many samples to be analyzed by WGS, whereas in the past it may have been difficult to perform any detailed analysis. “These advances enable straightforward access to the very large number of FFPE DNA samples whose DNA was previously too compromised to be able to obtain useful sequence data in a cost-effective manner,” says Stewart.

Advances in WGS technologies continue to evolve and are enabling researchers to study a wider variety of samples. This in turn is teaching scientists more about the genome’s biology and regulatory events and ultimately helping to uncover new diseases treatments.

Reference

[1] Chin, CS, et al., “Phased Diploid Genome Assembly with Single Molecule Real-time Sequencing,” bioRxiv, 2016.

Image: Shutterstock Images