Sequencing is already integral to life science research and human health, but as sequencing costs drop and novel technology improves the speed and simplicity of gathering genetic information it is poised to become even more invaluable.

Until recently, whole genome sequencing (WGS) has lingered in the background of the present excitement over genomics. WGS is the most comprehensive approach to sequencing, delivering the least bias and highest coverage and reproducibility while generating the most data. However, it can be incredibly time-consuming, extremely costly, and generates so much data that finding the right tools to analyze it can be challenging.

Of course, such a beneficial approach to genomics cannot be ignored. With advancements in sequencing technology as applied to throughput and specificity, improvements in library preparation efficiency for increased sample input, and new data collection and analysis tools for all-inclusive outlooks, whole genome sequencing is making its way to the forefront of the genomics world and at a continually decreasing cost.

And as WGS costs become more comparable to exome sequencing, “the question becomes, why not?” explains Adriana Geldart, sample prep support manager at Roche, “Why not get a more comprehensive view of a genome through WGS when new technologies in instrumentation and sample prep make this much more achievable?”

Enabling technologies

Michael Sismour, NGS lead at Beckman Coulter, has been involved with genomics for quite some time, having started in George Church’s lab at Harvard University. He is thrilled to see that sequencing technology has come so far in the past 10 years. Not only have read lengths evolved from 25 base pairs to the ability to read over 10Kb at a time, but “new technologies are encouraging even more in-depth comprehensive coverage of hard-to-read regions in the genome at a faster pace and it is exciting to see where we can take things,” he says. “At Beckman we are working to develop enabling technologies using automation that allow sequencing to become higher throughput.”

While previous technologies supplying short reads work for large parts of the human genome and are typically used for panel and targeted sequencing, other regions are not amenable to this type of analysis due to the widespread presence of repeats and paralogs. In these cases longer reads are needed to piece together large structural variants, including non-coding regions, to allow complete analysis of the genome.

Companies like 10X Genomics are making successful long-range sequencing easily possible with their Linked-Read technology included in their Chromium™ Genome, Exome, and de novo Assembly Solutions. Linked-Reads are generated from short-read sequences created with an in-line barcode, where reads that share a barcode can be grouped to form a single long input molecule. Thus, long range information can be assembled from short-reads, allowing the placement of short-read material within the context of the whole genome. This unique approach provides access to haplotype information that reveals the diploid nature of individual human genomes and insight into large structural variants such as inversions and translocations.

In a recent press release from 10x Genomics, CEO Serge Saxonov explains that “access to long-range sequence information is becoming the standard for obtaining the most comprehensive understanding of disease, and Linked-Reads technology can help enable labs to gain valuable insights that have previously not been possible.”

Sequencers themselves have also been getting an overhaul, specifically with WGS in mind.

Sequencers themselves have also been getting an overhaul, specifically with WGS in mind. Companies are recognizing the need for instrumentation that can generate the kind of throughput required for WGS at a reasonable cost and faster timeframe. Illumina’s newest release, NovaSeq, caters to WGS, surpassing previous iterations including the HiSeq X system in quality. While the HiSeq X can sequence over 1,800 human genomes per year and was the first platform to provide whole human genome sequencing for under $1,000, the NovaSeq platform was made to offer researchers more flexibility, and could eventually lead to whole human genome sequencing for much less.

Joel Fellis, director of product marketing for the NovaSeq, explains, “NovaSeq 6000 provides the most powerful, flexible, and cost-effective solution for human WGS studies. The NovaSeq S1 flow cell allows users to sequence a trio in a single day, while the S4 flow cell allows users to sequence up to 48 genomes in less than two days.”

These instruments utilize Illumina’s sequencing by synthesis (SBS) chemistry. While this chemistry has become a standard for sequencing, other chemistries such as sequencing by single molecule or by ligation, or sequencing based on atomic force microscopy also have the potential to change the WGS landscape.

However, even if sequencing itself becomes routine and cheap, other bottlenecks such as library preparation still exist. Sismour offers that WGS technology might eventually reach a level of being able to essentially eliminate this step, speeding up and simplifying the process even more.

In fact, Roche has been fine-tuning library prep kits to do just that. Working to advance microbial WGS, they have applied their KAPA HyperPlus kit to single-colony whole genome sequencing of crude bacterial isolates. Applying a crude extraction protocol prior to library preparation streamlines the sequencing workflow while still remaining robust enough to tolerate inhibitory molecules present in crude lysates.

Biocompare’s NGS Library Prep Search Tool
Find, compare and review library prep
tools from different suppliers Search

A study was performed in collaboration with the University of California, Davis validating the use of the KAPA HyperPlus Kit in the preparation of genomic DNA (gDNA) libraries directly from crude cell lysates of both gram-negative and gram-positive bacteria, bypassing the need for liquid culture and DNA extraction. Subsequent sequencing and data analysis confirmed that the resulting genome assemblies are comparable to those generated from purified bacterial gDNA.

Applying what we built

All of these evolving technologies have really paved the way for grand-scale studies, allowing more information than ever to be garnered from each WGS run. From large multi-consortium studies to blooming services for both company and consumer focuses, WGS might prove to be a solid foundation from which companies and research institutes alike can build from.

Subscribe to eNewsletters
Get the latest industry news and technology
updates related to your research interests.

WGS has already proven to be a useful tool for extensive studies where multiple genomes can be sequenced and compared, analyzing genomes from patients with a specific disease and those of their family members. Comparisons can reveal causal variants that can then be focused into the development of potential treatments. These large-scale investigations combine patient data from participating parties to maximize the number of genomes that can be analyzed, ensuring all variations, even rare ones, are being detected. “We need a high level of granularity to discover how genetics affects health. The more people we sequence, the easier it is going to be to unravel these mysteries that are very complex to figure out,” Sismour says.

In line with this approach, several nascent companies spun out of research institutions and academic labs aim to gather information from WGS. One of these companies, Veritas Genetics, founded by scientists from Harvard Medical School’s Personal Genome Project, has developed the myGenome project. By sequencing whole genomes, Veritas can identify the genetic variants associated with disease risks and guide dietary modifications to mitigate these risks as well as potential therapeutic interventions. Once a genome has been fully sequenced through one of these programs, its data can be revisited as new treatments are found and new knowledge is gained. Subsequently, health monitoring and screening can easily be done, assisting in liquid biopsy analysis, microbiome evaluation, and cancer diagnosis.

Another WGS application is in agrigenomics, a growing field that focuses on the development of genetically improved seeds for healthier and more productive crops. Rapid Genomics takes advantage of the flexibility of WGS, using it to develop quick resources for other targeted sequencing applications, and for SNP discovery, detection of structural variants, loci mapping, genome assembly, haplotype phasing, QTL mapping, phylogenomics, and genomic selection. “These methods can utilize WGS more as the costs of generating data decreases, especially in genomic selection and prediction,” explains Orin McCormick, sales manager at Rapid Genomics.

Easing data analysis

Researchers

Researchers prepare tissue samples for whole genome sequencing at The Rockefeller University, where clinical researcher Robert Darnell, M.D., Ph.D., led a study with the New York Genome Center and IBM to analyze complex genomic data from state-of-the-art DNA sequencing of whole genomes. Image courtesy of Epic Creative.

As new technologies continue to improve the quality of WGS, new approaches combining these technologies can pull the most out of sequencing. “Using a combination of instruments and technologies leverages the strengths and benefits of each sequencing technology, gathering more and better data than just using one alone,” explains Sismour. Other combined technologies such as genome editing and synthetic biology can utilize WGS not only for confirmation of products and signaling off-target effects, but also to verify genome representation and for de novo assembly.

Once WGS is ready to be a routine analytical tool, the challenge then moves downstream to computational advancements in data collection and analysis. IBM’s Watson for Genomics has been shown to analyze whole human genome sequences in less than one-tenth the time of current methods. In a study conducted by the New York Genome Center and IBM, Watson was able to provide a report of potential clinically actionable insights within 10 minutes, compared to 160 hours of human analysis and curation required for similar conclusions. Regular advancements in all aspects of the WGS method pipeline, including sample prep, sequencing, and data analysis, aid in ongoing efforts to standardize the use of WGS in research and in the clinic.

Image: Structural variant illustration. Image courtesy of 10x Genomics.