Sequencing Emerging Diseases

 Sequencing Emerging Diseases
Mike May earned an M.S. in biological engineering from the University of Connecticut and a Ph.D. in neurobiology and behavior from Cornell University. He worked as an associate editor at American Scientist, and he is the author of hundreds of articles for clients that include Nature, Science, Scientific American and many others.

According to the World Health Organization (WHO): “An emerging disease is one that has appeared in a population for the first time, or that may have existed previously but is rapidly increasing in incidence or geographic range.” One of the most recent concerns is the Zika virus, which comes from an infected mosquito—specifically one from the genus Aedes. Adults bitten by such mosquitoes might not feel bad enough to even suspect an infection, and death is extremely rare. If one of the infected mosquitoes bites a pregnant woman, however, it can trigger defects in the brain of her fetus, including microcephaly. Although scientists discovered the Zika virus in 1947, it really started to spread in the past few years. For example, the U.S. Centers for Disease Control and Prevention (CDC) states on its website: “On February 1, 2016, the World Health Organization (WHO) declared Zika virus a Public Health Emergency of International Concern (PHEIC). Local transmission has been reported in many other countries and territories. Zika virus will likely continue to spread to new areas.” With this emerging disease and others, scientists and clinicians benefit from knowing as many details about the disease as possible, and genetic sequencing can reveal its mechanism of action and guide the development of treatments.

As Jeremy Foster, staff scientist at New England Biolabs (NEB), explains, “Unlocking the secrets of a pathogen’s genome paves the way for potential development of new diagnostic tools, vaccine approaches and chemotherapeutic approaches.” He adds, “In addition, sequencing of multiple strains of pathogen from different geographic locations can address questions of genetic variation in populations and how the organism responds to factors such as drug pressure.”

According to Jonas Korlach, chief scientific officer at Pacific Biosciences, “One of the key benefits is that sequencing is a hypothesis-free method. You don’t need to know what you’re looking for.” The genetic sequence gives the complete blueprint for the source of a disease.

 

In short, the more that scientists can learn about a disease, the more likely they are to learn how to effectively treat it.  

Sequencing selections

Sequencing itself is not a new technology. In 1977, British biochemist Frederick Sanger developed a method—now known as Sanger sequencing—that typically produces the genetic “letters” for strands of up to about 1,000 bases. Then, analytical software puts the pieces together to build an entire genome. Many scientists still use this technique, but newer methods work faster and read longer strands of DNA.

Next-generation sequencing (NGS) is faster, and numerous NGS methods exist. Many scientists use systems from Illumina. So Serge Saxonov, CEO and cofounder of 10x Genomics, and his colleagues developed a microfluidic platform that goes upstream of an Illumina system. “It generates short bits of DNA that the Illumina sequencer can read, and we embed information about where the pieces came from. And then our software puts it all back together after the sequencing,” Saxonov explains. So anyone who knows how to use an Illumina system won’t need much training to add in the 10x Genomics technology. “We wanted to have minimal disruption of existing workflows,” Saxonov says.

Some techniques work with even longer sequences. Technology from Pacific Biosciences produces sequences that are, on average, more than 10,000 base pairs long. In some cases, this can be long enough to sequence complete viral-genome DNA molecules. In October 2015, Pacific Biosciences launched its Sequel System, which offers high-throughput capabilities and makes it easier for scientists to use. “Scientists from the University of California, San Francisco, used our technology to sequence a cell line from mosquitoes to investigate how the Zika virus infects cells,” Korlach says. “The long reads presented a much more complete and much less fragmented picture of the mosquito cell-line genome.”

Issues to address

As with any powerful and informative technology, NGS is not immune to its own challenges. “In certain cases, sample availability and quality is an issue,” Korlach says. “Areas where you have emerging diseases can be hard to get to, so getting the DNA can be complicated.” He adds, “The equipment to process the DNA might not be available, and the climate can be such that the samples degrade very quickly.”

In addition, the signal-to-noise ratio can be low. As Foster says, “Challenges in sequencing emerging diseases are many, and include access to typically very limited amounts of pathogen DNA in a background of mammalian DNA.”

As Barton Slatko, senior scientist at NEB, points out, there are other challenges to keep in mind. These include “the cost of sequencing and analysis, having trained individuals in appropriate locations, the accurate transfer of coded material to relevant labs, the availability of bioinformatics pipelines and ensuring that the information is robust and relevant.” Simply having high-tech sequencing tools is not enough to identify emerging diseases.

Calling all nucleic acids

Both DNA and RNA can be sequenced to understand and fight emerging diseases. “RNA sequencing is especially useful in helping to understand genetic mechanisms,” says Dalia Daujotyte, global scientific liaison manager at Lexogen in Vienna, Austria. “It shows the status of the organism at a given moment and therefore is invaluable in analyzing etiology and course of a disease.”

Lexogen focuses on RNA sequencing, known as RNAseq. “Only RNAseq has the power to discover and annotate new transcripts, fusion genes, mutations on the transcript level and, of course, detect gene expression,” says Daujotyte. “The latter has been studied using microarrays; however, broader choices of reliable RNAseq library preparation protocols, new sequencing instruments and reduced cost are making RNAseq the first option for differential gene-expression studies.”

In particular, differential gene expression can reveal aspects of a disease’s mechanism of action, and that can expose potential weak points to attack with treatments. Ultimately, a combination of DNA and RNA sequencing provides more information than either dataset alone.

Using segments of a genome

For emerging-disease identification, full-genome sequencing is not always required. According to Slatko, “Finding specific nucleic acid sequences for diagnostic probes or techniques such as the polymerase chain reaction [PCR], loop-mediated isothermal amplification [LAMP], rolling circle amplification [RCA] or helicase-dependent amplification [HDA] can rapidly survey individuals and populations at risk and provide biological and demographic information for rapid mobilization of resources.” These techniques rely on knowledge of DNA sequences of interest to be used for identification biomarkers. Full-genome sequencing obviously provides more information, especially where genomic sequence differences distinguish pathogenic strains from nonpathogenic ones.

Scientists also benefit from tools that work with small or selected samples. “At New England Biolabs,” says Foster, “we continually strive to streamline sequencing protocols and enable production of quality libraries from small amounts of nucleic acid input.” As an example, he says, “Our NEBNext Microbiome DNA Enrichment Kit and EpiMark Methylated DNA Enrichment Kit both enable enrichment of the unmethylated DNA typical of most infectious disease agents from mammalian methyl-CpG DNA prior to library construction.” Similarly, the company’s methylation-dependent restriction endonucleases, Foster explains, “degrade a high proportion of mammalian DNA to fragments that are about 32 base pairs long, which can be removed from library construction by size selection.” That improves the quality of the sample prior to analysis.

Slatko adds that target-enrichment protocols—such as SeqCap from Roche Nimblegen, SureSelect from Agilent and NEBNext Direct from New England Biolabs—“will be also very useful for eliminating unwanted DNA for more efficient library preparation of only the DNA one is interested in sequencing.”

Overall, the key is to get the most information about the disease-causing organism and then use that information to understand and defeat the organism. And nothing is better for that purpose than DNA-sequence information.

Image: Shutterstock Images

  • <<
  • >>

Join the discussion