First, there was the genome. Then there was the proteome, and the microbiome. Today, one of the most promising emerging “omics” fields is metagenomics—the study of the genes that exist in whole microbial communities found in samples such as soil, water, and stool.

It’s a tool that has the potential to unlock novel solutions in fields ranging from environmental cleanup to combating antibiotic resistance. For example, one recent study in China used metagenomic analysis of urban sewage to identify 381 different genes for antibiotic resistance that are extensively shared across the country—suggesting that monitoring sewage systems for antibiotic-resistance genes could boost antibiotic stewardship efforts by providing a real-time monitoring estimate of antibiotic resistance threats in specific areas.

In another cutting-edge application of metagenomics, Charles Chiu, an infectious disease physician at the University of California in San Francisco, developed a metagenomics-based assay called SURPI (Sequence-based Ultra-Rapid Pathogen Identification that can identify all potential pathogens in a sample of cerebrospinal fluid—viruses, bacteria, fungi, and parasites—in a single test.

Metagenomics is complicated by the fact that, unlike an individual human, animal, or plant genome or proteome, microbial communities are so vast and complex that they can never be fully sequenced, but only sampled. (Metagenomics pioneer Rob Knight, founding director of the Center for Microbiome Innovation and professor of pediatrics and computer science & engineering at the University of California, San Diego, noted in a recent analysis in Nature Biotechnology that the Earth is home to more than 1030 microbial cells, “a figure that exceeds the number of known stars in the universe by nine orders of magnitude.”)

Sample collection and documentation

“One of the primary challenges we have in the field is standardization of all the processes for sample collection and documentation,” says Toby Richardson, vice president for bioinformatics at Synthetic Genomics, which has an integrated suite of tools for metagenomic discovery. “The typical environmental sample in humans is fecal matter, for example. When does the sample get taken? Is it one day old, two days old? How is it transferred from the clinical setting to the place where it is processed?”

Standardization of sampling and extraction are key issues that must be addressed for the field to grow.

Metagenomics pioneer Rita Colwell, a Distinguished University Professor both at the University of Maryland at College Park and at Johns Hopkins University Bloomberg School of Public Health, senior advisor and chairman emeritus at Canon US Life Sciences, and president and CEO of CosmosID, agrees that standardization of sampling and extraction are key issues that must be addressed for the field to grow.

“In metagenomics, there is a tendency to forget the need for statistically significant or relevant samples from which you can draw conclusions. Getting a single sample and drawing great conclusions from it has inherent danger, and designing experiments so that a sample that you take can be meaningful and have relevancy is critical,” she says. “Let’s talk about water. Do you filter a milliliter, ten liters, a hundred liters, a thousand?”

DNA/RNA isolation

Standardization around the isolation of DNA poses additional challenges, Richardson says. “Every sample has its quirks. Soil and fecal material are very different in terms of how you might process them, and you have to explore different kits to work out the most efficient method of extraction.”

Germany-based Machery-Nagel provides, for example, specialized RNA isolation kits based on anion exchange, which yield a cleaner sample for amplification than the standard silica membrane approach.

“Our standard silica spin kit for environmental samples include specific patterns of washing buffers that wash away the humic acids,” says Andreas Hecker, one of Macherey-Nagel’s product managers for bioanalysis. “In our stool kits for example, we have an included inhibitor removal column; the entire lysate is applied onto that column and spun through it. This removes the inhibitory substances and also filters non-lysed components. It’s a one-time, one-minute centrifugation step in the protocol.”

Biocompare’s DNA Purification Search Tool
Find, compare and review purification
tools from different suppliers Search

Sample homogenization also makes preparation of samples from soil and stool complicated, because they contain a significant amount of difficult-to-disrupt tissue. Most of Macherey-Nagel’s environmental kits include NucleoSpin Bead Tubes. These 2 mL tubes contain glass, steel, or ceramic beads—or mixtures of these for even more challenging samples—that can mechanically disrupt the sample. “You fill one of these tubes with your sample, add lysis buffer, vortex for a maximum of 20 minutes, and then you’re done with homogenization,” says Hecker. “Glass beads make sense for bacteria, whereas for soil and stool samples ceramic beads are the best, and plant material is homogenized best using steel beads.”

Once the sample is homogenized, at present, the most common method of analyzing it involves 16S ribosomal RNA (rRNA) gene amplicons. “The 16S rRNA gene is highly conserved in bacteria, so it’s an obvious choice to use as a proxy to find out what species are in the sample,” says Richardson. “Up until now, I would say that 80–90% of all the metagenomic data in published papers has used 16S. But there are certain inherent biases—for example, you have to pick primers to come up with a universal amplicon. And every organism may have multiple copies of 16S, so it is difficult to get quantitation unless normalization is done on the data.”

Next-generation sequencing

At some point, experts predict, 16S is likely to be supplanted by next-generation sequencing—non-Sanger-based, high-throughput, massively parallel sequencing. “Initially most people were delighted with the 16S results, but now the results from NGS are very exciting,” says Colwell. “We are still in a kind of beginning stage, however, and once again, standardization around things such as minimum standard deviation must be developed, which I think is best coming from a kind of an inter-laboratory consensus procedure. This would expedite FDA approval of next-generation sequencing as a tool.”

For the present, there are a number of open-source tools available to facilitate analysis of high-throughput 16S RNA sequencing studies. The commonly used MG-RAST server provides annotation of sequence fragments, their phylogenetic classification, functional classification of samples, and comparison between multiple metagenomes. Dr. Knight’s QIIME (pronounced “chime”) lets users turn raw sequencing data generated on platforms from Illumina or other leading vendors into publication-quality graphics and statistics.

But all of this takes time. “Sequencing and assembly are cheap, but annotation becomes a bottleneck,” Richardson says. He reports that Synthetic Genomics has technology that will speed up these processes. “It’s focused on deep learning and AI, and we believe it’s going to change the way people do annotation. SGI recently assembled a microbiome dataset that was 47 Gbp—larger than any that we’re aware of being published to date. To annotate that over existing systems would take weeks to months depending on a given institutions compute infrastructure. With deep learning, we’ve found we can achieve that annotation in an hour for a couple of hundred dollars.”

Field work

Another key area of need for metagenomics are tools that can take it from the specialist’s laboratory to the field. “We need rapid, user-friendly tools that allow general biologists or other scientists without advanced metagenomic training to do large analyses—for example, in state public health laboratories when there is an outbreak. Say they have a mystery isolate from someone in the clinic and they need to find out if it’s coming from a food source, a water source, or a hospital-based transmission,” says Nur Hasan, vice president of CosmosID, which is developing such solutions. “You need rapid subtyping, but right now it takes forever—or you have to hire a bioinformatician and have the computational infrastructure. We are working to create an extensively curated metagenomic database and a validated algorithm that will provide an easy-to-use tool for rapid molecular epidemiological subtyping.”

Dr. Colwell predicts that within as little as five years, hand-held metagenomic sequencing devices will be available for office or hospital-based use to facilitate immediate diagnosis. “Illumina is certainly on that path. I think back on how tedious it was when I developed the first computer program to analyze microbial taxonomic data 40 or 50 years ago. We had to grow things in test tubes and tediously code them, and now we sit with a laptop computer and bingo. The future is extraordinary.”