No Culture, No Problem! Probe Uncultivable Microbes with Metagenomics

 Metagenomics Tools
Jeffrey Perkel has been a scientific writer and editor since 2000. He holds a PhD in Cell and Molecular Biology from the University of Pennsylvania, and did postdoctoral work at the University of Pennsylvania and at Harvard Medical School.

It’s easy to forget, in laboratories flush with microbe-dotted Petri plates, that most bacteria are not like E. coli. The vast majority of microorganisms will grow neither on agar plates nor in liquid cultures, making it nearly impossible to study their biology, biochemistry and contributions to ecology, health and disease.

“More than 99% of prokaryotes in the environment cannot be cultured in the laboratory, a phenomenon that limits our understanding of microbial physiology, genetics, and community ecology,” wrote Patrick Schloss and Jo Handelsman in a 2005 review on the topic [1].

Likening the problem of unculturable microbes to the unsolvable “Gordian knot” of Greek myth, Schloss and Handelsman note that there are two fundamental solutions. One is to figure out how to culture the uncultivable. The other is, metaphorically, to cut the knot—a strategy called metagenomics.

Dealing with uncultivable microbes

Metagenomics, sometimes called community genomics, reduces microbes to their genetic material, using DNA sequencing to determine which species are present and what they can (theoretically) do. It is one of a pair of strategies used today to study the genetic makeup of uncultivated organisms, the other being single-cell sequencing.

Metagenomics and single-cell genomics are complementary techniques, says Tanja Woyke, microbial genomics program lead at the DOE Joint Genome Institute (JGI), which funds and performs metagenomics studies, and indeed, most JGI applicants request funds to perform both analyses. Single-cell genomics is costly, tedious and can be difficult to apply to large cell numbers, she says. But it provides a definitive link between phylogeny and function, establishing that a particular cell contains a specific set of genetic instructions.

Metagenomics studies are technically simpler—“almost any sample is amenable to metagenomics, whereas that’s not necessarily the case with single-cell genomics,” Woyke notes—but the data analysis problem is incredibly complicated, as pieces arise from every member of the community and very often, those pieces are just a few hundred bases in length. It’s therefore difficult to assemble metagenomics data into contigs of any substantial size, or even to associate any particular piece of DNA with the organism it came from. Harder still is assembling those pieces into whole genomes.

Soil communities are particularly complicated, Woyke notes, and in some cases “less than 10% of reads assemble into contigs.” Lower-complexity samples, such as those from hot-spring environments, are more tractable. “Here, sometimes 50% to 90% of reads assemble [into contigs],” she says.

In fact, metagenomics actually comprises two different sequencing strategies. Targeted metagenomics focuses on specific gene sequences—usually the 16S ribosomal RNA gene, which provides a phylogenetic “barcode” that can be used to survey a community’s composition. Shotgun metagenomics sequences everything to get a sense not just of which microbes are present in an ecosystem but their functional coding potential, as well.

For example, explains Mark Driscoll, international product manager at 454 Life Sciences, a Roche company, “If you know microbes in a mine can digest metals and toxins, identifying the organisms [that are present] is nice, but you may want to identify the genes that help those organisms digest the metals.”

One recent study from Handelsman’s lab cloned bulk DNA fragments from bacteria in cow manure and used those libraries to identify clones containing antibiotic-resistance genes, which then were sequenced on a Pacific Biosciences instrument [2].

At ACGT Inc., a genomic analysis service provider in Wheeling, Ill., customers can request either the targeted or shotgun metagenomics approach, says scientific director Semyon Rubinchik. “So far the majority of requests have been for targeted analysis,” he says. In either case, samples are sequenced by short-read, paired-end sequencing on Illumina platforms, producing reads from 100 to 300 bases. For the targeted approach and using the Illumina MiSeq, fragments of up to 550 bp are possible. The longest contigs ACGT has produced after de novo assembly of shotgun sequencing data, Rubinchik says, are in the 100-kbp range.

A matter of read length

Every study is different, but at JGI the typical metagenome analysis consumes one Illumina HiSeq 2000 channel, producing 50 to 60 GB of sequence. With that level of data, “highly abundant organisms will start to assemble, while low-abundance organisms tend to remain fragmented,” Woyke says.

The HiSeq is a short-read sequencer: It produces billions of reads, but all are just a few hundred bases long. Longer-read sequencing technologies, such as those provided by 454 Life Sciences and Pacific Biosciences, provide far fewer reads per run (about 1 million for 454 and 50,000 for PacBio), but greater detail for each one (800 to 1,000 bp and >8,500 bp per run, respectively), making longer assemblies easier. Many researchers, in fact, blend strategies, says Driscoll, using short-read technology to collect as much data as possible and long-read approaches to scaffold and bolster their assemblies.

Jonas Korlach, chief scientific officer at Pacific Biosciences, presented data at this February’s Advances in Genome Biology and Technology meeting applying his company’s technology to a mock microbial community comprising some 21 bacterial strains. “We were able to assemble more than half the species into finished chromosomes,” he says, as well as a number of extrachromosomal plasmids. In contrast, using a short-read chemistry, “there was not a single complete genome, and typically each genome was in 50 to 100 different pieces.”

Perhaps more significantly, he notes, as a single-molecule method PacBio’s SMRT chemistry avoids biases (against, say, sequences with higher or lower GC content) that can plague other amplification-based technologies. “We captured more than 90% of the genome of all bacteria combined,” he says, while a competing short-read approach “missed approximately 25% of the genome content.”

Even 16S rRNA surveys can be complicated with short-read technologies. Weighing in at about 1,500 bp, the 16S gene contains multiple variable regions, Driscoll says; sequence only a few of them, and the best you can hope for is family- or order-level classification. “The more of the 16S gene you sequence, the more likely you are to be able to classify an organism precisely [to the genus/species level].”

Tools and tools

There’s nothing magic about metagenome sequencing. The trick is to efficiently extract genomic DNA from all or at least most microbes in the sample. “There is no single extraction method,” Woyke says. “Every sample is different and the taxa are different, and you can take one sample and five methods and get five different results.”

That said, many methods use mechanical disruption (bead beating) to break up a majority of cells, Woyke notes. The standard operating procedures for the Earth Microbiome Project, “a systematic attempt to characterize the global, microbial taxonomic and functional diversity,” uses that approach. Yet bead beating can be a double-edged sword, Woyke warns. Tougher cells eventually break, but “cells that lyse easier, lyse first, and you really shear the DNA.”

But as with other ‘omics disciplines, it’s not the data collection that is so difficult in metagenomics, but data analysis. And as with other ‘omics disciplines, analytical tools to make sense of the data have sprung up. Woyke recommends a JGI tool called IMG/M, which allows users to annotate and compare their data to other public metagenome, single-cell and bacterial isolate datasets. (JGI offers workshops to train users in IMG/M, available even to those who do not collaborate with JGI.) ACGT researchers tend to use Illumina’s MiSeq Reporter Metagenomics Workflow (PDF) or MetAMOS, a de novo assembly and variant-analysis tool, says Rubinchik.

Driscoll recommends that whatever approach and tools you use, make sure you start with clear experimental goals. 16S rRNA surveys require a different workflow than targeted metagenomics studies and far smaller volumes of data.

“Begin with the end in mind,” he says. “If you know what you need out of the experiment, it will follow out of that where to begin.”

References

[1] Schloss, PD, Handelsman, J, “Metagenomics for studying unculturable microorganisms: Cutting the Gordian knot,” Genome Biology, 6:229, 2005. [PubMed ID: 16086859]

[2] Wichmann, F, et al., “Diverse antibiotic resistance genes in dairy cow manure,” mBio, 5[2]:e01017-13, 2014. [PubMed ID: 24757214]

  • <<
  • >>

Join the discussion