NGS Flowers Brightly in AgBio

 NGS Flowers Brightly in AgBio
Josh P. Roberts has an M.A. in the history and philosophy of science, and he also went through the Ph.D. program in molecular, cellular, developmental biology, and genetics at the University of Minnesota, with dissertation research in ocular immunology.

The average farmer probably doesn’t have the latest HiSeq sequencing system in his or her barn. But that doesn’t mean next-generation sequencing (NGS) isn’t as important in the agronomics arena as it is in medical research.

Andrew Paterson, distinguished research professor at the University of Georgia, for example, uses NGS to capture the diversity of a population by comparing genotypes back to a reference genome, he says. The Plant Genome Mapping Laboratory, which he directs, also uses NGS “to obtain lots and lots of bits of sequence from different individuals, so that we can determine who is related to whom and sort out which pieces of chromosome came from Mom and which came from Dad.”

Many of the techniques from traditional genetics, molecular biology and biochemistry, bioinformatics and related disciplines are the same whether working with maize or mice. Yet plants present researchers with some challenges and opportunities of their own.

Ploidy

Plants may carry multiple copies of their genome—bread wheat, for example, is hexaploid, and sugarcane has been observed with as many as 200 copies [1]. “We have very few high-quality references of tetraploids, much less hexaploids,” explains Jeremy Schmutz, a faculty investigator at the HudsonAlpha Institute for Biotechnology who also manages the Department of Energy Joint Genome Institute (JGI) plant genome program. Admittedly, there may be enough differentiation among the subgenomes to associate most genes with distinctive characteristics of each one of the chromosomes, but not all. Schmutz says that with “a lot of stuff, you just can’t tell where it is and where it goes, because you have so many copies. And don’t even ask about sugarcane!”

Oftentimes genetics and genomics are carried out on parental or (putative) ancestral species or on model organisms. Brachypodium, for example, is an organism with an extremely compact genome; it is often used as a stand-in for wheat and other grasses. “It takes half a flow cell on an [Illumina HiSeq] X Ten to resequence one cultivar of wheat—that’s equivalent to five human genomes’ worth. In Brachypodium, which is 350 megabases, you can resequence to your heart’s content,” states Schmutz. “You can do crosses, induce mutations, measure phenotypes, and then if you find something interesting ask what happens in wheat with the same genes.”

Reference, please

The highest-quality plant genome that exists today is the diminutive rice genome, largely because “we did this using a very, very careful BAC [bacterial artificial chromosome]-by-BAC approach. And then we sequenced every BAC individually with Sanger sequencing, and we finished everything,” says Rod Wing, Bud Antle endowed chair professor at the Arizona Genomics Institute. That genome was published in 2005 [2].

Wing was also involved in sequencing the human genome-sized maize genome, also performed via BAC-by-BAC [3]. But in that case, “we didn’t have enough money to finish every BAC, so the only thing that’s finished in that genome is really gene space—everything else is draft,” Wing explains. And more recently, genomes have been sequenced using short-read technologies, yielding gene assemblies that lack most of the noncoding regions. “I’ve talked to scientists who do GWAS [genome-wide association studies], and they say the majority of the hits they get are in these draft regions”—meaning that fine mapping of these potential markers is not easy to accomplish.

The good news is that “we’re moving back to an era where we will have super-high-quality genomes, like gold standards,” Wing says. “Now some of the long-read chemistry is making it so that our genomes are more complete, and you won’t require BAC libraries as much.”

One of the biggest problems with NGS is that its short reads don’t do well with long repeats—which comprises perhaps half a genome. There is one technique that does a good job with repeats, says David Lightfoot, professor of plant soil and agricultural systems at Southern Illinois University, and that’s the long-read chemistry of PacBio—which is often able to traverse the repeat regions and anchor the read. Lightfoot’s team is about to publish the olive-tree genome, sequenced using a mixture of short- and long-read techniques.

High-MW DNA

BACs, long-read sequencing, mate-pair library construction for NGS and “some of the linking methodologies like 10X or BioNano” require relatively high-molecular-weight DNA, says Schmutz. “That’s pretty easy to get out of a blood sample, but with plants it’s a little bit more difficult, because they have carbohydrate buildup and other kinds of floating metabolites that degrade the DNA.”

In addition, chloroplasts can contribute perhaps 10% of total DNA. It’s best to get rid of them, “so that you’re not wasting your sequencing money on re-sequencing the chloroplast again and again and again,” says Lightfoot. He uses several tricks of the trade, including growing seedlings in the dark (which also helps to get rid of starch), taking samples from the roots or simply using an off-the-shelf kit to isolate the nuclei (which should rid the prep of mitochondrial DNA).

Wing’s group doesn’t rely on kits but instead lyses and spins down the cells and then resuspends the pellets in a buffer with mild detergent like Triton™ X-100 to lyse the chloroplasts and mitochondria but leave the nuclei intact. “Then you can take [those] nuclei and purify high-molecular-weight DNA out of that into an aqueous solution that’s actually good enough for PacBio or long-read sequencing,” he says. To make BAC libraries, his group embeds the nuclei in low-melt agarose, creating a porous bag of DNA through which all the enzymes and reagents for purifying the DNA can diffuse. “The DNA never gets subjected to any shearing force,” Wing says.

NGS

Most of the funding for plant genomics has so far gone toward making reference genomes. The next steps are looking at diversity within the crops themselves and looking at their wild relatives, says Wing. “During the domestication process, you go through the bottleneck in which you eliminate all this variation, but it turns out that this variation has the potential to solve a lot of agricultural problems.”

Rebecca Grumet, a professor of horticulture at Michigan State University, and her team are looking for genetic markers (quantitative trait loci, or QTLs) of disease resistance in cucurbits—specifically watermelon, melon, cucumber and squash—which ultimately can be translated into something “that can be easily screened for by a breeder who is trying to breed for resistance.” Genetic markers can help a breeder combine numerous traits, not only resistance to a disease but to several diseases and everything else needed for a robust plant variety, like yield and fruit quality.

Her team is doing this by using two kinds of genotyping by sequencing (GBS), a restriction-enzyme-based, inexpensive, low-coverage NGS technique [4]. The first is based on bi-parental mating: crossing a resistant with a susceptible strain and looking at the progeny. Because they have whole-genome references, they can figure out where the trait maps—essentially Mendelian genetics with 21st- century tools. The second is GWAS; they plan to sequence about 1,000 accessions of each crop from collections derived from all over the world, especially the regions where the species originated. They will “use that information to get a sense of how much genetic diversity is out there and to organize it to establish ‘core collections,’ so that [future researchers] can potentially work with a smaller number,” Grumet says. Sequence data and seed from accessions in the core collection will be publicly available, so that down the road someone looking for resistance to another disease, or some other trait, can ask if there is variation for that in the collection and use the sequence data to map the trait.

Many other NGS techniques—RNA-Seq, for example—are being put to great use by the plant-research community, as well. Sequencing has certainly taken root, and as costs continue to drop and new generations of researchers are trained, its applications will certainly continue to grow and bear fruit.

References

[1] Premachandran, MN, et al., “Sugarcane and polyploidy—A review,” Journal of Sugarcane Research, 1:1-15, 2011.

[2] International Rice Genome Sequencing Project, “The map-based sequence of the rice genome,” Nature, 436:793-800, 2005. [PMID: 16100779]

[3] Schnable, PS, et al., “The B73 maize genome: Complexity, diversity, and dynamics,” Science, 326:1112-1115, 2009. [PMID: 19965430]

[4] Elshire, RJ, et al., “A robust, simple genotyping-by-sequencing (GBS) approach for high diversity species,” PLOS One, 6:e19379, 2011. [PMID: 21573248]

  • <<
  • >>

Join the discussion