Mining Genetic Variation: The Latest SNP Analysis Tools

 Mining Genetic Variation: The Latest SNP Analysis Tools
Jeffrey Perkel has been a scientific writer and editor since 2000. He holds a PhD in Cell and Molecular Biology from the University of Pennsylvania, and did postdoctoral work at the University of Pennsylvania and at Harvard Medical School.

If there’s one thing the world learned from the 1000 Genomes Project, it is this: For all our similarities, humans are remarkably unique. By analyzing the sequences of 1,092 human genomes, the Project identified nearly 40 million variable sites, including 1.4 million short insertion/deletion (indel) differences, 14,000-plus “larger deletions,” and some 38 million single nucleotide polymorphisms. [1]

That means that humans differ from one another about every 75 bases, on average, and that density will surely increase. By the end of the 1000 Genomes Project’s Phase 3, involving a total of more than 2500 subjects, the SNP tally may reach anywhere from 60 million to 80 million variants, says Lisa Brooks, Director of the Genetic Variation Program at the National Human Genome Research Institute. “It’s hard to predict.”

As their name suggests, single nucleotide polymorphisms (SNPs) are single-base sites of variation between peoples’ genomes, and they are, by far, the most common form of genetic polymorphism. At a given location on a specific chromosome, some fraction of a population contains, say, an A, whereas others have a G. Identifying such differences is critical for several reasons: They may alter protein-coding sequences, or their associated regulatory control elements, for one thing. But they may also serve as genetic navigational aids, helping researchers home in on the genetic loci underlying diseases or phenotypic traits.

SNPs are just one of a number of variant types researchers have identified, including indels, copy number variation, repeat elements, and more. “[SNPs] are the most common, but scientifically, conceptually, if you are trying to understand why one person is at risk for a disease, there’s no essential difference between SNPs and [other kinds of] variants,” Brooks says.

Be that as it may, there does exist a richer toolset for SNP analysis than for other classes of genetic variants. Whether your goal is SNP identification, genotyping tens of thousands of variants in a small sample population simultaneously, or validating a handful of SNPs in a large sample set, there’s an analytical approach that’s right for you.

SNP analysis technologies

Stephen Chanock, Chief of the Laboratory of Translational Genomics and Director of the Cancer Genomics Research Laboratory at the National Cancer Institute, says his facility has SNP analysis platforms from Affymetrix, Illumina, and Life Technologies at its disposal.

For high-end genotyping work, the lab mostly relies on Illumina microarrays – off-the-shelf tools like the Infinium® HumanOmniExpress and Infinium HumanOmni5 BeadChip arrays, for instance, as well as custom iSelect microarray designs.

The four-sample Infinium HumanOmni5 BeadChip microarray covers the human genome with more than 4.3 million “tag SNPs” and “>240,000 exonic markers,” according to product literature. The eight-sample Infinium HumanOmniExpress BeadChip includes more than 700,000 genomic markers.

For smaller numbers of polymorphisms on larger populations, Chanock’s lab uses Life Technologies’ TaqMan® qPCR-based assays. According to Dennis Fantin, Director of Genetic Variation, TaqMan Assays, and the OpenArray Platforms at Life Technologies, the company now has over 7 million SNP assays in its portfolio, mostly human variants, but also including mouse and other model organisms.

Users can run TaqMan assays in a variety of formats, Fantin says, from standalone assays in single tubes or microtiter plates, to TaqMan Array Cards, to 3,072-“well” OpenArray® nanofluidic arrays. (All of which are supported on Life Technologies’ flexible QuantStudio™ 12K Flex Real-Time PCR System.) The company recently entered into a comarketing agreement with Douglas Scientific to provide TaqMan assays for the company’s Array Tape™ architecture, which performs the assays in nanoliter scale microwells on a polypropylene strip that unrolls like a roll of tape and that allows labs to run more than 110,000 genotypes per day, says Fantin.

Chanock’s lab processes and runs their TaqMan assays on Fluidigm’s Dynamic Array™ integrated fluidic circuits, which can multiplex up to 96 assays for each of 96 samples, or up to 9,216 parallel reactions.

Ioannis Ragoussis, Head of Genome Sciences at McGill University and Genome Quebec Innovation Centre in Montreal, also uses off-the-shelf and custom Infinium arrays for most of his genotyping work. But for lower SNP numbers, Ragoussis typically opts for Sequenom’s MassARRAY, a mass spectrometry-based platform that can handle up to about 40 variants in parallel (though TaqMan-capable ABI 7900 instruments are also available).

“The world is divided, in a way, between Sequenom users and TaqMan users,” Ragoussis says. But the platforms really are comparable, he stresses. MassARRAY requires an upfront capital investment to buy the dedicated MALDI-TOF mass spectrometer required to perform the assays, but the individual assays are relatively inexpensive, he says. TaqMan assays are pricier, but most labs already have access to a compatible real-time PCR instrument.

Ragoussis has also tested the Affymetrix Axiom® platform for SNP analysis, with “positive results.” The Axiom Genotyping Solution is microplate-based microarray architecture where 96 samples are simultaneously analyzed with 96 discrete microarrays in a microplate format. Both custom and catalog arrays are available, including the Axiom Biobank Genotyping Array with “over 600,000 total variants including the ability to add 150,000 variants of your choosing.”

Axiom Array Plates currently are available in a 96-well format with 800,000 SNPs per array, but a 384-well design (50,000 variants per array) will be released soon, says Shantanu Kaushikkar, Product Manager for Agrigenomics products at Affymetrix, enabling throughput of “a little more than 3,000 samples per week.”

According to Kaushikkar, agrigenomics SNP genotyping represent a growing application area for Affymetrix, in part because of the platform’s customizability (a key consideration when studying the genetics of non-standard organisms such as salmon and wheat), but especially because of the company’s software, which is capable of automated genotype calling.

“We are the only vendor that can provide automated analysis for the agrigenomics species,” says Kaushikkar. For instance, he says, wheat “is a particularly difficult organism.” It has a “huge” 17-gigabase, hexaploid genome (in other words, each cell contains six copies of each chromosome). “That makes [calling] SNPs challenging,” he says.

The rise of sequencing

Of course, there is another method for SNP genotyping that is gaining wider use. The proliferation of next-generation DNA sequencers and sequence-enrichment chemistries, coupled with falling per-run and hardware pricing, means it is becoming reasonable for researchers to use the technology not just for SNP discovery, but also validation and analysis.

“When sequencing is as cheap as water, why genotype anybody?” says Brooks. “It depends on your question.” For many applications, she says, things like microarrays work perfectly well. But for rare variants and variant discovery, sequencing is really the only way to go.

Sequencing offers several theoretical benefits over technologies like microarrays. Most obviously, it can be used both to identify new variants (discovery) and validate and genotype already known variants. In contrast, arrays can only score variants they were designed to detect. Sequencing can be used to concentrate on specific regions (via sequence enrichment) or to scan the entire genome, and it produces tons upon tons of data that can be re-examined as new sequence features and analytic tools come to light.

On the other hand, says Chanock, the bioinformatics are not quite as settled for sequencing as microarrays. Different labs use different SNP-calling algorithms and pipelines, meaning they can produce different outcomes. The datasets are also much larger and harder to handle, especially for labs not used to processing such data. “Calling [SNPs from sequence data] is a much dirtier and more difficult process,” he says.

Nevertheless, Chanock’s lab has been testing the Life Technologies Ion PGM™ and Ion Proton™ platforms for variant validation. But though prices are falling, he says, it’s still “in the distance” before sequencing is cost-competitive with arrays.

“Overall, I think … one direction in the future is to genotype through sequencing,” says Ragoussis, whose Centre’s validation platform is also testing the Ion PGM as a validation or targeted SNP discovery platform. That’s especially true for validation or diagnostic studies, he says, in which targeted genomic regions are sequenced on low-cost sequencers like the Ion PGM or Illumina MiSeq.

The black hole

There are, of course, other genotyping tools out there. And they can cover as much or as little of the genome as a researcher might want. Yet according to Chanock, there does remain a “large, gaping hole” in the world of genotyping.

“If you want to analyze between 100 and 10,000 SNPs [per sample], there really is no good, nimble, low-cost technology available,” he says.

TaqMan and Sequenom, for instance, enable researchers to tackle small numbers of SNPs on a large number of samples. Microarrays profile SNPs genome-wide, generally on smaller numbers of samples.

“But if you want [to analyze] 476 SNPs, that’s hard to get at,” says Chanock. Custom arrays, he explains, are not cost-effective without large sample sets and relatively large numbers of variants, whereas the only way to use technologies like TaqMan or Sequenom for that number of polymorphisms is to run multiple panels per sample.

“It’s very hard for arrays to come down into the area where you’re doing several hundred SNPs,” agrees Fantin. “And it’s difficult for these technologies [like TaqMan] to move up into that thousand-SNP area as well.”

Ragoussis also echoes Chanock’s assessment, though he notes that some platforms, like Illumina’s GoldenGate® assays, do support that SNP range. Discoveries tend to be made on high-density arrays, which now can include upwards of four million SNP assays, sometimes supplemented by low-coverage (e.g., 4x or lower) sequencing reads, as shown recently by Broad Institute researcher David Altshuler. [2] The panels are then whittled down somewhat to between 10,000 and 200,000 variants, which can be tested on less-expensive, focused or custom microarray designs.

“That’s why it doesn’t make sense any more to do, say, 1,000 SNPs,” Ragoussis says. Either researchers make do with fewer polymorphisms, or they take the plunge and grab the extraneous data that comes from analyzing tens of thousands of variants on an array. “Once a collection of individual assays goes over a cost of $10 per sample, then for large sample numbers a jump to custom SNP array technology may be preferable,” he says.

References

[1] The 1000 Genomes Project Consortium, “An integrated map of genetic variation from 1,092 human genomes,” Nature 491, 56–65, 2012.

[2] J. Flannick et al., “Efficiency and power as a function of sequence coverage, SNP array density, and imputation,” PLoS Comput Biol, 8(7): e1002604, 2012.

The image at the top of the page is from Fluidigm.

  • <<
  • >>

Join the discussion