by Caitlin Smith
The holy grail of genotyping is to analyze the genetic profile of an individual patient in the clinic, and use the results to target the most therapeutic diagnosis and treatment. Though the routine clinical use of genome-wide genotyping is still a dream, we are closer to making it a reality. Researchers continue to wrestle with hurdles such as increasing throughput, decreasing cost, and organizing and storing growing mounds of data. But technology is moving them through these hurdles and closer to the clinic. According to Andro Hsu, product manager at NextBio, “we are developing the kind of comprehensive, integrated tool that clinical researchers and others will need when understanding a patient-specific constellation of data through the lens of the world’s genomic information.”
Customizable SNP detection
The importance of single-nucleotide polymorphism (SNP) detection is receiving greater attention at Roche, according to Xinmin Zhang, senior product manager for Roche NimbleGen. “We [are expanding] the capabilities of our long oligonucleotide array platform to include a novel enzyme-based assay to detect SNPs with high accuracy,” says Zhang. “This new technology, in combination with the flexibility of the NimbleGen array platform, will deliver an ultra-flexible SNP microarray that will allow researchers to quickly validate and follow up on SNPs identified through genome-wide genotyping or whole genome/exome sequencing experiments.” This platform will feature a new degree of customization and flexibility in SNP investigations, such that researchers can quickly customize their SNP arrays based on their particular needs at that time. “With our customization capabilities, it will then be easy to create various iterations of the design to fine tune the performance and update the content of the array based on the latest results,” says Zhang. “The result will be a platform that complements sequencing as a second stage tool for validating the initial sequencing data and investigating additional samples.”
Optimization is also the goal of a new tool from Affymetrix, the Affymetrix™ Axiom™ Genotyping Solution, which helps optimize high-density, customized genotyping panels using their Axiom™ myDesign™ Array Plates with 50,000 to 2.6 million SNPs. The Axiom Genomic Database includes more than seven million SNPs, of which 5.4 million are validated for researchers to use in custom arrays. “Given the high error rates associated with sequencing, particularly at low depths of coverage, I would contend that the value of using validated SNPs is somewhat underappreciated today,” says Jay Kaufman, VP of genotyping at Affymetrix. “Having the knowledge that the SNPs that you choose to include on your array are truly polymorphic variants, and the confidence that they will work in the assay you are using, accomplishes several things: it reduces risk in generating high-quality data, it increases the likelihood of success in an association study, and lastly, it maximizes the scientific returns for the money you spend on an experiment or project.” Ongoing validation of newly discovered SNPs is particularly helpful to “researchers studying unique sample populations or focusing on specific ethnic groups that may not be well represented on so-called ‘cosmopolitan’ genome-wide arrays,” says Kaufman.
Kaufman notes an emerging trend in genotyping research to study a wide variety of ethnic populations, “such as Latinos, Africans, or Chinese,” he says. “Currently, our Axiom Genomic Database spans four major populations: Caucasian, Japanese, Chinese and Yoruba (African). This trend lends itself very well to our myDesign custom offering with respect to being able to design a custom array to focus on relevant SNPs in a specific ethnic group.” He also points out the value of Affymetrix’s flexible custom arrays used along with validated markers for researchers in rare and common diseases. “[They can] focus more of their scientific effort on defining disease phenotypes and experimental approach, instead of expending their energies on assay development,” he says.
Rare variants
An intriguing observation has recently emerged from genome-wide genotyping: “rare variants account for a significant portion of the heritability of human genetic diseases,” says Zhang. “This has led to new-generation genotyping arrays with expanded content to include rare variants identified from sequencing projects such as the 1,000 genomes project. However, due to the fact that rare variants can occur anywhere in the genome, an array-based approach has limited power in genome-wide discovery.”
One of the biggest challenges today for genotyping researchers is how to study rare variants most effectively. Zhang notes that while whole-genome sequencing is preferred by some for studying rare variants, often it is too slow and too expensive. “Many researchers have opted to take an intermediate approach to sequencing all the coding regions (exome) of the genome, which is an efficient way to detect all variants in the 1% of the coding region in many samples,” says Zhang. “This technique provides a cost-effective solution to allow researchers to focus in on the most functionally relevant portions of the genome for their research studies. NimbleGen SeqCap EZ Exome Library v2.0, our in-solution sequence capture technology, provides an efficient capture of the exome with 2.1 million empirically rebalanced probes for optimal uniformity to reduce sequencing needs of exome capture to ~3 Gb.”
While microarrays are more cost-effective and allow for higher-throughput, the newest arrays would only account for less than 1% of the 3Gb genome. Thus Zhang believes that a two-stage approach involving sequencing and arrays will be higher-throughput and more cost-effective. “The first stage is comprised of whole-genome or whole-exome sequencing of select samples where the candidate SNPs (both common and rare) will be identified,” explains Zhang. “The second stage consists of using a custom SNP array that focuses on the candidate SNPs to screen additional samples from the same or separate cohorts, and the additional data will be used to select the true signals from all of the candidates.”
Using public datasets in your work
A common lament among genotyping researchers is that the amounts of data will soon grow to the point of being unmanageable. NextBio integrates and mines publicly available genomic data, along with results from their own “correlation engine.” Their results include correlations taken from genome-wide association study (GWAS) data available from the National Institutes of Health and the Wellcome Trust Case-Control Consortium. “Researchers can explore results derived from public GWAS studies or import their own private list of SNPs, novel variants, or genes identified in a GWAS or sequencing study,” says Hsu. “These private data can be used to query against thousands of publicly available datasets to see if RNA expression, epigenetic, or other data can suggest a common mechanism linking or explaining the GWAS results. Using NextBio, researchers can place novel genetic variants into context by using public data to understand what is going on in an associated genomic region.”
A unique feature of NextBio is that it gives researchers access to thousands of datasets that would otherwise be unknown. “These data often contain ‘unpublished’ experimental findings not captured in an abstract’s limited space and focus,” says Hsu. “But because extensive processing, normalization, and curation is required, much of these data remain inaccessible to average researchers. NextBio’s service allows researchers to explore thousands of datasets and billions of correlations with a single click of the mouse.”
Hsu believes that as next-generation sequencing technology matures, so will our abilities to care for individual patients with personalized medicine performed at the molecular level. “This wealth of data can help to identify private genetic mutations or unique expression signatures that may underlie a patient’s particular condition,” says Hsu. “But all this information can’t sit in a vacuum; to contextualize these discoveries requires a ready ability to tap into all the genome-wide data the world has already generated. We do see far by standing on the shoulders of giants—but we can see much farther by climbing the mountains of data they’ve produced.”