Featured Article
Wednesday March 17, 2010
by Jeffrey M. Perkel
There's change in the air in the world of transcriptome analysis.
Once the domain of microarrays—the previous decade's hot technology—transcriptome analysis (that is, gene expression monitoring on a genome-wide scale) is now associated with the current "it" technology: Next-generation sequencing.
"We definitely see sequencing playing an increasingly important role in transcriptome analysis," says Jason Liu, senior director for the SOLiD PI product line at Life Technologies.
Shawn Baker, market manager for expression and epigenetics at Illumina, which markets both gene expression microarrays and the popular Genome Analyzer II (GAII) DNA sequencer, echoes that sentiment, saying, "We definitely see a transition starting to happen."
One recent example: Earlier this month, two independent research teams—one based at the University of Chicago, the other in Spain, Switzerland, and the UK—published back-to-back letters in Nature describing the sequencing of 69 Nigerian and 60 European transcriptomes, respectively.1,2
"Our goal was to understand how genetic variation influences different aspects of gene regulation," says Joseph Pickrell, a graduate student and lead author of the University of Chicago publication.
Though such data can be collected using arrays (sequencing was previously used mostly for transcript discovery), with sequencing "[y]ou are not limited to what you probe," he explains. "You can discover new things." In other words, sequencing approaches, unlike microarrays, are unbiased—"hypothesis-neutral," in Baker's words—meaning you can find things you weren't looking for, and reanalyze the data later as new discoveries come to light.
Pickrell's analysis, involving some 1.2 billion single-end reads from a GAII, each 35 or 46 base pairs long, identified novel splice sites and nearly 4,000 previously unknown exons, all in six months.
"The way the technology is progressing in this field is pretty incredible," Pickrell says. "Even from start to end of the experiment, we probably doubled the amount of output per [sequencing] run, just because of upgrades from Illumina or minor tweaks to the protocol."
That trend will likely continue, as both Illumina and Life Technologies (which offers the SOLiD next-gen sequencing system) continually update and improve their sequencing portfolios.
Both companies recently released new instrumentation. The Illumina HiSeq 2000, released in January, can crank out over a billion high-quality single reads (or over two billion paired-end reads) per run (~200 Gb total) using the same sequencing-by-synthesis approach as the GAIIx but with approximately six times the output, says Baker—enough for between 20 and 200 complete transcriptomes, depending on the desired read depth.
"Traditionally arrays have had the advantage of lower price and higher throughput, but with the HiSeq 2000 that is starting to merge," says Baker. "You can profile nearly 200 expression samples per run taking about two days, with less than $200 per sample," he estimates.
Life Technologies' new SOLiD PI instrument—a "lower-throughput" sibling to the company's flagship SOLiD 4 sequencer—"can deliver up to 800 million unique 50-base pair tags in a full-capacity run," says Liu, or enough for up to 16 complete transcriptomes (without barcoding).
According to Liu, the SOLiD PI represents a new class of next-gen sequencers, designed not for genome centers, but for core labs, individual researchers and clinical research labs (the "PI" stands for "principal investigator"). "The pace of market decentralization in next-gen has picked up," he says. "We see more and more researchers outside of traditional sequencing centers adopting next-gen platforms."
At $230,000 (instrument list price), costing less than half the SOLiD 4, and with a faster, turnaround time combined with automated and streamlined workflow, the PI was designed from the ground up to help those researchers do the work themselves, he says, rather than farming it out. "The design focus has moved from [one focused on] high-throughput to more on quality of the system, speed and ease of use," he says. (Illumina's GAIIe is also a "personal" sequencer; it costs about half the core facility-ready GAIIx.)
Sequencing offers other advantages over microarrays, as well, including reduced background, higher sensitivity, improved dynamic range, and greater information density. Another advantage, adds Baker, is that sequencing is more flexible: "Any DNA or RNA molecule you can capture in a library format, you can sequence."
That's not to say gene-expression microarrays are going the way of the dodo.
Not everyone has access to a next-gen DNA sequencer for one thing, and even if they do, they may not be able to get instrument time when they need it, says Alicia Burt, director of microarray applications at Agilent Technologies.
More critically, sequencing is generally time-consuming (one to two weeks per run, typically), expensive, and bioinformatically challenging—all of which presents a problem for independent researchers outside of major sequencing centers, and especially, for diagnostics developers.
"I think there will be some basal usage in research labs for a very long time, and in the diagnostics area, there is a need for arrays for the foreseeable future," says Burt.
A number of companies, including Affymetrix, Agilent, Illumina, and Roche NimbleGen, offer off-the-shelf "catalog" arrays; some, such as Agilent and Roche NimbleGen, offer custom designs, as well.
Agilent offers some 25 different catalog arrays, Burt says, most in a 4x44k format with 60-mer oligonucleotides and costing from $250 to $270 apiece (lower for bulk orders); custom arrays can be designed using the company's eArray software. Roche NimbleGen offers more than 200 different prokaryote-focused designs (the largest selection in the industry, according to the company), as well as key eukaryotic designs in 1x385K, 4x72K, and 12x135K formats (also using 60-mer oligos), says Tsetska Takova, director of array and reagents.
Illumina has taken a slightly different approach with its off-the-shelf designs, says Baker: "The arrays we think will last longer are the ones that offer a price advantage and a throughput advantage [relative to sequencing]." The company's flagship HumanHT12 genome array contains 12 subarrays (each comprising ~48,000 probe sequences), at a cost of $75/sample. (Illumina doesn't offer custom gene-expression arrays.)
Though the information may not be as rich as sequencing provides, microarray data still can come fast and furious. Using 12x135K NimbleGen arrays and a 12-bay hybridization station, for instance, researchers can process 144 samples simultaneously to obtain high-quality data in a fraction of the time and cost of sequencing, says Takova.
Service provider LC Sciences, which focuses specifically on the microRNA component of the transcriptome, splits the difference between the microarray and sequencing paradigms with a workflow it calls Seq-Array. Basically, customers who are working on, say, a poorly studied organism, can have its transcriptome sequenced. Then, following data analysis to identify interesting non-coding transcripts, the company designs custom microarrays with up to 30,000 probes apiece based on its microfluidic uParaflo technology, for analysis of large sample sets.
"So we take the best of both technologies and put them together," explains Chris Hebel, the company's Vice President of Business Development.
LC Sciences is also using its uParaflo technology to address a problem unique to microarray researchers. Arrays tend to look at either mRNA or miRNA, but not both. But more and more researchers would like both datasets, so they can, for instance, correlate miRNA expression with changes in mRNAs. Hebel says a combined mRNA/miRNA microarray is "still a couple of months off."
Independently, Agilent is addressing the same problem, using software. It also is developing transcriptome arrays using one probe per exon, rather than one per gene. This, says Burt, will enable researchers to study exon utilization, which cannot be done using current array designs. "That is something we hope to market within this calendar year," she says.
For the moment, both sequencing and microarray technologies have a place in the transcriptome marketplace; if nothing else, the current cost of data acquisition and analysis makes sequencing prohibitively expensive for most researchers, especially for large studies requiring hundreds or thousands of samples, says Takova. That said, given the pace of technology development and pricing in the sequencing market, evolution is inevitable; it's a sure bet that where and how researchers use these technologies moving forward will likely evolve as well.
References:
1JK Pickrell et al., "Understanding mechanisms underlying human gene expression variation with RNA sequencing," Nature, published online 10 March 2010.
2SB Montgomery et al., "Transcriptome genetics using second generation sequencing in a Caucasian population," Nature, published online 10 March 2010.