by Caitlin Smith
Real-time RT-PCR remains the choice method for analyzing changes in gene expression, especially for quantifying small amounts of mRNA. Most analyses involve quantifying the signal of the PCR products. But ironically, as technological improvements drive the honing of this method’s potential to detect single transcripts, researchers gathering ever-greater swaths of gene expression data at ever-faster paces with newer high-throughput methods are faced with challenges of data analysis. A most basic dilemma, for example, is what to use as a control when trying to measure the expression of a myriad of genes over a large dynamic range?
Absolute and relative quantification
The two most common methods to analyze real-time RT-PCR data are absolute quantification and relative quantification. Absolute quantification estimates the starting copy number by relating the PCR signal to a standard curve. The reliability and accuracy of this method counts on having identical amplification efficiencies for both the target and the calibration curve in both the RT reaction and the subsequent real-time PCR reaction. Thus the calibration curve model must be thoroughly validated. Potential problems that can arise include determination of exact standard concentrations, standard design and production, and standard stability over time. Standards can be recombinant plasmid DNA, genomic DNA, RT-PCR product, or oligonucleotides, but each has their own associated caveats.
Relative quantification estimates the change in mRNA expression levels by comparing the PCR signal of the target transcript to that of a control (a so-called housekeeping or reference gene). It is easier to use than absolute quantification because it does not use a calibration curve, and is useful for most situations that investigate physiological changes in gene expression. Normalizing the target gene to an endogenous standard is a good idea, but also fraught with complications: for example, housekeeping genes are known to be regulated and their expression can vary under different experimental conditions.
Various mathematical models exist to carry out relative quantification, and some have been incorporated into different software packages for data analysis (see below). “All users apply a mathematical quantification model that transforms processed raw instrument data (Ct values) into normalized relative quantities,” explains Jo Vandesompele, a professor at the Center for Medical Genetics at Ghent University Hospital in Belgium. “The main differences lie in the assumption that the user/software makes. The first model (Livak and Schmittgen, 20011) assumes optimal (100%) PCR efficiency for both the target of interest and the reference targets.”
The problem with this is that PCR efficiency is not always optimal. Furthermore, the PCR efficiencies of the target and reference are not always equal. So Michael Pfaffl and colleagues2 improved the model by correcting for differences in PCR efficiency. “The main disadvantage of both models is that these fail to incorporate more than one reference gene for more accurate normalization,” says Vandesompele, who soon after proposed that more accurate normalization can be achieved if you use more than one reference gene3.
Improving accuracy using multiple reference genes
Until Vandesompele’s work3, there had been no systematic determination of the errors that one incurred during the common practice of using only one reference gene, let alone solutions to the problems that the errors posed. He and his co-workers identified strategies to find the minimum number of genes needed to calculate a reliable normalization factor – as well as ways to identify the most stably expressed reference genes in your tissues of interest (they developed free software to do this called geNorm). After analyzing ten different types of reference genes, they found that normalizing with only one reference gene led to large errors in many of the samples. By analyzing microarray data, they found that accurate normalization could be achieved using the geometric mean of several carefully selected reference genes.
Reducing errors in real-time RT-PCR in this way is especially important for estimating very small changes in expression. Vandesompele has been developing his model further. “Together with my colleague and Biogazelle co-founder Jan Hellemans, I developed a general quantification model that corrects for possible PCR efficiency differences and employs multiple reference genes for normalization4.” This general model is at the heart of a software package called qBasePlus, developed and distributed by Biogazelle. “Our quantification model is the only one that knows how to use multiple reference genes for improved normalization,” says Vandesompele. “I genuinely believe our qBasePlus model is the best, because it is compatible with all known quantification models, and accurately propagates all errors during the calculations (errors on replicate measurements, and errors on estimated parameters such as PCR efficiency). This results in powerful analyses with high confidence.” Another benefit to qBasePlus is that it reads exported files from all major real-time PCR instruments.
Software to crunch your numbers
Several companies offer their own software for analyzing data from gene expression studies. Stratagene, now part of Agilent Technologies, sells their Mx3005P and MxPro QPCR software, “which can distinguish between 5,000 and 10,000 template copies with a 99.7% confidence level and can detect down to single copy equivalents of target,” says Michael Jessen, a product manager for instrumentation at Stratagene.
Bio-Rad has tried to make their software user-friendly so that researchers can “edit on the fly to change their analysis at any time, such as assigning alternate reference targets for normalization, assigning different control samples, or customizing the graphing options built into the software,” says Richard Kurtz, their senior product manager in amplification. “In our software, target sequence levels can be normalized to multiple reference targets using the Vandesompele method, and reaction efficiencies of the targets can be taken into account in the calculations using the Pfaffl method.”
Ease of use has also been a concern to Applied Biosystems, according to James Lee, a product manager for real-time PCR systems. “The Applied Biosystems 7500 and 7500 Fast Real-time PCR systems are verified to distinguish between samples containing 5,000-10,000 DNA template copies, with a statistical confidence level of 99.7%,” says Lee. “For normalization, our Gene Expression Study package provides the utility of using one or more endogenous control genes, as well as comparing the Gene Expression Study with various analysis parameters that include: reference sample, endogenous controls, PCR efficiency, and automatic versus manual Ct. Another Applied Biosystems product manager for real-time PCR systems, Laurel Nelson, says that “our new Gene Expression Study package for the 7500 and 7500 Fast Real-time PCR systems allows our customers to import an unlimited number of plates to a gene expression study, normalize their results to multiple endogenous controls, adjust the RQ values for the known efficiency of the assays, and group samples into both technical and biological replicate groups for streamlined analysis.”
Part of a whole workflow
It is important to remember that the analysis is actually the last leg in the quantitative PCR workflow. “Certainly, it is the step most people are struggling with, but I always emphasize that all previous steps are equally important to achieve good quality and reliable data,” reminds Vandesompele. “Previous steps include template quality (mRNA integrity and purity), PCR assay quality (extensive in silico and experimental validation and assessment of specificity and efficiency), and normalization strategy.”
References
1Livak KJ and Schmittgen TD. “Analysis of relative gene expression data using real-time quantitative PCR and the 2(-Delta Delta C(T)) method.” Methods 25(4):402-8, 2001.
2 Pfaffl MW. “A new mathematical model for relative quantification in real-time RT-PCR.” Nucleic Acids Res. 29(9):e45, 2001.
3Vandesompele J, et al. “Accurate normalization of real-time quantitative RT-PCR data by geometric averaging of multiple internal control genes.” Genome Biol. 3(7): 1-12, 2002.
4Hellemans J, et al. “qBase relative quantification framework and software for management and automated analysis of real-time quantitative PCR data.” Genome Biol. 8(2):R19, 2007