The transcriptome—all the RNA expressed within a cell—is a valuable snapshot in time that reveals which genes a cell is actively expressing. Transcriptome analysis helps researchers to understand how cells regulate gene expression. Not surprisingly, the techniques that scientists use to study RNA expression have been changing rapidly over the past decade or two. Here’s a look at today’s developing methods for analyzing RNA expression.

Common methods of transcriptome analysis

The three main tools used to study RNA are real-time quantitative PCR (qPCR), microarrays, and RNA-sequencing (RNA-seq). Today, the first two are in danger of becoming eclipsed by RNA-seq, though they are still used in particular applications. For example, qPCR might be a great choice if you are just looking to quantify the expression of a few known genes—inexpensive and highly accurate. But it wouldn’t help if you wanted to measure the expression of hundreds or thousands of genes.

Microarrays—oligomeric probes regularly arrayed on a chip or slide—are used to test whether RNA molecules in a sample will bind to known probes. These are inexpensive and easy to use, but a disadvantage is that microarrays can only test for known RNAs. Increasingly, microarrays are being replaced by RNA-seq, says Suvarna Gandlur, associate director of NGS marketing at Takara Bio. “Microarrays have a place, especially in cases where they are well tested, such as in diagnostic arrays, for example, because it’s not easy to remove them from the SOP,” she says. “But microarrays are used much less now, because you get a lot more data from sequencing; that, and reduced sequencing costs, have been the primary drivers for people adopting next-gen sequencing over microarrays.”

RNA-seq uses next-generation sequencing (NGS) of the cDNA version of the RNA. This is more expensive than other methods but returns much more information, too, including full sequences. Another option includes “third-generation” single molecule sequencing, which generates longer sequence reads than does NGS.

Difficult samples and single cells

Takara Bio offers a range of transcriptome analysis tools, with a focus on single cells, and samples of low input or challenging degraded, FFPE, plasma, or cell-free nucleic acid samples. The SMART-Seq Single Cell line is tailored for analysis of transcriptomes of single cells using Illumina or Ion Torrent sequencers. Additionally, the SMART Stranded line is optimized for generating cDNA from poor quality RNA that can exist in biologically derived or otherwise challenging samples.

A lot of single-cell studies use the droplet-based method that analyzes hundreds or thousands of cells simultaneously. “Those studies are good at the very high level, but you still want to look at a few cells at a much deeper level,” says Gandhur, so Takara Bio introduced the SMART-Seq Single Cell kit with high sensitivity chemistry to detect transcripts at very low levels from a single cell. “The SMART-Seq Single Cell kit is a good supplement to enrich the data collected using droplet-based methods,” she adds.

Third-generation transcriptome sequencing

Pacific Biosciences (PacBio) uses single molecule, real-time (SMRT) sequencing technology to generate full-length cDNA sequences from RNA samples. This method is called Iso-Seq. Because SMRT sequencing generates long reads that are much longer than mRNA transcripts (usually 1–10 kb), the sequence of an entire transcript can be obtained in a single read with no computational assembly required. This can be particularly valuable for studying isoforms produced by alternative splicing.

Researchers are also applying Iso-Seq in other fields, such as genome annotation in plants and animals, and studying genes related to neurological diseases and cancer. A recent paper used Iso-Seq analysis to study the PIK3CA gene, the most mutated oncogene involved in several types of human cancers. “The researchers were interested in looking at mutations on the gene with respect to drug response,” says Elizabeth Tseng, principal staff scientist at Pacific Biosciences. “They were able to show that double mutations on the same allele are associated with patients’ having better drug responsivity.”

Iso-Seq still has room to grow. “A lot of the existing bioinformatics tools don’t work well on long reads,” explains Tseng. Meanwhile, PacBio is continuing to work on making Iso-Seq more affordable for RNA researchers.

Synthetic long-read sequencing

Loop Genomics developed a method that CEO Tuval Ben-Yehezkel calls “synthetic long-read sequencing” to distinguish it from traditional long-read sequencing methods from PacBio and Oxford Nanopore Technologies. The value of using long reads in transcriptome sequencing lies in the way genes are expressed. “Using short-read sequence data, you can primarily tell which genes are sequenced, but not which isoforms of the gene are expressed,” says Ben-Yehezkel. “This is because short reads usually span just one exon, so you can't tell which combination of other exons exists on the same RNA molecule.” Loop Genomics’ method uses proprietary barcoding technology and an Illumina sequencer to reconstruct the sequence of long reads from barcoded short reads.

The method involves generating modified first-strand cDNA for sequencing, which includes unique molecular identifiers. Further, “proprietary barcoding technology distributes an enzymatic reaction that is essentially an intramolecular barcode copy-paste operation,” says Ben-Yehezkel. “We copy and paste each unique barcode into a random position within the same molecule.” When they subsequently make a short-read library, each short read contains the same barcode indicating which original mRNA it came from. “Then we're able to look for short reads that share the same barcode and assemble them into a long-read sequence by de novo assembly, to reconstruct the full-length mRNA molecules,” he says. Because they barcode each mRNA, they can also determine the relative abundance of each isoform, with no PCR duplicates in the data.

Another feature of Loop Genomics’ method is that it uses consensus error correction to ensure an extremely low error rate. “Consensus error correction is when you have multiple short reads covering the same position, so you can choose the consensus from the short reads to correct your sequencing errors, which is something that is very hard to do with other methods of long-read sequencing,” points out Ben-Yehezkel. The low error rate makes the method valuable for studying RNA editing and point mutations in mRNA. “To tell the difference between a true RNA edit or low frequency mutations, and just a sequencing error, you need a really low error rate,” he points out. Researchers are also using Loop Genomics’ technology to study fusion genes and to sequence isoforms from single cells.

As quickly as RNA-seq is evolving to meet the needs of RNA researchers, it will undoubtedly continue to develop. Gandlur notes an area of potential improvement for today’s RNA-seq technology, referencing the analogy of whole-genome sequencing, which narrowed to whole-exome sequencing, and then to targeted sequencing. “I think transcriptome analysis right now is still whole-transcriptome, which returns information that you don’t need,” she says. “It’s getting there, but it needs to become more streamlined, like in immune profiling with RNA-seq that uses specific targets.” Stay tuned for development of the next milestones in this exciting field.