by Jeffrey M. Perkel
In the classic model of gene regulation, transcription reflects the combined influences of a battery of DNA elements such as promoters and enhancer and repressor sequences. That model makes for pretty diagrams, but the real picture is far more nuanced.
DNA can be modified by methylation, and is wrapped around nucleosomes that also can be modified with acetyl, phosphoryl and methyl groups. Those modifications dictate, among other things, how tightly DNA and nucleosome can interact, and thus the DNA’s accessibility to regulatory proteins that directly influence gene expression patterns.
These chemical changes, called epigenetic modifications, are clearly targeted to specific sequences. There’s just one problem: The chromatin remodeling factors and epigenetic players that produce those modifications—methyltransferases and histone deacetyltransferases, for instance—are not sequence-specific. That is, they don’t target specific gene locations as, say, restriction enzymes do. So how does the cell direct those epigenetic modifications to where they are needed?
The answer seems to involve a class of RNAs called long non-coding RNAs (lncRNAs), also called long intergenic non-coding RNAs, or lincRNAs.
lncRNAs: What they are and what they do
The cell is, of course, rife with non-coding RNAs. There are transfer RNAs, ribosomal RNAs, small nucleolar RNAs and more. There exists a sizable collection of specialized tools to study, in particular, microRNAs. Long non-coding RNAs, though, are different. These transcripts are typically 200 nucleotides or larger in length and generally capped, polyadenylated and spliced just as messenger RNAs are. They just happen not to encode protein (for the most part, that is; some, called transcripts of uncertain coding potential (TUCP), may encode relatively short polypeptides).
But lncRNAs can, apparently, direct protein complexes to genomic targets, says Kevin Morris, associate professor of molecular medicine at the Scripps Research Institute in La Jolla, Calif., who studies this form of regulation.
Long non-coding RNAs, Morris says, “appear to be involved in bringing in payloads to particular sites [in the genome] and inducing histone modifications.”
One recent analysis discovered more than 8,000 lncRNAs in a variety of cell types. Naturally, figuring out what these RNAs do represents an area of active research. For the researchers who would study this relatively novel class of molecules, a growing set of tools exists to help them.
Working with lncRNAs
Fortunately, if you can work with RNA, you generally are well positioned to study lncRNAs. As long non-coding RNAs are, well, RNAs, they can be studied in total RNA preparations using the standard suite of gene-expression tools, including cDNA preparation systems, deep sequencing, microarray analysis, quantitative PCRand so on. The only caveat, says John Rossi, who studies microRNAs at the Beckman Research Institute of the City of Hope in Duarte, Calif., is sensitivity: “Those RNAs don’t really require special tools in terms of isolating them and studying them; they just require ways of detecting them, because they are very low-abundance usually.”
Like other RNAs, lncRNAs are directional. Sometimes, researchers specifically want to find transcripts that are antisense to a particular mRNA. In that case, they can use directional techniques.
A myriad of molecular tools
Morris’s lab, for instance, published in 2010 data demonstrating that the Oct4 gene, which is a key factor in the production of induced pluripotent stem cells, is regulated by antisense lncRNAs produced by a “pseudogene” called Oct4-pseudogene 5.1This pseudogene-5 transcript—which, unlike most lncRNAs is not polyadenylated—nucleates a complex of protein factors that targets and silences the Oct4 promoter.
Morris’ team used “strand-specific reverse transcription PCR” to look for antisense transcripts in poly-A-depleted RNA samples that could regulate Oct4. Strand-specific RT-PCR uses a “gene specific forward or reverse primer alone, thereby generating cDNA of specifically the antisense or sense strand of the targeted region respectively,” the authors explain.1
The team quantified the resulting cDNAs using quantitative PCR and used short-interfering RNAs targeting the pseudogene to block its function and dysregulate Oct4. They also used RACE (random amplification of cDNA ends) to pinpoint transcription boundaries and nuclear run-on assays to quantify the rate of transcription directly.
Custom and pre-designed real-time PCR assays
Many companies, Exiqon for instance, can produce custom qRT-PCR assays based on user-defined sequences. But one source of pre-designed assays is Life Technologies. According to Iain Russell, senior product manager for Applied Biosystems miRNA Assays at Life Technologies, the company offers nearly 25,000 TaqMan® assays specifically targeting lncRNA targets in human, mouse and rat.
“Very often researchers are looking in the vicinity of a particular coding RNA that is playing a role in whatever biological process, so we have built into the search functionality the ability to look at non-coding RNAs that juxtapose coding RNAs,” Russell says. “And we have also built in a genome viewer that allows them to view the different sequences within the context of the genomic framework.”
Researchers can search specifically for ncRNA assays using Life Technologies’ TaqMan search tool, which includes the option to restrict searches to “Noncoding RNA Only.”
Another tool for studying lncRNAs is the DNA microarray, several of which are available.
Life Technologies’ NCode™ Human Non-coding RNA Microarray, for instance, is a two-plex, 105,000-probe microarray that can detect some 17,112 non-coding transcripts as well as 22,074 mRNAs, thereby “allowing discovery of coordinated expression with associated protein-coding genes,” according to product documentation.
Agilent Technologies’ SurePrint G3 Gene Expression Microarray also blends coding and non-coding transcripts, with probes for 7,419 lincRNAs and nearly 28,000 mRNAs in an 8x60K format. According to Anne Bergstrom Lucas, senior research scientist for Genomics R&D at Agilent, the company’s arrays are able to detect transcripts expressed at a dynamic range of more than five orders of magnitude.
The Human SurePrint G3 8x60K microarray, released two years ago, currently is undergoing an update to better reflect current knowledge on lincRNA expression and structure, Lucas says. In fall 2011, Moran Cabili and colleagues in John Rinn’s lab at the Broad Institute probed RNA-Seq datasets from “24 human tissues and cell lines” to identify nearly 8,200 putative lincRNAs, including 4,662 “stringent” lincRNAs—those identified in more than one tissue or using more than one assembler application.2 (These lincRNAs are now visible as an annotation “track” in the UCSC Genome Browser, which illustrates known transcript start and stop sites; enable lincRNAs under “Genes and Gene Prediction Tracks” in the browser’s configuration section.)
Probes for the first SurePrint G3 lincRNA array were designed based on ChIP-Seq data. That array’s probe complement and the Cabili databaseoverlap by “only about 15%,” Lucas says. The in-development version 2 array will better mirror the Cabili dataset, she says, with over 10,000 newly designed lincRNA probes, as well as a refresh of the coding content. (The old probes will continue to be available for custom microarray designs through Agilent’s eArray tool. Regarding the unmapped probes, Lucas says, “I don’t necessarily think they’re junk. …We just can’t give confidently an idea of what they’re mapping to at this time.”)
Roche NimbleGencurrently does not offer a catalog product specifically for ncRNA-focused experiments, says Lance Brown, director of international product management at the company. But its high-density genome tiling arrays, with up to 4.2 million probes per slide, can be used for ncRNA discovery, Brown says, and users can follow that up with custom arrays (such as the company’s 12x135K format microarrays) to quantify known transcripts.
“Many of our customers have their own unique set of non-coding RNAs that they have discovered,” Brown explains. “We allow customers to dictate the content for their own experiments though our custom design and manufacturing capabilities.”
Of course, researchers also can study non-coding RNA expression using next-gen DNA sequencing technologies from Illumina, 454 Life Sciences (Roche)and Life Technologies. “In the discovery phase, next-gen sequencing is certainly a platform that people are looking to, and there [are] a lot of novel discoveries coming off that,” says Russell.
Cabili’s recent study used RNA-seq datasets collected on Illumina HiSeq and Genome Analyzer II instruments, for instance.
A number of dedicated library-preparation tools are available, including Life Technologies’ Ion Total RNA-Seq Kit and Illumina’s TruSeq RNA Sample Preparation Kit v2. To make rare lncRNAs easier to find, the company offers a protocol for “DSN normalization” using a duplex-specific thermostable nuclease (DSN) from Evrogen, which eliminates many of the more abundant RNAs in total RNA preparation (such as tRNA, rRNA and housekeeping genes).
Surprise and intrigue
Non-coding RNAs, says Morris, are full of surprises. For instance, he is studying a gene regulatory system in which a pseudogene of the PTEN tumor-suppressor gene regulates both the transcription and translation of PTEN itself.
Morris explains, “It’s exceedingly complex, because not only do you have non-coding RNAs that control chromatin states in epigenetics, you also have non-coding RNAs that can compete with the coding RNAs and soak up microRNAs. You also have non-coding RNAs that can bind to one another and form higher-order, non-coding RNA structures.”
Morris studies those “higher-order” structures by deep sequencing specifically to identify RNA-RNA hybrids, using RNAse treatment to degrade all single-stranded transcripts prior to cDNA synthesis.
As these new paradigms emerge, expect new tools and techniques to follow. “Really, it’s a brand new field,” says Brown. “It’s an emerging field and there are a variety of mechanisms currently being pursued.”
1Hawkins, PG and Morris, KV, “Transcriptional regulation of Oct4 by a long non-coding RNA antisense to Oct4-pseudogene 5,” Transcription, 1:3, 1-11, 2010.
2Cabili, MN, et al., “Integrative annotation of human large intergenic noncoding RNAs reveals global properties and specific subclasses,” Genes Dev, 25:1915-27, 2011.
The image at the top of the page is from Agilent Technologies.