Josh P. Roberts has an M.A. in the history and philosophy of science, and he also went through the Ph.D. program in molecular, cellular, developmental biology, and genetics at the University of Minnesota, with dissertation research in ocular immunology.
Not much is known about tree-frog and blueberry genomics, and that means that expression analyses of these non-model organisms are likely to yield novel information. But because, well, not much is known about tree-frog and blueberry genomics, there also are far fewer tools available to study them. That’s in contrast to organisms like mouse and Arabidopsis, for which fully sequenced genomes and transcriptomes have allowed off-the-shelf microarrays, qPCR primer and probe sets, and a host of other reagents to be created.
In such a situation, what’s a researcher to do? And when expressed sequences are found, how is anything to be known about them? These hurdles are not insurmountable. Using next-generation DNA sequencing (NGS), cloud computing and in silico design tools, anyone can become expert in the gene expression of organisms outside the mainstream.
Seq and ye shall find
In the old days, a tree-frog researcher might go through the long, laborious process of creating expressed sequence tag (EST) or cDNA microarrays, using these to capture mRNA, and then begin the process of expression analysis. These days, RNA-seq has pretty well displaced all that, notes Callum Bell, vice president for research at the National Center for Genome Resources (NCGR) in Santa Fe, N.M.
Preparing an RNA-seq experiment is very much like making a cDNA library, only at the end, “you have somebody sequence it for you. It’s that easy,” says Ann Loraine, associate professor of bioinformatics and genomics at the University of North Carolina, Charlotte. But it also can be quite costly, not least because the informatics can dwarf the cost of sequencing itself.
As a result, Loraine, who studies bioactive compounds found in blueberries, advises researchers to think carefully about what the part of the plant or animal and what developmental stages to focus on.
After you have the transcriptome sequenced, reads are assembled into cognate transcripts and then annotated. “You don’t want to know just that you have these transcripts, you want to know what they are,” explains Bell.
With poorly characterized genomes, annotation can be a challenge, “but there are well-established ways of trying to assign function to these transcripts,” says Bell. Among the strategies NCGR uses are searching hidden Markov model databases of protein motifs (such as TIGRFAMs) and finding an open reading frame, translating the sequence into amino acids, and using that to look for homologues in protein databases (such as SWISS-PROT) using such tools as BLASTP.
From here, says University of British Columbia computational biologist Paul Pavlidis, you can “bootstrap your way to better data next time” by using the data to inform your subsequent experiments. In days gone by the transcripts might have been used to create a microarray to query levels and patterns of expression. But these days, says Pavlidis, “you might as well just stick with RNA-seq” to garner that information.
Loraine’s lab makes use of a large computational toolbox to help understand the transcripts the researchers encounter. They built CressExpress to find other genes in the same genome (in publicly available data sources such as the Gene Expression Omnibus database) exhibiting similar expression patterns to query genes. They use Pathway Tools to predict metabolic pathways that may contain query genes. Another tool, called Integrated Genome Brower (also from Loraine’s lab), lets the team visualize how read alignments match with their gene model.
The farm team
Not every lab specializes in NGS or bioinformatics. “It’s unusual for an independent research laboratory to own the kind of equipment that’s necessary to do these experiments,” says Bell, adding that the introduction of desktop sequencing instruments like Illumina’s MiSeq and Life Technology’s Ion Torrent instruments is just beginning to change the landscape. Core facilities (like NCGR), on the other hand, “have this accumulated experience in doing these complex experiments that individual investigators can’t really match,” says Bell.
As for data analysis, an increasing variety of products use cloud-computing infrastructure—both free and on a paid subscription basis. NCGR itself is “developing such a private cloud infrastructure to offer scientists this kind of ability,” he notes.
Many for-profit contract research organizations (CROs) and not-for-profit core facilities provide bioinformatics services for a fee. It’s important to know what they’re capable of, and what you’re contracting for. A given university sequencing core may not be able to deliver analysis, for example (and thus send their customers to the cloud), while another may have a crack bioinformatics team on staff. Similarly, CROs may specialize in one or the other, or may offer a combination sequencing and bioinformatics analysis package.
But don’t expect a third party to just hand you a final biological answer, cautions Bell. “We can give them reports and help them understand the content of their data. But interpretation with respect to the domain of expertise they have—some biological or clinical question—is up to them.”
Validation
A preliminary analysis using read-counting technologies will give an indication of which genes might be differentially expressed under some scenario—in ripe vs. unripe fruit, for example, or from skin vs. intestinal tissue. These data typically are then validated by qPCR or some similar approach.
Indeed, notes Pavlidis, after the sequences to be studied have been established, subsequent small-scale experiments (say, fewer than 200 genes) can be performed by qPCR much more cheaply than by additional RNA-seq.
Sometimes, users can find assays that are “homologous enough” from among the more than 1.3 million pre-designed TaqMan assays in Life Technologies’ catalog, says senior product manager Sundiep Phanse. “If not, then we can work with you to put together a custom assay.”
The problem with using any assay not wet lab-tested for the purpose—including those designed in silico using something like the free online Primer3 design engine—is that “you want to make sure it’s not only detecting what’s important, but that it’s not detecting anything else, as well,” notes Sam Ropp, marketing manager for Bio-Rad Laboratories’ gene-expression division. Variables such as single nucleotide polymorphisms, secondary structure and splice variants can dictate where primer pairs need to be placed, for example. It is also helpful to design them to work together under the same running conditions—something that’s not so easy to do for more than a handful of assays.
Bottom line: Blazing a new trail with non-model organisms has its challenges, but tools exist to smooth the way. With the right spirit and resources, researchers can reap the rewards hidden among organisms that few have dared to scrutinize.