Accurate annotation of genes and their transcripts is of critical importance, but currently available annotation technique are not always up to par. As a result, reference gene collections remain incomplete—many gene models are fragmentary, and thousands more remain uncataloged, particularly for long noncoding RNAs (lncRNAs).
In a paper published today in Nature Genetics, an international team of scientists led by researchers at the Centre for Genomic Regulation (CRG), in collaboration with researchers at Cold Spring Harbor, the Wellcome Trust Sanger Institute, and qGenomics, describes the methodology they developed that improves upon the throughput and accuracy of current methods.
"98% of our DNA does not encode for proteins. These DNA regions contain thousands of uncharacterised non-protein-coding genes, but there is still a long way to go until we fully understand their functions and their roles in disease. Reaching this goal will require complete gene maps. Our method represents an important step in this direction," explained Rory Johnson, CRG alumnus currently principal investigator at the University of Bern, and co-leader of this paper.
The key feature of the new method, named RNA Capture Long Seq (CLS), is that it focuses specifically on the non-coding regions of the genome, that are amplified and analyzed using the most advanced sequencing. "In this way we could produce a detailed map of 3,500 long non-coding RNAs in human and mice (about 20% of all those known so far) As a result, we characterized the genomic features of long-non-coding RNAs to better understand how these genes work," stated Julien Lagarde and Barbara Uszczynska, co-first authors and CRG researchers.
The team then used CLS to reannotate lncRNA populations in GENCODE and found that CLS approximately doubled the annotated complexity of targeted loci, outperforming existing short-read techniques.

"Scientists around the globe are using GENCODE for their research projects as reference sets, so improving it means contributing to biomedical research worldwide", said Roderic Guigó, coordinator of the Bioinformatics and Genomics Programme at the CRG and co-leader of this work. "We have found a cheaper, faster and more accurate method that results in an improved catalogue and that will first benefit scientists worldwide and, ultimately, society," concluded Guigó.
Image courtesy of Dreamstimes Images.