Dr. Maria-Celeste Ramirez: Hello, everyone, and a good day to you all. Today, I would like to speak about RNA Seq and how important quality control is for this highly sensitive assay and how the 2100 Bioanalyzer is a key instrument for these steps, leading to the generation of accurate data you can have confidence in.
In this talk, I would like to go over the following. First, I would like to say a few words regarding NGS and RNA Seq. I will then provide you with a quick overview of the 2100 Bioanalyzer, the workflow, and the very flexible portfolio, which will address a number of biological QC needs.
After this, I will be going over the fundamental QC steps essential in the RNA Seq workflow, what you should be looking for when assessing your samples, as well as the impact of suboptimal nucleic acid quality at each of these steps on the workflow and on data quality.
And finally, I will give a brief summary on the topics that were touched upon during this talk.
Genetic research has evolved a lot in the last 35 years. In the '70s, researchers were interested in analyzing large pieces of DNA or RNA using Southern or Northern blots.
Then came the '90s, when researchers then started looking at the smaller details within the genome and transcriptome using molecular techniques, such as Sanger sequencing, QPCR, and microarrays.
Nowadays, researchers are more interested in the low-frequency changes that happen within the genome. And because of its highly sensitive nature, this is really where next-generation sequencing plays a huge role, allowing for the investigation of biological questions with single nucleotide resolution.
Here's a glimpse of sequencing history. In the '70s, both Gilbert-Maxam and Sanger sequencing techniques were developed. Both these protocols were very laborious and involved a lot of radioactive tagging, making it very hard to process multiple samples at a time.
In the '80s, Sanger, because of efforts to make this protocol more user friendly, became the preferred method of sequencing. The process shifted from radioactive to fluorescent tagging, an important change in the field, making sequencing more applicable as a routine method of testing.
To streamline the process, [unintelligible] turned into capillaries. And the demand for more high-throughput methods increasingly expanded the number of capillaries that were made available on these sequencers.
It is with these capillary sequencers that the human genome sequence is catalogued. The accomplishment of this monumental project in itself really pushed the demand for more automated and higher-throughput technologies to allow scientists to really accelerate their research.
This demand was addressed by next-generation sequencers, each with their own chemistries that allows more processes and highly parallel sequencing.
Since they came into the market, these next-generation sequencers have been incrementally increasing their output and quality with the launch of newer versions of these sequencers with the greatest output currently pegged at hundreds of gigabases per complete run.
Now, why the need for next-gen sequencing? The major advantage of this technology is its scalability and speed, allowing for highly parallel reactions to take place and thereby bringing down the cost of sequencing.
In terms of cost, based on the same output compared to traditional Sanger sequencing, using this technology results in hundreds of volts [sp] less for Illumina or solid-based sequencers and tens of volts less for Roche.
Another major point is the sample preparation process, which is much less labor intensive compared to Sanger. With the decreased sample preparation time coupled with increased throughput and decreased turnaround time, this technology really streamlines the whole genome sequencing or resequencing processes, thereby allowing for the advancement of research at a much more accelerated speed.
As I mentioned earlier, the advent of the Human Genome Project has really has really changed the way genomic analysis is being done. It facilitated the identification of a number of disease-causing genes for both Mendelian [sp] and complex disorders.
Arm in arm with next-gen sequencing, this process was accelerated even further. And abundant information with increased sensitivity and accuracy is obtained, allowing even the identification of rare low-frequency alleles within a pool relatively easy to do.
NGS, as you all know, has an expansive landscape of applications. It can be used for studying DNA and RNA as well as variants at the nucleic acid level, studying differential gene expression, splicing, as well as protein-DNA interaction.
At this point, I would like to talk briefly about RNA Seq. RNA Seq is a recently developed approach to transcriptome profiling that uses deep-sequencing technologies. Studies using this method have already altered our view of the extent and complexity of the transcriptome.
Because of its sensitivity, this technology provides a far more precise measurement of levels of transcript and their isoforms, more so than any other method.
This technology allows you to perform variant detection, comparison of relative expression levels between genes or transcripts of interest, and in addition, this technology also enables one to study splicing patterns. And again, because of its sensitivity, it also enables discovery of low-expressing transcripts within the transcriptome.
With RNA Seq, what you gain is a potentially unlimited dynamic range, greater sensitivity, the improved ability to discriminate regions of high-sequence identity, and the ability to profile transcription without prior assumptions of which genomic regions are being expressed.
And the advantage really to this is the high sensitivity with single-molecule resolution.
Similar to preparing DNA libraries for NGS, the typical RNA Seq workflow is as follows. Input RNA, whether total, ribosomal-RNA depleted, or enriched small RNA, is used for the workflow.
The input RNA is purified in order to enrich for the fraction of interest within the pool and then fragmented by chemical means in order to generate insert [sp] sizes amenable for sequencing.
cDNA synthesis of both first and second strand is performed, resulting in a double-stranded cDNA library. This library is then taken through the routine steps of NGS library preparation, which includes repairing the ends, generating five-prime phosphorylated ends.
After this, A-tailing takes place, generating a three-prime A-overhang. These two nonidentical ends allow for directional ligation of the universal adapters, which facilitate parallel amplification of all fragments within the pool.
Once the adapters are ligated, there's an optional gel-based size selection step to isolate only the 200-base frag [sp] population, making the library amenable for short reads that are less than 150 bases or so.
The caveat of having size selection step is the possibility of a good percentage of the sample being lost, thereby possibly compromising the final library complexity.
The final step in library preparation is the PCR-enrichment step, where one could add in the mixing tags if not added within the adapters already, allowing for various samples to be pooled and the sequencing output to be maximized.
Now, how important is quality control in the RNA Seq workflow? Always remember that, for assays that are highly sensitive, it will always be a garbage-in/garbage-out kind of a deal.
Because NGS provides single-base resolution, if the quality of the input sample is compromised, this can result in the identification of false positives.
A few notes to remember--with respect to the generation of the libraries themselves, the quality of the input RNA will largely determine the success of the workflow, as evidenced by the final library yield as well as the range of the fragment distribution.
As far as sequencing goes, the quality of the reads that are the output of the sequencers highly depends on the quality of the inserts themselves.
Furthermore, the better the quality of the inserts and the library as a whole, the better you can maximize your sequencing output, allowing for the highest number of mappable reads as possible per library.
On data analysis, RNA integrity will help assure accurate determination of expression, support for existence of variants, transcripts, and splice form, especially the low-frequency ones.
If the input RNA is degraded in any way, one cannot assure that some transcripts could have been lost or compromised, thereby preventing them from getting sequenced.
Within the RNA Seq workflow, here are the steps wherein quality control can be implemented. First off would be the assessment of the input RNA itself. At this step, what you are looking for is RNA with good integrity. And this is evidenced by the Ren [sp] score, which I will speak about in a few slides.
The next step is assessment of the adapter-ligated library. At this point, you would like to see that the adapters have been properly ligated, the concentration of the library at this point in the workflow is good, and that the distribution is still bell shaped, indicating the absence of any bias in the process.
After adapter ligation, as I mentioned earlier, there is that optional size selection step which aims to narrow down the size range of the fragments.
Traditionally, this is done by cutting the desired-sized fragments out of an agarose gel. At this step, there is a danger of potentially losing samples during the extraction and purification process. And therefore, after selecting, it would be very important to assess the size distribution of your library as well as your yield.
Lastly, the assessment of the final library prior to sequencing--at this step, same for the previous one, what you would like to see is the yield and distribution of the library. The final concentration of the library will determine the optimal dilution for cluster generation.
Not loading the right amount of library on the flow cell during cluster generation can result in suboptimal clustering and therefore suboptimal sequencing.
For all these steps, current NGS workflows, not only for RNA Seq, prefer the use of the 2100 Bioanalyzer, a unique instrument that has the sensitivity that complements the sensitivity of the technology.
This instrument allows you to not only determine concentrations of your library or any nucleic acid or protein for that matter, but also allows you to perform a visual assessment of the quality of the libraries, as evidenced by distribution.
Now, I would like to transition into the discussion of the 2100 Bioanalyzer. The 2100 Bioanalyzer is the first commercially available lab-on-chip product. Introduced in 1999, this instrument allows the sensitive detection and analysis of DNA, RNA, proteins, and cells, and has been the industry's gold standard for RNA analysis.
The bioanalyzer technology allows for separation, cleaning, and detection of samples, enabling both qualitative and quantitative analysis in a single step with the use of one microliter for nucleic acids and four microliters for protein.
With one chip, you can assay 10 to 12 samples and obtain results in as little as five minutes for one sample and 30 minutes for 12. With this technology, the quality and speed of electrophoresis-based analysis is improved.
The workflow itself consists of four easy steps. Step one is to set up the priming station, checking both the priming station base plate and syringe clip setting. These settings are all indicated in the quick start guides.
Step two is to prime your chip for 30 seconds or one minute, depending on the assay, after which you load your samples into the sample wells.
Step three is to vortex your chip and then place it on the bioanalyzer and, lastly, start your 2100 expert software, select your assay, and press start.
The portfolio offered by the bioanalyzer is very flexible and expansive, allowing for the analysis of various types of samples. There are cell assays that enable two-color detection, facilitating analysis of protein expression within cells.
RNA assays that come as either nano or pico, depending on the detection level required, as well as small RNA assays to allow for resolution of these much smaller species.
For RNA assays, both total RNA and mRNA analysis is enabled, allowing for analysis of 11 to 12 samples in 30 minutes.
DNA assays, on the other hand, come in many forms, depending on the level of detection required as well as the size of the sample that needs to be assayed. These assays allow for the assessment of PCR products, restriction digest, as well as larger fragments of up to 12 KB.
Similar to RNA assays, these assays can run 11 to 12 samples in roughly 30 minutes.
Last but not least, the protein assays, which again comes in various forms, depending on the molecular weight of the protein that you are interested in--these assays can be used to assess cell lysates, purified fractions, and such, and runs about 10 samples in 40 minutes.
For the RNA Seq workflow specifically, we would like to focus the QC steps using the DNA assays as well as the RNA assays.
After having given you an overview of the bioanalyzer and its accompanying assays, I would now like to transition into a discussion of the RNA Seq QC workflow.
As I touched upon earlier, this is the workflow. And indicated in the red box are the steps that are important in the quality control of your RNA Seq libraries.
What I would like to highlight at this point is that, in green are the points in the workflow wherein you are dealing with RNA. And in blue are the points you are dealing with DNA.
Starting off with the RNA assays, since we are at the beginning of the workflow, here's a table that shows the specifications of each of the kits. Note that, as I mentioned earlier, for most of these, you can analyze both total RNA and messenger RNA for both eukaryotes and prokaryotes.
In addition, total RNAs also enabled a small RNA analysis. Noted here in the red box are the various ranges or thresholds for quantization and qualitative analysis that each of these assays can work with and, as you embark on your RNA Seq experiments, that you be guided by these thresholds to allow for more accurate detection and assessment of your sample.
I've earlier mentioned that the bioanalyzer is the industry's gold standard when it comes to RNA analysis. This is all because of Agilent's proprietary Ren score that enables scientists to make and overall assessment of the quality and integrity of their RNA.
The calculation of the Ren score is based on a neural network trained algorithm that enables the recognition and assessment of various zones within the electropherogram that are involved in the determination of RNA quality.
First and foremost, we have the ribosomal-RNA peaks, the 18S and 28S, the height of which are direct indicators of degradation, the interregion and fast regions, wherein the degradation product from the ribosomal-RNA peaks will be found, the precursor region and the post-region, where secondary structures or larger molecular recontaminants can be found.
All of these are assessed and taken as a whole in order to calculate for the Ren scores.
Now, just a few notes on the Ren score calculation. The Ren scores are based on a scale from one to 10, wherein 10 is the highest quality. Anywhere from seven to 10 would be good quality RNA. And anywhere between five and one would be poor quality RNA. So, these would be very important considerations when embarking on the RNA Seq library prep workflow.
In detail, I would now like to go over each of these QC steps. QC step one, looking at the quality of your input RNA--at this point, as I mentioned earlier, what is very important is the RNA integrity. You may be dealing with total RNA or messenger RNA at this point, after which fragmentation is applied to the sample in order to generate fragments that are suitable for sequencing.
For this QC step, what is important is to assess quality and integrity of the input RNA after isolation and before launching into the library prep protocol. This is with the use of the RNA 6,000-nano total RNA assay. And as you can see here on this sample electropherogram, I'm pointing out the 18S and the 28S peaks, which are your ribosomal-RNA peaks.
Now, what are you looking for when looking at this electropherogram? What you're looking for are Ren scores of greater than seven or eight, 28S peak at 4.5 KB, the 18S peak at 1.9 KB, the 28S peak being twice the fluorescence intensity of the 18S, no significant degradation product or a flat baseline, well-resolved lower marker, good fluorescence intensity of lower marker, and that all latter peaks are well resolved, recognized, and are of good intensity.
Now, if you're dealing with messenger RNA as an input sample for the workflow, what you're looking for is a smear from 0.5 to 12 KB. You're also looking for a flat baseline, well-resolved lower marker with good fluorescence intensity, and again, that all latter peaks are well resolved, recognized, and are good intensity.
Another possible input sample would be ribosomal-RNA-depleted RNA. What are you looking for if you're dealing with a sample? What you're looking for is the disappearance of the ribosomal-RNA peaks after the depletion step.
Furthermore, what you may see as well would be the appearance of degradation product within the interregion or the fast region.
As you can see here on this sample electropherogram on the left side, you can see that prior to depletion, the 18S and the 28S peaks are clearly visible and are well resolved. But, after depletion, the 28S peak as well as the 18S peak have now dissolved into this hump that you can see in blue.
This indicates that your ribosomal-RNA depletion had worked. And this'll be very important in terms of looking at your sequencing data because the ribosomal-RNA transcripts tend to take up a lot of sequencing because they are very prevalent within the pool.
Aside from total RNA, mRNA and rRNA depleted total RNA, another type of input sample is miRNA which are short RNA molecules that tend to bind complementary sequences on messenger RNAs repressing transcription. For this type of sample, you can assess the quality and amount of miRNA, typically isolated from a total RNA population by using our small RNA Assay. Typically what you are looking for would be the enrichment of the miRNA fraction compared to the total unenriched sample. On the left is an example electropherogram showing the unenriched and miRNA-enriched sample. As you can see in the bottom electropherogram, a very visible peak within the 5-40nt region is seen whereas in the top electropherogram which represents the unenriched sample, within this same region for the same sample, only a slight bump is seen. This kind of comparison will tell you how well the miRNA isolation protocol had worked. Other things that you are looking for would be a well-resolved lower marker, good fluorescence intensity of the lower marker which is used for alignment, and finally that all ladder peaks are well resolved, recognized and have good fluorescence intensity.
Now, what are the implications if you go ahead and use poor-quality RNA in your workflow? Using RNA with poor-quality and low Ren scores will result in any of the following, one, low library yields; two, failure to create libraries; three, overrepresentation of the five-prime ends of the RNA molecules.
And as you can see here on the left side, I have included a representative trace of the stepwise degradation of RNA, starting off with impact RNA with a Ren score of 10 in the top, partially degraded RNA with a Ren score of five in the middle. And as you can see, degraded products are starting to appear within the interregion and the fast region. And then on the bottom is strongly degraded RNA with a Ren score of three, wherein all of your fragments are shifted to the left.
Now, we go onto the DNA part of the library. After having fragmented the mRNA and the first strand synthesis and second strand cDNA synthesis is over, you now end up with a double-stranded cDNA library. From this point onwards, you are now dealing with the DNA assays for the bioanalyzer, as indicated here in blue.
Here are the specifications for the various kits that are included in the bioanalyzer portfolio. What you can see here is that there are various size ranges that are specific to the various assays that are included as well as various qualitative range that would be applicable for each of these kits. Please be guided by these specifications in order to ensure accurate quantization and assessment of your libraries.
Now, for QC step two, which is the assessment of the adapter-ligated library, again, what we are looking for here would be the library yield and good library distribution. And this is with the use of the DNA 1000 assay.
Now, what are you looking for when assessing these libraries? What you are looking for is the Gaussian [sp] distribution of fragments, good intensity of your library peak, well-resolved lower and upper markers, good fluorescence intensity of both markers, no peak overlapping with either marker, as well as a flat baseline.
Furthermore, what you would also like to see is minimal to none of the excess adapter peak, which you will find anywhere from 100 bases and below.
Running the adapter-ligated cDNA library on the DNA chip ensures that the library was properly converted to cDNA, that adapters were sufficiently ligated. Inefficient adapter ligation will result in poor library complexity and high duplication rate.
As for the optional size selection step, what we are looking for again would be the library yield and distribution. And again, this can be done with the use of the DNA 1000 assay.
What are you looking for when assessing this library? Again, very important here would be the distribution of fragments as well as the concentration of the libraries because of the potential to lose sample during the extraction and purification process.
You want a good intensity of your library peak and narrow distribution that corresponds with the gel-selected size, well-resolved lower and upper marker, good fluorescence intensity of your lower marker, no peak overlap with either marker, as well as the flat baseline.
Running the size-selected cDNA library ensures that the size-selected fragments are of the correct size and that sample loss due to gel-based size selection is minimal. Incorrect selection of fragments leads to potential incompatibility of library with a preferred read length, wherein reads will run into the adapter region. Too much sample loss will also compromise library complexity, leading to high duplication rates.
And finally, for QC step three, this is the quality control for the prepped RNA Seq library. And again, we are looking for good library yield as well as distribution. This is done still with the DNA 1000 assay.
What we are looking for here is the Gaussian distributuion of fragments with good intensity of library peak, narrow distribution if size selection was implemented, well-resolved lower and upper marker, good fluorescence intensity of both markers, no overlapping peaks with either markers, as well as a flat baseline.
Assessment of the amplified library allows for estimation of success of the library preparation, determination of yield to calculate required dilution for optimal cluster generation, and finally the selection of the proper read length for sequencing based on the approximate insert slides.
At this point, I would like to summarize all these topics that were touched upon during this talk. As a summary, RNA Seq is a highly sensitive deep sequencing technology that allows for single-base resolution in identifying nucleotide variants on the transcript level as well as enabling study of differential gene expression and alternative splicing.
Quality of input RNA and the libraries as a whole impact the success of the workflow, quality of sequencing, and quality of the RNA Seq data that is generated, and finally, that the 2100 Bioanalyzer is a key instrument in assessing both the input RNA and the resulting cDNA libraries with a simple and easy workflow, assays that are tailored to specific needs, and a user-friendly software that generates easily understandable results.
Thank you very much, and I hope you have a good day.