Next-generation sequencing (NGS) has witnessed explosive growth in terms of options available for sample/library prep, sequencing, and software for data analysis and visualization. This has led to an increase in the use of NGS for a variety of different applications. However, it has also become evident that there is no one-size-fits-all. A lot of thought must be given to the type of sample to be analyzed, the amount of sample, the biological question that needs to be addressed, and the costs and resources on hand before deciding which NGS platform can be used.

“One of the big issues to improve the quality of sequencing data is to find the right platform and protocol to process and sequence the sample for a specific application,” says Jianjun Shen, Ph.D., Director of the NGS Core at MD Anderson Cancer Center, who consults and collaborates with nearly 40 faculty members utilizing NGS for various research projects. He works with them to understand the type of sample they are looking to analyze and the types of information that they are looking to get from sequencing.

How much sample do you have for sequencing?

Jack Lepine, Manager of the NGS Core Facility at the University of Massachusetts at Lowell, explains that it is best to have more samples for extraction and to put more of the extracted samples through library prep than would be necessary so that in the end there are enough samples to sequence. “I typically tell my clients that they need to consider replicates to be able to show statistical significance in their downstream analysis, taking into account that DNA/RNA extraction and library preparation are multistep complex processes that can cause sample failures.”

NGS Sample Prep Tools
Search Now Search our directory to find the right NGS sample prep products for your research needs.

“After a client pitches a project to me, I very often find myself explaining the project back to them in reverse order,” says Lepine. “I start by saying, ‘okay so you need to have 20 million sequencing reads at paired-end 150 cycle length per sample for 40 samples? This means you will need to do two NextSeq500 300-cycle high-output runs as each generates 400 million read pairs. To get here we need to library prep 45–50 samples, to account for some sample failure, so we are guaranteed to have 40 successful samples from each of our experimental groups in replicate. To get here we need to do RNA extraction of around 60 samples, and we need an extraction procedure that isn’t biased toward long RNA since you are expecting that your study may be related to smaller RNAs.'"

What type of sample do you need sequenced?

“It is not as simple as getting samples from users and handing them the sequencing data,” says Shen. “The type and amount of sample leads me to recommend NGS platforms and protocols that we can use.” For instance, he says, if we have a paraffin-embedded sample then we have to use certain specific protocols for extracting those samples. Similarly, if we have very few cells available for sequencing, then we have to use ultra-low extraction protocols or recommend single-cell analysis, which can be quite expensive. Knowing the sample source is also very critical, adds Shen. “For instance, if we use a regular extraction protocol for samples that are extracted from blood, then we will not be able to get rid of highly abundant genes. The sequencing data that you get will be dominated by these highly expressed genes and you will spend a lot of time and money to sift through the data to find the genes that you are looking for.”

For certain types of samples, the current methodologies and protocols have to be carefully evaluated or modified to meet the experimental goals. Applications such as transcriptomics and epigenomics require different methodologies for sequencing. “There is no commercial kit available today to sequence a non-coding RNA (ncRNA),” says Shen. A paper published in 2019 by the MD Anderson team talks about using a total RNA kit for extraction, not a mRNA kit, as the mRNA kit removes all ncRNA from the sample during the enrichment process. “It’s important to figure out which protocol will work best for which sample. Unless you do a thorough comparison you will not know which kit is best for the sample.” The team has recently submitted another manuscript for a ChIPseq kit evaluation comparing different targets, commercial kits, and sample amounts. “The conclusion again is that depending on the sample and the target to be analyzed, the kits give different results,” says Shen.

Matthias Meyer, Ph.D., evolutionary geneticist and Group Leader of Advanced DNA Sequencing Techniques at the Max Planck Institute for Evolutionary Anthropology, works primarily with ancient DNA. Since his group is working with trace amounts of ancient human DNA from bones and sediments, Meyer hopes that companies improve the index hopping problem that exists on their highest throughput platforms. “Even with dual-indexing, low level cross-contamination can occur, which may be problematic if a handful of sequences are considered evidence for the presence of ancient DNA,” says Meyer. He also wishes that there was a highly sensitive and very fast sequencing-based method for screening samples for the presence of trace amounts of DNA from specific genomic targets (e.g., human mitochondrial DNA). “This is something we are also working on in my lab, and it is great that so much powerful technology is available that can be used and combined to develop new methods on the 'customer' side.”

What are the costs of sequencing?

Costs for sequencing encompass many variables, some of which are often left out of commonly presented estimates of cost per base. Costs usually do not include labor and the bioinformatics analysis done at the end of the process. The reagent costs vary depending on the volume ordered. For instance, core facilities and sequencing centers that order in larger quantities often obtain discounted pricing. “Clients are inclined to save money right out of the gate and jump straight to targeted sequencing, exome sequencing, or mRNA sequencing and forego more expensive whole-genome sequencing or total RNA sequencing,” says Lepine.

The problem with that is you sometimes miss part of the story by zooming in on the data. For instance, a phenotype observed may have something to do with ncRNA, and if you are only sequencing mRNA, that will be missed. In order to save time and costs, Lepine recommends doing a smaller pilot project with a shallow sequencing run in order to be confident that the samples work on a larger scale. “In general, it’s best to start NGS/genome-based projects by sequencing more, early, and sequencing less, later (you can even turn a NGS study into simple qPCR depending on early sequencing results).”

How good is the data quality?

Data quality is very much dependent on the quality of the sample being analyzed. According to Lepine, there are various chemistries for library prep, and sometimes clients want to repeat the same library prep chemistry experiment after experiment without regard for dwindling RNA quality. “This can be a problem particularly with mRNA-seq projects.” Many mRNA sequencing kits require a poly-A tail to do a bead-based mRNA pull-down/enrichment. If the mRNA is degraded by freeze-thaw cycles or transcripts are without poly-A tails, they will be discarded and not get sequenced, which can skew information about mRNA expression levels.

Pradipta Ray, Ph.D., a Research Scientist in the Laboratory of Dr. Theodore Price in the Center for Advanced Pain Sciences at University of Texas, Dallas, works on Bayesian models of NGS data for comparative transcriptomics of pain. He says that although spatial RNA sequencing software is now available with built-in imaging analysis for identifying cell boundaries and neuronal identification from cell shape, it is not integrated into most microscopy or spatial RNA-seq out-of-the-box software. Having those integrated will be very useful, as well as having commercial kits for sub-cellular resolution for getting spatial RNA-seq data.

“One piece of advice I give people is don’t just hand over a list of genes after sequencing, as most labs don’t know what to do with that huge amount of data,” says Shen. Instead, it’s best to work with a bioinformatician, even before submitting the samples, who can help plan the experiment and help with data analysis and interpretation to make sense of your data. Hence, it is critical to know before embarking on a NGS experiment the types of resources available to help analyze, prioritize, and interpret the sequencing data. “Otherwise, it will be a huge waste of time, money, and effort,” says Shen.

Key considerations for evaluating NGS methodology options

  • What is the biological question/application?
  • What is the sample type and quality?
  • How much sample is available for sequencing?
  • Do you need DNA or RNA sequencing?
  • Do you need whole genome or targeted sequencing?
  • Is it short or long-read sequencing?
  • What are the limitations on cost and resources?
  • What are the demands on data analysis?