In sequencing nucleic acids, platforms use a variety of read lengths. At first glance, it might seem like longer reads would be better—turning a sample into a puzzle with fewer pieces—but it really all depends on what a scientist wants to accomplish.

“The benefit of short reads isn’t that they’re short, but that they can be very inexpensive and can be generated in a massively parallel manner with billions being generated in a single run,” says Shawn Baker, genomics advisor/consultant at SanDiegOmics and co-founder of AllSeq. “There are also some applications that simply cannot utilize long reads, such as fragmented DNA.” So, where cost needs to be kept down and read count must be high, short-read technology makes the most sense.

There are also some applications that simply cannot utilize long reads, such as fragmented DNA.

Long reads, on the other hand, “can resolve regions of the genome that are inaccessible to short reads due to repeat sequences,” Baker explains. “They can also read through the entire lengths of RNA transcripts, allowing for precisely determining the specific isoform.”

When it comes to long-read sequencing, many scientists think of Pacific Biosciences. Its Single Molecule, Real-Time (SMRT) Sequencing generates reads that are more than 15,000 bases long on average, and some exceed 100,000. At Oxford Nanopore Technologies, its platform—under the right conditions with high-quality DNA—can even produce reads of up to 1,000,000 bases. “That could reset the definition of ‘long’,” says Baker, “with PacBio reads being considered ‘medium length’.”

Staying SMRT

At long or medium, PacBio’s SMRT Sequencing offers many advantages. At Laval University, systems biologist Antony Vincent and his colleagues used it to study the bacterium Pseudomonas aeruginosa that causes severe infections in patients suffering from cystic fibrosis.

When asked why he selected SMRT Sequencing, Vincent says, “Second-generation sequencing technologies have made it possible to effectively explore microbial diversity at the genomic level, but for technical reasons, these technologies produce short sequencing reads of a few hundred base pairs.” That length, he points out, creates a limitation when de novo assembling these sequences. “Indeed, when the genomes contain longer repeats than the sequencing reads, it is impossible to determine the orientation and order of the fragments,” Vincent notes. Those sequences get left unassembled.

Vincent and his colleagues looked at repeats in sequencing that short-read technology couldn’t handle. In particular, these scientists analyzed mobile genetic elements in a strain of Pseudomonas aeruginosa (PPF-1), and SMRT technology, says Vincent, allowed them to “assemble the complete chromosome sequence of the PPF-1 strain and thus shed light on several repeated elements—insertion sequences, prophages, genomic islands and introns—that increase the genomic diversity of this bacterium.”

Longer, though, isn’t entirely better. The longer-read approaches, says Vincent, create some bioinformatics challenges. For one thing, there are fewer bioinformatics tools to work with new sequencing technologies. Plus, longer read approaches create some new obstacles. “It is well known that reads from technologies generating long sequences often contain more errors in the allocation of bases,” Vincent says. “However, several improvements—among others in terms of chemistry and bioinformatics treatment—have significantly reduced the problem.”

Benefiting from both

Sequencing is far from a one-or-the-other world. Some problems need short- and long-read technologies. At the University of Ferrara in Italy, assistant professor of genetics Silvia Fuselli and her colleagues ran into just such a situation with the major histocompatibility complex II DRB. Fuselli describes this immune-related structure as “extremely variable.” It includes numerous point substitutions, short tandem repeats, and rearrangements. This, says Fuselli, makes “the assembly of short reads almost impossible, especially in non-model species with no genomic reference available.”

Biocompare’s NGS Search Tool
Find, compare and review NGS tools
from different suppliers Search

So, the scientists used MinION technology from Oxford Nanopore Technologies. This provided “long reads covering the whole region without interruption,” Fuselli explains. “However, the error rate of the long reads—or at least of those produced with the MinION technology that we used, now considerably improved—was too high for a reliable variant calling.” So, the team used short-read sequencing where needed, such as in re-sequencing to adjust for any errors from long reads.

When asked if combinations of technologies are often beneficial, Fuselli says, “Especially in this case, definitively yes.” She adds, “In our specific experiment, the two technologies compensate each other and using both dramatically increased the efficiency of our approach in terms of time and produced information.”

As the work from Fuselli shows, short- and long-read methods of sequencing come with pros and cons. Getting the best results in some cases will come from using both technologies in ways that benefit from their upsides and sidestep downsides, at least as much as possible. As Fuselli says, “In general, I believe that if the costs are not too high, combining different technologies allows you to cover more aspects and overcome limits.”

Linking sequences

At 10x Genomics, scientists extend the sections of nucleotides in short reads with a data type termed Linked-Reads. Linked-Reads are generated by performing haplotype-level partitioning of long-input DNA molecules, generating barcoded short reads within those partitions and then performing short-read sequencing in bulk. The information encoded in the barcodes allows short reads to be grouped according to the long molecule of input DNA from which they originated, providing long-range genomic context and overcoming many of the limitations of short-read sequencing. Linked-Reads can be leveraged by novel bioinformatics pipelines to assemble sequences across long repeats in the genome and resolve heterozygous loci into individual haplotypes, enabling diploid de novo assembly and the simultaneous detection of small and large variants from a single Linked-Read library.

Using Linked-Read sequencing, researchers have demonstrated the ability to reconstruct megabase-scale haplotypes, resolve tandem gene duplications, and identify complex structural variations in cancer and inherited diseases.

For sequencing, accuracy is everything, but it’s complicated to maintain. Even as ‘long’ gets longer, short-read methods remain valuable—even invaluable in some situations. To dig as deeply as possible into a particular sequence, scientists need to know when to go long and when to stay short and staying on top of the best technology available keeps everyone busy.

Subscribe to eNewsletters
Get the latest industry news and technology
updates related to your research interests.