Looking for CNVs by NGS: Pare Me Down

 Looking for CNVs by NGS: Pare Me Down
Josh P. Roberts has an M.A. in the history and philosophy of science, and he also went through the Ph.D. program in molecular, cellular, developmental biology, and genetics at the University of Minnesota, with dissertation research in ocular immunology.

Copy number variants (CNVs)—duplications and deletions of large swaths of the genome—are known to play a role in disease states from cancers to neurological disorders such as autism and cognitive deficiencies. In the clinic, microarrays are currently the first-line means of detecting CNVs, but for discovery and translational research that may be a bit limiting. For one thing, arrays are only going to tell that the sequences are duplicated (or missing), and not much more. To understand all that’s going on, next-generation sequencing (NGS) is the way to go—it has much greater resolution and can at the same time provide information about inversions, translocations and other structural abnormalities, and about single nucleotide polymorphisms (SNPs) and small indels.

Yet sequencing the entire genome may be overkill. By enriching the samples for just those sequences potentially of interest—the 1% to 3% percent of the genome that codes for proteins, for example, or even just those genes known to be associated with a particular disease or syndrome—researchers can eliminate much of the bench time, sequencing, data storage and analysis costs associated with NGS experiments. Here we look at some ways to pare down sample DNA before looking for structural variations by NGS.

Why not use microarrays?

To look only for chromosomal duplications and deletions from about 1 kilobase to 3.5 megabases—that is, CNVs—researchers have traditionally turned to microarrays. These are still very cost-effective and very accurate, and they cover the genome more densely than approaches such as exome capture, notes Sameek Roychowdhury, assistant professor of medical oncology at Ohio State University.

Microarrays have their shortcomings. Although they can resolve much more fine-grained abnormalities than G-banded chromosome analysis (karyotyping), they cannot detect balanced rearrangements, such as inversions and translocations, nor can they determine the chromosomal location of insertions or deletions [1].

And with NGS—whether whole genome sequencing (WGS) or a targeted version, such as whole exome sequencing (WES)—“you’re paying more, but you’re also getting more information. You’re going to have SNPs and indels, and potentially structural rearrangements outside of the large CNVs” that you don’t get with arrays, says Roychowdhury.

The exome

By all accounts, as sequencing costs fall, the incentive to enrich genomic samples for NGS is lessening—it may be only a matter of time before WGS replaces targeted sequencing for most applications. But that time is not yet upon us, and so researchers using NGS for CNV need to weigh the relative advantages of different approaches to reducing the sequencing burden.

Selecting for the coding readings reduces the human genome down to circa 35 to 70 megabases—the approximately 2% in which about 85% of known disease-causing mutations have been found—depending on whether the 3’ and 5’ untranslated regions (UTRs) are included [2]. A variety of commercially available kits allow researchers to prepare a whole exome library for sequencing.

Most—including Agilent’s SureSelect,Roche NimbleGen’s SeqCap and Illumina’s Nextera—are based on hybridization of sequence-specific probes followed by PCR amplification of the captured sequences. The kits differ slightly in terms of how the DNA is fragmented (by sonication or enzyme treatment, for example) and the exact regions that are targeted by the probes—thus “the definition of the exome is different,” points out Janine Meienberg, who is completing her Ph.D. at the Center for Cardiovascular Genetics and Gene Diagnostics in Switzerland. None of the platforms, she found, capture all the known coding exons, either alone or in combination.

Another strategy relies on a variation of highly multiplexed PCR. Instead of using complementary oligonucleotides to capture selected sequences, such kits use those sequences as PCR primers to then generate the sequencing libraries. Most of these, like Thermo Fisher Scientific’s Ion AmpliSeq™, rely on carefully designed sets of primers to achieve a PCR-amplified version of the exome. Others, such as Agilent’s HaloPlex, use a dual-specificity primer (called a molecular inversion probe or padlock probe) that binds the ends of the desired genomic fragments to form a circle, with any leftover fragments being digested, prior to amplification.

Each method has its pluses and minuses. Hybridization capture methods may have difficulty with sequences from repetitive and GC-rich regions, while the PCR-based methods are less susceptible to such biases but work best on targeted regions of less than 5 megabases. At least one group, finding that the hybridization method missed approximately 7% to 10% of all protein-coding sequence regions, opted for a combination of the two [3].

Roychowdhury and his colleagues similarly found differences among the different exome-enrichment platforms in terms of on-target rates, uniformity and SNP variant calling, yet “all methods demonstrated effective copy-number variant calling” compared with a microarray [4].

Under-representation or over-representation of sequences can come into play with any technique that uses amplification, and this is a potential pitfall when looking for CNVs by NGS. One way to achieve more uniform amplification, and thus minimize such bias, is to divide the multiplex reactions into a large number of individual singleplex reactions by using partitioning technologies developed for digital PCR. In individual droplets there is less local competition for reagents, and less interaction between primers, than in a bulk reaction, notes Svilen Tzonev, director of business development at Bio-Rad.

Sequencing only what you need

In addition, a host of off-the-shelf targeted-enrichment panels (specific for genes associated with particular types of cancer, for example) use similar hybridization or PCR-based technologies. Many vendors allow customers to construct custom panels using the company’s design software tools. These might do well when searching for known genes that have translocated outside of the known exome.

Agilent has recently introduced its OneSeq kits, which are essentially a combination of the broadly spaced probes used for exon enrichment and the denser, more targeted probes that would be found in a SNP array, says Roychowdhury. This “comprehensive, all-in-one target enrichment assay … enables cytogeneticists and clinical researchers to study/discover the role of CNVs and the underlying causal mutations in many genetic disorders,” explains Charmian Cher, associated director of Agilent’s diagnostics and genomics group. The kits are available in off-the-shelf and custom versions.

Arrays, exome sequencing and targeted panels are great for finding CNVs—especially when looking at a particular set of genes and perhaps a specific loci—and there are many ways to get to that same finish line. For discovery, on the other hand, “if it’s a new case, we’d prefer to do WGS so that you can actually see these things across each of the chromosomes,” says Vincent Magrini, research assistant professor of genetics at the McDonnell Genome Institute at Washington University, St. Louis.

References

[1] Martin, CL, Warburton, D, “Detection of chromosomal aberrations in clinical practice: From karyotype to genome sequence,” Annu Rev Genomics Hum Genet, 2015. [PubMed ID: 26077817]

[2] Meienberg, J, et al., “New insights into the performance of human whole-exome capture platforms,” Nucleic Acids Res, 43:e76, 2015. [PubMed ID: 25820422

[3] Miya, F, et al., “A combination of targeted enrichment methodologies for whole-exome sequencing reveals novel pathogenic mutations,” Sci Rep, 5: 9331, 2015. [PubMed ID: 25786579]

[4] Samorodnitsky, E, et al., “Evaluation of Hybridization Capture Versus Amplicon-Based Methods for Whole-Exome Sequencing,” Hum Mutat, 2015. [PubMed ID: 26110913]

 

Image: ShutterStock

  • <<
  • >>

Join the discussion