Genetic variations in the human genome are diverse and underlie many diseases. Measuring genetic variants is key in precision medicine, where each patient is considered individually and prescribed the right medicine based in part on their genetic analysis. Specific to cancer, genetic variation can include small single-nucleotide substitutions and insertions/deletions, focal changes to gene copy number caused by gene amplification, partial or whole gene deletions, duplications or loss of large chromosomal regions, and chromosomal translocations and other rearrangements.

The HER2 gene, amplified in a subset of breast cancer patients, was one of the first therapeutic targets of cancer precision medicine and has transformed the treatment of HER2 positive breast cancer. Since the discovery of HER2 gene amplification and its application as a therapeutic target, the technology used to measure copy number changes has evolved. Recent developments and improvements in high-throughput, genome-wide copy number detection methods, such as array-based and next-generation sequencing (NGS)-based workflows and their underlying algorithms, are enabling better characterization of these important gene variants and supporting further insights into their relation to disease.

Overcoming copy number variant calling challenges

Initial approaches to genome-wide CNV detection employed comparative genomic hybridization (CGH) and genotyping arrays. CGH compares copy number fluorescence ratios between differentially labeled target and reference DNAs, using probes selected for an even genomic distribution of targeted, non-repetitive regions. In contrast, genotyping arrays capture fragments of DNA using short base-pair sequences to deduce copy number based on varying hybridization intensities. While array-based approaches offer efficient, sensitive, and large-scale copy number analysis for extensive examination and profiling of CNVs, they have several challenges, including hybridization noise, low resolution, limited genome coverage, and difficulty detecting novel and small, focal CNVs.

With the rise and adoption of NGS platforms that deliver high-throughput read generation combined with increased base-calling accuracy for faster and cheaper genome sequencing, NGS-based methods have been developed for comprehensive characterization of CNVs. Relative to array-based methods, NGS allows higher coverage and resolution, more accurate copy number assessment, and the capability to detect other variant types including small variants and chromosomal translocations.

In principle, adapting NGS methods to CNV detection is straightforward. “Detection of copy number alteration is a read counting application with a simple premise—if a gene has undergone amplification, there will be more copies of the gene relative to other genes in the genome. Thus, when amplicons are sequenced, there will be proportionally more sequencing reads to the amplified gene relative to other genes,” explains Seth Sadis, director of oncology R&D at Thermo Fisher Scientific. But in order to modify NGS approaches to detect CNVs, one of the major challenges is making it robust and repeatable, and both process variability and noise must be taken into account. “Since it is a counting application, read count variability introduced from external sources such as different thermocyclers, reagent mix batches, or manual sample handling as well as intrinsic factors such as amplicon length or GC content can influence results,” explains Jim Veitch, senior scientist in bioinformatics at Thermo Fisher Scientific.

For example, PCR amplification done in pre-processing steps generates exponential growth in the number of copies sent to sequencing. However, this exponential growth also magnifies changes in amplicon efficiency that, if not accounted for properly, introduce errors. To enable robust and accurate confirmation of a copy number change, read counts must be normalized for each targeted amplicon against both intrinsic and external factors that generate differences between read count efficiencies. “Errors in read count estimates for particular amplicons can be characterized and managed by adding in more amplicons per target gene and averaging the result,” offers Veitch. A particular challenge arises with samples containing low tumor content. Thermo Fisher’s Ion Torrent technology uses a machine learning approach to remove amplification biases and normalize read counts across different sample preparations and improve the accuracy of copy number detection in samples with low tumor content.

Novel approaches lead to new tools

New, robust NGS-based methods incorporate read normalization and minimize bias, including Thermo Fisher’s Ion Torrent platform that supports a growing array of NGS panels targeted to specific diseases. Sadis notes that the Ion Torrent solution for copy number analysis is validated for comprehensive CNV analysis, even with low sample input or from FFPE, cytology specimens, or fine needle aspirates. The system offers rapid sequencing using a semi-conductor based sequencing platform, which is inherently faster than dye-based methods. Integrated pre-validated software workflows provide software solutions for variant interpretation on chosen cancer panels and automatically associates them with public domain sources that inform users on their relevance.

The recent introduction of focal copy number detection in cancer panels helped highlight the utility of CNVs as a significant feature of cancer mutation analysis. “Thermo Fisher’s oncology panels are all targeted panels that use AmpliSeq technology, allowing multiplex PCR in a single tube. This is the core amplification methodology for all cancer panels, though the new version, AmpliSeq HD, also has molecular tagging. Molecular tagging decreases the limit of detection for small sequence variants and is currently used for cfDNA samples, but the technology will soon be applicable to large FFPE panels,” explains Veitch. Pan-cancer solid tumor panels like the Cancer Hotspot Panel were one of the first and most successful NGS panels focused on small alterations, mainly SNVs and indels. This has since been expanded to add copy number variants and gene fusions in the Oncomine® cancer panels.

Yosuke Hirotsu at Yamanashi Central Hospital recently collaborated with University of Tokyo colleagues to show that Thermo Fisher Scientific’s newly developed Oncomine® BRCA1/2 Panel is an accurate, rapid assay system that simultaneously detects pathogenic variants and copy number alterations. Hirotsu is interested in germline mutations in BRCA1 and BRCA2 genes that predispose patients to hereditary breast and ovarian cancer syndrome (HBOC). Detection of pathogenic BRCA1/2 variants, including cases of exon loss, is essential for the diagnosis and prevention of HBOC, and for offering effective treatment decisions. The team detected 21 pathogenic germline variants in 147 patients with breast and/or ovarian cancer, supporting the relevance of the panel as an alternative assay to investigate BRCA1/2 germline and somatic mutations.

Application explosion

Applications are moving from simple focal copy number detection to other classes of chromosomal alterations that are also relevant to disease such as arm level gains and losses or chromothripsis, where the genome is erratically fragmented and arranged. New tools can be used to more precisely estimate tumor content within a cancer sample, allowing normalization of the copy number estimates to what they actually are in the tumor at a given time.

In fact, Jan Smida and colleagues at the German Research Center for Environmental Health recently used whole‐genome Affymetrix CytoScan High Density arrays to assess somatic copy number alterations (SCNAs) in 160 osteosarcoma (OS) samples. Osteosarcoma, the most common primary malignant bone tumor in children and adolescents, is characterized by structural and numerical chromosomal alterations that create a complex breakage pattern, or chromothripsis. The team was able to identify genes or regions frequently targeted by SCNAs and reveal OS specific unstable regions involving well‐known OS tumor suppressor genes. Their research has confirmed OS‐specific fragility patterns, providing novel clues to further understand the complex biology of OS.

“Specific to cancer,” Veitch notes, “one of the challenges is to detect very small copy number losses associated with tumor suppressor inactivation. If a tumor suppressor gene is inactivated, one of the mechanisms of inactivation is to lose a piece of it, but a lot of these are de novo to each cancer sample instead of gain-of-function cancer drivers where the same mutation is common across many cancer samples.” Tumor suppressor genes can be inactivated in many ways—whole gene loss, partial gene loss, or sequence alterations resulting in loss of function (by premature truncation of the protein product for example). Understanding when and how these occur is becoming more interesting to researchers as more about the process is revealed. Sadis adds, “Improving resolution of methods to reliably detect these smaller changes is happening now and is already implemented for BRCA 1 and 2. We are looking to extend this to all tumor suppressors.”

Copy number variation has long been connected to changes in gene expression and development of disease. Incorporating this knowledge with new approaches in array and NGS technologies that expose the mechanisms behind their influence are enabling more in-depth variant analysis and screening as well as further expansion of applications for the detection of these causal mutations.