Switching just one base in a gene causes big changes, sometimes. The results of such a single nucleotide variant (SNV) depend on the gene, the location in the gene, and the base swap. One SNV can be related to cancer, another to sickle cell anemia, and so on. Although SNVs arise frequently, important instances can be at very low levels—sometimes at the limits of technology’s ability to find them. Advances in various forms of sequencing and advanced data analysis give scientists improved pathways to understanding the biology of SNVs and using the results in precision medicine.

“SNVs could be either germline or somatic,” says Ramana V. Davuluri, professor of preventive medicine at Northwestern University’s Feinberg School of Medicine. “While germline SNVs are due to natural variation—person to person variation in genotype at each locus—in the populations, somatic SNVs are the result of mutations in the patient’s tumor or disease cell.”

In precision medicine, SNVs can be used in various ways. For one thing, “SNVs are useful markers for determining loss of protein function and risk of a disease,” says Davuluri. Plus, SNVs “can be used to identify patients who respond to specific or targeted therapy.”

Some applications of SNVs help clinicians predict the expected pathways of a person’s disease, especially in cancer. “Knowledge of somatic single nucleotide variants in tumor tissues provides prognostic information—knowledge of how rapidly disease is likely to progress,” says Jeffrey Townsend, Elihu Professor of Biostatistics and Ecology & Evolutionary Biology at the Yale School of Public Health. “Moreover, it provides the ability to target precision therapies to an individual’s tumor.” As an example, he points out vemurafinib, which is prescribed to melanoma patients who have a specific valine-to-glutamate substitution at amino acid site 600 in the gene BRAF, Townsend explains. “Such targeted therapies alone usually provide a brief respite from disease, but most do not currently cure many cancers or drive them into longstanding remission,” he says. “However, there is reason to be optimistic that building up a constellation of combination precision treatments may enable oncologists to outflank evolving tumors and cure cancer patients.”

That would be an amazing application of SNVs, and ongoing studies could bring that optimism to clinics.

Identify to apply

To make the most of SNVs in clinical uses, scientists must be able to identify these gene changes and what they do. As Stefano Lise, bioinformatics core leader at The Institute of Cancer Research, states, “Depending on the approach you are taking, you can identify SNVs that have a frequency of just 0.1%.” So, finding an SNV can be easier than assessing its function.

In terms of what SNVs do, Davuluri points out that one challenge is the “functional annotation of impact of SNVs on loss of gene/protein function and risk of disease.” His team is developing ways to identify functional SNVs and their target genes in prostate cancer. “The proposed informatics methodology integrates multi-omics data from prostate cancer tumor samples and prostate cancer cell-lines,” he explains. “We identified 38 regulatory SNVs and their targeted genes.”1 Further research showed that some of these SNVs could enhance the molecular pathways involved in prostate cancer.

To apply SNVs to precision medicine more broadly, scientists need to reduce the odds of identifying an SNV as clinically significant when it’s not. As Townsend says, “The issue of false-positive variant calls still remains problematic.” Using very strict criteria to connect an SNV to a clinical result will reduce false positives, but that could tip the balance to increasing the false negatives. “Now that we are moving toward clinical applications of next-generation sequencing and quantitation of the effects of variants—instead of just discovery of variants—false-negatives can be equally problematic to false-positives,” Townsend says. “So, there is no easy way out.”

Assessing the effect size

With the right data, scientists can determine which mutated genes drive cancer and which ones don’t. Sequencing data can also reveal the prevalence of the cancer-driving mutations. What scientists haven’t been able to do is compute “the cancer effect size—how important one mutation is compared to another,” as Townsend describes it.

Townsend and his colleagues identified the cancer effect size of specific SNVs for 22 kinds of cancer.2 That is, they determined, “how much each somatic variant contributes to proliferation and survival of a cancer lineage,” Townsend explains.

So, how could this information help a clinician? Imagine that a patient has a tumor with mutations in two genes related to that cancer, and there are two drugs known to target those mutations, but no head-to-head comparison of these drugs. What should a clinician do? Give the drug that treats the mutation with the largest cancer effect size. That’s just one way that the cancer effect size can be used in the clinic.

Subtracting the noise

Despite the great improvements in sequencing technology, the results aren’t perfect. Plus, the biology makes sequencing challenging. At The Institute of Cancer Research in London, scientists often study SNVs that are very rare. Plus, samples can include tumor and healthy tissue. So with some sequencing error, a scientist can’t tell if a very rare SNV is real or an artefact. Lise and his colleagues came up with a way to tell the difference.

They developed AmpliSolve, a bioinformatics tool.3 Lise and his colleagues started with germline samples, where the variation is less complicated: 50% for heterozygous and 100% for homozygous. “We developed a model that established the location-specific error in sequencing,” Lise explains. The nucleotide composition of an area on a gene impacts the odds of an error in sequencing with some platforms. For example, some sequencing devices have trouble distinguishing a stretch of, say, seven G nucleotides, maybe missing one or getting one wrong.

So, Lise and his colleagues determined the typical sequencing-error rate for locations on genes, and AmpliSolve uses that to distinguish an artefact from an SNV. Say that a location’s typical sequencing error rate is 1%, then any variation of 1% or less is considered artefact, and any variation greater than 1% is considered real. As Lise explains it, “We get rid of the artefact, so that we can spot what is a potential effect.”

Ultimately, scientists want to understand the biology of SNVs, and clinicians want to use the information to treat diseases. It all goes to show what just a single change can do—bad and good.

References

1. Jin, HJ; Jung, S; DebRoy, AR, et al. Identification and validation of regulatory SNPs that modulate transcription factor chromatin binding and gene expression in prostate cancer. Oncotarget 2016. 7(34):54616-54626. [PMID: 27409348]

2. Cannataro, VL; Gaffney, SG; Townsend, JP. Effect sizes of somatic mutations in cancer. J. Natl. Cancer Inst. 2018. 110(11):1171-1177. [PMID: 30365005]

3. Jayaram, A; Sandhu, S; Wong, SQ; et al. Identification of single nucleotide variants using position-specific error estimation in deep sequencing data. bioRxiv. 2018. doi: https://doi.org/10.1101/475947