Biologists at Cold Spring Harbor Laboratory have developed a method for estimating probability distributions in computational biology that could help unravel the complexities of gene expression and the chromosomal mutations associated with cancer.

“This is one of the things that’s really fascinating about mathematical research, is sometimes you can see connections between topics, which on the surface they seem so different, but at a mathematical level, they might be using some of the same technical ideas,” explains David McCandlish, senior author of a paper published in PNAS today.

The questions the team delved into involve mapping the likelihood of different variations on a biological theme: which combinations of mutations are most likely to arise in a particular protein, for example, or which chromosome mutations are most often found together in the same cancer cell. McCandlish explains that these are problems of density estimation—a statistical tool that predicts how often an event happens. Density estimation can be relatively straightforward, such as charting different heights within a group of people. But when dealing with complex biological sequences, such as the hundreds, or thousands of amino acids that are strung together to build a protein, predicting the probability of each potential sequence becomes astonishingly complex.

Search Antibodies
Search Now Use our Antibody Search Tool to find the right antibody for your research. Filter
by Type, Application, Reactivity, Host, Clonality, Conjugate/Tag, and Isotype.

McCandlish explains the fundamental problem his team is using math to address: “Sometimes if you make, say one mutation to a protein sequence, it doesn’t do anything. The protein works fine. And if you make a second mutation, it still works fine, but then if you put the two of them together, now you’ve got a broken protein. We’ve been trying to come up with methods to model not just interactions between pairs of mutations, but between three or four or any number of mutations.”

The methods they have developed can be used to interpret data from experiments that measure how hundreds of thousands of different combinations of mutations impact the function of a protein.