EMBL-EBI researchers have created the largest reference phosphoproteome to date, with almost 120,000 human phosphosites. To sort through these phosphosites, the team used a machine learning approach capable of ranking them according to functional importance. This research, published Monday in Nature Biotechnology, provides a freely accessible resource that can be used by researchers to better understand which proteins are phosphorylated and which phosphosites have functional relevance.

Proteins can be regulated by modifications. Protein phosphorylation is one such modification that can alter the structural conformation of a protein, causing it to become activated, deactivated, or shifted in function. Despite decades of work, the total number of these modifications and which ones are truly critical for life remains a mystery.

“This new resource would not have been possible if scientists around the world didn’t share their research data and results,” says senior author Pedro Beltrao. “It would take a single machine over 500 consecutive days to run all the mass spectrometry experiments used to create this database. By applying machine learning to this huge dataset, we created a scoring system that will hopefully help researchers to determine which lesser-known phosphosites to explore next.”

Search Antibodies
Search Now Use our Antibody Search Tool to find the right antibody for your research. Filter
by Type, Application, Reactivity, Host, Clonality, Conjugate/Tag, and Isotype.

The researchers at EMBL-EBI curated over 100 publicly available phospho-enriched human datasets containing over 6000 mass spectrometry experiments from EMBL-EBI’s PRoteomics IDEntifications (PRIDE) database. This large-scale project has generated the biggest open access reference phosphoproteome database to date.

To identify the phosphosites most critical to human cells, machine learning was used to integrate diverse annotations for each site such as the degree of conservation. The phosphosite functional score generated in this study has enormous potential to help other scientists uncover more about their proteins of interest. It can be used to rank known phosphosites to distinguish those that are functionally relevant for molecular processes and disease. For example, the researchers were able to demonstrate the practicality of their functional score model by identifying two high-scoring phosphosites that play a role in regulating neuronal differentiation.

Phosphosite

“The functional score model created from this study can be used to uncover an abundance of new, functional phosphosites that may play crucial roles in disease,” says first author David Ochoa. “We already know of several groups who are using the scoring model, so we would like to encourage researchers everywhere to explore the resource and make use of it.”

Image: This is an artist's impression of phosphosite and machine learning. Image courtesy of Spencer Phillips.