The protein therapeutic market already includes more than 70 medicines, and global sales by 2020 are expected to reach $125 billion.1 However, making a therapeutic protein is challenging. To make biologics, based on a monoclonal antibody, enzyme, or other protein-based product, can take thousands of steps. One of the most complicated steps in developing biologics arises from interpreting data from mass spectrometry (MS).

A state-of-the-art technology often used to characterize a biologic is liquid chromatography (LC) plus tandem MS—LC-MS/MS. This technology, however, creates a variety of obstacles. “For therapeutic drugs, we want to address individual sites of a protein, but LC-MS/MS works with peptides,” says Olga Vitek, the Sy and Laurie Sternberg Interdisciplinary Associate Professor at Northeastern University. “So, you have to design experiments carefully to make sure that you have the full extent of the evidence.”

The molecules themselves also impact the process. “Individual peptides have different technological properties,” Vitek explains. “Some are easier to obtain by digestion than others, and some ionize better than others.” It is important to distinguish changes in MS signals due to technological limitations from true changes in the protein structure.

A lot of these limitations can be addressed by applying a statistical and computational mindset. In fact, devising methods for computational informatics really matters in evaluating MS data related to biologics.

Indirect indications

To analyze a protein for potential as a medicine, scientists must characterize it as completely as possible. With MS, the information is indirect—searching for knowledge about a protein by probing peptides. In addition to dealing with indirect evidence, scientists gather larger datasets, and this leads to difficult data to interpret. “You need statistical models to get insight into what you want to see through this indirect evidence,” Vitek says. “Proteomics has achieved a lot here, but therapeutic-drug characterization requires a different level of detail and precision, and has a way to go.”

Proteomics has achieved a lot here, but therapeutic-drug characterization requires a different level of detail and precision, and has a way to go.

The characterization can indicate various features, such as oxidation or glycosylation of an amino acid in a peptide. Beyond the presence or absence of a feature, the field increasingly demands quantification. “It’s not enough to see if the modification is there or not,” Vitek notes, “but you need to determine if the abundance changes, say with or without a small molecule conjugated to the protein.” So, scientists should design experiments to provide quantitative evidence.

Regulatory requirements also come into play. A therapeutic protein is expected to have specific features, and the analysis must be designed to reveal those properties. That requires the experimental design and statistical analysis.

Visualizing the variety

When asked about the most common bottlenecks in using MS in the development of a therapeutic protein, Steve Madden, software product manager at Agilent Technologies, says that one challenge is “quickly processing the large amount of data that is created in diverse workflows, such as intact protein analysis, peptide mapping, and released glycans analysis.”

Once scientists collect all that data, Madden points out that it can be difficult to “visualize the results of the data analysis to make sure that it is accurate and answers the questions at hand—for example, sequence coverage or the location of post-translational modifications.”

Sequence Coverage Map

In three data files, Sequence Coverage Map indicates the currently selected peptide (dark green), the covered sequence (light green) and the uncovered sequence (gray). Image courtesy of Agilent Technologies.

Some existing options help with these challenges. As Madden notes, “Our new Sequence Coverage Map in MassHunter BioConfirm allows the user to visualize up to 10 LC/Q-TOF protein digest data files at the same time.” This helps an investigator examine the results of using different enzyme digests, such as trypsin. “It also is useful when using Iterative MS/MS, a technique where the Agilent 6545XT AdvanceBio LC/Q-TOF will perform MS/MS on lower abundance precursor ions with each injection,” adds Madden. “This allows the user to drill further down to see lower levels of post-translational modifications or protein contaminants in a simple approach minimizing sample preparation.”

Biocompare’s LC/MS Search Tool
Find, compare and review LC/MS Mass
Spectrometers from different suppliers Search

As a scientist clicks a peptide in the table of results, the MS and MS/MS spectra appear. The software also indicates the peptide’s location in the protein sequence.

For any MS analysis of a biologic, adding a statistician to the team—as a working member, rather than an aside—makes a big difference. Plus, this should be done at the start. “Some scientists think that if they have a computer, they don’t need statistical expertise on a team,” Vitek states, “but that is wrong.”

Working with her colleagues, Vitek examined post-translational modifications of amino acids at specific locations. In particular this team of researchers analyzed site occupancy, which indicates the proportion of a protein that has a particular type of modification. This technique can be used to quantify the results of a specific process for making a biologic. With this approach, Vitek explains, scientists can figure out “how to make something reproducible across replicates.”

Without this kind of analysis of a biologic, “people cannot reproduce their results and they waste their money at best, or at worst they make misleading claims, and it takes a long time to realize it,” Vitek says.

Uncertain aspects

With advanced statistical analysis, MS can tell scientists much more about biologics. But much work lies ahead. “The problem now is that there are limited standards established on the regulatory side for statistical analysis of mass spectrometry characterization of biologics,” Vitek explains.

To establish those standards, the regulatory agencies need to collaborate with experts. There will be no single standard, though, that works for all biologics in all cases. “The guidelines could vary from one experiment to another,” Vitek states. “So, it’s important to involve statisticians, and statistical arguments for this type of MS-based research.”

Even statistical experts might not always know the best way to test every biologic. To make the best approach, it takes experts on the technology side of MS, plus a statistician. Then, “they need to work very closely—communicating, understanding the technology and data, and how it can be modeled,” Vitek notes.

The complexity of building and analyzing biologics requires more than an advanced MS platform. The scientists must also understand how to design an experiment to address the crucial questions and then analyze the data in a way that reveals the actual outcome.

References

1. Tsai, TH, et al. “Statistical characterization of therapeutic protein modifications,” Sci. Reports 7:7896 [PMID: 28801661].

Image courtesy of Agilent Technologies.