Get the Big Picture with Top-Down Proteomics

 Top-Down Proteomics
Jeffrey Perkel has been a scientific writer and editor since 2000. He holds a PhD in Cell and Molecular Biology from the University of Pennsylvania, and did postdoctoral work at the University of Pennsylvania and at Harvard Medical School.

In the world of mass spectrometry-based proteomics, researchers have two options. They can digest protein mixtures to peptides and catalog what they see. Or they can start with intact proteins, isolate individual forms and characterize them one at a time.

The former strategy is called “bottom-up proteomics,” and the latter, “top-down.” Though bottom-up proteomics is by far the more popular approach, being both technically easier, top-down offers a number of advantages, says Neil Kelleher, a top-down proponent at Northwestern University and member of the organizing committee of the Consortium for Top Down Proteomics.

In particular, top-down proteomics neatly sidesteps the so-called “inference problem” that is inherent in the bottom-up workflow. The inference problem, Kelleher explains, stems from trying to identify and characterize proteins precisely based only on a few diagnostic peptides.

That’s because the concept of one gene, one protein no longer works in most cases. Proteins can be post-translationally modified. They can represent different splice variants and arise from very similar genes in gene families. Given those variables, it can be nearly impossible to unambiguously determine precisely which protein isoform—the top-down community now advocates the term “proteoform”—a given peptide represents [1].

Where bottom-up smashes everything in a protein mix and sifts through the rubble, top-down starts with an intact proteoform, isolates it in the mass spectrometer and hammers at it until it gives up its secrets—polymorphisms and modifications alike. “It’s one reason why top-down offers real clarity about protein variability,” Kelleher says.

Mass spectrometers

If you’re interested in going with the top-down (so to speak), you’ll need more than your run-of-the-mill mass spec. Most top-down labs use high-end Orbitrap (Thermo Fisher Scientific) or Fourier transform ion-cyclotron resonance (FT-ICR) mass spectrometers, though some labs choose quadrupole-time-of-flight (qTOF) hybrid mass specs, which have lower mass resolution but excellent ion optics.

What all these instruments have in common is high mass resolution and mass accuracy. Proteomics relies on the ability to distinguish one protein from another, based on the proteins’ mass-to-charge ratios (m/z). Intact proteins bear multiple charges and multiple charge states, and those peaks compete for spectral space with all the other proteins in the mix, and their different charge-states, as well as the various proteoforms. Needless to say, the spectra can get crowded.

High mass resolution enables researchers to differentiate otherwise indistinguishable proteoforms and to select them for deeper analysis. “High resolution buys you a substantial amount of peak capacity and also buys you confidence in your peak assignments, because you have higher mass accuracy,” says Nick Young, director of biological applications for the Ion Cyclotron Resonance Program at the National High Magnetic Field Laboratory.

The “magnet lab’s” ICR program features four FT-ICR systems, including one of two 21-Tesla FT-ICRs currently under construction. (The other will be at the Pacific Northwest National Laboratory.) When complete, these will be the most powerful FT-ICR mass spectrometers in the world.

Ljiljana Paša-Tolić, lead scientist for mass spectrometry at the Environmental Molecular Sciences Laboratory at the PNNL, which already has 12-T and 15-T instruments, explains that higher magnetic field strength in FT-ICR mass spectrometers improves just about every facet of their operation. “Everything scales linearly or quadratically with field strength,” she says, including mass resolving power, accuracy, sensitivity and speed.

That latter benefit is important. Features that take long scans to resolve at 15 T will separate more easily and quickly at 21 T. “All the experiments that are heroic today become routine when you do them at high field,” Paša-Tolić says, reiterating a comment she attributes to Alan Marshall at Florida State University.

Paša-Tolić also has Orbitraps in her lab; she used one (an LTQ Orbitrap Velos) in a recent study of Salmonella typhimurium that identified “563 unique proteins including 1,665 proteoforms generated by posttranslational modifications (PTMs), representing the largest microbial top-down dataset reported to date” [2]. Among the study’s findings was the observation that certain Salmonella proteins alternate between glutathionylated and cysteinylated states in response to environmental conditions.

The newest incarnation of Orbitrap is the Orbitrap Fusion™. David Horn, biosoftware product manager at Thermo Fisher Scientific, says the Fusion instrument offers the highest field strength yet in the Orbitrap line, as well as the highest resolution—450,000 at m/z 200 vs. 330,000 for the Orbitrap Elite.

Another option for top-down studies is the qTOF, such as Waters’ Synapt G2-Si. With resolution up to 50,000, Synapt mass spectrometers feature front-end ion mobility separation, which provides another level of fractionation by molecular charge and shape, and a time-of-flight on the back end.

According to Jim Langridge, director of discovery and life sciences at Waters, time-of-flights provide excellent high m/z ion transmission, a key requirement for top-down proteomics, and thus offer a “good compromise” between resolution and technical accessibility. “You can have the best resolution in the world, but if you can’t see an ion, you can’t resolve it.”

Fragmentation techniques

According to Langridge, the primary challenge in top-down proteomics lies on the front end, with sample preparation and separation. “Dealing with intact proteins isn’t trivial,” he says—keeping them soluble, making sure they don’t denature and resolving them chromatographically, for instance. Proteoforms, especially, can be difficult to separate, as they often are so similar in size and charge.

Top-down researchers frequently go to extremes to resolve their protein mixtures. In one 2011 study, Kelleher and his team used a four-dimensional separation strategy to dive into the human proteome and still identified “only” 3,000 proteoforms mapping to 1,043 genes, most of which were 50,000 Da or less [3]. (According to Paša-Tolić, most top-down studies concentrate on relatively small proteins up to about 40,000 Da; the new 21-T FT-ICR could push that past 100 kD, thereby providing access “to the majority of the proteins for many or most organisms.”)

Another issue is fragmentation. Although it’s relatively easy to fragment a peptide into subfragments in a mass spectrometer, intact proteins are tougher nuts to crack. According to Langridge, one common approach for peptides, called CID (collision-induced dissociation), is relatively ineffective on proteins. “The protein is just so big that you can’t get enough energy in, and you don’t see any fragmentation.”

One popular alternative for top-down work is ETD (electron-transfer dissociation), which introduces a charged molecule into the mass spec to chemically break the protein backbone. Another option, available on the Orbitrap, is HCD (higher-energy collisional dissociation).

According to Horn, the Orbitrap Fusion offers users considerable flexibility in mixing and matching fragmentation techniques at will, which is required for complete protein characterization. For instance, the instrument enables a hybrid method called EThCD—ETD followed by HCD analysis of the resulting fragments.

“The more fragmentation techniques you have available and the more combinations you have, the more likely you are to get complete sequence coverage to confidently distinguish one proteoform from another,” Horn says.

New fragmentation techniques also are in development. One, a UV-laser-induced photodissociation strategy being developed by Jennifer Brodbelt at the University of Texas at Austin, “looks very promising,” says Paša-Tolić. Another promising approach, says Langridge, is surface-induced dissociation (SID), from Vicki Wysocki at Ohio State University. SID accelerates an ion toward a surface, causing a “glancing blow” that fragments the protein.

Bioinformatics issues

The top-down challenges don’t stop when data collection ends. Data analysis, too, is difficult. In particular, says Paša-Tolić, computational tools exist to identify proteins, but proteoforms are another matter. “The tools for identifying the proteoform and confidence metrics for that are pretty much nonexistent at this point,” she says.

Currently there are two popular options for analysis of top-down data. ProSight PC (developed in Kelleher’s lab and commercialized by Thermo Fisher Scientific) matches spectra in an “error-tolerant fashion” against a database of candidate proteoforms created by combining sets of known and possible modifications, splice events and polymorphisms, Kelleher says. MS-Align+, developed by Pavel Pevzner of the University of California, San Diego, and his then-post-doc, Xiaowen Liu, uses a spectral alignment algorithm to match spectra against proteoforms that may contain unexpected post-translational modifications -- an approach that is more flexible but also more computationally difficult.

Recently, Pevzner and Liu (now at Indiana University-Purdue University Indianapolis) described a new version of their software [4]. MS-Align-E allows researchers to tailor their search by indicating, for instance, that they know or suspect a given protein is phosphorylated.

“Because we know the type of post-translational modifications, it reduces the search space of the algorithm, so we can identify proteoforms with multiple modifications,” Liu says.

Ultimately, says Kelleher, it makes little sense to expect top-down strategies to replace bottom-up—they serve different functions. In cell biology or biomarker discovery, for instance, bottom-up helps researchers home in on proteins of interest, while top-down excels at doing a deep dive into the biology of a specific protein or class of proteins. It isn’t easy, he concedes. But if you have the means, says Kelleher, and especially if you’re interested in biomarker discovery and validation below 40 kD or so, it “becomes more of a serious question every year: Why would you not employ top-down proteomics?”

References

[1] Smith, LM, Kelleher, NL, Consortium for Top Down Proteomics, “Proteoform: A single term describing protein complexity,” Nat Meth, 10:186-7, 2013. [PubMed ID: 23443629]

[2] Ansong, C, et al., “Top-down proteomics reveals a unique protein S-thiolation switch in Salmonella typhimurium in response to infection-like conditions,” PNAS, 110:10153-8, 2013. [PubMed ID: 23720318]

[3] Tran, JC, et al., “Mapping intact protein isoforms in discovery mode using top-down proteomics,” Nature, 480:254-8, 2011. [PubMed ID: 22037311]

[4] Liu, X, et al., “Identification of ultramodified proteins using top-down tandem mass spectra,” J Proteome Res, 12:5830-8, 2013. [PubMed ID: 24188097]

  • <<
  • >>

Join the discussion