Break the Metabolomics Bottleneck with These Data Analysis Tools

 Metabolomics Data Analysis
Jeffrey Perkel has been a scientific writer and editor since 2000. He holds a PhD in Cell and Molecular Biology from the University of Pennsylvania, and did postdoctoral work at the University of Pennsylvania and at Harvard Medical School.

Researchers increasingly understand that if you really want to understand cell behavior, you need to look at metabolites. Genes encode proteins, and proteins operate on small molecules. The presence and abundance of these molecules—collectively called the metabolome—reflect and influence health, nutrition, the immune system and more.

As with other ‘omics disciplines, data collection is only one and certainly not the greatest problem metabolomics practitioners face. The bigger problem, most agree, is figuring out what metabolomics datasets really mean.

“Data analysis is still a huge bottleneck,” says Yingying Huang, metabolomics marketing manager at Thermo Fisher Scientific.

Fortunately, an ever-growing set of analytical tools is helping to crack that bottleneck open.

Spectral libraries

Fundamentally, metabolomics data analysis has two parts: peak picking and peak identification. Peak picking is the process of sifting through multiple datasets representing different conditions—healthy and diseased patients, say—and identifying the spectral features that differ statistically between them. After those peaks are found, the compounds those features represent must be identified.

Multiple software packages can handle the first half of that problem, including both commercial tools (e.g., Agilent Technologies’ MassHunter Profinder, Bruker’s ProfileAnalysis, Thermo Scientific’s SIEVE™ and Waters’ Progenesis QI) and free options (e.g., MZmine and XCMS Online). A handful of libraries are available or in development to deal with the second.

Gary Siuzdak, director of the Center for Metabolomics and Mass Spectrometry at the Scripps Research Institute in La Jolla, Calif., says his popular METLIN database (available free online or from Agilent Technologies) currently lists more than 240,000 compounds, including 11,600 with MS/MS spectral data. The literature-curated Human Metabolome Database (HMDB) has nearly 42,000 compounds, including 1,164 with MS and MS/MS data. And there are other options, including ChemSpider and MassBank.

Thermo Scientific has launched a library of its own, called mzCloud, which should contain the curated mass spectra from some 2,000 compounds by next month’s American Society for Mass Spectrometry meeting, says Huang. A key distinguishing feature, she adds, is the nature of the spectral data mzCloud will include in the database: Unlike METLIN and HMDB, which have MS/MS data, mzCloud will include MSn datasets, “which [are] crucial for de novo structure identification and required for unknown structure determination.”

For those using nuclear magnetic resonance to drive metabolomics studies, Chenomx’s NMR Suite features “a very detailed library” of more than 300 spectral signatures, according to president Neil Taylor.

Still, it likely will never be possible to collect experimental data on every metabolite, says Oliver Fiehn, director of the West Coast Metabolomics Center at the University of California, Davis—there are simply too many of them, and not all are available in purified form for use as a standard. “At some point, you have to predict how MS/MS spectra will look,” he says.

Fiehn’s approach to that problem is LipidBlast, a library of 200,000 predicted lipid spectra modeled on the virtual peptide libraries proteomics practitioners have used for years. “That’s harder [to do],” Fiehn concedes, because unlike peptides, “metabolites come in all shapes and forms and sizes.” With LipidBlast, users can compare their unknown spectra against the library to see if they get a hit, just as DNA jockeys compare gene sequences to GenBank using BLAST. (Thermo offers a similar search engine, LipidSearch™, with some 1.5 million parent and fragment ions.)

Liang Li, professor of chemistry at the University of Alberta and co-PI of HMDB, recently launched a similar project called MyCompoundID.org to expand the utility of HMDB. MyCompoundID was built by taking some 8,000 metabolites from HMDB and calculating their masses and predicted spectral features after undergoing any of 76 possible biological transformations—for instance, phosphorylation, methylation or D-ribosylation. With some 375,809 records, the end result helps researchers narrow down the possible identity of unknown spectral features. “This is a way to get structural candidates to work with,” he says.

Cutting a SWATH…

Metabolomics studies can be either targeted or untargeted. In the former, researchers program their instruments (generally triple-quadrupole mass spectrometers) to scan for specific metabolites. In the latter, the instrument scans everything within a given mass range but collects MS/MS fragmentation data on only the most abundant ions.

That so-called data-dependent workflow is designed for convenience. But it falls short, Fiehn says, when researchers discover a particular ion varies significantly between samples but wasn’t selected for fragmentation because of its low abundance.

Recently, Ruedi Aebersold described a solution to this problem for proteomics studies, which has been commercialized by AB SCIEX [1]. Called “SWATH™ MS,” the strategy eschews data-dependent processing in favor of a data-independent approach where all the ions entering the mass spectrometer are fragmented and analyzed. The method covers a broad mass range by stepping through user-defined isolation windows 25 m/z wide, repeating, and then sorting out the resulting fragments computationally. (Those 25-m/z bins are called “swaths,” hence the name.)

In 2013, Washington University chemist Gary Patti  applied a similar approach to metabolites using an Agilent 6520 Q-TOF mass spectrometer and a custom R package [2]. AB SCIEX is now applying the SWATH technique to metabolomics, says Fadi Abdi, senior marketing manager for lipidomics, metabolomics and imaging.

Users collect high-speed spectral data on the TripleTOF® mass spectrometer and interpret it using MS/MS spectral libraries, similar to the proteomics approach. “In data-dependent analysis, if you didn’t trigger on your molecule, you cannot identify it,” Abdi says. “SWATH allows you to collect data on all detectable species in the sample, providing more comprehensive quantitative coverage.”

Fiehn and colleagues in Japan have also developed a custom SWATH analysis tool, called MS DIAL, which they plan to launch at a conference next month. Although it was only just completed, Fiehn says, “we are somewhere between 40% and 50% better than before, in terms of compounds we can identify” in untargeted metabolite scans—though many unidentified compounds still remain.

Pathway analysis

After researchers identify interesting metabolites, they need to work out their role in the biological system they’re studying. This is where pathway-analysis tools come in.

Pathway analysis enables researchers to map metabolites onto known biochemical pathways to provide clues to possible genetic players as well as other metabolites to investigate.

Fiehn’s lab has written a pathway-analysis tool called MetaMapp, and most commercial metabolomics data-analysis packages now include pathway analysis, too. Thermo Scientific has such a module in its SIEVE data-analysis package that ties into the KEGG pathway database, for instance, and Bruker Daltonics will launch its Compass PathwayScreener tool at ASMS in June. The MetPath module to Biocrates Life Sciences’ MetIDQ™ software and Agilent’s Mass Profiler Professional also offer this feature.

But simply mapping metabolites onto known pathways isn’t always enough to see the big picture, warns Mike Milburn, chief scientific officer at Metabolon, a metabolomics service provider in North Carolina.

Metabolon has some 3,000 metabolomics studies under its belt and performs 600 to 700 studies annually, half with academic clients, Milburn says. That experience gives the company a bird’s-eye view of metabolism that less-experienced researchers may struggle to attain, says chief technology officer Steve Watkins.

“We can take a much more global view and also a very quantitative and focused view of how pathways react to stimuli and disease and how everything is coordinated,” Watkins says.

For many researchers, the skill, expertise and cost required to initiate a metabolomics workflow make outsourcing to companies like Metabolon a no-brainer. But those who feel they have what it takes to do the job themselves will find no shortage of computational tools to help.

Either way, says Aiko Barsch, market manager for metabolomics at Bruker Daltonics, “I would encourage new customers to get started into metabolomics, because there’s so much information contained in there. And so much new to discover.”

References

[1] Gillet, LC, et al., “Targeted data extraction of the MS/MS spectra generated by data independent acquisition: A new concept for consistent and accurate proteome analysis,” Mol Cell Proteomics, 11:O111.016717, 2012. [PubMed ID: 22261725]

[2] Nikolskiy, I, et al., “An untargeted metabolomics workflow to improve structural characterization of metabolites,” Anal Chem, 85:7713-9, 2013. [PubMed ID: 23829391]

Image: Detail from KEGG metabolic pathways database.

  • <<
  • >>

Join the discussion