Metabolomics Software

 Metabolomics Software

It’s no surprise that all colon cancers are not alike. But there is a tumor localization aspect to the disease that is perhaps unexpected.

As it turns out, tumors on the right (ascending) side of the colon tend to fare worse than those on the descending (left) side. To find out why that might be, Gary Siuzdak, senior director of the Center for Metabolomics at the Scripps Research Institute in La Jolla, Calif., and Cynthia Sears, professor of medicine at  Johns Hopkins University School of Medicine, turned to untargeted metabolomics.

The team processed biopsied tumor and matched normal tissues from both sides of the colon, looking for metabolic differences, particularly between samples associated with a bacterial biofilm (which is common in right-side tumors). They observed strong variation in the abundance of polyamine metabolites, especially N1,N12-diacetylspermine, which could be as high as 62-fold more abundant in some cases [1].

That metabolite’s precursor, they found, was actually made in host cells, then acetylated in the bacteria that colonize the tissue, which feed on the molecule. That, in turn, induces host cells to proliferate, thereby potentially enhancing tumor aggressiveness. “It’s a symbiotic relationship,” Siuzdak explains.

And it is one that could not have been discovered without metabolomics. Or, more specifically, metabolomics data analysis.

When it comes to untargeted (that is, discovery-mode) metabolomics, says Mike Milburn, chief scientific officer at metabolomics service provider Metabolon, researchers typically want answers to a few related questions: what are the metabolic differences between different conditions, what is the identity of those metabolites, and how do those metabolites fit into the global metabolic pathways of the cell.

That latter question is perhaps most useful for researchers interested in translating metabolomics data into biological insight, says Jason Lu, associate professor of biomedical informatics at the Cincinnati Children’s Hospital Medical Center, who wrote a recent review on metabolomics data analysis [2]. Indeed, researchers can identify both compounds and biomarkers without pathway analysis

But to understand how those biomarkers relate back to biology, he says, “you should put these metabolites in a pathway context and try to understand how the process works in the context of these metabolites with other proteins and enzymes.”

Metabolon has answered such questions in some 5,000 studies to date, Milburn says, identifying up to 1,000 metabolites per sample for its clients. In one recent case, he says, the company helped a biotech firm identify the mechanism of action of a drug that company was developing. But researchers can also answer those questions on their own – provided they have the proper software. Here are a few of your options.

XCMS Online

Siuzdak’s tool of choice is XCMS Online, a cloud-based tool first developed in his lab in 2012. With some 11,000 users, XCMS Online allows researchers to upload LC- or GC-MS data, identify spectral features that differ across samples or conditions, and assign a putative identification to those features. Those data can be displayed in a so-called “CloudPlot” – an integrated, interactive graphic highlighting the most statistically interesting features in any dataset. Or, they can be assigned to metabolic pathways and modules using a built-in implementation of the pathway tool Mummichog.

XCMS Online users can also cross-correlate those data with proteomic or genomic datasets, Siuzdak says. And because the system is in the cloud, users can share their data and analyses with remote colleagues, thereby facilitating collaboration. New features include the ability to stream spectral data to the cloud as it is collected (as opposed to waiting until data collection is complete), and the ability to perform pairwise and multigroup analyses (as opposed to one-to-one comparisons).

To make its metabolite identifications, XCMS Online uses the online metabolite database, METLIN. Also created in Siuzdak’s lab, METLIN includes records for some 220,000 molecules, mostly based upon intact mass (that is, MS data), and XCMS Online based initial metabolite identifications on intact mass alone. But as multiple metabolites may share the same mass, many metabolomicists collect tandem mass (MS/MS) spectra, as well.

Siuzdak’s team has generated MS/MS spectra on over 14,000 metabolites to date. And as of mid-August, they have fleshed out that collection with predicted MS/MS spectra on the remaining 200,000+ molecules based on machine-learning analysis of the experimental data already collected. Though not perfect, he concedes, “we have what we believe is a reasonably good approximation pattern of the fragmentation pattern of the molecule” – data that should help in compound identification using MS/MS spectra. And those predictions should improve in the months ahead, he adds, as the machine-learning approach evolves.

A desktop version of XCMS Online called XCMSPlus, developed in collaboration with mass spec vendor SCIEX, is also available. According to Baljit Ubhi, global staff scientist for metabolomics and lipidomics applications development at SCIEX, the company has its own manually curated library of 600-odd metabolites’ MS/MS data (the Accurate Mass Metabolite Spectral Library, created in collaboration with the University of Geneva), to aide in metabolite identification.

MS-DIAL

Another free analysis option is MS-DIAL, developed in the UC Davis lab of Oliver Fiehn. MS-DIAL accepts LC- or GC-MS spectra, selects and quantifies spectral peaks, and identifies them based on MS/MS data, exporting a list of chemical IDs and abundance data for analysis by external statistical tools such as MetaboAnalyst (see below).

According to Fiehn, MS-DIAL bases its identifications on MS/MS data, rather than intact masses. That, he says, allows for more accurate identifications. “If you have only the ion for an intact molecule, the best you can get is some elemental formula,” he explains. “And with an elemental formula, you can have multiple isomers.” The software also “disentangles” the multiple spectral features that can represent a single compound into one feature, simplifying the resulting data.

But perhaps most significantly, MS-DIAL was built to support “data-independent analysis,” a shotgun MS/MS strategy (akin to SCIEX’s SWATH method for proteomics) in which the mass spectrometer collects tandem spectra on as many ions as possible without regard to abundance, Fiehn shares. In this way, Fiehn explains, researchers can delve deeper into low-abundance molecules (which are often overlooked in data-dependent strategies) that turn out, during subsequent data analysis, to vary significantly between conditions.

In one example of the utility of this approach, Fiehn’s team used MS-DIAL and a public lipid database of some 220,000 MS/MS spectra (LipidBlast) to identify more than 1,000 lipids in nine algal strains used in biofuels research – data they used to work out the likely identity of one misidentified strain [3].

MetaboAnalyst

Both MS-DIAL and XCMS Online (as well as another related tool, MZmine 2) excel at spectral peak peaking, compound identification and quantitation. To assess the importance of the identified features, however, many researchers export those data into another tool, MetaboAnalyst.

According to developer Jianguo Xia of McGill University in Montreal, MetaboAnalyst helps researchers identify patterns and important metabolites in their data, using multivariate statistical methods such as principal component analysis (PCA). Users can also map those compounds onto pathways, assess their relationship to known metabolic signatures of disease, and perform biomarker analysis. “It’s a very comprehensive tool suite addressing a variety of needs.”

For instance, the software can perform “power analyses,” Xia says, to help researchers estimate how many samples will be needed in a study to answer a specific question.

That’s not the extent of the metabolomics tool set, of course (for a recent review, see [2]). For one thing, researchers comfortable eschewing pretty graphical user interfaces can probe their datasets from the command line using Bioconductor, R and/or the MATLAB programming language instead. But for those who lack those skills, or who simply prefer a user-friendlier experience, there’s no shortage of tools. And still more are coming. SCIEX, for instance, has developed with Illumina a cloud application called OneOmics for integrating sequencing and proteomics data into one analysis, says Ubhi, and metabolomics data integration is coming.
“Watch this space, ”shares Ubhi.


References

[1] Johnson, C.H., et al., “Metabolism links bacterial biofilms and colon carcinogenesis,” Cell Metabolism, 21:891–7, 2015. [PMID: 25959674]

[2] Ren, S., et al., “Computational and statistical analysis of metabolomics data,” Metabolomics, 11:1492–1513, 2015.

[3] Tsugawa, H., et al., “MS-DIAL: data-independent MS/MS deconvolution for comprehensive metabolome analysis,” Nature Methods, 12:523–6, 2015. [PMID: 25938372]

  • <<
  • >>

Join the discussion