Gene Pattern Software from Broad Institute of MIT and Harvard

Overall

Performance

Ease-of-Optimization

What do these ratings mean?
Write a Review
Research Fellow
BIDMC, Harvard University
Medicine/ Rheumatology
Hierarchical clustering output from Gene Pattern. Gene expression levels of 3 different genes on a total of 96 samples (65 SLE, 27 control and 4 RA) are plotted.

Company:

Broad Institute of MIT and Harvard

Product Name:

Gene Pattern Software

Catalog Number:

N/A
Image
  • Company: Broad Institute of MIT and Harvard
  • Product Name: Gene Pattern Software
  • Catalog Number: N/A

Over the last couple of years we have been working on potential biomarkers for systemic lupus erythematosus (SLE). We are looking at gene expression levels of a number of genes using real-time PCR in an attempt to see whether these can be used for disease diagnosis or monitoring of disease activity. To investigate the value of a large number of parameters (in our case, gene expression levels) on a large number of samples, novel mathematical techniques like classification trees and hierarchical clustering are needed. These are complicated techniques for the average researcher to learn and the use of graphical user interface software that can assist in these tasks is essential.

GenePattern is online software that provides access to a broad array of computational methods that can be used to analyze large datasets. Compared to other software with similar functionality, it is relatively easy to understand and learn how to use.

Experimental Design and Results Summary

Application

Gene expression analysis

Starting Material

We looked into 40 genes linked to the pathogenesis of SLE and tested whether these can be used as potential biomarkers for these patients. Blood was obtained from 65 SLE and 4 RA patients and from 27 healthy subjects. T cells were extracted from each subject and RNA was retrieved using a column-based assay. After transcribing RNA into cDNA, quantitative real-time PCR was performed to measure expression levels of these 40 genes.

Protocol Overview

GenePattern was accessed through Broad Institute’s webpage: http://www.broadinstitute.org/cancer/software/genepattern . Analyses on GenePattern can be done either using the online software or downloading and installing the whole suite on your personal computer. After creating a username you can select one of the numerous analysis modules available. In order to use CART (an implementation of classification trees), comparative marker selection and hierarchical clustering, we had to choose the Prediction, Differential Expression Analysis and Clustering categories, respectively. Data had to be converted into .cls .gct or .res files for GenePattern to be able to recognize them. The process of conversion is not very difficult and full details on how this can be done are found on the GenePattern website. As soon as all the necessary data are uploaded, the process of analysis only takes a couple of seconds to complete.

Tips

Most GenePattern modules require cls .gct .res file formatting so once you convert your data into these formats, you are ready to use a large array of different methods in order to analyze them. GenePattern includes modules for a large number of bioinformatics analyses, from allelic discrimination and gene expression to flow cytometry and proteomic studies.

Results Summary

Based solely on gene expression levels and using an implementation of the classification tree algorithm found on Gene Pattern (CART), SLE versus control identity of samples was predicted correctly in 75% of cases. To find which of the 40 genes used in the analysis contributed the most in differentiating SLE patients from controls we applied comparative marker selection (CMS), another module found on Gene Pattern. CMS determines which variables (gene expression levels) display the strongest correlation with a subgroup class based on a test statistic score (t-test) used to assess differential expression. When keeping just the top three genes that were better at discriminating SLE from controls, the classifier’s accuracy increased to 83%. We then used an unsupervised clustering technique to organize samples into groups that are similar based on gene expression patterns. Using the hierarchical clustering module on Gene Pattern and assessing similarity based solely on the expression levels of these 3 genes, we were able to establish that both control and RA samples cluster separately from SLE samples (see figure).

Features Summary

Overall, GenePattern offers numerous capabilities for biologic data analysis and can substitute well for most commercial software. It is a bit less user friendly than some of these software packages, however, and might take some time to getting used to.

Additional Notes

Gene Pattern is very good for the average user, although it still requires some time to master. For anyone planning to work long-term on genomic data analyses, although much more difficult to learn, R (free from http://www.r-project.org) and Matlab (commercial, MathWorks) are two other options that you may wish to consider. These are command-based software and are much more difficult to learn how to use, but once you do they offer much greater flexibility for large-data analysis.

Image Gallery

Hierarchical clustering output from Gene Pattern. Gene expression levels of 3 different genes on a total of 96 samples (65 SLE, 27 control and 4 RA) are plotted.

Summary

The Good

Gene Pattern is probably the best and most comprehensive free online software biomarker discovery and genomic data analysis software packages.

The Bad

Although it is one of the easiest, it may still take some time for the average user to learn how to use. Also, if you plan to work on multiple data analyses, it can be a bit time-consuming.

The Bottom Line

After looking into many free biomarker discovery software packages, this seems to be the easiest to use and the one that offers the greatest array of functions. If you have more time to spend, you may want to try more specialized alternatives, like R and Matlab.

Join the discussion