Proteomics Platform Development Hinges on Biomarker Discovery Study Design

 Proteomics Platform Development


Proteome profiling can identify new biomarkers for many different applications, including disease progression and the efficacy/toxicity of therapeutic intervention. Developing a proteomics platform is understandably challenging. Overall success is highly dependent upon experimental design, regardless of the platform used. Highlighted below are important considerations in the biomarker research process.

Collaborate with specialists

For proteomics-based clinical research, study design entails defining a clinical question, selecting patients and samples for analysis and carefully planning a series of experiments that generate reproducible results. Because these considerations require expertise in a number of fields, it’s advantageous to collaborate with other specialists, such as clinicians, additional proteomics researchers and mass-spectrometry technicians. It’s worth noting that there’s no clearly preferred platform for discovery of biomarkers and no comprehensive platform. (Initial and subsequent comparisons of different platforms, for plasma proteomics especially, indicate that each approach detects a slightly different set of proteins and is thereby capable of discovering a different subset of candidate biomarkers.) Focusing efforts on one platform may not uncover all (or even the best) biomarkers of a given disease, making collaboration especially valuable.

Minimize bias early

Throughput, cost and the requirements for reproducible and quantitative work are typical of all biomarker discovery projects. To maximize reproducibility, potential sources of bias need to be identified early. Pre-analytical basis can include systematic differences in patient populations or sample characteristics. Animal models display inter-animal variability, which affects the number of samples recommended to attain statistical significance. For example, transgenic models are homogeneous and therefore require a low number of animals (eight to 10 animals, for instance) in each treatment group. However, biomarker discovery in human samples is far more challenging. The human proteome is extremely complex and proteomic profiles are subject to environmental variability (i.e., time of day and diet) and pre-analytical bias (i.e., site of collection and storage of samples). With human samples, a much larger sample set is therefore required. The minimum recommended sample size is approximately 30 per classification group, for instance, treated vs. untreated subjects. In validation studies, however, the recommended sample size is even larger (100 to 1,000 samples, depending on the panel of biomarkers.)

Establish data analysis strategies

Before acquiring data, the project involvement of a biostatistician is recommended for both study design and data-analysis aspects. Mass spectrometry-based proteomic profiling techniques generate many peak intensity features per sample, significantly more than the total number of samples in a study. This results in high-dimensional data, which carry a higher risk of false discovery and over fitting of multivariate models. A biostatistician can also help calculate the number of samples required for statistical relevance and help plan data analysis strategies to minimize the effects of over fitting and random solutions—two common mathematical errors. (Over fitting is when a model is so tightly tuned to correctly classify the training data that it performs poorly on broader data sets. Random solutions of biomarkers occur when high-dimensional data create statistically significant solutions by chance.) A biostatistician also can help develop solid statistical assumptions and apply conservative feature selection and statistical cross-validation within a sample set. When possible, the analysis also should be tested using an independent validation data set.

Standardize sample prep

Sample preparation is a potential source of analytical bias, as well. After a platform is selected, standardize protocols for processing samples early on. Generally, for cell or tissue lysates, include protease inhibitors to minimize artifacts generated by proteolysis, unless working with serum or plasma samples. Test effects of sample denaturation prior to analysis. Denatured proteins often give better results than native proteins for several fractionation techniques. When using tissue lysates, determine the total protein concentration of each sample and adjust all samples to the same concentration with extraction buffer before diluting the sample into binding buffer. Process samples at the same time, if feasible. Always avoid repeated freeze-thaw cycles and use special caution with any liquid-handling steps. For instance, transfer larger volumes first into an appropriate volume of dilution buffer before transferring smaller volumes of the diluted sample. Parallel processing of all samples is recommended: Run quality control samples with experimental samples to monitor expected profiles and to calculate coefficients of variation.

The human proteome—estimated at more than 1 million proteins—is complex, diverse and dynamic. It is the proteins that contribute to the physiological homeostasis in any cell or tissue. Deciding which platform to use to perform large-scale proteomic studies often is based on personal preference or on so-called "figures of merit," such as dynamic range, resolution and the limit of detection. Careful study design is essential, regardless of the technology selected, helping to expedite the translation from biomarker discovery to demonstrable impact.