Quantitative trait loci (QTLs) are genetic markers—such as single nucleotide polymorphisms (SNPs), restriction fragment length polymorphisms (RFLPs), tandem repeats, and the positions of transposable elements—associated with a specific variable trait within a population. Those traits typically vary continuously (quantitatively) rather than in a binary fashion, and are likely polygenic—that is, they likely result from multiple genetic factors rather than being controlled by a single gene. Typical examples include rust resistance in rice, fruit flies’ longevity, and human height.

QTL mapping is a statistical analysis allowing researchers to associate the genotype (markers) with the phenotype (trait), most often with the goal of determining the genetic bases for the trait. The definition of a trait can be broadened to include things like RNA expression (called eQTL), methylation patterns (meQTL), and protein expression (pQTL).

This article looks at pQTLs a little more closely, outlining some of the benefits of pQTL mapping, as well as discusses how that information can be used to identify protein biomarkers.

What are pQTLs?

QTLs are not the same thing as genes. They are molecular markers that segregate more frequently with the traits in question. In the case of pQTLs, the traits are the appearance or quantity of protein variants under examination. These regions identified by pQTLs may contain one or many linked genes or their associated regulatory sequences that influence production of the proteins.

Both QTL mapping and genome-wide association studies (GWAS) rely on very large sample sizes in order to achieve the statistical power necessary for their analyses. But they differ in that GWAS look at a partial or entire genome to find the loci related to the relevant phenotype. QTL mapping may start with GWAS or a sequenced genome as reference.

pQTLs can be divided into two categories—cis and trans. As the name implies, cis-pQTLs are those markers that are proximal to the gene encoding the protein of interest. They are helpful in distinguishing whether a protein biomarker is causal for a disease, for example, or whether its up- or down-regulation is a consequence of the disease. Meanwhile trans-pQTLs are distal to that gene, and are more likely involved in its regulation.

The Systematic and Combined AnaLysis of Olink Proteins (SCALLOP) consortium, for example, is currently mapping pQTLs for hundreds of proteins from nearly 70,000 people (patients and controls) in 45 cohort studies, “which will yield much deeper insights into the trans-regulation of plasma proteins than what has been possible to date,” according to its website.

Why not just do eQTL mapping?

pQTL analysis makes use of various proteomics platforms to understand which proteins are present in a sample. SCALLOP, for example, uses Proximity Extension Assay (PEA) technology from Olink. It and other multiplex technologies, such as the SomaScan Assay by SomaLogic and Luminex, for example, can query a sample for a large number of known proteins simultaneously.

The SomaScan Assay has been used in several studies leading to the discovery of new pQTLs and improved understanding of protein dynamics in human health and disease. The technology enables the measurement of 7,000 proteins from a single 55 μL plasma sample. The SomaScan Assay provides high specificity with a dynamic range of 10 logs, detecting proteins in the fmol to μmol range, and % CV values less than 5%, making it highly reproducible.

Unbiased tools such as various iterations of mass spectrometry can also be used to detect proteins not previously identified.

But proteomics generally entails more difficult and time-consuming processes, with tools and databases generally lagging behind those of genomics. So why not just do RNA-seq to determine protein expression levels and variation instead?

While protein levels generally correlate with RNA levels, many factors beyond transcription and even post-transcriptional regulation can affect protein expression. Differences in either the coding or non-coding regions of the transcript may cause it to interact differently with the cell’s translational machinery, for example, while other differences may affect the stability of the protein itself. So while eQTL mapping may complement pQTL mapping by indicating how much variation in protein is due to transcriptional pathways, it cannot substitute for it. This is especially the case when looking at trans-pQTLs, which are likely to point to yet-to-be-discovered pathways.

Why care about protein pQTLs?

Knowing which alleles of which genes contribute to a particular trait affords an inroad into understanding what the proteins are that are encoded, how they are regulated, how they function individually, as well as where they fit into specific pathways and the interactome as a whole. pQTL mapping is often used as a step to determine the different contributions that particular alleles make to a particular phenotype, especially in more applied fields such medicine and agrigenomics. Yet that is certainly not the only—and sometimes not even the primary—goal.

Identifying the loci that contribute to variation affords insights into the hereditary basis of the phenotype. Which loci, for example, are the major contributors to determining a trait, for example, and which make more minor contributions? Are the contributions additive, or subtractive? Which are dominant over which others? Are the contributions of a particular locus augmented or negated by the presence of a different, seemingly unrelated, locus, or by certain environmental factors? How do those answers change depending on where in the genome they are located?

pQTLs and protein biomarkers

pQTLs represent places in the genome linked to a variation of protein expression. Combining pQTL mapping with clinical data, and perhaps functional data, can help to identify proteins that may be causal for disease. They are thus tantalizing as potential drug targets.

But even when such proteins cannot be demonstrated to be the cause of a disease, if their presence or expression level correlates with disease in any way—it could be diagnostic for a disease, say, or indicate a potential for disease, or signal progression of or remission from disease—they can also be useful as protein biomarkers. Protein biomarkers can be used as surrogate endpoints in a clinical trial, for example, to indicate that a given treatment has succeeded even when the desired result cannot be directly measured, or used to fine-tune a protocol.

pQTL mapping is not new. But recent explosions in proteomic and genomic technologies and analytics have allowed for an easier and more powerful workflow than ever before—as exemplified by the spate of recent papers highlighted on the SCALLOP homepage.