Charting the Methylome Base by Base (with Bisulfite Sequencing)

 Charting The Methylome Base by Base (with Bisulfite Sequencing)
Josh P. Roberts has an M.A. in the history and philosophy of science, and he also went through the Ph.D. program in molecular, cellular, developmental biology, and genetics at the University of Minnesota, with dissertation research in ocular immunology.

The epigenome—those sequence-independent processes and alterations that help modulate the way DNA is expressed, including modifications of histones and of DNA itself—is a major focus of biomedical research.

Methylation of cytosine residues—5mC, typically (but not always) found at CpG dinucleotides, and often in clusters called CpG islands—is a key modification that has been linked to regulating cell growth, differentiation, proliferation and disease states. DNA hypermethylation in promoter and other regulatory regions is strongly correlated with gene silencing, for example, while the bodies of actively transcribed genes are themselves often hypermethylated. Mapping the epigenome has enabled researchers to identify potential functional sequences, annotating parts of the genome until recently considered “junk” [1]. Meanwhile, abnormal patterns of DNA methylation have been strongly associated with cancer and other developmental disorders.

Bisulfite sequencing—essentially bisulfite conversion coupled with (now) next-generation sequencing (NGS)—is essential for base-pair resolution of DNA methylation on a genome-wide scale, and it is the most popular technique to study it, notes Yi Zhang, Fred Rosen Professor of genetics and pediatrics at Harvard Medical School. Here we explore ways that NGS is being used to query the methylome.

Conversion factor

In a 2014 article in Nature Reviews Genetics, 29 extant combinations of fragmentation, library preparation, treatment, amplification and analysis techniques were described to look at the methylome [2]. “Almost any flavor combination is possible,” says Willard Freeman, the Donald W. Reynolds Chair of Aging Research at the University of Oklahoma Health Science Center. “It’s a bit of a confusing landscape.”

Among these are some techniques that immunoprecipitate methylated fragments, whereas others cut the DNA with methylation-sensitive (or -insensitive) restriction enzymes and select the appropriate sized-fragments, before sequencing. Yet when performed without an accompanying bisulfite conversion, they indicate only the region of the DNA that is methylated, not the exact location, notes Keith Booher, epigenetics services project manager at Zymo Research. Similarly, they don’t distinguish between a fragment with all its cytosines methylated from one that is only partially methylated. “But by using a bisulfite sequence approach, you can calculate the methylation ratio as precisely as you want.”

Bisulfite treatment converts any unmethylated cytosines to uracils, leaving the methylated cytosines unscathed. The uracils are read as thymines during sequencing, and comparison to a reference sequence reveals which of those nucleotides in the dataset were originally cytosines.

Bisulfite sequencing as an enabling technology has “completely transformed” DNA-methylation analysis “from a study that was imprecise and largely ambiguous to now digital nucleotide resolution,” says Bing Ren, professor of cellular and molecular medicine at University of California, San Diego, School of Medicine.

Pare it down

More and more people are doing genome-wide sequencing “just because the results are extremely valuable and informative,” he says. But “it’s still very expensive.”

“The whole genome is going to be about 3 billion base pairs. Sequencing that at 30x coverage, you’re spending tens of thousands of dollars per sample in just sequencing costs. I don’t need to look at the whole 3 billion bases—we have 150 [to] 200 megabases of the genome that are most pertinent to us (at least right now): every gene promoter, every CpG island, shore, and shelf and things like that,” says Freeman. His group has developed an oligo-capture approach, similar in concept to what is done with exome sequencing, that enables them to select only those sequences of interest based on prior knowledge of where they are found in the genome and thus reduce the amount of sequencing by almost 97%. Agilent’s SureSelectXT MethylSeq and Roche NimbleGen’s SeqCap EZ kits take a similar tack, with the latter “allow[ing] researchers to interrogate >5.5 million methylation sites per sample at single-nucleotide resolution,” according to the product website.

There is a host of ways to enrich the genome. Capture approaches, as well as amplification methods, are examples of “targeted” enrichment. These latter are best for smaller, hypothesis-driven and focused studies or for confirmation of genome-wide studies, notes Freeman. When using PCR or a similar method, it’s important to first treat with bisulfite, because amplification will destroy any base-specific methylation information, and then to use amplification primers specific for the converted sequence.

Another option in this space is Iilumina’s “cost-effective” Infinium HumanMethylation450 BeadChip Kit, notes Ren. This array-based solution does not have the ability to distinguish nearby polymorphisms or detect copy number variations, as an NGS-based solution can—of great interest in oncology—but “if you have a large number of samples and want to use methylation as a tool for classifying them, I think a 450K array is a pretty good choice.”

Researchers also turn to “nontargeted” strategies, such as using an antibody (MeDIP) or methyl-binding protein (MBD) to pull down CpG-rich or methylation-rich sites prior to sequencing. “We’ve found Reduced Representation Bisulfite Sequencing (RRBS) to be the best method,” says Booher. “Instead of using antibodies or affinity binding proteins, you process your sample using restriction endonucleases that will recognize a CG-dense motif. It cuts the genome into fragments such that when you do a size selection and build your libraries around those enzyme-treated samples, you end up enriching for those gene promoters and gene bodies and avoiding the gene deserts and repetitive elements.” The amount of genome coverage can be controlled by judicious enzyme combinations.

On your mark

It’s not just 5mC that researchers are looking at, either. There has been a lot of interest of late in the cycle of oxidation that takes cytosine through a series of reversible “epigenetic marks” from 5mC to 5-hydroxymethylated cytosine (5hmC) to 5-formylcytosine (5-fC) to 5-carboxycytosine (5-caC) back to unmethylated cytosine—and what these marks mean. There were some early challenges to the chemistry needed to detect these alternative states, recalls Freeman, but “methods have evolved and matured from there to where they are more reliable, and I think more people will be looking at that in the future.”

For now, the average lab using methylation sequencing as a tool—without a benchtop sequencer, such as the Illumina MiSeq, or a bioinformatician on staff—often relies on a core facility, and “there are a number of companies stepping into this space for more specialized epigenetics work, such as Zymo Research, Epigentek and EpigenDx,” Freeman says. “They’re saying, We’ll do that as a full-fledged service—you just send us DNA, and we’ll send you analyzed data.’” Whether DIY’ing or outsourcing, researchers are using multiple NGS approaches to better define and understand the role of the methylome in cellular regulation, development and disease.

References

[1] Rivera, CM, Ren, B, “Mapping Human Epigenomes,” Cell, 155(1):39-55, 2013. [PMID: 24074860]
[2] Laird, P, “Principles and challenges of genome-wide DNA methylation analysis,” Nat Rev Genet, 11(3):191-203, 2010. [PMID: 20125086]

Image: Zymo Research Website

  • <<
  • >>

Join the discussion