Agilent SurePrint G3 Microarrays for the Detection of Single Exon Deletions, Uniparental Disomy, and Consanguinity: Recorded at ESHG 2010

Dr. James Lupski: Thank you very much, Natalie

Obviously, I'm not Art Beaudet, who's in the program. And I want to make it clear that Art put together this presentation but unfortunately had back surgery this week. But, he's recovering greatly, but his doctor insists he can't travel

So, I'll be presenting some of this work that is combined Baylor effort and an Agilent effort to develop a new array

So, the Baylor team--that shows them in the background--consists of over 15 faculty members and many others involved in trying to bring high-resolution human genome analysis to the clinic

And this is an effort that has been ongoing for seven or eight years. And about 30,000 arrays actually have been signed out to date

If we look at a history of human genome analysis just briefly, we know that chromosome, the right count wasn't even known until 1956. And then abnormalities in numbers have--were discovered in the '59--1959, early '60s

Banding came in the '70s. And in '81, the first microdeletion syndrome was described at the Kleberg Cytogenetics Lab at Baylor, Prader-Willi syndrome by David Ledbetter and the group there

In the '80s, we had fluorescence in situ hybridization that limited us to small genomic intervals. And the physician needed to know or tell a laboratory what area to look at

Telomere FISH came in the '90s. And that changed the focus of our thinking clinically from locus specific to genome wide. And I don't think we'll ever go back. We have to think genome wide when it comes to traits that we see in humans

The 21st century has brought in the array to scan the entire genome as well as personal genome sequencing. And I'll--when we talk about the first one

And I don't need to talk to this audience about arrays, the fact that we can get over 1,000-fold greater resolution than conventional cytogenetics

Suffice to say we want to continue to push this technology to understand how changes in the human genome with respect to copy number can be associated with traits

And certainly, we have an over 20-year history at Baylor of copy number variation as it relates to human disease, starting with the elucidation of the steam T18 duplication in 1991

So, the story has been one of increasing pixels. And if we look at the first arrays offered clinically in February of 2004, there were 366 BACs. And as of June last year, one used 180,000 oligonucleotides

So, we see increasing pixels as being an important part of resolution of the human genome. And that has translated into finding clinically relevant copy number variations, as one can see on this graph, by each version of the array, the percent yield that we have had of finding something that has been useful

But, I think also important to note is the fact that design improvement that is based on a mechanistic-based design, as we learn how copy number variation occurs, whether it's by nominal column, all of this recombination, nominal distend [sp] joining, or replication-based mechanisms, and we learn what are the regions of genomic stability, we can target our approach to better find those areas

As an example, doing a bioinformatic analysis for low-copy repeats, I think Ethan Eyeclar [sp] pioneered the idea of genome first to define new genomic disorders

And in his original paper with Andy Sharp, they showed 130 region. Our bioinformatic analysis shows 250 regions, actually more than 250 because we don't use repeat maskers

And these have all been implemented in our array that has gone into clinic

Addition of mitochondrial genome has gotten us out of the pure focus of nuclear and Mendelian genetics. And the exon coverage was an important new resolution that is yielding a lot in the clinic

The problem was that Baylor currently offered this high-resolution array. But, we also offer an Illumina SNP array to detect uniparental disomy and consanguinity

We'll need to choose between these two or both. No one array can do it all. And there will be different reasons why you might want to order different arrays

So, why are UPD and consanguinity important? Better detection of CNVs associated with Angelman's, Prader-Willi, and other UPD syndromes, I think a lot of this also motivated by the fact that UPD was described at Baylor by Art Beaudet in the late '80s

Detection of regions of identity by descent contributing to recessive disorders in first-cousin matings, this is important in small rural towns and other populations, and detection of incest, especially as sexual abuse of young females in the home

So, the objective was to develop an array platform that preserves the resolution, signal to background, and ease of custom design of the current Agilent array, but allow detection of significant blocks of absence of heterozygosity related to either to uniparental disomy of a whole chromosome, a chromosome off, or even potentially some regional uniparental disomy and also consanguinity

The basis of the system developed by Agilent uses--the combined CGH and SNP assay works by measuring SNPS using an Alu/Rsa restriction enzyme

If you have a cut, you get low signal. If you don't have a cut, you get a high signal on your array. But, it's incorporated directly into the same protocol so you don't lose absence of your dynamic range for the copy number

So, it measures about 50,000 SNPs, identical CGH protocol. And the current system is looking at two--about 300,000 CGH probes and 50,000 SNP probes

And when we work with them and then did side-by-side experiments in our laboratory at the MGL, it was shown that, if you runt he Agilent against the Illumina platforms on specific, in this case, four different HapMap samples, the concordance is really quite remarkable. So, we're talking greater than 98 percent for these data

Comparing the technologies, the aCGH uses single sequence oligos of 60 base pairs, two DNAs per hybridization, resolution down to the size of an oligo, no detection of UPD or consanguinity

The SNP part allows you--it uses multiple oligo sequences, one DNA per hybridization. Resolution is limited by the density of SNPs and the signal background. It detects UPD and consanguinity so we can get both information from the same array

So, what do we mean by absence of heterozygosity? And this is distinct from loss of heterozygosity in tumors. It occurs with consanguinity and associated identity by descent, occurs with uniparental disomy, is present in 100 percent of monosomy rescue, one of the four mechanisms that Art had proposed in the '80s that you get UPD by an AOH present in majority but nearly actually many of the trisomy rescue

For this audience, I probably don't have to go into it. But, just to remind you of what UPD is, you can have two of the different chromosomes from one parent, heterodisomy, or you can have two copies of only one chromosome from one parent, the so-called isodisomy

Examples of syndromes actually include Prader-Willi and Angelman, initially showed to be due to UPD by Rob Nichols. Some other conditions that you know about, and I was kind of curious to see this one put there, maternal UPD14 because, actually, we first described that in the early 1990s

But, suffice it to say there are several syndromes that the clinician would want to investigate by high-resolution genome analysis that you're going to get from this SNP information, not from the copy number information

So, here actually shows you normal chromosome 15. And we're now comparing the actual array that was designed and implemented with the Agilent team. And we're comparing it to the clinical use or the use of Illumina arrays

And what you can see is, with normal chromosome 15, you have the BLL frequency that has a lot of heterozygotes. And we can clearly see that in the Agilent array as well as we see normal copy number across chromosome 15

The Prader-Willi syndrome deletion, on the Illumina array, of course, you can see the absence of the B allele frequency or the absence of heterozygosity. And you can see copy number deletion

But, note the dynamic range of the copy number information on the SNP array. It's not as robust. And it's not as resolving in its capability

So, to give you an example of the current clinical array that we use, you can see here this is a patient with Prader-Willi syndrome

There's a tremendous dynamic range of the copy number here, as we see by these oligos. We put positive controls put there in orange, which are very variable high-copy-number genes, like amylase or the defensin-opsin locus, little protein A, exon two, etc., and use gender-matched controls

In the current clinical system, we drill down. So, now, we've gone from a genome view to a chromosome view. And you can see the dynamic range is much different from what I just showed you on the SNP array

We have much more robust dynamic range. And we can drill down to the exon. So, this is all the same data from that original genome-wide array I showed you. We're just clicking down on our informatics to get to exon-involved resolution

Now, if we find a patient with Angelman isodisomy, we can see the Illumina array but the absence of the B allele frequency. We can see that here now on this combined array, which is very, very nice, shows it has the normal copy number but has the same chromosome from one individual parent

We can also see for the first time consanguinity. And the clinicians have started to use this idea as a way to decide what test to use in possible consanguineous matings to look at genes that might be of interest

And this just shows you what coefficient of consanguinities occur between different types of matings and then the expected absence of heterozygosity in megabases

So, here, it shows you what it looks like in a case of incest, where we can get detection. And an Illumina array just sticks out at you that, pow, there's lots of areas of absence of heterozygosity, much more than you would expect by chance, in this case 700 megabases worth

And now, we can see the same thing on this combined array but then also get copy number information out of it. But, I wanted to give you--I'm sorry--a direct comparison

So, just to get the global picture here, it's really quite remarkable that you find the same regions of the human genome by this combined array that you found specifically by the SNP array

You can find every single region around the genome for absence of heterozygosity. And this is chromosomes one through nine

And now, if we look at chromosomes 10 through 11, we see remarkable concordance between the two assays that look at the genome SNP information. But, we still maintain the copy number information

Here's a case, another case of incest, suspected brother-sister mating. And clearly, we can see the absence of heterozygosity on the original Illumina array, but we can see the same thing now on this combined array

And now, if we do the head-to-head comparison, where you see the Agilent data of the combined array compared to the Illumina SNP array, we're seeing the same regions of absence of heterozygosity, chromosomes one through nine and 10 through the six chromosomes

And here quantitates the data, where in fact, the R squared is 0.995 for the SNP information. There's a couple SNPs that were not detected by the Agilent system. But, there's a couple of regions or absence of heterozygosity not detected by the Illumina system also. So, this was very important for us to see

So, the conclusion from this part of the talk at this Agilent SurePrint G3 microarray calls--the calls for absence of heterozygosity are essentially identical to calls with established SNP array platforms

AOH calls are typically quite large as determined by a recombination of breakpoints

Now, we wanted to further look at, well, what are some of the specifications. I think we need to put in these specifications, the new combined arrays

And then they're not intended to provide genotype calls at arbitrary SNP sites. That's not the idea

The boundary of the copy neutral absence of heterozygosity is limited by SNP density. And if you have less SNP density, you're going to have less ability to find the boundary, copy number determined entirely by non-SNP oligos, which maximizes the signal-to-noise ratio

So, for what we are interested in, we're very excited about the implementation of this new array

So, you want to know, will the SurePrint microarrays with SNPs still detect exon deletions that are on the current Baylor array

That's a very, very important question to us because we don't want to go backwards in our thinking of resolution of the human genome. We want to go up in pixels, but we also want to get parental inheritance

So, we tested this also in a blinded fashion

Exon-by-exon coverage is far different from gene coverage. It requires custom design. We spent two years implementing this by several different experiments. It's impossible with BAC arrays, and it's impossible with SNP arrays in our hands

You can detect deletion of duplication of one or more exons. And we have a paper in review of about 40 families right now, many of them actually initially thought to have a condition and sent for gene sequencing that came back normal. But, you could find the single exon drop out by this specific array

Here, it gives you some examples of those. So, this is Rubinstein-Taybi syndrome, which can be caused by point mutations in either CREBBP or the EP300 gene

You can see deletions as shown by those red dots. And to the right is the genome browser view showing, in the case of CREBBP that specifically exon 27 is deleted, in red shows the individual oligos interrogating that region. In EP300, exons 24 to 27 are deleted

Interestingly, conditions like the STAR syndrome just described two years ago, we can pick up exon deletions never reported before. Just point mutations have been reported in these specific conditions

Now, with EP300, here, we want to show you that, clearly, we confirm the array data by both MLPA in this instance and then here showing a blow-up the original data, just those exons are deleted. The surrounding exons are not deleted

And we always do an independent molecular confirmation by either MLPA for the copy number or breakpoint sequence analysis across small specific deletions

Now, using the combined array, we can actually find the same exon dropouts. So, we have not compromised the CNV resolution of the human genome by virtue of adding these SNPs onto this array. And this was extremely important to us to move forward in human genome analysis

Same thing for the PTEN exon deletion. I just show you using the new array, we can really detect it without a specific problem. Here shows you at the three levels of resolution or chromosome-chromosome segment down to the individual gene to try to convince you that, indeed, we still do have that ability to see the exon dropouts

So, our conclusion is that this Agilent SurePrint G3 microarray maintains the resolution and signal to noise of previous Agilent arrays

The current BCM high-resolution array with exon-by-exon coverage for 1700 OMEM genes and SNPs added is being validated and will be launched very soon

A single array can now provide high-resolution array CGH for copy number detection, copy number variation detection and also detection of UPD, consanguinity, and absence of heterozygosity in the human genome of the individual that you are examining

I want to thank the Agilent team for its close association and work with the Baylor investigators to develop this and research array that we use and to understanding the causes of different human diseases

Thank you for listening. I'm happy to take questions

More Information