Dr. James Lupski: Thank you very much, Natalie
Obviously, I'm not Art Beaudet, who's in the program. And I want to make it clear that Art put together this presentation but unfortunately had back surgery this week. But, he's recovering greatly, but his doctor insists he can't travel
So, I'll be presenting some of this work that is combined Baylor effort and an Agilent effort to develop a new array
So, the Baylor team--that shows them in the background--consists of over 15 faculty members and many others involved in trying to bring high-resolution human genome analysis to the clinic
And this is an effort that has been ongoing for seven or eight years. And about 30,000 arrays actually have been signed out to date
If we look at a history of human genome analysis just briefly, we know that chromosome, the right count wasn't even known until 1956. And then abnormalities in numbers have--were discovered in the '59--1959, early '60s
Banding came in the '70s. And in '81, the first microdeletion syndrome was described at the Kleberg Cytogenetics Lab at Baylor, Prader-Willi syndrome by David Ledbetter and the group there
In the '80s, we had fluorescence in situ hybridization that limited us to small genomic intervals. And the physician needed to know or tell a laboratory what area to look at
Telomere FISH came in the '90s. And that changed the focus of our thinking clinically from locus specific to genome wide. And I don't think we'll ever go back. We have to think genome wide when it comes to traits that we see in humans
The 21st century has brought in the array to scan the entire genome as well as personal genome sequencing. And I'll--when we talk about the first one
And I don't need to talk to this audience about arrays, the fact that we can get over 1,000-fold greater resolution than conventional cytogenetics
Suffice to say we want to continue to push this technology to understand how changes in the human genome with respect to copy number can be associated with traits
And certainly, we have an over 20-year history at Baylor of copy number variation as it relates to human disease, starting with the elucidation of the steam T18 duplication in 1991
So, the story has been one of increasing pixels. And if we look at the first arrays offered clinically in February of 2004, there were 366 BACs. And as of June last year, one used 180,000 oligonucleotides
So, we see increasing pixels as being an important part of resolution of the human genome. And that has translated into finding clinically relevant copy number variations, as one can see on this graph, by each version of the array, the percent yield that we have had of finding something that has been useful
But, I think also important to note is the fact that design improvement that is based on a mechanistic-based design, as we learn how copy number variation occurs, whether it's by nominal column, all of this recombination, nominal distend [sp] joining, or replication-based mechanisms, and we learn what are the regions of genomic stability, we can target our approach to better find those areas
As an example, doing a bioinformatic analysis for low-copy repeats, I think Ethan Eyeclar [sp] pioneered the idea of genome first to define new genomic disorders
And in his original paper with Andy Sharp, they showed 130 region. Our bioinformatic analysis shows 250 regions, actually more than 250 because we don't use repeat maskers
And these have all been implemented in our array that has gone into clinic
Addition of mitochondrial genome has gotten us out of the pure focus of nuclear and Mendelian genetics. And the exon coverage was an important new resolution that is yielding a lot in the clinic
The problem was that Baylor currently offered this high-resolution array. But, we also offer an Illumina SNP array to detect uniparental disomy and consanguinity
We'll need to choose between these two or both. No one array can do it all. And there will be different reasons why you might want to order different arrays
So, why are UPD and consanguinity important? Better detection of CNVs associated with Angelman's, Prader-Willi, and other UPD syndromes, I think a lot of this also motivated by the fact that UPD was described at Baylor by Art Beaudet in the late '80s
Detection of regions of identity by descent contributing to recessive disorders in first-cousin matings, this is important in small rural towns and other populations, and detection of incest, especially as sexual abuse of young females in the home
So, the objective was to develop an array platform that preserves the resolution, signal to background, and ease of custom design of the current Agilent array, but allow detection of significant blocks of absence of heterozygosity related to either to uniparental disomy of a whole chromosome, a chromosome off, or even potentially some regional uniparental disomy and also consanguinity
The basis of the system developed by Agilent uses--the combined CGH and SNP assay works by measuring SNPS using an Alu/Rsa restriction enzyme
If you have a cut, you get low signal. If you don't have a cut, you get a high signal on your array. But, it's incorporated directly into the same protocol so you don't lose absence of your dynamic range for the copy number
So, it measures about 50,000 SNPs, identical CGH protocol. And the current system is looking at two--about 300,000 CGH probes and 50,000 SNP probes
And when we work with them and then did side-by-side experiments in our laboratory at the MGL, it was shown that, if you runt he Agilent against the Illumina platforms on specific, in this case, four different HapMap samples, the concordance is really quite remarkable. So, we're talking greater than 98 percent for these data
Comparing the technologies, the aCGH uses single sequence oligos of 60 base pairs, two DNAs per hybridization, resolution down to the size of an oligo, no detection of UPD or consanguinity
The SNP part allows you--it uses multiple oligo sequences, one DNA per hybridization. Resolution is limited by the density of SNPs and the signal background. It detects UPD and consanguinity so we can get both information from the same array
So, what do we mean by absence of heterozygosity? And this is distinct from loss of heterozygosity in tumors. It occurs with consanguinity and associated identity by descent, occurs with uniparental disomy, is present in 100 percent of monosomy rescue, one of the four mechanisms that Art had proposed in the '80s that you get UPD by an AOH present in majority but nearly actually many of the trisomy rescue
For this audience, I probably don't have to go into it. But, just to remind you of what UPD is, you can have two of the different chromosomes from one parent, heterodisomy, or you can have two copies of only one chromosome from one parent, the so-called isodisomy
Examples of syndromes actually include Prader-Willi and Angelman, initially showed to be due to UPD by Rob Nichols. Some other conditions that you know about, and I was kind of curious to see this one put there, maternal UPD14 because, actually, we first described that in the early 1990s
But, suffice it to say there are several syndromes that the clinician would want to investigate by high-resolution genome analysis that you're going to get from this SNP information, not from the copy number information
So, here actually shows you normal chromosome 15. And we're now comparing the actual array that was designed and implemented with the Agilent team. And we're comparing it to the clinical use or the use of Illumina arrays
And what you can see is, with normal chromosome 15, you have the BLL frequency that has a lot of heterozygotes. And we can clearly see that in the Agilent array as well as we see normal copy number across chromosome 15
The Prader-Willi syndrome deletion, on the Illumina array, of course, you can see the absence of the B allele frequency or the absence of heterozygosity. And you can see copy number deletion
But, note the dynamic range of the copy number information on the SNP array. It's not as robust. And it's not as resolving in its capability
So, to give you an example of the current clinical array that we use, you can see here this is a patient with Prader-Willi syndrome
There's a tremendous dynamic range of the copy number here, as we see by these oligos. We put positive controls put there in orange, which are very variable high-copy-number genes, like amylase or the defensin-opsin locus, little protein A, exon two, etc., and use gender-matched controls
In the current clinical system, we drill down. So, now, we've gone from a genome view to a chromosome view. And you can see the dynamic range is much different from what I just showed you on the SNP array
We have much more robust dynamic range. And we can drill down to the exon. So, this is all the same data from that original genome-wide array I showed you. We're just clicking down on our informatics to get to exon-involved resolution
Now, if we find a patient with Angelman isodisomy, we can see the Illumina array but the absence of the B allele frequency. We can see that here now on this combined array, which is very, very nice, shows it has the normal copy number but has the same chromosome from one individual parent
We can also see for the first time consanguinity. And the clinicians have started to use this idea as a way to decide what test to use in possible consanguineous matings to look at genes that might be of interest
And this just shows you what coefficient of consanguinities occur between different types of matings and then the expected absence of heterozygosity in megabases
So, here, it shows you what it looks like in a case of incest, where we can get detection. And an Illumina array just sticks out at you that, pow, there's lots of areas of absence of heterozygosity, much more than you would expect by chance, in this case 700 megabases worth
And now, we can see the same thing on this combined array but then also get copy number information out of it. But, I wanted to give you--I'm sorry--a direct comparison
So, just to get the global picture here, it's really quite remarkable that you find the same regions of the human genome by this combined array that you found specifically by the SNP array
You can find every single region around the genome for absence of heterozygosity. And this is chromosomes one through nine
And now, if we look at chromosomes 10 through 11, we see remarkable concordance between the two assays that look at the genome SNP information. But, we still maintain the copy number information
Here's a case, another case of incest, suspected brother-sister mating. And clearly, we can see the absence of heterozygosity on the original Illumina array, but we can see the same thing now on this combined array
And now, if we do the head-to-head comparison, where you see the Agilent data of the combined array compared to the Illumina SNP array, we're seeing the same regions of absence of heterozygosity, chromosomes one through nine and 10 through the six chromosomes
And here quantitates the data, where in fact, the R squared is 0.995 for the SNP information. There's a couple SNPs that were not detected by the Agilent system. But, there's a couple of regions or absence of heterozygosity not detected by the Illumina system also. So, this was very important for us to see
So, the conclusion from this part of the talk at this Agilent SurePrint G3 microarray calls--the calls for absence of heterozygosity are essentially identical to calls with established SNP array platforms
AOH calls are typically quite large as determined by a recombination of breakpoints
Now, we wanted to further look at, well, what are some of the specifications. I think we need to put in these specifications, the new combined arrays
And then they're not intended to provide genotype calls at arbitrary SNP sites. That's not the idea
The boundary of the copy neutral absence of heterozygosity is limited by SNP density. And if you have less SNP density, you're going to have less ability to find the boundary, copy number determined entirely by non-SNP oligos, which maximizes the signal-to-noise ratio
So, for what we are interested in, we're very excited about the implementation of this new array
So, you want to know, will the SurePrint microarrays with SNPs still detect exon deletions that are on the current Baylor array
That's a very, very important question to us because we don't want to go backwards in our thinking of resolution of the human genome. We want to go up in pixels, but we also want to get parental inheritance
So, we tested this also in a blinded fashion
Exon-by-exon coverage is far different from gene coverage. It requires custom design. We spent two years implementing this by several different experiments. It's impossible with BAC arrays, and it's impossible with SNP arrays in our hands
You can detect deletion of duplication of one or more exons. And we have a paper in review of about 40 families right now, many of them actually initially thought to have a condition and sent for gene sequencing that came back normal. But, you could find the single exon drop out by this specific array
Here, it gives you some examples of those. So, this is Rubinstein-Taybi syndrome, which can be caused by point mutations in either CREBBP or the EP300 gene
You can see deletions as shown by those red dots. And to the right is the genome browser view showing, in the case of CREBBP that specifically exon 27 is deleted, in red shows the individual oligos interrogating that region. In EP300, exons 24 to 27 are deleted
Interestingly, conditions like the STAR syndrome just described two years ago, we can pick up exon deletions never reported before. Just point mutations have been reported in these specific conditions
Now, with EP300, here, we want to show you that, clearly, we confirm the array data by both MLPA in this instance and then here showing a blow-up the original data, just those exons are deleted. The surrounding exons are not deleted
And we always do an independent molecular confirmation by either MLPA for the copy number or breakpoint sequence analysis across small specific deletions
Now, using the combined array, we can actually find the same exon dropouts. So, we have not compromised the CNV resolution of the human genome by virtue of adding these SNPs onto this array. And this was extremely important to us to move forward in human genome analysis
Same thing for the PTEN exon deletion. I just show you using the new array, we can really detect it without a specific problem. Here shows you at the three levels of resolution or chromosome-chromosome segment down to the individual gene to try to convince you that, indeed, we still do have that ability to see the exon dropouts
So, our conclusion is that this Agilent SurePrint G3 microarray maintains the resolution and signal to noise of previous Agilent arrays
The current BCM high-resolution array with exon-by-exon coverage for 1700 OMEM genes and SNPs added is being validated and will be launched very soon
A single array can now provide high-resolution array CGH for copy number detection, copy number variation detection and also detection of UPD, consanguinity, and absence of heterozygosity in the human genome of the individual that you are examining
I want to thank the Agilent team for its close association and work with the Baylor investigators to develop this and research array that we use and to understanding the causes of different human diseases
Thank you for listening. I'm happy to take questions