Today, I'd like to talk about our recent method, that is CDNA capture, that can simultaneously identify point mutations, [unintelligible] divisions and gene fusions.
Target sequence capture method, or usually exon capture method for--from human genome, has been extensively used to identify point mutations or [unintelligible] deletions that may be the cause for cancer or as well as neuronal disorders.
And it's kind of successful. However, there is a trouble in that system because the--that technology cannot identify gene fusions because gene fusions usually take place in the intronic regions, and intronic regions are the target for breakage and ligation for gene fusions.
And since the information for those intronic region is discarded in the exonic capturing, so the conventional capture method cannot identify gene fusions.
It was not a trouble, say 10 years ago, because it was widely believed that gene fusions are rather specific to hematological malignancies or sarcomas, but not to epithelial tumors.
But, we discovered, in 2007, that a subset of lung cancer has a fusion-type tyrosine kinase, EML4-ALK, in subset for lung cancer.
And ALK is a receptor-type protein tyrosine kinase in normal cells, but gene fusion will make the fusion tyrosine kinase between EML4 and ALK that causes highly oncogenic tyrosine kinase.
And although we reported the discovery in 2007, 2011 the drug--the first ALK inhibitor has been approved by USFDA, and it's clearly one of the most effective anti-cancer drug on any epithelial tumors.
And in addition to EM4L4-ALK, the doctor [unintelligible] and his group also independently isolated discovered X fusions in prostate cancer. And the frequency of that X fusion is very high. Almost 30 to 40 percent in all prostate cancer patients. So, those two discoveries made essential to identify gene fusions in addition to point mutations and [unintelligible] regions.
So, there are many reports trying to identify novel gene fusions by [unintelligible] by sequencing both ends of fragments CDNS, and it works. But, the caveat is, it's not so efficient.
So, this slide shows you the--how many [unintelligible] is assigned to each gene. And here, the genes are ordered according to their [unintelligible] number. And I show here as a red bars in this slide, but as you see here, the bars are very extraordinarily enriched in a very small number of genes.
For example, in this satellite analysis, more than 21,000 genes--or more than 21,000 CDNAs were identified. However, only 56 genes - in other words, only .26 percent of total CDNAS - occupy more than half of the total [unintelligible], which indicate that, in our cells, only a small fraction of genes occupy almost half of the--all messenger RNAs.
So, if you just simply conduct RNA [unintelligible] on unselected message RNA or only the DTC DNA, it's very time consuming and laborious to comprehensively screen for meaning for future gene. So you have to target some selection.
And more than that, there are some reports trying to do that--trying to isolate gene fusions by RNA [unintelligible] or just paired in the sequencing for DNA fragment. And one paper showed that screening of 24 breast cancer specimen, as well as breast cancer cell lines, discovered more than 2,100 of independent genomic rearrangement. Another report also discover that screening of CDNA, RNA [unintelligible], you can identify almost 1 percent of CDNA, which are rearranged CDNAs.
So, these data clearly indicate that in cancer cells, there is a huge chromosome instability. And there are many fusion transcripts and I personally believe that most of them are passenger fusion transcripts, and the only a very small fraction of them may directly related to carcinogenesis.
So, you need to have some selection to enrich [unintelligible] or to, in an efficient way, identify cancer-causing gene rearrangement. And since we discovered EML4-ALK, it was indispense--it was essential for us to have the ability to identify gene fusion in addition to point mutations and [unintelligible].
So, we asked ourself, "How about capturing target CDNAs in--not only the target genomic fragment, but target relate--cancer-related CDNAs?" And if you isolate a CDNA and do the--both end sequencing, you should be able to identify the gene fusions.
For example, in [unintelligible] leukemia cell, the--all those cells contain [unintelligible] oncogenic fusions. And let's assume that you isolated only [unintelligible] CDNA containing [unintelligible] CDNAs and you fragment those [unintelligible] CDNAs. And even if you don't know whether [unintelligible] is fused to something - in this case, PCR--but, if you have [unintelligible] isolation probe, you will isolate [unintelligible] CDNAs. And some of them should contain not only the [unintelligible] but also the fragment carrying the fusion partner.
So, theoretically, if you isolate CDNAs and do deep sequencing, you can identify the gene fusions. And in this case, by using SureSelect system, you don't need to do a huge amount of sequencing. You can just focus on say, for example, 500 of human kinases or maybe several thousand of cancer-related genes.
So, we tried to develop a CDNA capture system in which a selected CDNAs are isolated and subjected to deep sequencing with the next duration sequencing system. But, we didn't know how efficient CDNAs could be isolated by SureSelect system, because every paper was dealing with genomic fragment isolation.
So, we first tried to confirm whether a SureSelect could efficiently isolate CDNAs in addition to genomic DNA. So, we used SureSelect human chromosome demo kit to isolate X chromosome fragments. And from the same cell line, we isolated genomic DNA in addition to CDNA. And both of those fragments were subjected to this X chromosome purification kit.
And here--so, the bottom line of this slide is shown here. If you isolate CDNAs and map to X chromosome--human X chromosome reference sequence, you can expect more than 80 percent of purification efficiency. And it's no less than the 75 percent of purification [unintelligible] for genomic DNA SureSelect capturing.
So, unexpectedly, the purification SureSelect mediated purification with CDNA is very efficient. So, we--at that time, we determined that we could go forward to do a large scale resequencing with the target capture cancer CDNAs.
And there is another good point for CDNA capturing. This slide shows you the--how DC percentage of each purification probe will affect the capturing efficiency. And it's well known that high DC percent will significantly affect the capturing efficiency in the genomic DNA isolation.
And here, the blue line shows you the real number for each probe, and the--each probe is sorted according to its DC percent. And here, you can see a very sharp peak of 50 percent, and it goes down on the--both sides.
However, in CDNAs, the DC percent--the capturing efficiency is less affected by the DC percent of each RNA probe, which may not be surprising, because in genomic isolation, there are some exons which have very high percentage with DC content, and those exons may be lost in the purification step.
But, in CDNAs, maybe the next exon is not so DC high. So, CDNA fragments is randomly generated, and some exon--some CDNA fragments may not only contain high DC exon but also contain the adjacent exons, which have intermediate DC percent.
And the--though [unintelligible] the probe was that those exons will efficiently isolate the fragments, which also contain the high DC content to exons. So, as expected, the RNA CDNA capturing is less affected by DC percent of the purification probe compared to the genomic DNA.
And we also asked whether the [unintelligible] number obtained for CDNA capturing may be dependent on the expression level of each messenger RNA. And here, the chronic [unintelligible] the leukemia cells were subjected to the CDNA capturing for 9,013 genes, and you can see this curve of the [unintelligible] number for each gene.
And more than 90 percent of genes were shown to have some [unintelligible]. And the main [unintelligible] number was more than 38,000 [unintelligible] per gene. And we also subjected that--the same CDNA to gene expression profiling and shows you that, as expected, the real number profile is quite highly concordant to the expression level of each gene. And here, Pearson's [sp] creation coefficient was .73, so it's very concordant to the expression level.
So, the--one drawback of our technology is that it may not be able to identify the mutations or [unintelligible] for genes with a very low expression level. And so, that may be the very important drawback, but if you--if we conduct significantly deep sequencing, maybe we can analyze intermediate or [unintelligible] low expressed genes in addition to the intermediate or highly expressed genes.
And so, we next ask whether our technology can indeed identify gene fusions. And for--as a positive control, we used chronic [unintelligible] leukemia cell line. And from this KCL-22 cell line, we conducted the CDNA capturing coupled with deep sequencing. And from those [unintelligible] sequences, we checked whether there are some [unintelligible] covering the [unintelligible] fusion point.
Even though the purification probes do not contain any BCR affinity probes, and--the RNA probe sets only contains the [unintelligible] isolation probe. But, you can identify, as expected, the [unintelligible] sequences which cover the fusion points, even though they are no BCR probes that [unintelligible] probes could successfully identify the fragments which cover the direct fusion points.
And you can discover those [unintelligible]--those sequences by making a pipeline which will look for the five dash and three dash [unintelligible] independently and then look for [unintelligible], which map to different genes on the--both ends.
And not only those gene fusions, we can of course isolate point mutations. And here, from the same cell line, p53 gene has a one point mutation, the glycine 266 to arginine. And not only that, we can identify insertions by sequencing those captured [unintelligible], or we can identify internal truncation by the analysis of those captured [unintelligible].
And so, our technology can identify gene fusions in addition to point mutations and [unintelligible]. And by applying that on technology, we can isolate captured [unintelligible], which have a novel gene fusions.
So, this is a relatively simple technology. Of course, we have to set up a custom pipeline, but technology's rather straightforward. And if you target maybe 3,000 or 5,000 genes, you can simultaneously screen for point mutations, [unintelligible] and gene fusion.
And so, we are now conducting a very large scale screening of various cancer specimen, especially for epithelial tumors, to look for the cancer-causing gene mutation.
Thanks.