Next Generation Sequencing Roundtable

1. How in your mind, has pharma thus far handled the challenge of increased NGS data mass as it applies to informatics and computing hardware?

JS: The industry clearly has ability to work with large and sophisticated datasets – storage and analysis. My impression from attending a few pharma-related meetings is that these folks are struggling, as is the genomics community, with both the scale and nature of these data. Assuming that samples aren’t limiting (rarely the case), and given rapid decreases in cost, one can amass rapidly growing datasets; just when you thought you had it under control a new machine comes out that gives higher throughput! Improvements in chemistry and software mean that bioinformatics must continually be updated to accommodate changes in the data/error models. Exacerbating the storage problem, users want to store data off the machine so it can be “called” again as algorithms improve. Identifying the variants is challenging because reference genome data against which to compare experimental samples aren’t yet good enough, though this is improving rapidly with 1000 Genomes and expanding numbers of publicly-available individual genome data sets. The available analytical software is improving but this, too, hasn’t “settled down.” My impression is that pharma is working hard to understand and grapple with these challenges, but they’re not solved here or in any other group of active users.

JV: Even though initially the pharmaceutical industry was slow the application of NGS technology, it has now made great leaps forward with NGS being used in all aspects of the drug discovery pipeline including target discovery, safety assessment, compound screening, clinical platforms and diagnostics. In terms of data, increased familiarity, the aid of external partners and the availability of analysis tools, have helped with data transfer and storage solutions and decisions regarding long-term management of raw and processed data.

GC: I believe that pharma is still working through the growing pains of working with NGS data. Computing and, especially, data storage resources comprise a significant hidden cost that comes along with NGS. I think all large institutions are trying to figure out how much of these resources they need and what they are willing to spend to get them. Computing capabilities have increased exponentially with time, but now, so do the capabilities and requirements of NGS.

As we have gotten used to doing with microarray and other more mature platforms, scientists would like to be able to do analyses across different data sets generated at different times, by different groups, and with different technologies. We would like to quickly pull up browsers showing high level results and then drill down into and export the underlying data. The software to do all this for NGS is either very young or non-existent. These software demands are another challenge that pharma is still working on meeting.

AB: By in large, pharma has been extremely hesitant to test the waters of NGS. The sheer amount of data, and the necessary infrastructure required to operate these has been prohibitive given the current economic climate. Most NGS performed by pharma is oursourced. The success of NGS in a traditional research setting can be attributed largely if not solely to the economic stimulus money that pharma has not had the luxury of leveraging for the capital equipment required to support NGS systems.

2. In your opinion, has NGS replaced alternative techniques such as the use of genome and expression microarrays? If not, what still needs happen before it does?

JS: Microarrays continue to cost less, and arguably the data quality can be better. With the ability to design custom arrays, it may make sense for at least the near future to analyze an initial set of samples by NGS, identify the variants likely to be of interest, and then build and use an array for analyzing a large number of samples. The knowledge base for how optimally to prepare RNA samples and to enrich for a desired fraction, is improving. NGS cost still needs to decrease because very deep sequencing is needed. Longer reads would be of high value to assess allele-specific splice variant expression. The role of RNA modification is very hard to assess today; new technologies may enable this aspect of the data.

JV: Within the last few years, the use of NGS for genome sequencing seems to have rapidly replaced traditional sequencing methods and the application of chip-seq technologies has tremendously advanced the field of epigenetics. Most remarkably, there is confidence in the methods for data processing and analysis. However, RNA-seq has yet to widely replace microarray technology with the major limitation being the availability of software algorithms that can reliably analyze the data. Another hurdle that needs to be overcome is the need for NGS technology to be FDA approved before they can be routinely applied to clinical and vaccine development and molecular diagnostics as an alternative to current processes.

GC: NGS is starting to replace these techniques and there is little reason why NGS will not eventually supplant them completely at some point. However, I believe that point is still some years off. The bottleneck of computing resources, the lack of mature database and data manipulation tools, and the still higher costs-per-sample are significant obstacles that must be overcome. Pharma has invested greatly in building reliable microarray infrastructure over the past decade. It will be some time before NGS infrastructure is built up to a similar point.There is also the issue of what questions scientists are asking. If all someone wants to know is how the expression levels of a handful of genes are changing with some treatment, why go to all the extra cost and effort of performing an NGS experiment, when a microarray experiment gives you mostly the same data? Scientists looking at things like alternative exon usage or rare genomic mutations have already switched to NGS. Other researchers will not switch until the the practical differences between NGS and arrays disappear.

AB: In some cases NGS has, or is getting ready to displace these technologies, however the biggest factor affecting the decision has much less to do with cost and more to do with sensitivity. NGS is an attractive solution when sensitivity is needed at the single basepair level. The argument for the NGS is that while this sensitivity may not be immediately required, the investment in NGS data will prove useful if the need for a more sensitive analysis arises.

3. Genomics is undoubtedly the field that has benefitted the most from the progression of NGS. In the past year, what other areas have benefitted and how have they done so?

JS: A handful of cases of clinical utility –supporting critical therapeutic decisions for individual patients – have been published, and we expect steady growth. That said, these success cases represent a fraction of the attempts. Additionally, NGS technologies have recently been used to understand the spread of infectious diseases such as hemolytic-uremic syndrome in Europe this summer and the Haitian cholera outbreak. NGS is being proposed for use in a quality control step in drug formulations (e.g., discovery of viral DNA in vaccines). And there’s certainly interest, though I’m not aware of practical use, for monitoring of bio-threats.

JV: NGS has commonly been applied to areas such as exon sequencing, epigenetics gene expression profiles. New areas that have emerged more recently are the use of NGS to elucidate DNA methylation patterns, microRNA functionality, copy number variation and biomarkers. These can now be effectively combined to be applied for patient stratification and personalized medicine and hence provide more effective and targeted therapies, alongside a more complete picture of the processes within the model system.

GC: Genomics is a pretty broad term and I believe it still covers most of the compelling usages of NGS. For example, there have been significant efforts to do genome sequencing on microbial communities in different niches of the human body as well as in the environment. NGS is starting to be used in the clinic for designing personalized treatment plans and enrolling for clinical trials. NGS is continuing to be used for mapping rare Mendelian mutations and common tumor mutations. The other major use for NGS, besides genome sequencing, is transcriptome sequencing. Due to the reasons described earlier, the uptake of this as a replacement for expression microarray analysis will be slower and will depend on improving infrastructure, decreasing costs, and compelling biological rationales.

AB: NGS has changed the way in which folks tackle scientific questions. Initially, the direct access to genomic information was so attractive, that researchers did away with more complex systems for interrogating this information, and replaced that with sequencing, a much more direct approach. More recently the cost-effectiveness and accessibility of these data has taken this a step further, where scientists are developing complex systems to leverage the power of nextgeneration sequencing. This is seen in the field of transcriptomics, which converts RNA to cDNA for sequencing, and in proteomics, where immunoprecipitation is used to study protein-DNA interactions. As sequencing throughput rises and associated costs continue to plummet, this trend will likely continue.

4. Can you comment on what the role of external partners is playing with the regards to the use of NGS in drug discovery?

JV: External partners vary from providing tools for efficient data management, transfer and storage to complex, all-in-one packages for data analysis and annotation capabilities. The impact of this is that it allows drug-discovery researchers the time to find answers to the questions at hand without spending significant amounts of time assembling and processing raw data. The availability of public tools also has made a large contribution to the ease and throughput by which NGS data is handled.

GC: In this field which is so new and which requires very specific resources and expertise to handle well, external partners are playing a great role. The NGS technology companies themselves are not only selling the sequencers but are also performing sequencing and analysis as a service. Third parties such as Complete Genomics and the Beijing Genomics Institute are also major players in both sequencing and analysis. In addition, there are companies that provide analysis software and database solutions for both in-house installation and for cloud storage/ computing. This technology is still changing quickly and it does not make sense for pharma to invest too heavily in something that may soon be obsolete.

5. What impact has the emergence of cloud computing had on the application of NGS?

JS: My impression is that NGS hasn’t yet moved in very substantive ways to cloud computing, but the potential here is substantial. This is of particular interest for public datasets that could reside in a (or few) centralized location and analysis could be brought to the data, instead of large numbers of labs needing to download (the size of the pipes is limiting!) and store the same very large datasets. But even for private data, private clouds could reduce resource needs when there are multiple users. Few of the tools that are used today for genomic analysis are adapted to the cloud but this will be rectified. Data security and access are also concerns that are being addressed.

JV: Cloud computing is having a great impact in the way bioinformatics is being done, not only with applications to NGS but on other tools such as literature mining and translational biology as well. Cloud computing has enabled data to be more portable between data centers, customers and partners and thereby facilitate more collaborative research.

GC: Cloud computing is just a trendy name for the outsourcing of computing resources. As with all other outsourced functionality there are both pros and cons. Given the resources required for working with NGS data, today it often makes sense, especially for smaller groups, to rent these resources in the cloud rather than make the substantial investments to bring them in-house. However, given the difficulties in transferring such large data sets over the network, heavy users of NGS may find the latency that comes with cloud computing unacceptible over the long run.

AB: Cloud computing has, and certainly will continue to, decrease the capital equipment costs for organizations interested in setting up an NGS lab. This makes NGS a more attractive option for organizations without the necessary financial means to set up the required data storage hardware.

Comments