In March, as the coronavirus pandemic spread from Asia to the western world, drug discovery came to a near standstill. Most laboratories shut down and instruments and reagents were left untouched, except for the most essential work. The pandemic forced large and small companies, regulatory and government agencies, and academia to tap into technology, particularly artificial intelligence (AI) and machine learning (ML), for providing more than just speed and efficiency. “What we see at this point and will continue to see for the next two years and beyond are predominantly early-stage, proof-of-concept, and feasibility pilot studies demonstrating the high potential of numerous AI techniques for improving early-stage drug discovery and performance of clinical trials,” says Stefan Harrer, Ph.D., Manager and Research Staff Member, Brain-Inspired Computing, IBM Research. “Obviously, the ongoing pandemic continues to be a huge stimulus to this exploration and notable progress has been made already. But make no mistake, further focused research and development are needed to ensure the viability of these innovations.”

According to Harrer, while there are no shortcuts around scientific rigor and experimentation, AI can certainly accelerate the discovery of new drugs particularly when combined with high-performance computing (HPC) and quantum computing. As part of the COVID-19 HPC Consortium, his colleague Jason Crain is combining advanced molecular simulations with AI-driven identification of potential compounds to enable identification of compounds that could be repurposed as candidate anti-viral drugs. “He is running this on two of the fastest supercomputers in the world basically compressing months of research into hours,” says Harrer.

Similarly, in March, when the White House put out more than 45,000 PDF documents on COVID-19 and coronavirus for experts to look at, they turned to AI. “Obviously, no human can read all these documents in any reasonable amount of time, hence, the IBM lab in Switzerland trained a ML model on a few dozen of these papers. Once the model was trained the IBM corpus conversion tool automatically ingested the remaining thousands of PDFs in a few days,” says Harrer. “More than 450 researchers are now posing queries to this freely available, deep search tool to find out which drugs have been used so far against the coronavirus, what were their outcomes, and much more.”

AI is Here to Stay, Now What?

Ben Miles, Ph.D., is Head of Product at Strateos, a company that is using next-generation software via the Internet to help companies connect their existing infrastructure to generate the required data. Their remote robotic labs seamlessly automate and integrate existing equipment and workflows to help scientists design and run experiments. “We are helping the physical world meet the digital world,” says Miles. “We are using statistical techniques to generate ML models to do things that were previously not possible and to identify possibilities that did not exist.”

Such applications of AI/ML have certainly been made more feasible and accessible due to the plethora of products and services offered by large companies like Amazon, Microsoft, and Google, as well as by smaller informatics-driven companies. Researchers can choose from several open access and commercially available cloud computing and data analysis software products and quickly analyze massive amounts of data with complex algorithms. So the critical questions now are, how do you evaluate and invest in AI and ML tools knowing that each one has its own unique set of advantages and limitations? How do you know which one will best address your query and more importantly, how can you assess the accuracy and reliability of the predictions?

Allan Jordan, Ph.D., Director of Oncology Drug Discovery at Sygnature Discovery, believes that, “the key is to be able to clearly articulate the specifics of what you want the tool to achieve, or the specific question you want answers to. You then take a long, objective look at how the tool seeks to answer the question, and the relevance of the dataset it uses to propose an answer. The quality of the answer in turn, is entirely dependent upon the quality, and relevance of the data upon which it rests.”

Evaluating new AI technologies, particularly in areas of drug discovery where there are few demonstrations of success, can be a real challenge. “If you plan to work in novel areas of science, do not be seduced by powerful-looking retrospective case studies on well-worked and well-exemplified areas,” says Jordan. “Instead, try to understand how the system can work in unknown areas—those ground-breaking areas of science with little or no exemplification. Can it answer a question that is relevant to your work, even if the specific answer does not exist?” Some areas are beginning to see more holistic uses of AI and ML such as suggestive medicinal chemistry, where systems work alongside chemists to interpret data and suggest potential drug targets or new data to acquire. In integrated synthesis systems the algorithm learns from the data, for example from a microfluidics assay in real-time, and alters the molecules being made in a flow system to investigate structure activity relationships (SAR) as it goes along. “However, for de novo drug design, or target identification, or using natural linguistics to suggest opportunities for drug re-purposing, the evidence that AI works well is less clear,” says Jordan.

Organizations are also incorporating their electronic lab notebook (ELN) data into customized implementation of the systems, in order to predict not just what worked, but what didn’t. “I think the predictive synthesis platforms are just about starting to become credible—systems like Synthia, a retrosynthesis software from Sigma Aldrich, though I think they would benefit from more failed reaction data,” says Jordan.

Predicting with cautious optimism

AI and ML have offered many predictions and promises on various occasions, but some of those have turned out to be less credible and impactful, and hence, expectations are now turning more realistic. According to Jordan, the future of AI will be less dramatic than what has been predicted. “Many times, we have heard that AI and ML will make the chemist obsolete, but I don’t think that’s going to happen just yet. However, it may improve the quality of decision-making and may remove some of the intrinsic biases that we all carry.”

AI/ML tools are being used quite successfully in clinical and diagnostic applications to stratify patients and to help identify responders and non-responders for a drug at earlier time points. It’s being used to assess cross-over between the control and treatment arms and to create complex clinical diagnostics with complicated decision rules, with many layers of information. TrialSpark is a startup working to accelerate and also lower the clinical trial costs for Covid-19 research. They are finding patients digitally through social media and using telemedicine to conduct at-home testing and collecting patient data.

However, large-scale datasets still remain largely siloed, according to Kuan-Fu Ding, Ph.D., Chief Science Officer at Catalytic Data Science, who has led data science and computational biology efforts at various data-driven companies. “In the era of big data, data centralization and accessibility will be absolutely critical to yield more successful applications of AI/ML in medicine,” says Ding. “Over the next two years, there will be further advancement and solutions built around data sharing, data wrangling, and data engineering.” Hence, according to Ding, it is very important that we don’t blindly apply AI/ML algorithms. “You must first understand the algorithms, their underlying principles and take care not to arbitrarily set thresholds (e.g., parameter thresholds, significance thresholds, etc.). Verify that the data (and metadata) being used is correct and discuss conclusions with the area experts like the clinicians, chemists, and geneticists. Remember that computers cannot reason!”

“As with any emerging and evolving technology, don’t fear it, but try it, explore it and see what value it adds,” says Jordan. “I think it’s fair, and probably important, to remain a little skeptical—be led by the ‘real’ data, not just the predictive datasets and be aware of the limitations of the predictions. It’s a tool that will become increasingly valuable, but I believe it will never be a substitute for a real lab experiment.”

What You Need to Know about AI 

What Can AI Offer:

  • More data access and search capabilities
  • Objective data analysis
  • Enhanced signal from noise in datasets
  • Increase in speed and efficiency
  • Exploration of previously unidentified or ignored areas and ideas

What Can Enhance the AI Offering:

  • Improving data formatting and normalization
  • Reporting negative data to enrich the training sets
  • Improving data curation to better train the models
  • Effective integration into existing workflows
  • Finding ways to capture and report any data ambiguity
  • Eliminating silos and increasing data accessibility
  • Increased commitment to data security and privacy

What Can You Do To Enhance the AI Offering:

  • Know what question you want answered
  • Understand the algorithms and their underlying assumptions
  • Understand the limitations of AI in answering the question
  • Verify the AI data and validate the predictions
  • Know what worked and what didn’t to improve the next iteration of the AI model
  • Share your knowledge and experiences with the community