While it may not be a new concept, convergence research has garnered a lot of attention recently due to exciting new breakthroughs from collaborative work in the realm of artificial intelligence and big data. This type of cross-disciplinary work has long been touted within academia, but recent attention from high-profile groups makes the idea worth revisiting—especially when the attention also leads to funding.

Two years ago, the National Science Foundation (NSF) announced ten big ideas for long-term future investment. Among them was convergence research, which NSF described as “research driven by a specific and compelling problem” and one that incorporates “deep integration across disciplines.” With convergence, areas outside of the traditional life sciences, such as engineering, data science, and computation, directly share methods and expertise to develop new approaches and solutions. A more in-depth description and history of convergence can be found in MIT’s 2016 Report.

Following up in a Dear Colleague Letter published last March, NSF announced it would be funding new research ideas that fulfill an interdisciplinary convergence approach (for those interested, a prospectus must be submitted by October 15 to be considered for 2019 funding).

Interest has also emerged from the private sector. In 2016, Facebook’s Mark Zuckerberg and Pricilla Chan launched the science arm of their Chan Zuckerberg Initiative (CZI) with the mission of curing, managing, or preventing all diseases by 2100. Among its approaches, the initiative is building open-source computing tools for data analysis, visualization, and sharing, with a specific focus on machine learning and cloud computing. By fostering direct “on the ground” collaboration of computational biologists and software engineers with life scientists, CZI aims to accelerate research. Perhaps more importantly, this commitment is backed by a generous promise of $3 billion in investment over the next decade.

CZI is already having an impact on biomedicine. It endowed $600 million to establish CZ BioHub, a medical science research center dedicated to advancing work on infectious disease and the massive Human Cell Atlas project. CZI also announced 85 new projects in support of the Atlas that will aim to create new computational tools, algorithms, visualizations, and tools for coordinating relevant data.

Much of the progress in convergence research is being driven by advances in computing and data science. Many of these computational tools have arisen out of necessity to accommodate biology’s own progress in accumulating data. A PLOS study projected that by 2025, annual storage of genomic data may reach 40 exabytes (1018 bytes) per year. To put that in perspective, these predicted requirements would exceed those of YouTube and the field of astronomy. Aside from genomics, countless images and associated data are also constantly produced in both research labs and the clinic, including datasets from high-content screening, microscopy, and digital pathology.

Deep learning

One particularly intriguing area in computation increasingly being applied in the life sciences is a branch of artificial intelligence known as machine learning. In its essence, machine learning involves constructing algorithms based on provided input data. The input serves to train the system to then make output predictions when given a brand new set of data. In a subtype known as “deep learning,” this training can be taken further. Deep learning models comprise several hierarchical layers of functions that analyze raw input data. After many iterations, these layers can begin to learn key data features. In the case of image data, lower layers tend to learn to detect features like shape edges, while higher layers can learn fuller shapes and objects. From the networking of processed information from many layers, deep learning algorithms are well-suited to make accurate predictions from complex inputs such as images and text documents. A recent paper in bioRxiv provides more information on deep learning.

Deep learning is directly applicable to medicine, as demonstrated by the Barzilay group at the Computer Science & Artificial Intelligence Lab at MIT. The Barzilay Group develops deep learning algorithms to answer oncological questions based on large amounts of clinical data. These include longitudinal mammography imaging data as well as textual data. Another branch of machine learning known as “natural language processing” aims to understand and interpret human language through computation, making it particularly useful in parsing text-based hospital reports. Through automation, such algorithms would not only save a great deal of time but would also make cancer predictions much more accurate.

“We have a large amount of mammograms and MRIs from Massachusetts General Hospital (MGH) covering hundreds of thousands of exams and predicting things like cancer risk, early detection, recurrence, and response. We also work on a large set of pathology reports (unstructured medical text), and have built an automated system to automatically extract important facts like diagnosis and histological factors,” explains Ph.D. candidate Adam Yala who leads the work in parsing breast pathology in the Barzilay lab. “We have systems for both mammography and pathology reports deployed at the hospital now.”

“In hind-sight, it's easy to see the cancer slowly developing, but it's very hard to identify the subtle pattern as a human. We are working on developing models that explicitly look at how the mammogram changes year over year and tries to diagnose the cancers earlier, and we're quite excited,” adds Yala.

Other applications of machine learning are increasingly dotting the research landscape. In January, CZI partnered with data scientists from UMass Amherst to design an AI-based tool that will dig through millions of published scientific articles. Known as Computable Knowledge, the method uses natural language processing to facilitate new ways to track published findings and make more meaningful scientific connections. Then last April, a collaboration between Gladstone Institutes and Google led to development of deep learning approach to label phase contrast images in silico with predicted fluorescent colors. As a result of this algorithm, a cell could be identified as alive or dead with 98% accuracy.

“We wanted to use our passion for machine learning to solve big problems,” notes Philip Nelson, director of engineering at Google Accelerated Science, in an interview with Gladstone. “A collaboration with Gladstone was an excellent opportunity for us to apply our expanding knowledge of artificial intelligence to help scientists in other fields in a way that could benefit society in a tangible way.”

The impact of big data and artificial intelligence on convergence research will be examined in an upcoming meeting hosted by AACR. The special conference, which spans four days, comprises eight sessions with presentations highlighting the application of data science to make predictions on all aspects of cancer—from cause to cure.

“This is going to be transformative,” explains Steven Finkbeiner, who led the Gladstone-Google study. “Deep learning is going to fundamentally change the way we conduct biomedical science in the future, not only by accelerating discovery, but also by helping find treatments to address major unmet medical needs.”