Drug development can be a very lucrative, and very costly, venture. Many more drugs fail in early stages—and even worse, in late stages—than succeed, driving up the average time to bring a new drug to market to about 12 years, at a cost of about $1.3 billion.

Like many other segments, developers of antibody-based pharmaceuticals are exploring new ways to accelerate development and reduce late-stage failures. Key among such efforts is the use of artificial intelligence (AI), and more specifically machine learning (ML) and deep learning (DL), tools. This article will examine AI’s increasing role in therapeutic antibody discovery and refinement.

Tradition

Historically, antibodies were created in one of two major ways: “one is to immunize an animal with an antigen, which is the traditional way,” explains Mark DePristo, Co-founder and CEO of BigHat Biosciences. The other is “a sort of molecular biology-based approach using any number of display technologies that got us away from an animal.” The latter used pre-defined libraries of potential antibody binding regions to discover those with affinity to a given antigen.

In either case, hits could be selected and refined by subjecting them to further rounds of screens.

These methods have had great success, he adds. “The problem is they’re expensive. They’re slow. And it’s generally extremely difficult to improve upon what you get either directly out of the animal or from a library.”

AI puts the data to work

AI offers the potential to take an antibody found in a screen and improve on it, using knowledge gained from other, similar, antibodies. It also promises the ability to design an antibody from scratch, by picking and choosing and combining different attributes found in other antibodies. To do so it’s necessary to have the data (already-known antibody sequences or structures), a model (to analyze the data), and raw computing power.

Search Drug discovery and development products
Search Now Search our directory to find the discovery-related products for your research needs.

Instead of just going with the hits from the screens you can feed the hits’ antibody sequences into a model, and then train the model with these hits. “The model can then infer further variants at a fairly high accuracy,” says Philip Kim, Professor in the Departments of Molecular Genetics and Computer Science at the University of Toronto. “Your model already has a good idea of what an antibody looks like. With selection you can train it to a certain extent of what kind of antibodies bind to your particular target at particular sites. Then you can let the model reason prospectively: knowing all this, give me new sequences that will bind the antibody maybe stronger. And then you can iterate.”

AI today

Most of what today is termed AI, such as the GPTs (Generative Pre-training Transformers) like ChatGPT, is deep learning. Unlike AI, DL is “really a proper technical term,” Kim explains. “It has real meaning in the computer science and machine learning world. To be very exact, DL models are by definition sort of deep, multilayer neural networks.”

DL algorithms, which are a subset of machine learning, have layers of nodes that can receive input data and output data between nodes of the same layer as well as subsequent layers, abstracting the data into higher-level representations. The network is trained by altering the weights of nodal connections based on previous data, in an iterative fashion, allowing it to find and learn non-linear relationships and deliver outputs that are not necessarily the sum of its inputs.

DL approaches are “just giant models that make less assumptions than traditional ML models,” DePristo says. This makes them excel at things like pattern recognition. “They have enough parameters that they can learn astronomically complicated things, and it turns out that we can train them to do that.”

How does DL work for antibodies?

There are two broad categories under which most DL antibody engineering models fall: sequence-based and structure-based.

There are a host of protein sequences available in UniProt and other public and proprietary protein databases. A lot of work has been done to date on modeling proteins using natural language—essentially treating amino acids as words, and allowing the algorithm to learn how they are put together to form protein sentences. “You can just train an effectively off-the shelf language model on these sequences, and then the language model learns quite well how a protein sequence looks,” Kim notes. “These models have since been used to suggest mutations to existing antibodies—lead candidates or whatever—and then more often than not these suggestions lead to an improvement of antibody properties. You can also use the models to generate antibodies” de novo.

Newer on the scene are DL models that take an antibody’s structure as their inputs and outputs. “We usually use some sort of coordinate-based representation of proteins,” Kim says. To make sense of these, a set of technologies had to be built with an infrastructure that can efficiently reason in 3D space rather than the 1D linear space inhabited by natural language. “The field has made tremendous progress in the past five, six, seven years.”

More than just affinity

Traditionally antibodies were selected for their ability to stick to their target, and “any other property was a distant second, so you often got high affinity antibodies that had bad properties,” DePristo points out. AI is well suited for simultaneously optimizing multiple properties like affinity and specificity, but also safety profiles, physicochemical parameters, and others related to what are collectively known as “developability.”

“At BigHat, we solve these problems by having an integrated wet lab. Every week, we make thousands of antibodies and measure these properties,” and feed the data back into the program, he explains. It suggests mutations, which are then included in the next round of antibodies that are produced and tested, the data from that round fed back, and so on until an optimum combination of properties can be achieved.

Designing antibodies

Kim remarks that advances in protein ML in general have been amazingly fast over the past few years, with no signs of slowing down. “Sequence-based models are already having an impact—they really work for making antibodies better.”

This leads him to predict that soon (he couldn’t commit to how soon) most antibodies will be designed on the computer. And eventually it will become like ordering a CRISPR/Cas9 guide RNA: “You go online, you order five, and a couple of them are tenable.”