A computational method developed by two Virginia Tech computer scientists is matching one of the most advanced AI systems available for predicting the three-dimensional shapes of RNA—and doing so without the large evolutionary sequence databases that most leading tools depend on. The method, called RNAbpFlow, was described in a study published in Nature Methods.
Predicting RNA structure matters because RNA molecules fold into specific three-dimensional shapes that can serve as drug targets. Risdiplam, one of the first small-molecule drugs designed to target RNA directly, works by latching onto a specific folded shape in an RNA molecule to treat spinal muscular atrophy—a leading genetic cause of infant death. Tools that can accurately predict RNA shapes could accelerate the search for similar therapies for diseases including Huntington’s, ALS, certain cancers, and viral infections.
“How can you target an RNA if you don’t have its shape?” said Sumit Tarafder, the study’s lead author. “In the shape, there are pockets where a drug can attach. If you can’t predict the shape, your pockets are wrong—and the drug won’t work.”
In a blind test against a widely used community benchmark, RNAbpFlow produced a correct overall structure for 12 of 14 RNA targets, compared with eight out of 14 for AlphaFold 3, the system from Google DeepMind. Rather than searching for thousands of related sequences to infer structure, as AlphaFold-style systems do, RNAbpFlow uses a technique called flow matching, the same broad class of generative AI used to create images. It generates complete, all-atom 3D structures in a single end-to-end process from just the RNA sequence and base pairs.
“The model starts from complete noise and, guided by those base pairs, folds into the right 3D shape. That’s the beauty of flow matching, and we can generate as many structures as you want, which lets us capture how the molecule actually moves,” said Tarafder.
Search Antibodies Search Now Use our Antibody Search Tool to find the right antibody for your research. Filter
by Type, Application, Reactivity, Host, Clonality, Conjugate/Tag, and Isotype.
The low data-dependence is central to the approach. RNA is structurally flexible and badly underrepresented in databases, making it far harder to model than proteins. “We asked whether we could leverage what data we have, and use additional knowledge from experiments to fill the data-gap and give RNA-based drug discovery a fair shot,” said Debswapna Bhattacharya, the study’s senior author.
The researchers acknowledge that on larger, more complex RNAs, tools that draw on evolutionary data still hold an edge. RNAbpFlow performs best in cases where that data is limited—including a conserved structural element from the SARS-CoV-2 genome and a laboratory-built ribozyme tested in the study. Tarafder is now leading development of an improved version to be submitted to CASP, the community-wide prediction competition.