MIL-OSI Economics: Understanding protein motion is essential to understanding biology and advancing drug discovery. Today we’re introducing BioEmu, an AI system that emulates the structural ensembles proteins adopt, delivering insights in hours that would otherwise require years of simulation.

Source: Microsoft

Headline: Understanding protein motion is essential to understanding biology and advancing drug discovery. Today we’re introducing BioEmu, an AI system that emulates the structural ensembles proteins adopt, delivering insights in hours that would otherwise require years of simulation.

Today in the journal Science: BioEmu from Microsoft Research AI for Science. This generative deep learning method emulates protein equilibrium ensembles – key for understanding protein function at scale. https://msft.it/6043S7rAH BioEmu aims to emulate the ensemble of structures that a protein will adopt in an experiment or the cell. The ability of a protein to dynamically switch between distinct structures is a basis for its function. BioEmu 1.1 is trained longer and more carefully in 3 distinct stages on vast data of protein structures, >200 milliseconds of molecular dynamics simulations, and 500,000 protein stability measurements. BioEmu 1.1 predicts functionally relevant conformational changes, including large-scale domain motions and local unfolding events + an increased success rate in predicting the formation of “cryptic” binding pockets. BioEmu 1.1 can emulate equilibrium distributions of millisecond-timescale MD at many orders of magnitude speedup, bringing GPU-years down to GPU-hours. BioEmu 1.1 improves ability to match experimental protein stability measurements with sampled protein structure ensembles with prediction errors below 1 kcal/mol, correlations >0.6 for a large protein stability test set, and train-test sequence similarities ~ 50%. This also holds up for predicting stability changes of single and double mutants. These results indicate that the encoding of protein mutants still resolves enough differences to be predictive when fine-tuned with the right data. Also available: MD simulations generated to train BioEmu – more than 100 milliseconds worth of data of 1000s of protein systems and 10,000s of mutants. This dataset stands out for its combined protein sequence diversity and simulation length. Learn more: https://msft.it/6044S7rAy

MIL OSI Economics