Creating Differentiable Cell Simulators

Deep learning models in biology have mostly been fairly limited in scope. For example, graph convolutional networks provide a powerful tool for mapping the structure of the molecule to a prediction about its properties. Similarly, models for analyzing biomedical images such as the UNet only consider a single image in isolation. Deep learning applied in this fashion has no doubt been powerful, but there are fundamental limitations to current approaches. For one, deep architectures have no understanding of the unity of a biological system. Models can’t understand that molecules constitute cells, which constitute tissues, which constitute organs. There’s no sense of how various components of biological systems fit together with one another.

Systems biology on the other hand has strived to make unified computational models that treat entire biological subsystems. A seminal paper from a few years ago constructs an exhaustive computational model of a small bacterium which it uses to make phenotypic suggestions. This approach is tremendously appealing, but there are limits here too. Taking a peek at the source code for this simulation reveals that many simulation parameters are hard coded. It isn’t obvious how this simulation model can be extended to simulate other cells for example. Contrast with the flexibility of deep architectures, where a convolutional network can handle image data from vastly different domains.

If we could create a differentiable cell simulator, we’d have a tremendously powerful toolkit with which we could approach questions in cellular biology. That is, we’d have a simulator where the simulation parameters are learnable from data using the techniques of deep learning. This would enable the model to be tuned to real experimental data so that it can yield meaningful biological predictions. This idea isn’t a pipe dream. There are a number of efforts underway to build detailed datasets about particular cells. The Pancreatic Beta-Cell Consortium is attempting to build detailed multimodal datasets that combine many different aspects of a cell’s function. If datasets of this type were open sourced, it might be possible for the research community to build differentiable simulators on top which could be used as a foundation for open source biology.

There’s been some early work in this vein already. For example, here’s a differentiable protein structure predictor and a differentiable protein simulator. However, there’s a large gap in complexity between a single protein and a cell, so considerable research work will be needed before differentiable cell simulators become standard.

I’m surprised and disappointed that not much significant work has happened based on the seminal paper by Karr et al. What’s the current status of this line of research?

I think the Covert lab has been hard at work trying to extend their machinery to E. Coli. They have a recent paper outlining some of their results. It’s a hard problem in general I think. I believe parameterizing the simulations is a major bottleneck, which a swap to a deep learning architecture could help quite a bit on.