One of deep learning’s biggest problems in life science applications is that of small datasets. Consider for example a human study with 50 participants. Using careful statistics, it may very well be feasible for a good statistician to draw meaningful results from this data. However, it’s not at all clear that a deep model trained on this data would be able to achieve good results.
What are some ways that we could adapt deep learning methods to do useful things with small datasets? Some past work has demonstrated how techniques from one-shot learning might be able to do things like find useful molecules for a downstream task. However, it’s important to note that these techniques are finicky in practice and often limited in scope. A one-shot or meta-learning approach might allow a model trained on a small dataset to be applied to “similar enough” dataset, but it’s very hard in practice to gauge what “similar enough” means.
To see a more fruitful path forward, it’s worth considering how the human scientist is able to draw meaningful conclusions from a small dataset. She might use her extensive past training to narrow down the space of hypotheses at hand by using known physiological or biomedical facts. This restricted question might then be able to be demonstrated using a statistical technique on the small dataset at hand. What would the analogous feat be for a machine learning model?
The key would have to be introducing strong priors into the network that enables it to learn effectively from the data at hand. The physics community has investigated ideas like introducing inductive biases from the Hamiltonian to make models obey conservation laws (paper). Could something similar be done for biology? The challenge of course is that biological systems are complex and straightforward laws are not easy to come by. Nonetheless, it ought to be possible to add some basic structural biases into models that may provide useful grounding. I suspect that the choice of prior would have to be very dependent on the system at hand, but with time it ought to be possible to build up a useful library of priors which can be combined to create an effective toolchain for smaller studies.
It’s interesting to note that this approach harkens back to old-school attempts to use expert systems in the life sciences. There’s a crucial difference though. Such systems were hard coded and couldn’t be easily trained. I’m suggesting something different, which is the development of effective inductive priors which can be added to the loss functions of deep networks. Such priors could be used to build models which can then be trained on new datasets and easily transferred to new applications.
Acknowledgements: Thanks to Sandya Subramanian for useful discussion.