A Pathway Towards Deep Science

In previous posts I’ve laid out the case for moving DeepChem towards a deep science suite of packages for scientific computing (Towards a Deep Science Suite, Some thoughts on DeepChem Architecture, Making DeepChem a Better Framework for AI-Driven Science, Towards Differentiable DeepChem). Thus far, these discussion have been high level and haven’t proposed a concrete roadmap for DeepChem/DeepScience development efforts. In this post, I take a first sketch at constructing a roadmap for the DeepChem->DeepScience evolution. The problems I would like to solve are:

  • Unifying DeepChem Layers: DeepChem’s layers are currently confusingly split across multiple frameworks (TensorFlow, PyTorch, Jax). This means that our layers are not readily composable, so that it isn’t possible to express complex architectures built out of DeepChem primitives.
  • Unifying DeepChem’s datatypes: DeepChem featurizers generate a number of closely related representations which use different graphs. This creates a lack of composability which means it’s hard to reason about how different featurizers fit together.
  • Adding gradients to DeepChem’s operations: For deepchem to become fully differentiable all DeepChem’s operations should have gradients defined.

I propose the following changes to DeepChem along the pathway towards DeepScience

  • Continuing to flesh out the tutorial series to cover more areas of science (https://github.com/deepchem/deepchem/issues/2907)
  • Creating a lightweight tensor object: DeepChem has at various times used Theano, TensorFlow, PyTorch, and Jax tensors. Swapping between frameworks has required major rewrites of DeepChem multiple times. I propose we maintain a common tensor object with different backends. Then future swaps of backend will only require minor changes to the backend API. There is engineering overhead in this option which is why we have avoided it in the past, but I think it will bring the powerful advantage that DeepChem’s core code can remain stable without rewrites of layer logic when downstream frameworks change or evolve.
  • Introduce DeepChem standard datastructures for new scientifically relevant data structures (multigraphs, meshes)
  • Rewrite DeepChem layers to use DeepChem tensors to make all layers interoperable. For backwards compatibility, maintain all existing layers in codebase through a long deprecation cycle before phasing out.
  • Rewrite DeepChem featurizers to use standard datastructures (for example, all graph convs should use the same graph objects).
  • Add gradients to DeepChem’s layers/models/featurizers
  • (Optional) Consider using https://docs.rapids.ai/api datastructures to use GPUs to accelerate featurization.
  • Continue fleshing out DeepChem model list (https://github.com/deepchem/deepchem/issues/2680)

This is still an early draft of the work that will be needed to move towards DeepScience. Please add on thoughts/feedback here!

1 Like

So excited for this :smiley:

Feel like Deep Learning methods are becoming more and more interoperable, and it’s a great moment to broaden our community along the lines of the open-ethos…

(DeepChem as scientific protocol 'glue)

… could be very cool to tell a bit of this story along with some of the cool DeepBio-esque projects (genomics ect) we have going on for GSoC :smiley: :smiley: :smiley: