Brainstorming Ideas for DeepChem 2.5.0

We’re starting to get close to the DeepChem 2.4.0 release, which suggests it’s time to also start planning for DeepChem 2.5.0. I’m starting off this thread to serve as a place for us to brainstorm features we’d like to see in DeepChem 2.5.0. I’ll kick us off with some items from my wishlist

  • Improved support for FEP (free energy perturbation) calculations and tutorials on how to use FEP on real system.
  • A state of the art model for retrosynthesis and improved MoleculeNet support for synthesis datasets. Computational retrosynthesis has really exploded as a field and a high quality model could be very useful.
  • Improved support for genomic datasets and models. More datasets (like Uniprot etc) and models for genomic learning tasks like transcription factor binding sites. More tutorials for the same.
  • Improved support for microscopy. Adding more datasets like Recursions RxRx19 to MoleculeNet and creating models that perform well on these datasets. More tutorials for microscopy.
  • Starting up support for physical deep learning, with a high quality implementation of something like PINNs
  • Adding support for Jax with a new dc.models.JAXModel

Please list more items from your wishlist in this thread! As we start to build consensus on desired features we can start to do some planning :slight_smile:


We should think about an architecture for pre-trained models. For a lot of these things, it would be really useful if we had a way to provide users with fully trained models that they could either use directly, or fine tune for other tasks.


Does DeepChem support force-field based virtual screening?

I just noticed this paper - which uses DFCNN, DeepBindBC, and an FF-based technique to predict an effective ( repurposed ) drug for COVID:

Their FF-based technique is not deep learning based, as far as I can tell. So, if DeepChem could be used also for FF modeling, then there may be an opportunity for a complete multi-task architecture using DeepChem.

Apologies in advance if this topic has been covered previously.

1 Like

What they did in this paper is pretty trivial. It was just

  1. Use three different existing docking codes to evaluate each molecule.
  2. Keep only the molecules that scored well with all three codes. There were 14 of them.
  3. Run a short MD simulation of each one in the binding pocket.
  4. Look at the results by eye and see if they seemed to be binding stably. As far as I can tell from the discussion, this mainly just came down to counting hydrogen bonds.
  5. Do experiments to test the four most promising ones.

To add on a bit more color, the DeepBind/DFCNN work is similar to what we support with interaction fingerprints / atomic convolutions so it might be possible to build a similar system directly in DeepChem. The dc.dock system allows for programmatic docking (with Autodock Vina) so it could also potentially help facilitate this type of work.

1 Like

Thanks @bharath @peastman!

1 Like

On additional item I forgot to mention. This may be too ambitious for DeepChem 2.5.0 but is an excellent long-term goal:

  • Improved support for AlphaFold-style architectures. This means support for more critical datasets, useful models, and relevant benchmark materials. And of course, tutorials for all of the above.

Hi I would also suggest the inclusion of Neural Fingerprints trained of bioactivity data like:

I think the results are quite impressive and Ligand-based virtual screening is still used by many medicinal chemists

1 Like

To add an update onto this thread, we’re now swapping to a faster release cycle. The planned release date for DeepChem 2.5.0 is now the end of February, which means it will limit the set of features we can get in by then. That said, the brainstorming in this thread is great! I think a lot of these ideas we can build out in 2.6.0 and beyond if we don’t get to them in 2.5.0