DeepChem 2.6.0 Release Notes

This post was co-authored with Bharath Ramsundar.

We are excited to announce the release of DeepChem 2.6.0! DeepChem 2.6.0 has a number of improvements in the overall usability of DeepChem like improvement to documentation, addition and re-organization of tutorials, features and models. The improved test coverage makes DeepChem 2.6.0 more production ready than the previous releases.

The release contains contributions from many people. We would like to thank our amazing community of developers, users and well-wishers who have made it possible.

Detailed Overview of Changes

Improved Production Readiness

One of the major improvements has been a wide increase in test coverage in 2.6.0. DeepChem’s CI has been improved to include a broad range of tests for all components of DeepChem, including many slow tests that weren’t being run on the CI previously. Separate CI workflows were created for individual components like Tensorflow, Jax, Pytorch models and other components like docstrings etc. All CI workflows have passed on the Linux platform making this release stable for production usage by guaranteeing an increased robustness. The overall healthiness of code has also been improved by improved adherence to coding conventions like mypy, flake8 and yapf. DeepChem 2.6.0 supports python 3.7–3.9 in Ubuntu and macOS. For Windows, DeepChem is stable on python 3.7.

This release upgrades DeepChem’s backend to support TensorFlow 2.7 and PyTorch 1.9.0. The current version is also compatible with the latest version of NumPy 1.22 paving way for going towards a more unified array API (ref).

Support for Jax Models

DeepChem 2.6.0 also started using Jax as a machine learning backend. We have added support for Jax models via the dc.models.jax_models API. Support for Jax is limited to Ubuntu and macOS as Windows currently does not support Jax.

Upgraded Tutorials

A lot of quality improvements have been made to the tutorials section of DeepChem as they form a core component of improving usability of DeepChem. Some of the existing tutorials were improved and new tutorials were added to cover different aspects and use cases of DeepChem. All tutorials can be used in Google Colab for learning purposes. Some of the new tutorials are:

Also, the entire set of tutorials has been re-organized based on use cases of DeepChem like Molecular Machine Learning, Modeling Proteins, Protein Ligand Modeling, Material Science, Quantum Chemistry, Bioinformatics and Physics Informed Neural Networks. This time, we also have a couple of DeepChem YouTube tutorials complementing the existing set of tutorials in the form of interactive Jupyter Notebooks.

Documentation

DeepChem 2.6.0 has improved its documentation on API reference for its users by adding more usage examples and explanations wherever as required. Documentation has also been improved to cover infrastructure related aspects like making a release, CI, running test suites, getting started with contributing to DeepChem etc.

Datasets and DataLoader Improvements

As part of the MoleculeNet suite of datasets, we have added the Freesolv dataset (#2576) and the USPTO dataset (#2546). Improvements have been made to out deepchem.data.DataLoader classes to handle a wide variety of data like .zip files (#2446)

Improvements to Featurizers

Featurizers are key strengths of DeepChem. In this release, we have added the following new featurizers:

  • DeepChem now supports a DummyFeaturizer via the dc.feat.DummyFeaturizer API. This featurizer simply returns a datapoint without performing any kind of featurization operation on it. It turns handy when working on datasets which do not require any kind of featurization.
  • Addition of the PAGTN featurizer which can be used for for PAGTN graph network for molecules (dc.feat.PagtnMolGraphFeaturizer)
  • Addition of RobertaFeaturizer for transformer models
  • Addition of BertFeaturizer for transformer models
  • Addition of RxnFeaturizer for chemical reaction models
  • OneHotFeaturizer (dc.feat.OneHotFeaturizer) can now be used to encode any arbitrary string or molecule as a one-hot array. This can be very useful for a wide variety of applications like protein modeling. Transformer has been added for handling chemical reaction SMILES into source and target string required for machine translation tasks (#2597).

Models and Layers

DeepChem has added a number of new models:

  • Pagtn model: Graph property prediction (dc.models.PagtnModel)
  • MolGAN model: A generative model for small molecular graphs
  • PINNModel: Partial differential equation solvers
  • Molecular Attention Transformer: Molecule property predicton

And new layers like:

  • A linear layer in Jax (#2634)
  • ScaleNorm
  • MATEncoderLayer
  • MultiHeadedMATAttention
  • MATEmbedding
  • MATGenerator

DeepChem is moving towards supporting fully differentiable layers (link). Support for Numpy 1.22 will enable us to add generic layers in future releases.

Minor improvements

DeepChem had a lot of minor improvements like fixes on minor errors in CI, saving and loading of models etc.

  • Improvements like addition of utilities functions for using Graph Conv models, utilities to find shortest path between atoms in a molecule etc have also been added.
  • New features for different loss functions like Huber loss, Squared Hinge loss have been added.
  • Also, optimizers like AdamW (Adam with weight decay), sparse adam optimizer have been added to give users a wide variety of choice for training models.
  • Hyper parameter optimization methods have been improved and made consistent across different hyper parameter optimization techniques.
  • Docking: Using Vina has been quite challenging in the past. The AutoDock Vina team had released a python API which has been integrated with DeepChem for docking related usages (#2741).

Improved Logging and Error Messages

DeepChem has improved upon better error messages and improved logging messages for failing cases and invalid inputs. Now, Weights & Biases logger (#2520) logger was integrated with DeepChem to log training loss and validation metrics apart from the TensorBoard. Logging for hyper-parameter optimizers has been improved to log all the tested models.

Pull Request list:

Here is a full list of Pull Requests (PRs) describing the changes to deepchem repository.

Dataset and DataLoader Improvements

Improvements to Testing

Changes in CI

Models

New Models, Layers and Modules

Jax Models

Existing Model Updates

Error Messages and Logging

Tutorial Updates

Updates to existing tutorials:

New Tutorials:

Minor Improvements

Dependency and Setup Fixes

Most of these fixes are related to version bumping, pinning and fixing. CI improvements, installation related fixes.

Documentation improvements

Featurizers and Transformers

New Featurizers

Improvements