This post was co-authored with Bharath Ramsundar.
We are excited to announce the release of DeepChem 2.6.0! DeepChem 2.6.0 has a number of improvements in the overall usability of DeepChem like improvement to documentation, addition and re-organization of tutorials, features and models. The improved test coverage makes DeepChem 2.6.0 more production ready than the previous releases.
The release contains contributions from many people. We would like to thank our amazing community of developers, users and well-wishers who have made it possible.
Detailed Overview of Changes
Improved Production Readiness
One of the major improvements has been a wide increase in test coverage in 2.6.0. DeepChem’s CI has been improved to include a broad range of tests for all components of DeepChem, including many slow tests that weren’t being run on the CI previously. Separate CI workflows were created for individual components like Tensorflow, Jax, Pytorch models and other components like docstrings etc. All CI workflows have passed on the Linux platform making this release stable for production usage by guaranteeing an increased robustness. The overall healthiness of code has also been improved by improved adherence to coding conventions like mypy, flake8 and yapf. DeepChem 2.6.0 supports python 3.7–3.9 in Ubuntu and macOS. For Windows, DeepChem is stable on python 3.7.
This release upgrades DeepChem’s backend to support TensorFlow 2.7 and PyTorch 1.9.0. The current version is also compatible with the latest version of NumPy 1.22 paving way for going towards a more unified array API (ref).
Support for Jax Models
DeepChem 2.6.0 also started using Jax as a machine learning backend. We have added support for Jax models via the
dc.models.jax_models API. Support for Jax is limited to Ubuntu and macOS as Windows currently does not support Jax.
A lot of quality improvements have been made to the tutorials section of DeepChem as they form a core component of improving usability of DeepChem. Some of the existing tutorials were improved and new tutorials were added to cover different aspects and use cases of DeepChem. All tutorials can be used in Google Colab for learning purposes. Some of the new tutorials are:
- Introduction to Material Science
- Protein Deep Learning
- Physics Informed Neural Networks
- Introduction to Molecular Attention Transformer
- Distributed Multi-GPU training of DeepChem Models with LitMatter
- Multisequence Alignments
Also, the entire set of tutorials has been re-organized based on use cases of DeepChem like Molecular Machine Learning, Modeling Proteins, Protein Ligand Modeling, Material Science, Quantum Chemistry, Bioinformatics and Physics Informed Neural Networks. This time, we also have a couple of DeepChem YouTube tutorials complementing the existing set of tutorials in the form of interactive Jupyter Notebooks.
DeepChem 2.6.0 has improved its documentation on API reference for its users by adding more usage examples and explanations wherever as required. Documentation has also been improved to cover infrastructure related aspects like making a release, CI, running test suites, getting started with contributing to DeepChem etc.
Datasets and DataLoader Improvements
As part of the MoleculeNet suite of datasets, we have added the Freesolv dataset (#2576) and the USPTO dataset (#2546). Improvements have been made to out
deepchem.data.DataLoader classes to handle a wide variety of data like .zip files (#2446)
Improvements to Featurizers
Featurizers are key strengths of DeepChem. In this release, we have added the following new featurizers:
- DeepChem now supports a DummyFeaturizer via the
dc.feat.DummyFeaturizerAPI. This featurizer simply returns a datapoint without performing any kind of featurization operation on it. It turns handy when working on datasets which do not require any kind of featurization.
- Addition of the PAGTN featurizer which can be used for for PAGTN graph network for molecules (
- Addition of RobertaFeaturizer for transformer models
- Addition of BertFeaturizer for transformer models
- Addition of RxnFeaturizer for chemical reaction models
- OneHotFeaturizer (
dc.feat.OneHotFeaturizer) can now be used to encode any arbitrary string or molecule as a one-hot array. This can be very useful for a wide variety of applications like protein modeling. Transformer has been added for handling chemical reaction SMILES into source and target string required for machine translation tasks (#2597).
Models and Layers
DeepChem has added a number of new models:
- Pagtn model: Graph property prediction (dc.models.PagtnModel)
- MolGAN model: A generative model for small molecular graphs
- PINNModel: Partial differential equation solvers
- Molecular Attention Transformer: Molecule property predicton
And new layers like:
- A linear layer in Jax (#2634)
DeepChem is moving towards supporting fully differentiable layers (link). Support for Numpy 1.22 will enable us to add generic layers in future releases.
DeepChem had a lot of minor improvements like fixes on minor errors in CI, saving and loading of models etc.
- Improvements like addition of utilities functions for using Graph Conv models, utilities to find shortest path between atoms in a molecule etc have also been added.
- New features for different loss functions like Huber loss, Squared Hinge loss have been added.
- Also, optimizers like AdamW (Adam with weight decay), sparse adam optimizer have been added to give users a wide variety of choice for training models.
- Hyper parameter optimization methods have been improved and made consistent across different hyper parameter optimization techniques.
- Docking: Using Vina has been quite challenging in the past. The AutoDock Vina team had released a python API which has been integrated with DeepChem for docking related usages (#2741).
Improved Logging and Error Messages
DeepChem has improved upon better error messages and improved logging messages for failing cases and invalid inputs. Now, Weights & Biases logger (#2520) logger was integrated with DeepChem to log training loss and validation metrics apart from the TensorBoard. Logging for hyper-parameter optimizers has been improved to log all the tested models.
Pull Request list:
Here is a full list of Pull Requests (PRs) describing the changes to deepchem repository.
Dataset and DataLoader Improvements
- https://github.com/deepchem/deepchem/pull/2565 (Improvements to FASTA Loader)
Improvements to Testing
- https://github.com/deepchem/deepchem/pull/2604 (tests for Jax Models)
Changes in CI
New Models, Layers and Modules
- https://github.com/deepchem/deepchem/pull/2426 (molGAN model)
- https://github.com/deepchem/deepchem/pull/2508 (Pagtn model)
- https://github.com/deepchem/deepchem/pull/2622 (Attention module for MAT)
- https://github.com/deepchem/deepchem/pull/2624 (MAT Embedding and Generator Layer)
- https://github.com/deepchem/deepchem/pull/2691 (MATModel integration with deepchem)
- https://github.com/deepchem/deepchem/pull/2658 (PINNModel)
Existing Model Updates
- https://github.com/deepchem/deepchem/pull/2559 (conversion of MultitaskRegressor and MultitaskClassifier to PyTorch)
Error Messages and Logging
- https://github.com/deepchem/deepchem/pull/2520 (WandB logger addition)
- https://github.com/deepchem/deepchem/pull/2725 (Improvements to GridHyperparameter logging)
Updates to existing tutorials:
- https://github.com/deepchem/deepchem/pull/2637 (tutorial re-organization)
Dependency and Setup Fixes
Most of these fixes are related to version bumping, pinning and fixing. CI improvements, installation related fixes.
- https://github.com/deepchem/deepchem/pull/2560 (Jax setup)
Featurizers and Transformers
- https://github.com/deepchem/deepchem/pull/2544 (MATFeaturizer)
- https://github.com/deepchem/deepchem/pull/2570 (Dummy Featurizer)
- https://github.com/deepchem/deepchem/pull/2656 (RxnFeaturizer)
- https://github.com/deepchem/deepchem/pull/2597 (RxnSplitTransformer)
- https://github.com/deepchem/deepchem/pull/2523 (RobertaFeaturizer)
- https://github.com/deepchem/deepchem/pull/2642 (BertFeaturizer)