DeepChem 2.7.0 Release Notes

arunppsg · December 6, 2022, 7:03pm

This post was coauthored with Bharath Ramsundar.

We are excited to announce the release of DeepChem 2.7.0. This release adds multi-GPU support, adds support for scientifically important models including DMPNNs (as in Chemprop), and ports several layers to PyTorch for long term stability. We have also added several new scientific tutorials and improved support for bioinformatics applications.

Highlights

DeepChem adds support for new models including DMPNN, MEGNet
We have ported NormalizingFlows to PyTorch
Added support for multi-gpu training via pytorch lightning.
Utilities to run hhsearch multisequence alignment search on a dataset
We have ported several layers to pytorch

Porting Models to PyTorch

The following models/layers have been ported to pytorch: GRU, InterAtomicL2Distance, WeightedLinearCombo, CombineMeanStd, AtomicConvolution layer, NeighborList, CNN, LSTMStep

New Features

Fake graph data generator to generate random graphs
FASTQ Loader to load biological sequences of data
Added top_k_accuracy_score metric for evaluating model performances
Extracting molecular coordinates from QM9 dataset
Support for Random hyperparameter tuning

Featurizers

DMPNN Featurizer
Sparse matrix one hot featurizer
Position Frequency Matrix Featurizer implements a featurizer for position frequency matrices on a list of multisequence alignments to return a list of position frequency matrices.

New Layers

MEGNet Layer

Deprecations

dc.evaluate.utils.relative_difference is being deprecated. A deprecation warning to use math.isclose, np.isclose, np.allclose has been put in place.

Examples and Tutorials

Using hydra config system with pytorch-lightning system
New tutorial have been added to DeepQMC, SCVI and ScanPy, HierVAE, molGAN, hyper-parameter optimization, neural ODE, gaussian process, pytorch lightning, training a normalising flow on qm9 model, grover.

Documentation

Documentation has been improved with wider examples, using deepchem with docker, model cheat sheets.
Citations have been added to some of the tutorials to make them citable.

Improvements

Speed up in atomic convolution model
Utilities in deepchem disk dataset to convert it to a csv file.
Added file storage of validation and train scores during hyperparameter optimization.
Modified GraphData to support kwargs for storing additional attributes
Made it possible to run DeepChem in offline mode by removing default download call from CGCNN

Refactors

Mol2vec_fingerprints to directly use method from gensim library rather than mol2vec sub-package.

Bug Fixes

Retrieving shape of disk dataset when task names are not specified
Improvements in k-fold split when the number of data points is not exactly divisible by k
Fix a bug in SmilesToSeq featurizer when the padding length is 0.
A bug in which LogTransformer fails on data without an explicit task dimension has been fixed.

Maintenance

Adding type hints.
CI pipeline to consume less time

List of Pull Requests

Adding electron input streams for Ferminet in https://github.com/deepchem/deepchem/pull/2997
Adding Metropolis Hasting sampler in https://github.com/deepchem/deepchem/pull/2935
Position Frequency Matrix Featurizer in https://github.com/deepchem/deepchem/pull/2896
NormalizingFlow class and Affine transformation created using Pytorch in https://github.com/deepchem/deepchem/pull/2918
Add random hyperparameter search in https://github.com/deepchem/deepchem/pull/2897

DMPNN Featurizer

implementation of batching for DMPNN model in https://github.com/deepchem/deepchem/pull/3040
modify RDKitDescriptors class for normalized features in https://github.com/deepchem/deepchem/pull/2983
atom features function and helper functions for DMPNN Featurizer in https://github.com/deepchem/deepchem/pull/2929
add bond_features and reaction mapping with suitable tests for DMPNN in https://github.com/deepchem/deepchem/pull/2942
added _MapperDMPNN class and suitable tests in https://github.com/deepchem/deepchem/pull/2962
add global feature generator and suitable unit tests in https://github.com/deepchem/deepchem/pull/2971
add count-based morgan fingerprint featurizer and suitable unit tests in https://github.com/deepchem/deepchem/pull/2980
add DMPNN featurizer class and suitable unit tests in https://github.com/deepchem/deepchem/pull/2995
add new global feature generators and units tests for DMPNN featurizer in https://github.com/deepchem/deepchem/pull/3005
add dmpnn encoder layer and suitable unit test in https://github.com/deepchem/deepchem/pull/3023
add dmpnn class and suitable unit tests in
https://github.com/deepchem/deepchem/pull/3028
modify PositionwiseFeedForward class and add unit tests in https://github.com/deepchem/deepchem/pull/3009
add mapper class for dmpnn model and suitable unit tests in https://github.com/deepchem/deepchem/pull/3001
add torch model wrapper for DMPNN model class in https://github.com/deepchem/deepchem/pull/3034

Porting Models to PyTorch

Ported Tensorflow GRU to PyTorch in https://github.com/deepchem/deepchem/pull/3076
Added torch equivalent of InterAtomicL2Distances in torch_layers.py + YAPF changes in https://github.com/deepchem/deepchem/pull/2934
Port of WeightedLinearCombo from Keras to Torch in https://github.com/deepchem/deepchem/pull/3022
Port of CombineMeanStd from Keras to Torch in https://github.com/deepchem/deepchem/pull/3021
Port of AtomicConvolution to Torch in https://github.com/deepchem/deepchem/pull/3026
Port of NeighborList from Keras to Torch in https://github.com/deepchem/deepchem/pull/3020
Ported Tensorflow LSTMStep to PyTorch in https://github.com/deepchem/deepchem/pull/3072
Porting CNN, TF to PyTorch in https://github.com/deepchem/deepchem/pull/2963

Tutorials

Adding DeepQMC tutorial in https://github.com/deepchem/deepchem/pull/2914
Scvi tools tutorial in https://github.com/deepchem/deepchem/pull/3025
ScanPy tutorial in https://github.com/deepchem/deepchem/pull/3018
added HierVAE tutorial in https://github.com/deepchem/deepchem/pull/2904
adding Docker tutorial in https://github.com/deepchem/deepchem/pull/2814
added molGAN tutorial in https://github.com/deepchem/deepchem/pull/2773
adding hyperopt tutorials in https://github.com/deepchem/deepchem/pull/2851
added neural ode tutorial in https://github.com/deepchem/deepchem/pull/2859
Fresh gaussian process tutorial in https://github.com/deepchem/deepchem/pull/2864
deepchem pytorch lightning tutorial in https://github.com/deepchem/deepchem/pull/2826
Update Training_a_Normalizing_Flow_on_QM9.ipynb in https://github.com/deepchem/deepchem/pull/2885
Adding a tutorial for GROVER. in https://github.com/deepchem/deepchem/pull/2901

Documentation

Update models.rst in https://github.com/deepchem/deepchem/pull/2828
Update to Documentation for Using DeepChem in Jupyter Notebook in https://github.com/deepchem/deepchem/pull/2856
Update tutorials.rst in https://github.com/deepchem/deepchem/pull/2902
add tutorial citations in https://github.com/deepchem/deepchem/pull/2921, https://github.com/deepchem/deepchem/pull/2922, https://github.com/deepchem/deepchem/pull/2925, https://github.com/deepchem/deepchem/pull/2924

Improvements over existing features

Best score callback in https://github.com/deepchem/deepchem/pull/2866
Speed up AtomicConv model, improvements to AtomicConv tutorial in https://github.com/deepchem/deepchem/pull/2888
Adding to_csv method to DiskDataset in https://github.com/deepchem/deepchem/pull/3007
Assigning task names when they are not specified in https://github.com/deepchem/deepchem/pull/3047
Extracting molecular coordinates for QM9 dataset from sdf files in https://github.com/deepchem/deepchem/pull/2903
added file output of validation and train scores in https://github.com/deepchem/deepchem/pull/3073

Bug Fixes and Patches

Improvements to GraphData in https://github.com/deepchem/deepchem/pull/2860
Resolve the bug issue of loading the .sdf files in https://github.com/deepchem/deepchem/pull/2795
Removing mol2vec dependency in https://github.com/deepchem/deepchem/pull/3052
Fixes to k-fold fingerprint splitting in https://github.com/deepchem/deepchem/pull/3038
adding batch size argument to lightning module in https://github.com/deepchem/deepchem/pull/3106
Fix issue #3057 (update _Mapper class for dmpnn) in https://github.com/deepchem/deepchem/pull/3058
Fix broken typehint in SparseMatrixOneHotFeaturizer.untransform in https://github.com/deepchem/deepchem/pull/3080
Substract padding from list length when slicing in https://github.com/deepchem/deepchem/pull/3079

Maintenance

Removing package pins in https://github.com/deepchem/deepchem/pull/2783, https://github.com/deepchem/deepchem/pull/2873
adding jax dependencies in https://github.com/deepchem/deepchem/pull/2877
Fixing Colab Links in https://github.com/deepchem/deepchem/pull/2883
Fixing broken CI on windows - Jax and Vina in https://github.com/deepchem/deepchem/pull/2886
Fix to log transformer in https://github.com/deepchem/deepchem/pull/2887
Minor Patch in https://github.com/deepchem/deepchem/pull/3101
Minor patches in https://github.com/deepchem/deepchem/pull/3105
Removing some obsolete code in https://github.com/deepchem/deepchem/pull/2855
Adding FutureWarning to depreciate deepchem.utils.evaluate.relative_difference in https://github.com/deepchem/deepchem/pull/2909
Module dl dependancies in https://github.com/deepchem/deepchem/pull/2908 - adds patches
Improving CI in https://github.com/deepchem/deepchem/pull/3056
Update init.py in https://github.com/deepchem/deepchem/pull/3063
Adding CI tests for python 3.10 in https://github.com/deepchem/deepchem/pull/2846
updated mdtraj requirement in https://github.com/deepchem/deepchem/pull/3112
Correctly import lightning to avoid import errors in https://github.com/deepchem/deepchem/pull/3065
Refactoring CI in https://github.com/deepchem/deepchem/pull/3055
pinning scipy to fix torch build in ci in https://github.com/deepchem/deepchem/pull/3051
Using github actions v3 in workflows in https://github.com/deepchem/deepchem/pull/3053
Fixing CI errors in https://github.com/deepchem/deepchem/pull/2931
first steps in fixing docker build in https://github.com/deepchem/deepchem/pull/2949
python v3.8 in readthedocs.yml in https://github.com/deepchem/deepchem/pull/2984
Pinning sphinx version in docs/requirements.txt in https://github.com/deepchem/deepchem/pull/2985
Update torchvision version requirement in https://github.com/deepchem/deepchem/pull/2916

DeepChem 2.7.0 Release Notes

Highlights

Porting Models to PyTorch

New Features

Featurizers

New Layers

Deprecations

Examples and Tutorials

Documentation

Improvements

Refactors

Bug Fixes

Maintenance

List of Pull Requests

DMPNN Featurizer

New Models

Normalizing Flow Model

Pytorch Lightning

Porting Models to PyTorch

Tutorials

Documentation

Improvements over existing features

Bug Fixes and Patches

Maintenance