This post was coauthored with Bharath Ramsundar.
We are excited to announce the release of DeepChem 2.7.0. This release adds multi-GPU support, adds support for scientifically important models including DMPNNs (as in Chemprop), and ports several layers to PyTorch for long term stability. We have also added several new scientific tutorials and improved support for bioinformatics applications.
Highlights
- DeepChem adds support for new models including DMPNN, MEGNet
- We have ported NormalizingFlows to PyTorch
- Added support for multi-gpu training via pytorch lightning.
- Utilities to run hhsearch multisequence alignment search on a dataset
- We have ported several layers to pytorch
Porting Models to PyTorch
- The following models/layers have been ported to pytorch: GRU, InterAtomicL2Distance, WeightedLinearCombo, CombineMeanStd, AtomicConvolution layer, NeighborList, CNN, LSTMStep
New Features
- Fake graph data generator to generate random graphs
- FASTQ Loader to load biological sequences of data
- Added top_k_accuracy_score metric for evaluating model performances
- Extracting molecular coordinates from QM9 dataset
- Support for Random hyperparameter tuning
Featurizers
- DMPNN Featurizer
- Sparse matrix one hot featurizer
- Position Frequency Matrix Featurizer implements a featurizer for position frequency matrices on a list of multisequence alignments to return a list of position frequency matrices.
New Layers
- MEGNet Layer
Deprecations
- dc.evaluate.utils.relative_difference is being deprecated. A deprecation warning to use
math.isclose
,np.isclose
,np.allclose
has been put in place.
Examples and Tutorials
- Using hydra config system with pytorch-lightning system
- New tutorial have been added to DeepQMC, SCVI and ScanPy, HierVAE, molGAN, hyper-parameter optimization, neural ODE, gaussian process, pytorch lightning, training a normalising flow on qm9 model, grover.
Documentation
- Documentation has been improved with wider examples, using deepchem with docker, model cheat sheets.
- Citations have been added to some of the tutorials to make them citable.
Improvements
- Speed up in atomic convolution model
- Utilities in deepchem disk dataset to convert it to a csv file.
- Added file storage of validation and train scores during hyperparameter optimization.
- Modified GraphData to support kwargs for storing additional attributes
- Made it possible to run DeepChem in offline mode by removing default download call from CGCNN
Refactors
- Mol2vec_fingerprints to directly use method from gensim library rather than mol2vec sub-package.
Bug Fixes
- Retrieving shape of disk dataset when task names are not specified
- Improvements in k-fold split when the number of data points is not exactly divisible by k
- Fix a bug in SmilesToSeq featurizer when the padding length is 0.
- A bug in which LogTransformer fails on data without an explicit task dimension has been fixed.
Maintenance
- Adding type hints.
- CI pipeline to consume less time
List of Pull Requests
-
Adding electron input streams for Ferminet in https://github.com/deepchem/deepchem/pull/2997
-
Adding Metropolis Hasting sampler in https://github.com/deepchem/deepchem/pull/2935
-
Position Frequency Matrix Featurizer in https://github.com/deepchem/deepchem/pull/2896
-
NormalizingFlow class and Affine transformation created using Pytorch in https://github.com/deepchem/deepchem/pull/2918
-
Add random hyperparameter search in https://github.com/deepchem/deepchem/pull/2897
DMPNN Featurizer
-
implementation of batching for DMPNN model in https://github.com/deepchem/deepchem/pull/3040
-
modify RDKitDescriptors class for normalized features in https://github.com/deepchem/deepchem/pull/2983
-
atom features function and helper functions for DMPNN Featurizer in https://github.com/deepchem/deepchem/pull/2929
-
add bond_features and reaction mapping with suitable tests for DMPNN in https://github.com/deepchem/deepchem/pull/2942
-
added _MapperDMPNN class and suitable tests in https://github.com/deepchem/deepchem/pull/2962
-
add global feature generator and suitable unit tests in https://github.com/deepchem/deepchem/pull/2971
-
add count-based morgan fingerprint featurizer and suitable unit tests in https://github.com/deepchem/deepchem/pull/2980
-
add DMPNN featurizer class and suitable unit tests in https://github.com/deepchem/deepchem/pull/2995
-
add new global feature generators and units tests for DMPNN featurizer in https://github.com/deepchem/deepchem/pull/3005
-
add dmpnn encoder layer and suitable unit test in https://github.com/deepchem/deepchem/pull/3023
-
add dmpnn class and suitable unit tests in
https://github.com/deepchem/deepchem/pull/3028 -
modify PositionwiseFeedForward class and add unit tests in https://github.com/deepchem/deepchem/pull/3009
-
add mapper class for dmpnn model and suitable unit tests in https://github.com/deepchem/deepchem/pull/3001
-
add torch model wrapper for DMPNN model class in https://github.com/deepchem/deepchem/pull/3034
New Models
-
Added Graph Networks in https://github.com/deepchem/deepchem/pull/2843
-
Adding batch processing to GraphNet layer in https://github.com/deepchem/deepchem/pull/2874
-
MEGNet layer implementation in https://github.com/deepchem/deepchem/pull/2837
Normalizing Flow Model
-
Normalizing Flow Torch Model in https://github.com/deepchem/deepchem/pull/2944
-
Adding KFAC in https://github.com/deepchem/deepchem/pull/2972
-
Real NVP Transformation Layer for Normalizing Flows in https://github.com/deepchem/deepchem/pull/2996
Pytorch Lightning
-
implement dc lightning dataset module in https://github.com/deepchem/deepchem/pull/2993
-
update unit tests with DC Lightning dataset module in https://github.com/deepchem/deepchem/pull/2994
-
Added pytorch-lightning in mac requirements in https://github.com/deepchem/deepchem/pull/3008
-
GCNModel benchmark script for gpu in https://github.com/deepchem/deepchem/pull/3016
-
Update pytorch-lightning version in https://github.com/deepchem/deepchem/pull/3042
-
implement DCLightningModule in https://github.com/deepchem/deepchem/pull/2945
Porting Models to PyTorch
-
Ported Tensorflow GRU to PyTorch in https://github.com/deepchem/deepchem/pull/3076
-
Added torch equivalent of InterAtomicL2Distances in torch_layers.py + YAPF changes in https://github.com/deepchem/deepchem/pull/2934
-
Port of WeightedLinearCombo from Keras to Torch in https://github.com/deepchem/deepchem/pull/3022
-
Port of CombineMeanStd from Keras to Torch in https://github.com/deepchem/deepchem/pull/3021
-
Port of AtomicConvolution to Torch in https://github.com/deepchem/deepchem/pull/3026
-
Port of NeighborList from Keras to Torch in https://github.com/deepchem/deepchem/pull/3020
-
Ported Tensorflow LSTMStep to PyTorch in https://github.com/deepchem/deepchem/pull/3072
-
Porting CNN, TF to PyTorch in https://github.com/deepchem/deepchem/pull/2963
Tutorials
-
Adding DeepQMC tutorial in https://github.com/deepchem/deepchem/pull/2914
-
Scvi tools tutorial in https://github.com/deepchem/deepchem/pull/3025
-
ScanPy tutorial in https://github.com/deepchem/deepchem/pull/3018
-
added HierVAE tutorial in https://github.com/deepchem/deepchem/pull/2904
-
adding Docker tutorial in https://github.com/deepchem/deepchem/pull/2814
-
added molGAN tutorial in https://github.com/deepchem/deepchem/pull/2773
-
adding hyperopt tutorials in https://github.com/deepchem/deepchem/pull/2851
-
added neural ode tutorial in https://github.com/deepchem/deepchem/pull/2859
-
Fresh gaussian process tutorial in https://github.com/deepchem/deepchem/pull/2864
-
deepchem pytorch lightning tutorial in https://github.com/deepchem/deepchem/pull/2826
-
Update Training_a_Normalizing_Flow_on_QM9.ipynb in https://github.com/deepchem/deepchem/pull/2885
-
Adding a tutorial for GROVER. in https://github.com/deepchem/deepchem/pull/2901
Documentation
-
Update models.rst in https://github.com/deepchem/deepchem/pull/2828
-
Update to Documentation for Using DeepChem in Jupyter Notebook in https://github.com/deepchem/deepchem/pull/2856
-
Update tutorials.rst in https://github.com/deepchem/deepchem/pull/2902
-
add tutorial citations in https://github.com/deepchem/deepchem/pull/2921, https://github.com/deepchem/deepchem/pull/2922, https://github.com/deepchem/deepchem/pull/2925, https://github.com/deepchem/deepchem/pull/2924
Improvements over existing features
-
Best score callback in https://github.com/deepchem/deepchem/pull/2866
-
Speed up AtomicConv model, improvements to AtomicConv tutorial in https://github.com/deepchem/deepchem/pull/2888
-
Adding to_csv method to DiskDataset in https://github.com/deepchem/deepchem/pull/3007
-
Assigning task names when they are not specified in https://github.com/deepchem/deepchem/pull/3047
-
Extracting molecular coordinates for QM9 dataset from sdf files in https://github.com/deepchem/deepchem/pull/2903
-
added file output of validation and train scores in https://github.com/deepchem/deepchem/pull/3073
Bug Fixes and Patches
-
Improvements to GraphData in https://github.com/deepchem/deepchem/pull/2860
-
Resolve the bug issue of loading the .sdf files in https://github.com/deepchem/deepchem/pull/2795
-
Removing mol2vec dependency in https://github.com/deepchem/deepchem/pull/3052
-
Fixes to k-fold fingerprint splitting in https://github.com/deepchem/deepchem/pull/3038
-
adding batch size argument to lightning module in https://github.com/deepchem/deepchem/pull/3106
-
Fix issue #3057 (update _Mapper class for dmpnn) in https://github.com/deepchem/deepchem/pull/3058
-
Fix broken typehint in SparseMatrixOneHotFeaturizer.untransform in https://github.com/deepchem/deepchem/pull/3080
-
Substract padding from list length when slicing in https://github.com/deepchem/deepchem/pull/3079
Maintenance
-
Removing package pins in https://github.com/deepchem/deepchem/pull/2783, https://github.com/deepchem/deepchem/pull/2873
-
adding jax dependencies in https://github.com/deepchem/deepchem/pull/2877
-
Fixing Colab Links in https://github.com/deepchem/deepchem/pull/2883
-
Fixing broken CI on windows - Jax and Vina in https://github.com/deepchem/deepchem/pull/2886
-
Fix to log transformer in https://github.com/deepchem/deepchem/pull/2887
-
Minor Patch in https://github.com/deepchem/deepchem/pull/3101
-
Minor patches in https://github.com/deepchem/deepchem/pull/3105
-
Removing some obsolete code in https://github.com/deepchem/deepchem/pull/2855
-
Adding FutureWarning to depreciate deepchem.utils.evaluate.relative_difference in https://github.com/deepchem/deepchem/pull/2909
-
Module dl dependancies in https://github.com/deepchem/deepchem/pull/2908 - adds patches
-
Improving CI in https://github.com/deepchem/deepchem/pull/3056
-
Update init.py in https://github.com/deepchem/deepchem/pull/3063
-
Adding CI tests for python 3.10 in https://github.com/deepchem/deepchem/pull/2846
-
updated mdtraj requirement in https://github.com/deepchem/deepchem/pull/3112
-
Correctly import lightning to avoid import errors in https://github.com/deepchem/deepchem/pull/3065
-
Refactoring CI in https://github.com/deepchem/deepchem/pull/3055
-
pinning scipy to fix torch build in ci in https://github.com/deepchem/deepchem/pull/3051
-
Using github actions v3 in workflows in https://github.com/deepchem/deepchem/pull/3053
-
Fixing CI errors in https://github.com/deepchem/deepchem/pull/2931
-
first steps in fixing docker build in https://github.com/deepchem/deepchem/pull/2949
-
python v3.8 in readthedocs.yml in https://github.com/deepchem/deepchem/pull/2984
-
Pinning sphinx version in docs/requirements.txt in https://github.com/deepchem/deepchem/pull/2985
-
Update torchvision version requirement in https://github.com/deepchem/deepchem/pull/2916