DeepChem 2.7.0 Release Notes

This post was coauthored with Bharath Ramsundar.

We are excited to announce the release of DeepChem 2.7.0. This release adds multi-GPU support, adds support for scientifically important models including DMPNNs (as in Chemprop), and ports several layers to PyTorch for long term stability. We have also added several new scientific tutorials and improved support for bioinformatics applications.

Highlights

  • DeepChem adds support for new models including DMPNN, MEGNet
  • We have ported NormalizingFlows to PyTorch
  • Added support for multi-gpu training via pytorch lightning.
  • Utilities to run hhsearch multisequence alignment search on a dataset
  • We have ported several layers to pytorch

Porting Models to PyTorch

  • The following models/layers have been ported to pytorch: GRU, InterAtomicL2Distance, WeightedLinearCombo, CombineMeanStd, AtomicConvolution layer, NeighborList, CNN, LSTMStep

New Features

  • Fake graph data generator to generate random graphs
  • FASTQ Loader to load biological sequences of data
  • Added top_k_accuracy_score metric for evaluating model performances
  • Extracting molecular coordinates from QM9 dataset
  • Support for Random hyperparameter tuning

Featurizers

  • DMPNN Featurizer
  • Sparse matrix one hot featurizer
  • Position Frequency Matrix Featurizer implements a featurizer for position frequency matrices on a list of multisequence alignments to return a list of position frequency matrices.

New Layers

  • MEGNet Layer

Deprecations

  • dc.evaluate.utils.relative_difference is being deprecated. A deprecation warning to use math.isclose, np.isclose, np.allclose has been put in place.

Examples and Tutorials

  • Using hydra config system with pytorch-lightning system
  • New tutorial have been added to DeepQMC, SCVI and ScanPy, HierVAE, molGAN, hyper-parameter optimization, neural ODE, gaussian process, pytorch lightning, training a normalising flow on qm9 model, grover.

Documentation

  • Documentation has been improved with wider examples, using deepchem with docker, model cheat sheets.
  • Citations have been added to some of the tutorials to make them citable.

Improvements

  • Speed up in atomic convolution model
  • Utilities in deepchem disk dataset to convert it to a csv file.
  • Added file storage of validation and train scores during hyperparameter optimization.
  • Modified GraphData to support kwargs for storing additional attributes
  • Made it possible to run DeepChem in offline mode by removing default download call from CGCNN

Refactors

  • Mol2vec_fingerprints to directly use method from gensim library rather than mol2vec sub-package.

Bug Fixes

  • Retrieving shape of disk dataset when task names are not specified
  • Improvements in k-fold split when the number of data points is not exactly divisible by k
  • Fix a bug in SmilesToSeq featurizer when the padding length is 0.
  • A bug in which LogTransformer fails on data without an explicit task dimension has been fixed.

Maintenance

  • Adding type hints.
  • CI pipeline to consume less time

List of Pull Requests

DMPNN Featurizer

New Models

Normalizing Flow Model

Pytorch Lightning

Porting Models to PyTorch

Tutorials

Documentation

Improvements over existing features

Bug Fixes and Patches

Maintenance

1 Like