DeepChem Papers and Discoveries List

DeepChem has been used in many publications at this point, but we don’t really have a well curated list of papers/discoveries made with DeepChem. I propose that we use this thread to crowdsource a list of important papers/discoveries that were powered with DeepChem. This will help us eventually secure grants for DeepChem and perhaps even give us the grounds to add a wikipedia page for the project. Here’s a couple old papers (and a rough template) to start us off:


Paper Title: MoleculeNet: a benchmark for molecular machine learning
Summary of DeepChem Usage: MoleculeNet is a benchmark collection that is tightly integrated with DeepChem and which uses the DeepChem API to enable easy access to datasets
Important Contributions: MoleculeNet was one of the first popular benchmarks for molecular machine learning and has been cited widely by the community.
Date Published: October 30th, 2017


Paper Title: Low data drug discovery with one-shot learning
Summary of DeepChem Usage: One shot models were implemented in DeepChem. (Unfortunately, these models don’t work in today’s DeepChem, but I hope to fix this in one of the upcoming releases)
Important Contributions: This paper demonstrated that low data techniques could be useful in some drug discovery applications
Date Published: April 3, 2017


I suspect there’s a lot of papers out there, so please add on your papers to this thread when you get a chance! I’ll continue populating this thread out as well

1 Like

Paper Title: AMPL: A Data-Driven Modeling Pipeline for Drug Discovery
Summary of DeepChem Usage: AMPL extends DeepChem into an end-to-end modeling pipeline for drug discovery teams. See https://github.com/ATOMconsortium/AMPL
Important Contributions: AMPL shows that DeepChem can be used in a large scale pipeline for Pharma drug discovery efforts
Date Published: April 3, 2020


1 Like

Paper Title: Is multitask deep learning practical for pharma?
Summary of DeepChem Usage: This paper used DeepChem to build multitask deep networks which were trained on obfuscated Merck internal data
Important Contributions: This paper demonstrated that DeepChem could achieve similar results to the Merck Kaggle contest which kicked off the entire field of deep learning for drug discovery. Also one of the first usages of DeepChem for a pharma collaboration project
Date Published: July 10, 2017

1 Like

Paper Title: Atomic convolutional networks for predicting protein-ligand binding affinity
Summary of DeepChem Usage: Atomic Convolutional Networks were implemented in DeepChem and are still supported in DeepChem. (Implementations are buggy but actively being fixed right now)
Important Contributions: One of the first models that used a structural inductive bias from physics to directly feed 3D structures into a deep network
Date Published: March 30, 2017 (Arxiv)

Paper Title: Computational modeling of β-secretase 1 (BACE-1) inhibitors using ligand based approaches
Summary of DeepChem Usage: This paper implemented all its models in DeepChem
Important Contributions: Used DeepChem as part of a pharma drug discovery pipeline for BACE-1.
Date Published: September 30, 2016

Paper Title: Attempt to develop machine learning materials via Scratch (Scratchを経由する機械学習教材の開発の試み)
Summary of DeepChem Usage: “We have been developing a Scratch-based educational program for machine learning in the chemistry context. This system has an interface with the DeepChem library set of Python. The solubility prediction of several molecules with SMILES notation was demonstrated as a preliminary application.”
Important Contributions:
image
Date Published: 2019

Paper Title: Assessing Graph‐based Deep Learning Models for Predicting Flash Point
Summary of DeepChem Usage: “This work assesses the performance of GBDL models in
predicting flash points with two methods. First, by comparing our results with previous studies that use traditional QSPR approaches, and second, by using our models to predict a test sample of our dataset as well as samples of data in different chemical domains. We apply two GBDL models that are implemented in DeepChem”
Important Contributions: Claims to be one of the first applications of graph based models to flashpoint prediction
Date Published: February 20, 2020

Paper Title: Large-scale comparison of machine learning methods for drug target prediction on ChEMBL
Summary of DeepChem Usage: Uses DeepChem Graph Conv and Weave models in its benchmarking of deep learning for virtual screening on ChEMBL.
Important Contributions: “We found (1) that deep learning methods significantly outperform all competing methods and (2) that the predictive performance of deep learning is in many cases comparable to that of tests performed in wet labs (i.e., in vitro assays).”
Date Published: May 16th, 2018

Paper Title: A self-attention based message passing neural network for predicting molecular lipophilicity and aqueous solubility
Summary of DeepChem Usage: DeepChem’s MPN is used as a baseline and as a code base: " In this paper, we describe a self-attention-based message-passing neural network (SAMPN) model, which is a modification of Deepchem’s MPN [16] and is state-of-the-art in deep learning."
Important Contributions: Introduces SAMPN (Self Attention Message Passing Networks) which provide a new interpretable architecture for molecules. Code is at https://github.com/tbwxmu/SAMPN.
Date Published: February 21, 2020

Paper Title: Performance Optimization for Feature Extraction Section of DeepChem
Summary of DeepChem Usage: “For the performance problem of training process of DeepChem neural network, this paper rebuilds the original serial feature extraction algorithm of DeepChem and optimizes the rebuilt serial algorithm based on the multiple processes algorithm. The experiment results show that the parallel algorithm achieves 15.38 × speedup at the best compared with the serial algorithm.”
Important Contributions: Implements a multiprocess variant of our feature extraction which could achieve a 15x speedup on our base code.
Date Published: September 29, 2020

Paper Title: Solvent-Specific Featurization for Predicting Free Energies of Solvation through Machine Learning
Summary of DeepChem Usage: “A featurization algorithm based on functional class fingerprints has been implemented within the DeepChem machine learning framework.”
Important Contributions: “Tests carried out on solvents with a range of polarity from the FreeSolv and MNSol data sets have shown slightly better accuracy than the commonly used topology-based extended connectivity fingerprint algorithm for hydration free energies. However, improvement was not as significant as hoped and less clear for less polar solvents suggesting that further solvent-specific descriptors may need to be taken into consideration.”
Date Published: March 1, 2019

Paper Title: ToxicBlend: Virtual screening of toxic compounds with ensemble predictors
Summary of DeepChem Usage: Tox21 and ToxCast datasets from MoleculeNet (via DeepChem) are used. XGBoost, GCNN, and DNN models from DeepChem are used for benchmarking. DeepChem index/scaffold/random splits are also used
Important Contributions: "Timely assessment of compound toxicity is one of the biggest challenges facing the pharmaceutical industry today. A significant proportion of compounds identified as potential leads are ultimately discarded due to the toxicity they induce. In this paper, we propose a novel machine learning approach for the prediction of molecular activity on TOXCAST targets.
Date Published: January 9th, 2019

Paper Title: Learning Graph While Training: An Evolving Graph
Convolutional Neural Network

Summary of DeepChem Usage: “Lastly, Deepchem is an outstanding open-source chem-informatics/machine learning benchmark. Our codes and demos were built and tested upon it.”
Important Contributions: “In this paper, we propose a more general and flexible
graph convolution network (EGCN) fed by batch of arbitrarily shaped data together
with their evolving graph Laplacians trained in supervised fashion.”
Date Published: August 10th, 2017

Paper Title: Synergy Effect between Convolutional Neural Networks and the Multiplicity of SMILES for Improvement of Molecular Prediction
Summary of DeepChem Usage: DeepChem is used for benchmarking baselines for author’s new model.
Important Contributions: “In our study, we demonstrate the synergy effect between convolutional neural networks and the multiplicity of SMILES. The model we propose, the so-called Convolutional Neural Fingerprint (CNF) model, reaches the accuracy of traditional descriptors such as Dragon (Mauri et al. [22]), RDKit (Landrum [18]), CDK2 (Willighagen et al. [43]) and PyDescriptor (Masand and Rastija [20]).”
Date Published: December 11, 2018 (Arxiv)

Paper Title: Combining Docking Pose Rank and Structure with Deep Learning Improves Protein–Ligand Binding Mode Prediction over a Baseline Docking Approach
Summary of DeepChem Usage: Featurization and network architecture are adapted from DeepChem. This is a paper from IBM Watson notably.
Important Contributions: "We present a simple, modular graph-based convolutional neural network that takes structural information from protein-ligand complexes as input to generate models for activity and binding mode prediction. "
Date Published: February 20, 2020

Paper Title: Multitask deep networks with grid featurization achieve
improved scoring performance for protein–ligand binding

Summary of DeepChem Usage: Fingerprints, datasets, and architectures were used from DeepChem
Important Contributions: “Our results demonstrated that progressive network combined with grid featurization would be one powerful rescoring approach to strengthen screening results after obtaining protein–ligand complex in the conventional docking software.”
Date Published: September 12th, 2019

Paper Title: Augmentation Is What You Need!
Summary of DeepChem Usage: DeepChem’s TextCNN is used as a baseline
Important Contributions: " We investigate the effect of augmentation of SMILES to increase the performance of convolutional neural network models by extending the results of our previous study [1] to new methods and augmentation scenarios. We demonstrate that augmentation significantly increases performance and this effect is consistent across investigated methods."
Date Published: September 9th, 2019

Paper Title: GraphCPI: Graph Neural Representation Learning for Compound-Protein Interaction
Summary of DeepChem Usage: Atom features are adopted from DeepChem.
Important Contributions: “Accurately predicting compound-protein interactions (CPIs) is of great help to increase the efficiency and reduce costs in drug development… this paper suggests
an end-to-end deep learning framework called GraphCPI, which captures the structural information of compounds and leverages the chemical context of protein sequences for solving the CPI prediction task.”
Date Published: November 18th, 2019

Paper Title: Prediction of aqueous solubility of compounds based on neural network
Summary of DeepChem Usage: The Delaney dataset is used for benchmarking
Important Contributions: We predict the aqueous solubility of compounds, and evaluate the prediction results of the Neural Networks including CNN, RNN, DNN, SNN.
Date Published: April 9th, 2019

Paper Title: Enhance Information Propagation for Graph Neural Network by Heterogeneous Aggregations
Summary of DeepChem Usage: A model from DeepChem is used as a baseline for the authors’ new architecture.
Important Contributions: A new generic GNN layer formulation and upon this a new GNN variant referred as HAG-Net is proposed. We empirically validate the effectiveness of HAG-Net on a number of graph classification benchmarks, and elaborate all the design options and criterions along with.
Date Published: February 8th, 2021