DeepChem/DeepScience has lots of features and documentation that make it a good tool (I believe) for students. Our focus on production engineering has also increasingly made DeepChem a good tool for production engineering. DeepChem is also a useful tool for researchers doing applied machine learning since it allows them to apply/optimize a large battery of machine learning models to datasets.
However, DeepChem/DeepScience has not yet succeeded in winning mindshare as a tool for ML researchers (beyond its use as a source of benchmarking baselines in which it has a healthy niche). I believe the main reason for this is twofold:
- DeepChem is not easily composable: DeepChem primitives can’t chain together to create new algorithms (say the way PyTorch tensors chain together to create new architectures)
- DeepChem’s featurizers and models are not interoperable: Most models have a uniquely associated featurizer. This means it’s hard to mix/match different featurizations and architectures
Here are two proposed improvements which I believe will partially improve the state of affairs:
- Port DeepChem Layers to PyTorch: Our layers are currently split between Torch/TensorFlow/Jax. As a result, we have fragmentation where it isn’t easy to mix/match layers. Porting all layers to Torch means our layers will readily interoperate with one another. Improving layer/documentation and tutorials will help with interoperability
- Standardize featurizer data classes: We should have all future graph convolutional models use
GraphData
as their data class. Over time, we should deprecateConvMol
,WeaveMol
etc and standardize onGraphData
. This will enable graph featurizations to be used across all graph convolutional architectures. We should also remove or reduce custom transformation code (the code that loadsConvMol
into TensorFlow tensors inGraphConvModel
for example). By usingGraphData
we can have one standard function to loadGraphData
into Torch tensors.
Both of these efforts are already informally underway but I wanted to document the general push so the community can suggest ideas to improve our efforts and make DeepChem a better researcher tool