DeepChem Minutes 8/28/2020

seyonec · September 2, 2020, 8:19pm

Date: August 28th, 2020

Attendees: Seyone, Peter, Nathan, Neel, Sanjiv

Summary:

Bharath was unable to come to today’s meeting for personal reasons, so Seyone took the minutes for today’s call. Daiki was also not at the call today.

Peter has been working on fixing the existing suite of tutorials, as many of them are broken
(PR 1,PR 2,PR 3,PR 4). Peter believes we can improve the existing tutorials to be more valuable to first-time users, as currently, they are mainly just examples of different models + classes in DeepChem, rather than tutorials that build on top of each other, sequentially. He plans to spend some more time working on a better-designed series of notebook tutorials.

Nathan has been working on a normalizing flows tutorial while iterating on the API for normalizing flows at the same time to fit in the tutorial. The tutorial aims to take a 2-d Gaussian distribution and train a normalizing flow that will transform that into a distribution. He’s currently fixing some bugs and wrapping up some changes to the API + tutorial, so it should be ready for review soon PR.

Neel is working on fixing the graph convolutions tutorial and will be spending some more time to understand the GraphConv module in DeepChem to improve the tutorial. He also looks to help out Peter with refactoring, once he gets a better grasp of the library. Peter mentioned that a good first step might be to identify which parts of the tutorial are broken and moving from there. He also worked on fixing the commented code in the Solubility modelling tutorial [PR] (https://github.com/deepchem/deepchem/pull/2115).

Sanjiv spent the past week familiarizing himself with the DeepChem library and will start to coordinate with Bharath on a good first contribution to work on during next week’s call.

Seyone has been working on a PR for creating a new Tokenizers API PR and migrating the SmilesTokenizer + BasicSmilesTokenizer from Philippe Scwaller’s ReactionMapper library. He’s hoping to include the SmilesTokenizer + regex-based BasicSmilesTokenizer in an updated version of the ChemBERTa tutorial (or a follow-up tutorial), for more semantically relevant attention visualization and model performance. So far, he’s expanded the documentation, type annotations, and added the transformers library to the Travis CI test, but the PR is still failing despite the unit test passing. Seyone mentioned he will go back this weekend and see where the remaining errors lie and also get another pass from Bharath + Philippe on Monday before merging it in. Seyone also mentioned that he was working on updating the older ChemBERTa implementation PR in DeepChem to use the SmilesTokenizer once it is merged in. He is currently trying to wrap it around the TorchModel.

After the roundtable updates, Nathan asked if there was an existing tutorial that would make for a good example to follow for the normalizing flows tutorial he is working on. He is looking to balance theory with the code implementation. Peter mentioned just following the general flow of existing tutorials and then iterating once the first working tutorial is done.

Seyone asked if there could be future tutorials aimed at making new contributions easier, namely specifically for adding new PyTorch-based implementations, as the bulk of papers nowadays are using Pytorch implementations. He thinks that this could be a good way to increase the number of new developers adding model implementations to DeepChem. Peter mentioned a good way to do so would be creating tutorials for how to utilize TorchModel + KerasModel for new implementations. Through these tutorials, the goal would be to show how to wrap TorchModel for new PyTorch implementations and wrap Keras Model for new Keras model implementations. It would also show the similarity between the two classes despite using different frameworks. Seyone agreed that this would be a very meaningful contribution, and it may be good to add once the current batch of tutorial refactoring is done.

As a quick reminder to anyone reading along, the DeepChem developer calls are open to the public! If you’re interested in attending, please send an email to X.Y@gmail.com, where X=bharath, Y=ramsundar.