DeepChem Minutes
Americas/Europe/Africa/Middle East Call
Date: October 29th, 2020
Attendees: Bharath, Hari, Mufei, Seyone, Daiki
Summary: Bharath hasn’t been able to do much on DeepChem due to a neck injury. He’s now starting to recover and has started catching up on all the issues and pull requests for the last week. Health permitting, next week Bharath will try to finish up the saving/reloading effort and start to get DeepChem 2.4.0 ready.
Hari has been working on changing the graph input features for a solar cell dataset. He’s adding additional features, which works well for his use case of solar cell property featurization. Bharath suggested that this might be a good addition to DeepChem. Hari mentioned that he wasn’t sure yet that the featurization method was robust, and that he’d check that first.
Mufei has submitted a PR for an initial model wrapper for a GCN based model for molecular property prediction. There are still a couple of additional comments that need to be addressed. There’s also a discussion about replacing the backend. Mufei would be careful about raising a deprecation warning for at least one stable version. Mufei said that once the current PR was merged in, he would put up subsequent PRs for review for wrappers, and then focus on getting the leaderboard up and running.
Seyone had a short update about ChemBERTa, as he was busy with midterms and scholarship applications. For the ChemBERTa tutorials, he managed to get the bert-loves-chemistry fork working, as well as the [SmilesTokenizer] (https://deepchem.readthedocs.io/en/latest/tokenizers.html) and Byte-Pair Encoder classification models loading stabally from HuggingFace, but he still needs to add in attention visualization and the masking language modelling demo. Seyone should have a lot more time this weekend to add the finishing touches and put up the PR. Seyone also noted that ChemBERTa’s paper was accepted into the NeurIPS ML4Molecules workshop. He’s aiming to have a more polished API for ChemBERTa transformers, with fine-tuning scripts, and more featurizers done by the conference date
Bharath mentioned that it might be useful to have the cleaned up tutorials ready by the actual workshop. Seyone said we should be able to get cleaned up tutorials ready by then. It might also be good to add a SELFIES model for the tutorial.
Daiki was focused on improving the tests and the documentation. He tried to convert Travis CI to Github actions in his repository. However, it looks like there are still some problems with Github actions there so he doesn’t think the Github actions are yet stable for usage yet. Daiki has also implemented some improvements for the documentation to make it more usable, so he’s tried to update the documentation to make it more useful. Daiki has written up an in-depth update of his work for the week on his Github here that explains his work in more depth.
Bharath asked if there were any general topics of discussions.
Seyone asked about the new PR for the protein featurization by @zqwu and what the DeepChem plan for protein featurizations was. Bharath mentioned that the goal was to try to reproduce AlphaFold within DeepChem. This is an ambitious target which might take some time to get there. Seyone mentioned that there’s an increasing number of protein transformer models in HuggingFace which might be useful to integrate more tightly into DeepChem.
Bharath mentioned that he wanted to try a new system for the notes, by asking all participants to edit their parts of the meeting notes. Bharath noted that last week since he was out, it wasn’t feasible to get a clean version of the notes up, so having a more robust system could be useful since one person being out sick wouldn’t prevent the notes from coming up.
India/Asia/Pacific Call
Attendees: Bharath, Peter, Seyone, Samuel, Daniel, Nathan, Vignesh, James
Date: October 30th, 2020
Summary: Since we had a newcomer on the call we started off with an intro.
Samuel is an undergrad in his final year at Karunya Institute in India. He’s been working with a team of researchers on chemoinformatics from Ben Gurion University of the Negev, Israel focusing on applications of small fragment molecules in the field of drug discovery. That’s how he got to learn about DeepChem. Looking forward to implementing DeepChem in his research project.
Bharath gave the same update he did in the Thursday meeting.
Seyone gave the same update he did in the Thursday meeting.
Peter has mostly been working on other things this week, but hopes he will have time next week to keep up with updating the molnet loaders and the tutorials.
Daniel made some progress with the low data loader this week and put up a PR. He’s managed to get some datasets to load, but hasn’t started on implementing the actual low data notebooks. He will start working on the Siamese network next. Bharath mentioned some unit tests would be very useful.
Nathan this week worked on adding functionality and tests for reloading/saving models for the normalizing flows in this PR. That should hopefully be ready soon! Nathan mentioned conference paper,
Vignesh this week has been working on implementing the lattice convolutions this week. He’s been having some difficulty. Bharath said he could give permissions for AWS bucket access to directly add in the dataset. Author, dataset, loader function.
James had a chance to read Mufei’s issue on the MoleculeNet refresh. He mentioned that he would be interested in adding some lead optimization datasets/tutorials. He also had a chance to read the ChemBERTa preprint and is excited to see how it scales.
Joining the DeepChem Developer Calls
As a quick reminder to anyone reading along, the DeepChem developer calls are open to the public! If you’re interested in attending either or both of the calls, please send an email to X.Y@gmail.com, where X=bharath, Y=ramsundar.