Date: March 4th, 2021
Attendees: Bharath, Vignesh, Dhairya, Ashwin
Summary: Ashwin is a 2nd year undergrad student at BITS Pilani studying chemistry and computer science and is interested in contributing to DeepChem. He’s been going through the documentation and has started working through the tutorials.
Dhairya is a data scientist and research engineer at Julia Computing. He’s the lead developer of Flux.jl and Zygote.jl. Dhairya has been working with DeepChem as part of the ARPA-E CMU project.
Bharath has mostly worked on code review for PRs this week.
Vignesh has been working on the PagTNN model for DGL-Life Sciences and is planning to contribute a wrapper to DeepChem. Vignesh contributed a PR for the LCNN model and has started on the tutorial.
Ashwin asked how models are selected for addition to DeepChem. Bharath said the process is pretty organic and based on developer interest.
Dhairya asked if there was common shared infrastructure across featurizers. Bharath said the utils had a collection of shared tools, but a lot of the infrastructure was custom.
Bharath asked Dhairya if there were any lessons we could take in from his work with Zygote and Flux. Dhairya said that Zygote has adjoints tracked throughout the codebase which allows for much faster autodifferentiation than even PyTorch. This is because we deal with standard semantics through the language IR and not having to assume what user code looks like. There are also a lot of sophisticated operations which allow for autodifferentiation to pass through complex language operations like recursion and control structures. This is implemented by having slim code that allows for generic recursion, and pairing with custom hooks that allow for specialized adjoint definitions. Zygote has been useful for scientific computing allowing for large scale applications.
Date: March 5th, 2021
Attendees: Bharath, Peter, Patrick, Seyone, Nathan
Summary: Bharath gave the same update.
Seyone this week has finished up the ChemBERTa pretraining script for the 10M models. He’s now working on standardizing the script such that it can run masked-language pre-training on any dataset/model using argparser.
Nathan has been working on adding some functionality to the PDBBind loader (PR) to allow access to the core set and to the binding pockets instead of the whole protein. Nathan has also been working on refactoring and fixing the rdkit grid featurizer (PR1, PR2).
We had a brief discussion about updating the DeepChem book. We’ve gotten ok’ed through O’Reilly to do some basic fixes and plan to coordinate to kick off these changes through email.
Discussion moved to whether the rdkit grid featurizer was working? Mostly yes, but we need to do a test. Peter will try a book example to make the final call. Nathan, on the complex featurizer, found voxel arguments were not getting passed which may have explained some of the issues we were seeing earlier in rdkit grid featurizer.
Seyone asked regarding whether the bug reported in Colab for RDKit would affect the tutorials, which Bharath mentioned shouldn’t be a problem as long as they are using the standard conda installer. Seyone mentioned he would re-run the tutorial again to make sure no issues were coming up.
Joining the DeepChem Developer Calls
As a quick reminder to anyone reading along, the DeepChem developer calls are open to the public! If you’re interested in attending either or both of the calls, please send an email to X.Y@gmail.com, where X=bharath, Y=ramsundar.