Date: November 12th, 2020
Attendees: Bharath, Vignesh
Bharath this week worked on getting the interaction fingerprints ready to merge (PR). Most of Nathan’s outstanding comments have been addressed, but there’s still a few bugs remaining. Bharath will probably merge in the PR with buggy code commented out to unblock Nathan, but will fix up in future PRs.
Vignesh made a [PR] from the author’s LCNN repo. The source PR was a little hard to understand since it was a little complicated. Vignesh has finished writing the code for the model and
Seyone was unable to attend this week’s call due to one of his classes being shifted to the time of the call, but sent his update for the week. He spent this week making progress on finishing the ChemBERTa tutorials set. The SMILES-BPE tutorial is now done, along with the MolNet dataloader (utilizing a scaffold split) from the ChemBERTa public repo. He also got the Smiles Tokenizer models to work with DeepChem’s tokenizer class and simple-transformers. Both of the tutorials are now done, so he will set up a PR for everyone to review before merging it in. Seyone added that he’s gotten some emails recently regarding open-sourcing more recent fine-tuning scripts and pre-training scripts which work with SELFIES and Smiles Tokenizer, so he mentioned he will start to move them into the public ChemBERTa repository.
Vignesh asked about good testing methodology for developing models. Bharath recommended checking out the ML testing guide.
Vignesh asked if there were any other areas where DeepChem improvements were planned on the future roadmap, and Bharath mentioned physical deep learning and retrosynthesis were both on the roadmap for future work.
Americas/Europe/Africa/Middle East Call
Attendees: Bharath, Peter, Vaijeyanthi, James Y., Nathan, Hariharan, Daniel V.
Date: November 13th, 2020
Vaijeyanthi did her graduate work in biomedical engineering. Have 5 years of experience in data science, mostly in Matlab and R. Worked at the indian institute of science, working on fluorescent microscopy.
Shubhandra did his masters in bioinformatics and worked on molecular dynamics. Currently he’s working on using enhanced sampling techniques for simulation and is interested to learn more about DeepChem.
Peter has been working on continuing to convert the Molnet loaders to the new API. He’s finished the simple ones, but working on the harder ones.
Daniel has been continuing to work on the low data implementation. He’s been focusing on the Siamese network, but is currently not getting very good results. The loss doesn’t go down with training and it’s not yet clear what’s happening.
James has nothing to add this week.
Hari has been working on autoencoders for a bandgap dataset. He used the autoencoders to cluster his dataset into 2-3 clusters. This uses a combination of different loss functions.
Nathan has been working mostly on the docking module and a tutorial to accompany it. Nathan has been working on a new utility function, dc.utils.vina_utils.prepare_inputs that adds some preprocessing steps to process docking structures. There’s an attempt at making sensible default choices, but a lot of warnings/notes that a lot of the structures have edge cases. The utility PR is just about ready to go, and the last thing for the tutorial is to add a machine learning prediction layer using the new interaction fingerprints.
With roundtables complete, we moved into general discussion. Hari asked how a simple k-means encoding compares with an autoencoder; how do they differ? Bharath said they were pretty similar in practice, but the autoencoders might tend to do better with larger datasets.
Bharath mentioned also that Travis-CI is now shutting off support for open source projects. Daiki has been leading our research on migrating to Github actions.
The DeepChem 2.4.0 release is also coming up. 15 Molnet loaders remain to be converted to the new style still, and some of them are complex. Bharath also needs to fix atomic convolutions.
Joining the DeepChem Developer Calls
As a quick reminder to anyone reading along, the DeepChem developer calls are open to the public! If you’re interested in attending either or both of the calls, please send an email to X.Y@gmail.com, where X=bharath, Y=ramsundar.