DeepChem Minutes 11/19/2020

bharath · November 26, 2020, 3:07am

DeepChem Minutes

India/Asia/Pacific Call

Date: November 19th, 2020

Attendees: Bharath, Mufei, Peter

Summary:

Bharath was busy with other work and unfortunately couldn’t do much this week. Bharath plans to return to finishing up the interaction fingerprints and the remaining 3 save/reload tests hopefully next week.

Mufei says we have now added 4 model wrappers for DGL/DGL-Life Sciences PR 2249, PR 2280, and PR 2293. It’s now time to add some baseline examples for the leaderboard! Bharath suggested perhaps starting with random forest and a simple graph conv model. Mufei agreed, suggesting one graph model, non-graph model could be a good place to start. Bharath suggested that perhaps we could use Github pages to host the leaderboard. Mufei said that might make sense, but that we would probably need to iterate on the design. Mufei said he would put up a first PR next week for feedback.

Peter is almost done with the loader overhaul. There are a couple remaining mysteries with the PCBA dataset (issue). Once that’s figured out, we can get in a PR with the last set of conversions, which will conclude the project. Peter’s also been working on the tutorials, in particular the interpretability tutorial (issue), but primarily has been focusing on MoleculeNet.

With roundtable updates finished, we moved to general discussion.

Mufei asked if the next release was coming up and whether we would prepare release notes. Bharath said that he’d start off a document to start editing the release notes and share with everyone for feedback.

Americas/Europe/Africa/Middle East Call

Attendees: Bharath, Vaijeyanthi, James Yu., Nathan, Shubandra, Hari, Seyone, Hosein

Date: November 20th, 2020

Summary:

Hosein is a new attendee and is a master’s of science in biomedical engineering and is now working at a Cambridge based company, mostly using machine learning for drug repositioning.

Vaijeyanthi is continuing to work on learning DeepChem.

James has a few questions we’ll hit later in the call.

Nathan has continued to work on a tutorial for the molecular docking API. Next week, Nathan will be working on overhauling the PDBBind data loader with the new interaction fingerprint featurizers. Once that’s in, the tutorial on molecular docking and predicting docking scores should be ready.

Shubhandra is working through the tutorials.

Hari has been working on a neural deciphering network that retraces back ECFP fingerprints to molecular structures. Bharath mentioned this might be a useful feature to bring into DeepCHem. https://github.com/bayer-science-for-a-better-life/neuraldecipher

Seyone over the last two weeks has gotten the two new tutorials for ChemBERTa ready to go. He’s put up a WIPPR which includes updates to the tokenizers docs, the existing ChemBERTa tutorial, and a new Smiles Tokenizer tutorial. The tutorials run masked token inference with the HuggingFace API, attention visualization, and fine-tune ChemBERTa on Clintox. Seyone plans to run some extra benchmarks for the NeuRIPS ML for Molecules workshop’s camera-ready paper due next week. Seyone mentioned that he will also upload the PubChem 77M dataset onto the MoleculeNet s3 bucket, and draft an auxiliary MoleculeNet PubChem 77M dataset loader function mirroring the existing ZINC data loader made by Bharath and Nathan.

Joining the DeepChem Developer Calls

As a quick reminder to anyone reading along, the DeepChem developer calls are open to the public! If you’re interested in attending either or both of the calls, please send an email to X.Y@gmail.com, where X=bharath, Y=ramsundar.