DeepChem Minutes 10/22/2020

bharath · October 29, 2020, 4:13am

DeepChem Minutes

NOTE: I’m the usual scribe for minutes, but due to a neck injury this week the minutes are more raw than usual since I can’t edit carefully. I’m posting them on the usual date for transparency, but with an upfront warning about cogency

Americas/Europe/Africa/Middle East Call

Date: October 22nd, 2020

Attendees: Peter, Mufei, Seyone, Daiki, Hariharan

Summary: Bharath has spent the past week working on getting saving/reload working in DeepCHem. Something like 26/33 of the models now have functional saving/reloading (see issue).

Peter has been working on the cleanup of the MoleculeNet loader functions. He’s put up a first PR that took one MoleculeNet loader to fix and iterated till we got a clean API. Peter has now got a second PR that tries to extend this model to more loaders. We might need to iterate a little bit on these loaders since there are some issues that crop up on new datasets. Peter also noted that some loaders for QM7/8/9 load from .mat files and have a different structure than the usual loaders. Bharath mentioned that he wasn’t sure if these loaders were necessary or whether we could get rid of them.

Mufei has come up with improved baselines for MoleculeNet. Preparing to submit some PRs for model wrappers. Can create scripts for examples. Github issue. How is submission/leaderboard handled? Mufei, mostly loosely. Could just submit a link to the Github repo to try reproducing results. No perfect solution, probably ways to hack

Bharath honors system leaderboard? Mufei, agree, but ask them to submit a pretrained model. To verify this model performance. Fixed period of time to fix issues. Should we ask them to create a DeepChem model wrapper? Some models might have very minor implementation, the models, adding some burdens.

Peter, possibly we should just consider, have to provide working source code for training/evaluating models. Even if we don’t execute the code, the code has to be theres for others to perform. Common script to evaluate

Seyone, agree with the MoleculeNet progress, strong benchmarks and leaderboard will really help. On my end, got ChemBERTa on Arxiv. Can start consolidating everything. Worked on fixing various bugs in the ChemBERTa tutorial. Also working on getting new infrastructure used. Right now closed, but soon open. Two separate tutorials, one the BPE Smiles Tutorial. The smiles Tokenizer tutorial will utilize deepchem smiles tokenizer. One question is if I clone the public facing . Ok to use this repo? Bharath, yet ok.

Daiki improving the paper, improving benchmarking. Daiki has been working on fixing the featurizer. Molecular fingerprint, and the save/loadin function. Also fixing inconsisten API in jhyperparameter tuning. Also added the new featurizer for MACCS key fingerprints.

Hari has been working on graph convolutions on solar energy datasets. Have been trying to work on creating an ensemble of a deep graph network with a Morgan fingerprint. Apart from that, the DeepChem atomic convolution uses 75 descriptors. Trying to change it to use a different featurizer. Bharath, asked if using RDKIt or custom code? Hari used the DeepChem source code based on which particular atom it is. Custom make, based on CGCNN.

Bharath asked if there were general points of discussion, but there wasn’t anything to hit so we wrapped up the call

India/Asia/Pacific Call

Attendees: James Yoder, Vignesh, James, Peter, Seyone, Daniel, Steven

Date: October 23rd, 2020

Summary: As usual, we started off with a round of introductions.

James Yoder trained as a statistician, graduated Harvard, working in QSAR. Excited about trying to build models for lead optimization, uncertainty quantification. Hopefully can add value

Vignesh undergrad physics student. Interested in applying ML techniques to physical systems. Came to know about this group through Akshay.

James Yuan is a phd student based in the UK who’s looking to use DeepChem in his research.

Bharath gave the same update as in yesterday’s meeting.

Peter gave the same update as in yesterday’s meeting.

Seyone gave the same update as in yesterday’s meeting.

Michael not much update, still tasked with adding the slow tests to the doctest

Daniel was a bit occupied at work, planning on giving the low data model more hours

Seyone asked when the Deepchem conference would happen. Bharath mentioned he was still working on the call for papers and would send out a draft for review next week hopefully. Seyone suggested reaching out to core academic groups in the field who aren’t currently involved with DeepChem, like the Debora Marks lab or others.

Steven mentioned that he has colleagues in academia who would love to get started contributing to the library in the community. He asked if there were guides or issues specifically tailored for beginners? That is for people getting started in research but would love to contribute to open source at the same time.

Seyone mentioned that the tutorials are a great first step (https://github.com/deepchem/deepchem/tree/master/examples/tutorials)

Seyone mentioned that after that there were the "Good first contribution’ tags for issues worth taking a look at. Bharath mentioned that the “Deep Learning for the life sciences book was also a good resource.

Joining the DeepChem Developer Calls

As a quick reminder to anyone reading along, the DeepChem developer calls are open to the public! If you’re interested in attending either or both of the calls, please send an email to X.Y@gmail.com, where X=bharath, Y=ramsundar.