DeepChem Minutes 1/14/2021

bharath · January 21, 2021, 3:37am

DeepChem Minutes

India/Asia/Pacific Call

Date: January 14th, 2021

Attendees: Bharath, Mufei, Yuanqi

Summary:

Bharath this week has been working on the DeepChem 2.4.0 release (issue). We’ve now tagged the release and gotten the pip and conda-forge packages installed. Bharath is now working on completing the release notes and getting the docker package upgraded.

Mufei this week has made two submissions (issue 20) on the clearance dataset. There were some issues with using RMSe as the metric for the dataset due to the small dataset and labels issues.

Yuanqi has been trying to run one of the MoleculeNet v2 datasets. It’s been a busy week so he’s just gotten started and will update later.

For general discussion, Bharath asked about input for future releases. It’s been a long while since there was a release and asked if we should move to a faster quarterly release cycle. Mufei suggested that we might even consider a smaller release cycle and suggested that we might want to even do a monthly/bimonthly release cycle. Mufei suggested also that for monthly releases we could do pip/conda only (which are automated) for ease of release. Bharath agreed this would be a good idea and suggested that we might target 2.4.1 for the end of February.

Bharath also noted that for MoleculeNet v2, we will likely want to apply https://github.com/PatWalters/rd_filters to filter some promiscuous binders from some of the assay datasets.

Americas/Europe/Africa/Middle East

India/Asia/Pacific Call

Date: January 15th, 2021

Attendees: Bharath, Peter, Seyone, Nathan

Summary:

Bharath gave the same update as on

Peter has been working on featurizations for models based on 3D conformations of atoms. The next step will be to create some models that make use of it.

Seyone has been hacking on a new pretraining task for ChemBERTa based on predicting RDKit and also working on updating the smiles tokenizer to be more compatible with RoBERTa tokenizer.

Nathan now has a minimally working tutorial on the new programmatic docking API and the interaction fingerprints. He’s still working through some issues related to the featurization but the tutorial should be in good enough shape to put up a WIP PR. Once the partial featurization / checkpoints are ready that should help. Nathan mentioned that it would be really handy to be able to get a subset of a dataset like PDBBind to be able to test featurization.

Hari has been working on reading the Schrodinger paper on FEP. There’s a lot of new terminology there so it’s a little tricky. Hari has also started doing the Yank tutorials as well. Bharath said it would be useful to put together a sketch of a design for programmatic FEP once Hari was ready. Peter mentioned it might be useful to look at the open force field (https://openforcefield.org/science/publications/).

Stanley looked through a lot of the codebase and contributor guidelines. Stanley is particularly interested in hyperparameter turning. Stanley did some testing with dc.hyperon a set of resistance prediction ensemble-models as a starting point to consider other integrations (keras tuner specifically).

Bharath suggested that it might be useful to have more frequent releases, with monthly or bimonthly releases. Stanley agreed and suggested that this might make it more easy for corporate users to add DeepChem to their stacks. Peter agreed, but suggested that instead of releasing 2.4.1, we probably want to release 2.5.0 since the next release wouldn’t just be a minor release. We all agreed to target DeepChem 2.5.0 for the end of February.

Joining the DeepChem Developer Calls

As a quick reminder to anyone reading along, the DeepChem developer calls are open to the public! If you’re interested in attending either or both of the calls, please send an email to X.Y@gmail.com, where X=bharath, Y=ramsundar.