DeepChem Minutes 1/7/2021

DeepChem Minutes

India/Asia/Pacific Call

Date: January 7th, 2021

Attendees: Bharath, Mufei, Peter, Du, Vignesh

Summary: We had a newcomer on the call so we started with a brief intro.

Yuanqi is an undergrad and a research intern at Microsoft research who’s interested in deep graph learning for protein structure prediction.

Bharath is just coming back online after being out for the holidays. He hasn’t done much this week but mentioned that he was planning to work on the DeepChem 2.4.0 during the coming week.

Mufei is still working on the MoleculeNet benchmarking. Mufei encountered some issues on the Chembl dataset (issue 2345). Mufei is asking whether we are planning to support QSAR on proteins. Bharath said not at mind, but it would be interesting. Bharath said it might be useful to check out unirep and Peter suggested the Atom3D paper.

Mufei mentioned that Yuanqi and other folks might be interested in contributing to the MoleculeNet benchmarks as well.

Peter doesn’t have much to report this week.

Vignesh was also busy this week and couldn’t get much done.

Yuanqi asked if it would be feasible to set up a leaderboard or server of some sorts.

Mufei asked if we could make the loaders less entangled with the DeepChem infrastructure. Bharath said it might be feasible with Peter’s new infrastructure for MoleculeNet loaders. Peter said adding numpy arrays or other return formas wouldn’t be the challenging part, but the featurization was the difficulty. Bharath mentioned it might be useful to add a tutorial for featurization that explains how to write your own custom featurizer

Yuanqi asked if it would be feasible to have easy/hard/medium labels for MoleculeNet datasts. And make sure splits were deterministic. Bharath said the splits were handled in the latest version and that it might be feasible to have some sort of labels.

Bharath mentioned that we should figure out the issue with the QM9 dataset before we make the 2.4.0 release.

Americas/Europe/Africa/Middle East

India/Asia/Pacific Call

Date: January 8th, 2021

Attendees: Bharath, Seyone, Hari, Hosein, Nathan

Summary: We had a newcomer on the call so we started with a brief intro.

Kishore is a neurostatics researcher at BU who’s interested to learn more about DeepChem.

Stanley is a mathematician who works on deep learning research who works at a company called drift. He’s interested in using NLP to model bacterial genomes.

Bharath gave the same update as he did the previous day.

Seyone has been working on university applications over the last couple of weeks but is done and will now restart ChemBERTa.

Hari came back from break just this Monday.

Hosein is working on active learning and Gaussian processes on a personal project, but hopes to contribute the implementation to DeepChem at the end.

Nathan has also been watching the news and getting back into DeepChem work. He’s working on the docking tutorial. The programmatic docking is working well, but the ComplexFeaturizer still has some issues. For the tutorial he’s thinking of using a small PDBBind set.

We then moved to the general discussion part

Hosein asked about API design for composite featurizers. He pointed out that there’s no easy way to compose featurizers. Bharath agreed this would be an interesting thing to look into.

We then moved back to the FEP discussion. Nathan gave a discussion about the status (issue). The source is a paper from deepmind that uses normalizing flows to sample protein-ligand conformations. Most of the implementation is in DeepChem and TensorFlow already, but the CircularSplineBijector is the last thing we need to replicate the papers

Joining the DeepChem Developer Calls

As a quick reminder to anyone reading along, the DeepChem developer calls are open to the public! If you’re interested in attending either or both of the calls, please send an email to, where X=bharath, Y=ramsundar.