DeepChem Minutes 5/22/2020

bharath · May 27, 2020, 5:43pm

Date: 5/22/2020
Attendees: Bharath, Vignesh, Daiki, Seyone, Peter
Summary: This was our first call shifted to our new 3 pm PST time so Daiki could make it. Daiki gave a brief introduction about himself. He’s a 2nd year master’s student who’ll be working on DeepChem as part of Google Summer of code this summer. Daiki has posted a project introduction on the forums and a project roadmap on Github.

Bharath has spent most of the past week working on his overhaul of DeepChem’s docking support (PR). This new PR looks a bit larger than it is due the addition of a new data file for testing purposes. The core of the PR is relatively simple and strips down the design of the dc.dock module. The original design had classes such as VinaGridRFDocker which hard-coded use of Autodock Vina, the Grid Featurizer, and random forests. The new design removes these hard-coded classes and instead has fewer more generic classes. This PR should be ready for review shortly.

Seyone’s been working with graph attention models primarily over the last week. He asked whether it would be possible to add a graph attention model into DeepChem. Bharath asked if Seyone would have bandwidth to contribute a keras implementation into DeepChem. Seyone said he had a PyTorch implementation ready which could be contributed into TorchChem but a Keras one might be a little trickier to pull off. He said he’d take a look at it, and that he was also interested to give the docking support a try.

Peter’s been working on optimizing the graph convolution models. He found that the actual bottleneck was mainly in the DiskDataset class. He added a PR which cached shards of data in memory which led to a nice speedup. The next major bottleneck was for the ConvMol object. Peter simplified the agglomerate_mols function in this PR. There were also a few minor improvements to the GraphConvModel itself (PR). The upshot of these changes is that training a graph conv model (on a small benchmark) is twice as fast! GPU utilization is also now up to 40%. One nice feature of these improvements is that the general improvements to DiskDataset should speed up models across the board.

Vignesh has been working on his Neurips paper this past week. He’s getting close to done so hopes that he’ll have more time to contribute to DeepChem development starting from next week.

Seyone asked if there had been any progress on getting Sagemaker support for DeepChem. Bharath mentioned that Sagemaker should work with DeepChem with proper setup, but that we should add proper documentation and support for installation. He said he’d try to get to it in a few weeks if possible.

Bharath asked if the new time worked for everyone, and it looks like everyone is OK with it. We’ll plan to continue the calls at 3pm PST Fridays.

As a quick reminder to anyone reading along, the DeepChem developer calls are open to the public! If you’re interested in attending, please send an email to X.Y@gmail.com, where X=bharath, Y=ramsundar.