DeepChem Minutes 5/8/2020

bharath · May 13, 2020, 5:59pm

Date: 5/8/2020
Attendees: Bharath, Peter, Seyone, Sean, Dilip
Summary: Bharath is continuing work on the larger refactoring PR. This last week he pulled out pieces of this refactoring PR and merged them in as smaller PRs. One PR updated the .gitignore file, another PR updated the data loaders and another PR removed NNScore. Bharath did some planning and it looks like the large refactoring PR will be broken into 15-20 smaller PRs (see tentative list) that will be merged in over the next 3-4 weeks. One highlight of the upstream refactoring branch is that it has much improved documentation that will be pushed to the main branch as the large refactoring PR is merged in.

Peter made some significant progress on getting travis CI support for Windows! This is a nice milestone which means that the next version of DeepChem will be supported for Windows developers. In the process, he found that a number of DeepChem’s slower tests were broken so he’s working on a PR to fix these broken tests. This is unfortunately partially blocked by Bharath’s large refactoring PR, especially on the parts that are working on the docking submodule. Bharath is going to try to break out that as a sub PR and get it merged in as soon as possible.

Sean has been continuing to work on a port of DeepChem to Julia. There were a couple issues that he ran into. First, getting GPU support for the port looks like it will be a little tricky. He asked whether graph convolutions could expect a large boost under GPU. Bharath mentioned that DeepChem’s graph convolutions have poor GPU utilization (something like 20% right now). He mentioned that DGL says they see large speed boosts over DeepChem’s implementation which might be worth investigating. Seyone’s been working with DGL and mentioned that a lot of the featurizers look to be based on DeepChem’s implementation and it’s mainly the models that have different structure. Dilip mentioned that another issue seems to be that there’s no good Julia replacement for RDKit, so it’s not been possible to easily port the DeepChem featurizers over. Bharath suggested perhaps making a Julia native RDKit port, and Peter took a look at the SWIG wrappings for RDKit and suggested that it might be possible to extend RDKit’s SWIG wrappers to directly generate Julia bindings for RDKit.

Seyone’s been working with DGL a good amount in the last week and has been using their graph attention networks and other graph convolutional implementations. He asked if it would be possible to improve DeepChem’s support for PyTorch since a lot of the research community has been trending that way. Bharath mentioned this has been a longstanding goal, but is tricky to do. There’s the separate standing torchchem repo which is highly experimental, but DeepChem development has been taking up the majority of the available development time so torchchem has been languishing. He asked whether it would be feasible to just make torchchem a submodule of DeepChem so that development happened in a monorepo. In this model, PyTorch would be a soft dependency of DeepChem that people could install to use torchchem models. Peter pointed out that there’s some unavoidable complexity here. If we merge the repos, then we’ll face extra complexity at the build step, since people will likely want separate deepchem-tf and deepchem-pytorch packages that use either TensorFlow or PyTorch and not both. He suggested it might be cleaner to separate at the repo level since it would force design of clean APIs. Left to an organic development, it’s likely that APIs will become entangled. Bharath noted there’s another tradeoff between application development and API separation; DeepChem’s support for biophysics, genetics, and chemical reactions is still underdeveloped and it’s likely that APIs will have to change considerably, so it might be hard to cleanly separate APIs until we know what the right interface is. Bharath said he’d think more about design questions over the next few weeks.

As a quick reminder to anyone reading along, the DeepChem developer calls are open to the public! If you’re interested in attending, please send an email to X.Y@gmail.com, where X=bharath, Y=ramsundar.