DeepChem Minutes 11/5/2020

bharath · November 12, 2020, 12:36am

DeepChem Minutes

Americas/Europe/Africa/Middle East Call

Date: November 5th, 2020
Attendees: Bharath, Peter, Mufei, Renuka
Summary: Renuka hasn’t been on the call in a few weeks, so we started with a quick re-introduction. Renuka is an undergrad student interested in working on synthetic biology techniques and is learning more about DeepChem.

Bharath this week has been working on finishing up the list of remaining reload tests (issue). He added in PR1, PR2 adding in tests for reloading MAML and A2C and PPO. Bharath has been working on the interaction fingerprints.

Peter has been working on overhauling the MoleculeNet loaders and has gotten a new PR in. Most of them are very straightforward, but some of them are very special purpose. The QM7/8/9 datasets in particular have some interesting behavior where different files are downloaded for different featurizations. Bharath mentioned that it might be useful looking at the http://quantum-machine.org/datasets/ source website. Another issue that Daiki pointed out is that some of the MoleculeNet loaders have no documentation. Peter has also been continuing on the project of updating the tutorials. Again, some of them are very straightforward, but in other cases, there are some complications.

Bharath mentioned that recently the Stanford S3 that held the original MoleculeNet datasets went down. Peter mentioned this was also an issue for MDTraj and OpenMM and that it should hopefully be fixed over the next day or two temporarily, but that we should try to get a more permanent fix in place. Bharath mentioned that we’ve migrated to the DeepChem S3 URLs on master, so the right fix might just be to get 2.4.0 released.

Mufei this week submitted PR 2280 for adding PyTorch model wrappers for GAT and AttentiveFP. There is currently a model wrapper for GAT but it is not robust enough so this PR replaces that one. Bharath suggested adding a model wrapper for MPNN as well if the MPNN implementation from DGL-LifeSci is robust enough and gradually deprecated the previous MPNN implementation in DeepChem. Mufei liked the idea and suggested starting working on baselines for MoleculeNet after this PR is merged. Bharath agreed.

Bharath asked if there were any questions we could answer for Renuka about DeepChem. Renuka said there weren’t any in particular, but asked about references for synthetic biology. Bharath mentioned that the cello project might be a good place to start.

DeepChem’s 2.4.0 release is coming up. Bharath has 3 reloading tests he still needs to add. Once those are in, Peter agrees we should make the release come out in a week or two. Bharath asked if we should make a new branch for the release. Peter mentioned that it wouldn’t matter since we could always make a new branch from the tag. Bharath also asked whether we needed a release candidate, and Peter said probably not since we’ve had most users on the nightly build for a while.

Daiki couldn’t make the call but sent in an update for the minutes:

In this week, I fixed the docs build and reorganized docs directory. Now, I’m working for more improvements about API docs and updating data, splitter, transformer and featurizer. I will make all PRs for them in this weekend.

And, I continue to test Github Actions. Last week, I said that Github Actions is not mature, but it was not correct. In this week, I was looking into more details, and Github Actions worked well in MacOS and Linux except Windows. In Windows CI, the Vina installation is not working well. The code which causes the error is here.

github.com

deepchem/deepchem/blob/ab07c0b6b070d376039f55762e6894df289b7364/deepchem/dock/pose_generation.py#L143-L145


if platform.system() == 'Windows':
  msi_cmd = "msiexec /i %s" % downloaded_file
  check_output(msi_cmd.split())

Now, I’m trying to resolve this problem. If I can resolve it, I will start to working for migration from Travis CI to Github Actions.

Do you recognize the change about the pricing model…?

This change brings some build limitation in OSS. If this changes have affected our build, we need to migrate to Github Action as soon as possible.

India/Asia/Pacific Call

Attendees: Bharath, Seyone, Akshay, James Yu. , Nathan, James Yo., Swag
Date: November 6th, 2020
Summary: Akshay is an undergrad student at IIT Roorkee who’s interested in applying machine learning and deep learning with materials science.

Swag is the co-founder of a stealth startup. Swag’s background is in physics, and his startup focuses on using machine learning and biophysical for therapeutic design.

Bharath gave the same recap of his work for the week.

Seyone has wrapped up the work on the Smiles Byte-Pair Encoder tutorial. Seyone plans to also get the Smiles Tokenizer tutorial working before he puts up a next PR. Progress this week has been a little slow with university applications, but will hopefully have a lot more time after the end of next week to get back to building out ChemBERTa utilities such as the updated featurizer for large datasets.

Akshay has been working on trying to add the Elemnet dataset to MoleculeNet. Akshay just made a new pull request PR today that adds in the required featurizer for Elemnet.

James Yu. just submitted his first paper using DeepChem! James is working on hyperparameter optimization. James wanted to know what feature_num for graph convolutional models meant. Bharath mentioned it was the number of per-atom features considered.

Nathan has been working on overhauling one of the tutorials on protein-ligand interactions. When that’s done that will introduce the dc.dock API and the interaction fingerprints. Nathan also reviewed Bharath’s interaction fingerprints PR…

Jamyes Yo. had a good conversation with Nathan about some of his work this week.

Bharath talked about the 2.4.0 release and his hopes to get it out the door in the next couple of weeks…

Nathan has been thinking more about docking in general and noticed that the process involves computing a number of noncovalent interactions, very similar to physics. Nathan has been brainstorming more about causal modelling, using ideas like the interaction fingerprints as a base. Swag noted that even in physics some interactions are correlative and not necessarily causal, as in electron interactions, so asked for more clarification on the causal interaction. Nathan said there’s an interaction in the docking that causes a jump in energetic favorability. Bharath mentioned that this paper might be pertinent. Swag mentioned that another pertinent idea might be to look at mediator analysis which assumes there’s one intermediate state between the undocked/docked state wiki.

James Yuan asked what folks favorite cloud infra was. Bharath mentioned AWS, Swag mentioned that he used Google cloud and it’s been pretty solid for his use cases.

Joining the DeepChem Developer Calls

As a quick reminder to anyone reading along, the DeepChem developer calls are open to the public! If you’re interested in attending either or both of the calls, please send an email to X.Y@gmail.com, where X=bharath, Y=ramsundar.