DeepChem Minutes 10/9/2020

Date: October 9th, 2020
Attendees: Bharath, Peter, Hari, Nathan, Seyone, Tyler, Alana
Summary: Since there were no new attendees on the call, we jumped straight into roundtable updates.

Bharath this week has been continuing work on getting DeepChem running on Sagemaker. This has been technically tricky since RDKit can only run within a conda environment, so we have to wrap a conda environment within a docker image within Sagemaker. Bharath has a first simple model running and hopes to turn that into a tutorial and more documentation in the next week or two. Bharath plans to work on the tutorial 3 for MoleculeNet this week, and has cleared out several days next week to take a crack at solving our serialization issue issue. This is the biggest blocker to getting our next release out the door, so hopefully we can clear this off in the next week.

Peter has been continuing to work on the tutorials this week. He’s got a new PR with more updates to the tutorials. Working on the tutorials is a long on-going project, which is now about halfway done. Bharath mentioned that now that we have the tutorial series up and running, we might want to consider putting up some type of course that teaches DeepChem through these materials. O’Reilly has an online platform and was interested in hosting the material. Peter mentioned he wouldn’t have the bandwidth to work on a course right now, but mentioned that it might be a good time to start thinking about the next version of the book after we release the next version of DeepChem given all the new changes.

Hari has been continuing working on orbital convolutions research and in particular is exploring the use of building ensembles to improve the predictive power of these models. Hari wanted to know if orbital convolutions would fit into DeepChem given that more of DeepChem focuses on organic compounds and orbital convolutions are for more inorganic materials. Bharath said that absolutely it would fit, especially given DeepChem’s new support for crystal graph convolutions that Daiki added and Nathan’s newly added materials datasets.

On a segue, Bharath congratulated Nathan and Seyone for getting in submissions to the ML for molecules Neurips workshops. These are the first DeepChem research papers that have been submitted in the last few years and hopefully we’ll see many more papers in the years to come.

Nathan this last week has been focused on running experiments and getting together a submission for the Neurips workshop. Nathan’s next planned DeepChem project is putting together a DeepChem FEP module that wraps Yank. For active learning experiments, we need a programmatic and automatic way to set up and launch FEP runs and adding some DeepChem infrastructure for this need would be very useful.

Following up on a discussion with Nathan from a few weeks ago, Bharath mentioned that he saw online that HuggingFace’s model hub only takes up 3.5 TB. This is very interesting since it suggests that it might be possible for us to maintain a model hub of our own. Nathan said he’d put together a rough design for what a DeepChem model hub could look like but hadn’t yet been able to post it online. Bharath suggested that we could perhaps take a hybrid approach where we limit model hub upload to DeepChem developers but allow anyone to download models. Seyone agreed from his experience with HuggingFace that this could be a useful design for the system.

Seyone has been wrapping up the ChemBERTa submission to Neurips workshop. In next steps, he’s working on putting up an Arxiv paper and open sourcing the code. Once the code is open sourced, we will have good grounds to start incorporating the code release further into DeepChem. Some good steps to doing this are to add more tutorials, improving the smiles tokenizers, adding selfies support and more. Bharath mentioned that some of the paper infrastructure created easy ways to benchmark models against MoleculeNet. It might be useful to add this infrastructure into DeepChem.

Tyler mentioned that he hadn’t been able to work on DeepChem much this week, but mentioned that the Kipoi which produced a model hub for genomics might be a good model to follow. Bharath said that was a great pointer and that he’d follow up with some of the folks involved.

Alana has been looking into improving DeepChem’s vaccine infrastructure, and in particular has been looking at implementing this paper in DeepChem. Alana mentioned that the authors were open sourcing the model and asked if there would still be value in adding it to DeepChem. Bharath said yes absolutely, since the authors may not be able or interested in maintaining the code for the long term, and that DeepChem could help maintain the implementation and test it thoroughly.

With the roundtable complete, we moved into administrative discussions.

Bharath said that he sent out a survey to find convenient times for the new split developer calls (one for Americas/Europe/Africa/Middle East and one for India/Asia/Pacific). Bharath asked everyone to fill out the survey if they hadn’t already and said he’d select times by focusing on times that worked for DeepChem developer call regulars. Votes from non-regulars would be used as tie-breakers. Bharath asked if this setup made sense for folks which it seemed to be.

As a second point, Bharath mentioned that he was looking more seriously into setting up the DeepChem virtual conference to happen this coming February. He said he wasn’t sure if we should run the conference informally or more formally with invited talks. Nathan mentioned from his experience with the materials project that making a more formal structure for invited talks helped significantly improve the quality of submissions. Bharath said that made sense and suggested that we ask for submission of 1-2 page papers for conference attendees. These papers can be archived on the DeepChem website so submitters can list accepted papers on their CVs. Bharath said he’d send out the conference call for papers to folks for review in the next couple of weeks.

As a third administrative point, Bharath mentioned that he’d chatted with an experienced open source lawyer who’d been involved with the Mozilla project. One recommendation the lawyer gave was that it was important for an open source project to secure a trademark for its name to prevent bad actors from misusing the project’s name. Bharath suggested that we should go ahead and secure the DeepChem trademark and that he could pay for the costs. Bharath also suggested that we can put up general trademark usage guidelines following best practices from Mozilla or other open source groups. Bharath asked if that plan made sense to folks. Alana said this made sense but asked that we check that this keeps us in compliance with OSI guidelines. Bharath said he’d check into it to make sure. Peter said that although he hadn’t personally gotten a trademark for any of the open source projects he’d been involved with, that it was a common thing for open source projects to secure trademarks on their names and said it sounded like a good idea to do. Bharath said he’d proceed with getting the trademark and that he’d put up draft trademark usage guidelines for community review in the next few weeks.

As a quick reminder to anyone reading along, the DeepChem developer calls are open to the public! If you’re interested in attending, please send an email to X.Y@gmail.com, where X=bharath, Y=ramsundar.

1 Like