DeepChem Minutes 4/15/2021

bharath · April 22, 2021, 7:20am

DeepChem Minutes

India/Asia/Pacific Call

Date: April 8th, 2021
Attendees: Bharath, Dhairya, Peter, Ashwin
Summary: Bharath over the last week has worked on some DeepChem planning and reviewing PRs.

Ashwin has been working on a plan for implementation on a method for interfacing DeepChem with HuggingFace. Bharath suggested posting that to the forums for review once ready.

Peter has been working on updating the book chapters. Peter has also been listening to the GTC talks and mentioned the XLA talk was interesting. There are a number of interesting talks on new libraries like legate and horovod.

Dhairya mentioned that the Julia ecosystem already has native profilers available for improving the ecosystem. Julia has a number of interesting features in many ways ahead of the Python ecosystem, with a much faster compiler.

Dhairya also mentioned the chemistry featurization for Julia’s chemistry has been coming along as well.

Bharath mentioned that the Dex team had put out a new paper https://arxiv.org/abs/2104.05372 that looked really cool.

Americas/Europe/Africa/Middle East

India/Asia/Pacific Call

Date: April 9th, 2021
Attendees: Bharath, Seyone, Andrea, Vignesh, Nathan, Walid
Summary: Walid is an engineer at Reverie labs. Walid is working with Bharath and Seyone on the ChemBERTa project.

Herman works at Nurix and is here to learn more about DeepChem.

Bharath gave the same update as he did yesterday.

Seyone has been working on getting fine tuning working with ChemBERTa sequence classification. He’s trying to get this pure hugging-face implementation’s benchmarks at the same level as the original simple transformers approach in the ChemBERTa tutorial.

Andrea was looking into MLFlow and how to integrate deepchem with mlflow. He was trying to figure out how to save models and ran into difficulties. Bharath mentioned that fit() autosaves.

Vignesh didn’t have time to submit a ray proposal to GSoC but is still interested in it. Vignesh was looking at a library that combines ray and sklearn, https://github.com/ray-project/tune-sklearn. Vignesh is also working on a wrapper for the PagTNModel in DGL-life sciences.

Nathan finished up some preliminary benchmarks in the PDBBind dataset and is reviewing GSoC proposals.

Stanley mentioned he had a talk accepted into the Ray summit.

David has been working on a tutorial for predicting pressures in proteins and has put up a first pull request.

Walid asked when we should subclass versus compose classes. In particular, for a smiles featurizer, should that featurize from MolecularFeaturizer or RobertaTokenizer. Bharath mentioned this was an example of mixins https://www.residentmar.io/2019/07/07/python-mixins.html.

Seyone asked about whether a ChemBERTa-sequence classification model should perhaps also utilize mixins due to the fact that it may subclass TorchModel and PreTrainedModel. Bharath responded that this was a tough call, given that no other torch model employs mix-ins, and perhaps there should only be one subclass. Seyone agreed and mentioned that perhaps the Smiles featurizer would be a good first step before figuring out how to design the sequence classification models.

Joining the DeepChem Developer Calls

As a quick reminder to anyone reading along, the DeepChem developer calls are open to the public! If you’re interested in attending either or both of the calls, please send an email to X.Y@gmail.com, where X=bharath, Y=ramsundar.