DeepChem Minutes 4/28/2022

DeepChem Minutes

India/Asia/Pacific Call

Date: April 28th, 2022

Attendees: Bharath, Ashish, Arun, Julius

Summary:

Julius is a software developer who is interested in learning more about data science and want to contribute to DeepChem.

Ashish is a software developer who is interested in learning more about rare diseases and cancer.

Bharath has been primarily working with students this week.

Arun has been working on a pull request to improve the QM9 dataset in DeepChem.

Julius has been working on adding M1 support for DeepChem but has run into serious errors.

Ashish asked what a good place to get started would be. Bharath mentioned improving the tutorial would be a good first starting point.

Americas/Europe/Africa/Middle East

Date: April 29th, 2022

Attendees: Bharath, Stanley, Shai, Aryan, Jose, Abhishek

Summary:

Bharath gave the same update as at the previous call.

Stanley has been working on using ray for protein-ligand docking and has been adding more infrastructure on transformers. In particular, Stanley has been looking at the bigbert paper.

Shai has been working on getting deepqmc running but has run into compute limitations. Abhishek and Stanley suggested accessing free AWS/GCP student credits.

Jose has just finished working through the deep learning for life sciences text.

Aryan has been busy with academics but has been working a bit on updating his docs PR.

Abhishek has been reviewing DeepChem GSoC proposals.

Mahdi is joining in to stay up to date.

DeepChem Bioinformatics Call

Date: May 2nd, 2022

Attendees: Bharath, Paulina, Stanley, Arun, David, Prashant, Tony

Summary:

Prashant is a 2nd undergrad who is interested in learning more about bioinformatics.

Bharath has been busy with other work related to RNA structure prediction and hasn’t done much on the bioinformatics end this week.

Stanley has been working with David to learn more about protein sequence transformers. Stanley has also been studying more about big bird and how different features are learned by different attention mechanisms. Mellow sprint with Mahdi: getting unit tests.

David has been busy with other work primarily this last week. He has ideas for adding datasets of protein sequences.

Paulina has been working to learn more about DeepChem’s core structure and has been reading more about big-bert and similar models. Paulina is interested in working to make tutorials citeable.

Tony has been working to build secondary structure prediction for proteins into DeepChem (check out his last PR for more info).

Mahdi has been working to build hardware upgrades for his system and hopes to work with Stanley on protein transformers.

Arun has been working to set up AWS batch infrastructure for Deep Forest Sciences which can be used to train protein transformers. Arun also set up https://github.com/deepchem/deepbio/issues/2 to start gathering useful datasets.

Bharath mentioned https://docs.scvi-tools.org/en/stable/tutorials/index.html (scvi-tools) as a good resource we should learn more about. Paulina mentioned she had been looking into it and might be able to write a tutorial.

Currently:

  • Gathering more resources and issues, example code, and potential datasets.

Down the road:

  • Are there pretty good architectures? Differentiable sequence alignments.
  • Maybe wrapper for scvi-tools.

Joining the DeepChem Developer Calls

As a quick reminder to anyone reading along, the DeepChem developer calls are open to the public! If you’re interested in attending either or both of the calls, please send an email to X.Y@gmail.com, where X=bharath, Y=ramsundar.