DeepChem Minutes 8/5/2021

DeepChem Minutes

India/Asia/Pacific Call

Date: August 5th, 2021

Attendees: Bharath, Ashwin, Arun

Summary:

Bharath has been mostly working with GSoC student code this summer. He’s also started a few issues with suggested DeepChem improvements (https://github.com/deepchem/deepchem/issues/2640).

Ashwin has been working on the reaction tokenizer. He put up a PR to fix the doctest for the tokenizer. He’s also put up a bugfix which enables splitting of datasets that don’t have labels. Splitting on the USPTO dataset looks good. Ashwin also worked on switching to using RobertaFeaturizer with USPTO. He’s managed to resolve the tokenization detection by using the ChemBERTa vocabulary. The next step will be to attempt to tokenize the full dataset.

Arun has been working on reordering the tutorials. Bharath said the PR looked good. Arun said he planned on working to add a model contribution guide this week and possibly make some YouTube videos. Arun also pointed out that new users would not know which models are on which backends. Bharath suggested adding a backends label to the model table.

Americas/Europe/Africa/Middle East

India/Asia/Pacific Call

Date: August 6th, 2021

Attendees: Bharath, Peter, Atreya, Vignesh, Seyone

Summary:

Bharath gave the same update as he did the previous day.

Peter has gone through the remaining failing test cases and is trying to address them. He has a PR up to fix a flaky test case. Another is an argument error caused by a change to atomic conv. The last is an issue where a molnet test isn’t achieving decent performance.

Atreya has put up his PRs for his layers and was working on getting them merged in. He’s currently working on a PR to add in the MAT model and believes it should be done soon.

Vignesh has been working on the new PINNs model and has been getting Burger’s equations and Schrodinger’s equations to work. He’s now working on a superclass for PINNs to add in. Bharath mentioned it might be useful to look at Nvidia simnet (https://developer.nvidia.com/simnet).

Seyone this week merged in the RobertaFeaturizer PR. Seyone is now working on the older ChemBERTa model PR. He’s also working with Alana to get RobertaFeaturizer working with CSVLoader. Seyone also asked if there was a convenient way to convert a DeepChem dataset to a PyTorch dataset. Peter mentioned yes and pointed to https://deepchem.readthedocs.io/en/latest/api_reference/data.html#deepchem.data.NumpyDataset.make_pytorch_dataset.

David has been working on the PR adding the SwissPROT dataset. The PR looks to be working but there’s some issue with the padding in CSVLLoader.

Joining the DeepChem Developer Calls

As a quick reminder to anyone reading along, the DeepChem developer calls are open to the public! If you’re interested in attending either or both of the calls, please send an email to X.Y@gmail.com, where X=bharath, Y=ramsundar.