Date: 4/10/2020
Attendees: Bharath, Peter, Vignesh, Seyone, Akshay
Summary: Bharath fixed the major issues remaining in the TF2.X PR and got that merged in. The graph convolutions and weave models now appear to be working fine. The DAG models appear to converge slowly, but the original paper notes slow convergence so this may be fine. He’s now working through cleaning up the examples. There’s a lot of examples so this will take a while.
Peter’s finished the book code fixes with the TF2.X fix now in (PR). With this fix in, the book code now all runs on the latest DeepChem.
Vignesh is finishing his thesis next week and should have time starting then on to hack on DeepChem. He’s planning to clean up the ChemCeption scripts and the transfer learning framework after finishing his thesis work.
Akshay’s a new attendee on the calls. He’s an undergrad researcher at IIT Roorkee who’s interested in applying deep learning methods to molecule and materials design. He just submitted an implementation of MolDQN in PyTorch to the TorchChem repo. Bharath mentioned that once he’s got the examples cleaned up in the main deepchem repo, he’s planning to do a pass back on torchchem and get it into a stable state. It’s currently a raw mix of code.
Seyone’s got a first crack at a tutorial for applying BERT-style methods to molecular prediction (ChemBERTa). ChemBERTa weights are now uploaded onto HuggingFace and can work for the task of filling in masked atoms on molecules. He’s planning to try them on downstream challenges like Tox21 property prediction next.
Bharath mentioned that there’s an intriguing connection between ChemCeption and ChemBERTa as two methods for using transfer learning techniques from vision and NLP respectively. He asked whether it might be possible to use contrastive learning techniques on ChemCeption. Vignesh said that it might be possible to adapt SimCLR methods there. Bharath commented that in general, it might be useful to expand DeepChem’s support for transfer learning to make it more useful for practitioners with small datasets.