Google Summer of Code 2022: D-MPNN Model Implementation for DeepChem

Hey everyone!

I am Aryan Amit Barsainyan. I will be working with DeepChem this summer as a GSoC contributor.

About me

I am a 2nd Year UG student at the National Institute of Technology Karnataka, India pursuing Mechanical Engineering (Major) and Information Technology (Minor). I am a Deep Learning enthusiast and enjoy working on tech that uses ML and Deep Learning for solving complex scientific problems.

I started my journey as a part of the DeepChem community in January 2022. I am happy to have found an organization where I could learn a lot about the real-world workings of ML models, especially graph-based neural networks, and I plan on contributing to DeepChem even after GSOC ends by improving the project implementation while also fixing issues and helping the community, wherever and whenever possible!

Contact Details

About the Project

This project seeks to bring a new tool to the DeepChem suite for solving message passing problems based on the recent advancements in GCNs research. This project aims to implement a Directed – Message Passing Neural Network (D-MPNN) model, a graph convolution network (GCN) built upon the existing Message Passing Neural Network (MPNN) model based on the base implementation in Chemprop.

I shall be updating my progress here over the summer on a weekly basis. Stay tuned!

Project description is available here.

2 Likes

Extended Week 1 progress (1 June 2022 - 10 June 2022)

PR 1: https://github.com/deepchem/deepchem/pull/2929

  • Files modified:
    • deepchem/feat/molecule_featurizers/__init__.py
      
    • deepchem/feat/molecule_featurizers/dmpnn_featurizer.py
      
    • deepchem/feat/tests/test_atom_feature_generator_dmpnn.py
      
  • Created:
    • atom_features()
      
    • get_atomic_num_one_hot()
      
    • get_atom_chiral_tag_one_hot()
      
    • get_atom_mass()
      
    • Suitable unit tests for all the added functions

PR 2 (Draft): https://github.com/deepchem/deepchem/pull/2939

  • Files modified:
    • deepchem/feat/graph_features.py
      
    • deepchem/feat/molecule_featurizers/dmpnn_featurizer.py
      
    • deepchem/feat/tests/test_atom_feature_generator_dmpnn.py
      
    • deepchem/feat/tests/test_features_generator_dmpnn.py
      
  • Modified:
    • bond_features() in graph_features.py
      
    • atom_features()
      
  • Created:
    • bond_features() in dmpnn_featurizer.py
      
    • map_reac_to_prod()
      
    • Suitable unit tests for all the added/modified functions

Issue created: https://github.com/deepchem/deepchem/issues/2936

  • To address doctest warning for ndarray()

PR created to rectify the problem: https://github.com/deepchem/deepchem/pull/2937

  • Files modified:
    • deepchem/feat/graph_features.py
      

Content of PR 3 in progress

  • Goal: to implement DMPNNFeaturizer class
  • Open topics to discuss:
    • Normalization of features
    • Phase features generation
    • Usage of BatchGraphData class
  • Files modified so far:
    • deepchem/feat/molecule_featurizers/dmpnn_featurizer.py
      

Highlights of the office-hour calls:

  • Discussion on creating PRs through a feature branch in the remote repo.
  • Discussion on using GraphData class instead of implementing new DmpnnMol and DMPNNEncoding classes.
  • Volunteering to review PRs from fellow contributors.
  • Addressed errors in CI.
  • Why switch from Keras to Pytorch?
  • Discussion to initially make DMPNN model only for non-reaction type datapoints.
  • A future scope: new base class - Reaction featurizer
  • Discussion on atomic mass normalization and lack of support in Deepchem to handle inequality targets.

New learnings this week:

  • Handling multiple branches in a repository.
  • Need for rebasing and procedure.
  • Procedure for error-free type annotation (invariance and covariance).
  • Improved the knowledge of writing proper unit tests.
  • Better practice is to lint the code before formatting.

Week 2 Progress Report

11 June 2022 - 17 June 2022

  • Bond features PR got merged.
  • Re-investigated the DMPNN paper for better understanding of the algorithm.
  • Created the DMPNN Featurizer class and the _featurize function.
  • Wrote unit tests for the featurizer class.
  • Cleared the misunderstanding about CanonicalRankAtoms() function.
  • Modified base class Molecule Featurizer.

Upcoming tasks for weekend:

  • Make changes in the Deepchem Docs and submit the PR for DMPNN featurizer.
  • Create a PR to add global features to the featurizer.
1 Like

Week 3 Progress Report

20 June 2022 - 24 June 2022

  • Earlier this week, submitted PR (6 files changed) to add DMPNN featurizer class to Deepchem with changes to base class MolecularFeaturizer.
  • Created function to generate global features.
  • Working on adding global featurizer from library ‘descriptastorus’ to deepchem.
  • Working on splitting the DMPNN featurizer PR and correcting suggested changes in OH meet.

Upcoming tasks for weekend:

  • Push the PR for only the base class modification and required changes.
  • Run dmpnn featurizer through existing datasets and create a helper function for the featurizer class.
1 Like