GSoC ‘24 | Porting Smiles2Vec Model from Tensorflow to PyTorch | Final Report

Hello everyone!

I am Harishwar Adithya and I have spent this summer as a contributor in DeepChem in GSoC 2024

Recently, DeepChem has been shifting its backend from TensorFlow to PyTorch. This transition requires the porting of models that were previously built in TensorFlow to PyTorch.

My work involved porting the Smiles2vec model from Tensorflow to Pytorch. I used TorchModel abstraction in Deepchem and wrote tests to make sure that the qualitative and quantitative behavior of the new model matches the TensorFlow base implementation.

In this forum post, I have provided comprehensive documentation and a detailed description of all the tasks and projects I have undertaken throughout the summer as weekly updates.
link: GSOC 2024 Project: Porting Smiles2Vec from Tensorflow to Pytorch

Pull requests:
PR1 (Writing the forward method of the model, inheriting from nn.module) : https://github.com/deepchem/deepchem/pull/4039
PR2 (Integrating the model with Torch Model abstraction and writing appropriate tests): https://github.com/deepchem/deepchem/pull/4045

Progress

Step 1: Wrote the model class ‘smiles2vec’ by subclassing torch.nn.Module and implementing the forward method, where you manually define how data flows through the network.

Key learnings:

  • Keras is higher-level, designed for rapid prototyping and ease of use. It abstracts many details, making it easier to build models without delving into the underlying mechanics. While, PyTorch is more low-level and flexible, providing greater control over the model’s behavior and allowing users to write models more intuitively.

Step 2: Used Deepchem’s TorchModel abstraction, which is a high-level wrapper that integrates PyTorch models into the DeepChem ecosystem.

Key learnings:

  • TorchModel abstracts away some of the boilerplate code required when working with PyTorch directly, such as defining training loops, handling batch sizes, and managing device placement (CPU/GPU).

Step 3: Writing tests to check the implementation of the forward method and overfitting test for both regression and classification task of the model.

Key learnings:

  • Tests often serve as a form of documentation, illustrating how the code is supposed to work. They provide examples of how functions, classes, or models should be used, helping new users and contributors understand the project and at the same time adding an additional layer of verification during code reviews.

Step 4: Writing tests to copy weights from the base tf model to the Pytorch model and compare both outputs.

Key Learnings:

  • The order of outputs from the 3 gates is different for Keras and PyTorch. So while copying weights from the Keras GRU layer to the Pytorch GRU layer, the order of the parameter has to be changed for the 3 gates from tf order(z,r,h) into torch order(r,z,h)
  • CPU and GPU computations in RNN/ GRU layers might output differently because the difference in the precision of both devices could accumulate over a deep network i.e. RNN/ GRU with a lot of hidden layers.
  • This test makes the porting robust since it shows that regardless of the framework the model’s output is the same.