GSoC ‘25 | Porting Chemception Model to PyTorch | Final Report

bibhu_18 · August 30, 2025, 10:44am

Hello! This summer, I had the privilege of contributing to DeepChem through the Google Summer of Code program.

Links:

Link to Proposal: Porting Chemception Model to PyTorch
Link to weekly updates thread: GSOC’25 Project: Porting Chemception Model to PyTorch
My Github: https://github.com/Dragonwagon18
My LinkedIn: https://www.linkedin.com/in/bibhusundar-mohapatra-54542526b/

Project Overview:

This project focuses on porting ChemCeption , a convolutional neural network for chemical property prediction, from TensorFlow to PyTorch as part of DeepChem’s transition to a PyTorch-first framework. ChemCeption learns directly from 2D molecular images, bypassing handcrafted descriptors, and has shown strong performance in tasks like toxicity, bioactivity, and solvation energy prediction.

The work involves re-implementing ChemCeption’s core layers (Stem, Inception-ResNet A/B/C, Reduction A/B) in PyTorch, assembling the full model with nn.Module , and integrating it into DeepChem. A comprehensive test suite validates structural integrity, PyTorch–TensorFlow equivalence, save/reload functionality, and model stability.

Pull Requests :

#4286 – Stem layer + tests [Merged]
#4453 – InceptionA and InceptionB layers + tests [Merged]
#4471 – InceptionC, ReductionA and ReductionB layers + tests [Merged]
#4499 – Chemception class extending Torch’s nn.Module class + test [Merged]
#4516 – ChemceptionModel: ModularTorchModel wrapper around Chemception nn.Module class [Not merged]

Chemception

Chemception is a deep convolutional neural network (CNN) designed for predicting chemical properties using only 2D molecular images. Inspired by Google’s Inception architecture, Chemception processes molecular structures without explicit chemical features like molecular descriptors or fingerprints. The model encodes input molecular images into a hierarchical representation, extracts spatial features, and predicts chemical properties such as toxicity, activity, and solvation energy. During training, it minimizes classification loss (e.g., cross-entropy) or regression loss (e.g., RMSE) using a two-stage optimization process. Chemception outperforms QSAR/QSPR models, matching or exceeding MLPs trained on expert-designed fingerprints, particularly in HIV activity and solvation energy prediction, making it a powerful tool for cheminformatics.

Model Architecture

Stem Layer : Begins with a 3×3 convolution (stride 2) for feature extraction, followed by 1×1 → 3×3 convolutions for refinement, and a 3×3 max pooling (stride 2) for dimensionality reduction. Captures low-level patterns efficiently.
InceptionResNetA : Three parallel paths — 1×1 conv (identity), 1×1 → 3×3 conv (local features), and 1×1 → 3×3 → 3×3 conv (deeper features). Outputs are concatenated, scaled with a 1×1 conv, and activated with ReLU. Refines multi-scale features with residual stability.
InceptionResNetB : Paths include 1×1 conv (identity) and 1×1 → 1×7 → 7×1 conv (elongated receptive fields). Outputs are concatenated, scaled with 1×1 conv, and passed through ReLU. Enhances long-range feature learning efficiently.
InceptionResNetC : Paths include 1×1 conv (identity) and 1×1 → 1×3 → 3×1 conv (asymmetrical refinement). Outputs are concatenated, scaled with 1×1 conv, and activated with ReLU. Captures fine-grained features with stability.
ReductionA : Combines 3×3 max pooling (stride 2), 3×3 conv (stride 2), and 1×1 → 3×3 conv (stride 2). Outputs concatenated with ReLU. Efficiently downsamples while preserving details.
ReductionB : Uses 3×3 max pooling, 1×1 → 3×3 convs (stride 2), and asymmetric branches (1×1 → 3×1, 1×1 → 1×3). Concatenated outputs with ReLU. Ensures robust downsampling with diverse feature extraction.

Challenges and Learnings

Challenges:

Adapting ChemCeption from TensorFlow to PyTorch while ensuring architectural fidelity.
Maintaining compatibility with DeepChem’s model API.
Designing reusable Inception-style blocks (Stem, InceptionA/B/C, ReductionA/B).
Handling different image conventions: TensorFlow/Keras uses channels-last (HWC) vs PyTorch’s channels-first (CHW) , requiring careful tensor reshaping.
Technical challenge: Implementing a ModularTorchModel wrapper , a subclass of TorchModel , to enable component-wise pretraining, modular construction ( build_components , build_model ), and custom loss functions leveraging intermediate network states.

Learnings:

Deepened understanding of ChemCeption’s architecture and image-based chemical representation.
Strengthened PyTorch skills in modular design and custom layer implementation.
Learned best practices in integrating new backends into DeepChem.
Gained experience in open-source contribution workflow: testing, refactoring, and documentation.

Future Work:

Getting the ModularTorchClass merged into the codebase along with comprehensive tests.
Raising a PR to fix and improve documentation .
Writing a tutorial on pretraining protocol using chemception model, explaining how pretraining works in DeepChem and demonstrating the use of ModularTorchClass .

Acknowledgement

I’m deeply grateful to my mentors, Shreyash and Bharath Ramsundar , for their invaluable guidance and feedback throughout this project. Contributing to DeepChem has been a transformative learning experience—it not only deepened my understanding of how open-source scientific software is built and maintained, but also highlighted the importance of writing clear, accessible code and tutorials that can be used by researchers and practitioners of all backgrounds.