GSoC'24 Final Report | Incorporating Polymer Representation in DeepChem for Drug Delivery and Material Science Support

image

Google Summer of Code, 2024 with DeepChem

Organization:

Project:

Mentor:

Participant:


Objectives

  1. Develop Polymer-Specific Representations:

    • Implement a graph-based polymer representation system that captures the recurrent nature, topology, isomerism, and varying monomer composition and stoichiometry of polymers.

    • Introduce a text-based polymer representation system using BigSMILES notation to better represent the stochastic nature of polymer molecules, allowing for easier storage and retrieval.

  2. Enhance Polymer Representation in Machine Learning:

    • Address the limitations of existing DeepChem representations (e.g., fingerprints, graph representations, and SMILES) that fail to capture the complex features of polymers.

    • Ensure the new representations account for crucial polymer properties like degree of polymerization, copolymer structure, and monomer arrangement.

  3. Evaluate Performance of New Representations:

    • Test the accuracy and applicability of the graph-based and BigSMILES-based polymer representations.
  4. Integrate and Validate Representations in DeepChem:

    • Implement the new polymer representations within DeepChem’s existing architecture, including models and layers.
  5. Create Documentation and Tutorials:

    • Develop comprehensive tutorials and documentation for the new polymer representation methods within DeepChem, focusing on real-world applications such as drug delivery and material science.

Tasks

  1. Define Abstract Base Class for Polymer Featurization

  2. Implement featurize Methods for Polymer Featurizers

  3. Create PolymerGraphFeaturizer Class

  4. Create PolymerBigSMILESFeaturizer Class

  5. Develop Dataset Loaders for Polymer Featurizers

  6. Integrate Featurizers into MoleculeNet

  7. Implement and Validate the Workflow

  8. Develop Documentation and Scientific Explanation

Progress

  1. The base polymer featurizer has been implemented in Deepchem.

  2. A weighted directed graph data class has been added to Deepchem.

  3. Utils and validation functionalities for weighted directed graph featurization is under review.

  4. I added a tutorial to the deepchem website explaining the functionalities of weighted directed graphs.

  5. Added documentation and implementation tutorial on implementing simpler polymer string representation with PSMILES.

  6. Developed example material to utilize deepchem to conduct crystallization tendency regression of polymers.

  7. Developed tutorial on utilization of PSMILES in PolyBERT (chemical language model for polymer fingerprint generation) for polymer similarity prediction using transformer embedding.

The above PRs have been made on DeepChem GitHub repo. There are various other PRs that have been made as part of GSoC to fix Bugs and CI. The PR Archive with the timeline can be found in this link. As part of the GSoC project, we have developed a weekly slide to mark the updates and a forum thread to note technical details for each week.

Link to Forum, Link to Weekly Slides

Pending

  1. BigSMILES is a complex polymer string representation. Before including the complex one, we considered working with PSMILES for a simpler integration and tested it with PolyBERT. Once PSMILES is integrated, we will work on including BigSMILES

  2. Considering the minimal size of the available dataset, the molnet integration of the polymer data was delayed. After proper standardization of polymer representation, it will be considered.

Future Prospects

  • Apart from the GSoC contribution, I look forward to contributing to Deepchem. I will be volunteering for the adoption of AI in the material discovery and deep chemical study domains.

  • I look forward to publishing a study on the standardization of polymer representations with DeepChem.

Key Learnings

  • Learned about the maintenance of a large scientific codebase.

  • Developed idea about stabilizing CI-CD pipelines with updating dependencies

  • Gained insight into scientific article writing for the end user

  • Had the opportunity to discuss scientific concepts with scientists and developers.

Conclusion

It’s a privilege to be part of GSoC under DeepChem. It’s been a transforming journey for me to understand scientific development. I am very much thankful to Bharath Ramsundar and Shreyas Vinaya for their continuous support and guidance during GSoC. As I am pursuing research, I look forward to contributing to Deepchem. It’s been a very productive journey with Deepchem and I hope to be able to increase my efforts to materialize the visions of Deepchem.