Progress Report [ Week - 0 (27.05.2024 - 03.06.2024) ]
New Featurizers for Polymer Representation and Utilities for Graph-Based Featurizations!
Hi everyone,
I’m excited to share two new PRs that introduce tools for working with polymers and graph-based featurizations in DeepChem!
PR 1: Polymer Featurizer Base Class and Weighted Directed Graph Data Structure
This PR tackles the challenge of representing and featurizing polymers, which are large molecules with repeating structural units.
-
New
PolymerFeaturizer
Base Class: This abstract class provides a framework for converting different polymer representations into features. It supports both BigSMILES string representations and a novel Weighted Directed Graph representation (more on that below!). Subclasses can implement the _featurize
method to handle specific feature calculations.
-
Introducing
WeightedDirectedGraphData
: This new class allows us to represent polymers as weighted directed graphs, capturing both the monomer structure and the distribution of different fragments within the polymer chain. This representation provides a more detailed and flexible way to encode polymer information.
PR 2: Utility Functions for Graph-Based Featurizations
This PR focuses on providing helpful utility functions to streamline the process of building graph-based featurizers, particularly for polymers.
-
FeaturizationParameters
Class: This class stores all the parameters needed for encoding atom and bond features, ensuring consistency and simplifying featurizer development.
-
Hydrogen Handling: The
handle_hydrogen
function provides fine-grained control over hydrogen addition and removal during molecule construction from SMILES strings.
-
Polymer-Specific Utilities: Functions like
make_polymer_mol
, parse_polymer_rules
, and tag_atoms_in_repeating_unit
offer specialized tools for building and manipulating polymer molecules from SMILES strings and rules describing their composition.
-
Feature Generation: The
generate_atom_features
and generate_bond_features
functions provide standardized methods for generating feature vectors for atoms and bonds, respectively.
Pointers for Review
I welcome your feedback and suggestions on these PRs! Let’s discuss how we can further enhance DeepChem’s capabilities for polymer modeling and graph-based featurizations.