Are there tools to go from structural data (eg crystallography data and other PDB data files) to a graph representation (nodes and edges) that can be used in a GNN? If there isn’t a tool is there a notebook/blog that goes through these steps?
AtomicConvModel does this for protein-ligand complexes by reading in the PDB files and converting to a graph representation. I think https://github.com/a-r-j/graphein also has some tools that could be useful for dealing with PDB filels!
AtomicConvModel defines the graph based on distance between atoms in 3D space. Possibly instead you want to define your graph based on covalent bonds? RDKit can read PDB files and identify the bonds. See the documentation at https://deepchem.readthedocs.io/en/latest/api_reference/featurizers.html. Wherever you see
Chem.MolFromSmiles() in the sample code, you can change it to
I’m not sure we have any classes in DeepChem to automate that though? ConvMolFeaturizer expects all the inputs to be SMILES strings, not PDB files.
That would be a really cool feature to extend
ConvMolFeaturizer and others to also handle PDB/sdf files. There’s a lot of interest in structural models these days