Crystallography Data to Graph

Are there tools to go from structural data (eg crystallography data and other PDB data files) to a graph representation (nodes and edges) that can be used in a GNN? If there isn’t a tool is there a notebook/blog that goes through these steps?

1 Like

The AtomicConvModel does this for protein-ligand complexes by reading in the PDB files and converting to a graph representation. I think https://github.com/a-r-j/graphein also has some tools that could be useful for dealing with PDB filels!

AtomicConvModel defines the graph based on distance between atoms in 3D space. Possibly instead you want to define your graph based on covalent bonds? RDKit can read PDB files and identify the bonds. See the documentation at https://deepchem.readthedocs.io/en/latest/api_reference/featurizers.html. Wherever you see Chem.MolFromSmiles() in the sample code, you can change it to Chem.MolFromPDBFile().

I’m not sure we have any classes in DeepChem to automate that though? ConvMolFeaturizer expects all the inputs to be SMILES strings, not PDB files.

1 Like

That would be a really cool feature to extend ConvMolFeaturizer and others to also handle PDB/sdf files. There’s a lot of interest in structural models these days