Prepare Protein-Ligand Complexes for Featurization

I am trying to construct a new data set of protein-ligand complexes labeled with affinity data. I see that deechem provides tools to featurize the structural information. But how do I prepare the complexes to be featurized? I would be great if you could share what pipelines and open source tools you use to clean the protein complexes before featurization.

For example, how do I go from having a downloaded pdb file from PDB to having a pdb file with the HET residues that I am not interested removed and without the crystalization artifacts.

Thank you!


Try PDBFixer. It can deal with a lot of the issues that come up in downloaded PDB files. Some manual curation is still needed though. Software can automate a lot of the cleanup, but there’s no substitute for looking at the structure and making sure everything is ok.


Thank you for the suggestion!

1 Like