I am trying to construct a new data set of protein-ligand complexes labeled with affinity data. I see that deechem provides tools to featurize the structural information. But how do I prepare the complexes to be featurized? I would be great if you could share what pipelines and open source tools you use to clean the protein complexes before featurization.
For example, how do I go from having a downloaded pdb file from PDB to having a pdb file with the HET residues that I am not interested removed and without the crystalization artifacts.