Representation vectors of protein-ligand interactions and clustering with DeepChem GraphConv


I have a set of virtual screening hits and want to create “representation vectors” for ligands (L) and ligand-protein interactions (LP) using DeepChem GraphConv.

How can I create representation vectors for L and LP using GraphConv? Can you recommend any tutorial-like material for that? It would be great if you could share pipelines you use for creating the vectors of protein-ligand interactions.

Then, I aim to use these vectors, individually (L, LP) or combined (L+LP), as input for an unsupervised clustering task. So, I want to use the representation vectors for clustering the compounds instead of classifying them. Do you think they are suitable for clustering tasks?

Thank you!


Answering here as well for searchability :slight_smile:.

The right answer here is probably to use the new interaction fingerprints which were merged into DeepChem just now. We’re working on tutorials for them so hopefully we’ll have some nice resources for this in the next month or so!

1 Like

I’ve checked out the new interaction fingerprints from deepchem/deepchem#2212. Can I use them now?

I’ve tried to use one of them but received an attribute error as below (dc.version>

size = 8
featurizer = dc.feat.ContactCircularFingerprint(size=size)
features = featurizer.featurize(mol_files=[‘ligand.sdf’], protein_pdbs=[‘protein.pdb’])

AttributeError Traceback (most recent call last)
in ()
1 size = 8
----> 2 featurizer = dc.feat.ContactCircularFingerprint(size=size)
3 features = featurizer.featurize(mol_files=[‘ligand.sdf’], protein_pdbs=[‘protein.pdb’])

AttributeError: module ‘deepchem.feat’ has no attribute ‘ContactCircularFingerprint’

Thanks for bringing this up! It looks like we have a problem with the nightly build we need to figure out on our end… We’ll look into it and see fi we can get this fixed so you can start trying out the new features!

Hi again,
I think the new interaction fingerprints are ready to use. (I’m not receiving the “no attribute” error anymore). Thanks for your help and effort!
However, I think I’ve made a mistake in the featurize command. So, still I cannot get the complex features as I copied below. Could you please help me to correct the mistake?
I hope the tutorial for the complex features will come soon…

Here is my code and its output:

#I have a protein-ligand complex from a VS campaign. I saved the target protein as my_protein.pdb and the hit molecule pose as my_ligand.sdf

size = 8
featurizer = dc.feat.ContactCircularFingerprint(size=size)
features = featurizer.featurize((“my_ligand.sdf”, “my_protein.pdb”))

Failed to featurize datapoint 0. Appending empty array.
Failed to featurize datapoint 1. Appending empty array.

1 Like

Hmm, is there perhaps anything exotic about the protein you’re working with? If you can share the PDB id we could try to see if we can reproduce the error.

Tagging in @ncfrey here as well in case he’s run into this error before

I am still debugging the new complex featurizers. I’ll take a look at the pdb files you provided and let you know if I’m able to find a solution.


Hi Nathan,

I’ve wonder if you could solved the issues about the ComplexFeaturizer. I saw other people have the same issue that I have in Github (Title: ComplexFeaturizer failed to featurize complexes #2347).

Have you had a chance to look at my input files?

Hi @okay, I think it’s an issue with your ligand being a mol2 file, although I’m not sure why. I read the file in with RDKit, sanitized it, and wrote it back to a PDB file and the complex featurization worked.

Thanks a lot, Nathan! Could you please share your sanitization code?
I will need to prepare all my ligands as you’ve done.

Sure! I wrote a flexible sanitization function called deepchem.utils.vina_utils.prepare_inputs that you can find here.

Thanks a lot Nathan! It’s very helpful.