Atom counts in ConvMol objects

OK, I have parsed my dataset from the website of origin and here are the preliminary results:

shulgin dataset

I got molfiles and loaded them as Mol objects.

Then I converted those Mol objects to SMILES strings.

Then I used the ConvMol featurizer and HERE IS THE THING:

I’m not finding the expected number of atoms with the get_num_atoms method.

My current thought is that conversion from Mol objects to SMILES strings must be handled more delicately OR loading of molfiles must be handled more delicately OR the ConvMol featurizer must be applied more delicately than I have done.

Has anyone else encountered such issues before?

Thanks for your attention!

One possibility is that hydrogens are being stripped which may account for the difference. Most featurizers discount hydrogens in their atomic counts

1 Like

Interesting! Thank you, that might explain it.

Full disclosure: I just realized that I might not have actually confirmed a discrepancy. I will have a closer and more careful look and make sure everything adds up or doesn’t.