Featurizer failure in DeepChem 2.7: ValueError: setting an array element with a sequence

Hi,
This is my first post, so please let me know if there conventions I should follow. I am featurizing a dataset created from dc.data.csvloader using the Circular Fingerprint featurizer, shown below:

import deepchem as dc
loader = dc.data.CSVLoader(["task1", "task2"], id_field = "CID" , feature_field="SMILES",  featurizer= dc.feat.CircularFingerprint(size=2048, radius=4))
CF_dataset = loader.create_dataset(dataset_path)

DeepChem Version: 2.7
Issue: Featurizer failure.
Error message: ValueError : setting an array element with a sequence. The requested array has an inhomogeneous shape after 1 dimensions. The detected shape was (8192,) + inhomogeneous part.

Please let me know if you have any suggestions. I ran this same code on DeepChem 2.5 a few months back, and had no problem. I have also tried using many other featurizers, and I get the same error except when I use RDKitDescriptors, which successfully featurizes the dataset.

Could you please provide a sample of the dataset you are trying to load.

I was able to run the code on a sample dataset without any issues.

image-1

Ok nice, I wonder if it’s a dependency issue then. Which python and deepchem versions are you using?

Here, I have tried to create a synthetic sample of a dataset that should match your dataset structure and used the same commands. I was able to execute it without any errors.

Deepchem version: 2.7.1
Python version: 3.8.10

image


image

@bharath
The code for circular fingerprint in deepchem contains this highlighted line which could be a reason for the error.

fp = np.asarray(fp, dtype=float)” is responsible for converting the output feature vector from rdkit module rdkit.Chem.dMolDescriptors into a numpy array.

This command could give a similar error if used with a multidimensional list with improper dimensions at different levels. Example:

image

We can check for a possible datapoint that could have caused this error during featurization process. And even internally handle this error with a warning output.