Rdkit error in MolGraphConv featurizer

I am facing an issue with the MolGraphConv featurizer. Some data points are not getting featured. I have attached the screenshot of the error I am getting. Is there any way to catch the exception or get to know the data points that are failing so that I can remove them through code automatically?

As you see in this line https://github.com/deepchem/deepchem/blob/a8ead748384490c43af5b13dbc44956eb3cdeff5/deepchem/feat/base_classes.py#L280

Any error in featurisation does not record the smile string but instead appends an empty array. It also logs the array index of the failed data point. If there aren’t too many failed data points, I would recommend removing those smiles strings yourself manually.
Or as its only appending empty array, you could write a script to remove them and see if it works.

I have been removing manually so far. I wanted to know if we can automate that process.

From what I understand, this how I would fix the problem, provided you pass a list of smile strings -

import numpy as np
from deepchem.feat import MolGraphConvFeaturizer
from deepchem.feat.graph_data import GraphData
from deepchem.data import NumpyDataset

# indices 1 and 3 invalid
smiles = ["C1CCC1", "abcd", "C1=CC=CN=C1", "efgh"]  
labels = np.array([0.642, 0.723, 0.823, 2.41])

featurizer = MolGraphConvFeaturizer(use_edges=True)

# Below line will throw exception for data points 1 and 3
features = featurizer.featurize(smiles)    

# Will store indices of valid data points & then choose them
indices = [ i for i, data in enumerate(features) if type(data) is GraphData ] 
features = features[indices]  
labels = labels[indices]

# Build the dataset for passing into the model
dataset = NumpyDataset(features, labels)

Code Description - MolGraphConvFeaturizer returns an np.ndarray of GraphData object. On failed datapoints it appends a empty np.ndarray.

Hope this helps

Thank you. I will try this method.