Unable to load and featurize using dc.molnet.load_pdbbind()

cheongw · April 24, 2023, 1:09pm

Hi,

I am new to DeepChem community.

I have been trying to run the revised code from the web tutorial (pdbbind_rf.py). However, I could not load and featurize the pdbbind data from Molnet for some reason.

It seems that other people asked about the similar issue some time ago, but I cannot find the solution to this issue.

Please, refer to the line of code (from the tutorial) and the portion of the error message shown in the snapshot.

tasks, datasets, transformers = dc.molnet.load_pdbbind(featurizer=grid, splitter=“random”, subset=“core”)

Upon execution, I get the 193 lines of the following warning message:
WARNING: deepchem.feat.base_classes: Failed to featurize datapoint 0. Appending empty array.
WARNING: deepchem.feat.base_classes: Failed to featurize datapoint 1. Appending empty array.
…
WARNING: deepchem.feat.base_classes: Failed to featurize datapoint 192. Appending empty array.

When printing out the 5 data from each dataset, I get the following:
print(train_dataset.X[0:5])
print(train_dataset.y[0:5])
print(train_dataset.w[0:5])

It would be great if someone can provide me with a solution to this challenge.

Thank you in advance.

RamashrayChauhan · April 26, 2023, 1:09pm

To solve this problem, you need to check that you have properly configured the featurizer function to handle pdbbind data. Maybe you need to use a different featurizer function, or adjust its parameters to handle pdbbind data.
Also note that you are using splitter = “random”, which means that the data is split into random samples. If you want to use fixed data sets for training and testing, you should use a different splitter, such as splitter = “scaffold”.
Finally, note that error messages indicate that the data cannot be allocated for data point 1 and data point 192. This could be because the data in these data points is corrupt or improperly organized. Check these data in your original dataset to make sure they are properly organized and not corrupted.
I hope this helps!

cheongw · April 26, 2023, 2:56pm

Thank you for your insights.

I understood the error message in the similar way that you explained. In fact, I could not include more than 1 snapshot, and thus, I wrote down a portion of the error message. Yes. The error message indicated that the data points from 0 to 192 (All X data in the core set) could not be featurized and allocated.

I believe that there are 3 possibilities for such error: (1) Some problem with the grid featurizer function, (2) Some problem with the dataset in molnet, or (3) Some problem with the loading function (dc.molnet.load_pdbbind())

Based on the fact that (a) the molenet pdbbind dataset could be downloaded with URL and (b) grid featurization command did not indicate any error, I would carefully weigh more on the issue with the “dc.molnet.load_pdbbind()” function. However, it could be other issues that I would not know. As I indicated, I am new to deepchem community.

cheongw · May 16, 2023, 9:02pm

Hi,

The developers of the dc.molnet.load_pdbbind() must have found the bug and have repaired the code or data. Now, the function works well. The results are shown in the below snapshot:

load_pdbbind_results

Thank you very much.