Custom created weavemols encountering trouble


I have been trying to encode and featurize my own data through the one hot encodings which are done when creating a weave mol. However I am running into issues when I try to fit a model given the dataset I provide from my own weavemols.

My data is molecules as graphs with custom values on the edges (similar to distances) I have tried to create something similar to the weavefeaturizer, and am succesful most of the way.

The problem is encountered when I try to feed this data to a MPNNModel as the response I get is this:

ValueError                                Traceback (most recent call last)
<ipython-input-72-6a0b50c0b8f0> in <module>
----> 1, nb_epoch=10)

~/.local/lib/python3.8/site-packages/deepchem/models/ in fit(self, dataset, nb_epoch, max_checkpoints_to_keep, checkpoint_interval, deterministic, restore, variables, loss, callbacks, all_losses)
    318     The average loss over the most recent checkpoint interval
    319    """
--> 320     return self.fit_generator(
    321         self.default_generator(
    322             dataset, epochs=nb_epoch,

~/.local/lib/python3.8/site-packages/deepchem/models/ in fit_generator(self, generator, max_checkpoints_to_keep, checkpoint_interval, restore, variables, loss, callbacks, all_losses)
    395     # Main training loop.
--> 397     for batch in generator:
    398       self._create_training_ops(batch)
    399       if restore:

~/.local/lib/python3.8/site-packages/deepchem/models/ in default_generator(self, dataset, epochs, mode, deterministic, pad_batches)
   1159           # pair features
   1160           pair_feat.append(
-> 1161               np.reshape(mol.get_pair_features(),
   1162                          (n_atoms * n_atoms, self.n_pair_feat)))

<__array_function__ internals> in reshape(*args, **kwargs)

~/.local/lib/python3.8/site-packages/numpy/core/ in reshape(a, newshape, order)
    297            [5, 6]])
    298     """
--> 299     return _wrapfunc(a, 'reshape', newshape, order=order)

~/.local/lib/python3.8/site-packages/numpy/core/ in _wrapfunc(obj, method, *args, **kwds)
     57     try:
---> 58         return bound(*args, **kwds)
     59     except TypeError:
     60         # A TypeError occurs if the object does have such a method in its

ValueError: cannot reshape array of size 6272 into shape (196,3)

I’m sure this has something to do with the shape of my data being something that the MPNN doesn’t expect.

The data basically is a list of the objects weavemol:

  • nodes (mostly the same as the weave featurizer)
  • pair_edges exactly the same as weave featurizer
  • pairs this is where I suspect the problem is as this contains a vector for all edges in the complete graph of the molecule with onehot encodings for my distance measure.

Any input would be appreciated

As a quick question, are you able to fit WeaveModel itself on your custom weavemols? I think there are also some issues at present handling distances with WeaveModel (this isn’t well covered by the unit test suite at present), so it’s possible you might have found an underlying bug.

For us to take a closer look, it would be hopeful if you could give us a minimal reproducing code snippet so we can repro the bug locally

I was not aware that the WeaveModel existed and had only tried MPNNModel, I will try it and get back to you.

Right now the data I’m working with is somewhat classified but I can try and create something to share that wont violate this. I will return when this is done.

