Custom created weavemols encountering trouble

Hello.

I have been trying to encode and featurize my own data through the one hot encodings which are done when creating a weave mol. However I am running into issues when I try to fit a model given the dataset I provide from my own weavemols.

My data is molecules as graphs with custom values on the edges (similar to distances) I have tried to create something similar to the weavefeaturizer, and am succesful most of the way.

The problem is encountered when I try to feed this data to a MPNNModel as the response I get is this:

---------------------------------------------------------------------------
ValueError                                Traceback (most recent call last)
<ipython-input-72-6a0b50c0b8f0> in <module>
----> 1 model.fit(train, nb_epoch=10)

~/.local/lib/python3.8/site-packages/deepchem/models/keras_model.py in fit(self, dataset, nb_epoch, max_checkpoints_to_keep, checkpoint_interval, deterministic, restore, variables, loss, callbacks, all_losses)
    318     The average loss over the most recent checkpoint interval
    319    """
--> 320     return self.fit_generator(
    321         self.default_generator(
    322             dataset, epochs=nb_epoch,

~/.local/lib/python3.8/site-packages/deepchem/models/keras_model.py in fit_generator(self, generator, max_checkpoints_to_keep, checkpoint_interval, restore, variables, loss, callbacks, all_losses)
    395     # Main training loop.
    396 
--> 397     for batch in generator:
    398       self._create_training_ops(batch)
    399       if restore:

~/.local/lib/python3.8/site-packages/deepchem/models/graph_models.py in default_generator(self, dataset, epochs, mode, deterministic, pad_batches)
   1159           # pair features
   1160           pair_feat.append(
-> 1161               np.reshape(mol.get_pair_features(),
   1162                          (n_atoms * n_atoms, self.n_pair_feat)))
   1163 

<__array_function__ internals> in reshape(*args, **kwargs)

~/.local/lib/python3.8/site-packages/numpy/core/fromnumeric.py in reshape(a, newshape, order)
    297            [5, 6]])
    298     """
--> 299     return _wrapfunc(a, 'reshape', newshape, order=order)
    300 
    301 

~/.local/lib/python3.8/site-packages/numpy/core/fromnumeric.py in _wrapfunc(obj, method, *args, **kwds)
     56 
     57     try:
---> 58         return bound(*args, **kwds)
     59     except TypeError:
     60         # A TypeError occurs if the object does have such a method in its

ValueError: cannot reshape array of size 6272 into shape (196,3)

I’m sure this has something to do with the shape of my data being something that the MPNN doesn’t expect.

The data basically is a list of the objects weavemol:

  • nodes (mostly the same as the weave featurizer)
  • pair_edges exactly the same as weave featurizer
  • pairs this is where I suspect the problem is as this contains a vector for all edges in the complete graph of the molecule with onehot encodings for my distance measure.

Any input would be appreciated

As a quick question, are you able to fit WeaveModel itself on your custom weavemols? I think there are also some issues at present handling distances with WeaveModel (this isn’t well covered by the unit test suite at present), so it’s possible you might have found an underlying bug.

For us to take a closer look, it would be hopeful if you could give us a minimal reproducing code snippet so we can repro the bug locally

I was not aware that the WeaveModel existed and had only tried MPNNModel, I will try it and get back to you.

Right now the data I’m working with is somewhat classified but I can try and create something to share that wont violate this. I will return when this is done.

1 Like

This is probably way too late, but I have been doing a similar thing creating my own WeaveMol featurization recently, and you need to ensure that the n_atom_feat and n_pair_feat parameters to the WeaveModel (or MPNNModel) match the length of your one hot encoded vectors stored in your WeaveMol object. I.e. print out the len() of the first entry in WeaveMol.get_atom_features and WeaveMol.get_pair_features, then manually put these numbers in for those parameters. Hope this at least helps anyone in the future.