Molecule WGAN in DeepChem

This is really cool! Awesome progress :).

As a couple of thoughts, perhaps this would be ready to make a first PR to DeepChem with the implementation for 1st round code-review. Typically we’ll ask for some more unit tests and other modifications in the review process so might be good to kick that process off sooner rather than later. We might be able to discuss some of the changes to GAN/WGAN you suggest on that PR and kick off some fixes :slight_smile:

I still want to polish the code a bit but should not take too long, might finish this weekend.
I will then let you know and we can start thinking about pushing it forward.
I have to still do following:

  • implement get_config() function in custom layers to enable saving
  • remake featurization to be inline with DC featurization class
  • style code to be inline with DC e.g. typing and commenting
1 Like

@bharath
Hi Bharath,
Happy New Year!
Sorry it took me so long but have been snowed under lately.
If you are happy with the current format we can think about first release.
Obviously, I need to convert Jupyter notebook into python files.

I have created featurizer that is inline with DeepChem class, with added defeaturization functionality that converts graphs into RDKit molecules.
I have also modified custom layers to include get_config() which will enable serialization.
If you think of something else that needs to be done before release just let me know.
Cheers,
Milosz

1 Like

That sounds great! Please do send in a first pull request for us to review. I’d suggest breaking the feature up into multiple smaller pull requests for ease of review if possible. We’re targeting getting the 2.5.0 release out at the end of February so hopefully we can get MolGAN into 2.5.0 :slight_smile:

1 Like

First pull request ready: https://github.com/deepchem/deepchem/pull/2371

Working on layers now. I have done initial code cleaning.
Need to create basic unit tests. If nothing urgent comes up I should have PR ready before end of the week.

1 Like

@bharath
Layers PR: https://github.com/deepchem/deepchem/pull/2386

1 Like

@bharath
Model PR: https://github.com/deepchem/deepchem/pull/2426

1 Like

Hi @MiloszGrabski, I tried to run main.ipynb and tutorial.ipynb from this repo https://github.com/MiloszGrabski/DeepChem_MolGAN. It didn’t run properly. I am getting the following error. Just wanted to let you know. Thanks!

Screenshot 2021-04-03 at 1.28.20 PM

@MiloszGrabski @bharath
What is the difference between these smiles string? I am trying to run GAN with MUV dataset smiles. However, I am getting the following error.

Smiles strings:

Screenshot 2021-04-05 at 12.44.18 PM

Output and error:

Screenshot 2021-04-05 at 12.44.32 PM

Hi,
Sorry for the late response. I am currently quite busy and struggling to find free time for DC.
The problem is with number and types of atoms.
The current infrastructure was tested on QM9 dataset with max number of atoms of 8 along with basic atom types C,N,O,F.
You can change max_number of atoms when you create featurizer (max_atom_count parameter).
Also, you need to remove charged molecules

1 Like

I received the same error as above even after increasing max_atom_count.

Some example SMILES I used are:
COC(=O)C1©CCCC2=C3C(=O)C(=O)C4=C(OC=C4C)C3=CC=C12
CC1COC2=C1C(=O)C(=O)C1=C3CCCC©©C3=CC=C21

Failed to featurize datapoint 3, CC(=O)c1ccccc1S(=O)(=O)c1ccccc1C(=O)O. Appending empty array
Exception message: 16

EDIT: Is there any way for me to use smiles containing S? When I removed them it worked.

EDIT#2: While going through the tutorial I also received this error while training the model:

>     #train model
>     gan.fit_gan(iterbatches(10), generator_steps=0.2, checkpoint_interval=5000)

ValueError: Layer model_1 expects 2 input(s), but it received 4 input tensors. Inputs received: [<tf.Tensor ‘model_2/model/tf.nn.softmax/Softmax:0’ shape=(100, 9, 9, 5) dtype=float32>, <tf.Tensor ‘model_2/model/tf.nn.softmax_1/Softmax:0’ shape=(100, 9, 5) dtype=float32>, <tf.Tensor ‘model_2/model/tf.math.argmax_1/ArgMax:0’ shape=(100, 9, 9) dtype=int64>, <tf.Tensor ‘model_2/model/tf.math.argmax_3/ArgMax:0’ shape=(100, 9) dtype=int64>]

1 Like

@seyonec I think you’ve dealt with a similar bug before with MolGAN right? I recall there’s something like expanding the set of atoms that you have to do.

CC @MiloszGrabski

After re-running the code with the updated BasicMolGANModel from the molgan.py generator I no longer got this error.

1 Like