Validating the Moleculenet.io paper results in DeepCheem

Thread for validating any of the results in ‘MoleculeNet: A Benchmark for Molecular
Machine Learning’ (See https://arxiv.org/pdf/1703.00564.pdf).

Please post links to any validation scripts or models that reproduce the results in the moleculenet paper so that new researchers can use them and quickly check that their set-up reproduces the results.

I cannot validate the QM8 data.

The dataset seems to have an error with duplicated data. As such I get an mae of around 0.039-0.040 on using the MultiTaskRegressor object on data loaded via load_qm8.

If anyone manages to fix this please do let me know.

We are working to create a modern set of scripts to run MoleculeNet benchmarks at https://github.com/deepchem/moleculenet. We have not yet added QM8 to these scripts but I’d recommend using them as a template to get numbers. Precise hyperparameters for the old benchmarks from the 2018 paper are available at https://github.com/deepchem/deepchem/blob/2.3.0/deepchem/molnet/preset_hyper_parameters.py. Note that considerable changes have been made in TensorFlow since the original benchmarks were run so there is no guarantee that these hyperparameters will reproduce the exact results. Our goal with the new MoleculeNet repo is to create an up-to-date way to get maintained benchmark numbers but as you can see the repo is still a work in progress.

OK thank you, this looks good. I’ll have a go with these parameters and models and see if I can replicate the results.

Thanks Bharath.

It should be relatively straightfoward to bypass the tensorflow update issue by running verifications in the same version of tensorflow as the original. From looking at the release Bharath linked, that code is set to tf 1.14.0 (based on the install_deepchem_conda script).