Problem with metric evaluation?

Hi all, and huge thanks to the authors of deepchem!
I can’t understand why the output of several variants of code below is different? Here is the basic example of GCN fit on Delaney(ESOL) dataset:

!pip install deepchem
import deepchem as dc
import tensorflow as tf
import pandas as pd

from deepchem.feat import ConvMolFeaturizer
from sklearn.metrics import mean_squared_error

#loader from dc
tasks, datasets, transformers = dc.molnet.load_delaney(featurizer='GraphConv', split='random')
train_dataset, valid_dataset, test_dataset = datasets

model = dc.models.GraphConvModel(n_tasks=1, batch_size=128, mode='regression', dropout=0.2)
model.fit(train_dataset, nb_epoch=200)

metric = dc.metrics.Metric(dc.metrics.rms_score)
train_score = model.evaluate(train_dataset, [metric], transformers)
valid_score = model.evaluate(valid_dataset, [metric], transformers)
test_score = model.evaluate(test_dataset, [metric], transformers)
print('Training set score:', train_score)
print('Validation set score:', valid_score)
print('Test set score:', test_score)

Results are close to the original paper except for training set (need more epochs?)
Training set score: {‘rms_score’: 0.9388395539670715}
Validation set score: {‘rms_score’: 1.049190558227201}
Test set score: {‘rms_score’: 1.19915950432068}

Now i use the code that should do just the same:

!mkdir data
!wget https://deepchemdata.s3-us-west-1.amazonaws.com/datasets/delaney-processed.csv -O data/delaney.csv
target_df = pd.read_csv('data/delaney.csv')
smiles = target_df['smiles'].values
smiles = smiles.astype(str)
target_values = target_df['measured log solubility in mols per litre'].values
target_values=target_values.astype('float64')
featurizer=ConvMolFeaturizer()
X = featurizer.featurize(smiles)
dataset=dc.data.DiskDataset.from_numpy(X=X, y=target_values, ids=smiles, tasks=['log solubility (mol/L)'])
splitter = dc.splits.RandomSplitter()
train_dataset, valid_dataset, test_dataset = splitter.train_valid_test_split(dataset, frac_train=0.8, frac_valid = 0.1, seed=0)
transformers = [
  dc.trans.NormalizationTransformer(transform_y=True, dataset=train_dataset)
]
for dataset in [train_dataset, valid_dataset, test_dataset]:
  for transformer in transformers:
            dataset = transformer.transform(dataset)

model = dc.models.GraphConvModel(n_tasks=1, batch_size=128, mode='regression', dropout=0.2)
model.fit(train_dataset, nb_epoch=200)
metric = dc.metrics.Metric(dc.metrics.rms_score)
train_score = model.evaluate(train_dataset, [metric], transformers)
valid_score = model.evaluate(valid_dataset, [metric], transformers)
test_score = model.evaluate(test_dataset, [metric], transformers)
print('Training set score:', train_score)
print('Validation set score:', valid_score)
print('Test set score:', test_score)

And the results are different:
Training set score: {‘rms_score’: 1.973609741740122}
Validation set score: {‘rms_score’: 2.1925210282279806}
Test set score: {‘rms_score’: 2.137958534792295}

When i evaluated this metrics by another way, it looks that results are the same:

pred_train=model.predict(train_dataset)
pred_test=model.predict(test_dataset)
pred_val=model.predict(valid_dataset)
model_test_mse = mean_squared_error(test_dataset.y, pred_test, squared=True)
model_train_mse = mean_squared_error(train_dataset.y, pred_train, squared=True)
model_val_mse = mean_squared_error(valid_dataset.y, pred_val, squared=True)
print(‘Training set score:’, model_train_mse ** 0.5)
print(‘Validation set score:’, model_val_mse ** 0.5)
print(‘Test set score:’, model_test_mse ** 0.5)

gives:
Training set score: 0.9395447458414238
Validation set score: 1.0437583246097768
Test set score: 1.0177836506555855

So, the question is why the output of second code is different?

I run it all in Colab, here is link to file:
https://colab.research.google.com/drive/11C3lk9QPZ5pNtmIx1NNnhWCe4CnNdxcR?usp=sharing

1 Like

got the same problem. Is there any explanation yet?

My apologies for the slow response on this thread folks! Lost track of it. Would one of you be able to come by office hours? (MWF at 9am PST). I can talk through this issue with you both there. I suspect this is something to do with transformers not being passed around properly but we can discuss