How to migrate from TensorFlow to PyTorch?

I lok at this tutorial:

https://colab.research.google.com/github/deepchem/deepchem/blob/master/examples/tutorials/The_Basic_Tools_of_the_Deep_Life_Sciences.ipynb

Then I change
!pip install --pre deepchem[tensorflow]
to
!pip install --pre deepchem

Then I change
model = dc.models.GraphConvModel(n_tasks=1, mode=‘regression’, dropout=0.2,batch_normalize=False)
to
num_features = train_dataset.X[0].get_atom_features().shape[1]
model = dc.models.torch_models.GraphConvModel(n_tasks=len(tasks), mode=‘regression’, dropout=0.2, number_input_features=[num_features,64], batch_normalize=False)

Then I run model.fit and get a very bad result.

Where am I wrong?

would you mind at least sharing what “bad result” you had?

1 Like

It works correctly now. Apparently, I did something wrong a month ago.

1 Like

import deepchem as dc
import numpy as np # Import numpy

Load your dataset (replace with your actual data loading)

tasks, datasets, transformers = dc.molnet.load_tox21(featurizer=‘GraphConv’, split=‘index’)
train_dataset, valid_dataset, test_dataset = datasets

Get the number of atom features

num_features = train_dataset.X[0].get_atom_features().shape[1]
print(f"Number of atom features: {num_features}")

Initialize the GraphConvModel (PyTorch version)

model = dc.models.torch_models.GraphConvModel(
n_tasks=len(tasks),
mode=‘regression’, # or ‘classification’
dropout=0.2,
number_input_features=num_features, # Pass the integer directly
batch_normalize=False
)

Train the model

model.fit(train_dataset, nb_epoch=10) # Adjust nb_epoch as needed

Evaluate the model

metric = dc.metrics.Metric(dc.metrics.pearson_r2_score, task_averaging_mode=‘macro’)
train_scores = model.evaluate(train_dataset, [metric], transformers)
valid_scores = model.evaluate(valid_dataset, [metric], transformers)

print(f"Train scores: {train_scores}")
print(f"Validation scores: {valid_scores}")

Try it.

1 Like

I continue.
I’m trying to change:

https://github.com/dbetm/DeepLearningLifeSciences/blob/main/06_Genomics/Predicting_transcription_factor_binding.ipynb

Changed code:

indeimport deepchem as dc
import torch
import numpy as np
from torchsummary import summary

pytorch_model = torch.nn.Sequential(
torch.nn.Flatten(),
torch.nn.Unflatten(1,(4,101)),
torch.nn.Conv1d(in_channels=4, out_channels=15, kernel_size=10, padding=‘same’),
torch.nn.ReLU(),
torch.nn.Dropout(0.5),
torch.nn.Conv1d(in_channels=15, out_channels=15, kernel_size=10, padding=‘same’),
torch.nn.ReLU(),
torch.nn.Dropout(0.5),
torch.nn.Conv1d(in_channels=15, out_channels=15, kernel_size=10, padding=‘same’),
torch.nn.ReLU(),
torch.nn.Dropout(0.5),
torch.nn.Flatten(),
torch.nn.Linear(1515, 1),
torch.nn.Sigmoid()
)

print(summary(pytorch_model, (101,4)))

model = dc.models.TorchModel(
pytorch_model,
loss=dc.models.losses.SigmoidCrossEntropy(),
output_types=[‘prediction’, ‘loss’],
batch_size=1000,
model_dir=‘pt’)

train = dc.data.DiskDataset(‘train_dataset’)
valid = dc.data.DiskDataset(‘valid_dataset’)

metric = dc.metrics.Metric(dc.metrics.roc_auc_score)
train2 = dc.data.DiskDataset.from_numpy(train.X.astype(np.float64),train.y.astype(np.float64),train.w.astype(np.float64))
valid2 = dc.data.DiskDataset.from_numpy(valid.X.astype(np.float64),valid.y.astype(np.float64),valid.w.astype(np.float64))
for i in range(20):
model.fit(train2, nb_epoch=10)
print(model.evaluate(train2, [metric]))
print(model.evaluate(valid2, [metric]))

Error:

IndexError Traceback (most recent call last)
in <cell line: 0>()
6 print(train2)
7 for i in range(20):
----> 8 model.fit(train2, nb_epoch=10)
9 print(model.evaluate(train2, [metric]))
10 print(model.evaluate(valid2, [metric]))

2 frames
/usr/local/lib/python3.11/dist-packages/deepchem/models/torch_models/torch_model.py in (.0)
435 outputs = [outputs]
436 if self._loss_outputs is not None:
–> 437 outputs = [outputs[i] for i in self._loss_outputs]
438 batch_loss = loss(outputs, labels, weights)
439 batch_loss.backward()

IndexError: list index out of range

Where am I wrong?