Hello,
I am a newbie to python/deepchem. I need to do a scaffold split on my own dataset (to evaluate ROCS scaffold hopping).
I tried running the example and I am able to get a split. But I don’t know as to how to write the output as a csv file. Any help is appreciated.
This is what I did so far with example dataset (example.csv) provided in deepchem.
import deepchem as dc
import pandas as pd
import os
current_dir=os.path.dirname(os.path.realpath(‘file’))
input_data=os.path.join(current_dir,’./example.csv’)
tasks=[‘log-solubility’]
featurizer=dc.feat.CircularFingerprint(size=1024)
loader = dc.data.CSVLoader(tasks=tasks, smiles_field=“smiles”,featurizer=featurizer)
dataset=loader.featurize(input_data)
splitter = dc.splits.ScaffoldSplitter(input_file)
train_dataset, test_dataset = splitter.train_test_split(dataset)
len(train_dataset),len(test_dataset)
The output I am trying to get are two csv files train_dataset and test_dataset. Ideally with Compound ID.
Thanks
Mohamed