Hello All,
I am trying to load a dataset from a pandas dataframe, which is okay but when I use scaffold splits with this code this happens:
[10:24:13] SMILES Parse Error: syntax error while parsing: CHEMBL132806
[10:24:13] SMILES Parse Error: Failed parsing SMILES 'CHEMBL132806' for input: 'CHEMBL132806'
and I am using this code:
dc_dataset = dc.data.DiskDataset.from_dataframe(df_adn_simple,
X="canonical_smiles",
y="pKI",
ids="molecule_chembl_id")
splitter_scaffold = dc.splits.ScaffoldSplitter()
scaffold_train_dataset_1, scaffold_test_dataset_1 = splitter_scaffold.train_test_split(dc_dataset)
If I swap out X and ids fields things work but obviously column names are wonky
Any help would be appreciated!