Splitting dataset

Hi All:

I’m not sure if this is the right forum to ask this question so please delete it if it’s inappropriate.

I’m currently working on some experimental data and I would like to split it into train, valid, and test sets.

My code looks something like this:

from deepchem.splits.splitters import RandomSplitter
split_dataset = RandomSplitter()
train, val, test = split_dataset.split(dataset=balanced_dataset)

Which returns me 3 arrays corresponding to an 80/10/10 split.

How do I use these indices to split my DiskDataset according to the specified arrays?

Try using split_dataset.train_valid_test_split() instead of split_dataset.spllit. See https://deepchem.readthedocs.io/en/latest/api_reference/splitters.html#deepchem.splits.RandomSplitter.train_valid_test_split

1 Like