Using the normalization transformer, one has the option whether to pass the dataset
at initialization or not:
transformer = dc.trans.NormalizationTransformer(transform_y=True, dataset=dataset)
The dataset
is then used to find X_means
, X_stds
or y_means
and y_stds
depending upon which quantity is being transformed:
if dataset is not None and transform_X:
X_means, X_stds = dataset.get_statistics(X_stats=True, y_stats=False)
self.X_means = X_means
self.X_stds = X_stds
It seems from the source code however, while these are not computed when the dataset
is not passed to the __init()__
method, the transform_array
method is still looking for them:
if self.transform_X:
if not hasattr(self, 'move_mean') or self.move_mean:
X = np.nan_to_num((X - self.X_means) / self.X_stds)
This seems to be a bug. A possible work around of course is to set these quantities manually but that is annoying.