autointent.context.data_handler.StratifiedSplitter#

class autointent.context.data_handler.StratifiedSplitter(test_size, label_feature, random_seed, shuffle=True)#

A class for stratified splitting of datasets.

This class provides methods to split a dataset into training and testing subsets while preserving the distribution of target labels. It supports both single-label and multi-label datasets.

Parameters:
  • test_size (float)

  • label_feature (str)

  • random_seed (int | None)

  • shuffle (bool)

test_size#
label_feature#
random_seed#
shuffle = True#
__call__(dataset, multilabel, allow_oos_in_train=None)#

Split the dataset into training and testing subsets.

Parameters:
  • dataset (datasets.Dataset) – The input dataset to be split.

  • multilabel (bool) – Whether the dataset is multi-label.

  • allow_oos_in_train (bool | None) – Set to True if you want to see out-of-scope utterances in train split.

Returns:

A tuple containing the training and testing datasets.

Raises:

ValueError – If OOS samples are present but allow_oos_in_train is not specified.

Return type:

tuple[datasets.Dataset, datasets.Dataset]