autointent.context.data_handler.StratifiedSplitter#
- class autointent.context.data_handler.StratifiedSplitter(test_size, label_feature, random_seed, shuffle=True)#
A class for stratified splitting of datasets.
This class provides methods to split a dataset into training and testing subsets while preserving the distribution of target labels. It supports both single-label and multi-label datasets.
- test_size#
- label_feature#
- random_seed#
- shuffle = True#
- __call__(dataset, multilabel, allow_oos_in_train=None)#
Split the dataset into training and testing subsets.
- Parameters:
dataset (datasets.Dataset) – The input dataset to be split.
multilabel (bool) – Whether the dataset is multi-label.
allow_oos_in_train (bool | None) – Set to True if you want to see out-of-scope utterances in train split.
- Returns:
A tuple containing the training and testing datasets.
- Raises:
ValueError – If OOS samples are present but allow_oos_in_train is not specified.
- Return type: