autointent.context.data_handler.DataHandler#

class autointent.context.data_handler.DataHandler(dataset, force_multilabel=False, random_seed=0)#

Data handler class.

Parameters:
dataset#
n_classes#
regexp_patterns#
intent_descriptions#
tags#
property multilabel: bool#

Check if the dataset is multilabel.

Returns:

True if the dataset is multilabel, False otherwise.

Return type:

bool

train_utterances(idx=None)#

Retrieve training utterances from the dataset.

If a specific training split index is provided, retrieves utterances from the indexed training split. Otherwise, retrieves utterances from the primary training split.

Parameters:

idx (int | None) – Optional index for a specific training split.

Returns:

List of training utterances.

Return type:

list[str]

train_labels(idx=None)#

Retrieve training labels from the dataset.

If a specific training split index is provided, retrieves labels from the indexed training split. Otherwise, retrieves labels from the primary training split.

Parameters:

idx (int | None) – Optional index for a specific training split.

Returns:

List of training labels.

Return type:

list[autointent.custom_types.LabelType]

validation_utterances(idx=None)#

Retrieve validation utterances from the dataset.

If a specific validation split index is provided, retrieves utterances from the indexed validation split. Otherwise, retrieves utterances from the primary validation split.

Parameters:

idx (int | None) – Optional index for a specific validation split.

Returns:

List of validation utterances.

Return type:

list[str]

validation_labels(idx=None)#

Retrieve validation labels from the dataset.

If a specific validation split index is provided, retrieves labels from the indexed validation split. Otherwise, retrieves labels from the primary validation split.

Parameters:

idx (int | None) – Optional index for a specific validation split.

Returns:

List of validation labels.

Return type:

list[autointent.custom_types.LabelType]

test_utterances(idx=None)#

Retrieve test utterances from the dataset.

If a specific test split index is provided, retrieves utterances from the indexed test split. Otherwise, retrieves utterances from the primary test split.

Parameters:

idx (int | None) – Optional index for a specific test split.

Returns:

List of test utterances.

Return type:

list[str]

test_labels(idx=None)#

Retrieve test labels from the dataset.

If a specific test split index is provided, retrieves labels from the indexed test split. Otherwise, retrieves labels from the primary test split.

Parameters:

idx (int | None) – Optional index for a specific test split.

Returns:

List of test labels.

Return type:

list[autointent.custom_types.LabelType]

oos_utterances(idx=None)#

Retrieve out-of-scope (OOS) utterances from the dataset.

If the dataset contains out-of-scope samples, retrieves the utterances from the specified OOS split index (if provided) or the primary OOS split. Returns an empty list if no OOS samples are available in the dataset.

Parameters:

idx (int | None) – Optional index for a specific OOS split.

Returns:

List of out-of-scope utterances, or an empty list if unavailable.

Return type:

list[str]

has_oos_samples()#

Check if there are out-of-scope samples.

Returns:

True if there are out-of-scope samples.

Return type:

bool

dump(filepath)#

Save the dataset splits and intents to a JSON file.

Parameters:

filepath (str | pathlib.Path) – The path to the file where the JSON data will be saved.

Return type:

None