autointent.Dataset#

class autointent.Dataset(*args, intents, **kwargs)#

Bases: dict[str, datasets.Dataset]

Represents a dataset with associated metadata and utilities for processing.

This class extends a dictionary where the keys represent dataset splits (e.g., ‘train’, ‘test’), and the values are Hugging Face datasets.

Parameters:
label_feature: str = 'label'#

The feature name corresponding to labels in the dataset.

utterance_feature: str = 'utterance'#

The feature name corresponding to utterances in the dataset

has_descriptions: bool#

Whether the dataset includes descriptions for intents.

intents: list[autointent.schemas.Intent]#

All metadata about intents used in this dataset.

property multilabel: bool#

Checks if the dataset is multilabel.

Return type:

bool

property n_classes: int#

Returns the number of classes in the dataset.

Return type:

int

classmethod from_dict(mapping)#

Creates a dataset from a dictionary mapping.

Parameters:

mapping (dict[str, Any]) – A dictionary representation of the dataset.

Return type:

Dataset

classmethod from_json(filepath)#

Loads a dataset from a JSON file.

Parameters:

filepath (str | pathlib.Path) – Path to the JSON file.

Return type:

Dataset

classmethod from_hub(repo_name)#

Loads a dataset from the Hugging Face Hub.

Parameters:

repo_name (str) – The name of the Hugging Face repository, like AutoIntent/clinc150.

Return type:

Dataset

to_multilabel()#

Converts dataset labels to multilabel format.

Return type:

Dataset

to_dict()#

Converts the dataset into a dictionary format.

Returns a dictionary where the keys are dataset splits and the values are lists of samples.

Return type:

dict[str, list[dict[str, Any]]]

to_json(filepath)#

Saves the dataset to a JSON file.

Parameters:

filepath (str | pathlib.Path) – The file path where the dataset should be saved.

Return type:

None

push_to_hub(repo_name, private=False)#

Uploads the dataset to the Hugging Face Hub.

Parameters:
  • repo_name (str) – The ID of the Hugging Face repository.

  • private (bool) – Whether to make the repository private.

Return type:

None

get_tags()#

Extracts unique tags from the dataset’s intents.

Return type:

list[autointent.schemas.Tag]

get_n_classes(split)#

Calculates the number of unique classes in a dataset split.

Parameters:

split (str) – The dataset split to analyze.

Return type:

int

validate_descriptions()#

Validates whether all intents in the dataset contain descriptions.

Return type:

bool