autointent.Dataset#

class autointent.Dataset(*args, intents, **kwargs)#

Bases: dict[str, datasets.Dataset]

Represents a dataset with associated metadata and utilities for processing.

Parameters:
  • args (Any) – Positional arguments to initialize the dataset.

  • intents (list[autointent.schemas.Intent]) – List of intents associated with the dataset.

  • kwargs (Any) – Additional keyword arguments to initialize the dataset.

label_feature = 'label'#
utterance_feature = 'utterance'#
intents#
property multilabel: bool#

Check if the dataset is multilabel.

Returns:

True if the dataset is multilabel, False otherwise.

Return type:

bool

property n_classes: int#

Get the number of classes in the training split.

Returns:

Number of classes.

Return type:

int

classmethod from_dict(mapping)#

Load a dataset from a dictionary mapping.

Parameters:

mapping (dict[str, Any]) – Dictionary representing the dataset.

Returns:

Initialized Dataset object.

Return type:

Dataset

classmethod from_json(filepath)#

Load a dataset from a JSON file.

Parameters:

filepath (str | pathlib.Path) – Path to the JSON file.

Returns:

Initialized Dataset object.

Return type:

Dataset

classmethod from_hub(repo_id)#

Load a dataset from a Hugging Face repository.

Parameters:

repo_id (str) – ID of the Hugging Face repository.

Returns:

Initialized Dataset object.

Return type:

Dataset

to_multilabel()#

Convert dataset labels to multilabel format.

Returns:

Self, with labels converted to multilabel.

Return type:

Dataset

to_dict()#

Convert the dataset splits and intents to a dictionary of lists.

Returns:

A dictionary containing dataset splits and intents as lists of dictionaries.

Return type:

dict[str, list[dict[str, Any]]]

to_json(filepath)#

Save the dataset splits and intents to a JSON file.

Parameters:

filepath (str | pathlib.Path) – The path to the file where the JSON data will be saved.

Return type:

None

push_to_hub(repo_id, private=False)#

Push dataset splits to a Hugging Face repository.

Parameters:
  • repo_id (str) – ID of the Hugging Face repository.

  • private (bool)

Return type:

None

get_tags()#

Extract unique tags from the dataset’s intents.

Returns:

List of tags with their associated intent IDs.

Return type:

list[autointent.schemas.Tag]

get_n_classes(split)#

Calculate the number of unique classes in a given split.

Parameters:

split (str) – The split to analyze.

Returns:

Number of unique classes.

Return type:

int