autointent.Dataset#
- class autointent.Dataset(*args, intents, **kwargs)#
Bases:
dict
[str
,datasets.Dataset
]Represents a dataset with associated metadata and utilities for processing.
- Parameters:
args (Any) – Positional arguments to initialize the dataset.
intents (list[autointent.schemas.Intent]) – List of intents associated with the dataset.
kwargs (Any) – Additional keyword arguments to initialize the dataset.
- label_feature = 'label'#
- utterance_feature = 'utterance'#
- intents#
- property multilabel: bool#
Check if the dataset is multilabel.
- Returns:
True if the dataset is multilabel, False otherwise.
- Return type:
- property n_classes: int#
Get the number of classes in the training split.
- Returns:
Number of classes.
- Return type:
- classmethod from_dict(mapping)#
Load a dataset from a dictionary mapping.
- classmethod from_json(filepath)#
Load a dataset from a JSON file.
- Parameters:
filepath (str | pathlib.Path) – Path to the JSON file.
- Returns:
Initialized Dataset object.
- Return type:
- classmethod from_hub(repo_id)#
Load a dataset from a Hugging Face repository.
- to_multilabel()#
Convert dataset labels to multilabel format.
- Returns:
Self, with labels converted to multilabel.
- Return type:
- to_dict()#
Convert the dataset splits and intents to a dictionary of lists.
- to_json(filepath)#
Save the dataset splits and intents to a JSON file.
- Parameters:
filepath (str | pathlib.Path) – The path to the file where the JSON data will be saved.
- Return type:
None
- push_to_hub(repo_id, private=False)#
Push dataset splits to a Hugging Face repository.
- get_tags()#
Extract unique tags from the dataset’s intents.
- Returns:
List of tags with their associated intent IDs.
- Return type: