autointent.generation.utterances.DatasetBalancer#
- class autointent.generation.utterances.DatasetBalancer(generator, prompt_maker, async_mode=False, max_samples_per_class=None)#
Balance dataset’s classes distribution.
If your dataset is unbalanced, you can add LLM-generated samples. This method uses
autointent.generation.utterances.UtteranceGenerator
under the hood.See tutorial Balancing Datasets with DatasetBalancer for usage examples.
- Parameters:
generator (Generator) – The generator object used to create utterances.
prompt_maker (Callable[[Intent, int], list[Message]]) – A callable that creates prompts for the generator.
async_mode (bool, optional) – Whether to run the generator in asynchronous mode. Defaults to False.
max_samples_per_class (int | None, optional) – The maximum number of samples per class. Must be a positive integer or None. Defaults to None.
- utterance_generator#
- max_samples = None#
- balance(dataset, split=Split.TRAIN, batch_size=4)#
Balances the specified dataset split.
- Parameters:
dataset (autointent.Dataset) – Source dataset
split (str) – Target split for balancing
batch_size (int) – Batch size for asynchronous processing
- Return type: