autointent.generation.utterances.DatasetBalancer#

class autointent.generation.utterances.DatasetBalancer(generator, prompt_maker, async_mode=False, max_samples_per_class=None)#

Balance dataset’s classes distribution.

If your dataset is unbalanced, you can add LLM-generated samples. This method uses autointent.generation.utterances.UtteranceGenerator under the hood.

See tutorial Balancing Datasets with DatasetBalancer for usage examples.

Parameters:
  • generator (Generator) – The generator object used to create utterances.

  • prompt_maker (Callable[[Intent, int], list[Message]]) – A callable that creates prompts for the generator.

  • async_mode (bool, optional) – Whether to run the generator in asynchronous mode. Defaults to False.

  • max_samples_per_class (int | None, optional) – The maximum number of samples per class. Must be a positive integer or None. Defaults to None.

utterance_generator#
max_samples = None#
balance(dataset, split=Split.TRAIN, batch_size=4)#

Balances the specified dataset split.

Parameters:
  • dataset (autointent.Dataset) – Source dataset

  • split (str) – Target split for balancing

  • batch_size (int) – Batch size for asynchronous processing

Return type:

autointent.Dataset