Adversarial human-like augmentation#

This tutorial covers autointent.generation.utterances.HumanUtteranceGenerator together with autointent.generation.utterances.CriticHumanLike. The generator proposes paraphrases of training utterances; the critic asks an LLM to label each candidate as human or generated. Candidates classified as generated are rejected and refined in a loop until the critic accepts them (or retries are exhausted).

Warning

This path is experimental and may hurt data quality if the critic or base model mis-judges natural text. Use small n_final_per_class values first and inspect outputs.

How it fits together#

  • Generatorautointent.generation.Generator wraps your chat/structured-output API (OpenAI-compatible).

  • CriticHumanLike — builds a JSON-schema prompt so the LLM returns reasoning and label (human | generated); is_human() returns whether the utterance passed.

  • HumanUtteranceGenerator — orchestrates rewrite attempts per intent; augment() can append accepted samples back into a chosen split (default: train).

Installation#

Install the OpenAI-backed generator extra (the Generator wrapper loads the OpenAI client):

pip install "autointent[openai]"

Set OPENAI_API_KEY (and optional base URL) as required by your deployment. No separate DSPy extra is needed for this augmentation path.

Minimal sketch#

from autointent import Dataset
from autointent.generation import Generator
from autointent.generation.utterances import CriticHumanLike, HumanUtteranceGenerator

dataset = Dataset.from_dict({...})  # your train split, with intent names if you use them in prompts

llm = Generator(model_name="gpt-4o-mini")
critic = CriticHumanLike(generator=llm)
augmenter = HumanUtteranceGenerator(generator=llm, critic=critic, async_mode=False)

new_samples = augmenter.augment(dataset, split_name="train", n_final_per_class=3)

See the API reference for full argument lists (HumanUtteranceGenerator, CriticHumanLike).