Adversarial human-like augmentation#
This tutorial covers autointent.generation.utterances.HumanUtteranceGenerator together with autointent.generation.utterances.CriticHumanLike. The generator proposes paraphrases of training utterances; the critic asks an LLM to label each candidate as human or generated. Candidates classified as generated are rejected and refined in a loop until the critic accepts them (or retries are exhausted).
Warning
This path is experimental and may hurt data quality if the critic or base model mis-judges natural text. Use small n_final_per_class values first and inspect outputs.
How it fits together#
Generator —
autointent.generation.Generatorwraps your chat/structured-output API (OpenAI-compatible).CriticHumanLike — builds a JSON-schema prompt so the LLM returns
reasoningandlabel(human|generated);is_human()returns whether the utterance passed.HumanUtteranceGenerator — orchestrates rewrite attempts per intent;
augment()can append accepted samples back into a chosen split (default: train).
Installation#
Install the OpenAI-backed generator extra (the Generator wrapper loads the OpenAI client):
pip install "autointent[openai]"
Set OPENAI_API_KEY (and optional base URL) as required by your deployment. No separate DSPy extra is needed for this augmentation path.
Minimal sketch#
from autointent import Dataset
from autointent.generation import Generator
from autointent.generation.utterances import CriticHumanLike, HumanUtteranceGenerator
dataset = Dataset.from_dict({...}) # your train split, with intent names if you use them in prompts
llm = Generator(model_name="gpt-4o-mini")
critic = CriticHumanLike(generator=llm)
augmenter = HumanUtteranceGenerator(generator=llm, critic=critic, async_mode=False)
new_samples = augmenter.augment(dataset, split_name="train", n_final_per_class=3)
See the API reference for full argument lists (HumanUtteranceGenerator, CriticHumanLike).