AutoIntent documentation
========================

**AutoIntent** is an open source tool for automatic configuration of text classification pipelines, with specialized support for intent prediction.

.. note::

   This project is under active development.

The task of intent detection is one of the main subtasks in creating task-oriented dialogue systems, along with scriptwriting and slot filling. While AutoIntent is particularly well-suited for intent detection, it can be applied to any text classification problem, including sentiment analysis, topic classification, document categorization, and other NLP tasks.

AutoIntent project offers users the following:

- A convenient library of methods for intent classification that can be used in a sklearn-like "fit-predict" format.
- An AutoML approach to creating classifiers, where the only thing needed is to upload a set of labeled data.

Example of building an intent classifier in a couple of lines of code:

.. testsetup::

   import importlib.resources as ires

   path_to_json = ires.files("tests.assets.data").joinpath("clinc_subset.json")

.. testcode::

   from autointent import Pipeline, Dataset

   dataset = Dataset.from_json(path_to_json)
   pipeline = Pipeline.from_preset("classic-light")
   pipeline.fit(dataset)
   pipeline.predict(["show me my latest recent transactions"])

.. testcleanup::

   import shutil
   from glob import glob
   for match in glob("vector_db*"):
      shutil.rmtree(match)

Documentation Guide
-------------------

Getting Started
...............

:doc:`🚀 Quickstart <quickstart>`
   Jump right in! Install AutoIntent and build your first text classifier in minutes. Perfect for users who want to get up and running quickly with practical examples.

:doc:`📚 Key Concepts <concepts>`
   Essential terminology and concepts used throughout AutoIntent. Understanding these will help you navigate the documentation and make the most of the library's features.

In-Depth Learning
.................

:doc:`📖 User Guides <user_guides>`
   Comprehensive tutorials and examples that walk you through AutoIntent's capabilities step-by-step. These hands-on guides cover everything from basic usage to advanced techniques.

:doc:`🎓 Learn AutoIntent <learn/index>`
   Dive deeper into the theory behind AutoIntent. Learn about dialogue systems, AutoML principles, and the science that powers intelligent text classification.

Reference
.........

:doc:`🔧 API Reference <autoapi/autointent/index>`
   Complete technical documentation for all classes, methods, and functions. Essential reference for developers integrating AutoIntent into their applications.
   
   Key section: :doc:`Modules <autoapi/autointent/modules/index>`


.. toctree::
   :hidden:
   :maxdepth: 1

   quickstart
   concepts
   user_guides
   learn/index
   autoapi/autointent/index

Quickstart
==========

Welcome to AutoIntent! This guide will get you up and running with intent classification in just a few minutes.

What is AutoIntent?
-------------------

AutoIntent is a powerful AutoML library for intent classification that automatically finds the best model architecture and hyperparameters for your text classification tasks. Whether you're building chatbots or text analysis pipelines, AutoIntent simplifies the process of creating high-performance intent classifiers.

Key Features
------------

* ✨ **AutoML Pipeline**: Automated model selection and hyperparameter optimization
* 🔧 **Modular Design**: Use individual components or the full pipeline
* 📊 **Multiple Algorithms**: Support for classical neural networks, transformers, and traditional ML methods
* 📈 **Experiment Tracking**: Built-in support for Weights & Biases, TensorBoard and CodeCarbon

Installation
------------

Basic Installation
..................

AutoIntent is compatible with Python 3.10+. For core functionality:

.. code-block:: bash

    pip install autointent

With Experiment Tracking
........................

To include experiment tracking capabilities:

.. code-block:: bash

    pip install autointent[wandb,codecarbon]

Development Installation
........................

To install the latest development version:

.. code-block:: bash

    git clone https://github.com/voorhs/AutoIntent.git
    cd AutoIntent
    pip install .

Quick Example
-------------

Here's a complete example that demonstrates AutoIntent's capabilities:

.. testcode:: python

    from autointent import Dataset, Pipeline

    # Prepare your data
    data = {
        "train": [
            {"utterance": "I want to check my account balance", "label": 0},
            {"utterance": "How do I transfer money?", "label": 1},
            {"utterance": "What's my current balance?", "label": 0},
            {"utterance": "I need to send money to my friend", "label": 1},
            {"utterance": "Can you help me make a payment?", "label": 1},
            {"utterance": "Show me my transaction history", "label": 0},
            {"utterance": "Can you show me my account details?", "label": 0},
            {"utterance": "I want to send funds to someone", "label": 1},
            {"utterance": "What is my available balance?", "label": 0},
            {"utterance": "How can I make a transfer?", "label": 1},
            {"utterance": "Please help me with a payment", "label": 1},
            {"utterance": "I need to view my recent transactions", "label": 0}
        ],
        "validation": [
            {"utterance": "Display my account info", "label": 0},
            {"utterance": "I want to transfer funds", "label": 1}
        ]
    }

    # Load data into AutoIntent
    dataset = Dataset.from_dict(data)

    # Initialize and train the AutoML pipeline
    pipeline = Pipeline.from_preset("classic-light")
    pipeline.fit(dataset)

    # Make predictions on new data
    predictions = pipeline.predict([
        "What is my available balance?",
        "Transfer money to John"
    ])

That's it! AutoIntent will automatically find the best model for your data.

Data Format
-----------

AutoIntent expects your data in a simple dictionary format with train/validation/test splits:

Single-Label Classification
...........................

.. code-block:: python

    data = {
        "train": [
            {"utterance": "Hello, how are you?", "label": 0},
            {"utterance": "Book a flight to Paris", "label": 1},
            {"utterance": "What's the weather like?", "label": 2}
        ],
        "validation": [  # Optional
            {"utterance": "Hi there!", "label": 0}
        ],
        "test": [  # Optional but highly recommended
            {"utterance": "Good morning", "label": 0}
        ]
    }

Multi-Label Classification
..........................

For multi-label tasks, use lists of 0s and 1s:

.. code-block:: python

    data = {
        "train": [
            {"utterance": "Book urgent flight to Paris", "label": [1, 0, 1]},  # booking=1, weather=0, urgent=1
            {"utterance": "What's the weather?", "label": [0, 1, 0]}
        ]
    }

Loading Data
............

.. code-block:: python

    from autointent import Dataset

    # From dictionary
    dataset = Dataset.from_dict(data)
    
    # From JSON file
    dataset = Dataset.from_json("/path/to/your/data.json")
    
    # From Hugging Face Hub
    dataset = Dataset.from_hub("your-username/your-dataset")

AutoML Training
---------------

AutoIntent provides several preset configurations optimized for different scenarios:

.. code-block:: python

    from autointent import Pipeline

    # Our quick and accurate SoTA
    pipeline = Pipeline.from_preset("classic-light")

    # If you have more training time
    pipeline = Pipeline.from_preset("classic-heavy")

    # Experimental preset with fine-tuning methods
    pipeline = Pipeline.from_preset("transformers-light")

    # Train the pipeline
    pipeline.fit(dataset)

Available Presets
.................

- ``classic-light``: Fast training with traditional ML methods
- ``classic-heavy``: Comprehensive search with traditional methods
- ``nn-medium``: Classic neural network-based approaches (RNN, CNN)
- ``nn-heavy``: Comprehensive neural network optimization
- ``transformers-light``: Transformer models with limited search
- ``transformers-no-hpo``: Transformer models without hyperparameter optimization
- ``zero-shot-llm``: Zero-shot classification using OpenAI models
- ``zero-shot-encoders``: Zero-shot classification using transformer models

Making Predictions
-------------------

Once trained, use your pipeline for inference:

.. code-block:: python

    # Batch predictions
    results = pipeline.predict([
        "What's my account balance?",
        "Transfer $100 to John",
        "Show me recent transactions"
    ])


Direct Module Usage
-------------------

For more control, use individual components without AutoML:

.. testcode:: python

    from autointent.modules import KNNScorer

    # Initialize a specific scorer
    scorer = KNNScorer(
        embedder_config="sentence-transformers/all-MiniLM-L6-v2",
        k=3
    )

    # Train on your data
    train_utterances = [
        "Check my account balance",
        "Transfer money to account",
        "Show transaction history"
    ]
    train_labels = [0, 1, 0]
    
    scorer.fit(train_utterances, train_labels)

    # Make predictions
    predictions = scorer.predict([
        "What's my current balance?",
        "Send money to my friend"
    ])

Available Modules
.................

- **Scoring**: :class:`autointent.modules.KNNScorer`, :class:`autointent.modules.BertScorer`, :class:`autointent.modules.SklearnScorer`, :class:`autointent.modules.CatBoostScorer`
- **Decision**: :class:`autointent.modules.ArgmaxDecision`,  :class:`autointent.modules.TunableDecision`, :class:`autointent.modules.AdaptiveDecision`

See more at API reference  :doc:`Modules <autoapi/autointent/modules/index>`.

Next Steps
----------

🚀 **Ready to dive deeper?**

- **Concepts**: Learn about :doc:`concepts` and AutoIntent's architecture
- **Tutorials**: Follow our step-by-step guides in :doc:`user_guides`
- **Background**: Learn the AutoML and intent classification from the theoretical perspective at :doc:`learn/index` page.

🛠️ **Need Help?**

- Report issues on our `GitHub Issues <https://github.com/DeepPavlov/AutoIntent/issues>`_
- Check out the full :doc:`API reference <autoapi/autointent/index>`

Happy intent classification! 🎯

============
Key Concepts
============

This page introduces the fundamental concepts that underpin AutoIntent's design and functionality. Understanding these concepts will help you effectively use the framework and make informed decisions about your text classification projects.

.. _concepts-pipeline:

Three-Stage Pipeline Architecture
=================================

AutoIntent organizes text classification into a modular three-stage pipeline, providing clear separation of concerns and flexibility in optimization:

**🔤 Embedding Stage**
   Transforms raw text into dense vector representations using pre-trained transformer models. This stage handles the computationally intensive text encoding and can be optimized independently from downstream classification tasks.

**📊 Scoring Stage**
   Processes embeddings to predict class probabilities. This stage supports diverse approaches from classical machine learning (KNN, logistic regression) to deep learning models (BERT fine-tuning, CNNs). All models operate on pre-computed embeddings for efficiency.

**⚖️ Decision Stage**
   Converts predicted probabilities into final classifications by applying thresholds and decision rules. This stage is crucial for multi-label classification and out-of-scope detection scenarios.

This modular design enables efficient experimentation, allows reusing expensive embedding computations across different models, and supports deployment on CPU-only systems.

.. _concepts-automl:

AutoML Optimization Strategy
============================

AutoIntent employs a hierarchical optimization approach that balances exploration with computational efficiency:

**🔧 Module-Level Optimization**
   Components are optimized sequentially: embedding → scoring → decision. Each stage builds upon the best model from the previous stage, creating a cohesive pipeline while preventing combinatorial explosion.

**🤖 Model-Level Optimization**
   Within each module, both model architectures and hyperparameters are jointly optimized using Optuna's Tree-structured Parzen Estimators and random sampling.

**🗺️ Search Space Configuration**
   Optimization behavior is controlled through dictionary-like search spaces that define:
   
   - Available model types and their hyperparameter ranges
   - Optimization budget and resource constraints  
   - Cross-validation and evaluation strategies

.. _concepts-embedding-centric:

Embedding-Centric Design
========================

AutoIntent's architecture centers around transformer-based text embeddings, providing several key advantages:

**⚡ Pre-computed Embeddings**
   Text is encoded once and reused across all scoring models, dramatically reducing computational overhead during hyperparameter optimization and enabling efficient experimentation.

**🤗 Model Repository Integration**
   Seamless access to thousands of pre-trained models from Hugging Face Hub, with intelligent selection strategies based on retrieval metrics or downstream task performance.

**🚀 Deployment Flexibility**
   Separation of embedding generation from classification enables deploying lightweight classifiers on resource-constrained systems while leveraging powerful transformer representations.

.. _concepts-multiclass-multilabel:

Multi- vs. Single-label classification
======================================

AutoIntent supports various classification scenarios through its flexible decision module:

**🏷️ Multi-Class Classification**
   Each input gets assigned to exactly one category - like sorting emails into "Spam", "Work", or "Personal" folders. Common examples include sentiment analysis (positive/negative/neutral) or determining user intent where each message has a single purpose. The model picks the single best match from all possible categories.

**🔖 Multi-Label Classification** 
   Each input can belong to multiple categories at once - like tagging a news article as both "Politics" and "Economics". Essential for scenarios like multi-intent messages ("book a flight and check weather"), content tagging, or any situation where multiple labels can apply simultaneously. The model almost independently decides whether each possible category fits or not.


.. _concepts-oos:

Out-of-Scope Detection
======================

A critical capability for production text classification systems, especially in conversational AI:

**📏 Confidence Thresholding**
   Uses predicted probability scores to identify inputs that don't belong to any known class. Threshold values can be tuned automatically to balance precision and recall.

**🔗 Integration with Multi-Label**
   OOS detection works seamlessly with multi-label scenarios, enabling detection of completely unknown inputs vs. partial matches to known classes.

.. _concepts-presets:

Optimization Presets
====================

AutoIntent provides predefined optimization strategies that balance quality, speed, and resource consumption:

**⚡ Zero-Shot Presets**
   Leverage class descriptions and large language models for classification without training data. Ideal for rapid prototyping and cold-start scenarios.

**📈 Classic Presets**
   Focus on traditional ML approaches (KNN, linear models, tree-based methods) operating on transformer embeddings. Offer excellent balance of performance and efficiency.

**🧠 Neural Network Presets**
   Include deep learning approaches like CNN, RNN, and transformer fine-tuning. Provide highest potential performance at increased computational cost.

**🪜 Computational Tiers**
   Each preset family offers light, medium, and heavy variants that trade optimization time for potential performance improvements.

.. _concepts-modularity:

Modular Architecture
====================

AutoIntent's design emphasizes modularity and extensibility:

**🧩 Plugin Architecture**
   Each component (embedding models, scoring methods, decision strategies) implements a common interface, enabling easy addition of new approaches without modifying core framework code.

**⚙️ Configuration-Driven**
   All aspects of optimization can be controlled through declarative configuration files, supporting reproducible experiments and easy sharing of optimization strategies.

**🔧 Extensibility**
   Framework can be extended with custom embedding models, scoring algorithms, and decision strategies while maintaining compatibility with the AutoML optimization pipeline.

This modular design ensures that AutoIntent can evolve with advances in NLP research while maintaining stability and backward compatibility for existing users.


.. _user_guides:

User Guides
-----------

.. toctree::
   :glob:
   :maxdepth: 1

   user_guides/index_basic_usage
   user_guides/index_advanced_usage
   augmentation_tutorials/index


Basic Usage
===========

.. nbgallery::
   user_guides.basic_usage.01_data.py
   user_guides.basic_usage.02_modules.py
   user_guides.basic_usage.03_automl.py
   user_guides.basic_usage.04_inference.py


# %% [markdown]
"""
# Data

In this chapter you will learn how to work with intent classification data in AutoIntent. We'll cover creating datasets, loading data from different sources, and manipulating your data for optimal results.
"""

# %%
import datasets

from autointent import Dataset

# %%
datasets.logging.disable_progress_bar()  # disable tqdm outputs

# %% [markdown]
"""
## Creating your first dataset

The easiest way to get started is by creating a dataset from a Python dictionary. Let's start with a simple banking intent classification example:
"""

# %%
# Create a simple intent classification dataset
data = {
    "train": [
        {"utterance": "What is my account balance?", "label": 0},
        {"utterance": "Check my current balance", "label": 0},
        {"utterance": "Show me my account details", "label": 0},
        {"utterance": "I want to transfer money", "label": 1},
        {"utterance": "How do I send funds to someone?", "label": 1},
        {"utterance": "Make a payment to my friend", "label": 1},
        {"utterance": "Cancel my last transaction", "label": 2},
        {"utterance": "Reverse the payment I just made", "label": 2},
        {"utterance": "Stop this transfer", "label": 2},
    ],
    "validation": [
        {"utterance": "Display my balance", "label": 0},
        {"utterance": "Send money to John", "label": 1},
        {"utterance": "Cancel the last payment", "label": 2},
    ],
    "test": [
        {"utterance": "How much money is in my account?", "label": 0},
        {"utterance": "Transfer funds to my savings", "label": 1},
        {"utterance": "Undo my recent payment", "label": 2},
    ],
}

# Load the data into AutoIntent
dataset = Dataset.from_dict(data)
print(f"Dataset created with {len(dataset['train'])} training samples")

# %% [markdown]
"""
This creates a dataset with three intent classes:
- **0**: Balance inquiries
- **1**: Money transfers
- **2**: Transaction cancellations

**Important notes about data splits:**
- **Test split**: Highly recommended as a frozen evaluation set that's never used during training
- **Validation split**: Optional - if not provided, AutoIntent will automatically split your training data
- **Training split**: Required - this is where your model learns from
"""

# %% [markdown]
"""
## Understanding the data format

AutoIntent expects your data in a specific format. Here are the key requirements:

### Single-label classification
For most intent classification tasks, each utterance belongs to exactly one class:

```python
{
    "train": [
        {"utterance": "Hello!", "label": 0},
        {"utterance": "Book a flight", "label": 1},
        {"utterance": "What's the weather?", "label": 2}
    ],
    "test": [  # Recommended: frozen test set
        {"utterance": "Hi there!", "label": 0}
    ]
    # validation split is optional - AutoIntent will create one if needed
}
```

### Multi-label classification
For tasks where utterances can belong to multiple classes, use a list of labels:

```python
{
    "train": [
        {"utterance": "Book urgent flight to Paris", "label": [1, 0, 1]},  # booking=1, weather=0, urgent=1
        {"utterance": "What's the weather like?", "label": [0, 1, 0]}
    ],
    "test": [
        {"utterance": "Emergency flight booking", "label": [1, 0, 1]}
    ]
}
```
"""

# %% [markdown]
"""
## Loading data from different sources

AutoIntent supports multiple ways to load your data:
"""

# %% [markdown]
"""
### From a dictionary (recommended for getting started)
Perfect when you have your data ready in Python:
"""

# %%
# Example with a complete dataset including all splits
banking_data = {
    "train": [
        {"utterance": "What is my account balance?", "label": 0},
        {"utterance": "Check my savings balance", "label": 0},
        {"utterance": "How much money do I have?", "label": 0},
        {"utterance": "Transfer $100 to savings", "label": 1},
        {"utterance": "Send money to my friend", "label": 1},
        {"utterance": "Make a payment", "label": 1},
        {"utterance": "Cancel my last payment", "label": 2},
        {"utterance": "Stop this transaction", "label": 2},
        {"utterance": "Reverse my transfer", "label": 2},
    ],
    "validation": [
        {"utterance": "Display my balance", "label": 0},
        {"utterance": "Send $50 to John", "label": 1},
        {"utterance": "Stop my last transaction", "label": 2},
    ],
    "test": [
        {"utterance": "Show me my current balance", "label": 0},
        {"utterance": "I want to transfer funds", "label": 1},
        {"utterance": "Cancel this payment", "label": 2},
    ],
}

dataset_from_dict = Dataset.from_dict(banking_data)
print("✅ Dataset loaded from dictionary")
print(f"Splits: {list(dataset_from_dict.keys())}")

# %% [markdown]
"""
### From a JSON file
When you have your data saved as a JSON file with the same structure:
"""

# %%
# Example: dataset_from_json = Dataset.from_json("/path/to/your/data.json")

# %% [markdown]
"""
### From Hugging Face Hub
For loading public datasets or sharing your own:
"""

# %%
# Load a sample dataset from HuggingFace Hub
dataset_from_hub = Dataset.from_hub("DeepPavlov/banking77")
print("✅ Dataset loaded from Hugging Face Hub")
print(f"Training samples: {len(dataset_from_hub['train'])}")

# %% [markdown]
"""
## Working with your dataset

Once loaded, your dataset behaves like a dictionary of [Hugging Face datasets](https://huggingface.co/docs/datasets/en/package_reference/main_classes#datasets.Dataset):
"""

# %%
# Access different splits
print("Available splits:", list(dataset_from_hub.keys()))
print(f"Training samples: {len(dataset_from_hub['train'])}")

# View the first few samples
print("\nFirst 3 training samples:")
train_split = dataset_from_hub["train"][:3]
for i, (utterance, label) in enumerate(zip(train_split["utterance"], train_split["label"], strict=True)):
    print(f"{i+1}. '{utterance}' → label {label}")

# %% [markdown]
"""
### Working with individual samples
"""

# %%
# Access specific samples
first_sample = dataset_from_hub["train"][0]
print(f"First sample: '{first_sample['utterance']}' (label: {first_sample['label']})")

# Slice multiple samples
batch = dataset_from_hub["train"][5:10]
print("\nBatch of 5 samples:")
for utterance, label in zip(batch["utterance"], batch["label"], strict=True):
    print(f"  '{utterance}' → {label}")

# %% [markdown]
"""
## Saving and sharing datasets

### Save to Hugging Face Hub
To share your dataset with others or for reproducibility:
"""

# %%
# dataset.push_to_hub("your-username/your-dataset-name")
# Note: Make sure you're logged in with `huggingface-cli login`

# %% [markdown]
"""
### Save to a local JSON file

You can also save your dataset to a local JSON file for backup or sharing outside the Hugging Face Hub. Use the `to_json` method:
"""

# %%
# Save the dataset to a local JSON file
# dataset_from_hub.to_json("my_banking77_dataset.json")


# %% [markdown]
"""
## Best practices

### 1. **Data quality matters**
- Ensure consistent labeling across your dataset
- Include diverse examples for each intent
- Aim for balanced classes when possible

### 2. **Split your data wisely**
- **Training**: 60-80% of your data (required)
- **Validation**: 10-20% (optional - AutoIntent will create from training if not provided)
- **Test**: 10-20% (highly recommended - keep this frozen for final evaluation)

### 3. **Start small, then scale**
- Begin with a small representative sample (10-20 examples per intent)
- Use AutoIntent to find the best approach
- Scale up with more data once you've validated your setup
- **Tip**: If you have limited data, consider using AutoIntent's augmentation tools (see %mddoclink(rst,augmentation_tutorials.index))
"""

# %% [markdown]
"""
## Next steps

Now that you know how to work with data in AutoIntent, you're ready to explore the different modules and techniques available for intent classification.

**Up next**: Learn how to use individual modules for more control over your intent classification pipeline.

- Next chapter: %mddoclink(notebook,basic_usage.02_modules)
- See also: %mddoclink(notebook,advanced.01_data) covering advanced topics like OOS samples and adding information about intent
"""


Advanced Usage
==============

.. nbgallery::
   user_guides.advanced.01_data.py
   user_guides.advanced.02_embedder_configuration.py
   user_guides.advanced.03_automl.py
   user_guides.advanced.04_reporting.py
   user_guides.advanced.05_logging.py


# %% [markdown]
"""
# Data

This chapter covers advanced data handling techniques in AutoIntent that go beyond basic dataset creation. You'll learn how to handle out-of-scope samples, enrich your data with intent metadata, and leverage advanced features for robust intent classification systems.
"""

# %% [markdown]
"""
**Prerequisites**: Complete the %mddoclink(notebook,basic_usage.01_data) tutorial first.
"""

# %%
import datasets

from autointent import Dataset

# %%
datasets.logging.disable_progress_bar()  # disable tqdm outputs for cleaner output

# %% [markdown]
"""
## Handling Out-of-Scope (OOS) Samples

Out-of-scope detection is crucial for robust intent classification systems. Users often say things that don't match any of your predefined intents, and your system needs to handle these gracefully.

### What are Out-of-Scope Samples?

Out-of-scope (OOS) samples are utterances that don't belong to any of your defined intent classes. For example, in a banking chatbot, "What's the weather like?" would be out-of-scope.
"""

# %%
# Create a dataset with out-of-scope samples
banking_with_oos = {
    "train": [
        # In-domain samples
        {"utterance": "What's my account balance?", "label": 0},
        {"utterance": "Check my current balance", "label": 0},
        {"utterance": "I want to transfer money to my friend", "label": 1},
        {"utterance": "How do I send funds to someone?", "label": 1},
        {"utterance": "Cancel my last transaction", "label": 2},
        {"utterance": "Reverse the payment I just made", "label": 2},
        # Out-of-scope samples (no label field)
        {"utterance": "What's the weather like today?"},
        {"utterance": "Tell me a joke"},
        {"utterance": "How do I cook pasta?"},
        {"utterance": "What time is it?"},
        {"utterance": "I love pizza"},
    ],
    "test": [
        {"utterance": "Show me my current balance", "label": 0},
        {"utterance": "Transfer $100 to my savings", "label": 1},
        {"utterance": "Stop my recent payment", "label": 2},
        {"utterance": "What's your favorite movie?"},  # OOS
        {"utterance": "How's the traffic today?"},  # OOS
    ],
    "intents": [
        {"id": 0, "name": "balance_inquiry"},
        {"id": 1, "name": "money_transfer"},
        {"id": 2, "name": "transaction_cancellation"},
    ],
}

dataset_with_oos = Dataset.from_dict(banking_with_oos)
print("✅ Dataset with OOS samples created")
print(f"Available splits: {list(dataset_with_oos.keys())}")

# %% [markdown]
"""
### Advanced OOS Strategies

For robust systems, you'll want to carefully curate your OOS samples:

1. **Domain-adjacent samples**: Include utterances that are close to your domain but still out-of-scope
2. **Common conversational patterns**: Add greetings, small talk, and common user behaviors
3. **Edge cases**: Include borderline cases that might confuse your model
"""

# %%
# Example of well-curated OOS samples for a banking domain
sophisticated_oos_data = {
    "train": [
        # In-scope samples
        {"utterance": "What's my account balance?", "label": 0},
        {"utterance": "I want to transfer money", "label": 1},
        # Sophisticated out-of-scope samples
        {"utterance": "Hello, how are you?"},  # Greeting
        {"utterance": "Thanks for your help!"},  # Courtesy
        {"utterance": "What other services do you offer?"},  # Domain-adjacent
        {"utterance": "I'm having trouble with the app"},  # Technical support (different domain)
        {"utterance": "Can you recommend a good investment?"},  # Financial advice (borderline)
        {"utterance": "What are your business hours?"},  # Information request (different domain)
    ],
    "intents": [
        {"id": 0, "name": "balance_inquiry"},
        {"id": 1, "name": "money_transfer"},
    ],
}

sophisticated_dataset = Dataset.from_dict(sophisticated_oos_data)

# %% [markdown]
"""
## Enriching Data with Intent Metadata

Intent metadata allows you to provide additional information about your intents that can be leveraged by various AutoIntent modules for improved performance.
"""

# %% [markdown]
"""
### Intent Metadata Example

Here's an example showing how to add metadata to your intents:
"""

# %%
# Create a dataset with rich intent metadata
comprehensive_banking_data = {
    "train": [
        {"utterance": "What's my account balance?", "label": 0},
        {"utterance": "Check my current balance", "label": 0},
        {"utterance": "How much money do I have?", "label": 0},
        {"utterance": "I want to transfer money", "label": 1},
        {"utterance": "Send funds to my friend", "label": 1},
        {"utterance": "Make a payment to someone", "label": 1},
        {"utterance": "Cancel my last transaction", "label": 2},
        {"utterance": "Reverse this payment", "label": 2},
        {"utterance": "Stop my transfer", "label": 2},
        {"utterance": "I need help with my account", "label": 3},
        {"utterance": "Can someone assist me?", "label": 3},
        {"utterance": "I have a question about my account", "label": 3},
    ],
    "intents": [
        {
            "id": 0,
            "name": "balance_inquiry",
            "description": "User wants to check their account balance or available funds",
        },
        {
            "id": 1,
            "name": "money_transfer",
            "description": "User wants to transfer money or make a payment to another person or account",
        },
        {
            "id": 2,
            "name": "transaction_cancellation",
            "description": "User wants to cancel, reverse, or stop a transaction or payment",
        },
        {
            "id": 3,
            "name": "general_help",
            "description": "User is requesting general assistance or has a question",
        },
    ],
}

rich_dataset = Dataset.from_dict(comprehensive_banking_data)
print("✅ Dataset with rich intent metadata created")

# %% [markdown]
"""
### Understanding Intent Metadata Fields

Let's examine what each metadata field does and how AutoIntent modules use them:
"""

# %%
# Examine the intent metadata
print("Intent metadata breakdown:\n")
for intent in rich_dataset.intents:
    print(f"🎯 Intent: {intent.name} (ID: {intent.id})")
    print(f"   Description: {intent.description}")
    print()

# %% [markdown]
"""
### How Modules Use Intent Metadata

- **`name`**: Human-readable intent names for interpretability and debugging, also can be used by AutoIntent augmentation methods
- **`description`**: Used by %mddoclink(class,modules.scoring,BiEncoderDescriptionScorer), %mddoclink(class,modules.scoring,CrossEncoderDescriptionScorer), %mddoclink(class,modules.scoring,LLMDescriptionScorer) to calculate semantic similarity between utterances and intent descriptions

**Pro tip**: Well-crafted descriptions significantly improve performance for description-based scoring modules, especially with limited training data.
"""

# %% [markdown]
"""
## Advanced Dataset Manipulation

### Working with Large Datasets

For systems with large datasets, you'll want efficient ways to manipulate and analyze your data:
"""

# %%
# Load a larger dataset for demonstration
dataset = Dataset.from_hub("DeepPavlov/clinc150_subset")

# Dataset analysis
print("📊 Dataset Analysis")
print(f"Dataset splits: {list(dataset.keys())}")
print(f"Total training samples: {len(dataset['train_0']) + len(dataset['train_1'])}")
print(f"Number of intents: {len(dataset.intents)}")

# Examine class distribution
from collections import Counter

label_counts = Counter(dataset["train_0"]["label"])
print("\nClass distribution (top 5):")
for label, count in label_counts.most_common(5):
    intent_name = dataset.intents[label].name
    print(f"  {intent_name} (label {label}): {count} samples")

# %% [markdown]
"""
### Custom Data Processing

You can process your datasets using the underlying Hugging Face datasets functionality:
"""

# %%
# Example: Filter samples by length
short_utterances = dataset["train_0"].filter(lambda x: len(x["utterance"].split()) <= 5)
print(f"Short utterances (≤5 words): {len(short_utterances)} samples")


# Example: Add computed features
def add_utterance_length(example):
    example["utterance_length"] = len(example["utterance"].split())
    return example


enriched_train = dataset["train_0"].map(add_utterance_length)
print(f"Added utterance_length feature to {len(enriched_train)} samples")

# Show example with new feature
sample = enriched_train[0]
print(f"Sample: '{sample['utterance']}' (length: {sample['utterance_length']} words)")

# %% [markdown]
"""
### Creating Custom Splits

For advanced experimentation, you might want to create custom data splits:
"""


# %%
# Example: Create a custom split based on utterance characteristics
def create_length_based_splits(dataset_split, short_threshold=5, long_threshold=10):
    """Split data based on utterance length for targeted evaluation."""

    def is_short(example):
        return len(example["utterance"].split()) <= short_threshold

    def is_long(example):
        return len(example["utterance"].split()) >= long_threshold

    short_split = dataset_split.filter(is_short)
    long_split = dataset_split.filter(is_long)

    return short_split, long_split


short_test, long_test = create_length_based_splits(dataset["test"])
print("Custom splits created:")
print(f"  Short utterances: {len(short_test)} samples")
print(f"  Long utterances: {len(long_test)} samples")

# %% [markdown]
"""
## Next Steps

You now understand advanced data handling in AutoIntent, including:

- ✅ Out-of-scope sample handling for robust intent classification
- ✅ Intent metadata for improved model performance
- ✅ Advanced dataset manipulation and analysis techniques

**What's next:**
- Explore %mddoclink(notebook,advanced.03_automl) for advanced AutoML techniques
- Learn about %mddoclink(rst,augmentation_tutorials.index) to expand your datasets
- See %mddoclink(notebook,advanced.04_reporting) for comprehensive model evaluation

**Pro tip**: Start with a small, well-curated dataset with good intent descriptions, then scale up using AutoIntent's optimization capabilities to find the best approach for your specific use case.
"""


# %% [markdown]
"""
# Embedder Configuration

This tutorial covers comprehensive embedder configuration for text classification modules in AutoIntent. Most scoring modules use embedders to convert text into vector representations, which are crucial for model performance.

## Overview

AutoIntent uses the **sentence-transformers** library under the hood to access embedding models from the Hugging Face Hub. The library automatically detects available devices (CUDA, MPS, CPU, etc.) and optimizes performance accordingly. This means you don't need to manually specify device preferences in most cases - the system will automatically use the best available hardware.

## Configuration Approaches

### Simple Configuration

The simplest way is to pass a model name as a string:
"""

# %%
from autointent.modules.scoring import KNNScorer, LinearScorer

# Using just the model name - sentence-transformers handles device detection
scorer = LinearScorer(embedder_config="sentence-transformers/all-MiniLM-L6-v2")

# %% [markdown]
"""
### Advanced Configuration

For more control, pass a dictionary with configuration parameters:
"""

# %%
from autointent.configs import EmbedderConfig

# Using a dictionary for detailed configuration
advanced_embedder_config = {
    "model_name": "sentence-transformers/all-MiniLM-L6-v2",
    "batch_size": 64,  # Increase batch size for faster processing
    "device": "cuda:0",  # Override automatic detection if needed
    "tokenizer_config": {
        "max_length": 256,  # Set custom max sequence length
        "padding": True,
        "truncation": True,
    },
    "similarity_fn_name": "cosine",  # Choose similarity function
    "use_cache": True,  # Enable embedding caching
}

scorer = LinearScorer(embedder_config=advanced_embedder_config)

# %% [markdown]
"""
### Using EmbedderConfig Class

You can also use the `EmbedderConfig` class directly for type safety and IDE support:
"""

# %%
import torch

from autointent.configs import TokenizerConfig

embedder_config = EmbedderConfig(
    model_name="sentence-transformers/all-mpnet-base-v2",
    batch_size=32,
    # Device is auto-detected, but you can override if needed
    device="cuda" if torch.cuda.is_available() else "cpu",
    tokenizer_config=TokenizerConfig(max_length=512, padding=True, truncation=True),
    classification_prompt="Classify the following text: ",  # Task-specific prompt
    similarity_fn_name="cosine",
    use_cache=True,
    freeze=True,  # Freeze model parameters for consistent embeddings
)

scorer = KNNScorer(embedder_config=embedder_config, k=10)

# %% [markdown]
"""
## Key Configuration Options

### Model Selection
- **`model_name`**: Any Sentence Transformers or Hugging Face model name
  - Popular choices: `"sentence-transformers/all-MiniLM-L6-v2"`, `"sentence-transformers/all-mpnet-base-v2"`
  - Language-specific: `"sentence-transformers/paraphrase-multilingual-MiniLM-L12-v2"`
  - Specialized models: `"sentence-transformers/all-distilroberta-v1"`, `"sentence-transformers/gtr-t5-base"`

### Infrastructure Settings
- **`device`**: Hardware device (`"cpu"`, `"cuda"`, `"cuda:0"`, `"mps"`, etc.)
  - Usually auto-detected by sentence-transformers
  - Override only if you need specific device control
- **`batch_size`**: Number of texts to process simultaneously (higher = faster but more memory)
- **`bf16`/`fp16`**: Enable mixed precision for memory efficiency (requires compatible hardware)
- **`trust_remote_code`**: Whether to trust remote code when loading models (default: False)

### Tokenizer Settings
- **`tokenizer_config.max_length`**: Maximum sequence length (longer texts are truncated)
- **`tokenizer_config.padding`**: How to pad shorter sequences (`True`, `"longest"`, `"max_length"`, `"do_not_pad"`)
- **`tokenizer_config.truncation`**: Whether to truncate longer sequences (default: True)

### Task-Specific Prompts
Prompts can significantly improve embedding quality for specific tasks:

- **`classification_prompt`**: Prompt for classification tasks
- **`default_prompt`**: General-purpose prompt used when no task-specific prompt is available
- **`query_prompt`/`passage_prompt`**: For retrieval and search tasks
- **`cluster_prompt`**: For clustering tasks
- **`sts_prompt`**: For semantic textual similarity tasks

### Performance Settings
- **`use_cache`**: Cache embeddings to disk for repeated use (highly recommended)
- **`freeze`**: Freeze model parameters for consistent embeddings across runs
- **`similarity_fn_name`**: Similarity function (default: `"cosine"`; other options like `"dot"`, `"euclidean"`, `"manhattan"` are available, but we recommend keeping the default unless you have a specific reason)

## Practical Examples

### Performance-Optimized Configuration
"""

# %%
# Example: Performance-optimized configuration
perf_config = EmbedderConfig(
    model_name="sentence-transformers/all-MiniLM-L6-v2",  # Fast, lightweight model
    batch_size=128,  # Large batch for speed
    # Device auto-detected by sentence-transformers
    tokenizer_config=TokenizerConfig(max_length=128),  # Shorter sequences for speed
    use_cache=True,  # Cache for repeated experiments
    fp16=torch.cuda.is_available(),  # Use mixed precision on GPU
)

scorer = KNNScorer(embedder_config=perf_config, k=5)

# %% [markdown]
"""
### Quality-Optimized Configuration
"""

# %%
# Example: Quality-optimized configuration
quality_config = EmbedderConfig(
    model_name="sentence-transformers/all-mpnet-base-v2",  # High-quality model
    batch_size=16,  # Smaller batch to handle longer sequences
    tokenizer_config=TokenizerConfig(max_length=512),  # Longer sequences for context
    classification_prompt="Classify the intent of this message: ",
    use_cache=True,
    freeze=True,
    similarity_fn_name="cosine",
)

scorer = LinearScorer(embedder_config=quality_config)

# %% [markdown]
"""
### Multilingual Configuration
"""

# %%
# Example: Multilingual setup
multilingual_config = EmbedderConfig(
    model_name="sentence-transformers/paraphrase-multilingual-MiniLM-L12-v2",
    batch_size=32,
    tokenizer_config=TokenizerConfig(max_length=256),
    use_cache=True,
    freeze=True,
)

scorer = KNNScorer(embedder_config=multilingual_config, k=7)

# %% [markdown]
"""
## Performance Tips

### 1. Leverage Automatic Device Detection
- Sentence-transformers automatically detects and uses the best available hardware
- Only override `device` if you need specific control (e.g., multi-GPU setups)
- The library handles CUDA, MPS (Apple Silicon), and CPU optimization automatically

### 2. Use Caching Effectively
- Enable `use_cache=True` for repeated experiments
- Cached embeddings are stored on disk and reused across runs
- Particularly useful during hyperparameter tuning

### 3. Optimize Batch Size
- Increase `batch_size` for faster processing
- Monitor memory usage - larger batches use more GPU/CPU memory

### 4. Choose Appropriate Sequence Length
- Longer sequences (`max_length`) provide more context but are slower
- For short texts (tweets, queries): 128-256 tokens
- For documents: 512+ tokens
- Balance accuracy vs. speed based on your use case

### 5. Select the Right Model
- **Tip**: For best results, choose a model from the [Massive Text Embedding Benchmark (MTEB) leaderboard](https://huggingface.co/spaces/mteb/leaderboard), which ranks models by quality and speed across many tasks.

### 6. Use Mixed Precision
- Enable `fp16=True` on compatible GPUs for faster inference
- Reduces memory usage without significant quality loss
- Automatically handled by sentence-transformers on supported hardware

## Troubleshooting

### Common Issues

1. **Out of Memory Errors**
   - Reduce `batch_size`
   - Decrease `max_length`
   - Enable mixed precision (`fp16=True`) [planned to implement]

2. **Slow Inference**
   - Increase `batch_size` (if memory allows)
   - Use a lighter model (e.g., MiniLM instead of MPNet)
   - Reduce `max_length`
   - Ensure GPU/MPS utilization

3. **Inconsistent Results**
   - Set `freeze=True` for reproducible embeddings
   - Use `use_cache=True` to avoid recomputation
   - Check if seed is set for your program
"""


# %% [markdown]
"""
# AutoML Customization

In this guide, you will learn how to configure a custom hyperparameter search space.
"""

# %% [markdown]
"""
## Python API

> Before reading this guide, we recommend familiarizing yourself with the sections %mddoclink(rst,concepts) and %mddoclink(rst,learn.optimization).
"""

# %% [markdown]
"""
### Optimization Module


To set up the optimization module, you need to create the following dictionary:
"""

# %%
knn_module = {
    "module_name": "knn",
    "k": [1, 5, 10, 50],
    "embedder_config": ["sergeyzh/rubert-tiny-turbo"],
}

# %% [markdown]
"""
The ``module_name`` field specifies the name of the module. You can explore the available names by yourself:
"""

# %%
from autointent.modules import DECISION_MODULES, EMBEDDING_MODULES, REGEX_MODULES, SCORING_MODULES

print(list(SCORING_MODULES.keys()))
print(list(DECISION_MODULES.keys()))
print(list(EMBEDDING_MODULES.keys()))
print(list(REGEX_MODULES.keys()))

# %% [markdown]
"""
All fields except ``module_name`` are lists that define the search space for each hyperparameter (see %mddoclink(class,modules.scoring,KNNScorer)). If you omit them, the default set of hyperparameters will be used:
"""

# %%
linear_module = {"module_name": "linear"}

# %% [markdown]
"""
See docs %mddoclink(class,modules.scoring,LinearScorer).
"""

# %% [markdown]
"""
### Optimization Node

To set up the optimization node, you need to create a list of modules and specify the target metric for optimization:
"""

# %%
scoring_node = {
    "node_type": "scoring",
    "target_metric": "scoring_roc_auc",
    "search_space": [
        knn_module,
        linear_module,
    ],
}

# %% [markdown]
"""
### Search Space

The search space for the entire pipeline looks approximately like this:
"""

# %%
search_space = [
    {
        "node_type": "embedding",
        "target_metric": "retrieval_hit_rate",
        "search_space": [
            {
                "module_name": "retrieval",
                "k": [10],
                "embedder_config": ["avsolatorio/GIST-small-Embedding-v0", "sergeyzh/rubert-tiny-turbo"],
            }
        ],
    },
    {
        "node_type": "scoring",
        "target_metric": "scoring_roc_auc",
        "search_space": [
            {"module_name": "knn", "k": [1, 3, 5, 10], "weights": ["uniform", "distance", "closest"]},
            {"module_name": "linear"},
            {
                "module_name": "dnnc",
                "cross_encoder_config": ["DiTy/cross-encoder-russian-msmarco"],
                "k": [1, 3, 5, 10],
            },
        ],
    },
    {
        "node_type": "decision",
        "target_metric": "decision_accuracy",
        "search_space": [{"module_name": "threshold", "thresh": [0.5]}, {"module_name": "argmax"}],
    },
]

# %% [markdown]
"""
### Load Data

Let us use small subset of popular `clinc150` dataset:
"""

# %%
from autointent import Dataset

dataset = Dataset.from_hub("DeepPavlov/clinc150_subset")

# %% [markdown]
"""
### Start Auto Configuration
"""

# %%
from autointent import Pipeline

pipeline_optimizer = Pipeline.from_search_space(search_space)
pipeline_optimizer.fit(dataset)

# %% [markdown]
"""
There are three hyperparameter tuning samplers available:

- "random"
- "tpe"

All the samplers are implemented with [optuna](https://optuna.org/).
"""

# %% [markdown]
"""
One can use more versatile %mddoclink(class,,OptimizationConfig) and %mddoclink(method,Pipeline,from_optimization_config).
"""

# %% [markdown]
"""
## See Also

- [Modules API reference](../autoapi/autointent/modules/index.rst) to get familiar with modules to include into search space
"""


# %% [markdown]
"""
# Run reporting

This script demonstrates how to report the optimization process using the AutoIntent library.
"""

# %%
search_space = [
    {
        "node_type": "embedding",
        "target_metric": "retrieval_hit_rate",
        "search_space": [
            {
                "module_name": "retrieval",
                "k": [10],
                "embedder_config": ["avsolatorio/GIST-small-Embedding-v0", "sergeyzh/rubert-tiny-turbo"],
            }
        ],
    },
    {
        "node_type": "scoring",
        "target_metric": "scoring_roc_auc",
        "search_space": [
            {"module_name": "knn", "k": [1, 3, 5, 10], "weights": ["uniform", "distance", "closest"]},
            {"module_name": "linear"},
            {
                "module_name": "dnnc",
                "cross_encoder_config": ["cross-encoder/ms-marco-MiniLM-L6-v2"],
                "k": [1, 3, 5, 10],
            },
        ],
    },
    {
        "node_type": "decision",
        "target_metric": "decision_accuracy",
        "search_space": [{"module_name": "threshold", "thresh": [0.5]}, {"module_name": "argmax"}],
    },
]

# %% [markdown]
"""
### Load Data

Let us use small subset of popular `clinc150` dataset:
"""

# %%

from autointent import Dataset

dataset = Dataset.from_hub("DeepPavlov/clinc150_subset")

# %% [markdown]
"""
### Start Auto Configuration
"""

# %%
from autointent import Pipeline

pipeline_optimizer = Pipeline.from_search_space(search_space)

# %% [markdown]
"""
## Reporting

Currently supported reporting options are:
- tensorboard
- wandb
"""
# %%
from autointent.configs import LoggingConfig
from pathlib import Path

log_config = LoggingConfig(
    run_name="test_tensorboard", report_to=["tensorboard"], project_dir=Path("my_projects"), dump_modules=False
)

pipeline_optimizer.set_config(log_config)

# %%
pipeline_optimizer.fit(dataset)

# %% [markdown]
"""
Now results of the optimization process can be viewed in the tensorboard.

```bash
tensorboard --logdir test_tensorboard
```
"""


# %% [markdown]
"""
# Logging to stdout and file

This guide will teach you how to configure logging in AutoIntent. By default, it is fully disabled.

It will be demonstrated on toy search_space example:
"""

# %%
from pathlib import Path

from autointent import Dataset, Pipeline
from autointent.configs import LoggingConfig

search_space = [
    {
        "node_type": "scoring",
        "target_metric": "scoring_roc_auc",
        "search_space": [
            {
                "module_name": "knn",
                "k": [1],
                "weights": ["uniform"],
                "embedder_config": ["avsolatorio/GIST-small-Embedding-v0"],
            },
        ],
    },
    {
        "node_type": "decision",
        "target_metric": "decision_accuracy",
        "search_space": [
            {"module_name": "threshold", "thresh": [0.5]},
            {"module_name": "argmax"},
        ],
    },
]

log_config = LoggingConfig(project_dir=Path("logging_tutorial"))
pipeline_optimizer = Pipeline.from_search_space(search_space)
pipeline_optimizer.set_config(log_config)

dataset = Dataset.from_hub("DeepPavlov/clinc150_subset")

# %% [markdown]
"""
## Fully Custom Logging

One can fully customize logging via python's standard module [`logging`](https://docs.python.org/3/library/logging.html). Everything you need to do is configure it before AutoIntent execution:
"""
# %%
import logging

logging.basicConfig(level="INFO")
pipeline_optimizer.fit(dataset)

# %% [markdown]
"""
See external tutorials and guides about `logging` module.
"""

# %% [markdown]
"""
## Export from AutoIntent

If you don't have to customize logging, you can export our configuration. Everything you need to do is setup it before AutoIntent execution:
"""

# %%
from autointent import setup_logging

setup_logging("INFO", log_filename="tests/logs/my_exp")

# %% [markdown]
"""
The first parameter affects the logs to the standard output stream. The second parameter is optional. If it is specified, then the "DEBUG" messages are logged to the file, regardless of what is specified by the first parameter.
"""


# %% [markdown]
"""
# Modules

Modules are the core building blocks of AutoIntent, providing the fundamental functionality for intent classification tasks. This guide will walk you through everything you need to know about using modules effectively in your projects.

## What You'll Learn

By the end of this tutorial, you'll be able to:

- Understand the different types of modules and their roles
- Initialize and configure modules for your specific needs
- Train modules on your data and use them for inference
- Save and load trained modules for reuse
- Debug and inspect module predictions

## Understanding Module Types

AutoIntent provides two complementary types of modules that work together to solve intent classification:

### Scoring Modules
**Purpose**: Convert text utterances into probability distributions over intent classes.

**What they do**: Take raw text as input and output a probability vector where each element represents the likelihood of the utterance belonging to a specific intent class.

**Examples**: %mddoclink(class,modules.scoring,KNNScorer), %mddoclink(class,modules.scoring,LinearScorer), %mddoclink(class,modules.scoring,BertScorer), %mddoclink(class,modules.scoring,CatBoostScorer)

### Decision Modules
**Purpose**: Convert probability scores into final predictions with support for multi-label and out-of-domain detection.

**What they do**: Take probability vectors from scoring modules and apply decision logic to determine the final set of predicted labels.

**Examples**: %mddoclink(class,modules.decision,ArgmaxDecision), %mddoclink(class,modules.decision,ThresholdDecision), %mddoclink(class,modules.decision,TunableDecision)

## Getting Started: Your First Module

Let's start by initializing a simple but effective K-Nearest Neighbors scorer:
"""

# %%
from autointent.modules.scoring import KNNScorer

# Initialize a KNN scorer with basic configuration
scorer = KNNScorer(
    embedder_config="sergeyzh/rubert-tiny-turbo",  # Pre-trained embedding model
    k=5,  # Number of nearest neighbors
)

print(f"Initialized {scorer.__class__.__name__} with k={scorer.k}")

# %% [markdown]
"""
### Embedder Configuration Deep Dive

Most modules in AutoIntent rely on embedders to convert text into numerical representations. We use the powerful **sentence-transformers** library, which provides:

- 🔄 **Automatic device detection** (CUDA, MPS, CPU)
- 🚀 **Optimized inference** with batching and caching
- 🌐 **Access to enumerous models** from Hugging Face Hub
"""

# %% [markdown]
"""
💡 For detailed embedder configuration including custom models, optimization settings, and advanced features, check out our dedicated guide: %mddoclink(notebook,advanced.02_embedder_configuration).
"""

# %% [markdown]
"""
## Loading Training Data

Before we can train our module, we need to prepare the training data. AutoIntent provides convenient data loading utilities:
"""

# %%
from autointent import Dataset

# Load a pre-processed dataset from the hub
dataset = Dataset.from_hub("DeepPavlov/clinc150_subset")

# Let's explore the dataset structure
print("Dataset structure:")
print(f"- Splits: {list(dataset.keys())}")
print(f"- Train split size: {len(dataset['train_0'])}")
print(f"- Sample utterance: '{dataset['train_0']['utterance'][0]}'")
print(f"- Sample label: '{dataset['train_0']['label'][0]}'")

# %% [markdown]
"""
### Understanding the Data Format

The dataset contains text utterances paired with their intent labels. Let's examine a few examples to understand what we're working with:
"""

# %%
# Display sample data
print("Sample training examples:")
for i in range(3):
    utterance = dataset["train_0"]["utterance"][i]
    label = dataset["train_0"]["label"][i]
    print(f'  {i+1}. "{utterance}" → {label}')

print(f"\nTotal unique intents: {len(set(dataset['train_0']['label']))}")

# %% [markdown]
"""
## Training Your Module

Now comes the exciting part - training your module on the data! This is where the module learns to map utterances to intent probabilities:
"""

# %%
import time

print("🚀 Starting training...")
start_time = time.time()

# Train the module on utterances and their corresponding labels
scorer.fit(dataset["train_0"]["utterance"], dataset["train_0"]["label"])

training_time = time.time() - start_time
print(f"✅ Training completed in {training_time:.2f} seconds!")

# %% [markdown]
"""
### What Happens During Training

During the `fit()` process, the module:

1. **Validates setup**: Ensures the data is properly formatted and compatible with this module
2. **Processes text**: Converts utterances into embeddings using the configured embedder
3. **Learns from data**: Builds internal representations (e.g., stores training examples for %mddoclink(class,modules.scoring,KNNScorer), learns weights for %mddoclink(class,modules.scoring,LinearScorer))

The exact process depends on the module type - some modules like %mddoclink(class,modules.scoring,KNNScorer) simply store the training examples, while others like neural networks perform gradient-based optimization.
"""

# %% [markdown]
"""
## Making Predictions

Now that your module is trained, let's see it in action! We can use it to predict intents for new utterances:
"""

# %%
# Test with some example utterances
test_utterances = [
    "hello world!",
    "What's the weather like today?",
    "I want to book a flight to Paris",
    "Play some music please",
]

print("🔮 Making predictions...")
predictions = scorer.predict(test_utterances)

print("\nPrediction results:")
for utterance, prediction in zip(test_utterances, predictions, strict=False):
    print(f'  "{utterance}" → {prediction}')

# %% [markdown]
"""
### Understanding Predictions

The `predict()` method returns the most likely intent class for each input utterance. Behind the scenes, the module:

1. **Embeds the text** using the same embedder used during training
2. **Computes similarities** with training examples (for %mddoclink(class,modules.scoring,KNNScorer)) or applies learned weights (for %mddoclink(class,modules.scoring,LinearScorer))
3. **Returns the highest-scoring intent** as the final prediction

### Batch vs Single Predictions

You can predict on single utterances or batches efficiently:
"""

# %%
# Single prediction
single_prediction = scorer.predict(["How do I reset my password?"])
print(f"Single prediction: {single_prediction[0]}")

# Batch prediction (more efficient for multiple utterances)
batch_predictions = scorer.predict(
    ["Show me my account balance", "What time does the store close?", "Cancel my subscription"]
)
print(f"Batch predictions: {batch_predictions}")

# %% [markdown]
"""
## Saving and Loading Models

One of the most important features for production use is the ability to save trained models and load them later. AutoIntent makes this simple and reliable:

### Saving Your Trained Model
"""

# %%
from pathlib import Path

# Create a directory for saving the model
model_path = Path("my_dumps/knnscorer_clinc150")
model_path.mkdir(parents=True, exist_ok=True)

print(f"💾 Saving model to: {model_path}")
scorer.dump(model_path)
print("✅ Model saved successfully!")

# Let's see what files were created
print("\nSaved files:")
for file in model_path.rglob("*"):
    if file.is_file():
        print(f"  - {file.name}")

# %% [markdown]
"""
### Loading a Saved Model

Loading is just as easy - you can restore the exact same model state without retraining:
"""

# %%
# Load the model from disk
print("📁 Loading saved model...")
loaded_scorer = KNNScorer.load(model_path)
print("✅ Model loaded successfully!")

# Verify it works the same as the original
test_utterance = ["hello world!"]
original_prediction = scorer.predict(test_utterance)
loaded_prediction = loaded_scorer.predict(test_utterance)

print("\nVerification:")
print(f"  Original model: {original_prediction}")
print(f"  Loaded model:   {loaded_prediction}")
print(f"  Identical:      {original_prediction == loaded_prediction}")

# %% [markdown]
"""
## Advanced: Debugging with Rich Output

Many modules provide detailed prediction metadata that's valuable for understanding model behavior and debugging issues:
"""

# %%
# Get detailed prediction information
print("🔍 Analyzing prediction with metadata...")
scores, meta = loaded_scorer.predict_with_metadata(["hello world!"])

print("Detailed prediction analysis:")
print("  Input: 'hello world!'")
print(f"  Prediction: {scores[0]}")

# Display additional metadata if available
print(f"  Similar examples found: {len(meta[0]['neighbors'])}")

# %% [markdown]
"""
### What's in the Metadata?

Different modules provide different types of debugging information:

- **KNN modules** (e.g., %mddoclink(class,modules.scoring,KNNScorer)): Show nearest neighbors from training data
- **Linear modules** (e.g., %mddoclink(class,modules.scoring,LinearScorer)): Display feature importance scores
- **Neural modules** (e.g., %mddoclink(class,modules.scoring,BertScorer)): Provide attention weights and layer activations [planned to implement]

This metadata is crucial for:
- 🐛 **Debugging** unexpected predictions
- 📊 **Model interpretation** and explainability
- 🎯 **Performance optimization** by identifying weak spots
- 🔧 **Feature engineering** based on what the model focuses on

## Summary and Next Steps

Congratulations! You've learned the fundamentals of working with AutoIntent modules:

✅ **What you've accomplished:**
- Understood different module types and their roles
- Configured and initialized modules for your needs
- Trained a module on real data
- Made predictions on new utterances
- Saved and loaded models for reuse
- Explored debugging capabilities with rich output

### 🚀 What's Next?

Now that you understand modules, you're ready for more advanced topics:

1. **Pipeline Automation**: Learn about %mddoclink(notebook,basic_usage.03_automl) to automatically find the best module configurations
2. **Advanced Configuration**: Dive deeper into %mddoclink(notebook,advanced.02_embedder_configuration) for optimal performance
3. **Production Deployment**: Explore inference optimization and serving strategies
4. **Custom Modules**: Build your own modules for specific use cases

### 💡 Key Takeaways

- **Modules are composable**: Combine scoring and decision modules for complex workflows
- **Configuration matters**: Proper embedder and hyperparameter setup significantly impacts performance
- **Debugging is built-in**: Use metadata outputs to understand and improve your models
- **Persistence is seamless**: Save and load models without losing any functionality
"""

# %%
# Clean up the saved files
import shutil

print("🧹 Cleaning up demo files...")
shutil.rmtree(model_path.parent)
print("✅ Cleanup completed!")


# %% [markdown]
"""
# AutoML Pipeline Configuration

AutoML (Automated Machine Learning) in AutoIntent allows you to automatically find the best configuration for your intent classification pipeline. Instead of manually tuning hyperparameters and selecting components, AutoML explores different combinations to find the optimal setup for your specific dataset.
"""

# %%
from autointent import Pipeline

# %% [markdown]
"""
In this tutorial, we'll walk through the pipeline auto-configuration process step by step. We'll learn how to:

- Use predefined search spaces and presets
- Customize search configurations
- Set up logging and validation strategies
- Run the optimization process
- Save and load optimized pipelines

Let's start by loading a small subset of the popular `clinc150` dataset for demonstration.
"""

# %%
from autointent import Dataset

# Load the dataset from Hugging Face hub
dataset = Dataset.from_hub("DeepPavlov/clinc150_subset")
print(f"Dataset contains {len(dataset)} splits")
dataset

# %% [markdown]
"""
Let's examine the structure of our dataset by looking at a sample utterance:
"""

# %%
sample = dataset["train_0"][0]
print(f"Sample utterance: '{sample['utterance']}'")
print(f"Intent label: '{sample['label']}'")
sample

# %% [markdown]
"""
## Search Space

AutoIntent provides default search spaces. One can utilize them by constructing %mddoclink(class,,Pipeline) with factory %mddoclink(method,Pipeline,from_preset):
"""

# %%
pipeline = Pipeline.from_preset("classic-light")

# %% [markdown]
"""
You can inspect the structure and default values of any preset:
"""

# %%
from pprint import pprint

from autointent.utils import load_preset

preset = load_preset("classic-light")
pprint(preset)

# %% [markdown]
"""
### Customizing Search Spaces

The search space can be customized to fit your specific needs. For example, you can modify hyperparameter ranges:
"""

# %%
# Example: modify the maximum k value for KNN-based components
preset["search_space"][0]["search_space"][0]["k"]["high"] = 10
custom_pipeline = Pipeline.from_optimization_config(preset)

# %% [markdown]
"""
See tutorial %mddoclink(notebook,advanced.03_search_space_configuration) on how the search space is structured.
"""

# %% [markdown]
"""
## Logging and Storage Configuration

During the AutoML process, you'll want to control what artifacts are saved and where they're stored. The %mddoclink(class,configs,LoggingConfig) allows you to specify:

- `project_dir`: Directory where results will be saved
- `dump_modules`: Whether to save trained model files
- `clear_ram`: Whether to clear models from memory after training to save RAM
"""

# %%
from pathlib import Path

from autointent.configs import LoggingConfig

logging_config = LoggingConfig(
    project_dir=Path.cwd() / "runs",  # Save results to 'runs' directory
    dump_modules=False,  # Don't save large model files
    clear_ram=False,  # Keep models in memory for inference
)
custom_pipeline.set_config(logging_config)

# %% [markdown]
"""
## Model Configuration

You can specify which transformer models to use for text embeddings and cross-encoding. This is useful when you want to:

- Use smaller/faster models for experimentation
- Apply domain-specific pre-trained models
- Control model parameters like tokenizer settings
"""

# %%
from autointent.configs import CrossEncoderConfig, EmbedderConfig, TokenizerConfig

# Configure embedding model (used for vector representations)
custom_pipeline.set_config(EmbedderConfig(model_name="prajjwal1/bert-tiny"))

# Configure cross-encoder model (used for scoring text pairs)
custom_pipeline.set_config(
    CrossEncoderConfig(model_name="cross-encoder/ms-marco-MiniLM-L2-v2", tokenizer_config=TokenizerConfig(max_length=8))
)

# %% [markdown]
"""
See the documentation for %mddoclink(class,configs,EmbedderConfig) and %mddoclink(class,configs,CrossEncoderConfig) for all available customization options.
"""

# %% [markdown]
"""
## Validation Strategy

Choose between two validation approaches based on your dataset size:

**Hold-out validation** (default): Uses separate train/validation splits. Best when you have plenty of data.

**Cross-validation**: Splits data into k folds for more robust evaluation. Better for smaller datasets as it uses all data for both training and validation.
"""

# %%
from autointent.configs import DataConfig

# Use 3-fold cross-validation for better performance on small datasets
custom_pipeline.set_config(DataConfig(scheme="cv", n_folds=3))

# %% [markdown]
"""
See the docs for %mddoclink(class,configs,DataConfig) for other options available to customize.
"""

# %% [markdown]
"""
## Complete Example

Let's put everything together in a comprehensive example that demonstrates the full AutoML workflow:
"""

# %%
from autointent import Dataset, Pipeline
from autointent.configs import LoggingConfig
from autointent.utils import load_preset

# Step 1: Load your dataset
dataset = Dataset.from_hub("DeepPavlov/clinc150_subset")
print(f"Loaded dataset with {len(dataset)} splits")

# Step 2: Load and customize a preset configuration
preset = load_preset("classic-light")
# You can modify the preset here if needed
# preset["search_space"][0]["search_space"][0]["k"]["high"] = 5

# Step 3: Create pipeline from the configuration
pipeline = Pipeline.from_optimization_config(preset)

# Step 4: Configure logging and storage
logging_config = LoggingConfig(
    dump_modules=True,  # Save trained models for later use
    clear_ram=False,  # Keep models in memory for immediate inference
)
pipeline.set_config(logging_config)

# Step 5: Run AutoML optimization
print("Starting AutoML optimization...")
context = pipeline.fit(dataset)
print("✅ AutoML optimization completed!")

# Step 6: Test the optimized pipeline
test_utterances = ["hello world!", "I want to transfer money", "book a flight"]
predictions = pipeline.predict(test_utterances)
print(f"Predictions: {predictions}")

# %% [markdown]
"""
## Dump Results

One can save all results of auto-configuration process to file system (to ``LoggingConfig.dirpath``):
"""

# %%
context.dump()

# %% [markdown]
"""
Or one can dump only the configured pipeline to any desired location (by default ``LoggingConfig.dirpath``):
"""

# %%
pipeline.dump()

# %% [markdown]
"""
## Load Pipeline for Inference
"""

# %%
loaded_pipe = Pipeline.load(logging_config.dirpath)

# %% [markdown]
"""
Since this notebook is launched automatically while building the docs, we will clean the space if you don't mind :)
"""

# %%
import shutil

shutil.rmtree(logging_config.dirpath)


# %% [markdown]
"""
# Inference Pipeline

After you configured optimal pipeline with AutoIntent, you probably want to test its power on some new data! There are several options:

- use it right after optimization
- save to file system and then load

## Right After

Here's the basic example:
"""

# %%
from autointent import Dataset, Pipeline

search_space = [
    {
        "node_type": "scoring",
        "target_metric": "scoring_roc_auc",
        "search_space": [
            {
                "module_name": "knn",
                "k": [1],
                "weights": ["uniform"],
                "embedder_config": ["avsolatorio/GIST-small-Embedding-v0"],
            },
        ],
    },
    {
        "node_type": "decision",
        "target_metric": "decision_accuracy",
        "search_space": [
            {"module_name": "threshold", "thresh": [0.5]},
            {"module_name": "argmax"},
        ],
    },
]

dataset = Dataset.from_hub("DeepPavlov/clinc150_subset")
pipeline = Pipeline.from_search_space(search_space)
context = pipeline.fit(dataset)
pipeline.predict(["hello, world!"])

# %% [markdown]
"""
There are several caveats.

**RAM usage.**

You can optimize RAM usage by saving all modules to file system. Just set the following options:
"""

# %%
from autointent.configs import LoggingConfig

logging_config = LoggingConfig(dump_modules=True, clear_ram=True)

# %% [markdown]
"""
## Load from File System

Firstly, your auto-configuration run should dump modules into file system:
"""

# %%
from autointent import Dataset, Pipeline
from autointent.configs import LoggingConfig

dataset = Dataset.from_hub("DeepPavlov/clinc150_subset")
pipeline = Pipeline.from_search_space(search_space)
pipeline.set_config(LoggingConfig(dump_modules=True, clear_ram=True))

# %% [markdown]
"""
Secondly, after optimization finished, you need to save the auto-configuration results to file system:
"""

# %%
context = pipeline.fit(dataset)
context.dump()
# or pipeline.dump() to save only configured pipeline but not all the optimization assets

# %% [markdown]
"""
This command saves all results to the run's directory:
"""

# %%
run_directory = context.logging_config.dirpath
run_directory

# %% [markdown]
"""
After that, you can load pipeline for inference:
"""

# %%
loaded_pipeline = Pipeline.load(run_directory)
loaded_pipeline.predict(["hello, world!"])

# %% [markdown]
"""
## That's all!
"""


.. _data-aug-tuts:

Data augmentation tutorials
---------------------------

.. toctree::
   :maxdepth: 1

   balancer
   dspy_augmentation
   intent_description


.. _balancer_aug:

Balancing Datasets with DatasetBalancer
=======================================

This guide demonstrates how to use the :py:class:`autointent.generation.utterances.DatasetBalancer` class to balance class distribution in your datasets through LLM-based data augmentation. This method is a wrapper for more simple method :py:class:`autointent.generation.utterances.UtteranceGenerator`.

.. contents:: Table of Contents
    :depth: 2

Why Balance Datasets?
---------------------

Imbalanced datasets can lead to biased models that perform well on majority classes but poorly on minority classes. DatasetBalancer helps address this issue by generating additional examples for underrepresented classes using large language models.

Creating a Sample Imbalanced Dataset
-----------------------------------

Let's create a small imbalanced dataset to demonstrate the balancing process:

.. code-block:: python

    from autointent import Dataset
    from autointent.generation.utterances.balancer import DatasetBalancer
    from autointent.generation.utterances.generator import Generator
    from autointent.generation.chat_templates import EnglishSynthesizerTemplate

    # Create a simple imbalanced dataset
    sample_data = {
        "intents": [
            {"id": 0, "name": "restaurant_booking", "description": "Booking a table at a restaurant"},
            {"id": 1, "name": "weather_query", "description": "Checking weather conditions"},
            {"id": 2, "name": "navigation", "description": "Getting directions to a location"},
        ],
        "train": [
            # Restaurant booking examples (5)
            {"utterance": "Book a table for two tonight", "label": 0},
            {"utterance": "I need a reservation at Le Bistro", "label": 0},
            {"utterance": "Can you reserve a table for me?", "label": 0},
            {"utterance": "I want to book a restaurant for my anniversary", "label": 0},
            {"utterance": "Make a dinner reservation for 8pm", "label": 0},

            # Weather query examples (3)
            {"utterance": "What's the weather like today?", "label": 1},
            {"utterance": "Will it rain tomorrow?", "label": 1},
            {"utterance": "Weather forecast for New York", "label": 1},

            # Navigation example (1)
            {"utterance": "How do I get to the museum?", "label": 2},
        ]
    }

    # Create the dataset
    dataset = Dataset.from_dict(sample_data)

Setting up the Generator and Template
------------------------------------

DatasetBalancer requires two main components:

1. A :py:class:`autointent.generation.Generator`` - responsible for creating new utterances using an LLM
2. A :py:class:`autointent.generation.chat_templates.EnglishSynthesizerTemplate` - defines the prompt format sent to the LLM

Let's set up these components:

.. code-block:: python

    # Initialize a generator (uses OpenAI API by default)
    generator = Generator()

    # Create a template for generating utterances
    template = EnglishSynthesizerTemplate(dataset=dataset, split="train")

Creating the DatasetBalancer
----------------------------

Now we can create our DatasetBalancer instance:

.. code-block:: python

    balancer = DatasetBalancer(
        generator=generator,
        prompt_maker=template,
        async_mode=False,  # Set to True for faster generation with async processing
        max_samples_per_class=5,  # Each class will have exactly 5 samples after balancing
    )

Checking Initial Class Distribution
----------------------------------

Let's examine the class distribution before balancing:

.. code-block:: python

    # Check the initial distribution of classes in the training set
    initial_distribution = {}
    for sample in dataset["train"]:
        label = sample[Dataset.label_feature]
        initial_distribution[label] = initial_distribution.get(label, 0) + 1

    print("Initial class distribution:")
    for class_id, count in sorted(initial_distribution.items()):
        intent = next(i for i in dataset.intents if i.id == class_id)
        print(f"Class {class_id} ({intent.name}): {count} samples")

    print(f"\nMost represented class: {max(initial_distribution.values())} samples")
    print(f"Least represented class: {min(initial_distribution.values())} samples")

Balancing the Dataset
---------------------

Now we'll use the DatasetBalancer to augment our dataset:

.. code-block:: python

    # Create a copy of the dataset
    dataset_copy = Dataset.from_dict(dataset.to_dict())

    # Balance the training split
    balanced_dataset = balancer.balance(
        dataset=dataset_copy,
        split="train",
        batch_size=2,  # Process generations in batches of 2
    )

Checking the Results
-------------------

Let's examine the class distribution after balancing:

.. code-block:: python

    # Check the balanced distribution
    balanced_distribution = {}
    for sample in balanced_dataset["train"]:
        label = sample[Dataset.label_feature]
        balanced_distribution[label] = balanced_distribution.get(label, 0) + 1

    print("Balanced class distribution:")
    for class_id, count in sorted(balanced_distribution.items()):
        intent = next(i for i in dataset.intents if i.id == class_id)
        print(f"Class {class_id} ({intent.name}): {count} samples")

    print(f"\nMost represented class: {max(balanced_distribution.values())} samples")
    print(f"Least represented class: {min(balanced_distribution.values())} samples")

Examining Generated Examples
---------------------------

Let's look at some examples of original and generated utterances for the navigation class,
which was the most underrepresented:

.. code-block:: python

    # Navigation class (Class 2)
    navigation_class_id = 2
    intent = next(i for i in dataset.intents if i.id == navigation_class_id)

    print(f"Examples for class {navigation_class_id} ({intent.name}):")

    # Original examples
    original_examples = [
        s[Dataset.utterance_feature] for s in dataset["train"] if s[Dataset.label_feature] == navigation_class_id
    ]
    print("\nOriginal examples:")
    for i, example in enumerate(original_examples, 1):
        print(f"{i}. {example}")

    # Generated examples
    all_examples = [
        s[Dataset.utterance_feature] for s in balanced_dataset["train"] if s[Dataset.label_feature] == navigation_class_id
    ]
    generated_examples = [ex for ex in all_examples if ex not in original_examples]
    print("\nGenerated examples:")
    for i, example in enumerate(generated_examples, 1):
        print(f"{i}. {example}")

Configuring the Number of Samples per Class
------------------------------------------

You can configure how many samples each class should have:

.. code-block:: python

    # To bring all classes to exactly 10 samples
    original_dataset = Dataset.from_dict(sample_data)
    exact_template = EnglishSynthesizerTemplate(dataset=original_dataset, split="train")

    exact_balancer = DatasetBalancer(
        generator=generator,
        prompt_maker=exact_template,
        max_samples_per_class=10
    )

    # Balance to the level of the most represented class
    max_template = EnglishSynthesizerTemplate(dataset=original_dataset, split="train")

    max_balancer = DatasetBalancer(
        generator=generator,
        prompt_maker=max_template,
        max_samples_per_class=None  # Will use the count of the most represented class
    )

Tips for Effective Dataset Balancing
-----------------------------------

1. **Quality Control**: Always review a sample of generated utterances to ensure quality.
2. **Template Selection**: Different templates may work better for different domains.
3. **Model Selection**: Larger models generally produce higher quality utterances.
4. **Batch Size**: Increase batch size for faster generation if your hardware allows.
5. **Validation**: Test your model on both original and augmented data to ensure it generalizes well.


.. _evolutionary_strategy_augmentation:

DSPY Augmentation
#################

This tutorial covers the implementation and usage of an evolutionary strategy to augment utterances using DSPy. It explains how DSPy is used, how the module functions, and how the scoring metric works. This method is a wrapper for more simple method :py:class:`autointent.generation.utterances.UtteranceEvolver`.

.. contents:: Table of Contents
    :depth: 2

What is DSPy?
-------------

DSPy is a framework for optimizing and evaluating language models. It provides tools for defining signatures, optimizing modules, and measuring evaluation metrics. This module leverages DSPy to generate augmented utterances using an evolutionary approach.

How This Module Works
---------------------

This module applies an incremental evolutionary strategy for augmenting utterances. It generates new utterances based on a given dataset and refines them using an iterative process. The generated utterances are evaluated using a scoring mechanism that includes:

- **SemanticF1**: Measures how well the generated utterance matches the ground truth.
- **ROUGE-1 penalty**: Discourages excessive repetition.
- **Pipeline Decision Metric**: Assesses whether the augmented utterances improve model performance.

The augmentation process runs for a specified number of evolutions, saving intermediate models and optimizing the results.

Installation
------------

Ensure you have the required dependencies installed:

.. code-block:: bash

    pip install "autointent[dspy]"

Scoring Metric
--------------

The scoring metric consists of:

1. **SemanticF1 Score**:
   - Computes precision and recall between system-generated utterances and ground truth by LLM.
   - Uses DSPy’s `SemanticRecallPrecision` module.

2. **Repetition Factor (ROUGE-1 Penalty)**:
   - Measures overlap of words between the generated and ground truth utterances.
   - Ensures diversity in augmentation.

3. **Final Score Calculation**:
   - `Final Score = SemanticF1 * Repetition Factor`
   - A higher score means better augmentation.

Usage Example
-------------

Before running the following code, refer to the `LiteLLM documentation <https://docs.litellm.ai/docs/providers>`_ for proper model configuration.

.. code-block:: python

    import os
    os.environ["OPENAI_API_KEY"] = "your-api-key"

    from autointent import Dataset
    from autointent.custom_types import Split

    dataset = Dataset.from_hub("AutoIntent/clinc150_subset")
    evolver = DSPYIncrementalUtteranceEvolver(
        "openai/gpt-4o-mini"
    )

    augmented_dataset = evolver.augment(
        dataset,
        split_name=Split.TEST,
        n_evolutions=1,
        mipro_init_params={
            "auto": "light",
        },
        mipro_compile_params={
            "minibatch": False,
        },
    )

    augmented_dataset.to_csv("clinc150_dspy_augment.csv")


.. _intent_description_generation:

Intent Description Generation
#############################

This documentation covers the implementation and usage of the Intent Description Generation module. It explains the function of the module, the underlying mechanisms, and provides examples of usage.

The approach used in this module is based on the paper `Exploring Description-Augmented Dataless Intent Classification <https://arxiv.org/pdf/2407.17862>`_.

.. contents:: Table of Contents
    :depth: 2

Overview
--------

The Intent Description Generation module is designed to automatically generate detailed and coherent descriptions of intents using large language models (LLMs). It enhances datasets by creating human-readable explanations for intents, supplemented by examples (utterances) and regex patterns.

How the Module Works
--------------------

The module leverages prompt engineering to interact with LLMs, creating structured intent descriptions that are suitable for documentation, user interaction, and training purposes. Each generated description includes:

- **Intent Name**: Clearly identifies the intent.
- **Examples (User Utterances)**: Demonstrates real-world user inputs.
- **Regex Patterns**: Highlights relevant regex patterns associated with the intent.

The module uses a templated approach, defined through `PromptDescription`, to maintain consistency and clarity across descriptions.

Installation
------------

Ensure you have the necessary dependencies installed:

.. code-block:: bash

    pip install autointent openai

Usage
-----

Here's an example demonstrating how to generate intent descriptions:

.. code-block:: python

    import openai
    from autointent import Dataset
    from autointent.generation.intents import generate_descriptions
    from autointent.generation.chat_templates import PromptDescription

    client = openai.AsyncOpenAI(
        api_key="your-api-key"
    )

    dataset = Dataset.from_hub("AutoIntent/clinc150_subset")

    prompt = PromptDescription(
        text="Describe intent {intent_name} with examples: {user_utterances} and patterns: {regex_patterns}",
    )

    enhanced_dataset = generate_descriptions(
        dataset=dataset,
        client=client,
        prompt=prompt,
        model_name="gpt4o-mini",
    )

    enhanced_dataset.to_csv("enhanced_clinc150.csv")

Prompt Customization
--------------------

The `PromptDescription` can be customized to better fit specific requirements. It uses the following placeholders:

- ``{intent_name}``: The name of the intent being described.
- ``{user_utterances}``: Example utterances related to the intent.
- ``{regex_patterns}``: Associated regular expression patterns.

Adjusting the prompt allows tailoring descriptions to different contexts or detail levels.

Model Selection
---------------

This module supports various LLMs available through OpenAI-compatible APIs. Configure your preferred model via the `model_name` parameter. Refer to your LLM provider’s documentation for available models.

Recommended models include:

- ``gpt4o-mini`` (for balanced performance and efficiency)
- ``gpt-4`` (for maximum descriptive quality)

API Integration
---------------

Ensure your OpenAI-compatible client is properly configured with an API endpoint and key:

.. code-block:: python

    client = openai.AsyncOpenAI(
        base_url="your-api-base-url",
        api_key="your-api-key"
    )


Learn
=====

.. toctree::
    :glob:
    :maxdepth: 1
    :caption: Contents:

    ./*

AutoML and Hyperparameter Optimization
======================================

This section provides a deep dive into the theoretical foundations of automated machine learning (AutoML) and hyperparameter optimization as implemented in AutoIntent.

The Hyperparameter Optimization Problem
---------------------------------------

**The Core Problem**

Hyperparameter optimization is about finding the best configuration of settings that maximizes model performance. Think of it as searching through all possible combinations of hyperparameters (like learning rates, model sizes, regularization strengths) to find the combination that gives the best results on validation data.

The performance metric is typically estimated through cross-validation to avoid overfitting - we want configurations that work well on unseen data, not just the training data.

**The Challenge of Combinatorial Explosion**

In AutoIntent's three-stage pipeline, the total search space grows multiplicatively across all stages. If we have:

- 10 different embedding models to choose from
- 20 different scoring configurations 
- 5 different decision strategies

Then we have 10 × 20 × 5 = 1,000 total combinations. In realistic scenarios, this can easily exceed 1,000,000 configurations, making it impossible to test every combination within reasonable time and computational budgets.

Hierarchical Optimization Strategy
----------------------------------

AutoIntent addresses combinatorial explosion through a **hierarchical greedy optimization** approach that optimizes modules sequentially.

**Sequential Module Optimization**

The optimization proceeds in three stages, where each stage builds on the results of the previous one:

1. **Embedding Optimization**: First, find the best embedding model configuration by testing different models and settings, evaluating them using retrieval or classification metrics.

2. **Scoring Optimization**: Using the best embedding model from step 1, now optimize the scoring module by testing different classifiers (KNN, linear, neural networks, etc.) with various hyperparameters.

3. **Decision Optimization**: Using the best embedding and scoring combination from steps 1-2, optimize the decision module by finding optimal thresholds and decision strategies for final predictions.

**Proxy Metrics**

Each stage uses specialized proxy metrics that correlate with final performance:

- **Embedding Stage**: Retrieval metrics (NDCG, hit rate) or lightweight classification accuracy
- **Scoring Stage**: Classification metrics (F1, ROC-AUC) on validation data  
- **Decision Stage**: Threshold-specific metrics for multi-label/OOS scenarios

**Trade-offs**

- ✅ **Computational Efficiency**: Instead of testing all possible combinations (which grows exponentially), we only test combinations within each stage separately, making optimization much faster and more manageable.
- ✅ **Parallelization**: Each stage can be parallelized independently, allowing multiple configurations to be tested simultaneously.
- ⚠️ **Local Optimality**: May miss globally optimal combinations due to greedy choices - the best embedding might work better with a different scorer than the one we pick, but we won't discover this combination.

Tree-Structured Parzen Estimators (TPE)
----------------------------------------

AutoIntent uses Optuna's TPE algorithm for sophisticated hyperparameter optimization within each module. This is a form of Bayesian optimization that learns from previous trials to make smarter choices about which hyperparameters to try next.

**How TPE Works**

TPE builds two separate models:

- **Good Configuration Model**: Learns the distribution of hyperparameters that led to good performance (typically the top 25% of trials)
- **Bad Configuration Model**: Learns the distribution of hyperparameters that led to poor performance (the remaining 75% of trials)

The algorithm then suggests new hyperparameters by finding configurations that are likely under the "good" model but unlikely under the "bad" model. This naturally balances exploration (trying untested areas) with exploitation (focusing on promising regions).

**Benefits of TPE**

- **Smart Sampling**: After initial random trials, TPE makes increasingly informed decisions about which hyperparameters to try
- **Handles Different Parameter Types**: Works well with categorical, continuous, and integer parameters
- **Robust to Noisy Evaluations**: Can handle situations where the same hyperparameters might give slightly different results due to randomness
- **No Prior Knowledge Required**: Works without needing to specify complex relationships between parameters

Search Space Design
-------------------

**Parameter Types**

AutoIntent supports various hyperparameter types with appropriate sampling strategies:

AutoIntent supports several types of hyperparameters, each requiring different optimization strategies:

**Categorical Parameters**: These are discrete choices from a fixed set of options, like choosing between different model types ("knn", "linear", "bert") or activation functions ("relu", "tanh", "sigmoid"). The optimizer samples uniformly from the available choices.

**Continuous Parameters**: These are real-valued parameters like learning rates, regularization strengths, or temperature values. The optimizer can sample from uniform distributions (for parameters like dropout rates between 0.0 and 1.0) or log-uniform distributions (for parameters like learning rates that work better on logarithmic scales).

**Integer Parameters**: These are whole number parameters like the number of neighbors in KNN, hidden dimensions in neural networks, or batch sizes. The optimizer can specify step sizes and bounds to ensure valid configurations.

**Conditional Parameters**: Some parameters only make sense when certain other parameters have specific values. For example, LoRA-specific parameters (like lora_alpha and lora_r) only apply when the model type is "lora". AutoIntent handles these dependencies automatically in the search space configuration.


**Search Space Configuration**

.. code-block:: yaml

   search_space:
     - node_type: scoring
       target_metric: scoring_f1
       search_space:
         - module_name: knn
           k:
             low: 1
             high: 20
           weights: [uniform, distance, closest]
         - module_name: linear
           cv: [3, 5, 10]

Cross-Validation and Data Splitting
-----------------------------------

**Validation Schemes**

AutoIntent supports multiple validation strategies to ensure robust hyperparameter selection:

**Hold-out Validation (HO)**

Split data into training and validation sets once. Train the model on the training set and evaluate performance on the validation set. This gives a single performance score for each hyperparameter configuration.

**Cross-Validation (CV)**

Split data into K folds (typically 3-5). For each fold, train on the remaining folds and validate on the current fold. Average the performance scores across all K folds to get a more robust estimate of how well the hyperparameters work.

**Stratified Splitting**

For imbalanced datasets, AutoIntent uses stratified sampling to maintain class distributions:

.. code-block:: python

   from autointent.configs import DataConfig
   
   data_config = DataConfig(
       scheme="cv",           # Cross-validation
       n_folds=5,             # 5-fold CV
       validation_size=0.2,   # 20% for validation in HO
       separation_ratio=0.5   # Prevent data leakage between modules
   )

**Data Leakage Prevention**

The ``separation_ratio`` parameter prevents information leakage between scoring and decision modules by using different data subsets for each stage.

**Hyperparameter Bounds**

Search spaces include reasonable bounds to prevent extreme configurations:

.. code-block:: yaml

   learning_rate:
     low: 1.0e-5    # Prevent too slow learning
     high: 1.0e-2   # Prevent instability
     log: true      # Log-uniform sampling

Multi-Objective Optimization Considerations
--------------------------------------------

While AutoIntent primarily optimizes single metrics, it considers multiple objectives implicitly:

**Performance vs. Efficiency Trade-offs**

- **Model size**: Smaller models for deployment efficiency  
- **Training time**: Faster models for rapid iteration
- **Inference speed**: Optimized for production latency

**Presets as Multi-Objective Solutions**

AutoIntent provides presets that balance different objectives:

.. code-block:: python

   # Different computational budgets
   pipeline_light = Pipeline.from_preset("classic-light")    # Speed-focused
   pipeline_heavy = Pipeline.from_preset("classic-heavy")    # Performance-focused
   
   # Different model types  
   pipeline_zero_shot = Pipeline.from_preset("zero-shot-encoders")  # No training data

Bayesian Optimization Theory
-----------------------------

**Gaussian Process Surrogate Models**

While TPE uses tree-structured models, the general Bayesian optimization framework uses Gaussian Processes as surrogate models. These are probabilistic models that learn to predict performance based on previous trials, including uncertainty estimates about unexplored regions of the hyperparameter space.

**Exploration vs. Exploitation**

Bayesian optimization balances:

- **Exploitation**: Sampling near known good configurations
- **Exploration**: Sampling in uncertain regions of the space

The acquisition function mathematically encodes this trade-off.

**Convergence Properties**

TPE and related algorithms have theoretical guarantees for convergence to global optima under certain conditions, though practical performance depends on:

- Search space dimensionality
- Function smoothness  
- Available computational budget

Practical Optimization Strategies
----------------------------------

**Budget Allocation**

.. code-block:: python

   hpo_config = HPOConfig(
       sampler="tpe",
       n_trials=50,              # Total optimization budget
       n_startup_trials=10,      # Random initialization
       timeout=3600,             # 1-hour time limit
       n_jobs=4                  # Parallel trials
   )

**Warm Starting**

AutoIntent can resume interrupted optimization. This is the approximate code we use for creating optuna studies:

.. code-block:: python

   # Optimization state is automatically saved
   study = optuna.create_study(
       study_name="intent_classification",
       storage="sqlite:///optuna.db",
       load_if_exists=True
   )

Advanced Topics
---------------

**Meta-Learning**

AutoIntent's presets can be viewed as meta-learning solutions - configurations that work well across diverse datasets based on empirical analysis.

**Neural Architecture Search (NAS)**

While not fully implemented, AutoIntent's modular design supports architecture search within model families (e.g., different CNN configurations).

**Automated Feature Engineering**

AutoIntent's embedding-centric design can be seen as automated feature engineering - the system automatically learns relevant representations through selecting best fitting embedding model.


Dialogue Systems Theory and Practice
====================================

In this section, you will learn about the theoretical foundations and practical challenges of building dialogue systems, with a focus on how AutoIntent addresses these challenges.

What are Dialogue Systems?
--------------------------

A dialogue system is a computational framework that enables natural language interaction between humans and machines. These systems serve as intelligent interfaces that can understand user requests, maintain conversation context, and provide appropriate responses or actions.

**📋 Types of Dialogue Systems**

**🎯 Task-Oriented Systems**: Designed to help users accomplish specific tasks like booking flights, making restaurant reservations, or accessing bank account information. These systems typically have well-defined goals and operate within limited domains.

**💬 Open-Domain Chatbots**: Designed for general conversation without specific task constraints. Examples include social chatbots and virtual companions that can discuss various topics.

**❓ Question-Answering Systems**: Focused on providing factual answers to user questions, often by retrieving information from knowledge bases or documents.

**🔀 Hybrid Systems**: Combine multiple approaches, supporting both task-oriented interactions and general conversation.

Core Components of Dialogue Systems
-----------------------------------

Modern dialogue systems typically consist of several interconnected components:

**🧠 Natural Language Understanding (NLU)**

The NLU component processes user input and extracts structured meaning, typically including:

- **🎯 Intent Classification**: Determining what the user wants to do
- **🏷️ Entity Extraction**: Identifying specific pieces of information (names, dates, locations)
- **😊 Sentiment Analysis**: Understanding the user's emotional state or attitude

**🎛️ Dialogue Management**

This component maintains conversation state and decides what action to take next:

- **📊 State Tracking**: Keeping track of what has been discussed and what information is needed
- **🧭 Policy Learning**: Deciding what response or action is most appropriate given the current state
- **💭 Context Management**: Handling multi-turn conversations and maintaining dialogue history

**✍️ Natural Language Generation (NLG)**

Converts system decisions into natural language responses that users can understand.

**🔌 Backend Integration**

Connects to external services, databases, or APIs to fulfill user requests.

Intent Classification: The Heart of NLU
---------------------------------------

Intent classification is arguably the most critical component of task-oriented dialogue systems. It determines which service or action the user wants to invoke.

**🎯 What is an Intent?**

An intent represents the user's goal or purpose behind an utterance. For example:

- "Book a flight to Paris" → `book_flight` intent
- "What's my account balance?" → `check_balance` intent  
- "Cancel my reservation" → `cancel_booking` intent

**🤖 Intent Classification as Machine Learning**

From a technical perspective, intent classification is a text classification problem where:

- **📥 Input**: User utterance (text)
- **📤 Output**: Intent class (category)
- **📚 Training Data**: Examples of utterances paired with their corresponding intents

**⚠️ Unique Challenges in Dialogue Systems**

Intent classification in dialogue systems faces several challenges that distinguish it from general text classification:

**1️⃣ Domain Complexity and Scale**

Real-world dialogue systems often need to handle dozens or hundreds of different intents. A banking chatbot might support intents like `transfer_money`, `check_balance`, `report_fraud`, `apply_for_loan`, `find_atm`, `update_personal_info`, and many others. This scale makes manual rule-based approaches impractical.

**2️⃣ Out-of-Scope Detection**

Users don't always stay within the system's intended domain. They might ask questions the system wasn't designed to handle:

- User: "What's the weather like?" (to a banking bot)
- User: "Tell me a joke" (to a flight booking system)

The system must recognize these out-of-scope (OOS) utterances and handle them gracefully, rather than misclassifying them as valid intents.

**3️⃣ Multi-Intent Utterances**

Users sometimes express multiple intentions in a single utterance:

- "I want to book a flight to London and also check if I have enough points for an upgrade"
- "Transfer $500 to John's account and send me a confirmation email"

This requires multi-label classification where an utterance can belong to multiple intent categories simultaneously.

**4️⃣ Limited Training Data**

Collecting comprehensive training data for dialogue systems is challenging:

- **🚀 Cold Start Problem**: New domains or intents may have little or no training data
- **📉 Long Tail Distribution**: Some intents occur much less frequently than others
- **💬 Conversation Context**: Training data should ideally capture how intents appear in real conversations, not just isolated utterances

**5️⃣ Linguistic Variation**

Users express the same intent in many different ways:

- "Book me a flight" / "I need to fly somewhere" / "Can you help me travel to NYC?"
- "What's my balance?" / "How much money do I have?" / "Show me my account status"

The system must handle this linguistic variation while maintaining accuracy.

**6️⃣ Contextual Dependencies**

In multi-turn dialogues, the same utterance can have different meanings depending on context:

- User: "Book it" (could mean book a flight, hotel, or restaurant depending on previous conversation)
- User: "Yes" (confirmation, but confirmation of what?)

How AutoIntent Addresses Dialogue System Challenges
----------------------------------------------------

AutoIntent specifically addresses the key challenges faced by dialogue system developers:

**🔄 Automated Model Selection**

Instead of manually trying different approaches, AutoIntent automatically tests and compares multiple classification methods (KNN, neural networks, transformer models, etc.) to find the best approach for your specific dataset and use case.

**🚫 Out-of-Scope Detection**

AutoIntent provides built-in support for OOS detection through:

- **📊 Confidence Thresholding**: Rejecting predictions below a certain confidence level
- **🎯 Specialized Decision Modules**: Like `JinoosDecision` and `TunableDecision` that are designed for OOS scenarios
- **⚖️ Threshold Optimization**: Automatically finding the best confidence thresholds that balance precision and recall

**🏷️ Multi-Label Classification**

AutoIntent natively supports multi-label scenarios through:

- **🤖 Multi-Label Aware Algorithms**: Methods like `MLKnnScorer` designed specifically for multi-label tasks
- **📈 Adaptive Thresholding**: The `AdaptiveDecision` module can set different thresholds for different intent classes

**🎯 Few-Shot Learning**

AutoIntent excels in scenarios with limited training data through:

- **🔍 Embedding-Based Methods**: KNN and similarity-based approaches that work well with few examples
- **⚡ Zero-Shot Capabilities**: Using intent descriptions instead of training examples
- **🔄 Transfer Learning**: Leveraging pre-trained models and embeddings

**⚙️ Hyperparameter Optimization**

AutoIntent eliminates the need for manual hyperparameter tuning through automated optimization, saving significant development time.

Multi-Turn Dialogue Considerations
-----------------------------------

While AutoIntent focuses primarily on single-utterance intent classification, real dialogue systems must handle multi-turn conversations.

**🔗 Context Propagation**

In multi-turn scenarios, context from previous turns affects intent classification:

- **Turn 1**: "I want to book a flight"
- **Turn 2**: "Make it for tomorrow" (context: still talking about flight booking)
- **Turn 3**: "Actually, change that to next week" (context: modifying the previous request)

**📋 Session Management**

Dialogue systems must maintain session state across multiple interactions:

- **👤 User Identity**: Who is the user?
- **📜 Conversation History**: What has been discussed?
- **💾 Slot Values**: What information has been collected?
- **🎯 Current Goal**: What is the user trying to accomplish?

**🔌 Integration with AutoIntent**

AutoIntent cannot directly be integrated into multi-turn system for now, but here are a few things to bridge the gap:

1. **🔍 Processing Each Turn**: Using AutoIntent to classify each user utterance
2. **📝 Context Enrichment**: Adding conversation context as text features to improve classification

Practical Applications and Use Cases
------------------------------------

**📞 Customer Service Chatbots**

- **🏦 Banking**: Account inquiries, transactions, fraud reporting
- **🛒 E-commerce**: Order tracking, returns, product recommendations
- **📡 Telecommunications**: Bill payments, service upgrades, technical support
- **✈️ Travel**: Flight, hotel, and car rental bookings
- **🏥 Healthcare**: Appointment scheduling, prescription refills
- **🍕 Food Services**: Restaurant reservations, food delivery


**🎙️ Voice Assistants**

- **🏠 Smart Home**: Device control, automation setup
- **🎵 Entertainment**: Music playback, content search
- **📅 Productivity**: Calendar management, reminders, note-taking

**📅 Booking Systems**


**🤖 LLM Agents and Tool Selection**

Modern AI systems increasingly rely on Large Language Models (LLMs) that can use external tools and APIs to accomplish complex tasks. These systems, often called "AI agents" need to determine which tools to use for specific user requests.

- **⚡ Function Calling**: LLMs like GPT-4, Claude, or Llama can be equipped with function-calling capabilities to use external APIs, databases, or computational tools
- **🔧 Tool Orchestration**: Complex agents that combine multiple tools (web search, calculator, database queries, file operations) based on user needs
- **⚙️ Workflow Automation**: Systems that can execute multi-step processes by selecting appropriate tools in sequence

**🚀 Performance Advantages of AutoIntent for LLM Systems**

Even when using powerful LLMs, AutoIntent can provide significant advantages:

**⚡ Latency Optimization**: API calls to distant LLM servers typically take 500-4000ms, while local ML model predictions with AutoIntent can complete much faster. For tool selection in real-time applications, this speed difference is crucial.

**💰 Cost Efficiency**: Local intent classification reduces the number of expensive LLM API calls by pre-filtering and routing requests to appropriate tools without requiring LLM reasoning.

**🔒 Reliability**: Local models provide consistent performance without dependency on external API availability, rate limits, or network connectivity issues.

**🛡️ Privacy**: Sensitive user requests can be classified locally without sending data to external LLM providers.

**🔄 Hybrid Architecture Benefits**

- **⚡ Fast Intent Routing**: Use AutoIntent to quickly classify user requests and route them to appropriate specialized tools or LLM prompts
- **🎯 Tool Pre-selection**: Narrow down the set of available tools before presenting options to the LLM, improving accuracy and reducing hallucination
- **🔄 Fallback Strategies**: When local classification is confident, execute actions directly; when uncertain, escalate to LLM for more sophisticated reasoning
- **🤖 Multi-Agent Coordination**: Route different types of requests to specialized LLM agents based on local intent classification

Slots and Entity Extraction
----------------------------

While AutoIntent focuses primarily on intent classification, understanding slots (also called entities) is crucial for building complete dialogue systems.

**❓ What are Slots?**

Slots are specific pieces of information that the system needs to extract from user utterances to fulfill their requests. They represent the parameters or arguments required by the intended action.

**💡 Examples of Slots**

For a flight booking intent, relevant slots might include:

- **🛫 Departure City**: "I want to fly from New York"
- **🛬 Destination City**: "to London"  
- **📅 Date**: "on March 15th"
- **👥 Number of Passengers**: "for two people"
- **💺 Class**: "in business class"

**📋 Types of Slots**

**📝 Categorical Slots**: Fixed set of possible values

- Seat class: economy, business, first
- Payment method: credit card, debit card, PayPal

**🔢 Numerical Slots**: Numeric values

- Number of passengers: 1, 2, 3, 4...
- Amount to transfer: $100, $500, $1,250

**⏰ Temporal Slots**: Date and time information

- Departure date: "tomorrow", "March 15th", "next Friday"
- Time: "morning", "3 PM", "around noon"

**📍 Location Slots**: Geographic information

- Cities: "New York", "London", "Tokyo"
- Addresses: "123 Main Street", "downtown area"

**👤 Named Entity Slots**: Proper nouns

- Person names: "John Smith", "Maria Garcia"
- Organization names: "Chase Bank", "Delta Airlines"

**🔗 Relationship Between Intents and Slots**

Different intents require different slots:

- `book_flight` intent needs: departure_city, destination_city, date, passengers
- `transfer_money` intent needs: amount, recipient, account_type
- `check_weather` intent needs: location, date/time

**🔌 Integration with AutoIntent**

While AutoIntent doesn't directly handle slot extraction, it can be integrated with slot filling systems:

1. **🎯 Intent Classification First**: Use AutoIntent to determine the user's intent
2. **🏷️ Slot Extraction**: Based on the predicted intent, apply appropriate slot extraction models
3. **🤝 Joint Training**: Use intent predictions as features for slot extraction models

**🛠️ Popular Slot Extraction Approaches**

- **📝 Rule-Based**: Regular expressions and pattern matching
- **📊 Classical ML**: CRF (Conditional Random Fields)
- **🧠 Neural Approaches**: BERT-based NER models, BiLSTM-CRF
- **🤖 Joint Models**: Models that predict intents and slots simultaneously (encoders like BERT or LLMs like GPT)

Dialogue Management and Flow Control
-------------------------------------

Dialogue management orchestrates the conversation flow and determines system actions based on the current state.

**📊 State Representation**

The dialogue state typically includes:

- **🎯 Current Intent**: What the user wants to do
- **💾 Slot Values**: Information collected so far
- **📜 Dialogue History**: Previous turns and actions
- **👤 User Profile**: Persistent information about the user
- **🌍 Context**: External information (time, location, etc.)

**🌊 Dialogue Flow Patterns**

**📏 Linear Flows**: Predetermined sequence of steps

1. Collect departure city
2. Collect destination city  
3. Collect travel date
4. Confirm booking
5. Process payment

**🔀 Branching Flows**: Different paths based on conditions

- If user is premium member → offer upgrade options
- If destination requires visa → provide visa information
- If date is invalid → ask for alternative dates

**🤝 Mixed-Initiative**: Both user and system can drive the conversation

- System can ask clarifying questions
- User can provide unrequested information
- System can make proactive suggestions

**🛠️ Error Handling and Recovery**

**❌ Recognition Errors**: When the system misunderstands user input

- Confidence scoring to detect uncertain predictions
- Confirmation strategies ("Did you say London?")
- Graceful fallback to human agents

**💥 Task Completion Failures**: When the system cannot fulfill the request

- Alternative suggestions
- Partial completion with explanation
- Escalation procedures

**😕 User Confusion**: When users don't understand the system

- Help messages and tutorials
- Progressive disclosure of capabilities
- Context-sensitive guidance

**🧭 Dialogue Policies**

**📋 Rule-Based Policies**: Hand-crafted decision trees

- Simple and predictable
- Easy to debug and modify
- Limited scalability and flexibility

**🤖 Machine Learning Policies**: Learned from data

- Reinforcement learning approaches
- Supervised learning from conversation logs
- Better handling of complex scenarios

**🧠 LLM-Based Policies**: Leveraging large language models for dialogue management

- Use LLMs (e.g., GPT, Llama) to generate system responses dynamically
- Few-shot or zero-shot prompting for intent recognition and slot filling
- Can handle open-domain and complex, unanticipated user inputs
- Requires careful prompt engineering and safety controls
- May be combined with retrieval-augmented generation for factual accuracy

**🔄 Hybrid Approaches**: Combining rules and learning

- Rules for critical paths and constraints
- ML for optimization and personalization
- Best of both worlds approach

Production Considerations for Dialogue Systems
----------------------------------------------

There are lots of considerations to think about, but the one where AutoIntent can help is **🔄 Automated Model Update**. AutoIntent's AutoML capabilities enable periodic retraining and updating of classifiers with new data, ensuring the system stays accurate and up-to-date with minimal manual intervention.


Conclusion
----------

This comprehensive understanding of dialogue systems provides the context for how AutoIntent fits into the broader ecosystem. While AutoIntent specifically excels at the intent classification component, understanding the full picture helps developers build more effective and robust conversational systems.


Optimization
============

In this section, you will learn how hyperparameter optimization works in our library.

Pipeline
--------

The entire process of configuring a classifier in our library is divided into sequential steps (:ref:`and that's why <concepts-pipeline>`):

1. Selecting an embedder (EmbeddingNode)
2. Selecting a classifier (ScoringNode)
3. Selecting a decision rule (PredictionNode)

Each step has its own set of hyperparameters. To theoretically guarantee finding the ideal configuration through exhaustive search, it is necessary to check every element of the Cartesian product of the hyperparameter sets of these steps (grid search). In practice, achieving this is usually impossible because the number of combinations is too large.

Greedy Strategy
---------------

This is one of the ways to solve the problem of an overwhelming number of combinations. In our case, the greedy optimization algorithm is as follows:

1. Iterate through the hyperparameters of the embedder and fix the best one.
2. Iterate through the hyperparameters of the classifier and fix the best one.
3. Iterate through the hyperparameters of the decision rule and fix the best one.

This algorithm checks fewer combinations, which speeds up the process. To implement such an algorithm, it is necessary to be able to evaluate the quality of not only the final prediction of the entire pipeline but also its intermediate predictions. The main drawback of this approach is that the decisions made are optimal only locally, not globally. The metrics for evaluating intermediate predictions are only a proxy signal for the quality of the final prediction.

Random Search
-------------

A simpler strategy is to take a random subset of the full search space (random grid search). A straightforward strategy is to iterate through all combinations in random order until a certain time budget is exhausted.

This approach is less intelligent than the greedy strategy because, at any moment during the random combination search, poor embedders or any other bad parameters might keep appearing, despite they have been tested already. The greedy strategy would have eliminated such embedders at the beginning and not revisited them. On the other hand, random search, by its nature, does not rely on any local decisions.

Bayesian Optimization
---------------------

This is similar to random search over a subset, but during the search, we attempt to model the probabilistic space of hyperparameters. This allows us to avoid repeating hyperparameter values that have previously performed poorly. The search itself aims to balance exploration and exploitation.

This approach is more sophisticated and can lead to better results by intelligently exploring the hyperparameter space.


Text Embeddings and Representation Learning
===========================================

In this section, you will learn about the theoretical foundations of text embeddings and how AutoIntent leverages them for efficient intent classification.

What are Text Embeddings?
--------------------------

Text embeddings are dense vector representations of text that capture semantic meaning in a continuous vector space. Unlike traditional bag-of-words approaches that treat words as discrete tokens, embeddings map text to points in a high-dimensional space where semantically similar texts are located close to each other.

**Mathematical Foundation**

An embedding function :math:`f: \mathcal{T} \rightarrow \mathbb{R}^d` maps text :math:`t \in \mathcal{T}` to a dense vector :math:`\mathbf{e} \in \mathbb{R}^d`, where :math:`d` is the embedding dimension (typically 384, 768, or 1024). The key property is that semantic similarity in text space translates to geometric proximity in embedding space:

.. math::
   \text{semantic_similarity}(t_1, t_2) \approx \cos(\mathbf{e}_1, \mathbf{e}_2)

where :math:`\cos(\mathbf{e}_1, \mathbf{e}_2) = \frac{\mathbf{e}_1 \cdot \mathbf{e}_2}{||\mathbf{e}_1|| \cdot ||\mathbf{e}_2||}`

Transformer-Based Embeddings
-----------------------------

AutoIntent primarily uses transformer-based embedding models, which have revolutionized natural language processing through their attention mechanisms and contextual representations.

**Sentence Transformers**

The library leverages the `sentence-transformers <https://www.sbert.net/>`_ framework, which provides pre-trained models specifically optimized for semantic similarity tasks. These models are fine-tuned versions of BERT, RoBERTa, or other transformer architectures that produce high-quality sentence-level embeddings.

**Key Advantages:**

1. **Contextual Understanding**: Unlike word2vec or GloVe, transformer embeddings understand context. The word "bank" will have different representations in "river bank" vs. "money bank."

2. **Cross-lingual Capabilities**: Many models support multiple languages, crucial for dialog systems serving diverse users.

3. **Task Adaptation**: Models can be fine-tuned for specific domains or similarity tasks.

**Model Types in AutoIntent:**

- **Bi-encoders**: Encode texts independently, enabling efficient pre-computation and caching
- **Cross-encoders**: Process text pairs jointly for higher accuracy but at computational cost


Task-Specific Prompting
-----------------------

AutoIntent supports task-specific prompts to optimize embedding quality for different use cases.

Different tasks may benefit from different prompting strategies:

.. code-block:: python

   # Query prompt for search
   query_embeddings = embedder.embed(queries, TaskTypeEnum.query)
   
   # Passage prompt for documents
   doc_embeddings = embedder.embed(documents, TaskTypeEnum.passage)
   
   # Classification prompt for intents
   intent_embeddings = embedder.embed(utterances, TaskTypeEnum.classification)

Embedding Quality and Evaluation
---------------------------------

AutoIntent evaluates embedding quality using retrieval metrics:

- **NDCG** (Normalized Discounted Cumulative Gain)
- **Hit Rate** (Proportion of relevant items in top-k results)
- **Precision@k** and **Recall@k**


Practical Applications in Dialog Systems
-----------------------------------------

**Intent Classification Pipeline**

1. **User utterance**: "I want to book a flight to Paris"
2. **Embedding**: Convert to 768-dimensional vector
3. **Similarity search**: Find nearest training examples
4. **Classification**: Use embedding-based classifier (KNN, linear, etc.)
5. **Decision**: Apply confidence thresholds for final prediction

**Zero-Shot Classification**

Using intent descriptions for classification without training data:

.. code-block:: python

   from autointent.modules.scoring import BiEncoderDescriptionScorer
   
   scorer = BiEncoderDescriptionScorer()
   
   # Intent descriptions instead of training data
   descriptions = [
       "User wants to book a flight",
       "User wants to cancel a reservation",
       "User asks about flight status"
   ]
   
   scorer.fit([], [], descriptions)
   predictions = scorer.predict(["I want to fly to London"])

**Few-Shot Learning**

Embeddings excel in few-shot scenarios where limited training data is available. AutoIntent's k-NN based methods are particularly effective.