Run reporting#

This script demonstrates how to report the optimization process using the AutoIntent library.

[1]:

search_space = [
    {
        "node_type": "embedding",
        "target_metric": "retrieval_hit_rate",
        "search_space": [
            {
                "module_name": "retrieval",
                "k": [10],
                "embedder_config": ["avsolatorio/GIST-small-Embedding-v0", "sergeyzh/rubert-tiny-turbo"],
            }
        ],
    },
    {
        "node_type": "scoring",
        "target_metric": "scoring_roc_auc",
        "search_space": [
            {"module_name": "knn", "k": [1, 3, 5, 10], "weights": ["uniform", "distance", "closest"]},
            {"module_name": "linear"},
            {
                "module_name": "dnnc",
                "cross_encoder_config": ["cross-encoder/ms-marco-MiniLM-L6-v2"],
                "k": [1, 3, 5, 10],
            },
        ],
    },
    {
        "node_type": "decision",
        "target_metric": "decision_accuracy",
        "search_space": [{"module_name": "threshold", "thresh": [0.5]}, {"module_name": "argmax"}],
    },
]

Load Data#

Let us use small subset of popular clinc150 dataset:

[2]:

from autointent import Dataset

dataset = Dataset.from_hub("DeepPavlov/clinc150_subset")

/opt/hostedtoolcache/Python/3.10.18/x64/lib/python3.10/site-packages/tqdm/auto.py:21: TqdmWarning: IProgress not found. Please update jupyter and ipywidgets. See https://ipywidgets.readthedocs.io/en/stable/user_install.html
  from .autonotebook import tqdm as notebook_tqdm

Start Auto Configuration#

[3]:

from autointent import Pipeline

pipeline_optimizer = Pipeline.from_search_space(search_space)

Reporting#

Currently supported reporting options are:

tensorboard
wandb

[4]:

from autointent.configs import LoggingConfig
from pathlib import Path

log_config = LoggingConfig(
    run_name="test_tensorboard", report_to=["tensorboard"], project_dir=Path("my_projects"), dump_modules=False
)

pipeline_optimizer.set_config(log_config)

[5]:

pipeline_optimizer.fit(dataset)

Memory storage is not compatible with resuming optimization. Modules from previous runs won't be available. Set dump_modules=True in LoggingConfig to enable proper resuming.
Storage directory must be provided for study persistence.
[I 2025-08-01 06:19:02,956] A new study created in memory with name: embedding
Storage directory must be provided for study persistence.
/opt/hostedtoolcache/Python/3.10.18/x64/lib/python3.10/site-packages/torch/nn/modules/module.py:1762: FutureWarning: `encoder_attention_mask` is deprecated and will be removed in version 4.55.0 for `BertSdpaSelfAttention.forward`.
  return forward_call(*args, **kwargs)
/opt/hostedtoolcache/Python/3.10.18/x64/lib/python3.10/site-packages/torch/nn/modules/module.py:1762: FutureWarning: `encoder_attention_mask` is deprecated and will be removed in version 4.55.0 for `BertSdpaSelfAttention.forward`.
  return forward_call(*args, **kwargs)
Storage directory must be provided for study persistence.
"argmax" is NOT designed to handle OOS samples, but your data contains it. So, using this method reduces the power of classification.
/opt/hostedtoolcache/Python/3.10.18/x64/lib/python3.10/site-packages/sklearn/metrics/_classification.py:1731: UndefinedMetricWarning: Precision is ill-defined and being set to 0.0 in labels with no predicted samples. Use `zero_division` parameter to control this behavior.
  _warn_prf(average, modifier, f"{metric.capitalize()} is", result.shape[0])

[5]:

<autointent.context._context.Context at 0x7f3ac76a36a0>

Now results of the optimization process can be viewed in the tensorboard.

tensorboard --logdir test_tensorboard