AutoML Customization#

In this guide, you will learn how to configure a custom hyperparameter search space.

Python API#

Before reading this guide, we recommend familiarizing yourself with the sections concepts and optimization.

Optimization Module#

To set up the optimization module, you need to create the following dictionary:

[1]:
knn_module = {
    "module_name": "knn",
    "k": [1, 5, 10, 50],
    "embedder_config": ["sergeyzh/rubert-tiny-turbo"],
}

The module_name field specifies the name of the module. You can explore the available names by yourself:

[2]:
from autointent.modules import DECISION_MODULES, EMBEDDING_MODULES, REGEX_MODULES, SCORING_MODULES

print(list(SCORING_MODULES.keys()))
print(list(DECISION_MODULES.keys()))
print(list(EMBEDDING_MODULES.keys()))
print(list(REGEX_MODULES.keys()))
/home/runner/work/AutoIntent/AutoIntent/.venv/lib/python3.12/site-packages/tqdm/auto.py:21: TqdmWarning: IProgress not found. Please update jupyter and ipywidgets. See https://ipywidgets.readthedocs.io/en/stable/user_install.html
  from .autonotebook import tqdm as notebook_tqdm
['catboost', 'dnnc', 'gcn', 'knn', 'linear', 'description_bi', 'description_cross', 'description_llm', 'rerank', 'sklearn', 'mlknn', 'bert', 'cnn', 'lora', 'ptuning', 'rnn']
['argmax', 'jinoos', 'threshold', 'tunable', 'adaptive']
['retrieval', 'logreg_embedding']
['simple']

All fields except module_name are lists that define the search space for each hyperparameter (see KNNScorer). If you omit them, the default set of hyperparameters will be used:

[3]:
linear_module = {"module_name": "linear"}

See docs LinearScorer.

Optimization Node#

To set up the optimization node, you need to create a list of modules and specify the target metric for optimization:

[4]:
scoring_node = {
    "node_type": "scoring",
    "target_metric": "scoring_roc_auc",
    "search_space": [
        knn_module,
        linear_module,
    ],
}

Search Space#

The search space for the entire pipeline looks approximately like this:

[5]:
from typing import Any

search_space: list[dict[str, Any]] = [
    {
        "node_type": "embedding",
        "target_metric": "retrieval_hit_rate",
        "search_space": [
            {
                "module_name": "retrieval",
                "k": [10],
                "embedder_config": ["avsolatorio/GIST-small-Embedding-v0", "sergeyzh/rubert-tiny-turbo"],
            }
        ],
    },
    {
        "node_type": "scoring",
        "target_metric": "scoring_roc_auc",
        "search_space": [
            {"module_name": "knn", "k": [1, 3, 5, 10], "weights": ["uniform", "distance", "closest"]},
            {"module_name": "linear"},
            {
                "module_name": "dnnc",
                "cross_encoder_config": ["DiTy/cross-encoder-russian-msmarco"],
                "k": [1, 3, 5, 10],
            },
        ],
    },
    {
        "node_type": "decision",
        "target_metric": "decision_accuracy",
        "search_space": [{"module_name": "threshold", "thresh": [0.5]}, {"module_name": "argmax"}],
    },
]

Load Data#

Let us use small subset of popular clinc150 dataset:

[6]:
from autointent import Dataset

dataset = Dataset.from_hub("DeepPavlov/clinc150_subset")

Start Auto Configuration#

[7]:
from autointent import Pipeline

pipeline_optimizer = Pipeline.from_search_space(search_space)
pipeline_optimizer.fit(dataset)
Memory storage is not compatible with resuming optimization. Modules from previous runs won't be available. Set dump_modules=True in LoggingConfig to enable proper resuming.
/home/runner/work/AutoIntent/AutoIntent/src/autointent/nodes/_node_optimizer.py:82: FutureWarning: `consider_prior` has been deprecated in v4.3.0. This feature will be removed in v6.0.0. See https://github.com/optuna/optuna/releases/tag/v4.3.0.
  sampler_instance = optuna.samplers.TPESampler(
/home/runner/work/AutoIntent/AutoIntent/src/autointent/nodes/_node_optimizer.py:82: FutureWarning: `prior_weight` has been deprecated in v4.9.0. This feature will be removed in v6.0.0. See https://github.com/optuna/optuna/releases/tag/v4.9.0.
  sampler_instance = optuna.samplers.TPESampler(
Storage directory must be provided for study persistence.
[I 2026-06-17 19:12:53,675] A new study created in memory with name: NodeType.embedding
Storage directory must be provided for study persistence.
/home/runner/work/AutoIntent/AutoIntent/.venv/lib/python3.12/site-packages/sklearn/linear_model/_logistic.py:2092: FutureWarning: The default value for l1_ratios will change from None to (0.0,) in version 1.10. From version 1.10 onwards, only array-like with values in [0, 1] will be allowed, None will be forbidden. To avoid this warning, explicitly set a value, e.g. l1_ratios=(0,).
  warnings.warn(
/home/runner/work/AutoIntent/AutoIntent/.venv/lib/python3.12/site-packages/sklearn/linear_model/_logistic.py:2137: FutureWarning: The default value of the parameter 'scoring' will change from None, i.e. accuracy, to 'neg_log_loss' in version 1.11. To silence this warning, explicitly set the scoring parameter: scoring='neg_log_loss' for the new, scoring='accuracy' or scoring=None for the old default.
  warnings.warn(
/home/runner/work/AutoIntent/AutoIntent/.venv/lib/python3.12/site-packages/sklearn/linear_model/_logistic.py:2150: FutureWarning: The fitted attributes of LogisticRegressionCV will be simplified in scikit-learn 1.10 to remove redundancy. Set`use_legacy_attributes=False` to enable the new behavior now, or set it to `True` to silence this warning during the transition period while keeping the deprecated behavior for the time being. The default value of use_legacy_attributes will change from True to False in scikit-learn 1.10. See the docstring of LogisticRegressionCV for more details.
  warnings.warn(
/home/runner/work/AutoIntent/AutoIntent/src/autointent/nodes/_node_optimizer.py:82: FutureWarning: `consider_prior` has been deprecated in v4.3.0. This feature will be removed in v6.0.0. See https://github.com/optuna/optuna/releases/tag/v4.3.0.
  sampler_instance = optuna.samplers.TPESampler(
/home/runner/work/AutoIntent/AutoIntent/src/autointent/nodes/_node_optimizer.py:82: FutureWarning: `prior_weight` has been deprecated in v4.9.0. This feature will be removed in v6.0.0. See https://github.com/optuna/optuna/releases/tag/v4.9.0.
  sampler_instance = optuna.samplers.TPESampler(
Storage directory must be provided for study persistence.
"argmax" is NOT designed to handle OOS samples, but your data contains it. So, using this method reduces the power of classification.
/home/runner/work/AutoIntent/AutoIntent/.venv/lib/python3.12/site-packages/sklearn/metrics/_classification.py:1879: UndefinedMetricWarning: Precision is ill-defined and being set to 0.0 in labels with no predicted samples. Use `zero_division` parameter to control this behavior.
  _warn_prf(average, modifier, f"{metric.capitalize()} is", result.shape[0])
[7]:
<autointent.context._context.Context at 0x7faeeb971640>

There are three hyperparameter tuning samplers available:

  • “random”

  • “tpe”

All the samplers are implemented with optuna.

One can use more versatile OptimizationConfig and from_optimization_config.

See Also#