AutoML Customization#
In this guide, you will learn how to configure a custom hyperparameter search space.
Python API#
Before reading this guide, we recommend familiarizing yourself with the sections concepts and optimization.
Optimization Module#
To set up the optimization module, you need to create the following dictionary:
[1]:
knn_module = {
"module_name": "knn",
"k": [1, 5, 10, 50],
"embedder_config": ["sergeyzh/rubert-tiny-turbo"],
}
The module_name field specifies the name of the module. You can explore the available names by yourself:
[2]:
from autointent.modules import DECISION_MODULES, EMBEDDING_MODULES, REGEX_MODULES, SCORING_MODULES
print(list(SCORING_MODULES.keys()))
print(list(DECISION_MODULES.keys()))
print(list(EMBEDDING_MODULES.keys()))
print(list(REGEX_MODULES.keys()))
/home/runner/work/AutoIntent/AutoIntent/.venv/lib/python3.12/site-packages/tqdm/auto.py:21: TqdmWarning: IProgress not found. Please update jupyter and ipywidgets. See https://ipywidgets.readthedocs.io/en/stable/user_install.html
from .autonotebook import tqdm as notebook_tqdm
['catboost', 'dnnc', 'gcn', 'knn', 'linear', 'description_bi', 'description_cross', 'description_llm', 'rerank', 'sklearn', 'mlknn', 'bert', 'cnn', 'lora', 'ptuning', 'rnn']
['argmax', 'jinoos', 'threshold', 'tunable', 'adaptive']
['retrieval', 'logreg_embedding']
['simple']
All fields except module_name are lists that define the search space for each hyperparameter (see KNNScorer). If you omit them, the default set of hyperparameters will be used:
[3]:
linear_module = {"module_name": "linear"}
See docs LinearScorer.
Optimization Node#
To set up the optimization node, you need to create a list of modules and specify the target metric for optimization:
[4]:
scoring_node = {
"node_type": "scoring",
"target_metric": "scoring_roc_auc",
"search_space": [
knn_module,
linear_module,
],
}
Search Space#
The search space for the entire pipeline looks approximately like this:
[5]:
search_space = [
{
"node_type": "embedding",
"target_metric": "retrieval_hit_rate",
"search_space": [
{
"module_name": "retrieval",
"k": [10],
"embedder_config": ["avsolatorio/GIST-small-Embedding-v0", "sergeyzh/rubert-tiny-turbo"],
}
],
},
{
"node_type": "scoring",
"target_metric": "scoring_roc_auc",
"search_space": [
{"module_name": "knn", "k": [1, 3, 5, 10], "weights": ["uniform", "distance", "closest"]},
{"module_name": "linear"},
{
"module_name": "dnnc",
"cross_encoder_config": ["DiTy/cross-encoder-russian-msmarco"],
"k": [1, 3, 5, 10],
},
],
},
{
"node_type": "decision",
"target_metric": "decision_accuracy",
"search_space": [{"module_name": "threshold", "thresh": [0.5]}, {"module_name": "argmax"}],
},
]
Load Data#
Let us use small subset of popular clinc150 dataset:
[6]:
from autointent import Dataset
dataset = Dataset.from_hub("DeepPavlov/clinc150_subset")
Start Auto Configuration#
[7]:
from autointent import Pipeline
pipeline_optimizer = Pipeline.from_search_space(search_space)
pipeline_optimizer.fit(dataset)
Memory storage is not compatible with resuming optimization. Modules from previous runs won't be available. Set dump_modules=True in LoggingConfig to enable proper resuming.
Storage directory must be provided for study persistence.
[I 2026-05-20 09:46:13,127] A new study created in memory with name: NodeType.embedding
Storage directory must be provided for study persistence.
/home/runner/work/AutoIntent/AutoIntent/.venv/lib/python3.12/site-packages/sklearn/linear_model/_logistic.py:1780: FutureWarning: The default value for l1_ratios will change from None to (0.0,) in version 1.10. From version 1.10 onwards, only array-like with values in [0, 1] will be allowed, None will be forbidden. To avoid this warning, explicitly set a value, e.g. l1_ratios=(0,).
warnings.warn(
/home/runner/work/AutoIntent/AutoIntent/.venv/lib/python3.12/site-packages/sklearn/linear_model/_logistic.py:1823: FutureWarning: The fitted attributes of LogisticRegressionCV will be simplified in scikit-learn 1.10 to remove redundancy. Set`use_legacy_attributes=False` to enable the new behavior now, or set it to `True` to silence this warning during the transition period while keeping the deprecated behavior for the time being. The default value of use_legacy_attributes will change from True to False in scikit-learn 1.10. See the docstring of LogisticRegressionCV for more details.
warnings.warn(
Storage directory must be provided for study persistence.
"argmax" is NOT designed to handle OOS samples, but your data contains it. So, using this method reduces the power of classification.
/home/runner/work/AutoIntent/AutoIntent/.venv/lib/python3.12/site-packages/sklearn/metrics/_classification.py:1833: UndefinedMetricWarning: Precision is ill-defined and being set to 0.0 in labels with no predicted samples. Use `zero_division` parameter to control this behavior.
_warn_prf(average, modifier, f"{metric.capitalize()} is", result.shape[0])
[7]:
<autointent.context._context.Context at 0x7f70c3a4b2c0>
There are three hyperparameter tuning samplers available:
“random”
“tpe”
All the samplers are implemented with optuna.
One can use more versatile OptimizationConfig and from_optimization_config.
See Also#
Modules API reference to get familiar with modules to include into search space