AutoML Customization#
In this guide, you will learn how to configure a custom hyperparameter search space.
Python API#
Before reading this guide, we recommend familiarizing yourself with the sections concepts and optimization.
Optimization Module#
To set up the optimization module, you need to create the following dictionary:
[1]:
knn_module = {
"module_name": "knn",
"k": [1, 5, 10, 50],
"embedder_config": ["sergeyzh/rubert-tiny-turbo"],
}
The module_name
field specifies the name of the module. You can explore the available names by yourself:
[2]:
from autointent.modules import DECISION_MODULES, EMBEDDING_MODULES, REGEX_MODULES, SCORING_MODULES
print(list(SCORING_MODULES.keys()))
print(list(DECISION_MODULES.keys()))
print(list(EMBEDDING_MODULES.keys()))
print(list(REGEX_MODULES.keys()))
/opt/hostedtoolcache/Python/3.10.18/x64/lib/python3.10/site-packages/tqdm/auto.py:21: TqdmWarning: IProgress not found. Please update jupyter and ipywidgets. See https://ipywidgets.readthedocs.io/en/stable/user_install.html
from .autonotebook import tqdm as notebook_tqdm
['catboost', 'dnnc', 'knn', 'linear', 'description_bi', 'description_cross', 'description_llm', 'rerank', 'sklearn', 'mlknn', 'bert', 'cnn', 'lora', 'ptuning', 'rnn']
['argmax', 'jinoos', 'threshold', 'tunable', 'adaptive']
['retrieval', 'logreg_embedding']
['simple']
All fields except module_name
are lists that define the search space for each hyperparameter (see KNNScorer). If you omit them, the default set of hyperparameters will be used:
[3]:
linear_module = {"module_name": "linear"}
See docs LinearScorer.
Optimization Node#
To set up the optimization node, you need to create a list of modules and specify the target metric for optimization:
[4]:
scoring_node = {
"node_type": "scoring",
"target_metric": "scoring_roc_auc",
"search_space": [
knn_module,
linear_module,
],
}
Search Space#
The search space for the entire pipeline looks approximately like this:
[5]:
search_space = [
{
"node_type": "embedding",
"target_metric": "retrieval_hit_rate",
"search_space": [
{
"module_name": "retrieval",
"k": [10],
"embedder_config": ["avsolatorio/GIST-small-Embedding-v0", "sergeyzh/rubert-tiny-turbo"],
}
],
},
{
"node_type": "scoring",
"target_metric": "scoring_roc_auc",
"search_space": [
{"module_name": "knn", "k": [1, 3, 5, 10], "weights": ["uniform", "distance", "closest"]},
{"module_name": "linear"},
{
"module_name": "dnnc",
"cross_encoder_config": ["DiTy/cross-encoder-russian-msmarco"],
"k": [1, 3, 5, 10],
},
],
},
{
"node_type": "decision",
"target_metric": "decision_accuracy",
"search_space": [{"module_name": "threshold", "thresh": [0.5]}, {"module_name": "argmax"}],
},
]
Load Data#
Let us use small subset of popular clinc150
dataset:
[6]:
from autointent import Dataset
dataset = Dataset.from_hub("DeepPavlov/clinc150_subset")
Start Auto Configuration#
[7]:
from autointent import Pipeline
pipeline_optimizer = Pipeline.from_search_space(search_space)
pipeline_optimizer.fit(dataset)
Memory storage is not compatible with resuming optimization. Modules from previous runs won't be available. Set dump_modules=True in LoggingConfig to enable proper resuming.
Storage directory must be provided for study persistence.
[I 2025-08-01 06:16:19,203] A new study created in memory with name: embedding
Storage directory must be provided for study persistence.
/opt/hostedtoolcache/Python/3.10.18/x64/lib/python3.10/site-packages/torch/nn/modules/module.py:1762: FutureWarning: `encoder_attention_mask` is deprecated and will be removed in version 4.55.0 for `BertSdpaSelfAttention.forward`.
return forward_call(*args, **kwargs)
/opt/hostedtoolcache/Python/3.10.18/x64/lib/python3.10/site-packages/torch/nn/modules/module.py:1762: FutureWarning: `encoder_attention_mask` is deprecated and will be removed in version 4.55.0 for `BertSdpaSelfAttention.forward`.
return forward_call(*args, **kwargs)
Storage directory must be provided for study persistence.
"argmax" is NOT designed to handle OOS samples, but your data contains it. So, using this method reduces the power of classification.
/opt/hostedtoolcache/Python/3.10.18/x64/lib/python3.10/site-packages/sklearn/metrics/_classification.py:1731: UndefinedMetricWarning: Precision is ill-defined and being set to 0.0 in labels with no predicted samples. Use `zero_division` parameter to control this behavior.
_warn_prf(average, modifier, f"{metric.capitalize()} is", result.shape[0])
[7]:
<autointent.context._context.Context at 0x7f751ce33a90>
There are three hyperparameter tuning samplers available:
“random”
“tpe”
All the samplers are implemented with optuna.
One can use more versatile OptimizationConfig and from_optimization_config.
See Also#
Modules API reference to get familiar with modules to include into search space