autointent.modules.CatBoostScorer#

class autointent.modules.CatBoostScorer(embedder_config=None, features_type=FeaturesType.BOTH, use_embedding_features=True, loss_function=None, verbose=False, val_fraction=0.2, early_stopping_rounds=100, iterations=1000, depth=6, **catboost_kwargs)#

Bases: autointent.modules.base.BaseScorer

CatBoost scorer using either external embeddings or CatBoost’s own BoW encoding.

Parameters:

embedder_config (autointent.configs.EmbedderConfig | str | dict[str, Any] | None) – Config of the base transformer model (HFModelConfig, str, or dict) If None (default) the scorer relies on CatBoost’s own Bag-of-Words encoding, otherwise the provided embedder is used.
features_type (FeaturesType) – Type of features used in CatBoost. Can be one of: - “text”: Use only text features (CatBoost’s BoW encoding). - “embedding”: Use only embedding features. - “both”: Use both text and embedding features.
use_embedding_features (bool) – If True, the model uses CatBoost embedding_features otherwise each number will be in separate column.
loss_function (str | None) – CatBoost loss function. If None, an appropriate loss is chosen automatically from the task type.
verbose (bool) – If True, CatBoost prints training progress.
val_fraction (float | None) – fraction of training data used for early stopping. Set to None to disaple early stopping. Note: early stopping is not supported with multilabel classification.
early_stopping_rounds (int) – number of iterations without metric increasing waiting for early stopping. Ignored when val_fraction is None.
**catboost_kwargs (dict[str, Any]) – Any additional keyword arguments forwarded to catboost.CatBoostClassifier. Please refer to catboost’s documentation
iterations (int)
depth (int)
**catboost_kwargs

Example:#

from autointent.modules import CatBoostScorer

scorer = CatBoostScorer(
    iterations=50,
    learning_rate=0.05,
    depth=6,
    l2_leaf_reg=3,
    eval_metric="Accuracy",
    random_seed=42,
    verbose=False,
    features_type="embedding",  # or "text" or "both"
)
utterances = ["hello", "goodbye", "allo", "sayonara"]
labels = [0, 1, 0, 1]
scorer.fit(utterances, labels)
test_utterances = ["hi", "bye"]
probabilities = scorer.predict(test_utterances)

name = 'catboost'#: Name of the module to reference in search space configuration.

supports_multiclass = True#: Whether the module supports multiclass classification

supports_multilabel = True#: Whether the module supports multilabel classification

encoder_features_types#

val_fraction = 0.2#

early_stopping_rounds = 100#

iterations = 1000#

depth = 6#

features_type#

use_embedding_features = True#

embedder_config#

loss_function = None#

verbose = False#

catboost_kwargs#

classmethod from_context(context, embedder_config=None, features_type=FeaturesType.BOTH, use_embedding_features=True, loss_function=None, verbose=False, val_fraction=0.2, early_stopping_rounds=100, iterations=1000, depth=6, **catboost_kwargs)#

Initialize self from context.

Parameters:

context (autointent.Context) – Context to init from
**kwargs – Additional kwargs
embedder_config (autointent.configs.EmbedderConfig | str | dict[str, Any] | None)
features_type (FeaturesType)
use_embedding_features (bool)
loss_function (str | None)
verbose (bool)
val_fraction (autointent.custom_types.FloatFromZeroToOne | None)
early_stopping_rounds (pydantic.PositiveInt)
iterations (pydantic.PositiveInt)
depth (pydantic.PositiveInt)
catboost_kwargs (dict[str, Any])

Returns:

Initialized module

Return type:

CatBoostScorer

get_implicit_initialization_params()#

Return default params used in __init__ method.

Some parameters of the module may be inferred using context rather from __init__ method. But they need to be logged for reproducibility during loading from disk.

Returns:: Dictionary of default params
Return type:: dict[str, Any]

get_extra_params()#

Return type:: dict[str, Any]

fit(utterances, labels)#

Fit the scoring module to the training data.

Parameters:

utterances (list[str]) – List of training utterances.
labels (autointent.custom_types.ListOfLabels) – List of training labels.

Return type:

None

predict(utterances)#

Predict scores for a list of utterances.

Parameters:: utterances (list[str]) – List of utterances to score.
Returns:: Array of predicted scores.
Return type:: numpy.typing.NDArray[numpy.float64]

clear_cache()#

Clear cache.

Return type:: None