autointent.modules.scoring.CatBoostScorer#

class autointent.modules.scoring.CatBoostScorer(embedder_config=None, features_type=FeaturesType.BOTH, use_embedding_features=True, loss_function=None, verbose=False, val_fraction=0.2, early_stopping_rounds=100, iterations=1000, depth=6, **catboost_kwargs)#

Bases: autointent.modules.base.BaseScorer

CatBoost scorer using either external embeddings or CatBoost’s own BoW encoding.

Parameters:
  • embedder_config (autointent.configs.EmbedderConfig | str | dict[str, Any] | None) – Config of the base transformer model (HFModelConfig, str, or dict) If None (default) the scorer relies on CatBoost’s own Bag-of-Words encoding, otherwise the provided embedder is used.

  • features_type (FeaturesType) – Type of features used in CatBoost. Can be one of: - “text”: Use only text features (CatBoost’s BoW encoding). - “embedding”: Use only embedding features. - “both”: Use both text and embedding features.

  • use_embedding_features (bool) – If True, the model uses CatBoost embedding_features otherwise each number will be in separate column.

  • loss_function (str | None) – CatBoost loss function. If None, an appropriate loss is chosen automatically from the task type.

  • verbose (bool) – If True, CatBoost prints training progress.

  • val_fraction (float | None) – fraction of training data used for early stopping. Set to None to disaple early stopping. Note: early stopping is not supported with multilabel classification.

  • early_stopping_rounds (int) – number of iterations without metric increasing waiting for early stopping. Ignored when val_fraction is None.

  • **catboost_kwargs (dict[str, Any]) – Any additional keyword arguments forwarded to catboost.CatBoostClassifier. Please refer to catboost’s documentation

  • iterations (int)

  • depth (int)

  • **catboost_kwargs

Example:#

from autointent.modules import CatBoostScorer

scorer = CatBoostScorer(
    iterations=50,
    learning_rate=0.05,
    depth=6,
    l2_leaf_reg=3,
    eval_metric="Accuracy",
    random_seed=42,
    verbose=False,
    features_type="embedding",  # or "text" or "both"
)
utterances = ["hello", "goodbye", "allo", "sayonara"]
labels = [0, 1, 0, 1]
scorer.fit(utterances, labels)
test_utterances = ["hi", "bye"]
probabilities = scorer.predict(test_utterances)
name = 'catboost'#

Name of the module to reference in search space configuration.

supports_multiclass = True#

Whether the module supports multiclass classification

supports_multilabel = True#

Whether the module supports multilabel classification

encoder_features_types#
val_fraction = 0.2#
early_stopping_rounds = 100#
iterations = 1000#
depth = 6#
features_type#
use_embedding_features = True#
embedder_config#
loss_function = None#
verbose = False#
catboost_kwargs#
classmethod from_context(context, embedder_config=None, features_type=FeaturesType.BOTH, use_embedding_features=True, loss_function=None, verbose=False, val_fraction=0.2, early_stopping_rounds=100, iterations=1000, depth=6, **catboost_kwargs)#

Initialize self from context.

Parameters:
  • context (autointent.Context) – Context to init from

  • **kwargs – Additional kwargs

  • embedder_config (autointent.configs.EmbedderConfig | str | dict[str, Any] | None)

  • features_type (FeaturesType)

  • use_embedding_features (bool)

  • loss_function (str | None)

  • verbose (bool)

  • val_fraction (autointent.custom_types.FloatFromZeroToOne | None)

  • early_stopping_rounds (pydantic.PositiveInt)

  • iterations (pydantic.PositiveInt)

  • depth (pydantic.PositiveInt)

  • catboost_kwargs (dict[str, Any])

Returns:

Initialized module

Return type:

CatBoostScorer

get_implicit_initialization_params()#

Return default params used in __init__ method.

Some parameters of the module may be inferred using context rather from __init__ method. But they need to be logged for reproducibility during loading from disk.

Returns:

Dictionary of default params

Return type:

dict[str, Any]

get_extra_params()#
Return type:

dict[str, Any]

fit(utterances, labels)#

Fit the scoring module to the training data.

Parameters:
  • utterances (list[str]) – List of training utterances.

  • labels (autointent.custom_types.ListOfLabels) – List of training labels.

Return type:

None

predict(utterances)#

Predict scores for a list of utterances.

Parameters:

utterances (list[str]) – List of utterances to score.

Returns:

Array of predicted scores.

Return type:

numpy.typing.NDArray[numpy.float64]

clear_cache()#

Clear cache.

Return type:

None