autointent.modules.scoring.CatBoostScorer#

class autointent.modules.scoring.CatBoostScorer(embedder_config=None, features_type=FeaturesType.BOTH, use_embedding_features=True, loss_function=None, verbose=False, val_fraction=0.2, early_stopping_rounds=100, iterations=1000, depth=6, **catboost_kwargs)#

Bases: autointent.modules.base.BaseScorer

CatBoost scorer using either external embeddings or CatBoost’s own BoW encoding.

Parameters:
  • embedder_config (autointent.configs.EmbedderConfig | str | dict[str, Any] | None) – Config of the base transformer model (HFModelConfig, str, or dict) If None (default) the scorer relies on CatBoost’s own Bag-of-Words encoding, otherwise the provided embedder is used.

  • features_type (FeaturesType) – Type of features used in CatBoost. Can be one of: - “text”: Use only text features (CatBoost’s BoW encoding). - “embedding”: Use only embedding features. - “both”: Use both text and embedding features.

  • use_embedding_features (bool) – If True, the model uses CatBoost embedding_features otherwise each number will be in separate column.

  • loss_function (str | None) – CatBoost loss function. If None, an appropriate loss is chosen automatically from the task type.

  • verbose (bool) – If True, CatBoost prints training progress.

  • val_fraction (float | None) – fraction of training data used for early stopping. Set to None to disaple early stopping. Note: early stopping is not supported with multilabel classification.

  • early_stopping_rounds (int) – number of iterations without metric increasing waiting for early stopping. Ignored when val_fraction is None.

  • **catboost_kwargs (dict[str, Any]) – Any additional keyword arguments forwarded to catboost.CatBoostClassifier. Please refer to catboost’s documentation

  • iterations (int)

  • depth (int)

  • **catboost_kwargs

Example:#

from autointent.modules import CatBoostScorer

scorer = CatBoostScorer(
    iterations=50,
    learning_rate=0.05,
    depth=6,
    l2_leaf_reg=3,
    eval_metric="Accuracy",
    random_seed=42,
    verbose=False,
    features_type="embedding",  # or "text" or "both"
)
utterances = ["hello", "goodbye", "allo", "sayonara"]
labels = [0, 1, 0, 1]
scorer.fit(utterances, labels)
test_utterances = ["hi", "bye"]
probabilities = scorer.predict(test_utterances)
name = 'catboost'#
supports_multiclass = True#
supports_multilabel = True#
encoder_features_types#
val_fraction = 0.2#
early_stopping_rounds = 100#
iterations = 1000#
depth = 6#
features_type#
use_embedding_features = True#
embedder_config#
loss_function = None#
verbose = False#
catboost_kwargs#
classmethod from_context(context, embedder_config=None, features_type=FeaturesType.BOTH, use_embedding_features=True, loss_function=None, verbose=False, val_fraction=0.2, early_stopping_rounds=100, iterations=1000, depth=6, **catboost_kwargs)#
Parameters:
  • context (autointent.Context)

  • embedder_config (autointent.configs.EmbedderConfig | str | dict[str, Any] | None)

  • features_type (FeaturesType)

  • use_embedding_features (bool)

  • loss_function (str | None)

  • verbose (bool)

  • val_fraction (autointent.custom_types.FloatFromZeroToOne | None)

  • early_stopping_rounds (pydantic.PositiveInt)

  • iterations (pydantic.PositiveInt)

  • depth (pydantic.PositiveInt)

  • catboost_kwargs (dict[str, Any])

Return type:

CatBoostScorer

get_implicit_initialization_params()#
Return type:

dict[str, Any]

get_extra_params()#
Return type:

dict[str, Any]

fit(utterances, labels)#
Parameters:
  • utterances (list[str])

  • labels (autointent.custom_types.ListOfLabels)

Return type:

None

predict(utterances)#
Parameters:

utterances (list[str])

Return type:

numpy.typing.NDArray[numpy.float64]

clear_cache()#
Return type:

None