autointent.modules.CatBoostScorer#
- class autointent.modules.CatBoostScorer(embedder_config=None, features_type=FeaturesType.BOTH, use_embedding_features=True, loss_function=None, verbose=False, val_fraction=0.2, early_stopping_rounds=100, iterations=1000, depth=6, **catboost_kwargs)#
Bases:
autointent.modules.base.BaseScorer
CatBoost scorer using either external embeddings or CatBoost’s own BoW encoding.
- Parameters:
embedder_config (autointent.configs.EmbedderConfig | str | dict[str, Any] | None) – Config of the base transformer model (HFModelConfig, str, or dict) If None (default) the scorer relies on CatBoost’s own Bag-of-Words encoding, otherwise the provided embedder is used.
features_type (FeaturesType) – Type of features used in CatBoost. Can be one of: - “text”: Use only text features (CatBoost’s BoW encoding). - “embedding”: Use only embedding features. - “both”: Use both text and embedding features.
use_embedding_features (bool) – If True, the model uses CatBoost embedding_features otherwise each number will be in separate column.
loss_function (str | None) – CatBoost loss function. If None, an appropriate loss is chosen automatically from the task type.
verbose (bool) – If True, CatBoost prints training progress.
val_fraction (float | None) – fraction of training data used for early stopping. Set to None to disaple early stopping. Note: early stopping is not supported with multilabel classification.
early_stopping_rounds (int) – number of iterations without metric increasing waiting for early stopping. Ignored when
val_fraction
isNone
.**catboost_kwargs (dict[str, Any]) – Any additional keyword arguments forwarded to
catboost.CatBoostClassifier
. Please refer to catboost’s documentationiterations (int)
depth (int)
**catboost_kwargs
Example:#
from autointent.modules import CatBoostScorer scorer = CatBoostScorer( iterations=50, learning_rate=0.05, depth=6, l2_leaf_reg=3, eval_metric="Accuracy", random_seed=42, verbose=False, features_type="embedding", # or "text" or "both" ) utterances = ["hello", "goodbye", "allo", "sayonara"] labels = [0, 1, 0, 1] scorer.fit(utterances, labels) test_utterances = ["hi", "bye"] probabilities = scorer.predict(test_utterances)
- name = 'catboost'#
Name of the module to reference in search space configuration.
- supports_multiclass = True#
Whether the module supports multiclass classification
- supports_multilabel = True#
Whether the module supports multilabel classification
- encoder_features_types#
- val_fraction = 0.2#
- early_stopping_rounds = 100#
- iterations = 1000#
- depth = 6#
- features_type#
- use_embedding_features = True#
- embedder_config#
- loss_function = None#
- verbose = False#
- catboost_kwargs#
- classmethod from_context(context, embedder_config=None, features_type=FeaturesType.BOTH, use_embedding_features=True, loss_function=None, verbose=False, val_fraction=0.2, early_stopping_rounds=100, iterations=1000, depth=6, **catboost_kwargs)#
Initialize self from context.
- Parameters:
context (autointent.Context) – Context to init from
**kwargs – Additional kwargs
embedder_config (autointent.configs.EmbedderConfig | str | dict[str, Any] | None)
features_type (FeaturesType)
use_embedding_features (bool)
loss_function (str | None)
verbose (bool)
val_fraction (autointent.custom_types.FloatFromZeroToOne | None)
early_stopping_rounds (pydantic.PositiveInt)
iterations (pydantic.PositiveInt)
depth (pydantic.PositiveInt)
- Returns:
Initialized module
- Return type:
- get_implicit_initialization_params()#
Return default params used in
__init__
method.Some parameters of the module may be inferred using context rather from
__init__
method. But they need to be logged for reproducibility during loading from disk.
- fit(utterances, labels)#
Fit the scoring module to the training data.
- predict(utterances)#
Predict scores for a list of utterances.
- clear_cache()#
Clear cache.
- Return type:
None