autointent.modules.CatBoostScorer#
- class autointent.modules.CatBoostScorer(embedder_config=None, features_type=FeaturesType.BOTH, use_embedding_features=True, loss_function=None, verbose=False, val_fraction=0.2, early_stopping_rounds=100, iterations=1000, depth=6, **catboost_kwargs)#
- Bases: - autointent.modules.base.BaseScorer- CatBoost scorer using either external embeddings or CatBoost’s own BoW encoding. - Parameters:
- embedder_config (autointent.configs.EmbedderConfig | str | dict[str, Any] | None) – Config of the base transformer model (HFModelConfig, str, or dict) If None (default) the scorer relies on CatBoost’s own Bag-of-Words encoding, otherwise the provided embedder is used. 
- features_type (FeaturesType) – Type of features used in CatBoost. Can be one of: - “text”: Use only text features (CatBoost’s BoW encoding). - “embedding”: Use only embedding features. - “both”: Use both text and embedding features. 
- use_embedding_features (bool) – If True, the model uses CatBoost embedding_features otherwise each number will be in separate column. 
- loss_function (str | None) – CatBoost loss function. If None, an appropriate loss is chosen automatically from the task type. 
- verbose (bool) – If True, CatBoost prints training progress. 
- val_fraction (float | None) – fraction of training data used for early stopping. Set to None to disaple early stopping. Note: early stopping is not supported with multilabel classification. 
- early_stopping_rounds (int) – number of iterations without metric increasing waiting for early stopping. Ignored when - val_fractionis- None.
- **catboost_kwargs (dict[str, Any]) – Any additional keyword arguments forwarded to - catboost.CatBoostClassifier. Please refer to catboost’s documentation
- iterations (int) 
- depth (int) 
- **catboost_kwargs 
 
 - Example:#- from autointent.modules import CatBoostScorer scorer = CatBoostScorer( iterations=50, learning_rate=0.05, depth=6, l2_leaf_reg=3, eval_metric="Accuracy", random_seed=42, verbose=False, features_type="embedding", # or "text" or "both" ) utterances = ["hello", "goodbye", "allo", "sayonara"] labels = [0, 1, 0, 1] scorer.fit(utterances, labels) test_utterances = ["hi", "bye"] probabilities = scorer.predict(test_utterances) - name = 'catboost'#
 - supports_multiclass = True#
 - supports_multilabel = True#
 - encoder_features_types#
 - val_fraction = 0.2#
 - early_stopping_rounds = 100#
 - iterations = 1000#
 - depth = 6#
 - features_type#
 - use_embedding_features = True#
 - embedder_config#
 - loss_function = None#
 - verbose = False#
 - catboost_kwargs#
 - classmethod from_context(context, embedder_config=None, features_type=FeaturesType.BOTH, use_embedding_features=True, loss_function=None, verbose=False, val_fraction=0.2, early_stopping_rounds=100, iterations=1000, depth=6, **catboost_kwargs)#
- Parameters:
- context (autointent.Context) 
- embedder_config (autointent.configs.EmbedderConfig | str | dict[str, Any] | None) 
- features_type (FeaturesType) 
- use_embedding_features (bool) 
- loss_function (str | None) 
- verbose (bool) 
- val_fraction (autointent.custom_types.FloatFromZeroToOne | None) 
- early_stopping_rounds (pydantic.PositiveInt) 
- iterations (pydantic.PositiveInt) 
- depth (pydantic.PositiveInt) 
 
- Return type:
 
 - fit(utterances, labels)#
 - predict(utterances)#
 - clear_cache()#
- Return type:
- None