autointent.modules.scoring.MLKnnScorer#
- class autointent.modules.scoring.MLKnnScorer(k, embedder_config=None, s=1.0, ignore_first_neighbours=0)#
Bases:
autointent.modules.base.BaseScorer
Multi-label k-nearest neighbors (ML-KNN) scorer.
This module implements ML-KNN, a multi-label classifier that computes probabilities based on the k-nearest neighbors of a query instance.
- Parameters:
k (pydantic.PositiveInt) – Number of nearest neighbors to consider
embedder_config (autointent.configs.EmbedderConfig | str | dict[str, Any] | None) – Config of the embedder used for vectorization
s (float) – Smoothing parameter for probability calculations, defaults to 1.0
ignore_first_neighbours (int) – Number of closest neighbors to ignore, defaults to 0
Example:#
from autointent.modules.scoring import MLKnnScorer utterances = ["what is your name?", "how are you?"] labels = [[1,0], [0,1]] scorer = MLKnnScorer( k=5, embedder_config="sergeyzh/rubert-tiny-turbo", ) scorer.fit(utterances, labels) test_utterances = ["Hi!", "What's up?"] probabilities = scorer.predict(test_utterances) print(probabilities) # Outputs predicted probabilities for each label
[[0.5 0.5] [0.5 0.5]]
- name = 'mlknn'#
Name of the module.
- supports_multiclass = False#
Whether the module supports multiclass classification
- supports_multilabel = True#
Whether the module supports multilabel classification
- k#
- embedder_config#
- s = 1.0#
- ignore_first_neighbours = 0#
- classmethod from_context(context, k, s=1.0, ignore_first_neighbours=0, embedder_config=None)#
Create an MLKnnScorer instance using a Context object.
- Parameters:
context (autointent.Context) – Context containing configurations and utilities
k (pydantic.PositiveInt) – Number of nearest neighbors to consider
s (pydantic.PositiveFloat) – Smoothing parameter for probability calculations, defaults to 1.0
ignore_first_neighbours (pydantic.NonNegativeInt) – Number of closest neighbors to ignore, defaults to 0
embedder_config (autointent.configs.EmbedderConfig | str | None) – Config of the embedder, or None to use the best embedder
- Returns:
Initialized MLKnnScorer instance
- Return type:
- get_embedder_config()#
Get the name of the embedder.
- fit(utterances, labels)#
Fit the scorer by training or loading the vector index and calculating probabilities.
- Parameters:
- Raises:
TypeError – If the labels are not multi-label
ValueError – If the vector index mismatches the provided utterances
- Return type:
None
- predict_labels(utterances, thresh=0.5)#
Predict labels for the given utterances.
- predict(utterances)#
Predict probabilities for the given utterances.
- predict_with_metadata(utterances)#
Predict probabilities along with metadata for the given utterances.
- clear_cache()#
Clear cached data in memory used by the vector index.
- Return type:
None