autointent.modules.scoring.MLKnnScorer#
- class autointent.modules.scoring.MLKnnScorer(k, embedder_name, db_dir=None, s=1.0, ignore_first_neighbours=0, embedder_device='cpu', batch_size=32, max_length=None, embedder_use_cache=False)#
Bases:
autointent.modules.abc.ScoringModule
Multi-label k-nearest neighbors (ML-KNN) scorer.
This module implements ML-KNN, a multi-label classifier that computes probabilities based on the k-nearest neighbors of a query instance.
- Variables:
arrays_filename – Filename for saving probabilities to disk.
metadata – Metadata about the scorer’s configuration.
prebuilt_index – Flag indicating if the vector index is prebuilt.
name – Name of the scorer, defaults to “mlknn”.
- Parameters:
Example#
from autointent.modules.scoring import MLKnnScorer utterances = ["what is your name?", "how are you?"] labels = [[1,0], [0,1]] scorer = MLKnnScorer( k=5, embedder_name="sergeyzh/rubert-tiny-turbo", db_dir=db_dir, ) scorer.fit(utterances, labels) test_utterances = ["Hi!", "What's up?"] probabilities = scorer.predict(test_utterances) print(probabilities) # Outputs predicted probabilities for each label
[[0.5 0.5] [0.5 0.5]]
- metadata: MLKnnScorerDumpMetadata#
- name = 'mlknn'#
- k#
- embedder_name#
- s = 1.0#
- ignore_first_neighbours = 0#
- embedder_device = 'cpu'#
- batch_size = 32#
- max_length = None#
- embedder_use_cache = False#
- property db_dir: str#
Get the database directory for the vector index.
- Returns:
Path to the database directory.
- Return type:
- classmethod from_context(context, k, s=1.0, ignore_first_neighbours=0, embedder_name=None)#
Create an MLKnnScorer instance using a Context object.
- Parameters:
context (autointent.Context) – Context containing configurations and utilities.
k (int) – Number of nearest neighbors to consider.
s (float) – Smoothing parameter for probability calculations, defaults to 1.0.
ignore_first_neighbours (int) – Number of closest neighbors to ignore, defaults to 0.
embedder_name (str | None) – Name of the embedder, or None to use the best embedder.
- Returns:
Initialized MLKnnScorer instance.
- Return type:
- fit(utterances, labels)#
Fit the scorer by training or loading the vector index and calculating probabilities.
- Parameters:
- Raises:
TypeError – If the labels are not multi-label.
ValueError – If the vector index mismatches the provided utterances.
- Return type:
None
- predict_labels(utterances, thresh=0.5)#
Predict labels for the given utterances.
- predict(utterances)#
Predict probabilities for the given utterances.
- predict_with_metadata(utterances)#
Predict probabilities along with metadata for the given utterances.
- clear_cache()#
Clear cached data in memory used by the vector index.
- Return type:
None