autointent.modules.scoring.MLKnnScorer#

class autointent.modules.scoring.MLKnnScorer(k, embedder_name, db_dir=None, s=1.0, ignore_first_neighbours=0, embedder_device='cpu', batch_size=32, max_length=None, embedder_use_cache=False)#

Bases: autointent.modules.abc.ScoringModule

Multi-label k-nearest neighbors (ML-KNN) scorer.

This module implements ML-KNN, a multi-label classifier that computes probabilities based on the k-nearest neighbors of a query instance.

Variables:
  • arrays_filename – Filename for saving probabilities to disk.

  • metadata – Metadata about the scorer’s configuration.

  • prebuilt_index – Flag indicating if the vector index is prebuilt.

  • name – Name of the scorer, defaults to “mlknn”.

Parameters:
  • k (int)

  • embedder_name (str)

  • db_dir (str | None)

  • s (float)

  • ignore_first_neighbours (int)

  • embedder_device (str)

  • batch_size (int)

  • max_length (int | None)

  • embedder_use_cache (bool)

Example#

from autointent.modules.scoring import MLKnnScorer
utterances = ["what is your name?", "how are you?"]
labels = [[1,0], [0,1]]
scorer = MLKnnScorer(
    k=5,
    embedder_name="sergeyzh/rubert-tiny-turbo",
    db_dir=db_dir,
)
scorer.fit(utterances, labels)
test_utterances = ["Hi!", "What's up?"]
probabilities = scorer.predict(test_utterances)
print(probabilities)  # Outputs predicted probabilities for each label
[[0.5 0.5]
 [0.5 0.5]]
arrays_filename: str = 'probs.npz'#
metadata: MLKnnScorerDumpMetadata#
prebuilt_index: bool = False#
name = 'mlknn'#
k#
embedder_name#
s = 1.0#
ignore_first_neighbours = 0#
embedder_device = 'cpu'#
batch_size = 32#
max_length = None#
embedder_use_cache = False#
property db_dir: str#

Get the database directory for the vector index.

Returns:

Path to the database directory.

Return type:

str

classmethod from_context(context, k, s=1.0, ignore_first_neighbours=0, embedder_name=None)#

Create an MLKnnScorer instance using a Context object.

Parameters:
  • context (autointent.Context) – Context containing configurations and utilities.

  • k (int) – Number of nearest neighbors to consider.

  • s (float) – Smoothing parameter for probability calculations, defaults to 1.0.

  • ignore_first_neighbours (int) – Number of closest neighbors to ignore, defaults to 0.

  • embedder_name (str | None) – Name of the embedder, or None to use the best embedder.

Returns:

Initialized MLKnnScorer instance.

Return type:

MLKnnScorer

get_embedder_name()#

Get the name of the embedder.

Returns:

Embedder name.

Return type:

str

fit(utterances, labels)#

Fit the scorer by training or loading the vector index and calculating probabilities.

Parameters:
  • utterances (list[str]) – List of training utterances.

  • labels (list[autointent.custom_types.LabelType]) – List of multi-label targets for each utterance.

Raises:
  • TypeError – If the labels are not multi-label.

  • ValueError – If the vector index mismatches the provided utterances.

Return type:

None

predict_labels(utterances, thresh=0.5)#

Predict labels for the given utterances.

Parameters:
  • utterances (list[str]) – List of query utterances.

  • thresh (float) – Threshold for binary classification, defaults to 0.5.

Returns:

Predicted labels as a binary array.

Return type:

numpy.typing.NDArray[numpy.int64]

predict(utterances)#

Predict probabilities for the given utterances.

Parameters:

utterances (list[str]) – List of query utterances.

Returns:

Array of predicted probabilities for each class.

Return type:

numpy.typing.NDArray[numpy.float64]

predict_with_metadata(utterances)#

Predict probabilities along with metadata for the given utterances.

Parameters:

utterances (list[str]) – List of query utterances.

Returns:

Tuple of probabilities and metadata with neighbor information.

Return type:

tuple[numpy.typing.NDArray[Any], list[dict[str, Any]] | None]

clear_cache()#

Clear cached data in memory used by the vector index.

Return type:

None

dump(path)#

Save the MLKnnScorer’s metadata and probabilities to disk.

Parameters:

path (str) – Path to the directory where assets will be dumped.

Return type:

None

load(path)#

Load the MLKnnScorer’s metadata and probabilities from disk.

Parameters:

path (str) – Path to the directory containing the dumped assets.

Return type:

None