autointent.modules.scoring.MLKnnScorer#

class autointent.modules.scoring.MLKnnScorer(k=5, embedder_config=None, s=1.0, ignore_first_neighbours=0)#

Bases: autointent.modules.base.BaseScorer

Multi-label k-nearest neighbors (ML-KNN) scorer.

This module implements ML-KNN, a multi-label classifier that computes probabilities based on the k-nearest neighbors of a query instance.

Parameters:

k (pydantic.PositiveInt) – Number of nearest neighbors to consider
embedder_config (autointent.configs.EmbedderConfig | str | dict[str, Any] | None) – Config of the embedder used for vectorization
s (float) – Smoothing parameter for probability calculations, defaults to 1.0
ignore_first_neighbours (int) – Number of closest neighbors to ignore, defaults to 0

Example:#

from autointent.modules.scoring import MLKnnScorer
utterances = ["what is your name?", "how are you?"]
labels = [[1,0], [0,1]]
scorer = MLKnnScorer(
    k=5,
    embedder_config="sergeyzh/rubert-tiny-turbo",
)
scorer.fit(utterances, labels)
test_utterances = ["Hi!", "What's up?"]
probabilities = scorer.predict(test_utterances)
print(probabilities)  # Outputs predicted probabilities for each label

[[0.5 0.5]
 [0.5 0.5]]

name = 'mlknn'#: Name of the module to reference in search space configuration.

supports_multiclass = False#: Whether the module supports multiclass classification

supports_multilabel = True#: Whether the module supports multilabel classification

k = 5#

embedder_config#

s = 1.0#

ignore_first_neighbours = 0#

classmethod from_context(context, k=5, s=1.0, ignore_first_neighbours=0, embedder_config=None)#

Create an MLKnnScorer instance using a Context object.

Parameters:

context (autointent.Context) – Context containing configurations and utilities
k (pydantic.PositiveInt) – Number of nearest neighbors to consider
s (pydantic.PositiveFloat) – Smoothing parameter for probability calculations, defaults to 1.0
ignore_first_neighbours (pydantic.NonNegativeInt) – Number of closest neighbors to ignore, defaults to 0
embedder_config (autointent.configs.EmbedderConfig | str | None) – Config of the embedder, or None to use the best embedder

Returns:

Initialized MLKnnScorer instance

Return type:

MLKnnScorer

get_implicit_initialization_params()#

Return default params used in __init__ method.

Some parameters of the module may be inferred using context rather from __init__ method. But they need to be logged for reproducibility during loading from disk.

Returns:: Dictionary of default params
Return type:: dict[str, Any]

fit(utterances, labels)#

Fit the scorer by training or loading the vector index and calculating probabilities.

Parameters:

utterances (list[str]) – List of training utterances
labels (autointent.custom_types.ListOfLabels) – List of multi-label targets for each utterance

Raises:

TypeError – If the labels are not multi-label
ValueError – If the vector index mismatches the provided utterances

Return type:

None

predict_labels(utterances, thresh=0.5)#

Predict labels for the given utterances.

Parameters:

utterances (list[str]) – List of query utterances
thresh (float) – Threshold for binary classification, defaults to 0.5

Returns:

Predicted labels as a binary array

Return type:

numpy.typing.NDArray[numpy.int64]

predict(utterances)#

Predict probabilities for the given utterances.

Parameters:: utterances (list[str]) – List of query utterances
Returns:: Array of predicted probabilities for each class
Return type:: numpy.typing.NDArray[numpy.float64]

predict_with_metadata(utterances)#

Predict probabilities along with metadata for the given utterances.

Parameters:

utterances (list[str]) – List of query utterances

Returns:

Array of predicted probabilities
List of metadata with neighbor information

Return type:

Tuple containing

clear_cache()#

Clear cached data in memory used by the vector index.

Return type:: None