autointent.modules.scoring.KNNScorer#

class autointent.modules.scoring.KNNScorer(k=5, embedder_config=None, weights='distance')#

Bases: autointent.modules.base.BaseScorer

K-nearest neighbors (KNN) scorer for intent classification.

This module uses a vector index to retrieve nearest neighbors for query utterances and applies a weighting strategy to compute class probabilities.

Parameters:

embedder_config (autointent.configs.EmbedderConfig | str | dict[str, Any] | None) – Config of the embedder used for vectorization
k (pydantic.PositiveInt) – Number of closest neighbors to consider during inference
weights (autointent.custom_types.WeightType) –
Weighting strategy:
- ”uniform”: Equal weight for all neighbors
- ”distance”: Weight inversely proportional to distance
- ”closest”: Only the closest neighbor of each class is weighted

Examples:#

from autointent.modules.scoring import KNNScorer
utterances = ["hello", "how are you?"]
labels = [0, 1]
scorer = KNNScorer(
    embedder_config="sergeyzh/rubert-tiny-turbo",
    k=5,
)
scorer.fit(utterances, labels)
test_utterances = ["hi", "what's up?"]
probabilities = scorer.predict(test_utterances)

name = 'knn'#: Name of the module to reference in search space configuration.

supports_multilabel = True#: Whether the module supports multilabel classification

supports_multiclass = True#: Whether the module supports multiclass classification

embedder_config#

k = 5#

weights = 'distance'#

classmethod from_context(context, k=5, weights='distance', embedder_config=None)#

Create a KNNScorer instance using a Context object.

Parameters:

context (autointent.Context) – Context containing configurations and utilities
k (pydantic.PositiveInt) – Number of closest neighbors to consider during inference
weights (autointent.custom_types.WeightType) – Weighting strategy for scoring
embedder_config (autointent.configs.EmbedderConfig | str | None) – Config of the embedder, or None to use the best embedder

Return type:

KNNScorer

get_implicit_initialization_params()#

Return default params used in __init__ method.

Some parameters of the module may be inferred using context rather from __init__ method. But they need to be logged for reproducibility during loading from disk.

Returns:: Dictionary of default params
Return type:: dict[str, Any]

fit(utterances, labels)#

Fit the scorer by training or loading the vector index.

Parameters:

utterances (list[str]) – List of training utterances
labels (autointent.custom_types.ListOfLabels) – List of labels corresponding to the utterances
clear_cache – Whether to clear the vector index cache before fitting

Raises:

ValueError – If the vector index mismatches the provided utterances

Return type:

None

predict(utterances)#

Predict class probabilities for the given utterances.

Parameters:: utterances (list[str]) – List of query utterances
Returns:: Array of predicted probabilities for each class
Return type:: numpy.typing.NDArray[Any]

predict_with_metadata(utterances)#

Predict class probabilities along with metadata for the given utterances.

Parameters:

utterances (list[str]) – List of query utterances

Returns:

Array of predicted probabilities
List of metadata with neighbor information

Return type:

Tuple containing

clear_cache()#

Clear cached data in memory used by the vector index.

Return type:: None