autointent.modules.embedding.RetrievalAimedEmbedding#

class autointent.modules.embedding.RetrievalAimedEmbedding(embedder_config, k=10)#

Bases: autointent.modules.base.BaseEmbedding

Module for configuring embeddings optimized for retrieval tasks.

The main purpose of this module is to be used at embedding node for optimizing embedding configuration using its retrieval quality as a sort of proxy metric.

Parameters:

Examples:#

from autointent.modules.embedding import RetrievalAimedEmbedding
utterances = ["bye", "how are you?", "good morning"]
labels = [0, 1, 1]
retrieval = RetrievalAimedEmbedding(
    k=2,
    embedder_config="sergeyzh/rubert-tiny-turbo",
)
retrieval.fit(utterances, labels)
name = 'retrieval'#

Name of the module.

supports_multiclass = True#

Whether the module supports multiclass classification

supports_multilabel = True#

Whether the module supports multilabel classification

supports_oos = False#

Whether the module supports oos data

k = 10#
embedder_config#
classmethod from_context(context, embedder_config, k=10)#

Create an instance using a Context object.

Parameters:
Return type:

RetrievalAimedEmbedding

fit(utterances, labels)#

Fit the vector index using the provided utterances and labels.

Parameters:
  • utterances (list[str]) – List of text data to index

  • labels (autointent.custom_types.ListOfLabels) – List of corresponding labels for the utterances

Return type:

None

score_ho(context, metrics)#

Evaluate the embedding model using specified metric functions.

Parameters:
  • context (autointent.Context) – Context containing test data and labels

  • metrics (list[str]) – List of metric names to compute

Returns:

Dictionary of computed metric values for the test set

Return type:

dict[str, float]

score_cv(context, metrics)#

Evaluate the embedding model using specified metric functions.

Parameters:
  • context (autointent.Context) – Context containing test data and labels

  • metrics (list[str]) – List of metric names to compute

Returns:

Dictionary of computed metric values for the test set

Return type:

dict[str, float]

get_assets()#

Get the retriever artifacts for this module.

Returns:

A EmbeddingArtifact object containing embedder information

Return type:

autointent.context.optimization_info.EmbeddingArtifact

clear_cache()#

Clear cached data in memory used by the vector index.

Return type:

None

predict(utterances)#

Predict the nearest neighbors for a list of utterances.

Parameters:

utterances (list[str]) – List of utterances for which nearest neighbors are to be retrieved

Returns:

List of labels for each retrieved utterance

Return type:

list[autointent.custom_types.ListOfLabels]