autointent.modules.scoring.LinearScorer#

class autointent.modules.scoring.LinearScorer(embedder_name, cv=3, n_jobs=None, embedder_device='cpu', seed=0, batch_size=32, max_length=None, embedder_use_cache=False)#

Bases: autointent.modules.abc.ScoringModule

Scoring module for linear classification using logistic regression.

This module uses embeddings generated from a transformer model to train a logistic regression classifier for intent classification.

Variables:
  • classifier_file_name – Filename for saving the classifier to disk.

  • embedding_model_subdir – Directory for saving the embedding model.

  • precomputed_embeddings – Flag indicating if embeddings are precomputed.

  • db_dir – Path to the database directory.

  • name – Name of the scorer, defaults to “linear”.

Parameters:
  • embedder_name (str)

  • cv (int)

  • n_jobs (int | None)

  • embedder_device (str)

  • seed (int)

  • batch_size (int)

  • max_length (int | None)

  • embedder_use_cache (bool)

Example#

from autointent.modules import LinearScorer
scorer = LinearScorer(
    embedder_name="sergeyzh/rubert-tiny-turbo", cv=2
)
utterances = ["hello", "goodbye", "allo", "sayonara"]
labels = [0, 1, 0, 1]
scorer.fit(utterances, labels)
test_utterances = ["hi", "bye"]
probabilities = scorer.predict(test_utterances)
print(probabilities)
[[0.50000032 0.49999968]
 [0.50000032 0.49999968]]
classifier_file_name: str = 'classifier.joblib'#
embedding_model_subdir: str = 'embedding_model'#
precomputed_embeddings: bool = False#
db_dir: str#
name = 'linear'#
cv = 3#
n_jobs = None#
embedder_device = 'cpu'#
seed = 0#
embedder_name#
batch_size = 32#
max_length = None#
embedder_use_cache = False#
classmethod from_context(context, embedder_name=None)#

Create a LinearScorer instance using a Context object.

Parameters:
  • context (autointent.Context) – Context containing configurations and utilities.

  • embedder_name (str | None) – Name of the embedder, or None to use the best embedder.

Returns:

Initialized LinearScorer instance.

Return type:

LinearScorer

get_embedder_name()#

Get the name of the embedder.

Returns:

Embedder name.

Return type:

str

fit(utterances, labels)#

Train the logistic regression classifier.

Parameters:
  • utterances (list[str]) – List of training utterances.

  • labels (list[autointent.custom_types.LabelType]) – List of labels corresponding to the utterances.

Raises:

ValueError – If the vector index mismatches the provided utterances.

Return type:

None

predict(utterances)#

Predict probabilities for the given utterances.

Parameters:

utterances (list[str]) – List of query utterances.

Returns:

Array of predicted probabilities for each class.

Return type:

numpy.typing.NDArray[Any]

clear_cache()#

Clear cached data in memory used by the embedder.

Return type:

None

dump(path)#

Save the LinearScorer’s metadata, classifier, and embedder to disk.

Parameters:

path (str) – Path to the directory where assets will be dumped.

Return type:

None

load(path)#

Load the LinearScorer’s metadata, classifier, and embedder from disk.

Parameters:

path (str) – Path to the directory containing the dumped assets.

Return type:

None