autointent.modules.scoring.DNNCScorer#
- class autointent.modules.scoring.DNNCScorer(cross_encoder_name, embedder_name, k, db_dir=None, device='cpu', train_head=False, batch_size=32, max_length=None, embedder_use_cache=False)#
Bases:
autointent.modules.abc.ScoringModule
Scoring module for intent classification using a discriminative nearest neighbor classification (DNNC).
This module uses a CrossEncoder for scoring candidate intents and can optionally train a logistic regression head on top of cross-encoder features.
@misc{zhang2020discriminativenearestneighborfewshot, title={Discriminative Nearest Neighbor Few-Shot Intent Detection by Transferring Natural Language Inference}, author={Jian-Guo Zhang and Kazuma Hashimoto and Wenhao Liu and Chien-Sheng Wu and Yao Wan and Philip S. Yu and Richard Socher and Caiming Xiong}, year={2020}, eprint={2010.13009}, archivePrefix={arXiv}, primaryClass={cs.CL}, url={https://arxiv.org/abs/2010.13009}, }
- Variables:
crossencoder_subdir – Subdirectory for storing the cross-encoder model (crossencoder).
model – The model used for scoring, which could be a CrossEncoder or a CrossEncoderWithLogreg.
prebuilt_index – Flag indicating whether a prebuilt vector index is used.
_db_dir – Path to the database directory where the vector index is stored.
name – Name of the scorer, defaults to “dnnc”.
- Parameters:
Examples#
from autointent.modules.scoring import DNNCScorer utterances = ["what is your name?", "how are you?"] labels = [0, 1] scorer = DNNCScorer( cross_encoder_name="cross-encoder/ms-marco-MiniLM-L-6-v2", embedder_name="sergeyzh/rubert-tiny-turbo", k=5, db_dir=db_dir, ) scorer.fit(utterances, labels) test_utterances = ["Hello!", "What's up?"] scores = scorer.predict(test_utterances) print(scores) # Outputs similarity scores for the utterances
[[-8.90408421 0. ] [-8.10923195 0. ]]
- name = 'dnnc'#
- model: sentence_transformers.CrossEncoder | autointent.modules.scoring._dnnc.head_training.CrossEncoderWithLogreg#
- cross_encoder_name#
- embedder_name#
- k#
- train_head = False#
- device = 'cpu'#
- batch_size = 32#
- max_length = None#
- embedder_use_cache = False#
- property db_dir: str#
Get the database directory for the vector index.
- Returns:
Path to the database directory.
- Return type:
- classmethod from_context(context, cross_encoder_name, k, embedder_name=None, train_head=False)#
Create a DNNCScorer instance using a Context object.
- Parameters:
context (autointent.Context) – Context containing configurations and utilities.
cross_encoder_name (str) – Name of the cross-encoder model.
k (int) – Number of nearest neighbors to retrieve.
embedder_name (str | None) – Name of the embedder model, or None to use the best embedder.
train_head (bool) – Whether to train a logistic regression head, defaults to False.
- Returns:
Initialized DNNCScorer instance.
- Return type:
- fit(utterances, labels)#
Fit the scorer by training or loading the vector index and optionally training a logistic regression head.
- Parameters:
- Raises:
ValueError – If the vector index mismatches the provided utterances.
- Return type:
None
- predict(utterances)#
Predict class scores for the given utterances.
- predict_with_metadata(utterances)#
Predict class scores along with metadata for the given utterances.
- clear_cache()#
Clear cached data in memory used by the vector index.
- Return type:
None