Modules#

In this chapter you will get familiar with modules and how to use them for intent classification.

Modules are the basic units in our library. They perform core operations such as predicting probabilities and constructing final set of predicted labels.

Modules Types#

There are two main module types in AutoIntent:

  • Scoring modules. These modules perform probabilities prediction, i.e. they take an utterance as input and output a vector of probabilities.

  • Prediction modules. These modules take vector of probabilities and output set of labels. Prediction modules are important to support multi-label classification and out-of-domain utterances detection.

Initialize Module#

Firstly, you need to initialize module:

[1]:
from autointent.modules.scoring import KNNScorer

scorer = KNNScorer(
    embedder_name="sergeyzh/rubert-tiny-turbo",
    k=5,
)
/home/runner/.cache/pypoetry/virtualenvs/autointent-FDypUDHQ-py3.10/lib/python3.10/site-packages/sentence_transformers/cross_encoder/CrossEncoder.py:11: TqdmWarning: IProgress not found. Please update jupyter and ipywidgets. See https://ipywidgets.readthedocs.io/en/stable/user_install.html
  from tqdm.autonotebook import tqdm, trange

At this moment, you do two things:

  • Set hyperparameters. Refer to Modules API Reference to see all possible hyperparameters and their default values.

  • Configure infrastructure. You are allowed to

    • choose CUDA device (embedder_device)

    • customize embedder batch size (batch_size) and truncation length (embedder_max_length)

    • location where to save module’s assets (db_dir)

Load Data#

Secondly, you need to load training data (see previous chapter for detailed explanation of what happens):

[2]:
from autointent import Dataset

dataset = Dataset.from_hub("AutoIntent/clinc150_subset")

Fit Module#

[3]:
scorer.fit(dataset["train"]["utterance"], dataset["train"]["label"])

Inference#

After fitting, module is ready for using at inference:

[4]:
scorer.predict(["hello world!"])
[4]:
array([[0., 0., 1.]])

Dump and Load#

We provide functionality to save and restore module. To save, just provide a path to a directory:

[5]:
from pathlib import Path

pathdir = Path("my_dumps/knnscorer_clinc150")
pathdir.mkdir(parents=True)
scorer.dump(pathdir)

To restore, initialize module with the same hyperparams and use load method:

[6]:
loaded_scorer = KNNScorer(
    embedder_name="sergeyzh/rubert-tiny-turbo",
    k=5,
)
loaded_scorer.load(pathdir)
loaded_scorer.predict(["hello world!"])
[6]:
array([[0., 0., 1.]])

Rich Output#

Some scoring modules support rich output as a result of prediction. It can be useful for inspecting how your classifier work and for debugging as it contains intrinsic information such as retrieved candidates. Example:

[7]:
loaded_scorer.predict_with_metadata(["hello world!"])
[7]:
(array([[0., 0., 1.]]),
 [{'neighbors': ['i need an alarm set please',
    'you need to set alarm for me',
    'please set up an alarm to go off tomorrow at daybreak',
    'set the alarm for me',
    'set alarm']}])

That’s all!#

[8]:
# [you didn't see it]
import shutil

shutil.rmtree(pathdir.parent)

for file in Path.cwd().glob("vector_db*"):
    shutil.rmtree(file)