Modules#

In this chapter you will get familiar with modules and how to use them for intent classification.

Modules are the basic units in our library. They perform core operations such as predicting probabilities and constructing final set of predicted labels.

Modules Types#

There are two main module types in AutoIntent:

Scoring modules. These modules perform probabilities prediction, i.e. they take an utterance as input and output a vector of probabilities.
Prediction modules. These modules take vector of probabilities and output set of labels. Prediction modules are important to support multi-label classification and out-of-domain utterances detection.

Initialize Module#

Firstly, you need to initialize module:

[1]:

from autointent.modules.scoring import KNNScorer

scorer = KNNScorer(
    embedder_config="sergeyzh/rubert-tiny-turbo",
    k=5,
)

/home/runner/.cache/pypoetry/virtualenvs/autointent-FDypUDHQ-py3.10/lib/python3.10/site-packages/tqdm/auto.py:21: TqdmWarning: IProgress not found. Please update jupyter and ipywidgets. See https://ipywidgets.readthedocs.io/en/stable/user_install.html
  from .autonotebook import tqdm as notebook_tqdm

At this moment, you do two things:

Set hyperparameters. Refer to Modules API Reference to see all possible hyperparameters and their default values.
Configure infrastructure. You are allowed to
- choose CUDA device (embedder_device)
- customize embedder batch size (batch_size) and truncation length (embedder_max_length)

Load Data#

Secondly, you need to load training data (see previous chapter for detailed explanation of what happens):

[2]:

from autointent import Dataset

dataset = Dataset.from_hub("AutoIntent/clinc150_subset")
dataset

[2]:

{'train_0': Dataset({
     features: ['utterance', 'label'],
     num_rows: 18
 }),
 'train_1': Dataset({
     features: ['utterance', 'label'],
     num_rows: 18
 }),
 'validation_0': Dataset({
     features: ['utterance', 'label'],
     num_rows: 4
 }),
 'validation_1': Dataset({
     features: ['utterance', 'label'],
     num_rows: 8
 }),
 'test': Dataset({
     features: ['utterance', 'label'],
     num_rows: 12
 })}

Fit Module#

[3]:

scorer.fit(dataset["train_0"]["utterance"], dataset["train_0"]["label"])

Inference#

After fitting, module is ready for using at inference:

[4]:

scorer.predict(["hello world!"])

[4]:

array([[0.19671963, 0.19584422, 0.60743615, 0.        ]])

Dump and Load#

We provide functionality to save and restore module. To save, just provide a path to a directory:

[5]:

from pathlib import Path

pathdir = Path("my_dumps/knnscorer_clinc150")
pathdir.mkdir(parents=True)
scorer.dump(pathdir)

To restore, initialize module with the same hyperparams and use load method:

[6]:

loaded_scorer = KNNScorer(
    embedder_config="sergeyzh/rubert-tiny-turbo",
    k=5,
)
loaded_scorer.load(pathdir)
loaded_scorer.predict(["hello world!"])

[6]:

array([[0.19671963, 0.19584422, 0.60743615, 0.        ]])

Rich Output#

Some scoring modules support rich output as a result of prediction. It can be useful for inspecting how your classifier work and for debugging as it contains intrinsic information such as retrieved candidates. Example:

[7]:

loaded_scorer.predict_with_metadata(["hello world!"])

[7]:

(array([[0.19671963, 0.19584422, 0.60743615, 0.        ]]),
 [{'neighbors': ['wake me up at noon tomorrow',
    'set my alarm for getting up',
    'i need you to schedule an alarm',
    "does michael's accept reservations",
    'why in the world am i locked out of my bank account']}])

That’s all!#

[8]:

# [we need to clean the space]
import shutil

shutil.rmtree(pathdir.parent)