Quickstart#
Installation#
AutoIntent can be installed using the package manager pip
:
git clone https://github.com/voorhs/AutoIntent.git
cd AutoIntent
pip install .
The library is compatible with Python 3.10+.
Data Format#
To work with AutoIntent, you need to format your training data in a specific way. You need to provide a training split containing samples with utterances and labels, as shown below:
{
"train": [
{
"utterance": "Hello!",
"label": 0
}
]
}
For a multilabel dataset, the label
field should be a list of integers representing the corresponding class labels.
To use it with our Python API, you can use our autointent.Dataset
object.
from autointent import Dataset
dataset = Dataset.from_dict({"train": [...]})
To load a dataset from the file system into Python, the autointent.Dataset.from_json()
method exists:
dataset = Dataset.from_json("/path/to/json")
AutoML goes brrr…#
Once the data is ready, you can start building the optimal classifier from the command line:
autointent data.train_path="path/to/your/data.json"
This command will start the hyperparameter search in the default search space.
As a result, a runs
folder will be created in the current working directory, which will save the selected classifier ready for inference.
Similar actions but in a limited mode can be started using the Python API:
from autointent import PipelineOptimizer
pipeline_optimizer = PipelineOptimizer.default(multilabel=False)
pipeline_optimizer.fit(dataset)
Inference#
To apply the built classifier to new data, you can use our Python API:
from autointent import Pipeline
pipeline = Pipeline.load("path/to/run/directory")
utterances = ["123", "hello world"]
prediction = pipeline.predict(utterances)
Modular Approach#
If there is no need to iterate over pipelines and hyperparameters, you can import classification methods directly.
from autointent.modules import KNNScorer
scorer = KNNScorer(embedder_name="sergeyzh/rubert-tiny-turbo", k=1)
train_utterances = [
"why is there a hold on my american saving bank account",
"i am not sure why my account is blocked",
"why is there a hold on my capital one checking account",
]
train_labels = [0, 2, 1]
scorer.fit(train_utterances, train_labels)
test_utterances = [
"i think my account is blocked but i do not know the reason",
"can you tell me why is my bank account frozen",
]
scorer.predict(test_utterances)
Further Reading#
Get familiar with Key Concepts.
Check out the guide on basic Python API Usage: Basic Usage