autointent.VectorIndex#
- class autointent.VectorIndex(embedder_config)#
A class for managing a vector index using FAISS and embedding models.
This class allows adding, querying, and managing embeddings and their associated labels for efficient nearest neighbor search.
- Parameters:
embedder_config (autointent.configs.EmbedderConfig)
- embedder#
- labels: autointent.custom_types.ListOfLabels = []#
- add(texts, labels)#
Add texts and their corresponding labels to the index.
- is_empty()#
Check if the index is empty.
- Returns:
True if the index contains no embeddings, False otherwise.
- Return type:
- delete()#
Delete the vector index and all associated data from disk and memory.
- Return type:
None
- clear_ram()#
Clear the vector index from RAM.
- Return type:
None
- get_all_embeddings()#
Retrieve all embeddings stored in the index.
- Returns:
Array of all embeddings.
- Raises:
ValueError – If the index has not been created yet.
- Return type:
numpy.typing.NDArray[Any]
- get_all_labels()#
Retrieve all labels stored in the index.
- Returns:
List of all labels.
- Return type:
autointent.custom_types.ListOfLabels
- query(queries, k)#
Query the index to retrieve nearest neighbors.
- Parameters:
- Returns:
labels: List of retrieved labels for each query.
distances: Corresponding distances for each neighbor.
texts: Corresponding texts for each neighbor.
- Return type:
A tuple containing
- dump(dir_path)#
Save the index and associated data to disk.
- Parameters:
dir_path (pathlib.Path) – Directory path where the data will be stored.
- Return type:
None
- classmethod load(dir_path, embedder_device=None, embedder_batch_size=None, embedder_use_cache=None)#
Load the index and associated data from disk.
- Parameters:
dir_path (pathlib.Path) – Directory path where the data is stored.
embedder_device (str | None) – Device for the embedding model.
embedder_batch_size (int | None) – Batch size for the embedding model.
embedder_use_cache (bool | None) – Whether to use caching for the embedding model.
- Return type: