autointent.VectorIndex#

class autointent.VectorIndex(embedder_config)#

A class for managing a vector index using FAISS and embedding models.

This class allows adding, querying, and managing embeddings and their associated labels for efficient nearest neighbor search.

Parameters:

embedder_config (autointent.configs.EmbedderConfig)

embedder#
labels: autointent.custom_types.ListOfLabels = []#
texts: list[str] = []#
add(texts, labels)#

Add texts and their corresponding labels to the index.

Parameters:
  • texts (list[str]) – List of input texts.

  • labels (autointent.custom_types.ListOfLabels) – List of labels corresponding to the texts.

Return type:

None

is_empty()#

Check if the index is empty.

Returns:

True if the index contains no embeddings, False otherwise.

Return type:

bool

delete()#

Delete the vector index and all associated data from disk and memory.

Return type:

None

clear_ram()#

Clear the vector index from RAM.

Return type:

None

get_all_embeddings()#

Retrieve all embeddings stored in the index.

Returns:

Array of all embeddings.

Raises:

ValueError – If the index has not been created yet.

Return type:

numpy.typing.NDArray[Any]

get_all_labels()#

Retrieve all labels stored in the index.

Returns:

List of all labels.

Return type:

autointent.custom_types.ListOfLabels

query(queries, k)#

Query the index to retrieve nearest neighbors.

Parameters:
  • queries (list[str] | numpy.typing.NDArray[numpy.float32]) – List of text queries or embedding vectors.

  • k (int) – Number of nearest neighbors to return for each query.

Returns:

  • labels: List of retrieved labels for each query.

  • distances: Corresponding distances for each neighbor.

  • texts: Corresponding texts for each neighbor.

Return type:

A tuple containing

dump(dir_path)#

Save the index and associated data to disk.

Parameters:

dir_path (pathlib.Path) – Directory path where the data will be stored.

Return type:

None

classmethod load(dir_path, embedder_device=None, embedder_batch_size=None, embedder_use_cache=None)#

Load the index and associated data from disk.

Parameters:
  • dir_path (pathlib.Path) – Directory path where the data is stored.

  • embedder_device (str | None) – Device for the embedding model.

  • embedder_batch_size (int | None) – Batch size for the embedding model.

  • embedder_use_cache (bool | None) – Whether to use caching for the embedding model.

Return type:

VectorIndex