autointent.VectorIndex#

class autointent.VectorIndex(embedder_config, config)#

A class for managing a vector index and embedding models.

This class allows adding, querying, and managing embeddings and their associated labels for efficient nearest neighbor search.

Parameters:
embedder: autointent._wrappers.Embedder#
index: autointent._wrappers.vector_index.base_backend.BaseIndexBackend#
config#
add(texts, labels)#

Add texts and their corresponding labels to the index.

Parameters:
  • texts (list[str]) – List of input texts.

  • labels (autointent.custom_types.ListOfLabels) – List of labels corresponding to the texts.

Return type:

None

clear_ram()#

Clear the vector index from RAM.

Return type:

None

get_all_embeddings()#

Retrieve all embeddings stored in the index.

Returns:

Array of all embeddings.

Raises:

ValueError – If the index has not been created yet.

Return type:

numpy.typing.NDArray[Any]

query(queries, k)#

Query the index to retrieve nearest neighbors.

Parameters:
  • queries (list[str] | numpy.typing.NDArray[Any]) – List of text queries or embedding vectors.

  • k (int) – Number of nearest neighbors to return for each query.

Returns:

  • distances: Corresponding distances for each neighbor.

  • documents: Corresponding documents for each neighbor.

Return type:

A tuple containing

dump(dir_path)#

Save the index and associated data to disk.

Parameters:

dir_path (pathlib.Path) – Directory path where the data will be stored.

Return type:

None

classmethod load(dir_path, embedder_override_config=None)#

Load the index and associated data from disk.

Parameters:
  • dir_path (pathlib.Path) – Directory path where the data is stored.

  • embedder_override_config (autointent.configs.EmbedderConfig | None) – override some settings like device and inference batch size

Return type:

VectorIndex