autointent.context.vector_index_client.VectorIndex#

class autointent.context.vector_index_client.VectorIndex(model_name, embedder_device, embedder_batch_size=32, embedder_max_length=None, embedder_use_cache=False)#

A class for managing a vector index using FAISS and embedding models.

This class allows adding, querying, and managing embeddings and their associated labels for efficient nearest neighbor search.

Parameters:
  • model_name (str)

  • embedder_device (str)

  • embedder_batch_size (int)

  • embedder_max_length (int | None)

  • embedder_use_cache (bool)

model_name#
embedder#
embedder_device#
labels: list[autointent.custom_types.LabelType] = []#
texts: list[str] = []#
logger#
add(texts, labels)#

Add texts and their corresponding labels to the index.

Parameters:
  • texts (list[str]) – List of input texts.

  • labels (list[autointent.custom_types.LabelType]) – List of labels corresponding to the texts.

Return type:

None

is_empty()#

Check if the index is empty.

Returns:

True if the index contains no embeddings, False otherwise.

Return type:

bool

delete()#

Delete the vector index and all associated data from disk and memory.

Return type:

None

clear_ram()#

Clear the vector index from RAM.

Return type:

None

get_all_embeddings()#

Retrieve all embeddings stored in the index.

Returns:

Array of all embeddings.

Raises:

ValueError – If the index has not been created yet.

Return type:

numpy.typing.NDArray[Any]

get_all_labels()#

Retrieve all labels stored in the index.

Returns:

List of all labels.

Return type:

list[autointent.custom_types.LabelType]

query(queries, k)#

Query the index to retrieve nearest neighbors.

Parameters:
  • queries (list[str] | numpy.typing.NDArray[numpy.float32]) – List of text queries or embedding vectors.

  • k (int) – Number of nearest neighbors to return for each query.

Returns:

A tuple containing: - labels: List of retrieved labels for each query. - distances: Corresponding distances for each neighbor. - texts: Corresponding texts for each neighbor.

Return type:

tuple[list[list[autointent.custom_types.LabelType]], list[list[float]], list[list[str]]]

dump(dir_path)#

Save the index and associated data to disk.

Parameters:

dir_path (pathlib.Path) – Directory path to save the data.

Return type:

None

load(dir_path)#

Load the index and associated data from disk.

Parameters:

dir_path (pathlib.Path) – Directory path where the data is stored.

Return type:

None