autointent.Context#

class autointent.Context(seed=42)#

Context manager for configuring and managing data handling, vector indexing, and optimization.

This class provides methods to set up logging, configure data and vector index components, manage datasets, and retrieve various configurations for inference and optimization.

Parameters:

seed (int)

data_handler: autointent.context.data_handler.DataHandler#
vector_index_client: autointent.context.vector_index_client.VectorIndexClient#
optimization_info: autointent.context.optimization_info.OptimizationInfo#
callback_handler#
seed = 42#
configure_logging(config)#

Configure logging settings.

Parameters:

config (autointent.configs.LoggingConfig) – Logging configuration settings.

Return type:

None

configure_vector_index(config, embedder_config=None)#

Configure the vector index client and embedder.

Parameters:
Return type:

None

configure_data(config)#

Configure data handling.

Parameters:

config (autointent.configs.DataConfig) – Configuration for the data handling process.

Return type:

None

set_dataset(dataset, force_multilabel=False)#

Set the datasets for training, validation and testing.

Parameters:
  • dataset (autointent.Dataset) – Dataset.

  • force_multilabel (bool) – Whether to force multilabel classification.

Return type:

None

get_inference_config()#

Generate configuration settings for inference.

Returns:

Dictionary containing inference configuration.

Return type:

dict[str, Any]

dump()#

Save logs, configurations, and datasets to disk.

Dumps evaluation results, training/test data splits, and inference configurations to the specified logging directory.

Return type:

None

get_db_dir()#

Get the database directory of the vector index.

Returns:

Path to the database directory.

Return type:

pathlib.Path

get_device()#

Get the embedder device used by the vector index client.

Returns:

Device name.

Return type:

str

get_batch_size()#

Get the batch size used by the embedder.

Returns:

Batch size.

Return type:

int

get_max_length()#

Get the maximum sequence length for embeddings.

Returns:

Maximum length or None if not set.

Return type:

int | None

get_use_cache()#

Check if caching is enabled for the embedder.

Returns:

True if caching is enabled, False otherwise.

Return type:

bool

get_dump_dir()#

Get the directory for saving dumped modules.

Returns:

Path to the dump directory or None if dumping is disabled.

Return type:

pathlib.Path | None

is_multilabel()#

Check if the dataset is configured for multilabel classification.

Returns:

True if multilabel classification is enabled, False otherwise.

Return type:

bool

get_n_classes()#

Get the number of classes in the dataset.

Returns:

Number of classes.

Return type:

int

is_ram_to_clear()#

Check if RAM clearing is enabled in the logging configuration.

Returns:

True if RAM clearing is enabled, False otherwise.

Return type:

bool

has_saved_modules()#

Check if any modules have been saved.

Returns:

True if there are saved modules, False otherwise.

Return type:

bool