Command Line Interface for Pipeline Auto Configuration#

Data#

Just like with Python API, you can run an automatic pipeline configuration with just a prepared data set.

You can use local JSON file:

autointent data.train_path="path/to/my.json"

Or dataset from Hugging Face hub:

autointent data.train_path="AutoIntent/banking77"

Search Space#

You can provide custom search space, saved as YAML file (as explained in 01_search_space):

autointent data.train_path="AutoIntent/banking77" task.search_space_path="path/to/my/search/space.yaml"

Logging Level#

AutoIntent provides comprehensive logs. You can enable it by changing default logging level:

autointent data.train_path="AutoIntent/banking77" hydra.job_logging.root.level=INFO

All Options#

data:
# Path to a json file with training data. Set to "default" to use AutoIntent/clinc150_subset from HF hub.
  train_path: ???

# Path to a json file with test records. Skip this option if you want to use a random subset of the
# training sample as test data.
  test_path: null

# Set to true if your data is multiclass but you want to train the multilabel classifier.
  force_multilabel: false

task:
# Path to a yaml configuration file that defines the optimization search space.
# Omit this to use the default configuration.
  search_space_path: null
logs:
# Name of the run prepended to optimization assets dirname (generated randomly if omitted)
  run_name: "awful_hippo_10-30-2024_19-42-12"

# Location where to save optimization logs that will be saved as `<logs_dir>/<run_name>_<cur_datetime>/logs.json`.
# Omit to use current working directory. <-- on Windows it is not correct
  dirpath: "/home/user/AutoIntent/awful_hippo_10-30-2024_19-42-12"

  dump_dir: "/home/user/AutoIntent/runs/awful_hippo_10-30-2024_19-42-12/modules_dumps"

vector_index:
# Location where to save faiss database file. Omit to use your system's default cache directory.
  db_dir: null

# Specify device in torch notation
  device: cpu

augmentation:
# Number of shots per intent to sample from regular expressions. This option extends sample utterance
# within multiclass intent records.
  regex_sampling: 0

# Config string like "[20, 40, 20, 10]" means 20 one-label examples, 40 two-label examples, 20 three-label examples,
# 10 four-label examples. This option extends multilabel utterance records.
  multilabel_generation_config: null

embedder:
# batch size for embedding computation.
  batch_size: 1
# sentence length limit for embedding computation
  max_length: null

#Affects the randomness
seed: 0

# String from {DEBUG,INFO,WARNING,ERROR,CRITICAL}. Omit to use ERROR by default.
hydra.job_logging.root.level: "ERROR"

Run from Config File#

Create a yaml file in a separate folder with the following structure my_config.yaml:

defaults:
- optimization_config
- _self_
- override hydra/job_logging: custom

# put the configuration options you want to override here. The full structure is presented above.
# Here is just an example with the same options as for the command line variant above.
embedder:
embedder_batch_size: 32

Launch AutoIntent:

autointent --config-path=/path/to/config/directory --config-name=my_config

Important:

specify the full path in the config-path option.
do not use tab in the yaml file.
it is desirable that the file name differs from optimization_config.yaml to avoid warnings from hydra

You can use a combination of Option 1 and 2. Command line options have the highest priority.

Example configs are stored in our GitHub repository in example_configs.