Search Space as YAML#
If you want to use default search space, you can skip this tutorial. Here we discuss how to save your custom search space as YAML file in order to use it in CLI for pipeline auto-configuration.
YAML#
YAML (YAML Ain’t Markup Language) is a human-readable data serialization standard that is often used for configuration files and data exchange between languages with different data structures. It serves similar purposes as JSON but is much easier to read.
Here’s an example YAML file:
database:
    host: localhost
    port: 5432
    username: admin
    # this is a comment
    password: secret
counts:
- 10
- 20
- 30
literal_counts: [10, 20, 30]
users:
- name: Alice
    age: 30
    email: alice@example.com
- name: Bob
    age: 25
    email: bob@example.com
settings:
debug: true
timeout: 30
Explanation:
the whole file represents a dictionary with keys
database,counts,users,settings,debug,timeoutdatabaseitself is a dictionary with keyshost,port, and so oncountsis a list (Python[10, 20, 30])literal_countsis a list toousersis a list of dictionaries
Example Search Space#
- node_type: embedding
  metric: retrieval_hit_rate
  search_space:
    - module_name: retrieval
      k: [10]
      embedder_name:
        - avsolatorio/GIST-small-Embedding-v0
        - infgrad/stella-base-en-v2
- node_type: scoring
  metric: scoring_roc_auc
  search_space:
    - module_name: knn
      k: [1, 3, 5, 10]
      weights: ["uniform", "distance", "closest"]
    - module_name: linear
    - module_name: dnnc
      cross_encoder_name:
        - BAAI/bge-reranker-base
        - cross-encoder/ms-marco-MiniLM-L-6-v2
      k: [1, 3, 5, 10]
- node_type: decision
  metric: decision_accuracy
  search_space:
    - module_name: threshold
      thresh: [0.5]
    - module_name: argmax