Search Space as YAML#
If you want to use default search space, you can skip this tutorial. Here we discuss how to save your custom search space as YAML file in order to use it in CLI for pipeline auto-configuration.
YAML#
YAML (YAML Ain’t Markup Language) is a human-readable data serialization standard that is often used for configuration files and data exchange between languages with different data structures. It serves similar purposes as JSON but is much easier to read.
Here’s an example YAML file:
database:
host: localhost
port: 5432
username: admin
# this is a comment
password: secret
counts:
- 10
- 20
- 30
literal_counts: [10, 20, 30]
users:
- name: Alice
age: 30
email: alice@example.com
- name: Bob
age: 25
email: bob@example.com
settings:
debug: true
timeout: 30
Explanation:
the whole file represents a dictionary with keys
database
,counts
,users
,settings
,debug
,timeout
database
itself is a dictionary with keyshost
,port
, and so oncounts
is a list (Python[10, 20, 30]
)literal_counts
is a list toousers
is a list of dictionaries
Example Search Space#
- node_type: embedding
metric: retrieval_hit_rate
search_space:
- module_name: retrieval
k: [10]
embedder_name:
- avsolatorio/GIST-small-Embedding-v0
- infgrad/stella-base-en-v2
- node_type: scoring
metric: scoring_roc_auc
search_space:
- module_name: knn
k: [1, 3, 5, 10]
weights: ["uniform", "distance", "closest"]
- module_name: linear
- module_name: dnnc
cross_encoder_name:
- BAAI/bge-reranker-base
- cross-encoder/ms-marco-MiniLM-L-6-v2
k: [1, 3, 5, 10]
- node_type: decision
metric: decision_accuracy
search_space:
- module_name: threshold
thresh: [0.5]
- module_name: argmax