8. Partial context updates#
The following tutorial shows the advanced usage of context storage and context storage schema.
[1]:
# installing dependencies
%pip install -q chatsky==0.10.0
Note: you may need to restart the kernel to use updated packages.
[2]:
from pathlib import Path
from chatsky import Pipeline
from chatsky.context_storages import context_storage_factory
from chatsky.context_storages.database import NameConfig
from chatsky.utils.testing.common import check_happy_path, is_interactive_mode
from chatsky.utils.testing.toy_script import TOY_SCRIPT_KWARGS, HAPPY_PATH
[3]:
Path("dbs").mkdir(exist_ok=True)
db = context_storage_factory("shelve://dbs/partly.shlv")
pipeline = Pipeline(**TOY_SCRIPT_KWARGS, context_storage=db)
Most of the Context
fields, that might grow in size uncontrollably, are stored in a special structure, ContextDict
. This structure can be used for fine-grained access to the underlying database, partial and asynchronous element loading. In particular, this is relevant for labels
, requests
and responses
fields, while misc
and framework_data
are always loaded fully.
How does that partial field writing work?
In most cases, every context storage operates two “tables”, “dictionaries”, “files”, etc.
One of them is called MAIN
and contains all the “primitive” Context
data (and also the data that will be read and written completely every time) - that includes context id
, current_turn_id
, _created_at
, _updated_at
, misc
and framework_data
fields.
The other one is called TURNS
and contains triplets of the data generated on each conversation step: label
, request
and response
.
Whenever a context is loaded, all of its information from MAIN
table and one to few items from TURNS
table are loaded. More items from TURNS
table can be loaded later on demand (via the get
or __getitem__
methods of corresponding fields).
Database table layout and default behavior are controlled by some special fields of the DBContextStorage
class.
All the table and field names are stored in a special NameConfig
static class.
One of the important configuration options is _subscripts
: this property controls the number of last dictionary items that will be read and written (the items are ordered by keys, ascending) - default value is 3. In order to read all items at once, the property can also be set to “all” literal. In order to read only a specific subset of keys, the property can be set to a set of desired integers.
[4]:
# All items will be loaded on every turn.
db._subscripts[NameConfig._requests_field] = "__all__"
[5]:
# 5 last items will be loaded on every turn.
db._subscripts[NameConfig._requests_field] = 5
[6]:
# Items 1, 3, 5 and 7 will be loaded on every turn.
db._subscripts[NameConfig._requests_field] = {1, 3, 5, 7}
Last but not least, comes rewrite_existing
boolean flag.
Without it any “silent” modifications to the values of ContextDict
will be discarded at the end of each turn.
I.e. explicit modification of values via methods such as __setitem__
, __delitem__
or pop
will be kept track of and preserved, while implicit modification via object manipulation, e.g. ctx.last_request.text = "new_text"
, will be discarded.
Turning the option on will enable calculating hashes for all items stored locally and comparing them at the end of every turn, updating any that were implicitly changed.
NB! Keeping track of the modified elements comes with a price of calculating their hashes and comparing them, so in performance-critical environments this feature can be disabled by setting the flag to False.
[7]:
# Any modifications done to the elements already present in storage
# will be preserved.
db.rewrite_existing = True
[8]:
if __name__ == "__main__":
check_happy_path(pipeline, HAPPY_PATH)
# This is a function for automatic
# tutorial running (testing) with HAPPY_PATH
# This runs tutorial in interactive mode if not in IPython env
# and if `DISABLE_INTERACTIVE_MODE` is not set
if is_interactive_mode():
pipeline.run() # This runs tutorial in interactive mode