Using Dialog2Graph algorithms#
This guide demonstrates usage of dialog2graph
dialog graph generation algorithms.
The following example shows how to use the particular algorithm that generates graph from the set of dialogs. It leverages both Embedding and LLM models.
Also we dive into details of ModelStorage
usage.
First of all, we need to import the ModelStorage
and
LLMGraphGenerator
.
from dialog2graph import Dialog
from dialog2graph.pipelines.model_storage import ModelStorage
from dialog2graph.pipelines.d2g_llm import LLMGraphGenerator
from dialog2graph.pipelines.helpers.parse_data import PipelineRawDataType
Now, we need to get the dialogs that are the source for the further graph. In this example we will read the dialogs from a JSON file. The dialogs should be presented in the following format:
{
"dialogs": [
{"text": "Hey there! How can I help you today?", "participant": "assistant"},
{"text": "I need to book a ride to the airport.", "participant": "user"},
{
"text": "Sure! I can help with that. When is your flight, and where are you departing from?",
"participant": "assistant"
},
{"text": "Do you have any other options?", "participant": "user"},
{
"text": "If you'd prefer, I can send you options for ride-share services instead. Would you like that?",
"participant": "assistant"
},
{"text": "No, I'll manage on my own.", "participant": "user"},
{"text": "No worries! Feel free to reach out anytime.", "participant": "assistant"},
{"text": "Alright, thanks anyway.", "participant": "user"},
{"text": "You're welcome! Have a fantastic trip!", "participant": "assistant"}
],
}
As we got the dialogs from the file, we reformat the initial dictionaries to Dialog
objects. Then, the dialogs are
passed to the data parser PipelineRawDataType
. So, we’ve prepared data for the generator:
import json
with open("example_dialog.json", "r") as f:
data = json.load(f)
dialogs = [Dialog(**dialog) for dialog in data["dialogs"]]
data = PipelineRawDataType(
dialogs=dialogs,
supported_graph=None,
true_graph=None,
)
Now we should create a ModelStorage
object. This object will be used to store the models for generation.
In this example we will use the LLM model and the Embedding model. The LLM model will be used to generate the graph, and the Embedding model will be used
to generate the embeddings for the nodes in the graph.
model_storage = ModelStorage()
model_storage.add(
"my_formatting_model",
config={
"model_name": "gpt-4.1-mini"
},
model_type=ChatOpenAI,
)
model_storage.add(
"my_embedding_model",
config={
"model_name": "sentence-transformers/all-MiniLM-L6-v2",
"model_kwargs": {"device": "cpu"}
},
model_type=HuggingFaceEmbeddings,
)
Now we can create the LLMGraphGenerator
object. This object will be used to generate the graph.
We will pass the ModelStorage
object to the constructor of the
LLMGraphGenerator
object. Note, that we are overriding the default model on the formatting and
similarity tasks with the models we added to the ModelStorage
object.
The rest of the models will be used as default. Don’t forget to use correct model_type
when adding the model to the
ModelStorage
. The available types are llm
for LLMs and emb
for embedders.
graph_generator = LLMGraphGenerator(
model_storage=model_storage,
formatting_llm="my_formatting_model",
sim_model="my_embedding_model"
)
Now we can generate the graph. We will pass the dialogs .invoke()
method of the LLMGraphGenerator
object. The method will return a graph object and a report object. To include the metrics in the report, we need to set the enable_evals
parameter to True
. It will run some metrics on the graph during and after the generation process. Keep in mind that this will usually slow down
the generation process and rise the token count.
graph, report = graph_generator.invoke(data, enable_evals=True)
graph.visualise()
print(report)