Icon

Challenge 24 - Evaluating Text Generation Workflows with Giskard

<p><strong>Challenge 24: Evaluating Text Generation Workflows with Giskard</strong></p><p><strong>Level:</strong> Medium<br><br><strong>Description:</strong> The LLM world keeps evolving at a fast pace, with new and better models coming to the market often. You want to build a workflow that evaluates the output of LLMs, including the detection of their potential vulnerabilities, using Giskard. The goal is to use this workflow to facilitate decision making when picking an LLM for a new task. As an initial test, you want to evaluate LLMs that tackle the following task: <em>given a prompt with product descriptions, the LLM should create emails to customers detailing such products.</em> <strong>Hint 1:</strong> Use the Giskard LLM Scanner node for the evaluation of LLMs. <strong>Hint 2:</strong> In this challenge, you're free to choose what LLMs you'll work with -- they can be local (e.g., Ollama's Llama models) or cloud-based (e.g., OpenAI's GPT models). <br><br><em>Beginner-friendly objective(s): </em>1. Load the dataset containing the products' descriptions. 2. Pick two LLMs of your preference and connect to them. The first LLM will be used to handle the email task; the second LLM will be one of the inputs for the Giskard LLM Scanner node, helping evaluate the results of the first LLM.<br><br><em>Intermediate-friendly objective(s): </em>1. Create a prompt that asks an LLM to leverage products' descriptions and create emails for customers detailing them. 2. Isolate the workflow segment that contains this prompt and an instance of the LLM prompter -- you can turn this workflow segment into a new, separate workflow by using the Workflow Writer node. 3. Send your second chosen LLM, the workflow segment you created, and the loaded dataset with product descriptions to the Giskard LLM Scanner. How does Giskard's final report look like? What are the main vulnerabilities or fragilities of the LLM you chose to create the emails?</p>

URL: LLM Vulnerabilities https://docs.giskard.ai/en/stable/knowledge/llm_vulnerabilities/index.html
URL: Giskard LLM Scan https://docs.giskard.ai/en/stable/knowledge/llm_vulnerabilities/index.html

Workflow segment to be evaluated

  • The segment could contain more nodes, e.g. chained LLM prompters or a RAG agent

Challenge 24: Evaluating Text Generation Workflows with Giskard


Level: Medium

Description: The LLM world keeps evolving at a fast pace, with new and better models coming to the market often. You want to build a workflow that evaluates the output of LLMs, including the detection of their potential vulnerabilities, using Giskard. The goal is to use this workflow to facilitate decision making when picking an LLM for a new task. As an initial test, you want to evaluate LLMs that tackle the following task: given a prompt with product descriptions, the LLM should create emails to customers detailing such products.Hint 1: Use the Giskard LLM Scanner node for the evaluation of LLMs. Hint 2: In this challenge, you're free to choose what LLMs you'll work with -- they can be local (e.g., Ollama's Llama models) or cloud-based (e.g., OpenAI's GPT models).

Beginner-friendly objective(s): 1. Load the dataset containing the products' descriptions. 2. Pick two LLMs of your preference and connect to them. The first LLM will be used to handle the email task; the second LLM will be one of the inputs for the Giskard LLM Scanner node, helping evaluate the results of the first LLM.

Intermediate-friendly objective(s): 1. Create a prompt that asks an LLM to leverage products' descriptions and create emails for customers detailing them. 2. Isolate the workflow segment that contains this prompt and an instance of the LLM prompter -- you can turn this workflow segment into a new, separate workflow by using the Workflow Writer node. 3. Send your second chosen LLM, the workflow segment you created, and the loaded dataset with product descriptions to the Giskard LLM Scanner. How does Giskard's final report look like? What are the main vulnerabilities or fragilities of the LLM you chose to create the emails?

Capture Workflow End
Authenticate
OpenAI Authenticator
Capture Workflow Start
OpenAIAPI Key
Credentials Configuration
Choose a modelfor generationand configure it
OpenAI LLM Selector
Create report onpotential vulnerabilitiesof the LLM model
Giskard LLM Scanner
Generates emails based on the product informationin the prompt
LLM Prompter
Choose a modelfor evaluationand configure it
OpenAI LLM Selector
Promptengineering
Expression
Dataset that is used byGiskard to generate domain-specific probes
CSV Reader

Nodes

Extensions

Links