Challenge 24 - Evaluating Text Generation Workflows with Giskard

Challenge 24: Evaluating Text Generation Workflows with GiskardLevel: Medium Description: The LLM world keeps evolving at a fast pace, with new and better models coming to the market often. You want to build a workflow that evaluates the output of LLMs, including the detection of their potential vulnerabilities, using Giskard. The goal is to use this workflow to facilitate decision making when picking an LLM for a new task. As an initial test, you want to evaluate LLMs that tackle the following task: given a prompt with product descriptions, the LLM should create emails to customers detailing such products. Hint 1: Use the Giskard LLM Scanner node for the evaluation of LLMs. Hint 2: In this challenge, you're free to choose what LLMs you'll work with -- they can be local (e.g., Ollama's Llama models) or cloud-based (e.g., OpenAI's GPT models). Beginner-friendly objective(s): 1. Load the dataset containing the products' descriptions. 2. Pick two LLMs of your preference and connect to them. The first LLM will be used to handle the email task; the second LLM will be one of the inputs for the Giskard LLM Scanner node, helping evaluate the results of the first LLM. Intermediate-friendly objective(s): 1. Create a prompt that asks an LLM to leverage products' descriptions and create emails for customers detailing them. 2. Isolate the workflow segment that contains this prompt and an instance of the LLM prompter -- you can turn this workflow segment into a new, separate workflow by using the Workflow Writer node. 3. Send your second chosen LLM, the workflow segment you created, and the loaded dataset with product descriptions to the Giskard LLM Scanner. How does Giskard's final report look like? What are the main vulnerabilities or fragilities of the LLM you chose to create the emails?

URL: LLM Vulnerabilities https://docs.giskard.ai/en/stable/knowledge/llm_vulnerabilities/index.html
URL: Giskard LLM Scan https://docs.giskard.ai/en/stable/knowledge/llm_vulnerabilities/index.html

Challenge 24 - Evaluating Text Generation Workflows with Giskard

Workflow segment to be evaluated

Nodes

Extensions

Links

Download