Icon

Challenge 24 - Evaluating Text Generation with Giskard_​PV

<p><strong>Challenge 24 - Evaluating Text Generation with Giskard</strong></p><p><strong>Level: </strong>Medium<br><strong><br>Description: </strong>The LLM world keeps evolving at a fast pace, with new and better models coming to the market often. You want to build a workflow that evaluates the output of LLMs, including the detection of their potential vulnerabilities, using Giskard. The goal is to use this workflow to facilitate decision making when picking an LLM for a new task. As an initial test, you want to evaluate LLMs that tackle the following task: <em>"given a prompt with product descriptions, the LLM should create emails to customers detailing such products</em>.<em>"</em> <strong>Hint 1:</strong> Use the <strong>Giskard LLM Scanner</strong> node for the evaluation of LLMs. <strong>Hint 2:</strong> In this challenge, you are free to choose what LLMs you will work with.<br><br><em>Beginner-friendly objective(s):</em> 1. Load the dataset containing the products' descriptions. 2. Pick two LLMs of your preference and connect to them. The first LLM will be used to handle the email task; the second LLM will be one of the inputs for the Giskard LLM Scanner node, helping evaluate the results of the first LLM.<br><em><br>Intermediate-friendly objective(s):</em> 1. Create a prompt that asks an LLM to leverage products' descriptions and create emails for customers detailing them. 2. Isolate the workflow segment that contains this prompt and an instance of the LLM prompter -- you can turn this workflow segment into a new, separate workflow by using the Workflow Writer node. 3. Send your second chosen LLM, the workflow segment you created, and the loaded dataset with product descriptions to the Giskard LLM Scanner. How does Giskard's final report look like? What are the main vulnerabilities or fragilities of the LLM you chose to create the emails?</p>

Challenge 24 - Evaluating Text Generation with Giskard

Level: Medium

Description:
The LLM world keeps evolving at a fast pace, with new and better models coming to the market often. You want to build a workflow that evaluates the output of LLMs, including the detection of their potential vulnerabilities, using Giskard. The goal is to use this workflow to facilitate decision making when picking an LLM for a new task. As an initial test, you want to evaluate LLMs that tackle the following task: "given a prompt with product descriptions, the LLM should create emails to customers detailing such products." Hint 1: Use the Giskard LLM Scanner node for the evaluation of LLMs. Hint 2: In this challenge, you are free to choose what LLMs you will work with.

Beginner-friendly objective(s): 1. Load the dataset containing the products' descriptions. 2. Pick two LLMs of your preference and connect to them. The first LLM will be used to handle the email task; the second LLM will be one of the inputs for the Giskard LLM Scanner node, helping evaluate the results of the first LLM.

Intermediate-friendly objective(s):
1. Create a prompt that asks an LLM to leverage products' descriptions and create emails for customers detailing them. 2. Isolate the workflow segment that contains this prompt and an instance of the LLM prompter -- you can turn this workflow segment into a new, separate workflow by using the Workflow Writer node. 3. Send your second chosen LLM, the workflow segment you created, and the loaded dataset with product descriptions to the Giskard LLM Scanner. How does Giskard's final report look like? What are the main vulnerabilities or fragilities of the LLM you chose to create the emails?

Nodes

Extensions

Links