Challenge 24 - Evaluating Text Generation with Giskard
Level: Medium
Description: The LLM world keeps evolving at a fast pace, with new and better models coming to the market often. You want to build a workflow that evaluates the output of LLMs, including the detection of their potential vulnerabilities, using Giskard. The goal is to use this workflow to facilitate decision making when picking an LLM for a new task. As an initial test, you want to evaluate LLMs that tackle the following task: "given a prompt with product descriptions, the LLM should create emails to customers detailing such products." Hint 1: Use the Giskard LLM Scanner node for the evaluation of LLMs. Hint 2: In this challenge, you are free to choose what LLMs you will work with.
Beginner-friendly objective(s): 1. Load the dataset containing the products' descriptions. 2. Pick two LLMs of your preference and connect to them. The first LLM will be used to handle the email task; the second LLM will be one of the inputs for the Giskard LLM Scanner node, helping evaluate the results of the first LLM.
Intermediate-friendly objective(s): 1. Create a prompt that asks an LLM to leverage products' descriptions and create emails for customers detailing them. 2. Isolate the workflow segment that contains this prompt and an instance of the LLM prompter -- you can turn this workflow segment into a new, separate workflow by using the Workflow Writer node. 3. Send your second chosen LLM, the workflow segment you created, and the loaded dataset with product descriptions to the Giskard LLM Scanner. How does Giskard's final report look like? What are the main vulnerabilities or fragilities of the LLM you chose to create the emails?
To use this workflow in KNIME, download it from the below URL and open it in KNIME:
Download WorkflowDeploy, schedule, execute, and monitor your KNIME workflows locally, in the cloud or on-premises – with our brand new NodePit Runner.
Try NodePit Runner!Do you have feedback, questions, comments about NodePit, want to support this platform, or want your own nodes or workflows listed here as well? Do you think, the search results could be improved or something is missing? Then please get in touch! Alternatively, you can send us an email to mail@nodepit.com.
Please note that this is only about NodePit. We do not provide general support for KNIME — please use the KNIME forums instead.