Local GPT4All LLM Selector

This node allows you to connect to a local GPT4All LLM. To get started, you need to download a specific model either through the GPT4All client or by dowloading a GGUF model from Hugging Face Hub. Once you have downloaded the model, specify its file path in the configuration dialog to use it.

It is not necessary to install the GPT4All client to execute the node.

It is recommended to use models (e.g. Llama 2) that have been fine-tuned for chat applications. For model specifications including prompt templates, see GPT4All model list.

The currently supported models are based on GPT-J, LLaMA, MPT, Replit, Falcon and StarCoder.

For more information and detailed instructions on downloading compatible models, please visit the GPT4All GitHub repository.

Note: This node cannot be used on the KNIME Hub, as the models cannot be embedded into the workflow due to their large size.

Options

Model type

The type of the selected model.

Available options:

Chat: The model is trained to follow instructions in a conversational style.
Instruct: The model is trained to follow one-shot instructions and is not well suited for conversations.

Model Usage

Model path: Path to the pre-trained GPT4All model file eg. my/path/model.gguf. You can find the folder through settings -> application in the GPT4All desktop application.
Thread Count: Number of CPU threads used by GPT4All. If set to 0, the number of threads is determined automatically.

Prompt Templates

System prompt template

Model specific system template. Defaults to "%1". Refer to the GPT4All model list for the correct template for your model:

Locate the model you are using under the field "name".
Within the Model object, locate the "systemPrompt" field and use these values.

Prompt template

Model specific prompt template. Defaults to "%1". Refer to the GPT4All model list for the correct template for your model:

Locate the model you are using under the field "name".
Within the Model object, locate the "promptTemplate" field and use these values.

Note: For instruction based models, it is recommended to use "[INST] %1 [/INST]" as the prompt template for better output if the "promptTemplate" field is not specified in the model list.

Model Parameters

Maximum response length (token)

The maximum number of tokens to generate.

This value, plus the token count of your prompt, cannot exceed the model's context length.

Context length

The maximum number of tokens a model can process in a single input sequence.

This value should be greater than the number of tokens in your prompt plus the maximum response length.

Temperature

Sampling temperature to use, between 0.0 and 1.0.

Higher values will lead to less deterministic answers.

Try 0.9 for more creative applications, and 0 for ones with a well-defined answer. It is generally recommended altering this, or Top-p, but not both.

Top-k sampling

Set the "k" value to limit the vocabulary used during text generation. Smaller values (e.g., 10) restrict the choices to the most probable words, while larger values (e.g., 50) allow for more variety.

Prompt batch size

Amount of prompt tokens to process at once.

NOTE: On CPU, higher values can speed up reading prompts but will also use more RAM. On GPU, a batch size of 1 has outperformed other batch sizes in our experiments.

Device

The processing unit on which the GPT4All model will run. It can be set to:

"cpu": Model will run on the central processing unit.
"gpu": Model will run on the best available graphics processing unit, irrespective of its vendor.
"amd", "nvidia", "intel": Model will run on the best available GPU from the specified vendor.

Alternatively, a specific GPU name can also be provided, and the model will run on the GPU that matches the name if it's available. Default is "cpu".

Note: If a selected GPU device does not have sufficient RAM to accommodate the model, an error will be thrown. It's advised to ensure the device has enough memory before initiating the model.

Top-p sampling

An alternative to sampling with temperature, where the model considers the results of the tokens (words) with top_p probability mass. Hence, 0.1 means only the tokens comprising the top 10% probability mass are considered.

Input Ports

This node has no input ports

Output Ports

: A GPT4All Large Language Model.

Popular Predecessors

No recommendations found

Popular Successors

No recommendations found

Views

This node has no views

Workflows

No workflows found

Developers

You want to see the source code for this node? Click the following button and we’ll use our super-powers to find it for you.

Installation

To use this node in KNIME, install the extension KNIME Python Extension Development (Labs) from the below update site following our NodePit Product and Node Installation Guide:

v5.8

A zipped version of the software site can be downloaded here.

Plugin provider: KNIME AG, Zurich, Switzerland

Plugin version: 5.8.0.v202510031553

On NodePit since: 2025-10-17

Last update: 2025-10-29

Tags: Modern UI

KNIME versions: Since v5.3

Deploy, schedule, execute, and monitor your KNIME workflows locally, in the cloud or on-premises – with our brand new NodePit Runner.

Try NodePit Runner!