Local GPT4All LLM Connector

This node allows you to connect to a local GPT4All LLM. To get started, you need to download a specific model either through the GPT4All client or by dowloading a GGUF model from Hugging Face Hub. Once you have downloaded the model, specify its file path in the configuration dialog to use it.

It is not necessary to install the GPT4All client to execute the node.

Some models (e.g. Llama 2) have been fine-tuned for chat applications, so they might behave unexpectedly if their prompts do not follow a chat like structure:

User: <The prompt you want to send to the model>
Assistant:

Use the prompt template for the specific model from the GPT4All model list if one is provided.

The currently supported models are based on GPT-J, LLaMA, MPT, Replit, Falcon and StarCoder.

For more information and detailed instructions on downloading compatible models, please visit the GPT4All GitHub repository.

Note: This node cannot be used on the KNIME Hub, as the models cannot be embedded into the workflow due to their large size.

Options

Model Usage

Model path

Path to the pre-trained GPT4All model file eg. my/path/model.gguf. You can find the folder through settings -> application in the GPT4All desktop application.

Thread Count

Number of CPU threads used by GPT4All. If set to 0, the number of threads is determined automatically.

Model Parameters

Maximum response length (token)

The maximum number of tokens to generate.

This value, plus the token count of your prompt, cannot exceed the model's context length.

Temperature

Sampling temperature to use, between 0.0 and 1.0.

Higher values will lead to less deterministic answers.

Try 0.9 for more creative applications, and 0 for ones with a well-defined answer. It is generally recommended altering this, or Top-p, but not both.

Top-k sampling

Set the "k" value to limit the vocabulary used during text generation. Smaller values (e.g., 10) restrict the choices to the most probable words, while larger values (e.g., 50) allow for more variety.

Prompt batch size

Amount of prompt tokens to process at once.

NOTE: On CPU, higher values can speed up reading prompts but will also use more RAM. On GPU, a batch size of 1 has outperformed other batch sizes in our experiments.

Device

The processing unit on which the GPT4All model will run. It can be set to:

  • "cpu": Model will run on the central processing unit.
  • "gpu": Model will run on the best available graphics processing unit, irrespective of its vendor.
  • "amd", "nvidia", "intel": Model will run on the best available GPU from the specified vendor.

Alternatively, a specific GPU name can also be provided, and the model will run on the GPU that matches the name if it's available. Default is "cpu".

Note: If a selected GPU device does not have sufficient RAM to accommodate the model, an error will be thrown. It's advised to ensure the device has enough memory before initiating the model.

Top-p sampling

An alternative to sampling with temperature, where the model considers the results of the tokens (words) with top_p probability mass. Hence, 0.1 means only the tokens comprising the top 10% probability mass are considered.

Input Ports

This node has no input ports

Output Ports

Icon

A GPT4All large language model.

Popular Predecessors

  • No recommendations found

Popular Successors

  • No recommendations found

Views

This node has no views

Workflows

  • No workflows found

Links

Developers

You want to see the source code for this node? Click the following button and we’ll use our super-powers to find it for you.