This node allows you to connect to a local GPT4All LLM. To get started, you need to download a specific model either through the GPT4All client or by dowloading a GGUF model from Hugging Face Hub. Once you have downloaded the model, specify its file path in the configuration dialog to use it.
It is not necessary to install the GPT4All client to execute the node.
It is recommended to use models (e.g. Llama 2) that have been fine-tuned for chat applications. For model specifications including prompt templates, see GPT4All model list.
The currently supported models are based on GPT-J, LLaMA, MPT, Replit, Falcon and StarCoder.
For more information and detailed instructions on downloading compatible models, please visit the GPT4All GitHub repository.
Note: This node can not be used on the KNIME Hub, as the models can't be embedded into the workflow due to their large size.
Path to the pre-trained GPT4All model file eg. my/path/model.gguf. You can find the folder through settings -> application in the GPT4All desktop application.
Number of CPU threads used by GPT4All. If set to 0, the number of threads is determined automatically.
Model specific system template. Defaults to "%1". Refer to the GPT4All model list for the correct template for your model:
Model specific prompt template. Defaults to "%1". Refer to the GPT4All model list for the correct template for your model:
Note: For instruction based models, it is recommended to use "[INST] %1 [/INST]" as the prompt template for better output if the "promptTemplate" field is not specified in the model list.
The maximum number of tokens to generate.
This value, plus the token count of your prompt, cannot exceed the model's context length.
Sampling temperature to use, between 0.0 and 1.0.
Higher values will lead to less deterministic answers.
Try 0.9 for more creative applications, and 0 for ones with a well-defined answer. It is generally recommended altering this, or Top-p, but not both.
Set the "k" value to limit the vocabulary used during text generation. Smaller values (e.g., 10) restrict the choices to the most probable words, while larger values (e.g., 50) allow for more variety.
Amount of prompt tokens to process at once.
NOTE: On CPU, higher values can speed up reading prompts but will also use more RAM. On GPU, a batch size of 1 has outperformed other batch sizes in our experiments.
The processing unit on which the GPT4All model will run. It can be set to:
Alternatively, a specific GPU name can also be provided, and the model will run on the GPU that matches the name if it's available. Default is "cpu".
Note: If a selected GPU device does not have sufficient RAM to accommodate the model, an error will be thrown. It's advised to ensure the device has enough memory before initiating the model.
An alternative to sampling with temperature, where the model considers the results of the tokens (words) with top_p probability mass. Hence, 0.1 means only the tokens comprising the top 10% probability mass are considered.
You want to see the source code for this node? Click the following button and we’ll use our super-powers to find it for you.
To use this node in KNIME, install the extension KNIME Python Extension Development (Labs) from the below update site following our NodePit Product and Node Installation Guide:
A zipped version of the software site can be downloaded here.
Deploy, schedule, execute, and monitor your KNIME workflows locally, in the cloud or on-premises – with our brand new NodePit Runner.
Try NodePit Runner!Do you have feedback, questions, comments about NodePit, want to support this platform, or want your own nodes or workflows listed here as well? Do you think, the search results could be improved or something is missing? Then please get in touch! Alternatively, you can send us an email to mail@nodepit.com.
Please note that this is only about NodePit. We do not provide general support for KNIME — please use the KNIME forums instead.