HF Hub LLM Selector

This node establishes a connection to a specific chat model hosted on the Hugging Face Hub.

To use this node, you need to successfully authenticate with the Hugging Face Hub using the HF Hub Authenticator node.

Provide the name of the desired chat model repository available on the Hugging Face Hub as an input. The model will be executed with the selected Inference Provider for the conversational task.

Please ensure that you have the necessary permissions to access the model. Failures with gated models may occur due to outdated tokens.

Note: Tool calling is currently not supported for HF Hub models.

Note: If you use the Credentials Configuration node and do not select the "Save password in configuration (weakly encrypted)" option for passing the API key, the Credentials Configuration node will need to be reconfigured upon reopening the workflow, as the credentials flow variable was not saved and will therefore not be available to downstream nodes.

Options

Provider selection

Specify whether the Inference Provider is selected automatically or manually.

Available options:

Auto: The first provider that supports the model is selected automatically.
Manual: Allows to select a specific provider from a list of all providers.

Inference provider

The Inference Provider that runs the model. The HF Hub website shows for each model which providers are available.

Hugging Face Hub Settings

Repo ID

The model name to be used, in the format <organization_name>/<model_name>. For example, mistralai/Mistral-7B-Instruct-v0.3 for text generation, or sentence-transformers/all-MiniLM-L6-v2 for embedding model.

You can find available models at the Hugging Face Models repository.

Model Parameters

Maximum response length (token)

The maximum number of tokens to generate.

This value, plus the token count of your prompt, cannot exceed the model's context length.

Temperature

Sampling temperature to use, between 0.0 and 2.0. Higher values will make the output more random, while lower values will make it more focused and deterministic.

Top-p sampling

An alternative to sampling with temperature, where the model considers the results of the tokens (words) with top_p probability mass. Hence, 0.1 means only the tokens comprising the top 10% probability mass are considered.

Number of concurrent requests

Maximum number of concurrent requests to LLMs that can be made, whether through API calls or to an inference server. Exceeding this limit may result in temporary restrictions on your access.

It is important to plan your usage according to the model provider's rate limits, and keep in mind that both software and hardware constraints can impact performance.

For OpenAI, please refer to the Limits page for the rate limits available to you.

Input Ports

: Validated authentication for Hugging Face Hub.

Output Ports

: Connection to a specific chat model from Hugging Face Hub.

Popular Predecessors

No recommendations found

Popular Successors

No recommendations found

Views

This node has no views

Workflows

No workflows found

Developers

You want to see the source code for this node? Click the following button and we’ll use our super-powers to find it for you.

Installation

To use this node in KNIME, install the extension KNIME Python Extension Development (Labs) from the below update site following our NodePit Product and Node Installation Guide:

v5.10

A zipped version of the software site can be downloaded here.

Plugin provider: KNIME AG, Zurich, Switzerland

Plugin version: 5.10.0.v202601271422

On NodePit since: 2026-02-18

Last update: 2026-03-09

Tags: Modern UI

KNIME versions: Since v5.8

Deploy, schedule, execute, and monitor your KNIME workflows locally, in the cloud or on-premises – with our brand new NodePit Runner.

Try NodePit Runner!