HF TGI LLM Selector

This node can connect to locally or remotely hosted TGI servers which includes Text Generation Inference Endpoints of popular text generation models that are deployed via Hugging Face Hub.

Protected endpoints require a connection with a HF Hub Authenticator node in order to authenticate with Hugging Face Hub.

The Text Generation Inference is a Rust, Python, and gRPC server specifically designed for text generation inference. It can be self-hosted to power LLM APIs and inference widgets.

For more details and information about integrating with the Hugging Face TextGen Inference and setting up a local server, refer to the LangChain documentation.

Note: If you use the Credentials Configuration node and do not select the "Save password in configuration (weakly encrypted)" option for passing the API key via the HF Hub Authenticator node, the Credentials Configuration node will need to be reconfigured upon reopening the workflow, as the credentials flow variable was not saved and will therefore not be available to downstream nodes.

Options

Model type

The type of the selected model.

Available options:

Chat: The model is trained to follow instructions in a conversational style.
Instruct: The model is trained to follow one-shot instructions and is not well suited for conversations.

Hugging Face TextGen Inference Server Settings

Inference server URL: The URL of the inference server to use, e.g. http://localhost:8010/.

Prompt Templates

System prompt template: Model specific system prompt template. Defaults to "%1". Refer to the Hugging Face Hub model card for information on the correct prompt template.
Prompt template: Model specific prompt template. Defaults to "%1". Refer to the Hugging Face Hub model card for information on the correct prompt template.

Model Parameters

Seed

Set the seed parameter to any integer of your choice and use the same value across requests to have reproducible outputs.

The default value of 0 means that no seed is specified.

Top k

The number of top-k tokens to consider when generating text.

Typical p

The typical probability threshold for generating text.

Repetition penalty

The repetition penalty to use when generating text.

Max new tokens

The maximum number of tokens to generate in the completion.

The token count of your prompt plus max new tokens cannot exceed the model's context length.

Number of concurrent requests

Maximum number of concurrent requests to LLMs that can be made, whether through API calls or to an inference server. Exceeding this limit may result in temporary restrictions on your access.

It is important to plan your usage according to the model provider's rate limits, and keep in mind that both software and hardware constraints can impact performance.

For OpenAI, please refer to the Limits page for the rate limits available to you.

Temperature

Sampling temperature to use, between 0.0 and 100.0. Higher values will make the output more random, while lower values will make it more focused and deterministic.

Top-p sampling

An alternative to sampling with temperature, where the model considers the results of the tokens (words) with top_p probability mass. Hence, 0.1 means only the tokens comprising the top 10% probability mass are considered.

Input Ports

…: An optional Hugging Face Hub connection that can be used to access protected Hugging Face inference endpoints.

Output Ports

: Connection to a chat model hosted on a Text Generation Inference server.

Popular Predecessors

No recommendations found

Popular Successors

No recommendations found

Views

This node has no views

Workflows

No workflows found

Developers

You want to see the source code for this node? Click the following button and we’ll use our super-powers to find it for you.

Installation

To use this node in KNIME, install the extension KNIME Python Extension Development (Labs) from the below update site following our NodePit Product and Node Installation Guide:

v5.5

A zipped version of the software site can be downloaded here.

Plugin provider: KNIME AG, Zurich, Switzerland

Plugin version: 5.5.0.v202506241445

On NodePit since: 2025-07-02

Last update: 2025-07-27

KNIME versions: Since v5.2

Deploy, schedule, execute, and monitor your KNIME workflows locally, in the cloud or on-premises – with our brand new NodePit Runner.

Try NodePit Runner!