HF Hub Chat Model Connector

This node establishes a connection to a specific chat model hosted on the Hugging Face Hub. The difference to the HF Hub LLM Connector is that this node allows you to provide prompt templates which are crucial for obtaining the best output from many models that have been fine-tuned for chat-based usecases.

To use this node, you need to successfully authenticate with the Hugging Face Hub using the HF Hub Authenticator node.

Provide the name of the desired chat model repository available on the Hugging Face Hub as an input.

Please ensure that you have the necessary permissions to access the model. Failures with gated models may occur due to outdated tokens.

Options

Hugging Face Hub Settings

Repo ID

The model name to be used, in the format <organization_name>/<model_name>. For example, mistralai/Mistral-7B-Instruct-v0.3 for text generation, or sentence-transformers/all-MiniLM-L6-v2 for embedding model.

You can find available models at the Hugging Face Models repository.

Prompt Templates

System prompt template

Model specific system prompt template. Defaults to "%1". Refer to the Hugging Face Hub model card for information on the correct prompt template.

Prompt template

Model specific prompt template. Defaults to "%1". Refer to the Hugging Face Hub model card for information on the correct prompt template.

Model Parameters

Top k

The number of top-k tokens to consider when generating text.

Typical p

The typical probability threshold for generating text.

Repetition penalty

The repetition penalty to use when generating text.

Max new tokens

The maximum number of tokens to generate in the completion.

The token count of your prompt plus max new tokens cannot exceed the model's context length.

Number of concurrent requests

Maximum number of concurrent requests to LLMs that can be made, whether through API calls or to an inference server. Exceeding this limit may result in temporary restrictions on your access.

It is important to plan your usage according to the model provider's rate limits, and keep in mind that both software and hardware constraints can impact performance.

For OpenAI, please refer to the Limits page for the rate limits available to you.

Temperature

Sampling temperature to use, between 0.0 and 100.0. Higher values will make the output more random, while lower values will make it more focused and deterministic.

Top-p sampling

An alternative to sampling with temperature, where the model considers the results of the tokens (words) with top_p probability mass. Hence, 0.1 means only the tokens comprising the top 10% probability mass are considered.

Input Ports

Icon

Validated authentication for Hugging Face Hub.

Output Ports

Icon

Connection to a specific chat model from Hugging Face Hub.

Popular Predecessors

  • No recommendations found

Popular Successors

  • No recommendations found

Views

This node has no views

Workflows

  • No workflows found

Links

Developers

You want to see the source code for this node? Click the following button and we’ll use our super-powers to find it for you.