Create Realtime Session

Go to Product

Create an ephemeral API token for use in client-side applications with the Realtime API. Can be configured with the same session parameters as the `session.update` client event. It responds with a session object, plus a `client_secret` key which contains a usable ephemeral API token that can be used to authenticate browser clients for the Realtime API.

Options

Body
Create an ephemeral API key with the given session configuration.

Request body which must comply to the following JSON Schema:

{
  "required" : [ "model" ],
  "type" : "object",
  "properties" : {
    "modalities" : {
      "type" : "array",
      "description" : "The set of modalities the model can respond with. To disable audio,\nset this to [\"text\"].\n",
      "items" : {
        "type" : "string",
        "enum" : [ "text", "audio" ]
      }
    },
    "model" : {
      "type" : "string",
      "description" : "The Realtime model used for this session.\n",
      "enum" : [ "gpt-4o-realtime-preview", "gpt-4o-realtime-preview-2024-10-01", "gpt-4o-realtime-preview-2024-12-17", "gpt-4o-mini-realtime-preview", "gpt-4o-mini-realtime-preview-2024-12-17" ]
    },
    "instructions" : {
      "type" : "string",
      "description" : "The default system instructions (i.e. system message) prepended to model \ncalls. This field allows the client to guide the model on desired \nresponses. The model can be instructed on response content and format, \n(e.g. \"be extremely succinct\", \"act friendly\", \"here are examples of good \nresponses\") and on audio behavior (e.g. \"talk quickly\", \"inject emotion \ninto your voice\", \"laugh frequently\"). The instructions are not guaranteed \nto be followed by the model, but they provide guidance to the model on the \ndesired behavior.\n\nNote that the server sets default instructions which will be used if this \nfield is not set and are visible in the `session.created` event at the \nstart of the session.\n"
    },
    "voice" : {
      "type" : "string",
      "description" : "The voice the model uses to respond. Voice cannot be changed during the \nsession once the model has responded with audio at least once. Current \nvoice options are `alloy`, `ash`, `ballad`, `coral`, `echo` `sage`, \n`shimmer` and `verse`.\n",
      "enum" : [ "alloy", "ash", "ballad", "coral", "echo", "sage", "shimmer", "verse" ]
    },
    "input_audio_format" : {
      "type" : "string",
      "description" : "The format of input audio. Options are `pcm16`, `g711_ulaw`, or `g711_alaw`.\n",
      "enum" : [ "pcm16", "g711_ulaw", "g711_alaw" ]
    },
    "output_audio_format" : {
      "type" : "string",
      "description" : "The format of output audio. Options are `pcm16`, `g711_ulaw`, or `g711_alaw`.\n",
      "enum" : [ "pcm16", "g711_ulaw", "g711_alaw" ]
    },
    "input_audio_transcription" : {
      "type" : "object",
      "properties" : {
        "model" : {
          "type" : "string",
          "description" : "The model to use for transcription, `whisper-1` is the only currently \nsupported model.\n"
        }
      },
      "description" : "Configuration for input audio transcription, defaults to off and can be \nset to `null` to turn off once on. Input audio transcription is not native \nto the model, since the model consumes audio directly. Transcription runs \nasynchronously through Whisper and should be treated as rough guidance \nrather than the representation understood by the model.\n"
    },
    "turn_detection" : {
      "type" : "object",
      "properties" : {
        "type" : {
          "type" : "string",
          "description" : "Type of turn detection, only `server_vad` is currently supported.\n"
        },
        "threshold" : {
          "type" : "number",
          "description" : "Activation threshold for VAD (0.0 to 1.0), this defaults to 0.5. A \nhigher threshold will require louder audio to activate the model, and \nthus might perform better in noisy environments.\n"
        },
        "prefix_padding_ms" : {
          "type" : "integer",
          "description" : "Amount of audio to include before the VAD detected speech (in \nmilliseconds). Defaults to 300ms.\n"
        },
        "silence_duration_ms" : {
          "type" : "integer",
          "description" : "Duration of silence to detect speech stop (in milliseconds). Defaults \nto 500ms. With shorter values the model will respond more quickly, \nbut may jump in on short pauses from the user.\n"
        },
        "create_response" : {
          "type" : "boolean",
          "description" : "Whether or not to automatically generate a response when VAD is\nenabled. `true` by default.\n",
          "default" : true
        }
      },
      "description" : "Configuration for turn detection. Can be set to `null` to turn off. Server \nVAD means that the model will detect the start and end of speech based on \naudio volume and respond at the end of user speech.\n"
    },
    "tools" : {
      "type" : "array",
      "description" : "Tools (functions) available to the model.",
      "items" : {
        "type" : "object",
        "properties" : {
          "type" : {
            "type" : "string",
            "description" : "The type of the tool, i.e. `function`.",
            "enum" : [ "function" ]
          },
          "name" : {
            "type" : "string",
            "description" : "The name of the function."
          },
          "description" : {
            "type" : "string",
            "description" : "The description of the function, including guidance on when and how \nto call it, and guidance about what to tell the user when calling \n(if anything).\n"
          },
          "parameters" : {
            "type" : "object",
            "description" : "Parameters of the function in JSON Schema."
          }
        }
      }
    },
    "tool_choice" : {
      "type" : "string",
      "description" : "How the model chooses tools. Options are `auto`, `none`, `required`, or \nspecify a function.\n"
    },
    "temperature" : {
      "type" : "number",
      "description" : "Sampling temperature for the model, limited to [0.6, 1.2]. Defaults to 0.8.\n"
    },
    "max_response_output_tokens" : {
      "description" : "Maximum number of output tokens for a single assistant response,\ninclusive of tool calls. Provide an integer between 1 and 4096 to\nlimit output tokens, or `inf` for the maximum available tokens for a\ngiven model. Defaults to `inf`.\n",
      "oneOf" : [ {
        "type" : "integer"
      }, {
        "type" : "string",
        "enum" : [ "inf" ]
      } ]
    }
  },
  "description" : "Realtime session object configuration."
}
Result Format

Specify how the response should be mapped to the table output. The following formats are available:

Raw Response: Returns the raw response in a single row with the following columns:

  • body: Response body
  • status: HTTP status code

Input Ports

Icon
Configuration data.

Output Ports

Icon
Result of the request depending on the selected Result Format.
Icon
Configuration data (this is the same as the input port; it is provided as passthrough for sequentially chaining nodes to declutter your workflow connections).

Popular Predecessors

  • No recommendations found

Popular Successors

  • No recommendations found

Views

This node has no views

Workflows

  • No workflows found

Links

Developers

You want to see the source code for this node? Click the following button and we’ll use our super-powers to find it for you.