OpenAI Realtime API
📝 Overview
Introduction
OpenAI Realtime API provides two connection methods:
-
WebRTC - For real-time audio/video interaction in browsers and mobile clients
-
WebSocket - For server-to-server application integration
Use Cases
- Real-time voice conversations
- Audio/video conferencing
- Real-time translation
- Speech transcription
- Real-time code generation
- Server-side real-time integration
Key Features
- Bidirectional audio streaming
- Mixed text and audio conversations
- Function calling support
- Automatic Voice Activity Detection (VAD)
- Audio transcription capabilities
- WebSocket server-side integration
🔐 Authentication & Security
Authentication Methods
- Standard API Key (server-side only)
- Ephemeral Token (client-side use)
Ephemeral Token
- Validity: 1 minute
- Usage limit: Single connection
- Generation: Created via server-side API
POST https://www.uniaix.com/v1/realtime/sessions
Content-Type: application/json
Authorization: Bearer $NEW_API_KEY
{
"model": "gpt-4o-realtime-preview-2024-12-17",
"voice": "verse"
}
Security Recommendations
- Never expose standard API keys on the client side
- Use HTTPS/WSS for communication
- Implement appropriate access controls
- Monitor for unusual activity
🔌 Connection Establishment
WebRTC Connection
- URL:
https://www.uniaix.com/v1/realtime
- Query parameters:
model
- Headers:
Authorization: Bearer EPHEMERAL_KEY
Content-Type: application/sdp
WebSocket Connection
- URL:
wss://www.uniaix.com/v1/realtime
- Query parameters:
model
- Headers:
Authorization: Bearer YOUR_API_KEY
OpenAI-Beta: realtime=v1
Connection Flow
sequenceDiagram
participant Client
participant Server
participant OpenAI
alt WebRTC Connection
Client->>Server: Request ephemeral token
Server->>OpenAI: Create session
OpenAI-->>Server: Return ephemeral token
Server-->>Client: Return ephemeral token
Client->>OpenAI: Create WebRTC offer
OpenAI-->>Client: Return answer
Note over Client,OpenAI: Establish WebRTC connection
Client->>OpenAI: Create data channel
OpenAI-->>Client: Confirm data channel
else WebSocket Connection
Server->>OpenAI: Establish WebSocket connection
OpenAI-->>Server: Confirm connection
Note over Server,OpenAI: Begin real-time conversation
end
Data Channel
- Name:
oai-events
- Purpose: Event transmission
- Format: JSON
Audio Stream
- Input:
addTrack()
- Output:
ontrack
event
💬 Conversation Interaction
Conversation Modes
- Text-only conversations
- Voice conversations
- Mixed conversations
Session Management
- Create session
- Update session
- End session
- Session configuration
Event Types
- Text events
- Audio events
- Function calls
- Status updates
- Error events
⚙️ Configuration Options
Audio Configuration
- Input formats
pcm16
g711_ulaw
g711_alaw
- Output formats
pcm16
g711_ulaw
g711_alaw
- Voice types
alloy
echo
shimmer
Model Configuration
- Temperature
- Maximum output length
- System prompt
- Tool configuration
VAD Configuration
- Threshold
- Silence duration
- Prefix padding
💡 Request Examples
WebRTC Connection ❌
Client Implementation (Browser)
async function init() {
// Get ephemeral key from server - see server code below
const tokenResponse = await fetch("/session");
const data = await tokenResponse.json();
const EPHEMERAL_KEY = data.client_secret.value;
// Create peer connection
const pc = new RTCPeerConnection();
// Set up remote audio from model playback
const audioEl = document.createElement("audio");
audioEl.autoplay = true;
pc.ontrack = e => audioEl.srcObject = e.streams[0];
// Add local audio track from browser microphone input
const ms = await navigator.mediaDevices.getUserMedia({
audio: true
});
pc.addTrack(ms.getTracks()[0]);
// Set up data channel for sending and receiving events
const dc = pc.createDataChannel("oai-events");
dc.addEventListener("message", (e) => {
// Receive real-time server events here!
console.log(e);
});
// Start session using Session Description Protocol (SDP)
const offer = await pc.createOffer();
await pc.setLocalDescription(offer);
const baseUrl = "https://www.uniaix.com/v1/realtime";
const model = "gpt-4o-realtime-preview-2024-12-17";
const sdpResponse = await fetch(`${baseUrl}?model=${model}`, {
method: "POST",
body: offer.sdp,
headers: {
Authorization: `Bearer ${EPHEMERAL_KEY}`,
"Content-Type": "application/sdp"
},
});
const answer = {
type: "answer",
sdp: await sdpResponse.text(),
};
await pc.setRemoteDescription(answer);
}
init();
Server Implementation (Node.js)
import express from "express";
const app = express();
// Create an endpoint for generating ephemeral tokens
// This endpoint works with the client code above
app.get("/session", async (req, res) => {
const r = await fetch("https://www.uniaix.com/v1/realtime/sessions", {
method: "POST",
headers: {
"Authorization": `Bearer ${process.env.NEW_API_KEY}`,
"Content-Type": "application/json",
},
body: JSON.stringify({
model: "gpt-4o-realtime-preview-2024-12-17",
voice: "verse",
}),
});
const data = await r.json();
// Send the JSON received from OpenAI REST API back to client
res.send(data);
});
app.listen(3000);
WebRTC Event Send/Receive Example
// Create data channel from peer connection
const dc = pc.createDataChannel("oai-events");
// Listen for server events on data channel
// Event data needs to be parsed from JSON string
dc.addEventListener("message", (e) => {
const realtimeEvent = JSON.parse(e.data);
console.log(realtimeEvent);
});
// Send client event: serialize valid client events to
// JSON and send via data channel
const responseCreate = {
type: "response.create",
response: {
modalities: ["text"],
instructions: "Write a haiku about code",
},
};
dc.send(JSON.stringify(responseCreate));
WebSocket Connection ✅
Node.js (ws module)
import WebSocket from "ws";
const url = "wss://www.uniaix.com/v1/realtime?model=gpt-4o-realtime-preview-2024-12-17";
const ws = new WebSocket(url, {
headers: {
"Authorization": "Bearer " + process.env.NEW_API_KEY,
"OpenAI-Beta": "realtime=v1",
},
});
ws.on("open", function open() {
console.log("Connected to server.");
});
ws.on("message", function incoming(message) {
console.log(JSON.parse(message.toString()));
});
Python (websocket-client)
# Requires websocket-client library:
# pip install websocket-client
import os
import json
import websocket
NEW_API_KEY = os.environ.get("NEW_API_KEY")
url = "wss://www.uniaix.com/v1/realtime?model=gpt-4o-realtime-preview-2024-12-17"
headers = [
"Authorization: Bearer " + NEW_API_KEY,
"OpenAI-Beta: realtime=v1"
]
def on_open(ws):
print("Connected to server.");
def on_message(ws, message):
data = json.loads(message)
print("Received event:", json.dumps(data, indent=2))
ws = websocket.WebSocketApp(
url,
header=headers,
on_open=on_open,
on_message=on_message,
)
ws.run_forever()
Browser (Standard WebSocket)
/*
Note: In browser and other client environments, we recommend using WebRTC.
But in Deno and Cloudflare Workers and other browser-like environments,
you can also use the standard WebSocket interface.
*/
const ws = new WebSocket(
"wss://www.uniaix.com/v1/realtime?model=gpt-4o-realtime-preview-2024-12-17",
[
"realtime",
// Authentication
"openai-insecure-api-key." + NEW_API_KEY,
// Optional
"openai-organization." + OPENAI_ORG_ID,
"openai-project." + OPENAI_PROJECT_ID,
// Beta protocol, required
"openai-beta.realtime-v1"
]
);
ws.on("open", function open() {
console.log("Connected to server.");
});
ws.on("message", function incoming(message) {
console.log(message.data);
});
Message Send/Receive Example
Node.js/Browser
// Receive server events
ws.on("message", function incoming(message) {
// Need to parse message data from JSON
const serverEvent = JSON.parse(message.data)
console.log(serverEvent);
});
// Send events, create JSON data structure conforming to client event format
const event = {
type: "response.create",
response: {
modalities: ["audio", "text"],
instructions: "Give me a haiku about code.",
}
};
ws.send(JSON.stringify(event));
Python
# Send client events, serialize dictionary to JSON
def on_open(ws):
print("Connected to server.");
event = {
"type": "response.create",
"response": {
"modalities": ["text"],
"instructions": "Please assist the user."
}
}
ws.send(json.dumps(event))
# Receive messages need to parse message payload from JSON
def on_message(ws, message):
data = json.loads(message)
print("Received event:", json.dumps(data, indent=2))
⚠️ Error Handling
Common Errors
- Connection errors
- Network issues
- Authentication failures
- Configuration errors
- Audio errors
- Device permissions
- Unsupported formats
- Codec issues
- Session errors
- Token expiration
- Session timeout
- Concurrency limits
Error Recovery
- Automatic reconnection
- Session recovery
- Error retry
- Graceful degradation
📝 Event Reference
All events need to include the following request headers:
Header |
Type |
Description |
Example Value |
Authorization |
String |
Authentication token |
Bearer $NEW_API_KEY |
OpenAI-Beta |
String |
API version |
realtime=v1 |
Client Events
session.update
Update the default configuration for the session.
Parameter |
Type |
Required |
Description |
Example Value/Optional Values |
event_id |
String |
No |
Client-generated event identifier |
event_123 |
type |
String |
No |
Event type |
session.update |
modalities |
String array |
No |
Modality types the model can respond with |
["text", "audio"] |
instructions |
String |
No |
System instructions prepended to model calls |
"Your knowledge cutoff is 2023-10..." |
voice |
String |
No |
Voice type used by the model |
alloy, echo, shimmer |
input_audio_format |
String |
No |
Input audio format |
pcm16, g711_ulaw, g711_alaw |
output_audio_format |
String |
No |
Output audio format |
pcm16, g711_ulaw, g711_alaw |
input_audio_transcription.model |
String |
No |
Model used for transcription |
whisper-1 |
turn_detection.type |
String |
No |
Voice detection type |
server_vad |
turn_detection.threshold |
Number |
No |
VAD activation threshold (0.0-1.0) |
0.8 |
turn_detection.prefix_padding_ms |
Integer |
No |
Audio duration included before speech starts |
500 |
turn_detection.silence_duration_ms |
Integer |
No |
Silence duration to detect speech stop |
1000 |
tools |
Array |
No |
List of tools available to the model |
[] |
tool_choice |
String |
No |
How the model chooses tools |
auto/none/required |
temperature |
Number |
No |
Model sampling temperature |
0.8 |
max_output_tokens |
String/Integer |
No |
Maximum tokens per response |
"inf"/4096 |
Append audio data to the input audio buffer.
Parameter |
Type |
Required |
Description |
Example Value |
event_id |
String |
No |
Client-generated event identifier |
event_456 |
type |
String |
No |
Event type |
input_audio_buffer.append |
audio |
String |
No |
Base64-encoded audio data |
Base64EncodedAudioData |
Commit the audio data in the buffer as a user message.
Parameter |
Type |
Required |
Description |
Example Value |
event_id |
String |
No |
Client-generated event identifier |
event_789 |
type |
String |
No |
Event type |
input_audio_buffer.commit |
Clear all audio data from the input audio buffer.
Parameter |
Type |
Required |
Description |
Example Value |
event_id |
String |
No |
Client-generated event identifier |
event_012 |
type |
String |
No |
Event type |
input_audio_buffer.clear |
conversation.item.create
Add a new conversation item to the conversation.
Parameter |
Type |
Required |
Description |
Example Value |
event_id |
String |
No |
Client-generated event identifier |
event_345 |
type |
String |
No |
Event type |
conversation.item.create |
previous_item_id |
String |
No |
New item will be inserted after this ID |
null |
item.id |
String |
No |
Unique identifier for the conversation item |
msg_001 |
item.type |
String |
No |
Type of conversation item |
message/function_call/function_call_output |
item.status |
String |
No |
Status of conversation item |
completed/in_progress/incomplete |
item.role |
String |
No |
Role of message sender |
user/assistant/system |
item.content |
Array |
No |
Message content |
[text/audio/transcript] |
item.call_id |
String |
No |
ID of function call |
call_001 |
item.name |
String |
No |
Name of called function |
function_name |
item.arguments |
String |
No |
Arguments for function call |
{"param": "value"} |
item.output |
String |
No |
Output result of function call |
{"result": "value"} |
conversation.item.truncate
Truncate audio content in assistant messages.
Parameter |
Type |
Required |
Description |
Example Value |
event_id |
String |
No |
Client-generated event identifier |
event_678 |
type |
String |
No |
Event type |
conversation.item.truncate |
item_id |
String |
No |
ID of assistant message item to truncate |
msg_002 |
content_index |
Integer |
No |
Index of content part to truncate |
0 |
audio_end_ms |
Integer |
No |
End time point for audio truncation |
1500 |
conversation.item.delete
Delete the specified conversation item from conversation history.
Parameter |
Type |
Required |
Description |
Example Value |
event_id |
String |
No |
Client-generated event identifier |
event_901 |
type |
String |
No |
Event type |
conversation.item.delete |
item_id |
String |
No |
ID of conversation item to delete |
msg_003 |
response.create
Trigger response generation.
Parameter |
Type |
Required |
Description |
Example Value |
event_id |
String |
No |
Client-generated event identifier |
event_234 |
type |
String |
No |
Event type |
response.create |
response.modalities |
String array |
No |
Modality types for response |
["text", "audio"] |
response.instructions |
String |
No |
Instructions for the model |
"Please assist the user." |
response.voice |
String |
No |
Voice type used by the model |
alloy/echo/shimmer |
response.output_audio_format |
String |
No |
Output audio format |
pcm16 |
response.tools |
Array |
No |
List of tools available to the model |
["type", "name", "description"] |
response.tool_choice |
String |
No |
How the model chooses tools |
auto |
response.temperature |
Number |
No |
Sampling temperature |
0.7 |
response.max_output_tokens |
Integer/String |
No |
Maximum output tokens |
150/"inf" |
response.cancel
Cancel ongoing response generation.
Parameter |
Type |
Required |
Description |
Example Value |
event_id |
String |
No |
Client-generated event identifier |
event_567 |
type |
String |
No |
Event type |
response.cancel |
Server Events
error
Event returned when an error occurs.
Parameter |
Type |
Required |
Description |
Example Value |
event_id |
String array |
No |
Unique identifier for server event |
["event_890"] |
type |
String |
No |
Event type |
error |
error.type |
String |
No |
Error type |
invalid_request_error/server_error |
error.code |
String |
No |
Error code |
invalid_event |
error.message |
String |
No |
Human-readable error message |
"The 'type' field is missing." |
error.param |
String |
No |
Parameter related to error |
null |
error.event_id |
String |
No |
ID of related event |
event_567 |
Returned when input audio transcription is enabled and transcription succeeds.
Parameter |
Type |
Required |
Description |
Example Value |
event_id |
String |
No |
Unique identifier for server event |
event_2122 |
type |
String |
No |
Event type |
conversation.item.input_audio_transcription.completed |
item_id |
String |
No |
ID of user message item |
msg_003 |
content_index |
Integer |
No |
Index of content part containing audio |
0 |
transcript |
String |
No |
Transcribed text content |
"Hello, how are you?" |
Returned when input audio transcription is configured but transcription request for user message fails.
Parameter |
Type |
Required |
Description |
Example Value |
event_id |
String |
No |
Unique identifier for server event |
event_2324 |
type |
String array |
No |
Event type |
["conversation.item.input_audio_transcription.failed"] |
item_id |
String |
No |
ID of user message item |
msg_003 |
content_index |
Integer |
No |
Index of content part containing audio |
0 |
error.type |
String |
No |
Error type |
transcription_error |
error.code |
String |
No |
Error code |
audio_unintelligible |
error.message |
String |
No |
Human-readable error message |
"The audio could not be transcribed." |
error.param |
String |
No |
Parameter related to error |
null |
conversation.item.truncated
Returned when client truncates previous assistant audio message item.
Parameter |
Type |
Required |
Description |
Example Value |
event_id |
String |
No |
Unique identifier for server event |
event_2526 |
type |
String |
No |
Event type |
conversation.item.truncated |
item_id |
String |
No |
ID of truncated assistant message item |
msg_004 |
content_index |
Integer |
No |
Index of truncated content part |
0 |
audio_end_ms |
Integer |
No |
Time point when audio was truncated (milliseconds) |
1500 |
conversation.item.deleted
Returned when an item in the conversation is deleted.
Parameter |
Type |
Required |
Description |
Example Value |
event_id |
String |
No |
Unique identifier for server event |
event_2728 |
type |
String |
No |
Event type |
conversation.item.deleted |
item_id |
String |
No |
ID of deleted conversation item |
msg_005 |
Returned when audio buffer data is committed.
Parameter |
Type |
Required |
Description |
Example Value |
event_id |
String |
No |
Unique identifier for server event |
event_1121 |
type |
String |
No |
Event type |
input_audio_buffer.committed |
previous_item_id |
String |
No |
New conversation item will be inserted after this ID |
msg_001 |
item_id |
String |
No |
ID of user message item to be created |
msg_002 |
Returned when client clears input audio buffer.
Parameter |
Type |
Required |
Description |
Example Value |
event_id |
String |
No |
Unique identifier for server event |
event_1314 |
type |
String |
No |
Event type |
input_audio_buffer.cleared |
In server voice detection mode, returned when voice input is detected.
Parameter |
Type |
Required |
Description |
Example Value |
event_id |
String |
No |
Unique identifier for server event |
event_1516 |
type |
String |
No |
Event type |
input_audio_buffer.speech_started |
audio_start_ms |
Integer |
No |
Milliseconds from session start to voice detection |
1000 |
item_id |
String |
No |
ID of user message item to be created when voice stops |
msg_003 |
In server voice detection mode, returned when voice input stops.
Parameter |
Type |
Required |
Description |
Example Value |
event_id |
String |
No |
Unique identifier for server event |
event_1718 |
type |
String |
No |
Event type |
input_audio_buffer.speech_stopped |
audio_start_ms |
Integer |
No |
Milliseconds from session start to voice stop detection |
2000 |
item_id |
String |
No |
ID of user message item to be created |
msg_003 |
response.created
Returned when a new response is created.
Parameter |
Type |
Required |
Description |
Example Value |
event_id |
String |
No |
Unique identifier for server event |
event_2930 |
type |
String |
No |
Event type |
response.created |
response.id |
String |
No |
Unique identifier for response |
resp_001 |
response.object |
String |
No |
Object type |
realtime.response |
response.status |
String |
No |
Status of response |
in_progress |
response.status_details |
Object |
No |
Additional details about status |
null |
response.output |
String array |
No |
List of output items generated by response |
["[]"] |
response.usage |
Object |
No |
Usage statistics for response |
null |
response.done
Returned when response streaming is complete.
Parameter |
Type |
Required |
Description |
Example Value |
event_id |
String |
No |
Unique identifier for server event |
event_3132 |
type |
String |
No |
Event type |
response.done |
response.id |
String |
No |
Unique identifier for response |
resp_001 |
response.object |
String |
No |
Object type |
realtime.response |
response.status |
String |
No |
Final status of response |
completed/cancelled/failed/incomplete |
response.status_details |
Object |
No |
Additional details about status |
null |
response.output |
String array |
No |
List of output items generated by response |
["[...]"] |
response.usage.total_tokens |
Integer |
No |
Total tokens |
50 |
response.usage.input_tokens |
Integer |
No |
Input tokens |
20 |
response.usage.output_tokens |
Integer |
No |
Output tokens |
30 |
response.output_item.added
Returned when a new output item is created during response generation.
Parameter |
Type |
Required |
Description |
Example Value |
event_id |
String |
No |
Unique identifier for server event |
event_3334 |
type |
String |
No |
Event type |
response.output_item.added |
response_id |
String |
No |
ID of response the output item belongs to |
resp_001 |
output_index |
String |
No |
Index of output item in response |
0 |
item.id |
String |
No |
Unique identifier for output item |
msg_007 |
item.object |
String |
No |
Object type |
realtime.item |
item.type |
String |
No |
Type of output item |
message/function_call/function_call_output |
item.status |
String |
No |
Status of output item |
in_progress/completed |
item.role |
String |
No |
Role associated with output item |
assistant |
item.content |
Array |
No |
Content of output item |
["type", "text", "audio", "transcript"] |
response.output_item.done
Returned when output item streaming is complete.
Parameter |
Type |
Required |
Description |
Example Value |
event_id |
String |
No |
Unique identifier for server event |
event_3536 |
type |
String |
No |
Event type |
response.output_item.done |
response_id |
String |
No |
ID of response the output item belongs to |
resp_001 |
output_index |
String |
No |
Index of output item in response |
0 |
item.id |
String |
No |
Unique identifier for output item |
msg_007 |
item.object |
String |
No |
Object type |
realtime.item |
item.type |
String |
No |
Type of output item |
message/function_call/function_call_output |
item.status |
String |
No |
Final status of output item |
completed/incomplete |
item.role |
String |
No |
Role associated with output item |
assistant |
item.content |
Array |
No |
Content of output item |
["type", "text", "audio", "transcript"] |
response.content_part.added
Returned when a new content part is added to assistant message item during response generation.
Parameter |
Type |
Required |
Description |
Example Value |
event_id |
String |
No |
Unique identifier for server event |
event_3738 |
type |
String |
No |
Event type |
response.content_part.added |
response_id |
String |
No |
ID of response |
resp_001 |
item_id |
String |
No |
ID of message item to add content part to |
msg_007 |
output_index |
Integer |
No |
Index of output item in response |
0 |
content_index |
Integer |
No |
Index of content part in message item content array |
0 |
part.type |
String |
No |
Content type |
text/audio |
part.text |
String |
No |
Text content |
"Hello" |
part.audio |
String |
No |
Base64-encoded audio data |
"base64_encoded_audio_data" |
part.transcript |
String |
No |
Transcribed text of audio |
"Hello" |
response.content_part.done
Returned when content part in assistant message item streaming is complete.
Parameter |
Type |
Required |
Description |
Example Value |
event_id |
String |
No |
Unique identifier for server event |
event_3940 |
type |
String |
No |
Event type |
response.content_part.done |
response_id |
String |
No |
ID of response |
resp_001 |
item_id |
String |
No |
ID of message item to add content part to |
msg_007 |
output_index |
Integer |
No |
Index of output item in response |
0 |
content_index |
Integer |
No |
Index of content part in message item content array |
0 |
part.type |
String |
No |
Content type |
text/audio |
part.text |
String |
No |
Text content |
"Hello" |
part.audio |
String |
No |
Base64-encoded audio data |
"base64_encoded_audio_data" |
part.transcript |
String |
No |
Transcribed text of audio |
"Hello" |
response.text.delta
Returned when text value of "text" type content part is updated.
Parameter |
Type |
Required |
Description |
Example Value |
event_id |
String |
No |
Unique identifier for server event |
event_4142 |
type |
String |
No |
Event type |
response.text.delta |
response_id |
String |
No |
ID of response |
resp_001 |
item_id |
String |
No |
ID of message item |
msg_007 |
output_index |
Integer |
No |
Index of output item in response |
0 |
content_index |
Integer |
No |
Index of content part in message item content array |
0 |
delta |
String |
No |
Text delta update content |
"Sure, I can h" |
response.text.done
Returned when "text" type content part text streaming is complete.
Parameter |
Type |
Required |
Description |
Example Value |
event_id |
String |
No |
Unique identifier for server event |
event_4344 |
type |
String |
No |
Event type |
response.text.done |
response_id |
String |
No |
ID of response |
resp_001 |
item_id |
String |
No |
ID of message item |
msg_007 |
output_index |
Integer |
No |
Index of output item in response |
0 |
content_index |
Integer |
No |
Index of content part in message item content array |
0 |
delta |
String |
No |
Final complete text content |
"Sure, I can help with that." |
response.audio_transcript.delta
Returned when transcription content of model-generated audio output is updated.
Parameter |
Type |
Required |
Description |
Example Value |
event_id |
String |
No |
Unique identifier for server event |
event_4546 |
type |
String |
No |
Event type |
response.audio_transcript.delta |
response_id |
String |
No |
ID of response |
resp_001 |
item_id |
String |
No |
ID of message item |
msg_008 |
output_index |
Integer |
No |
Index of output item in response |
0 |
content_index |
Integer |
No |
Index of content part in message item content array |
0 |
delta |
String |
No |
Transcription text delta update content |
"Hello, how can I a" |
response.audio_transcript.done
Returned when transcription of model-generated audio output streaming is complete.
Parameter |
Type |
Required |
Description |
Example Value |
event_id |
String |
No |
Unique identifier for server event |
event_4748 |
type |
String |
No |
Event type |
response.audio_transcript.done |
response_id |
String |
No |
ID of response |
resp_001 |
item_id |
String |
No |
ID of message item |
msg_008 |
output_index |
Integer |
No |
Index of output item in response |
0 |
content_index |
Integer |
No |
Index of content part in message item content array |
0 |
transcript |
String |
No |
Final complete transcribed text of audio |
"Hello, how can I assist you today?" |
response.audio.delta
Returned when model-generated audio content is updated.
Parameter |
Type |
Required |
Description |
Example Value |
event_id |
String |
No |
Unique identifier for server event |
event_4950 |
type |
String |
No |
Event type |
response.audio.delta |
response_id |
String |
No |
ID of response |
resp_001 |
item_id |
String |
No |
ID of message item |
msg_008 |
output_index |
Integer |
No |
Index of output item in response |
0 |
content_index |
Integer |
No |
Index of content part in message item content array |
0 |
delta |
String |
No |
Base64-encoded audio data delta |
"Base64EncodedAudioDelta" |
response.audio.done
Returned when model-generated audio is complete.
Parameter |
Type |
Required |
Description |
Example Value |
event_id |
String |
No |
Unique identifier for server event |
event_5152 |
type |
String |
No |
Event type |
response.audio.done |
response_id |
String |
No |
ID of response |
resp_001 |
item_id |
String |
No |
ID of message item |
msg_008 |
output_index |
Integer |
No |
Index of output item in response |
0 |
content_index |
Integer |
No |
Index of content part in message item content array |
0 |
Function Calling
response.function_call_arguments.delta
Returned when model-generated function call arguments are updated.
Parameter |
Type |
Required |
Description |
Example Value |
event_id |
String |
No |
Unique identifier for server event |
event_5354 |
type |
String |
No |
Event type |
response.function_call_arguments.delta |
response_id |
String |
No |
ID of response |
resp_002 |
item_id |
String |
No |
ID of message item |
fc_001 |
output_index |
Integer |
No |
Index of output item in response |
0 |
call_id |
String |
No |
ID of function call |
call_001 |
delta |
String |
No |
JSON format function call arguments delta |
"{\"location\": \"San\"" |
response.function_call_arguments.done
Returned when model-generated function call arguments streaming is complete.
Parameter |
Type |
Required |
Description |
Example Value |
event_id |
String |
No |
Unique identifier for server event |
event_5556 |
type |
String |
No |
Event type |
response.function_call_arguments.done |
response_id |
String |
No |
ID of response |
resp_002 |
item_id |
String |
No |
ID of message item |
fc_001 |
output_index |
Integer |
No |
Index of output item in response |
0 |
call_id |
String |
No |
ID of function call |
call_001 |
arguments |
String |
No |
Final complete function call arguments (JSON format) |
"{\"location\": \"San Francisco\"}" |
Other Status Updates
rate_limits.updated
Triggered after each "response.done" event to indicate updated rate limits.
Parameter |
Type |
Required |
Description |
Example Value |
event_id |
String |
No |
Unique identifier for server event |
event_5758 |
type |
String |
No |
Event type |
rate_limits.updated |
rate_limits |
Object array |
No |
List of rate limit information |
[{"name": "requests_per_min", "limit": 60, "remaining": 45, "reset_seconds": 35}] |
conversation.created
Returned when conversation is created.
Parameter |
Type |
Required |
Description |
Example Value |
event_id |
String |
No |
Unique identifier for server event |
event_9101 |
type |
String |
No |
Event type |
conversation.created |
conversation |
Object |
No |
Conversation resource object |
{"id": "conv_001", "object": "realtime.conversation"} |
conversation.item.created
Returned when conversation item is created.
Parameter |
Type |
Required |
Description |
Example Value |
event_id |
String |
No |
Unique identifier for server event |
event_1920 |
type |
String |
No |
Event type |
conversation.item.created |
previous_item_id |
String |
No |
ID of previous conversation item |
msg_002 |
item |
Object |
No |
Conversation item object |
{"id": "msg_003", "object": "realtime.item", "type": "message", "status": "completed", "role": "user", "content": [{"type": "text", "text": "Hello"}]} |
session.created
Returned when session is created.
Parameter |
Type |
Required |
Description |
Example Value |
event_id |
String |
No |
Unique identifier for server event |
event_1234 |
type |
String |
No |
Event type |
session.created |
session |
Object |
No |
Session object |
{"id": "sess_001", "object": "realtime.session", "model": "gpt-4", "modalities": ["text", "audio"]} |
session.updated
Returned when session is updated.
Parameter |
Type |
Required |
Description |
Example Value |
event_id |
String |
No |
Unique identifier for server event |
event_5678 |
type |
String |
No |
Event type |
session.updated |
session |
Object |
No |
Updated session object |
{"id": "sess_001", "object": "realtime.session", "model": "gpt-4", "modalities": ["text", "audio"]} |
Rate Limit Event Parameter Table
Parameter |
Type |
Required |
Description |
Example Value |
name |
String |
Yes |
Limit name |
requests_per_min |
limit |
Integer |
Yes |
Limit value |
60 |
remaining |
Integer |
Yes |
Remaining available amount |
45 |
reset_seconds |
Integer |
Yes |
Reset time (seconds) |
35 |
Function Call Parameter Table
Parameter |
Type |
Required |
Description |
Example Value |
type |
String |
Yes |
Function type |
function |
name |
String |
Yes |
Function name |
get_weather |
description |
String |
No |
Function description |
Get the current weather |
parameters |
Object |
Yes |
Function parameter definition |
{"type": "object", "properties": {...}} |
Parameter |
Type |
Description |
Optional Values |
sample_rate |
Integer |
Sample rate |
8000, 16000, 24000, 44100, 48000 |
channels |
Integer |
Number of channels |
1 (mono), 2 (stereo) |
bits_per_sample |
Integer |
Bits per sample |
16 (pcm16), 8 (g711) |
encoding |
String |
Encoding method |
pcm16, g711_ulaw, g711_alaw |
Voice Detection Parameter Table
Parameter |
Type |
Description |
Default Value |
Range |
threshold |
Float |
VAD activation threshold |
0.5 |
0.0-1.0 |
prefix_padding_ms |
Integer |
Voice prefix padding (milliseconds) |
500 |
0-5000 |
silence_duration_ms |
Integer |
Silence detection duration (milliseconds) |
1000 |
100-10000 |
Parameter |
Type |
Description |
Optional Values |
tool_choice |
String |
Tool selection method |
auto, none, required |
tools |
Array |
Available tools list |
[{type, name, description, parameters}] |
Model Configuration Parameter Table
Parameter |
Type |
Description |
Range/Optional Values |
Default Value |
temperature |
Float |
Sampling temperature |
0.0-2.0 |
1.0 |
max_output_tokens |
Integer/String |
Maximum output length |
1-4096/"inf" |
"inf" |
modalities |
String array |
Response modalities |
["text", "audio"] |
["text"] |
voice |
String |
Voice type |
alloy, echo, shimmer |
alloy |
Event Common Parameter Table
Parameter |
Type |
Required |
Description |
Example Value |
event_id |
String |
Yes |
Unique identifier for event |
event_123 |
type |
String |
Yes |
Event type |
session.update |
timestamp |
Integer |
No |
Event timestamp (milliseconds) |
1677649363000 |
Session Status Parameter Table
Parameter |
Type |
Description |
Optional Values |
status |
String |
Session status |
active, ended, error |
error |
Object |
Error information |
{"type": "error_type", "message": "error message"} |
metadata |
Object |
Session metadata |
{"client_id": "web", "session_type": "chat"} |
Conversation Item Status Parameter Table
Parameter |
Type |
Description |
Optional Values |
status |
String |
Conversation item status |
completed, in_progress, incomplete |
role |
String |
Sender role |
user, assistant, system |
type |
String |
Conversation item type |
message, function_call, function_call_output |
Content Type Parameter Table
Parameter |
Type |
Description |
Optional Values |
type |
String |
Content type |
text, audio, transcript |
format |
String |
Content format |
plain, markdown, html |
encoding |
String |
Encoding method |
utf-8, base64 |
Response Status Parameter Table
Parameter |
Type |
Description |
Optional Values |
status |
String |
Response status |
completed, cancelled, failed, incomplete |
status_details |
Object |
Status details |
{"reason": "user_cancelled"} |
usage |
Object |
Usage statistics |
{"total_tokens": 50, "input_tokens": 20, "output_tokens": 30} |
Audio Transcription Parameter Table
Parameter |
Type |
Description |
Example Value |
enabled |
Boolean |
Whether transcription is enabled |
true |
model |
String |
Transcription model |
whisper-1 |
language |
String |
Transcription language |
en, zh, auto |
prompt |
String |
Transcription prompt |
"Transcript of a conversation" |
Audio Stream Parameter Table
Parameter |
Type |
Description |
Optional Values |
chunk_size |
Integer |
Audio chunk size (bytes) |
1024, 2048, 4096 |
latency |
String |
Latency mode |
low, balanced, high |
compression |
String |
Compression method |
none, opus, mp3 |
WebRTC Configuration Parameter Table
Parameter |
Type |
Description |
Default Value |
ice_servers |
Array |
ICE server list |
[{"urls": "stun:stun.l.google.com:19302"}] |
audio_constraints |
Object |
Audio constraints |
{"echoCancellation": true} |
connection_timeout |
Integer |
Connection timeout (milliseconds) |
30000 |