Streaming
Stream responses token-by-token using Server-Sent Events (SSE). Both the OpenAI and Anthropic API formats support streaming.
OpenAI SSE Format
Set "stream": true in the request body to receive a stream of Server-Sent Events. Each event is a line prefixed with data: followed by a JSON object. The stream terminates with data: [DONE].
Set "stream_options": {"include_usage": true} to receive token usage statistics in the final chunk.
Chunk Schema
| Field | Type | Description |
|---|---|---|
| id | string | Response ID (same across all chunks) |
| object | string | Always "chat.completion.chunk" |
| created | integer | Unix timestamp |
| model | string | Model used |
| choices | array | Array with one choice object |
| usage | object | Token usage (only in final chunk when include_usage is true) |
Delta Object
| Field | Type | Description |
|---|---|---|
| role | string | Set in first chunk (usually "assistant") |
| content | string | Text token fragment |
| tool_calls | array | Tool call deltas (partial function name/arguments) |
| reasoning_details | array | Reasoning detail deltas |
Example SSE Stream
data: {"id":"chatcmpl-abc","object":"chat.completion.chunk","created":1700000000,"model":"anthropic/claude-sonnet-4","choices":[{"index":0,"delta":{"role":"assistant","content":""},"finish_reason":null}]}
data: {"id":"chatcmpl-abc","object":"chat.completion.chunk","created":1700000000,"model":"anthropic/claude-sonnet-4","choices":[{"index":0,"delta":{"content":"Hello"},"finish_reason":null}]}
data: {"id":"chatcmpl-abc","object":"chat.completion.chunk","created":1700000000,"model":"anthropic/claude-sonnet-4","choices":[{"index":0,"delta":{"content":"!"},"finish_reason":"stop"}]}
data: [DONE]Examples
curl https://api.routerhub.ai/v1/chat/completions \
-H "Authorization: Bearer $ROUTERHUB_API_KEY" \
-H "Content-Type: application/json" \
--no-buffer \
-d '{
"model": "anthropic/claude-sonnet-4",
"stream": true,
"messages": [
{"role": "user", "content": "Write a haiku about coding"}
]
}'import requests
import json
response = requests.post(
"https://api.routerhub.ai/v1/chat/completions",
headers={
"Authorization": "Bearer YOUR_API_KEY",
"Content-Type": "application/json",
},
json={
"model": "anthropic/claude-sonnet-4",
"stream": True,
"messages": [
{"role": "user", "content": "Write a haiku about coding"}
],
},
stream=True,
)
for line in response.iter_lines():
if line:
line = line.decode("utf-8")
if line.startswith("data: ") and line != "data: [DONE]":
chunk = json.loads(line[6:])
content = chunk["choices"][0]["delta"].get("content", "")
print(content, end="", flush=True)from openai import OpenAI
client = OpenAI(
base_url="https://api.routerhub.ai/v1",
api_key="YOUR_API_KEY",
)
stream = client.chat.completions.create(
model="anthropic/claude-sonnet-4",
stream=True,
messages=[
{"role": "user", "content": "Write a haiku about coding"}
],
)
for chunk in stream:
content = chunk.choices[0].delta.content
if content:
print(content, end="", flush=True)from langchain_openai import ChatOpenAI
llm = ChatOpenAI(
base_url="https://api.routerhub.ai/v1",
api_key="YOUR_API_KEY",
model="anthropic/claude-sonnet-4",
streaming=True,
)
for chunk in llm.stream("Write a haiku about coding"):
print(chunk.content, end="", flush=True)Anthropic SSE Format
Set "stream": true in the request body. The Anthropic format uses named event types with event: and data: lines.
Event Types
| Event | Description |
|---|---|
| message_start | First event, contains full message object with empty content |
| content_block_start | Start of a content block (text, tool_use, thinking) |
| content_block_delta | Incremental content (text_delta, input_json_delta, thinking_delta) |
| content_block_stop | End of a content block |
| message_delta | Final event with stop_reason and output usage |
| message_stop | Stream termination |
| ping | Keep-alive ping |
Example SSE Stream
event: message_start
data: {"type":"message_start","message":{"id":"msg_abc","type":"message","role":"assistant","content":[],"model":"anthropic/claude-sonnet-4","stop_reason":null,"usage":{"input_tokens":25,"output_tokens":0}}}
event: content_block_start
data: {"type":"content_block_start","index":0,"content_block":{"type":"text","text":""}}
event: content_block_delta
data: {"type":"content_block_delta","index":0,"delta":{"type":"text_delta","text":"Hello"}}
event: content_block_delta
data: {"type":"content_block_delta","index":0,"delta":{"type":"text_delta","text":"!"}}
event: content_block_stop
data: {"type":"content_block_stop","index":0}
event: message_delta
data: {"type":"message_delta","delta":{"stop_reason":"end_turn"},"usage":{"output_tokens":12}}
event: message_stop
data: {"type":"message_stop"}Examples
curl https://api.routerhub.ai/v1/messages \
-H "x-api-key: $ROUTERHUB_API_KEY" \
-H "anthropic-version: 2023-06-01" \
-H "Content-Type: application/json" \
--no-buffer \
-d '{
"model": "anthropic/claude-sonnet-4",
"max_tokens": 1024,
"stream": true,
"messages": [
{"role": "user", "content": "Write a haiku about coding"}
]
}'import requests
import json
response = requests.post(
"https://api.routerhub.ai/v1/messages",
headers={
"x-api-key": "YOUR_API_KEY",
"anthropic-version": "2023-06-01",
"Content-Type": "application/json",
},
json={
"model": "anthropic/claude-sonnet-4",
"max_tokens": 1024,
"stream": True,
"messages": [
{"role": "user", "content": "Write a haiku about coding"}
],
},
stream=True,
)
for line in response.iter_lines():
if line:
line = line.decode("utf-8")
if line.startswith("data: "):
data = json.loads(line[6:])
if data["type"] == "content_block_delta":
print(data["delta"]["text"], end="", flush=True)from anthropic import Anthropic
client = Anthropic(
base_url="https://api.routerhub.ai",
api_key="YOUR_API_KEY",
)
with client.messages.stream(
model="anthropic/claude-sonnet-4",
max_tokens=1024,
messages=[
{"role": "user", "content": "Write a haiku about coding"}
],
) as stream:
for text in stream.text_stream:
print(text, end="", flush=True)Usage in Streaming
Token usage information is available in streaming responses for both API formats:
OpenAI Format
Set "stream_options": {"include_usage": true} in the request. Token usage will appear in the usage field of the final chunk (the last chunk before data: [DONE]).
Anthropic Format
Usage is provided automatically in two events:
- Input usage is included in the
message_startevent, inside themessage.usageobject. - Output usage is included in the
message_deltaevent, inside theusageobject.