Streaming

Stream responses token-by-token using Server-Sent Events (SSE). Both the OpenAI and Anthropic API formats support streaming.


OpenAI SSE Format

POST /v1/chat/completions

Set "stream": true in the request body to receive a stream of Server-Sent Events. Each event is a line prefixed with data: followed by a JSON object. The stream terminates with data: [DONE].

Set "stream_options": {"include_usage": true} to receive token usage statistics in the final chunk.

Chunk Schema

Field Type Description
id string Response ID (same across all chunks)
object string Always "chat.completion.chunk"
created integer Unix timestamp
model string Model used
choices array Array with one choice object
usage object Token usage (only in final chunk when include_usage is true)

Delta Object

Field Type Description
role string Set in first chunk (usually "assistant")
content string Text token fragment
tool_calls array Tool call deltas (partial function name/arguments)
reasoning_details array Reasoning detail deltas

Example SSE Stream

data: {"id":"chatcmpl-abc","object":"chat.completion.chunk","created":1700000000,"model":"anthropic/claude-sonnet-4","choices":[{"index":0,"delta":{"role":"assistant","content":""},"finish_reason":null}]}

data: {"id":"chatcmpl-abc","object":"chat.completion.chunk","created":1700000000,"model":"anthropic/claude-sonnet-4","choices":[{"index":0,"delta":{"content":"Hello"},"finish_reason":null}]}

data: {"id":"chatcmpl-abc","object":"chat.completion.chunk","created":1700000000,"model":"anthropic/claude-sonnet-4","choices":[{"index":0,"delta":{"content":"!"},"finish_reason":"stop"}]}

data: [DONE]

Examples

curl https://api.routerhub.ai/v1/chat/completions \
  -H "Authorization: Bearer $ROUTERHUB_API_KEY" \
  -H "Content-Type: application/json" \
  --no-buffer \
  -d '{
    "model": "anthropic/claude-sonnet-4",
    "stream": true,
    "messages": [
      {"role": "user", "content": "Write a haiku about coding"}
    ]
  }'
import requests
import json

response = requests.post(
    "https://api.routerhub.ai/v1/chat/completions",
    headers={
        "Authorization": "Bearer YOUR_API_KEY",
        "Content-Type": "application/json",
    },
    json={
        "model": "anthropic/claude-sonnet-4",
        "stream": True,
        "messages": [
            {"role": "user", "content": "Write a haiku about coding"}
        ],
    },
    stream=True,
)

for line in response.iter_lines():
    if line:
        line = line.decode("utf-8")
        if line.startswith("data: ") and line != "data: [DONE]":
            chunk = json.loads(line[6:])
            content = chunk["choices"][0]["delta"].get("content", "")
            print(content, end="", flush=True)
from openai import OpenAI

client = OpenAI(
    base_url="https://api.routerhub.ai/v1",
    api_key="YOUR_API_KEY",
)

stream = client.chat.completions.create(
    model="anthropic/claude-sonnet-4",
    stream=True,
    messages=[
        {"role": "user", "content": "Write a haiku about coding"}
    ],
)

for chunk in stream:
    content = chunk.choices[0].delta.content
    if content:
        print(content, end="", flush=True)
from langchain_openai import ChatOpenAI

llm = ChatOpenAI(
    base_url="https://api.routerhub.ai/v1",
    api_key="YOUR_API_KEY",
    model="anthropic/claude-sonnet-4",
    streaming=True,
)

for chunk in llm.stream("Write a haiku about coding"):
    print(chunk.content, end="", flush=True)

Anthropic SSE Format

POST /v1/messages

Set "stream": true in the request body. The Anthropic format uses named event types with event: and data: lines.

Event Types

Event Description
message_start First event, contains full message object with empty content
content_block_start Start of a content block (text, tool_use, thinking)
content_block_delta Incremental content (text_delta, input_json_delta, thinking_delta)
content_block_stop End of a content block
message_delta Final event with stop_reason and output usage
message_stop Stream termination
ping Keep-alive ping

Example SSE Stream

event: message_start
data: {"type":"message_start","message":{"id":"msg_abc","type":"message","role":"assistant","content":[],"model":"anthropic/claude-sonnet-4","stop_reason":null,"usage":{"input_tokens":25,"output_tokens":0}}}

event: content_block_start
data: {"type":"content_block_start","index":0,"content_block":{"type":"text","text":""}}

event: content_block_delta
data: {"type":"content_block_delta","index":0,"delta":{"type":"text_delta","text":"Hello"}}

event: content_block_delta
data: {"type":"content_block_delta","index":0,"delta":{"type":"text_delta","text":"!"}}

event: content_block_stop
data: {"type":"content_block_stop","index":0}

event: message_delta
data: {"type":"message_delta","delta":{"stop_reason":"end_turn"},"usage":{"output_tokens":12}}

event: message_stop
data: {"type":"message_stop"}

Examples

curl https://api.routerhub.ai/v1/messages \
  -H "x-api-key: $ROUTERHUB_API_KEY" \
  -H "anthropic-version: 2023-06-01" \
  -H "Content-Type: application/json" \
  --no-buffer \
  -d '{
    "model": "anthropic/claude-sonnet-4",
    "max_tokens": 1024,
    "stream": true,
    "messages": [
      {"role": "user", "content": "Write a haiku about coding"}
    ]
  }'
import requests
import json

response = requests.post(
    "https://api.routerhub.ai/v1/messages",
    headers={
        "x-api-key": "YOUR_API_KEY",
        "anthropic-version": "2023-06-01",
        "Content-Type": "application/json",
    },
    json={
        "model": "anthropic/claude-sonnet-4",
        "max_tokens": 1024,
        "stream": True,
        "messages": [
            {"role": "user", "content": "Write a haiku about coding"}
        ],
    },
    stream=True,
)

for line in response.iter_lines():
    if line:
        line = line.decode("utf-8")
        if line.startswith("data: "):
            data = json.loads(line[6:])
            if data["type"] == "content_block_delta":
                print(data["delta"]["text"], end="", flush=True)
from anthropic import Anthropic

client = Anthropic(
    base_url="https://api.routerhub.ai",
    api_key="YOUR_API_KEY",
)

with client.messages.stream(
    model="anthropic/claude-sonnet-4",
    max_tokens=1024,
    messages=[
        {"role": "user", "content": "Write a haiku about coding"}
    ],
) as stream:
    for text in stream.text_stream:
        print(text, end="", flush=True)

Usage in Streaming

Token usage information is available in streaming responses for both API formats:

OpenAI Format

Set "stream_options": {"include_usage": true} in the request. Token usage will appear in the usage field of the final chunk (the last chunk before data: [DONE]).

Anthropic Format

Usage is provided automatically in two events: