Streaming

Stream responses token-by-token using Server-Sent Events (SSE). Both the OpenAI and Anthropic API formats support streaming.

OpenAI SSE Format

POST /v1/chat/completions

Set "stream": true in the request body to receive a stream of Server-Sent Events. Each event is a line prefixed with data: followed by a JSON object. The stream terminates with data: [DONE].

Set "stream_options": {"include_usage": true} to receive token usage statistics in the final chunk.

Chunk Schema

Field	Type	Description
id	string	Response ID (same across all chunks)
object	string	Always `"chat.completion.chunk"`
created	integer	Unix timestamp
model	string	Model used
choices	array	Array with one choice object
usage	object	Token usage (only in final chunk when `include_usage` is true)

Delta Object

Field	Type	Description
role	string	Set in first chunk (usually `"assistant"`)
content	string	Text token fragment
tool_calls	array	Tool call deltas (partial function name/arguments)
reasoning_details	array	Reasoning detail deltas

Example SSE Stream

data: {"id":"chatcmpl-abc","object":"chat.completion.chunk","created":1700000000,"model":"anthropic/claude-opus-4.6","choices":[{"index":0,"delta":{"role":"assistant","content":""},"finish_reason":null}]}

data: {"id":"chatcmpl-abc","object":"chat.completion.chunk","created":1700000000,"model":"anthropic/claude-opus-4.6","choices":[{"index":0,"delta":{"content":"Hello"},"finish_reason":null}]}

data: {"id":"chatcmpl-abc","object":"chat.completion.chunk","created":1700000000,"model":"anthropic/claude-opus-4.6","choices":[{"index":0,"delta":{"content":"!"},"finish_reason":"stop"}]}

data: [DONE]

Examples

curl https://api.routerhub.ai/v1/chat/completions \
  -H "Authorization: Bearer $ROUTERHUB_API_KEY" \
  -H "Content-Type: application/json" \
  --no-buffer \
  -d '{
    "model": "anthropic/claude-opus-4.6",
    "stream": true,
    "messages": [
      {"role": "user", "content": "Write a haiku about coding"}
    ]
  }'

import requests
import json

response = requests.post(
    "https://api.routerhub.ai/v1/chat/completions",
    headers={
        "Authorization": "Bearer YOUR_API_KEY",
        "Content-Type": "application/json",
    },
    json={
        "model": "anthropic/claude-opus-4.6",
        "stream": True,
        "messages": [
            {"role": "user", "content": "Write a haiku about coding"}
        ],
    },
    stream=True,
)

for line in response.iter_lines():
    if line:
        line = line.decode("utf-8")
        if line.startswith("data: ") and line != "data: [DONE]":
            chunk = json.loads(line[6:])
            content = chunk["choices"][0]["delta"].get("content", "")
            print(content, end="", flush=True)

from openai import OpenAI

client = OpenAI(
    base_url="https://api.routerhub.ai/v1",
    api_key="YOUR_API_KEY",
)

stream = client.chat.completions.create(
    model="anthropic/claude-opus-4.6",
    stream=True,
    messages=[
        {"role": "user", "content": "Write a haiku about coding"}
    ],
)

for chunk in stream:
    content = chunk.choices[0].delta.content
    if content:
        print(content, end="", flush=True)

from langchain_openai import ChatOpenAI

llm = ChatOpenAI(
    base_url="https://api.routerhub.ai/v1",
    api_key="YOUR_API_KEY",
    model="anthropic/claude-opus-4.6",
    streaming=True,
)

for chunk in llm.stream("Write a haiku about coding"):
    print(chunk.content, end="", flush=True)

Anthropic SSE Format

POST /v1/messages

Set "stream": true in the request body. The Anthropic format uses named event types with event: and data: lines.

Event Types

Event	Description
message_start	First event, contains full message object with empty content
content_block_start	Start of a content block (text, tool_use, thinking)
content_block_delta	Incremental content (text_delta, input_json_delta, thinking_delta)
content_block_stop	End of a content block
message_delta	Final event with stop_reason and output usage
message_stop	Stream termination
ping	Keep-alive ping

Example SSE Stream

event: message_start
data: {"type":"message_start","message":{"id":"msg_abc","type":"message","role":"assistant","content":[],"model":"anthropic/claude-opus-4.6","stop_reason":null,"usage":{"input_tokens":25,"output_tokens":0}}}

event: content_block_start
data: {"type":"content_block_start","index":0,"content_block":{"type":"text","text":""}}

event: content_block_delta
data: {"type":"content_block_delta","index":0,"delta":{"type":"text_delta","text":"Hello"}}

event: content_block_delta
data: {"type":"content_block_delta","index":0,"delta":{"type":"text_delta","text":"!"}}

event: content_block_stop
data: {"type":"content_block_stop","index":0}

event: message_delta
data: {"type":"message_delta","delta":{"stop_reason":"end_turn"},"usage":{"output_tokens":12}}

event: message_stop
data: {"type":"message_stop"}

Examples

curl https://api.routerhub.ai/v1/messages \
  -H "x-api-key: $ROUTERHUB_API_KEY" \
  -H "anthropic-version: 2023-06-01" \
  -H "Content-Type: application/json" \
  --no-buffer \
  -d '{
    "model": "anthropic/claude-opus-4.6",
    "max_tokens": 1024,
    "stream": true,
    "messages": [
      {"role": "user", "content": "Write a haiku about coding"}
    ]
  }'

import requests
import json

response = requests.post(
    "https://api.routerhub.ai/v1/messages",
    headers={
        "x-api-key": "YOUR_API_KEY",
        "anthropic-version": "2023-06-01",
        "Content-Type": "application/json",
    },
    json={
        "model": "anthropic/claude-opus-4.6",
        "max_tokens": 1024,
        "stream": True,
        "messages": [
            {"role": "user", "content": "Write a haiku about coding"}
        ],
    },
    stream=True,
)

for line in response.iter_lines():
    if line:
        line = line.decode("utf-8")
        if line.startswith("data: "):
            data = json.loads(line[6:])
            if data["type"] == "content_block_delta":
                print(data["delta"]["text"], end="", flush=True)

from anthropic import Anthropic

client = Anthropic(
    base_url="https://api.routerhub.ai",
    api_key="YOUR_API_KEY",
)

with client.messages.stream(
    model="anthropic/claude-opus-4.6",
    max_tokens=1024,
    messages=[
        {"role": "user", "content": "Write a haiku about coding"}
    ],
) as stream:
    for text in stream.text_stream:
        print(text, end="", flush=True)

Gemini SSE Format

POST /v1beta/models/{model}:streamGenerateContent

Point the Google GenAI SDK at RouterHub and call generate_content_stream, or send a raw POST to the :streamGenerateContent action. Each event is a line prefixed with data: followed by a partial GenerateContentResponse.

The Gemini stream has no [DONE] terminator — unlike the OpenAI format, the stream simply ends when the connection closes. Detect completion via the presence of finishReason on the final chunk.

Chunk Contents

Text chunks carry candidates[0].content.parts[0].text with the incremental delta.
Thought chunks (when thinkingConfig.includeThoughts is set) carry a part with thought: true and the accumulated thoughtSignature.
Tool-call chunks emit a single part containing a complete functionCall. Function arguments are not streamed incrementally on the Gemini format.
The final chunk carries candidates[0].finishReason (STOP, MAX_TOKENS, SAFETY, OTHER) and the full usageMetadata.

Example SSE Stream

data: {"candidates":[{"content":{"role":"model","parts":[{"text":"Hel"}]}}],"modelVersion":"gemini-2.5-pro"}

data: {"candidates":[{"content":{"role":"model","parts":[{"text":"lo"}]}}],"modelVersion":"gemini-2.5-pro"}

data: {"candidates":[{"content":{"role":"model","parts":[{"text":"!"}]},"finishReason":"STOP"}],"modelVersion":"gemini-2.5-pro","usageMetadata":{"promptTokenCount":5,"candidatesTokenCount":3,"totalTokenCount":8}}

Examples

curl https://api.routerhub.ai/v1beta/models/gemini-2.5-pro:streamGenerateContent \
  -H "Authorization: Bearer $ROUTERHUB_API_KEY" \
  -H "Content-Type: application/json" \
  --no-buffer \
  -d '{
    "contents": [
      {"role": "user", "parts": [{"text": "Write a haiku about coding"}]}
    ]
  }'

import requests
import json

response = requests.post(
    "https://api.routerhub.ai/v1beta/models/gemini-2.5-pro:streamGenerateContent",
    headers={
        "Authorization": "Bearer YOUR_API_KEY",
        "Content-Type": "application/json",
    },
    json={
        "contents": [
            {"role": "user", "parts": [{"text": "Write a haiku about coding"}]}
        ],
    },
    stream=True,
)

for line in response.iter_lines():
    if not line:
        continue
    line = line.decode("utf-8")
    if not line.startswith("data: "):
        continue
    chunk = json.loads(line[6:])
    for part in chunk["candidates"][0]["content"].get("parts", []):
        if "text" in part:
            print(part["text"], end="", flush=True)

from google import genai

client = genai.Client(
    api_key="YOUR_API_KEY",
    http_options={"base_url": "https://api.routerhub.ai"},
)

# Works for Gemini natively, plus Claude / GPT via cross-provider dispatch.
stream = client.models.generate_content_stream(
    model="gemini-2.5-pro",
    contents="Write a haiku about coding",
)

for chunk in stream:
    if chunk.text:
        print(chunk.text, end="", flush=True)

Usage in Streaming

Token usage information is available in streaming responses for all three API formats:

OpenAI Format

Set "stream_options": {"include_usage": true} in the request. Token usage will appear in the usage field of the final chunk (the last chunk before data: [DONE]).

Anthropic Format

Usage is provided automatically in two events:

Input usage is included in the message_start event, inside the message.usage object.
Output usage is included in the message_delta event, inside the usage object.

Gemini Format

Usage is provided automatically on the final chunk, inside the top-level usageMetadata field with promptTokenCount, candidatesTokenCount, thoughtsTokenCount, cachedContentTokenCount, and totalTokenCount.