Chat Completions
POST
/v1/chat/completions
Create a chat completion. Compatible with the OpenAI Chat Completions API format.
Request Body
| Field | Type | Description | |
|---|---|---|---|
| model | string | Required | Model ID (e.g., anthropic/claude-sonnet-4, google/gemini-2.5-pro, openai/gpt-4.1) |
| messages | array | Required | Array of message objects. See Message Object below. |
| max_tokens | integer | Optional | Maximum number of tokens to generate. |
| max_completion_tokens | integer | Optional | Alternative to max_tokens. Used as fallback when max_tokens is not set. |
| temperature | number | Optional | Sampling temperature (0–2). nil = model default. |
| top_p | number | Optional | Nucleus sampling. nil = model default. |
| stream | boolean | Optional | Stream response via SSE. See the Streaming guide. |
| stream_options | object | Optional | {"include_usage": true} to receive token usage in the final streamed chunk. |
| stop | string or array | Optional | Up to 4 stop sequences. |
| tools | array | Optional | List of tool definitions. See the Tool Calling guide. |
| tool_choice | string or object | Optional | "auto", "none", "required", or {"type":"function","function":{"name":"..."}} |
| parallel_tool_calls | boolean | Optional | Allow the model to make multiple tool calls in parallel. |
| response_format | object | Optional | {"type":"json_object"} or {"type":"json_schema","json_schema":{...}}. See the Structured Output guide. |
| reasoning | object | Optional | {"effort":"low"|"medium"|"high", "max_tokens": N, "exclude": bool}. See the Reasoning
Prompt Caching guide. |
| reasoning_effort | string | Optional | Shorthand: "low", "medium", or "high". |
| n | integer | Optional | Number of completions to generate. |
| seed | integer | Optional | Deterministic sampling seed. |
| user | string | Optional | End-user identifier for abuse monitoring. |
| presence_penalty | number | Optional | Presence penalty (−2 to 2). |
| frequency_penalty | number | Optional | Frequency penalty (−2 to 2). |
| logit_bias | object | Optional | Map of token IDs to bias values (−100 to 100). |
| logprobs | boolean | Optional | Return log probabilities of output tokens. |
| top_logprobs | integer | Optional | Number of most likely tokens to return at each position (0–20). Requires logprobs: true. |
| service_tier | string | Optional | Service tier preference. |
Message Object
| Field | Type | Description | |
|---|---|---|---|
| role | string | Required | "system", "user", "assistant", "tool", or "developer" |
| content | string or array | Required | Text content or array of content parts (for multimodal inputs such as images). See Vision below. |
| name | string | Optional | An optional name for the participant. |
| tool_calls | array | Optional | Tool calls generated by the model (present in assistant messages). |
| tool_call_id | string | Conditional | Required for role="tool" messages. The ID of the tool call this message is responding to. |
| reasoning_details | array | Optional | Reasoning details from the model's extended thinking process. |
Response Body
| Field | Type | Description |
|---|---|---|
| id | string | Unique response identifier (e.g., chatcmpl-abc123). |
| object | string | Always "chat.completion". |
| created | integer | Unix timestamp of when the response was created. |
| model | string | The model used for the completion. |
| choices | array | Array of choice objects. See Choice Object below. |
| usage | object | Token usage statistics. See Usage Object below. |
| system_fingerprint | string | System fingerprint (if available). |
| service_tier | string | The service tier used for the request. |
Choice Object
| Field | Type | Description |
|---|---|---|
| index | integer | The index of this choice in the array. |
| message | object | The assistant's message, containing role and content fields. |
| finish_reason | string | "stop", "length", "tool_calls", or "content_filter". |
Usage Object
| Field | Type | Description |
|---|---|---|
| prompt_tokens | integer | Number of tokens in the prompt. |
| completion_tokens | integer | Number of tokens in the generated completion. |
| total_tokens | integer | Total tokens used (prompt_tokens + completion_tokens). |
| prompt_tokens_details | object | Breakdown of prompt tokens. Contains cached_tokens (number of tokens served from cache). |
| completion_tokens_details | object | Breakdown of completion tokens. Contains reasoning_tokens (tokens used for internal reasoning). |
Examples
Basic Request
curl https://api.routerhub.ai/v1/chat/completions \
-H "Authorization: Bearer $ROUTERHUB_API_KEY" \
-H "Content-Type: application/json" \
-d '{
"model": "anthropic/claude-sonnet-4",
"messages": [
{"role": "user", "content": "What is the capital of France?"}
],
"max_tokens": 256
}'import requests
response = requests.post(
"https://api.routerhub.ai/v1/chat/completions",
headers={
"Authorization": "Bearer YOUR_API_KEY",
"Content-Type": "application/json",
},
json={
"model": "anthropic/claude-sonnet-4",
"messages": [
{"role": "user", "content": "What is the capital of France?"}
],
"max_tokens": 256,
},
)
data = response.json()
print(data["choices"][0]["message"]["content"])from openai import OpenAI
client = OpenAI(
base_url="https://api.routerhub.ai/v1",
api_key="YOUR_API_KEY",
)
response = client.chat.completions.create(
model="anthropic/claude-sonnet-4",
messages=[
{"role": "user", "content": "What is the capital of France?"}
],
max_tokens=256,
)
print(response.choices[0].message.content)from langchain_openai import ChatOpenAI
llm = ChatOpenAI(
base_url="https://api.routerhub.ai/v1",
api_key="YOUR_API_KEY",
model="anthropic/claude-sonnet-4",
max_tokens=256,
)
response = llm.invoke("What is the capital of France?")
print(response.content)Multi-turn Conversation
curl https://api.routerhub.ai/v1/chat/completions \
-H "Authorization: Bearer $ROUTERHUB_API_KEY" \
-H "Content-Type: application/json" \
-d '{
"model": "google/gemini-2.5-pro",
"messages": [
{"role": "user", "content": "What is photosynthesis?"},
{"role": "assistant", "content": "Photosynthesis is the process by which plants convert sunlight, water, and carbon dioxide into glucose and oxygen."},
{"role": "user", "content": "What is the chemical equation for it?"}
],
"max_tokens": 512
}'import requests
response = requests.post(
"https://api.routerhub.ai/v1/chat/completions",
headers={
"Authorization": "Bearer YOUR_API_KEY",
"Content-Type": "application/json",
},
json={
"model": "google/gemini-2.5-pro",
"messages": [
{"role": "user", "content": "What is photosynthesis?"},
{"role": "assistant", "content": "Photosynthesis is the process by which plants convert sunlight, water, and carbon dioxide into glucose and oxygen."},
{"role": "user", "content": "What is the chemical equation for it?"},
],
"max_tokens": 512,
},
)
print(response.json()["choices"][0]["message"]["content"])from openai import OpenAI
client = OpenAI(
base_url="https://api.routerhub.ai/v1",
api_key="YOUR_API_KEY",
)
response = client.chat.completions.create(
model="google/gemini-2.5-pro",
messages=[
{"role": "user", "content": "What is photosynthesis?"},
{"role": "assistant", "content": "Photosynthesis is the process by which plants convert sunlight, water, and carbon dioxide into glucose and oxygen."},
{"role": "user", "content": "What is the chemical equation for it?"},
],
max_tokens=512,
)
print(response.choices[0].message.content)from langchain_openai import ChatOpenAI
from langchain_core.messages import HumanMessage, AIMessage
llm = ChatOpenAI(
base_url="https://api.routerhub.ai/v1",
api_key="YOUR_API_KEY",
model="google/gemini-2.5-pro",
max_tokens=512,
)
response = llm.invoke([
HumanMessage(content="What is photosynthesis?"),
AIMessage(content="Photosynthesis is the process by which plants convert sunlight, water, and carbon dioxide into glucose and oxygen."),
HumanMessage(content="What is the chemical equation for it?"),
])
print(response.content)System Message
curl https://api.routerhub.ai/v1/chat/completions \
-H "Authorization: Bearer $ROUTERHUB_API_KEY" \
-H "Content-Type: application/json" \
-d '{
"model": "openai/gpt-4.1",
"messages": [
{"role": "system", "content": "You are a helpful math tutor. Explain concepts step by step."},
{"role": "user", "content": "Solve: 2x + 5 = 13"}
],
"temperature": 0.2,
"max_tokens": 1024
}'import requests
response = requests.post(
"https://api.routerhub.ai/v1/chat/completions",
headers={
"Authorization": "Bearer YOUR_API_KEY",
"Content-Type": "application/json",
},
json={
"model": "openai/gpt-4.1",
"messages": [
{"role": "system", "content": "You are a helpful math tutor. Explain concepts step by step."},
{"role": "user", "content": "Solve: 2x + 5 = 13"},
],
"temperature": 0.2,
"max_tokens": 1024,
},
)
print(response.json()["choices"][0]["message"]["content"])from openai import OpenAI
client = OpenAI(
base_url="https://api.routerhub.ai/v1",
api_key="YOUR_API_KEY",
)
response = client.chat.completions.create(
model="openai/gpt-4.1",
messages=[
{"role": "system", "content": "You are a helpful math tutor. Explain concepts step by step."},
{"role": "user", "content": "Solve: 2x + 5 = 13"},
],
temperature=0.2,
max_tokens=1024,
)
print(response.choices[0].message.content)from langchain_openai import ChatOpenAI
from langchain_core.messages import SystemMessage, HumanMessage
llm = ChatOpenAI(
base_url="https://api.routerhub.ai/v1",
api_key="YOUR_API_KEY",
model="openai/gpt-4.1",
temperature=0.2,
max_tokens=1024,
)
response = llm.invoke([
SystemMessage(content="You are a helpful math tutor. Explain concepts step by step."),
HumanMessage(content="Solve: 2x + 5 = 13"),
])
print(response.content)Vision (Image Input)
Send images to vision-capable models using the image_url content part type. Both HTTP/HTTPS URLs and base64 data URIs are supported.
curl https://api.routerhub.ai/v1/chat/completions \
-H "Authorization: Bearer $ROUTERHUB_API_KEY" \
-H "Content-Type: application/json" \
-d '{
"model": "anthropic/claude-sonnet-4",
"messages": [
{
"role": "user",
"content": [
{"type": "text", "text": "What is in this image?"},
{
"type": "image_url",
"image_url": {
"url": "https://example.com/photo.jpg"
}
}
]
}
],
"max_tokens": 512
}'import requests
response = requests.post(
"https://api.routerhub.ai/v1/chat/completions",
headers={
"Authorization": "Bearer YOUR_API_KEY",
"Content-Type": "application/json",
},
json={
"model": "anthropic/claude-sonnet-4",
"messages": [
{
"role": "user",
"content": [
{"type": "text", "text": "What is in this image?"},
{
"type": "image_url",
"image_url": {
"url": "https://example.com/photo.jpg"
},
},
],
}
],
"max_tokens": 512,
},
)
print(response.json()["choices"][0]["message"]["content"])from openai import OpenAI
client = OpenAI(
base_url="https://api.routerhub.ai/v1",
api_key="YOUR_API_KEY",
)
response = client.chat.completions.create(
model="anthropic/claude-sonnet-4",
messages=[
{
"role": "user",
"content": [
{"type": "text", "text": "What is in this image?"},
{
"type": "image_url",
"image_url": {
"url": "https://example.com/photo.jpg"
},
},
],
}
],
max_tokens=512,
)
print(response.choices[0].message.content)You can also pass images as base64 data URIs:
{
"type": "image_url",
"image_url": {
"url": "data:image/png;base64,iVBORw0KGgo..."
}
}Limits: Individual images are limited to 5 MB, with a total of 12 MB of images per request.
Sample Response
{
"id": "chatcmpl-abc123def456",
"object": "chat.completion",
"created": 1709251200,
"model": "anthropic/claude-sonnet-4",
"choices": [
{
"index": 0,
"message": {
"role": "assistant",
"content": "The capital of France is Paris. It is the largest city in France and serves as the country's political, economic, and cultural center."
},
"finish_reason": "stop"
}
],
"usage": {
"prompt_tokens": 14,
"completion_tokens": 32,
"total_tokens": 46,
"prompt_tokens_details": {
"cached_tokens": 0
},
"completion_tokens_details": {
"reasoning_tokens": 0
}
},
"system_fingerprint": null,
"service_tier": "default"
}