Reasoning
Enable extended thinking for complex tasks. The model can reason through problems step-by-step before generating a final response.
OpenAI Format
Use the /v1/chat/completions endpoint with reasoning parameters to enable extended thinking.
Using reasoning object
Pass a reasoning object in the request body to control the model's thinking behavior:
reasoning.effort:"none","minimal","low","medium","high", or"xhigh"— controls how much thinking the model doesreasoning.max_tokens: integer — set a specific token budget for reasoning (mutually exclusive witheffort)reasoning.exclude: boolean — iftrue, reasoning tokens are not included in the response
reasoning.effort and reasoning.max_tokens are mutually exclusive. Do not set both.
reasoning fields
| Field | Type | Description |
|---|---|---|
| effort | string | "none", "minimal", "low", "medium", "high", or "xhigh". Controls reasoning depth |
| max_tokens | integer | Maximum tokens for reasoning. Mutually exclusive with effort |
| exclude | boolean | Exclude reasoning from response (default: false) |
Using reasoning_effort shorthand
Set reasoning_effort at the top level of the request body: "none", "minimal", "low", "medium", "high", or "xhigh". This is equivalent to reasoning.effort.
Response
Reasoning appears in a reasoning_details array on the assistant message:
{
"role": "assistant",
"content": "The answer is 42.",
"reasoning_details": [
{
"type": "thinking",
"text": "Let me work through this step by step..."
}
]
}Usage
Reasoning tokens are tracked in completion_tokens_details.reasoning_tokens.
Anthropic Format
Use the /v1/messages endpoint with the thinking parameter to enable extended thinking.
Using thinking object
Pass a thinking object in the request body:
thinking.type:"enabled"or"disabled"thinking.budget_tokens: integer — required when type is"enabled", sets the thinking token budget
thinking fields
| Field | Type | Required | Description |
|---|---|---|---|
| type | string | Required | "enabled" or "disabled" |
| budget_tokens | integer | Required when enabled | Token budget for thinking |
Response
Thinking appears as content blocks in the response:
{
"content": [
{
"type": "thinking",
"thinking": "Let me work through this step by step..."
},
{
"type": "text",
"text": "The answer is 42."
}
]
}Gemini Format
Use the /v1beta/models/{model}:generateContent endpoint with generationConfig.thinkingConfig to enable extended thinking.
Using thinkingConfig object
| Field | Type | Description |
|---|---|---|
| includeThoughts | boolean | When true, the response includes thought parts with thought: true and a thoughtSignature. |
| thinkingBudget | integer | Maximum tokens the model may spend on thinking. |
| thinkingLevel | string | Thinking depth: "LOW", "MEDIUM", or "HIGH". Native Gemini only — dropped on the cross-provider path. |
Response
When includeThoughts is enabled, each thought appears as a part with thought: true, text, and an opaque thoughtSignature:
{
"candidates": [{
"content": {
"role": "model",
"parts": [
{
"thought": true,
"text": "Let me work through this step by step...",
"thoughtSignature": "Aab..."
},
{"text": "The answer is 49,403."}
]
},
"finishReason": "STOP"
}],
"usageMetadata": {
"promptTokenCount": 10,
"candidatesTokenCount": 14,
"thoughtsTokenCount": 64,
"totalTokenCount": 88
}
}Thought round-trip contract. If you include a thought part on a follow-up turn, echo the thoughtSignature back exactly as received. Unsigned replayed function calls and thought parts are dropped silently on Gemini thinking models.
Cross-Provider Mapping
When the resolved model is Claude or GPT, RouterHub maps thinkingConfig into the internal reasoning config:
includeThoughts: true→ reasoning enabledthinkingBudget: N→ reasoning token budgetthinkingLevel→ dropped (the internal config only carries enabled + budget)
Model Support
Reasoning is supported on most models. See the Models page for a full compatibility matrix.
| Family | Models with Reasoning |
|---|---|
| Claude | All Claude 4.x models |
| Gemini | All Gemini 2.5+ and 3.x models |
| GPT | gpt-5, gpt-5.x series |
Examples
curl https://api.routerhub.ai/v1/chat/completions \
-H "Authorization: Bearer $ROUTERHUB_API_KEY" \
-H "Content-Type: application/json" \
-d '{
"model": "anthropic/claude-sonnet-4",
"messages": [
{"role": "user", "content": "What is 127 * 389? Think step by step."}
],
"reasoning": {
"effort": "high"
}
}'from openai import OpenAI
client = OpenAI(
base_url="https://api.routerhub.ai/v1",
api_key="YOUR_API_KEY",
)
response = client.chat.completions.create(
model="anthropic/claude-sonnet-4",
messages=[
{"role": "user", "content": "What is 127 * 389? Think step by step."}
],
extra_body={
"reasoning": {"effort": "high"}
},
)
# Access reasoning
for detail in response.choices[0].message.reasoning_details or []:
if detail.type == "thinking":
print("Thinking:", detail.text)
print("Answer:", response.choices[0].message.content)from anthropic import Anthropic
client = Anthropic(
base_url="https://api.routerhub.ai",
api_key="YOUR_API_KEY",
)
response = client.messages.create(
model="anthropic/claude-sonnet-4",
max_tokens=8192,
thinking={
"type": "enabled",
"budget_tokens": 4096,
},
messages=[
{"role": "user", "content": "What is 127 * 389? Think step by step."}
],
)
for block in response.content:
if block.type == "thinking":
print("Thinking:", block.thinking)
elif block.type == "text":
print("Answer:", block.text)Streaming with Reasoning
Reasoning is fully supported in streaming mode for both API formats:
- OpenAI format: Reasoning details appear as
reasoning_detailsdeltas in streamed chunks. - Anthropic format: Thinking appears as
thinking_deltaevents in the SSE stream.
See the Streaming guide for full details on consuming streaming responses.