Reasoning

Enable extended thinking for complex tasks. The model can reason through problems step-by-step before generating a final response.


OpenAI Format

Use the /v1/chat/completions endpoint with reasoning parameters to enable extended thinking.

Using reasoning object

Pass a reasoning object in the request body to control the model's thinking behavior:

reasoning.effort and reasoning.max_tokens are mutually exclusive. Do not set both.

reasoning fields

Field Type Description
effort string "none", "minimal", "low", "medium", "high", or "xhigh". Controls reasoning depth
max_tokens integer Maximum tokens for reasoning. Mutually exclusive with effort
exclude boolean Exclude reasoning from response (default: false)

Using reasoning_effort shorthand

Set reasoning_effort at the top level of the request body: "none", "minimal", "low", "medium", "high", or "xhigh". This is equivalent to reasoning.effort.

Response

Reasoning appears in a reasoning_details array on the assistant message:

{
  "role": "assistant",
  "content": "The answer is 42.",
  "reasoning_details": [
    {
      "type": "thinking",
      "text": "Let me work through this step by step..."
    }
  ]
}

Usage

Reasoning tokens are tracked in completion_tokens_details.reasoning_tokens.


Anthropic Format

Use the /v1/messages endpoint with the thinking parameter to enable extended thinking.

Using thinking object

Pass a thinking object in the request body:

thinking fields

Field Type Required Description
type string Required "enabled" or "disabled"
budget_tokens integer Required when enabled Token budget for thinking

Response

Thinking appears as content blocks in the response:

{
  "content": [
    {
      "type": "thinking",
      "thinking": "Let me work through this step by step..."
    },
    {
      "type": "text",
      "text": "The answer is 42."
    }
  ]
}

Gemini Format

Use the /v1beta/models/{model}:generateContent endpoint with generationConfig.thinkingConfig to enable extended thinking.

Using thinkingConfig object

Field Type Description
includeThoughts boolean When true, the response includes thought parts with thought: true and a thoughtSignature.
thinkingBudget integer Maximum tokens the model may spend on thinking.
thinkingLevel string Thinking depth: "LOW", "MEDIUM", or "HIGH". Native Gemini only — dropped on the cross-provider path.

Response

When includeThoughts is enabled, each thought appears as a part with thought: true, text, and an opaque thoughtSignature:

{
  "candidates": [{
    "content": {
      "role": "model",
      "parts": [
        {
          "thought": true,
          "text": "Let me work through this step by step...",
          "thoughtSignature": "Aab..."
        },
        {"text": "The answer is 49,403."}
      ]
    },
    "finishReason": "STOP"
  }],
  "usageMetadata": {
    "promptTokenCount": 10,
    "candidatesTokenCount": 14,
    "thoughtsTokenCount": 64,
    "totalTokenCount": 88
  }
}

Thought round-trip contract. If you include a thought part on a follow-up turn, echo the thoughtSignature back exactly as received. Unsigned replayed function calls and thought parts are dropped silently on Gemini thinking models.

Cross-Provider Mapping

When the resolved model is Claude or GPT, RouterHub maps thinkingConfig into the internal reasoning config:


Model Support

Reasoning is supported on most models. See the Models page for a full compatibility matrix.

Family Models with Reasoning
Claude All Claude 4.x models
Gemini All Gemini 2.5+ and 3.x models
GPT gpt-5, gpt-5.x series

Examples

curl https://api.routerhub.ai/v1/chat/completions \
  -H "Authorization: Bearer $ROUTERHUB_API_KEY" \
  -H "Content-Type: application/json" \
  -d '{
    "model": "anthropic/claude-sonnet-4",
    "messages": [
      {"role": "user", "content": "What is 127 * 389? Think step by step."}
    ],
    "reasoning": {
      "effort": "high"
    }
  }'
from openai import OpenAI

client = OpenAI(
    base_url="https://api.routerhub.ai/v1",
    api_key="YOUR_API_KEY",
)

response = client.chat.completions.create(
    model="anthropic/claude-sonnet-4",
    messages=[
        {"role": "user", "content": "What is 127 * 389? Think step by step."}
    ],
    extra_body={
        "reasoning": {"effort": "high"}
    },
)

# Access reasoning
for detail in response.choices[0].message.reasoning_details or []:
    if detail.type == "thinking":
        print("Thinking:", detail.text)

print("Answer:", response.choices[0].message.content)
from anthropic import Anthropic

client = Anthropic(
    base_url="https://api.routerhub.ai",
    api_key="YOUR_API_KEY",
)

response = client.messages.create(
    model="anthropic/claude-sonnet-4",
    max_tokens=8192,
    thinking={
        "type": "enabled",
        "budget_tokens": 4096,
    },
    messages=[
        {"role": "user", "content": "What is 127 * 389? Think step by step."}
    ],
)

for block in response.content:
    if block.type == "thinking":
        print("Thinking:", block.thinking)
    elif block.type == "text":
        print("Answer:", block.text)

Streaming with Reasoning

Reasoning is fully supported in streaming mode for both API formats:

See the Streaming guide for full details on consuming streaming responses.