Generate Content

POST /v1beta/models/{model}:generateContent
POST /v1beta/models/{model}:streamGenerateContent

Native Gemini generateContent API for text generation. Point the Google GenAI SDK at RouterHub and your request is dispatched to any registered text model — Gemini, Claude, or GPT — from a single endpoint shape.

Dispatch model. When the resolved model is backed by Vertex Gemini, your request is passed through to the SDK unchanged, preserving every top-level field (tools, toolConfig, safetySettings, systemInstruction, cachedContent, labels, serviceTier, and thinkingConfig.thinkingLevel). When the resolved model is Claude, GPT, or a generic OpenAI-compatible provider, the request is converted to the internal OpenAI shape before dispatch. Some Gemini-only fields are dropped on that cross-provider path.


Available Models

All registered text models are reachable through this endpoint, regardless of the underlying provider. Use the same model IDs documented in Models.

The google/ prefix is optional when calling Gemini models. Both /v1beta/models/gemini-2.5-pro:generateContent and /v1beta/models/google/gemini-2.5-pro:generateContent resolve the same way. For non-Gemini models, use the full model ID (e.g. anthropic/claude-sonnet-4.5, openai/gpt-5).


Authentication

This endpoint supports two authentication methods:

Method Header Example
Bearer token Authorization Authorization: Bearer rh_your_api_key
Google-style API key x-goog-api-key x-goog-api-key: rh_your_api_key

Request Body

Field Type Description
contents array Required Array of Content objects describing the conversation history.
systemInstruction object Optional System-level instruction as a Content object. Role is ignored — only parts is read.
tools array Optional Array of Tool definitions the model may call.
toolConfig object Optional ToolConfig controlling how the model uses the declared tools.
safetySettings array Optional Gemini safety filter thresholds. Native Gemini only — dropped on the cross-provider path.
generationConfig object Optional GenerationConfig: temperature, token limits, response format, thinking configuration, and more.
cachedContent string Optional Resource name of a Gemini cached content entry (e.g. projects/my-project/cachedContents/abc123). Native Gemini only.
labels object Optional Key–value string metadata for billing breakdown. Vertex Gemini only.
serviceTier string Optional Gemini service tier (standard, flex, priority). Native Gemini only.

Content

Field Type Description
role string Optional "user" or "model". RouterHub also accepts "function" for legacy tool-response turns. Omit for the first user message.
parts array Required Array of Part objects. A part carries exactly one of text, inlineData, fileData, functionCall, or functionResponse.

Part

Field Type Description
text string Plain text content.
inlineData object Inline media: {"mimeType": "image/png", "data": "<base64>"}.
fileData object URI-based media: {"mimeType": "image/png", "fileUri": "gs://..."}.
functionCall object Model-emitted tool invocation: {"name": "...", "args": {...}}. Appears on role: "model" turns.
functionResponse object Tool execution result: {"name": "...", "response": {...}}. Appears inside a role: "user" turn per Gemini SDK convention.
thought boolean true when the part is an extended-thinking block. Paired with text.
thoughtSignature string Opaque base64 signature required for thought round-trips. Echo it back on follow-up turns exactly as received. See Reasoning.

Tool

Field Type Description
functionDeclarations array Array of FunctionDeclaration: {name, description, parameters} where parameters is a Gemini-flavoured JSON Schema object.

ToolConfig

Field Type Description
functionCallingConfig.mode string "AUTO" (default), "ANY" (must call a function), "NONE" (never call), or "VALIDATED" (call-or-text with schema validation).
functionCallingConfig.allowedFunctionNames array Restrict the model to this subset of declared function names. Required to have a non-empty intersection with tools; otherwise RouterHub returns INVALID_ARGUMENT.

GenerationConfig

Field Type Description
temperature number Sampling temperature (0.0 – 2.0).
topP number Nucleus sampling threshold.
topK number Top-K sampling.
maxOutputTokens integer Maximum tokens to generate per candidate.
stopSequences array Strings that stop generation when encountered.
candidateCount integer Number of response variants to return. Cross-provider path only accepts 1 — higher values are rejected with INVALID_ARGUMENT. Native Gemini accepts the provider's allowed range.
responseMimeType string Set to "application/json" together with responseSchema for structured output. See Structured Output.
responseSchema object Gemini Schema describing the expected JSON shape. Translated to OpenAI json_schema for cross-provider dispatch.
responseModalities array Must not contain "IMAGE" on this route. For image output use Image Generation.
thinkingConfig object Extended-thinking knobs: includeThoughts, thinkingBudget, thinkingLevel. thinkingLevel is preserved on the native path and dropped cross-provider (where only includeThoughts + thinkingBudget map to our internal reasoning config). See Reasoning.

Response Body

Field Type Description
candidates array One candidate per candidateCount. Each has content (with parts), finishReason (STOP, MAX_TOKENS, SAFETY, OTHER), and optional safetyRatings.
modelVersion string Model identifier used to serve the request.
responseId string Server-issued request identifier.
usageMetadata object Token usage: promptTokenCount, candidatesTokenCount, thoughtsTokenCount, cachedContentTokenCount, totalTokenCount.
promptFeedback object Present if the prompt was blocked. Contains blockReason and safetyRatings.

Streaming

POST /v1beta/models/{model}:streamGenerateContent

Call the :streamGenerateContent action to receive the response as Server-Sent Events. Each event is a line of the form data: <partial GenerateContentResponse> followed by a blank line.

Unlike the OpenAI SSE format, the Gemini stream has no [DONE] terminator. The stream simply ends when the connection closes. Detect completion via the presence of finishReason on the final chunk, or by the closed connection itself.

Chunk contents:

Example SSE stream

data: {"candidates":[{"content":{"role":"model","parts":[{"text":"Hel"}]}}],"modelVersion":"gemini-2.5-pro"}

data: {"candidates":[{"content":{"role":"model","parts":[{"text":"lo"}]}}],"modelVersion":"gemini-2.5-pro"}

data: {"candidates":[{"content":{"role":"model","parts":[{"text":"!"}]},"finishReason":"STOP"}],"modelVersion":"gemini-2.5-pro","usageMetadata":{"promptTokenCount":5,"candidatesTokenCount":3,"totalTokenCount":8}}

Cross-Provider Routing

When the resolved model is not Gemini-backed, RouterHub converts your request into the internal OpenAI shape before dispatching. This lets you use the Google GenAI SDK against Claude, GPT, and generic backends — at the cost of a few Gemini-only fields:

Field Behavior
safetySettings Dropped silently (no equivalent on Claude / GPT).
cachedContent Dropped silently. See Prompt Caching for per-provider caching.
labels Dropped silently.
serviceTier Dropped silently.
generationConfig.thinkingConfig.thinkingLevel Dropped. includeThoughts and thinkingBudget are mapped to the internal reasoning config.
generationConfig.candidateCount > 1 Rejected with INVALID_ARGUMENT.
generationConfig.responseModalities containing "IMAGE" Rejected with INVALID_ARGUMENT (use the image endpoint).
tools / toolConfig Translated into OpenAI tools and tool_choice. allowedFunctionNames filters the tool list; see Tool Calling for the full mode mapping.
functionResponse parts Accepted inside role: "user" or role: "function" turns. RouterHub mints stable synthetic tool_call_ids internally so function responses bind to their originating call even when the underlying provider requires an explicit ID.

Examples

Non-Streaming — Gemini (native pass-through)

curl https://api.routerhub.ai/v1beta/models/gemini-2.5-pro:generateContent \
  -H "Authorization: Bearer $ROUTERHUB_API_KEY" \
  -H "Content-Type: application/json" \
  -d '{
    "contents": [
      {"role": "user", "parts": [{"text": "What is 127 * 389?"}]}
    ],
    "generationConfig": {
      "temperature": 0.2,
      "thinkingConfig": {
        "includeThoughts": true,
        "thinkingBudget": 1024
      }
    }
  }'
from google import genai
from google.genai import types

client = genai.Client(
    api_key="YOUR_API_KEY",
    http_options={"base_url": "https://api.routerhub.ai"},
)

response = client.models.generate_content(
    model="gemini-2.5-pro",
    contents="What is 127 * 389?",
    config=types.GenerateContentConfig(
        temperature=0.2,
        thinking_config=types.ThinkingConfig(
            include_thoughts=True,
            thinking_budget=1024,
        ),
    ),
)

for part in response.candidates[0].content.parts:
    if part.thought:
        print("Thinking:", part.text)
    elif part.text:
        print("Answer:", part.text)

Non-Streaming — Claude / GPT (cross-provider)

Same request shape, different model ID — RouterHub converts to the internal OpenAI shape and dispatches to the resolved backend.

curl https://api.routerhub.ai/v1beta/models/anthropic/claude-sonnet-4.5:generateContent \
  -H "Authorization: Bearer $ROUTERHUB_API_KEY" \
  -H "Content-Type: application/json" \
  -d '{
    "contents": [
      {"role": "user", "parts": [{"text": "Write a haiku about coding"}]}
    ],
    "generationConfig": {"temperature": 0.7, "maxOutputTokens": 128}
  }'
from google import genai
from google.genai import types

client = genai.Client(
    api_key="YOUR_API_KEY",
    http_options={"base_url": "https://api.routerhub.ai"},
)

# Any non-Gemini model works — Claude, GPT, etc.
response = client.models.generate_content(
    model="anthropic/claude-sonnet-4.5",
    contents="Write a haiku about coding",
    config=types.GenerateContentConfig(temperature=0.7, max_output_tokens=128),
)
print(response.candidates[0].content.parts[0].text)

Streaming — streamGenerateContent

curl https://api.routerhub.ai/v1beta/models/gemini-2.5-pro:streamGenerateContent \
  -H "Authorization: Bearer $ROUTERHUB_API_KEY" \
  -H "Content-Type: application/json" \
  --no-buffer \
  -d '{
    "contents": [
      {"role": "user", "parts": [{"text": "Count from 1 to 5."}]}
    ]
  }'
from google import genai

client = genai.Client(
    api_key="YOUR_API_KEY",
    http_options={"base_url": "https://api.routerhub.ai"},
)

stream = client.models.generate_content_stream(
    model="gemini-2.5-pro",
    contents="Count from 1 to 5.",
)

for chunk in stream:
    if chunk.text:
        print(chunk.text, end="", flush=True)

# The final chunk also carries usage_metadata and finish_reason.

Function Calling

Declare tools at the top level and use toolConfig.functionCallingConfig to constrain the model. Function responses are sent back inside a role: "user" turn with a functionResponse part.

curl https://api.routerhub.ai/v1beta/models/gemini-2.5-pro:generateContent \
  -H "Authorization: Bearer $ROUTERHUB_API_KEY" \
  -H "Content-Type: application/json" \
  -d '{
    "contents": [
      {"role": "user", "parts": [{"text": "What is the weather in San Francisco?"}]}
    ],
    "tools": [{
      "functionDeclarations": [{
        "name": "get_weather",
        "description": "Get the current weather for a location",
        "parameters": {
          "type": "object",
          "properties": {"location": {"type": "string"}},
          "required": ["location"]
        }
      }]
    }],
    "toolConfig": {
      "functionCallingConfig": {
        "mode": "ANY",
        "allowedFunctionNames": ["get_weather"]
      }
    }
  }'
from google import genai
from google.genai import types

client = genai.Client(
    api_key="YOUR_API_KEY",
    http_options={"base_url": "https://api.routerhub.ai"},
)

get_weather = types.FunctionDeclaration(
    name="get_weather",
    description="Get the current weather for a location",
    parameters={
        "type": "OBJECT",
        "properties": {"location": {"type": "STRING"}},
        "required": ["location"],
    },
)

response = client.models.generate_content(
    model="gemini-2.5-pro",
    contents="What is the weather in San Francisco?",
    config=types.GenerateContentConfig(
        tools=[types.Tool(function_declarations=[get_weather])],
        tool_config=types.ToolConfig(
            function_calling_config=types.FunctionCallingConfig(
                mode="ANY",
                allowed_function_names=["get_weather"],
            ),
        ),
    ),
)

call = response.candidates[0].content.parts[0].function_call
print(call.name, call.args)  # get_weather {'location': 'San Francisco'}

Sample Response

{
  "candidates": [
    {
      "content": {
        "role": "model",
        "parts": [
          {"text": "127 * 389 = 49,403."}
        ]
      },
      "finishReason": "STOP"
    }
  ],
  "modelVersion": "gemini-2.5-pro",
  "responseId": "req_abc123",
  "usageMetadata": {
    "promptTokenCount": 10,
    "candidatesTokenCount": 14,
    "thoughtsTokenCount": 64,
    "totalTokenCount": 88
  }
}

Error Format

Errors follow Google API format — distinct from the OpenAI and Anthropic formats used elsewhere.

{
  "error": {
    "code": 400,
    "message": "candidateCount > 1 is not supported when routing to non-Gemini backends",
    "status": "INVALID_ARGUMENT"
  }
}
HTTP status Meaning
400 INVALID_ARGUMENT Validation error: bad field, rejected cross-provider feature, or mismatched allowedFunctionNames.
401 UNAUTHENTICATED Missing or invalid API key.
404 NOT_FOUND Unknown model, or action other than :generateContent / :streamGenerateContent.
429 RESOURCE_EXHAUSTED Rate limit exceeded.
502 INTERNAL Downstream provider error.
503 UNAVAILABLE Provider temporarily unavailable, or the resolved Gemini model is behind a multi-provider priority chain (rare configuration; see callout below).
504 DEADLINE_EXCEEDED Upstream timeout.

The 503 “gemini-backed model in multi-provider chain not yet supported on /v1beta endpoint” only fires when a Gemini model is configured as one of multiple alternate providers for the same slug — an uncommon setup. Direct Vertex Gemini registrations and multi-account Gemini pools both work on the native path.

See Errors for the full retry guidance.