Generate Content

POST /v1beta/models/{model}:generateContent

POST /v1beta/models/{model}:streamGenerateContent

Native Gemini generateContent API for text generation. Point the Google GenAI SDK at RouterHub and your request is dispatched to any registered text model — Gemini, Claude, or GPT — from a single endpoint shape.

Dispatch model. When the resolved model is backed by Vertex Gemini, your request is passed through to the SDK unchanged, preserving every top-level field (tools, toolConfig, safetySettings, systemInstruction, cachedContent, labels, serviceTier, and thinkingConfig.thinkingLevel). When the resolved model is Claude, GPT, or a generic OpenAI-compatible provider, the request is converted to the internal OpenAI shape before dispatch. Some Gemini-only fields are dropped on that cross-provider path.

Available Models

All registered text models are reachable through this endpoint, regardless of the underlying provider. Use the same model IDs documented in Models.

The google/ prefix is optional when calling Gemini models. Both /v1beta/models/gemini-2.5-pro:generateContent and /v1beta/models/google/gemini-2.5-pro:generateContent resolve the same way. For non-Gemini models, use the full model ID (e.g. anthropic/claude-sonnet-4.5, openai/gpt-5).

Authentication

This endpoint supports two authentication methods:

Method	Header	Example
Bearer token	`Authorization`	`Authorization: Bearer rh_your_api_key`
Google-style API key	`x-goog-api-key`	`x-goog-api-key: rh_your_api_key`

Request Body

Field	Type		Description
contents	array	Required	Array of `Content` objects describing the conversation history.
systemInstruction	object	Optional	System-level instruction as a `Content` object. Role is ignored — only `parts` is read.
tools	array	Optional	Array of `Tool` definitions the model may call.
toolConfig	object	Optional	`ToolConfig` controlling how the model uses the declared tools.
safetySettings	array	Optional	Gemini safety filter thresholds. Native Gemini only — dropped on the cross-provider path.
generationConfig	object	Optional	`GenerationConfig`: temperature, token limits, response format, thinking configuration, and more.
cachedContent	string	Optional	Resource name of a Gemini cached content entry (e.g. `projects/my-project/cachedContents/abc123`). Native Gemini only.
labels	object	Optional	Key–value string metadata for billing breakdown. Vertex Gemini only.
serviceTier	string	Optional	Gemini service tier (`standard`, `flex`, `priority`). Native Gemini only.

Content

Field	Type		Description
role	string	Optional	`"user"` or `"model"`. RouterHub also accepts `"function"` for legacy tool-response turns. Omit for the first user message.
parts	array	Required	Array of `Part` objects. A part carries exactly one of `text`, `inlineData`, `fileData`, `functionCall`, or `functionResponse`.

Part

Field	Type	Description
text	string	Plain text content.
inlineData	object	Inline media: `{"mimeType": "image/png", "data": "<base64>"}`.
fileData	object	URI-based media: `{"mimeType": "image/png", "fileUri": "gs://..."}`.
functionCall	object	Model-emitted tool invocation: `{"name": "...", "args": {...}}`. Appears on `role: "model"` turns.
functionResponse	object	Tool execution result: `{"name": "...", "response": {...}}`. Appears inside a `role: "user"` turn per Gemini SDK convention.
thought	boolean	`true` when the part is an extended-thinking block. Paired with `text`.
thoughtSignature	string	Opaque base64 signature required for thought round-trips. Echo it back on follow-up turns exactly as received. See Reasoning.

Tool

Field	Type	Description
functionDeclarations	array	Array of `FunctionDeclaration`: `{name, description, parameters}` where `parameters` is a Gemini-flavoured JSON Schema object.

ToolConfig

Field	Type	Description
functionCallingConfig.mode	string	`"AUTO"` (default), `"ANY"` (must call a function), `"NONE"` (never call), or `"VALIDATED"` (call-or-text with schema validation).
functionCallingConfig.allowedFunctionNames	array	Restrict the model to this subset of declared function names. Required to have a non-empty intersection with `tools`; otherwise RouterHub returns `INVALID_ARGUMENT`.

GenerationConfig

Field	Type	Description
temperature	number	Sampling temperature (0.0 – 2.0).
topP	number	Nucleus sampling threshold.
topK	number	Top-K sampling.
maxOutputTokens	integer	Maximum tokens to generate per candidate.
stopSequences	array	Strings that stop generation when encountered.
candidateCount	integer	Number of response variants to return. Cross-provider path only accepts 1 — higher values are rejected with `INVALID_ARGUMENT`. Native Gemini accepts the provider's allowed range.
responseMimeType	string	Set to `"application/json"` together with `responseSchema` for structured output. See Structured Output.
responseSchema	object	Gemini `Schema` describing the expected JSON shape. Translated to OpenAI `json_schema` for cross-provider dispatch.
responseModalities	array	Must not contain `"IMAGE"` on this route. For image output use Image Generation.
thinkingConfig	object	Extended-thinking knobs: `includeThoughts`, `thinkingBudget`, `thinkingLevel`. `thinkingLevel` is preserved on the native path and dropped cross-provider (where only `includeThoughts` + `thinkingBudget` map to our internal reasoning config). See Reasoning.

Response Body

Field	Type	Description
candidates	array	One candidate per `candidateCount`. Each has `content` (with `parts`), `finishReason` (`STOP`, `MAX_TOKENS`, `SAFETY`, `OTHER`), and optional `safetyRatings`.
modelVersion	string	Model identifier used to serve the request.
responseId	string	Server-issued request identifier.
usageMetadata	object	Token usage: `promptTokenCount`, `candidatesTokenCount`, `thoughtsTokenCount`, `cachedContentTokenCount`, `totalTokenCount`.
promptFeedback	object	Present if the prompt was blocked. Contains `blockReason` and `safetyRatings`.

Streaming

POST /v1beta/models/{model}:streamGenerateContent

Call the :streamGenerateContent action to receive the response as Server-Sent Events. Each event is a line of the form data: <partial GenerateContentResponse> followed by a blank line.

Unlike the OpenAI SSE format, the Gemini stream has no [DONE] terminator. The stream simply ends when the connection closes. Detect completion via the presence of finishReason on the final chunk, or by the closed connection itself.

Chunk contents:

Text chunks carry candidates[0].content.parts[0].text with the incremental text delta.
Thought chunks carry a part with thought: true and the accumulated signature.
Tool-call chunks emit a single part with a complete functionCall (arguments are not streamed incrementally).
The final chunk carries candidates[0].finishReason and the full usageMetadata.

Example SSE stream

data: {"candidates":[{"content":{"role":"model","parts":[{"text":"Hel"}]}}],"modelVersion":"gemini-2.5-pro"}

data: {"candidates":[{"content":{"role":"model","parts":[{"text":"lo"}]}}],"modelVersion":"gemini-2.5-pro"}

data: {"candidates":[{"content":{"role":"model","parts":[{"text":"!"}]},"finishReason":"STOP"}],"modelVersion":"gemini-2.5-pro","usageMetadata":{"promptTokenCount":5,"candidatesTokenCount":3,"totalTokenCount":8}}

Cross-Provider Routing

When the resolved model is not Gemini-backed, RouterHub converts your request into the internal OpenAI shape before dispatching. This lets you use the Google GenAI SDK against Claude, GPT, and generic backends — at the cost of a few Gemini-only fields:

Field	Behavior
safetySettings	Dropped silently (no equivalent on Claude / GPT).
cachedContent	Dropped silently. See Prompt Caching for per-provider caching.
labels	Dropped silently.
serviceTier	Dropped silently.
generationConfig.thinkingConfig.thinkingLevel	Dropped. `includeThoughts` and `thinkingBudget` are mapped to the internal reasoning config.
generationConfig.candidateCount > 1	Rejected with `INVALID_ARGUMENT`.
generationConfig.responseModalities containing `"IMAGE"`	Rejected with `INVALID_ARGUMENT` (use the image endpoint).
tools / toolConfig	Translated into OpenAI `tools` and `tool_choice`. `allowedFunctionNames` filters the tool list; see Tool Calling for the full mode mapping.
functionResponse parts	Accepted inside `role: "user"` or `role: "function"` turns. RouterHub mints stable synthetic `tool_call_id`s internally so function responses bind to their originating call even when the underlying provider requires an explicit ID.

Examples

Non-Streaming — Gemini (native pass-through)

curl https://api.routerhub.ai/v1beta/models/gemini-2.5-pro:generateContent \
  -H "Authorization: Bearer $ROUTERHUB_API_KEY" \
  -H "Content-Type: application/json" \
  -d '{
    "contents": [
      {"role": "user", "parts": [{"text": "What is 127 * 389?"}]}
    ],
    "generationConfig": {
      "temperature": 0.2,
      "thinkingConfig": {
        "includeThoughts": true,
        "thinkingBudget": 1024
      }
    }
  }'

from google import genai
from google.genai import types

client = genai.Client(
    api_key="YOUR_API_KEY",
    http_options={"base_url": "https://api.routerhub.ai"},
)

response = client.models.generate_content(
    model="gemini-2.5-pro",
    contents="What is 127 * 389?",
    config=types.GenerateContentConfig(
        temperature=0.2,
        thinking_config=types.ThinkingConfig(
            include_thoughts=True,
            thinking_budget=1024,
        ),
    ),
)

for part in response.candidates[0].content.parts:
    if part.thought:
        print("Thinking:", part.text)
    elif part.text:
        print("Answer:", part.text)

Non-Streaming — Claude / GPT (cross-provider)

Same request shape, different model ID — RouterHub converts to the internal OpenAI shape and dispatches to the resolved backend.

curl https://api.routerhub.ai/v1beta/models/anthropic/claude-sonnet-4.5:generateContent \
  -H "Authorization: Bearer $ROUTERHUB_API_KEY" \
  -H "Content-Type: application/json" \
  -d '{
    "contents": [
      {"role": "user", "parts": [{"text": "Write a haiku about coding"}]}
    ],
    "generationConfig": {"temperature": 0.7, "maxOutputTokens": 128}
  }'

from google import genai
from google.genai import types

client = genai.Client(
    api_key="YOUR_API_KEY",
    http_options={"base_url": "https://api.routerhub.ai"},
)

# Any non-Gemini model works — Claude, GPT, etc.
response = client.models.generate_content(
    model="anthropic/claude-sonnet-4.5",
    contents="Write a haiku about coding",
    config=types.GenerateContentConfig(temperature=0.7, max_output_tokens=128),
)
print(response.candidates[0].content.parts[0].text)

Streaming — `streamGenerateContent`

curl https://api.routerhub.ai/v1beta/models/gemini-2.5-pro:streamGenerateContent \
  -H "Authorization: Bearer $ROUTERHUB_API_KEY" \
  -H "Content-Type: application/json" \
  --no-buffer \
  -d '{
    "contents": [
      {"role": "user", "parts": [{"text": "Count from 1 to 5."}]}
    ]
  }'

from google import genai

client = genai.Client(
    api_key="YOUR_API_KEY",
    http_options={"base_url": "https://api.routerhub.ai"},
)

stream = client.models.generate_content_stream(
    model="gemini-2.5-pro",
    contents="Count from 1 to 5.",
)

for chunk in stream:
    if chunk.text:
        print(chunk.text, end="", flush=True)

# The final chunk also carries usage_metadata and finish_reason.

Function Calling

Declare tools at the top level and use toolConfig.functionCallingConfig to constrain the model. Function responses are sent back inside a role: "user" turn with a functionResponse part.

curl https://api.routerhub.ai/v1beta/models/gemini-2.5-pro:generateContent \
  -H "Authorization: Bearer $ROUTERHUB_API_KEY" \
  -H "Content-Type: application/json" \
  -d '{
    "contents": [
      {"role": "user", "parts": [{"text": "What is the weather in San Francisco?"}]}
    ],
    "tools": [{
      "functionDeclarations": [{
        "name": "get_weather",
        "description": "Get the current weather for a location",
        "parameters": {
          "type": "object",
          "properties": {"location": {"type": "string"}},
          "required": ["location"]
        }
      }]
    }],
    "toolConfig": {
      "functionCallingConfig": {
        "mode": "ANY",
        "allowedFunctionNames": ["get_weather"]
      }
    }
  }'

from google import genai
from google.genai import types

client = genai.Client(
    api_key="YOUR_API_KEY",
    http_options={"base_url": "https://api.routerhub.ai"},
)

get_weather = types.FunctionDeclaration(
    name="get_weather",
    description="Get the current weather for a location",
    parameters={
        "type": "OBJECT",
        "properties": {"location": {"type": "STRING"}},
        "required": ["location"],
    },
)

response = client.models.generate_content(
    model="gemini-2.5-pro",
    contents="What is the weather in San Francisco?",
    config=types.GenerateContentConfig(
        tools=[types.Tool(function_declarations=[get_weather])],
        tool_config=types.ToolConfig(
            function_calling_config=types.FunctionCallingConfig(
                mode="ANY",
                allowed_function_names=["get_weather"],
            ),
        ),
    ),
)

call = response.candidates[0].content.parts[0].function_call
print(call.name, call.args)  # get_weather {'location': 'San Francisco'}

Sample Response

{
  "candidates": [
    {
      "content": {
        "role": "model",
        "parts": [
          {"text": "127 * 389 = 49,403."}
        ]
      },
      "finishReason": "STOP"
    }
  ],
  "modelVersion": "gemini-2.5-pro",
  "responseId": "req_abc123",
  "usageMetadata": {
    "promptTokenCount": 10,
    "candidatesTokenCount": 14,
    "thoughtsTokenCount": 64,
    "totalTokenCount": 88
  }
}

Error Format

Errors follow Google API format — distinct from the OpenAI and Anthropic formats used elsewhere.

{
  "error": {
    "code": 400,
    "message": "candidateCount > 1 is not supported when routing to non-Gemini backends",
    "status": "INVALID_ARGUMENT"
  }
}

HTTP	status	Meaning
400	`INVALID_ARGUMENT`	Validation error: bad field, rejected cross-provider feature, or mismatched `allowedFunctionNames`.
401	`UNAUTHENTICATED`	Missing or invalid API key.
404	`NOT_FOUND`	Unknown model, or action other than `:generateContent` / `:streamGenerateContent`.
429	`RESOURCE_EXHAUSTED`	Rate limit exceeded.
502	`INTERNAL`	Downstream provider error.
503	`UNAVAILABLE`	Provider temporarily unavailable, or the resolved Gemini model is behind a multi-provider priority chain (rare configuration; see callout below).
504	`DEADLINE_EXCEEDED`	Upstream timeout.

The 503 “gemini-backed model in multi-provider chain not yet supported on /v1beta endpoint” only fires when a Gemini model is configured as one of multiple alternate providers for the same slug — an uncommon setup. Direct Vertex Gemini registrations and multi-account Gemini pools both work on the native path.

See Errors for the full retry guidance.

Generate Content

Available Models

Authentication

Request Body

Content

Part

Tool

ToolConfig

GenerationConfig

Response Body

Streaming

Example SSE stream

Cross-Provider Routing

Examples

Non-Streaming — Gemini (native pass-through)

Non-Streaming — Claude / GPT (cross-provider)

Streaming — streamGenerateContent

Function Calling

Sample Response

Error Format

Streaming — `streamGenerateContent`