Introduction
The Gemini Native API uses Google Gemini’s request and response format. It is suitable for Google official clients (e.g. the google-generativeai SDK) or when you need to work directly with Gemini data structures. The API follows the Gemini specification and supports thinking mode, multimodal input, tool calling, Google Search (Grounding), context caching, image generation, and other full capabilities.
If you use an OpenAI-compatible client (e.g. OpenAI SDK), use the /v1/chat/completions endpoint instead.
| Aspect | Gemini Native | OpenAI-compatible (/v1/chat/completions) |
|---|
| Message structure | contents[].parts[] (text / inlineData / fileData) | messages[].content |
| Roles | user / model | user / assistant / system |
| System prompt | systemInstruction.parts | messages with role=system |
| Streaming | streamGenerateContent?alt=sse | stream: true |
| Thinking mode | generationConfig.thinkingConfig or model suffix | Model suffix (e.g. -thinking) |
API endpoints
| Feature | Method | Path |
|---|
| Text generation (non-streaming) | POST | /v1beta/models/{model}:generateContent |
| Text generation (streaming) | POST | /v1beta/models/{model}:streamGenerateContent?alt=sse |
| Single Embedding | POST | /v1beta/models/{model}:embedContent |
| Batch Embedding | POST | /v1beta/models/{model}:batchEmbedContents |
Replace {model} in the path with the actual model ID, e.g. gemini-2.5-pro, gemini-3-pro-preview.
Authentication
Any of the following is supported:
Bearer token: Bearer sk-xxxxxxxxxx (recommended, consistent with other Nebula endpoints)
Google-style API key: x-goog-api-key: sk-xxxxxxxxxx
You can also pass the key in the URL: ?key=sk-xxxxxxxxxx.
Request parameters
generateContent / streamGenerateContent
List of conversation contents. Each item has role (user or model) and parts. Each part can be: {"text": "..."}, {"inlineData": {"mimeType": "...", "data": "base64..."}}, or {"fileData": {"mimeType": "...", "fileUri": "gs://..."}}.
Generation config.
temperature: 0–2, randomness
topP: nucleus sampling
topK: top-K sampling
maxOutputTokens: max output tokens
stopSequences: stop sequences
responseMimeType: e.g. text/plain
responseModalities: e.g. ["TEXT"] or ["IMAGE"]
thinkingConfig: thinking mode (see below)
imageConfig: image generation config (see below)
System instruction: {"parts": [{"text": "..."}]}.
Safety levels, e.g. [{"category": "HARM_CATEGORY_HARASSMENT", "threshold": "OFF"}].
Tool declarations (function calling), see advanced features.
Tool config, e.g. functionCallingConfig.mode: AUTO / ANY / NONE.
Context caching ID returned by the API; used to reuse cached context.
Non-streaming generateContent returns JSON:
{
"candidates": [
{
"content": {
"parts": [{"text": "Response text"}],
"role": "model"
},
"finishReason": "STOP",
"index": 0,
"safetyRatings": []
}
],
"usageMetadata": {
"promptTokenCount": 10,
"candidatesTokenCount": 20,
"totalTokenCount": 30,
"thoughtsTokenCount": 0,
"cachedContentTokenCount": 0
},
"modelVersion": "gemini-2.5-pro",
"createTime": "2025-01-01T00:00:00Z"
}
The streaming endpoint returns SSE; each line starts with data: and contains a JSON fragment (e.g. candidates[].content.parts).
Basic examples
curl -X POST "https://llm.ai-nebula.com/v1beta/models/gemini-2.5-pro:generateContent" \
-H "Content-Type: application/json" \
-H "Authorization: Bearer sk-xxxxxxxxxx" \
-d '{
"contents": [
{"role": "user", "parts": [{"text": "Describe AI in one sentence"}]}
],
"generationConfig": {
"temperature": 0.7,
"maxOutputTokens": 1024
}
}'
curl -N -X POST "https://llm.ai-nebula.com/v1beta/models/gemini-2.5-pro:streamGenerateContent?alt=sse" \
-H "Content-Type: application/json" \
-H "Authorization: Bearer sk-xxxxxxxxxx" \
-d '{
"contents": [
{"role": "user", "parts": [{"text": "Describe AI in one sentence"}]}
],
"generationConfig": {"maxOutputTokens": 1024}
}'
import google.generativeai as genai
genai.configure(
api_key="sk-xxxxxxxxxx",
transport="rest",
client_options={"api_endpoint": "https://llm.ai-nebula.com"}
)
model = genai.GenerativeModel("gemini-2.5-pro")
response = model.generate_content("Describe AI in one sentence")
print(response.text)
const { GoogleGenerativeAI } = require("@google/generative-ai");
const genAI = new GoogleGenerativeAI("sk-xxxxxxxxxx");
genAI.apiKey = "sk-xxxxxxxxxx";
// If the SDK supports a custom baseUrl, set it to https://llm.ai-nebula.com
const model = genAI.getGenerativeModel({ model: "gemini-2.5-pro" });
const result = await model.generateContent("Describe AI in one sentence");
const text = result.response.text();
console.log(text);
By default, google-generativeai calls Google’s API. To use Nebula, set api_endpoint to https://llm.ai-nebula.com via client_options or environment variables. See your SDK docs for details.
Advanced features
Thinking mode
Supported in three ways:
- generationConfig.thinkingConfig (Gemini 2.5 Pro): use
thinkingBudget (token count)
- thinkingConfig.thinkingLevel (Gemini 3 Pro): use
LOW / HIGH
- Model suffix:
-thinking, -thinking-8192, -nothinking, -thinking-low, -thinking-high
thinkingBudget (2.5 Pro)
thinkingLevel (3 Pro)
{
"contents": [{"role": "user", "parts": [{"text": "Give a geometry problem and solve it step by step"}]}],
"generationConfig": {
"maxOutputTokens": 8192,
"thinkingConfig": {
"includeThoughts": true,
"thinkingBudget": 8192
}
}
}
{
"contents": [{"role": "user", "parts": [{"text": "Give a geometry problem and solve it step by step"}]}],
"generationConfig": {
"maxOutputTokens": 8192,
"thinkingConfig": {
"includeThoughts": true,
"thinkingLevel": "HIGH"
}
}
}
Mix text and media in contents[].parts:
{
"contents": [
{
"role": "user",
"parts": [
{"text": "Describe this image"},
{
"inlineData": {
"mimeType": "image/jpeg",
"data": "/9j/4AAQSkZJRg..."
}
}
]
}
]
}
- Image:
inlineData with base64 data, or fileData with fileUri (e.g. gs://...)
- Audio:
inlineData with mimeType such as audio/mp3
{
"contents": [{"role": "user", "parts": [{"text": "What is the weather in Shanghai today?"}]}],
"tools": [
{
"functionDeclarations": [
{
"name": "get_weather",
"description": "Get weather for a city",
"parameters": {
"type": "object",
"properties": {
"location": {"type": "string"}
},
"required": ["location"]
}
}
]
}
],
"toolConfig": {
"functionCallingConfig": {
"mode": "AUTO",
"allowedFunctionNames": []
}
}
}
The model may return a functionCall part; include the corresponding functionResponse in the next contents and send another request.
Google Search (Grounding)
When enabled, the model can use real-time web search to improve answers (e.g. weather, news). Add googleSearch to tools:
{
"contents": [{"role": "user", "parts": [{"text": "What is the weather in Beijing today?"}]}],
"tools": [
{
"googleSearch": {}
}
],
"toolConfig": {
"functionCallingConfig": {
"mode": "AUTO"
}
}
}
To use both function calling and Google Search, include googleSearch: {} and functionDeclarations as separate elements in the same tools array. Responses may include retrieval metadata (e.g. groundingMetadata).
Streaming
Use: POST /v1beta/models/{model}:streamGenerateContent?alt=sse. Request body is the same as generateContent. Response is SSE; each data: line is a JSON chunk.
Context caching
First request does not include cachedContent. If the server returns a cache ID, subsequent requests can send:
{
"cachedContent": "cached-content-id",
"contents": [{"role": "user", "parts": [{"text": "Continue from the context above"}]}]
}
This reduces cost and latency for long repeated context.
Image generation (e.g. Gemini 2.5 Flash)
When the model supports image output, set in generationConfig:
{
"contents": [{"role": "user", "parts": [{"text": "Draw a cat"}]}],
"generationConfig": {
"responseModalities": ["IMAGE"],
"imageConfig": {
"aspectRatio": "1:1",
"imageSize": "1K",
"imageOutputOptions": {"mimeType": "image/png"}
}
}
}
Response candidates[].content.parts may include inlineData (e.g. base64 image).
Embedding API
Single: embedContent
Endpoint: POST https://llm.ai-nebula.com/v1beta/models/{model}:embedContent
Request body example:
{
"model": "text-embedding-004",
"content": {
"parts": [{"text": "Text to embed"}]
}
}
Or put model in the path: /v1beta/models/text-embedding-004:embedContent, with body containing only content.
Batch: batchEmbedContents
Endpoint: POST https://llm.ai-nebula.com/v1beta/models/{model}:batchEmbedContents
Request body example:
{
"requests": [
{"content": {"parts": [{"text": "First text"}]}},
{"content": {"parts": [{"text": "Second text"}]}}
]
}
Response is an array, one embedding per request.
Error handling
Errors are returned as HTTP status codes and JSON body, for example:
{
"error": {
"code": 400,
"message": "Invalid request: ...",
"status": "INVALID_ARGUMENT"
}
}
Common cases:
| Status | Meaning |
|---|
| 400 | Invalid request (e.g. missing contents, unsupported parameter) |
| 401 | Authentication failed (invalid or missing API key) |
| 404 | Model not found or wrong path |
| 429 | Rate limited; retry later |
| 500 | Server error |
Parse error.message in your client and handle retries or user messaging accordingly.
| Item | Gemini Native | OpenAI (/v1/chat/completions) |
|---|
| Base path | /v1beta/models/{model}:generateContent | /v1/chat/completions |
| Auth | Authorization: Bearer sk-xxx or x-goog-api-key | Authorization: Bearer sk-xxx |
| Message format | contents[].parts[] (text/inlineData/fileData) | messages[].content (string or array) |
| System prompt | systemInstruction.parts | messages with role: "system" |
| Streaming | streamGenerateContent?alt=sse | stream: true |
| Thinking | thinkingConfig or model suffix | Model suffix (e.g. -thinking) |
| Tools | tools[].functionDeclarations | tools[].function (OpenAI shape) |
| Typical clients | Google SDK, custom HTTP client | OpenAI SDK, OpenAI-compatible clients |
Use the native endpoint when you rely on Google Gemini tooling or need Gemini-specific fields (e.g. thinkingConfig, native multimodal parts). Use /v1/chat/completions when you want to stay within the OpenAI ecosystem.