Skip to main content
POST
https://llm.ai-nebula.com
/
v1beta
/
models
/
{model}
:generateContent
Gemini Native (Text)
curl --request POST \
  --url https://llm.ai-nebula.com/v1beta/models/{model}:generateContent \
  --header 'Content-Type: application/json' \
  --data '
{
  "contents": [
    {}
  ],
  "generationConfig": {},
  "systemInstruction": {},
  "safetySettings": [
    {}
  ],
  "tools": [
    {}
  ],
  "toolConfig": {},
  "cachedContent": "<string>"
}
'
{
  "candidates": [
    {
      "content": {
        "parts": [{"text": "Response text"}],
        "role": "model"
      },
      "finishReason": "STOP",
      "index": 0,
      "safetyRatings": []
    }
  ],
  "usageMetadata": {
    "promptTokenCount": 10,
    "candidatesTokenCount": 20,
    "totalTokenCount": 30,
    "thoughtsTokenCount": 0,
    "cachedContentTokenCount": 0
  },
  "modelVersion": "gemini-2.5-pro",
  "createTime": "2025-01-01T00:00:00Z"
}

Introduction

The Gemini Native API uses Google Gemini’s request and response format. It is suitable for Google official clients (e.g. the google-generativeai SDK) or when you need to work directly with Gemini data structures. The API follows the Gemini specification and supports thinking mode, multimodal input, tool calling, Google Search (Grounding), context caching, image generation, and other full capabilities.
If you use an OpenAI-compatible client (e.g. OpenAI SDK), use the /v1/chat/completions endpoint instead.

Difference from OpenAI format

AspectGemini NativeOpenAI-compatible (/v1/chat/completions)
Message structurecontents[].parts[] (text / inlineData / fileData)messages[].content
Rolesuser / modeluser / assistant / system
System promptsystemInstruction.partsmessages with role=system
StreamingstreamGenerateContent?alt=ssestream: true
Thinking modegenerationConfig.thinkingConfig or model suffixModel suffix (e.g. -thinking)

API endpoints

FeatureMethodPath
Text generation (non-streaming)POST/v1beta/models/{model}:generateContent
Text generation (streaming)POST/v1beta/models/{model}:streamGenerateContent?alt=sse
Single EmbeddingPOST/v1beta/models/{model}:embedContent
Batch EmbeddingPOST/v1beta/models/{model}:batchEmbedContents
Replace {model} in the path with the actual model ID, e.g. gemini-2.5-pro, gemini-3-pro-preview.

Authentication

Any of the following is supported:
Authorization
string
Bearer token: Bearer sk-xxxxxxxxxx (recommended, consistent with other Nebula endpoints)
x-goog-api-key
string
Google-style API key: x-goog-api-key: sk-xxxxxxxxxx
You can also pass the key in the URL: ?key=sk-xxxxxxxxxx.

Request parameters

generateContent / streamGenerateContent

contents
array
required
List of conversation contents. Each item has role (user or model) and parts. Each part can be: {"text": "..."}, {"inlineData": {"mimeType": "...", "data": "base64..."}}, or {"fileData": {"mimeType": "...", "fileUri": "gs://..."}}.
generationConfig
object
Generation config.
  • temperature: 0–2, randomness
  • topP: nucleus sampling
  • topK: top-K sampling
  • maxOutputTokens: max output tokens
  • stopSequences: stop sequences
  • responseMimeType: e.g. text/plain
  • responseModalities: e.g. ["TEXT"] or ["IMAGE"]
  • thinkingConfig: thinking mode (see below)
  • imageConfig: image generation config (see below)
systemInstruction
object
System instruction: {"parts": [{"text": "..."}]}.
safetySettings
array
Safety levels, e.g. [{"category": "HARM_CATEGORY_HARASSMENT", "threshold": "OFF"}].
tools
array
Tool declarations (function calling), see advanced features.
toolConfig
object
Tool config, e.g. functionCallingConfig.mode: AUTO / ANY / NONE.
cachedContent
string
Context caching ID returned by the API; used to reuse cached context.

Response format

Non-streaming generateContent returns JSON:
{
  "candidates": [
    {
      "content": {
        "parts": [{"text": "Response text"}],
        "role": "model"
      },
      "finishReason": "STOP",
      "index": 0,
      "safetyRatings": []
    }
  ],
  "usageMetadata": {
    "promptTokenCount": 10,
    "candidatesTokenCount": 20,
    "totalTokenCount": 30,
    "thoughtsTokenCount": 0,
    "cachedContentTokenCount": 0
  },
  "modelVersion": "gemini-2.5-pro",
  "createTime": "2025-01-01T00:00:00Z"
}
The streaming endpoint returns SSE; each line starts with data: and contains a JSON fragment (e.g. candidates[].content.parts).

Basic examples

curl -X POST "https://llm.ai-nebula.com/v1beta/models/gemini-2.5-pro:generateContent" \
  -H "Content-Type: application/json" \
  -H "Authorization: Bearer sk-xxxxxxxxxx" \
  -d '{
    "contents": [
      {"role": "user", "parts": [{"text": "Describe AI in one sentence"}]}
    ],
    "generationConfig": {
      "temperature": 0.7,
      "maxOutputTokens": 1024
    }
  }'
By default, google-generativeai calls Google’s API. To use Nebula, set api_endpoint to https://llm.ai-nebula.com via client_options or environment variables. See your SDK docs for details.

Advanced features

Thinking mode

Supported in three ways:
  1. generationConfig.thinkingConfig (Gemini 2.5 Pro): use thinkingBudget (token count)
  2. thinkingConfig.thinkingLevel (Gemini 3 Pro): use LOW / HIGH
  3. Model suffix: -thinking, -thinking-8192, -nothinking, -thinking-low, -thinking-high
{
  "contents": [{"role": "user", "parts": [{"text": "Give a geometry problem and solve it step by step"}]}],
  "generationConfig": {
    "maxOutputTokens": 8192,
    "thinkingConfig": {
      "includeThoughts": true,
      "thinkingBudget": 8192
    }
  }
}

Multimodal input

Mix text and media in contents[].parts:
{
  "contents": [
    {
      "role": "user",
      "parts": [
        {"text": "Describe this image"},
        {
          "inlineData": {
            "mimeType": "image/jpeg",
            "data": "/9j/4AAQSkZJRg..."
          }
        }
      ]
    }
  ]
}
  • Image: inlineData with base64 data, or fileData with fileUri (e.g. gs://...)
  • Audio: inlineData with mimeType such as audio/mp3

Tool calling (Function Calling)

{
  "contents": [{"role": "user", "parts": [{"text": "What is the weather in Shanghai today?"}]}],
  "tools": [
    {
      "functionDeclarations": [
        {
          "name": "get_weather",
          "description": "Get weather for a city",
          "parameters": {
            "type": "object",
            "properties": {
              "location": {"type": "string"}
            },
            "required": ["location"]
          }
        }
      ]
    }
  ],
  "toolConfig": {
    "functionCallingConfig": {
      "mode": "AUTO",
      "allowedFunctionNames": []
    }
  }
}
The model may return a functionCall part; include the corresponding functionResponse in the next contents and send another request.

Google Search (Grounding)

When enabled, the model can use real-time web search to improve answers (e.g. weather, news). Add googleSearch to tools:
{
  "contents": [{"role": "user", "parts": [{"text": "What is the weather in Beijing today?"}]}],
  "tools": [
    {
      "googleSearch": {}
    }
  ],
  "toolConfig": {
    "functionCallingConfig": {
      "mode": "AUTO"
    }
  }
}
To use both function calling and Google Search, include googleSearch: {} and functionDeclarations as separate elements in the same tools array. Responses may include retrieval metadata (e.g. groundingMetadata).

Streaming

Use: POST /v1beta/models/{model}:streamGenerateContent?alt=sse. Request body is the same as generateContent. Response is SSE; each data: line is a JSON chunk.

Context caching

First request does not include cachedContent. If the server returns a cache ID, subsequent requests can send:
{
  "cachedContent": "cached-content-id",
  "contents": [{"role": "user", "parts": [{"text": "Continue from the context above"}]}]
}
This reduces cost and latency for long repeated context.

Image generation (e.g. Gemini 2.5 Flash)

When the model supports image output, set in generationConfig:
{
  "contents": [{"role": "user", "parts": [{"text": "Draw a cat"}]}],
  "generationConfig": {
    "responseModalities": ["IMAGE"],
    "imageConfig": {
      "aspectRatio": "1:1",
      "imageSize": "1K",
      "imageOutputOptions": {"mimeType": "image/png"}
    }
  }
}
Response candidates[].content.parts may include inlineData (e.g. base64 image).

Embedding API

Single: embedContent

Endpoint: POST https://llm.ai-nebula.com/v1beta/models/{model}:embedContent Request body example:
{
  "model": "text-embedding-004",
  "content": {
    "parts": [{"text": "Text to embed"}]
  }
}
Or put model in the path: /v1beta/models/text-embedding-004:embedContent, with body containing only content.

Batch: batchEmbedContents

Endpoint: POST https://llm.ai-nebula.com/v1beta/models/{model}:batchEmbedContents Request body example:
{
  "requests": [
    {"content": {"parts": [{"text": "First text"}]}},
    {"content": {"parts": [{"text": "Second text"}]}}
  ]
}
Response is an array, one embedding per request.

Error handling

Errors are returned as HTTP status codes and JSON body, for example:
{
  "error": {
    "code": 400,
    "message": "Invalid request: ...",
    "status": "INVALID_ARGUMENT"
  }
}
Common cases:
StatusMeaning
400Invalid request (e.g. missing contents, unsupported parameter)
401Authentication failed (invalid or missing API key)
404Model not found or wrong path
429Rate limited; retry later
500Server error
Parse error.message in your client and handle retries or user messaging accordingly.

Comparison with OpenAI format

ItemGemini NativeOpenAI (/v1/chat/completions)
Base path/v1beta/models/{model}:generateContent/v1/chat/completions
AuthAuthorization: Bearer sk-xxx or x-goog-api-keyAuthorization: Bearer sk-xxx
Message formatcontents[].parts[] (text/inlineData/fileData)messages[].content (string or array)
System promptsystemInstruction.partsmessages with role: "system"
StreamingstreamGenerateContent?alt=ssestream: true
ThinkingthinkingConfig or model suffixModel suffix (e.g. -thinking)
Toolstools[].functionDeclarationstools[].function (OpenAI shape)
Typical clientsGoogle SDK, custom HTTP clientOpenAI SDK, OpenAI-compatible clients
Use the native endpoint when you rely on Google Gemini tooling or need Gemini-specific fields (e.g. thinkingConfig, native multimodal parts). Use /v1/chat/completions when you want to stay within the OpenAI ecosystem.