Skip to main content
POST
https://llm.ai-nebula.com
/
v1
/
chat
/
completions
Create Chat Request
curl --request POST \
  --url https://llm.ai-nebula.com/v1/chat/completions \
  --header 'Authorization: <authorization>' \
  --header 'Content-Type: application/json' \
  --data '
{
  "model": "<string>",
  "messages": [
    {}
  ],
  "temperature": 123,
  "stream": true,
  "max_tokens": 123,
  "top_p": 123
}
'
{
  "id": "chatcmpl-xxx",
  "object": "chat.completion",
  "created": 1234567890,
  "model": "claude-opus-4-6",
  "choices": [
    {
      "index": 0,
      "message": {
        "role": "assistant",
        "content": "Artificial intelligence is a branch of computer science that aims to create intelligent machines..."
      },
      "finish_reason": "stop"
    }
  ],
  "usage": {
    "prompt_tokens": 25,
    "completion_tokens": 100,
    "total_tokens": 125
  }
}

Introduction

Universal text chat API supporting OpenAI-compatible large language models for generating conversational responses. Through a unified API interface, you can call multiple mainstream large models including OpenAI, Claude, DeepSeek, Grok, and Tongyi Qianwen.

Authentication

Authorization
string
required
Bearer Token, e.g. Bearer sk-xxxxxxxxxx

Request Parameters

model
string
required
Model identifier, supported models include:
  • OpenAI series: o4-mini, o3-mini, gpt-5.2, gpt-5.1, gpt-4o, gpt-4o-mini, etc.
  • Claude series: claude-opus-4-6, claude-sonnet-4-5-20250929, claude-haiku-4-5-20251001, etc.
  • DeepSeek series: deepseek-v3-1-250821, deepseek-v3, deepseek-r1, etc.
  • Grok series: grok-4, grok-4-fast-reasoning, grok-3, etc.
  • Gemini series: gemini-3-pro-preview, gemini-3-flash-preview, nano-banana-pro and -thinking/-nothinking / -thinking-<budget> / -thinking-low/-thinking-high variants
  • Domestic models: glm-4.7, qwen3-coder-plus, kimi-k2.5, etc.
messages
array
required
Conversation message list, each element contains role (user/system/assistant) and content
temperature
number
default:"0.7"
Randomness control, 0-2, higher values = more random responses
stream
boolean
default:"false"
Whether to enable streaming output, returns SSE format chunked data
max_tokens
number
Maximum number of tokens to generate, controls response length
top_p
number
Nucleus sampling parameter, 0-1, controls generation diversity

Basic Examples

curl -X POST "https://llm.ai-nebula.com/v1/chat/completions" \
  -H "Content-Type: application/json" \
  -H "Authorization: Bearer sk-xxxxxxxxxx" \
  -d '{
    "model": "claude-opus-4-6",
    "messages": [
      {"role": "system", "content": "You are a helpful assistant"},
      {"role": "user", "content": "Briefly introduce artificial intelligence"}
    ],
    "temperature": 0.7
  }'
{
  "id": "chatcmpl-xxx",
  "object": "chat.completion",
  "created": 1234567890,
  "model": "claude-opus-4-6",
  "choices": [
    {
      "index": 0,
      "message": {
        "role": "assistant",
        "content": "Artificial intelligence is a branch of computer science that aims to create intelligent machines..."
      },
      "finish_reason": "stop"
    }
  ],
  "usage": {
    "prompt_tokens": 25,
    "completion_tokens": 100,
    "total_tokens": 125
  }
}

Advanced Features

Tool Calling (Functions / Tools)

Supports OpenAI-compatible tool calling format, applicable to GPT, Claude, DeepSeek, Grok, Tongyi Qianwen, and other models.
curl -X POST "https://llm.ai-nebula.com/v1/chat/completions" \
  -H "Content-Type: application/json" \
  -H "Authorization: Bearer sk-xxxxxxxxxx" \
  -d '{
    "model": "claude-opus-4-6",
    "messages": [
      {"role": "user", "content": "What'\''s the weather in Shanghai?"}
    ],
    "tools": [
      {
        "type": "function",
        "function": {
          "name": "get_weather",
          "description": "Get weather information by city",
          "parameters": {
            "type": "object",
            "properties": {
              "city": {"type": "string"}
            },
            "required": ["city"]
          }
        }
      }
    ],
    "tool_choice": "auto"
  }'

Structured Output (JSON Schema)

Supports controlling output format through response_format parameter, applicable to GPT, Claude, Grok, and other models.
curl -X POST "https://llm.ai-nebula.com/v1/chat/completions" \
  -H "Content-Type: application/json" \
  -H "Authorization: Bearer sk-xxxxxxxxxx" \
  -d '{
    "model": "claude-opus-4-6",
    "response_format": {
      "type": "json_schema",
      "json_schema": {
        "name": "Answer",
        "schema": {
          "type": "object",
          "properties": {
            "summary": {"type": "string"}
          },
          "required": ["summary"]
        }
      }
    },
    "messages": [
      {"role": "user", "content": "Return a JSON containing a summary field"}
    ]
  }'
For strict structured output, it is recommended to lower the temperature value (e.g., 0.1-0.3) and set an appropriate max_tokens to improve consistency.

Thinking Capability

Some models support thinking capability (Thinking/Reasoning), which can display the reasoning process when generating responses. Different models implement this differently:
DeepSeek models support enabling thinking capability through the thinking field:
curl -X POST "https://llm.ai-nebula.com/v1/chat/completions" \
  -H "Content-Type: application/json" \
  -H "Authorization: Bearer sk-xxxxxxxxxx" \
  -d '{
    "model": "deepseek-v3-1-250821",
    "messages": [
      {"role": "system", "content": "You are a helpful assistant"},
      {"role": "user", "content": "Give a medium-difficulty geometry problem and solve it step by step"}
    ],
    "thinking": {"type": "enabled"}
  }'
  • Default thinking.type is "disabled", need to explicitly set to "enabled" to enable
  • The output form of thinking capability may vary by model version
  • It is recommended to use with stream: true for better interactive experience

Tongyi Qianwen Extended Features

Tongyi Qianwen models support extended features such as search, speech recognition, etc. All extended parameters need to be placed in the parameters object.
curl -X POST "https://llm.ai-nebula.com/v1/chat/completions" \
  -H "Content-Type: application/json" \
  -H "Authorization: Bearer sk-xxxxxxxxxx" \
  -d '{
    "model": "qwen3-omni-flash",
    "messages": [
      {"role": "user", "content": "Please first search for recent common misconceptions about Fermat'\''s Last Theorem, then answer"}
    ],
    "stream": true,
    "enable_thinking": true,
    "parameters": {
      "enable_search": true,
      "search_options": {
        "region": "CN",
        "recency_days": 30
      },
      "incremental_output": true
    }
  }'
All extended parameters for Tongyi Qianwen (such as enable_search, search_options, asr_options, temperature, top_p, etc.) need to be placed in the parameters object, not at the top level of the request body.

Web Search Features

Some models support real-time web search, allowing access to the latest information and including citation sources in responses.

GPT File Input (Responses API)

GPT-5 and other models support file input functionality, which needs to be called through the /v1/responses endpoint, not /v1/chat/completions.
You can upload PDF files by linking external URLs:
from openai import OpenAI

client = OpenAI(
    api_key="sk-xxxxxxxxxx",
    base_url="https://llm.ai-nebula.com/v1/responses?api-version=2025-03-01-preview"
)

response = client.responses.create(
    model="gpt-5.2",
    input=[
        {
            "role": "user",
            "content": [
                {
                    "type": "input_text",
                    "text": "Analyze this letter and summarize its key points"
                },
                {
                    "type": "input_file",
                    "file_url": "https://www.example.com/document.pdf"
                }
            ]
        }
    ]
)
print(response.output_text)
  • File size limit: Single file not exceeding 50 MB, total size of all files in a single request not exceeding 50 MB
  • Supported models: gpt-4o, gpt-4o-mini, gpt-5-chat, and other models that support text and image input
  • Reasoning models (o1, o3-mini, o4-mini) should also use the /v1/responses endpoint if they need to use reasoning capability

Grok Reasoning Capability

Grok models (especially grok-4-fast-reasoning) support reasoning capability. The usage in the response distinguishes between completion_tokens and reasoning_tokens:
{
  "usage": {
    "prompt_tokens": 100,
    "completion_tokens": 500,
    "total_tokens": 600,
    "completion_tokens_details": {
      "reasoning_tokens": 300
    }
  }
}
Actual output text token count = completion_tokens - reasoning_tokens

Response Format

{
  "id": "chatcmpl-xxx",
  "object": "chat.completion",
  "created": 1234567890,
  "model": "claude-opus-4-6",
  "choices": [
    {
      "index": 0,
      "message": {
        "role": "assistant",
        "content": "Response content..."
      },
      "finish_reason": "stop"
    }
  ],
  "usage": {
    "prompt_tokens": 25,
    "completion_tokens": 100,
    "total_tokens": 125
  }
}

Error Handling

Exception TypeTrigger ScenarioReturn Message
AuthenticationErrorInvalid or unauthorized API keyError: Invalid or unauthorized API key
NotFoundErrorModel does not exist or is not supportedError: Model [model] does not exist or is not supported
APIConnectionErrorNetwork interruption or server not respondingError: Cannot connect to API server
APIErrorRequest format error and other server-side exceptionsAPI request failed: [error details]

Supported Model Series

OpenAI Series

  • GPT-4.1, GPT-4o, GPT-4o Mini, GPT-3.5-turbo
  • Reasoning models: o3-mini, o4-mini (need to use /v1/responses endpoint)

Claude Series (Anthropic)

  • Claude Sonnet 4, Claude 3 Opus, Claude 3 Haiku

DeepSeek Series

  • DeepSeek V3, DeepSeek R1

Grok Series (xAI)

  • Grok-4, Grok-3, Grok-3-fast, Grok-4-fast-reasoning

Tongyi Qianwen Series (Qwen)

  • Qwen3-omni-flash, etc.

Other Models

  • Gemini series, GLM series, Kimi series, etc.
For the complete model list, please see the Model Information Page.

Notes

  • In the messages list, system role is used to set model behavior, user role is for user questions
  • Multi-turn conversations require appending history (including assistant role responses)
  • Requires openai library: pip install openai
  • Different models may have different levels of support for certain features, it is recommended to check the specific model documentation before use
  • Using streaming output can improve first token response time and interactive experience
  • Tool calling requires proper timeout and retry mechanisms to avoid blocking model responses
  • Tongyi Qianwen extended parameters must be placed in the parameters object