Chat Completions

The chat completions endpoint is the primary way to generate text with Qubax. It is fully OpenAI-compatible, so any client or SDK built for the OpenAI Chat API works unchanged — just point it at the Qubax base URL.

Endpoint

Text

POST https://api.qubax.ai/v1/chat/completions

All requests require an API key passed as a Bearer token in the Authorization header:

Text

Authorization: Bearer $QUBAX_API_KEY
Content-Type: application/json

Request Body

The request body is a JSON object. The table below describes the most commonly used parameters.

Parameter	Type	Required	Description
model	string	Yes	ID of the model to use (e.g. `gpt-5.5`).
messages	array	Yes	Conversation messages, each with a role and content.
temperature	number	No	Sampling temperature, 0–2. Higher is more random. Default 1.
top_p	number	No	Nucleus sampling mass, 0–1. Use this or temperature, not both.
max_tokens	integer	No	Maximum tokens to generate in the completion.
stream	boolean	No	Stream partial deltas as Server-Sent Events. Default `false`.

ℹ️

Other OpenAI-compatible parameters such as stop, n, presence_penalty, and frequency_penalty are also accepted.

Messages Format

The messages array represents the conversation as an ordered list of role-tagged turns. Each message object has a role and content field.

system — sets the assistant's behavior and persona. Usually the first message.
user — a message from the end user.
assistant — a prior response from the model, included to maintain multi-turn context.

Shell

curl https://api.qubax.ai/v1/chat/completions \
  -H "Authorization: Bearer $QUBAX_API_KEY" \
  -H "Content-Type: application/json" \
  -d '{
    "model": "gpt-5.5",
    "messages": [
      {"role": "system", "content": "You are a concise, helpful assistant."},
      {"role": "user", "content": "What is the capital of France?"}
    ],
    "temperature": 0.7,
    "max_tokens": 256
  }'

The equivalent call using the OpenAI Python SDK:

Python

from openai import OpenAI

client = OpenAI(
    api_key="qbx_live_...",
    base_url="https://api.qubax.ai/v1",
)

response = client.chat.completions.create(
    model="gpt-5.5",
    messages=[
        {"role": "system", "content": "You are a concise, helpful assistant."},
        {"role": "user", "content": "What is the capital of France?"},
    ],
    temperature=0.7,
    max_tokens=256,
)

print(response.choices[0].message.content)

Response Format

A successful non-streaming request returns a JSON object containing the generated message, token usage, and metadata — matching the OpenAI shape exactly.

JSON

{
  "id": "chatcmpl_abc123",
  "object": "chat.completion",
  "created": 1719792000,
  "model": "gpt-5.5",
  "choices": [
    {
      "index": 0,
      "message": {
        "role": "assistant",
        "content": "The capital of France is Paris."
      },
      "finish_reason": "stop"
    }
  ],
  "usage": {
    "prompt_tokens": 25,
    "completion_tokens": 7,
    "total_tokens": 32
  }
}

The generated text lives at choices[0].message.content. The finish_reason field tells you why generation stopped:stop means a natural stop, length means the max_tokens limit was reached.

Error Handling

Errors use the standard HTTP status code plus a JSON body that describes the problem. The shape mirrors the OpenAI error format so existing error-handling logic carries over.

JSON

{
  "error": {
    "message": "Invalid model id: gpt-99",
    "type": "invalid_request_error",
    "param": "model",
    "code": "model_not_found"
  }
}

Status	Meaning
400	Malformed request body or invalid parameter.
401	Missing or invalid API key.
404	Unknown model id.
429	Rate limit exceeded — see Rate Limits.
500/503	Server-side error; retry with backoff.

Python

from openai import OpenAI, APIError, RateLimitError

client = OpenAI(api_key="qbx_live_...", base_url="https://api.qubax.ai/v1")

try:
    response = client.chat.completions.create(
        model="gpt-5.5",
        messages=[{"role": "user", "content": "Hello!"}],
    )
    print(response.choices[0].message.content)
except RateLimitError as e:
    print("Rate limited — slow down and retry.", e)
except APIError as e:
    print("API error:", e.status_code, e.message)

←

Choosing Models

Streaming Responses

→