The chat completions endpoint is the primary way to generate text with Qubax. It is fully OpenAI-compatible, so any client or SDK built for the OpenAI Chat API works unchanged — just point it at the Qubax base URL.
POST https://api.qubax.ai/v1/chat/completionsAll requests require an API key passed as a Bearer token in the Authorization header:
Authorization: Bearer $QUBAX_API_KEY
Content-Type: application/jsonThe request body is a JSON object. The table below describes the most commonly used parameters.
| Parameter | Type | Required | Description |
|---|---|---|---|
| model | string | Yes | ID of the model to use (e.g. gpt-5.5). |
| messages | array | Yes | Conversation messages, each with a role and content. |
| temperature | number | No | Sampling temperature, 0–2. Higher is more random. Default 1. |
| top_p | number | No | Nucleus sampling mass, 0–1. Use this or temperature, not both. |
| max_tokens | integer | No | Maximum tokens to generate in the completion. |
| stream | boolean | No | Stream partial deltas as Server-Sent Events. Default false. |
stop, n, presence_penalty, and frequency_penalty are also accepted.The messages array represents the conversation as an ordered list of role-tagged turns. Each message object has a role and content field.
system — sets the assistant's behavior and persona. Usually the first message.user — a message from the end user.assistant — a prior response from the model, included to maintain multi-turn context.curl https://api.qubax.ai/v1/chat/completions \
-H "Authorization: Bearer $QUBAX_API_KEY" \
-H "Content-Type: application/json" \
-d '{
"model": "gpt-5.5",
"messages": [
{"role": "system", "content": "You are a concise, helpful assistant."},
{"role": "user", "content": "What is the capital of France?"}
],
"temperature": 0.7,
"max_tokens": 256
}'The equivalent call using the OpenAI Python SDK:
from openai import OpenAI
client = OpenAI(
api_key="qbx_live_...",
base_url="https://api.qubax.ai/v1",
)
response = client.chat.completions.create(
model="gpt-5.5",
messages=[
{"role": "system", "content": "You are a concise, helpful assistant."},
{"role": "user", "content": "What is the capital of France?"},
],
temperature=0.7,
max_tokens=256,
)
print(response.choices[0].message.content)A successful non-streaming request returns a JSON object containing the generated message, token usage, and metadata — matching the OpenAI shape exactly.
{
"id": "chatcmpl_abc123",
"object": "chat.completion",
"created": 1719792000,
"model": "gpt-5.5",
"choices": [
{
"index": 0,
"message": {
"role": "assistant",
"content": "The capital of France is Paris."
},
"finish_reason": "stop"
}
],
"usage": {
"prompt_tokens": 25,
"completion_tokens": 7,
"total_tokens": 32
}
}The generated text lives at choices[0].message.content. The finish_reason field tells you why generation stopped:stop means a natural stop, length means the max_tokens limit was reached.
Errors use the standard HTTP status code plus a JSON body that describes the problem. The shape mirrors the OpenAI error format so existing error-handling logic carries over.
{
"error": {
"message": "Invalid model id: gpt-99",
"type": "invalid_request_error",
"param": "model",
"code": "model_not_found"
}
}| Status | Meaning |
|---|---|
| 400 | Malformed request body or invalid parameter. |
| 401 | Missing or invalid API key. |
| 404 | Unknown model id. |
| 429 | Rate limit exceeded — see Rate Limits. |
| 500/503 | Server-side error; retry with backoff. |
from openai import OpenAI, APIError, RateLimitError
client = OpenAI(api_key="qbx_live_...", base_url="https://api.qubax.ai/v1")
try:
response = client.chat.completions.create(
model="gpt-5.5",
messages=[{"role": "user", "content": "Hello!"}],
)
print(response.choices[0].message.content)
except RateLimitError as e:
print("Rate limited — slow down and retry.", e)
except APIError as e:
print("API error:", e.status_code, e.message)