Qubax enforces rate limits per API key to protect shared infrastructure and ensure fair access for all users. This page explains how limits are applied, how to interpret a 429 response, and how to build a reliable retry strategy.
Limits are tracked against your API key (the qbx_live_... token), not against individual requests or sessions. Every request made with a key draws from that key's shared quota. Two common dimensions are enforced together:
Your specific limits depend on your plan and tier. Every response includes headers that tell you exactly where you stand:
| Header | Meaning |
|---|---|
| x-ratelimit-limit-requests | Max requests per minute for your key. |
| x-ratelimit-remaining-requests | Requests remaining in the current window. |
| x-ratelimit-limit-tokens | Max tokens per minute for your key. |
| x-ratelimit-remaining-tokens | Tokens remaining in the current window. |
| x-ratelimit-reset-requests | Time until the request window resets. |
x-ratelimit-remaining-* approaches zero, pause briefly before sending the next request.When you exceed a limit, Qubax responds with HTTP 429 Too Many Requests and a JSON body describing the violation. The response also includes a Retry-After header giving the recommended wait time in seconds.
{
"error": {
"message": "Rate limit reached for qbx_live_... at 60 requests per minute.",
"type": "rate_limit_error",
"code": "rate_limit_exceeded"
}
}Treat a 429 as a signal to slow down, not as a fatal error. The request was never processed, so it is safe to retry once the window resets.
A robust client retries transient failures automatically. The recommended approach is exponential backoff with jitter: double the delay after each failure and add a small random offset so that many clients do not retry in lockstep.
import random
import time
from openai import OpenAI, RateLimitError, APIConnectionError, APITimeoutError
client = OpenAI(api_key="qbx_live_...", base_url="https://api.qubax.ai/v1")
MAX_RETRIES = 5
def chat_with_retry(**kwargs):
delay = 1.0
for attempt in range(MAX_RETRIES):
try:
return client.chat.completions.create(**kwargs)
except (RateLimitError, APIConnectionError, APITimeoutError) as e:
if attempt == MAX_RETRIES - 1:
raise
# Respect Retry-After when available, otherwise back off.
wait = getattr(e, "retry_after", None) or (delay + random.uniform(0, 0.5))
time.sleep(wait)
delay *= 2
response = chat_with_retry(
model="gpt-5.5",
messages=[{"role": "user", "content": "Hello!"}],
)
print(response.choices[0].message.content)Best practices for production traffic:
429, 5xx, timeouts, and connection errors — never on 4xx client errors like 400 or 401.Retry-After header whenever it is present.