Streaming lets you display tokens to your users as soon as they are generated, rather than waiting for the entire response to finish. This dramatically improves perceived latency for interactive applications. Qubax streams responses using Server-Sent Events (SSE), the same protocol the OpenAI Chat API uses.
When you enable streaming, Qubax keeps the HTTP connection open and emits a series of events. Each event is a JSON object representing an incremental delta — usually a few tokens of the assistant's message. The connection ends with a special [DONE] marker.
Events are delivered as Server-Sent Events: each is prefixed with data: and separated by a blank line. Because the wire format is identical to OpenAI's, any SSE-compatible OpenAI client streams correctly against Qubax.
Set stream to true in the request body. All other parameters behave the same as a non-streaming request.
curl https://api.qubax.ai/v1/chat/completions \
-H "Authorization: Bearer $QUBAX_API_KEY" \
-H "Content-Type: application/json" \
-d '{
"model": "gpt-5.5",
"messages": [{"role": "user", "content": "Count to 5 slowly."}],
"stream": true
}'The raw response is a stream of SSE chunks that looks like this:
data: {"id":"chatcmpl_abc","object":"chat.completion.chunk","created":1719792000,"model":"gpt-5.5","choices":[{"index":0,"delta":{"role":"assistant"},"finish_reason":null}]}
data: {"id":"chatcmpl_abc","object":"chat.completion.chunk","created":1719792000,"model":"gpt-5.5","choices":[{"index":0,"delta":{"content":"1"},"finish_reason":null}]}
data: {"id":"chatcmpl_abc","object":"chat.completion.chunk","created":1719792000,"model":"gpt-5.5","choices":[{"index":0,"delta":{"content":", 2"},"finish_reason":null}]}
data: [DONE]The OpenAI SDK handles SSE parsing for you. Pass stream=True and iterate over the returned generator:
from openai import OpenAI
client = OpenAI(
api_key="qbx_live_...",
base_url="https://api.qubax.ai/v1",
)
stream = client.chat.completions.create(
model="gpt-5.5",
messages=[{"role": "user", "content": "Write a haiku about the ocean."}],
stream=True,
)
for chunk in stream:
delta = chunk.choices[0].delta.content
if delta is not None:
print(delta, end="", flush=True)
print()delta.role and no content. Subsequent chunks carry text in delta.content. Always check that content is not None before printing.If you are not using an OpenAI SDK — for example in a custom backend or a language without first-class support — you can parse the SSE stream yourself. Read the body line by line, strip the data: prefix from each line, decode the JSON, and stop when you encounter the literal [DONE] sentinel.
import json
import httpx
url = "https://api.qubax.ai/v1/chat/completions"
headers = {
"Authorization": "Bearer qbx_live_...",
"Content-Type": "application/json",
}
payload = {
"model": "gpt-5.5",
"messages": [{"role": "user", "content": "Say hello in three languages."}],
"stream": True,
}
with httpx.Client() as client:
with client.stream("POST", url, headers=headers, json=payload) as resp:
for line in resp.iter_lines():
if not line or not line.startswith("data:"):
continue
data = line[len("data:"):].strip()
if data == "[DONE]":
break
chunk = json.loads(data)
delta = chunk["choices"][0]["delta"].get("content")
if delta:
print(delta, end="", flush=True)
print()Key rules when parsing manually:
data:.data: [DONE] as the end of the stream.delta.content to rebuild the full message.finish_reason.usage object. If you need token counts, either disable streaming or count tokens separately on your side.