Streaming Responses

Streaming lets you display tokens to your users as soon as they are generated, rather than waiting for the entire response to finish. This dramatically improves perceived latency for interactive applications. Qubax streams responses using Server-Sent Events (SSE), the same protocol the OpenAI Chat API uses.

How Streaming Works

When you enable streaming, Qubax keeps the HTTP connection open and emits a series of events. Each event is a JSON object representing an incremental delta — usually a few tokens of the assistant's message. The connection ends with a special [DONE] marker.

Events are delivered as Server-Sent Events: each is prefixed with data: and separated by a blank line. Because the wire format is identical to OpenAI's, any SSE-compatible OpenAI client streams correctly against Qubax.

Enabling Streaming

Set stream to true in the request body. All other parameters behave the same as a non-streaming request.

Shell
curl https://api.qubax.ai/v1/chat/completions \
  -H "Authorization: Bearer $QUBAX_API_KEY" \
  -H "Content-Type: application/json" \
  -d '{
    "model": "gpt-5.5",
    "messages": [{"role": "user", "content": "Count to 5 slowly."}],
    "stream": true
  }'

The raw response is a stream of SSE chunks that looks like this:

Text
data: {"id":"chatcmpl_abc","object":"chat.completion.chunk","created":1719792000,"model":"gpt-5.5","choices":[{"index":0,"delta":{"role":"assistant"},"finish_reason":null}]}

data: {"id":"chatcmpl_abc","object":"chat.completion.chunk","created":1719792000,"model":"gpt-5.5","choices":[{"index":0,"delta":{"content":"1"},"finish_reason":null}]}

data: {"id":"chatcmpl_abc","object":"chat.completion.chunk","created":1719792000,"model":"gpt-5.5","choices":[{"index":0,"delta":{"content":", 2"},"finish_reason":null}]}

data: [DONE]

Python Example

The OpenAI SDK handles SSE parsing for you. Pass stream=True and iterate over the returned generator:

Python
from openai import OpenAI

client = OpenAI(
    api_key="qbx_live_...",
    base_url="https://api.qubax.ai/v1",
)

stream = client.chat.completions.create(
    model="gpt-5.5",
    messages=[{"role": "user", "content": "Write a haiku about the ocean."}],
    stream=True,
)

for chunk in stream:
    delta = chunk.choices[0].delta.content
    if delta is not None:
        print(delta, end="", flush=True)
print()
ℹ️
The first chunk usually contains only delta.role and no content. Subsequent chunks carry text in delta.content. Always check that content is not None before printing.

Parsing SSE Chunks Manually

If you are not using an OpenAI SDK — for example in a custom backend or a language without first-class support — you can parse the SSE stream yourself. Read the body line by line, strip the data: prefix from each line, decode the JSON, and stop when you encounter the literal [DONE] sentinel.

Python
import json
import httpx

url = "https://api.qubax.ai/v1/chat/completions"
headers = {
    "Authorization": "Bearer qbx_live_...",
    "Content-Type": "application/json",
}
payload = {
    "model": "gpt-5.5",
    "messages": [{"role": "user", "content": "Say hello in three languages."}],
    "stream": True,
}

with httpx.Client() as client:
    with client.stream("POST", url, headers=headers, json=payload) as resp:
        for line in resp.iter_lines():
            if not line or not line.startswith("data:"):
                continue
            data = line[len("data:"):].strip()
            if data == "[DONE]":
                break
            chunk = json.loads(data)
            delta = chunk["choices"][0]["delta"].get("content")
            if delta:
                print(delta, end="", flush=True)
print()

Key rules when parsing manually:

  • Ignore blank lines and any line not starting with data:.
  • Treat data: [DONE] as the end of the stream.
  • Concatenate every delta.content to rebuild the full message.
  • The final chunk carries a non-null finish_reason.
⚠️
Streaming responses do not include a usage object. If you need token counts, either disable streaming or count tokens separately on your side.