Audio: TTS & Speech-to-Text

Qubax exposes two audio endpoints — one for generating spoken audio from text (Text-to-Speech) and one for transcribing uploaded audio into text (Speech-to-Text). Both are OpenAI-compatible and accept your qbx_live_... key as a Bearer token.

Text

Authorization: Bearer qbx_live_...

Text-to-Speech

Generate spoken audio from input text. The response body is the raw audio bytes in the requested format (no JSON wrapper).

Text

POST https://api.qubax.ai/v1/audio/speech

Parameter	Type	Required	Description
model	string	Yes	TTS model to use (e.g. `tts-1`).
input	string	Yes	The text to synthesize. Max 4,096 characters.
voice	string	No	Voice preset: `alloy`, `echo`, `fable`, `onyx`, `nova`, `shimmer`. Default `alloy`.
response_format	string	No	Output container: `mp3`, `opus`, `aac`, `flac`, `wav`, `pcm`. Default `mp3`.

Python SDK Example

Python

from pathlib import Path
from openai import OpenAI

client = OpenAI(
    api_key="qbx_live_...",
    base_url="https://api.qubax.ai/v1",
)

response = client.audio.speech.create(
    model="tts-1",
    voice="alloy",
    input="Hello world, this is Qubax text-to-speech.",
)

# The SDK exposes the raw bytes via .content
Path("speech.mp3").write_bytes(response.content)
print("Wrote speech.mp3")

cURL Example

Shell

curl https://api.qubax.ai/v1/audio/speech \
  -H "Authorization: Bearer *** \
  -H "Content-Type: application/json" \
  -d '{
    "model": "tts-1",
    "input": "Hello world, this is Qubax text-to-speech.",
    "voice": "alloy"
  }' \
  -o speech.mp3

Because the response is raw audio, point your HTTP client's output straight at a file (-o speech.mp3 in cURL, or write response.content to disk in Python).

Speech-to-Text

Transcribe an audio file into text. This endpoint accepts multipart/form-data with a file upload and returns a JSON object containing the transcript.

Text

POST https://api.qubax.ai/v1/audio/transcriptions

Parameter	Type	Required	Description
file	file	Yes	The audio file to transcribe. Max 25 MB.
model	string	Yes	Transcription model to use (e.g. `whisper-1`).

Supported formats
`wav` `mp3` `mp4` `m4a` `ogg` `webm` `flac`

⚠️

The file upload is capped at 25 MB. For longer recordings, split the file before uploading.

Python SDK Example

Python

from openai import OpenAI

client = OpenAI(
    api_key="qbx_live_...",
    base_url="https://api.qubax.ai/v1",
)

with open("audio.mp3", "rb") as audio_file:
    result = client.audio.transcriptions.create(
        model="whisper-1",
        file=audio_file,
    )

print(result.text)

cURL Example

Shell

curl https://api.qubax.ai/v1/audio/transcriptions \
  -H "Authorization: Bearer *** \
  -F "[email protected]" \
  -F "model=whisper-1"

Transcription Response

A successful transcription returns a JSON object with the full text:

JSON

{
  "text": "The quick brown fox jumps over the lazy dog."
}

Pass response_format: "verbose_json" in the form data to receive segments with per-phrase timestamps, or "srt" / "vtt" for caption-ready text.

←

Video Generation

Embeddings

→