Audio: TTS & Speech-to-Text

Qubax exposes two audio endpoints — one for generating spoken audio from text (Text-to-Speech) and one for transcribing uploaded audio into text (Speech-to-Text). Both are OpenAI-compatible and accept your qbx_live_... key as a Bearer token.

Text
Authorization: Bearer qbx_live_...

Text-to-Speech

Generate spoken audio from input text. The response body is the raw audio bytes in the requested format (no JSON wrapper).

Text
POST https://api.qubax.ai/v1/audio/speech
ParameterTypeRequiredDescription
modelstringYesTTS model to use (e.g. tts-1).
inputstringYesThe text to synthesize. Max 4,096 characters.
voicestringNoVoice preset: alloy, echo, fable, onyx, nova, shimmer. Default alloy.
response_formatstringNoOutput container: mp3, opus, aac, flac, wav, pcm. Default mp3.

Python SDK Example

Python
from pathlib import Path
from openai import OpenAI

client = OpenAI(
    api_key="qbx_live_...",
    base_url="https://api.qubax.ai/v1",
)

response = client.audio.speech.create(
    model="tts-1",
    voice="alloy",
    input="Hello world, this is Qubax text-to-speech.",
)

# The SDK exposes the raw bytes via .content
Path("speech.mp3").write_bytes(response.content)
print("Wrote speech.mp3")

cURL Example

Shell
curl https://api.qubax.ai/v1/audio/speech \
  -H "Authorization: Bearer *** \
  -H "Content-Type: application/json" \
  -d '{
    "model": "tts-1",
    "input": "Hello world, this is Qubax text-to-speech.",
    "voice": "alloy"
  }' \
  -o speech.mp3

Because the response is raw audio, point your HTTP client's output straight at a file (-o speech.mp3 in cURL, or write response.content to disk in Python).

Speech-to-Text

Transcribe an audio file into text. This endpoint accepts multipart/form-data with a file upload and returns a JSON object containing the transcript.

Text
POST https://api.qubax.ai/v1/audio/transcriptions
ParameterTypeRequiredDescription
filefileYesThe audio file to transcribe. Max 25 MB.
modelstringYesTranscription model to use (e.g. whisper-1).
Supported formats
wav mp3 mp4 m4a ogg webm flac
⚠️
The file upload is capped at 25 MB. For longer recordings, split the file before uploading.

Python SDK Example

Python
from openai import OpenAI

client = OpenAI(
    api_key="qbx_live_...",
    base_url="https://api.qubax.ai/v1",
)

with open("audio.mp3", "rb") as audio_file:
    result = client.audio.transcriptions.create(
        model="whisper-1",
        file=audio_file,
    )

print(result.text)

cURL Example

Shell
curl https://api.qubax.ai/v1/audio/transcriptions \
  -H "Authorization: Bearer *** \
  -F "[email protected]" \
  -F "model=whisper-1"

Transcription Response

A successful transcription returns a JSON object with the full text:

JSON
{
  "text": "The quick brown fox jumps over the lazy dog."
}

Pass response_format: "verbose_json" in the form data to receive segments with per-phrase timestamps, or "srt" / "vtt" for caption-ready text.