Audio & speech

Status: Planned. Not callable yet. The shape below is design direction, not contract.

Audio & speech will cover narration, accessibility, transcription, dialogue audio, and voice catalog workflows.

Planned shape

Speech synthesis (text-to-speech):

POST /v1/audio/speech
Authorization: Bearer <api_key>
Content-Type: application/json
Idempotency-Key: <uuid>
 
{
  "model": "auto",
  "input": "...",
  "voice": "voice_..."
}

Transcription (speech-to-text):

POST /v1/audio/transcriptions
Authorization: Bearer <api_key>
Content-Type: application/json
Idempotency-Key: <uuid>
 
{
  "model": "auto",
  "upload_id": "upl_..."
}

Both return 202 Accepted with a task_id. Voice ids come from a future GET /v1/voices discovery endpoint.

How delivery will work

Audio jobs use the async task model. Source media for transcription is provided via uploads.

Prepare now

Classify transcript and voice data as sensitive unless your policy says otherwise.
Validate file formats before upload once uploads become available.
Keep source media, generated audio, and review state in your own records.
Track Roadmap before wiring client calls.

Audio & speech

Planned shape

How delivery will work

Prepare now

On this page