How beatra works

The mental model behind statuses, models, retries, errors, and the request mechanisms shared by every capability.

A short read that explains the few things that come up everywhere else in the docs. If you've made one successful call from the Quickstart, this page fills in the rest.

Release status

Every capability and endpoint in these docs carries one of three labels:

Status	Meaning
Available	Safe to build against today; we treat it as a public contract.
Preview	Callable, but fields and behavior may still change. Keep a fallback.
Planned	Product direction. Don't call this from your code.

The full status matrix lives in the Roadmap.

Picking a model

Pass model on every chat request. You have two choices:

model: "auto" — the fastest path. We pick a suitable model for the request. Use this while prototyping or when your product doesn't expose a model choice to users.
A specific model id — pin one when your account exposes a model selector, when you run evals, or when reproducibility matters.

The response body's model field always tells you which model actually handled the request. Log it next to your job records.

If you pin a model, plan for the day it becomes unavailable. The cleanest fallback is to retry the same request with model: "auto".

Sync vs streaming responses

Text chat is synchronous by default — one HTTP request, one JSON response. For user-facing UIs that need token-by-token rendering, add stream: true to the same request.

The streaming response uses text/event-stream, emits OpenAI-compatible chunks, and ends with a sentinel event:

data: {"choices":[{"delta":{"content":"Hi"}}]}

data: {"choices":[{"delta":{"content":" there"}}]}

data: {"choices":[{"delta":{},"finish_reason":"stop"}]}

data: [DONE]

Two rules:

Treat a disconnect before data: [DONE] as an incomplete response. Don't act on partial output for irreversible workflows.
A pre-stream failure returns the same JSON error envelope as the non-streaming path — not a half-stream.

The OpenAI Python and Node SDKs both stream natively (stream=True / for await (const chunk of stream)); see Text & chat for code.

The async task model

Music generation uses async tasks in Preview. The same pattern will extend to other long-running capabilities as they become callable:

Create — POST to the capability endpoint with a stable Idempotency-Key. The response returns a task_id.
Track — store the task_id alongside your local job record. Surface a stable status to the end user.
Wait for delivery — poll GET /v1/tasks/{task_id} until the task reaches a terminal state.
Persist — copy final artifacts into your own storage before our retention window expires.

When a capability needs source media, POST /v1/uploads accepts a single file up to 100MB and returns an artifact_id. Reference-audio music can reuse that artifact_id in the create call. Validate file type, size, and ownership on your side before sending. Map the upload id to your customer's record so cleanup and audit work later.

Customer callbacks (delivery)

Customer callbacks are still Planned. For now, poll task status. The shape is described here so you can design your job table and webhook receiver ahead of time.

Idempotency and request IDs

Two headers carry most of the operational weight:

Idempotency-Key — send a stable, server-generated key on every create-style POST. Reuse it on retries of the same logical operation. Generate a new one when the user's intent changes. The deduplication window is 24 hours.
X-Request-Id — send your own for correlation, or let beatra generate one. Either way, beatra echoes it back in the response header. Store it with the user-visible job record — it's the fastest support lookup key.

Errors

Every error from beatra looks the same:

{
  "error": {
    "code": "rate_limited",
    "message": "Request budget exhausted; retry later.",
    "retryable": true,
    "request_id": "req_01J5..."
  }
}

One rule: retry only when retryable is true. Reuse the same Idempotency-Key on each retry. The full list of codes lives in Error codes.