Text & chat
Build assistants, drafting, classification, and decision tools with the OpenAI-compatible chat API.
Status: Available. POST /v1/chat/completions, including stream: true.
Use text & chat for assistants, drafting, classification, extraction, summarization, and any workflow that produces text from text.
Basic request
Response shape
Two things to keep per request:
- The response body's
modelfield — what actually handled the call. - The response header
X-Request-Id— your fastest support lookup key.
Key parameters
| Field | Type | Notes |
|---|---|---|
model | string | "auto" or an account-enabled model id. |
messages | array | role + content. system first for instructions, then alternating user and assistant. |
stream | boolean | See below. Default false. |
temperature | number | 0.0–2.0. Lower = more deterministic. |
max_tokens | integer | Cap on response length. |
top_p | number | Nucleus sampling. Pass either temperature or top_p, not both. |
Full field reference: Chat API.
Streaming
Set stream: true for token-by-token rendering in UIs. The response is text/event-stream, emits OpenAI-compatible chunks, and ends with data: [DONE].
Treat any disconnect before data: [DONE] as an incomplete response. Don't take irreversible action on partial output. See Sync vs streaming.
Billing
Chat is postpaid. Every successful completion is charged from your wallet at the end of the call. The charge is the LLM-cost figure reported by LiteLLM, multiplied by beatra's per-model markup, expressed in credits.
A worked example using the current pr1-seed defaults:
| Component | Value |
|---|---|
LiteLLM cost (model MiniMax-M2.7) | $0.00200 |
beatra markup (credits_per_litellm_usd for this model) | 1500 |
| Charged | 0.00200 × 1500 = 3.0000 credits |
USD reference (default rate 1000 cr/$) | ≈ $0.0030 |
The response carries the charge under usage.credits and
usage.credits_usd_reference:
USD reference is display-only — it tracks the wallet's
credits_per_usd_ratesetting at the time of the response and is not a price commitment. See billing-model for the full rules.
Error handling
All errors use the standard envelope; branch on retryable. Most common for this endpoint:
invalid_request(400) — fix the body and resendrate_limited(429) — back off, retry with the sameIdempotency-Keymodel_unavailable(503) — retry with"auto"or another approved model id
Full handling rules: Errors & retries.