Text & chat

Build assistants, drafting, classification, and decision tools with the OpenAI-compatible chat API.

Status: Available. POST /v1/chat/completions, including stream: true.

Use text & chat for assistants, drafting, classification, extraction, summarization, and any workflow that produces text from text.

Basic request

curl https://api.beatra.ai/v1/chat/completions \
  -H "Authorization: Bearer $BEATRA_API_KEY" \
  -H "Content-Type: application/json" \
  -H "Idempotency-Key: $(uuidgen)" \
  -d '{
    "model": "auto",
    "messages": [
      {"role": "system", "content": "You are a concise product copywriter."},
      {"role": "user", "content": "Write three onboarding email subject lines."}
    ]
  }'

from openai import OpenAI
 
client = OpenAI(
    base_url="https://api.beatra.ai/v1",
    api_key="<BEATRA_KEY>",
)
 
resp = client.chat.completions.create(
    model="auto",
    messages=[
        {"role": "system", "content": "You are a concise product copywriter."},
        {"role": "user", "content": "Write three onboarding email subject lines."},
    ],
)
print(resp.choices[0].message.content)

import OpenAI from "openai";
 
const client = new OpenAI({
  baseURL: "https://api.beatra.ai/v1",
  apiKey: process.env.BEATRA_API_KEY,
});
 
const resp = await client.chat.completions.create({
  model: "auto",
  messages: [
    { role: "system", content: "You are a concise product copywriter." },
    { role: "user", content: "Write three onboarding email subject lines." },
  ],
});
console.log(resp.choices[0].message.content);

Response shape

{
  "id": "chatcmpl_01J5...",
  "model": "<resolved model id>",
  "choices": [
    {
      "index": 0,
      "message": { "role": "assistant", "content": "1. Welcome aboard\n2. ..." },
      "finish_reason": "stop"
    }
  ],
  "usage": { "prompt_tokens": 24, "completion_tokens": 38, "total_tokens": 62 }
}

Two things to keep per request:

The response body's model field — what actually handled the call.
The response header X-Request-Id — your fastest support lookup key.

Field	Type	Notes
`model`	string	`"auto"` or an account-enabled model id.
`messages`	array	`role` + `content`. `system` first for instructions, then alternating `user` and `assistant`.
`stream`	boolean	See below. Default `false`.
`temperature`	number	`0.0`–`2.0`. Lower = more deterministic.
`max_tokens`	integer	Cap on response length.
`top_p`	number	Nucleus sampling. Pass either `temperature` or `top_p`, not both.

Streaming

Set stream: true for token-by-token rendering in UIs. The response is text/event-stream, emits OpenAI-compatible chunks, and ends with data: [DONE].

from openai import OpenAI
 
client = OpenAI(base_url="https://api.beatra.ai/v1", api_key="<BEATRA_KEY>")
 
stream = client.chat.completions.create(
    model="auto",
    messages=[{"role": "user", "content": "Count to five."}],
    stream=True,
)
for chunk in stream:
    delta = chunk.choices[0].delta
    if delta.content:
        print(delta.content, end="", flush=True)

import OpenAI from "openai";
 
const client = new OpenAI({
  baseURL: "https://api.beatra.ai/v1",
  apiKey: process.env.BEATRA_API_KEY,
});
 
const stream = await client.chat.completions.create({
  model: "auto",
  messages: [{ role: "user", content: "Count to five." }],
  stream: true,
});
 
for await (const chunk of stream) {
  process.stdout.write(chunk.choices[0]?.delta?.content ?? "");
}

Treat any disconnect before data: [DONE] as an incomplete response. Don't take irreversible action on partial output. See Sync vs streaming.

Chat is postpaid. Every successful completion is charged from your wallet at the end of the call. The charge is the LLM-cost figure reported by LiteLLM, multiplied by beatra's per-model markup, expressed in credits.

A worked example using the current pr1-seed defaults:

Component	Value
LiteLLM cost (model `MiniMax-M2.7`)	`$0.00200`
beatra markup (`credits_per_litellm_usd` for this model)	`1500`
Charged	`0.00200 × 1500 = 3.0000` credits
USD reference (default rate `1000 cr/$`)	`≈ $0.0030`

The response carries the charge under usage.credits and usage.credits_usd_reference:

{
  "usage": {
    "prompt_tokens": 18,
    "completion_tokens": 42,
    "total_tokens": 60,
    "credits": "3.0000",
    "credits_usd_reference": "0.0030"
  }
}

USD reference is display-only — it tracks the wallet's credits_per_usd_rate setting at the time of the response and is not a price commitment. See billing-model for the full rules.

Error handling

All errors use the standard envelope; branch on retryable. Most common for this endpoint:

invalid_request (400) — fix the body and resend
rate_limited (429) — back off, retry with the same Idempotency-Key
model_unavailable (503) — retry with "auto" or another approved model id

Full handling rules: Errors & retries.

Text & chat

Basic request

Response shape

Key parameters

Streaming

Billing

Error handling

Chat API contract

Migrating from OpenAI

How beatra works

Go-live checklist

On this page

Text & chat

Basic request

Response shape

Key parameters

Streaming

Billing

Error handling

Related

Chat API contract

Migrating from OpenAI

How beatra works

Go-live checklist

On this page