Chat completions

OpenAI-compatible chat completions on open-weights models. Same wire shape as POST api.openai.com/v1/chat/completions. Sync, stream, and async modes; pick one per request.

Endpoint

POST https://api.genie.tech/v1/chat/completions
Authorization: Bearer sk-genie-{your-key}
Content-Type: application/json

Try it

Anonymous mode. Sign in to enable the Run button. The widget below shows the request shape you'd send.

Streaming responses fill the response pane progressively. If the worker fails to load the requested model the pane shows the error inline (e.g. [error] 500 model failed to load … + [finish_reason: error]) instead of staying blank. Silence is the bug, not the feature.

Try it nowChat completion (stream)

Request body (JSON)

SDK + curl

curl https://api.genie.tech/v1/chat/completions \
  -H "Authorization: Bearer $GENIE_API_KEY" \
  -H "Content-Type: application/json" \
  -d '{
    "model": "qwen3:0.6b",
    "messages": [{"role": "user", "content": "hi"}]
  }'

Request body

model: model id from GET /v1/models (or use a persona alias: lite, standard, pro, coder, heavy).
messages[]: OpenAI shape: { role, content, name?, tool_call_id? }.
stream: true for SSE deltas (default false for sync).
mode: 'sync' | 'stream' | 'async'. async returns 202 + a job id you poll via GET /v1/jobs/:id; optionally HMAC-webhooked on completion.
temperature, top_p, max_tokens: standard sampling.
tools, tool_choice: full OpenAI function-calling shape.
response_format: text | json_object | json_schema.

Response shapes

Sync (default)

{
  "id": "chatcmpl-...",
  "object": "chat.completion",
  "created": 1731020000,
  "model": "qwen3:0.6b",
  "choices": [
    {
      "index": 0,
      "message": { "role": "assistant", "content": "Genie runs open-weights ..." },
      "finish_reason": "stop"
    }
  ],
  "usage": { "prompt_tokens": 12, "completion_tokens": 28, "total_tokens": 40 }
}

Stream (Server-Sent Events)

data: {"choices":[{"delta":{"role":"assistant"}}]}

data: {"choices":[{"delta":{"content":"Genie"}}]}

data: {"choices":[{"delta":{"content":" runs"}}]}

data: {"choices":[{"delta":{},"finish_reason":"stop"}]}

data: [DONE]

Async

{
  "id": "chatcmpl-cmoz...",
  "object": "chat.completion.async",
  "status": "queued",
  "poll_url": "/v1/jobs/cmoz...",
  "created": 1731020000
}

Webhook (when supplied): POST {webhook.url} with the full sync-shape body, signed with X-Genie-Signature: sha256={hex} over the body using webhook.secret.

Budget gating:every request is gated by your org's four-layer cap stack (org → application → member → user-global). When any cap trips, the response is 402 with {code:'budget-cap-hit', capLayer, capWindow, resetsAt}. See Budgets for the full model.