Chat completions
OpenAI-compatible chat completions on open-weights models. Same wire shape as POST api.openai.com/v1/chat/completions. Sync, stream, and async modes; pick one per request.
Endpoint
POST https://api.genie.tech/v1/chat/completions
Authorization: Bearer sk-genie-{your-key}
Content-Type: application/jsonTry it
Anonymous mode. Sign in to enable the Run button. The widget below shows the request shape you'd send.
Streaming responses fill the response pane progressively. If the worker fails to load the requested model the pane shows the error inline (e.g. [error] 500 model failed to load … + [finish_reason: error]) instead of staying blank. Silence is the bug, not the feature.
Try it nowChat completion (stream)
Sign in to tryPOST /api/v1/chat/completions
SDK + curl
curl https://api.genie.tech/v1/chat/completions \
-H "Authorization: Bearer $GENIE_API_KEY" \
-H "Content-Type: application/json" \
-d '{
"model": "qwen3:0.6b",
"messages": [{"role": "user", "content": "hi"}]
}'Request body
model: model id fromGET /v1/models(or use a persona alias:lite,standard,pro,coder,heavy).messages[]: OpenAI shape:{ role, content, name?, tool_call_id? }.stream:truefor SSE deltas (defaultfalsefor sync).mode:'sync' | 'stream' | 'async'.asyncreturns 202 + a job id you poll viaGET /v1/jobs/:id; optionally HMAC-webhooked on completion.temperature,top_p,max_tokens: standard sampling.tools,tool_choice: full OpenAI function-calling shape.response_format:text|json_object|json_schema.
Response shapes
Sync (default)
{
"id": "chatcmpl-...",
"object": "chat.completion",
"created": 1731020000,
"model": "qwen3:0.6b",
"choices": [
{
"index": 0,
"message": { "role": "assistant", "content": "Genie runs open-weights ..." },
"finish_reason": "stop"
}
],
"usage": { "prompt_tokens": 12, "completion_tokens": 28, "total_tokens": 40 }
}Stream (Server-Sent Events)
data: {"choices":[{"delta":{"role":"assistant"}}]}
data: {"choices":[{"delta":{"content":"Genie"}}]}
data: {"choices":[{"delta":{"content":" runs"}}]}
data: {"choices":[{"delta":{},"finish_reason":"stop"}]}
data: [DONE]Async
{
"id": "chatcmpl-cmoz...",
"object": "chat.completion.async",
"status": "queued",
"poll_url": "/v1/jobs/cmoz...",
"created": 1731020000
}Webhook (when supplied): POST {webhook.url} with the full sync-shape body, signed with X-Genie-Signature: sha256={hex} over the body using webhook.secret.
Budget gating:every request is gated by your org's four-layer cap stack (org → application → member → user-global). When any cap trips, the response is
402 with {code:'budget-cap-hit', capLayer, capWindow, resetsAt}. See Budgets for the full model.