API reference

POST /v1/chat/completions

OpenAI-compatible chat completions through the gateway. Request and response bodies mirror OpenAI's schema. Extra headers describe what useLLM actually did.

Endpoint

POST https://api.usellm.io/v1/chat/completions

Authentication

Send the workspace's gateway key as a bearer token.

Authorization: Bearer ul_live_XXXXXXXXXXXXXXXXXXXXXXXX

Request body

FieldTypeRequiredNotes
modelstringyesAn alias from /routes or a direct provider model id.
messagesarrayyesSame shape as OpenAI's chat completions: { role, content, name? }.
temperaturenumbernoPassed through verbatim.
max_tokensnumbernoPassed through; defaults to 1024 if Anthropic is selected and you omit it.
top_pnumbernoPassed through verbatim.
stopstring | string[]noOpenAI shape; translated to Anthropic's stop_sequences when needed.
streambooleannoStreaming is not yet supported — pass false or omit. Streaming requests currently return 501 not_implemented.

Response body

The response is OpenAI-shaped. The modelfield echoes what the caller requested — usually the alias — not the provider's internal model id (that lives in the headers).

Successful responsejson
{
  "id": "chatcmpl_3p2c4xz",
  "object": "chat.completion",
  "created": 1715600000,
  "model": "smart",
  "choices": [
    {
      "index": 0,
      "message": { "role": "assistant", "content": "Hello!" },
      "finish_reason": "stop"
    }
  ],
  "usage": {
    "prompt_tokens": 12,
    "completion_tokens": 4,
    "total_tokens": 16
  }
}

Response headers

Every successful response includes useLLM-specific headers describing what actually happened.

HeaderExampleMeaning
x-usellm-provideropenaiWhich provider served this response.
x-usellm-modelgpt-4oThe actual model id used at the provider (different from the request body's model when an alias is in play).
x-usellm-fallback-usedtrueTrue when the primary entry in the route's chain failed and a fallback served the request.
x-usellm-latency-ms284Provider call latency. Gateway overhead is typically <5 ms.
x-usellm-cost-usd0.000420Estimated provider cost for this call, computed from the live model registry. Frozen at request time on the log row.

Errors

Errors come back in OpenAI's shape, with a useLLM-specific code:

Example errorjson
{
  "error": {
    "message": "OpenAI is rate-limiting the provider key.",
    "type": "rate_limited",
    "code": "rate_limited"
  }
}

See Errors for the full code reference, status codes, and which errors trigger fallback.

Examples

from openai import OpenAI

client = OpenAI(
    api_key="ul_live_XXXXXXXXXXXXXXXXXXXXXXXX",
    base_url="https://api.usellm.io/v1",
)

response = client.chat.completions.create(
    model="smart",
    messages=[
        {"role": "system", "content": "You are concise."},
        {"role": "user", "content": "Capital of France?"},
    ],
    temperature=0.2,
)
print(response.choices[0].message.content)
  • See Models for the catalog endpoint OpenAI SDKs hit to enumerate available models.