API reference

POST /v1/chat/completions

OpenAI-compatible chat completions through the gateway. Request and response bodies mirror OpenAI's schema. Extra headers describe what useLLM actually did.

Endpoint

POST https://api.usellm.io/v1/chat/completions

Authentication

Send the workspace's gateway key as a bearer token.

Authorization: Bearer ul_live_XXXXXXXXXXXXXXXXXXXXXXXX

Request body

Field	Type	Required	Notes
`model`	string	yes	An alias from /routes or a direct provider model id.
`messages`	array	yes	Same shape as OpenAI's chat completions: { role, content, name? }.
`temperature`	number	no	Passed through verbatim.
`max_tokens`	number	no	Passed through; defaults to 1024 if Anthropic is selected and you omit it.
`top_p`	number	no	Passed through verbatim.
`stop`	string \| string[]	no	OpenAI shape; translated to Anthropic's stop_sequences when needed.
`stream`	boolean	no	Streaming is not yet supported — pass false or omit. Streaming requests currently return 501 not_implemented.

Response body

The response is OpenAI-shaped. The modelfield echoes what the caller requested — usually the alias — not the provider's internal model id (that lives in the headers).

Successful responsejson

{
  "id": "chatcmpl_3p2c4xz",
  "object": "chat.completion",
  "created": 1715600000,
  "model": "smart",
  "choices": [
    {
      "index": 0,
      "message": { "role": "assistant", "content": "Hello!" },
      "finish_reason": "stop"
    }
  ],
  "usage": {
    "prompt_tokens": 12,
    "completion_tokens": 4,
    "total_tokens": 16
  }
}

Response headers

Every successful response includes useLLM-specific headers describing what actually happened.

Header	Example	Meaning
`x-usellm-provider`	openai	Which provider served this response.
`x-usellm-model`	gpt-4o	The actual model id used at the provider (different from the request body's model when an alias is in play).
`x-usellm-fallback-used`	true	True when the primary entry in the route's chain failed and a fallback served the request.
`x-usellm-latency-ms`	284	Provider call latency. Gateway overhead is typically <5 ms.
`x-usellm-cost-usd`	0.000420	Estimated provider cost for this call, computed from the live model registry. Frozen at request time on the log row.

Errors

Errors come back in OpenAI's shape, with a useLLM-specific code:

Example errorjson

{
  "error": {
    "message": "OpenAI is rate-limiting the provider key.",
    "type": "rate_limited",
    "code": "rate_limited"
  }
}

See Errors for the full code reference, status codes, and which errors trigger fallback.

Examples

python

from openai import OpenAI

client = OpenAI(
    api_key="ul_live_XXXXXXXXXXXXXXXXXXXXXXXX",
    base_url="https://api.usellm.io/v1",
)

response = client.chat.completions.create(
    model="smart",
    messages=[
        {"role": "system", "content": "You are concise."},
        {"role": "user", "content": "Capital of France?"},
    ],
    temperature=0.2,
)
print(response.choices[0].message.content)

See Models for the catalog endpoint OpenAI SDKs hit to enumerate available models.