POST /v1/chat/completions
OpenAI-compatible chat completions through the gateway. Request and response bodies mirror OpenAI's schema. Extra headers describe what useLLM actually did.
Endpoint
POST https://api.usellm.io/v1/chat/completionsAuthentication
Send the workspace's gateway key as a bearer token.
Authorization: Bearer ul_live_XXXXXXXXXXXXXXXXXXXXXXXXRequest body
| Field | Type | Required | Notes |
|---|---|---|---|
model | string | yes | An alias from /routes or a direct provider model id. |
messages | array | yes | Same shape as OpenAI's chat completions: { role, content, name? }. |
temperature | number | no | Passed through verbatim. |
max_tokens | number | no | Passed through; defaults to 1024 if Anthropic is selected and you omit it. |
top_p | number | no | Passed through verbatim. |
stop | string | string[] | no | OpenAI shape; translated to Anthropic's stop_sequences when needed. |
stream | boolean | no | Streaming is not yet supported — pass false or omit. Streaming requests currently return 501 not_implemented. |
Response body
The response is OpenAI-shaped. The modelfield echoes what the caller requested — usually the alias — not the provider's internal model id (that lives in the headers).
Successful responsejson
{
"id": "chatcmpl_3p2c4xz",
"object": "chat.completion",
"created": 1715600000,
"model": "smart",
"choices": [
{
"index": 0,
"message": { "role": "assistant", "content": "Hello!" },
"finish_reason": "stop"
}
],
"usage": {
"prompt_tokens": 12,
"completion_tokens": 4,
"total_tokens": 16
}
}Response headers
Every successful response includes useLLM-specific headers describing what actually happened.
| Header | Example | Meaning |
|---|---|---|
x-usellm-provider | openai | Which provider served this response. |
x-usellm-model | gpt-4o | The actual model id used at the provider (different from the request body's model when an alias is in play). |
x-usellm-fallback-used | true | True when the primary entry in the route's chain failed and a fallback served the request. |
x-usellm-latency-ms | 284 | Provider call latency. Gateway overhead is typically <5 ms. |
x-usellm-cost-usd | 0.000420 | Estimated provider cost for this call, computed from the live model registry. Frozen at request time on the log row. |
Errors
Errors come back in OpenAI's shape, with a useLLM-specific code:
Example errorjson
{
"error": {
"message": "OpenAI is rate-limiting the provider key.",
"type": "rate_limited",
"code": "rate_limited"
}
}See Errors for the full code reference, status codes, and which errors trigger fallback.
Examples
python
from openai import OpenAI
client = OpenAI(
api_key="ul_live_XXXXXXXXXXXXXXXXXXXXXXXX",
base_url="https://api.usellm.io/v1",
)
response = client.chat.completions.create(
model="smart",
messages=[
{"role": "system", "content": "You are concise."},
{"role": "user", "content": "Capital of France?"},
],
temperature=0.2,
)
print(response.choices[0].message.content)- See Models for the catalog endpoint OpenAI SDKs hit to enumerate available models.