Routing & aliases
Aliases let you call model: 'smart' instead of model: 'gpt-4o-2024-08-06'. Swap providers and tune fallbacks from the dashboard, not from a deploy.
The model string the gateway sees
When the gateway receives a request, it resolves the model field in this order:
- If the name matches an enabled route aliasin your workspace, use the route's primary + fallback chain.
- Otherwise, treat the string as a direct model id. The gateway sniffs the prefix to pick a provider:
gpt-*,o*,text-embedding-*→ OpenAI;claude-*→ Anthropic. - If neither path matches you get a 400
model_not_supported.
Anatomy of a route
| Field | Meaning |
|---|---|
alias | Workspace-unique name. Lowercase letters/digits/dashes/underscores, 1–63 chars, must start with a letter or digit. |
| Primary | Provider + model id tried first. Required. |
| Fallbacks | Ordered list of { provider, model } pairs. Tried in order if the primary fails for a retryable reason (timeout, 429, 5xx, network). |
| Retries per attempt | How many times to retry the same (provider, model) before falling through to the next attempt. 0–5, default 1. |
| Timeout (ms) | Cap on a single provider call. 1,000–120,000 ms, default 30,000. |
| Enabled | Pause a route without deleting it. Paused routes 404 at the gateway resolver. |
Example: a quality-first alias with a cheap escape hatch
On /routes → New route:
| Field | Value |
|---|---|
| Alias | smart |
| Description | Best quality for complex prompts |
| Primary | Anthropic · claude-sonnet-4-5 |
| Fallback 1 | OpenAI · gpt-4o |
| Fallback 2 | OpenAI · gpt-4o-mini |
| Retries per attempt | 1 |
| Timeout | 25,000 ms |
Then call it like any other model:
await client.chat.completions.create({
model: "smart",
messages: [{ role: "user", content: "Summarise this PDF…" }],
});If Anthropic times out, the gateway retries once, then falls over to OpenAI gpt-4o — and only then to gpt-4o-mini. The response's x-usellm-fallback-used: true header tells you the chain kicked in.
Which errors trigger fallback?
Only retryable errors fall through the chain. The intent is to insulate users from transient provider issues, not to paper over real bugs.
- Retryable (consume retries, then fall back):
timeout,rate_limited(provider 429),provider_unavailable(5xx, network). - Not retryable (surface to caller immediately):
invalid_request(provider 400),provider_auth(provider 401/403),no_provider_key(missing connection),quota_exceeded(your useLLM plan).
See Errors for the full code reference.
Editing without redeploys
Your app code keeps calling model: "smart". Pause a route, swap a primary model, add or remove fallbacks — the next request picks up the change. No restart, no env var rollouts.