Core concepts

Routing & aliases

Aliases let you call model: 'smart' instead of model: 'gpt-4o-2024-08-06'. Swap providers and tune fallbacks from the dashboard, not from a deploy.

The model string the gateway sees

When the gateway receives a request, it resolves the model field in this order:

If the name matches an enabled route aliasin your workspace, use the route's primary + fallback chain.
Otherwise, treat the string as a direct model id. The gateway sniffs the prefix to pick a provider:gpt-*, o*, text-embedding-* → OpenAI; claude-* → Anthropic.
If neither path matches you get a 400 model_not_supported.

Anatomy of a route

Field	Meaning
`alias`	Workspace-unique name. Lowercase letters/digits/dashes/underscores, 1–63 chars, must start with a letter or digit.
Primary	Provider + model id tried first. Required.
Fallbacks	Ordered list of { provider, model } pairs. Tried in order if the primary fails for a retryable reason (timeout, 429, 5xx, network).
Retries per attempt	How many times to retry the same (provider, model) before falling through to the next attempt. 0–5, default 1.
Timeout (ms)	Cap on a single provider call. 1,000–120,000 ms, default 30,000.
Enabled	Pause a route without deleting it. Paused routes 404 at the gateway resolver.

Example: a quality-first alias with a cheap escape hatch

On /routes → New route:

Field	Value
Alias	`smart`
Description	Best quality for complex prompts
Primary	Anthropic · claude-sonnet-4-5
Fallback 1	OpenAI · gpt-4o
Fallback 2	OpenAI · gpt-4o-mini
Retries per attempt	1
Timeout	25,000 ms

Then call it like any other model:

Calling an aliasts

await client.chat.completions.create({
  model: "smart",
  messages: [{ role: "user", content: "Summarise this PDF…" }],
});

If Anthropic times out, the gateway retries once, then falls over to OpenAI gpt-4o — and only then to gpt-4o-mini. The response's x-usellm-fallback-used: true header tells you the chain kicked in.

Which errors trigger fallback?

Only retryable errors fall through the chain. The intent is to insulate users from transient provider issues, not to paper over real bugs.

Retryable (consume retries, then fall back): timeout, rate_limited (provider 429), provider_unavailable (5xx, network).
Not retryable (surface to caller immediately): invalid_request (provider 400), provider_auth (provider 401/403), no_provider_key (missing connection), quota_exceeded (your useLLM plan).

See Errors for the full code reference.

Editing without redeploys

Your app code keeps calling model: "smart". Pause a route, swap a primary model, add or remove fallbacks — the next request picks up the change. No restart, no env var rollouts.