Get started

Introduction

useLLM is a BYOK (bring your own keys) gateway in front of OpenAI and Anthropic. One OpenAI-compatible endpoint, your provider keys, model aliases, fallbacks, and a dashboard for everything.

Your app talks to useLLM the same way it talks to OpenAI today — change the baseURL and the API key, keep the same request shape. Behind the scenes useLLM uses your provider keys to call OpenAI or Anthropic, retries on transient errors, falls back to a backup model if the primary chain fails, and writes a metadata-only request log for the dashboard.

Drop-in replacementpython

from openai import OpenAI

client = OpenAI(
    api_key="ul_live_XXXXXXXXXXXXXXXXXXXXXXXX",  # your useLLM gateway key
    base_url="http://localhost:4000/v1",
)

res = client.chat.completions.create(
    model="smart",  # an alias you defined, or any provider model id
    messages=[{"role": "user", "content": "Hello!"}],
)
print(res.choices[0].message.content)

What you get

One endpoint for OpenAI + Anthropic models. OpenAI-compatible request and response shape, including streaming and tool calls (streaming lands in a follow-up).
BYOK — provider keys are stored AES-256-GCM encrypted and decrypted only at request time. Token usage is billed to your OpenAI/Anthropic accounts directly. useLLM never resells tokens.
Model aliases like smart or cheap. Each alias maps to a primary model plus an ordered fallback chain, configurable from the dashboard with no redeploy.
Observability — per-request metadata logs feed dashboards for latency, error rate, fallback rate, and estimated provider cost per period.
Plan quotas enforced at the gateway. Over-quota requests return 429 quota_exceeded; an upgrade to a higher plan resets the period.

How it fits together

Your app sends requests with a ul_live_* gateway key. The gateway authenticates it, looks up the route alias (or maps the model id to a provider), pulls the matching encrypted provider key out of your workspace, and forwards the call. The response comes back OpenAI-shaped, with extra headers describing what really happened.

Next steps

Run through the Quickstart to send your first request in five minutes, or jump straight to Routing & aliases if you already have keys connected.