Gateway
The gateway is the part of the LLM Service Daemon (LSD) that accepts inference requests, routes them to a model provider, and records the result. It is built on Rama and exposes both OpenAI-compatible and native endpoints.
API surface
Section titled “API surface”| Endpoint | Purpose |
|---|---|
POST /openai/v1/chat/completions |
OpenAI-compatible chat completions |
POST /openai/v1/embeddings |
OpenAI-compatible embeddings |
POST /anthropic/v1/messages |
Anthropic Messages API compatibility |
POST /inference |
LSD’s native inference endpoint (functions, variants, episodes) |
POST /feedback |
Attach metric/boolean/float/comment/demonstration feedback to an inference or episode |
POST /batch_inference, GET /batch_inference/{batch_id} |
Submit and poll batch inference jobs |
GET /status, GET /health |
Liveness/readiness |
GET /metrics |
Prometheus metrics |
Point an existing OpenAI SDK at http://<host>/openai/v1 and it works without code changes.
Routing, retries, and fallback
Section titled “Routing, retries, and fallback”LSD has three independent layers for handling a failing request, each configured at a different level:
-
Provider routing, on the model: an ordered list of providers behind one model name. If the first provider errors, the next one in the list is tried.
[models."grok-4p3"]routing = ["xai", "openrouter"][models."grok-4p3".providers.xai]type = "xai"model_name = "grok-4.3"[models."grok-4p3".providers.openrouter]type = "openrouter"model_name = "x-ai/grok-4.3" -
Retries, on the variant: how many times to retry the same model before giving up on it, with a capped delay between attempts.
[functions.my_function.variants.my_variant.retries]num_retries = 3max_delay_s = 10 -
Cross-model fallback, also on the variant: once
modelhas exhausted its providers and its retries, fall through to an entirely different model.[functions.my_function.variants.my_variant]model = "claude-sonnet-5"fallback = ["kimi-k2p6", "glm-5p2"]
Use LSD’s native functions and variants to reach layers 2 and 3; the OpenAI/Anthropic-compatible endpoints only see layer 1 (a single "model" resolving through its provider list).
Supported providers
Section titled “Supported providers”LSD speaks the native API of each of these providers directly (no separate proxy layer):
Anthropic, AWS Bedrock, AWS SageMaker, Azure OpenAI, DeepSeek, Fireworks, GCP Vertex AI (Anthropic and Gemini), Google AI Studio (Gemini), Groq, Hyperbolic, Mistral, OpenAI, OpenRouter, SGLang, TGI, Together AI, vLLM, and xAI.
Any OpenAI-compatible self-hosted endpoint (vLLM, SGLang, TGI, and similar) can also be configured directly.
Credentials
Section titled “Credentials”Provider credentials default to each provider’s standard environment variable (e.g. OPENAI_API_KEY, ANTHROPIC_API_KEY). You can override the credential location per provider in config if you need to read from a different variable or inject it dynamically.
Auth and rate limiting
Section titled “Auth and rate limiting”Auth is opt-in: set gateway.auth.enabled = true and issue bearer tokens with gateway --create-api-key. The gateway also supports configurable rate-limiting rules; see the [rate_limiting] config section.