Skip to content

Gateway

The gateway is the part of the LLM Service Daemon (LSD) that accepts inference requests, routes them to a model provider, and records the result. It is built on Rama and exposes both OpenAI-compatible and native endpoints.

Endpoint Purpose
POST /openai/v1/chat/completions OpenAI-compatible chat completions
POST /openai/v1/embeddings OpenAI-compatible embeddings
POST /anthropic/v1/messages Anthropic Messages API compatibility
POST /inference LSD’s native inference endpoint (functions, variants, episodes)
POST /feedback Attach metric/boolean/float/comment/demonstration feedback to an inference or episode
POST /batch_inference, GET /batch_inference/{batch_id} Submit and poll batch inference jobs
GET /status, GET /health Liveness/readiness
GET /metrics Prometheus metrics

Point an existing OpenAI SDK at http://<host>/openai/v1 and it works without code changes.

LSD has three independent layers for handling a failing request, each configured at a different level:

  1. Provider routing, on the model: an ordered list of providers behind one model name. If the first provider errors, the next one in the list is tried.

    [models."grok-4p3"]
    routing = ["xai", "openrouter"]
    [models."grok-4p3".providers.xai]
    type = "xai"
    model_name = "grok-4.3"
    [models."grok-4p3".providers.openrouter]
    type = "openrouter"
    model_name = "x-ai/grok-4.3"
  2. Retries, on the variant: how many times to retry the same model before giving up on it, with a capped delay between attempts.

    [functions.my_function.variants.my_variant.retries]
    num_retries = 3
    max_delay_s = 10
  3. Cross-model fallback, also on the variant: once model has exhausted its providers and its retries, fall through to an entirely different model.

    [functions.my_function.variants.my_variant]
    model = "claude-sonnet-5"
    fallback = ["kimi-k2p6", "glm-5p2"]

Use LSD’s native functions and variants to reach layers 2 and 3; the OpenAI/Anthropic-compatible endpoints only see layer 1 (a single "model" resolving through its provider list).

LSD speaks the native API of each of these providers directly (no separate proxy layer):

Anthropic, AWS Bedrock, AWS SageMaker, Azure OpenAI, DeepSeek, Fireworks, GCP Vertex AI (Anthropic and Gemini), Google AI Studio (Gemini), Groq, Hyperbolic, Mistral, OpenAI, OpenRouter, SGLang, TGI, Together AI, vLLM, and xAI.

Any OpenAI-compatible self-hosted endpoint (vLLM, SGLang, TGI, and similar) can also be configured directly.

Provider credentials default to each provider’s standard environment variable (e.g. OPENAI_API_KEY, ANTHROPIC_API_KEY). You can override the credential location per provider in config if you need to read from a different variable or inject it dynamically.

Auth is opt-in: set gateway.auth.enabled = true and issue bearer tokens with gateway --create-api-key. The gateway also supports configurable rate-limiting rules; see the [rate_limiting] config section.