Functions & Variants
The OpenAI/Anthropic-compatible endpoints map one request to one model. The LLM Service Daemon’s (LSD’s) native /inference endpoint adds a layer above that: a function is a named task (e.g. generate_summary), and a variant is one way to accomplish it (a specific model, prompt, and set of parameters). A function can have several variants running side by side, each receiving a share of traffic.
POST /inference{ "function_name": "generate_summary", "input": { "messages": [{ "role": "user", "content": "..." }] }}LSD picks a variant for you (see Experimentation), or you can pin one with "variant_name".
Defining a function
Section titled “Defining a function”[functions.generate_summary]type = "chat" # or "json"
[functions.generate_summary.variants.sonnet5]type = "chat_completion"model = "claude-sonnet-5"JSON functions additionally require an output_schema and use json_mode_tool_call_config to enforce it.
Variant types
Section titled “Variant types”type |
Behavior |
|---|---|
chat_completion |
A single model, prompt templates, and inference parameters. The default and most common variant. |
best_of_n_sampling |
Runs candidates (other variants) in parallel, then an evaluator variant picks the best response. |
mixture_of_n |
Runs candidates in parallel, then a fuser variant combines them into one response. |
dicl |
Dynamic in-context learning: embeds the input, retrieves the k nearest stored examples, and injects them as few-shot context before calling model. |
chain_of_thought |
Deprecated; use chat_completion with reasoning_effort instead. |
Reliability: retries and fallback
Section titled “Reliability: retries and fallback”A chat_completion (and dicl) variant has its own retry policy, independent of the provider-level routing described in Gateway:
[functions.generate_summary.variants.minimax_primary.retries]num_retries = 3max_delay_s = 10
[functions.generate_summary.variants.minimax_primary]model = "minimax-m3"fallback = ["gpt-5.5"] # tried only after minimax-m3 exhausts retries and provider routingPrompt templates
Section titled “Prompt templates”Templates are MiniJinja files referenced from the variant, invoked through {"type": "template", "name": "..."} input blocks (or the legacy user/assistant/system role-based wrappers). A function can declare a JSON schema per template to validate the template’s input variables.
Structured outputs
Section titled “Structured outputs”JSON functions declare an output_schema; LSD enforces it either through the provider’s native structured-output mode or by wrapping it as an implicit tool call (json_mode = "tool"), controllable per variant: off, on, strict, or tool.
Tool use
Section titled “Tool use”Functions can declare tools (by name, referencing [tools.*] config blocks) and a tool_choice: none, auto, required, or a specific tool name. parallel_tool_calls controls whether the model may call more than one tool per turn.
Semantic cache
Section titled “Semantic cache”Functions can opt into a semantic cache ([functions.my_function.semantic_cache]). It stores embeddings in Postgres and serves a cached response when a sufficiently similar input has been seen before.
Built-in evaluators
Section titled “Built-in evaluators”A function can attach evaluators directly in its config, in addition to the standalone evaluation tooling covered in Evaluations.