Skip to content

Functions & Variants

The OpenAI/Anthropic-compatible endpoints map one request to one model. The LLM Service Daemon’s (LSD’s) native /inference endpoint adds a layer above that: a function is a named task (e.g. generate_summary), and a variant is one way to accomplish it (a specific model, prompt, and set of parameters). A function can have several variants running side by side, each receiving a share of traffic.

POST /inference
{
"function_name": "generate_summary",
"input": { "messages": [{ "role": "user", "content": "..." }] }
}

LSD picks a variant for you (see Experimentation), or you can pin one with "variant_name".

[functions.generate_summary]
type = "chat" # or "json"
[functions.generate_summary.variants.sonnet5]
type = "chat_completion"
model = "claude-sonnet-5"

JSON functions additionally require an output_schema and use json_mode_tool_call_config to enforce it.

type Behavior
chat_completion A single model, prompt templates, and inference parameters. The default and most common variant.
best_of_n_sampling Runs candidates (other variants) in parallel, then an evaluator variant picks the best response.
mixture_of_n Runs candidates in parallel, then a fuser variant combines them into one response.
dicl Dynamic in-context learning: embeds the input, retrieves the k nearest stored examples, and injects them as few-shot context before calling model.
chain_of_thought Deprecated; use chat_completion with reasoning_effort instead.

A chat_completion (and dicl) variant has its own retry policy, independent of the provider-level routing described in Gateway:

[functions.generate_summary.variants.minimax_primary.retries]
num_retries = 3
max_delay_s = 10
[functions.generate_summary.variants.minimax_primary]
model = "minimax-m3"
fallback = ["gpt-5.5"] # tried only after minimax-m3 exhausts retries and provider routing

Templates are MiniJinja files referenced from the variant, invoked through {"type": "template", "name": "..."} input blocks (or the legacy user/assistant/system role-based wrappers). A function can declare a JSON schema per template to validate the template’s input variables.

JSON functions declare an output_schema; LSD enforces it either through the provider’s native structured-output mode or by wrapping it as an implicit tool call (json_mode = "tool"), controllable per variant: off, on, strict, or tool.

Functions can declare tools (by name, referencing [tools.*] config blocks) and a tool_choice: none, auto, required, or a specific tool name. parallel_tool_calls controls whether the model may call more than one tool per turn.

Functions can opt into a semantic cache ([functions.my_function.semantic_cache]). It stores embeddings in Postgres and serves a cached response when a sufficiently similar input has been seen before.

A function can attach evaluators directly in its config, in addition to the standalone evaluation tooling covered in Evaluations.