Functions & Variants

The OpenAI/Anthropic-compatible endpoints map one request to one model. The LLM Service Daemon’s (LSD’s) native /inference endpoint adds a layer above that: a function is a named task (e.g. generate_summary), and a variant is one way to accomplish it (a specific model, prompt, and set of parameters). A function can have several variants running side by side, each receiving a share of traffic.

POST /inference
{
  "function_name": "generate_summary",
  "input": { "messages": [{ "role": "user", "content": "..." }] }
}

LSD picks a variant for you (see Experimentation), or you can pin one with "variant_name".

Defining a function

[functions.generate_summary]
type = "chat"  # or "json"

[functions.generate_summary.variants.sonnet5]
type = "chat_completion"
model = "claude-sonnet-5"

JSON functions additionally require an output_schema and use json_mode_tool_call_config to enforce it.

Variant types

`type`	Behavior
`chat_completion`	A single model, prompt templates, and inference parameters. The default and most common variant.
`best_of_n_sampling`	Runs `candidates` (other variants) in parallel, then an evaluator variant picks the best response.
`mixture_of_n`	Runs `candidates` in parallel, then a fuser variant combines them into one response.
`dicl`	Dynamic in-context learning: embeds the input, retrieves the `k` nearest stored examples, and injects them as few-shot context before calling `model`.
`chain_of_thought`	Deprecated; use `chat_completion` with `reasoning_effort` instead.

Reliability: retries and fallback

A chat_completion (and dicl) variant has its own retry policy, independent of the provider-level routing described in Gateway:

[functions.generate_summary.variants.minimax_primary.retries]
num_retries = 3
max_delay_s = 10

[functions.generate_summary.variants.minimax_primary]
model = "minimax-m3"
fallback = ["gpt-5.5"]  # tried only after minimax-m3 exhausts retries and provider routing

Prompt templates

Templates are MiniJinja files referenced from the variant, invoked through {"type": "template", "name": "..."} input blocks (or the legacy user/assistant/system role-based wrappers). A function can declare a JSON schema per template to validate the template’s input variables.

Structured outputs

JSON functions declare an output_schema; LSD enforces it either through the provider’s native structured-output mode or by wrapping it as an implicit tool call (json_mode = "tool"), controllable per variant: off, on, strict, or tool.

Tool use

Functions can declare tools (by name, referencing [tools.*] config blocks) and a tool_choice: none, auto, required, or a specific tool name. parallel_tool_calls controls whether the model may call more than one tool per turn.

Semantic cache

Functions can opt into a semantic cache ([functions.my_function.semantic_cache]). It stores embeddings in Postgres and serves a cached response when a sufficiently similar input has been seen before.

Built-in evaluators

A function can attach evaluators directly in its config, in addition to the standalone evaluation tooling covered in Evaluations.