Files
LocalAI/.agents/api-endpoints-and-auth.md
Richard Palethorpe 0245b33eab feat(realtime): Add Liquid Audio s2s model and assistant mode on talk page (#9801)
* feat(liquid-audio): add LFM2.5-Audio any-to-any backend + realtime_audio usecase

Wires LiquidAI's LFM2.5-Audio-1.5B as a self-contained Realtime API model:
single engine handles VAD, transcription, LLM, and TTS in one bidirectional
stream — drop-in alternative to a VAD+STT+LLM+TTS pipeline.

Backend
- backend/python/liquid-audio/ — new Python gRPC backend wrapping the
  `liquid-audio` package. Modes: chat / asr / tts / s2s, voice presets,
  Load/Predict/PredictStream/AudioTranscription/TTS/VAD/AudioToAudioStream/
  Free and StartFineTune/FineTuneProgress/StopFineTune. Runtime monkey-patch
  on `liquid_audio.utils.snapshot_download` so absolute local paths from
  LocalAI's gallery resolve without a HF round-trip. soundfile in place of
  torchaudio.load/save (torchcodec drags NVIDIA NPP we don't bundle).
- backend/backend.proto + pkg/grpc/{backend,client,server,base,embed,
  interface}.go — new AudioToAudioStream RPC mirroring AudioTransformStream
  (config/frame/control oneof in; typed event+pcm+meta out).
- core/services/nodes/{health_mock,inflight}_test.go — add stubs for the
  new RPC to the test fakes.

Config + capabilities
- core/config/backend_capabilities.go — UsecaseRealtimeAudio, MethodAudio
  ToAudioStream, UsecaseInfoMap entry, liquid-audio BackendCapability row.
- core/config/model_config.go — FLAG_REALTIME_AUDIO bitmask, ModalityGroups
  membership in both speech-input and audio-output groups so a lone flag
  still reads as multimodal, GetAllModelConfigUsecases entry, GuessUsecases
  branch.

Realtime endpoint
- core/http/endpoints/openai/realtime.go — extract prepareRealtimeConfig()
  so the gate is unit-testable; accept realtime_audio models and self-fill
  empty pipeline slots with the model's own name (user-pinned slots win).
- core/http/endpoints/openai/realtime_gate_test.go — six specs covering nil
  cfg, empty pipeline, legacy pipeline, self-contained realtime_audio,
  user-pinned VAD slot, and partial legacy pipeline.

UI + endpoints
- core/http/routes/ui.go — /api/pipeline-models accepts either a legacy
  VAD+STT+LLM+TTS pipeline or a realtime_audio model; surfaces a
  self_contained flag so the Talk page can collapse the four cards.
- core/http/routes/ui_api.go — realtime_audio in usecaseFilters.
- core/http/routes/ui_pipeline_models_test.go — covers both code paths.
- core/http/react-ui/src/pages/Talk.jsx — self-contained badge instead of
  the four-slot grid; rename Edit Pipeline → Edit Model Config; less
  pipeline-specific wording.
- core/http/react-ui/src/pages/Models.jsx + locales/en/models.json — new
  realtime_audio filter button + i18n.
- core/http/react-ui/src/utils/capabilities.js — CAP_REALTIME_AUDIO.
- core/http/react-ui/src/pages/FineTune.jsx — voice + validation-dataset
  fields, surfaced when backend === liquid-audio, plumbed via
  extra_options on submit/export/import.

Gallery + importer
- gallery/liquid-audio.yaml — config template with known_usecases:
  [realtime_audio, chat, tts, transcript, vad].
- gallery/index.yaml — four model entries (realtime/chat/asr/tts) keyed by
  mode option. Fixed pre-existing `transcribe` typo on the asr entry
  (loader silently dropped the unknown string → entry never surfaced as a
  transcript model).
- gallery/lfm.yaml — function block for the LFM2 Pythonic tool-call format
  `<|tool_call_start|>[name(k="v")]<|tool_call_end|>` matching
  common_chat_params_init_lfm2 in vendored llama.cpp.
- core/gallery/importers/{liquid-audio,liquid-audio_test}.go — detector
  matches LFM2-Audio HF repos (excludes -gguf mirrors); mode/voice
  preferences plumbed through to options.
- core/gallery/importers/importers.go — register LiquidAudioImporter
  before LlamaCPPImporter.
- pkg/functions/parse_lfm2_test.go — seven specs for the response/argument
  regex pair on the LFM2 pythonic format.

Build matrix
- .github/backend-matrix.yml — seven liquid-audio targets (cuda12, cuda13,
  l4t-cuda-13, hipblas, intel, cpu amd64, cpu arm64). Jetpack r36 cuda-12
  is skipped (Ubuntu 22.04 / Python 3.10 incompatible with liquid-audio's
  3.12 floor).
- backend/index.yaml — anchor + 13 image entries.
- Makefile — .NOTPARALLEL, prepare-test-extra, test-extra,
  docker-build-liquid-audio.

Docs
- .agents/plans/liquid-audio-integration.md — phased plan; PR-D (real
  any-to-any wiring via AudioToAudioStream), PR-E (mid-audio tool-call
  detector), PR-G (GGUF entries once upstream llama.cpp PR #18641 lands)
  remain.
- .agents/api-endpoints-and-auth.md — expand the capability-surface
  checklist with every place a new FLAG_* needs to be registered.

Assisted-by: claude-code:claude-opus-4-7-1m [Claude Code]
Signed-off-by: Richard Palethorpe <io@richiejp.com>

* feat(realtime): function calling + history cap for any-to-any models

Three pieces, all on the realtime_audio path that just landed:

1. liquid-audio backend (backend/python/liquid-audio/backend.py):
   - _build_chat_state grows a `tools_prelude` arg.
   - new _render_tools_prelude parses request.Tools (the OpenAI Chat
     Completions function array realtime.go already serialises) and
     emits an LFM2 `<|tool_list_start|>…<|tool_list_end|>` system turn
     ahead of the user history. Mirrors gallery/lfm.yaml's `function:`
     template so the model sees the same prompt shape whether served
     via llama-cpp or here. Without this the backend silently dropped
     tools — function calling was wired end-to-end on the Go side but
     the model never saw a tool list.

2. Realtime history cap (core/http/endpoints/openai/realtime.go):
   - Session grows MaxHistoryItems int; default picked by new
     defaultMaxHistoryItems(cfg) — 6 for realtime_audio models (LFM2.5
     1.5B degrades quickly past a handful of turns), 0/unlimited for
     legacy pipelines composing larger LLMs.
   - triggerResponse runs conv.Items through trimRealtimeItems before
     building conversationHistory. Helper walks the cut left if it
     would orphan a function_call_output, so tool result + call pairs
     stay intact.
   - realtime_gate_test.go: specs for defaultMaxHistoryItems and
     trimRealtimeItems (zero cap, under cap, over cap, tool-call pair
     preservation).

3. Talk page (core/http/react-ui/src/pages/Talk.jsx):
   - Reuses the chat page's MCP plumbing — useMCPClient hook,
     ClientMCPDropdown component, same auto-connect/disconnect effect
     pattern. No bespoke tool registry, no new REST endpoints; tools
     come from whichever MCP servers the user toggles on, exactly as
     on the chat page.
   - sendSessionUpdate now passes session.tools=getToolsForLLM(); the
     update re-fires when the active server set changes mid-session.
   - New response.function_call_arguments.done handler executes via
     the hook's executeTool (which round-trips through the MCP client
     SDK), then replies with conversation.item.create
     {type:function_call_output} + response.create so the model
     completes its turn with the tool output. Mirrors chat's
     client-side agentic loop, translated to the realtime wire shape.

UI changes require a LocalAI image rebuild (Dockerfile:308-313 bakes
react-ui/dist into the runtime image). Backend.py changes can be
swapped live in /backends/<id>/backend.py + /backend/shutdown.

Assisted-by: claude-code:claude-opus-4-7-1m [Claude Code]
Signed-off-by: Richard Palethorpe <io@richiejp.com>

* feat(realtime): LocalAI Assistant ("Manage Mode") for the Talk page

Mirrors the chat-page metadata.localai_assistant flow so users can ask the
realtime model what's loaded / installed / configured. Tools are run
server-side via the same in-process MCP holder that powers the chat
modality — no transport switch, no proxy, no new wire protocol.

Wire:
- core/http/endpoints/openai/realtime.go:
  - RealtimeSessionOptions{LocalAIAssistant,IsAdmin}; isCurrentUserAdmin
    helper mirrors chat.go's requireAssistantAccess (no-op when auth
    disabled, else requires auth.RoleAdmin).
  - Session grows AssistantExecutor mcpTools.ToolExecutor.
  - runRealtimeSession, when opts.LocalAIAssistant is set: gate on admin,
    fail closed if DisableLocalAIAssistant or the holder has no tools,
    DiscoverTools and inject into session.Tools, prepend
    holder.SystemPrompt() to instructions.
  - Tool-call dispatch loop: when AssistantExecutor.IsTool(name), run
    ExecuteTool inproc, append a FunctionCallOutput to conv.Items, skip
    the function_call_arguments client emit (the client can't execute
    these — it doesn't know about them). After the loop, if any
    assistant tool ran, trigger another response so the model speaks the
    result. Mirrors chat's agentic loop, driven server-side rather than
    via client round-trip.

- core/http/endpoints/openai/realtime_webrtc.go: RealtimeCallRequest
  gains `localai_assistant` (JSON omitempty). Handshake calls
  isCurrentUserAdmin and builds RealtimeSessionOptions.

- core/http/react-ui/src/pages/Talk.jsx: admin-only "Manage Mode"
  checkbox under the Tools dropdown; passes localai_assistant: true to
  realtimeApi.call's body, captured in the connect callback's deps.

Mirroring chat's pattern means the in-process MCP tools surface "just
works" for the Talk page without exposing a Streamable-HTTP MCP endpoint
(which was the alternative). Clients with their own MCP servers can
still use the existing ClientMCPDropdown path in parallel; the realtime
handler distinguishes them by AssistantExecutor.IsTool() at dispatch
time.

Assisted-by: claude-code:claude-opus-4-7-1m [Claude Code]
Signed-off-by: Richard Palethorpe <io@richiejp.com>

* feat(realtime): render Manage Mode tool calls in the Talk transcript

Previously the realtime endpoint only emitted response.output_item.added
for the FunctionCall item, and Talk.jsx's switch ignored the event — so
server-side tool runs were invisible in the UI. The model would speak
the result but the user had no way to see what tool was actually
called.

realtime.go: after executing an assistant tool inproc, emit a second
output_item.added/.done pair for the FunctionCallOutput item. Mirrors
the way the chat page displays tool_call + tool_result blocks.

Talk.jsx: handle both response.output_item.added and .done. Render
FunctionCall (with arguments) and FunctionCallOutput (pretty-printed
JSON when possible) as two transcript entries — `tool_call` with the
wrench icon, `tool_result` with the clipboard icon, both in mono-space
secondary-colour. Resets streamingRef after the result so the next
assistant text delta starts a fresh transcript entry instead of
appending to the previous turn.

Assisted-by: claude-code:claude-opus-4-7-1m [Claude Code]
Signed-off-by: Richard Palethorpe <io@richiejp.com>

* refactor(realtime): bound the Manage Mode tool-loop + preserve assistant tools

Fallout from a review pass on the Manage Mode patches:

- Bound the server-side agentic loop. triggerResponse used to recurse on
  executedAssistantTool with no cap — a model that kept calling tools
  would blow the goroutine stack. New maxAssistantToolTurns = 10 (mirrors
  useChat.js's maxToolTurns). Public triggerResponse is now a thin shim
  over triggerResponseAtTurn(toolTurn int); recursion increments the
  counter and stops at the cap with an xlog.Warn.

- Preserve Manage Mode tools across client session.update. The handler
  used to blindly overwrite session.Tools, so toggling a client MCP
  server mid-session silently wiped the in-process admin tools. Session
  now caches the original AssistantTools slice at session creation and
  the session.update handler merges them back in (client names win on
  collision — the client is explicit).

- strconv.ParseBool for the localai_assistant query param instead of
  hand-rolled "1" || "true". Mirrors LocalAIAssistantFromMetadata.

- Talk.jsx: render both tool_call and tool_result on
  response.output_item.done instead of splitting them across .added and
  .done. The server's event pairing (added → done) stays correct; the
  UI just doesn't need to inspect both phases of the same item. One
  switch case instead of two, no behavioural change.

Out of scope (noted for follow-ups): extract a shared assistant-tools
helper between chat.go and realtime.go (duplication is small enough
that two parallel implementations stay readable for now), and an i18n
key for the Manage Mode helper text (Talk.jsx doesn't use i18n
anywhere else yet).

Assisted-by: claude-code:claude-opus-4-7-1m [Claude Code]
Signed-off-by: Richard Palethorpe <io@richiejp.com>

* ci(test-extra): wire liquid-audio backend smoke test

The backend ships test.py + a `make test` target and is listed in
backend-matrix.yml, so scripts/changed-backends.js already writes a
`liquid-audio=true|false` output when files under backend/python/liquid-audio/
change. The workflow just wasn't reading it.

- Expose the `liquid-audio` output on the detect-changes job
- Add a tests-liquid-audio job that runs `make` + `make test` in
  backend/python/liquid-audio, gated on the per-backend detect flag

The smoke covers Health() and LoadModel(mode:finetune); fine-tune mode
short-circuits before any HuggingFace download (backend.py:192), so the
job needs neither weights nor a GPU. The full-inference path remains
gated on LIQUID_AUDIO_MODEL_ID, which CI doesn't set.

The four new Go test files (core/gallery/importers/liquid-audio_test.go,
core/http/endpoints/openai/realtime_gate_test.go,
core/http/routes/ui_pipeline_models_test.go, pkg/functions/parse_lfm2_test.go)
are already picked up by the existing test.yml workflow via `make test` →
`ginkgo -r ./pkg/... ./core/...`; their packages all carry RunSpecs entries.

Assisted-by: Claude:claude-opus-4-7
Signed-off-by: Richard Palethorpe <io@richiejp.com>

---------

Signed-off-by: Richard Palethorpe <io@richiejp.com>
2026-05-13 21:57:27 +02:00

16 KiB

API Endpoints and Authentication

This guide covers how to add new API endpoints and properly integrate them with the auth/permissions system.

Before you ship a new endpoint or capability surface, re-read the checklist at the bottom of this file. LocalAI advertises its feature surface in several independent places — miss any one of them and clients/admins/UI won't know the endpoint exists.

Architecture overview

Authentication and authorization flow through three layers:

  1. Global auth middleware (core/http/auth/middleware.goauth.Middleware) — applied to every request in core/http/app.go. Handles session cookies, Bearer tokens, API keys, and legacy API keys. Populates auth_user and auth_role in the Echo context.
  2. Feature middleware (auth.RequireFeature) — per-feature access control applied to route groups or individual routes. Checks if the authenticated user has the specific feature enabled.
  3. Admin middleware (auth.RequireAdmin) — restricts endpoints to admin users only.

When auth is disabled (no auth DB, no legacy API keys), all middleware becomes pass-through (auth.NoopMiddleware).

Adding a new API endpoint

Step 1: Create the handler

Write the endpoint handler in the appropriate package under core/http/endpoints/. Follow existing patterns:

// core/http/endpoints/localai/my_feature.go
func MyFeatureEndpoint(app *application.Application) echo.HandlerFunc {
    return func(c echo.Context) error {
        // Use auth.GetUser(c) to get the authenticated user (may be nil if auth is disabled)
        user := auth.GetUser(c)

        // Your logic here
        return c.JSON(http.StatusOK, result)
    }
}

Step 2: Register routes

Add routes in the appropriate file under core/http/routes/. The file you use depends on the endpoint category:

File Category
routes/openai.go OpenAI-compatible API endpoints (/v1/...)
routes/localai.go LocalAI-specific endpoints (/api/..., /models/..., /backends/...)
routes/agents.go Agent pool endpoints (/api/agents/...)
routes/auth.go Auth endpoints (/api/auth/...)
routes/ui_api.go UI backend API endpoints

Step 3: Apply the right middleware

Choose the appropriate protection level:

No auth required (public)

Exempt paths bypass auth entirely. Add to isExemptPath() in middleware.go or use the /api/auth/ prefix (always exempt). Use sparingly — most endpoints should require auth.

Standard auth (any authenticated user)

The global middleware already handles this. API paths (/api/, /v1/, etc.) automatically require authentication when auth is enabled. You don't need to add any extra middleware.

router.GET("/v1/my-endpoint", myHandler)  // auth enforced by global middleware

Admin only

Pass adminMiddleware to the route. This is set up in app.go and passed to Register*Routes functions:

// In the Register function signature, accept the middleware:
func RegisterMyRoutes(router *echo.Echo, app *application.Application, adminMiddleware echo.MiddlewareFunc) {
    router.POST("/models/apply", myHandler, adminMiddleware)
}

Feature-gated

For endpoints that should be toggleable per-user, use feature middleware. There are two approaches:

Approach A: Route-level middleware (preferred for groups of related endpoints)

// In app.go, create the feature middleware:
myFeatureMw := auth.RequireFeature(application.AuthDB(), auth.FeatureMyFeature)

// Pass it to the route registration function:
routes.RegisterMyRoutes(e, app, myFeatureMw)

// In the routes file, apply to a group:
g := e.Group("/api/my-feature", myFeatureMw)
g.GET("", listHandler)
g.POST("", createHandler)

Approach B: RouteFeatureRegistry (preferred for individual OpenAI-compatible endpoints)

Add an entry to RouteFeatureRegistry in core/http/auth/features.go. The RequireRouteFeature global middleware will automatically enforce it:

var RouteFeatureRegistry = []RouteFeature{
    // ... existing entries ...
    {"POST", "/v1/my-endpoint", FeatureMyFeature},
}

Adding a new feature

When you need a new toggleable feature (not just a new endpoint under an existing feature):

1. Define the feature constant

Add to core/http/auth/permissions.go:

const (
    // Add to the appropriate group:
    // Agent features (default OFF for new users)
    FeatureMyFeature = "my_feature"

    // OR API features (default ON for new users)
    FeatureMyFeature = "my_feature"
)

Then add it to the appropriate slice:

// Default OFF — user must be explicitly granted access:
var AgentFeatures = []string{..., FeatureMyFeature}

// Default ON — user has access unless explicitly revoked:
var APIFeatures = []string{..., FeatureMyFeature}

2. Add feature metadata

In core/http/auth/features.go, add to the appropriate FeatureMetas function so the admin UI can display it:

func AgentFeatureMetas() []FeatureMeta {
    return []FeatureMeta{
        // ... existing ...
        {FeatureMyFeature, "My Feature", false},  // false = default OFF
    }
}

3. Wire up the middleware

In core/http/app.go:

myFeatureMw := auth.RequireFeature(application.AuthDB(), auth.FeatureMyFeature)

Then pass it to the route registration function.

4. Register route-feature mappings (if applicable)

If your feature gates standard API endpoints (like /v1/...), add entries to RouteFeatureRegistry in features.go instead of using per-route middleware.

Accessing the authenticated user in handlers

import "github.com/mudler/LocalAI/core/http/auth"

func MyHandler(c echo.Context) error {
    // Get the user (nil when auth is disabled or unauthenticated)
    user := auth.GetUser(c)
    if user == nil {
        // Handle unauthenticated — or let middleware handle it
    }

    // Check role
    if user.Role == auth.RoleAdmin {
        // admin-specific logic
    }

    // Check feature access programmatically (when you need conditional behavior, not full blocking)
    if auth.HasFeatureAccess(db, user, auth.FeatureMyFeature) {
        // feature-specific logic
    }

    // Check model access
    if !auth.IsModelAllowed(db, user, modelName) {
        return c.JSON(http.StatusForbidden, ...)
    }
}

Middleware composition patterns

Middleware can be composed at different levels. Here are the patterns used in the codebase:

Group-level middleware (agents pattern)

// All routes in the group share the middleware
g := e.Group("/api/agents", poolReadyMw, agentsMw)
g.GET("", listHandler)
g.POST("", createHandler)

Per-route middleware (localai pattern)

// Individual routes get middleware as extra arguments
router.POST("/models/apply", applyHandler, adminMiddleware)
router.GET("/metrics", metricsHandler, adminMiddleware)

Middleware slice (openai pattern)

// Build a middleware chain for a handler
chatMiddleware := []echo.MiddlewareFunc{
    usageMiddleware,
    traceMiddleware,
    modelFilterMiddleware,
}
app.POST("/v1/chat/completions", chatHandler, chatMiddleware...)

Error response format

Always use schema.ErrorResponse for auth/permission errors to stay consistent with the OpenAI-compatible API:

return c.JSON(http.StatusForbidden, schema.ErrorResponse{
    Error: &schema.APIError{
        Message: "feature not enabled for your account",
        Code:    http.StatusForbidden,
        Type:    "authorization_error",
    },
})

Use these HTTP status codes:

  • 401 Unauthorized — no valid credentials provided
  • 403 Forbidden — authenticated but lacking permission
  • 429 Too Many Requests — rate limited (auth endpoints)

Usage tracking

If your endpoint should be tracked for usage (token counts, request counts), add the usageMiddleware to its middleware chain. See core/http/middleware/usage.go and how it's applied in routes/openai.go.

Advertising surfaces — where to register a new capability

Beyond routing and auth, LocalAI publishes its capability surface in four independent places. When you add an endpoint — especially one introducing a net-new capability like a new media type or a new auth-gated feature — you must update every relevant surface. These aren't optional: missing them means the endpoint works but is invisible to clients, admins, and the UI.

1. Swagger @Tags annotation (mandatory)

Every handler needs a swagger block so the endpoint appears in /swagger/index.html and in the /api/instructions output. The @Tags value is what groups the endpoint into a capability area:

// MyEndpoint does X.
// @Summary Do X.
// @Tags my-capability
// @Param request body schema.MyRequest true "payload"
// @Success 200 {object} schema.MyResponse "Response"
// @Router /v1/my-endpoint [post]
func MyEndpoint(...) echo.HandlerFunc { ... }

Use an existing tag when the endpoint extends an existing area (e.g. audio, images, face-recognition). Create a new tag only when the endpoint introduces a genuinely new capability surface — and in that case, also register it in step 2.

After adding endpoints, regenerate the embedded spec so the runtime serves it:

make protogen-go         # ensures gRPC codegen is fresh first
make swagger             # regenerates swagger/swagger.json

2. /api/instructions registry (for new capability areas)

core/http/endpoints/localai/api_instructions.go defines instructionDefs — a lightweight, machine-readable index of capability areas that groups swagger endpoints by tag. It's the primary discovery surface for agents and SDKs ("what can this server do?").

When to update: only when adding a new capability area (a new swagger tag). Existing-tag additions automatically surface without any change here.

Add an entry to instructionDefs:

{
    Name:        "my-capability",             // URL segment at /api/instructions/my-capability
    Description: "Short sentence describing the capability",
    Tags:        []string{"my-capability"},   // must match swagger @Tags
    Intro:       "Optional gotcha/context that isn't in the swagger descriptions (caveats, defaults, cross-references to other endpoints).",
},

Also bump the expected-length count in api_instructions_test.go and add the name to the ContainElements assertion.

3. capabilities.js symbol (for new model-config FLAG_* flags)

If your feature needs a new FLAG_* usecase flag in core/config/model_config.go (so users can filter gallery models by it, and so /v1/models surfaces it), you need to update all of:

  • Usecase<Name> string constant in core/config/backend_capabilities.go
  • UsecaseInfoMap entry mapping the string to its flag + gRPC method
  • FLAG_<NAME> bitmask in core/config/model_config.go
  • GetAllModelConfigUsecases() map entry (otherwise the YAML loader silently ignores the string)
  • ModalityGroups membership if the flag should affect IsMultimodal() (e.g. realtime_audio is in both speech-input and audio-output groups so a lone flag still reads as multimodal)
  • GuessUsecases() branch listing the backends that own this capability
  • usecaseFilters in core/http/routes/ui_api.go (drives the gallery filter dropdown)
  • Models.jsx FILTERS array + matching filters.<camelCase> i18n key in core/http/react-ui/public/locales/en/models.json
  • core/http/react-ui/src/utils/capabilities.js:
export const CAP_MY_CAPABILITY = 'FLAG_MY_CAPABILITY'

React pages that want to filter the ModelSelector by capability import this symbol. Declare it even if you're not building the UI page yet — the declaration keeps the Go/JS vocabularies in sync.

4. docs/content/ (user-facing documentation)

A new capability deserves its own page under docs/content/features/, plus cross-links from related features and an entry in docs/content/whats-new.md. See the pattern used by face-recognition.md / object-detection.md.

Path protection rules

The global auth middleware classifies paths as API paths or non-API paths:

  • API paths (always require auth when auth is enabled): /api/, /v1/, /models/, /backends/, /backend/, /tts, /vad, /video, /stores/, /system, /ws/, /metrics
  • Exempt paths (never require auth): /api/auth/ prefix, anything in appConfig.PathWithoutAuth
  • Non-API paths (UI, static assets): pass through without auth — the React UI handles login redirects client-side

If you add endpoints under a new top-level path prefix, add it to isAPIPath() in middleware.go to ensure it requires authentication.

Checklist

When adding a new endpoint:

Routing & auth

  • Handler in core/http/endpoints/
  • Route registered in appropriate core/http/routes/ file
  • Auth level chosen: public / standard / admin / feature-gated
  • Entry added to RouteFeatureRegistry in core/http/auth/features.go (one row per route/method — all /v1/* routes gate through this, not per-route middleware)
  • If new feature: constant in permissions.go, added to the right slice (APIFeatures default-ON / AgentFeatures default-OFF), metadata in features.go *FeatureMetas()
  • If feature uses group middleware: wired in core/http/app.go and passed to the route registration function
  • If new path prefix: added to isAPIPath() in middleware.go
  • If token-counting: usageMiddleware added to middleware chain

Advertising surfaces (easy to miss — see the Advertising surfaces section)

  • Swagger block on the handler: @Summary, @Tags, @Param, @Success, @Router
  • If new capability area (new swagger tag): entry in instructionDefs in core/http/endpoints/localai/api_instructions.go + test count bumped in api_instructions_test.go
  • If new FLAG_* usecase flag: matching CAP_* symbol exported from core/http/react-ui/src/utils/capabilities.js
  • docs/content/features/<feature>.md created; cross-links from related feature pages; entry in docs/content/whats-new.md

Quality

  • Error responses use schema.ErrorResponse format (or echo.NewHTTPError with a mapped gRPC status — see the mapBackendError helper in core/http/endpoints/localai/images.go)
  • Tests cover both authenticated and unauthenticated access
  • Swagger regenerated (make swagger) if you changed any @Router/@Tags/@Param annotation

Companion: MCP admin tool surface

Required for admin endpoints. Every new admin endpoint MUST be considered for the MCP admin tool surface — the REST API and the MCP tool catalog can drift silently otherwise, and both the LocalAI Assistant chat modality and the standalone local-ai mcp-server rely on pkg/mcp/localaitools/ to mirror REST.

Two outcomes are acceptable; one is not:

  • Tool added. The new endpoint is something an admin would manage conversationally (install, list, edit, toggle, upgrade). Follow the full checklist in .agents/localai-assistant-mcp.md: add a LocalAIClient interface method, implement it in both inproc and httpapi, register the tool with a Tool* constant, update the skill prompts, and add the route to toolToHTTPRoute in pkg/mcp/localaitools/coverage_test.go.
  • Tool deliberately skipped. The endpoint is internal/diagnostic and adding a chat path would be misleading. Document the decision in the PR description; no code action.
  • Forgot. This breaks the contract. The TestToolHTTPRouteMappingComplete test in pkg/mcp/localaitools is a partial guard (it checks every Tool* has a route mapping), but it does NOT detect new REST endpoints without a tool — that's still a process check on the PR author.

Add to the bottom of the checklist below:

  • If admin: decided whether MCP coverage is needed; if yes, tool registered + map updated; if no, skip-reason in PR description.