Files
LocalAI/core/http/routes/ui_api.go
Richard Palethorpe 0245b33eab feat(realtime): Add Liquid Audio s2s model and assistant mode on talk page (#9801)
* feat(liquid-audio): add LFM2.5-Audio any-to-any backend + realtime_audio usecase

Wires LiquidAI's LFM2.5-Audio-1.5B as a self-contained Realtime API model:
single engine handles VAD, transcription, LLM, and TTS in one bidirectional
stream — drop-in alternative to a VAD+STT+LLM+TTS pipeline.

Backend
- backend/python/liquid-audio/ — new Python gRPC backend wrapping the
  `liquid-audio` package. Modes: chat / asr / tts / s2s, voice presets,
  Load/Predict/PredictStream/AudioTranscription/TTS/VAD/AudioToAudioStream/
  Free and StartFineTune/FineTuneProgress/StopFineTune. Runtime monkey-patch
  on `liquid_audio.utils.snapshot_download` so absolute local paths from
  LocalAI's gallery resolve without a HF round-trip. soundfile in place of
  torchaudio.load/save (torchcodec drags NVIDIA NPP we don't bundle).
- backend/backend.proto + pkg/grpc/{backend,client,server,base,embed,
  interface}.go — new AudioToAudioStream RPC mirroring AudioTransformStream
  (config/frame/control oneof in; typed event+pcm+meta out).
- core/services/nodes/{health_mock,inflight}_test.go — add stubs for the
  new RPC to the test fakes.

Config + capabilities
- core/config/backend_capabilities.go — UsecaseRealtimeAudio, MethodAudio
  ToAudioStream, UsecaseInfoMap entry, liquid-audio BackendCapability row.
- core/config/model_config.go — FLAG_REALTIME_AUDIO bitmask, ModalityGroups
  membership in both speech-input and audio-output groups so a lone flag
  still reads as multimodal, GetAllModelConfigUsecases entry, GuessUsecases
  branch.

Realtime endpoint
- core/http/endpoints/openai/realtime.go — extract prepareRealtimeConfig()
  so the gate is unit-testable; accept realtime_audio models and self-fill
  empty pipeline slots with the model's own name (user-pinned slots win).
- core/http/endpoints/openai/realtime_gate_test.go — six specs covering nil
  cfg, empty pipeline, legacy pipeline, self-contained realtime_audio,
  user-pinned VAD slot, and partial legacy pipeline.

UI + endpoints
- core/http/routes/ui.go — /api/pipeline-models accepts either a legacy
  VAD+STT+LLM+TTS pipeline or a realtime_audio model; surfaces a
  self_contained flag so the Talk page can collapse the four cards.
- core/http/routes/ui_api.go — realtime_audio in usecaseFilters.
- core/http/routes/ui_pipeline_models_test.go — covers both code paths.
- core/http/react-ui/src/pages/Talk.jsx — self-contained badge instead of
  the four-slot grid; rename Edit Pipeline → Edit Model Config; less
  pipeline-specific wording.
- core/http/react-ui/src/pages/Models.jsx + locales/en/models.json — new
  realtime_audio filter button + i18n.
- core/http/react-ui/src/utils/capabilities.js — CAP_REALTIME_AUDIO.
- core/http/react-ui/src/pages/FineTune.jsx — voice + validation-dataset
  fields, surfaced when backend === liquid-audio, plumbed via
  extra_options on submit/export/import.

Gallery + importer
- gallery/liquid-audio.yaml — config template with known_usecases:
  [realtime_audio, chat, tts, transcript, vad].
- gallery/index.yaml — four model entries (realtime/chat/asr/tts) keyed by
  mode option. Fixed pre-existing `transcribe` typo on the asr entry
  (loader silently dropped the unknown string → entry never surfaced as a
  transcript model).
- gallery/lfm.yaml — function block for the LFM2 Pythonic tool-call format
  `<|tool_call_start|>[name(k="v")]<|tool_call_end|>` matching
  common_chat_params_init_lfm2 in vendored llama.cpp.
- core/gallery/importers/{liquid-audio,liquid-audio_test}.go — detector
  matches LFM2-Audio HF repos (excludes -gguf mirrors); mode/voice
  preferences plumbed through to options.
- core/gallery/importers/importers.go — register LiquidAudioImporter
  before LlamaCPPImporter.
- pkg/functions/parse_lfm2_test.go — seven specs for the response/argument
  regex pair on the LFM2 pythonic format.

Build matrix
- .github/backend-matrix.yml — seven liquid-audio targets (cuda12, cuda13,
  l4t-cuda-13, hipblas, intel, cpu amd64, cpu arm64). Jetpack r36 cuda-12
  is skipped (Ubuntu 22.04 / Python 3.10 incompatible with liquid-audio's
  3.12 floor).
- backend/index.yaml — anchor + 13 image entries.
- Makefile — .NOTPARALLEL, prepare-test-extra, test-extra,
  docker-build-liquid-audio.

Docs
- .agents/plans/liquid-audio-integration.md — phased plan; PR-D (real
  any-to-any wiring via AudioToAudioStream), PR-E (mid-audio tool-call
  detector), PR-G (GGUF entries once upstream llama.cpp PR #18641 lands)
  remain.
- .agents/api-endpoints-and-auth.md — expand the capability-surface
  checklist with every place a new FLAG_* needs to be registered.

Assisted-by: claude-code:claude-opus-4-7-1m [Claude Code]
Signed-off-by: Richard Palethorpe <io@richiejp.com>

* feat(realtime): function calling + history cap for any-to-any models

Three pieces, all on the realtime_audio path that just landed:

1. liquid-audio backend (backend/python/liquid-audio/backend.py):
   - _build_chat_state grows a `tools_prelude` arg.
   - new _render_tools_prelude parses request.Tools (the OpenAI Chat
     Completions function array realtime.go already serialises) and
     emits an LFM2 `<|tool_list_start|>…<|tool_list_end|>` system turn
     ahead of the user history. Mirrors gallery/lfm.yaml's `function:`
     template so the model sees the same prompt shape whether served
     via llama-cpp or here. Without this the backend silently dropped
     tools — function calling was wired end-to-end on the Go side but
     the model never saw a tool list.

2. Realtime history cap (core/http/endpoints/openai/realtime.go):
   - Session grows MaxHistoryItems int; default picked by new
     defaultMaxHistoryItems(cfg) — 6 for realtime_audio models (LFM2.5
     1.5B degrades quickly past a handful of turns), 0/unlimited for
     legacy pipelines composing larger LLMs.
   - triggerResponse runs conv.Items through trimRealtimeItems before
     building conversationHistory. Helper walks the cut left if it
     would orphan a function_call_output, so tool result + call pairs
     stay intact.
   - realtime_gate_test.go: specs for defaultMaxHistoryItems and
     trimRealtimeItems (zero cap, under cap, over cap, tool-call pair
     preservation).

3. Talk page (core/http/react-ui/src/pages/Talk.jsx):
   - Reuses the chat page's MCP plumbing — useMCPClient hook,
     ClientMCPDropdown component, same auto-connect/disconnect effect
     pattern. No bespoke tool registry, no new REST endpoints; tools
     come from whichever MCP servers the user toggles on, exactly as
     on the chat page.
   - sendSessionUpdate now passes session.tools=getToolsForLLM(); the
     update re-fires when the active server set changes mid-session.
   - New response.function_call_arguments.done handler executes via
     the hook's executeTool (which round-trips through the MCP client
     SDK), then replies with conversation.item.create
     {type:function_call_output} + response.create so the model
     completes its turn with the tool output. Mirrors chat's
     client-side agentic loop, translated to the realtime wire shape.

UI changes require a LocalAI image rebuild (Dockerfile:308-313 bakes
react-ui/dist into the runtime image). Backend.py changes can be
swapped live in /backends/<id>/backend.py + /backend/shutdown.

Assisted-by: claude-code:claude-opus-4-7-1m [Claude Code]
Signed-off-by: Richard Palethorpe <io@richiejp.com>

* feat(realtime): LocalAI Assistant ("Manage Mode") for the Talk page

Mirrors the chat-page metadata.localai_assistant flow so users can ask the
realtime model what's loaded / installed / configured. Tools are run
server-side via the same in-process MCP holder that powers the chat
modality — no transport switch, no proxy, no new wire protocol.

Wire:
- core/http/endpoints/openai/realtime.go:
  - RealtimeSessionOptions{LocalAIAssistant,IsAdmin}; isCurrentUserAdmin
    helper mirrors chat.go's requireAssistantAccess (no-op when auth
    disabled, else requires auth.RoleAdmin).
  - Session grows AssistantExecutor mcpTools.ToolExecutor.
  - runRealtimeSession, when opts.LocalAIAssistant is set: gate on admin,
    fail closed if DisableLocalAIAssistant or the holder has no tools,
    DiscoverTools and inject into session.Tools, prepend
    holder.SystemPrompt() to instructions.
  - Tool-call dispatch loop: when AssistantExecutor.IsTool(name), run
    ExecuteTool inproc, append a FunctionCallOutput to conv.Items, skip
    the function_call_arguments client emit (the client can't execute
    these — it doesn't know about them). After the loop, if any
    assistant tool ran, trigger another response so the model speaks the
    result. Mirrors chat's agentic loop, driven server-side rather than
    via client round-trip.

- core/http/endpoints/openai/realtime_webrtc.go: RealtimeCallRequest
  gains `localai_assistant` (JSON omitempty). Handshake calls
  isCurrentUserAdmin and builds RealtimeSessionOptions.

- core/http/react-ui/src/pages/Talk.jsx: admin-only "Manage Mode"
  checkbox under the Tools dropdown; passes localai_assistant: true to
  realtimeApi.call's body, captured in the connect callback's deps.

Mirroring chat's pattern means the in-process MCP tools surface "just
works" for the Talk page without exposing a Streamable-HTTP MCP endpoint
(which was the alternative). Clients with their own MCP servers can
still use the existing ClientMCPDropdown path in parallel; the realtime
handler distinguishes them by AssistantExecutor.IsTool() at dispatch
time.

Assisted-by: claude-code:claude-opus-4-7-1m [Claude Code]
Signed-off-by: Richard Palethorpe <io@richiejp.com>

* feat(realtime): render Manage Mode tool calls in the Talk transcript

Previously the realtime endpoint only emitted response.output_item.added
for the FunctionCall item, and Talk.jsx's switch ignored the event — so
server-side tool runs were invisible in the UI. The model would speak
the result but the user had no way to see what tool was actually
called.

realtime.go: after executing an assistant tool inproc, emit a second
output_item.added/.done pair for the FunctionCallOutput item. Mirrors
the way the chat page displays tool_call + tool_result blocks.

Talk.jsx: handle both response.output_item.added and .done. Render
FunctionCall (with arguments) and FunctionCallOutput (pretty-printed
JSON when possible) as two transcript entries — `tool_call` with the
wrench icon, `tool_result` with the clipboard icon, both in mono-space
secondary-colour. Resets streamingRef after the result so the next
assistant text delta starts a fresh transcript entry instead of
appending to the previous turn.

Assisted-by: claude-code:claude-opus-4-7-1m [Claude Code]
Signed-off-by: Richard Palethorpe <io@richiejp.com>

* refactor(realtime): bound the Manage Mode tool-loop + preserve assistant tools

Fallout from a review pass on the Manage Mode patches:

- Bound the server-side agentic loop. triggerResponse used to recurse on
  executedAssistantTool with no cap — a model that kept calling tools
  would blow the goroutine stack. New maxAssistantToolTurns = 10 (mirrors
  useChat.js's maxToolTurns). Public triggerResponse is now a thin shim
  over triggerResponseAtTurn(toolTurn int); recursion increments the
  counter and stops at the cap with an xlog.Warn.

- Preserve Manage Mode tools across client session.update. The handler
  used to blindly overwrite session.Tools, so toggling a client MCP
  server mid-session silently wiped the in-process admin tools. Session
  now caches the original AssistantTools slice at session creation and
  the session.update handler merges them back in (client names win on
  collision — the client is explicit).

- strconv.ParseBool for the localai_assistant query param instead of
  hand-rolled "1" || "true". Mirrors LocalAIAssistantFromMetadata.

- Talk.jsx: render both tool_call and tool_result on
  response.output_item.done instead of splitting them across .added and
  .done. The server's event pairing (added → done) stays correct; the
  UI just doesn't need to inspect both phases of the same item. One
  switch case instead of two, no behavioural change.

Out of scope (noted for follow-ups): extract a shared assistant-tools
helper between chat.go and realtime.go (duplication is small enough
that two parallel implementations stay readable for now), and an i18n
key for the Manage Mode helper text (Talk.jsx doesn't use i18n
anywhere else yet).

Assisted-by: claude-code:claude-opus-4-7-1m [Claude Code]
Signed-off-by: Richard Palethorpe <io@richiejp.com>

* ci(test-extra): wire liquid-audio backend smoke test

The backend ships test.py + a `make test` target and is listed in
backend-matrix.yml, so scripts/changed-backends.js already writes a
`liquid-audio=true|false` output when files under backend/python/liquid-audio/
change. The workflow just wasn't reading it.

- Expose the `liquid-audio` output on the detect-changes job
- Add a tests-liquid-audio job that runs `make` + `make test` in
  backend/python/liquid-audio, gated on the per-backend detect flag

The smoke covers Health() and LoadModel(mode:finetune); fine-tune mode
short-circuits before any HuggingFace download (backend.py:192), so the
job needs neither weights nor a GPU. The full-inference path remains
gated on LIQUID_AUDIO_MODEL_ID, which CI doesn't set.

The four new Go test files (core/gallery/importers/liquid-audio_test.go,
core/http/endpoints/openai/realtime_gate_test.go,
core/http/routes/ui_pipeline_models_test.go, pkg/functions/parse_lfm2_test.go)
are already picked up by the existing test.yml workflow via `make test` →
`ginkgo -r ./pkg/... ./core/...`; their packages all carry RunSpecs entries.

Assisted-by: Claude:claude-opus-4-7
Signed-off-by: Richard Palethorpe <io@richiejp.com>

---------

Signed-off-by: Richard Palethorpe <io@richiejp.com>
2026-05-13 21:57:27 +02:00

1586 lines
48 KiB
Go

package routes
import "os"
import (
"cmp"
"context"
"fmt"
"math"
"net/http"
"net/url"
"slices"
"strconv"
"strings"
"time"
"github.com/google/uuid"
"github.com/labstack/echo/v4"
"github.com/mudler/LocalAI/core/application"
"github.com/mudler/LocalAI/core/config"
"github.com/mudler/LocalAI/core/gallery"
"github.com/mudler/LocalAI/core/http/auth"
"github.com/mudler/LocalAI/core/http/endpoints/localai"
"github.com/mudler/LocalAI/core/p2p"
"github.com/mudler/LocalAI/core/services/galleryop"
"github.com/mudler/LocalAI/pkg/model"
"github.com/mudler/LocalAI/pkg/vram"
"github.com/mudler/LocalAI/pkg/xsysinfo"
"github.com/mudler/xlog"
)
const (
nameSortFieldName = "name"
repositorySortFieldName = "repository"
licenseSortFieldName = "license"
statusSortFieldName = "status"
ascSortOrder = "asc"
multimodalFilterKey = "multimodal"
)
// usecaseFilters maps UI filter keys to ModelConfigUsecase flags for
// capability-based gallery filtering.
var usecaseFilters = map[string]config.ModelConfigUsecase{
config.UsecaseChat: config.FLAG_CHAT,
config.UsecaseImage: config.FLAG_IMAGE,
config.UsecaseVideo: config.FLAG_VIDEO,
config.UsecaseVision: config.FLAG_VISION,
config.UsecaseTTS: config.FLAG_TTS,
config.UsecaseTranscript: config.FLAG_TRANSCRIPT,
config.UsecaseSoundGeneration: config.FLAG_SOUND_GENERATION,
config.UsecaseEmbeddings: config.FLAG_EMBEDDINGS,
config.UsecaseRerank: config.FLAG_RERANK,
config.UsecaseDetection: config.FLAG_DETECTION,
config.UsecaseVAD: config.FLAG_VAD,
config.UsecaseAudioTransform: config.FLAG_AUDIO_TRANSFORM,
config.UsecaseDiarization: config.FLAG_DIARIZATION,
config.UsecaseRealtimeAudio: config.FLAG_REALTIME_AUDIO,
}
// extractHFRepo tries to find a HuggingFace repo ID from model overrides or URLs.
func extractHFRepo(overrides map[string]any, urls []string) string {
if overrides != nil {
if params, ok := overrides["parameters"].(map[string]any); ok {
if modelRef, ok := params["model"].(string); ok {
if repoID, ok := vram.ExtractHFRepoID(modelRef); ok {
return repoID
}
}
}
}
for _, u := range urls {
if repoID, ok := vram.ExtractHFRepoID(u); ok {
return repoID
}
}
return ""
}
// buildEstimateInput creates a vram.ModelEstimateInput from gallery model metadata.
func buildEstimateInput(m *gallery.GalleryModel) vram.ModelEstimateInput {
var input vram.ModelEstimateInput
input.Size = m.Size
if hfRepoID := extractHFRepo(m.Overrides, m.URLs); hfRepoID != "" {
input.HFRepo = hfRepoID
}
for _, f := range m.AdditionalFiles {
if vram.IsWeightFile(f.URI) {
input.Files = append(input.Files, vram.FileInput{URI: f.URI, Size: 0})
}
}
return input
}
// parseContextSizes parses a comma-separated list of context sizes from a query param.
// Returns a default of [8192] if the param is empty or unparseable.
func parseContextSizes(raw string) []uint32 {
if raw == "" {
return []uint32{8192}
}
var sizes []uint32
for _, s := range strings.Split(raw, ",") {
s = strings.TrimSpace(s)
if v, err := strconv.ParseUint(s, 10, 32); err == nil && v > 0 {
sizes = append(sizes, uint32(v))
}
}
if len(sizes) == 0 {
return []uint32{8192}
}
return sizes
}
// getDirectorySize calculates the total size of files in a directory
// metaParentOf returns the name of the auto-resolving (meta) backend that
// declares `name` as one of its hardware-specific variants in its
// CapabilitiesMap, or "" if there is no such parent. The install picker uses
// this to render hints like "CPU build of llama-cpp" without re-walking the
// whole gallery on the client side.
func metaParentOf(name string, backends gallery.GalleryElements[*gallery.GalleryBackend]) string {
for _, b := range backends {
if !b.IsMeta() {
continue
}
for _, concreteName := range b.CapabilitiesMap {
if concreteName == name {
return b.Name
}
}
}
return ""
}
func getDirectorySize(path string) (int64, error) {
var totalSize int64
entries, err := os.ReadDir(path)
if err != nil {
return 0, err
}
for _, entry := range entries {
info, err := entry.Info()
if err != nil {
continue
}
if !info.IsDir() {
totalSize += info.Size()
}
}
return totalSize, nil
}
// RegisterUIAPIRoutes registers JSON API routes for the web UI
func RegisterUIAPIRoutes(app *echo.Echo, cl *config.ModelConfigLoader, ml *model.ModelLoader, appConfig *config.ApplicationConfig, galleryService *galleryop.GalleryService, opcache *galleryop.OpCache, applicationInstance *application.Application, adminMiddleware echo.MiddlewareFunc) {
// Operations API - Get all current operations (models + backends)
app.GET("/api/operations", func(c echo.Context) error {
processingData, taskTypes := opcache.GetStatus()
operations := []map[string]any{}
for galleryID, jobID := range processingData {
taskType := "installation"
if tt, ok := taskTypes[galleryID]; ok {
taskType = tt
}
status := galleryService.GetStatus(jobID)
progress := 0
isDeletion := false
isQueued := false
isCancelled := false
isCancellable := false
message := ""
if status != nil {
// Skip successfully completed operations
if status.Processed && !status.Cancelled && status.Error == nil {
continue
}
// Skip cancelled operations that are processed (they're done, no need to show)
if status.Processed && status.Cancelled {
continue
}
progress = int(status.Progress)
isDeletion = status.Deletion
isCancelled = status.Cancelled
isCancellable = status.Cancellable
message = status.Message
if isDeletion {
taskType = "deletion"
}
if isCancelled {
taskType = "cancelled"
}
} else {
// Job is queued but hasn't started
isQueued = true
isCancellable = true
message = "Operation queued"
}
// Determine if it's a model or backend
// First check if it was explicitly marked as a backend operation
isBackend := opcache.IsBackendOp(galleryID)
// If not explicitly marked, check if it matches a known backend from the gallery
if !isBackend {
backends, _ := gallery.AvailableBackends(appConfig.BackendGalleries, appConfig.SystemState)
for _, b := range backends {
backendID := fmt.Sprintf("%s@%s", b.Gallery.Name, b.Name)
if backendID == galleryID || b.Name == galleryID {
isBackend = true
break
}
}
}
// Extract display name (remove repo prefix if exists)
displayName := galleryID
if strings.Contains(galleryID, "@") {
parts := strings.Split(galleryID, "@")
if len(parts) > 1 {
displayName = parts[1]
}
}
opData := map[string]any{
"id": galleryID,
"name": displayName,
"fullName": galleryID,
"jobID": jobID,
"progress": progress,
"taskType": taskType,
"isDeletion": isDeletion,
"isBackend": isBackend,
"isQueued": isQueued,
"isCancelled": isCancelled,
"cancellable": isCancellable,
"message": message,
}
if status != nil && status.Error != nil {
opData["error"] = status.Error.Error()
}
operations = append(operations, opData)
}
// Append active file staging operations (distributed mode only)
if d := applicationInstance.Distributed(); d != nil && d.Router != nil {
for modelID, status := range d.Router.StagingTracker().GetAll() {
operations = append(operations, map[string]any{
"id": "staging:" + modelID,
"name": modelID,
"fullName": modelID,
"jobID": "staging:" + modelID,
"progress": int(status.Progress),
"taskType": "staging",
"isDeletion": false,
"isBackend": false,
"isQueued": false,
"isCancelled": false,
"cancellable": false,
"message": status.Message,
"nodeName": status.NodeName,
})
}
}
// Sort operations by progress (ascending), then by ID for stable display order
slices.SortFunc(operations, func(a, b map[string]any) int {
progressA := a["progress"].(int)
progressB := b["progress"].(int)
// Primary sort by progress
if progressA != progressB {
return cmp.Compare(progressA, progressB)
}
// Secondary sort by ID for stability when progress is the same
return cmp.Compare(a["id"].(string), b["id"].(string))
})
return c.JSON(200, map[string]any{
"operations": operations,
})
}, adminMiddleware)
// Cancel operation endpoint (admin only)
app.POST("/api/operations/:jobID/cancel", func(c echo.Context) error {
jobID := c.Param("jobID")
xlog.Debug("API request to cancel operation", "jobID", jobID)
err := galleryService.CancelOperation(jobID)
if err != nil {
xlog.Error("Failed to cancel operation", "error", err, "jobID", jobID)
return c.JSON(http.StatusBadRequest, map[string]any{
"error": err.Error(),
})
}
// Clean up opcache for cancelled operation
opcache.DeleteUUID(jobID)
return c.JSON(200, map[string]any{
"success": true,
"message": "Operation cancelled",
})
}, adminMiddleware)
// Dismiss a failed operation (acknowledge the error and remove it from the list)
app.POST("/api/operations/:jobID/dismiss", func(c echo.Context) error {
jobID := c.Param("jobID")
xlog.Debug("API request to dismiss operation", "jobID", jobID)
// Remove the operation from the opcache so it no longer appears
opcache.DeleteUUID(jobID)
return c.JSON(200, map[string]any{
"success": true,
"message": "Operation dismissed",
})
}, adminMiddleware)
// Model Gallery APIs (admin only)
app.GET("/api/models", func(c echo.Context) error {
term := c.QueryParam("term")
tag := c.QueryParam("tag")
page := c.QueryParam("page")
if page == "" {
page = "1"
}
items := c.QueryParam("items")
if items == "" {
items = "9"
}
models, err := gallery.AvailableGalleryModelsCached(appConfig.Galleries, appConfig.SystemState)
if err != nil {
xlog.Error("could not list models from galleries", "error", err)
return c.JSON(http.StatusInternalServerError, map[string]any{
"error": err.Error(),
})
}
// Get all available tags
allTags := map[string]struct{}{}
tags := []string{}
for _, m := range models {
for _, t := range m.Tags {
allTags[t] = struct{}{}
}
}
for t := range allTags {
tags = append(tags, t)
}
slices.Sort(tags)
// Get all available backends (before filtering so dropdown always shows all)
allBackendsMap := map[string]struct{}{}
for _, m := range models {
if b := m.Backend; b != "" {
allBackendsMap[b] = struct{}{}
}
}
backendNames := make([]string, 0, len(allBackendsMap))
for b := range allBackendsMap {
backendNames = append(backendNames, b)
}
slices.Sort(backendNames)
// Filter by usecase tags (comma-separated for multi-select).
if tag != "" {
var combinedFlag config.ModelConfigUsecase
hasMultimodal := false
var plainTags []string
for _, t := range strings.Split(tag, ",") {
t = strings.TrimSpace(t)
if t == multimodalFilterKey {
hasMultimodal = true
} else if flag, ok := usecaseFilters[t]; ok {
combinedFlag |= flag
} else if t != "" {
plainTags = append(plainTags, t)
}
}
if hasMultimodal {
models = gallery.FilterGalleryModelsByMultimodal(models)
}
if combinedFlag != config.FLAG_ANY {
models = gallery.FilterGalleryModelsByUsecase(models, combinedFlag)
}
for _, pt := range plainTags {
models = gallery.GalleryElements[*gallery.GalleryModel](models).FilterByTag(pt)
}
}
if term != "" {
models = gallery.GalleryElements[*gallery.GalleryModel](models).Search(term)
}
// Filter by backend if requested
backendFilter := c.QueryParam("backend")
if backendFilter != "" {
var filtered gallery.GalleryElements[*gallery.GalleryModel]
for _, m := range models {
if m.Backend == backendFilter {
filtered = append(filtered, m)
}
}
models = filtered
}
// Get model statuses
processingModelsData, taskTypes := opcache.GetStatus()
// Apply sorting if requested
sortBy := c.QueryParam("sort")
sortOrder := c.QueryParam("order")
if sortOrder == "" {
sortOrder = ascSortOrder
}
switch sortBy {
case nameSortFieldName:
models = gallery.GalleryElements[*gallery.GalleryModel](models).SortByName(sortOrder)
case repositorySortFieldName:
models = gallery.GalleryElements[*gallery.GalleryModel](models).SortByRepository(sortOrder)
case licenseSortFieldName:
models = gallery.GalleryElements[*gallery.GalleryModel](models).SortByLicense(sortOrder)
case statusSortFieldName:
models = gallery.GalleryElements[*gallery.GalleryModel](models).SortByInstalled(sortOrder)
}
pageNum, err := strconv.Atoi(page)
if err != nil || pageNum < 1 {
pageNum = 1
}
itemsNum, err := strconv.Atoi(items)
if err != nil || itemsNum < 1 {
itemsNum = 9
}
totalPages := int(math.Ceil(float64(len(models)) / float64(itemsNum)))
totalModels := len(models)
if pageNum > 0 {
models = models.Paginate(pageNum, itemsNum)
}
// Convert models to JSON-friendly format and deduplicate by ID
modelsJSON := make([]map[string]any, 0, len(models))
seenIDs := make(map[string]bool)
for _, m := range models {
modelID := m.ID()
// Skip duplicate IDs to prevent Alpine.js x-for errors
if seenIDs[modelID] {
xlog.Debug("Skipping duplicate model ID", "modelID", modelID)
continue
}
seenIDs[modelID] = true
currentlyProcessing := opcache.Exists(modelID)
jobID := ""
isDeletionOp := false
if currentlyProcessing {
jobID = opcache.Get(modelID)
status := galleryService.GetStatus(jobID)
if status != nil && status.Deletion {
isDeletionOp = true
}
}
_, trustRemoteCodeExists := m.Overrides["trust_remote_code"]
obj := map[string]any{
"id": modelID,
"name": m.Name,
"description": m.Description,
"icon": m.Icon,
"license": m.License,
"urls": m.URLs,
"tags": m.Tags,
"gallery": m.Gallery.Name,
"installed": m.Installed,
"processing": currentlyProcessing,
"jobID": jobID,
"isDeletion": isDeletionOp,
"trustRemoteCode": trustRemoteCodeExists,
"additionalFiles": m.AdditionalFiles,
"backend": m.Backend,
}
modelsJSON = append(modelsJSON, obj)
}
prevPage := pageNum - 1
nextPage := pageNum + 1
if prevPage < 1 {
prevPage = 1
}
if nextPage > totalPages {
nextPage = totalPages
}
// Calculate installed models count (models with configs + models without configs)
modelConfigs := cl.GetAllModelsConfigs()
modelsWithoutConfig, _ := galleryop.ListModels(cl, ml, config.NoFilterFn, galleryop.LOOSE_ONLY)
installedModelsCount := len(modelConfigs) + len(modelsWithoutConfig)
ramInfo, _ := xsysinfo.GetSystemRAMInfo()
return c.JSON(200, map[string]any{
"models": modelsJSON,
"repositories": appConfig.Galleries,
"allTags": tags,
"allBackends": backendNames,
"processingModels": processingModelsData,
"taskTypes": taskTypes,
"availableModels": totalModels,
"installedModels": installedModelsCount,
"ramTotal": ramInfo.Total,
"ramUsed": ramInfo.Used,
"ramUsagePercent": ramInfo.UsagePercent,
"currentPage": pageNum,
"totalPages": totalPages,
"prevPage": prevPage,
"nextPage": nextPage,
})
}, adminMiddleware)
// Returns installed models with their capability flags for UI filtering
app.GET("/api/models/capabilities", func(c echo.Context) error {
modelConfigs := cl.GetAllModelsConfigs()
modelsWithoutConfig, _ := galleryop.ListModels(cl, ml, config.NoFilterFn, galleryop.LOOSE_ONLY)
type loadedOn struct {
NodeID string `json:"node_id"`
NodeName string `json:"node_name"`
State string `json:"state"`
NodeStatus string `json:"node_status"`
}
type modelCapability struct {
ID string `json:"id"`
Capabilities []string `json:"capabilities"`
Backend string `json:"backend"`
Disabled bool `json:"disabled"`
Pinned bool `json:"pinned"`
// LoadedOn is populated only when the node registry is active
// (distributed mode). Lets the UI show "loaded on worker-1" without
// the operator having to expand every node manually. An empty slice
// with nil reports "no loaded replicas" vs. nil reports "not in
// cluster mode" — the frontend treats both as "no distribution info".
LoadedOn []loadedOn `json:"loaded_on,omitempty"`
// Source="registry-only" marks models adopted from the cluster that
// have no local config yet (ghosts that the reconciler discovered).
Source string `json:"source,omitempty"`
}
// Join with the node registry when we have one (distributed mode). A
// single registry fetch + map join beats per-model queries for the
// 100-model case.
var loadedByModel map[string][]loadedOn
if ds := applicationInstance.Distributed(); ds != nil && ds.Registry != nil {
nodeModels, err := ds.Registry.ListAllLoadedModels(c.Request().Context())
if err == nil {
allNodes, _ := ds.Registry.List(c.Request().Context())
nameByID := make(map[string]string, len(allNodes))
statusByID := make(map[string]string, len(allNodes))
for _, n := range allNodes {
nameByID[n.ID] = n.Name
statusByID[n.ID] = n.Status
}
loadedByModel = make(map[string][]loadedOn)
for _, nm := range nodeModels {
loadedByModel[nm.ModelName] = append(loadedByModel[nm.ModelName], loadedOn{
NodeID: nm.NodeID,
NodeName: nameByID[nm.NodeID],
State: nm.State,
NodeStatus: statusByID[nm.NodeID],
})
}
}
}
result := make([]modelCapability, 0, len(modelConfigs)+len(modelsWithoutConfig))
seen := make(map[string]bool, len(modelConfigs)+len(modelsWithoutConfig))
for _, cfg := range modelConfigs {
seen[cfg.Name] = true
result = append(result, modelCapability{
ID: cfg.Name,
Capabilities: cfg.KnownUsecaseStrings,
Backend: cfg.Backend,
Disabled: cfg.IsDisabled(),
Pinned: cfg.IsPinned(),
LoadedOn: loadedByModel[cfg.Name],
})
}
for _, name := range modelsWithoutConfig {
seen[name] = true
result = append(result, modelCapability{
ID: name,
Capabilities: []string{},
LoadedOn: loadedByModel[name],
})
}
// Emit entries for cluster models that have no local config — these
// are the actual ghosts. Without this the operator would have no way
// to see a model the cluster is running if its config file wasn't
// synced to this frontend's filesystem.
for name, loc := range loadedByModel {
if seen[name] {
continue
}
result = append(result, modelCapability{
ID: name,
Capabilities: []string{},
LoadedOn: loc,
Source: "registry-only",
})
}
// Filter by user's model allowlist if auth is enabled
if authDB := applicationInstance.AuthDB(); authDB != nil {
if user := auth.GetUser(c); user != nil && user.Role != auth.RoleAdmin {
perm, err := auth.GetCachedUserPermissions(c, authDB, user.ID)
if err == nil && perm.AllowedModels.Enabled {
allowed := map[string]bool{}
for _, m := range perm.AllowedModels.Models {
allowed[m] = true
}
filtered := make([]modelCapability, 0, len(result))
for _, mc := range result {
if allowed[mc.ID] {
filtered = append(filtered, mc)
}
}
result = filtered
}
}
}
return c.JSON(200, map[string]any{
"data": result,
})
})
// Returns a mapping of backend names to the usecase filter keys they support.
// Used by the gallery frontend to grey out usecase filter buttons when a
// backend is selected.
app.GET("/api/backends/usecases", func(c echo.Context) error {
result := make(map[string][]string, len(config.BackendCapabilities))
for name, cap := range config.BackendCapabilities {
var keys []string
for _, uc := range cap.PossibleUsecases {
if _, ok := usecaseFilters[uc]; ok {
keys = append(keys, uc)
}
}
slices.Sort(keys)
result[name] = keys
}
return c.JSON(200, result)
}, adminMiddleware)
// Returns VRAM/size estimates for a single gallery model at multiple
// context sizes. The frontend calls this per-model so the gallery page
// can load instantly and fill in estimates asynchronously.
// Query params:
// contexts - comma-separated context sizes (default: 8192)
app.GET("/api/models/estimate/:id", func(c echo.Context) error {
modelID, err := url.QueryUnescape(c.Param("id"))
if err != nil {
return c.JSON(http.StatusBadRequest, map[string]any{"error": "invalid model ID"})
}
contextSizes := parseContextSizes(c.QueryParam("contexts"))
// Look up the model from the gallery to build the estimate input.
models, err := gallery.AvailableGalleryModelsCached(appConfig.Galleries, appConfig.SystemState)
if err != nil {
return c.JSON(http.StatusInternalServerError, map[string]any{"error": err.Error()})
}
model := gallery.FindGalleryElement(models, modelID)
if model == nil {
return c.JSON(http.StatusNotFound, map[string]any{"error": "model not found"})
}
input := buildEstimateInput(model)
if len(input.Files) == 0 && input.HFRepo == "" && input.Size == "" {
return c.JSON(200, vram.MultiContextEstimate{})
}
ctx, cancel := context.WithTimeout(c.Request().Context(), 10*time.Second)
defer cancel()
result, err := vram.EstimateModelMultiContext(ctx, input, contextSizes)
if err != nil {
xlog.Debug("model estimate failed", "model", modelID, "error", err)
return c.JSON(200, vram.MultiContextEstimate{})
}
return c.JSON(200, result)
}, adminMiddleware)
app.POST("/api/models/install/:id", func(c echo.Context) error {
galleryID := c.Param("id")
// URL decode the gallery ID (e.g., "localai%40model" -> "localai@model")
galleryID, err := url.QueryUnescape(galleryID)
if err != nil {
return c.JSON(http.StatusBadRequest, map[string]any{
"error": "invalid model ID",
})
}
xlog.Debug("API job submitted to install", "galleryID", galleryID)
id, err := uuid.NewUUID()
if err != nil {
return c.JSON(http.StatusInternalServerError, map[string]any{
"error": err.Error(),
})
}
uid := id.String()
opcache.Set(galleryID, uid)
ctx, cancelFunc := context.WithCancel(context.Background())
op := galleryop.ManagementOp[gallery.GalleryModel, gallery.ModelConfig]{
ID: uid,
GalleryElementName: galleryID,
Galleries: appConfig.Galleries,
BackendGalleries: appConfig.BackendGalleries,
Context: ctx,
CancelFunc: cancelFunc,
}
// Store cancellation function immediately so queued operations can be cancelled
galleryService.StoreCancellation(uid, cancelFunc)
go func() {
galleryService.ModelGalleryChannel <- op
}()
return c.JSON(200, map[string]any{
"jobID": uid,
"message": "Installation started",
})
}, adminMiddleware)
app.POST("/api/models/delete/:id", func(c echo.Context) error {
galleryID := c.Param("id")
// URL decode the gallery ID
galleryID, err := url.QueryUnescape(galleryID)
if err != nil {
return c.JSON(http.StatusBadRequest, map[string]any{
"error": "invalid model ID",
})
}
xlog.Debug("API job submitted to delete", "galleryID", galleryID)
var galleryName = galleryID
if strings.Contains(galleryID, "@") {
galleryName = strings.Split(galleryID, "@")[1]
}
id, err := uuid.NewUUID()
if err != nil {
return c.JSON(http.StatusInternalServerError, map[string]any{
"error": err.Error(),
})
}
uid := id.String()
opcache.Set(galleryID, uid)
ctx, cancelFunc := context.WithCancel(context.Background())
op := galleryop.ManagementOp[gallery.GalleryModel, gallery.ModelConfig]{
ID: uid,
Delete: true,
GalleryElementName: galleryName,
Galleries: appConfig.Galleries,
BackendGalleries: appConfig.BackendGalleries,
Context: ctx,
CancelFunc: cancelFunc,
}
// Store cancellation function immediately so queued operations can be cancelled
galleryService.StoreCancellation(uid, cancelFunc)
go func() {
galleryService.ModelGalleryChannel <- op
cl.RemoveModelConfig(galleryName)
}()
return c.JSON(200, map[string]any{
"jobID": uid,
"message": "Deletion started",
})
}, adminMiddleware)
app.POST("/api/models/config/:id", func(c echo.Context) error {
galleryID := c.Param("id")
// URL decode the gallery ID
galleryID, err := url.QueryUnescape(galleryID)
if err != nil {
return c.JSON(http.StatusBadRequest, map[string]any{
"error": "invalid model ID",
})
}
xlog.Debug("API job submitted to get config", "galleryID", galleryID)
models, err := gallery.AvailableGalleryModelsCached(appConfig.Galleries, appConfig.SystemState)
if err != nil {
return c.JSON(http.StatusInternalServerError, map[string]any{
"error": err.Error(),
})
}
model := gallery.FindGalleryElement(models, galleryID)
if model == nil {
return c.JSON(http.StatusNotFound, map[string]any{
"error": "model not found",
})
}
config, err := gallery.GetGalleryConfigFromURL[gallery.ModelConfig](model.URL, appConfig.SystemState.Model.ModelsPath)
if err != nil {
return c.JSON(http.StatusInternalServerError, map[string]any{
"error": err.Error(),
})
}
_, err = gallery.InstallModel(context.Background(), appConfig.SystemState, model.Name, &config, model.Overrides, nil, false)
if err != nil {
return c.JSON(http.StatusInternalServerError, map[string]any{
"error": err.Error(),
})
}
return c.JSON(200, map[string]any{
"message": "Configuration file saved",
})
}, adminMiddleware)
// Get installed model config as JSON (used by frontend for MCP detection, etc.)
app.GET("/api/models/config-json/:name", func(c echo.Context) error {
modelName := c.Param("name")
if modelName == "" {
return c.JSON(http.StatusBadRequest, map[string]any{
"error": "model name is required",
})
}
modelConfig, exists := cl.GetModelConfig(modelName)
if !exists {
return c.JSON(http.StatusNotFound, map[string]any{
"error": "model configuration not found",
})
}
return c.JSON(http.StatusOK, modelConfig)
}, adminMiddleware)
// Config metadata API - returns field metadata for all ~170 config fields
app.GET("/api/models/config-metadata", localai.ConfigMetadataEndpoint(), adminMiddleware)
// Autocomplete providers for config fields (dynamic values only)
app.GET("/api/models/config-metadata/autocomplete/:provider", localai.AutocompleteEndpoint(cl, ml, appConfig), adminMiddleware)
// PATCH config endpoint - partial update using nested JSON merge
app.PATCH("/api/models/config-json/:name", localai.PatchConfigEndpoint(cl, ml, appConfig), adminMiddleware)
// VRAM estimation endpoint
app.POST("/api/models/vram-estimate", localai.VRAMEstimateEndpoint(cl, appConfig), adminMiddleware)
// Get installed model YAML config for the React model editor
app.GET("/api/models/edit/:name", func(c echo.Context) error {
modelName := c.Param("name")
if decoded, err := url.PathUnescape(modelName); err == nil {
modelName = decoded
}
if modelName == "" {
return c.JSON(http.StatusBadRequest, map[string]any{
"error": "model name is required",
})
}
modelConfig, exists := cl.GetModelConfig(modelName)
if !exists {
return c.JSON(http.StatusNotFound, map[string]any{
"error": "model configuration not found",
})
}
modelConfigFile := modelConfig.GetModelConfigFile()
if modelConfigFile == "" {
return c.JSON(http.StatusNotFound, map[string]any{
"error": "model configuration file not found",
})
}
configData, err := os.ReadFile(modelConfigFile)
if err != nil {
return c.JSON(http.StatusInternalServerError, map[string]any{
"error": "failed to read configuration file: " + err.Error(),
})
}
return c.JSON(http.StatusOK, map[string]any{
"config": string(configData),
"name": modelName,
})
}, adminMiddleware)
app.GET("/api/models/job/:uid", func(c echo.Context) error {
jobUID := c.Param("uid")
status := galleryService.GetStatus(jobUID)
if status == nil {
// Job is queued but hasn't started processing yet
return c.JSON(200, map[string]any{
"progress": 0,
"message": "Operation queued",
"galleryElementName": "",
"processed": false,
"deletion": false,
"queued": true,
})
}
response := map[string]any{
"progress": status.Progress,
"message": status.Message,
"galleryElementName": status.GalleryElementName,
"processed": status.Processed,
"deletion": status.Deletion,
"queued": false,
}
if status.Error != nil {
response["error"] = status.Error.Error()
}
if status.Progress == 100 && status.Processed && status.Message == "completed" {
opcache.DeleteUUID(jobUID)
response["completed"] = true
}
return c.JSON(200, response)
}, adminMiddleware)
// Backend Gallery APIs
app.GET("/api/backends", func(c echo.Context) error {
term := c.QueryParam("term")
tag := c.QueryParam("tag")
page := c.QueryParam("page")
if page == "" {
page = "1"
}
items := c.QueryParam("items")
if items == "" {
items = "9"
}
backends, err := gallery.AvailableBackendsUnfiltered(appConfig.BackendGalleries, appConfig.SystemState)
if err != nil {
xlog.Error("could not list backends from galleries", "error", err)
return c.JSON(http.StatusInternalServerError, map[string]any{
"error": err.Error(),
})
}
// Collect concrete backend names that are referenced by any meta backend's
// CapabilitiesMap. These are the per-capability variants the UI hides by
// default behind "Show all" (the meta backend is the preferred entry).
aliasedByMeta := make(map[string]bool)
for _, b := range backends {
if !b.IsMeta() {
continue
}
for _, concreteName := range b.CapabilitiesMap {
aliasedByMeta[concreteName] = true
}
}
// Use the BackendManager's list to determine installed status.
// In standalone mode this checks the local filesystem; in distributed
// mode it aggregates from all healthy worker nodes.
installedBackends, listErr := galleryService.ListBackends()
if listErr == nil {
for i, b := range backends {
if installedBackends.Exists(b.GetName()) {
backends[i].Installed = true
}
}
}
// Get all available tags
allTags := map[string]struct{}{}
tags := []string{}
for _, b := range backends {
for _, t := range b.Tags {
allTags[t] = struct{}{}
}
}
for t := range allTags {
tags = append(tags, t)
}
slices.Sort(tags)
if tag != "" {
backends = gallery.GalleryElements[*gallery.GalleryBackend](backends).FilterByTag(tag)
}
if term != "" {
backends = gallery.GalleryElements[*gallery.GalleryBackend](backends).Search(term)
}
// Get backend statuses
processingBackendsData, taskTypes := opcache.GetStatus()
// Apply sorting if requested
sortBy := c.QueryParam("sort")
sortOrder := c.QueryParam("order")
if sortOrder == "" {
sortOrder = ascSortOrder
}
switch sortBy {
case nameSortFieldName:
backends = gallery.GalleryElements[*gallery.GalleryBackend](backends).SortByName(sortOrder)
case repositorySortFieldName:
backends = gallery.GalleryElements[*gallery.GalleryBackend](backends).SortByRepository(sortOrder)
case licenseSortFieldName:
backends = gallery.GalleryElements[*gallery.GalleryBackend](backends).SortByLicense(sortOrder)
case statusSortFieldName:
backends = gallery.GalleryElements[*gallery.GalleryBackend](backends).SortByInstalled(sortOrder)
}
pageNum, err := strconv.Atoi(page)
if err != nil || pageNum < 1 {
pageNum = 1
}
itemsNum, err := strconv.Atoi(items)
if err != nil || itemsNum < 1 {
itemsNum = 9
}
totalPages := int(math.Ceil(float64(len(backends)) / float64(itemsNum)))
totalBackends := len(backends)
if pageNum > 0 {
backends = backends.Paginate(pageNum, itemsNum)
}
// Get dev suffix from SystemState for development backend detection
devSuffix := ""
if appConfig.SystemState != nil {
devSuffix = appConfig.SystemState.BackendDevSuffix
}
// Convert backends to JSON-friendly format and deduplicate by ID
backendsJSON := make([]map[string]any, 0, len(backends))
seenBackendIDs := make(map[string]bool)
for _, b := range backends {
backendID := b.ID()
// Skip duplicate IDs to prevent Alpine.js x-for errors
if seenBackendIDs[backendID] {
xlog.Debug("Skipping duplicate backend ID", "backendID", backendID)
continue
}
seenBackendIDs[backendID] = true
currentlyProcessing := opcache.Exists(backendID)
jobID := ""
isDeletionOp := false
if currentlyProcessing {
jobID = opcache.Get(backendID)
status := galleryService.GetStatus(jobID)
if status != nil && status.Deletion {
isDeletionOp = true
}
}
// Per-node distribution + parent meta lookup for the install picker.
// `nodes` populates the Nodes column on the gallery; `metaBackendFor`
// lets the picker name the parent (e.g. "CPU build of llama-cpp")
// without re-walking the whole gallery on the client.
var perNode []gallery.NodeBackendRef
if installedBackends != nil {
if sb, ok := installedBackends.Get(b.Name); ok {
perNode = sb.Nodes
}
}
backendsJSON = append(backendsJSON, map[string]any{
"id": backendID,
"name": b.Name,
"description": b.Description,
"icon": b.Icon,
"license": b.License,
"urls": b.URLs,
"tags": b.Tags,
"gallery": b.Gallery.Name,
"installed": b.Installed,
"version": b.Version,
"processing": currentlyProcessing,
"jobID": jobID,
"isDeletion": isDeletionOp,
"isMeta": b.IsMeta(),
"isAlias": aliasedByMeta[b.Name],
"isDevelopment": b.IsDevelopment(devSuffix),
"capabilities": b.CapabilitiesMap,
"metaBackendFor": metaParentOf(b.Name, backends),
"nodes": perNode,
})
}
prevPage := pageNum - 1
nextPage := pageNum + 1
if prevPage < 1 {
prevPage = 1
}
if nextPage > totalPages {
nextPage = totalPages
}
// Calculate installed backends count (reuse the already-fetched data)
installedBackendsCount := 0
if listErr == nil {
installedBackendsCount = len(installedBackends)
} else {
// Fallback to local listing if manager listing failed
if localBackends, localErr := gallery.ListSystemBackends(appConfig.SystemState); localErr == nil {
installedBackendsCount = len(localBackends)
}
}
// Get the detected system capability
detectedCapability := ""
if appConfig.SystemState != nil {
detectedCapability = appConfig.SystemState.DetectedCapability()
}
return c.JSON(200, map[string]any{
"backends": backendsJSON,
"repositories": appConfig.BackendGalleries,
"allTags": tags,
"processingBackends": processingBackendsData,
"taskTypes": taskTypes,
"availableBackends": totalBackends,
"installedBackends": installedBackendsCount,
"currentPage": pageNum,
"totalPages": totalPages,
"prevPage": prevPage,
"nextPage": nextPage,
"systemCapability": detectedCapability,
"preferDevelopmentBackends": appConfig.PreferDevelopmentBackends,
})
}, adminMiddleware)
app.POST("/api/backends/install/:id", func(c echo.Context) error {
backendID := c.Param("id")
// URL decode the backend ID
backendID, err := url.QueryUnescape(backendID)
if err != nil {
return c.JSON(http.StatusBadRequest, map[string]any{
"error": "invalid backend ID",
})
}
xlog.Debug("API job submitted to install backend", "backendID", backendID)
id, err := uuid.NewUUID()
if err != nil {
return c.JSON(http.StatusInternalServerError, map[string]any{
"error": err.Error(),
})
}
uid := id.String()
opcache.SetBackend(backendID, uid)
ctx, cancelFunc := context.WithCancel(context.Background())
op := galleryop.ManagementOp[gallery.GalleryBackend, any]{
ID: uid,
GalleryElementName: backendID,
Galleries: appConfig.BackendGalleries,
Context: ctx,
CancelFunc: cancelFunc,
}
// Store cancellation function immediately so queued operations can be cancelled
galleryService.StoreCancellation(uid, cancelFunc)
go func() {
galleryService.BackendGalleryChannel <- op
}()
return c.JSON(200, map[string]any{
"jobID": uid,
"message": "Backend installation started",
})
}, adminMiddleware)
// Install backend from external source (OCI image, URL, or path)
app.POST("/api/backends/install-external", func(c echo.Context) error {
// Request body structure
type ExternalBackendRequest struct {
URI string `json:"uri"`
Name string `json:"name"`
Alias string `json:"alias"`
}
var req ExternalBackendRequest
if err := c.Bind(&req); err != nil {
return c.JSON(http.StatusBadRequest, map[string]any{
"error": "invalid request body",
})
}
// Validate required fields
if req.URI == "" {
return c.JSON(http.StatusBadRequest, map[string]any{
"error": "uri is required",
})
}
xlog.Debug("API job submitted to install external backend", "uri", req.URI, "name", req.Name, "alias", req.Alias)
id, err := uuid.NewUUID()
if err != nil {
return c.JSON(http.StatusInternalServerError, map[string]any{
"error": err.Error(),
})
}
uid := id.String()
// Use URI as the key for opcache, or name if provided
cacheKey := req.URI
if req.Name != "" {
cacheKey = req.Name
}
opcache.SetBackend(cacheKey, uid)
ctx, cancelFunc := context.WithCancel(context.Background())
op := galleryop.ManagementOp[gallery.GalleryBackend, any]{
ID: uid,
GalleryElementName: req.Name, // May be empty, will be derived during installation
Galleries: appConfig.BackendGalleries,
Context: ctx,
CancelFunc: cancelFunc,
ExternalURI: req.URI,
ExternalName: req.Name,
ExternalAlias: req.Alias,
}
// Store cancellation function immediately so queued operations can be cancelled
galleryService.StoreCancellation(uid, cancelFunc)
go func() {
galleryService.BackendGalleryChannel <- op
}()
return c.JSON(200, map[string]any{
"jobID": uid,
"message": "External backend installation started",
})
}, adminMiddleware)
app.POST("/api/backends/delete/:id", func(c echo.Context) error {
backendID := c.Param("id")
// URL decode the backend ID
backendID, err := url.QueryUnescape(backendID)
if err != nil {
return c.JSON(http.StatusBadRequest, map[string]any{
"error": "invalid backend ID",
})
}
xlog.Debug("API job submitted to delete backend", "backendID", backendID)
var backendName = backendID
if strings.Contains(backendID, "@") {
backendName = strings.Split(backendID, "@")[1]
}
id, err := uuid.NewUUID()
if err != nil {
return c.JSON(http.StatusInternalServerError, map[string]any{
"error": err.Error(),
})
}
uid := id.String()
opcache.SetBackend(backendID, uid)
ctx, cancelFunc := context.WithCancel(context.Background())
op := galleryop.ManagementOp[gallery.GalleryBackend, any]{
ID: uid,
Delete: true,
GalleryElementName: backendName,
Galleries: appConfig.BackendGalleries,
Context: ctx,
CancelFunc: cancelFunc,
}
// Store cancellation function immediately so queued operations can be cancelled
galleryService.StoreCancellation(uid, cancelFunc)
go func() {
galleryService.BackendGalleryChannel <- op
}()
return c.JSON(200, map[string]any{
"jobID": uid,
"message": "Backend deletion started",
})
}, adminMiddleware)
app.GET("/api/backends/job/:uid", func(c echo.Context) error {
jobUID := c.Param("uid")
status := galleryService.GetStatus(jobUID)
if status == nil {
// Job is queued but hasn't started processing yet
return c.JSON(200, map[string]any{
"progress": 0,
"message": "Operation queued",
"galleryElementName": "",
"processed": false,
"deletion": false,
"queued": true,
})
}
response := map[string]any{
"progress": status.Progress,
"message": status.Message,
"galleryElementName": status.GalleryElementName,
"processed": status.Processed,
"deletion": status.Deletion,
"queued": false,
}
if status.Error != nil {
response["error"] = status.Error.Error()
}
if status.Progress == 100 && status.Processed && status.Message == "completed" {
opcache.DeleteUUID(jobUID)
response["completed"] = true
}
return c.JSON(200, response)
}, adminMiddleware)
// System Backend Deletion API (for installed backends on index page)
app.POST("/api/backends/system/delete/:name", func(c echo.Context) error {
backendName := c.Param("name")
// URL decode the backend name
backendName, err := url.QueryUnescape(backendName)
if err != nil {
return c.JSON(http.StatusBadRequest, map[string]any{
"error": "invalid backend name",
})
}
xlog.Debug("API request to delete system backend", "backendName", backendName)
// Use the gallery service's backend manager, which in distributed mode
// fans out deletion to worker nodes via NATS.
if err := galleryService.DeleteBackend(backendName); err != nil {
xlog.Error("Failed to delete backend", "error", err, "backendName", backendName)
return c.JSON(http.StatusInternalServerError, map[string]any{
"error": err.Error(),
})
}
return c.JSON(200, map[string]any{
"success": true,
"message": "Backend deleted successfully",
})
}, adminMiddleware)
// Backend upgrade APIs
app.GET("/api/backends/upgrades", func(c echo.Context) error {
if applicationInstance == nil || applicationInstance.UpgradeChecker() == nil {
return c.JSON(200, map[string]any{})
}
return c.JSON(200, applicationInstance.UpgradeChecker().GetAvailableUpgrades())
}, adminMiddleware)
app.POST("/api/backends/upgrades/check", func(c echo.Context) error {
if applicationInstance == nil || applicationInstance.UpgradeChecker() == nil {
return c.JSON(200, map[string]any{})
}
applicationInstance.UpgradeChecker().TriggerCheck()
return c.JSON(200, applicationInstance.UpgradeChecker().GetAvailableUpgrades())
}, adminMiddleware)
app.POST("/api/backends/upgrade/:name", func(c echo.Context) error {
backendName := c.Param("name")
backendName, err := url.QueryUnescape(backendName)
if err != nil {
return c.JSON(http.StatusBadRequest, map[string]any{
"error": "invalid backend name",
})
}
id, err := uuid.NewUUID()
if err != nil {
return c.JSON(http.StatusInternalServerError, map[string]any{"error": err.Error()})
}
uid := id.String()
// Register in opcache so the operation shows up in /api/operations
// and the Backends UI can reflect progress on the affected row.
opcache.SetBackend(backendName, uid)
ctx, cancelFunc := context.WithCancel(context.Background())
op := galleryop.ManagementOp[gallery.GalleryBackend, any]{
ID: uid,
GalleryElementName: backendName,
Galleries: appConfig.BackendGalleries,
Upgrade: true,
Context: ctx,
CancelFunc: cancelFunc,
}
// Store cancellation function immediately so queued operations can be cancelled
galleryService.StoreCancellation(uid, cancelFunc)
// Non-blocking send — BackendGalleryChannel is unbuffered and a direct
// send would hang the HTTP handler whenever the worker is busy.
go func() {
galleryService.BackendGalleryChannel <- op
}()
return c.JSON(200, map[string]any{
"jobID": uid,
"uuid": uid,
"statusUrl": fmt.Sprintf("/api/backends/job/%s", uid),
"message": "Backend upgrade started",
})
}, adminMiddleware)
// P2P APIs
app.GET("/api/p2p/workers", func(c echo.Context) error {
llamaNodes := p2p.GetAvailableNodes(p2p.NetworkID(appConfig.P2PNetworkID, p2p.LlamaCPPWorkerID))
mlxNodes := p2p.GetAvailableNodes(p2p.NetworkID(appConfig.P2PNetworkID, p2p.MLXWorkerID))
llamaJSON := make([]map[string]any, 0, len(llamaNodes))
for _, n := range llamaNodes {
llamaJSON = append(llamaJSON, map[string]any{
"name": n.Name,
"id": n.ID,
"tunnelAddress": n.TunnelAddress,
"serviceID": n.ServiceID,
"lastSeen": n.LastSeen,
"isOnline": n.IsOnline(),
})
}
mlxJSON := make([]map[string]any, 0, len(mlxNodes))
for _, n := range mlxNodes {
mlxJSON = append(mlxJSON, map[string]any{
"name": n.Name,
"id": n.ID,
"tunnelAddress": n.TunnelAddress,
"serviceID": n.ServiceID,
"lastSeen": n.LastSeen,
"isOnline": n.IsOnline(),
})
}
return c.JSON(200, map[string]any{
"llama_cpp": map[string]any{
"nodes": llamaJSON,
},
"mlx": map[string]any{
"nodes": mlxJSON,
},
// Keep backward-compatible "nodes" key with llama.cpp workers
"nodes": llamaJSON,
})
}, adminMiddleware)
app.GET("/api/p2p/federation", func(c echo.Context) error {
nodes := p2p.GetAvailableNodes(p2p.NetworkID(appConfig.P2PNetworkID, p2p.FederatedID))
nodesJSON := make([]map[string]any, 0, len(nodes))
for _, n := range nodes {
nodesJSON = append(nodesJSON, map[string]any{
"name": n.Name,
"id": n.ID,
"tunnelAddress": n.TunnelAddress,
"serviceID": n.ServiceID,
"lastSeen": n.LastSeen,
"isOnline": n.IsOnline(),
})
}
return c.JSON(200, map[string]any{
"nodes": nodesJSON,
})
}, adminMiddleware)
app.GET("/api/p2p/stats", func(c echo.Context) error {
llamaCPPNodes := p2p.GetAvailableNodes(p2p.NetworkID(appConfig.P2PNetworkID, p2p.LlamaCPPWorkerID))
federatedNodes := p2p.GetAvailableNodes(p2p.NetworkID(appConfig.P2PNetworkID, p2p.FederatedID))
mlxWorkerNodes := p2p.GetAvailableNodes(p2p.NetworkID(appConfig.P2PNetworkID, p2p.MLXWorkerID))
llamaCPPOnline := 0
for _, n := range llamaCPPNodes {
if n.IsOnline() {
llamaCPPOnline++
}
}
federatedOnline := 0
for _, n := range federatedNodes {
if n.IsOnline() {
federatedOnline++
}
}
mlxWorkersOnline := 0
for _, n := range mlxWorkerNodes {
if n.IsOnline() {
mlxWorkersOnline++
}
}
return c.JSON(200, map[string]any{
"llama_cpp_workers": map[string]any{
"online": llamaCPPOnline,
"total": len(llamaCPPNodes),
},
"federated": map[string]any{
"online": federatedOnline,
"total": len(federatedNodes),
},
"mlx_workers": map[string]any{
"online": mlxWorkersOnline,
"total": len(mlxWorkerNodes),
},
})
}, adminMiddleware)
// Resources API endpoint - unified memory info (GPU if available, otherwise RAM)
app.GET("/api/resources", func(c echo.Context) error {
resourceInfo := xsysinfo.GetResourceInfo()
// Format watchdog interval
watchdogInterval := "2s" // default
if appConfig.WatchDogInterval > 0 {
watchdogInterval = appConfig.WatchDogInterval.String()
}
storageSize, _ := getDirectorySize(appConfig.SystemState.Model.ModelsPath)
response := map[string]any{
"type": resourceInfo.Type, // "gpu" or "ram"
"available": resourceInfo.Available,
"gpus": resourceInfo.GPUs,
"ram": resourceInfo.RAM,
"aggregate": resourceInfo.Aggregate,
"storage_size": storageSize,
"reclaimer_enabled": appConfig.MemoryReclaimerEnabled,
"reclaimer_threshold": appConfig.MemoryReclaimerThreshold,
"watchdog_interval": watchdogInterval,
}
return c.JSON(200, response)
}, adminMiddleware)
if !appConfig.DisableRuntimeSettings {
// Settings API
app.GET("/api/settings", localai.GetSettingsEndpoint(applicationInstance), adminMiddleware)
app.POST("/api/settings", localai.UpdateSettingsEndpoint(applicationInstance), adminMiddleware)
}
// Branding / whitelabeling. The read endpoint and the asset server are
// public so the login screen can render the configured logo and instance
// name before authentication. Mutations are admin-only. See app.go where
// "/api/branding" and "/branding/" are added to PathWithoutAuth.
app.GET("/api/branding", localai.GetBrandingEndpoint(appConfig))
app.GET("/branding/asset/:kind", localai.ServeBrandingAssetEndpoint(appConfig))
app.POST("/api/branding/asset/:kind", localai.UploadBrandingAssetEndpoint(appConfig), adminMiddleware)
app.DELETE("/api/branding/asset/:kind", localai.DeleteBrandingAssetEndpoint(appConfig), adminMiddleware)
}