Files
LocalAI/pkg/mcp/localaitools/client.go
Richard Palethorpe eb32cd9073 feat(realtime): eager blocking pipeline warm-up + /backend/load API (#10662)
Realtime sessions previously lazy-loaded each pipeline sub-model (VAD,
transcription, LLM, TTS) on first use, so every cold session paid a
per-request model-load stall and load errors only surfaced mid-stream.

Warm the whole pipeline eagerly and blockingly at session start
(including the voice-gate speaker-recognition model, which an enforced
gate blocks each utterance on; compaction's summary_model stays lazy
since it only runs off the response path):
- Add backend.PreloadModel / PreloadModelByName as the single load path
  for every modality (no transcription special-case; backend-omitted
  configs are deprecated).
- The realtime session blocks on Model.Warmup and returns a
  model_load_error to the client if any stage fails to load;
  updateSession warms in the background. Opt out per pipeline with
  pipeline.disable_warmup, exposed as a UI toggle via the
  config-metadata registry.

Add a LocalAI-native POST /backend/load (and /v1/backend/load) that
pre-loads a model -- expanding realtime pipelines into their sub-models
-- as the inverse of /backend/shutdown. There is one preload engine
(backend.PreloadStages): the realtime Warmup methods, /backend/load and
the --load-to-memory startup flag all use it, so --load-to-memory now
also expands pipeline models and records load-failure traces. Pipeline
sub-model alias resolution is likewise shared
(ModelConfigLoader.LoadResolvedModelConfig). Surface the endpoint
everywhere an admin manages models:
- MCP admin tool load_model (httpapi + inproc clients, safety/catalog
  prompts, catalog/dispatch tests).
- "Load into memory" action in the React models UI.
- Swagger regenerated; docs moved to the general backend-monitor page
  since it is not realtime-specific.

Fix a Traces UI crash ("json: unsupported value: -Inf"): audio-snippet
RMS/peak now floor at a finite dBFS, and backend-trace data is sanitized
to drop non-finite floats before marshaling. The sanitizer is
copy-on-write -- it runs on every RecordBackendTrace, so containers are
only re-allocated on the paths that actually changed.

Migrate core/http/openresponses_test.go onto the prebuilt mock-backend
the rest of the http suite already uses -- it was the last spec still
pointing at a real HuggingFace model, so it 404'd wherever no vision
backend was built -- and fix its item_reference specs to send the
spec's "id" field instead of "item_id", which the handler never
accepted.

Assisted-by: Claude:claude-opus-4-8 Claude Code

Signed-off-by: Richard Palethorpe <io@richiejp.com>
2026-07-03 18:00:37 +02:00

109 lines
5.2 KiB
Go

package localaitools
import (
"context"
"github.com/mudler/LocalAI/core/config"
"github.com/mudler/LocalAI/core/gallery"
"github.com/mudler/LocalAI/core/schema"
"github.com/mudler/LocalAI/core/services/modeladmin"
"github.com/mudler/LocalAI/pkg/vram"
)
// LocalAIClient is the surface tools depend on. It has two implementations:
//
// - inproc.Client (in-process; calls LocalAI services directly)
// - httpapi.Client (out-of-process; calls the LocalAI REST API)
//
// Tool handlers and the embedded skill prompts are agnostic to which
// implementation backs the client.
//
// Where the same shape already exists elsewhere in the codebase
// (config.Gallery, gallery.Metadata, schema.KnownBackend, vram.EstimateResult,
// modeladmin.Action/Capability) we surface it directly rather than maintain
// a parallel DTO — keeping the LLM-visible wire format aligned with the
// rest of LocalAI by construction.
type LocalAIClient interface {
// ---- Models / gallery (read) ----
GallerySearch(ctx context.Context, q GallerySearchQuery) ([]gallery.Metadata, error)
ListInstalledModels(ctx context.Context, capability Capability) ([]InstalledModel, error)
ListGalleries(ctx context.Context) ([]config.Gallery, error)
GetJobStatus(ctx context.Context, jobID string) (*JobStatus, error)
GetModelConfig(ctx context.Context, name string) (*ModelConfigView, error)
// ---- Models / gallery (write) ----
InstallModel(ctx context.Context, req InstallModelRequest) (jobID string, err error)
DeleteModel(ctx context.Context, name string) error
EditModelConfig(ctx context.Context, name string, patch map[string]any) error
ReloadModels(ctx context.Context) error
// LoadModel pre-loads a model into memory by name (the inverse of shutting
// it down). For a realtime pipeline model every configured sub-model is
// loaded; it returns the model names that became resident.
LoadModel(ctx context.Context, model string) ([]string, error)
ImportModelURI(ctx context.Context, req ImportModelURIRequest) (*ImportModelURIResponse, error)
// ---- Model aliases ----
// SetAlias creates the alias `name` pointing at `target`, or swaps an
// existing alias's target. The server validates that `target` is an
// existing, non-alias, enabled model. Deletion reuses DeleteModel.
SetAlias(ctx context.Context, name, target string) error
// ListAliases returns every configured alias and its target.
ListAliases(ctx context.Context) ([]AliasInfo, error)
// ---- Backends ----
// ListBackends returns installed backends. The shape stays a thin
// localaitools.Backend rather than gallery.SystemBackend because the
// latter carries filesystem paths (RunFile, Metadata) the LLM
// shouldn't see.
ListBackends(ctx context.Context) ([]Backend, error)
// ListKnownBackends returns the same shape as REST /backends/known.
ListKnownBackends(ctx context.Context) ([]schema.KnownBackend, error)
InstallBackend(ctx context.Context, req InstallBackendRequest) (jobID string, err error)
UpgradeBackend(ctx context.Context, name string) (jobID string, err error)
// ---- System ----
SystemInfo(ctx context.Context) (*SystemInfo, error)
ListNodes(ctx context.Context) ([]Node, error)
VRAMEstimate(ctx context.Context, req VRAMEstimateRequest) (*vram.EstimateResult, error)
// ---- State ----
// ToggleModelState accepts modeladmin.ActionEnable / ActionDisable.
ToggleModelState(ctx context.Context, name string, action modeladmin.Action) error
// ToggleModelPinned accepts modeladmin.ActionPin / ActionUnpin.
ToggleModelPinned(ctx context.Context, name string, action modeladmin.Action) error
// ---- Branding / whitelabeling ----
// GetBranding returns the configured instance branding (name, tagline,
// asset URLs).
GetBranding(ctx context.Context) (*Branding, error)
// SetBranding updates the text branding fields. Asset uploads are not
// exposed over MCP — admins use the Settings UI for binary files.
SetBranding(ctx context.Context, req SetBrandingRequest) (*Branding, error)
// ---- Usage / billing ----
// GetUsageStats returns aggregated token usage. In single-user
// no-auth mode this reports the synthetic local user's usage. The
// implementation enforces "admin required to query other users".
GetUsageStats(ctx context.Context, q UsageStatsQuery) (*UsageStats, error)
// ---- PII filter ----
// GetPIIEvents returns recent redaction events. Implementation
// enforces "admin required" when auth is on. The regex pattern tools
// were removed — detection policy lives on each detector model's
// pii_detection block, managed via the model-config tools.
GetPIIEvents(ctx context.Context, q PIIEventsQuery) ([]PIIEvent, error)
// ---- Middleware admin ----
// GetMiddlewareStatus returns the aggregated state surfaced on the
// /app/middleware page: active PII patterns, per-model resolved
// enabled state, recent event count, router placeholder.
GetMiddlewareStatus(ctx context.Context) (*MiddlewareStatus, error)
// ---- Router (intelligent routing) ----
// GetRouterDecisions returns recent routing decisions for the
// /app/middleware Routing tab and for agent-driven introspection.
// Admin-required when auth is on.
GetRouterDecisions(ctx context.Context, q RouterDecisionsQuery) ([]RouterDecision, error)
}