LocalAI

mirror of https://github.com/mudler/LocalAI.git synced 2026-07-05 13:57:28 -04:00

Author	SHA1	Message	Date
Richard Palethorpe	eb32cd9073	feat(realtime): eager blocking pipeline warm-up + /backend/load API (#10662 ) Realtime sessions previously lazy-loaded each pipeline sub-model (VAD, transcription, LLM, TTS) on first use, so every cold session paid a per-request model-load stall and load errors only surfaced mid-stream. Warm the whole pipeline eagerly and blockingly at session start (including the voice-gate speaker-recognition model, which an enforced gate blocks each utterance on; compaction's summary_model stays lazy since it only runs off the response path): - Add backend.PreloadModel / PreloadModelByName as the single load path for every modality (no transcription special-case; backend-omitted configs are deprecated). - The realtime session blocks on Model.Warmup and returns a model_load_error to the client if any stage fails to load; updateSession warms in the background. Opt out per pipeline with pipeline.disable_warmup, exposed as a UI toggle via the config-metadata registry. Add a LocalAI-native POST /backend/load (and /v1/backend/load) that pre-loads a model -- expanding realtime pipelines into their sub-models -- as the inverse of /backend/shutdown. There is one preload engine (backend.PreloadStages): the realtime Warmup methods, /backend/load and the --load-to-memory startup flag all use it, so --load-to-memory now also expands pipeline models and records load-failure traces. Pipeline sub-model alias resolution is likewise shared (ModelConfigLoader.LoadResolvedModelConfig). Surface the endpoint everywhere an admin manages models: - MCP admin tool load_model (httpapi + inproc clients, safety/catalog prompts, catalog/dispatch tests). - "Load into memory" action in the React models UI. - Swagger regenerated; docs moved to the general backend-monitor page since it is not realtime-specific. Fix a Traces UI crash ("json: unsupported value: -Inf"): audio-snippet RMS/peak now floor at a finite dBFS, and backend-trace data is sanitized to drop non-finite floats before marshaling. The sanitizer is copy-on-write -- it runs on every RecordBackendTrace, so containers are only re-allocated on the paths that actually changed. Migrate core/http/openresponses_test.go onto the prebuilt mock-backend the rest of the http suite already uses -- it was the last spec still pointing at a real HuggingFace model, so it 404'd wherever no vision backend was built -- and fix its item_reference specs to send the spec's "id" field instead of "item_id", which the handler never accepted. Assisted-by: Claude:claude-opus-4-8 Claude Code Signed-off-by: Richard Palethorpe <io@richiejp.com>	2026-07-03 18:00:37 +02:00
LocalAI [bot]	d74f88357e	fix(tests): align openresponses test model name with GGUF-derived naming (#10589 ) (#10609 ) PR #10589 changed repo-root HuggingFace URI imports to name the model after the selected GGUF file rather than the repository. The Open Responses API integration test still requested the old repo-derived name ("Qwen3-VL-2B-Instruct-GGUF"), so every request 404'd on an unknown model and the suite has failed on master since `1a4f68ed4`. Update testModel to the name the importer now registers for the default q4_k_m quant ("Qwen3-VL-2B-Instruct-Q4_K_M") so the specs resolve the model again. The #10589 behaviour change is intentional; only the stale test needed updating. Assisted-by: Claude:claude-opus-4-8 [Claude Code] Signed-off-by: Ettore Di Giacinto <mudler@localai.io> Co-authored-by: Ettore Di Giacinto <mudler@localai.io>	2026-06-30 15:41:44 +02:00
Ettore Di Giacinto	59108fbe32	feat: add distributed mode (#9124 ) * feat: add distributed mode (experimental) Signed-off-by: Ettore Di Giacinto <mudler@localai.io> * fix data races, mutexes, transactions Signed-off-by: Ettore Di Giacinto <mudler@localai.io> * refactorings Signed-off-by: Ettore Di Giacinto <mudler@localai.io> * fixups Signed-off-by: Ettore Di Giacinto <mudler@localai.io> * fix events and tool stream in agent chat Signed-off-by: Ettore Di Giacinto <mudler@localai.io> * use ginkgo Signed-off-by: Ettore Di Giacinto <mudler@localai.io> * refactoring and consolidation Signed-off-by: Ettore Di Giacinto <mudler@localai.io> * refactoring and consolidation Signed-off-by: Ettore Di Giacinto <mudler@localai.io> * refactoring and consolidation Signed-off-by: Ettore Di Giacinto <mudler@localai.io> * refactoring and consolidation Signed-off-by: Ettore Di Giacinto <mudler@localai.io> * refactoring and consolidation Signed-off-by: Ettore Di Giacinto <mudler@localai.io> * refactoring and consolidation Signed-off-by: Ettore Di Giacinto <mudler@localai.io> * refactoring and consolidation Signed-off-by: Ettore Di Giacinto <mudler@localai.io> * refactoring and consolidation Signed-off-by: Ettore Di Giacinto <mudler@localai.io> * fix(cron): compute correctly time boundaries avoiding re-triggering Signed-off-by: Ettore Di Giacinto <mudler@localai.io> * enhancements, refactorings Signed-off-by: Ettore Di Giacinto <mudler@localai.io> * do not flood of healthy checks Signed-off-by: Ettore Di Giacinto <mudler@localai.io> * do not list obvious backends as text backends Signed-off-by: Ettore Di Giacinto <mudler@localai.io> * tests fixups Signed-off-by: Ettore Di Giacinto <mudler@localai.io> * refactoring and consolidation Signed-off-by: Ettore Di Giacinto <mudler@localai.io> * Drop redundant healthcheck Signed-off-by: Ettore Di Giacinto <mudler@localai.io> * enhancements, refactorings Signed-off-by: Ettore Di Giacinto <mudler@localai.io> --------- Signed-off-by: Ettore Di Giacinto <mudler@localai.io>	2026-03-30 00:47:27 +02:00
Ettore Di Giacinto	3387bfaee0	feat(api): add support for open responses specification (#8063 ) * feat: openresponses Signed-off-by: Ettore Di Giacinto <mudler@localai.io> * Add ttl settings, fix tests Signed-off-by: Ettore Di Giacinto <mudler@localai.io> * fix: register cors middleware by default Signed-off-by: Ettore Di Giacinto <mudler@localai.io> * satisfy schema Signed-off-by: Ettore Di Giacinto <mudler@localai.io> * Logitbias and logprobs Signed-off-by: Ettore Di Giacinto <mudler@localai.io> * Add grammar Signed-off-by: Ettore Di Giacinto <mudler@localai.io> * SSE compliance Signed-off-by: Ettore Di Giacinto <mudler@localai.io> * tool JSON conversion Signed-off-by: Ettore Di Giacinto <mudler@localai.io> * support background mode Signed-off-by: Ettore Di Giacinto <mudler@localai.io> * swagger Signed-off-by: Ettore Di Giacinto <mudler@localai.io> * drop code. This is handled in the handler Signed-off-by: Ettore Di Giacinto <mudler@localai.io> * Small refactorings Signed-off-by: Ettore Di Giacinto <mudler@localai.io> * background mode for MCP Signed-off-by: Ettore Di Giacinto <mudler@localai.io> --------- Signed-off-by: Ettore Di Giacinto <mudler@localai.io>	2026-01-17 22:11:47 +01:00

4 Commits