LocalAI

mirror of https://github.com/mudler/LocalAI.git synced 2026-06-14 19:58:44 -04:00

Author	SHA1	Message	Date
Richard Palethorpe	085fc53bbc	fix(router): production-ready request router + auto-size batch for embedding/rerank (#10104 ) * fix(router): score classifier production-readiness Conversation trimming runs through the classifier model's chat template and trims by exact token count, sized to the model's n_batch which is now scaled to context so long probes can't crash the backend. Missing chat_message templates are a hard error at router build time. Router- facing factories (Embedder/Scorer/Reranker/TokenCounter) re-resolve ModelConfig per call so a model installed post-startup doesn't bind a stub Backend="" config and silently fall into the loader's auto- iterate path. New 'vector_store' backend trace recorded inside localVectorStore on every Search/Insert — including the backend-load-failure path that previously vanished into an xlog.Warn — with outcome tagging (hit/miss/empty_store/backend_load_error/find_error/insert_error/ok). Companion cleanup drops misleading similarity:0 and input_tokens_count:0 from non-hit and text-mode traces. Gallery local-store-development aliases to 'local-store' so the master image satisfies pkg/model.LocalStoreBackend lookups from the embedding cache. Misc: llama-cpp TokenizeString reads the correct 'prompt' JSON key (the original bug); ModelTokenize nil-guard; non-fatal mitm proxy startup; PII 'route_local' renamed to 'allow' with docs/UI in sync; model-editor footer no longer eats the edit area on small screens; several config-editor template/dropdown/section fixes. Tests: e2e router specs (casual/code-hint + long-conversation trim), vector_store trace specs, lazy-factory specs, gallery dev-alias resolution, Playwright trace badge + scroll regression. Assisted-by: Claude:claude-opus-4-7 [Claude Code] Signed-off-by: Richard Palethorpe <io@richiejp.com> * feat(backend): auto-size batch to context for embedding and rerank models Embedding and rerank models pool over the whole input in a single physical batch (n_ubatch). With batch left at the 512 default, the backend rejects longer inputs with "input is too large to process", silently capping a large-context embedder (e.g. 8k/32k) at 512 tokens. Size n_batch to the context for these single-pass usecases, mirroring the existing FLAG_SCORE behaviour; an explicit batch: still wins. Extracts EffectiveContextSize/EffectiveBatchSize from grpcModelOpts so the effective decode window has one home for other callers to reuse. Adds an e2e-aio regression test that embeds a >512-token input. The AIO embedding model is switched to nomic-embed-text-v1.5 (2048 context) because the previous granite model was capped at 512 tokens and could not exercise the larger batch. Assisted-by: claude-code:claude-opus-4-8 [Claude Code] Signed-off-by: Richard Palethorpe <io@richiejp.com> * fix(gallery): raise arch-router scoring output cap via parallel:64 Scoring decodes the whole prompt+candidate in a single llama_decode and reads one logit row per candidate token. The vendored llama.cpp server caps causal output rows at n_parallel, so the default of 1 aborts with GGML_ASSERT(n_outputs_max <= cparams.n_outputs_max) on multi-token route labels. Set options: [parallel:64] on both arch-router quant entries to lift the cap; kv_unified (the grpc-server default) keeps the full context per sequence, so this does not split the KV cache. Assisted-by: claude-code:claude-opus-4-8 [Claude Code] Signed-off-by: Richard Palethorpe <io@richiejp.com> --------- Signed-off-by: Richard Palethorpe <io@richiejp.com>	2026-06-12 16:21:15 +02:00
LocalAI [bot]	1198d10b58	fix(traces): cap backend trace Data to keep admin UI responsive (#9960 ) * fix(traces): cap backend trace Data field so the admin UI stays responsive The previous fix (#9946) capped API trace bodies but missed backend traces, which carry the same blast radius: - LLM backend traces store the full chat messages JSON, full response, and full streaming deltas. Every agent-pool reasoning step ships the full RAG-augmented history (50-500 KiB per trace, often 100+ traces queued). - TTS / audio_transform / transcript traces embed a 30s audio snippet as base64, around 1.3 MiB per trace. Both blow the /api/backend-traces JSON past tens of MiB. The admin Traces page then keeps re-downloading and re-parsing the buffer faster than the 5s auto-refresh and stays in the loading state forever, the same symptom the API-side fix addressed. Apply two complementary caps, both honoring LOCALAI_TRACING_MAX_BODY_BYTES: Option A (safety net in core/trace): RecordBackendTrace walks the Data map recursively and replaces any string value larger than the cap with "<truncated: N bytes>". Catches anything a future producer forgets. Option B (head-preserving at the producer): - core/backend/llm.go: TruncateToBytes on messages, response, and chat_deltas content/reasoning_content so the leading content stays readable in the UI. - core/trace/audio_snippet.go: omit audio_wav_base64 when the encoded blob would exceed the cap (truncated base64 is undecodable). The quality metrics still ship and the UI's WaveformPlayer simply skips when the field is absent. TruncateToBytes is bounded to <= maxBytes so Option A leaves the producer's head-preserving output alone instead of replacing it with the bare marker. Signed-off-by: Ettore Di Giacinto <mudler@localai.io> Assisted-by: Claude:claude-opus-4-7 * fix(react-ui): expose tracing_max_body_bytes in Settings and Traces panels The setting was already plumbed through env (LOCALAI_TRACING_MAX_BODY_BYTES), CLI flag, and the runtime_settings.json GET/PUT schema, but neither the main Settings page nor the inline Traces panel offered an input for it. Admins hitting the "Traces UI stuck loading" symptom had to know to set an env var or PUT raw JSON to /api/settings to dial the cap. Add a "Max Body Bytes" row next to "Max Items" in both places. Same input type, same disabled-when-tracing-off semantics, placeholder shows the 65536 default so users see what they're inheriting. Signed-off-by: Ettore Di Giacinto <mudler@localai.io> Assisted-by: Claude:claude-opus-4-7 * test(react-ui): disambiguate Max Items locator after adding Max Body Bytes The Tracing settings panel now has two number inputs. The previous spec matched 'input[type="number"]' which became ambiguous and triggered a Playwright strict-mode violation in CI. Switch to getByPlaceholder('100') for Max Items and add a parallel spec for the new Max Body Bytes field using getByPlaceholder('65536'). Signed-off-by: Ettore Di Giacinto <mudler@localai.io> Assisted-by: Claude:claude-opus-4-7 --------- Signed-off-by: Ettore Di Giacinto <mudler@localai.io> Co-authored-by: Ettore Di Giacinto <mudler@localai.io>	2026-05-23 14:50:40 +02:00
Ettore Di Giacinto	59108fbe32	feat: add distributed mode (#9124 ) * feat: add distributed mode (experimental) Signed-off-by: Ettore Di Giacinto <mudler@localai.io> * fix data races, mutexes, transactions Signed-off-by: Ettore Di Giacinto <mudler@localai.io> * refactorings Signed-off-by: Ettore Di Giacinto <mudler@localai.io> * fixups Signed-off-by: Ettore Di Giacinto <mudler@localai.io> * fix events and tool stream in agent chat Signed-off-by: Ettore Di Giacinto <mudler@localai.io> * use ginkgo Signed-off-by: Ettore Di Giacinto <mudler@localai.io> * refactoring and consolidation Signed-off-by: Ettore Di Giacinto <mudler@localai.io> * refactoring and consolidation Signed-off-by: Ettore Di Giacinto <mudler@localai.io> * refactoring and consolidation Signed-off-by: Ettore Di Giacinto <mudler@localai.io> * refactoring and consolidation Signed-off-by: Ettore Di Giacinto <mudler@localai.io> * refactoring and consolidation Signed-off-by: Ettore Di Giacinto <mudler@localai.io> * refactoring and consolidation Signed-off-by: Ettore Di Giacinto <mudler@localai.io> * refactoring and consolidation Signed-off-by: Ettore Di Giacinto <mudler@localai.io> * refactoring and consolidation Signed-off-by: Ettore Di Giacinto <mudler@localai.io> * fix(cron): compute correctly time boundaries avoiding re-triggering Signed-off-by: Ettore Di Giacinto <mudler@localai.io> * enhancements, refactorings Signed-off-by: Ettore Di Giacinto <mudler@localai.io> * do not flood of healthy checks Signed-off-by: Ettore Di Giacinto <mudler@localai.io> * do not list obvious backends as text backends Signed-off-by: Ettore Di Giacinto <mudler@localai.io> * tests fixups Signed-off-by: Ettore Di Giacinto <mudler@localai.io> * refactoring and consolidation Signed-off-by: Ettore Di Giacinto <mudler@localai.io> * Drop redundant healthcheck Signed-off-by: Ettore Di Giacinto <mudler@localai.io> * enhancements, refactorings Signed-off-by: Ettore Di Giacinto <mudler@localai.io> --------- Signed-off-by: Ettore Di Giacinto <mudler@localai.io>	2026-03-30 00:47:27 +02:00
Richard Palethorpe	35d509d8e7	feat(ui): Per model backend logs and various fixes (#9028 ) * feat(gallery): Switch to expandable box instead of pop-over and display model files Signed-off-by: Richard Palethorpe <io@richiejp.com> * feat(ui, backends): Add individual backend logging Signed-off-by: Richard Palethorpe <io@richiejp.com> * fix(ui): Set the context settings from the model config Signed-off-by: Richard Palethorpe <io@richiejp.com> --------- Signed-off-by: Richard Palethorpe <io@richiejp.com>	2026-03-18 08:31:26 +01:00
Richard Palethorpe	51eec4e6b8	feat(traces): Add backend traces (#8609 ) Signed-off-by: Richard Palethorpe <io@richiejp.com>	2026-02-20 23:47:33 +01:00
Ettore Di Giacinto	fc5b9ebfcc	feat(loader): enhance single active backend to support LRU eviction (#7535 ) * feat(loader): refactor single active backend support to LRU This changeset introduces LRU management of loaded backends. Users can set now a maximum number of models to be loaded concurrently, and, when setting LocalAI in single active backend mode we set LRU to 1 for backward compatibility. Signed-off-by: Ettore Di Giacinto <mudler@localai.io> * chore: add tests Signed-off-by: Ettore Di Giacinto <mudler@localai.io> * Update docs Signed-off-by: Ettore Di Giacinto <mudler@localai.io> * Fixups Signed-off-by: Ettore Di Giacinto <mudler@localai.io> --------- Signed-off-by: Ettore Di Giacinto <mudler@localai.io>	2025-12-12 12:28:38 +01:00
Ettore Di Giacinto	089efe05fd	feat(backends): add system backend, refactor (#6059 ) - Add a system backend path - Refactor and consolidate system information in system state - Use system state in all the components to figure out the system paths to used whenever needed - Refactor BackendConfig -> ModelConfig. This was otherway misleading as now we do have a backend configuration which is not the model config. Signed-off-by: Ettore Di Giacinto <mudler@localai.io>	2025-08-14 19:38:26 +02:00
Ettore Di Giacinto	2c425e9c69	feat(loader): enhance single active backend by treating as singleton (#5107 ) feat(loader): enhance single active backend by treating at singleton Signed-off-by: Ettore Di Giacinto <mudler@localai.io>	2025-04-01 20:58:11 +02:00
Dave	3cddf24747	feat: Centralized Request Processing middleware (#3847 ) * squash past, centralize request middleware PR Signed-off-by: Dave Lee <dave@gray101.com> * migrate bruno request files to examples repo Signed-off-by: Dave Lee <dave@gray101.com> * fix Signed-off-by: Dave Lee <dave@gray101.com> * Update tests/e2e-aio/e2e_test.go Signed-off-by: Ettore Di Giacinto <mudler@users.noreply.github.com> --------- Signed-off-by: Dave Lee <dave@gray101.com> Signed-off-by: Ettore Di Giacinto <mudler@users.noreply.github.com> Co-authored-by: Ettore Di Giacinto <mudler@users.noreply.github.com>	2025-02-10 12:06:16 +01:00
Shraddha	03974a4dd4	feat: tokenization with llama.cpp (#4724 ) feat: tokenization Signed-off-by: shraddhazpy <shraddha@shraddhafive.in>	2025-02-02 17:39:43 +00:00
Ettore Di Giacinto	6daef00d30	chore(refactor): drop unnecessary code in loader (#4096 ) * chore: simplify passing options to ModelOptions Signed-off-by: Ettore Di Giacinto <mudler@localai.io> * chore(refactor): do not expose internal backend Loader Signed-off-by: Ettore Di Giacinto <mudler@localai.io> --------- Signed-off-by: Ettore Di Giacinto <mudler@localai.io>	2024-11-08 21:54:25 +01:00
Ettore Di Giacinto	3acd767ac4	chore: simplify model loading (#3715 ) Signed-off-by: Ettore Di Giacinto <mudler@localai.io>	2024-10-02 08:59:06 +02:00
Shraddha	5488fc3bc1	feat: tokenization endpoint (#3710 ) endpoint to access the tokenizer Signed-off-by: shraddhazpy <shraddha@shraddhafive.in> Co-authored-by: Ettore Di Giacinto <mudler@users.noreply.github.com> Co-authored-by: Dave <dave@gray101.com>	2024-10-02 08:56:18 +02:00

13 Commits