LocalAI

mirror of https://github.com/mudler/LocalAI.git synced 2026-06-05 07:16:10 -04:00

Author	SHA1	Message	Date
Ettore Di Giacinto	9f41e69bc3	fix(distributed): self-heal stale 'model not loaded' routing In distributed mode the registry can list a model as loaded on a node while the worker has evicted it (autonomous LRU eviction, an out-of-band unload, etc.) yet the backend process survives. The router's cached-node check only verifies the process is alive (probeHealth), so it routes there and inference fails with "<backend>: model not loaded" — and stays broken until the controller restarts and rebuilds its registry. InFlightTrackingClient now reconciles this: when a tracked inference call returns a model-not-loaded error, it drops the stale replica row (RemoveNodeModel) so the next request reloads the model on a healthy node instead of routing back to the evicted one. The original error is returned unchanged; only the registry is corrected. Assisted-by: Claude:claude-opus-4-8 go vet Signed-off-by: Ettore Di Giacinto <mudler@localai.io>	2026-06-04 23:00:50 +00:00
Adira	ef80a0e825	fix(config): add face/speaker recognition constants and register insightface + speaker-recognition (#10110 ) FLAG_FACE_RECOGNITION and FLAG_SPEAKER_RECOGNITION already existed as ModelConfigUsecase bitmask flags, and GuessUsecases already gate-checks both backends by name — but BackendCapabilities had no entries for either, so the UI could not classify them. Also missing were the Method* constants for the five proto-defined RPCs these backends implement (FaceVerify, FaceAnalyze, VoiceVerify, VoiceEmbed, VoiceAnalyze) and the corresponding Usecase* strings and UsecaseInfoMap entries needed to wire them into the rest of the capability system. Changes: - Add MethodFaceVerify, MethodFaceAnalyze, MethodVoiceVerify, MethodVoiceEmbed, MethodVoiceAnalyze GRPCMethod constants - Add UsecaseFaceRecognition ("face_recognition") and UsecaseSpeakerRecognition ("speaker_recognition") Usecase constants - Add UsecaseInfoMap entries for both new usecases, referencing the existing FLAG_FACE_RECOGNITION and FLAG_SPEAKER_RECOGNITION flags - Register insightface: Embedding + Detect + FaceVerify + FaceAnalyze - Register speaker-recognition: VoiceVerify + VoiceEmbed + VoiceAnalyze Follows up on #10107 which left these two out because they needed new constants first. Assisted-by: Claude Sonnet 4.6 <noreply@anthropic.com> Signed-off-by: Adira Denis Muhando <dennisadira@gmail.com>	2026-06-04 21:48:01 +02:00
LocalAI [bot]	92726f7631	fix(distributed): stage directory-based models to remote nodes (#10175 ) Distributed file-staging treated every model path field (ModelFile, etc.) as a single regular file: it os.Open'd the path and streamed its fd as the HTTP PUT body. For directory-based models — e.g. qwen3-tts-cpp, whose weights and tokenizer ggufs live under one directory referenced by parameters.model — opening the directory succeeds but reading its fd returns EISDIR, so routing the model to a remote NATS worker failed with "read /models/<model>: is a directory". Single-file models were unaffected, so only multi-file pipelines (e.g. the realtime TTS stage) broke. stageModelFiles now detects a directory path field and stages each contained file individually (via the new stageDirectory helper), preserving structure with the existing StagingKeyMapper and rewriting the field to the remote directory (deriving ModelPath as before). countStageableFiles makes the progress total count a directory's files so the staging tracker stays accurate. Assisted-by: Claude:claude-opus-4-8 go vet Signed-off-by: Ettore Di Giacinto <mudler@localai.io> Co-authored-by: Ettore Di Giacinto <mudler@localai.io>	2026-06-04 18:05:38 +02:00
LocalAI [bot]	994063ba9a	feat(qwen3-tts-cpp): normalize request language for flexible matching (#10174 ) The qwen3-tts.cpp backend honored the request `language` field only via exact lowercase two-letter codes in the C++ language_to_id table, silently defaulting to English for anything else (en-US, EN, english, ...). Add normalizeLanguage() in the Go handler: lowercase + trim, strip the region/locale suffix (en-US, pt_BR, zh-Hans -> en/pt/zh), and resolve common English full names (english -> en). The canonical codes match the existing C++ table, so no C++ change is needed. Covered by a pure-Go Ginkgo spec. Also document the language field and accepted forms under the Qwen3-TTS docs. Signed-off-by: Ettore Di Giacinto <mudler@localai.io> Assisted-by: Claude:claude-opus-4-8 [Claude Code] Co-authored-by: Ettore Di Giacinto <mudler@localai.io>	2026-06-04 17:26:31 +02:00
LocalAI [bot]	c1a55cf72d	chore: ⬆️ Update mudler/parakeet.cpp to `b11fe5bca78ad8b342dd559a43d76df3984bb447` (#10167 ) ⬆️ Update mudler/parakeet.cpp Signed-off-by: github-actions[bot] <41898282+github-actions[bot]@users.noreply.github.com> Co-authored-by: mudler <2420543+mudler@users.noreply.github.com>	2026-06-04 12:07:09 +02:00
LocalAI [bot]	96758841d8	chore: ⬆️ Update predict-woo/qwen3-tts.cpp to `136e5d36c17083da0321fd96512dc7b263f94a44` (#10165 ) ⬆️ Update predict-woo/qwen3-tts.cpp Signed-off-by: github-actions[bot] <41898282+github-actions[bot]@users.noreply.github.com> Co-authored-by: mudler <2420543+mudler@users.noreply.github.com>	2026-06-04 12:06:55 +02:00
LocalAI [bot]	7a59260621	chore: ⬆️ Update CrispStrobe/CrispASR to `13d54e110e1538e0f0bc3af0680b9ab246cfb48d` (#10145 ) ⬆️ Update CrispStrobe/CrispASR Signed-off-by: github-actions[bot] <41898282+github-actions[bot]@users.noreply.github.com> Co-authored-by: mudler <2420543+mudler@users.noreply.github.com>	2026-06-04 12:06:32 +02:00
LocalAI [bot]	27e63b9a78	feat(tts): support per-request instructions and params (#10172 ) The OpenAI-compatible TTS endpoint accepts an `instructions` field, but it was silently dropped at the HTTP->gRPC boundary: neither schema.TTSRequest nor the gRPC TTSRequest proto carried it, so backends could only read such a value from static YAML options (identical for every request). This blocked per-line emotion/style and, for Qwen3-TTS VoiceDesign, limited a model config to a single designed voice. Plumb a generic per-request instruction string end to end, plus an optional backend-specific params map: - proto: add `optional string instructions` and `map<string,string> params` to TTSRequest. - schema: add Instructions (maps OpenAI `instructions`) and Params (LocalAI extension) to schema.TTSRequest. - core: thread both through ModelTTS/ModelTTSStream via a newTTSRequest helper that attaches instructions only when non-empty (so backends can fall back to YAML when unset); forward them from the /v1/audio/speech handler. - qwen-tts: prefer the per-request instruction over the YAML `instruct` option (used by both mode detection and generation) and merge per-request params. - chatterbox: merge per-request params (coerced to float/int/bool) over YAML options into generate() kwargs. Fully backward compatible: empty instructions fall back to the YAML option and backends that don't support style/voice instructions ignore the field. Closes #10164 Assisted-by: Claude:claude-opus-4-8 [Claude Code] Signed-off-by: Ettore Di Giacinto <mudler@localai.io> Co-authored-by: Ettore Di Giacinto <mudler@localai.io>	2026-06-04 11:45:02 +02:00
LocalAI [bot]	55c0911c23	chore: ⬆️ Update leejet/stable-diffusion.cpp to `1f9ee88e09c258053fa59d5e05e23dfb10fa0b13` (#10166 ) ⬆️ Update leejet/stable-diffusion.cpp Signed-off-by: github-actions[bot] <41898282+github-actions[bot]@users.noreply.github.com> Co-authored-by: mudler <2420543+mudler@users.noreply.github.com>	2026-06-04 09:34:34 +02:00
LocalAI [bot]	f6cb6ab6d9	chore: ⬆️ Update ggml-org/llama.cpp to `94a220cd6745e6e3f8de62870b66fd5b9bc92700` (#10168 ) ⬆️ Update ggml-org/llama.cpp Signed-off-by: github-actions[bot] <41898282+github-actions[bot]@users.noreply.github.com> Co-authored-by: mudler <2420543+mudler@users.noreply.github.com>	2026-06-04 09:34:13 +02:00
LocalAI [bot]	9f11b09c6a	chore(model-gallery): ⬆️ update checksum (#10169 ) ⬆️ Checksum updates in gallery/index.yaml Signed-off-by: github-actions[bot] <41898282+github-actions[bot]@users.noreply.github.com> Co-authored-by: mudler <2420543+mudler@users.noreply.github.com>	2026-06-04 00:32:15 +02:00
LocalAI [bot]	a5c4f822f0	chore: ⬆️ Update antirez/ds4 to `477c0e82e2699b35a65fd0a1ed6fe66b41087dfe` (#10142 ) ⬆️ Update antirez/ds4 Signed-off-by: github-actions[bot] <41898282+github-actions[bot]@users.noreply.github.com> Co-authored-by: mudler <2420543+mudler@users.noreply.github.com>	2026-06-03 19:45:23 +02:00
LocalAI [bot]	fb36c262fe	chore(model gallery): 🤖 add 1 new models via gallery agent (#10163 ) chore(model gallery): 🤖 add new models via gallery agent Signed-off-by: github-actions[bot] <41898282+github-actions[bot]@users.noreply.github.com> Co-authored-by: mudler <2420543+mudler@users.noreply.github.com>	2026-06-03 19:44:51 +02:00
LocalAI [bot]	0e4e8980e6	chore: ⬆️ Update ggml-org/llama.cpp to `5c394fdc8b564eff6faacc50a139529d875f0e36` (#10143 ) ⬆️ Update ggml-org/llama.cpp Signed-off-by: github-actions[bot] <41898282+github-actions[bot]@users.noreply.github.com> Co-authored-by: mudler <2420543+mudler@users.noreply.github.com>	2026-06-03 19:44:21 +02:00
Richard Palethorpe	3a932a9803	feat(distributed): Add NATS JWT authentication and TLS/mTLS options (#10159 ) * feat(distributed): NATS JWT auth, TLS/mTLS options, and e2e coverage Mint per-node NATS user JWTs at registration when LOCALAI_NATS_ACCOUNT_SEED is set, and connect workers with scoped credentials from the register response. Add optional LOCALAI_NATS_TLS_CA/CERT/KEY for private CA and mTLS alongside tls:// URLs, plus test-e2e-distributed and NatsJWT container e2e specs. Document JWT setup (nats-auth-setup.sh) and TLS env vars in distributed-mode. Assisted-by: Grok:grok grok-build Signed-off-by: Richard Palethorpe <io@richiejp.com> * fix(distributed): correct NATS JWT scoping and harden client auth The JWT-auth path added in 46467cc7 had several gaps that fail silently under LOCALAI_NATS_REQUIRE_AUTH: - Agent-worker minted JWTs did not allow the subjects the agent worker actually subscribes to (jobs.mcp-ci.new and nodes.<id>.backend.stop), so MCP-CI jobs and backend-stop session cleanup were silently dropped. Scope the agent permission set to those subjects. - NATS subscription permission violations were swallowed (Subscribe returned a live-but-dead subscription). Confirm subscriptions with a server round-trip so a denial surfaces synchronously, and log async permission errors. - The backend worker connected anonymously when given a JWT without its paired seed; reject the unpaired credential instead. - The documented service-user permissions in nats-auth-setup.sh omitted prefixcache.>, which the frontend publishes and subscribes; add it. Also: add a credential-provider hook to the messaging client (consumed by the follow-up credential-lifecycle change), drop the always-nil error from NatsMessagingOptions, run go mod tidy (jwt/v2 and nkeys are now direct), and gofmt the feature's files. Tests: an agent-JWT e2e spec that connects to the enforcing NATS server and exercises every subscription the agent worker makes, plus permission allow-list coverage unit tests. Assisted-by: Claude:claude-opus-4-8 [Claude Code] Signed-off-by: Richard Palethorpe <io@richiejp.com> * feat(distributed): acquire and auto-refresh worker NATS credentials Workers fetched NATS credentials once at startup, which broke two cases under JWT auth: a worker that registered while still pending admin approval never received a minted JWT (it connected unauthenticated and gave up), and a long-running worker's 24h JWT expired with no way to renew it. Introduce workerregistry.NATSCredentialManager, built on idempotent re-registration (the frontend preserves the node row and mints a fresh JWT each call): - Acquire re-registers through admin approval until the node is approved and credentials are minted (or returns the first success when auth is not required, preserving anonymous-NATS behavior). - RefreshLoop re-registers before the JWT expires (~75% of its lifetime), updating the credentials served to the connection. - Both are bounded (default 100 attempts / consecutive failures) and return an error on exhaustion, so an unapprovable or unrenewable worker exits non-zero and surfaces the problem instead of hanging or drifting toward an expired credential. The messaging client gains WithUserJWTProvider, fetching credentials on each (re)connect so the connection transparently adopts a refreshed JWT when the server expires the old one. RegisterFull exposes the approval status and full response; Register delegates to it. Both the backend worker and the agent worker are wired to this: explicit env credentials are used as-is, minted credentials are acquired-with-wait and refreshed, and a permanent refresh failure shuts the worker down so it restarts and re-acquires. Tests cover Acquire (wait-through-pending, bounded give-up, context cancel), RefreshLoop (refresh-before-expiry, bounded failure, no-expiry exit) and jwtExpiry decoding. Docs updated in distributed-mode.md. Assisted-by: Claude:claude-opus-4-8 [Claude Code] Signed-off-by: Richard Palethorpe <io@richiejp.com> --------- Signed-off-by: Richard Palethorpe <io@richiejp.com>	2026-06-03 19:43:56 +02:00
LocalAI [bot]	9d10418593	fix(parakeet-cpp): convert audio before the non-batched transcribe path (#10161 ) The direct (non-batched) transcription path handed the original upload path straight to the C library via parakeet_capi_transcribe_path_json. That loader only understands 16 kHz mono WAV/PCM, so any other format (MP3, etc.) failed with "parakeet: failed to load audio: <file>". Only the batched path converted the input (via decodeWavMono16k -> utils.AudioToWav). Every other audio backend (whisper, crispasr) converts unconditionally with utils.AudioToWav before handing the file to its engine; the parakeet-cpp fallback was the lone exception. Extract a convertToWavMono16k helper (reused by decodeWavMono16k) that produces a 16 kHz mono WAV in a temp dir, and run the non-batched path through it before calling the C loader. WAV inputs already in the target format are passed through without ffmpeg. Add specs covering the helper (decodable copy + cleanup, and an error on a missing input) that need neither the model, the C library, nor ffmpeg. Assisted-by: Claude:claude-opus-4-8 [Claude Code] Signed-off-by: Ettore Di Giacinto <mudler@localai.io> Co-authored-by: Ettore Di Giacinto <mudler@localai.io>	2026-06-03 15:06:57 +02:00
dependabot[bot]	5470051d4d	chore(deps): bump grpcio from 1.80.0 to 1.81.0 in /backend/python/transformers (#10158 ) chore(deps): bump grpcio in /backend/python/transformers Bumps [grpcio](https://github.com/grpc/grpc) from 1.80.0 to 1.81.0. - [Release notes](https://github.com/grpc/grpc/releases) - [Commits](https://github.com/grpc/grpc/compare/v1.80.0...v1.81.0) --- updated-dependencies: - dependency-name: grpcio dependency-version: 1.81.0 dependency-type: direct:production update-type: version-update:semver-minor ... Signed-off-by: dependabot[bot] <support@github.com> Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com>	2026-06-03 10:38:43 +02:00
LocalAI [bot]	68c5eeebc3	chore: ⬆️ Update ggml-org/whisper.cpp to `610e664ba7cfe3af46125ed1b5a1184fccb51bcd` (#10140 ) ⬆️ Update ggml-org/whisper.cpp Signed-off-by: github-actions[bot] <41898282+github-actions[bot]@users.noreply.github.com> Co-authored-by: mudler <2420543+mudler@users.noreply.github.com>	2026-06-03 10:38:28 +02:00
dependabot[bot]	1531fabe23	chore(deps): bump securego/gosec from 2.22.9 to 2.27.1 (#10147 ) Bumps [securego/gosec](https://github.com/securego/gosec) from 2.22.9 to 2.27.1. - [Release notes](https://github.com/securego/gosec/releases) - [Commits](https://github.com/securego/gosec/compare/v2.22.9...v2.27.1) --- updated-dependencies: - dependency-name: securego/gosec dependency-version: 2.27.1 dependency-type: direct:production update-type: version-update:semver-minor ... Signed-off-by: dependabot[bot] <support@github.com> Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com>	2026-06-03 10:38:07 +02:00
LocalAI [bot]	b7673d5b76	chore: ⬆️ Update leejet/stable-diffusion.cpp to `2d40a8b2adcdf8b5b0ca0535f3bb7801b6ba13e5` (#10144 ) ⬆️ Update leejet/stable-diffusion.cpp Signed-off-by: github-actions[bot] <41898282+github-actions[bot]@users.noreply.github.com> Co-authored-by: mudler <2420543+mudler@users.noreply.github.com>	2026-06-03 10:37:51 +02:00
dependabot[bot]	b64bdaf406	chore(deps): bump github.com/google/go-containerregistry from 0.21.5 to 0.21.6 (#10149 ) chore(deps): bump github.com/google/go-containerregistry Bumps [github.com/google/go-containerregistry](https://github.com/google/go-containerregistry) from 0.21.5 to 0.21.6. - [Release notes](https://github.com/google/go-containerregistry/releases) - [Commits](https://github.com/google/go-containerregistry/compare/v0.21.5...v0.21.6) --- updated-dependencies: - dependency-name: github.com/google/go-containerregistry dependency-version: 0.21.6 dependency-type: direct:production update-type: version-update:semver-patch ... Signed-off-by: dependabot[bot] <support@github.com> Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com>	2026-06-03 10:37:33 +02:00
dependabot[bot]	eebf08ff1d	chore(deps): bump grpcio from 1.80.0 to 1.81.0 in /backend/python/vllm (#10157 ) Bumps [grpcio](https://github.com/grpc/grpc) from 1.80.0 to 1.81.0. - [Release notes](https://github.com/grpc/grpc/releases) - [Commits](https://github.com/grpc/grpc/compare/v1.80.0...v1.81.0) --- updated-dependencies: - dependency-name: grpcio dependency-version: 1.81.0 dependency-type: direct:production update-type: version-update:semver-minor ... Signed-off-by: dependabot[bot] <support@github.com> Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com>	2026-06-03 10:37:16 +02:00
dependabot[bot]	42e51894c3	chore(deps): bump go.opentelemetry.io/otel/exporters/prometheus from 0.65.0 to 0.66.0 (#10151 ) chore(deps): bump go.opentelemetry.io/otel/exporters/prometheus Bumps [go.opentelemetry.io/otel/exporters/prometheus](https://github.com/open-telemetry/opentelemetry-go) from 0.65.0 to 0.66.0. - [Release notes](https://github.com/open-telemetry/opentelemetry-go/releases) - [Changelog](https://github.com/open-telemetry/opentelemetry-go/blob/main/CHANGELOG.md) - [Commits](https://github.com/open-telemetry/opentelemetry-go/compare/exporters/prometheus/v0.65.0...metric/x/v0.66.0) --- updated-dependencies: - dependency-name: go.opentelemetry.io/otel/exporters/prometheus dependency-version: 0.66.0 dependency-type: direct:production update-type: version-update:semver-minor ... Signed-off-by: dependabot[bot] <support@github.com> Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com>	2026-06-03 09:14:42 +02:00
LocalAI [bot]	d9ae6481fb	chore: ⬆️ Update mudler/parakeet.cpp to `9edf17c3ada66e0f881dcff155492867db7ac4cf` (#10141 ) ⬆️ Update mudler/parakeet.cpp Signed-off-by: github-actions[bot] <41898282+github-actions[bot]@users.noreply.github.com> Co-authored-by: mudler <2420543+mudler@users.noreply.github.com>	2026-06-03 08:49:47 +02:00
dependabot[bot]	f1c495a748	chore(deps): bump github.com/mudler/edgevpn from 0.32.2 to 0.34.0 (#10153 ) Bumps [github.com/mudler/edgevpn](https://github.com/mudler/edgevpn) from 0.32.2 to 0.34.0. - [Release notes](https://github.com/mudler/edgevpn/releases) - [Commits](https://github.com/mudler/edgevpn/compare/v0.32.2...v0.34.0) --- updated-dependencies: - dependency-name: github.com/mudler/edgevpn dependency-version: 0.34.0 dependency-type: direct:production update-type: version-update:semver-minor ... Signed-off-by: dependabot[bot] <support@github.com> Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com>	2026-06-03 08:34:16 +02:00
LocalAI [bot]	415b561947	docs: fix distributed-mode diagram (workers use NATS, not PostgreSQL) (#10138 ) docs: fix distributed-mode diagram - workers coordinate via NATS, not PostgreSQL The architecture diagram drew the worker-bound arrows from the PostgreSQL area of the control plane, implying workers connect to PostgreSQL. They do not: PostgreSQL is the frontends shared state, while workers coordinate over NATS (backend.install events) and receive LoadModel over gRPC from a frontend. Re-route the worker arrows to originate from the NATS chip. Assisted-by: Claude:claude-opus-4-8 [Claude Code] Signed-off-by: Ettore Di Giacinto <mudler@localai.io> Co-authored-by: Ettore Di Giacinto <mudler@localai.io>	2026-06-02 22:05:33 +02:00
Ettore Di Giacinto	e6a0d4c375	Remove diagram from distributed mode documentation Removed ASCII diagram of distributed mode architecture from the documentation. Signed-off-by: Ettore Di Giacinto <mudler@users.noreply.github.com>	2026-06-02 18:48:12 +02:00
LocalAI [bot]	7e59a5c7c5	docs: architecture & feature diagrams (blueprint style) (#10137 ) * docs: add 'how LocalAI works' architecture diagram Add a blueprint-style architecture diagram: clients -> small core (API, router, WebUI, agents) -> gRPC -> backend processes pulled on demand as OCI images. Place it on the overview page and replace the stale external architecture image on the reference page. Assisted-by: Claude:claude-opus-4-8 [Claude Code] Signed-off-by: Ettore Di Giacinto <mudler@localai.io> * docs: add blueprint diagrams across feature, distributed & getting-started docs Add 24 architecture/flow/comparison diagrams (PNG + HTML source) under docs/static/images/diagrams/, wired into their docs pages, from an impact-vs-effort audit of the docs. Broaden the API surface on the overview architecture diagram (OpenAI, Anthropic, ElevenLabs, Ollama, and LocalAI's own API) and move the gRPC boundary label clear of the arrows. Pages: distributed mode (architecture, scheduling, ds4 layer-split), distributed inferencing, MLX, realtime, quantization, MCP, agents, mitm & cloud proxy, middleware, reverse-proxy TLS, VRAM, voice & face recognition, reranker, function calling, fine-tuning (recipe + jobs), diarization, audio transform, quickstart, model resolution. Assisted-by: Claude:claude-opus-4-8 [Claude Code] Signed-off-by: Ettore Di Giacinto <mudler@localai.io> * docs: add composable-core diagram to README hero Commit the composable-core card (small core + on-demand backend tiles) alongside the other diagrams and reference it from the README hero via a repo-relative path, so it renders on GitHub. Assisted-by: Claude:claude-opus-4-8 [Claude Code] Signed-off-by: Ettore Di Giacinto <mudler@localai.io> * docs: fix composable-core connectors/badge and federated-vs-worker layout - composable-core: thicken the plug-in connectors so they read clearly, and widen the SEPARATE IMAGE badge so its text no longer overflows the box. - federated-vs-worker: shorten the WHOLE/SPLIT REQUEST pills to fit, and replace the tangled node-to-node activation arrows with a clean fan-out (request split across all sharded nodes), mirroring the federated panel. Assisted-by: Claude:claude-opus-4-8 [Claude Code] Signed-off-by: Ettore Di Giacinto <mudler@localai.io> --------- Signed-off-by: Ettore Di Giacinto <mudler@localai.io> Co-authored-by: Ettore Di Giacinto <mudler@localai.io>	2026-06-02 18:43:22 +02:00
LocalAI [bot]	aea954a482	docs: position LocalAI as a composable engine, not a bundle (#10136 ) Reframe the README hero and docs (homepage, overview, FAQ) around the composable architecture: a small core, with backends built as dedicated gRPC services around best-in-class engines, shipped as separate OCI images and pulled on demand. Lead from strength: drop the "36+ backends" kitchen-sink framing and the "All-in-One Complete AI Stack" / "single binary that gives you everything" lines that read as a monolith. - README: small-core differentiator; composable + open/extensible bullets - _index.md: composable tagline; install only what you use - overview.md: core vs on-demand backends; gRPC/OCI mechanics as benefits; bring-your-own model and backend - faq.md: "Do I need to install all the backends?" and "Can I bring my own model or backend?" Assisted-by: Claude:claude-opus-4-8 [Claude Code] Signed-off-by: Ettore Di Giacinto <mudler@localai.io> Co-authored-by: Ettore Di Giacinto <mudler@localai.io>	2026-06-02 17:34:43 +02:00
Ettore Di Giacinto	595e448714	docs(llama.cpp): note tensor split now works with quantized KV cache (#10135 ) The split_mode: tensor description claimed tensor parallelism requires KV-cache quantization to be disabled. ggml-org/llama.cpp#23792 lifts that restriction by extending the meta backend to preserve shape information through KV-cache flatten/reshape, so cache_type_k/cache_type_v quantization can be combined with -sm tensor on builds that include it. Documentation only: no backend code, grpc-server.cpp comment, or llama.cpp pin changes. Assisted-by: Claude Code:claude-opus-4-8 Signed-off-by: Ettore Di Giacinto <mudler@localai.io>	2026-06-02 15:52:23 +02:00
LocalAI [bot]	860f9d63ad	feat(parakeet-cpp): dynamic batching for concurrent transcription requests (#10112 ) * feat(parakeet-cpp): dynamic-batching scheduler (queue + dispatcher) Assisted-by: Claude:claude-opus-4-8 [Claude Code] Signed-off-by: Ettore Di Giacinto <mudler@localai.io> * feat(parakeet-cpp): dynamic batching for AudioTranscription via batched JSON C-API Drop SingleThread; route unary transcription through the in-process batcher which coalesces concurrent requests into one batched engine call. Streaming stays mutually exclusive via engineMu. Adds batch_max_size / batch_max_wait_ms options (size=1 disables; recommended on CPU). Assisted-by: Claude:claude-opus-4-8 [Claude Code] Signed-off-by: Ettore Di Giacinto <mudler@localai.io> * fix(parakeet-cpp): tear down dispatcher in Free; log batch config; preallocate; clarify stream lock Assisted-by: Claude:claude-opus-4-8 [Claude Code] Signed-off-by: Ettore Di Giacinto <mudler@localai.io> * fix(parakeet-cpp): Ginkgo batcher tests; optional batch C-API binding with per-request fallback The batched JSON C-API symbol exists only in newer libparakeet.so (ABI >= 2); probe it with Dlsym and register optionally so the backend still loads against an older library, falling back to per-request transcription. Rewrites the batcher unit tests as Ginkgo/Gomega specs (forbidigo bans t.Fatal in tests). Assisted-by: Claude:claude-opus-4-8 [Claude Code] Signed-off-by: Ettore Di Giacinto <mudler@localai.io> * feat(parakeet-cpp): debug-log coalesced batch size in runBatch Assisted-by: Claude:claude-opus-4-8 [Claude Code] Signed-off-by: Ettore Di Giacinto <mudler@localai.io> * fix(parakeet-cpp): default batch_max_size to 1 (batching opt-in) Dynamic batching now defaults off (batch_max_size:1, one request at a time). Raise batch_max_size to opt in: it is a large throughput win on GPU under concurrent load, but on CPU and low-concurrency setups it only adds latency, so off is the safer default. The startup log now states whether batching is on or off, and the audio-to-text docs are updated to match. Signed-off-by: Ettore Di Giacinto <mudler@localai.io> Assisted-by: Claude:claude-opus-4-8 [Claude Code] * chore(parakeet-cpp): bump parakeet.cpp to 8a7c482 (batched decode + B=1 fast-path) parakeet.cpp PR #1 merged the batched encoder/decode and the B=1 encoder fast-path to master. Point PARAKEET_VERSION at that commit so the backend builds the batched C-API (parakeet_capi_transcribe_pcm_batch_json) that the dynamic batcher calls; the prior pin (30a3075) predated it, so only the per-request fallback path was exercised. Verified the shared lib builds with the backend's CMake flags and exports the batch symbol. Signed-off-by: Ettore Di Giacinto <mudler@localai.io> Assisted-by: Claude:claude-opus-4-8 [Claude Code] --------- Signed-off-by: Ettore Di Giacinto <mudler@localai.io> Co-authored-by: Ettore Di Giacinto <mudler@localai.io>	2026-06-02 14:49:02 +02:00
LocalAI [bot]	a5a0b3dc4e	chore: ⬆️ Update CrispStrobe/CrispASR to `05e60432bcb5bc2113f8c395a41e86497c11504a` (#10115 ) ⬆️ Update CrispStrobe/CrispASR Signed-off-by: github-actions[bot] <41898282+github-actions[bot]@users.noreply.github.com> Co-authored-by: mudler <2420543+mudler@users.noreply.github.com>	2026-06-02 14:48:47 +02:00
番茄摔成番茄酱	94eca04c60	fix(nemo): pin texterrors to 1.1.6 for GLIBCXX compatibility (#10134 ) Pin texterrors==1.1.6 before nemo_toolkit[asr] in requirements-cublas13.txt. The texterrors package (a NeMo transitive dependency) contains a compiled C++ extension (texterrors_align.so) that may be built from source during OCI image creation. When built on systems with GCC 14+ (e.g. Ubuntu 24.04), the resulting binary requires GLIBCXX_3.4.32, which is not available in the default LocalAI container (Ubuntu 22.04, GLIBCXX up to 3.4.30). Pinning to 1.1.6 (the latest release) ensures: - Reproducible builds across environments - pip resolves the pre-built manylinux2014 wheel (needs only GLIBCXX_3.4.11) instead of potentially building from source with a newer toolchain Fixes #10056 Signed-off-by: 番茄摔成番茄酱 <fqscfqj@outlook.com>	2026-06-02 14:48:27 +02:00
LocalAI [bot]	35bd485d6a	chore: ⬆️ Update ggml-org/llama.cpp to `5dcb71166686799f0d873eab7386234302d05ecf` (#10128 ) ⬆️ Update ggml-org/llama.cpp Signed-off-by: github-actions[bot] <41898282+github-actions[bot]@users.noreply.github.com> Co-authored-by: mudler <2420543+mudler@users.noreply.github.com>	2026-06-02 09:06:35 +02:00
LocalAI [bot]	1fe96f8d9a	chore: ⬆️ Update mudler/parakeet.cpp to `8a7c48209d7882a7ce79a6b306270e4703194543` (#10129 ) ⬆️ Update mudler/parakeet.cpp Signed-off-by: github-actions[bot] <41898282+github-actions[bot]@users.noreply.github.com> Co-authored-by: mudler <2420543+mudler@users.noreply.github.com>	2026-06-02 09:06:19 +02:00
LocalAI [bot]	c508e9d7c6	chore: ⬆️ Update leejet/stable-diffusion.cpp to `7948df8ac1070f5f6881b8d34675821893eb97d6` (#10127 ) ⬆️ Update leejet/stable-diffusion.cpp Signed-off-by: github-actions[bot] <41898282+github-actions[bot]@users.noreply.github.com> Co-authored-by: mudler <2420543+mudler@users.noreply.github.com>	2026-06-02 09:06:03 +02:00
LocalAI [bot]	55e754fd05	chore: ⬆️ Update ggml-org/whisper.cpp to `23ee03506a91ac3d3f0071b40e66a430eebdfa1d` (#10130 ) ⬆️ Update ggml-org/whisper.cpp Signed-off-by: github-actions[bot] <41898282+github-actions[bot]@users.noreply.github.com> Co-authored-by: mudler <2420543+mudler@users.noreply.github.com>	2026-06-02 01:43:03 +02:00
LocalAI [bot]	a17753f7d1	chore(model-gallery): ⬆️ update checksum (#10131 ) ⬆️ Checksum updates in gallery/index.yaml Signed-off-by: github-actions[bot] <41898282+github-actions[bot]@users.noreply.github.com> Co-authored-by: mudler <2420543+mudler@users.noreply.github.com>	2026-06-01 23:39:47 +02:00
Zhao73	c61838dba6	docs: fix documentation typos (#10125 ) Correct clear spelling mistakes in documentation without changing behavior. Confidence: high Scope-risk: narrow Tested: git diff --check; uvx codespell on changed files Not-tested: Full docs build not run; text-only changes Assisted-by: Codex:gpt-5 codespell	2026-06-01 14:31:08 +02:00
LocalAI [bot]	7013e13f05	chore: ⬆️ Update ggml-org/llama.cpp to `399739d5c5978351f39e3454bfbfbab4f369088f` (#10119 ) ⬆️ Update ggml-org/llama.cpp Signed-off-by: github-actions[bot] <41898282+github-actions[bot]@users.noreply.github.com> Co-authored-by: mudler <2420543+mudler@users.noreply.github.com>	2026-06-01 14:24:51 +02:00
Richard Palethorpe	5a0013defe	test(react-ui): add page render-smoke specs, reset the coverage gate (#10122 ) The UI coverage gate was tightened to 0.1pp against a fast-local measurement (39.86% baseline); CI's slower runners measure ~0.9pp lower, so tests-ui-e2e failed there. UI e2e coverage is diffusely non-deterministic and tracks machine speed — a 0.1pp band can't hold across environments. Rather than loosen the gate, raise the floor under it: a render-smoke spec mounts each lazy page (navigate + assert the header renders), covering a dozen previously-untested pages and lifting coverage from ~39% to ~42.7% locally. Restore the tolerance to 0.8pp and set the baseline conservatively (40.0), below the slow-CI floor, so the ratchet holds without flapping. Document the coverage policy — install the git hooks and don't bypass them (no --no-verify, no hand-lowering the baseline or widening the tolerance); raise coverage by adding tests instead; set the UI baseline below the slow-CI floor — in AGENTS.md, CONTRIBUTING.md and .agents/building-and-testing.md. Assisted-by: Claude:claude-opus-4-8 [Claude Code] Signed-off-by: Richard Palethorpe <io@richiejp.com>	2026-06-01 14:24:36 +02:00
LocalAI [bot]	c01ed631d6	refactor(routing): extract replica picker into pkg/clusterrouting (#10123 ) Move ReplicaCandidate and PickBestReplica out of core/services/nodes (which depends on gorm) into a new dependency-light leaf package pkg/clusterrouting, so the p2p federation server can later share the same replica-selection policy without pulling in a database driver. core/services/nodes keeps a type alias and a thin delegator, so every existing reference (the LoadedReplicaStats interface method, the ReplicaCandidate row conversion in registry.go, and the SQL policy-mirror test) compiles and behaves unchanged. This is a pure, behavior-preserving refactor: the full nodes suite, including the policy-mirror spec that pins the SQL ORDER BY to PickBestReplica, stays green. Assisted-by: Claude Code:claude-opus-4-8 Signed-off-by: Ettore Di Giacinto <mudler@localai.io> Co-authored-by: Ettore Di Giacinto <mudler@localai.io>	2026-06-01 09:38:55 +02:00
LocalAI [bot]	d47464cb06	docs: ⬆️ update docs version mudler/LocalAI (#10114 ) ⬆️ Update docs version mudler/LocalAI Signed-off-by: github-actions[bot] <41898282+github-actions[bot]@users.noreply.github.com> Co-authored-by: mudler <2420543+mudler@users.noreply.github.com>	2026-06-01 08:16:29 +02:00
LocalAI [bot]	63f176346e	chore: ⬆️ Update leejet/stable-diffusion.cpp to `be65ac7511b30379b003626c15224798929e33d4` (#10118 ) ⬆️ Update leejet/stable-diffusion.cpp Signed-off-by: github-actions[bot] <41898282+github-actions[bot]@users.noreply.github.com> Co-authored-by: mudler <2420543+mudler@users.noreply.github.com>	2026-06-01 00:43:50 +02:00
LocalAI [bot]	af94d08729	chore: ⬆️ Update ggml-org/whisper.cpp to `fe69461618ffc50ba8afa65c25cc6c6e34d4537f` (#10117 ) ⬆️ Update ggml-org/whisper.cpp Signed-off-by: github-actions[bot] <41898282+github-actions[bot]@users.noreply.github.com> Co-authored-by: mudler <2420543+mudler@users.noreply.github.com>	2026-06-01 00:43:34 +02:00
LocalAI [bot]	6795d38f50	chore: ⬆️ Update mudler/parakeet.cpp to `cb45f68068081af01e7092e91b038ee353eb56be` (#10116 ) ⬆️ Update mudler/parakeet.cpp Signed-off-by: github-actions[bot] <41898282+github-actions[bot]@users.noreply.github.com> Co-authored-by: mudler <2420543+mudler@users.noreply.github.com>	2026-05-31 23:57:15 +02:00
Richard Palethorpe	718223f33b	feat(localvqe/audio): v1.3 release and add spectrograms to audio transform UI (#10113 ) * chore(localvqe): update backend to v1.3, add v1.2/v1.3 gallery models Bump the LocalVQE backend pin 72bfb4c6 -> b0f0378a, which adds the v1.2 (1.3 M) and v1.3 (4.8 M) GGUF SHA-256s to the upstream released-models allowlist (and the arch_version=3 loader) so both load without LOCALVQE_ALLOW_UNHASHED. Add gallery entries for localvqe-v1.2-1.3m and localvqe-v1.3-4.8m (SHA-256 verified against the downloaded weights) and update the audio-transform docs to make v1.3 the current default while noting the compact v1.1/v1.2 alternatives. Assisted-by: Claude:claude-opus-4-8 Claude-Code Signed-off-by: Richard Palethorpe <io@richiejp.com> * chore(flake): add ffmpeg-headless to the dev shell pkg/utils/ffmpeg_test.go shells out to the `ffmpeg` CLI, and the pre-commit gate runs those tests via `make test-coverage`. Without ffmpeg in the dev shell the gate fails with "executable file not found in $PATH". The headless build provides the CLI without GUI/X deps. Assisted-by: Claude:claude-opus-4-8 [Claude Code] Signed-off-by: Richard Palethorpe <io@richiejp.com> * fix(localvqe): parse WAV by walking RIFF sub-chunks Walk the RIFF chunk list instead of assuming the canonical 44-byte header layout. Real inputs (browser-recorded clips, ffmpeg output with an 18/40-byte extensible `fmt ` chunk or trailing LIST/INFO metadata) would otherwise splice header/metadata bytes into the PCM stream as an audible impulse. Honour the `data` chunk size and validate that both `fmt ` and `data` chunks are present. Assisted-by: Claude:claude-opus-4-8 [Claude Code] Signed-off-by: Richard Palethorpe <io@richiejp.com> * fix(security-headers): allow blob: in connect-src for waveform fetch The waveform renderer XHRs/fetches a freshly-created blob: object URL (e.g. an uploaded or enhanced clip before it has a server URL). XHR/fetch of blob: is governed by connect-src, not media-src, so it was blocked by the CSP. Add blob: to connect-src. Assisted-by: Claude:claude-opus-4-8 [Claude Code] Signed-off-by: Richard Palethorpe <io@richiejp.com> * feat(react-ui): add input/output spectrogram view to AudioTransform The transform page only showed time-domain amplitude waveforms, so you could see how loud a clip was but not which frequencies the model touched. Add a time x frequency spectrogram heatmap and render the input and output spectrums side by side, so it's visible which bands the enhancement attenuates (bright input bands that go dark in the output). Computed client-side via a Hann-windowed STFT over both clips (a small dependency-free radix-2 FFT), defaulting to the LocalVQE 512/256 frame geometry. This shows the net input->output spectral change; the model's internal gain mask is not exposed by the backend. - src/utils/fft.js radix-2 FFT - src/hooks/useSpectrogram.js decode + STFT -> normalised dB magnitude grid - src/components/audio/Spectrogram.jsx canvas heatmap (magma colormap) - AudioTransform.jsx dual-spectrogram panel + CSS - e2e spec + UI coverage baseline bump (38.29 -> 39.0; measured ~39.4-40.2) Assisted-by: Claude:claude-opus-4-8 [Claude Code] Signed-off-by: Richard Palethorpe <io@richiejp.com> * test(react-ui): make UI coverage deterministic, tighten the gate UI e2e line coverage swung ~1pp run-to-run (39.1% <-> 40.2%), which forced a loose 0.8pp tolerance on the monotonic gate — a band wide enough to let a real ~300-line regression through silently. The swing was a bug, not inherent jitter: the 'Create Agent navigates' spec ended on the URL assertion, so AgentCreate.jsx's ~400 lines were collected only when its render happened to beat the coverage teardown. Wait for the page to actually render (assert its heading) so those lines are covered every run. With the race gone, repeated runs land within ~0.013pp of each other, so: - tighten UI_COVERAGE_TOLERANCE 0.8 -> 0.1 (noise floor, not a drift band) - set the baseline to the real, reliably-achieved value (39.0 -> 39.86) Localised by running the V8-coverage suite repeatedly and diffing per-file line coverage; AgentCreate.jsx was the sole ~1pp flipper. Assisted-by: Claude:claude-opus-4-8 [Claude Code] Signed-off-by: Richard Palethorpe <io@richiejp.com> --------- Signed-off-by: Richard Palethorpe <io@richiejp.com>	2026-05-31 23:56:46 +02:00
LocalAI [bot]	39e050d9e2	fix(parakeet-cpp): cublas/hipblas/vulkan builds were silently CPU-only (#10120 ) fix(parakeet-cpp): forward PARAKEET_GGML_* so cublas/hipblas/vulkan builds aren't silently CPU-only parakeet.cpp gates its GGML backends behind PARAKEET_GGML_CUDA/HIP/VULKAN and does set(GGML_CUDA ${PARAKEET_GGML_CUDA} CACHE BOOL "" FORCE), which overwrites a bare -DGGML_CUDA=ON back to OFF. So the backend's BUILD_TYPE=cublas (and hipblas, vulkan) produced a CPU-only libparakeet.so. Forward the PARAKEET_GGML_* options instead. Verified on a GB10 (CUDA 13): the lib now links libcudart/libcublas and registers the CUDA backend, vs a CPU-only lib before. Assisted-by: Claude:claude-opus-4-8 [Claude Code] Signed-off-by: Ettore Di Giacinto <mudler@localai.io> Co-authored-by: Ettore Di Giacinto <mudler@localai.io>	2026-05-31 23:56:07 +02:00
LocalAI [bot]	c222161291	feat(distributed): resumable file uploads via HTTP Content-Range (#10109 ) Large model GGUFs (multi-GB) transferred between master and worker over flaky / bandwidth-throttled paths (e.g. libp2p relays with byte caps) used to restart from byte 0 on every transport error. This change adds standard HTTP Range/resume semantics to the worker's PUT /v1/files/<key> endpoint and teaches the master-side HTTPFileStager to consult the worker for the last accepted offset and resume from there. Server side (file_transfer_server.go): - PUT now honors Content-Range: bytes <start>-<end>/<total>. The handler validates that <start> matches the current on-disk size; mismatches return 416 with the actual size in X-File-Size. - Mid-upload chunks return 308 Permanent Redirect ("Resume Incomplete") with the new size, so the client can keep going. - An optional X-Content-SHA256 request header binds an upload to a target hash; cross-attempt drift returns 409. On the final chunk the server re-computes SHA-256 and returns 400 if it doesn't match. - HEAD now advertises Accept-Ranges: bytes and Content-Length, and exposes X-Target-SHA256 for in-progress files (so clients can resume only when the partial bytes belong to the file they want to upload). - Legacy PUTs with no Content-Range keep the original truncate-create semantics — zero behavior change on the happy path. Client side (file_stager_http.go): - Pre-PUT HEAD probe reads X-File-Size + X-Target-SHA256 to determine the resume offset. - doUpload seeks to that offset and sends Content-Range + X-Content-SHA256. - Retry loop switches from fixed 3 attempts / 5s-10s-20s backoff to an outer time budget with exponential backoff (1s -> 30s cap), so a 5GB upload over a flaky link can outlast many short disconnects. - 308 and 416 responses are treated as transient: the next iteration re-HEADs to learn the correct offset. Tests: - Two-chunk Content-Range round-trip produces the correct file + sidecar. - 416 on a Content-Range/file-size mismatch. - 409 on X-Content-SHA256 drift between chunks. - 400 on final-hash mismatch. - HEAD on a partial upload exposes X-Target-SHA256 (not a misleading hash-of-partial-bytes via X-Content-SHA256). - Pre-existing finished file with a different hash is transparently overwritten when a new PUT starts at byte 0. - End-to-end resume: EnsureRemote against a worker that already holds a partial file transfers only the remainder. - Mid-stream connection drop on attempt #1 is recovered by attempt #2 resuming from the partial offset. Assisted-by: Claude:claude-opus-4-7 Signed-off-by: Ettore Di Giacinto <mudler@localai.io> Co-authored-by: Ettore Di Giacinto <mudler@localai.io>	2026-05-31 11:02:20 +00:00
LocalAI [bot]	aa80d4681b	chore: ⬆️ Update ggml-org/llama.cpp to `d6588daa800058dfa54f1d7ea695b1a810c8ae18` (#10093 ) * ⬆️ Update ggml-org/llama.cpp Signed-off-by: github-actions[bot] <41898282+github-actions[bot]@users.noreply.github.com> * fix(llama-cpp): skip begin-of-stream null partial in PredictStream Upstream llama.cpp (ggml-org/llama.cpp#23884), pulled in by this bump, now emits an initial "begin" partial whose to_json() returns null. It exists only to signal the HTTP layer to flush 200 status headers before any token is produced. gRPC has no such concept, and PredictStream had no guard: the null result was fed straight into build_reply_from_json, which threw an uncaught exception. That surfaced as a generic "Unexpected error in RPC handling" and the task was cancelled the instant it launched, breaking the PredictStream e2e spec. Skip null results in both the first-result handling and the streaming loop, mirroring upstream's own `if (first_result_json == nullptr)` guard. Signed-off-by: Ettore Di Giacinto <mudler@localai.io> Assisted-by: Claude:claude-opus-4-8 [Claude Code] --------- Signed-off-by: github-actions[bot] <41898282+github-actions[bot]@users.noreply.github.com> Signed-off-by: Ettore Di Giacinto <mudler@localai.io> Co-authored-by: mudler <2420543+mudler@users.noreply.github.com> Co-authored-by: Ettore Di Giacinto <mudler@localai.io>	2026-05-31 10:26:03 +00:00

1 2 3 4 5 ...

6581 Commits