LocalAI

mirror of https://github.com/mudler/LocalAI.git synced 2026-07-01 11:56:57 -04:00

Author	SHA1	Message	Date
LocalAI [bot]	d641ded194	chore: ⬆️ Update ggml-org/whisper.cpp to `0874de3e8e8e48361dba85c7fe6d176f008bf158` (#10621 ) ⬆️ Update ggml-org/whisper.cpp Signed-off-by: github-actions[bot] <41898282+github-actions[bot]@users.noreply.github.com> Co-authored-by: mudler <2420543+mudler@users.noreply.github.com>	2026-07-01 08:43:40 +02:00
LocalAI [bot]	40445fff05	chore: ⬆️ Update leejet/stable-diffusion.cpp to `484baa41e5e006c52dcd4addc38c830b9489745f` (#10619 ) * ⬆️ Update leejet/stable-diffusion.cpp Signed-off-by: github-actions[bot] <41898282+github-actions[bot]@users.noreply.github.com> * fix(stablediffusion-ggml): adapt to new generate_image() out-param signature leejet/stable-diffusion.cpp@484baa4 changed generate_image() from returning sd_image_t* to returning bool with images_out/num_images_out out-parameters (same pattern already used by generate_video()). Signed-off-by: Ettore Di Giacinto <mudler@localai.io> --------- Signed-off-by: github-actions[bot] <41898282+github-actions[bot]@users.noreply.github.com> Signed-off-by: Ettore Di Giacinto <mudler@localai.io> Co-authored-by: mudler <2420543+mudler@users.noreply.github.com> Co-authored-by: Ettore Di Giacinto <mudler@localai.io>	2026-07-01 08:32:57 +02:00
LocalAI [bot]	b1af37257d	chore: ⬆️ Update CrispStrobe/CrispASR to `3b93758f9725d400eca82976f895e4cec3f31260` (#10597 ) ⬆️ Update CrispStrobe/CrispASR Signed-off-by: github-actions[bot] <41898282+github-actions[bot]@users.noreply.github.com> Co-authored-by: mudler <2420543+mudler@users.noreply.github.com>	2026-06-30 09:17:11 +02:00
LocalAI [bot]	686ce10b54	chore: ⬆️ Update leejet/stable-diffusion.cpp to `3b6c9ca97cfcda8e68e719e6670d06379fcbe943` (#10594 ) ⬆️ Update leejet/stable-diffusion.cpp Signed-off-by: github-actions[bot] <41898282+github-actions[bot]@users.noreply.github.com> Co-authored-by: mudler <2420543+mudler@users.noreply.github.com>	2026-06-30 09:16:21 +02:00
Richard Palethorpe	5d0c43ec6e	feat(realtime): Semantic VAD EOU token (#10444 ) * feat(realtime): EOU-driven semantic_vad turn detection Add a `semantic_vad` turn-detection mode to the realtime API that feeds the transcription model live and decides "the user finished speaking" from the `<EOU>` end-of-utterance token rather than from silence alone. When EOU fires the turn commits immediately (~0.3s); otherwise it falls back to an eagerness-scaled silence threshold (low/med/high = 8/4/2s). Plumbing, bottom to top: - proto: `AudioTranscriptionLive` bidirectional RPC (config-first oneof, mono float PCM @16k, ready-ack / Unimplemented degrade signal) plus `TranscriptResult.eou` for the unary retranscribe gate. - pkg/grpc: client/server/base/embed scaffolding for the bidi stream, modeled on AudioTransformStream; release stream conns on terminal Recv. - parakeet-cpp: live transcription RPC with per-C-call engine locking (one live stream per turn, finalize+free at commit); bump parakeet.cpp to ABI v5 — incremental StreamingMel (no more quadratic per-feed mel recompute that delayed EOU on long turns) and the <EOU>/<EOB> split; strip the literal <EOU>/<EOB> from offline text and set Eou. - core/backend: LiveTranscriptionSession wrapper + pipeline `turn_detection:` config block (type/eagerness/retranscribe). - realtime: semantic_vad integration — live input captions streamed as transcription deltas while the user speaks, EOU-immediate commit with eagerness fallback, optional retranscribe gate (batch re-decode must also end in <EOU> to confirm), clause synthesis off the LLM token callback, and per-turn live-transcription / model_load telemetry. - UI: show the realtime pipeline components as a vertical list. Docs and tests included; opt-in via the pipeline YAML or per-session `session.update`. Non-streaming STT backends degrade to silence-only. Assisted-by: Claude Code:claude-opus-4-8 [Read] [Edit] [Write] [Bash] Assisted-by: Claude Code:claude-fable-5 [Read] [Edit] [Bash] Signed-off-by: Richard Palethorpe <io@richiejp.com> * feat(realtime): explicit formally-verified state machines + parakeet streaming driver The realtime API had several implicit state machines whose state was inferred from scattered booleans, channels, and five separate mutexes, leaving illegal/inconsistent states reachable. Make them explicit and keep the implementation in step with a formal design; rework the parakeet streaming backend along the same lines. Realtime state machines (M1-M5). Each is a sealed sum-type State/Event/Effect with a total, pure Next(state,event)->(state,[]effect) behind a single-writer Coordinator: M1 conncoord connection lifecycle: VAD toggle + once-only teardown (replaces vadServerStarted + a `done` channel closed from two sites). M2 turncoord turn detection: collapses speechStarted and the live-stream "turn open" flag into one state, so discardTurn can no longer desync them and suppress the next onset. M3 respcoord response coordination: serializes the dual-writer start/cancel so at most one response is live; one response.done per response.create. M4 compactcoord conversation compaction: single-flight (replaces the `compacting atomic.Bool` CAS). M5 ttscoord TTS pipeline: open->closing->closed, idempotent wait(), rejects enqueue-after-close (was a silent drop). The Coordinator/Sink/Next plumbing — only the sealed types and Next differed per machine — is extracted once into core/http/endpoints/openai/coordinator as a generic Coordinator[S,E,F]; each machine keeps its public API via type aliases, so no sink, call-site, or test moved. Hierarchy. session_lifecycle.fizz models M1 as the parent region with its children (M2/M3/M4) as one statechart and asserts ChildrenDieWithParent (conn torn => all children terminal, none start after teardown). respcoord and compactcoord gain an absorbing Terminated state + Shutdown event; conncoord's teardown drives the children terminal. This closes a compaction teardown gap: a fire-and-forget compaction could outlive a torn session — compactionSink now takes a session-scoped cancellable context + WaitGroup and joins the in-flight summarize+evict on shutdown. Formal verification. formal-verification/ holds one authoritative FizzBee spec per machine plus the composition spec, each with an always-assertion and a documented one-line edit that makes the checker fail (verified non-vacuous). scripts/realtime-conformance.sh is fail-closed: all Go conformance suites under -race AND a model-check of every .fizz spec; a missing FizzBee is a hard error (only the loud REALTIME_CONFORMANCE_SKIP_FIZZBEE=1 bypasses it, never in CI). FizzBee is pinned by sha256 and installed via scripts/install-fizzbee.sh into .tools/ (gitignored). Wired as make test-realtime-conformance, a CI workflow, and a pre-commit path filter. Go conformance tests are Ginkgo/Gomega (per the repo's forbidigo lint): transition tables + fixed-seed property walks + concurrent/-race specs, no rapid dependency. Design map: docs/design/realtime-state-machines.md. Parakeet streaming backend. The same treatment applied to the parakeet-cpp streaming paths: - AudioTranscriptionStream returns codes.Unimplemented for non-streaming models instead of decoding offline and emitting it as one delta + final. A client that asked for streaming learns the model cannot stream rather than receiving a batch result shaped like a stream. New grpcerrors.StreamTranscriptionUnsupported carries that signal; the HTTP /v1/audio/transcriptions stream path surfaces it as an SSE error event. Mirrors AudioTranscriptionLive, which already did this. - utteranceBoundary (boundary.go): a single definition of the end-of-utterance latch, replacing three open-coded finalEou toggles. Modelled as a two-valued type so illegal states are unrepresentable. - Shared decode driver (driver.go): streamFeedResult (one per-feed event) + feedChunk (hides the ABI v4 JSON vs text-only split) + feedSlices + flushTail. The feed loop is written once. - AudioTranscriptionLive becomes a bidi adapter: it streams the per-feed {delta,eou,eob,words} the realtime turn detector consumes and a terminal FinalResult carrying only Text. Segments/duration/eou are offline-only and no longer produced (nor read) on the live path; liveTraceState drops the terminal eou and keeps the per-feed eou_events count. - AudioTranscriptionStream + streamJSON merge into one driver-based function; streamSegmenter is generalized to the unified event with a text-only fallback that preserves the legacy (no-words) library's per-utterance segmentation. Verified: build/vet/gofumpt clean, golangci-lint 0 issues, all coordinator and parakeet packages under -race, the fail-closed conformance gate green, and make test-realtime (12 e2e WS+WebRTC). Assisted-by: Claude:claude-opus-4-8 [Claude Code] Signed-off-by: Richard Palethorpe <io@richiejp.com> --------- Signed-off-by: Richard Palethorpe <io@richiejp.com>	2026-06-30 09:01:22 +02:00
LocalAI [bot]	5b7b914b4f	chore(recon): re-pin voice/face-detect to squashed release commits (+ graph-cache fix) (#10591 ) chore(recon): re-pin voice/face-detect to squashed release commits The voice-detect.cpp and face-detect.cpp engine repos were squashed to a single release commit, which orphaned the previous pins (voice 3d51077, face 06914b0). Re-pin to the new single-commit SHAs (voice 1db1759, face e22260d). These also fold in a real correctness fix: the persistent graph-cache fingerprint now includes op_params, so two structurally identical GGML_OP_CUSTOM graphs (a blocked 3x3 vs a blocked 1x1 strided conv) can no longer false-hit the cache and replay the wrong kernel. voice CI was failing test_blocked/conv1x1_s2 with an out-of-bounds write on the GGML_NATIVE=OFF build; both engine repos are now green and WeSpeaker embed parity is 1.0 vs golden. Assisted-by: Claude:claude-opus-4-8 [Claude Code] Signed-off-by: Ettore Di Giacinto <mudler@localai.io> Co-authored-by: Ettore Di Giacinto <mudler@localai.io>	2026-06-29 18:48:47 +02:00
LocalAI [bot]	baaa0fe94f	chore: ⬆️ Update mudler/face-detect.cpp to `06914b077d52f90d5421299138e7be6bdd06b5e8` (#10580 ) ⬆️ Update mudler/face-detect.cpp Signed-off-by: github-actions[bot] <41898282+github-actions[bot]@users.noreply.github.com> Co-authored-by: mudler <2420543+mudler@users.noreply.github.com>	2026-06-29 08:04:22 +02:00
LocalAI [bot]	c3b5c7c3fa	chore: ⬆️ Update mudler/voice-detect.cpp to `3d510772357538c5182808ac7de2278b84824e24` (#10581 ) ⬆️ Update mudler/voice-detect.cpp Signed-off-by: github-actions[bot] <41898282+github-actions[bot]@users.noreply.github.com> Co-authored-by: mudler <2420543+mudler@users.noreply.github.com>	2026-06-29 08:03:43 +02:00
LocalAI [bot]	135debf9af	chore: ⬆️ Update CrispStrobe/CrispASR to `6b50f76e59700665358a1aabf5295597fa318e06` (#10583 ) ⬆️ Update CrispStrobe/CrispASR Signed-off-by: github-actions[bot] <41898282+github-actions[bot]@users.noreply.github.com> Co-authored-by: mudler <2420543+mudler@users.noreply.github.com>	2026-06-29 08:03:06 +02:00
LocalAI [bot]	e8c18ae28e	chore: ⬆️ Update leejet/stable-diffusion.cpp to `c1790754d31bec0731ed5fddc9d5b9ff22ee19cd` (#10584 ) ⬆️ Update leejet/stable-diffusion.cpp Signed-off-by: github-actions[bot] <41898282+github-actions[bot]@users.noreply.github.com> Co-authored-by: mudler <2420543+mudler@users.noreply.github.com>	2026-06-29 08:02:52 +02:00
LocalAI [bot]	de2ec2f136	feat(backends): add voice-detect + face-detect ggml backends (replace Python insightface/speaker-recognition) (#10441 ) * feat(voice-detect): add Go purego backend for voice-detect.cpp Add backend/go/voice-detect implementing the Backend gRPC voice subset (VoiceEmbed/VoiceVerify/VoiceAnalyze) over libvoicedetect.so via purego, mirroring the parakeet-cpp / omnivoice-cpp backends. The flat voicedetect_capi C ABI is dlopen'd cgo-less; malloc'd string and float-vector returns are owned by Go and released through the matching capi free functions, with the per-ctx last error surfaced into Go errors. Calls are serialized via base.SingleThread since the C context is not reentrant. Proto field mapping: - VoiceEmbed: VoiceEmbedRequest.audio (path) -> embed_path -> Embedding+Model. - VoiceVerify: audio1/audio2 + threshold (<=0 falls back to the verify_threshold option, default 0.25) -> verify_paths -> verified/distance/ threshold/confidence/model/processing_time_ms. - VoiceAnalyze: audio (path) -> analyze_path_json; the JSON age/gender/emotion document maps to a single VoiceAnalysis segment (start/end 0; gender "label" -> dominant_gender with the remaining float scores as the gender map; emotion label/scores -> dominant_emotion/emotion). The Makefile pins voice-detect.cpp to 47546430, clones+builds libvoicedetect.so with ggml static-linked (PIC, GGML_NATIVE off) so dlopen needs no external libggml/libvoicedetect; ldd on the artifact shows only system libs. Ginkgo tests cover option parsing and analyze-JSON mapping; embed/verify smoke specs gate on VOICEDETECT_BACKEND_TEST_MODEL + VOICEDETECT_BACKEND_TEST_WAV. Signed-off-by: Ettore Di Giacinto <mudler@localai.io> Assisted-by: Claude:claude-opus-4-8 [Claude Code] * feat(voice-detect): wire backend into index, gallery and build Register the voice-detect.cpp speaker-recognition + voice-analysis backend (added in Voice-INT-A) into LocalAI's distribution surfaces, mirroring the ced backend (the closest mudler C++/ggml audio analogue): - backend/index.yaml: add the &voicedetect meta-backend (capabilities platform map, no top-level uri) plus the full set of concrete per-arch image entries (cpu/cuda12/cuda13/metal/rocm/sycl/vulkan/l4t and the -development variants). Referential integrity audited - every alias target resolves. - gallery/index.yaml: add 5 model entries on backend voice-detect - ECAPA-TDNN, WeSpeaker ResNet34, 3D-Speaker ERes2Net, CAM++ and the wav2vec2 age/gender/emotion analyze model. The engine architecture is read from GGUF metadata (voicedetect.arch) at load. GGUF artifacts are not yet published: each files: entry points at the intended mudler/voice-detect-gguf location with a TODO to fill sha256 after upload (no fabricated hashes). - .github/backend-matrix.yml: add the linux build matrix block + the darwin metal entry mirroring ced. - .github/workflows/bump_deps.yaml: track mudler/voice-detect.cpp via VOICEDETECT_VERSION (pin 47546430, = 4754643). - core/config/backend_capabilities.go: register voice-detect in the backend capability map (VoiceVerify/VoiceEmbed/VoiceAnalyze -> speaker_recognition), mirroring speaker-recognition. Signed-off-by: Ettore Di Giacinto <mudler@localai.io> Assisted-by: Claude:claude-opus-4-8 [Claude Code] * feat(face-detect): add purego Go backend for face-detect.cpp Add the LocalAI Go backend that dlopens libfacedetect.so (the flat facedetect_capi_* C-ABI) via purego, mirroring the sibling voice-detect backend. Implements the Face subset of the Backend gRPC service: - Embeddings(PredictOptions): Images[0] base64 -> temp file -> embed_path -> L2-normalized ArcFace embedding. - Detect(DetectOptions): src -> detect_path_json -> Detection boxes (class_name "face", [x1,y1,x2,y2] -> x/y/w/h). - FaceVerify(FaceVerifyRequest): two images + threshold + anti_spoof -> verify_paths; best-effort img areas via detect. - FaceAnalyze(FaceAnalyzeRequest): img -> analyze_path_json -> per-face age + gender ("M"/"F" normalized to "Man"/"Woman"). The Makefile pins face-detect.cpp to 636a1963 and builds the shared lib with ggml + vendored libjpeg-turbo static (PIC), so the .so is ldd-clean (no libggml) and exports only facedetect_capi_* (no jpeg_ symbols). Gated Ginkgo e2e mirrors voice-detect. Note for the gallery-wiring task: backend registration (index.yaml, gallery, core/config/backend_capabilities.go) is intentionally not touched here. Signed-off-by: Ettore Di Giacinto <mudler@localai.io> Assisted-by: Claude:claude-opus-4-8 [Claude Code] * fix(voice-detect): replace em dashes in net-new descriptions Project style forbids em/en dashes. Replace the three U+2014 chars introduced by the voice-detect gallery/index wiring with `-`/`:`. Signed-off-by: Ettore Di Giacinto <mudler@localai.io> Assisted-by: Claude:claude-opus-4-8 [Claude Code] * feat(face-detect): wire backend into index, gallery and build Register the face-detect.cpp face detection / embedding / verification / analysis backend (added in Face-INT-A) into LocalAI's distribution surfaces, mirroring the voice-detect wiring (the closest mudler C++/ggml recognition analogue): - backend/index.yaml: add the &facedetect meta-backend (capabilities platform map, no top-level uri to avoid the meta-backend gotcha) plus the full set of concrete per-arch image entries (cpu/cuda12/cuda13/ metal/rocm/sycl-f16/sycl-f32/vulkan/l4t and the -development variants), 22 entries. Referential integrity audited: every alias target resolves. - gallery/index.yaml: add 4 model entries on backend face-detect - face-detect-buffalo-l/m/s (insightface SCRFD + ArcFace/MBF, NON-COMMERCIAL) and face-detect-yunet-sface (OpenCV-Zoo YuNet + SFace, APACHE-2.0, the commercial-friendly alternative). The detector/embedder architecture is read from GGUF metadata (facedetect.arch) at load; only the real verify_threshold option is set (0.35 buffalo, 0.363 sface). GGUF artifacts are not yet published: each files: entry points at the intended mudler/face-detect-gguf location with a TODO to fill sha256 after upload (no fabricated hashes). - core/config/backend_capabilities.go: register face-detect in the backend capability map (Embedding/Detect/FaceVerify/FaceAnalyze -> face_recognition), mirroring insightface. - .github/backend-matrix.yml: add the linux build matrix block + the darwin metal entry mirroring voice-detect. - .github/workflows/bump_deps.yaml: track mudler/face-detect.cpp via FACEDETECT_VERSION (pin 636a1963). Signed-off-by: Ettore Di Giacinto <mudler@localai.io> Assisted-by: Claude:claude-opus-4-8 [Claude Code] * fix(recon): voice-detect metal build branch + face-detect gallery usecases Add the missing metal BUILD_TYPE branch to the voice-detect Makefile forwarding -DVOICEDETECT_GGML_METAL=ON, mirroring face-detect, so the darwin metal CI artifact is built with the Metal backend instead of CPU-only. Expand the 4 face-detect gallery models' known_usecases to [face_recognition, detection, embeddings] to match the backend capabilities map and the mirrored insightface-buffalo entries, so auto-selection for /v1/detect and /embeddings works. Signed-off-by: Ettore Di Giacinto <mudler@localai.io> Assisted-by: Claude:claude-opus-4-8 [Claude Code] * docs(recon): document voice-detect and face-detect ggml backends Document the new standalone C++/ggml biometric backends as the recommended/default option for face and voice recognition, keeping the existing Python insightface / speaker-recognition backends framed as the legacy path. - features/face-recognition.md: add a face-detect (ggml) backend section with the gallery entries (buffalo-l/m/s non-commercial, yunet-sface Apache-2.0), licensing, and verify/detect/analyze quickstart. - features/voice-recognition.md: add a voice-detect (ggml) backend section with the gallery entries (ecapa-tdnn, wespeaker-resnet34, eres2net, campplus speaker recognizers; emotion-wav2vec2 non-commercial analyze head) and quickstart. - reference/compatibility-table.md: add face-detect.cpp and voice-detect.cpp rows to the Vision, Detection & Recognition table. Signed-off-by: Ettore Di Giacinto <mudler@localai.io> Assisted-by: Claude:claude-opus-4-8 [Claude Code] * chore(gallery): publish recon backend GGUF uris + sha256 Fill in the published HuggingFace GGUF uris and verified sha256 for the 9 recon gallery entries (voice-detect-* and face-detect-), and remove the TODO publish markers. Correct the eres2net, campplus, and emotion-wav2vec2 uris to the actual published filenames. Signed-off-by: Ettore Di Giacinto <mudler@localai.io> Assisted-by: Claude:claude-opus-4-8 [Claude Code] feat(gallery): re-embed buffalo anti-spoof + add audeering age/gender voice model Update the 3 buffalo face-detect GGUF sha256 (anti-spoof ensemble now embedded and re-uploaded under the same filenames/uris) and note the FaceVerify anti_spoof request flag in each description. Add a new voice-detect-age-gender-wav2vec2 gallery entry mirroring the emotion model. Signed-off-by: Ettore Di Giacinto <mudler@localai.io> Assisted-by: Claude:claude-opus-4-8 [Claude Code] * feat(gallery): add face-detect-buffalo-sc and antelopev2 packs Add gallery entries for two newly-published insightface face packs on the face-detect backend: buffalo_sc (smallest pack, SCRFD-500M + small ArcFace) and antelopev2 (higher-accuracy, SCRFD-10G + ArcFace glint360k R100, 512-d). Both are non-commercial research-only. Signed-off-by: Ettore Di Giacinto <mudler@localai.io> Assisted-by: Claude:claude-opus-4-8 [Claude Code] * feat(recon): honor LocalAI per-model threads in voice/face-detect backends LocalAI spawns one backend process per model and serves requests concurrently, so the engines' own min(hardware_concurrency, 8) default can oversubscribe cores. Forward the per-model Threads value from the gRPC LoadModel options into the engine via VOICEDETECT_THREADS / FACEDETECT_THREADS (read at backend construction) before the capi load. A non-positive Threads is treated as unset, leaving the engine default. Signed-off-by: Ettore Di Giacinto <mudler@localai.io> Assisted-by: Claude:claude-opus-4-8 [Claude Code] * chore(recon): bump backend pins to CPU-optimized engine commits voice-detect.cpp -> 0d9c1b3 (radix-2 FFT FBank, threads, flash attn + cached pos-conv); face-detect.cpp -> 523aee1 (thread-gated direct conv, threads). Brings the CPU optimizations into the LocalAI backend builds. GGUF format and parity unchanged, so the published HF GGUFs remain valid. Signed-off-by: Ettore Di Giacinto <mudler@localai.io> Assisted-by: Claude:claude-opus-4-8 [Claude Code] * chore(recon): bump backend pins to round-2 CPU-optimized engines voice-detect.cpp -> fe7e6a3 (ERes2Net 1x1->mul_mat, CAM++ layout+context, wav2vec2 conv-LN, ECAPA capture-drop, AVX512 dispatch opt-in); face-detect.cpp -> 9c8adb7 (AVX2 Winograd F(2x2,3x3) for SCRFD/ArcFace 3x3 convs, ArcFace BN-fold). Parity unchanged (cosine=1.0); GGUF format unchanged, HF GGUFs valid. Signed-off-by: Ettore Di Giacinto <mudler@localai.io> Assisted-by: Claude:claude-opus-4-8 [Claude Code] * chore(recon): bump backend pins to round-3 Winograd engines voice-detect.cpp -> 45122ec (Winograd F(2x2,3x3) for WeSpeaker/ERes2Net 3x3 convs, -22%/-20% @8t); face-detect.cpp -> cd5c962 (Winograd F(4x4,3x3) for SCRFD large maps, -22% @1t on top of F(2x2), more load-stable). Parity held (cosine=1.0); GGUF format unchanged, HF GGUFs valid. Signed-off-by: Ettore Di Giacinto <mudler@localai.io> Assisted-by: Claude:claude-opus-4-8 [Claude Code] * chore(recon): bump backend pins to round-4 Winograd engines (CPU opt complete) voice-detect.cpp -> d2839ca (CAM++ FCM 2D convs through Winograd, -15.5%/-10.3%); face-detect.cpp -> c1db23d (AVX2-vectorized Winograd tile transforms, SCRFD detect -14%/-9.6%). Final CPU optimization round; the conv-kernel lever class is now exhausted (parity held cosine=1.0; GGUF/parity unchanged, HF GGUFs valid). Signed-off-by: Ettore Di Giacinto <mudler@localai.io> Assisted-by: Claude:claude-opus-4-8 [Claude Code] * chore(recon): bump face-detect pin to deep-kernel engine (7ae5c4d) face-detect.cpp -> 7ae5c4d: register-blocked winograd-domain GEMM microkernel (2.8x isolated GFLOP/s), AVX-512 zmm evolution behind runtime CPUID dispatch (ship-safe, AVX2 fallback bit-identical), bias/relu fused into the winograd output transform, and SFace Conv+BN fold + bias/PReLU fusion. SCRFD detect ~1.4x faster end-to-end vs the round-4 baseline; parity bit-exact; portable single binary (function-multiversioned, no global -mavx512f). Signed-off-by: Ettore Di Giacinto <mudler@localai.io> Assisted-by: Claude:claude-opus-4-8 [Claude Code] * chore(recon): bump voice-detect pin to ECAPA operand-order win (e9c56ae) voice-detect.cpp -> e9c56ae: weight-as-src0 mul_mat order in ECAPA's F32 conv1d_same (routes through tinyBLAS sgemm); ECAPA embed 1.67x @1t / ~1.3x @8t, parity cosine=1.0. Isolated to encoder.cpp (ECAPA-only); ERes2Net/CAM++/WeSpeaker do not call conv1d_same so are provably unaffected. Signed-off-by: Ettore Di Giacinto <mudler@localai.io> Assisted-by: Claude:claude-opus-4-8 [Claude Code] * chore(recon): bump pins to FMA-throughput engines (voice f7b9f89, face 2d2d5f0) face -> 2d2d5f0: route ArcFace 3x3 body convs through the AVX-512 winograd microkernel (kWinoMinSize 80->14); ArcFace 1.62x @1t, SCRFD detect to 0.966 of MLAS @1t, no regression. voice -> f7b9f89: runtime-CPUID-dispatched AVX-512 winograd-GEMM microkernel (ship-safe, AVX2 fallback bit-identical); WeSpeaker 1.90x @1t. Parity cosine=1.0 throughout; portable single binaries. Signed-off-by: Ettore Di Giacinto <mudler@localai.io> Assisted-by: Claude:claude-opus-4-8 [Claude Code] * chore(recon): bump pins to MLAS-class direct-conv engines (voice 7ecfd07, face be22d67) Hand-tuned nChw16c AVX-512 register-tiled direct-conv microkernel (~263 GFLOP/s, within 6-7% of MLAS per-op efficiency), runtime-CPUID-dispatched + AVX2 fallback, fused bias/relu. voice 7ecfd07: default 3x3-s1 kernel for WeSpeaker (+37%/+32%) + ERes2Net, CAM++ pinned to Winograd. face be22d67: shape-gated to the ArcFace recognizer body (+25-27% @8t); SCRFD detector stays on Winograd (no regression). Parity cosine=1.0 / detect <=1px on AVX-512 + AVX2 paths. Portable single binaries. Signed-off-by: Ettore Di Giacinto <mudler@localai.io> Assisted-by: Claude:claude-opus-4-8 [Claude Code] * chore(recon): bump voice pin to Phase-A blocked backbone (f4e7eef) WeSpeaker ResNet34 runs as one nChw16c blocked island (2 reorders/forward vs ~60) on AVX-512, default; per-conv directconv fallback on AVX2. +2.9% @1t / +17-19% @8t vs per-conv directconv, parity cosine=1.0. The conv microkernel is already FMA-bound near peak (~0.86-0.98x MLAS-implied); residual to MLAS is sub-peak edge + non-conv tail, documented in docs/cpu-optimization.md. Signed-off-by: Ettore Di Giacinto <mudler@localai.io> Assisted-by: Claude:claude-opus-4-8 [Claude Code] * chore(recon): bump pins to breadth blocked-backbone (voice 7f66871, face d80092b) voice 7f66871: AVX2-vectorized (ymm) blocked island - AVX2-only hosts now run the blocked backbone for WeSpeaker (2.3x over per-conv-AVX2, cosine=1.0); ERes2Net stays per-conv (blocked regresses, opt-in only); CAM++ Winograd-pinned. face d80092b: ArcFace recognizer blocked island, AVX-512 default (-13% @8t, ~0.90x MLAS, the closest conv result), auto per-conv on AVX2; SCRFD untouched on Winograd (0 island invocations during detect). Parity cosine=1.0 / detect <=1px throughout. Signed-off-by: Ettore Di Giacinto <mudler@localai.io> Assisted-by: Claude:claude-opus-4-8 [Claude Code] * chore(recon): bump pins to small-spatial + stem conv kernels (voice 99b1804, face 47fdab6) Measured-gap-driven conv kernels: small-spatial (fill the register tile when output width <= tile width) + small-IC stem + strided-1x1/downsample recovery. ArcFace recognizer 0.57 -> 0.70x MLAS @1t (the closest conv model), WeSpeaker 0.65 -> 0.79x @1t. Parity cosine=1.0 / detect <=1px. The OC-block-sharing lever was a measured dead-end (deep stride-1 is L3-weight-bandwidth bound, not read-port bound) and was NOT shipped. Kernel ceiling reached; further gap needs an algorithm-class change (cache-blocked weight-stationary GEMM, or q8 weights). Signed-off-by: Ettore Di Giacinto <mudler@localai.io> Assisted-by: Claude:claude-opus-4-8 [Claude Code] * chore(recon): bump pins to GPU persistent-graph + multi-model-safe cache (voice 45d2e6b, face 0a4799a) GPU wins (CUDA/ggml backend, no CPU-path change): persistent per-shape graph+context cache in Backend::compute() eliminates the per-call cudaGraph re-instantiation churn -> wav2vec2 emotion+age-gender now AT GPU parity with torch-cuDNN on GB10 (0.97-0.98x), CAM++ -5.7ms; bit-identical parity. Cache hardened multi-model-safe (invalidate-on-free keyed by the ModelLoader weights buffer) so LocalAI multi-model hosting cannot stale-hit. Conv models still trail cuDNN (im2col-materialization-bound) - cuDNN implicit-GEMM lever next. Signed-off-by: Ettore Di Giacinto <mudler@localai.io> Assisted-by: Claude:claude-opus-4-8 [Claude Code] * chore(recon): bump pins to cuDNN-conv-capable engines (voice b6e4356, face 6107a24) Adds the opt-in cuDNN implicit-GEMM conv path (VOICEDETECT_GGML_CUDNN / FACEDETECT_GGML_CUDNN, DEFAULT OFF -> zero build/runtime dep until enabled). On GPU it kills the im2col-materialization bottleneck and reaches torch-cuDNN parity on the spill-bound convs: SCRFD detect 14.8->6.4ms (2.3x, ~parity), WeSpeaker ~parity, ERes2Net beats torch (1.10x); ArcFace/CAM++ neutral (no spill). Parity exact (SCRFD <=1px, cosine=1.0). To USE it in LocalAI, the CUDA backend build must enable the flag AND bundle libcudnn - deferred until a cuDNN-bundled GPU image; flag stays OFF here. Signed-off-by: Ettore Di Giacinto <mudler@localai.io> Assisted-by: Claude:claude-opus-4-8 [Claude Code] * feat(recon): enable cuDNN conv path on arm64+CUDA13 recon backends The voice-detect.cpp / face-detect.cpp engines have an opt-in cuDNN implicit-GEMM conv path behind VOICEDETECT_GGML_CUDNN / FACEDETECT_GGML_CUDNN (default OFF) that kills im2col on the GPU and reaches torch-cuDNN parity (SCRFD 2.3x, WeSpeaker/ERes2Net parity), measured on the GB10 (arm64, CUDA 13, sm_121a). Enable it for the CUDA build, but only where cuDNN actually ships: the arm64 + CUDA 13 image (GB10/Jetson/L4T). x86 CUDA images carry no cuDNN, so flipping it on globally for BUILD_TYPE=cublas would be a link failure. The Makefiles gate on CUDA_MAJOR_VERSION=13 + arch (TARGETARCH from the matrix/Docker build, uname -m fallback for local builds). backend/Dockerfile.golang already installs the runtime libcudnn9-cuda-13 in the arm64+CUDA13 apt block; add the matching libcudnn9-dev-cuda-13 so the build-time link resolves. Signed-off-by: Ettore Di Giacinto <mudler@localai.io> Assisted-by: Claude:claude-opus-4-8 [Claude Code] * chore(recon): bump voice-detect pin to ERes2Net blocked-default (30beecd) Defaults VD_ERES2NET_BLOCKED ON: routes the ERes2Net Res2Net body through the blocked nChw16c AVX-512 directconv island instead of the 1x1 mul_mat fast path (CONT-transpose + skinny low-K GEMM). On the shipped GGML_NATIVE=OFF build (ggml mul_mat is AVX2-only) this wins ~2x at every thread count (2.07x@1t, 2.2x@4t, 2.05x@8t); pure-AVX2 fallback still 1.3-1.62x. Parity exact (cosine=1.000000 vs golden), so registered voices + verify/identify thresholds are unaffected. The prior default-OFF rested on a stale comment whose 23pct regression only held on the non-shipping GGML_NATIVE=ON build. Signed-off-by: Ettore Di Giacinto <mudler@localai.io> Assisted-by: Claude:claude-opus-4-8 [Claude Code] * docs(readme): announce native voice-detect + face-detect backends in Latest News Add a Latest News entry for the new from-scratch C++/ggml biometric backends (voice-detect.cpp + face-detect.cpp) that replace the Python insightface and speaker-recognition backends: no Python/onnxruntime at inference, self-contained GGUF, bit-exact parity, GPU cuDNN parity. Mirrors the parakeet.cpp / locate-anything.cpp native-backend news entries. Refs PR #10441. Signed-off-by: Ettore Di Giacinto <mudler@localai.io> Assisted-by: Claude:claude-opus-4-8 [Claude Code] * chore(recon): re-pin to the squashed engine release commits The voice-detect.cpp and face-detect.cpp histories were squashed to a single release commit, which orphaned the previous pins (voice 30beecd, face 6107a24). Re-pin to the new single-commit SHAs (voice 3d51077, face 06914b0); the tree is identical, so the backend build is unchanged. Signed-off-by: Ettore Di Giacinto <mudler@localai.io> Assisted-by: Claude:claude-opus-4-8 [Claude Code] --------- Signed-off-by: Ettore Di Giacinto <mudler@localai.io> Co-authored-by: Ettore Di Giacinto <mudler@localai.io>	2026-06-28 09:29:08 +02:00
LocalAI [bot]	e68ca109c5	chore: ⬆️ Update CrispStrobe/CrispASR to `6514c9da00b03a2f0f1b49a43fae4f3a01a41844` (#10535 ) ⬆️ Update CrispStrobe/CrispASR Signed-off-by: github-actions[bot] <41898282+github-actions[bot]@users.noreply.github.com> Co-authored-by: mudler <2420543+mudler@users.noreply.github.com>	2026-06-28 08:56:24 +02:00
LocalAI [bot]	6740e988d2	chore: ⬆️ Update ggml-org/whisper.cpp to `0ae02cdb2c7317b50991367c165736ce42ed96ac` (#10532 ) ⬆️ Update ggml-org/whisper.cpp Signed-off-by: github-actions[bot] <41898282+github-actions[bot]@users.noreply.github.com> Co-authored-by: mudler <2420543+mudler@users.noreply.github.com>	2026-06-28 08:56:06 +02:00
LocalAI [bot]	471e38e4e7	chore: ⬆️ Update leejet/stable-diffusion.cpp to `9956436c925a367daeab097598b1ea1f32d3503f` (#10533 ) ⬆️ Update leejet/stable-diffusion.cpp Signed-off-by: github-actions[bot] <41898282+github-actions[bot]@users.noreply.github.com> Co-authored-by: mudler <2420543+mudler@users.noreply.github.com>	2026-06-28 01:55:44 +02:00
LocalAI [bot]	d11b202dd2	fix(backends): whisper darwin run.sh loads whichever fallback lib exists (.so/.dylib) (#10553 ) fix(backends): whisper darwin run.sh loads whichever fallback lib exists The macOS branch hardcoded WHISPER_LIBRARY=$CURDIR/libgowhisper-fallback.dylib, but the cmake build emits a Mach-O named libgowhisper-fallback.so on darwin, so the Go loader panicked at runtime ("dlopen ...dylib: no such file") and the backend exited ("grpc service not ready") — breaking e.g. the silero-vad-ggml VAD on darwin. Pick whichever of .dylib/.so is present so it is robust to the build's naming either way. Assisted-by: Claude:claude-opus-4-8 Signed-off-by: Ettore Di Giacinto <mudler@localai.io> Co-authored-by: Ettore Di Giacinto <mudler@localai.io>	2026-06-27 14:07:56 +02:00
LocalAI [bot]	2c96c2d08e	chore: ⬆️ Update mudler/parakeet.cpp to `f469a57270a1cc4554acb15febf60e56619673b9` (#10530 ) ⬆️ Update mudler/parakeet.cpp Signed-off-by: github-actions[bot] <41898282+github-actions[bot]@users.noreply.github.com> Co-authored-by: mudler <2420543+mudler@users.noreply.github.com>	2026-06-27 00:50:51 +02:00
LocalAI [bot]	17c1fc74b2	fix(backends): darwin packaging for silero-vad (last Linux-only Go backend) (#10528 ) fix(backends): darwin packaging for silero-vad silero-vad was the last Go backend with Linux-only darwin packaging: - package.sh fell through to "Could not detect architecture" -> exit 1 on macOS (no Darwin branch), so its darwin image never packaged. - run.sh exported LD_LIBRARY_PATH, which macOS dyld ignores, so the bundled libonnxruntime.dylib couldn't be found at runtime. Add a Darwin branch to package.sh (skip the glibc/ld.so bundling; add an @loader_path/lib rpath so @rpath resolves to package/lib/) and a DYLD_LIBRARY_PATH branch to run.sh — mirroring the piper darwin fix (#10525). Assisted-by: Claude:claude-opus-4-8 Signed-off-by: Ettore Di Giacinto <mudler@localai.io> Co-authored-by: Ettore Di Giacinto <mudler@localai.io>	2026-06-26 22:31:06 +02:00
LocalAI [bot]	068d397acf	fix(backends): set rpath on the piper darwin binary so it can load its bundled libs (#10525 ) The metal-darwin-arm64-piper backend crashed at launch on macOS: DYLD "Library missing" Library not loaded: @rpath/libucd.dylib Referenced from: .../piper Reason: no LC_RPATH's found The piper binary links libucd, libespeak-ng, libpiper_phonemize and libonnxruntime via @rpath, but ships with no LC_RPATH, so dyld cannot expand @rpath and aborts before piper runs. The libraries themselves are already bundled in package/lib/ by package.sh. Additionally, package.sh's architecture detection only handled the Linux glibc loaders (/lib64/ld-linux-x86-64.so.2, /lib/ld-linux-aarch64.so.1) and otherwise hit `echo "Error: Could not detect architecture"; exit 1`, so on macOS packaging failed outright. Add a Darwin branch (before the Linux checks) that skips the glibc/ld.so bundling macOS has no use for and instead runs `install_name_tool -add_rpath @loader_path/lib` on the piper binary, so @rpath resolves to the bundled package/lib/ directory. Also mirror sherpa-onnx/opus in run.sh: export DYLD_LIBRARY_PATH on Darwin (LD_LIBRARY_PATH is Linux-only) as a defensive fallback. Validated by hand on Apple Silicon: with the rpath added, piper synthesized a real WAV. The darwin build is validated in CI. Assisted-by: Claude:claude-opus-4-8 Signed-off-by: Ettore Di Giacinto <mudler@localai.io> Co-authored-by: Ettore Di Giacinto <mudler@localai.io>	2026-06-26 15:10:15 +02:00
LocalAI [bot]	6afe127cd4	fix(backends): make the opus backend build and package on macOS/Darwin (#10523 ) The opus Go backend (WebRTC audio codec) never built on macOS, so the published master-metal-darwin-arm64-opus image shipped source only — no opus binary and no libopusshim — because every step assumed Linux. - Makefile: hardcoded libopusshim.so with no OS handling. Mirror sherpa-onnx: SHIM_EXT=so / dylib on Darwin and build libopusshim.$(SHIM_EXT). On Darwin link the shim with -undefined dynamic_lookup so it resolves opus_encoder_ctl from the already globally-loaded libopus (codec.go dlopens it RTLD_GLOBAL first) instead of baking an absolute Homebrew path into the dylib, keeping the packaged shim relocatable. - run.sh: hardcoded LD_LIBRARY_PATH + libopusshim.so even on macOS. Add a Darwin branch exporting DYLD_LIBRARY_PATH and the .dylib shim, like sherpa-onnx/run.sh. - package.sh: bundle libopusshim.$(SHIM_EXT) and libopus*.dylib (not just .so) into package/lib so the OCI image (which ships package/.) is self-contained on a runtime with no Homebrew; add a Darwin arch branch so it doesn't warn/skip. - backend_build_darwin.yml: install + link opus and pkg-config via brew so the Makefile's `pkg-config opus` resolves on the macOS runner, and cache opus' Cellar dir. Go code is unchanged; darwin build is validated in CI. Assisted-by: Claude:claude-opus-4-8 Signed-off-by: Ettore Di Giacinto <mudler@localai.io> Co-authored-by: Ettore Di Giacinto <mudler@localai.io>	2026-06-26 11:19:50 +02:00
LocalAI [bot]	253aedff06	chore: ⬆️ Update CrispStrobe/CrispASR to `8f1218141b792b8868861c1af17ba1e361b05dc0` (#10502 ) ⬆️ Update CrispStrobe/CrispASR Signed-off-by: github-actions[bot] <41898282+github-actions[bot]@users.noreply.github.com> Co-authored-by: mudler <2420543+mudler@users.noreply.github.com>	2026-06-26 01:08:09 +02:00
LocalAI [bot]	74f07ecc35	fix(backends): quote $CURDIR in run.sh (fixes backends in paths with spaces) (#10519 ) fix(backends): quote $CURDIR in run.sh so backends work in paths with spaces The backend launcher scripts derive their own directory with CURDIR=$(dirname "$(realpath $0)") and then referenced it unquoted as $CURDIR (e.g. [ -f $CURDIR/lib/ld.so ], export LD_LIBRARY_PATH=$CURDIR/lib:..., exec $CURDIR/<binary> "$@"). When a backend is installed under a path that contains a space - notably macOS's ~/Library/Application Support/... - bash word-splits the unquoted $CURDIR, so the test builtin fails with "binary operator expected" and exec tries to run ".../Library/Application", yielding "No such file or directory". The backend never starts, surfacing as a gRPC "service not ready" error and an HTTP 500. Quote $CURDIR (and the realpath "$0") in every affected run.sh; no logic changes. Co-authored-by: Ettore Di Giacinto <mudler@localai.io> Co-authored-by: Claude Opus 4.8 (1M context) <noreply@anthropic.com>	2026-06-26 01:02:48 +02:00
LocalAI [bot]	286c508ce0	feat(backends): darwin build for the localvqe backend (acoustic echo cancellation) (#10512 ) feat(backends): darwin build for the localvqe backend LocalVQE (acoustic echo cancellation / noise suppression / dereverberation) already builds on Darwin - its Makefile takes the OS=Darwin branch with GGML_METAL=OFF (upstream is CPU + Vulkan only), producing a native arm64 CPU image. It was just never wired into CI. - .github/backend-matrix.yml: add localvqe to includeDarwin (build-type metal, lang go) - the darwin/arm64 build profile; the backend itself stays CPU. - backend/index.yaml: metal: capability + concrete metal-localvqe(-development) entries pointing at the -metal-darwin-arm64-localvqe images. - backend/go/localvqe/Makefile: note on the existing Darwin branch (also the per-backend change the CI path filter needs to build it here). Assisted-by: Claude:opus-4.8 [Claude Code] Signed-off-by: Ettore Di Giacinto <mudler@localai.io> Co-authored-by: Ettore Di Giacinto <mudler@localai.io>	2026-06-25 22:54:36 +02:00
LocalAI [bot]	d1a9d59917	feat(backends): darwin/Metal builds for vision C++/ggml backends (depth-anything, locate-anything, rfdetr-cpp, sam3-cpp) (#10511 ) feat(backends): darwin/Metal builds for the vision C++/ggml backends depth-anything-cpp, locate-anything-cpp, rfdetr-cpp and sam3-cpp already carry a Darwin/Metal path in their Makefiles (GGML_METAL=ON when build-type=metal), but were never wired into CI, so no Metal image was published and Apple Silicon could not install them. - .github/backend-matrix.yml: add the four to includeDarwin (build-type metal, lang go), matching the other go+ggml -cpp Metal entries. - backend/index.yaml: add metal: to each backend's capabilities map (main and -development) plus concrete metal-<backend>(-development) entries pointing at the latest/master -metal-darwin-arm64-<backend> images. - backend/go//Makefile: a one-line note on the existing Darwin branch (also the per-backend change the CI path filter needs to actually build them here). Assisted-by: Claude:opus-4.8 [Claude Code] Signed-off-by: Ettore Di Giacinto <mudler@localai.io> Co-authored-by: Ettore Di Giacinto <mudler@localai.io>	2026-06-25 22:07:56 +02:00
LocalAI [bot]	f1e5071321	chore: ⬆️ Update leejet/stable-diffusion.cpp to `8caa3f908ae6d4a4bef531e73b9a969f266a3d1f` (#10493 ) ⬆️ Update leejet/stable-diffusion.cpp Signed-off-by: github-actions[bot] <41898282+github-actions[bot]@users.noreply.github.com> Co-authored-by: mudler <2420543+mudler@users.noreply.github.com>	2026-06-25 08:11:31 +02:00
LocalAI [bot]	fae9f6356f	chore: ⬆️ Update ServeurpersoCom/qwentts.cpp to `9dbe7ea26a01b30fccb117ae5e86807c1dc23d42` (#10499 ) ⬆️ Update ServeurpersoCom/qwentts.cpp Signed-off-by: github-actions[bot] <41898282+github-actions[bot]@users.noreply.github.com> Co-authored-by: mudler <2420543+mudler@users.noreply.github.com>	2026-06-25 08:10:41 +02:00
LocalAI [bot]	c678530cf0	fix(backends): darwin/metal support across purego Go backends (#10481 ) * fix(parakeet-cpp): darwin/metal support (libparakeet.dylib + DYLD path) The parakeet-cpp backend had no macOS support and panicked at startup on Apple/Metal nodes when purego.Dlopen could not find "libparakeet.so". Fix it across the same four layers the sibling voxtral backend already handles correctly: - main.go: default the dlopen target to libparakeet.dylib on darwin (runtime.GOOS), libparakeet.so elsewhere; PARAKEET_LIBRARY still wins. - Makefile: also stage the built libparakeet.dylib next to the Go sources. - package.sh: accept either the Linux .so[.X.Y] or the macOS .dylib when bundling instead of hard-failing when no .so is present (the macOS case); note that on Darwin only system frameworks are linked. - run.sh: on Darwin set DYLD_LIBRARY_PATH and PARAKEET_LIBRARY to the packaged .dylib; keep LD_LIBRARY_PATH + .so on Linux. Mirrors backend/go/voxtral. Assisted-by: Claude:claude-opus-4-8 Signed-off-by: Ettore Di Giacinto <mudler@localai.io> * fix(backends): darwin/metal support across purego Go backends The parakeet-cpp fix in the previous commit was an instance of a bug shared by nearly every purego/dlopen Go backend: the dlopen target was hardcoded to a .so name and run.sh exported only LD_LIBRARY_PATH, so the backend panicked at startup on macOS/Apple-Metal nodes (dyld needs the .dylib name and DYLD_LIBRARY_PATH). voxtral was the only backend handling this correctly. Apply the same four-layer fix (mirroring backend/go/voxtral) to the remaining affected backends: whisper, sherpa-onnx, ced, stablediffusion-ggml, vibevoice-cpp, qwen3-tts-cpp, omnivoice-cpp, crispasr, acestep-cpp, locate-anything-cpp, depth-anything-cpp, rfdetr-cpp, sam3-cpp, localvqe Per backend: - main.go (sherpa-onnx: backend.go, two libraries): default the dlopen target to the .dylib on darwin (runtime.GOOS), .so elsewhere; the existing <BACKEND>_LIBRARY env override still wins. - run.sh: on Darwin set DYLD_LIBRARY_PATH and point <BACKEND>_LIBRARY at the packaged .dylib; keep LD_LIBRARY_PATH + the Linux CPU-variant (avx/avx2/avx512) selection unchanged in the else branch. - package.sh: also bundle the .dylib and stop hard-failing when no .so is present (the macOS case). - Makefile: also stage the built .dylib. Notes: - stablediffusion-ggml and acestep-cpp build their lib as a CMake MODULE, which emits .so (not .dylib) on macOS; run.sh prefers .dylib and falls back to .so so both layouts work. - sherpa-onnx was already partly darwin-aware (Makefile/package.sh); only run.sh and the two dlopen defaults needed fixing. Linux behavior is unchanged. Verified gofmt-clean and `CGO_ENABLED=0 go build` for every backend. Assisted-by: Claude:claude-opus-4-8 Signed-off-by: Ettore Di Giacinto <mudler@localai.io> --------- Signed-off-by: Ettore Di Giacinto <mudler@localai.io> Co-authored-by: Ettore Di Giacinto <mudler@localai.io>	2026-06-25 08:09:18 +02:00
LocalAI [bot]	3c63431e46	chore: ⬆️ Update ServeurpersoCom/omnivoice.cpp to `0f37401bebe9b20c0160a888e592108fc1d17607` (#10492 ) ⬆️ Update ServeurpersoCom/omnivoice.cpp Signed-off-by: github-actions[bot] <41898282+github-actions[bot]@users.noreply.github.com> Co-authored-by: mudler <2420543+mudler@users.noreply.github.com>	2026-06-25 00:57:58 +02:00
LocalAI [bot]	193d0e6aef	fix(backends): darwin/metal support for supertonic (#10488 ) The supertonic Go TTS backend dlopens ONNX Runtime, but its runtime and packaging scripts were Linux-only: run.sh exported LD_LIBRARY_PATH, pointed ONNXRUNTIME_LIB_PATH at libonnxruntime.so, and always tried the ld.so exec path, while package.sh hard-failed on any non-Linux host. On macOS dyld has no ld.so loader, uses DYLD_LIBRARY_PATH, and ONNX Runtime ships as a .dylib. This applies the same purego .dylib/DYLD_LIBRARY_PATH fix that PR #10481 landed for 15 other ONNX/purego backends (sherpa-onnx, silero-vad, etc.) but which omitted supertonic: - run.sh: on darwin export DYLD_LIBRARY_PATH and point ONNXRUNTIME_LIB_PATH at libonnxruntime.dylib; guard the ld.so exec path to Linux only. - package.sh: recognize Darwin instead of erroring out; the bundled .dylib is resolved via DYLD_LIBRARY_PATH, no glibc/ld.so to bundle. - helper.go: platform-native default library extension (dylib on darwin) for the last-resort dlopen fallback. It also wires the darwin CI build and gallery entries, resolving the inconsistency where backend/index.yaml advertised metal for supertonic but no includeDarwin matrix entry built the image: - .github/backend-matrix.yml: add the -metal-darwin-arm64-supertonic Go entry. - backend/index.yaml: declare metal capabilities and add the concrete metal-supertonic / metal-supertonic-development child entries. The Makefile already detects Darwin/osx/arm64 and stages the per-OS ONNX Runtime tarball, mirroring sherpa-onnx, so no Makefile change is required. Assisted-by: Claude:opus-4.8 [Claude Code] Signed-off-by: Ettore Di Giacinto <mudler@localai.io> Co-authored-by: Ettore Di Giacinto <mudler@localai.io>	2026-06-24 22:19:03 +02:00
LocalAI [bot]	0f3b24436d	chore: ⬆️ Update mudler/parakeet.cpp to `89f5e2977b4d8bccd45e7bcc6f2ef7c4ed49e89a` (#10468 ) ⬆️ Update mudler/parakeet.cpp Signed-off-by: github-actions[bot] <41898282+github-actions[bot]@users.noreply.github.com> Co-authored-by: mudler <2420543+mudler@users.noreply.github.com>	2026-06-24 09:41:43 +02:00
LocalAI [bot]	4b6f911835	chore: ⬆️ Update ggml-org/whisper.cpp to `43d78af5be58f41d6ffbc227d608f104577741ea` (#10466 ) ⬆️ Update ggml-org/whisper.cpp Signed-off-by: github-actions[bot] <41898282+github-actions[bot]@users.noreply.github.com> Co-authored-by: mudler <2420543+mudler@users.noreply.github.com>	2026-06-24 09:41:14 +02:00
LocalAI [bot]	dba9cd7ca4	chore: ⬆️ Update CrispStrobe/CrispASR to `96b2a6ee31d30389fed8a7ef1a54239b75231ddc` (#10465 ) ⬆️ Update CrispStrobe/CrispASR Signed-off-by: github-actions[bot] <41898282+github-actions[bot]@users.noreply.github.com> Co-authored-by: mudler <2420543+mudler@users.noreply.github.com>	2026-06-24 09:40:34 +02:00
LocalAI [bot]	06a7b6cadb	chore: ⬆️ Update leejet/stable-diffusion.cpp to `f440ad9c29dd8bc34e5d1f4b863832b96d6ea05f` (#10457 ) ⬆️ Update leejet/stable-diffusion.cpp Signed-off-by: github-actions[bot] <41898282+github-actions[bot]@users.noreply.github.com> Co-authored-by: mudler <2420543+mudler@users.noreply.github.com>	2026-06-23 13:29:07 +02:00
LocalAI [bot]	67c8889866	chore: ⬆️ Update CrispStrobe/CrispASR to `63b57289255267edf66e43e33bc3911e04a2e92d` (#10455 ) ⬆️ Update CrispStrobe/CrispASR Signed-off-by: github-actions[bot] <41898282+github-actions[bot]@users.noreply.github.com> Co-authored-by: mudler <2420543+mudler@users.noreply.github.com>	2026-06-23 13:28:49 +02:00
LocalAI [bot]	2edc4e25b3	chore: ⬆️ Update ggml-org/whisper.cpp to `bae6bc02b1940bbfb87b6a0299c565e563b916d1` (#10459 ) ⬆️ Update ggml-org/whisper.cpp Signed-off-by: github-actions[bot] <41898282+github-actions[bot]@users.noreply.github.com> Co-authored-by: mudler <2420543+mudler@users.noreply.github.com>	2026-06-23 13:27:51 +02:00
LocalAI [bot]	7226bb9f30	chore: ⬆️ Update CrispStrobe/CrispASR to `7a8cb80907341c0204bd0488c1244764f4163883` (#10315 ) ⬆️ Update CrispStrobe/CrispASR Signed-off-by: github-actions[bot] <41898282+github-actions[bot]@users.noreply.github.com> Co-authored-by: mudler <2420543+mudler@users.noreply.github.com>	2026-06-22 12:21:58 +02:00
LocalAI [bot]	600dafd20b	feat(ced): sound-event classification backend (CED audio tagger) (#10425 ) * feat(ced): sketch sound-classification backend (CED audio tagger) Wires ced.cpp (CED, 527-class AudioSet sound-event tagger; baby cry, footsteps, glass, alarms, dog bark) into LocalAI as a Go/purego backend. SKETCH (backend skeleton real; core REST wiring + CI/gallery is a checklist in DESIGN.md): - backend/backend.proto: new SoundDetection rpc + SoundClass messages (run `make protogen-go` to regenerate pkg/grpc/proto). - backend/go/ced: main.go (purego dlopen libced.so + ced_capi.h), goced.go (Ced gRPC backend: Load + SoundDetection), Makefile (clone-at-pin CED_VERSION, ggml static-PIC shared build), run.sh, package.sh, .gitignore. - DESIGN.md: REST /v1/audio/classification wiring (handler/route/capability registration checklist), gallery/index + CI registration, and a scoping note for the realtime/websocket live-recognition path (sliding-window classify over the existing ws transport + voicegate; the ced C-API per-PCM entry point is already window-friendly). Backend code does not compile until protogen-go regenerates the pb types and a libced.so is built (Makefile clones+builds it). Signed-off-by: Ettore Di Giacinto <mudler@localai.io> * feat(ced): REST /v1/audio/classification endpoint + capability registration Wires the ced sound-event classification backend (AudioSet audio tagger) end to end through the REST surface, mirroring the transcription path. - Handler: core/http/endpoints/openai/sound_classification.go parses the multipart audio upload, temp-files it, resolves the model config and calls the SoundDetection RPC; returns {model, detections[]} JSON. - Backend wrapper: core/backend/sound_classification.go (ModelSoundDetection) loads the model and normalizes the proto response into schema types. - Schema: core/schema/sound_classification.go (SoundClassificationResult). - gRPC layer: SoundDetection wired through the LocalAI wrapper (interface, Backend client, Client, embed, server, base default) so the loader-typed client exposes the RPC; proto regenerated via make protogen-go. - Route: POST /v1/audio/classification (+ /audio/classification alias) with the audio/multipart default-model middleware in routes/openai.go. - Capability surfaces: swagger @Tags/@Router on the handler; FLAG_SOUND_ CLASSIFICATION usecase flag + UsecaseSoundClassification + UsecaseInfoMap + GuessUsecases + ModalityGroups + GetAllModelConfigUsecases; meta usecase option; /api/instructions audio area updated; auth RouteFeatureRegistry + FeatureAudioClassification (APIFeatures, default ON) + FeatureMetas; UI usecaseFilters, capabilities.js CAP_SOUND_CLASSIFICATION, Models.jsx filter + i18n; docs page features/audio-classification.md + whats-new + crosslink. Signed-off-by: Ettore Di Giacinto <mudler@localai.io> * feat(ced): realtime sound-event detection over the websocket API When a realtime pipeline configures a sound-classification model, each VAD-committed utterance (the same window the transcription path produces) is also run through the CED sound-event classifier and the scored AudioSet tags are emitted as a new server event. No new backend rpc is needed: the SoundDetection gRPC method already exists on this branch. - config: add Pipeline.SoundDetection (yaml/json sound_detection,omitempty) beside Transcription/VAD. - realtime: add Model.SoundDetection(ctx, audio, topK, threshold) to the ModelInterface; implement it on wrappedModel and transcriptOnlyModel by calling backend.ModelSoundDetection with the session's sound-classification model config (mirrors how Transcribe dispatches). Load the optional config in newModel / newTranscriptionOnlyModel; nil config keeps it additive. - types: add ConversationItemSoundDetectionEvent (item_id, content_index, detections[]{label,score,index}) with type conversation.item.sound_detection, its ServerEventType constant and MarshalJSON, mirroring the transcription completed event. - realtime: add emitSoundDetection (unary path: classify the committed window, build the event, t.SendEvent) and wire it at the utterance-commit hook right after emitTranscription; gated on session.SoundDetectionEnabled (resolved from Pipeline.SoundDetection at session setup, defaults top_k=5, threshold=0). Its error is logged via xlog but never aborts the turn. - test: Ginkgo specs for emitSoundDetection (tags emitted, empty detections, classifier error) plus a SoundDetection method on the fakeModel double. Signed-off-by: Ettore Di Giacinto <mudler@localai.io> * fix(ced): implement SoundDetection in nodes backend test doubles The SoundDetection method added to the grpc backend interface left two test doubles (fakeBackendClient, fakeGRPCBackend) incomplete, so core/services/nodes failed to compile under `go vet`/`go test` (go build missed it: the doubles live in _test.go). Add the method to both, mirroring their existing Detect mock. Repairs CI for the nodes package. Signed-off-by: Ettore Di Giacinto <mudler@localai.io> * feat(ced): decouple realtime sound detection from VAD (sound-only sessions) Sound-event detection must activate on sounds, not speech, so it no longer runs through the voice VAD/transcription path. A sound-detection-only pipeline (sound_detection set, no transcription/LLM) now: - is accepted by prepareRealtimeConfig (sound_detection counts as a pipeline stage), - builds a lightweight model via newSoundDetectionOnlyModel (no VAD/STT/LLM/TTS loaded), and - defaults the session to turn_detection none (no VAD) with no transcription stage, so the client drives windowing via input_audio_buffer.commit (option A: client-side sliding window). The per-PCM C-API already supports arbitrary windows. commitUtterance gains a sound-only branch: it emits the conversation.item.sound_detection event (scored AudioSet tags) and stops - no transcription, no LLM response. generateResponse is now guarded on a transcription stage being present, so a sound-only turn never invokes the LLM. Existing transcription/VAD sessions are unchanged (additive). Added a commitUtterance sound-only Ginkgo spec asserting it emits the sound event and neither transcribes nor generates a response. go vet + golangci-lint (new-from-merge-base) clean; openai suite green. Signed-off-by: Ettore Di Giacinto <mudler@localai.io> * feat(ced): register sound-classification backend in gallery + CI Mechanical backend-image registration for the ced sound-event classifier, mirroring the parakeet-cpp Go/purego backend everywhere it is wired up. - .github/backend-matrix.yml: add the ced build matrix, field-for-field copies of the parakeet-cpp entries (cpu amd64/arm64, cublas cuda 12/13 amd64, l4t cuda-13 arm64, l4t-jetpack cuda-12 arm64, sycl f32/f16, vulkan amd64/arm64, rocm hipblas, and the metal darwin entry), changing only backend and tag-suffix. dockerfile stays ./backend/Dockerfile.golang. - backend/index.yaml: add the &ced meta anchor (capabilities map per platform) plus ced-development and the per-arch image entries, each uri/mirror tag-suffix matching the matrix exactly. The model gallery (GGUF) entry is intentionally deferred pending the HuggingFace publish (TODO note inline). - scripts/changed-backends.js: add an explicit item.backend === "ced" branch in inferBackendPath mapping to backend/go/ced/, same mechanism and ordering as the parakeet-cpp branch (before the generic golang fallthrough). - .github/workflows/bump_deps.yaml: register mudler/ced.cpp -> CED_VERSION in backend/go/ced/Makefile so the daily bot bumps the pin. - swagger/{docs.go,swagger.json,swagger.yaml}: regenerated via make swagger so the existing /v1/audio/classification annotations land in the generated spec. Signed-off-by: Ettore Di Giacinto <mudler@localai.io> * feat(ced): server-side windowing for realtime sound detection (option B) Adds an optional server-driven sliding-window classifier so a sound-only realtime client only has to stream audio (no input_audio_buffer.commit): - Pipeline.sound_detection_window_ms / sound_detection_hop_ms config knobs. When both > 0 on a sound-only session, the server classifies the last window of streamed audio every hop and emits a conversation.item.sound_ detection event; the input buffer is trimmed to one window so a long stream stays bounded. When unset, the session stays client-driven (option A). Runs independent of VAD (sound events are not speech). - handleSoundWindow (ticker) + classifySoundWindow (one tick, extracted so it is unit-testable) + writeWindowWAV, which declares the true InputSampleRate (NewWAVHeaderWithRate) so the classifier resamples correctly. Goroutine is started after toggleVAD and torn down with the session (close + wg.Wait). - Register pipeline.sound_detection (+window_ms/hop_ms) in the config meta registry; the earlier realtime commit added pipeline.sound_detection without a registry entry, failing TestAllFieldsHaveRegistryEntries. This fixes that and covers the two new knobs. Tests: classifySoundWindow emits an event + trims the buffer to one window, no-ops on too-little audio; writeWindowWAV declares the given sample rate. go build/vet + golangci-lint (new-from-merge-base) clean; config + openai suites green. Signed-off-by: Ettore Di Giacinto <mudler@localai.io> * feat(ced): add ced-base GGUF model gallery entries (f16 + q8_0) The ced-base weights are now published at mudler/ced-base-gguf (Apache-2.0, converted from mispeech/ced-base). Adds gallery/ced.yaml (backend: ced + known_usecases: sound_classification) and two gallery/index.yaml entries (ced-base-f16 default, ced-base-q8 smallest) with sha256-pinned files, and removes the now-resolved TODO from backend/index.yaml. Signed-off-by: Ettore Di Giacinto <mudler@localai.io> * feat(ced): add tiny/mini/small GGUF model gallery entries Publishes the rest of the CED family (same architecture, metadata-driven port verified end-to-end on ced-tiny) to mudler/ced-{tiny,mini,small}-gguf and adds their f16 + q8_0 gallery entries: ced-tiny (5.5M, edge/Pi-class) f16 11MB / q8_0 6MB ced-mini (9.6M) f16 19MB / q8_0 11MB ced-small (22M) f16 42MB / q8_0 23MB All sha256-pinned. ced-base remains the accuracy default. Signed-off-by: Ettore Di Giacinto <mudler@localai.io> * chore(ced): point gallery entries at the consolidated mudler/ced-gguf repo All CED quantizations (tiny/mini/small/base, f16/q8_0) now live in a single HuggingFace repo, mudler/ced-gguf, instead of per-model repos. Repoint the 8 gallery model entries' urls + file uris accordingly. sha256 and filenames are unchanged. Signed-off-by: Ettore Di Giacinto <mudler@localai.io> * chore(ced): bump CED_VERSION to the short-clip fix Pin the ced backend to ced.cpp 99c6ed3, which fixes a crash on any clip shorter than target_length (~10.11s): time_pos_embed was added at its full 63-frame grid instead of being sliced to the clip's actual time grid, tripping ggml_can_repeat in ggml_add. Surfaced by the live realtime e2e (sub-10s windows) and gated with a short-clip parity test upstream. Signed-off-by: Ettore Di Giacinto <mudler@localai.io> * docs(ced): list ced.cpp as a LocalAI-team engine + backend-guide directive - README.md: add ced.cpp to the "native C/C++/GGML engines developed and maintained by the LocalAI project" table. - docs/content/features/backends.md: add a Sound Classification backend category (sound-event classification / audio tagging) listing ced.cpp. - .agents/adding-backends.md: add a "Documenting the backend" section and two verification-checklist items requiring new backends to be documented in the backends.md category list, and in-house native engines to be added to the README maintained-engines table. This directive was missing. Signed-off-by: Ettore Di Giacinto <mudler@localai.io> * chore(ced): repin CED_VERSION to the v0.1.0 release commit ced.cpp history was squashed into a single release commit (tagged v0.1.0), so the previous pin (99c6ed3) no longer exists upstream. Pin to c04ac14, the v0.1.0 release commit, so the backend builds against a commit that exists. Signed-off-by: Ettore Di Giacinto <mudler@localai.io> * fix(ced): silence gosec G304/G103 + govet unsafeptr on audited paths - sound_classification.go: os.Create(dst) where dst = temp dir + path.Base of the upload (no traversal). #nosec G304, matching the depth-anything-cpp handler. - goced.go: reading a NUL-terminated C string from a libced-owned buffer. #nosec G103 (gosec) + //nolint:govet (golangci-lint's unsafeptr check), since the uintptr is a C-owned malloc'd buffer, not Go-GC memory. Signed-off-by: Ettore Di Giacinto <mudler@localai.io> --------- Signed-off-by: Ettore Di Giacinto <mudler@localai.io> Co-authored-by: Ettore Di Giacinto <mudler@localai.io>	2026-06-22 01:00:28 +02:00
LocalAI [bot]	ce8a3e9266	chore: ⬆️ Update ServeurpersoCom/qwentts.cpp to `4536dcdce27c3764a93a06d6bf64026b124962f5` (#10431 ) ⬆️ Update ServeurpersoCom/qwentts.cpp Signed-off-by: github-actions[bot] <41898282+github-actions[bot]@users.noreply.github.com> Co-authored-by: mudler <2420543+mudler@users.noreply.github.com>	2026-06-22 01:00:10 +02:00
LocalAI [bot]	1cf1bf32e1	chore: ⬆️ Update leejet/stable-diffusion.cpp to `b12098f5d09fc83da36e65c784f7bdb16a5a5ebf` (#10429 ) ⬆️ Update leejet/stable-diffusion.cpp Signed-off-by: github-actions[bot] <41898282+github-actions[bot]@users.noreply.github.com> Co-authored-by: mudler <2420543+mudler@users.noreply.github.com>	2026-06-22 00:57:33 +02:00
番茄摔成番茄酱	cf7f9573a2	fix(crispasr): filter garbage words from parakeet word-level timestamps (#10421 ) The parakeet-specific word accessors can return stale initialisation data (model name, binary blobs) for segments with no real speech. Add isValidWord() to filter out words that have: - empty or whitespace-only text - U+FFFD replacement characters (from binary data scrubbing) - negative timestamps - zero duration (end <= start) Also skip empty segments entirely when they have no recognisable content (empty text AND no valid words), preventing spurious subtitle entries like '00:45:33,592 --> 00:45:33,592 parakeet@rH\u000b\ufffdI'. Applies to both AudioTranscription and AudioTranscriptionStream. Signed-off-by: fqscfqj <fqscfqj@outlook.com>	2026-06-21 17:03:33 +02:00
LocalAI [bot]	e19c43cf04	feat(gallery): add Depth Anything V2 models + bump native version (#10413 ) * feat(gallery): add Depth Anything V2 models + bump native version Add Depth Anything V2 (DA2) support to the depth-anything backend. DA2 is depth-only (no camera pose, no confidence) and ships both relative (relative inverse depth) and metric (depth in metres) variants. The Go backend is model-agnostic, so no backend code changes are required — only a native version bump and new gallery entries. - backend/go/depth-anything-cpp/Makefile: pin DEPTHANYTHING_VERSION to the depth-anything.cpp commit that adds the DA2 engine + C-API routing (e3dec57f13a52366bbc4f279ef44804915960a6b, kept alive by the upstream tag da2-support so it survives a squash-merge). - gallery/index.yaml: add 12 DA2 entries (4 base quants, small, large, plus Hypersim indoor and VKITTI outdoor metric models in S/B/L). Metric models carry the metric-depth tag; none carry camera-pose. Assisted-by: Claude:claude-opus-4-8 * chore(depth-anything-cpp): pin to merged DA2 master commit PR #1 (mudler/depth-anything.cpp) merged to master as f4e17de (squash); repoint the pin from the pre-merge commit to the canonical master commit. Assisted-by: Claude:claude-opus-4-8 --------- Co-authored-by: Ettore Di Giacinto <mudler@localai.io>	2026-06-20 14:56:16 +02:00
LocalAI [bot]	93706fec57	chore: ⬆️ Update mudler/parakeet.cpp to `db755a78d39f789bb7d4e3935158a9e8105dbe36` (#10393 ) ⬆️ Update mudler/parakeet.cpp Signed-off-by: github-actions[bot] <41898282+github-actions[bot]@users.noreply.github.com> Co-authored-by: mudler <2420543+mudler@users.noreply.github.com>	2026-06-20 01:37:33 +02:00
LocalAI [bot]	8915f2ab91	chore: ⬆️ Update ggml-org/whisper.cpp to `5ed76e9a079962f1c85cfce44edd325c27ef1f97` (#10396 ) ⬆️ Update ggml-org/whisper.cpp Signed-off-by: github-actions[bot] <41898282+github-actions[bot]@users.noreply.github.com> Co-authored-by: mudler <2420543+mudler@users.noreply.github.com>	2026-06-20 01:37:06 +02:00
LocalAI [bot]	dd928f0bdd	chore: ⬆️ Update ServeurpersoCom/qwentts.cpp to `26fcea5468e4069bc72d1f2fcc812c985e7361bb` (#10409 ) ⬆️ Update ServeurpersoCom/qwentts.cpp Signed-off-by: github-actions[bot] <41898282+github-actions[bot]@users.noreply.github.com> Co-authored-by: mudler <2420543+mudler@users.noreply.github.com>	2026-06-20 01:36:36 +02:00
LocalAI [bot]	c43a752afc	chore: ⬆️ Update ServeurpersoCom/omnivoice.cpp to `96d30169afd5e6bb3fd6a0e9be0eb505bfe81fcd` (#10408 ) ⬆️ Update ServeurpersoCom/omnivoice.cpp Signed-off-by: github-actions[bot] <41898282+github-actions[bot]@users.noreply.github.com> Co-authored-by: mudler <2420543+mudler@users.noreply.github.com>	2026-06-20 01:36:22 +02:00
番茄摔成番茄酱	72d46c1115	feat(crispasr): add word-level timestamp support (#10403 ) * feat(crispasr): add word-level timestamp support Add word-level timestamp extraction to the crispasr backend by calling the CrispASR C library's word accessor functions that are already exported by libgocraspasr but were not previously bound by the Go wrapper. Two families of word functions are supported: 1. Session-based (get_word_count/text/t0/t1) — works per-segment for whisper-like backends. 2. Parakeet-specific (get_parakeet_word_count/text/t0/t1) — returns a global word list for TDT/CTC/RNNT parakeet models where the session API does not expose per-segment word data. The Go code tries session-based first and falls back to parakeet-specific when the session word count is zero. Depends on #10402 (grpc server Words forwarding) for the words to reach the HTTP response. Signed-off-by: fqscfqj <fqscfqj@outlook.com> * fix(crispasr): use portable sed -i.bak for macOS compatibility BSD sed requires -i '' for in-place editing while GNU sed uses -i. Replace with -i.bak which works on both platforms, then remove the backup file. Signed-off-by: fqscfqj <fqscfqj@outlook.com> --------- Signed-off-by: fqscfqj <fqscfqj@outlook.com>	2026-06-19 21:34:30 +02:00
Richard Palethorpe	606128e4e9	feat(vulkan): make Vulkan backends self-contained on the GPU (#10404 ) Vulkan backends bundled their own loader and ICD manifests but neither the Mesa driver the manifests point at nor a way to make the loader find them, so on a runtime base image without Mesa the loader enumerated zero devices and the GPU silently fell back to CPU (only NVIDIA worked, since its ICD is injected by the container toolkit). - scripts/build/package-gpu-libs.sh: for each installed ICD manifest, bundle the driver .so its library_path names — no hard-coded, platform-dependent soname list — plus that driver's ldd dependencies, skipping manifests whose driver isn't installed. Rewrite each library_path to a bare soname so the bundled driver resolves via the LD_LIBRARY_PATH run.sh already sets. - .docker/install-base-deps.sh, backend/Dockerfile.golang, backend/Dockerfile.python: install mesa-vulkan-drivers in every Vulkan builder so the driver + manifests exist to be packaged (the LunarG SDK ships only the loader and shader tooling). - pkg/model/process.go: when a backend ships vulkan/icd.d/, point the loader at it via VK_DRIVER_FILES/VK_ICD_FILENAMES at launch (no-op otherwise). Covered by pkg/model/process_vulkan_test.go. - backend/go/parakeet-cpp/package.sh: complete the L0 stub (was missing the libc-family ldd walk + GPU-lib packaging) by mirroring whisper, so the vulkan-parakeet image actually bundles its GPU runtime. Assisted-by: Claude Code:claude-opus-4-8 Signed-off-by: Richard Palethorpe <io@richiejp.com>	2026-06-19 17:16:33 +02:00
LocalAI [bot]	91f97f2a54	chore: ⬆️ Update ggml-org/whisper.cpp to `86c40c3bd6fc86f1187fb751d111b49e0fc18e84` (#10382 ) ⬆️ Update ggml-org/whisper.cpp Signed-off-by: github-actions[bot] <41898282+github-actions[bot]@users.noreply.github.com> Co-authored-by: mudler <2420543+mudler@users.noreply.github.com>	2026-06-18 08:34:43 +02:00
LocalAI [bot]	55f9ff6805	chore: ⬆️ Update mudler/parakeet.cpp to `92a5f0306be354c109150fe58ae4cc4f8a21ca45` (#10380 ) ⬆️ Update mudler/parakeet.cpp Signed-off-by: github-actions[bot] <41898282+github-actions[bot]@users.noreply.github.com> Co-authored-by: mudler <2420543+mudler@users.noreply.github.com>	2026-06-18 08:32:13 +02:00
LocalAI [bot]	dfd5a00e6f	chore: ⬆️ Update ggml-org/whisper.cpp to `9efddafb9153e1fb22bdc3dd3057072c99165ed2` (#10366 ) ⬆️ Update ggml-org/whisper.cpp Signed-off-by: github-actions[bot] <41898282+github-actions[bot]@users.noreply.github.com> Co-authored-by: mudler <2420543+mudler@users.noreply.github.com>	2026-06-17 09:27:52 +02:00
LocalAI [bot]	63be479066	chore: ⬆️ Update leejet/stable-diffusion.cpp to `7f0e728b7d42f2490dfa5dd9539082d904f2f6b2` (#10370 ) ⬆️ Update leejet/stable-diffusion.cpp Signed-off-by: github-actions[bot] <41898282+github-actions[bot]@users.noreply.github.com> Co-authored-by: mudler <2420543+mudler@users.noreply.github.com>	2026-06-17 09:08:34 +02:00

1 2 3 4 5 ...

373 Commits