LocalAI [bot]
c03e532a18
chore: ⬆️ Update ggml-org/llama.cpp to ae9f8df77882716b1702df2bed8919499e64cc28 ( #7915 )
...
⬆️ Update ggml-org/llama.cpp
Signed-off-by: github-actions[bot] <41898282+github-actions[bot]@users.noreply.github.com>
Co-authored-by: mudler <2420543+mudler@users.noreply.github.com >
2026-01-07 23:24:01 +01:00
Copilot
b2ff1cea2a
feat: enable Vulkan arm64 image builds ( #7912 )
...
* Initial plan
* Add arm64 support for Vulkan builds in Dockerfiles and workflows
Co-authored-by: mudler <2420543+mudler@users.noreply.github.com >
---------
Co-authored-by: copilot-swe-agent[bot] <198982749+Copilot@users.noreply.github.com >
Co-authored-by: mudler <2420543+mudler@users.noreply.github.com >
2026-01-07 21:49:50 +01:00
Ettore Di Giacinto
b964b3d53e
feat(backends): add moonshine backend for faster transcription ( #7833 )
...
* feat(backends): add moonshine backend for faster transcription
Signed-off-by: Ettore Di Giacinto <mudler@localai.io >
* Add backend to CI, update AGENTS.md from this exercise
Signed-off-by: Ettore Di Giacinto <mudler@localai.io >
---------
Signed-off-by: Ettore Di Giacinto <mudler@localai.io >
2026-01-07 21:44:35 +01:00
Copilot
fd53978a7b
feat: package GPU libraries inside backend containers for unified base image ( #7891 )
...
* Initial plan
* Add GPU library packaging for isolated backend environments
- Create scripts/build/package-gpu-libs.sh for packaging CUDA, ROCm, SYCL, and Vulkan libraries
- Update llama-cpp, whisper, stablediffusion-ggml package.sh to include GPU libraries
- Update Dockerfile.python to package GPU libraries into Python backends
- Update libbackend.sh to set LD_LIBRARY_PATH for GPU library loading
Co-authored-by: mudler <2420543+mudler@users.noreply.github.com >
* Address code review feedback: fix variable consistency and quoting
Co-authored-by: mudler <2420543+mudler@users.noreply.github.com >
* Fix code review issues: improve glob handling and remove redundant variable
Co-authored-by: mudler <2420543+mudler@users.noreply.github.com >
* Simplify main Dockerfile and workflow to use unified base image
- Remove GPU-specific driver installation from Dockerfile (CUDA, ROCm, Vulkan, Intel)
- Simplify image.yml workflow to build single unified base image for linux/amd64 and linux/arm64
- GPU libraries are now packaged in individual backend containers
Co-authored-by: mudler <2420543+mudler@users.noreply.github.com >
---------
Co-authored-by: copilot-swe-agent[bot] <198982749+Copilot@users.noreply.github.com >
Co-authored-by: mudler <2420543+mudler@users.noreply.github.com >
Co-authored-by: Ettore Di Giacinto <mudler@users.noreply.github.com >
2026-01-07 15:48:51 +01:00
LocalAI [bot]
23df29fbd3
chore: ⬆️ Update leejet/stable-diffusion.cpp to 9be0b91927dfa4007d053df72dea7302990226bb ( #7895 )
...
⬆️ Update leejet/stable-diffusion.cpp
Signed-off-by: github-actions[bot] <41898282+github-actions[bot]@users.noreply.github.com>
Co-authored-by: mudler <2420543+mudler@users.noreply.github.com >
2026-01-06 22:18:53 +01:00
LocalAI [bot]
fb9879949c
chore: ⬆️ Update ggml-org/llama.cpp to ccbc84a5374bab7a01f68b129411772ddd8e7c79 ( #7894 )
...
⬆️ Update ggml-org/llama.cpp
Signed-off-by: github-actions[bot] <41898282+github-actions[bot]@users.noreply.github.com>
Co-authored-by: mudler <2420543+mudler@users.noreply.github.com >
2026-01-06 22:18:35 +01:00
Richard Palethorpe
e6ba26c3e7
chore: Update to Ubuntu24.04 (cont #7423 ) ( #7769 )
...
* ci(workflows): bump GitHub Actions images to Ubuntu 24.04
Signed-off-by: Alessandro Sturniolo <alessandro.sturniolo@gmail.com >
* ci(workflows): remove CUDA 11.x support from GitHub Actions (incompatible with ubuntu:24.04)
Signed-off-by: Alessandro Sturniolo <alessandro.sturniolo@gmail.com >
* ci(workflows): bump GitHub Actions CUDA support to 12.9
Signed-off-by: Alessandro Sturniolo <alessandro.sturniolo@gmail.com >
* build(docker): bump base image to ubuntu:24.04 and adjust Vulkan SDK/packages
Signed-off-by: Alessandro Sturniolo <alessandro.sturniolo@gmail.com >
* fix(backend): correct context paths for Python backends in workflows, Makefile and Dockerfile
Signed-off-by: Alessandro Sturniolo <alessandro.sturniolo@gmail.com >
* chore(make): disable parallel backend builds to avoid race conditions
Signed-off-by: Alessandro Sturniolo <alessandro.sturniolo@gmail.com >
* chore(make): export CUDA_MAJOR_VERSION and CUDA_MINOR_VERSION for override
Signed-off-by: Alessandro Sturniolo <alessandro.sturniolo@gmail.com >
* build(backend): update backend Dockerfiles to Ubuntu 24.04
Signed-off-by: Alessandro Sturniolo <alessandro.sturniolo@gmail.com >
* chore(backend): add ROCm env vars and default AMDGPU_TARGETS for hipBLAS builds
Signed-off-by: Alessandro Sturniolo <alessandro.sturniolo@gmail.com >
* chore(chatterbox): bump ROCm PyTorch to 2.9.1+rocm6.4 and update index URL; align hipblas requirements
Signed-off-by: Alessandro Sturniolo <alessandro.sturniolo@gmail.com >
* chore: add local-ai-launcher to .gitignore
Signed-off-by: Alessandro Sturniolo <alessandro.sturniolo@gmail.com >
* ci(workflows): fix backends GitHub Actions workflows after rebase
Signed-off-by: Alessandro Sturniolo <alessandro.sturniolo@gmail.com >
* build(docker): use build-time UBUNTU_VERSION variable
Signed-off-by: Alessandro Sturniolo <alessandro.sturniolo@gmail.com >
* chore(docker): remove libquadmath0 from requirements-stage base image
Signed-off-by: Alessandro Sturniolo <alessandro.sturniolo@gmail.com >
* chore(make): add backends/vllm to .NOTPARALLEL to prevent parallel builds
Signed-off-by: Alessandro Sturniolo <alessandro.sturniolo@gmail.com >
* fix(docker): correct CUDA installation steps in backend Dockerfiles
Signed-off-by: Alessandro Sturniolo <alessandro.sturniolo@gmail.com >
* chore(backend): update ROCm to 6.4 and align Python hipblas requirements
Signed-off-by: Alessandro Sturniolo <alessandro.sturniolo@gmail.com >
* ci(workflows): switch GitHub Actions runners to Ubuntu-24.04 for CUDA on arm64 builds
Signed-off-by: Alessandro Sturniolo <alessandro.sturniolo@gmail.com >
* build(docker): update base image and backend Dockerfiles for Ubuntu 24.04 compatibility on arm64
Signed-off-by: Alessandro Sturniolo <alessandro.sturniolo@gmail.com >
* build(backend): increase timeout for uv installs behind slow networks on backend/Dockerfile.python
Signed-off-by: Alessandro Sturniolo <alessandro.sturniolo@gmail.com >
* ci(workflows): switch GitHub Actions runners to Ubuntu-24.04 for vibevoice backend
Signed-off-by: Alessandro Sturniolo <alessandro.sturniolo@gmail.com >
* ci(workflows): fix failing GitHub Actions runners
Signed-off-by: Alessandro Sturniolo <alessandro.sturniolo@gmail.com >
* fix: Allow FROM_SOURCE to be unset, use upstream Intel images etc.
Signed-off-by: Richard Palethorpe <io@richiejp.com >
* chore(build): rm all traces of CUDA 11
Signed-off-by: Richard Palethorpe <io@richiejp.com >
* chore(build): Add Ubuntu codename as an argument
Signed-off-by: Richard Palethorpe <io@richiejp.com >
---------
Signed-off-by: Alessandro Sturniolo <alessandro.sturniolo@gmail.com >
Signed-off-by: Richard Palethorpe <io@richiejp.com >
Co-authored-by: Alessandro Sturniolo <alessandro.sturniolo@gmail.com >
2026-01-06 15:26:42 +01:00
Ettore Di Giacinto
26c4f80d1b
chore(llama.cpp/flags): simplify conditionals ( #7887 )
...
If ggml handle conditionals correctly we don't need to handle it here.
Signed-off-by: Ettore Di Giacinto <mudler@localai.io >
2026-01-06 15:02:20 +01:00
coffeerunhobby
5add7b47f5
fix: BMI2 crash on AVX-only CPUs (Intel Ivy Bridge/Sandy Bridge) ( #7864 )
...
* Fix BMI2 crash on AVX-only CPUs (Intel Ivy Bridge/Sandy Bridge)
Signed-off-by: coffeerunhobby <coffeerunhobby@users.noreply.github.com >
* Address feedback from review
Signed-off-by: Ettore Di Giacinto <mudler@localai.io >
---------
Signed-off-by: coffeerunhobby <coffeerunhobby@users.noreply.github.com >
Signed-off-by: Ettore Di Giacinto <mudler@localai.io >
Co-authored-by: coffeerunhobby <coffeerunhobby@users.noreply.github.com >
Co-authored-by: Ettore Di Giacinto <mudler@localai.io >
Co-authored-by: Ettore Di Giacinto <mudler@users.noreply.github.com >
2026-01-06 00:13:48 +00:00
LocalAI [bot]
4f7b6b0bff
chore: ⬆️ Update ggml-org/llama.cpp to e443fbcfa51a8a27b15f949397ab94b5e87b2450 ( #7881 )
...
⬆️ Update ggml-org/llama.cpp
Signed-off-by: github-actions[bot] <41898282+github-actions[bot]@users.noreply.github.com>
Co-authored-by: mudler <2420543+mudler@users.noreply.github.com >
2026-01-05 22:55:40 +01:00
LocalAI [bot]
3a629cea2f
chore: ⬆️ Update ggml-org/whisper.cpp to 679bdb53dbcbfb3e42685f50c7ff367949fd4d48 ( #7879 )
...
⬆️ Update ggml-org/whisper.cpp
Signed-off-by: github-actions[bot] <41898282+github-actions[bot]@users.noreply.github.com>
Co-authored-by: mudler <2420543+mudler@users.noreply.github.com >
2026-01-05 22:55:16 +01:00
LocalAI [bot]
f917feda29
chore: ⬆️ Update leejet/stable-diffusion.cpp to c5602a676caff5fe5a9f3b76b2bc614faf5121a5 ( #7880 )
...
⬆️ Update leejet/stable-diffusion.cpp
Signed-off-by: github-actions[bot] <41898282+github-actions[bot]@users.noreply.github.com>
Co-authored-by: mudler <2420543+mudler@users.noreply.github.com >
2026-01-05 22:54:56 +01:00
LocalAI [bot]
9d3da0bed5
chore: ⬆️ Update ggml-org/llama.cpp to 4974bf53cf14073c7b66e1151348156aabd42cb8 ( #7861 )
...
⬆️ Update ggml-org/llama.cpp
Signed-off-by: github-actions[bot] <41898282+github-actions[bot]@users.noreply.github.com>
Co-authored-by: mudler <2420543+mudler@users.noreply.github.com >
2026-01-05 00:10:18 +01:00
LocalAI [bot]
1b063b5595
chore: ⬆️ Update leejet/stable-diffusion.cpp to b90b1ee9cf84ea48b478c674dd2ec6a33fd504d6 ( #7862 )
...
⬆️ Update leejet/stable-diffusion.cpp
Signed-off-by: github-actions[bot] <41898282+github-actions[bot]@users.noreply.github.com>
Co-authored-by: mudler <2420543+mudler@users.noreply.github.com >
2026-01-04 23:52:01 +01:00
LocalAI [bot]
a7e155240b
chore: ⬆️ Update ggml-org/llama.cpp to e57f52334b2e8436a94f7e332462dfc63a08f995 ( #7848 )
...
⬆️ Update ggml-org/llama.cpp
Signed-off-by: github-actions[bot] <41898282+github-actions[bot]@users.noreply.github.com>
Co-authored-by: mudler <2420543+mudler@users.noreply.github.com >
2026-01-04 10:27:45 +01:00
coffeerunhobby
666d110714
fix: Prevent BMI2 instruction crash on AVX-only CPUs ( #7817 )
...
* Fix: Prevent BMI2 instruction crash on AVX-only CPUs
Signed-off-by: Ettore Di Giacinto <mudler@localai.io >
* fix: apply no-bmi flags on non-darwin
Signed-off-by: Ettore Di Giacinto <mudler@localai.io >
---------
Signed-off-by: Ettore Di Giacinto <mudler@localai.io >
Co-authored-by: coffeerunhobby <coffeerunhobby@users.noreply.github.com >
Co-authored-by: Ettore Di Giacinto <mudler@localai.io >
2026-01-03 08:36:55 +01:00
LocalAI [bot]
641606ae93
chore: ⬆️ Update ggml-org/llama.cpp to 706e3f93a60109a40f1224eaf4af0d59caa7c3ae ( #7836 )
...
⬆️ Update ggml-org/llama.cpp
Signed-off-by: github-actions[bot] <41898282+github-actions[bot]@users.noreply.github.com>
Co-authored-by: mudler <2420543+mudler@users.noreply.github.com >
2026-01-02 21:26:37 +00:00
Ettore Di Giacinto
5f6c941399
fix(llama.cpp/mmproj): fix loading mmproj in nested sub-dirs different from model path ( #7832 )
...
fix(mmproj): fix loading mmproj in nested sub-dirs
Signed-off-by: Ettore Di Giacinto <mudler@localai.io >
2026-01-02 20:17:30 +01:00
LocalAI [bot]
949de04052
chore: ⬆️ Update ggml-org/llama.cpp to ced765be44ce173c374f295b3c6f4175f8fd109b ( #7822 )
...
⬆️ Update ggml-org/llama.cpp
Signed-off-by: github-actions[bot] <41898282+github-actions[bot]@users.noreply.github.com>
Co-authored-by: mudler <2420543+mudler@users.noreply.github.com >
2026-01-02 08:44:49 +01:00
LocalAI [bot]
bc3e8793ed
chore: ⬆️ Update ggml-org/llama.cpp to 13814eb370d2f0b70e1830cc577b6155b17aee47 ( #7809 )
...
⬆️ Update ggml-org/llama.cpp
Signed-off-by: github-actions[bot] <41898282+github-actions[bot]@users.noreply.github.com>
Co-authored-by: mudler <2420543+mudler@users.noreply.github.com >
2025-12-31 23:04:01 +01:00
LocalAI [bot]
91978bb3a5
chore: ⬆️ Update ggml-org/whisper.cpp to e9898ddfb908ffaa7026c66852a023889a5a7202 ( #7810 )
...
⬆️ Update ggml-org/whisper.cpp
Signed-off-by: github-actions[bot] <41898282+github-actions[bot]@users.noreply.github.com>
Co-authored-by: mudler <2420543+mudler@users.noreply.github.com >
2025-12-31 22:59:05 +01:00
Ettore Di Giacinto
797f27f09f
feat(UI): image generation improvements ( #7804 )
...
* chore: drop mode from image generation(unused)
Signed-off-by: Ettore Di Giacinto <mudler@localai.io >
* feat(UI): improve image generation front-end
Signed-off-by: Ettore Di Giacinto <mudler@localai.io >
* feat(UI): only ref images. files is to be deprecated
Signed-off-by: Ettore Di Giacinto <mudler@localai.io >
* do not override default steps
Signed-off-by: Ettore Di Giacinto <mudler@localai.io >
---------
Signed-off-by: Ettore Di Giacinto <mudler@localai.io >
2025-12-31 21:59:46 +01:00
LocalAI [bot]
218f3a126a
chore: ⬆️ Update ggml-org/llama.cpp to 0f89d2ecf14270f45f43c442e90ae433fd82dab1 ( #7795 )
...
⬆️ Update ggml-org/llama.cpp
Signed-off-by: github-actions[bot] <41898282+github-actions[bot]@users.noreply.github.com>
Co-authored-by: mudler <2420543+mudler@users.noreply.github.com >
2025-12-31 08:53:41 +01:00
LocalAI [bot]
bc8ec5cb39
chore: ⬆️ Update ggml-org/llama.cpp to c9a3b40d6578f2381a1373d10249403d58c3c5bd ( #7778 )
...
⬆️ Update ggml-org/llama.cpp
Signed-off-by: github-actions[bot] <41898282+github-actions[bot]@users.noreply.github.com>
Co-authored-by: mudler <2420543+mudler@users.noreply.github.com >
2025-12-30 08:27:16 +01:00
Richard Palethorpe
0b80167912
chore: ⬆️ Update leejet/stable-diffusion.cpp to 4ff2c8c74bd17c2cfffe3a01be77743fb3efba2f ( #7771 )
...
* ⬆️ Update leejet/stable-diffusion.cpp
Signed-off-by: github-actions[bot] <41898282+github-actions[bot]@users.noreply.github.com>
* fix: Add KL_OPTIMAL scheduler, pass sampler to default scheduler for LCM and fixup other refactorings from upstream
Signed-off-by: Richard Palethorpe <io@richiejp.com >
* Delete backend/go/stablediffusion-ggml/compile_commands.json
Signed-off-by: Ettore Di Giacinto <mudler@users.noreply.github.com >
---------
Signed-off-by: github-actions[bot] <41898282+github-actions[bot]@users.noreply.github.com>
Signed-off-by: Richard Palethorpe <io@richiejp.com >
Signed-off-by: Ettore Di Giacinto <mudler@users.noreply.github.com >
Co-authored-by: mudler <2420543+mudler@users.noreply.github.com >
Co-authored-by: Ettore Di Giacinto <mudler@users.noreply.github.com >
2025-12-29 19:06:35 +01:00
LocalAI [bot]
1a6fd0f7fc
chore: ⬆️ Update ggml-org/llama.cpp to 4ffc47cb2001e7d523f9ff525335bbe34b1a2858 ( #7760 )
...
⬆️ Update ggml-org/llama.cpp
Signed-off-by: github-actions[bot] <41898282+github-actions[bot]@users.noreply.github.com>
Co-authored-by: mudler <2420543+mudler@users.noreply.github.com >
2025-12-28 21:10:39 +00:00
LocalAI [bot]
c95c482f36
chore: ⬆️ Update ggml-org/llama.cpp to a4bf35889eda36d3597cd0f8f333f5b8a2fcaefc ( #7751 )
...
⬆️ Update ggml-org/llama.cpp
Signed-off-by: github-actions[bot] <41898282+github-actions[bot]@users.noreply.github.com>
Co-authored-by: mudler <2420543+mudler@users.noreply.github.com >
2025-12-27 21:09:12 +00:00
LocalAI [bot]
ddf0281785
chore: ⬆️ Update ggml-org/llama.cpp to 7ac8902133da6eb390c4d8368a7d252279123942 ( #7740 )
...
⬆️ Update ggml-org/llama.cpp
Signed-off-by: github-actions[bot] <41898282+github-actions[bot]@users.noreply.github.com>
Co-authored-by: mudler <2420543+mudler@users.noreply.github.com >
2025-12-26 21:44:34 +00:00
LocalAI [bot]
86c68c9623
chore: ⬆️ Update ggml-org/llama.cpp to 85c40c9b02941ebf1add1469af75f1796d513ef4 ( #7731 )
...
⬆️ Update ggml-org/llama.cpp
Signed-off-by: github-actions[bot] <41898282+github-actions[bot]@users.noreply.github.com>
Co-authored-by: mudler <2420543+mudler@users.noreply.github.com >
2025-12-25 21:10:28 +00:00
LocalAI [bot]
2fe6e278c8
chore: ⬆️ Update ggml-org/llama.cpp to c18428423018ed214c004e6ecaedb0cbdda06805 ( #7718 )
...
⬆️ Update ggml-org/llama.cpp
Signed-off-by: github-actions[bot] <41898282+github-actions[bot]@users.noreply.github.com>
Co-authored-by: mudler <2420543+mudler@users.noreply.github.com >
2025-12-25 10:00:40 +01:00
LocalAI [bot]
ae69921d77
chore: ⬆️ Update ggml-org/whisper.cpp to 6114e692136bea917dc88a5eb2e532c3d133d963 ( #7717 )
...
⬆️ Update ggml-org/whisper.cpp
Signed-off-by: github-actions[bot] <41898282+github-actions[bot]@users.noreply.github.com>
Co-authored-by: mudler <2420543+mudler@users.noreply.github.com >
2025-12-25 10:00:24 +01:00
Ettore Di Giacinto
0a168830ea
chore(deps): Bump llama.cpp to '5b6c9bc0f3c8f55598b9999b65aff7ce4119bc15' and refactor usage of base params ( #7706 )
...
* chore(deps): Bump llama.cpp to '5b6c9bc0f3c8f55598b9999b65aff7ce4119bc15' and refactor usage of base params
Signed-off-by: Ettore Di Giacinto <mudler@localai.io >
* chore: update AGENTS.md
Signed-off-by: Ettore Di Giacinto <mudler@localai.io >
---------
Signed-off-by: Ettore Di Giacinto <mudler@localai.io >
2025-12-24 00:28:27 +01:00
Ettore Di Giacinto
fc6057a952
chore(deps): bump llama.cpp to '0e1ccf15c7b6d05c720551b537857ecf6194d420' ( #7684 )
...
Signed-off-by: Ettore Di Giacinto <mudler@localai.io >
2025-12-22 09:50:42 +01:00
Ettore Di Giacinto
8b3e0ebf8a
chore: allow to set local-ai log format, default to custom one ( #7679 )
...
Signed-off-by: Ettore Di Giacinto <mudler@localai.io >
2025-12-21 21:21:59 +01:00
Ettore Di Giacinto
c37785b78c
chore(refactor): move logging to common package based on slog ( #7668 )
...
Signed-off-by: Ettore Di Giacinto <mudler@localai.io >
2025-12-21 19:33:13 +01:00
LocalAI [bot]
38cde81ff4
chore: ⬆️ Update ggml-org/llama.cpp to 52ab19df633f3de5d4db171a16f2d9edd2342fec ( #7665 )
...
⬆️ Update ggml-org/llama.cpp
Signed-off-by: github-actions[bot] <41898282+github-actions[bot]@users.noreply.github.com>
Co-authored-by: mudler <2420543+mudler@users.noreply.github.com >
2025-12-20 21:09:15 +00:00
LocalAI [bot]
626057bcca
chore: ⬆️ Update ggml-org/llama.cpp to ce734a8a2f9fb6eb4f0383ab1370a1b0014ab787 ( #7654 )
...
⬆️ Update ggml-org/llama.cpp
Signed-off-by: github-actions[bot] <41898282+github-actions[bot]@users.noreply.github.com>
Co-authored-by: mudler <2420543+mudler@users.noreply.github.com >
2025-12-19 21:15:39 +00:00
LocalAI [bot]
aa0efeb0a8
chore: ⬆️ Update ggml-org/whisper.cpp to 6c22e792cb0ee155b6587ce71a8410c3aeb06949 ( #7644 )
...
⬆️ Update ggml-org/whisper.cpp
Signed-off-by: github-actions[bot] <41898282+github-actions[bot]@users.noreply.github.com>
Co-authored-by: mudler <2420543+mudler@users.noreply.github.com >
2025-12-19 09:26:41 +01:00
LocalAI [bot]
f25ac00bca
chore: ⬆️ Update ggml-org/llama.cpp to f9ec8858edea4a0ecfea149d6815ebfb5ecc3bcd ( #7642 )
...
⬆️ Update ggml-org/llama.cpp
Signed-off-by: github-actions[bot] <41898282+github-actions[bot]@users.noreply.github.com>
Co-authored-by: mudler <2420543+mudler@users.noreply.github.com >
2025-12-18 21:17:14 +00:00
Richard Palethorpe
c3494a0927
chore: ⬆️ Update leejet/stable-diffusion.cpp to bda7fab9f208dff4b67179a68f694b6ddec13326 ( #7639 )
...
* ⬆️ Update leejet/stable-diffusion.cpp
Signed-off-by: github-actions[bot] <41898282+github-actions[bot]@users.noreply.github.com>
* fix(stablediffusion-ggml): Don't set removed lora model dir
Signed-off-by: Richard Palethorpe <io@richiejp.com >
---------
Signed-off-by: github-actions[bot] <41898282+github-actions[bot]@users.noreply.github.com>
Signed-off-by: Richard Palethorpe <io@richiejp.com >
Co-authored-by: mudler <2420543+mudler@users.noreply.github.com >
2025-12-18 20:52:22 +01:00
Richard Palethorpe
716dba94b4
feat(whisper): Add prompt to condition transcription output ( #7624 )
...
* chore(makefile): Add buildargs for sd and cuda when building backend
Signed-off-by: Richard Palethorpe <io@richiejp.com >
* feat(whisper): Add prompt to condition transcription output
Signed-off-by: Richard Palethorpe <io@richiejp.com >
---------
Signed-off-by: Richard Palethorpe <io@richiejp.com >
2025-12-18 14:40:45 +01:00
LocalAI [bot]
5515119a7e
chore: ⬆️ Update ggml-org/llama.cpp to d37fc935059211454e9ad2e2a44e8ed78fd6d1ce ( #7629 )
...
⬆️ Update ggml-org/llama.cpp
Signed-off-by: github-actions[bot] <41898282+github-actions[bot]@users.noreply.github.com>
Co-authored-by: mudler <2420543+mudler@users.noreply.github.com >
2025-12-18 09:07:09 +01:00
LocalAI [bot]
4535e7dfc4
chore: ⬆️ Update ggml-org/whisper.cpp to 3e79e73eee32e924fbd34587f2f2ac5a45a26b61 ( #7630 )
...
⬆️ Update ggml-org/whisper.cpp
Signed-off-by: github-actions[bot] <41898282+github-actions[bot]@users.noreply.github.com>
Co-authored-by: mudler <2420543+mudler@users.noreply.github.com >
2025-12-18 09:06:48 +01:00
LocalAI [bot]
14bb65b57b
chore: ⬆️ Update ggml-org/llama.cpp to ef83fb8601229ff650d952985be47e82d644bfaa ( #7611 )
...
⬆️ Update ggml-org/llama.cpp
Signed-off-by: github-actions[bot] <41898282+github-actions[bot]@users.noreply.github.com>
Co-authored-by: mudler <2420543+mudler@users.noreply.github.com >
Co-authored-by: Ettore Di Giacinto <mudler@users.noreply.github.com >
2025-12-17 08:32:42 +01:00
Ettore Di Giacinto
61afe4ca60
chore: drop drawin-x86_64 support ( #7616 )
...
Signed-off-by: Ettore Di Giacinto <mudler@localai.io >
2025-12-16 21:22:15 +01:00
blightbow
67baf66555
feat(mlx): add thread-safe LRU prompt cache and min_p/top_k sampling ( #7556 )
...
* feat(mlx): add thread-safe LRU prompt cache
Port mlx-lm's LRUPromptCache to fix race condition where concurrent
requests corrupt shared KV cache state. The previous implementation
used a single prompt_cache instance shared across all requests.
Changes:
- Add backend/python/common/mlx_cache.py with ThreadSafeLRUPromptCache
- Modify backend.py to use per-request cache isolation via fetch/insert
- Add prefix matching for cache reuse across similar prompts
- Add LRU eviction (default 10 entries, configurable)
- Add concurrency and cache unit tests
The cache uses a trie-based structure for efficient prefix matching,
allowing prompts that share common prefixes to reuse cached KV states.
Thread safety is provided via threading.Lock.
New configuration options:
- max_cache_entries: Maximum LRU cache entries (default: 10)
- max_kv_size: Maximum KV cache size per entry (default: None)
🤖 Generated with [Claude Code](https://claude.com/claude-code )
Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com >
Signed-off-by: Blightbow <blightbow@users.noreply.github.com >
* feat(mlx): add min_p and top_k sampler support
Add MinP field to proto (field 52) following the precedent set by
other non-OpenAI sampling parameters like TopK, TailFreeSamplingZ,
TypicalP, and Mirostat.
Changes:
- backend.proto: Add float MinP field for min-p sampling
- backend.py: Extract and pass min_p and top_k to mlx_lm sampler
(top_k was in proto but not being passed)
- test.py: Fix test_sampling_params to use valid proto fields and
switch to MLX-compatible model (mlx-community/Llama-3.2-1B-Instruct)
🤖 Generated with [Claude Code](https://claude.com/claude-code )
Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com >
Signed-off-by: Blightbow <blightbow@users.noreply.github.com >
* refactor(mlx): move mlx_cache.py from common to mlx backend
The ThreadSafeLRUPromptCache is only used by the mlx backend. After
evaluating mlx-vlm, it was determined that the cache cannot be shared
because mlx-vlm's generate/stream_generate functions don't support
the prompt_cache parameter that mlx_lm provides.
- Move mlx_cache.py from backend/python/common/ to backend/python/mlx/
- Remove sys.path manipulation from backend.py and test.py
- Fix test assertion to expect "MLX model loaded successfully"
🤖 Generated with [Claude Code](https://claude.com/claude-code )
Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com >
Signed-off-by: Blightbow <blightbow@users.noreply.github.com >
* test(mlx): add comprehensive cache tests and document upstream behavior
Added comprehensive unit tests (test_mlx_cache.py) covering all cache
operation modes:
- Exact match
- Shorter prefix match
- Longer prefix match with trimming
- No match scenarios
- LRU eviction and access order
- Reference counting and deep copy behavior
- Multi-model namespacing
- Thread safety with data integrity verification
Documents upstream mlx_lm/server.py behavior: single-token prefixes are
deliberately not matched (uses > 0, not >= 0) to allow longer cached
sequences to be preferred for trimming. This is acceptable because real
prompts with chat templates are always many tokens.
Removed weak unit tests from test.py that only verified "no exception
thrown" rather than correctness.
🤖 Generated with [Claude Code](https://claude.com/claude-code )
Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com >
Signed-off-by: Blightbow <blightbow@users.noreply.github.com >
* chore(mlx): remove unused MinP proto field
The MinP field was added to PredictOptions but is not populated by the
Go frontend/API. The MLX backend uses getattr with a default value,
so it works without the proto field.
🤖 Generated with [Claude Code](https://claude.com/claude-code )
Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com >
Signed-off-by: Blightbow <blightbow@users.noreply.github.com >
---------
Signed-off-by: Blightbow <blightbow@users.noreply.github.com >
Co-authored-by: Blightbow <blightbow@users.noreply.github.com >
Co-authored-by: Claude Opus 4.5 <noreply@anthropic.com >
2025-12-16 11:27:46 +01:00
dependabot[bot]
dbd25885c3
chore(deps): bump sentence-transformers from 5.1.0 to 5.2.0 in /backend/python/transformers ( #7594 )
...
chore(deps): bump sentence-transformers in /backend/python/transformers
Bumps [sentence-transformers](https://github.com/huggingface/sentence-transformers ) from 5.1.0 to 5.2.0.
- [Release notes](https://github.com/huggingface/sentence-transformers/releases )
- [Commits](https://github.com/huggingface/sentence-transformers/compare/v5.1.0...v5.2.0 )
---
updated-dependencies:
- dependency-name: sentence-transformers
dependency-version: 5.2.0
dependency-type: direct:production
update-type: version-update:semver-minor
...
Signed-off-by: dependabot[bot] <support@github.com >
Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com>
2025-12-16 09:12:57 +01:00
LocalAI [bot]
7a3b0bbfaa
chore: ⬆️ Update leejet/stable-diffusion.cpp to 200cb6f2ca07e40fa83b610a4e595f4da06ec709 ( #7597 )
...
⬆️ Update leejet/stable-diffusion.cpp
Signed-off-by: github-actions[bot] <41898282+github-actions[bot]@users.noreply.github.com>
Co-authored-by: mudler <2420543+mudler@users.noreply.github.com >
2025-12-16 08:15:15 +01:00
Ettore Di Giacinto
2387b266d8
chore(llama.cpp): Add Missing llama.cpp Options to gRPC Server ( #7584 )
...
Signed-off-by: Ettore Di Giacinto <mudler@localai.io >
2025-12-15 21:55:20 +01:00
LocalAI [bot]
0f5cc4c07b
chore: ⬆️ Update ggml-org/llama.cpp to 5c8a717128cc98aa9e5b1c44652f5cf458fd426e ( #7573 )
...
⬆️ Update ggml-org/llama.cpp
Signed-off-by: github-actions[bot] <41898282+github-actions[bot]@users.noreply.github.com>
Co-authored-by: mudler <2420543+mudler@users.noreply.github.com >
2025-12-14 22:21:54 +01:00