LocalAI [bot]
6e5a58ca70
feat: Add Free RPC to backend.proto for VRAM cleanup ( #8751 )
...
* fix: Add VRAM cleanup when stopping models
- Add Free() method to AIModel interface for proper GPU resource cleanup
- Implement Free() in llama backend to release llama.cpp model resources
- Add Free() stub implementations in base and SingleThread backends
- Modify deleteProcess() to call Free() before stopping the process
to ensure VRAM is properly released when models are unloaded
Fixes issue where VRAM was not freed when stopping models, which
could lead to memory exhaustion when running multiple models
sequentially.
* feat: Add Free RPC to backend.proto for VRAM cleanup\n\n- Add rpc Free(HealthMessage) returns (Result) {} to backend.proto\n- This RPC is required to properly expose the Free() method\n through the gRPC interface for VRAM resource cleanup\n\nRefs: PR #8739
* Apply suggestion from @mudler
Signed-off-by: Ettore Di Giacinto <mudler@users.noreply.github.com >
---------
Signed-off-by: Ettore Di Giacinto <mudler@users.noreply.github.com >
Co-authored-by: localai-bot <localai-bot@users.noreply.github.com >
Co-authored-by: Ettore Di Giacinto <mudler@users.noreply.github.com >
2026-03-03 12:39:06 +01:00
Ettore Di Giacinto
1c8db3846d
chore(faster-qwen3-tts): Add anyio to requirements.txt
...
Signed-off-by: Ettore Di Giacinto <mudler@users.noreply.github.com >
2026-03-03 09:43:29 +01:00
LocalAI [bot]
d846ad3a84
chore: ⬆️ Update ggml-org/llama.cpp to 4d828bd1ab52773ba9570cc008cf209eb4a8b2f5 ( #8727 )
...
⬆️ Update ggml-org/llama.cpp
Signed-off-by: github-actions[bot] <41898282+github-actions[bot]@users.noreply.github.com>
Co-authored-by: mudler <2420543+mudler@users.noreply.github.com >
2026-03-02 23:22:28 +01:00
LocalAI [bot]
2dd4e7cdc3
fix(qwen-tts): ensure all requirements files end with newline ( #8724 )
...
- Add trailing newline to all requirements*.txt files in qwen-tts backend
- This ensures proper file formatting and prevents potential issues with
package installation tools that expect newline-terminated files
2026-03-02 13:56:11 +01:00
LocalAI [bot]
b61536c0f4
chore: ⬆️ Update ggml-org/llama.cpp to 319146247e643695f94a558e8ae686277dd4f8da ( #8707 )
...
⬆️ Update ggml-org/llama.cpp
Signed-off-by: github-actions[bot] <41898282+github-actions[bot]@users.noreply.github.com>
Co-authored-by: mudler <2420543+mudler@users.noreply.github.com >
2026-03-02 10:08:51 +01:00
LocalAI [bot]
8b430c577b
feat: Add debug logging for pocket-tts voice issue #8244 ( #8715 )
...
Adding debug logging to help investigate the pocket-tts custom voice
finding issue (Issue #8244 ). This is a first step to understand how
voices are being loaded and where the failure occurs.
Signed-off-by: localai-bot <localai-bot@users.noreply.github.com >
Co-authored-by: localai-bot <localai-bot@users.noreply.github.com >
2026-03-02 09:24:59 +01:00
LocalAI [bot]
ddb36468ed
chore: ⬆️ Update ggml-org/llama.cpp to 05728db18eea59de81ee3a7699739daaf015206b ( #8683 )
...
⬆️ Update ggml-org/llama.cpp
Signed-off-by: github-actions[bot] <41898282+github-actions[bot]@users.noreply.github.com>
Co-authored-by: mudler <2420543+mudler@users.noreply.github.com >
2026-03-01 00:48:26 +01:00
Ettore Di Giacinto
1c5dc83232
chore(deps): bump llama.cpp to 'ecbcb7ea9d3303097519723b264a8b5f1e977028' ( #8672 )
...
Signed-off-by: Ettore Di Giacinto <mudler@localai.io >
2026-02-28 00:33:56 +01:00
LocalAI [bot]
73b997686a
chore: ⬆️ Update ggml-org/whisper.cpp to 9453b4b9be9b73adfc35051083f37cefa039acee ( #8671 )
...
⬆️ Update ggml-org/whisper.cpp
Signed-off-by: github-actions[bot] <41898282+github-actions[bot]@users.noreply.github.com>
Co-authored-by: mudler <2420543+mudler@users.noreply.github.com >
2026-02-27 21:28:48 +00:00
LocalAI [bot]
dfc6efb88d
feat(backends): add faster-qwen3-tts ( #8664 )
...
* feat(backends): add faster-qwen3-tts
Signed-off-by: Ettore Di Giacinto <mudler@localai.io >
* fix: this backend is CUDA only
Signed-off-by: Ettore Di Giacinto <mudler@localai.io >
* fix: add requirements-install.txt with setuptools for build isolation
The faster-qwen3-tts backend requires setuptools to build packages
like sox that have setuptools as a build dependency. This ensures
the build completes successfully in CI.
Signed-off-by: LocalAI Bot <localai-bot@users.noreply.github.com >
---------
Signed-off-by: Ettore Di Giacinto <mudler@localai.io >
Signed-off-by: LocalAI Bot <localai-bot@users.noreply.github.com >
Co-authored-by: Ettore Di Giacinto <mudler@localai.io >
2026-02-27 08:16:51 +01:00
LocalAI [bot]
8ad40091a6
chore: ⬆️ Update ggml-org/llama.cpp to 723c71064da0908c19683f8c344715fbf6d986fd ( #8660 )
...
⬆️ Update ggml-org/llama.cpp
Signed-off-by: github-actions[bot] <41898282+github-actions[bot]@users.noreply.github.com>
Co-authored-by: mudler <2420543+mudler@users.noreply.github.com >
2026-02-26 21:34:47 +00:00
LocalAI [bot]
fb86f6461d
chore: ⬆️ Update ggml-org/llama.cpp to 3769fe6eb70b0a0fbb30b80917f1caae68c902f7 ( #8655 )
...
⬆️ Update ggml-org/llama.cpp
Signed-off-by: github-actions[bot] <41898282+github-actions[bot]@users.noreply.github.com>
Co-authored-by: mudler <2420543+mudler@users.noreply.github.com >
2026-02-26 00:05:03 +01:00
Ettore Di Giacinto
b032cf489b
fix(chatterbox): add support for cuda13/aarch64 ( #8653 )
...
Signed-off-by: Ettore Di Giacinto <mudler@localai.io >
2026-02-25 21:51:44 +01:00
dependabot[bot]
c4783a0a05
chore(deps): bump grpcio from 1.76.0 to 1.78.1 in /backend/python/vllm ( #8635 )
...
Bumps [grpcio](https://github.com/grpc/grpc ) from 1.76.0 to 1.78.1.
- [Release notes](https://github.com/grpc/grpc/releases )
- [Commits](https://github.com/grpc/grpc/compare/v1.76.0...v1.78.1 )
---
updated-dependencies:
- dependency-name: grpcio
dependency-version: 1.78.1
dependency-type: direct:production
update-type: version-update:semver-minor
...
Signed-off-by: dependabot[bot] <support@github.com >
Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com>
2026-02-25 08:17:32 +01:00
dependabot[bot]
c44f03b882
chore(deps): bump grpcio from 1.76.0 to 1.78.1 in /backend/python/rerankers ( #8636 )
...
chore(deps): bump grpcio in /backend/python/rerankers
Bumps [grpcio](https://github.com/grpc/grpc ) from 1.76.0 to 1.78.1.
- [Release notes](https://github.com/grpc/grpc/releases )
- [Commits](https://github.com/grpc/grpc/compare/v1.76.0...v1.78.1 )
---
updated-dependencies:
- dependency-name: grpcio
dependency-version: 1.78.1
dependency-type: direct:production
update-type: version-update:semver-minor
...
Signed-off-by: dependabot[bot] <support@github.com >
Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com>
2026-02-25 08:16:57 +01:00
dependabot[bot]
eeec92af78
chore(deps): bump sentence-transformers from 5.2.2 to 5.2.3 in /backend/python/transformers ( #8638 )
...
chore(deps): bump sentence-transformers in /backend/python/transformers
Bumps [sentence-transformers](https://github.com/huggingface/sentence-transformers ) from 5.2.2 to 5.2.3.
- [Release notes](https://github.com/huggingface/sentence-transformers/releases )
- [Commits](https://github.com/huggingface/sentence-transformers/compare/v5.2.2...v5.2.3 )
---
updated-dependencies:
- dependency-name: sentence-transformers
dependency-version: 5.2.3
dependency-type: direct:production
update-type: version-update:semver-patch
...
Signed-off-by: dependabot[bot] <support@github.com >
Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com>
2026-02-25 08:16:41 +01:00
dependabot[bot]
842033b8b5
chore(deps): bump grpcio from 1.76.0 to 1.78.1 in /backend/python/transformers ( #8640 )
...
chore(deps): bump grpcio in /backend/python/transformers
Bumps [grpcio](https://github.com/grpc/grpc ) from 1.76.0 to 1.78.1.
- [Release notes](https://github.com/grpc/grpc/releases )
- [Commits](https://github.com/grpc/grpc/compare/v1.76.0...v1.78.1 )
---
updated-dependencies:
- dependency-name: grpcio
dependency-version: 1.78.1
dependency-type: direct:production
update-type: version-update:semver-minor
...
Signed-off-by: dependabot[bot] <support@github.com >
Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com>
2026-02-25 08:14:55 +01:00
dependabot[bot]
a2941228a7
chore(deps): bump grpcio from 1.76.0 to 1.78.1 in /backend/python/common/template ( #8641 )
...
chore(deps): bump grpcio in /backend/python/common/template
Bumps [grpcio](https://github.com/grpc/grpc ) from 1.76.0 to 1.78.1.
- [Release notes](https://github.com/grpc/grpc/releases )
- [Commits](https://github.com/grpc/grpc/compare/v1.76.0...v1.78.1 )
---
updated-dependencies:
- dependency-name: grpcio
dependency-version: 1.78.1
dependency-type: direct:production
update-type: version-update:semver-minor
...
Signed-off-by: dependabot[bot] <support@github.com >
Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com>
2026-02-25 08:14:43 +01:00
dependabot[bot]
791e6b84ee
chore(deps): bump grpcio from 1.76.0 to 1.78.1 in /backend/python/coqui ( #8642 )
...
Bumps [grpcio](https://github.com/grpc/grpc ) from 1.76.0 to 1.78.1.
- [Release notes](https://github.com/grpc/grpc/releases )
- [Commits](https://github.com/grpc/grpc/compare/v1.76.0...v1.78.1 )
---
updated-dependencies:
- dependency-name: grpcio
dependency-version: 1.78.1
dependency-type: direct:production
update-type: version-update:semver-minor
...
Signed-off-by: dependabot[bot] <support@github.com >
Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com>
2026-02-25 08:14:30 +01:00
LocalAI [bot]
1331e23b67
chore: ⬆️ Update ggml-org/llama.cpp to 418dea39cea85d3496c8b04a118c3b17f3940ad8 ( #8649 )
...
⬆️ Update ggml-org/llama.cpp
Signed-off-by: github-actions[bot] <41898282+github-actions[bot]@users.noreply.github.com>
Co-authored-by: mudler <2420543+mudler@users.noreply.github.com >
2026-02-25 00:04:48 +00:00
LocalAI [bot]
9a5b5ee8a9
chore: ⬆️ Update ggml-org/llama.cpp to b68a83e641b3ebe6465970b34e99f3f0e0a0b21a ( #8628 )
...
⬆️ Update ggml-org/llama.cpp
Signed-off-by: github-actions[bot] <41898282+github-actions[bot]@users.noreply.github.com>
Co-authored-by: mudler <2420543+mudler@users.noreply.github.com >
2026-02-23 22:02:40 +00:00
LocalAI [bot]
f40c8dd0ce
chore: ⬆️ Update ggml-org/llama.cpp to 2b6dfe824de8600c061ef91ce5cc5c307f97112c ( #8622 )
...
⬆️ Update ggml-org/llama.cpp
Signed-off-by: github-actions[bot] <41898282+github-actions[bot]@users.noreply.github.com>
Co-authored-by: mudler <2420543+mudler@users.noreply.github.com >
2026-02-23 09:30:58 +00:00
LocalAI [bot]
91f2dd5820
chore: ⬆️ Update ggml-org/llama.cpp to f75c4e8bf52ea480ece07fd3d9a292f1d7f04bc5 ( #8619 )
...
⬆️ Update ggml-org/llama.cpp
Signed-off-by: github-actions[bot] <41898282+github-actions[bot]@users.noreply.github.com>
Co-authored-by: mudler <2420543+mudler@users.noreply.github.com >
2026-02-22 13:20:08 +01:00
LocalAI [bot]
fcecc12e57
chore: ⬆️ Update ggml-org/llama.cpp to ba3b9c8844aca35ecb40d31886686326f22d2214 ( #8613 )
...
⬆️ Update ggml-org/llama.cpp
Signed-off-by: github-actions[bot] <41898282+github-actions[bot]@users.noreply.github.com>
Co-authored-by: mudler <2420543+mudler@users.noreply.github.com >
Co-authored-by: Ettore Di Giacinto <mudler@users.noreply.github.com >
2026-02-21 09:57:04 +01:00
LocalAI [bot]
bb0924dff1
chore: ⬆️ Update ggml-org/llama.cpp to b908baf1825b1a89afef87b09e22c32af2ca6548 ( #8612 )
...
⬆️ Update ggml-org/llama.cpp
Signed-off-by: github-actions[bot] <41898282+github-actions[bot]@users.noreply.github.com>
Co-authored-by: mudler <2420543+mudler@users.noreply.github.com >
2026-02-20 23:47:47 +01:00
LocalAI [bot]
b1c434f0fc
chore: ⬆️ Update ggml-org/llama.cpp to 11c325c6e0666a30590cde390d5746a405e536b9 ( #8607 )
...
⬆️ Update ggml-org/llama.cpp
Signed-off-by: github-actions[bot] <41898282+github-actions[bot]@users.noreply.github.com>
Co-authored-by: mudler <2420543+mudler@users.noreply.github.com >
2026-02-19 23:32:35 +01:00
LocalAI [bot]
bb42b342de
chore: ⬆️ Update ggml-org/whisper.cpp to 21411d81ea736ed5d9cdea4df360d3c4b60a4adb ( #8606 )
...
⬆️ Update ggml-org/whisper.cpp
Signed-off-by: github-actions[bot] <41898282+github-actions[bot]@users.noreply.github.com>
Co-authored-by: mudler <2420543+mudler@users.noreply.github.com >
2026-02-19 23:32:21 +01:00
LocalAI [bot]
e555057f8b
fix: multi-GPU support for Diffusers (Issue #8575 ) ( #8605 )
...
* chore: init
* feat: implement multi-GPU support for Diffusers backend (fixes #8575 )
---------
Co-authored-by: localai-bot <localai-bot@users.noreply.github.com >
2026-02-19 21:35:58 +01:00
Ettore Di Giacinto
dadc7158fb
fix(diffusers): sd_embed is not always available ( #8602 )
...
Seems sd_embed doesn't play well with MPS and L4T. Making it optional
Signed-off-by: Ettore Di Giacinto <mudler@localai.io >
2026-02-19 10:45:17 +01:00
LocalAI [bot]
68c7077491
chore: ⬆️ Update ggml-org/llama.cpp to b55dcdef5dcd74dc75c4921090e928d43453c157 ( #8599 )
...
⬆️ Update ggml-org/llama.cpp
Signed-off-by: github-actions[bot] <41898282+github-actions[bot]@users.noreply.github.com>
Co-authored-by: mudler <2420543+mudler@users.noreply.github.com >
2026-02-18 22:33:25 +01:00
LocalAI [bot]
ed832cf0e0
chore: ⬆️ Update ggml-org/llama.cpp to 2b089c77580d347767f440205103e4da8ec33d89 ( #8592 )
...
⬆️ Update ggml-org/llama.cpp
Signed-off-by: github-actions[bot] <41898282+github-actions[bot]@users.noreply.github.com>
Co-authored-by: mudler <2420543+mudler@users.noreply.github.com >
Co-authored-by: Ettore Di Giacinto <mudler@users.noreply.github.com >
2026-02-17 22:35:07 +00:00
Richard Palethorpe
9e692967c3
fix(llama-cpp): Pass parameters when using embedded template ( #8590 )
...
Signed-off-by: Richard Palethorpe <io@richiejp.com >
2026-02-17 18:50:05 +01:00
LocalAI [bot]
067a255435
chore: ⬆️ Update ggml-org/llama.cpp to d612901116ab2066c7923372d4827032ff296bc4 ( #8588 )
...
⬆️ Update ggml-org/llama.cpp
Signed-off-by: github-actions[bot] <41898282+github-actions[bot]@users.noreply.github.com>
Co-authored-by: mudler <2420543+mudler@users.noreply.github.com >
2026-02-17 00:57:32 +01:00
LocalAI [bot]
109f29cc24
chore: ⬆️ Update ggml-org/llama.cpp to 27b93cbd157fc4ad94573a1fbc226d3e18ea1bb4 ( #8577 )
...
⬆️ Update ggml-org/llama.cpp
Signed-off-by: github-actions[bot] <41898282+github-actions[bot]@users.noreply.github.com>
Co-authored-by: mudler <2420543+mudler@users.noreply.github.com >
2026-02-15 23:42:36 +01:00
LocalAI [bot]
587e4a21b3
chore: ⬆️ Update antirez/voxtral.c to 134d366c24d20c64b614a3dcc8bda2a6922d077d ( #8578 )
...
⬆️ Update antirez/voxtral.c
Signed-off-by: github-actions[bot] <41898282+github-actions[bot]@users.noreply.github.com>
Co-authored-by: mudler <2420543+mudler@users.noreply.github.com >
2026-02-15 23:42:11 +01:00
LocalAI [bot]
3f1f58b2ab
chore: ⬆️ Update ggml-org/whisper.cpp to 364c77f4ca2737e3287652e0e8a8c6dce3231bba ( #8576 )
...
⬆️ Update ggml-org/whisper.cpp
Signed-off-by: github-actions[bot] <41898282+github-actions[bot]@users.noreply.github.com>
Co-authored-by: mudler <2420543+mudler@users.noreply.github.com >
2026-02-15 21:20:04 +00:00
LocalAI [bot]
d784851337
chore: ⬆️ Update ggml-org/llama.cpp to 01d8eaa28d57bfc6d06e30072085ed0ef12e06c5 ( #8567 )
...
⬆️ Update ggml-org/llama.cpp
Signed-off-by: github-actions[bot] <41898282+github-actions[bot]@users.noreply.github.com>
Co-authored-by: mudler <2420543+mudler@users.noreply.github.com >
2026-02-14 22:52:32 +01:00
LocalAI [bot]
94df096fb9
fix: pin neutts-air to known working commit ( #8566 )
...
* chore: init
* fix: pin neutts-air to known working commit
---------
Co-authored-by: localai-bot <localai-bot@users.noreply.github.com >
2026-02-14 21:16:37 +01:00
Ettore Di Giacinto
820bd7dd01
fix(ci): try to fix deps for l4t13 on qwen-*
...
Signed-off-by: Ettore Di Giacinto <mudler@localai.io >
2026-02-14 10:21:23 +01:00
Austen
42cb7bda19
fix(llama-cpp): populate tensor_buft_override buffer so llama-cpp properly performs fit calculations ( #8560 )
...
fix auto-fit for llama-cpp
2026-02-14 10:07:37 +01:00
Ettore Di Giacinto
2fb9940b8a
fix(voxcpm): pin setuptools ( #8556 )
...
Signed-off-by: Ettore Di Giacinto <mudler@localai.io >
2026-02-13 23:44:35 +01:00
LocalAI [bot]
2ff0ad4190
chore: ⬆️ Update ggml-org/llama.cpp to 05a6f0e8946914918758db767f6eb04bc1e38507 ( #8553 )
...
⬆️ Update ggml-org/llama.cpp
Signed-off-by: github-actions[bot] <41898282+github-actions[bot]@users.noreply.github.com>
Co-authored-by: mudler <2420543+mudler@users.noreply.github.com >
2026-02-13 22:48:01 +01:00
Ettore Di Giacinto
2fd026e958
fix: update moonshine API, add setuptools to voxcpm requirements ( #8541 )
...
Signed-off-by: Ettore Di Giacinto <mudler@localai.io >
2026-02-12 23:22:37 +01:00
LocalAI [bot]
08718b656e
chore: ⬆️ Update ggml-org/llama.cpp to 338085c69e486b7155e5b03d7b5087e02c0e2528 ( #8538 )
...
⬆️ Update ggml-org/llama.cpp
Signed-off-by: github-actions[bot] <41898282+github-actions[bot]@users.noreply.github.com>
Co-authored-by: mudler <2420543+mudler@users.noreply.github.com >
2026-02-12 23:21:53 +01:00
Austen
cff972094c
feat(diffusers): add experimental support for sd_embed-style prompt embedding ( #8504 )
...
* add experimental support for sd_embed-style prompt embedding
Signed-off-by: Austen Dicken <cvpcsm@gmail.com >
* add doc equivalent to compel
Signed-off-by: Austen Dicken <cvpcsm@gmail.com >
* need to use flux1 embedding function for flux model
Signed-off-by: Austen Dicken <cvpcsm@gmail.com >
---------
Signed-off-by: Austen Dicken <cvpcsm@gmail.com >
2026-02-11 22:58:19 +01:00
LocalAI [bot]
79a25f7ae9
chore: ⬆️ Update ggml-org/llama.cpp to 4d3daf80f8834e0eb5148efc7610513f1e263653 ( #8513 )
...
⬆️ Update ggml-org/llama.cpp
Signed-off-by: github-actions[bot] <41898282+github-actions[bot]@users.noreply.github.com>
Co-authored-by: mudler <2420543+mudler@users.noreply.github.com >
2026-02-11 21:48:39 +00:00
LocalAI [bot]
0ee92317ec
chore: ⬆️ Update ggml-org/llama.cpp to 57487a64c88c152ac72f3aea09bd1cc491b2f61e ( #8499 )
...
⬆️ Update ggml-org/llama.cpp
Signed-off-by: github-actions[bot] <41898282+github-actions[bot]@users.noreply.github.com>
Co-authored-by: mudler <2420543+mudler@users.noreply.github.com >
2026-02-10 21:32:46 +00:00
LocalAI [bot]
743d2d1947
chore: ⬆️ Update ggml-org/whisper.cpp to 764482c3175d9c3bc6089c1ec84df7d1b9537d83 ( #8478 )
...
⬆️ Update ggml-org/whisper.cpp
Signed-off-by: github-actions[bot] <41898282+github-actions[bot]@users.noreply.github.com>
Co-authored-by: mudler <2420543+mudler@users.noreply.github.com >
2026-02-10 15:14:59 +01:00
LocalAI [bot]
df04843f34
chore: ⬆️ Update ggml-org/llama.cpp to 262364e31d1da43596fe84244fba44e94a0de64e ( #8479 )
...
⬆️ Update ggml-org/llama.cpp
Signed-off-by: github-actions[bot] <41898282+github-actions[bot]@users.noreply.github.com>
Co-authored-by: mudler <2420543+mudler@users.noreply.github.com >
2026-02-10 15:14:33 +01:00
LocalAI [bot]
0c040beb59
chore: ⬆️ Update antirez/voxtral.c to c9e8773a2042d67c637fc492c8a655c485354080 ( #8477 )
...
⬆️ Update antirez/voxtral.c
Signed-off-by: github-actions[bot] <41898282+github-actions[bot]@users.noreply.github.com>
Co-authored-by: mudler <2420543+mudler@users.noreply.github.com >
2026-02-09 22:20:03 +01:00