Compare commits

..

233 Commits

Author SHA1 Message Date
copilot-swe-agent[bot]
4b0dbe8ae0 Initial plan 2026-02-02 21:21:41 +00:00
Ettore Di Giacinto
08b2b8d755 fix(libbackend): do not inject --index-strategy unsafe-best-match to all
Signed-off-by: Ettore Di Giacinto <mudler@localai.io>
2026-02-02 17:13:45 +01:00
Dream
10a1e6c74d feat(whisperx): add whisperx backend for transcription with speaker diarization (#8299)
* feat(proto): add speaker field to TranscriptSegment for diarization

Add speaker field to the gRPC TranscriptSegment message and map it
through the Go schema, enabling backends to return speaker labels.

Signed-off-by: eureka928 <meobius123@gmail.com>

* feat(whisperx): add whisperx backend for transcription with diarization

Add Python gRPC backend using WhisperX for speech-to-text with
word-level timestamps, forced alignment, and speaker diarization
via pyannote-audio when HF_TOKEN is provided.

Signed-off-by: eureka928 <meobius123@gmail.com>

* feat(whisperx): register whisperx backend in Makefile

Signed-off-by: eureka928 <meobius123@gmail.com>

* feat(whisperx): add whisperx meta and image entries to index.yaml

Signed-off-by: eureka928 <meobius123@gmail.com>

* ci(whisperx): add build matrix entries for CPU, CUDA 12/13, and ROCm

Signed-off-by: eureka928 <meobius123@gmail.com>

* fix(whisperx): unpin torch versions and use CPU index for cpu requirements

Address review feedback:
- Use --extra-index-url for CPU torch wheels to reduce size
- Remove torch version pins, let uv resolve compatible versions

Signed-off-by: eureka928 <meobius123@gmail.com>

* fix(whisperx): pin torch ROCm variant to fix CI build failure

Signed-off-by: eureka928 <meobius123@gmail.com>

* fix(whisperx): pin torch CPU variant to fix uv resolution failure

Pin torch==2.8.0+cpu so uv resolves the CPU wheel from the extra
index instead of picking torch==2.8.0+cu128 from PyPI, which pulls
unresolvable CUDA dependencies.

Signed-off-by: eureka928 <meobius123@gmail.com>

* fix(whisperx): use unsafe-best-match index strategy to fix uv resolution failure

uv's default first-match strategy finds torch on PyPI before checking
the extra index, causing it to pick torch==2.8.0+cu128 instead of the
CPU variant. This makes whisperx's transitive torch dependency
unresolvable. Using unsafe-best-match lets uv consider all indexes.

Signed-off-by: eureka928 <meobius123@gmail.com>

* fix(whisperx): drop +cpu local version suffix to fix uv resolution failure

PEP 440 ==2.8.0 matches 2.8.0+cpu from the extra index, avoiding the
issue where uv cannot locate an explicit +cpu local version specifier.
This aligns with the pattern used by all other CPU backends.

Signed-off-by: eureka928 <meobius123@gmail.com>

* fix(backends): drop +rocm local version suffixes from hipblas requirements to fix uv resolution

uv cannot resolve PEP 440 local version specifiers (e.g. +rocm6.4,
+rocm6.3) in pinned requirements. The --extra-index-url already points
to the correct ROCm wheel index and --index-strategy unsafe-best-match
(set in libbackend.sh) ensures the ROCm variant is preferred.

Applies the same fix as 7f5d72e8 (which resolved this for +cpu) across
all 14 hipblas requirements files.

Signed-off-by: eureka928 <meobius123@gmail.com>

Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
Signed-off-by: eureka928 <meobius123@gmail.com>

* revert: scope hipblas suffix fix to whisperx only

Reverts changes to non-whisperx hipblas requirements files per
maintainer review — other backends are building fine with the +rocm
local version suffix.

Signed-off-by: eureka928 <meobius123@gmail.com>

Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
Signed-off-by: eureka928 <meobius123@gmail.com>

---------

Signed-off-by: eureka928 <meobius123@gmail.com>
Co-authored-by: Claude Opus 4.5 <noreply@anthropic.com>
2026-02-02 16:33:12 +01:00
Alex O'Connell
b7585ca738 fix(api): Add missing field in initial OpenAI streaming response (#8341)
Add missing field in initial OpenAI streaming response

Signed-off-by: Alex O'Connell <35843486+acon96@users.noreply.github.com>
2026-02-02 08:30:04 +01:00
LocalAI [bot]
8cae99229c chore: ⬆️ Update ggml-org/llama.cpp to 2634ed207a17db1a54bd8df0555bd8499a6ab691 (#8336)
⬆️ Update ggml-org/llama.cpp

Signed-off-by: github-actions[bot] <41898282+github-actions[bot]@users.noreply.github.com>
Co-authored-by: mudler <2420543+mudler@users.noreply.github.com>
2026-02-01 21:23:57 +00:00
rampa3
04e0f444e1 chore(model gallery): Add Qwen 3 VL 8B thinking & instruct (#8329)
Signed-off-by: rampa3 <68955305+rampa3@users.noreply.github.com>
2026-02-01 17:34:31 +01:00
rampa3
6f410f4cbe chore(model gallery): Rename downloaded filename for Magistral Small mmproj (#8327)
Rename downloaded filename for Magistral Small mmproj

Signed-off-by: rampa3 <68955305+rampa3@users.noreply.github.com>
2026-02-01 17:33:48 +01:00
Ettore Di Giacinto
800f749c7b fix: drop gguf VRAM estimation (now redundant) (#8325)
fix: drop gguf VRAM estimation

Cleanup. This is now handled directly in llama.cpp, no need to estimate from Go.

VRAM estimation in general is tricky, but llama.cpp ( 41ea26144e/src/llama.cpp (L168) ) lately has added an automatic "fitting" of models to VRAM, so we can drop backend-specific GGUF VRAM estimation from our code instead of trying to guess as we already enable it

 397f7f0862/backend/cpp/llama-cpp/grpc-server.cpp (L393)

Fixes: https://github.com/mudler/LocalAI/issues/8302
See: https://github.com/mudler/LocalAI/issues/8302#issuecomment-3830773472
2026-02-01 17:33:28 +01:00
Andres
b6459ddd57 feat(api): Add transcribe response format request parameter & adjust STT backends (#8318)
* WIP response format implementation for audio transcriptions

(cherry picked from commit e271dd764bbc13846accf3beb8b6522153aa276f)
Signed-off-by: Andres Smith <andressmithdev@pm.me>

* Rework transcript response_format and add more formats

(cherry picked from commit 6a93a8f63e2ee5726bca2980b0c9cf4ef8b7aeb8)
Signed-off-by: Andres Smith <andressmithdev@pm.me>

* Add test and replace go-openai package with official openai go client

(cherry picked from commit f25d1a04e46526429c89db4c739e1e65942ca893)
Signed-off-by: Andres Smith <andressmithdev@pm.me>

* Fix faster-whisper backend and refactor transcription formatting to also work on CLI

Signed-off-by: Andres Smith <andressmithdev@pm.me>
(cherry picked from commit 69a93977d5e113eb7172bd85a0f918592d3d2168)
Signed-off-by: Andres Smith <andressmithdev@pm.me>

---------

Signed-off-by: Andres Smith <andressmithdev@pm.me>
Co-authored-by: nanoandrew4 <nanoandrew4@gmail.com>
Co-authored-by: Ettore Di Giacinto <mudler@users.noreply.github.com>
2026-02-01 17:33:17 +01:00
Ettore Di Giacinto
397f7f0862 fix(ui): take account of reasoning in token count calculation (#8324)
We were skipping reasoning traces when counting tokens, yielding to a
wrong sum count.

Signed-off-by: Ettore Di Giacinto <mudler@localai.io>
2026-02-01 10:48:31 +01:00
LocalAI [bot]
234072769c chore(model gallery): 🤖 add 1 new models via gallery agent (#8321)
chore(model gallery): 🤖 add new models via gallery agent

Signed-off-by: github-actions[bot] <41898282+github-actions[bot]@users.noreply.github.com>
Co-authored-by: mudler <2420543+mudler@users.noreply.github.com>
2026-02-01 09:02:05 +01:00
LocalAI [bot]
3445415b3d chore: ⬆️ Update ggml-org/llama.cpp to 41ea26144e55d23f37bb765f88c07588d786567f (#8317)
⬆️ Update ggml-org/llama.cpp

Signed-off-by: github-actions[bot] <41898282+github-actions[bot]@users.noreply.github.com>
Co-authored-by: mudler <2420543+mudler@users.noreply.github.com>
2026-01-31 21:18:31 +00:00
LocalAI [bot]
b05e110aa6 chore: ⬆️ Update ggml-org/llama.cpp to 1488339138d609139c4400d1b80f8a5b1a9a203c (#8306)
⬆️ Update ggml-org/llama.cpp

Signed-off-by: github-actions[bot] <41898282+github-actions[bot]@users.noreply.github.com>
Co-authored-by: mudler <2420543+mudler@users.noreply.github.com>
2026-01-31 08:59:09 +01:00
LocalAI [bot]
e69cba2444 chore: ⬆️ Update ggml-org/whisper.cpp to aa1bc0d1a6dfd70dbb9f60c11df12441e03a9075 (#8305)
⬆️ Update ggml-org/whisper.cpp

Signed-off-by: github-actions[bot] <41898282+github-actions[bot]@users.noreply.github.com>
Co-authored-by: mudler <2420543+mudler@users.noreply.github.com>
2026-01-31 08:58:54 +01:00
LocalAI [bot]
f7903597ac chore(model-gallery): ⬆️ update checksum (#8307)
⬆️ Checksum updates in gallery/index.yaml

Signed-off-by: github-actions[bot] <41898282+github-actions[bot]@users.noreply.github.com>
Co-authored-by: mudler <2420543+mudler@users.noreply.github.com>
2026-01-30 21:27:48 +01:00
LocalAI [bot]
ee76a0cd1c feat(swagger): update swagger (#8304)
Signed-off-by: github-actions[bot] <41898282+github-actions[bot]@users.noreply.github.com>
Co-authored-by: mudler <2420543+mudler@users.noreply.github.com>
2026-01-30 21:27:10 +01:00
Ettore Di Giacinto
4ca5b737bf chore(cuda): target 12.8 for 12 to increase compatibility (#8297)
Some datacenter setups might be stuck with the 5.x kernel which doesn't
play well with CUDA >=12.9. To incrase compatibility with the CUDA 12.x
branch, downgrade to 12.8. For newer systems, it is still suggested to
use CUDA 13.x wherever compatible.

Signed-off-by: Ettore Di Giacinto <mudler@localai.io>
2026-01-30 12:58:44 +01:00
Ettore Di Giacinto
4077aaf978 chore: re-enable e2e tests, fixups anthropic API tools support (#8296)
* chore(tests): add mock backend e2e tests

Signed-off-by: Ettore Di Giacinto <mudler@localai.io>

* Fixup anthropic tests

Signed-off-by: Ettore Di Giacinto <mudler@localai.io>

* prepare e2e tests

Signed-off-by: Ettore Di Giacinto <mudler@localai.io>

* Drop repetitive tests

Signed-off-by: Ettore Di Giacinto <mudler@localai.io>

* Drop specific CI workflow

Signed-off-by: Ettore Di Giacinto <mudler@localai.io>

* fixup anthropic issues, move all e2e tests to use mocked backend

Signed-off-by: Ettore Di Giacinto <mudler@localai.io>

---------

Signed-off-by: Ettore Di Giacinto <mudler@localai.io>
2026-01-30 12:41:50 +01:00
Ettore Di Giacinto
68dd9765a0 feat(tts): add support for streaming mode (#8291)
* feat(tts): add support for streaming mode

Signed-off-by: Ettore Di Giacinto <mudler@localai.io>

* Send first audio, make sure it's 16

Signed-off-by: Ettore Di Giacinto <mudler@localai.io>

---------

Signed-off-by: Ettore Di Giacinto <mudler@localai.io>
2026-01-30 11:58:01 +01:00
LocalAI [bot]
2c44b06a67 chore: ⬆️ Update ggml-org/llama.cpp to 4fdbc1e4dba428ce0cf9d2ac22232dc170bbca82 (#8283)
⬆️ Update ggml-org/llama.cpp

Signed-off-by: github-actions[bot] <41898282+github-actions[bot]@users.noreply.github.com>
Co-authored-by: mudler <2420543+mudler@users.noreply.github.com>
2026-01-29 23:43:29 +01:00
LocalAI [bot]
7cc90db3e5 chore(model-gallery): ⬆️ update checksum (#8285)
⬆️ Checksum updates in gallery/index.yaml

Signed-off-by: github-actions[bot] <41898282+github-actions[bot]@users.noreply.github.com>
Co-authored-by: mudler <2420543+mudler@users.noreply.github.com>
2026-01-29 21:51:18 +01:00
Ettore Di Giacinto
1e08e02598 feat(qwen-asr): add support to qwen-asr (#8281)
Signed-off-by: Ettore Di Giacinto <mudler@localai.io>
2026-01-29 21:50:35 +01:00
Richard Palethorpe
dd8e74a486 feat(realtime): Add audio conversations (#6245)
* feat(realtime): Add audio conversations

Signed-off-by: Richard Palethorpe <io@richiejp.com>

* chore(realtime): Vendor the updated API and modify for server side

Signed-off-by: Richard Palethorpe <io@richiejp.com>

* feat(realtime): Update to the GA realtime API

Signed-off-by: Richard Palethorpe <io@richiejp.com>

* chore: Document realtime API and add docs to AGENTS.md

Signed-off-by: Richard Palethorpe <io@richiejp.com>

* feat: Filter reasoning from spoken output

Signed-off-by: Richard Palethorpe <io@richiejp.com>

* fix(realtime): Send delta and done events for tool calls and audio transcripts

Ensure that content is sent in both deltas and done events for function call arguments and audio transcripts. This fixes compatibility with clients that rely on delta events for parsing.

💘 Generated with Crush

Signed-off-by: Richard Palethorpe <io@richiejp.com>

* fix(realtime): Improve tool call handling and error reporting

- Refactor Model interface to accept []types.ToolUnion and *types.ToolChoiceUnion
  instead of JSON strings, eliminating unnecessary marshal/unmarshal cycles
- Fix Parameters field handling: support both map[string]any and JSON string formats
- Add PredictConfig() method to Model interface for accessing model configuration
- Add comprehensive debug logging for tool call parsing and function config
- Add missing return statement after prediction error (critical bug fix)
- Add warning logs for NoAction function argument parsing failures
- Improve error visibility throughout generateResponse function

💘 Generated with Crush

Assisted-by: Claude Sonnet 4.5 via Crush <crush@charm.land>
Signed-off-by: Richard Palethorpe <io@richiejp.com>

---------

Signed-off-by: Richard Palethorpe <io@richiejp.com>
2026-01-29 08:44:53 +01:00
Ettore Di Giacinto
48e08772f3 chore(llama.cpp): bump to 'f6b533d898ce84bae8d9fa8dfc6697ac087800bf' (#8275)
Signed-off-by: Ettore Di Giacinto <mudler@localai.io>
2026-01-29 00:22:25 +01:00
LocalAI [bot]
c28c0227c6 chore: ⬆️ Update leejet/stable-diffusion.cpp to e411520407663e1ddf8ff2e5ed4ff3a116fbbc97 (#8274)
⬆️ Update leejet/stable-diffusion.cpp

Signed-off-by: github-actions[bot] <41898282+github-actions[bot]@users.noreply.github.com>
Co-authored-by: mudler <2420543+mudler@users.noreply.github.com>
2026-01-28 21:23:05 +00:00
Richard Palethorpe
856ca2d6b1 fix(qwen3): Be explicit with function calling format (#8265)
Qwen3 4b was using the wrong function format (i.e. using "function"
instead of "name") within the realtime API.

If we specify the function calling format explicitly then it stops it.

Signed-off-by: Richard Palethorpe <io@richiejp.com>
2026-01-28 14:44:29 +01:00
Ettore Di Giacinto
9b973b79f6 feat: add VoxCPM tts backend (#8109)
* feat: add VoxCPM tts backend

Signed-off-by: Ettore Di Giacinto <mudler@localai.io>

* Disable voxcpm on arm64 cpu

Signed-off-by: Ettore Di Giacinto <mudler@localai.io>

---------

Signed-off-by: Ettore Di Giacinto <mudler@localai.io>
2026-01-28 14:44:04 +01:00
Ettore Di Giacinto
cba8ef4e38 chore: fix backend icons
Signed-off-by: Ettore Di Giacinto <mudler@localai.io>
2026-01-28 09:09:00 +01:00
Ettore Di Giacinto
f729e300d6 chore: fix backend icons
Signed-off-by: Ettore Di Giacinto <mudler@localai.io>
2026-01-28 09:08:03 +01:00
LocalAI [bot]
9916811a79 chore: ⬆️ Update ggml-org/llama.cpp to 2b4cbd2834e427024bc7f935a1f232aecac6679b (#8258)
⬆️ Update ggml-org/llama.cpp

Signed-off-by: github-actions[bot] <41898282+github-actions[bot]@users.noreply.github.com>
Co-authored-by: mudler <2420543+mudler@users.noreply.github.com>
Co-authored-by: Ettore Di Giacinto <mudler@users.noreply.github.com>
2026-01-28 08:50:16 +01:00
Ettore Di Giacinto
2f7c595cd1 chore(model gallery): add z-image and z-image-turbo for diffusers (#8260)
Signed-off-by: Ettore Di Giacinto <mudler@localai.io>
2026-01-27 22:42:10 +01:00
rampa3
73decac746 chore(model gallery): Add mistral-community/pixtral-12b with mmproj (#8245)
Rebased branch add_pixtral on master

Signed-off-by: rampa3 <68955305+rampa3@users.noreply.github.com>
2026-01-27 21:43:31 +01:00
Ettore Di Giacinto
ec1598868b feat(vibevoice): add ASR support (#8222)
* feat(vibevoice): add ASR support

Signed-off-by: Ettore Di Giacinto <mudler@localai.io>

* Add tests

Signed-off-by: Ettore Di Giacinto <mudler@localai.io>

* fixups

Signed-off-by: Ettore Di Giacinto <mudler@localai.io>

* chore(tests): download voice files

Signed-off-by: Ettore Di Giacinto <mudler@localai.io>

* Small fixups

Signed-off-by: Ettore Di Giacinto <mudler@localai.io>

* Small fixups

Signed-off-by: Ettore Di Giacinto <mudler@localai.io>

* Try to run on bigger runner

Signed-off-by: Ettore Di Giacinto <mudler@localai.io>

* Fixups

Signed-off-by: Ettore Di Giacinto <mudler@localai.io>

* Fixups

Signed-off-by: Ettore Di Giacinto <mudler@localai.io>

* Fixups

Signed-off-by: Ettore Di Giacinto <mudler@localai.io>

* debug

Signed-off-by: Ettore Di Giacinto <mudler@localai.io>

* CI can't hold vibevoice

Signed-off-by: Ettore Di Giacinto <mudler@localai.io>

---------

Signed-off-by: Ettore Di Giacinto <mudler@localai.io>
2026-01-27 20:19:22 +01:00
rampa3
93d7e5d4b8 chore(model gallery): Add entry for Magistral Small 1.2 with mmproj (#8248)
Signed-off-by: rampa3 <68955305+rampa3@users.noreply.github.com>
2026-01-27 16:55:00 +01:00
rampa3
ff5a54b9d1 chore(model gallery): Add entry for Mistral Small 3.1 with mmproj (#8247)
* chore(model gallery): Add entry for Mistral Small 3.1 with mmproj

Signed-off-by: rampa3 <68955305+rampa3@users.noreply.github.com>

* Use llama-cpp subfolder structure akin to Qwen 3 VL

Signed-off-by: rampa3 <68955305+rampa3@users.noreply.github.com>

---------

Signed-off-by: rampa3 <68955305+rampa3@users.noreply.github.com>
2026-01-27 16:54:14 +01:00
LocalAI [bot]
3c1f823c47 chore: ⬆️ Update ggml-org/llama.cpp to 8f80d1b254aef70a0959e314be368d05debe7294 (#8229)
⬆️ Update ggml-org/llama.cpp

Signed-off-by: github-actions[bot] <41898282+github-actions[bot]@users.noreply.github.com>
Co-authored-by: mudler <2420543+mudler@users.noreply.github.com>
2026-01-26 21:19:43 +00:00
LocalAI [bot]
4024220d00 chore(model gallery): 🤖 add 1 new models via gallery agent (#8220)
chore(model gallery): 🤖 add new models via gallery agent

Signed-off-by: github-actions[bot] <41898282+github-actions[bot]@users.noreply.github.com>
Co-authored-by: mudler <2420543+mudler@users.noreply.github.com>
2026-01-26 12:11:24 +01:00
LocalAI [bot]
f76958d761 chore: ⬆️ Update ggml-org/llama.cpp to 0440bfd1605333726ea0fb7a836942660bf2f9a6 (#8216)
⬆️ Update ggml-org/llama.cpp

Signed-off-by: github-actions[bot] <41898282+github-actions[bot]@users.noreply.github.com>
Co-authored-by: mudler <2420543+mudler@users.noreply.github.com>
2026-01-26 00:50:35 +01:00
LocalAI [bot]
2bd5ca45de chore: ⬆️ Update leejet/stable-diffusion.cpp to 43e829f21966abb96b08c712bccee872dc820914 (#8215)
⬆️ Update leejet/stable-diffusion.cpp

Signed-off-by: github-actions[bot] <41898282+github-actions[bot]@users.noreply.github.com>
Co-authored-by: mudler <2420543+mudler@users.noreply.github.com>
2026-01-26 00:50:16 +01:00
Ettore Di Giacinto
6804ce1c39 chore(docs): change MEMORY_FILE_PATH to MEMORY_INDEX_PATH
Updated MEMORY_FILE_PATH to MEMORY_INDEX_PATH in memory configuration.

Signed-off-by: Ettore Di Giacinto <mudler@users.noreply.github.com>
2026-01-25 22:14:11 +01:00
Dedy F. Setyawan
d499071bff fix(ui): correctly display selected image model (#8208)
Signed-off-by: Dedy F. Setyawan <dedyfajars@gmail.com>
2026-01-25 14:54:40 +01:00
Ettore Di Giacinto
26a374b717 chore: drop bark which is unmaintained (#8207)
Signed-off-by: Ettore Di Giacinto <mudler@localai.io>
2026-01-25 09:26:40 +01:00
rampa3
980de0e25b chore(model gallery): Add most of not yet present Piper voices from Hugging Face (#8202)
Signed-off-by: rampa3 <68955305+rampa3@users.noreply.github.com>
2026-01-25 08:56:53 +01:00
Ettore Di Giacinto
4767371aee chore(README): Add links
Signed-off-by: Ettore Di Giacinto <mudler@localai.io>
2026-01-24 22:49:27 +01:00
Ettore Di Giacinto
131d247b78 chore(README): Update and simplify links
Signed-off-by: Ettore Di Giacinto <mudler@localai.io>
2026-01-24 22:46:40 +01:00
Ettore Di Giacinto
b2a8a63899 feat(vllm-omni): add new backend (#8188)
* feat(vllm-omni: add new backend

Signed-off-by: Ettore Di Giacinto <mudler@localai.io>

* default to py3.12

Signed-off-by: Ettore Di Giacinto <mudler@localai.io>

---------

Signed-off-by: Ettore Di Giacinto <mudler@localai.io>
2026-01-24 22:23:30 +01:00
LocalAI [bot]
05a332cd5f chore: ⬆️ Update ggml-org/llama.cpp to bb02f74c612064947e51d23269a1cf810b67c9a7 (#8196)
⬆️ Update ggml-org/llama.cpp

Signed-off-by: github-actions[bot] <41898282+github-actions[bot]@users.noreply.github.com>
Co-authored-by: mudler <2420543+mudler@users.noreply.github.com>
2026-01-24 21:19:43 +00:00
Ettore Di Giacinto
05904c77f5 chore(exllama): drop backend now almost deprecated (#8186)
exllama2 development has stalled and only old architectures are
supported. exllamav3 is still in development, meanwhile cleaning up
exllama2 from the gallery.

Signed-off-by: Ettore Di Giacinto <mudler@localai.io>
2026-01-24 08:57:37 +01:00
LocalAI [bot]
17783fa7d9 chore: ⬆️ Update leejet/stable-diffusion.cpp to fa61ea744d1a87fa26a63f8a86e45587bc9534d6 (#8184)
⬆️ Update leejet/stable-diffusion.cpp

Signed-off-by: github-actions[bot] <41898282+github-actions[bot]@users.noreply.github.com>
Co-authored-by: mudler <2420543+mudler@users.noreply.github.com>
2026-01-24 08:57:24 +01:00
LocalAI [bot]
4019094111 chore: ⬆️ Update ggml-org/llama.cpp to 557515be1e93ed8939dd8a7c7d08765fdbe8be31 (#8183)
⬆️ Update ggml-org/llama.cpp

Signed-off-by: github-actions[bot] <41898282+github-actions[bot]@users.noreply.github.com>
Co-authored-by: mudler <2420543+mudler@users.noreply.github.com>
2026-01-24 08:57:08 +01:00
Ettore Di Giacinto
ca65fc751a chore(model gallery): add qwen3-tts to model gallery (#8187)
Signed-off-by: Ettore Di Giacinto <mudler@localai.io>
2026-01-23 23:06:50 +01:00
LocalAI [bot]
a1e3acc590 docs: ⬆️ update docs version mudler/LocalAI (#8182)
⬆️ Update docs version mudler/LocalAI

Signed-off-by: github-actions[bot] <41898282+github-actions[bot]@users.noreply.github.com>
Co-authored-by: mudler <2420543+mudler@users.noreply.github.com>
2026-01-23 22:03:47 +01:00
Ettore Di Giacinto
a36960e069 fix(qwen-tts): change icon URL in index.yaml
Updated the icon URL for the project.

Signed-off-by: Ettore Di Giacinto <mudler@users.noreply.github.com>
2026-01-23 22:00:14 +01:00
Ettore Di Giacinto
58bb6a29ed Revert "chore(deps): bump torch from 2.4.1 to 2.7.1+xpu in /backend/python/bark in the pip group across 1 directory" (#8180)
Revert "chore(deps): bump torch from 2.4.1 to 2.7.1+xpu in /backend/python/ba…"

This reverts commit 5881c82413.
2026-01-23 17:25:04 +01:00
dependabot[bot]
5881c82413 chore(deps): bump torch from 2.4.1 to 2.7.1+xpu in /backend/python/bark in the pip group across 1 directory (#8175)
chore(deps): bump torch

Bumps the pip group with 1 update in the /backend/python/bark directory: torch.


Updates `torch` from 2.4.1 to 2.7.1+xpu

---
updated-dependencies:
- dependency-name: torch
  dependency-version: 2.7.1+xpu
  dependency-type: direct:production
  dependency-group: pip
...

Signed-off-by: dependabot[bot] <support@github.com>
Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com>
2026-01-23 15:32:15 +00:00
Ettore Di Giacinto
923ebbb344 feat(qwen-tts): add Qwen-tts backend (#8163)
* feat(qwen-tts): add Qwen-tts backend

Signed-off-by: Ettore Di Giacinto <mudler@localai.io>

* Update intel deps

Signed-off-by: Ettore Di Giacinto <mudler@localai.io>

* Drop flash-attn for cuda13

Signed-off-by: Ettore Di Giacinto <mudler@localai.io>

---------

Signed-off-by: Ettore Di Giacinto <mudler@localai.io>
2026-01-23 15:18:41 +01:00
LocalAI [bot]
ea51567b89 chore(model gallery): 🤖 add 1 new models via gallery agent (#8170)
chore(model gallery): 🤖 add new models via gallery agent

Signed-off-by: github-actions[bot] <41898282+github-actions[bot]@users.noreply.github.com>
Co-authored-by: mudler <2420543+mudler@users.noreply.github.com>
2026-01-23 08:19:39 +01:00
LocalAI [bot]
552c62a19c chore: ⬆️ Update leejet/stable-diffusion.cpp to 5e4579c11d0678f9765463582d024e58270faa9c (#8166)
⬆️ Update leejet/stable-diffusion.cpp

Signed-off-by: github-actions[bot] <41898282+github-actions[bot]@users.noreply.github.com>
Co-authored-by: mudler <2420543+mudler@users.noreply.github.com>
Co-authored-by: Ettore Di Giacinto <mudler@users.noreply.github.com>
2026-01-23 08:18:05 +01:00
Ettore Di Giacinto
c0b21a921b feat: detect thinking support from backend automatically if not explicitly set (#8167)
detect thinking support from backend automatically if not explicitly set

Signed-off-by: Ettore Di Giacinto <mudler@localai.io>
2026-01-23 00:38:28 +01:00
LocalAI [bot]
b10045adc2 chore: ⬆️ Update ggml-org/llama.cpp to a5eaa1d6a3732bc0f460b02b61c95680bba5a012 (#8165)
⬆️ Update ggml-org/llama.cpp

Signed-off-by: github-actions[bot] <41898282+github-actions[bot]@users.noreply.github.com>
Co-authored-by: mudler <2420543+mudler@users.noreply.github.com>
Co-authored-by: Ettore Di Giacinto <mudler@users.noreply.github.com>
2026-01-22 23:32:05 +00:00
Ettore Di Giacinto
61b5e3b629 chore: drop test file
Signed-off-by: Ettore Di Giacinto <mudler@localai.io>
2026-01-22 22:19:38 +00:00
Ettore Di Giacinto
e35d7cb3b3 chore: drop test file
the function now was removed

Signed-off-by: Ettore Di Giacinto <mudler@localai.io>
2026-01-22 21:47:52 +00:00
Ettore Di Giacinto
0fa0ac4797 fix(videogen): drop incomplete endpoint, add GGUF support for LTX-2 (#8160)
* Debug

Signed-off-by: Ettore Di Giacinto <mudler@localai.io>

* Drop openai video endpoint (is not complete)

Signed-off-by: Ettore Di Giacinto <mudler@localai.io>

* Add download button

Signed-off-by: Ettore Di Giacinto <mudler@localai.io>

---------

Signed-off-by: Ettore Di Giacinto <mudler@localai.io>
2026-01-22 14:09:20 +01:00
LocalAI [bot]
be7ed85838 chore(model gallery): 🤖 add 1 new models via gallery agent (#8157)
chore(model gallery): 🤖 add new models via gallery agent

Signed-off-by: github-actions[bot] <41898282+github-actions[bot]@users.noreply.github.com>
Co-authored-by: mudler <2420543+mudler@users.noreply.github.com>
2026-01-22 08:25:40 +01:00
LocalAI [bot]
c12b310028 chore: ⬆️ Update ggml-org/llama.cpp to c301172f660a1fe0b42023da990bf7385d69adb4 (#8151)
⬆️ Update ggml-org/llama.cpp

Signed-off-by: github-actions[bot] <41898282+github-actions[bot]@users.noreply.github.com>
Co-authored-by: mudler <2420543+mudler@users.noreply.github.com>
2026-01-22 00:51:22 +01:00
LocalAI [bot]
0447d5564d chore: ⬆️ Update leejet/stable-diffusion.cpp to 329571131d62d64a4f49e1acbef49ae02544fdcd (#8152)
⬆️ Update leejet/stable-diffusion.cpp

Signed-off-by: github-actions[bot] <41898282+github-actions[bot]@users.noreply.github.com>
Co-authored-by: mudler <2420543+mudler@users.noreply.github.com>
2026-01-22 00:50:41 +01:00
Ettore Di Giacinto
22c0eb5421 chore(diffusers): add 'av' to requirements.txt (#8155)
Signed-off-by: Ettore Di Giacinto <mudler@users.noreply.github.com>
2026-01-21 22:35:00 +01:00
LocalAI [bot]
a0a00fb937 chore(model-gallery): ⬆️ update checksum (#8153)
⬆️ Checksum updates in gallery/index.yaml

Signed-off-by: github-actions[bot] <41898282+github-actions[bot]@users.noreply.github.com>
Co-authored-by: mudler <2420543+mudler@users.noreply.github.com>
2026-01-21 21:45:11 +01:00
LocalAI [bot]
6dd44742ea feat(swagger): update swagger (#8150)
Signed-off-by: github-actions[bot] <41898282+github-actions[bot]@users.noreply.github.com>
Co-authored-by: mudler <2420543+mudler@users.noreply.github.com>
2026-01-21 21:44:44 +01:00
Richard Palethorpe
00c72e7d3e fix(tracing): Create trace buffer on first request to enable tracing at runtime (#8148)
Signed-off-by: Richard Palethorpe <io@richiejp.com>
2026-01-21 18:39:39 +01:00
LocalAI [bot]
d01c335cf6 chore: ⬆️ Update ggml-org/whisper.cpp to 7aa8818647303b567c3a21fe4220b2681988e220 (#8146)
⬆️ Update ggml-org/whisper.cpp

Signed-off-by: github-actions[bot] <41898282+github-actions[bot]@users.noreply.github.com>
Co-authored-by: mudler <2420543+mudler@users.noreply.github.com>
2026-01-21 17:44:01 +01:00
LocalAI [bot]
5687df4535 chore: ⬆️ Update ggml-org/llama.cpp to ad8d85bd94cc86e89d23407bdebf98f2e6510c61 (#8145)
⬆️ Update ggml-org/llama.cpp

Signed-off-by: github-actions[bot] <41898282+github-actions[bot]@users.noreply.github.com>
Co-authored-by: mudler <2420543+mudler@users.noreply.github.com>
2026-01-21 15:41:36 +00:00
Ettore Di Giacinto
f5fade97e6 chore: drop noisy logs (#8142)
Signed-off-by: Ettore Di Giacinto <mudler@localai.io>
2026-01-21 09:52:20 +01:00
Ettore Di Giacinto
b88ae31e4e chore(model gallery): add flux 2 and flux 2 klein (#8141)
Signed-off-by: Ettore Di Giacinto <mudler@localai.io>
2026-01-21 09:46:33 +01:00
Ettore Di Giacinto
f6daaa7c35 chore(deps): Bump llama.cpp to '1c7cf94b22a9dc6b1d32422f72a627787a4783a3' (#8136)
Signed-off-by: Ettore Di Giacinto <mudler@localai.io>
2026-01-21 00:12:13 +01:00
Ettore Di Giacinto
c491c6ca90 feat(openresponses): Support reasoning blocks (#8133)
* feat(openresponses): support reasoning blocks

Signed-off-by: Ettore Di Giacinto <mudler@localai.io>

* allow to disable reasoning, refactor common logic

Signed-off-by: Ettore Di Giacinto <mudler@localai.io>

* Add option to only strip reasoning

Signed-off-by: Ettore Di Giacinto <mudler@localai.io>

* Add configurations for custom reasoning tokens

Signed-off-by: Ettore Di Giacinto <mudler@localai.io>

---------

Signed-off-by: Ettore Di Giacinto <mudler@localai.io>
2026-01-21 00:11:45 +01:00
Ettore Di Giacinto
34e054f607 fix(reasoning): support models with reasoning without starting thinking tag (#8132)
* chore: extract reasoning to its own package

Signed-off-by: Ettore Di Giacinto <mudler@localai.io>

* make sure we detect thinking tokens from template

Signed-off-by: Ettore Di Giacinto <mudler@localai.io>

* Allow to override via config, add tests

Signed-off-by: Ettore Di Giacinto <mudler@localai.io>

---------

Signed-off-by: Ettore Di Giacinto <mudler@localai.io>
2026-01-20 21:07:59 +01:00
LocalAI [bot]
e886bb291a chore(model gallery): 🤖 add 1 new models via gallery agent (#8128)
chore(model gallery): 🤖 add new models via gallery agent

Signed-off-by: github-actions[bot] <41898282+github-actions[bot]@users.noreply.github.com>
Co-authored-by: mudler <2420543+mudler@users.noreply.github.com>
2026-01-20 12:58:29 +01:00
Ettore Di Giacinto
4bf2f8bbd8 chore(docs): update docs with Anthropic API and openresponses
Signed-off-by: Ettore Di Giacinto <mudler@localai.io>
2026-01-20 09:25:24 +01:00
LocalAI [bot]
d3525b7509 chore: ⬆️ Update ggml-org/llama.cpp to 959ecf7f234dc0bc0cd6829b25cb0ee1481aa78a (#8122)
⬆️ Update ggml-org/llama.cpp

Signed-off-by: github-actions[bot] <41898282+github-actions[bot]@users.noreply.github.com>
Co-authored-by: mudler <2420543+mudler@users.noreply.github.com>
2026-01-19 22:50:47 +01:00
LocalAI [bot]
c8aa821e0e chore: ⬆️ Update leejet/stable-diffusion.cpp to a48b4a3ade9972faf0adcad47e51c6fc03f0e46d (#8121)
⬆️ Update leejet/stable-diffusion.cpp

Signed-off-by: github-actions[bot] <41898282+github-actions[bot]@users.noreply.github.com>
Co-authored-by: mudler <2420543+mudler@users.noreply.github.com>
2026-01-19 22:27:46 +01:00
dependabot[bot]
b3191927ae chore(deps): bump github.com/mudler/cogito from 0.7.2 to 0.8.1 (#8124)
Bumps [github.com/mudler/cogito](https://github.com/mudler/cogito) from 0.7.2 to 0.8.1.
- [Release notes](https://github.com/mudler/cogito/releases)
- [Commits](https://github.com/mudler/cogito/compare/v0.7.2...v0.8.1)

---
updated-dependencies:
- dependency-name: github.com/mudler/cogito
  dependency-version: 0.8.1
  dependency-type: direct:production
  update-type: version-update:semver-minor
...

Signed-off-by: dependabot[bot] <support@github.com>
Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com>
2026-01-19 22:26:26 +01:00
LocalAI [bot]
54c5a2d9ea docs: ⬆️ update docs version mudler/LocalAI (#8120)
⬆️ Update docs version mudler/LocalAI

Signed-off-by: github-actions[bot] <41898282+github-actions[bot]@users.noreply.github.com>
Co-authored-by: mudler <2420543+mudler@users.noreply.github.com>
2026-01-19 21:18:24 +00:00
Ettore Di Giacinto
0279591fec Enable reranking for Qwen3-VL-Reranker-8B
Signed-off-by: Ettore Di Giacinto <mudler@users.noreply.github.com>
2026-01-19 15:28:58 +01:00
LocalAI [bot]
8845186955 chore: ⬆️ Update leejet/stable-diffusion.cpp to 2efd19978dd4164e387bf226025c9666b6ef35e2 (#8099)
⬆️ Update leejet/stable-diffusion.cpp

Signed-off-by: github-actions[bot] <41898282+github-actions[bot]@users.noreply.github.com>
Co-authored-by: mudler <2420543+mudler@users.noreply.github.com>
2026-01-18 22:40:35 +01:00
LocalAI [bot]
ab8ed24358 chore: ⬆️ Update ggml-org/llama.cpp to 287a33017b32600bfc0e81feeb0ad6e81e0dd484 (#8100)
⬆️ Update ggml-org/llama.cpp

Signed-off-by: github-actions[bot] <41898282+github-actions[bot]@users.noreply.github.com>
Co-authored-by: mudler <2420543+mudler@users.noreply.github.com>
2026-01-18 22:40:14 +01:00
LocalAI [bot]
a021df5a88 feat(swagger): update swagger (#8098)
Signed-off-by: github-actions[bot] <41898282+github-actions[bot]@users.noreply.github.com>
Co-authored-by: mudler <2420543+mudler@users.noreply.github.com>
2026-01-18 22:10:06 +01:00
Ettore Di Giacinto
5f403b1631 chore: drop neutts for l4t (#8101)
Builds exhausts CI currently, and there are better backends at this
point in time. We will probably deprecate it in the future.

Signed-off-by: Ettore Di Giacinto <mudler@localai.io>
2026-01-18 21:55:56 +01:00
rampa3
897ad1729e chore(model gallery): add qwen3-coder-30b-a3b-instruct based on model request (#8082)
* chore(model gallery): add qwen3-coder-30b-a3b-instruct based on model request

Signed-off-by: rampa3 <68955305+rampa3@users.noreply.github.com>

* added missing model config import URL

Signed-off-by: rampa3 <68955305+rampa3@users.noreply.github.com>

---------

Signed-off-by: rampa3 <68955305+rampa3@users.noreply.github.com>
2026-01-18 09:23:07 +01:00
LocalAI [bot]
16a18a2e55 chore: ⬆️ Update leejet/stable-diffusion.cpp to 9565c7f6bd5fcff124c589147b2621244f2c4aa1 (#8086)
⬆️ Update leejet/stable-diffusion.cpp

Signed-off-by: github-actions[bot] <41898282+github-actions[bot]@users.noreply.github.com>
Co-authored-by: mudler <2420543+mudler@users.noreply.github.com>
2026-01-17 22:12:21 +01:00
Ettore Di Giacinto
3387bfaee0 feat(api): add support for open responses specification (#8063)
* feat: openresponses

Signed-off-by: Ettore Di Giacinto <mudler@localai.io>

* Add ttl settings, fix tests

Signed-off-by: Ettore Di Giacinto <mudler@localai.io>

* fix: register cors middleware by default

Signed-off-by: Ettore Di Giacinto <mudler@localai.io>

* satisfy schema

Signed-off-by: Ettore Di Giacinto <mudler@localai.io>

* Logitbias and logprobs

Signed-off-by: Ettore Di Giacinto <mudler@localai.io>

* Add grammar

Signed-off-by: Ettore Di Giacinto <mudler@localai.io>

* SSE compliance

Signed-off-by: Ettore Di Giacinto <mudler@localai.io>

* tool JSON conversion

Signed-off-by: Ettore Di Giacinto <mudler@localai.io>

* support background mode

Signed-off-by: Ettore Di Giacinto <mudler@localai.io>

* swagger

Signed-off-by: Ettore Di Giacinto <mudler@localai.io>

* drop code. This is handled in the handler

Signed-off-by: Ettore Di Giacinto <mudler@localai.io>

* Small refactorings

Signed-off-by: Ettore Di Giacinto <mudler@localai.io>

* background mode for MCP

Signed-off-by: Ettore Di Giacinto <mudler@localai.io>

---------

Signed-off-by: Ettore Di Giacinto <mudler@localai.io>
2026-01-17 22:11:47 +01:00
LocalAI [bot]
1cd33047b4 chore: ⬆️ Update ggml-org/llama.cpp to 2fbde785bc106ae1c4102b0e82b9b41d9c466579 (#8087)
⬆️ Update ggml-org/llama.cpp

Signed-off-by: github-actions[bot] <41898282+github-actions[bot]@users.noreply.github.com>
Co-authored-by: mudler <2420543+mudler@users.noreply.github.com>
2026-01-17 21:10:18 +00:00
Ettore Di Giacinto
1de045311a chore(ui): add video generation link (#8079)
Signed-off-by: Ettore Di Giacinto <mudler@localai.io>
2026-01-17 09:49:47 +01:00
LocalAI [bot]
5fe9bf9f84 chore: ⬆️ Update ggml-org/whisper.cpp to f53dc74843e97f19f94a79241357f74ad5b691a6 (#8074)
⬆️ Update ggml-org/whisper.cpp

Signed-off-by: github-actions[bot] <41898282+github-actions[bot]@users.noreply.github.com>
Co-authored-by: mudler <2420543+mudler@users.noreply.github.com>
2026-01-17 08:32:53 +01:00
LocalAI [bot]
d4fd0c0609 chore: ⬆️ Update ggml-org/llama.cpp to 388ce822415f24c60fcf164a321455f1e008cafb (#8073)
⬆️ Update ggml-org/llama.cpp

Signed-off-by: github-actions[bot] <41898282+github-actions[bot]@users.noreply.github.com>
Co-authored-by: mudler <2420543+mudler@users.noreply.github.com>
2026-01-16 21:22:33 +00:00
Ettore Di Giacinto
d16722ee13 Revert "chore(deps): bump torch from 2.3.1+cxx11.abi to 2.8.0 in /backend/python/rerankers in the pip group across 1 directory" (#8072)
Revert "chore(deps): bump torch from 2.3.1+cxx11.abi to 2.8.0 in /backend/pyt…"

This reverts commit 1f10ab39a9.
2026-01-16 20:50:33 +01:00
dependabot[bot]
1f10ab39a9 chore(deps): bump torch from 2.3.1+cxx11.abi to 2.8.0 in /backend/python/rerankers in the pip group across 1 directory (#8066)
chore(deps): bump torch

Bumps the pip group with 1 update in the /backend/python/rerankers directory: [torch](https://github.com/pytorch/pytorch).


Updates `torch` from 2.3.1+cxx11.abi to 2.8.0
- [Release notes](https://github.com/pytorch/pytorch/releases)
- [Changelog](https://github.com/pytorch/pytorch/blob/main/RELEASE.md)
- [Commits](https://github.com/pytorch/pytorch/commits/v2.8.0)

---
updated-dependencies:
- dependency-name: torch
  dependency-version: 2.8.0
  dependency-type: direct:production
  dependency-group: pip
...

Signed-off-by: dependabot[bot] <support@github.com>
Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com>
2026-01-16 19:38:12 +00:00
Ettore Di Giacinto
4d36e393d1 fix(ci): use more beefy runner for expensive jobs (#8065)
Signed-off-by: Ettore Di Giacinto <mudler@localai.io>
2026-01-16 19:26:40 +01:00
LocalAI [bot]
cb8616c7d1 chore: ⬆️ Update ggml-org/llama.cpp to 785a71008573e2d84728fb0ba9e851d72d3f8fab (#8053)
⬆️ Update ggml-org/llama.cpp

Signed-off-by: github-actions[bot] <41898282+github-actions[bot]@users.noreply.github.com>
Co-authored-by: mudler <2420543+mudler@users.noreply.github.com>
2026-01-15 22:53:17 +01:00
LocalAI [bot]
ff31d50488 chore: ⬆️ Update ggml-org/whisper.cpp to 2eeeba56e9edd762b4b38467bab96c2517163158 (#8052)
⬆️ Update ggml-org/whisper.cpp

Signed-off-by: github-actions[bot] <41898282+github-actions[bot]@users.noreply.github.com>
Co-authored-by: mudler <2420543+mudler@users.noreply.github.com>
2026-01-15 22:52:56 +01:00
Divyanshupandey007
1a50717e33 fix: reduce log verbosity for /api/operations polling (#8050)
* fix: reduce log verbosity for /api/operations polling

Reduces log clutter by changing the log level from INFO to DEBUG for successful (200 OK) /api/operations requests. This endpoint is polled frequently by the Web UI, causing log spam. Fixes #7989.

* fix: reduce log verbosity for /api/operations polling

Reduces log clutter by changing the log level from INFO to DEBUG for successful (200 OK) /api/operations requests. This endpoint is polled frequently by the Web UI, causing log spam. Fixes #7989.
2026-01-15 21:13:13 +01:00
LocalAI [bot]
49d6305509 chore: ⬆️ Update ggml-org/llama.cpp to d98b548120eecf98f0f6eaa1ba7e29b3afda9f2e (#8040)
⬆️ Update ggml-org/llama.cpp

Signed-off-by: github-actions[bot] <41898282+github-actions[bot]@users.noreply.github.com>
Co-authored-by: mudler <2420543+mudler@users.noreply.github.com>
2026-01-15 08:39:46 +01:00
Ettore Di Giacinto
d20a113aef fix(functions): do not duplicate function when valid JSON is inside XML tags (#8043)
Signed-off-by: Ettore Di Giacinto <mudler@localai.io>
2026-01-14 23:42:00 +01:00
LocalAI [bot]
cbaa793520 chore: ⬆️ Update ggml-org/whisper.cpp to 47af2fb70f7e4ee1ba40c8bed513760fdfe7a704 (#8039)
⬆️ Update ggml-org/whisper.cpp

Signed-off-by: github-actions[bot] <41898282+github-actions[bot]@users.noreply.github.com>
Co-authored-by: mudler <2420543+mudler@users.noreply.github.com>
2026-01-14 22:12:32 +01:00
Ettore Di Giacinto
6fe3fc880f Update section headers in README.md for clarity
Signed-off-by: Ettore Di Giacinto <mudler@users.noreply.github.com>
2026-01-14 22:11:58 +01:00
Ettore Di Giacinto
752e641c48 Clarify Docker usage in README
Updated Docker section in README to clarify usage.

Signed-off-by: Ettore Di Giacinto <mudler@users.noreply.github.com>
2026-01-14 22:10:59 +01:00
Ettore Di Giacinto
44d78b4d15 chore(doc): put alert on install.sh until is fixed (#8042)
See: https://github.com/mudler/LocalAI/issues/8032

Signed-off-by: Ettore Di Giacinto <mudler@localai.io>
2026-01-14 22:08:48 +01:00
Ettore Di Giacinto
64d0a96ba3 feat(ui): add video gen UI (#8020)
Signed-off-by: Ettore Di Giacinto <mudler@localai.io>
2026-01-14 11:43:32 +01:00
Ettore Di Giacinto
b19afc9e64 feat(diffusers): add support to LTX-2 (#8019)
* feat(diffusers): add support to LTX-2

Signed-off-by: Ettore Di Giacinto <mudler@localai.io>

* Add to the gallery

Signed-off-by: Ettore Di Giacinto <mudler@localai.io>

---------

Signed-off-by: Ettore Di Giacinto <mudler@localai.io>
2026-01-14 09:07:30 +01:00
LocalAI [bot]
d6e698876b chore: ⬆️ Update ggml-org/llama.cpp to e4832e3ae4d58ac0ecbdbf4ae055424d6e628c9f (#8015)
⬆️ Update ggml-org/llama.cpp

Signed-off-by: github-actions[bot] <41898282+github-actions[bot]@users.noreply.github.com>
Co-authored-by: mudler <2420543+mudler@users.noreply.github.com>
2026-01-14 08:09:37 +01:00
LocalAI [bot]
8962205546 chore: ⬆️ Update ggml-org/whisper.cpp to a96310871a3b294f026c3bcad4e715d17b5905fe (#8014)
⬆️ Update ggml-org/whisper.cpp

Signed-off-by: github-actions[bot] <41898282+github-actions[bot]@users.noreply.github.com>
Co-authored-by: mudler <2420543+mudler@users.noreply.github.com>
2026-01-14 08:09:00 +01:00
LocalAI [bot]
eddc460118 chore: ⬆️ Update leejet/stable-diffusion.cpp to 7010bb4dff7bd55b03d35ef9772142c21699eba9 (#8013)
⬆️ Update leejet/stable-diffusion.cpp

Signed-off-by: github-actions[bot] <41898282+github-actions[bot]@users.noreply.github.com>
Co-authored-by: mudler <2420543+mudler@users.noreply.github.com>
2026-01-14 08:08:31 +01:00
Ettore Di Giacinto
a6ff354c86 feat(tts): add pocket-tts backend (#8018)
* feat(pocket-tts): add new backend

Signed-off-by: Ettore Di Giacinto <mudler@localai.io>

* Add to the gallery

Signed-off-by: Ettore Di Giacinto <mudler@localai.io>

* fixups

Signed-off-by: Ettore Di Giacinto <mudler@localai.io>

* Update docs

Signed-off-by: Ettore Di Giacinto <mudler@localai.io>

---------

Signed-off-by: Ettore Di Giacinto <mudler@localai.io>
2026-01-13 23:35:19 +01:00
dependabot[bot]
3a2be4df48 chore(deps): bump github.com/onsi/ginkgo/v2 from 2.27.3 to 2.27.5 (#8004)
Bumps [github.com/onsi/ginkgo/v2](https://github.com/onsi/ginkgo) from 2.27.3 to 2.27.5.
- [Release notes](https://github.com/onsi/ginkgo/releases)
- [Changelog](https://github.com/onsi/ginkgo/blob/master/CHANGELOG.md)
- [Commits](https://github.com/onsi/ginkgo/compare/v2.27.3...v2.27.5)

---
updated-dependencies:
- dependency-name: github.com/onsi/ginkgo/v2
  dependency-version: 2.27.5
  dependency-type: direct:production
  update-type: version-update:semver-patch
...

Signed-off-by: dependabot[bot] <support@github.com>
Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com>
2026-01-13 09:06:20 +01:00
dependabot[bot]
4e1f448e86 chore(deps): bump fyne.io/fyne/v2 from 2.7.1 to 2.7.2 (#8003)
Bumps [fyne.io/fyne/v2](https://github.com/fyne-io/fyne) from 2.7.1 to 2.7.2.
- [Release notes](https://github.com/fyne-io/fyne/releases)
- [Changelog](https://github.com/fyne-io/fyne/blob/master/CHANGELOG.md)
- [Commits](https://github.com/fyne-io/fyne/compare/v2.7.1...v2.7.2)

---
updated-dependencies:
- dependency-name: fyne.io/fyne/v2
  dependency-version: 2.7.2
  dependency-type: direct:production
  update-type: version-update:semver-patch
...

Signed-off-by: dependabot[bot] <support@github.com>
Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com>
2026-01-13 08:45:58 +01:00
dependabot[bot]
3e0168360a chore(deps): bump github.com/gpustack/gguf-parser-go from 0.22.1 to 0.23.1 (#8001)
chore(deps): bump github.com/gpustack/gguf-parser-go

Bumps [github.com/gpustack/gguf-parser-go](https://github.com/gpustack/gguf-parser-go) from 0.22.1 to 0.23.1.
- [Release notes](https://github.com/gpustack/gguf-parser-go/releases)
- [Commits](https://github.com/gpustack/gguf-parser-go/compare/v0.22.1...v0.23.1)

---
updated-dependencies:
- dependency-name: github.com/gpustack/gguf-parser-go
  dependency-version: 0.23.1
  dependency-type: direct:production
  update-type: version-update:semver-minor
...

Signed-off-by: dependabot[bot] <support@github.com>
Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com>
2026-01-13 08:45:35 +01:00
dependabot[bot]
ea4157887b chore(deps): bump github.com/onsi/gomega from 1.38.3 to 1.39.0 (#8000)
Bumps [github.com/onsi/gomega](https://github.com/onsi/gomega) from 1.38.3 to 1.39.0.
- [Release notes](https://github.com/onsi/gomega/releases)
- [Changelog](https://github.com/onsi/gomega/blob/master/CHANGELOG.md)
- [Commits](https://github.com/onsi/gomega/compare/v1.38.3...v1.39.0)

---
updated-dependencies:
- dependency-name: github.com/onsi/gomega
  dependency-version: 1.39.0
  dependency-type: direct:production
  update-type: version-update:semver-minor
...

Signed-off-by: dependabot[bot] <support@github.com>
Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com>
2026-01-13 08:45:18 +01:00
dependabot[bot]
699c50be47 chore(deps): bump github.com/mudler/go-processmanager from 0.0.0-20240820160718-8b802d3ecf82 to 0.1.0 (#7992)
chore(deps): bump github.com/mudler/go-processmanager

Bumps [github.com/mudler/go-processmanager](https://github.com/mudler/go-processmanager) from 0.0.0-20240820160718-8b802d3ecf82 to 0.1.0.
- [Release notes](https://github.com/mudler/go-processmanager/releases)
- [Commits](https://github.com/mudler/go-processmanager/commits/v0.1.0)

---
updated-dependencies:
- dependency-name: github.com/mudler/go-processmanager
  dependency-version: 0.1.0
  dependency-type: direct:production
  update-type: version-update:semver-minor
...

Signed-off-by: dependabot[bot] <support@github.com>
Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com>
2026-01-13 08:44:53 +01:00
dependabot[bot]
94eecc43a3 chore(deps): bump protobuf from 6.33.2 to 6.33.4 in /backend/python/transformers (#7993)
chore(deps): bump protobuf in /backend/python/transformers

Bumps [protobuf](https://github.com/protocolbuffers/protobuf) from 6.33.2 to 6.33.4.
- [Release notes](https://github.com/protocolbuffers/protobuf/releases)
- [Commits](https://github.com/protocolbuffers/protobuf/commits)

---
updated-dependencies:
- dependency-name: protobuf
  dependency-version: 6.33.4
  dependency-type: direct:production
  update-type: version-update:semver-patch
...

Signed-off-by: dependabot[bot] <support@github.com>
Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com>
2026-01-12 23:46:32 +00:00
LocalAI [bot]
7e35ec6c4f chore: ⬆️ Update ggml-org/llama.cpp to bcf7546160982f56bc290d2e538544bbc0772f63 (#7991)
⬆️ Update ggml-org/llama.cpp

Signed-off-by: github-actions[bot] <41898282+github-actions[bot]@users.noreply.github.com>
Co-authored-by: mudler <2420543+mudler@users.noreply.github.com>
2026-01-12 21:14:33 +00:00
Ettore Di Giacinto
7891c33cb1 chore(vulkan): bump vulkan-sdk to 1.4.335.0 (#7981)
Signed-off-by: Ettore Di Giacinto <mudler@localai.io>
2026-01-12 07:51:26 +01:00
Ettore Di Giacinto
271cc79709 chore(backends): do not bundle cuda target directory (#7982)
Signed-off-by: Ettore Di Giacinto <mudler@localai.io>
2026-01-12 07:51:09 +01:00
LocalAI [bot]
3d12d5e70d chore: ⬆️ Update leejet/stable-diffusion.cpp to 885e62ea822e674c6837a8225d2d75f021b97a6a (#7979)
⬆️ Update leejet/stable-diffusion.cpp

Signed-off-by: github-actions[bot] <41898282+github-actions[bot]@users.noreply.github.com>
Co-authored-by: mudler <2420543+mudler@users.noreply.github.com>
2026-01-11 22:44:11 +01:00
LocalAI [bot]
bc180c2638 chore: ⬆️ Update ggml-org/llama.cpp to 0c3b7a9efebc73d206421c99b7eb6b6716231322 (#7978)
⬆️ Update ggml-org/llama.cpp

Signed-off-by: github-actions[bot] <41898282+github-actions[bot]@users.noreply.github.com>
Co-authored-by: mudler <2420543+mudler@users.noreply.github.com>
2026-01-11 22:06:30 +01:00
Ettore Di Giacinto
2de30440fe fix(l4t-12): use pip to install python deps (#7967)
* fix: install only torch/torchvision from jetson index

Signed-off-by: Ettore Di Giacinto <mudler@localai.io>

* fix: use pip for l4t-12

Signed-off-by: Ettore Di Giacinto <mudler@localai.io>

* Revert "fix: install only torch/torchvision from jetson index"

This reverts commit 2d2b020078

* chatterbox needs wheel

Signed-off-by: Ettore Di Giacinto <mudler@localai.io>

---------

Signed-off-by: Ettore Di Giacinto <mudler@localai.io>
2026-01-11 00:21:32 +01:00
Copilot
673a80a578 feat: Filter backend gallery by system capabilities (#7950)
* Initial plan

* Add backend gallery filtering based on system capabilities

Co-authored-by: mudler <2420543+mudler@users.noreply.github.com>

* Refactor L4T backend check to come before NVIDIA check

Co-authored-by: mudler <2420543+mudler@users.noreply.github.com>

* Refactor: move capabilities business logic to capabilities.go and use constants

Co-authored-by: mudler <2420543+mudler@users.noreply.github.com>

* feat: display system capability in webui and refactor tests

Co-authored-by: mudler <2420543+mudler@users.noreply.github.com>

* chore: rename System/Capability

Signed-off-by: Ettore Di Giacinto <mudler@localai.io>

* refactor: use getSystemCapabilities in IsBackendCompatible for consistency

Co-authored-by: mudler <2420543+mudler@users.noreply.github.com>

* refactor: keep unused constants private in capabilities.go

Co-authored-by: mudler <2420543+mudler@users.noreply.github.com>

* fix: skip AMD/ROCm and Intel/SYCL tests on darwin

Co-authored-by: mudler <2420543+mudler@users.noreply.github.com>

---------

Signed-off-by: Ettore Di Giacinto <mudler@localai.io>
Co-authored-by: copilot-swe-agent[bot] <198982749+Copilot@users.noreply.github.com>
Co-authored-by: mudler <2420543+mudler@users.noreply.github.com>
Co-authored-by: Ettore Di Giacinto <mudler@localai.io>
2026-01-10 23:34:01 +01:00
Jon Roeber
2554e9fabe fix(model): do not assume success when deleting a model process (#7963)
* fix(model): do not assume success when deleting a model process

Signed-off-by: Jon Roeber <jon@roeber.dev>

* Update pkg/model/process.go

Co-authored-by: Ettore Di Giacinto <mudler@users.noreply.github.com>
Signed-off-by: Jon Roeber <65431671+jroeber@users.noreply.github.com>

---------

Signed-off-by: Jon Roeber <jon@roeber.dev>
Signed-off-by: Jon Roeber <65431671+jroeber@users.noreply.github.com>
Co-authored-by: Ettore Di Giacinto <mudler@users.noreply.github.com>
2026-01-10 23:33:44 +01:00
LocalAI [bot]
5bfc3eebf8 chore: ⬆️ Update ggml-org/llama.cpp to b1377188784f9aea26b8abde56d4aee8c733eec7 (#7965)
⬆️ Update ggml-org/llama.cpp

Signed-off-by: github-actions[bot] <41898282+github-actions[bot]@users.noreply.github.com>
Co-authored-by: mudler <2420543+mudler@users.noreply.github.com>
2026-01-10 22:24:26 +01:00
LocalAI [bot]
ab893fe302 feat(swagger): update swagger (#7964)
Signed-off-by: github-actions[bot] <41898282+github-actions[bot]@users.noreply.github.com>
Co-authored-by: mudler <2420543+mudler@users.noreply.github.com>
2026-01-10 21:46:23 +01:00
Ettore Di Giacinto
c88074a19e feat(api): support 'reasoning' api field (#7959)
This PR adds support to support the 'reasoning' API field of the OpenAI
spec.

LocalAI now will extract automatically thinking tags in both SSE and
non-SSE mode. The changes are adapted as well to the Chat UI now that
will use the reasoning field to extract the thinking process and display
it in the chat.

This fixes https://github.com/mudler/LocalAI/issues/7944

Signed-off-by: Ettore Di Giacinto <mudler@localai.io>
2026-01-10 19:06:12 +01:00
Copilot
5ca8f0aea0 feat: add tool/function calling support to Anthropic Messages API (#7956)
* Initial plan

* Add tool/function calling schema support to Anthropic Messages API

Co-authored-by: mudler <2420543+mudler@users.noreply.github.com>

* Add E2E tests for Anthropic tool calling

Co-authored-by: mudler <2420543+mudler@users.noreply.github.com>

* Make tool calling tests require model to use tools

- First test now expects hasToolUse to be true with clear error message
- Third test now expects toolUseID to be non-empty (removed conditional)
- Both tests will now fail if model doesn't call the expected tools

Co-authored-by: mudler <2420543+mudler@users.noreply.github.com>

* Add E2E test for tool calling with streaming responses

- Tests that streaming events are properly emitted (content_block_start/delta/stop)
- Verifies tool_use blocks are accumulated correctly in streaming mode
- Ensures model calls tools and stop_reason is set to tool_use

Co-authored-by: mudler <2420543+mudler@users.noreply.github.com>

---------

Co-authored-by: copilot-swe-agent[bot] <198982749+Copilot@users.noreply.github.com>
Co-authored-by: mudler <2420543+mudler@users.noreply.github.com>
2026-01-10 18:44:22 +01:00
LocalAI [bot]
84234e531f chore(model gallery): 🤖 add 1 new models via gallery agent (#7954)
chore(model gallery): 🤖 add new models via gallery agent

Signed-off-by: github-actions[bot] <41898282+github-actions[bot]@users.noreply.github.com>
Co-authored-by: mudler <2420543+mudler@users.noreply.github.com>
2026-01-10 12:34:23 +01:00
Copilot
4cbf9abfef feat: Add Anthropic Messages API support (#7948)
* Initial plan

* Add Anthropic Messages API support

Co-authored-by: mudler <2420543+mudler@users.noreply.github.com>

* Fix code review comments: add error handling for JSON operations

Co-authored-by: mudler <2420543+mudler@users.noreply.github.com>

* Fix test suite to use existing schema test runner

Co-authored-by: mudler <2420543+mudler@users.noreply.github.com>

* Add Anthropic e2e tests using anthropic-sdk-go for streaming and non-streaming

Co-authored-by: mudler <2420543+mudler@users.noreply.github.com>

* Co-authored-by: mudler <2420543+mudler@users.noreply.github.com>

---------

Co-authored-by: copilot-swe-agent[bot] <198982749+Copilot@users.noreply.github.com>
Co-authored-by: mudler <2420543+mudler@users.noreply.github.com>
2026-01-10 12:33:05 +01:00
LocalAI [bot]
fdc2c0737c chore: ⬆️ Update ggml-org/llama.cpp to 593da7fa49503b68f9f01700be9f508f1e528992 (#7946)
⬆️ Update ggml-org/llama.cpp

Signed-off-by: github-actions[bot] <41898282+github-actions[bot]@users.noreply.github.com>
Co-authored-by: mudler <2420543+mudler@users.noreply.github.com>
2026-01-09 21:13:04 +00:00
Ettore Di Giacinto
f4b0a304d7 chore(llama.cpp): propagate errors during model load (#7937)
Signed-off-by: Ettore Di Giacinto <mudler@localai.io>
2026-01-09 07:52:49 +01:00
Ettore Di Giacinto
d16ec7aa9e chore(deps): Bump llama.cpp to '480160d47297df43b43746294963476fc0a6e10f' (#7933)
Signed-off-by: Ettore Di Giacinto <mudler@localai.io>
2026-01-09 07:52:32 +01:00
Ettore Di Giacinto
d699b7ccdc Add backend configuration for Granite embedding model
Signed-off-by: Ettore Di Giacinto <mudler@users.noreply.github.com>
2026-01-09 00:44:10 +01:00
Ettore Di Giacinto
a4d224dd1b Revert "chore(uv): add --index-strategy=unsafe-first-match to l4t" (#7936)
Revert "chore(uv): add --index-strategy=unsafe-first-match to l4t (#7934)"

This reverts commit f5dee90962.
2026-01-08 23:31:51 +01:00
Ettore Di Giacinto
917c7aa9f3 chore(ci): roll back l4t-cuda12 configurations (#7935)
Signed-off-by: Ettore Di Giacinto <mudler@localai.io>
2026-01-08 23:04:33 +01:00
LocalAI [bot]
5aa66842dd chore: ⬆️ Update leejet/stable-diffusion.cpp to 0e52afc6513cc2dea9a1a017afc4a008d5acf2b0 (#7930)
⬆️ Update leejet/stable-diffusion.cpp

Signed-off-by: github-actions[bot] <41898282+github-actions[bot]@users.noreply.github.com>
Co-authored-by: mudler <2420543+mudler@users.noreply.github.com>
2026-01-08 22:48:46 +01:00
Ettore Di Giacinto
f5dee90962 chore(uv): add --index-strategy=unsafe-first-match to l4t (#7934)
This is because the main index might not contain all the dependencies
for torch

Signed-off-by: Ettore Di Giacinto <mudler@localai.io>
2026-01-08 22:48:03 +01:00
Copilot
06323df457 Optimize GPU library copying to preserve symlinks and avoid duplicates (#7931)
* Initial plan

* Optimize library copying to preserve symlinks and avoid duplicates

Co-authored-by: mudler <2420543+mudler@users.noreply.github.com>

* Address code review feedback: extract get_inode helper, use file type detection for sorting

Co-authored-by: mudler <2420543+mudler@users.noreply.github.com>

* Simplify implementation by removing inode tracking

Co-authored-by: mudler <2420543+mudler@users.noreply.github.com>

* Add clarifying comment about basename deduplication

Co-authored-by: mudler <2420543+mudler@users.noreply.github.com>

---------

Co-authored-by: copilot-swe-agent[bot] <198982749+Copilot@users.noreply.github.com>
Co-authored-by: mudler <2420543+mudler@users.noreply.github.com>
2026-01-08 22:26:48 +01:00
Richard Palethorpe
98f28bf583 chore(docs): Add Crush and VoxInput to the integrations (#7924)
* chore(docs): Add Crush and VoxInput to the integrations

Signed-off-by: Richard Palethorpe <io@richiejp.com>

* Apply suggestion from @mudler

Signed-off-by: Ettore Di Giacinto <mudler@users.noreply.github.com>

---------

Signed-off-by: Richard Palethorpe <io@richiejp.com>
Signed-off-by: Ettore Di Giacinto <mudler@users.noreply.github.com>
Co-authored-by: Ettore Di Giacinto <mudler@users.noreply.github.com>
2026-01-08 21:39:25 +01:00
Ettore Di Giacinto
383312b50e chore(l4t-12): do not use python 3.12 (wheels are only for 3.10) (#7928)
Signed-off-by: Ettore Di Giacinto <mudler@localai.io>
2026-01-08 19:00:07 +01:00
Ettore Di Giacinto
b736db4bbe chore(ci): use latest jetpack image for l4t (#7926)
This image is for HW prior Jetpack 7. Jetpack 7 broke compatibility with
older devices (which are still in use) such as AGX Orin or Jetsons.

While we do have l4t-cuda-13 images with sbsa support for new Nvidia
devices (Thor, DGX, etc). For older HW we are forced to keep old images
around as 24.04 does not seem to be supported.

Signed-off-by: Ettore Di Giacinto <mudler@localai.io>
2026-01-08 18:30:59 +01:00
LocalAI [bot]
09bc2e4a00 chore(model gallery): 🤖 add 1 new models via gallery agent (#7922)
chore(model gallery): 🤖 add new models via gallery agent

Signed-off-by: github-actions[bot] <41898282+github-actions[bot]@users.noreply.github.com>
Co-authored-by: mudler <2420543+mudler@users.noreply.github.com>
2026-01-08 11:06:21 +01:00
LocalAI [bot]
c03e532a18 chore: ⬆️ Update ggml-org/llama.cpp to ae9f8df77882716b1702df2bed8919499e64cc28 (#7915)
⬆️ Update ggml-org/llama.cpp

Signed-off-by: github-actions[bot] <41898282+github-actions[bot]@users.noreply.github.com>
Co-authored-by: mudler <2420543+mudler@users.noreply.github.com>
2026-01-07 23:24:01 +01:00
Ettore Di Giacinto
fcb58ee243 fix(intel): Add ARG for Ubuntu codename in Dockerfile (#7917)
Signed-off-by: Ettore Di Giacinto <mudler@users.noreply.github.com>
2026-01-07 21:55:18 +01:00
Copilot
b2ff1cea2a feat: enable Vulkan arm64 image builds (#7912)
* Initial plan

* Add arm64 support for Vulkan builds in Dockerfiles and workflows

Co-authored-by: mudler <2420543+mudler@users.noreply.github.com>

---------

Co-authored-by: copilot-swe-agent[bot] <198982749+Copilot@users.noreply.github.com>
Co-authored-by: mudler <2420543+mudler@users.noreply.github.com>
2026-01-07 21:49:50 +01:00
Ettore Di Giacinto
b964b3d53e feat(backends): add moonshine backend for faster transcription (#7833)
* feat(backends): add moonshine backend for faster transcription

Signed-off-by: Ettore Di Giacinto <mudler@localai.io>

* Add backend to CI, update AGENTS.md from this exercise

Signed-off-by: Ettore Di Giacinto <mudler@localai.io>

---------

Signed-off-by: Ettore Di Giacinto <mudler@localai.io>
2026-01-07 21:44:35 +01:00
LocalAI [bot]
0b26669d0b chore(model gallery): 🤖 add 1 new models via gallery agent (#7916)
chore(model gallery): 🤖 add new models via gallery agent

Signed-off-by: github-actions[bot] <41898282+github-actions[bot]@users.noreply.github.com>
Co-authored-by: mudler <2420543+mudler@users.noreply.github.com>
2026-01-07 21:43:39 +01:00
Ettore Di Giacinto
5a9698bc69 chore(Dockerfile): restore GPU vendor specific sections (#7911)
Until we figure out https://github.com/mudler/LocalAI/issues/7909

Signed-off-by: Ettore Di Giacinto <mudler@localai.io>
2026-01-07 16:34:23 +01:00
Ettore Di Giacinto
1fe0e9f74f chore(ci): restore building of GPU vendor images (#7910)
Until we figure out https://github.com/mudler/LocalAI/issues/7909

Signed-off-by: Ettore Di Giacinto <mudler@localai.io>
2026-01-07 16:32:22 +01:00
Ettore Di Giacinto
ffb2dc4666 chore(detection): detect GPU vendor from files present in the system (#7908)
Signed-off-by: Ettore Di Giacinto <mudler@localai.io>
2026-01-07 16:18:27 +01:00
Ettore Di Giacinto
cfc2225fc7 chore(dockerfile): drop driver-requirements section (#7907)
* chore(dockerfile): drop driver-requirements section

Signed-off-by: Ettore Di Giacinto <mudler@localai.io>

* chore(ci): drop other builds

Signed-off-by: Ettore Di Giacinto <mudler@localai.io>

---------

Signed-off-by: Ettore Di Giacinto <mudler@localai.io>
2026-01-07 16:18:14 +01:00
Copilot
fd53978a7b feat: package GPU libraries inside backend containers for unified base image (#7891)
* Initial plan

* Add GPU library packaging for isolated backend environments

- Create scripts/build/package-gpu-libs.sh for packaging CUDA, ROCm, SYCL, and Vulkan libraries
- Update llama-cpp, whisper, stablediffusion-ggml package.sh to include GPU libraries
- Update Dockerfile.python to package GPU libraries into Python backends
- Update libbackend.sh to set LD_LIBRARY_PATH for GPU library loading

Co-authored-by: mudler <2420543+mudler@users.noreply.github.com>

* Address code review feedback: fix variable consistency and quoting

Co-authored-by: mudler <2420543+mudler@users.noreply.github.com>

* Fix code review issues: improve glob handling and remove redundant variable

Co-authored-by: mudler <2420543+mudler@users.noreply.github.com>

* Simplify main Dockerfile and workflow to use unified base image

- Remove GPU-specific driver installation from Dockerfile (CUDA, ROCm, Vulkan, Intel)
- Simplify image.yml workflow to build single unified base image for linux/amd64 and linux/arm64
- GPU libraries are now packaged in individual backend containers

Co-authored-by: mudler <2420543+mudler@users.noreply.github.com>

---------

Co-authored-by: copilot-swe-agent[bot] <198982749+Copilot@users.noreply.github.com>
Co-authored-by: mudler <2420543+mudler@users.noreply.github.com>
Co-authored-by: Ettore Di Giacinto <mudler@users.noreply.github.com>
2026-01-07 15:48:51 +01:00
LocalAI [bot]
7abc0242bb chore(model gallery): 🤖 add 1 new models via gallery agent (#7903)
chore(model gallery): 🤖 add new models via gallery agent

Signed-off-by: github-actions[bot] <41898282+github-actions[bot]@users.noreply.github.com>
Co-authored-by: mudler <2420543+mudler@users.noreply.github.com>
2026-01-07 09:46:36 +01:00
LocalAI [bot]
23df29fbd3 chore: ⬆️ Update leejet/stable-diffusion.cpp to 9be0b91927dfa4007d053df72dea7302990226bb (#7895)
⬆️ Update leejet/stable-diffusion.cpp

Signed-off-by: github-actions[bot] <41898282+github-actions[bot]@users.noreply.github.com>
Co-authored-by: mudler <2420543+mudler@users.noreply.github.com>
2026-01-06 22:18:53 +01:00
LocalAI [bot]
fb9879949c chore: ⬆️ Update ggml-org/llama.cpp to ccbc84a5374bab7a01f68b129411772ddd8e7c79 (#7894)
⬆️ Update ggml-org/llama.cpp

Signed-off-by: github-actions[bot] <41898282+github-actions[bot]@users.noreply.github.com>
Co-authored-by: mudler <2420543+mudler@users.noreply.github.com>
2026-01-06 22:18:35 +01:00
Manish Dewangan
1642b39cb8 [gallery] add JSON schema for gallery model specification (#7890)
Add JSON Schema for gallery model specification

Signed-off-by: devmanishofficial <devmanishofficial@gmail.com>
2026-01-06 22:10:43 +01:00
Richard Palethorpe
e6ba26c3e7 chore: Update to Ubuntu24.04 (cont #7423) (#7769)
* ci(workflows): bump GitHub Actions images to Ubuntu 24.04

Signed-off-by: Alessandro Sturniolo <alessandro.sturniolo@gmail.com>

* ci(workflows): remove CUDA 11.x support from GitHub Actions (incompatible with ubuntu:24.04)

Signed-off-by: Alessandro Sturniolo <alessandro.sturniolo@gmail.com>

* ci(workflows): bump GitHub Actions CUDA support to 12.9

Signed-off-by: Alessandro Sturniolo <alessandro.sturniolo@gmail.com>

* build(docker): bump base image to ubuntu:24.04 and adjust Vulkan SDK/packages

Signed-off-by: Alessandro Sturniolo <alessandro.sturniolo@gmail.com>

* fix(backend): correct context paths for Python backends in workflows, Makefile and Dockerfile

Signed-off-by: Alessandro Sturniolo <alessandro.sturniolo@gmail.com>

* chore(make): disable parallel backend builds to avoid race conditions

Signed-off-by: Alessandro Sturniolo <alessandro.sturniolo@gmail.com>

* chore(make): export CUDA_MAJOR_VERSION and CUDA_MINOR_VERSION for override

Signed-off-by: Alessandro Sturniolo <alessandro.sturniolo@gmail.com>

* build(backend): update backend Dockerfiles to Ubuntu 24.04

Signed-off-by: Alessandro Sturniolo <alessandro.sturniolo@gmail.com>

* chore(backend): add ROCm env vars and default AMDGPU_TARGETS for hipBLAS builds

Signed-off-by: Alessandro Sturniolo <alessandro.sturniolo@gmail.com>

* chore(chatterbox): bump ROCm PyTorch to 2.9.1+rocm6.4 and update index URL; align hipblas requirements

Signed-off-by: Alessandro Sturniolo <alessandro.sturniolo@gmail.com>

* chore: add local-ai-launcher to .gitignore

Signed-off-by: Alessandro Sturniolo <alessandro.sturniolo@gmail.com>

* ci(workflows): fix backends GitHub Actions workflows after rebase

Signed-off-by: Alessandro Sturniolo <alessandro.sturniolo@gmail.com>

* build(docker): use build-time UBUNTU_VERSION variable

Signed-off-by: Alessandro Sturniolo <alessandro.sturniolo@gmail.com>

* chore(docker): remove libquadmath0 from requirements-stage base image

Signed-off-by: Alessandro Sturniolo <alessandro.sturniolo@gmail.com>

* chore(make): add backends/vllm to .NOTPARALLEL to prevent parallel builds

Signed-off-by: Alessandro Sturniolo <alessandro.sturniolo@gmail.com>

* fix(docker): correct CUDA installation steps in backend Dockerfiles

Signed-off-by: Alessandro Sturniolo <alessandro.sturniolo@gmail.com>

* chore(backend): update ROCm to 6.4 and align Python hipblas requirements

Signed-off-by: Alessandro Sturniolo <alessandro.sturniolo@gmail.com>

* ci(workflows): switch GitHub Actions runners to Ubuntu-24.04 for CUDA on arm64 builds

Signed-off-by: Alessandro Sturniolo <alessandro.sturniolo@gmail.com>

* build(docker): update base image and backend Dockerfiles for Ubuntu 24.04 compatibility on arm64

Signed-off-by: Alessandro Sturniolo <alessandro.sturniolo@gmail.com>

* build(backend): increase timeout for uv installs behind slow networks on backend/Dockerfile.python

Signed-off-by: Alessandro Sturniolo <alessandro.sturniolo@gmail.com>

* ci(workflows): switch GitHub Actions runners to Ubuntu-24.04 for vibevoice backend

Signed-off-by: Alessandro Sturniolo <alessandro.sturniolo@gmail.com>

* ci(workflows): fix failing GitHub Actions runners

Signed-off-by: Alessandro Sturniolo <alessandro.sturniolo@gmail.com>

* fix: Allow FROM_SOURCE to be unset, use upstream Intel images etc.

Signed-off-by: Richard Palethorpe <io@richiejp.com>

* chore(build): rm all traces of CUDA 11

Signed-off-by: Richard Palethorpe <io@richiejp.com>

* chore(build): Add Ubuntu codename as an argument

Signed-off-by: Richard Palethorpe <io@richiejp.com>

---------

Signed-off-by: Alessandro Sturniolo <alessandro.sturniolo@gmail.com>
Signed-off-by: Richard Palethorpe <io@richiejp.com>
Co-authored-by: Alessandro Sturniolo <alessandro.sturniolo@gmail.com>
2026-01-06 15:26:42 +01:00
Ettore Di Giacinto
26c4f80d1b chore(llama.cpp/flags): simplify conditionals (#7887)
If ggml handle conditionals correctly we don't need to handle it here.

Signed-off-by: Ettore Di Giacinto <mudler@localai.io>
2026-01-06 15:02:20 +01:00
coffeerunhobby
5add7b47f5 fix: BMI2 crash on AVX-only CPUs (Intel Ivy Bridge/Sandy Bridge) (#7864)
* Fix BMI2 crash on AVX-only CPUs (Intel Ivy Bridge/Sandy Bridge)

Signed-off-by: coffeerunhobby <coffeerunhobby@users.noreply.github.com>

* Address feedback from review

Signed-off-by: Ettore Di Giacinto <mudler@localai.io>

---------

Signed-off-by: coffeerunhobby <coffeerunhobby@users.noreply.github.com>
Signed-off-by: Ettore Di Giacinto <mudler@localai.io>
Co-authored-by: coffeerunhobby <coffeerunhobby@users.noreply.github.com>
Co-authored-by: Ettore Di Giacinto <mudler@localai.io>
Co-authored-by: Ettore Di Giacinto <mudler@users.noreply.github.com>
2026-01-06 00:13:48 +00:00
Ettore Di Giacinto
3244ccc224 chore(image-ui): simplify interface (#7882)
Signed-off-by: Ettore Di Giacinto <mudler@localai.io>
2026-01-05 23:20:28 +01:00
LocalAI [bot]
4f7b6b0bff chore: ⬆️ Update ggml-org/llama.cpp to e443fbcfa51a8a27b15f949397ab94b5e87b2450 (#7881)
⬆️ Update ggml-org/llama.cpp

Signed-off-by: github-actions[bot] <41898282+github-actions[bot]@users.noreply.github.com>
Co-authored-by: mudler <2420543+mudler@users.noreply.github.com>
2026-01-05 22:55:40 +01:00
LocalAI [bot]
3a629cea2f chore: ⬆️ Update ggml-org/whisper.cpp to 679bdb53dbcbfb3e42685f50c7ff367949fd4d48 (#7879)
⬆️ Update ggml-org/whisper.cpp

Signed-off-by: github-actions[bot] <41898282+github-actions[bot]@users.noreply.github.com>
Co-authored-by: mudler <2420543+mudler@users.noreply.github.com>
2026-01-05 22:55:16 +01:00
LocalAI [bot]
f917feda29 chore: ⬆️ Update leejet/stable-diffusion.cpp to c5602a676caff5fe5a9f3b76b2bc614faf5121a5 (#7880)
⬆️ Update leejet/stable-diffusion.cpp

Signed-off-by: github-actions[bot] <41898282+github-actions[bot]@users.noreply.github.com>
Co-authored-by: mudler <2420543+mudler@users.noreply.github.com>
2026-01-05 22:54:56 +01:00
dependabot[bot]
e2018cdc8f chore(deps): bump github.com/labstack/echo/v4 from 4.14.0 to 4.15.0 (#7875)
Bumps [github.com/labstack/echo/v4](https://github.com/labstack/echo) from 4.14.0 to 4.15.0.
- [Release notes](https://github.com/labstack/echo/releases)
- [Changelog](https://github.com/labstack/echo/blob/master/CHANGELOG.md)
- [Commits](https://github.com/labstack/echo/compare/v4.14.0...v4.15.0)

---
updated-dependencies:
- dependency-name: github.com/labstack/echo/v4
  dependency-version: 4.15.0
  dependency-type: direct:production
  update-type: version-update:semver-minor
...

Signed-off-by: dependabot[bot] <support@github.com>
Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com>
2026-01-05 22:54:30 +01:00
Manish Dewangan
a3b8a94187 fix(ui): fix 404 on API menu link by pointing to index.html (#7878)
Signed-off-by: devmanishofficial <devmanishofficial@gmail.com>
2026-01-05 22:54:14 +01:00
dependabot[bot]
41de7d32ad chore(deps): bump dependabot/fetch-metadata from 2.4.0 to 2.5.0 (#7876)
Bumps [dependabot/fetch-metadata](https://github.com/dependabot/fetch-metadata) from 2.4.0 to 2.5.0.
- [Release notes](https://github.com/dependabot/fetch-metadata/releases)
- [Commits](https://github.com/dependabot/fetch-metadata/compare/v2.4.0...v2.5.0)

---
updated-dependencies:
- dependency-name: dependabot/fetch-metadata
  dependency-version: 2.5.0
  dependency-type: direct:production
  update-type: version-update:semver-minor
...

Signed-off-by: dependabot[bot] <support@github.com>
Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com>
2026-01-05 20:10:07 +00:00
Richard Palethorpe
93364df0a8 chore(AGENTS.md): Add section to help with building backends (#7871)
Signed-off-by: Richard Palethorpe <io@richiejp.com>
2026-01-05 18:25:52 +01:00
Ettore Di Giacinto
21c84f432f feat(function): Add tool streaming, XML Tool Call Parsing Support (#7865)
* feat(function): Add XML Tool Call Parsing Support

Extend the function parsing system in LocalAI to support XML-style tool calls, similar to how JSON tool calls are currently parsed. This will allow models that return XML format (like <tool_call><function=name><parameter=key>value</parameter></function></tool_call>) to be properly parsed alongside text content.

Signed-off-by: Ettore Di Giacinto <mudler@localai.io>

* thinking before tool calls, more strict support for corner cases with no tools

Signed-off-by: Ettore Di Giacinto <mudler@localai.io>

* Support streaming tools

Signed-off-by: Ettore Di Giacinto <mudler@localai.io>

* Iterative JSON

Signed-off-by: Ettore Di Giacinto <mudler@localai.io>

* Iterative parsing

Signed-off-by: Ettore Di Giacinto <mudler@localai.io>

* Consume JSON marker

Signed-off-by: Ettore Di Giacinto <mudler@localai.io>

* Fixup

Signed-off-by: Ettore Di Giacinto <mudler@localai.io>

* add tests

Signed-off-by: Ettore Di Giacinto <mudler@localai.io>

* Fix pending TODOs

Signed-off-by: Ettore Di Giacinto <mudler@localai.io>

* Don't run other parsing with ParseRegex

Signed-off-by: Ettore Di Giacinto <mudler@localai.io>

---------

Signed-off-by: Ettore Di Giacinto <mudler@localai.io>
2026-01-05 18:25:40 +01:00
LocalAI [bot]
9d3da0bed5 chore: ⬆️ Update ggml-org/llama.cpp to 4974bf53cf14073c7b66e1151348156aabd42cb8 (#7861)
⬆️ Update ggml-org/llama.cpp

Signed-off-by: github-actions[bot] <41898282+github-actions[bot]@users.noreply.github.com>
Co-authored-by: mudler <2420543+mudler@users.noreply.github.com>
2026-01-05 00:10:18 +01:00
LocalAI [bot]
1b063b5595 chore: ⬆️ Update leejet/stable-diffusion.cpp to b90b1ee9cf84ea48b478c674dd2ec6a33fd504d6 (#7862)
⬆️ Update leejet/stable-diffusion.cpp

Signed-off-by: github-actions[bot] <41898282+github-actions[bot]@users.noreply.github.com>
Co-authored-by: mudler <2420543+mudler@users.noreply.github.com>
2026-01-04 23:52:01 +01:00
Ettore Di Giacinto
560bf50299 chore(Makefile): refactor common make targets (#7858)
Signed-off-by: Ettore Di Giacinto <mudler@localai.io>
2026-01-04 21:12:50 +01:00
LocalAI [bot]
a7e155240b chore: ⬆️ Update ggml-org/llama.cpp to e57f52334b2e8436a94f7e332462dfc63a08f995 (#7848)
⬆️ Update ggml-org/llama.cpp

Signed-off-by: github-actions[bot] <41898282+github-actions[bot]@users.noreply.github.com>
Co-authored-by: mudler <2420543+mudler@users.noreply.github.com>
2026-01-04 10:27:45 +01:00
LocalAI [bot]
793e4907a2 feat(swagger): update swagger (#7847)
Signed-off-by: github-actions[bot] <41898282+github-actions[bot]@users.noreply.github.com>
Co-authored-by: mudler <2420543+mudler@users.noreply.github.com>
2026-01-03 22:09:39 +01:00
Ettore Di Giacinto
d38811560c chore(docs): add opencode, GHA, and realtime voice assistant examples
Signed-off-by: Ettore Di Giacinto <mudler@localai.io>
2026-01-03 22:03:43 +01:00
Ettore Di Giacinto
33cc0b8e13 fix(chat/ui): record model name in history for consistency (#7845)
Signed-off-by: Ettore Di Giacinto <mudler@localai.io>
2026-01-03 18:05:33 +01:00
lif
4cd95b8a9d fix: Highly inconsistent agent response to cogito agent calling MCP server - Body "Invalid http method" (#7790)
* fix: resolve duplicate MCP route registration causing 50% failure rate

Fixes #7772

The issue was caused by duplicate registration of the MCP endpoint
/mcp/v1/chat/completions in both openai.go and localai.go, leading
to a race condition where requests would randomly hit different
handlers with incompatible behaviors.

Changes:
- Removed duplicate MCP route registration from openai.go
- Kept the localai.MCPStreamEndpoint as the canonical handler
- Added all three MCP route patterns for backward compatibility:
  * /v1/mcp/chat/completions
  * /mcp/v1/chat/completions
  * /mcp/chat/completions
- Added comments to clarify route ownership and prevent future conflicts
- Fixed formatting in ui_api.go

The localai.MCPStreamEndpoint handler is more feature-complete as it
supports both streaming and non-streaming modes, while the removed
openai.MCPCompletionEndpoint only supported synchronous requests.

This eliminates the ~50% failure rate where the cogito library would
receive "Invalid http method" errors when internal HTTP requests were
routed to the wrong handler.

🤖 Generated with [Claude Code](https://claude.com/claude-code)

Co-Authored-By: Claude Sonnet 4.5 <noreply@anthropic.com>
Signed-off-by: majiayu000 <1835304752@qq.com>

* Address feedback from review

Signed-off-by: Ettore Di Giacinto <mudler@localai.io>

---------

Signed-off-by: majiayu000 <1835304752@qq.com>
Signed-off-by: Ettore Di Giacinto <mudler@localai.io>
Co-authored-by: Claude Sonnet 4.5 <noreply@anthropic.com>
Co-authored-by: Ettore Di Giacinto <mudler@localai.io>
2026-01-03 15:43:23 +01:00
LocalAI [bot]
8c504113a2 chore(model gallery): 🤖 add 1 new models via gallery agent (#7840)
chore(model gallery): 🤖 add new models via gallery agent

Signed-off-by: github-actions[bot] <41898282+github-actions[bot]@users.noreply.github.com>
Co-authored-by: mudler <2420543+mudler@users.noreply.github.com>
2026-01-03 08:42:05 +01:00
coffeerunhobby
666d110714 fix: Prevent BMI2 instruction crash on AVX-only CPUs (#7817)
* Fix: Prevent BMI2 instruction crash on AVX-only CPUs

Signed-off-by: Ettore Di Giacinto <mudler@localai.io>

* fix: apply no-bmi flags on non-darwin

Signed-off-by: Ettore Di Giacinto <mudler@localai.io>

---------

Signed-off-by: Ettore Di Giacinto <mudler@localai.io>
Co-authored-by: coffeerunhobby <coffeerunhobby@users.noreply.github.com>
Co-authored-by: Ettore Di Giacinto <mudler@localai.io>
2026-01-03 08:36:55 +01:00
LocalAI [bot]
641606ae93 chore: ⬆️ Update ggml-org/llama.cpp to 706e3f93a60109a40f1224eaf4af0d59caa7c3ae (#7836)
⬆️ Update ggml-org/llama.cpp

Signed-off-by: github-actions[bot] <41898282+github-actions[bot]@users.noreply.github.com>
Co-authored-by: mudler <2420543+mudler@users.noreply.github.com>
2026-01-02 21:26:37 +00:00
Ettore Di Giacinto
5f6c941399 fix(llama.cpp/mmproj): fix loading mmproj in nested sub-dirs different from model path (#7832)
fix(mmproj): fix loading mmproj in nested sub-dirs

Signed-off-by: Ettore Di Giacinto <mudler@localai.io>
2026-01-02 20:17:30 +01:00
LocalAI [bot]
1639fc6309 chore(model gallery): 🤖 add 1 new models via gallery agent (#7831)
chore(model gallery): 🤖 add new models via gallery agent

Signed-off-by: github-actions[bot] <41898282+github-actions[bot]@users.noreply.github.com>
Co-authored-by: mudler <2420543+mudler@users.noreply.github.com>
2026-01-02 15:10:00 +01:00
Ettore Di Giacinto
841e8f6d47 fix(image-gen): fix scrolling issues (#7829)
Signed-off-by: Ettore Di Giacinto <mudler@localai.io>
2026-01-02 09:05:49 +01:00
LocalAI [bot]
fd152c97c0 chore(model gallery): 🤖 add 1 new models via gallery agent (#7826)
chore(model gallery): 🤖 add new models via gallery agent

Signed-off-by: github-actions[bot] <41898282+github-actions[bot]@users.noreply.github.com>
Co-authored-by: mudler <2420543+mudler@users.noreply.github.com>
2026-01-02 08:45:43 +01:00
LocalAI [bot]
949de04052 chore: ⬆️ Update ggml-org/llama.cpp to ced765be44ce173c374f295b3c6f4175f8fd109b (#7822)
⬆️ Update ggml-org/llama.cpp

Signed-off-by: github-actions[bot] <41898282+github-actions[bot]@users.noreply.github.com>
Co-authored-by: mudler <2420543+mudler@users.noreply.github.com>
2026-01-02 08:44:49 +01:00
Ettore Di Giacinto
76cfe1f367 feat(image-gen/UI): move controls to the left, make the page more compact (#7823)
Signed-off-by: Ettore Di Giacinto <mudler@localai.io>
2026-01-01 22:07:42 +01:00
LocalAI [bot]
5ee6c1810b feat(swagger): update swagger (#7820)
Signed-off-by: github-actions[bot] <41898282+github-actions[bot]@users.noreply.github.com>
Co-authored-by: mudler <2420543+mudler@users.noreply.github.com>
Co-authored-by: Ettore Di Giacinto <mudler@users.noreply.github.com>
2026-01-01 21:16:38 +01:00
LocalAI [bot]
7db79aadfa chore(model-gallery): ⬆️ update checksum (#7821)
⬆️ Checksum updates in gallery/index.yaml

Signed-off-by: github-actions[bot] <41898282+github-actions[bot]@users.noreply.github.com>
Co-authored-by: mudler <2420543+mudler@users.noreply.github.com>
2026-01-01 21:16:11 +01:00
nold
dee48679b4 Fix(gallery): Updated checksums for qwen3-vl-30b instruct & thinking (#7819)
* Fix(gallery): SHA256 hashes for qwen3-vl-30b-instruct

Signed-off-by: nold <Nold360@users.noreply.github.com>

* Fix(gallery): SHA256 checksums for qwen3-vl-30b-thinking

Signed-off-by: nold <Nold360@users.noreply.github.com>

---------

Signed-off-by: nold <Nold360@users.noreply.github.com>
2026-01-01 20:33:55 +01:00
LocalAI [bot]
94b47a9310 chore(model gallery): 🤖 add 1 new models via gallery agent (#7816)
chore(model gallery): 🤖 add new models via gallery agent

Signed-off-by: github-actions[bot] <41898282+github-actions[bot]@users.noreply.github.com>
Co-authored-by: mudler <2420543+mudler@users.noreply.github.com>
2026-01-01 19:20:26 +01:00
LocalAI [bot]
bc3e8793ed chore: ⬆️ Update ggml-org/llama.cpp to 13814eb370d2f0b70e1830cc577b6155b17aee47 (#7809)
⬆️ Update ggml-org/llama.cpp

Signed-off-by: github-actions[bot] <41898282+github-actions[bot]@users.noreply.github.com>
Co-authored-by: mudler <2420543+mudler@users.noreply.github.com>
2025-12-31 23:04:01 +01:00
LocalAI [bot]
91978bb3a5 chore: ⬆️ Update ggml-org/whisper.cpp to e9898ddfb908ffaa7026c66852a023889a5a7202 (#7810)
⬆️ Update ggml-org/whisper.cpp

Signed-off-by: github-actions[bot] <41898282+github-actions[bot]@users.noreply.github.com>
Co-authored-by: mudler <2420543+mudler@users.noreply.github.com>
2025-12-31 22:59:05 +01:00
Ettore Di Giacinto
797f27f09f feat(UI): image generation improvements (#7804)
* chore: drop mode from image generation(unused)

Signed-off-by: Ettore Di Giacinto <mudler@localai.io>

* feat(UI): improve image generation front-end

Signed-off-by: Ettore Di Giacinto <mudler@localai.io>

* feat(UI): only ref images. files is to be deprecated

Signed-off-by: Ettore Di Giacinto <mudler@localai.io>

* do not override default steps

Signed-off-by: Ettore Di Giacinto <mudler@localai.io>

---------

Signed-off-by: Ettore Di Giacinto <mudler@localai.io>
2025-12-31 21:59:46 +01:00
LocalAI [bot]
3f1631aa87 chore(model gallery): 🤖 add 1 new models via gallery agent (#7807)
chore(model gallery): 🤖 add new models via gallery agent

Signed-off-by: github-actions[bot] <41898282+github-actions[bot]@users.noreply.github.com>
Co-authored-by: mudler <2420543+mudler@users.noreply.github.com>
2025-12-31 19:29:59 +01:00
LocalAI [bot]
dad509637e chore(model gallery): 🤖 add 1 new models via gallery agent (#7801)
chore(model gallery): 🤖 add new models via gallery agent

Signed-off-by: github-actions[bot] <41898282+github-actions[bot]@users.noreply.github.com>
Co-authored-by: mudler <2420543+mudler@users.noreply.github.com>
2025-12-31 09:18:35 +01:00
LocalAI [bot]
218f3a126a chore: ⬆️ Update ggml-org/llama.cpp to 0f89d2ecf14270f45f43c442e90ae433fd82dab1 (#7795)
⬆️ Update ggml-org/llama.cpp

Signed-off-by: github-actions[bot] <41898282+github-actions[bot]@users.noreply.github.com>
Co-authored-by: mudler <2420543+mudler@users.noreply.github.com>
2025-12-31 08:53:41 +01:00
Ettore Di Giacinto
be77a845fa fix(gallery agent): change model
Signed-off-by: Ettore Di Giacinto <mudler@localai.io>
2025-12-30 22:34:25 +00:00
Ettore Di Giacinto
ca32286022 fix(gallery agent): change model
Signed-off-by: Ettore Di Giacinto <mudler@localai.io>
2025-12-30 22:27:48 +00:00
Ettore Di Giacinto
1f592505dd fix(gallery agent): change model
Signed-off-by: Ettore Di Giacinto <mudler@localai.io>
2025-12-30 22:22:45 +00:00
Ettore Di Giacinto
b3bc623eb3 fix(gallery agent): fixups
Signed-off-by: Ettore Di Giacinto <mudler@localai.io>
2025-12-30 22:18:02 +00:00
Ettore Di Giacinto
e56391cf14 Add individual sponsors acknowledgment in README
Added a section to acknowledge individual sponsors and their contributions.

Signed-off-by: Ettore Di Giacinto <mudler@users.noreply.github.com>
2025-12-30 23:01:22 +01:00
Ettore Di Giacinto
ef3ffe4a4e fix(gallery agent): fixups
Signed-off-by: Ettore Di Giacinto <mudler@localai.io>
2025-12-30 21:56:54 +00:00
Ettore Di Giacinto
3cffde2cd5 fix(gallery agent): skip model selection if only one
Signed-off-by: Ettore Di Giacinto <mudler@localai.io>
2025-12-30 21:53:37 +00:00
LocalAI [bot]
234bf7e2ad feat(swagger): update swagger (#7794)
Signed-off-by: github-actions[bot] <41898282+github-actions[bot]@users.noreply.github.com>
Co-authored-by: mudler <2420543+mudler@users.noreply.github.com>
2025-12-30 21:05:01 +00:00
lif
ba73d2e759 fix: Failed to download checksums.txt when using launch to install localai (#7788)
* fix: add retry logic and fallback for checksums.txt download

- Add HTTP client with 30s timeout to ReleaseManager
- Implement downloadFileWithRetry with 3 attempts and exponential backoff
- Allow manual checksum placement at ~/.localai/checksums/checksums-<version>.txt
- Continue installation with warning if checksum download/verification fails
- Add test for HTTPClient initialization
- Fix linter error in systray_manager.go

Fixes #7385

Signed-off-by: majiayu000 <1835304752@qq.com>

* fix: add retry logic and improve checksums.txt download handling

This commit addresses issue #7385 by implementing:
- Retry logic (3 attempts) for checksum file downloads
- Fallback to manually placed checksum files
- Option to proceed with installation if checksums unavailable (with warnings)
- Fixed resource leaks in download retry loop
- Added configurable HTTP client with 30s timeout

The installation will now be more resilient to network issues while
maintaining security through checksum verification when available.

Signed-off-by: majiayu000 <1835304752@qq.com>

* fix: check for existing checksum file before downloading

This commit addresses the review feedback from mudler on PR #7788.
The code now checks if there's already a checksum file (either manually
placed or previously downloaded) and honors that, skipping download
entirely in such case.

Changes:
- Check for existing checksum file at ~/.localai/checksums/checksums-<version>.txt first
- Check for existing downloaded checksum file at binary path
- Only attempt to download if no existing checksum file is found
- This prevents unnecessary network requests and honors user-placed checksums

Signed-off-by: majiayu000 <1835304752@qq.com>

🤖 Generated with Claude Code

Co-Authored-By: Claude Sonnet 4.5 <noreply@anthropic.com>

---------

Signed-off-by: majiayu000 <1835304752@qq.com>
Co-authored-by: Claude Sonnet 4.5 <noreply@anthropic.com>
2025-12-30 18:33:44 +01:00
Ettore Di Giacinto
592697216b Revert "chore(deps): bump securego/gosec from 2.22.9 to 2.22.11" (#7789)
Revert "chore(deps): bump securego/gosec from 2.22.9 to 2.22.11 (#7774)"

This reverts commit 0c16f55b45.
2025-12-30 09:58:13 +01:00
lif
8bd7143a44 fix: propagate validation errors (#7787)
fix: validate MCP configuration in model config

Fixes #7334

The Validate() function was not checking if MCP configuration
(mcp.stdio and mcp.remote) contains valid JSON. This caused
malformed JSON with missing commas to be silently accepted.

Changes:
- Add MCP configuration validation to ModelConfig.Validate()
- Properly report validation errors instead of discarding them
- Add test cases for valid and invalid MCP configurations

The fix ensures that malformed JSON in MCP config sections
will now be caught and reported during validation.

🤖 Generated with [Claude Code](https://claude.com/claude-code)

Signed-off-by: majiayu000 <1835304752@qq.com>
Co-authored-by: Claude Sonnet 4.5 <noreply@anthropic.com>
2025-12-30 09:54:27 +01:00
lif
0d0ef0121c fix: Usage for image generation is incorrect (and causes error in LiteLLM) (#7786)
* fix: Add usage fields to image generation response for OpenAI API compatibility

Fixes #7354

Added input_tokens, output_tokens, and input_tokens_details fields to the
image generation API response to comply with OpenAI's image generation API
specification. This resolves validation errors in LiteLLM and the OpenAI SDK.

Changes:
- Added InputTokensDetails struct with text_tokens and image_tokens fields
- Extended OpenAIUsage struct with input_tokens, output_tokens, and input_tokens_details
- Updated ImageEndpoint to populate usage object with required fields
- Updated InpaintingEndpoint to populate usage object with required fields
- All fields initialized to 0 as per current behavior

🤖 Generated with [Claude Code](https://claude.com/claude-code)

Co-Authored-By: Claude Sonnet 4.5 <noreply@anthropic.com>
Signed-off-by: majiayu000 <1835304752@qq.com>

* fix: Correct usage field types for image generation API compatibility

Changed InputTokens and OutputTokens from pointer types (*int) to
regular int types to match OpenAI API specification. This fixes
validation errors with LiteLLM and OpenAI SDK when parsing image
generation responses.

Fixes #7354

🤖 Generated with [Claude Code](https://claude.com/claude-code)

Co-Authored-By: Claude Sonnet 4.5 <noreply@anthropic.com>
Signed-off-by: majiayu000 <1835304752@qq.com>

---------

Signed-off-by: majiayu000 <1835304752@qq.com>
Co-authored-by: Claude Sonnet 4.5 <noreply@anthropic.com>
2025-12-30 09:53:05 +01:00
lif
d7b2eee08f fix: add nil checks before mergo.Merge to prevent panic in gallery model installation (#7785)
Fixes #7420

Added nil checks before calling mergo.Merge in InstallModelFromGallery and InstallModel
functions to prevent panic when req.Overrides or configOverrides are nil. The panic was
occurring at models.go:248 during Qwen-Image-Edit gallery model download.

Changes:
- Added nil check for req.Overrides before merging in InstallModelFromGallery (line 126)
- Added nil check for configOverrides before merging in InstallModel (line 248)
- Added test case to verify nil configOverrides are handled without panic

Signed-off-by: majiayu000 <1835304752@qq.com>
2025-12-30 09:51:45 +01:00
LocalAI [bot]
bc8ec5cb39 chore: ⬆️ Update ggml-org/llama.cpp to c9a3b40d6578f2381a1373d10249403d58c3c5bd (#7778)
⬆️ Update ggml-org/llama.cpp

Signed-off-by: github-actions[bot] <41898282+github-actions[bot]@users.noreply.github.com>
Co-authored-by: mudler <2420543+mudler@users.noreply.github.com>
2025-12-30 08:27:16 +01:00
dependabot[bot]
3f38fecdfc chore(deps): bump github.com/modelcontextprotocol/go-sdk from 1.1.0 to 1.2.0 (#7776)
chore(deps): bump github.com/modelcontextprotocol/go-sdk

Bumps [github.com/modelcontextprotocol/go-sdk](https://github.com/modelcontextprotocol/go-sdk) from 1.1.0 to 1.2.0.
- [Release notes](https://github.com/modelcontextprotocol/go-sdk/releases)
- [Commits](https://github.com/modelcontextprotocol/go-sdk/compare/v1.1.0...v1.2.0)

---
updated-dependencies:
- dependency-name: github.com/modelcontextprotocol/go-sdk
  dependency-version: 1.2.0
  dependency-type: direct:production
  update-type: version-update:semver-minor
...

Signed-off-by: dependabot[bot] <support@github.com>
Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com>
2025-12-29 22:15:29 +01:00
dependabot[bot]
20a4199229 chore(deps): bump github.com/schollz/progressbar/v3 from 3.18.0 to 3.19.0 (#7775)
chore(deps): bump github.com/schollz/progressbar/v3

Bumps [github.com/schollz/progressbar/v3](https://github.com/schollz/progressbar) from 3.18.0 to 3.19.0.
- [Release notes](https://github.com/schollz/progressbar/releases)
- [Commits](https://github.com/schollz/progressbar/compare/v3.18.0...v3.19.0)

---
updated-dependencies:
- dependency-name: github.com/schollz/progressbar/v3
  dependency-version: 3.19.0
  dependency-type: direct:production
  update-type: version-update:semver-minor
...

Signed-off-by: dependabot[bot] <support@github.com>
Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com>
2025-12-29 22:15:11 +01:00
Ettore Di Giacinto
ded9955881 chore(ci): do not select models if we have only 1 result
Signed-off-by: Ettore Di Giacinto <mudler@localai.io>
2025-12-29 22:14:14 +01:00
dependabot[bot]
cf78f9a2a8 chore(deps): bump google.golang.org/grpc from 1.77.0 to 1.78.0 (#7777)
Bumps [google.golang.org/grpc](https://github.com/grpc/grpc-go) from 1.77.0 to 1.78.0.
- [Release notes](https://github.com/grpc/grpc-go/releases)
- [Commits](https://github.com/grpc/grpc-go/compare/v1.77.0...v1.78.0)

---
updated-dependencies:
- dependency-name: google.golang.org/grpc
  dependency-version: 1.78.0
  dependency-type: direct:production
  update-type: version-update:semver-minor
...

Signed-off-by: dependabot[bot] <support@github.com>
Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com>
2025-12-29 21:03:57 +01:00
dependabot[bot]
0c16f55b45 chore(deps): bump securego/gosec from 2.22.9 to 2.22.11 (#7774)
Bumps [securego/gosec](https://github.com/securego/gosec) from 2.22.9 to 2.22.11.
- [Release notes](https://github.com/securego/gosec/releases)
- [Commits](https://github.com/securego/gosec/compare/v2.22.9...v2.22.11)

---
updated-dependencies:
- dependency-name: securego/gosec
  dependency-version: 2.22.11
  dependency-type: direct:production
  update-type: version-update:semver-patch
...

Signed-off-by: dependabot[bot] <support@github.com>
Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com>
2025-12-29 19:18:29 +00:00
Richard Palethorpe
0b80167912 chore: ⬆️ Update leejet/stable-diffusion.cpp to 4ff2c8c74bd17c2cfffe3a01be77743fb3efba2f (#7771)
* ⬆️ Update leejet/stable-diffusion.cpp

Signed-off-by: github-actions[bot] <41898282+github-actions[bot]@users.noreply.github.com>

* fix: Add KL_OPTIMAL scheduler, pass sampler to default scheduler for LCM and fixup other refactorings from upstream

Signed-off-by: Richard Palethorpe <io@richiejp.com>

* Delete backend/go/stablediffusion-ggml/compile_commands.json

Signed-off-by: Ettore Di Giacinto <mudler@users.noreply.github.com>

---------

Signed-off-by: github-actions[bot] <41898282+github-actions[bot]@users.noreply.github.com>
Signed-off-by: Richard Palethorpe <io@richiejp.com>
Signed-off-by: Ettore Di Giacinto <mudler@users.noreply.github.com>
Co-authored-by: mudler <2420543+mudler@users.noreply.github.com>
Co-authored-by: Ettore Di Giacinto <mudler@users.noreply.github.com>
2025-12-29 19:06:35 +01:00
Richard Palethorpe
99b5c5f156 feat(api): Allow tracing of requests and responses (#7609)
* feat(api): Allow tracing of requests and responses

Signed-off-by: Richard Palethorpe <io@richiejp.com>

* feat(traces): Add traces UI

Signed-off-by: Richard Palethorpe <io@richiejp.com>

---------

Signed-off-by: Richard Palethorpe <io@richiejp.com>
2025-12-29 11:06:06 +01:00
Ettore Di Giacinto
9ab812a8e8 chore(ci): be more precise when detecting existing models (#7767)
Signed-off-by: Ettore Di Giacinto <mudler@localai.io>
2025-12-29 10:06:42 +01:00
Ettore Di Giacinto
185a685211 fix(amd-gpu): correctly show total and used vram (#7761)
An example output of `rocm-smi --showproductname --showmeminfo vram --showuniqueid --csv`:

```
device,Unique ID,VRAM Total Memory (B),VRAM Total Used Memory (B),Card Series,Card Model,Card Vendor,Card SKU,Subsystem ID,Device Rev,Node ID,GUID,GFX Version
card0,0x9246____________,17163091968,692142080,Navi 21 [Radeon RX 6800/6800 XT / 6900 XT],0x73bf,Advanced Micro Devices Inc. [AMD/ATI],001,0x2406,0xc1,1,45534,gfx1030
card1,N/A,67108864,26079232,Raphael,0x164e,Advanced Micro Devices Inc. [AMD/ATI],RAPHAEL,0x364e,0xc6,2,52156,gfx1036
```

Total memory is actually showed before the total used memory as can be seen in https://github.com/LostRuins/koboldcpp/issues/1104#issuecomment-2321143507.

This PR fixes https://github.com/mudler/LocalAI/issues/7724

Signed-off-by: Ettore Di Giacinto <mudler@localai.io>
2025-12-29 07:57:07 +01:00
LocalAI [bot]
1a6fd0f7fc chore: ⬆️ Update ggml-org/llama.cpp to 4ffc47cb2001e7d523f9ff525335bbe34b1a2858 (#7760)
⬆️ Update ggml-org/llama.cpp

Signed-off-by: github-actions[bot] <41898282+github-actions[bot]@users.noreply.github.com>
Co-authored-by: mudler <2420543+mudler@users.noreply.github.com>
2025-12-28 21:10:39 +00:00
LocalAI [bot]
c95c482f36 chore: ⬆️ Update ggml-org/llama.cpp to a4bf35889eda36d3597cd0f8f333f5b8a2fcaefc (#7751)
⬆️ Update ggml-org/llama.cpp

Signed-off-by: github-actions[bot] <41898282+github-actions[bot]@users.noreply.github.com>
Co-authored-by: mudler <2420543+mudler@users.noreply.github.com>
2025-12-27 21:09:12 +00:00
Ettore Di Giacinto
21c464c34f fix(cli): import via CLI needs system state (#7746)
pass system state to application config to avoid nil pointer exception
during import.

Fixes: https://github.com/mudler/LocalAI/issues/7728

Signed-off-by: Ettore Di Giacinto <mudler@localai.io>
2025-12-27 11:10:28 +01:00
LocalAI [bot]
ddf0281785 chore: ⬆️ Update ggml-org/llama.cpp to 7ac8902133da6eb390c4d8368a7d252279123942 (#7740)
⬆️ Update ggml-org/llama.cpp

Signed-off-by: github-actions[bot] <41898282+github-actions[bot]@users.noreply.github.com>
Co-authored-by: mudler <2420543+mudler@users.noreply.github.com>
2025-12-26 21:44:34 +00:00
LocalAI [bot]
86c68c9623 chore: ⬆️ Update ggml-org/llama.cpp to 85c40c9b02941ebf1add1469af75f1796d513ef4 (#7731)
⬆️ Update ggml-org/llama.cpp

Signed-off-by: github-actions[bot] <41898282+github-actions[bot]@users.noreply.github.com>
Co-authored-by: mudler <2420543+mudler@users.noreply.github.com>
2025-12-25 21:10:28 +00:00
Ettore Di Giacinto
c844b7ac58 feat: disable force eviction (#7725)
* feat: allow to set forcing backends eviction while requests are in flight

Signed-off-by: Ettore Di Giacinto <mudler@localai.io>

* feat: try to make the request sit and retry if eviction couldn't be done

Otherwise calls that in order to pass would need to shutdown other
backends would just fail.

In this way instead we make the request sit and retry eviction until it
succeeds. The thresholds can be configured by the user.

Signed-off-by: Ettore Di Giacinto <mudler@localai.io>

* add tests

Signed-off-by: Ettore Di Giacinto <mudler@localai.io>

* expose settings to CLI

Signed-off-by: Ettore Di Giacinto <mudler@localai.io>

* Update docs

Signed-off-by: Ettore Di Giacinto <mudler@localai.io>

---------

Signed-off-by: Ettore Di Giacinto <mudler@localai.io>
2025-12-25 14:26:18 +01:00
Ettore Di Giacinto
bb459e671f fix(ui): correctly parse import errors (#7726)
errors are nested

Signed-off-by: Ettore Di Giacinto <mudler@localai.io>
2025-12-25 10:43:12 +01:00
LocalAI [bot]
2fe6e278c8 chore: ⬆️ Update ggml-org/llama.cpp to c18428423018ed214c004e6ecaedb0cbdda06805 (#7718)
⬆️ Update ggml-org/llama.cpp

Signed-off-by: github-actions[bot] <41898282+github-actions[bot]@users.noreply.github.com>
Co-authored-by: mudler <2420543+mudler@users.noreply.github.com>
2025-12-25 10:00:40 +01:00
LocalAI [bot]
ae69921d77 chore: ⬆️ Update ggml-org/whisper.cpp to 6114e692136bea917dc88a5eb2e532c3d133d963 (#7717)
⬆️ Update ggml-org/whisper.cpp

Signed-off-by: github-actions[bot] <41898282+github-actions[bot]@users.noreply.github.com>
Co-authored-by: mudler <2420543+mudler@users.noreply.github.com>
2025-12-25 10:00:24 +01:00
Ettore Di Giacinto
bf2f95c684 chore(docs): update docs with cuda 13 instructions and the new vibevoice backend
Signed-off-by: Ettore Di Giacinto <mudler@localai.io>
2025-12-25 10:00:07 +01:00
LocalAI [bot]
94069f2751 docs: ⬆️ update docs version mudler/LocalAI (#7716)
⬆️ Update docs version mudler/LocalAI

Signed-off-by: github-actions[bot] <41898282+github-actions[bot]@users.noreply.github.com>
Co-authored-by: mudler <2420543+mudler@users.noreply.github.com>
2025-12-24 21:06:02 +00:00
369 changed files with 38011 additions and 6214 deletions

View File

@@ -11,6 +11,7 @@ import (
"slices"
"strings"
"github.com/ghodss/yaml"
hfapi "github.com/mudler/LocalAI/pkg/huggingface-api"
cogito "github.com/mudler/cogito"
@@ -52,6 +53,11 @@ func cleanTextContent(text string) string {
return stripThinkingTags(strings.TrimRight(result, "\n"))
}
type galleryModel struct {
Name string `yaml:"name"`
Urls []string `yaml:"urls"`
}
// isModelExisting checks if a specific model ID exists in the gallery using text search
func isModelExisting(modelID string) (bool, error) {
indexPath := getGalleryIndexPath()
@@ -60,9 +66,20 @@ func isModelExisting(modelID string) (bool, error) {
return false, fmt.Errorf("failed to read %s: %w", indexPath, err)
}
contentStr := string(content)
// Simple text search - if the model ID appears anywhere in the file, it exists
return strings.Contains(contentStr, modelID), nil
var galleryModels []galleryModel
err = yaml.Unmarshal(content, &galleryModels)
if err != nil {
return false, fmt.Errorf("failed to unmarshal %s: %w", indexPath, err)
}
for _, galleryModel := range galleryModels {
if slices.Contains(galleryModel.Urls, modelID) {
return true, nil
}
}
return false, nil
}
// filterExistingModels removes models that already exist in the gallery
@@ -134,6 +151,11 @@ func getRealReadme(ctx context.Context, repository string) (string, error) {
}
func selectMostInterestingModels(ctx context.Context, searchResult *SearchResult) ([]ProcessedModel, error) {
if len(searchResult.Models) == 1 {
return searchResult.Models, nil
}
// Create a conversation fragment
fragment := cogito.NewEmptyFragment().
AddMessage("user",

View File

@@ -119,14 +119,24 @@ func main() {
}
fmt.Println(result.FormattedOutput)
var models []ProcessedModel
// Use AI agent to select the most interesting models
fmt.Println("Using AI agent to select the most interesting models...")
models, err := selectMostInterestingModels(context.Background(), result)
if err != nil {
fmt.Fprintf(os.Stderr, "Error in model selection: %v\n", err)
// Continue with original result if selection fails
if len(result.Models) > 1 {
fmt.Println("More than one model found (", len(result.Models), "), using AI agent to select the most interesting models")
for _, model := range result.Models {
fmt.Println("Model: ", model.ModelID)
}
// Use AI agent to select the most interesting models
fmt.Println("Using AI agent to select the most interesting models...")
models, err = selectMostInterestingModels(context.Background(), result)
if err != nil {
fmt.Fprintf(os.Stderr, "Error in model selection: %v\n", err)
// Continue with original result if selection fails
models = result.Models
}
} else if len(result.Models) == 1 {
models = result.Models
fmt.Println("Only one model found, using it directly")
}
fmt.Print(models)
@@ -315,7 +325,7 @@ func searchAndProcessModels(searchTerm string, limit int, quantization string) (
outputBuilder.WriteString(fmt.Sprintf(" README Content Preview: %s\n",
processedModel.ReadmeContentPreview))
} else {
continue
fmt.Printf(" Warning: Failed to get real readme: %v\n", err)
}
fmt.Println("Real readme got", readmeContent)

View File

File diff suppressed because it is too large Load Diff

View File

@@ -14,7 +14,7 @@ jobs:
steps:
- name: Dependabot metadata
id: metadata
uses: dependabot/fetch-metadata@v2.4.0
uses: dependabot/fetch-metadata@v2.5.0
with:
github-token: "${{ secrets.GITHUB_TOKEN }}"
skip-commit-verification: true

View File

@@ -49,12 +49,12 @@ jobs:
PATH="$PATH:$HOME/go/bin" make protogen-go
- uses: mudler/localai-github-action@v1.1
with:
model: 'qwen3-4b'
model: 'https://huggingface.co/bartowski/Qwen_Qwen3-1.7B-GGUF'
- name: Run gallery agent
env:
#OPENAI_MODEL: ${{ secrets.OPENAI_MODEL }}
OPENAI_MODE: qwen3-4b
OPENAI_MODE: Qwen_Qwen3-1.7B-GGUF
OPENAI_BASE_URL: "http://localhost:8080"
OPENAI_KEY: ${{ secrets.OPENAI_KEY }}
#OPENAI_BASE_URL: ${{ secrets.OPENAI_BASE_URL }}

View File

@@ -16,7 +16,7 @@ jobs:
strategy:
matrix:
include:
- grpc-base-image: ubuntu:22.04
- grpc-base-image: ubuntu:24.04
runs-on: 'ubuntu-latest'
platforms: 'linux/amd64,linux/arm64'
runs-on: ${{matrix.runs-on}}

View File

@@ -15,7 +15,7 @@ jobs:
strategy:
matrix:
include:
- base-image: intel/oneapi-basekit:2025.2.0-0-devel-ubuntu22.04
- base-image: intel/oneapi-basekit:2025.3.0-0-devel-ubuntu24.04
runs-on: 'arc-runner-set'
platforms: 'linux/amd64'
runs-on: ${{matrix.runs-on}}
@@ -53,7 +53,7 @@ jobs:
BASE_IMAGE=${{ matrix.base-image }}
context: .
file: ./Dockerfile
tags: quay.io/go-skynet/intel-oneapi-base:latest
tags: quay.io/go-skynet/intel-oneapi-base:24.04
push: true
target: intel
platforms: ${{ matrix.platforms }}

View File

@@ -1,94 +1,95 @@
---
name: 'build container images tests'
on:
pull_request:
concurrency:
group: ci-${{ github.head_ref || github.ref }}-${{ github.repository }}
cancel-in-progress: true
jobs:
image-build:
uses: ./.github/workflows/image_build.yml
with:
tag-latest: ${{ matrix.tag-latest }}
tag-suffix: ${{ matrix.tag-suffix }}
build-type: ${{ matrix.build-type }}
cuda-major-version: ${{ matrix.cuda-major-version }}
cuda-minor-version: ${{ matrix.cuda-minor-version }}
platforms: ${{ matrix.platforms }}
runs-on: ${{ matrix.runs-on }}
base-image: ${{ matrix.base-image }}
grpc-base-image: ${{ matrix.grpc-base-image }}
makeflags: ${{ matrix.makeflags }}
ubuntu-version: ${{ matrix.ubuntu-version }}
secrets:
dockerUsername: ${{ secrets.DOCKERHUB_USERNAME }}
dockerPassword: ${{ secrets.DOCKERHUB_PASSWORD }}
quayUsername: ${{ secrets.LOCALAI_REGISTRY_USERNAME }}
quayPassword: ${{ secrets.LOCALAI_REGISTRY_PASSWORD }}
strategy:
# Pushing with all jobs in parallel
# eats the bandwidth of all the nodes
max-parallel: ${{ github.event_name != 'pull_request' && 4 || 8 }}
fail-fast: false
matrix:
include:
- build-type: 'cublas'
cuda-major-version: "12"
cuda-minor-version: "0"
platforms: 'linux/amd64'
tag-latest: 'false'
tag-suffix: '-gpu-nvidia-cuda-12'
runs-on: 'ubuntu-latest'
base-image: "ubuntu:22.04"
makeflags: "--jobs=3 --output-sync=target"
ubuntu-version: '2204'
- build-type: 'cublas'
cuda-major-version: "13"
cuda-minor-version: "0"
platforms: 'linux/amd64'
tag-latest: 'false'
tag-suffix: '-gpu-nvidia-cuda-13'
runs-on: 'ubuntu-latest'
base-image: "ubuntu:22.04"
makeflags: "--jobs=3 --output-sync=target"
ubuntu-version: '2204'
- build-type: 'hipblas'
platforms: 'linux/amd64'
tag-latest: 'false'
tag-suffix: '-hipblas'
base-image: "rocm/dev-ubuntu-22.04:6.4.3"
grpc-base-image: "ubuntu:22.04"
runs-on: 'ubuntu-latest'
makeflags: "--jobs=3 --output-sync=target"
ubuntu-version: '2204'
- build-type: 'sycl'
platforms: 'linux/amd64'
tag-latest: 'false'
base-image: "quay.io/go-skynet/intel-oneapi-base:latest"
grpc-base-image: "ubuntu:22.04"
tag-suffix: 'sycl'
runs-on: 'ubuntu-latest'
makeflags: "--jobs=3 --output-sync=target"
ubuntu-version: '2204'
- build-type: 'vulkan'
platforms: 'linux/amd64'
tag-latest: 'false'
tag-suffix: '-vulkan-core'
runs-on: 'ubuntu-latest'
base-image: "ubuntu:22.04"
makeflags: "--jobs=4 --output-sync=target"
ubuntu-version: '2204'
- build-type: 'cublas'
cuda-major-version: "13"
cuda-minor-version: "0"
platforms: 'linux/arm64'
tag-latest: 'false'
tag-suffix: '-nvidia-l4t-arm64-cuda-13'
base-image: "ubuntu:24.04"
runs-on: 'ubuntu-24.04-arm'
makeflags: "--jobs=4 --output-sync=target"
skip-drivers: 'false'
ubuntu-version: '2404'
name: 'build container images tests'
on:
pull_request:
concurrency:
group: ci-${{ github.head_ref || github.ref }}-${{ github.repository }}
cancel-in-progress: true
jobs:
image-build:
uses: ./.github/workflows/image_build.yml
with:
tag-latest: ${{ matrix.tag-latest }}
tag-suffix: ${{ matrix.tag-suffix }}
build-type: ${{ matrix.build-type }}
cuda-major-version: ${{ matrix.cuda-major-version }}
cuda-minor-version: ${{ matrix.cuda-minor-version }}
platforms: ${{ matrix.platforms }}
runs-on: ${{ matrix.runs-on }}
base-image: ${{ matrix.base-image }}
grpc-base-image: ${{ matrix.grpc-base-image }}
makeflags: ${{ matrix.makeflags }}
ubuntu-version: ${{ matrix.ubuntu-version }}
secrets:
dockerUsername: ${{ secrets.DOCKERHUB_USERNAME }}
dockerPassword: ${{ secrets.DOCKERHUB_PASSWORD }}
quayUsername: ${{ secrets.LOCALAI_REGISTRY_USERNAME }}
quayPassword: ${{ secrets.LOCALAI_REGISTRY_PASSWORD }}
strategy:
# Pushing with all jobs in parallel
# eats the bandwidth of all the nodes
max-parallel: ${{ github.event_name != 'pull_request' && 4 || 8 }}
fail-fast: false
matrix:
include:
- build-type: 'cublas'
cuda-major-version: "12"
cuda-minor-version: "8"
platforms: 'linux/amd64'
tag-latest: 'false'
tag-suffix: '-gpu-nvidia-cuda-12'
runs-on: 'ubuntu-latest'
base-image: "ubuntu:24.04"
makeflags: "--jobs=3 --output-sync=target"
ubuntu-version: '2404'
- build-type: 'cublas'
cuda-major-version: "13"
cuda-minor-version: "0"
platforms: 'linux/amd64'
tag-latest: 'false'
tag-suffix: '-gpu-nvidia-cuda-13'
runs-on: 'ubuntu-latest'
base-image: "ubuntu:22.04"
makeflags: "--jobs=3 --output-sync=target"
ubuntu-version: '2404'
- build-type: 'hipblas'
platforms: 'linux/amd64'
tag-latest: 'false'
tag-suffix: '-hipblas'
base-image: "rocm/dev-ubuntu-24.04:6.4.4"
grpc-base-image: "ubuntu:24.04"
runs-on: 'ubuntu-latest'
makeflags: "--jobs=3 --output-sync=target"
ubuntu-version: '2404'
- build-type: 'sycl'
platforms: 'linux/amd64'
tag-latest: 'false'
base-image: "intel/oneapi-basekit:2025.3.0-0-devel-ubuntu24.04"
grpc-base-image: "ubuntu:24.04"
tag-suffix: 'sycl'
runs-on: 'ubuntu-latest'
makeflags: "--jobs=3 --output-sync=target"
ubuntu-version: '2404'
- build-type: 'vulkan'
platforms: 'linux/amd64,linux/arm64'
tag-latest: 'false'
tag-suffix: '-vulkan-core'
runs-on: 'ubuntu-latest'
base-image: "ubuntu:24.04"
makeflags: "--jobs=4 --output-sync=target"
ubuntu-version: '2404'
- build-type: 'cublas'
cuda-major-version: "13"
cuda-minor-version: "0"
platforms: 'linux/arm64'
tag-latest: 'false'
tag-suffix: '-nvidia-l4t-arm64-cuda-13'
base-image: "ubuntu:24.04"
runs-on: 'ubuntu-24.04-arm'
makeflags: "--jobs=4 --output-sync=target"
skip-drivers: 'false'
ubuntu-version: '2404'

View File

@@ -1,187 +1,187 @@
---
name: 'build container images'
on:
push:
branches:
- master
tags:
- '*'
concurrency:
group: ci-${{ github.head_ref || github.ref }}-${{ github.repository }}
cancel-in-progress: true
jobs:
hipblas-jobs:
uses: ./.github/workflows/image_build.yml
with:
tag-latest: ${{ matrix.tag-latest }}
tag-suffix: ${{ matrix.tag-suffix }}
build-type: ${{ matrix.build-type }}
cuda-major-version: ${{ matrix.cuda-major-version }}
cuda-minor-version: ${{ matrix.cuda-minor-version }}
platforms: ${{ matrix.platforms }}
runs-on: ${{ matrix.runs-on }}
base-image: ${{ matrix.base-image }}
grpc-base-image: ${{ matrix.grpc-base-image }}
aio: ${{ matrix.aio }}
makeflags: ${{ matrix.makeflags }}
ubuntu-version: ${{ matrix.ubuntu-version }}
secrets:
dockerUsername: ${{ secrets.DOCKERHUB_USERNAME }}
dockerPassword: ${{ secrets.DOCKERHUB_PASSWORD }}
quayUsername: ${{ secrets.LOCALAI_REGISTRY_USERNAME }}
quayPassword: ${{ secrets.LOCALAI_REGISTRY_PASSWORD }}
strategy:
matrix:
include:
- build-type: 'hipblas'
platforms: 'linux/amd64'
tag-latest: 'auto'
tag-suffix: '-gpu-hipblas'
base-image: "rocm/dev-ubuntu-22.04:6.4.3"
grpc-base-image: "ubuntu:22.04"
runs-on: 'ubuntu-latest'
makeflags: "--jobs=3 --output-sync=target"
aio: "-aio-gpu-hipblas"
ubuntu-version: '2204'
core-image-build:
uses: ./.github/workflows/image_build.yml
with:
tag-latest: ${{ matrix.tag-latest }}
tag-suffix: ${{ matrix.tag-suffix }}
build-type: ${{ matrix.build-type }}
cuda-major-version: ${{ matrix.cuda-major-version }}
cuda-minor-version: ${{ matrix.cuda-minor-version }}
platforms: ${{ matrix.platforms }}
runs-on: ${{ matrix.runs-on }}
aio: ${{ matrix.aio }}
base-image: ${{ matrix.base-image }}
grpc-base-image: ${{ matrix.grpc-base-image }}
makeflags: ${{ matrix.makeflags }}
skip-drivers: ${{ matrix.skip-drivers }}
ubuntu-version: ${{ matrix.ubuntu-version }}
secrets:
dockerUsername: ${{ secrets.DOCKERHUB_USERNAME }}
dockerPassword: ${{ secrets.DOCKERHUB_PASSWORD }}
quayUsername: ${{ secrets.LOCALAI_REGISTRY_USERNAME }}
quayPassword: ${{ secrets.LOCALAI_REGISTRY_PASSWORD }}
strategy:
#max-parallel: ${{ github.event_name != 'pull_request' && 2 || 4 }}
matrix:
include:
- build-type: ''
platforms: 'linux/amd64,linux/arm64'
tag-latest: 'auto'
tag-suffix: ''
base-image: "ubuntu:22.04"
runs-on: 'ubuntu-latest'
aio: "-aio-cpu"
makeflags: "--jobs=4 --output-sync=target"
skip-drivers: 'false'
ubuntu-version: '2204'
- build-type: 'cublas'
cuda-major-version: "11"
cuda-minor-version: "7"
platforms: 'linux/amd64'
tag-latest: 'auto'
tag-suffix: '-gpu-nvidia-cuda-11'
runs-on: 'ubuntu-latest'
base-image: "ubuntu:22.04"
makeflags: "--jobs=4 --output-sync=target"
skip-drivers: 'false'
aio: "-aio-gpu-nvidia-cuda-11"
ubuntu-version: '2204'
- build-type: 'cublas'
cuda-major-version: "12"
cuda-minor-version: "0"
platforms: 'linux/amd64'
tag-latest: 'auto'
tag-suffix: '-gpu-nvidia-cuda-12'
runs-on: 'ubuntu-latest'
base-image: "ubuntu:22.04"
skip-drivers: 'false'
makeflags: "--jobs=4 --output-sync=target"
aio: "-aio-gpu-nvidia-cuda-12"
ubuntu-version: '2204'
- build-type: 'cublas'
cuda-major-version: "13"
cuda-minor-version: "0"
platforms: 'linux/amd64'
tag-latest: 'auto'
tag-suffix: '-gpu-nvidia-cuda-13'
runs-on: 'ubuntu-latest'
base-image: "ubuntu:22.04"
skip-drivers: 'false'
makeflags: "--jobs=4 --output-sync=target"
aio: "-aio-gpu-nvidia-cuda-13"
ubuntu-version: '2204'
- build-type: 'vulkan'
platforms: 'linux/amd64'
tag-latest: 'auto'
tag-suffix: '-gpu-vulkan'
runs-on: 'ubuntu-latest'
base-image: "ubuntu:22.04"
skip-drivers: 'false'
makeflags: "--jobs=4 --output-sync=target"
aio: "-aio-gpu-vulkan"
ubuntu-version: '2204'
- build-type: 'intel'
platforms: 'linux/amd64'
tag-latest: 'auto'
base-image: "quay.io/go-skynet/intel-oneapi-base:latest"
grpc-base-image: "ubuntu:22.04"
tag-suffix: '-gpu-intel'
runs-on: 'ubuntu-latest'
makeflags: "--jobs=3 --output-sync=target"
aio: "-aio-gpu-intel"
ubuntu-version: '2204'
gh-runner:
uses: ./.github/workflows/image_build.yml
with:
tag-latest: ${{ matrix.tag-latest }}
tag-suffix: ${{ matrix.tag-suffix }}
build-type: ${{ matrix.build-type }}
cuda-major-version: ${{ matrix.cuda-major-version }}
cuda-minor-version: ${{ matrix.cuda-minor-version }}
platforms: ${{ matrix.platforms }}
runs-on: ${{ matrix.runs-on }}
aio: ${{ matrix.aio }}
base-image: ${{ matrix.base-image }}
grpc-base-image: ${{ matrix.grpc-base-image }}
makeflags: ${{ matrix.makeflags }}
skip-drivers: ${{ matrix.skip-drivers }}
ubuntu-version: ${{ matrix.ubuntu-version }}
secrets:
dockerUsername: ${{ secrets.DOCKERHUB_USERNAME }}
dockerPassword: ${{ secrets.DOCKERHUB_PASSWORD }}
quayUsername: ${{ secrets.LOCALAI_REGISTRY_USERNAME }}
quayPassword: ${{ secrets.LOCALAI_REGISTRY_PASSWORD }}
strategy:
matrix:
include:
- build-type: 'cublas'
cuda-major-version: "12"
cuda-minor-version: "0"
platforms: 'linux/arm64'
tag-latest: 'auto'
tag-suffix: '-nvidia-l4t-arm64'
base-image: "nvcr.io/nvidia/l4t-jetpack:r36.4.0"
runs-on: 'ubuntu-24.04-arm'
makeflags: "--jobs=4 --output-sync=target"
skip-drivers: 'true'
ubuntu-version: "2204"
- build-type: 'cublas'
cuda-major-version: "13"
cuda-minor-version: "0"
platforms: 'linux/arm64'
tag-latest: 'auto'
tag-suffix: '-nvidia-l4t-arm64-cuda-13'
base-image: "ubuntu:24.04"
runs-on: 'ubuntu-24.04-arm'
makeflags: "--jobs=4 --output-sync=target"
skip-drivers: 'false'
ubuntu-version: '2404'
name: 'build container images'
on:
push:
branches:
- master
tags:
- '*'
concurrency:
group: ci-${{ github.head_ref || github.ref }}-${{ github.repository }}
cancel-in-progress: true
jobs:
hipblas-jobs:
uses: ./.github/workflows/image_build.yml
with:
tag-latest: ${{ matrix.tag-latest }}
tag-suffix: ${{ matrix.tag-suffix }}
build-type: ${{ matrix.build-type }}
cuda-major-version: ${{ matrix.cuda-major-version }}
cuda-minor-version: ${{ matrix.cuda-minor-version }}
platforms: ${{ matrix.platforms }}
runs-on: ${{ matrix.runs-on }}
base-image: ${{ matrix.base-image }}
grpc-base-image: ${{ matrix.grpc-base-image }}
aio: ${{ matrix.aio }}
makeflags: ${{ matrix.makeflags }}
ubuntu-version: ${{ matrix.ubuntu-version }}
ubuntu-codename: ${{ matrix.ubuntu-codename }}
secrets:
dockerUsername: ${{ secrets.DOCKERHUB_USERNAME }}
dockerPassword: ${{ secrets.DOCKERHUB_PASSWORD }}
quayUsername: ${{ secrets.LOCALAI_REGISTRY_USERNAME }}
quayPassword: ${{ secrets.LOCALAI_REGISTRY_PASSWORD }}
strategy:
matrix:
include:
- build-type: 'hipblas'
platforms: 'linux/amd64'
tag-latest: 'auto'
tag-suffix: '-gpu-hipblas'
base-image: "rocm/dev-ubuntu-24.04:6.4.4"
grpc-base-image: "ubuntu:24.04"
runs-on: 'ubuntu-latest'
makeflags: "--jobs=3 --output-sync=target"
aio: "-aio-gpu-hipblas"
ubuntu-version: '2404'
ubuntu-codename: 'noble'
core-image-build:
uses: ./.github/workflows/image_build.yml
with:
tag-latest: ${{ matrix.tag-latest }}
tag-suffix: ${{ matrix.tag-suffix }}
build-type: ${{ matrix.build-type }}
cuda-major-version: ${{ matrix.cuda-major-version }}
cuda-minor-version: ${{ matrix.cuda-minor-version }}
platforms: ${{ matrix.platforms }}
runs-on: ${{ matrix.runs-on }}
aio: ${{ matrix.aio }}
base-image: ${{ matrix.base-image }}
grpc-base-image: ${{ matrix.grpc-base-image }}
makeflags: ${{ matrix.makeflags }}
skip-drivers: ${{ matrix.skip-drivers }}
ubuntu-version: ${{ matrix.ubuntu-version }}
ubuntu-codename: ${{ matrix.ubuntu-codename }}
secrets:
dockerUsername: ${{ secrets.DOCKERHUB_USERNAME }}
dockerPassword: ${{ secrets.DOCKERHUB_PASSWORD }}
quayUsername: ${{ secrets.LOCALAI_REGISTRY_USERNAME }}
quayPassword: ${{ secrets.LOCALAI_REGISTRY_PASSWORD }}
strategy:
#max-parallel: ${{ github.event_name != 'pull_request' && 2 || 4 }}
matrix:
include:
- build-type: ''
platforms: 'linux/amd64,linux/arm64'
tag-latest: 'auto'
tag-suffix: ''
base-image: "ubuntu:24.04"
runs-on: 'ubuntu-latest'
aio: "-aio-cpu"
makeflags: "--jobs=4 --output-sync=target"
skip-drivers: 'false'
ubuntu-version: '2404'
ubuntu-codename: 'noble'
- build-type: 'cublas'
cuda-major-version: "12"
cuda-minor-version: "8"
platforms: 'linux/amd64'
tag-latest: 'auto'
tag-suffix: '-gpu-nvidia-cuda-12'
runs-on: 'ubuntu-latest'
base-image: "ubuntu:24.04"
skip-drivers: 'false'
makeflags: "--jobs=4 --output-sync=target"
aio: "-aio-gpu-nvidia-cuda-12"
ubuntu-version: '2404'
ubuntu-codename: 'noble'
- build-type: 'cublas'
cuda-major-version: "13"
cuda-minor-version: "0"
platforms: 'linux/amd64'
tag-latest: 'auto'
tag-suffix: '-gpu-nvidia-cuda-13'
runs-on: 'ubuntu-latest'
base-image: "ubuntu:22.04"
skip-drivers: 'false'
makeflags: "--jobs=4 --output-sync=target"
aio: "-aio-gpu-nvidia-cuda-13"
ubuntu-version: '2404'
ubuntu-codename: 'noble'
- build-type: 'vulkan'
platforms: 'linux/amd64,linux/arm64'
tag-latest: 'auto'
tag-suffix: '-gpu-vulkan'
runs-on: 'ubuntu-latest'
base-image: "ubuntu:24.04"
skip-drivers: 'false'
makeflags: "--jobs=4 --output-sync=target"
aio: "-aio-gpu-vulkan"
ubuntu-version: '2404'
ubuntu-codename: 'noble'
- build-type: 'intel'
platforms: 'linux/amd64'
tag-latest: 'auto'
base-image: "intel/oneapi-basekit:2025.3.0-0-devel-ubuntu24.04"
grpc-base-image: "ubuntu:24.04"
tag-suffix: '-gpu-intel'
runs-on: 'ubuntu-latest'
makeflags: "--jobs=3 --output-sync=target"
aio: "-aio-gpu-intel"
ubuntu-version: '2404'
ubuntu-codename: 'noble'
gh-runner:
uses: ./.github/workflows/image_build.yml
with:
tag-latest: ${{ matrix.tag-latest }}
tag-suffix: ${{ matrix.tag-suffix }}
build-type: ${{ matrix.build-type }}
cuda-major-version: ${{ matrix.cuda-major-version }}
cuda-minor-version: ${{ matrix.cuda-minor-version }}
platforms: ${{ matrix.platforms }}
runs-on: ${{ matrix.runs-on }}
aio: ${{ matrix.aio }}
base-image: ${{ matrix.base-image }}
grpc-base-image: ${{ matrix.grpc-base-image }}
makeflags: ${{ matrix.makeflags }}
skip-drivers: ${{ matrix.skip-drivers }}
ubuntu-version: ${{ matrix.ubuntu-version }}
ubuntu-codename: ${{ matrix.ubuntu-codename }}
secrets:
dockerUsername: ${{ secrets.DOCKERHUB_USERNAME }}
dockerPassword: ${{ secrets.DOCKERHUB_PASSWORD }}
quayUsername: ${{ secrets.LOCALAI_REGISTRY_USERNAME }}
quayPassword: ${{ secrets.LOCALAI_REGISTRY_PASSWORD }}
strategy:
matrix:
include:
- build-type: 'cublas'
cuda-major-version: "12"
cuda-minor-version: "0"
platforms: 'linux/arm64'
tag-latest: 'auto'
tag-suffix: '-nvidia-l4t-arm64'
base-image: "nvcr.io/nvidia/l4t-jetpack:r36.4.0"
runs-on: 'ubuntu-24.04-arm'
makeflags: "--jobs=4 --output-sync=target"
skip-drivers: 'true'
ubuntu-version: "2204"
ubuntu-codename: 'jammy'
- build-type: 'cublas'
cuda-major-version: "13"
cuda-minor-version: "0"
platforms: 'linux/arm64'
tag-latest: 'auto'
tag-suffix: '-nvidia-l4t-arm64-cuda-13'
base-image: "ubuntu:24.04"
runs-on: 'ubuntu-24.04-arm'
makeflags: "--jobs=4 --output-sync=target"
skip-drivers: 'false'
ubuntu-version: '2404'
ubuntu-codename: 'noble'

View File

@@ -23,7 +23,7 @@ on:
type: string
cuda-minor-version:
description: 'CUDA minor version'
default: "4"
default: "9"
type: string
platforms:
description: 'Platforms'
@@ -61,6 +61,11 @@ on:
required: false
default: '2204'
type: string
ubuntu-codename:
description: 'Ubuntu codename'
required: false
default: 'noble'
type: string
secrets:
dockerUsername:
required: true
@@ -244,6 +249,7 @@ jobs:
MAKEFLAGS=${{ inputs.makeflags }}
SKIP_DRIVERS=${{ inputs.skip-drivers }}
UBUNTU_VERSION=${{ inputs.ubuntu-version }}
UBUNTU_CODENAME=${{ inputs.ubuntu-codename }}
context: .
file: ./Dockerfile
cache-from: type=gha
@@ -272,6 +278,7 @@ jobs:
MAKEFLAGS=${{ inputs.makeflags }}
SKIP_DRIVERS=${{ inputs.skip-drivers }}
UBUNTU_VERSION=${{ inputs.ubuntu-version }}
UBUNTU_CODENAME=${{ inputs.ubuntu-codename }}
context: .
file: ./Dockerfile
cache-from: type=gha

View File

@@ -238,7 +238,7 @@ jobs:
- name: Dependencies
run: |
sudo apt-get update
sudo apt-get install build-essential ffmpeg
sudo apt-get install -y build-essential ffmpeg
sudo apt-get install -y ca-certificates cmake curl patch espeak espeak-ng python3-pip
# Install UV
curl -LsSf https://astral.sh/uv/install.sh | sh
@@ -247,3 +247,98 @@ jobs:
run: |
make --jobs=5 --output-sync=target -C backend/python/coqui
make --jobs=5 --output-sync=target -C backend/python/coqui test
tests-moonshine:
runs-on: ubuntu-latest
steps:
- name: Clone
uses: actions/checkout@v6
with:
submodules: true
- name: Dependencies
run: |
sudo apt-get update
sudo apt-get install -y build-essential ffmpeg
sudo apt-get install -y ca-certificates cmake curl patch python3-pip
# Install UV
curl -LsSf https://astral.sh/uv/install.sh | sh
pip install --user --no-cache-dir grpcio-tools==1.64.1
- name: Test moonshine
run: |
make --jobs=5 --output-sync=target -C backend/python/moonshine
make --jobs=5 --output-sync=target -C backend/python/moonshine test
tests-pocket-tts:
runs-on: ubuntu-latest
steps:
- name: Clone
uses: actions/checkout@v6
with:
submodules: true
- name: Dependencies
run: |
sudo apt-get update
sudo apt-get install -y build-essential ffmpeg
sudo apt-get install -y ca-certificates cmake curl patch python3-pip
# Install UV
curl -LsSf https://astral.sh/uv/install.sh | sh
pip install --user --no-cache-dir grpcio-tools==1.64.1
- name: Test pocket-tts
run: |
make --jobs=5 --output-sync=target -C backend/python/pocket-tts
make --jobs=5 --output-sync=target -C backend/python/pocket-tts test
tests-qwen-tts:
runs-on: ubuntu-latest
steps:
- name: Clone
uses: actions/checkout@v6
with:
submodules: true
- name: Dependencies
run: |
sudo apt-get update
sudo apt-get install -y build-essential ffmpeg
sudo apt-get install -y ca-certificates cmake curl patch python3-pip
# Install UV
curl -LsSf https://astral.sh/uv/install.sh | sh
pip install --user --no-cache-dir grpcio-tools==1.64.1
- name: Test qwen-tts
run: |
make --jobs=5 --output-sync=target -C backend/python/qwen-tts
make --jobs=5 --output-sync=target -C backend/python/qwen-tts test
tests-qwen-asr:
runs-on: ubuntu-latest
steps:
- name: Clone
uses: actions/checkout@v6
with:
submodules: true
- name: Dependencies
run: |
sudo apt-get update
sudo apt-get install -y build-essential ffmpeg sox
sudo apt-get install -y ca-certificates cmake curl patch python3-pip
# Install UV
curl -LsSf https://astral.sh/uv/install.sh | sh
pip install --user --no-cache-dir grpcio-tools==1.64.1
- name: Test qwen-asr
run: |
make --jobs=5 --output-sync=target -C backend/python/qwen-asr
make --jobs=5 --output-sync=target -C backend/python/qwen-asr test
tests-voxcpm:
runs-on: ubuntu-latest
steps:
- name: Clone
uses: actions/checkout@v6
with:
submodules: true
- name: Dependencies
run: |
sudo apt-get update
sudo apt-get install build-essential ffmpeg
sudo apt-get install -y ca-certificates cmake curl patch python3-pip
# Install UV
curl -LsSf https://astral.sh/uv/install.sh | sh
pip install --user --no-cache-dir grpcio-tools==1.64.1
- name: Test voxcpm
run: |
make --jobs=5 --output-sync=target -C backend/python/voxcpm
make --jobs=5 --output-sync=target -C backend/python/voxcpm test

56
.github/workflows/tests-e2e.yml vendored Normal file
View File

@@ -0,0 +1,56 @@
---
name: 'E2E Backend Tests'
on:
pull_request:
push:
branches:
- master
tags:
- '*'
concurrency:
group: ci-tests-e2e-backend-${{ github.head_ref || github.ref }}-${{ github.repository }}
cancel-in-progress: true
jobs:
tests-e2e-backend:
runs-on: ubuntu-latest
strategy:
matrix:
go-version: ['1.25.x']
steps:
- name: Clone
uses: actions/checkout@v6
with:
submodules: true
- name: Setup Go ${{ matrix.go-version }}
uses: actions/setup-go@v5
with:
go-version: ${{ matrix.go-version }}
cache: false
- name: Display Go version
run: go version
- name: Proto Dependencies
run: |
# Install protoc
curl -L -s https://github.com/protocolbuffers/protobuf/releases/download/v26.1/protoc-26.1-linux-x86_64.zip -o protoc.zip && \
unzip -j -d /usr/local/bin protoc.zip bin/protoc && \
rm protoc.zip
go install google.golang.org/protobuf/cmd/protoc-gen-go@v1.34.2
go install google.golang.org/grpc/cmd/protoc-gen-go-grpc@1958fcbe2ca8bd93af633f11e97d44e567e945af
PATH="$PATH:$HOME/go/bin" make protogen-go
- name: Dependencies
run: |
sudo apt-get update
sudo apt-get install -y build-essential
- name: Test Backend E2E
run: |
PATH="$PATH:$HOME/go/bin" make build-mock-backend test-e2e
- name: Setup tmate session if tests fail
if: ${{ failure() }}
uses: mxschmitt/action-tmate@v3.23
with:
detached: true
connect-timeout-seconds: 180
limit-access-to-actor: true

3
.gitignore vendored
View File

@@ -25,6 +25,7 @@ go-bert
# LocalAI build binary
LocalAI
/local-ai
/local-ai-launcher
# prevent above rules from omitting the helm chart
!charts/*
# prevent above rules from omitting the api/localai folder
@@ -35,6 +36,8 @@ LocalAI
models/*
test-models/
test-dir/
tests/e2e-aio/backends
tests/e2e-aio/models
release/

211
AGENTS.md
View File

@@ -2,6 +2,163 @@
Building and testing the project depends on the components involved and the platform where development is taking place. Due to the amount of context required it's usually best not to try building or testing the project unless the user requests it. If you must build the project then inspect the Makefile in the project root and the Makefiles of any backends that are effected by changes you are making. In addition the workflows in .github/workflows can be used as a reference when it is unclear how to build or test a component. The primary Makefile contains targets for building inside or outside Docker, if the user has not previously specified a preference then ask which they would like to use.
## Building a specified backend
Let's say the user wants to build a particular backend for a given platform. For example let's say they want to build coqui for ROCM/hipblas
- The Makefile has targets like `docker-build-coqui` created with `generate-docker-build-target` at the time of writing. Recently added backends may require a new target.
- At a minimum we need to set the BUILD_TYPE, BASE_IMAGE build-args
- Use .github/workflows/backend.yml as a reference it lists the needed args in the `include` job strategy matrix
- l4t and cublas also requires the CUDA major and minor version
- You can pretty print a command like `DOCKER_MAKEFLAGS=-j$(nproc --ignore=1) BUILD_TYPE=hipblas BASE_IMAGE=rocm/dev-ubuntu-24.04:6.4.4 make docker-build-coqui`
- Unless the user specifies that they want you to run the command, then just print it because not all agent frontends handle long running jobs well and the output may overflow your context
- The user may say they want to build AMD or ROCM instead of hipblas, or Intel instead of SYCL or NVIDIA insted of l4t or cublas. Ask for confirmation if there is ambiguity.
- Sometimes the user may need extra parameters to be added to `docker build` (e.g. `--platform` for cross-platform builds or `--progress` to view the full logs), in which case you can generate the `docker build` command directly.
## Adding a New Backend
When adding a new backend to LocalAI, you need to update several files to ensure the backend is properly built, tested, and registered. Here's a step-by-step guide based on the pattern used for adding backends like `moonshine`:
### 1. Create Backend Directory Structure
Create the backend directory under the appropriate location:
- **Python backends**: `backend/python/<backend-name>/`
- **Go backends**: `backend/go/<backend-name>/`
- **C++ backends**: `backend/cpp/<backend-name>/`
For Python backends, you'll typically need:
- `backend.py` - Main gRPC server implementation
- `Makefile` - Build configuration
- `install.sh` - Installation script for dependencies
- `protogen.sh` - Protocol buffer generation script
- `requirements.txt` - Python dependencies
- `run.sh` - Runtime script
- `test.py` / `test.sh` - Test files
### 2. Add Build Configurations to `.github/workflows/backend.yml`
Add build matrix entries for each platform/GPU type you want to support. Look at similar backends (e.g., `chatterbox`, `faster-whisper`) for reference.
**Placement in file:**
- CPU builds: Add after other CPU builds (e.g., after `cpu-chatterbox`)
- CUDA 12 builds: Add after other CUDA 12 builds (e.g., after `gpu-nvidia-cuda-12-chatterbox`)
- CUDA 13 builds: Add after other CUDA 13 builds (e.g., after `gpu-nvidia-cuda-13-chatterbox`)
**Additional build types you may need:**
- ROCm/HIP: Use `build-type: 'hipblas'` with `base-image: "rocm/dev-ubuntu-24.04:6.4.4"`
- Intel/SYCL: Use `build-type: 'intel'` or `build-type: 'sycl_f16'`/`sycl_f32` with `base-image: "intel/oneapi-basekit:2025.3.0-0-devel-ubuntu24.04"`
- L4T (ARM): Use `build-type: 'l4t'` with `platforms: 'linux/arm64'` and `runs-on: 'ubuntu-24.04-arm'`
### 3. Add Backend Metadata to `backend/index.yaml`
**Step 3a: Add Meta Definition**
Add a YAML anchor definition in the `## metas` section (around line 2-300). Look for similar backends to use as a template such as `diffusers` or `chatterbox`
**Step 3b: Add Image Entries**
Add image entries at the end of the file, following the pattern of similar backends such as `diffusers` or `chatterbox`. Include both `latest` (production) and `master` (development) tags.
### 4. Update the Makefile
The Makefile needs to be updated in several places to support building and testing the new backend:
**Step 4a: Add to `.NOTPARALLEL`**
Add `backends/<backend-name>` to the `.NOTPARALLEL` line (around line 2) to prevent parallel execution conflicts:
```makefile
.NOTPARALLEL: ... backends/<backend-name>
```
**Step 4b: Add to `prepare-test-extra`**
Add the backend to the `prepare-test-extra` target (around line 312) to prepare it for testing:
```makefile
prepare-test-extra: protogen-python
...
$(MAKE) -C backend/python/<backend-name>
```
**Step 4c: Add to `test-extra`**
Add the backend to the `test-extra` target (around line 319) to run its tests:
```makefile
test-extra: prepare-test-extra
...
$(MAKE) -C backend/python/<backend-name> test
```
**Step 4d: Add Backend Definition**
Add a backend definition variable in the backend definitions section (around line 428-457). The format depends on the backend type:
**For Python backends with root context** (like `faster-whisper`, `coqui`):
```makefile
BACKEND_<BACKEND_NAME> = <backend-name>|python|.|false|true
```
**For Python backends with `./backend` context** (like `chatterbox`, `moonshine`):
```makefile
BACKEND_<BACKEND_NAME> = <backend-name>|python|./backend|false|true
```
**For Go backends**:
```makefile
BACKEND_<BACKEND_NAME> = <backend-name>|golang|.|false|true
```
**Step 4e: Generate Docker Build Target**
Add an eval call to generate the docker-build target (around line 480-501):
```makefile
$(eval $(call generate-docker-build-target,$(BACKEND_<BACKEND_NAME>)))
```
**Step 4f: Add to `docker-build-backends`**
Add `docker-build-<backend-name>` to the `docker-build-backends` target (around line 507):
```makefile
docker-build-backends: ... docker-build-<backend-name>
```
**Determining the Context:**
- If the backend is in `backend/python/<backend-name>/` and uses `./backend` as context in the workflow file, use `./backend` context
- If the backend is in `backend/python/<backend-name>/` but uses `.` as context in the workflow file, use `.` context
- Check similar backends to determine the correct context
### 5. Verification Checklist
After adding a new backend, verify:
- [ ] Backend directory structure is complete with all necessary files
- [ ] Build configurations added to `.github/workflows/backend.yml` for all desired platforms
- [ ] Meta definition added to `backend/index.yaml` in the `## metas` section
- [ ] Image entries added to `backend/index.yaml` for all build variants (latest + development)
- [ ] Tag suffixes match between workflow file and index.yaml
- [ ] Makefile updated with all 6 required changes (`.NOTPARALLEL`, `prepare-test-extra`, `test-extra`, backend definition, docker-build target eval, `docker-build-backends`)
- [ ] No YAML syntax errors (check with linter)
- [ ] No Makefile syntax errors (check with linter)
- [ ] Follows the same pattern as similar backends (e.g., if it's a transcription backend, follow `faster-whisper` pattern)
### 6. Example: Adding a Python Backend
For reference, when `moonshine` was added:
- **Files created**: `backend/python/moonshine/{backend.py, Makefile, install.sh, protogen.sh, requirements.txt, run.sh, test.py, test.sh}`
- **Workflow entries**: 3 build configurations (CPU, CUDA 12, CUDA 13)
- **Index entries**: 1 meta definition + 6 image entries (cpu, cuda12, cuda13 × latest/development)
- **Makefile updates**:
- Added to `.NOTPARALLEL` line
- Added to `prepare-test-extra` and `test-extra` targets
- Added `BACKEND_MOONSHINE = moonshine|python|./backend|false|true`
- Added eval for docker-build target generation
- Added `docker-build-moonshine` to `docker-build-backends`
# Coding style
- The project has the following .editorconfig
@@ -77,3 +234,57 @@ When fixing compilation errors after upstream changes:
- HTTP server uses `server_routes` with HTTP handlers
- Both use the same `server_context` and task queue infrastructure
- gRPC methods: `LoadModel`, `Predict`, `PredictStream`, `Embedding`, `Rerank`, `TokenizeString`, `GetMetrics`, `Health`
## Tool Call Parsing Maintenance
When working on JSON/XML tool call parsing functionality, always check llama.cpp for reference implementation and updates:
### Checking for XML Parsing Changes
1. **Review XML Format Definitions**: Check `llama.cpp/common/chat-parser-xml-toolcall.h` for `xml_tool_call_format` struct changes
2. **Review Parsing Logic**: Check `llama.cpp/common/chat-parser-xml-toolcall.cpp` for parsing algorithm updates
3. **Review Format Presets**: Check `llama.cpp/common/chat-parser.cpp` for new XML format presets (search for `xml_tool_call_format form`)
4. **Review Model Lists**: Check `llama.cpp/common/chat.h` for `COMMON_CHAT_FORMAT_*` enum values that use XML parsing:
- `COMMON_CHAT_FORMAT_GLM_4_5`
- `COMMON_CHAT_FORMAT_MINIMAX_M2`
- `COMMON_CHAT_FORMAT_KIMI_K2`
- `COMMON_CHAT_FORMAT_QWEN3_CODER_XML`
- `COMMON_CHAT_FORMAT_APRIEL_1_5`
- `COMMON_CHAT_FORMAT_XIAOMI_MIMO`
- Any new formats added
### Model Configuration Options
Always check `llama.cpp` for new model configuration options that should be supported in LocalAI:
1. **Check Server Context**: Review `llama.cpp/tools/server/server-context.cpp` for new parameters
2. **Check Chat Params**: Review `llama.cpp/common/chat.h` for `common_chat_params` struct changes
3. **Check Server Options**: Review `llama.cpp/tools/server/server.cpp` for command-line argument changes
4. **Examples of options to check**:
- `ctx_shift` - Context shifting support
- `parallel_tool_calls` - Parallel tool calling
- `reasoning_format` - Reasoning format options
- Any new flags or parameters
### Implementation Guidelines
1. **Feature Parity**: Always aim for feature parity with llama.cpp's implementation
2. **Test Coverage**: Add tests for new features matching llama.cpp's behavior
3. **Documentation**: Update relevant documentation when adding new formats or options
4. **Backward Compatibility**: Ensure changes don't break existing functionality
### Files to Monitor
- `llama.cpp/common/chat-parser-xml-toolcall.h` - Format definitions
- `llama.cpp/common/chat-parser-xml-toolcall.cpp` - Parsing logic
- `llama.cpp/common/chat-parser.cpp` - Format presets and model-specific handlers
- `llama.cpp/common/chat.h` - Format enums and parameter structures
- `llama.cpp/tools/server/server-context.cpp` - Server configuration options
# Documentation
The project documentation is located in `docs/content`. When adding new features or changing existing functionality, it is crucial to update the documentation to reflect these changes. This helps users understand how to use the new capabilities and ensures the documentation stays relevant.
- **Feature Documentation**: If you add a new feature (like a new backend or API endpoint), create a new markdown file in `docs/content/features/` explaining what it is, how to configure it, and how to use it.
- **Configuration**: If you modify configuration options, update the relevant sections in `docs/content/`.
- **Examples**: providing concrete examples (like YAML configuration blocks) is highly encouraged to help users get started quickly.

View File

@@ -78,6 +78,20 @@ LOCALAI_IMAGE_TAG=test LOCALAI_IMAGE=local-ai-aio make run-e2e-aio
We are welcome the contribution of the documents, please open new PR or create a new issue. The documentation is available under `docs/` https://github.com/mudler/LocalAI/tree/master/docs
### Gallery YAML Schema
LocalAI provides a JSON Schema for gallery model YAML files at:
`core/schema/gallery-model.schema.json`
This schema mirrors the internal gallery model configuration and can be used by editors (such as VS Code) to enable autocomplete, validation, and inline documentation when creating or modifying gallery files.
To use it with the YAML language server, add the following comment at the top of a gallery YAML file:
```yaml
# yaml-language-server: $schema=../core/schema/gallery-model.schema.json
```
## Community and Communication
- You can reach out via the Github issue tracker.

View File

@@ -1,6 +1,7 @@
ARG BASE_IMAGE=ubuntu:22.04
ARG BASE_IMAGE=ubuntu:24.04
ARG GRPC_BASE_IMAGE=${BASE_IMAGE}
ARG INTEL_BASE_IMAGE=${BASE_IMAGE}
ARG UBUNTU_CODENAME=noble
FROM ${BASE_IMAGE} AS requirements
@@ -9,7 +10,7 @@ ENV DEBIAN_FRONTEND=noninteractive
RUN apt-get update && \
apt-get install -y --no-install-recommends \
ca-certificates curl wget espeak-ng libgomp1 \
ffmpeg && \
ffmpeg libopenblas0 libopenblas-dev sox && \
apt-get clean && \
rm -rf /var/lib/apt/lists/*
@@ -23,7 +24,7 @@ ARG SKIP_DRIVERS=false
ARG TARGETARCH
ARG TARGETVARIANT
ENV BUILD_TYPE=${BUILD_TYPE}
ARG UBUNTU_VERSION=2204
ARG UBUNTU_VERSION=2404
RUN mkdir -p /run/localai
RUN echo "default" > /run/localai/capability
@@ -34,11 +35,45 @@ RUN <<EOT bash
apt-get update && \
apt-get install -y --no-install-recommends \
software-properties-common pciutils wget gpg-agent && \
wget -qO - https://packages.lunarg.com/lunarg-signing-key-pub.asc | apt-key add - && \
wget -qO /etc/apt/sources.list.d/lunarg-vulkan-jammy.list https://packages.lunarg.com/vulkan/lunarg-vulkan-jammy.list && \
apt-get update && \
apt-get install -y \
vulkan-sdk && \
apt-get install -y libglm-dev cmake libxcb-dri3-0 libxcb-present0 libpciaccess0 \
libpng-dev libxcb-keysyms1-dev libxcb-dri3-dev libx11-dev g++ gcc \
libwayland-dev libxrandr-dev libxcb-randr0-dev libxcb-ewmh-dev \
git python-is-python3 bison libx11-xcb-dev liblz4-dev libzstd-dev \
ocaml-core ninja-build pkg-config libxml2-dev wayland-protocols python3-jsonschema \
clang-format qtbase5-dev qt6-base-dev libxcb-glx0-dev sudo xz-utils mesa-vulkan-drivers
if [ "amd64" = "$TARGETARCH" ]; then
wget "https://sdk.lunarg.com/sdk/download/1.4.335.0/linux/vulkansdk-linux-x86_64-1.4.335.0.tar.xz" && \
tar -xf vulkansdk-linux-x86_64-1.4.335.0.tar.xz && \
rm vulkansdk-linux-x86_64-1.4.335.0.tar.xz && \
mkdir -p /opt/vulkan-sdk && \
mv 1.4.335.0 /opt/vulkan-sdk/ && \
cd /opt/vulkan-sdk/1.4.335.0 && \
./vulkansdk --no-deps --maxjobs \
vulkan-loader \
vulkan-validationlayers \
vulkan-extensionlayer \
vulkan-tools \
shaderc && \
cp -rfv /opt/vulkan-sdk/1.4.335.0/x86_64/bin/* /usr/bin/ && \
cp -rfv /opt/vulkan-sdk/1.4.335.0/x86_64/lib/* /usr/lib/x86_64-linux-gnu/ && \
cp -rfv /opt/vulkan-sdk/1.4.335.0/x86_64/include/* /usr/include/ && \
cp -rfv /opt/vulkan-sdk/1.4.335.0/x86_64/share/* /usr/share/ && \
rm -rf /opt/vulkan-sdk
fi
if [ "arm64" = "$TARGETARCH" ]; then
mkdir vulkan && cd vulkan && \
curl -L -o vulkan-sdk.tar.xz https://github.com/mudler/vulkan-sdk-arm/releases/download/1.4.335.0/vulkansdk-ubuntu-24.04-arm-1.4.335.0.tar.xz && \
tar -xvf vulkan-sdk.tar.xz && \
rm vulkan-sdk.tar.xz && \
cd 1.4.335.0 && \
cp -rfv aarch64/bin/* /usr/bin/ && \
cp -rfv aarch64/lib/* /usr/lib/aarch64-linux-gnu/ && \
cp -rfv aarch64/include/* /usr/include/ && \
cp -rfv aarch64/share/* /usr/share/ && \
cd ../.. && \
rm -rf vulkan
fi
ldconfig && \
apt-get clean && \
rm -rf /var/lib/apt/lists/* && \
echo "vulkan" > /run/localai/capability
@@ -141,13 +176,12 @@ ENV PATH=/opt/rocm/bin:${PATH}
# The requirements-core target is common to all images. It should not be placed in requirements-core unless every single build will use it.
FROM requirements-drivers AS build-requirements
ARG GO_VERSION=1.22.6
ARG CMAKE_VERSION=3.26.4
ARG GO_VERSION=1.25.4
ARG CMAKE_VERSION=3.31.10
ARG CMAKE_FROM_SOURCE=false
ARG TARGETARCH
ARG TARGETVARIANT
RUN apt-get update && \
apt-get install -y --no-install-recommends \
build-essential \
@@ -204,9 +238,10 @@ WORKDIR /build
# https://community.intel.com/t5/Intel-oneAPI-Math-Kernel-Library/APT-Repository-not-working-signatures-invalid/m-p/1599436/highlight/true#M36143
# This is a temporary workaround until Intel fixes their repository
FROM ${INTEL_BASE_IMAGE} AS intel
ARG UBUNTU_CODENAME=noble
RUN wget -qO - https://repositories.intel.com/gpu/intel-graphics.key | \
gpg --yes --dearmor --output /usr/share/keyrings/intel-graphics.gpg
RUN echo "deb [arch=amd64 signed-by=/usr/share/keyrings/intel-graphics.gpg] https://repositories.intel.com/gpu/ubuntu jammy/lts/2350 unified" > /etc/apt/sources.list.d/intel-graphics.list
RUN echo "deb [arch=amd64 signed-by=/usr/share/keyrings/intel-graphics.gpg] https://repositories.intel.com/gpu/ubuntu ${UBUNTU_CODENAME}/lts/2350 unified" > /etc/apt/sources.list.d/intel-graphics.list
RUN apt-get update && \
apt-get install -y --no-install-recommends \
intel-oneapi-runtime-libs && \

View File

@@ -1,4 +1,4 @@
ARG BASE_IMAGE=ubuntu:22.04
ARG BASE_IMAGE=ubuntu:24.04
FROM ${BASE_IMAGE}

330
Makefile
View File

@@ -1,15 +1,20 @@
# Disable parallel execution for backend builds
.NOTPARALLEL: backends/diffusers backends/llama-cpp backends/piper backends/stablediffusion-ggml backends/whisper backends/faster-whisper backends/silero-vad backends/local-store backends/huggingface backends/rfdetr backends/kitten-tts backends/kokoro backends/chatterbox backends/llama-cpp-darwin backends/neutts build-darwin-python-backend build-darwin-go-backend backends/mlx backends/diffuser-darwin backends/mlx-vlm backends/mlx-audio backends/stablediffusion-ggml-darwin backends/vllm backends/vllm-omni backends/moonshine backends/pocket-tts backends/qwen-tts backends/qwen-asr backends/voxcpm backends/whisperx
GOCMD=go
GOTEST=$(GOCMD) test
GOVET=$(GOCMD) vet
BINARY_NAME=local-ai
LAUNCHER_BINARY_NAME=local-ai-launcher
CUDA_MAJOR_VERSION?=13
CUDA_MINOR_VERSION?=0
UBUNTU_VERSION?=2404
UBUNTU_CODENAME?=noble
GORELEASER?=
export BUILD_TYPE?=
export CUDA_MAJOR_VERSION?=13
export CUDA_MINOR_VERSION?=0
GO_TAGS?=
BUILD_ID?=
@@ -155,7 +160,17 @@ test: test-models/testmodel.ggml protogen-go
########################################################
docker-build-aio:
docker build --build-arg MAKEFLAGS="--jobs=5 --output-sync=target" -t local-ai:tests -f Dockerfile .
docker build \
--build-arg MAKEFLAGS="--jobs=5 --output-sync=target" \
--build-arg BASE_IMAGE=$(BASE_IMAGE) \
--build-arg IMAGE_TYPE=$(IMAGE_TYPE) \
--build-arg BUILD_TYPE=$(BUILD_TYPE) \
--build-arg CUDA_MAJOR_VERSION=$(CUDA_MAJOR_VERSION) \
--build-arg CUDA_MINOR_VERSION=$(CUDA_MINOR_VERSION) \
--build-arg UBUNTU_VERSION=$(UBUNTU_VERSION) \
--build-arg UBUNTU_CODENAME=$(UBUNTU_CODENAME) \
--build-arg GO_TAGS="$(GO_TAGS)" \
-t local-ai:tests -f Dockerfile .
BASE_IMAGE=local-ai:tests DOCKER_AIO_IMAGE=local-ai-aio:test $(MAKE) docker-aio
e2e-aio:
@@ -174,20 +189,29 @@ run-e2e-aio: protogen-go
########################################################
prepare-e2e:
mkdir -p $(TEST_DIR)
cp -rfv $(abspath ./tests/e2e-fixtures)/gpu.yaml $(TEST_DIR)/gpu.yaml
test -e $(TEST_DIR)/ggllm-test-model.bin || wget -q https://huggingface.co/TheBloke/CodeLlama-7B-Instruct-GGUF/resolve/main/codellama-7b-instruct.Q2_K.gguf -O $(TEST_DIR)/ggllm-test-model.bin
docker build --build-arg IMAGE_TYPE=core --build-arg BUILD_TYPE=$(BUILD_TYPE) --build-arg CUDA_MAJOR_VERSION=12 --build-arg CUDA_MINOR_VERSION=0 -t localai-tests .
docker build \
--build-arg IMAGE_TYPE=core \
--build-arg BUILD_TYPE=$(BUILD_TYPE) \
--build-arg BASE_IMAGE=$(BASE_IMAGE) \
--build-arg CUDA_MAJOR_VERSION=$(CUDA_MAJOR_VERSION) \
--build-arg CUDA_MINOR_VERSION=$(CUDA_MINOR_VERSION) \
--build-arg UBUNTU_VERSION=$(UBUNTU_VERSION) \
--build-arg UBUNTU_CODENAME=$(UBUNTU_CODENAME) \
--build-arg GO_TAGS="$(GO_TAGS)" \
--build-arg MAKEFLAGS="$(DOCKER_MAKEFLAGS)" \
-t localai-tests .
run-e2e-image:
ls -liah $(abspath ./tests/e2e-fixtures)
docker run -p 5390:8080 -e MODELS_PATH=/models -e THREADS=1 -e DEBUG=true -d --rm -v $(TEST_DIR):/models --gpus all --name e2e-tests-$(RANDOM) localai-tests
docker run -p 5390:8080 -e MODELS_PATH=/models -e THREADS=1 -e DEBUG=true -d --rm -v $(TEST_DIR):/models --name e2e-tests-$(RANDOM) localai-tests
test-e2e:
test-e2e: build-mock-backend prepare-e2e run-e2e-image
@echo 'Running e2e tests'
BUILD_TYPE=$(BUILD_TYPE) \
LOCALAI_API=http://$(E2E_BRIDGE_IP):5390/v1 \
LOCALAI_API=http://$(E2E_BRIDGE_IP):5390 \
$(GOCMD) run github.com/onsi/ginkgo/v2/ginkgo --flake-attempts $(TEST_FLAKES) -v -r ./tests/e2e
$(MAKE) clean-mock-backend
$(MAKE) teardown-e2e
docker rmi localai-tests
teardown-e2e:
rm -rf $(TEST_DIR) || true
@@ -287,19 +311,33 @@ prepare-test-extra: protogen-python
$(MAKE) -C backend/python/diffusers
$(MAKE) -C backend/python/chatterbox
$(MAKE) -C backend/python/vllm
$(MAKE) -C backend/python/vllm-omni
$(MAKE) -C backend/python/vibevoice
$(MAKE) -C backend/python/moonshine
$(MAKE) -C backend/python/pocket-tts
$(MAKE) -C backend/python/qwen-tts
$(MAKE) -C backend/python/qwen-asr
$(MAKE) -C backend/python/voxcpm
$(MAKE) -C backend/python/whisperx
test-extra: prepare-test-extra
$(MAKE) -C backend/python/transformers test
$(MAKE) -C backend/python/diffusers test
$(MAKE) -C backend/python/chatterbox test
$(MAKE) -C backend/python/vllm test
$(MAKE) -C backend/python/vllm-omni test
$(MAKE) -C backend/python/vibevoice test
$(MAKE) -C backend/python/moonshine test
$(MAKE) -C backend/python/pocket-tts test
$(MAKE) -C backend/python/qwen-tts test
$(MAKE) -C backend/python/qwen-asr test
$(MAKE) -C backend/python/voxcpm test
$(MAKE) -C backend/python/whisperx test
DOCKER_IMAGE?=local-ai
DOCKER_AIO_IMAGE?=local-ai-aio
IMAGE_TYPE?=core
BASE_IMAGE?=ubuntu:22.04
BASE_IMAGE?=ubuntu:24.04
docker:
docker build \
@@ -308,24 +346,34 @@ docker:
--build-arg GO_TAGS="$(GO_TAGS)" \
--build-arg MAKEFLAGS="$(DOCKER_MAKEFLAGS)" \
--build-arg BUILD_TYPE=$(BUILD_TYPE) \
--build-arg CUDA_MAJOR_VERSION=$(CUDA_MAJOR_VERSION) \
--build-arg CUDA_MINOR_VERSION=$(CUDA_MINOR_VERSION) \
--build-arg UBUNTU_VERSION=$(UBUNTU_VERSION) \
--build-arg UBUNTU_CODENAME=$(UBUNTU_CODENAME) \
-t $(DOCKER_IMAGE) .
docker-cuda11:
docker-cuda12:
docker build \
--build-arg CUDA_MAJOR_VERSION=11 \
--build-arg CUDA_MINOR_VERSION=8 \
--build-arg CUDA_MAJOR_VERSION=${CUDA_MAJOR_VERSION} \
--build-arg CUDA_MINOR_VERSION=${CUDA_MINOR_VERSION} \
--build-arg BASE_IMAGE=$(BASE_IMAGE) \
--build-arg IMAGE_TYPE=$(IMAGE_TYPE) \
--build-arg GO_TAGS="$(GO_TAGS)" \
--build-arg MAKEFLAGS="$(DOCKER_MAKEFLAGS)" \
--build-arg BUILD_TYPE=$(BUILD_TYPE) \
-t $(DOCKER_IMAGE)-cuda-11 .
--build-arg UBUNTU_VERSION=$(UBUNTU_VERSION) \
--build-arg UBUNTU_CODENAME=$(UBUNTU_CODENAME) \
-t $(DOCKER_IMAGE)-cuda-12 .
docker-aio:
@echo "Building AIO image with base $(BASE_IMAGE) as $(DOCKER_AIO_IMAGE)"
docker build \
--build-arg BASE_IMAGE=$(BASE_IMAGE) \
--build-arg MAKEFLAGS="$(DOCKER_MAKEFLAGS)" \
--build-arg CUDA_MAJOR_VERSION=$(CUDA_MAJOR_VERSION) \
--build-arg CUDA_MINOR_VERSION=$(CUDA_MINOR_VERSION) \
--build-arg UBUNTU_VERSION=$(UBUNTU_VERSION) \
--build-arg UBUNTU_CODENAME=$(UBUNTU_CODENAME) \
-t $(DOCKER_AIO_IMAGE) -f Dockerfile.aio .
docker-aio-all:
@@ -334,66 +382,31 @@ docker-aio-all:
docker-image-intel:
docker build \
--build-arg BASE_IMAGE=quay.io/go-skynet/intel-oneapi-base:latest \
--build-arg BASE_IMAGE=intel/oneapi-basekit:2025.3.0-0-devel-ubuntu24.04 \
--build-arg IMAGE_TYPE=$(IMAGE_TYPE) \
--build-arg GO_TAGS="$(GO_TAGS)" \
--build-arg MAKEFLAGS="$(DOCKER_MAKEFLAGS)" \
--build-arg BUILD_TYPE=intel -t $(DOCKER_IMAGE) .
--build-arg BUILD_TYPE=intel \
--build-arg CUDA_MAJOR_VERSION=$(CUDA_MAJOR_VERSION) \
--build-arg CUDA_MINOR_VERSION=$(CUDA_MINOR_VERSION) \
--build-arg UBUNTU_VERSION=$(UBUNTU_VERSION) \
--build-arg UBUNTU_CODENAME=$(UBUNTU_CODENAME) \
-t $(DOCKER_IMAGE) .
########################################################
## Backends
########################################################
# Pattern rule for standard backends (docker-based)
# This matches all backends that use docker-build-* and docker-save-*
backends/%: docker-build-% docker-save-% build
./local-ai backends install "ocifile://$(abspath ./backend-images/$*.tar)"
backends/diffusers: docker-build-diffusers docker-save-diffusers build
./local-ai backends install "ocifile://$(abspath ./backend-images/diffusers.tar)"
backends/llama-cpp: docker-build-llama-cpp docker-save-llama-cpp build
./local-ai backends install "ocifile://$(abspath ./backend-images/llama-cpp.tar)"
backends/piper: docker-build-piper docker-save-piper build
./local-ai backends install "ocifile://$(abspath ./backend-images/piper.tar)"
backends/stablediffusion-ggml: docker-build-stablediffusion-ggml docker-save-stablediffusion-ggml build
./local-ai backends install "ocifile://$(abspath ./backend-images/stablediffusion-ggml.tar)"
backends/whisper: docker-build-whisper docker-save-whisper build
./local-ai backends install "ocifile://$(abspath ./backend-images/whisper.tar)"
backends/silero-vad: docker-build-silero-vad docker-save-silero-vad build
./local-ai backends install "ocifile://$(abspath ./backend-images/silero-vad.tar)"
backends/local-store: docker-build-local-store docker-save-local-store build
./local-ai backends install "ocifile://$(abspath ./backend-images/local-store.tar)"
backends/huggingface: docker-build-huggingface docker-save-huggingface build
./local-ai backends install "ocifile://$(abspath ./backend-images/huggingface.tar)"
backends/rfdetr: docker-build-rfdetr docker-save-rfdetr build
./local-ai backends install "ocifile://$(abspath ./backend-images/rfdetr.tar)"
backends/kitten-tts: docker-build-kitten-tts docker-save-kitten-tts build
./local-ai backends install "ocifile://$(abspath ./backend-images/kitten-tts.tar)"
backends/kokoro: docker-build-kokoro docker-save-kokoro build
./local-ai backends install "ocifile://$(abspath ./backend-images/kokoro.tar)"
backends/chatterbox: docker-build-chatterbox docker-save-chatterbox build
./local-ai backends install "ocifile://$(abspath ./backend-images/chatterbox.tar)"
# Darwin-specific backends (keep as explicit targets since they have special build logic)
backends/llama-cpp-darwin: build
bash ./scripts/build/llama-cpp-darwin.sh
./local-ai backends install "ocifile://$(abspath ./backend-images/llama-cpp.tar)"
backends/neutts: docker-build-neutts docker-save-neutts build
./local-ai backends install "ocifile://$(abspath ./backend-images/neutts.tar)"
backends/vllm: docker-build-vllm docker-save-vllm build
./local-ai backends install "ocifile://$(abspath ./backend-images/vllm.tar)"
backends/vibevoice: docker-build-vibevoice docker-save-vibevoice build
./local-ai backends install "ocifile://$(abspath ./backend-images/vibevoice.tar)"
build-darwin-python-backend: build
bash ./scripts/build/python-darwin.sh
@@ -423,121 +436,102 @@ backends/stablediffusion-ggml-darwin:
backend-images:
mkdir -p backend-images
docker-build-llama-cpp:
docker build --build-arg BUILD_TYPE=$(BUILD_TYPE) --build-arg BASE_IMAGE=$(BASE_IMAGE) -t local-ai-backend:llama-cpp -f backend/Dockerfile.llama-cpp .
# Backend metadata: BACKEND_NAME | DOCKERFILE_TYPE | BUILD_CONTEXT | PROGRESS_FLAG | NEEDS_BACKEND_ARG
# llama-cpp is special - uses llama-cpp Dockerfile and doesn't need BACKEND arg
BACKEND_LLAMA_CPP = llama-cpp|llama-cpp|.|false|false
docker-build-bark-cpp:
docker build --build-arg BUILD_TYPE=$(BUILD_TYPE) --build-arg BASE_IMAGE=$(BASE_IMAGE) -t local-ai-backend:bark-cpp -f backend/Dockerfile.golang --build-arg BACKEND=bark-cpp .
# Golang backends
BACKEND_PIPER = piper|golang|.|false|true
BACKEND_LOCAL_STORE = local-store|golang|.|false|true
BACKEND_HUGGINGFACE = huggingface|golang|.|false|true
BACKEND_SILERO_VAD = silero-vad|golang|.|false|true
BACKEND_STABLEDIFFUSION_GGML = stablediffusion-ggml|golang|.|--progress=plain|true
BACKEND_WHISPER = whisper|golang|.|false|true
docker-build-piper:
docker build --build-arg BUILD_TYPE=$(BUILD_TYPE) --build-arg BASE_IMAGE=$(BASE_IMAGE) -t local-ai-backend:piper -f backend/Dockerfile.golang --build-arg BACKEND=piper .
# Python backends with root context
BACKEND_RERANKERS = rerankers|python|.|false|true
BACKEND_TRANSFORMERS = transformers|python|.|false|true
BACKEND_FASTER_WHISPER = faster-whisper|python|.|false|true
BACKEND_COQUI = coqui|python|.|false|true
BACKEND_RFDETR = rfdetr|python|.|false|true
BACKEND_KITTEN_TTS = kitten-tts|python|.|false|true
BACKEND_NEUTTS = neutts|python|.|false|true
BACKEND_KOKORO = kokoro|python|.|false|true
BACKEND_VLLM = vllm|python|.|false|true
BACKEND_VLLM_OMNI = vllm-omni|python|.|false|true
BACKEND_DIFFUSERS = diffusers|python|.|--progress=plain|true
BACKEND_CHATTERBOX = chatterbox|python|.|false|true
BACKEND_VIBEVOICE = vibevoice|python|.|--progress=plain|true
BACKEND_MOONSHINE = moonshine|python|.|false|true
BACKEND_POCKET_TTS = pocket-tts|python|.|false|true
BACKEND_QWEN_TTS = qwen-tts|python|.|false|true
BACKEND_QWEN_ASR = qwen-asr|python|.|false|true
BACKEND_VOXCPM = voxcpm|python|.|false|true
BACKEND_WHISPERX = whisperx|python|.|false|true
docker-build-local-store:
docker build --build-arg BUILD_TYPE=$(BUILD_TYPE) --build-arg BASE_IMAGE=$(BASE_IMAGE) -t local-ai-backend:local-store -f backend/Dockerfile.golang --build-arg BACKEND=local-store .
# Helper function to build docker image for a backend
# Usage: $(call docker-build-backend,BACKEND_NAME,DOCKERFILE_TYPE,BUILD_CONTEXT,PROGRESS_FLAG,NEEDS_BACKEND_ARG)
define docker-build-backend
docker build $(if $(filter-out false,$(4)),$(4)) \
--build-arg BUILD_TYPE=$(BUILD_TYPE) \
--build-arg BASE_IMAGE=$(BASE_IMAGE) \
--build-arg CUDA_MAJOR_VERSION=$(CUDA_MAJOR_VERSION) \
--build-arg CUDA_MINOR_VERSION=$(CUDA_MINOR_VERSION) \
--build-arg UBUNTU_VERSION=$(UBUNTU_VERSION) \
--build-arg UBUNTU_CODENAME=$(UBUNTU_CODENAME) \
$(if $(filter true,$(5)),--build-arg BACKEND=$(1)) \
-t local-ai-backend:$(1) -f backend/Dockerfile.$(2) $(3)
endef
docker-build-huggingface:
docker build --build-arg BUILD_TYPE=$(BUILD_TYPE) --build-arg BASE_IMAGE=$(BASE_IMAGE) -t local-ai-backend:huggingface -f backend/Dockerfile.golang --build-arg BACKEND=huggingface .
# Generate docker-build targets from backend definitions
define generate-docker-build-target
docker-build-$(word 1,$(subst |, ,$(1))):
$$(call docker-build-backend,$(word 1,$(subst |, ,$(1))),$(word 2,$(subst |, ,$(1))),$(word 3,$(subst |, ,$(1))),$(word 4,$(subst |, ,$(1))),$(word 5,$(subst |, ,$(1))))
endef
docker-build-rfdetr:
docker build --build-arg BUILD_TYPE=$(BUILD_TYPE) --build-arg BASE_IMAGE=$(BASE_IMAGE) -t local-ai-backend:rfdetr -f backend/Dockerfile.python --build-arg BACKEND=rfdetr ./backend
# Generate all docker-build targets
$(eval $(call generate-docker-build-target,$(BACKEND_LLAMA_CPP)))
$(eval $(call generate-docker-build-target,$(BACKEND_PIPER)))
$(eval $(call generate-docker-build-target,$(BACKEND_LOCAL_STORE)))
$(eval $(call generate-docker-build-target,$(BACKEND_HUGGINGFACE)))
$(eval $(call generate-docker-build-target,$(BACKEND_SILERO_VAD)))
$(eval $(call generate-docker-build-target,$(BACKEND_STABLEDIFFUSION_GGML)))
$(eval $(call generate-docker-build-target,$(BACKEND_WHISPER)))
$(eval $(call generate-docker-build-target,$(BACKEND_RERANKERS)))
$(eval $(call generate-docker-build-target,$(BACKEND_TRANSFORMERS)))
$(eval $(call generate-docker-build-target,$(BACKEND_FASTER_WHISPER)))
$(eval $(call generate-docker-build-target,$(BACKEND_COQUI)))
$(eval $(call generate-docker-build-target,$(BACKEND_RFDETR)))
$(eval $(call generate-docker-build-target,$(BACKEND_KITTEN_TTS)))
$(eval $(call generate-docker-build-target,$(BACKEND_NEUTTS)))
$(eval $(call generate-docker-build-target,$(BACKEND_KOKORO)))
$(eval $(call generate-docker-build-target,$(BACKEND_VLLM)))
$(eval $(call generate-docker-build-target,$(BACKEND_VLLM_OMNI)))
$(eval $(call generate-docker-build-target,$(BACKEND_DIFFUSERS)))
$(eval $(call generate-docker-build-target,$(BACKEND_CHATTERBOX)))
$(eval $(call generate-docker-build-target,$(BACKEND_VIBEVOICE)))
$(eval $(call generate-docker-build-target,$(BACKEND_MOONSHINE)))
$(eval $(call generate-docker-build-target,$(BACKEND_POCKET_TTS)))
$(eval $(call generate-docker-build-target,$(BACKEND_QWEN_TTS)))
$(eval $(call generate-docker-build-target,$(BACKEND_QWEN_ASR)))
$(eval $(call generate-docker-build-target,$(BACKEND_VOXCPM)))
$(eval $(call generate-docker-build-target,$(BACKEND_WHISPERX)))
docker-build-kitten-tts:
docker build --build-arg BUILD_TYPE=$(BUILD_TYPE) --build-arg BASE_IMAGE=$(BASE_IMAGE) -t local-ai-backend:kitten-tts -f backend/Dockerfile.python --build-arg BACKEND=kitten-tts ./backend
# Pattern rule for docker-save targets
docker-save-%: backend-images
docker save local-ai-backend:$* -o backend-images/$*.tar
docker-save-kitten-tts: backend-images
docker save local-ai-backend:kitten-tts -o backend-images/kitten-tts.tar
docker-build-backends: docker-build-llama-cpp docker-build-rerankers docker-build-vllm docker-build-vllm-omni docker-build-transformers docker-build-diffusers docker-build-kokoro docker-build-faster-whisper docker-build-coqui docker-build-chatterbox docker-build-vibevoice docker-build-moonshine docker-build-pocket-tts docker-build-qwen-tts docker-build-qwen-asr docker-build-voxcpm docker-build-whisperx
docker-save-chatterbox: backend-images
docker save local-ai-backend:chatterbox -o backend-images/chatterbox.tar
########################################################
### Mock Backend for E2E Tests
########################################################
docker-save-vibevoice: backend-images
docker save local-ai-backend:vibevoice -o backend-images/vibevoice.tar
build-mock-backend: protogen-go
$(GOCMD) build -o tests/e2e/mock-backend/mock-backend ./tests/e2e/mock-backend
docker-build-neutts:
docker build --build-arg BUILD_TYPE=$(BUILD_TYPE) --build-arg BASE_IMAGE=$(BASE_IMAGE) -t local-ai-backend:neutts -f backend/Dockerfile.python --build-arg BACKEND=neutts ./backend
docker-save-neutts: backend-images
docker save local-ai-backend:neutts -o backend-images/neutts.tar
docker-build-kokoro:
docker build --build-arg BUILD_TYPE=$(BUILD_TYPE) --build-arg BASE_IMAGE=$(BASE_IMAGE) -t local-ai-backend:kokoro -f backend/Dockerfile.python --build-arg BACKEND=kokoro ./backend
docker-build-vllm:
docker build --build-arg CUDA_MAJOR_VERSION=$(CUDA_MAJOR_VERSION) --build-arg CUDA_MINOR_VERSION=$(CUDA_MINOR_VERSION) --build-arg BUILD_TYPE=$(BUILD_TYPE) --build-arg BASE_IMAGE=$(BASE_IMAGE) -t local-ai-backend:vllm -f backend/Dockerfile.python --build-arg BACKEND=vllm ./backend
docker-save-vllm: backend-images
docker save local-ai-backend:vllm -o backend-images/vllm.tar
docker-save-kokoro: backend-images
docker save local-ai-backend:kokoro -o backend-images/kokoro.tar
docker-save-rfdetr: backend-images
docker save local-ai-backend:rfdetr -o backend-images/rfdetr.tar
docker-save-huggingface: backend-images
docker save local-ai-backend:huggingface -o backend-images/huggingface.tar
docker-save-local-store: backend-images
docker save local-ai-backend:local-store -o backend-images/local-store.tar
docker-build-silero-vad:
docker build --build-arg BUILD_TYPE=$(BUILD_TYPE) --build-arg BASE_IMAGE=$(BASE_IMAGE) -t local-ai-backend:silero-vad -f backend/Dockerfile.golang --build-arg BACKEND=silero-vad .
docker-save-silero-vad: backend-images
docker save local-ai-backend:silero-vad -o backend-images/silero-vad.tar
docker-save-piper: backend-images
docker save local-ai-backend:piper -o backend-images/piper.tar
docker-save-llama-cpp: backend-images
docker save local-ai-backend:llama-cpp -o backend-images/llama-cpp.tar
docker-save-bark-cpp: backend-images
docker save local-ai-backend:bark-cpp -o backend-images/bark-cpp.tar
docker-build-stablediffusion-ggml:
docker build --progress=plain --build-arg BUILD_TYPE=$(BUILD_TYPE) --build-arg BASE_IMAGE=$(BASE_IMAGE) --build-arg CUDA_MAJOR_VERSION=$(CUDA_MAJOR_VERSION) --build-arg CUDA_MINOR_VERSION=$(CUDA_MINOR_VERSION) -t local-ai-backend:stablediffusion-ggml -f backend/Dockerfile.golang --build-arg BACKEND=stablediffusion-ggml .
docker-save-stablediffusion-ggml: backend-images
docker save local-ai-backend:stablediffusion-ggml -o backend-images/stablediffusion-ggml.tar
docker-build-rerankers:
docker build --build-arg BUILD_TYPE=$(BUILD_TYPE) --build-arg BASE_IMAGE=$(BASE_IMAGE) -t local-ai-backend:rerankers -f backend/Dockerfile.python --build-arg BACKEND=rerankers .
docker-build-transformers:
docker build --build-arg BUILD_TYPE=$(BUILD_TYPE) --build-arg BASE_IMAGE=$(BASE_IMAGE) -t local-ai-backend:transformers -f backend/Dockerfile.python --build-arg BACKEND=transformers .
docker-build-diffusers:
docker build --progress=plain --build-arg BUILD_TYPE=$(BUILD_TYPE) --build-arg BASE_IMAGE=$(BASE_IMAGE) -t local-ai-backend:diffusers -f backend/Dockerfile.python --build-arg BACKEND=diffusers ./backend
docker-save-diffusers: backend-images
docker save local-ai-backend:diffusers -o backend-images/diffusers.tar
docker-build-whisper:
docker build --build-arg BUILD_TYPE=$(BUILD_TYPE) --build-arg BASE_IMAGE=$(BASE_IMAGE) --build-arg CUDA_MAJOR_VERSION=$(CUDA_MAJOR_VERSION) --build-arg CUDA_MINOR_VERSION=$(CUDA_MINOR_VERSION) -t local-ai-backend:whisper -f backend/Dockerfile.golang --build-arg BACKEND=whisper .
docker-save-whisper: backend-images
docker save local-ai-backend:whisper -o backend-images/whisper.tar
docker-build-faster-whisper:
docker build --build-arg BUILD_TYPE=$(BUILD_TYPE) --build-arg BASE_IMAGE=$(BASE_IMAGE) -t local-ai-backend:faster-whisper -f backend/Dockerfile.python --build-arg BACKEND=faster-whisper .
docker-build-coqui:
docker build --build-arg BUILD_TYPE=$(BUILD_TYPE) --build-arg BASE_IMAGE=$(BASE_IMAGE) -t local-ai-backend:coqui -f backend/Dockerfile.python --build-arg BACKEND=coqui .
docker-build-bark:
docker build --build-arg BUILD_TYPE=$(BUILD_TYPE) --build-arg BASE_IMAGE=$(BASE_IMAGE) -t local-ai-backend:bark -f backend/Dockerfile.python --build-arg BACKEND=bark .
docker-build-chatterbox:
docker build --build-arg BUILD_TYPE=$(BUILD_TYPE) --build-arg BASE_IMAGE=$(BASE_IMAGE) -t local-ai-backend:chatterbox -f backend/Dockerfile.python --build-arg BACKEND=chatterbox ./backend
docker-build-vibevoice:
docker build --progress=plain --build-arg BUILD_TYPE=$(BUILD_TYPE) --build-arg BASE_IMAGE=$(BASE_IMAGE) -t local-ai-backend:vibevoice -f backend/Dockerfile.python --build-arg BACKEND=vibevoice ./backend
docker-build-exllama2:
docker build --build-arg BUILD_TYPE=$(BUILD_TYPE) --build-arg BASE_IMAGE=$(BASE_IMAGE) -t local-ai-backend:exllama2 -f backend/Dockerfile.python --build-arg BACKEND=exllama2 .
docker-build-backends: docker-build-llama-cpp docker-build-rerankers docker-build-vllm docker-build-transformers docker-build-diffusers docker-build-kokoro docker-build-faster-whisper docker-build-coqui docker-build-bark docker-build-chatterbox docker-build-vibevoice docker-build-exllama2
clean-mock-backend:
rm -f tests/e2e/mock-backend/mock-backend
########################################################
### END Backends

118
README.md
View File

@@ -51,34 +51,16 @@
**LocalAI** is the free, Open Source OpenAI alternative. LocalAI act as a drop-in replacement REST API that's compatible with OpenAI (Elevenlabs, Anthropic... ) API specifications for local AI inferencing. It allows you to run LLMs, generate images, audio (and not only) locally or on-prem with consumer grade hardware, supporting multiple model families. Does not require GPU. It is created and maintained by [Ettore Di Giacinto](https://github.com/mudler).
## 📚🆕 Local Stack Family
## Local Stack Family
🆕 LocalAI is now part of a comprehensive suite of AI tools designed to work together:
Liking LocalAI? LocalAI is part of an integrated suite of AI infrastructure tools, you might also like:
- **[LocalAGI](https://github.com/mudler/LocalAGI)** - AI agent orchestration platform with OpenAI Responses API compatibility and advanced agentic capabilities
- **[LocalRecall](https://github.com/mudler/LocalRecall)** - MCP/REST API knowledge base system providing persistent memory and storage for AI agents
- 🆕 **[Cogito](https://github.com/mudler/cogito)** - Go library for building intelligent, co-operative agentic software and LLM-powered workflows, focusing on improving results for small, open source language models that scales to any LLM. Powers LocalAGI and LocalAI MCP/Agentic capabilities
- 🆕 **[Wiz](https://github.com/mudler/wiz)** - Terminal-based AI agent accessible via Ctrl+Space keybinding. Portable, local-LLM friendly shell assistant with TUI/CLI modes, tool execution with approval, MCP protocol support, and multi-shell compatibility (zsh, bash, fish)
- 🆕 **[SkillServer](https://github.com/mudler/skillserver)** - Simple, centralized skills database for AI agents via MCP. Manages skills as Markdown files with MCP server integration, web UI for editing, Git synchronization, and full-text search capabilities
<table>
<tr>
<td width="50%" valign="top">
<a href="https://github.com/mudler/LocalAGI">
<img src="https://raw.githubusercontent.com/mudler/LocalAGI/refs/heads/main/webui/react-ui/public/logo_2.png" width="300" alt="LocalAGI Logo">
</a>
</td>
<td width="50%" valign="top">
<h3><a href="https://github.com/mudler/LocalAGI">LocalAGI</a></h3>
<p>A powerful Local AI agent management platform that serves as a drop-in replacement for OpenAI's Responses API, enhanced with advanced agentic capabilities.</p>
</td>
</tr>
<tr>
<td width="50%" valign="top">
<a href="https://github.com/mudler/LocalRecall">
<img src="https://raw.githubusercontent.com/mudler/LocalRecall/refs/heads/main/static/localrecall_horizontal.png" width="300" alt="LocalRecall Logo">
</a>
</td>
<td width="50%" valign="top">
<h3><a href="https://github.com/mudler/LocalRecall">LocalRecall</a></h3>
<p>A REST-ful API and knowledge base management system that provides persistent memory and storage capabilities for AI agents.</p>
</td>
</tr>
</table>
## Screenshots / Video
@@ -111,6 +93,8 @@
## 💻 Quickstart
> ⚠️ **Note:** The `install.sh` script is currently experiencing issues due to the heavy changes currently undergoing in LocalAI and may produce broken or misconfigured installations. Please use Docker installation (see below) or manual binary installation until [issue #8032](https://github.com/mudler/LocalAI/issues/8032) is resolved.
Run the installer script:
```bash
@@ -128,7 +112,7 @@ For more installation options, see [Installer Options](https://localai.io/instal
> Note: the DMGs are not signed by Apple as quarantined. See https://github.com/mudler/LocalAI/issues/6268 for a workaround, fix is tracked here: https://github.com/mudler/LocalAI/issues/6244
Or run with docker:
### Containers (Docker, podman, ...)
> **💡 Docker Run vs Docker Start**
>
@@ -137,55 +121,59 @@ Or run with docker:
>
> If you've already run LocalAI before and want to start it again, use: `docker start -i local-ai`
### CPU only image:
#### CPU only image:
```bash
docker run -ti --name local-ai -p 8080:8080 localai/localai:latest
```
### NVIDIA GPU Images:
#### NVIDIA GPU Images:
```bash
# CUDA 13.0
docker run -ti --name local-ai -p 8080:8080 --gpus all localai/localai:latest-gpu-nvidia-cuda-13
# CUDA 12.0
docker run -ti --name local-ai -p 8080:8080 --gpus all localai/localai:latest-gpu-nvidia-cuda-12
# CUDA 11.7
docker run -ti --name local-ai -p 8080:8080 --gpus all localai/localai:latest-gpu-nvidia-cuda-11
# NVIDIA Jetson (L4T) ARM64
# CUDA 12 (for Nvidia AGX Orin and similar platforms)
docker run -ti --name local-ai -p 8080:8080 --gpus all localai/localai:latest-nvidia-l4t-arm64
# CUDA 13 (for Nvidia DGX Spark)
docker run -ti --name local-ai -p 8080:8080 --gpus all localai/localai:latest-nvidia-l4t-arm64-cuda-13
```
### AMD GPU Images (ROCm):
#### AMD GPU Images (ROCm):
```bash
docker run -ti --name local-ai -p 8080:8080 --device=/dev/kfd --device=/dev/dri --group-add=video localai/localai:latest-gpu-hipblas
```
### Intel GPU Images (oneAPI):
#### Intel GPU Images (oneAPI):
```bash
docker run -ti --name local-ai -p 8080:8080 --device=/dev/dri/card1 --device=/dev/dri/renderD128 localai/localai:latest-gpu-intel
```
### Vulkan GPU Images:
#### Vulkan GPU Images:
```bash
docker run -ti --name local-ai -p 8080:8080 localai/localai:latest-gpu-vulkan
```
### AIO Images (pre-downloaded models):
#### AIO Images (pre-downloaded models):
```bash
# CPU version
docker run -ti --name local-ai -p 8080:8080 localai/localai:latest-aio-cpu
# NVIDIA CUDA 13 version
docker run -ti --name local-ai -p 8080:8080 --gpus all localai/localai:latest-aio-gpu-nvidia-cuda-13
# NVIDIA CUDA 12 version
docker run -ti --name local-ai -p 8080:8080 --gpus all localai/localai:latest-aio-gpu-nvidia-cuda-12
# NVIDIA CUDA 11 version
docker run -ti --name local-ai -p 8080:8080 --gpus all localai/localai:latest-aio-gpu-nvidia-cuda-11
# Intel GPU version
docker run -ti --name local-ai -p 8080:8080 localai/localai:latest-aio-gpu-intel
@@ -251,6 +239,7 @@ Roadmap items: [List of issues](https://github.com/mudler/LocalAI/issues?q=is%3A
- 🔈 [Audio to Text](https://localai.io/features/audio-to-text/) (Audio transcription with `whisper.cpp`)
- 🎨 [Image generation](https://localai.io/features/image-generation)
- 🔥 [OpenAI-alike tools API](https://localai.io/features/openai-functions/)
- ⚡ [Realtime API](https://localai.io/features/openai-realtime/) (Speech-to-speech)
- 🧠 [Embeddings generation for vector databases](https://localai.io/features/embeddings/)
- ✍️ [Constrained grammars](https://localai.io/features/constrained_grammars/)
- 🖼️ [Download Models directly from Huggingface ](https://localai.io/models/)
@@ -269,39 +258,39 @@ LocalAI supports a comprehensive range of AI backends with multiple acceleration
### Text Generation & Language Models
| Backend | Description | Acceleration Support |
|---------|-------------|---------------------|
| **llama.cpp** | LLM inference in C/C++ | CUDA 11/12, ROCm, Intel SYCL, Vulkan, Metal, CPU |
| **vLLM** | Fast LLM inference with PagedAttention | CUDA 12, ROCm, Intel |
| **transformers** | HuggingFace transformers framework | CUDA 11/12, ROCm, Intel, CPU |
| **exllama2** | GPTQ inference library | CUDA 12 |
| **llama.cpp** | LLM inference in C/C++ | CUDA 12/13, ROCm, Intel SYCL, Vulkan, Metal, CPU |
| **vLLM** | Fast LLM inference with PagedAttention | CUDA 12/13, ROCm, Intel |
| **transformers** | HuggingFace transformers framework | CUDA 12/13, ROCm, Intel, CPU |
| **MLX** | Apple Silicon LLM inference | Metal (M1/M2/M3+) |
| **MLX-VLM** | Apple Silicon Vision-Language Models | Metal (M1/M2/M3+) |
### Audio & Speech Processing
| Backend | Description | Acceleration Support |
|---------|-------------|---------------------|
| **whisper.cpp** | OpenAI Whisper in C/C++ | CUDA 12, ROCm, Intel SYCL, Vulkan, CPU |
| **faster-whisper** | Fast Whisper with CTranslate2 | CUDA 12, ROCm, Intel, CPU |
| **bark** | Text-to-audio generation | CUDA 12, ROCm, Intel |
| **bark-cpp** | C++ implementation of Bark | CUDA, Metal, CPU |
| **coqui** | Advanced TTS with 1100+ languages | CUDA 12, ROCm, Intel, CPU |
| **kokoro** | Lightweight TTS model | CUDA 12, ROCm, Intel, CPU |
| **chatterbox** | Production-grade TTS | CUDA 11/12, CPU |
| **whisper.cpp** | OpenAI Whisper in C/C++ | CUDA 12/13, ROCm, Intel SYCL, Vulkan, CPU |
| **faster-whisper** | Fast Whisper with CTranslate2 | CUDA 12/13, ROCm, Intel, CPU |
| **coqui** | Advanced TTS with 1100+ languages | CUDA 12/13, ROCm, Intel, CPU |
| **kokoro** | Lightweight TTS model | CUDA 12/13, ROCm, Intel, CPU |
| **chatterbox** | Production-grade TTS | CUDA 12/13, CPU |
| **piper** | Fast neural TTS system | CPU |
| **kitten-tts** | Kitten TTS models | CPU |
| **silero-vad** | Voice Activity Detection | CPU |
| **neutts** | Text-to-speech with voice cloning | CUDA 12, ROCm, CPU |
| **neutts** | Text-to-speech with voice cloning | CUDA 12/13, ROCm, CPU |
| **vibevoice** | Real-time TTS with voice cloning | CUDA 12/13, ROCm, Intel, CPU |
| **pocket-tts** | Lightweight CPU-based TTS | CUDA 12/13, ROCm, Intel, CPU |
| **qwen-tts** | High-quality TTS with custom voice, voice design, and voice cloning | CUDA 12/13, ROCm, Intel, CPU |
### Image & Video Generation
| Backend | Description | Acceleration Support |
|---------|-------------|---------------------|
| **stablediffusion.cpp** | Stable Diffusion in C/C++ | CUDA 12, Intel SYCL, Vulkan, CPU |
| **diffusers** | HuggingFace diffusion models | CUDA 11/12, ROCm, Intel, Metal, CPU |
| **stablediffusion.cpp** | Stable Diffusion in C/C++ | CUDA 12/13, Intel SYCL, Vulkan, CPU |
| **diffusers** | HuggingFace diffusion models | CUDA 12/13, ROCm, Intel, Metal, CPU |
### Specialized AI Tasks
| Backend | Description | Acceleration Support |
|---------|-------------|---------------------|
| **rfdetr** | Real-time object detection | CUDA 12, Intel, CPU |
| **rerankers** | Document reranking API | CUDA 11/12, ROCm, Intel, CPU |
| **rfdetr** | Real-time object detection | CUDA 12/13, Intel, CPU |
| **rerankers** | Document reranking API | CUDA 12/13, ROCm, Intel, CPU |
| **local-store** | Vector database | CPU |
| **huggingface** | HuggingFace API integration | API-based |
@@ -309,13 +298,14 @@ LocalAI supports a comprehensive range of AI backends with multiple acceleration
| Acceleration Type | Supported Backends | Hardware Support |
|-------------------|-------------------|------------------|
| **NVIDIA CUDA 11** | llama.cpp, whisper, stablediffusion, diffusers, rerankers, bark, chatterbox | Nvidia hardware |
| **NVIDIA CUDA 12** | All CUDA-compatible backends | Nvidia hardware |
| **AMD ROCm** | llama.cpp, whisper, vllm, transformers, diffusers, rerankers, coqui, kokoro, bark, neutts | AMD Graphics |
| **Intel oneAPI** | llama.cpp, whisper, stablediffusion, vllm, transformers, diffusers, rfdetr, rerankers, exllama2, coqui, kokoro, bark | Intel Arc, Intel iGPUs |
| **Apple Metal** | llama.cpp, whisper, diffusers, MLX, MLX-VLM, bark-cpp | Apple M1/M2/M3+ |
| **NVIDIA CUDA 13** | All CUDA-compatible backends | Nvidia hardware |
| **AMD ROCm** | llama.cpp, whisper, vllm, transformers, diffusers, rerankers, coqui, kokoro, neutts, vibevoice, pocket-tts, qwen-tts | AMD Graphics |
| **Intel oneAPI** | llama.cpp, whisper, stablediffusion, vllm, transformers, diffusers, rfdetr, rerankers, coqui, kokoro, vibevoice, pocket-tts, qwen-tts | Intel Arc, Intel iGPUs |
| **Apple Metal** | llama.cpp, whisper, diffusers, MLX, MLX-VLM | Apple M1/M2/M3+ |
| **Vulkan** | llama.cpp, whisper, stablediffusion | Cross-platform GPUs |
| **NVIDIA Jetson** | llama.cpp, whisper, stablediffusion, diffusers, rfdetr | ARM64 embedded AI |
| **NVIDIA Jetson (CUDA 12)** | llama.cpp, whisper, stablediffusion, diffusers, rfdetr | ARM64 embedded AI (AGX Orin, etc.) |
| **NVIDIA Jetson (CUDA 13)** | llama.cpp, whisper, stablediffusion, diffusers, rfdetr | ARM64 embedded AI (DGX Spark) |
| **CPU Optimized** | All backends | AVX/AVX2/AVX512, quantization support |
### 🔗 Community and integrations
@@ -334,6 +324,10 @@ Agentic Libraries:
MCPs:
- https://github.com/mudler/MCPs
OS Assistant:
- https://github.com/mudler/Keygeist - Keygeist is an AI-powered keyboard operator that listens for key combinations and responds with AI-generated text typed directly into your Linux box.
Model galleries
- https://github.com/go-skynet/model-gallery
@@ -408,6 +402,10 @@ A huge thank you to our generous sponsors who support this project covering CI e
</a>
</p>
### Individual sponsors
A special thanks to individual sponsors that contributed to the project, a full list is in [Github](https://github.com/sponsors/mudler) and [buymeacoffee](https://buymeacoffee.com/mudler), a special shout out goes to [drikster80](https://github.com/drikster80) for being generous. Thank you everyone!
## 🌟 Star history
[![LocalAI Star history Chart](https://api.star-history.com/svg?repos=go-skynet/LocalAI&type=Date)](https://star-history.com/#go-skynet/LocalAI&Date)

View File

@@ -1,4 +1,4 @@
ARG BASE_IMAGE=ubuntu:22.04
ARG BASE_IMAGE=ubuntu:24.04
FROM ${BASE_IMAGE} AS builder
ARG BACKEND=rerankers
@@ -12,8 +12,8 @@ ENV CUDA_MINOR_VERSION=${CUDA_MINOR_VERSION}
ENV DEBIAN_FRONTEND=noninteractive
ARG TARGETARCH
ARG TARGETVARIANT
ARG GO_VERSION=1.22.6
ARG UBUNTU_VERSION=2204
ARG GO_VERSION=1.25.4
ARG UBUNTU_VERSION=2404
RUN apt-get update && \
apt-get install -y --no-install-recommends \
@@ -40,11 +40,45 @@ RUN <<EOT bash
apt-get update && \
apt-get install -y --no-install-recommends \
software-properties-common pciutils wget gpg-agent && \
wget -qO - https://packages.lunarg.com/lunarg-signing-key-pub.asc | apt-key add - && \
wget -qO /etc/apt/sources.list.d/lunarg-vulkan-jammy.list https://packages.lunarg.com/vulkan/lunarg-vulkan-jammy.list && \
apt-get update && \
apt-get install -y \
vulkan-sdk && \
apt-get install -y libglm-dev cmake libxcb-dri3-0 libxcb-present0 libpciaccess0 \
libpng-dev libxcb-keysyms1-dev libxcb-dri3-dev libx11-dev g++ gcc \
libwayland-dev libxrandr-dev libxcb-randr0-dev libxcb-ewmh-dev \
git python-is-python3 bison libx11-xcb-dev liblz4-dev libzstd-dev \
ocaml-core ninja-build pkg-config libxml2-dev wayland-protocols python3-jsonschema \
clang-format qtbase5-dev qt6-base-dev libxcb-glx0-dev sudo xz-utils
if [ "amd64" = "$TARGETARCH" ]; then
wget "https://sdk.lunarg.com/sdk/download/1.4.335.0/linux/vulkansdk-linux-x86_64-1.4.335.0.tar.xz" && \
tar -xf vulkansdk-linux-x86_64-1.4.335.0.tar.xz && \
rm vulkansdk-linux-x86_64-1.4.335.0.tar.xz && \
mkdir -p /opt/vulkan-sdk && \
mv 1.4.335.0 /opt/vulkan-sdk/ && \
cd /opt/vulkan-sdk/1.4.335.0 && \
./vulkansdk --no-deps --maxjobs \
vulkan-loader \
vulkan-validationlayers \
vulkan-extensionlayer \
vulkan-tools \
shaderc && \
cp -rfv /opt/vulkan-sdk/1.4.335.0/x86_64/bin/* /usr/bin/ && \
cp -rfv /opt/vulkan-sdk/1.4.335.0/x86_64/lib/* /usr/lib/x86_64-linux-gnu/ && \
cp -rfv /opt/vulkan-sdk/1.4.335.0/x86_64/include/* /usr/include/ && \
cp -rfv /opt/vulkan-sdk/1.4.335.0/x86_64/share/* /usr/share/ && \
rm -rf /opt/vulkan-sdk
fi
if [ "arm64" = "$TARGETARCH" ]; then
mkdir vulkan && cd vulkan && \
curl -L -o vulkan-sdk.tar.xz https://github.com/mudler/vulkan-sdk-arm/releases/download/1.4.335.0/vulkansdk-ubuntu-24.04-arm-1.4.335.0.tar.xz && \
tar -xvf vulkan-sdk.tar.xz && \
rm vulkan-sdk.tar.xz && \
cd 1.4.335.0 && \
cp -rfv aarch64/bin/* /usr/bin/ && \
cp -rfv aarch64/lib/* /usr/lib/aarch64-linux-gnu/ && \
cp -rfv aarch64/include/* /usr/include/ && \
cp -rfv aarch64/share/* /usr/share/ && \
cd ../.. && \
rm -rf vulkan
fi
ldconfig && \
apt-get clean && \
rm -rf /var/lib/apt/lists/*
fi
@@ -148,6 +182,8 @@ EOT
COPY . /LocalAI
RUN git config --global --add safe.directory /LocalAI
RUN cd /LocalAI && make protogen-go && make -C /LocalAI/backend/go/${BACKEND} build
FROM scratch

View File

@@ -1,4 +1,4 @@
ARG BASE_IMAGE=ubuntu:22.04
ARG BASE_IMAGE=ubuntu:24.04
ARG GRPC_BASE_IMAGE=${BASE_IMAGE}
@@ -10,7 +10,8 @@ FROM ${GRPC_BASE_IMAGE} AS grpc
ARG GRPC_MAKEFLAGS="-j4 -Otarget"
ARG GRPC_VERSION=v1.65.0
ARG CMAKE_FROM_SOURCE=false
ARG CMAKE_VERSION=3.26.4
# CUDA Toolkit 13.x compatibility: CMake 3.31.9+ fixes toolchain detection/arch table issues
ARG CMAKE_VERSION=3.31.10
ENV MAKEFLAGS=${GRPC_MAKEFLAGS}
@@ -26,7 +27,7 @@ RUN apt-get update && \
# Install CMake (the version in 22.04 is too old)
RUN <<EOT bash
if [ "${CMAKE_FROM_SOURCE}}" = "true" ]; then
if [ "${CMAKE_FROM_SOURCE}" = "true" ]; then
curl -L -s https://github.com/Kitware/CMake/releases/download/v${CMAKE_VERSION}/cmake-${CMAKE_VERSION}.tar.gz -o cmake.tar.gz && tar xvf cmake.tar.gz && cd cmake-${CMAKE_VERSION} && ./configure && make && make install
else
apt-get update && \
@@ -50,6 +51,13 @@ RUN git clone --recurse-submodules --jobs 4 -b ${GRPC_VERSION} --depth 1 --shall
rm -rf /build
FROM ${BASE_IMAGE} AS builder
ARG CMAKE_FROM_SOURCE=false
ARG CMAKE_VERSION=3.31.10
# We can target specific CUDA ARCHITECTURES like --build-arg CUDA_DOCKER_ARCH='75;86;89;120'
ARG CUDA_DOCKER_ARCH
ENV CUDA_DOCKER_ARCH=${CUDA_DOCKER_ARCH}
ARG CMAKE_ARGS
ENV CMAKE_ARGS=${CMAKE_ARGS}
ARG BACKEND=rerankers
ARG BUILD_TYPE
ENV BUILD_TYPE=${BUILD_TYPE}
@@ -61,8 +69,8 @@ ENV CUDA_MINOR_VERSION=${CUDA_MINOR_VERSION}
ENV DEBIAN_FRONTEND=noninteractive
ARG TARGETARCH
ARG TARGETVARIANT
ARG GO_VERSION=1.22.6
ARG UBUNTU_VERSION=2204
ARG GO_VERSION=1.25.4
ARG UBUNTU_VERSION=2404
RUN apt-get update && \
apt-get install -y --no-install-recommends \
@@ -70,6 +78,7 @@ RUN apt-get update && \
ccache git \
ca-certificates \
make \
pkg-config libcurl4-openssl-dev \
curl unzip \
libssl-dev wget && \
apt-get clean && \
@@ -88,11 +97,45 @@ RUN <<EOT bash
apt-get update && \
apt-get install -y --no-install-recommends \
software-properties-common pciutils wget gpg-agent && \
wget -qO - https://packages.lunarg.com/lunarg-signing-key-pub.asc | apt-key add - && \
wget -qO /etc/apt/sources.list.d/lunarg-vulkan-jammy.list https://packages.lunarg.com/vulkan/lunarg-vulkan-jammy.list && \
apt-get update && \
apt-get install -y \
vulkan-sdk && \
apt-get install -y libglm-dev cmake libxcb-dri3-0 libxcb-present0 libpciaccess0 \
libpng-dev libxcb-keysyms1-dev libxcb-dri3-dev libx11-dev g++ gcc \
libwayland-dev libxrandr-dev libxcb-randr0-dev libxcb-ewmh-dev \
git python-is-python3 bison libx11-xcb-dev liblz4-dev libzstd-dev \
ocaml-core ninja-build pkg-config libxml2-dev wayland-protocols python3-jsonschema \
clang-format qtbase5-dev qt6-base-dev libxcb-glx0-dev sudo xz-utils
if [ "amd64" = "$TARGETARCH" ]; then
wget "https://sdk.lunarg.com/sdk/download/1.4.335.0/linux/vulkansdk-linux-x86_64-1.4.335.0.tar.xz" && \
tar -xf vulkansdk-linux-x86_64-1.4.335.0.tar.xz && \
rm vulkansdk-linux-x86_64-1.4.335.0.tar.xz && \
mkdir -p /opt/vulkan-sdk && \
mv 1.4.335.0 /opt/vulkan-sdk/ && \
cd /opt/vulkan-sdk/1.4.335.0 && \
./vulkansdk --no-deps --maxjobs \
vulkan-loader \
vulkan-validationlayers \
vulkan-extensionlayer \
vulkan-tools \
shaderc && \
cp -rfv /opt/vulkan-sdk/1.4.335.0/x86_64/bin/* /usr/bin/ && \
cp -rfv /opt/vulkan-sdk/1.4.335.0/x86_64/lib/* /usr/lib/x86_64-linux-gnu/ && \
cp -rfv /opt/vulkan-sdk/1.4.335.0/x86_64/include/* /usr/include/ && \
cp -rfv /opt/vulkan-sdk/1.4.335.0/x86_64/share/* /usr/share/ && \
rm -rf /opt/vulkan-sdk
fi
if [ "arm64" = "$TARGETARCH" ]; then
mkdir vulkan && cd vulkan && \
curl -L -o vulkan-sdk.tar.xz https://github.com/mudler/vulkan-sdk-arm/releases/download/1.4.335.0/vulkansdk-ubuntu-24.04-arm-1.4.335.0.tar.xz && \
tar -xvf vulkan-sdk.tar.xz && \
rm vulkan-sdk.tar.xz && \
cd 1.4.335.0 && \
cp -rfv aarch64/bin/* /usr/bin/ && \
cp -rfv aarch64/lib/* /usr/lib/aarch64-linux-gnu/ && \
cp -rfv aarch64/include/* /usr/include/ && \
cp -rfv aarch64/share/* /usr/share/ && \
cd ../.. && \
rm -rf vulkan
fi
ldconfig && \
apt-get clean && \
rm -rf /var/lib/apt/lists/*
fi
@@ -189,7 +232,7 @@ EOT
# Install CMake (the version in 22.04 is too old)
RUN <<EOT bash
if [ "${CMAKE_FROM_SOURCE}}" = "true" ]; then
if [ "${CMAKE_FROM_SOURCE}" = "true" ]; then
curl -L -s https://github.com/Kitware/CMake/releases/download/v${CMAKE_VERSION}/cmake-${CMAKE_VERSION}.tar.gz -o cmake.tar.gz && tar xvf cmake.tar.gz && cd cmake-${CMAKE_VERSION} && ./configure && make && make install
else
apt-get update && \
@@ -205,19 +248,30 @@ COPY --from=grpc /opt/grpc /usr/local
COPY . /LocalAI
## Otherwise just run the normal build
RUN <<EOT bash
if [ "${TARGETARCH}" = "arm64" ] || [ "${BUILD_TYPE}" = "hipblas" ]; then \
cd /LocalAI/backend/cpp/llama-cpp && make llama-cpp-fallback && \
make llama-cpp-grpc && make llama-cpp-rpc-server; \
else \
cd /LocalAI/backend/cpp/llama-cpp && make llama-cpp-avx && \
make llama-cpp-avx2 && \
make llama-cpp-avx512 && \
make llama-cpp-fallback && \
make llama-cpp-grpc && \
make llama-cpp-rpc-server; \
fi
RUN <<'EOT' bash
set -euxo pipefail
if [[ -n "${CUDA_DOCKER_ARCH:-}" ]]; then
CUDA_ARCH_ESC="${CUDA_DOCKER_ARCH//;/\\;}"
export CMAKE_ARGS="${CMAKE_ARGS:-} -DCMAKE_CUDA_ARCHITECTURES=${CUDA_ARCH_ESC}"
echo "CMAKE_ARGS(env) = ${CMAKE_ARGS}"
rm -rf /LocalAI/backend/cpp/llama-cpp-*-build
fi
if [ "${TARGETARCH}" = "arm64" ] || [ "${BUILD_TYPE}" = "hipblas" ]; then
cd /LocalAI/backend/cpp/llama-cpp
make llama-cpp-fallback
make llama-cpp-grpc
make llama-cpp-rpc-server
else
cd /LocalAI/backend/cpp/llama-cpp
make llama-cpp-avx
make llama-cpp-avx2
make llama-cpp-avx512
make llama-cpp-fallback
make llama-cpp-grpc
make llama-cpp-rpc-server
fi
EOT

View File

@@ -1,4 +1,4 @@
ARG BASE_IMAGE=ubuntu:22.04
ARG BASE_IMAGE=ubuntu:24.04
FROM ${BASE_IMAGE} AS builder
ARG BACKEND=rerankers
@@ -12,7 +12,7 @@ ENV CUDA_MINOR_VERSION=${CUDA_MINOR_VERSION}
ENV DEBIAN_FRONTEND=noninteractive
ARG TARGETARCH
ARG TARGETVARIANT
ARG UBUNTU_VERSION=2204
ARG UBUNTU_VERSION=2404
RUN apt-get update && \
apt-get install -y --no-install-recommends \
@@ -54,11 +54,45 @@ RUN <<EOT bash
apt-get update && \
apt-get install -y --no-install-recommends \
software-properties-common pciutils wget gpg-agent && \
wget -qO - https://packages.lunarg.com/lunarg-signing-key-pub.asc | apt-key add - && \
wget -qO /etc/apt/sources.list.d/lunarg-vulkan-jammy.list https://packages.lunarg.com/vulkan/lunarg-vulkan-jammy.list && \
apt-get update && \
apt-get install -y \
vulkan-sdk && \
apt-get install -y libglm-dev cmake libxcb-dri3-0 libxcb-present0 libpciaccess0 \
libpng-dev libxcb-keysyms1-dev libxcb-dri3-dev libx11-dev g++ gcc \
libwayland-dev libxrandr-dev libxcb-randr0-dev libxcb-ewmh-dev \
git python-is-python3 bison libx11-xcb-dev liblz4-dev libzstd-dev \
ocaml-core ninja-build pkg-config libxml2-dev wayland-protocols python3-jsonschema \
clang-format qtbase5-dev qt6-base-dev libxcb-glx0-dev sudo xz-utils
if [ "amd64" = "$TARGETARCH" ]; then
wget "https://sdk.lunarg.com/sdk/download/1.4.335.0/linux/vulkansdk-linux-x86_64-1.4.335.0.tar.xz" && \
tar -xf vulkansdk-linux-x86_64-1.4.335.0.tar.xz && \
rm vulkansdk-linux-x86_64-1.4.335.0.tar.xz && \
mkdir -p /opt/vulkan-sdk && \
mv 1.4.335.0 /opt/vulkan-sdk/ && \
cd /opt/vulkan-sdk/1.4.335.0 && \
./vulkansdk --no-deps --maxjobs \
vulkan-loader \
vulkan-validationlayers \
vulkan-extensionlayer \
vulkan-tools \
shaderc && \
cp -rfv /opt/vulkan-sdk/1.4.335.0/x86_64/bin/* /usr/bin/ && \
cp -rfv /opt/vulkan-sdk/1.4.335.0/x86_64/lib/* /usr/lib/x86_64-linux-gnu/ && \
cp -rfv /opt/vulkan-sdk/1.4.335.0/x86_64/include/* /usr/include/ && \
cp -rfv /opt/vulkan-sdk/1.4.335.0/x86_64/share/* /usr/share/ && \
rm -rf /opt/vulkan-sdk
fi
if [ "arm64" = "$TARGETARCH" ]; then
mkdir vulkan && cd vulkan && \
curl -L -o vulkan-sdk.tar.xz https://github.com/mudler/vulkan-sdk-arm/releases/download/1.4.335.0/vulkansdk-ubuntu-24.04-arm-1.4.335.0.tar.xz && \
tar -xvf vulkan-sdk.tar.xz && \
rm vulkan-sdk.tar.xz && \
cd 1.4.335.0 && \
cp -rfv aarch64/bin/* /usr/bin/ && \
cp -rfv aarch64/lib/* /usr/lib/aarch64-linux-gnu/ && \
cp -rfv aarch64/include/* /usr/include/ && \
cp -rfv aarch64/share/* /usr/share/ && \
cd ../.. && \
rm -rf vulkan
fi
ldconfig && \
apt-get clean && \
rm -rf /var/lib/apt/lists/*
fi
@@ -142,7 +176,8 @@ RUN if [ "${BUILD_TYPE}" = "hipblas" ]; then \
# Install uv as a system package
RUN curl -LsSf https://astral.sh/uv/install.sh | UV_INSTALL_DIR=/usr/bin sh
ENV PATH="/root/.cargo/bin:${PATH}"
# Increase timeout for uv installs behind slow networks
ENV UV_HTTP_TIMEOUT=180
RUN curl --proto '=https' --tlsv1.2 -sSf https://sh.rustup.rs | sh -s -- -y
# Install grpcio-tools (the version in 22.04 is too old)
@@ -155,12 +190,18 @@ RUN <<EOT bash
EOT
COPY python/${BACKEND} /${BACKEND}
COPY backend.proto /${BACKEND}/backend.proto
COPY python/common/ /${BACKEND}/common
COPY backend/python/${BACKEND} /${BACKEND}
COPY backend/backend.proto /${BACKEND}/backend.proto
COPY backend/python/common/ /${BACKEND}/common
COPY scripts/build/package-gpu-libs.sh /package-gpu-libs.sh
RUN cd /${BACKEND} && PORTABLE_PYTHON=true make
# Package GPU libraries into the backend's lib directory
RUN mkdir -p /${BACKEND}/lib && \
TARGET_LIB_DIR="/${BACKEND}/lib" BUILD_TYPE="${BUILD_TYPE}" CUDA_MAJOR_VERSION="${CUDA_MAJOR_VERSION}" \
bash /package-gpu-libs.sh "/${BACKEND}/lib"
FROM scratch
ARG BACKEND=rerankers
COPY --from=builder /${BACKEND}/ /

View File

@@ -46,7 +46,7 @@ The backend system provides language-specific Dockerfiles that handle the build
- **vllm**: High-performance LLM inference
- **mlx**: Apple Silicon optimization
- **diffusers**: Stable Diffusion models
- **Audio**: bark, coqui, faster-whisper, kitten-tts
- **Audio**: coqui, faster-whisper, kitten-tts
- **Vision**: mlx-vlm, rfdetr
- **Specialized**: rerankers, chatterbox, kokoro
@@ -55,7 +55,6 @@ The backend system provides language-specific Dockerfiles that handle the build
- **stablediffusion-ggml**: Stable Diffusion in Go with GGML Cpp backend
- **huggingface**: Hugging Face model integration
- **piper**: Text-to-speech synthesis Golang with C bindings using rhaspy/piper
- **bark-cpp**: Bark TTS models Golang with Cpp bindings
- **local-store**: Vector storage backend
#### C++ Backends (`cpp/`)
@@ -65,7 +64,7 @@ The backend system provides language-specific Dockerfiles that handle the build
## Hardware Acceleration Support
### CUDA (NVIDIA)
- **Versions**: CUDA 11.x, 12.x
- **Versions**: CUDA 12.x, 13.x
- **Features**: cuBLAS, cuDNN, TensorRT optimization
- **Targets**: x86_64, ARM64 (Jetson)
@@ -132,8 +131,7 @@ For ARM64/Mac builds, docker can't be used, and the makefile in the respective b
### Build Types
- **`cpu`**: CPU-only optimization
- **`cublas11`**: CUDA 11.x with cuBLAS
- **`cublas12`**: CUDA 12.x with cuBLAS
- **`cublas12`**, **`cublas13`**: CUDA 12.x, 13.x with cuBLAS
- **`hipblas`**: ROCm with rocBLAS
- **`intel`**: Intel oneAPI optimization
- **`vulkan`**: Vulkan-based acceleration
@@ -210,4 +208,4 @@ When contributing to the backend system:
2. **Add Tests**: Include comprehensive test coverage
3. **Document**: Provide clear usage examples
4. **Optimize**: Consider performance and resource usage
5. **Validate**: Test across different hardware targets
5. **Validate**: Test across different hardware targets

View File

@@ -17,6 +17,7 @@ service Backend {
rpc GenerateVideo(GenerateVideoRequest) returns (Result) {}
rpc AudioTranscription(TranscriptRequest) returns (TranscriptResult) {}
rpc TTS(TTSRequest) returns (Result) {}
rpc TTSStream(TTSRequest) returns (stream Reply) {}
rpc SoundGeneration(SoundGenerationRequest) returns (Result) {}
rpc TokenizeString(PredictOptions) returns (TokenizationResponse) {}
rpc Status(HealthMessage) returns (StatusResponse) {}
@@ -32,6 +33,8 @@ service Backend {
rpc GetMetrics(MetricsRequest) returns (MetricsResponse);
rpc VAD(VADRequest) returns (VADResponse) {}
rpc ModelMetadata(ModelOptions) returns (ModelMetadataResponse) {}
}
// Define the empty request
@@ -296,12 +299,12 @@ message TranscriptSegment {
int64 end = 3;
string text = 4;
repeated int32 tokens = 5;
string speaker = 6;
}
message GenerateImageRequest {
int32 height = 1;
int32 width = 2;
int32 mode = 3;
int32 step = 4;
int32 seed = 5;
string positive_prompt = 6;
@@ -411,3 +414,8 @@ message Detection {
message DetectResponse {
repeated Detection Detections = 1;
}
message ModelMetadataResponse {
bool supports_thinking = 1;
string rendered_template = 2; // The rendered chat template with enable_thinking=true (empty if not applicable)
}

View File

@@ -70,4 +70,4 @@ target_link_libraries(${TARGET} PRIVATE common llama mtmd ${CMAKE_THREAD_LIBS_IN
target_compile_features(${TARGET} PRIVATE cxx_std_11)
if(TARGET BUILD_INFO)
add_dependencies(${TARGET} BUILD_INFO)
endif()
endif()

View File

@@ -1,5 +1,5 @@
LLAMA_VERSION?=5b6c9bc0f3c8f55598b9999b65aff7ce4119bc15
LLAMA_VERSION?=2634ed207a17db1a54bd8df0555bd8499a6ab691
LLAMA_REPO?=https://github.com/ggerganov/llama.cpp
CMAKE_ARGS?=
@@ -7,7 +7,8 @@ BUILD_TYPE?=
NATIVE?=false
ONEAPI_VARS?=/opt/intel/oneapi/setvars.sh
TARGET?=--target grpc-server
JOBS?=$(shell nproc)
JOBS?=$(shell nproc 2>/dev/null || sysctl -n hw.ncpu 2>/dev/null || echo 1)
ARCH?=$(shell uname -m)
# Disable Shared libs as we are linking on static gRPC and we can't mix shared and static
CMAKE_ARGS+=-DBUILD_SHARED_LIBS=OFF -DLLAMA_CURL=OFF
@@ -106,21 +107,21 @@ llama-cpp-avx: llama.cpp
cp -rf $(CURRENT_MAKEFILE_DIR)/../llama-cpp $(CURRENT_MAKEFILE_DIR)/../llama-cpp-avx-build
$(MAKE) -C $(CURRENT_MAKEFILE_DIR)/../llama-cpp-avx-build purge
$(info ${GREEN}I llama-cpp build info:avx${RESET})
CMAKE_ARGS="$(CMAKE_ARGS) -DGGML_AVX=on -DGGML_AVX2=off -DGGML_AVX512=off -DGGML_FMA=off -DGGML_F16C=off" $(MAKE) VARIANT="llama-cpp-avx-build" build-llama-cpp-grpc-server
CMAKE_ARGS="$(CMAKE_ARGS) -DGGML_AVX=on -DGGML_AVX2=off -DGGML_AVX512=off -DGGML_FMA=off -DGGML_F16C=off -DGGML_BMI2=off" $(MAKE) VARIANT="llama-cpp-avx-build" build-llama-cpp-grpc-server
cp -rfv $(CURRENT_MAKEFILE_DIR)/../llama-cpp-avx-build/grpc-server llama-cpp-avx
llama-cpp-fallback: llama.cpp
cp -rf $(CURRENT_MAKEFILE_DIR)/../llama-cpp $(CURRENT_MAKEFILE_DIR)/../llama-cpp-fallback-build
$(MAKE) -C $(CURRENT_MAKEFILE_DIR)/../llama-cpp-fallback-build purge
$(info ${GREEN}I llama-cpp build info:fallback${RESET})
CMAKE_ARGS="$(CMAKE_ARGS) -DGGML_AVX=off -DGGML_AVX2=off -DGGML_AVX512=off -DGGML_FMA=off -DGGML_F16C=off" $(MAKE) VARIANT="llama-cpp-fallback-build" build-llama-cpp-grpc-server
CMAKE_ARGS="$(CMAKE_ARGS) -DGGML_AVX=off -DGGML_AVX2=off -DGGML_AVX512=off -DGGML_FMA=off -DGGML_F16C=off -DGGML_BMI2=off" $(MAKE) VARIANT="llama-cpp-fallback-build" build-llama-cpp-grpc-server
cp -rfv $(CURRENT_MAKEFILE_DIR)/../llama-cpp-fallback-build/grpc-server llama-cpp-fallback
llama-cpp-grpc: llama.cpp
cp -rf $(CURRENT_MAKEFILE_DIR)/../llama-cpp $(CURRENT_MAKEFILE_DIR)/../llama-cpp-grpc-build
$(MAKE) -C $(CURRENT_MAKEFILE_DIR)/../llama-cpp-grpc-build purge
$(info ${GREEN}I llama-cpp build info:grpc${RESET})
CMAKE_ARGS="$(CMAKE_ARGS) -DGGML_RPC=ON -DGGML_AVX=off -DGGML_AVX2=off -DGGML_AVX512=off -DGGML_FMA=off -DGGML_F16C=off" TARGET="--target grpc-server --target rpc-server" $(MAKE) VARIANT="llama-cpp-grpc-build" build-llama-cpp-grpc-server
CMAKE_ARGS="$(CMAKE_ARGS) -DGGML_RPC=ON -DGGML_AVX=off -DGGML_AVX2=off -DGGML_AVX512=off -DGGML_FMA=off -DGGML_F16C=off -DGGML_BMI2=off" TARGET="--target grpc-server --target rpc-server" $(MAKE) VARIANT="llama-cpp-grpc-build" build-llama-cpp-grpc-server
cp -rfv $(CURRENT_MAKEFILE_DIR)/../llama-cpp-grpc-build/grpc-server llama-cpp-grpc
llama-cpp-rpc-server: llama-cpp-grpc

View File

@@ -23,6 +23,7 @@
#include <grpcpp/health_check_service_interface.h>
#include <regex>
#include <atomic>
#include <mutex>
#include <signal.h>
#include <thread>
@@ -82,8 +83,8 @@ static void start_llama_server(server_context& ctx_server) {
// print sample chat example to make it clear which template is used
// LOG_INF("%s: chat template, chat_template: %s, example_format: '%s'\n", __func__,
// common_chat_templates_source(ctx_server.impl->chat_templates.get()),
// common_chat_format_example(ctx_server.impl->chat_templates.get(), ctx_server.impl->params_base.use_jinja).c_str(), ctx_server.impl->params_base.default_template_kwargs);
// common_chat_templates_source(ctx_server.impl->chat_params.tmpls.get()),
// common_chat_format_example(ctx_server.impl->chat_params.tmpls.get(), ctx_server.impl->params_base.use_jinja).c_str(), ctx_server.impl->params_base.default_template_kwargs);
// Keep the chat templates initialized in load_model() so they can be used when UseTokenizerTemplate is enabled
// Templates will only be used conditionally in Predict/PredictStream when UseTokenizerTemplate is true and Messages are provided
@@ -358,9 +359,7 @@ static void params_parse(server_context& /*ctx_server*/, const backend::ModelOpt
params.model.path = request->modelfile();
if (!request->mmproj().empty()) {
// get the directory of modelfile
std::string model_dir = params.model.path.substr(0, params.model.path.find_last_of("/\\"));
params.mmproj.path = model_dir + "/"+ request->mmproj();
params.mmproj.path = request->mmproj();
}
// params.model_alias ??
params.model_alias = request->modelfile();
@@ -392,8 +391,9 @@ static void params_parse(server_context& /*ctx_server*/, const backend::ModelOpt
// Initialize fit_params options (can be overridden by options)
// fit_params: whether to auto-adjust params to fit device memory (default: true as in llama.cpp)
params.fit_params = true;
// fit_params_target: target margin per device in bytes (default: 1GB)
params.fit_params_target = 1024 * 1024 * 1024;
// fit_params_target: target margin per device in bytes (default: 1GB per device)
// Initialize as vector with default value for all devices
params.fit_params_target = std::vector<size_t>(llama_max_devices(), 1024 * 1024 * 1024);
// fit_params_min_ctx: minimum context size for fit (default: 4096)
params.fit_params_min_ctx = 4096;
@@ -470,10 +470,28 @@ static void params_parse(server_context& /*ctx_server*/, const backend::ModelOpt
} else if (!strcmp(optname, "fit_params_target") || !strcmp(optname, "fit_target")) {
if (optval != NULL) {
try {
// Value is in MiB, convert to bytes
params.fit_params_target = static_cast<size_t>(std::stoi(optval_str)) * 1024 * 1024;
// Value is in MiB, can be comma-separated list for multiple devices
// Single value is broadcast across all devices
std::string arg_next = optval_str;
const std::regex regex{ R"([,/]+)" };
std::sregex_token_iterator it{ arg_next.begin(), arg_next.end(), regex, -1 };
std::vector<std::string> split_arg{ it, {} };
if (split_arg.size() >= llama_max_devices()) {
// Too many values provided
continue;
}
if (split_arg.size() == 1) {
// Single value: broadcast to all devices
size_t value_mib = std::stoul(split_arg[0]);
std::fill(params.fit_params_target.begin(), params.fit_params_target.end(), value_mib * 1024 * 1024);
} else {
// Multiple values: set per device
for (size_t i = 0; i < split_arg.size() && i < params.fit_params_target.size(); i++) {
params.fit_params_target[i] = std::stoul(split_arg[i]) * 1024 * 1024;
}
}
} catch (const std::exception& e) {
// If conversion fails, keep default value (1GB)
// If conversion fails, keep default value (1GB per device)
}
}
} else if (!strcmp(optname, "fit_params_min_ctx") || !strcmp(optname, "fit_ctx")) {
@@ -688,13 +706,13 @@ private:
public:
BackendServiceImpl(server_context& ctx) : ctx_server(ctx) {}
grpc::Status Health(ServerContext* /*context*/, const backend::HealthMessage* /*request*/, backend::Reply* reply) {
grpc::Status Health(ServerContext* /*context*/, const backend::HealthMessage* /*request*/, backend::Reply* reply) override {
// Implement Health RPC
reply->set_message("OK");
return Status::OK;
}
grpc::Status LoadModel(ServerContext* /*context*/, const backend::ModelOptions* request, backend::Result* result) {
grpc::Status LoadModel(ServerContext* /*context*/, const backend::ModelOptions* request, backend::Result* result) override {
// Implement LoadModel RPC
common_params params;
params_parse(ctx_server, request, params);
@@ -711,11 +729,72 @@ public:
LOG_INF("\n");
LOG_INF("%s\n", common_params_get_system_info(params).c_str());
LOG_INF("\n");
// Capture error messages during model loading
struct error_capture {
std::string captured_error;
std::mutex error_mutex;
ggml_log_callback original_callback;
void* original_user_data;
} error_capture_data;
// Get original log callback
llama_log_get(&error_capture_data.original_callback, &error_capture_data.original_user_data);
// Set custom callback to capture errors
llama_log_set([](ggml_log_level level, const char * text, void * user_data) {
auto* capture = static_cast<error_capture*>(user_data);
// Capture error messages
if (level == GGML_LOG_LEVEL_ERROR) {
std::lock_guard<std::mutex> lock(capture->error_mutex);
// Append error message, removing trailing newlines
std::string msg(text);
while (!msg.empty() && (msg.back() == '\n' || msg.back() == '\r')) {
msg.pop_back();
}
if (!msg.empty()) {
if (!capture->captured_error.empty()) {
capture->captured_error.append("; ");
}
capture->captured_error.append(msg);
}
}
// Also call original callback to preserve logging
if (capture->original_callback) {
capture->original_callback(level, text, capture->original_user_data);
}
}, &error_capture_data);
// load the model
if (!ctx_server.load_model(params)) {
result->set_message("Failed loading model");
bool load_success = ctx_server.load_model(params);
// Restore original log callback
llama_log_set(error_capture_data.original_callback, error_capture_data.original_user_data);
if (!load_success) {
std::string error_msg = "Failed to load model: " + params.model.path;
if (!params.mmproj.path.empty()) {
error_msg += " (with mmproj: " + params.mmproj.path + ")";
}
if (params.speculative.has_dft() && !params.speculative.mparams_dft.path.empty()) {
error_msg += " (with draft model: " + params.speculative.mparams_dft.path + ")";
}
// Add captured error details if available
{
std::lock_guard<std::mutex> lock(error_capture_data.error_mutex);
if (!error_capture_data.captured_error.empty()) {
error_msg += ". Error: " + error_capture_data.captured_error;
} else {
error_msg += ". Model file may not exist or be invalid.";
}
}
result->set_message(error_msg);
result->set_success(false);
return Status::CANCELLED;
return grpc::Status(grpc::StatusCode::INTERNAL, error_msg);
}
// Process grammar triggers now that vocab is available
@@ -803,7 +882,7 @@ public:
std::string prompt_str;
std::vector<raw_buffer> files; // Declare files early so it's accessible in both branches
// Handle chat templates when UseTokenizerTemplate is enabled and Messages are provided
if (request->usetokenizertemplate() && request->messages_size() > 0 && ctx_server.impl->chat_templates != nullptr) {
if (request->usetokenizertemplate() && request->messages_size() > 0 && ctx_server.impl->chat_params.tmpls != nullptr) {
// Convert proto Messages to JSON format compatible with oaicompat_chat_params_parse
json body_json;
json messages_json = json::array();
@@ -1182,12 +1261,7 @@ public:
// Use the same approach as server.cpp: call oaicompat_chat_params_parse
// This handles all template application, grammar merging, etc. automatically
// Files extracted from multimodal content in messages will be added to the files vector
// Create parser options with current chat_templates to ensure tmpls is not null
oaicompat_parser_options parser_opt = ctx_server.impl->oai_parser_opt;
parser_opt.tmpls = ctx_server.impl->chat_templates.get(); // Ensure tmpls is set to current chat_templates
// Update allow_image and allow_audio based on current mctx state
parser_opt.allow_image = ctx_server.impl->mctx ? mtmd_support_vision(ctx_server.impl->mctx) : false;
parser_opt.allow_audio = ctx_server.impl->mctx ? mtmd_support_audio(ctx_server.impl->mctx) : false;
// chat_params already contains tmpls, allow_image, and allow_audio set during model loading
// Debug: Log tools before template processing
if (body_json.contains("tools")) {
@@ -1233,7 +1307,7 @@ public:
}
}
json parsed_data = oaicompat_chat_params_parse(body_json, parser_opt, files);
json parsed_data = oaicompat_chat_params_parse(body_json, ctx_server.impl->chat_params, files);
// Debug: Log tools after template processing
if (parsed_data.contains("tools")) {
@@ -1286,7 +1360,7 @@ public:
// If not using chat templates, extract files from image_data/audio_data fields
// (If using chat templates, files were already extracted by oaicompat_chat_params_parse)
if (!request->usetokenizertemplate() || request->messages_size() == 0 || ctx_server.impl->chat_templates == nullptr) {
if (!request->usetokenizertemplate() || request->messages_size() == 0 || ctx_server.impl->chat_params.tmpls == nullptr) {
const auto &images_data = data.find("image_data");
if (images_data != data.end() && images_data->is_array())
{
@@ -1494,7 +1568,7 @@ public:
return grpc::Status::OK;
}
grpc::Status Predict(ServerContext* context, const backend::PredictOptions* request, backend::Reply* reply) {
grpc::Status Predict(ServerContext* context, const backend::PredictOptions* request, backend::Reply* reply) override {
if (params_base.model.path.empty()) {
return grpc::Status(grpc::StatusCode::FAILED_PRECONDITION, "Model not loaded");
}
@@ -1514,7 +1588,7 @@ public:
std::string prompt_str;
std::vector<raw_buffer> files; // Declare files early so it's accessible in both branches
// Handle chat templates when UseTokenizerTemplate is enabled and Messages are provided
if (request->usetokenizertemplate() && request->messages_size() > 0 && ctx_server.impl->chat_templates != nullptr) {
if (request->usetokenizertemplate() && request->messages_size() > 0 && ctx_server.impl->chat_params.tmpls != nullptr) {
// Convert proto Messages to JSON format compatible with oaicompat_chat_params_parse
json body_json;
json messages_json = json::array();
@@ -1918,12 +1992,7 @@ public:
// Use the same approach as server.cpp: call oaicompat_chat_params_parse
// This handles all template application, grammar merging, etc. automatically
// Files extracted from multimodal content in messages will be added to the files vector
// Create parser options with current chat_templates to ensure tmpls is not null
oaicompat_parser_options parser_opt = ctx_server.impl->oai_parser_opt;
parser_opt.tmpls = ctx_server.impl->chat_templates.get(); // Ensure tmpls is set to current chat_templates
// Update allow_image and allow_audio based on current mctx state
parser_opt.allow_image = ctx_server.impl->mctx ? mtmd_support_vision(ctx_server.impl->mctx) : false;
parser_opt.allow_audio = ctx_server.impl->mctx ? mtmd_support_audio(ctx_server.impl->mctx) : false;
// chat_params already contains tmpls, allow_image, and allow_audio set during model loading
// Debug: Log tools before template processing
if (body_json.contains("tools")) {
@@ -1969,7 +2038,7 @@ public:
}
}
json parsed_data = oaicompat_chat_params_parse(body_json, parser_opt, files);
json parsed_data = oaicompat_chat_params_parse(body_json, ctx_server.impl->chat_params, files);
// Debug: Log tools after template processing
if (parsed_data.contains("tools")) {
@@ -2022,7 +2091,7 @@ public:
// If not using chat templates, extract files from image_data/audio_data fields
// (If using chat templates, files were already extracted by oaicompat_chat_params_parse)
if (!request->usetokenizertemplate() || request->messages_size() == 0 || ctx_server.impl->chat_templates == nullptr) {
if (!request->usetokenizertemplate() || request->messages_size() == 0 || ctx_server.impl->chat_params.tmpls == nullptr) {
const auto &images_data = data.find("image_data");
if (images_data != data.end() && images_data->is_array())
{
@@ -2165,7 +2234,7 @@ public:
return grpc::Status::OK;
}
grpc::Status Embedding(ServerContext* context, const backend::PredictOptions* request, backend::EmbeddingResult* embeddingResult) {
grpc::Status Embedding(ServerContext* context, const backend::PredictOptions* request, backend::EmbeddingResult* embeddingResult) override {
if (params_base.model.path.empty()) {
return grpc::Status(grpc::StatusCode::FAILED_PRECONDITION, "Model not loaded");
}
@@ -2260,7 +2329,7 @@ public:
return grpc::Status::OK;
}
grpc::Status Rerank(ServerContext* context, const backend::RerankRequest* request, backend::RerankResult* rerankResult) {
grpc::Status Rerank(ServerContext* context, const backend::RerankRequest* request, backend::RerankResult* rerankResult) override {
if (!params_base.embedding || params_base.pooling_type != LLAMA_POOLING_TYPE_RANK) {
return grpc::Status(grpc::StatusCode::UNIMPLEMENTED, "This server does not support reranking. Start it with `--reranking` and without `--embedding`");
}
@@ -2346,7 +2415,7 @@ public:
return grpc::Status::OK;
}
grpc::Status TokenizeString(ServerContext* /*context*/, const backend::PredictOptions* request, backend::TokenizationResponse* response) {
grpc::Status TokenizeString(ServerContext* /*context*/, const backend::PredictOptions* request, backend::TokenizationResponse* response) override {
if (params_base.model.path.empty()) {
return grpc::Status(grpc::StatusCode::FAILED_PRECONDITION, "Model not loaded");
}
@@ -2369,7 +2438,7 @@ public:
return grpc::Status::OK;
}
grpc::Status GetMetrics(ServerContext* /*context*/, const backend::MetricsRequest* /*request*/, backend::MetricsResponse* response) {
grpc::Status GetMetrics(ServerContext* /*context*/, const backend::MetricsRequest* /*request*/, backend::MetricsResponse* response) override {
// request slots data using task queue
auto rd = ctx_server.get_response_reader();
@@ -2407,6 +2476,47 @@ public:
response->set_prompt_tokens_processed(res_metrics->n_prompt_tokens_processed_total);
return grpc::Status::OK;
}
grpc::Status ModelMetadata(ServerContext* /*context*/, const backend::ModelOptions* /*request*/, backend::ModelMetadataResponse* response) override {
// Check if model is loaded
if (params_base.model.path.empty()) {
return grpc::Status(grpc::StatusCode::FAILED_PRECONDITION, "Model not loaded");
}
// Check if chat templates are initialized
if (ctx_server.impl->chat_params.tmpls == nullptr) {
// If templates are not initialized, we can't detect thinking support
// Return false as default
response->set_supports_thinking(false);
response->set_rendered_template("");
return grpc::Status::OK;
}
// Detect thinking support using llama.cpp's function
bool supports_thinking = common_chat_templates_support_enable_thinking(ctx_server.impl->chat_params.tmpls.get());
response->set_supports_thinking(supports_thinking);
// Render the template with enable_thinking=true so Go code can detect thinking tokens
// This allows reusing existing detection functions in Go
std::string rendered_template = "";
if (params_base.use_jinja) {
// Render the template with enable_thinking=true to see what the actual prompt looks like
common_chat_templates_inputs dummy_inputs;
common_chat_msg msg;
msg.role = "user";
msg.content = "test";
dummy_inputs.messages = {msg};
dummy_inputs.enable_thinking = true;
dummy_inputs.use_jinja = params_base.use_jinja;
const auto rendered = common_chat_templates_apply(ctx_server.impl->chat_params.tmpls.get(), dummy_inputs);
rendered_template = rendered.prompt;
}
response->set_rendered_template(rendered_template);
return grpc::Status::OK;
}
};

View File

@@ -6,6 +6,7 @@
set -e
CURDIR=$(dirname "$(realpath $0)")
REPO_ROOT="${CURDIR}/../../.."
# Create lib directory
mkdir -p $CURDIR/package/lib
@@ -37,6 +38,15 @@ else
exit 1
fi
# Package GPU libraries based on BUILD_TYPE
# The GPU library packaging script will detect BUILD_TYPE and copy appropriate GPU libraries
GPU_LIB_SCRIPT="${REPO_ROOT}/scripts/build/package-gpu-libs.sh"
if [ -f "$GPU_LIB_SCRIPT" ]; then
echo "Packaging GPU libraries for BUILD_TYPE=${BUILD_TYPE:-cpu}..."
source "$GPU_LIB_SCRIPT" "$CURDIR/package/lib"
package_gpu_libs
fi
echo "Packaging completed successfully"
ls -liah $CURDIR/package/
ls -liah $CURDIR/package/lib/

View File

@@ -1,51 +0,0 @@
INCLUDE_PATH := $(abspath ./)
LIBRARY_PATH := $(abspath ./)
AR?=ar
CMAKE_ARGS?=-DGGML_NATIVE=OFF
BUILD_TYPE?=
GOCMD=go
# keep standard at C11 and C++11
CXXFLAGS = -I. -I$(INCLUDE_PATH)/sources/bark.cpp/examples -I$(INCLUDE_PATH)/sources/bark.cpp/encodec.cpp/ggml/include -I$(INCLUDE_PATH)/sources/bark.cpp/spm-headers -I$(INCLUDE_PATH)/sources/bark.cpp -O3 -DNDEBUG -std=c++17 -fPIC
LDFLAGS = -L$(LIBRARY_PATH) -L$(LIBRARY_PATH)/sources/bark.cpp/build/examples -lbark -lstdc++ -lm
# bark.cpp
BARKCPP_REPO?=https://github.com/PABannier/bark.cpp.git
BARKCPP_VERSION?=5d5be84f089ab9ea53b7a793f088d3fbf7247495
# warnings
CXXFLAGS += -Wall -Wextra -Wpedantic -Wcast-qual -Wno-unused-function
## bark.cpp
sources/bark.cpp:
git clone --recursive $(BARKCPP_REPO) sources/bark.cpp && \
cd sources/bark.cpp && \
git checkout $(BARKCPP_VERSION) && \
git submodule update --init --recursive --depth 1 --single-branch
sources/bark.cpp/build/libbark.a: sources/bark.cpp
cd sources/bark.cpp && \
mkdir -p build && \
cd build && \
cmake $(CMAKE_ARGS) .. && \
cmake --build . --config Release
gobark.o:
$(CXX) $(CXXFLAGS) gobark.cpp -o gobark.o -c $(LDFLAGS)
libbark.a: sources/bark.cpp/build/libbark.a gobark.o
cp $(INCLUDE_PATH)/sources/bark.cpp/build/libbark.a ./
$(AR) rcs libbark.a gobark.o
bark-cpp: libbark.a
CGO_LDFLAGS="$(CGO_LDFLAGS)" C_INCLUDE_PATH="$(CURDIR)" LIBRARY_PATH=$(CURDIR) \
$(GOCMD) build -v -ldflags "$(LD_FLAGS)" -tags "$(GO_TAGS)" -o bark-cpp ./
package:
bash package.sh
build: bark-cpp package
clean:
rm -f gobark.o libbark.a

View File

@@ -1,85 +0,0 @@
#include <iostream>
#include <tuple>
#include "bark.h"
#include "gobark.h"
#include "common.h"
#include "ggml.h"
struct bark_context *c;
void bark_print_progress_callback(struct bark_context *bctx, enum bark_encoding_step step, int progress, void *user_data) {
if (step == bark_encoding_step::SEMANTIC) {
printf("\rGenerating semantic tokens... %d%%", progress);
} else if (step == bark_encoding_step::COARSE) {
printf("\rGenerating coarse tokens... %d%%", progress);
} else if (step == bark_encoding_step::FINE) {
printf("\rGenerating fine tokens... %d%%", progress);
}
fflush(stdout);
}
int load_model(char *model) {
// initialize bark context
struct bark_context_params ctx_params = bark_context_default_params();
bark_params params;
params.model_path = model;
// ctx_params.verbosity = verbosity;
ctx_params.progress_callback = bark_print_progress_callback;
ctx_params.progress_callback_user_data = nullptr;
struct bark_context *bctx = bark_load_model(params.model_path.c_str(), ctx_params, params.seed);
if (!bctx) {
fprintf(stderr, "%s: Could not load model\n", __func__);
return 1;
}
c = bctx;
return 0;
}
int tts(char *text,int threads, char *dst ) {
ggml_time_init();
const int64_t t_main_start_us = ggml_time_us();
// generate audio
if (!bark_generate_audio(c, text, threads)) {
fprintf(stderr, "%s: An error occurred. If the problem persists, feel free to open an issue to report it.\n", __func__);
return 1;
}
const float *audio_data = bark_get_audio_data(c);
if (audio_data == NULL) {
fprintf(stderr, "%s: Could not get audio data\n", __func__);
return 1;
}
const int audio_arr_size = bark_get_audio_data_size(c);
std::vector<float> audio_arr(audio_data, audio_data + audio_arr_size);
write_wav_on_disk(audio_arr, dst);
// report timing
{
const int64_t t_main_end_us = ggml_time_us();
const int64_t t_load_us = bark_get_load_time(c);
const int64_t t_eval_us = bark_get_eval_time(c);
printf("\n\n");
printf("%s: load time = %8.2f ms\n", __func__, t_load_us / 1000.0f);
printf("%s: eval time = %8.2f ms\n", __func__, t_eval_us / 1000.0f);
printf("%s: total time = %8.2f ms\n", __func__, (t_main_end_us - t_main_start_us) / 1000.0f);
}
return 0;
}
int unload() {
bark_free(c);
}

View File

@@ -1,52 +0,0 @@
package main
// #cgo CXXFLAGS: -I${SRCDIR}/sources/bark.cpp/ -I${SRCDIR}/sources/bark.cpp/encodec.cpp -I${SRCDIR}/sources/bark.cpp/encodec.cpp/ggml/include -I${SRCDIR}/sources/bark.cpp/examples -I${SRCDIR}/sources/bark.cpp/spm-headers
// #cgo LDFLAGS: -L${SRCDIR}/ -L${SRCDIR}/sources/bark.cpp/build/examples -L${SRCDIR}/sources/bark.cpp/build/encodec.cpp/ggml/src/ -L${SRCDIR}/sources/bark.cpp/build/encodec.cpp/ -lbark -lencodec -lcommon -lggml -lgomp
// #include <gobark.h>
// #include <stdlib.h>
import "C"
import (
"fmt"
"unsafe"
"github.com/mudler/LocalAI/pkg/grpc/base"
pb "github.com/mudler/LocalAI/pkg/grpc/proto"
)
type Bark struct {
base.SingleThread
threads int
}
func (sd *Bark) Load(opts *pb.ModelOptions) error {
sd.threads = int(opts.Threads)
modelFile := C.CString(opts.ModelFile)
defer C.free(unsafe.Pointer(modelFile))
ret := C.load_model(modelFile)
if ret != 0 {
return fmt.Errorf("inference failed")
}
return nil
}
func (sd *Bark) TTS(opts *pb.TTSRequest) error {
t := C.CString(opts.Text)
defer C.free(unsafe.Pointer(t))
dst := C.CString(opts.Dst)
defer C.free(unsafe.Pointer(dst))
threads := C.int(sd.threads)
ret := C.tts(t, threads, dst)
if ret != 0 {
return fmt.Errorf("inference failed")
}
return nil
}

View File

@@ -1,8 +0,0 @@
#ifdef __cplusplus
extern "C" {
#endif
int load_model(char *model);
int tts(char *text,int threads, char *dst );
#ifdef __cplusplus
}
#endif

View File

@@ -1,20 +0,0 @@
package main
// Note: this is started internally by LocalAI and a server is allocated for each model
import (
"flag"
grpc "github.com/mudler/LocalAI/pkg/grpc"
)
var (
addr = flag.String("addr", "localhost:50051", "the address to connect to")
)
func main() {
flag.Parse()
if err := grpc.StartServer(*addr, &Bark{}); err != nil {
panic(err)
}
}

View File

@@ -1,41 +0,0 @@
#!/bin/bash
# Script to copy the appropriate libraries based on architecture
# This script is used in the final stage of the Dockerfile
set -e
CURDIR=$(dirname "$(realpath $0)")
# Create lib directory
mkdir -p $CURDIR/package/lib
cp -avrf $CURDIR/bark-cpp $CURDIR/package/
cp -rfv $CURDIR/run.sh $CURDIR/package/
# Detect architecture and copy appropriate libraries
if [ -f "/lib64/ld-linux-x86-64.so.2" ]; then
# x86_64 architecture
echo "Detected x86_64 architecture, copying x86_64 libraries..."
cp -arfLv /lib64/ld-linux-x86-64.so.2 $CURDIR/package/lib/ld.so
cp -arfLv /lib/x86_64-linux-gnu/libc.so.6 $CURDIR/package/lib/libc.so.6
cp -arfLv /lib/x86_64-linux-gnu/libgcc_s.so.1 $CURDIR/package/lib/libgcc_s.so.1
cp -arfLv /lib/x86_64-linux-gnu/libstdc++.so.6 $CURDIR/package/lib/libstdc++.so.6
cp -arfLv /lib/x86_64-linux-gnu/libm.so.6 $CURDIR/package/lib/libm.so.6
cp -arfLv /lib/x86_64-linux-gnu/libgomp.so.1 $CURDIR/package/lib/libgomp.so.1
elif [ -f "/lib/ld-linux-aarch64.so.1" ]; then
# ARM64 architecture
echo "Detected ARM64 architecture, copying ARM64 libraries..."
cp -arfLv /lib/ld-linux-aarch64.so.1 $CURDIR/package/lib/ld.so
cp -arfLv /lib/aarch64-linux-gnu/libc.so.6 $CURDIR/package/lib/libc.so.6
cp -arfLv /lib/aarch64-linux-gnu/libgcc_s.so.1 $CURDIR/package/lib/libgcc_s.so.1
cp -arfLv /lib/aarch64-linux-gnu/libstdc++.so.6 $CURDIR/package/lib/libstdc++.so.6
cp -arfLv /lib/aarch64-linux-gnu/libm.so.6 $CURDIR/package/lib/libm.so.6
cp -arfLv /lib/aarch64-linux-gnu/libgomp.so.1 $CURDIR/package/lib/libgomp.so.1
else
echo "Error: Could not detect architecture"
exit 1
fi
echo "Packaging completed successfully"
ls -liah $CURDIR/package/
ls -liah $CURDIR/package/lib/

View File

@@ -1,13 +0,0 @@
#!/bin/bash
set -ex
CURDIR=$(dirname "$(realpath $0)")
export LD_LIBRARY_PATH=$CURDIR/lib:$LD_LIBRARY_PATH
# If there is a lib/ld.so, use it
if [ -f $CURDIR/lib/ld.so ]; then
echo "Using lib/ld.so"
exec $CURDIR/lib/ld.so $CURDIR/bark-cpp "$@"
fi
exec $CURDIR/bark-cpp "$@"

View File

@@ -8,7 +8,7 @@ JOBS?=$(shell nproc --ignore=1)
# stablediffusion.cpp (ggml)
STABLEDIFFUSION_GGML_REPO?=https://github.com/leejet/stable-diffusion.cpp
STABLEDIFFUSION_GGML_VERSION?=bda7fab9f208dff4b67179a68f694b6ddec13326
STABLEDIFFUSION_GGML_VERSION?=e411520407663e1ddf8ff2e5ed4ff3a116fbbc97
CMAKE_ARGS+=-DGGML_MAX_NAME=128
@@ -28,7 +28,12 @@ else ifeq ($(BUILD_TYPE),clblas)
CMAKE_ARGS+=-DGGML_CLBLAST=ON -DCLBlast_DIR=/some/path
# If it's hipblas we do have also to set CC=/opt/rocm/llvm/bin/clang CXX=/opt/rocm/llvm/bin/clang++
else ifeq ($(BUILD_TYPE),hipblas)
CMAKE_ARGS+=-DSD_HIPBLAS=ON -DGGML_HIPBLAS=ON
ROCM_HOME ?= /opt/rocm
ROCM_PATH ?= /opt/rocm
export CXX=$(ROCM_HOME)/llvm/bin/clang++
export CC=$(ROCM_HOME)/llvm/bin/clang
AMDGPU_TARGETS?=gfx803,gfx900,gfx906,gfx908,gfx90a,gfx942,gfx1010,gfx1030,gfx1032,gfx1100,gfx1101,gfx1102,gfx1200,gfx1201
CMAKE_ARGS+=-DSD_HIPBLAS=ON -DGGML_HIPBLAS=ON -DAMDGPU_TARGETS=$(AMDGPU_TARGETS)
else ifeq ($(BUILD_TYPE),vulkan)
CMAKE_ARGS+=-DSD_VULKAN=ON -DGGML_VULKAN=ON
else ifeq ($(OS),Darwin)

View File

@@ -55,6 +55,7 @@ const char* schedulers[] = {
"sgm_uniform",
"simple",
"smoothstep",
"kl_optimal",
"lcm",
};
@@ -147,26 +148,26 @@ static std::string lora_dir_path;
static void build_embedding_vec(const char* embedding_dir) {
embedding_vec.clear();
embedding_strings.clear();
if (!embedding_dir || strlen(embedding_dir) == 0) {
return;
}
if (!std::filesystem::exists(embedding_dir) || !std::filesystem::is_directory(embedding_dir)) {
fprintf(stderr, "Embedding directory does not exist or is not a directory: %s\n", embedding_dir);
return;
}
static const std::vector<std::string> valid_ext = {".pt", ".safetensors", ".gguf"};
for (const auto& entry : std::filesystem::directory_iterator(embedding_dir)) {
if (!entry.is_regular_file()) {
continue;
}
auto path = entry.path();
std::string ext = path.extension().string();
bool valid = false;
for (const auto& e : valid_ext) {
if (ext == e) {
@@ -177,51 +178,51 @@ static void build_embedding_vec(const char* embedding_dir) {
if (!valid) {
continue;
}
std::string name = path.stem().string();
std::string full_path = path.string();
// Store strings in persistent storage
embedding_strings.push_back(name);
embedding_strings.push_back(full_path);
sd_embedding_t item;
item.name = embedding_strings[embedding_strings.size() - 2].c_str();
item.path = embedding_strings[embedding_strings.size() - 1].c_str();
embedding_vec.push_back(item);
fprintf(stderr, "Found embedding: %s -> %s\n", item.name, item.path);
}
fprintf(stderr, "Loaded %zu embeddings from %s\n", embedding_vec.size(), embedding_dir);
}
// Discover LoRA files in directory and build a map of name -> path
static std::map<std::string, std::string> discover_lora_files(const char* lora_dir) {
std::map<std::string, std::string> lora_map;
if (!lora_dir || strlen(lora_dir) == 0) {
fprintf(stderr, "LoRA directory not specified\n");
return lora_map;
}
if (!std::filesystem::exists(lora_dir) || !std::filesystem::is_directory(lora_dir)) {
fprintf(stderr, "LoRA directory does not exist or is not a directory: %s\n", lora_dir);
return lora_map;
}
static const std::vector<std::string> valid_ext = {".safetensors", ".ckpt", ".pt", ".gguf"};
fprintf(stderr, "Discovering LoRA files in: %s\n", lora_dir);
for (const auto& entry : std::filesystem::directory_iterator(lora_dir)) {
if (!entry.is_regular_file()) {
continue;
}
auto path = entry.path();
std::string ext = path.extension().string();
bool valid = false;
for (const auto& e : valid_ext) {
if (ext == e) {
@@ -232,17 +233,17 @@ static std::map<std::string, std::string> discover_lora_files(const char* lora_d
if (!valid) {
continue;
}
std::string name = path.stem().string(); // stem() already removes extension
std::string full_path = path.string();
// Store the name (without extension) -> full path mapping
// This allows users to specify just the name in <lora:name:strength>
lora_map[name] = full_path;
fprintf(stderr, "Found LoRA file: %s -> %s\n", name.c_str(), full_path.c_str());
}
fprintf(stderr, "Discovered %zu LoRA files in %s\n", lora_map.size(), lora_dir);
return lora_map;
}
@@ -264,31 +265,31 @@ static bool is_absolute_path(const std::string& p) {
static std::pair<std::vector<sd_lora_t>, std::string> parse_loras_from_prompt(const std::string& prompt, const char* lora_dir) {
std::vector<sd_lora_t> loras;
std::string cleaned_prompt = prompt;
if (!lora_dir || strlen(lora_dir) == 0) {
fprintf(stderr, "LoRA directory not set, cannot parse LoRAs from prompt\n");
return {loras, cleaned_prompt};
}
// Discover LoRA files for name-based lookup
std::map<std::string, std::string> discovered_lora_map = discover_lora_files(lora_dir);
// Map to accumulate multipliers for the same LoRA (matches upstream)
std::map<std::string, float> lora_map;
std::map<std::string, float> high_noise_lora_map;
static const std::regex re(R"(<lora:([^:>]+):([^>]+)>)");
static const std::vector<std::string> valid_ext = {".pt", ".safetensors", ".gguf"};
std::smatch m;
std::string tmp = prompt;
fprintf(stderr, "Parsing LoRAs from prompt: %s\n", prompt.c_str());
while (std::regex_search(tmp, m, re)) {
std::string raw_path = m[1].str();
const std::string raw_mul = m[2].str();
float mul = 0.f;
try {
mul = std::stof(raw_mul);
@@ -298,14 +299,14 @@ static std::pair<std::vector<sd_lora_t>, std::string> parse_loras_from_prompt(co
fprintf(stderr, "Invalid LoRA multiplier '%s', skipping\n", raw_mul.c_str());
continue;
}
bool is_high_noise = false;
static const std::string prefix = "|high_noise|";
if (raw_path.rfind(prefix, 0) == 0) {
raw_path.erase(0, prefix.size());
is_high_noise = true;
}
std::filesystem::path final_path;
if (is_absolute_path(raw_path)) {
final_path = raw_path;
@@ -334,7 +335,7 @@ static std::pair<std::vector<sd_lora_t>, std::string> parse_loras_from_prompt(co
}
}
}
// Try adding extensions if file doesn't exist
if (!std::filesystem::exists(final_path)) {
bool found = false;
@@ -354,24 +355,24 @@ static std::pair<std::vector<sd_lora_t>, std::string> parse_loras_from_prompt(co
continue;
}
}
// Normalize path (matches upstream)
const std::string key = final_path.lexically_normal().string();
// Accumulate multiplier if same LoRA appears multiple times (matches upstream)
if (is_high_noise) {
high_noise_lora_map[key] += mul;
} else {
lora_map[key] += mul;
}
fprintf(stderr, "Parsed LoRA: path='%s', multiplier=%.2f, is_high_noise=%s\n",
fprintf(stderr, "Parsed LoRA: path='%s', multiplier=%.2f, is_high_noise=%s\n",
key.c_str(), mul, is_high_noise ? "true" : "false");
cleaned_prompt = std::regex_replace(cleaned_prompt, re, "", std::regex_constants::format_first_only);
tmp = m.suffix().str();
}
// Build final LoRA vector from accumulated maps (matches upstream)
// Store all path strings first to ensure they persist
for (const auto& kv : lora_map) {
@@ -380,7 +381,7 @@ static std::pair<std::vector<sd_lora_t>, std::string> parse_loras_from_prompt(co
for (const auto& kv : high_noise_lora_map) {
lora_strings.push_back(kv.first);
}
// Now build the LoRA vector with pointers to the stored strings
size_t string_idx = 0;
for (const auto& kv : lora_map) {
@@ -391,7 +392,7 @@ static std::pair<std::vector<sd_lora_t>, std::string> parse_loras_from_prompt(co
loras.push_back(item);
string_idx++;
}
for (const auto& kv : high_noise_lora_map) {
sd_lora_t item;
item.is_high_noise = true;
@@ -400,7 +401,7 @@ static std::pair<std::vector<sd_lora_t>, std::string> parse_loras_from_prompt(co
loras.push_back(item);
string_idx++;
}
// Clean up extra spaces
std::regex space_regex(R"(\s+)");
cleaned_prompt = std::regex_replace(cleaned_prompt, space_regex, " ");
@@ -413,9 +414,9 @@ static std::pair<std::vector<sd_lora_t>, std::string> parse_loras_from_prompt(co
if (last != std::string::npos) {
cleaned_prompt.erase(last + 1);
}
fprintf(stderr, "Parsed %zu LoRA(s) from prompt. Cleaned prompt: %s\n", loras.size(), cleaned_prompt.c_str());
return {loras, cleaned_prompt};
}
@@ -752,7 +753,7 @@ int load_model(const char *model, char *model_path, char* options[], int threads
}
}
if (scheduler == SCHEDULER_COUNT) {
scheduler = sd_get_default_scheduler(sd_ctx);
scheduler = sd_get_default_scheduler(sd_ctx, sample_method);
fprintf(stderr, "Invalid scheduler, using default: %s\n", schedulers[scheduler]);
}
@@ -787,7 +788,7 @@ sd_img_gen_params_t* sd_img_gen_params_new(void) {
sd_img_gen_params_t *params = (sd_img_gen_params_t *)std::malloc(sizeof(sd_img_gen_params_t));
sd_img_gen_params_init(params);
sd_sample_params_init(&params->sample_params);
sd_easycache_params_init(&params->easycache);
sd_cache_params_init(&params->cache);
params->control_strength = 0.9f;
return params;
}
@@ -819,18 +820,18 @@ void sd_img_gen_params_set_prompts(sd_img_gen_params_t *params, const char *prom
fprintf(stderr, "Note: Found %zu LoRAs in negative prompt (may not be supported)\n", neg_loras.size());
}
cleaned_negative_prompt_storage = cleaned_negative;
// Set the cleaned prompts
params->prompt = cleaned_prompt_storage.c_str();
params->negative_prompt = cleaned_negative_prompt_storage.c_str();
// Set LoRAs in params
params->loras = lora_vec.empty() ? nullptr : lora_vec.data();
params->lora_count = static_cast<uint32_t>(lora_vec.size());
fprintf(stderr, "Set prompts with %zu LoRAs. Original prompt: %s\n", lora_vec.size(), prompt ? prompt : "(null)");
fprintf(stderr, "Cleaned prompt: %s\n", cleaned_prompt_storage.c_str());
// Debug: Verify LoRAs are set correctly
if (params->loras && params->lora_count > 0) {
fprintf(stderr, "DEBUG: LoRAs set in params structure:\n");
@@ -1042,7 +1043,7 @@ int gen_image(sd_img_gen_params_t *p, int steps, char *dst, float cfg_scale, cha
fprintf(stderr, "Using %u LoRA(s) in generation:\n", p->lora_count);
for (uint32_t i = 0; i < p->lora_count; i++) {
fprintf(stderr, " LoRA[%u]: path='%s', multiplier=%.2f, is_high_noise=%s\n",
i,
i,
p->loras[i].path ? p->loras[i].path : "(null)",
p->loras[i].multiplier,
p->loras[i].is_high_noise ? "true" : "false");

View File

@@ -6,6 +6,7 @@
set -e
CURDIR=$(dirname "$(realpath $0)")
REPO_ROOT="${CURDIR}/../../.."
# Create lib directory
mkdir -p $CURDIR/package/lib
@@ -50,6 +51,15 @@ else
exit 1
fi
# Package GPU libraries based on BUILD_TYPE
# The GPU library packaging script will detect BUILD_TYPE and copy appropriate GPU libraries
GPU_LIB_SCRIPT="${REPO_ROOT}/scripts/build/package-gpu-libs.sh"
if [ -f "$GPU_LIB_SCRIPT" ]; then
echo "Packaging GPU libraries for BUILD_TYPE=${BUILD_TYPE:-cpu}..."
source "$GPU_LIB_SCRIPT" "$CURDIR/package/lib"
package_gpu_libs
fi
echo "Packaging completed successfully"
ls -liah $CURDIR/package/
ls -liah $CURDIR/package/lib/

View File

@@ -8,7 +8,7 @@ JOBS?=$(shell nproc --ignore=1)
# whisper.cpp version
WHISPER_REPO?=https://github.com/ggml-org/whisper.cpp
WHISPER_CPP_VERSION?=6c22e792cb0ee155b6587ce71a8410c3aeb06949
WHISPER_CPP_VERSION?=aa1bc0d1a6dfd70dbb9f60c11df12441e03a9075
SO_TARGET?=libgowhisper.so
CMAKE_ARGS+=-DBUILD_SHARED_LIBS=OFF

View File

@@ -130,8 +130,9 @@ func (w *Whisper) AudioTranscription(opts *pb.TranscriptRequest) (pb.TranscriptR
segments := []*pb.TranscriptSegment{}
text := ""
for i := range int(segsLen) {
s := CppGetSegmentStart(i)
t := CppGetSegmentEnd(i)
// segment start/end conversion factor taken from https://github.com/ggml-org/whisper.cpp/blob/master/examples/cli/cli.cpp#L895
s := CppGetSegmentStart(i) * (10000000)
t := CppGetSegmentEnd(i) * (10000000)
txt := strings.Clone(CppGetSegmentText(i))
tokens := make([]int32, CppNTokens(i))

View File

@@ -6,6 +6,7 @@
set -e
CURDIR=$(dirname "$(realpath $0)")
REPO_ROOT="${CURDIR}/../../.."
# Create lib directory
mkdir -p $CURDIR/package/lib
@@ -50,6 +51,15 @@ else
exit 1
fi
# Package GPU libraries based on BUILD_TYPE
# The GPU library packaging script will detect BUILD_TYPE and copy appropriate GPU libraries
GPU_LIB_SCRIPT="${REPO_ROOT}/scripts/build/package-gpu-libs.sh"
if [ -f "$GPU_LIB_SCRIPT" ]; then
echo "Packaging GPU libraries for BUILD_TYPE=${BUILD_TYPE:-cpu}..."
source "$GPU_LIB_SCRIPT" "$CURDIR/package/lib"
package_gpu_libs
fi
echo "Packaging completed successfully"
ls -liah $CURDIR/package/
ls -liah $CURDIR/package/lib/

View File

File diff suppressed because it is too large Load Diff

View File

@@ -16,10 +16,8 @@ The Python backends use a unified build system based on `libbackend.sh` that pro
- **transformers** - Hugging Face Transformers framework (PyTorch-based)
- **vllm** - High-performance LLM inference engine
- **mlx** - Apple Silicon optimized ML framework
- **exllama2** - ExLlama2 quantized models
### Audio & Speech
- **bark** - Text-to-speech synthesis
- **coqui** - Coqui TTS models
- **faster-whisper** - Fast Whisper speech recognition
- **kitten-tts** - Lightweight TTS
@@ -85,7 +83,7 @@ runUnittests
The build system automatically detects and configures for different hardware:
- **CPU** - Standard CPU-only builds
- **CUDA** - NVIDIA GPU acceleration (supports CUDA 11/12)
- **CUDA** - NVIDIA GPU acceleration (supports CUDA 12/13)
- **Intel** - Intel XPU/GPU optimization
- **MLX** - Apple Silicon (M1/M2/M3) optimization
- **HIP** - AMD GPU acceleration
@@ -95,8 +93,8 @@ The build system automatically detects and configures for different hardware:
Backends can specify hardware-specific dependencies:
- `requirements.txt` - Base requirements
- `requirements-cpu.txt` - CPU-specific packages
- `requirements-cublas11.txt` - CUDA 11 packages
- `requirements-cublas12.txt` - CUDA 12 packages
- `requirements-cublas13.txt` - CUDA 13 packages
- `requirements-intel.txt` - Intel-optimized packages
- `requirements-mps.txt` - Apple Silicon packages

View File

@@ -1,16 +0,0 @@
# Creating a separate environment for ttsbark project
```
make ttsbark
```
# Testing the gRPC server
```
<The path of your python interpreter> -m unittest test_ttsbark.py
```
For example
```
/opt/conda/envs/bark/bin/python -m unittest extra/grpc/bark/test_ttsbark.py
``````

View File

@@ -1,4 +0,0 @@
transformers
accelerate
torch==2.4.1
torchaudio==2.4.1

View File

@@ -1,5 +0,0 @@
--extra-index-url https://download.pytorch.org/whl/cu118
torch==2.4.1+cu118
torchaudio==2.4.1+cu118
transformers
accelerate

View File

@@ -1,4 +0,0 @@
torch==2.4.1
torchaudio==2.4.1
transformers
accelerate

View File

@@ -1,5 +0,0 @@
--extra-index-url https://download.pytorch.org/whl/rocm6.0
torch==2.4.1+rocm6.0
torchaudio==2.4.1+rocm6.0
transformers
accelerate

View File

@@ -1,9 +0,0 @@
--extra-index-url https://pytorch-extension.intel.com/release-whl/stable/xpu/us/
intel-extension-for-pytorch==2.8.10+xpu
torch==2.3.1+cxx11.abi
torchaudio==2.3.1+cxx11.abi
oneccl_bind_pt==2.3.100+xpu
optimum[openvino]
setuptools
transformers
accelerate

View File

@@ -1,4 +0,0 @@
bark==0.1.5
grpcio==1.76.0
protobuf
certifi

View File

@@ -17,4 +17,9 @@ if [ "x${BUILD_PROFILE}" == "xintel" ]; then
fi
EXTRA_PIP_INSTALL_FLAGS+=" --no-build-isolation"
if [ "x${BUILD_PROFILE}" == "xl4t12" ]; then
USE_PIP=true
fi
installRequirements

View File

@@ -1,8 +0,0 @@
--extra-index-url https://download.pytorch.org/whl/cu118
torch==2.6.0+cu118
torchaudio==2.6.0+cu118
transformers==4.46.3
numpy>=1.24.0,<1.26.0
# https://github.com/mudler/LocalAI/pull/6240#issuecomment-3329518289
chatterbox-tts@git+https://git@github.com/mudler/chatterbox.git@faster
accelerate

View File

@@ -1,6 +1,6 @@
--extra-index-url https://download.pytorch.org/whl/rocm6.0
torch==2.6.0+rocm6.1
torchaudio==2.6.0+rocm6.1
--extra-index-url https://download.pytorch.org/whl/rocm6.4
torch==2.9.1+rocm6.4
torchaudio==2.9.1+rocm6.4
transformers
numpy>=1.24.0,<1.26.0
# https://github.com/mudler/LocalAI/pull/6240#issuecomment-3329518289

View File

@@ -0,0 +1,5 @@
# Build dependencies needed for packages installed from source (e.g., git dependencies)
# When using --no-build-isolation, these must be installed in the venv first
wheel
setuptools
packaging

View File

@@ -1,7 +1,6 @@
--extra-index-url https://pytorch-extension.intel.com/release-whl/stable/xpu/us/
intel-extension-for-pytorch==2.3.110+xpu
torch==2.3.1+cxx11.abi
torchaudio==2.3.1+cxx11.abi
--extra-index-url https://download.pytorch.org/whl/xpu
torch
torchaudio
transformers
numpy>=1.24.0,<1.26.0
# https://github.com/mudler/LocalAI/pull/6240#issuecomment-3329518289

View File

@@ -1,7 +1,7 @@
#!/usr/bin/env bash
set -euo pipefail
#
#
# use the library by adding the following line to a script:
# source $(dirname $0)/../common/libbackend.sh
#
@@ -206,8 +206,8 @@ function init() {
# getBuildProfile will inspect the system to determine which build profile is appropriate:
# returns one of the following:
# - cublas11
# - cublas12
# - cublas13
# - hipblas
# - intel
function getBuildProfile() {
@@ -392,13 +392,13 @@ function runProtogen() {
# - requirements-${BUILD_TYPE}.txt
# - requirements-${BUILD_PROFILE}.txt
#
# BUILD_PROFILE is a more specific version of BUILD_TYPE, ex: cuda-11 or cuda-12
# BUILD_PROFILE is a more specific version of BUILD_TYPE, ex: cuda-12 or cuda-13
# it can also include some options that we do not have BUILD_TYPES for, ex: intel
#
# NOTE: for BUILD_PROFILE==intel, this function does NOT automatically use the Intel python package index.
# you may want to add the following line to a requirements-intel.txt if you use one:
#
# --index-url https://pytorch-extension.intel.com/release-whl/stable/xpu/us/
# --index-url https://download.pytorch.org/whl/xpu
#
# If you need to add extra flags into the pip install command you can do so by setting the variable EXTRA_PIP_INSTALL_FLAGS
# before calling installRequirements. For example:
@@ -465,6 +465,14 @@ function startBackend() {
if [ "x${PORTABLE_PYTHON}" == "xtrue" ] || [ -x "$(_portable_python)" ]; then
_makeVenvPortable --update-pyvenv-cfg
fi
# Set up GPU library paths if a lib directory exists
# This allows backends to include their own GPU libraries (CUDA, ROCm, etc.)
if [ -d "${EDIR}/lib" ]; then
export LD_LIBRARY_PATH="${EDIR}/lib:${LD_LIBRARY_PATH:-}"
echo "Added ${EDIR}/lib to LD_LIBRARY_PATH for GPU libraries"
fi
if [ ! -z "${BACKEND_FILE:-}" ]; then
exec "${EDIR}/venv/bin/python" "${BACKEND_FILE}" "$@"
elif [ -e "${MY_DIR}/server.py" ]; then

View File

@@ -1,2 +1,2 @@
--extra-index-url https://download.pytorch.org/whl/rocm6.0
--extra-index-url https://download.pytorch.org/whl/rocm6.4
torch

View File

@@ -1,5 +1,4 @@
--extra-index-url https://pytorch-extension.intel.com/release-whl/stable/xpu/us/
intel-extension-for-pytorch==2.8.10+xpu
--extra-index-url https://download.pytorch.org/whl/xpu
torch==2.8.0
oneccl_bind_pt==2.8.0+xpu
optimum[openvino]

View File

@@ -1,4 +1,4 @@
# Creating a separate environment for ttsbark project
# Creating a separate environment for coqui project
```
make coqui

View File

@@ -1,6 +1,6 @@
#!/usr/bin/env python3
"""
This is an extra gRPC server of LocalAI for Bark TTS
This is an extra gRPC server of LocalAI for Coqui TTS
"""
from concurrent import futures
import time

View File

@@ -1,6 +0,0 @@
--extra-index-url https://download.pytorch.org/whl/cu118
torch==2.4.1+cu118
torchaudio==2.4.1+cu118
transformers==4.48.3
accelerate
coqui-tts

View File

@@ -1,6 +1,6 @@
--extra-index-url https://download.pytorch.org/whl/rocm6.0
torch==2.4.1+rocm6.0
torchaudio==2.4.1+rocm6.0
--extra-index-url https://download.pytorch.org/whl/rocm6.4
torch==2.8.0+rocm6.4
torchaudio==2.8.0+rocm6.4
transformers==4.48.3
accelerate
coqui-tts

View File

@@ -1,8 +1,6 @@
--extra-index-url https://pytorch-extension.intel.com/release-whl/stable/xpu/us/
intel-extension-for-pytorch==2.3.110+xpu
torch==2.3.1+cxx11.abi
torchaudio==2.3.1+cxx11.abi
oneccl_bind_pt==2.3.100+xpu
--extra-index-url https://download.pytorch.org/whl/xpu
torch==2.8.0+xpu
torchaudio==2.8.0+xpu
optimum[openvino]
setuptools
transformers==4.48.3

View File

@@ -41,6 +41,10 @@ from optimum.quanto import freeze, qfloat8, quantize
from transformers import T5EncoderModel
from safetensors.torch import load_file
# Import LTX-2 specific utilities
from diffusers.pipelines.ltx2.export_utils import encode_video as ltx2_encode_video
from diffusers import LTX2VideoTransformer3DModel, GGUFQuantizationConfig
_ONE_DAY_IN_SECONDS = 60 * 60 * 24
COMPEL = os.environ.get("COMPEL", "0") == "1"
XPU = os.environ.get("XPU", "0") == "1"
@@ -290,6 +294,104 @@ class BackendServicer(backend_pb2_grpc.BackendServicer):
pipe.enable_model_cpu_offload()
return pipe
# LTX2ImageToVideoPipeline - needs img2vid flag, CPU offload, and special handling
if pipeline_type == "LTX2ImageToVideoPipeline":
self.img2vid = True
self.ltx2_pipeline = True
# Check if loading from single file (GGUF)
if fromSingleFile and LTX2VideoTransformer3DModel is not None:
_, single_file_ext = os.path.splitext(modelFile)
if single_file_ext == ".gguf":
# Load transformer from single GGUF file with quantization
transformer_kwargs = {}
quantization_config = GGUFQuantizationConfig(compute_dtype=torchType)
transformer_kwargs["quantization_config"] = quantization_config
transformer = LTX2VideoTransformer3DModel.from_single_file(
modelFile,
config=request.Model, # Use request.Model as the config/model_id
subfolder="transformer",
**transformer_kwargs,
)
# Load pipeline with custom transformer
pipe = load_diffusers_pipeline(
class_name="LTX2ImageToVideoPipeline",
model_id=request.Model,
transformer=transformer,
torch_dtype=torchType,
)
else:
# Single file but not GGUF - use standard single file loading
pipe = load_diffusers_pipeline(
class_name="LTX2ImageToVideoPipeline",
model_id=modelFile,
from_single_file=True,
torch_dtype=torchType,
)
else:
# Standard loading from pretrained
pipe = load_diffusers_pipeline(
class_name="LTX2ImageToVideoPipeline",
model_id=request.Model,
torch_dtype=torchType,
variant=variant
)
if not DISABLE_CPU_OFFLOAD:
pipe.enable_model_cpu_offload()
return pipe
# LTX2Pipeline - text-to-video pipeline, needs txt2vid flag, CPU offload, and special handling
if pipeline_type == "LTX2Pipeline":
self.txt2vid = True
self.ltx2_pipeline = True
# Check if loading from single file (GGUF)
if fromSingleFile and LTX2VideoTransformer3DModel is not None:
_, single_file_ext = os.path.splitext(modelFile)
if single_file_ext == ".gguf":
# Load transformer from single GGUF file with quantization
transformer_kwargs = {}
quantization_config = GGUFQuantizationConfig(compute_dtype=torchType)
transformer_kwargs["quantization_config"] = quantization_config
transformer = LTX2VideoTransformer3DModel.from_single_file(
modelFile,
config=request.Model, # Use request.Model as the config/model_id
subfolder="transformer",
**transformer_kwargs,
)
# Load pipeline with custom transformer
pipe = load_diffusers_pipeline(
class_name="LTX2Pipeline",
model_id=request.Model,
transformer=transformer,
torch_dtype=torchType,
)
else:
# Single file but not GGUF - use standard single file loading
pipe = load_diffusers_pipeline(
class_name="LTX2Pipeline",
model_id=modelFile,
from_single_file=True,
torch_dtype=torchType,
)
else:
# Standard loading from pretrained
pipe = load_diffusers_pipeline(
class_name="LTX2Pipeline",
model_id=request.Model,
torch_dtype=torchType,
variant=variant
)
if not DISABLE_CPU_OFFLOAD:
pipe.enable_model_cpu_offload()
return pipe
# ================================================================
# Dynamic pipeline loading - the default path for most pipelines
# Uses the dynamic loader to instantiate any pipeline by class name
@@ -404,6 +506,9 @@ class BackendServicer(backend_pb2_grpc.BackendServicer):
fromSingleFile = request.Model.startswith("http") or request.Model.startswith("/") or local
self.img2vid = False
self.txt2vid = False
self.ltx2_pipeline = False
print(f"LoadModel: PipelineType from request: {request.PipelineType}", file=sys.stderr)
# Load pipeline using dynamic loader
# Special cases that require custom initialization are handled first
@@ -414,6 +519,8 @@ class BackendServicer(backend_pb2_grpc.BackendServicer):
torchType=torchType,
variant=variant
)
print(f"LoadModel: After loading - ltx2_pipeline: {self.ltx2_pipeline}, img2vid: {self.img2vid}, txt2vid: {self.txt2vid}, PipelineType: {self.PipelineType}", file=sys.stderr)
if CLIPSKIP and request.CLIPSkip != 0:
self.clip_skip = request.CLIPSkip
@@ -651,14 +758,20 @@ class BackendServicer(backend_pb2_grpc.BackendServicer):
try:
prompt = request.prompt
if not prompt:
print(f"GenerateVideo: No prompt provided for video generation.", file=sys.stderr)
return backend_pb2.Result(success=False, message="No prompt provided for video generation")
# Debug: Print raw request values
print(f"GenerateVideo: Raw request values - num_frames: {request.num_frames}, fps: {request.fps}, cfg_scale: {request.cfg_scale}, step: {request.step}", file=sys.stderr)
# Set default values from request or use defaults
num_frames = request.num_frames if request.num_frames > 0 else 81
fps = request.fps if request.fps > 0 else 16
cfg_scale = request.cfg_scale if request.cfg_scale > 0 else 4.0
num_inference_steps = request.step if request.step > 0 else 40
print(f"GenerateVideo: Using values - num_frames: {num_frames}, fps: {fps}, cfg_scale: {cfg_scale}, num_inference_steps: {num_inference_steps}", file=sys.stderr)
# Prepare generation parameters
kwargs = {
"prompt": prompt,
@@ -684,9 +797,86 @@ class BackendServicer(backend_pb2_grpc.BackendServicer):
kwargs["end_image"] = load_image(request.end_image)
print(f"Generating video with {kwargs=}", file=sys.stderr)
print(f"GenerateVideo: Pipeline type: {self.PipelineType}, ltx2_pipeline flag: {self.ltx2_pipeline}", file=sys.stderr)
# Generate video frames based on pipeline type
if self.PipelineType == "WanPipeline":
if self.ltx2_pipeline or self.PipelineType in ["LTX2Pipeline", "LTX2ImageToVideoPipeline"]:
# LTX-2 generation with audio (supports both text-to-video and image-to-video)
# Determine if this is text-to-video (no image) or image-to-video (has image)
has_image = bool(request.start_image)
# Remove image-related parameters that might have been added earlier
kwargs.pop("start_image", None)
kwargs.pop("end_image", None)
# LTX2ImageToVideoPipeline uses 'image' parameter for image-to-video
# LTX2Pipeline (text-to-video) doesn't need an image parameter
if has_image:
# Image-to-video: use 'image' parameter
if self.PipelineType == "LTX2ImageToVideoPipeline":
image = load_image(request.start_image)
kwargs["image"] = image
print(f"LTX-2: Using image-to-video mode with image", file=sys.stderr)
else:
# If pipeline type is LTX2Pipeline but we have an image, we can't do image-to-video
return backend_pb2.Result(success=False, message="LTX2Pipeline does not support image-to-video. Use LTX2ImageToVideoPipeline for image-to-video generation.")
else:
# Text-to-video: no image parameter needed
# Ensure no image-related kwargs are present
kwargs.pop("image", None)
print(f"LTX-2: Using text-to-video mode (no image)", file=sys.stderr)
# LTX-2 uses 'frame_rate' instead of 'fps'
frame_rate = float(fps)
kwargs["frame_rate"] = frame_rate
# LTX-2 requires output_type="np" and return_dict=False
kwargs["output_type"] = "np"
kwargs["return_dict"] = False
# Generate video and audio
print(f"LTX-2: Generating with kwargs: {kwargs}", file=sys.stderr)
try:
video, audio = self.pipe(**kwargs)
print(f"LTX-2: Generated video shape: {video.shape}, audio shape: {audio.shape}", file=sys.stderr)
except Exception as e:
print(f"LTX-2: Error during pipe() call: {e}", file=sys.stderr)
traceback.print_exc()
return backend_pb2.Result(success=False, message=f"Error generating video with LTX-2 pipeline: {e}")
# Convert video to uint8 format
video = (video * 255).round().astype("uint8")
video = torch.from_numpy(video)
print(f"LTX-2: Converting video, shape after conversion: {video.shape}", file=sys.stderr)
print(f"LTX-2: Audio sample rate: {self.pipe.vocoder.config.output_sampling_rate}", file=sys.stderr)
print(f"LTX-2: Output path: {request.dst}", file=sys.stderr)
# Use LTX-2's encode_video function which handles audio
try:
ltx2_encode_video(
video[0],
fps=frame_rate,
audio=audio[0].float().cpu(),
audio_sample_rate=self.pipe.vocoder.config.output_sampling_rate,
output_path=request.dst,
)
# Verify file was created and has content
import os
if os.path.exists(request.dst):
file_size = os.path.getsize(request.dst)
print(f"LTX-2: Video file created successfully, size: {file_size} bytes", file=sys.stderr)
if file_size == 0:
return backend_pb2.Result(success=False, message=f"Video file was created but is empty (0 bytes). Check LTX-2 encode_video function.")
else:
return backend_pb2.Result(success=False, message=f"Video file was not created at {request.dst}")
except Exception as e:
print(f"LTX-2: Error encoding video: {e}", file=sys.stderr)
traceback.print_exc()
return backend_pb2.Result(success=False, message=f"Error encoding video: {e}")
return backend_pb2.Result(message="Video generated successfully", success=True)
elif self.PipelineType == "WanPipeline":
# WAN2.2 text-to-video generation
output = self.pipe(**kwargs)
frames = output.frames[0] # WAN2.2 returns frames in this format
@@ -725,11 +915,23 @@ class BackendServicer(backend_pb2_grpc.BackendServicer):
output = self.pipe(**kwargs)
frames = output.frames[0]
else:
print(f"GenerateVideo: Pipeline {self.PipelineType} does not match any known video pipeline handler", file=sys.stderr)
return backend_pb2.Result(success=False, message=f"Pipeline {self.PipelineType} does not support video generation")
# Export video
# Export video (for non-LTX-2 pipelines)
print(f"GenerateVideo: Exporting video to {request.dst} with fps={fps}", file=sys.stderr)
export_to_video(frames, request.dst, fps=fps)
# Verify file was created
import os
if os.path.exists(request.dst):
file_size = os.path.getsize(request.dst)
print(f"GenerateVideo: Video file created, size: {file_size} bytes", file=sys.stderr)
if file_size == 0:
return backend_pb2.Result(success=False, message=f"Video file was created but is empty (0 bytes)")
else:
return backend_pb2.Result(success=False, message=f"Video file was not created at {request.dst}")
return backend_pb2.Result(message="Video generated successfully", success=True)
except Exception as err:

View File

@@ -16,8 +16,12 @@ if [ "x${BUILD_PROFILE}" == "xintel" ]; then
EXTRA_PIP_INSTALL_FLAGS+=" --upgrade --index-strategy=unsafe-first-match"
fi
if [ "x${BUILD_PROFILE}" == "xl4t12" ]; then
USE_PIP=true
fi
# Use python 3.12 for l4t
if [ "x${BUILD_PROFILE}" == "xl4t12" ] || [ "x${BUILD_PROFILE}" == "xl4t13" ]; then
if [ "x${BUILD_PROFILE}" == "xl4t13" ]; then
PYTHON_VERSION="3.12"
PYTHON_PATCH="12"
PY_STANDALONE_TAG="20251120"

View File

@@ -1,12 +0,0 @@
--extra-index-url https://download.pytorch.org/whl/cu118
git+https://github.com/huggingface/diffusers
opencv-python
transformers
torchvision==0.22.1
accelerate
compel
peft
sentencepiece
torch==2.7.1
optimum-quanto
ftfy

View File

@@ -1,6 +1,6 @@
--extra-index-url https://download.pytorch.org/whl/rocm6.3
torch==2.7.1+rocm6.3
torchvision==0.22.1+rocm6.3
--extra-index-url https://download.pytorch.org/whl/rocm6.4
torch==2.8.0+rocm6.4
torchvision==0.23.0+rocm6.4
git+https://github.com/huggingface/diffusers
opencv-python
transformers

View File

@@ -1,8 +1,6 @@
--extra-index-url https://pytorch-extension.intel.com/release-whl/stable/xpu/us/
intel-extension-for-pytorch==2.3.110+xpu
torch==2.5.1+cxx11.abi
torchvision==0.20.1+cxx11.abi
oneccl_bind_pt==2.8.0+xpu
--extra-index-url https://download.pytorch.org/whl/xpu
torch
torchvision
optimum[openvino]
setuptools
git+https://github.com/huggingface/diffusers

View File

@@ -3,3 +3,4 @@ grpcio==1.76.0
pillow
protobuf
certifi
av

View File

@@ -1 +0,0 @@
source

View File

@@ -1,17 +0,0 @@
.PHONY: exllama2
exllama2:
bash install.sh
.PHONY: run
run: exllama2
@echo "Running exllama2..."
bash run.sh
@echo "exllama2 run."
.PHONY: protogen-clean
protogen-clean:
$(RM) backend_pb2_grpc.py backend_pb2.py
.PHONY: clean
clean: protogen-clean
$(RM) -r venv source __pycache__

View File

@@ -1,143 +0,0 @@
#!/usr/bin/env python3
import grpc
from concurrent import futures
import time
import backend_pb2
import backend_pb2_grpc
import argparse
import signal
import sys
import os
import glob
from pathlib import Path
import torch
import torch.nn.functional as F
from torch import version as torch_version
from exllamav2.generator import (
ExLlamaV2BaseGenerator,
ExLlamaV2Sampler
)
from exllamav2 import (
ExLlamaV2,
ExLlamaV2Config,
ExLlamaV2Cache,
ExLlamaV2Cache_8bit,
ExLlamaV2Tokenizer,
model_init,
)
_ONE_DAY_IN_SECONDS = 60 * 60 * 24
# If MAX_WORKERS are specified in the environment use it, otherwise default to 1
MAX_WORKERS = int(os.environ.get('PYTHON_GRPC_MAX_WORKERS', '1'))
# Implement the BackendServicer class with the service methods
class BackendServicer(backend_pb2_grpc.BackendServicer):
def Health(self, request, context):
return backend_pb2.Reply(message=bytes("OK", 'utf-8'))
def LoadModel(self, request, context):
try:
model_directory = request.ModelFile
config = ExLlamaV2Config()
config.model_dir = model_directory
config.prepare()
model = ExLlamaV2(config)
cache = ExLlamaV2Cache(model, lazy=True)
model.load_autosplit(cache)
tokenizer = ExLlamaV2Tokenizer(config)
# Initialize generator
generator = ExLlamaV2BaseGenerator(model, cache, tokenizer)
self.generator = generator
generator.warmup()
self.model = model
self.tokenizer = tokenizer
self.cache = cache
except Exception as err:
return backend_pb2.Result(success=False, message=f"Unexpected {err=}, {type(err)=}")
return backend_pb2.Result(message="Model loaded successfully", success=True)
def Predict(self, request, context):
penalty = 1.15
if request.Penalty != 0.0:
penalty = request.Penalty
settings = ExLlamaV2Sampler.Settings()
settings.temperature = request.Temperature
settings.top_k = request.TopK
settings.top_p = request.TopP
settings.token_repetition_penalty = penalty
settings.disallow_tokens(self.tokenizer, [self.tokenizer.eos_token_id])
tokens = 512
if request.Tokens != 0:
tokens = request.Tokens
output = self.generator.generate_simple(
request.Prompt, settings, tokens)
# Remove prompt from response if present
if request.Prompt in output:
output = output.replace(request.Prompt, "")
return backend_pb2.Result(message=bytes(output, encoding='utf-8'))
def PredictStream(self, request, context):
# Implement PredictStream RPC
# for reply in some_data_generator():
# yield reply
# Not implemented yet
return self.Predict(request, context)
def serve(address):
server = grpc.server(futures.ThreadPoolExecutor(max_workers=MAX_WORKERS),
options=[
('grpc.max_message_length', 50 * 1024 * 1024), # 50MB
('grpc.max_send_message_length', 50 * 1024 * 1024), # 50MB
('grpc.max_receive_message_length', 50 * 1024 * 1024), # 50MB
])
backend_pb2_grpc.add_BackendServicer_to_server(BackendServicer(), server)
server.add_insecure_port(address)
server.start()
print("Server started. Listening on: " + address, file=sys.stderr)
# Define the signal handler function
def signal_handler(sig, frame):
print("Received termination signal. Shutting down...")
server.stop(0)
sys.exit(0)
# Set the signal handlers for SIGINT and SIGTERM
signal.signal(signal.SIGINT, signal_handler)
signal.signal(signal.SIGTERM, signal_handler)
try:
while True:
time.sleep(_ONE_DAY_IN_SECONDS)
except KeyboardInterrupt:
server.stop(0)
if __name__ == "__main__":
parser = argparse.ArgumentParser(description="Run the gRPC server.")
parser.add_argument(
"--addr", default="localhost:50051", help="The address to bind the server to."
)
args = parser.parse_args()
serve(args.addr)

View File

@@ -1,21 +0,0 @@
#!/bin/bash
set -e
LIMIT_TARGETS="cublas"
EXTRA_PIP_INSTALL_FLAGS="--no-build-isolation"
EXLLAMA2_VERSION=c0ddebaaaf8ffd1b3529c2bb654e650bce2f790f
backend_dir=$(dirname $0)
if [ -d $backend_dir/common ]; then
source $backend_dir/common/libbackend.sh
else
source $backend_dir/../common/libbackend.sh
fi
installRequirements
git clone https://github.com/turboderp/exllamav2 $MY_DIR/source
pushd ${MY_DIR}/source && git checkout -b build ${EXLLAMA2_VERSION} && popd
# This installs exllamav2 in JIT mode so it will compile the appropriate torch extension at runtime
EXLLAMA_NOCOMPILE= uv pip install ${EXTRA_PIP_INSTALL_FLAGS} ${MY_DIR}/source/

View File

@@ -1,3 +0,0 @@
transformers
accelerate
torch==2.4.1

View File

@@ -1,4 +0,0 @@
--extra-index-url https://download.pytorch.org/whl/cu118
torch==2.4.1+cu118
transformers
accelerate

View File

@@ -1,3 +0,0 @@
torch==2.4.1
transformers
accelerate

View File

@@ -1,4 +0,0 @@
# This is here to trigger the install script to add --no-build-isolation to the uv pip install commands
# exllama2 does not specify it's build requirements per PEP517, so we need to provide some things ourselves
wheel
setuptools

View File

@@ -1,5 +0,0 @@
grpcio==1.76.0
protobuf
certifi
wheel
setuptools

View File

@@ -1,6 +1,6 @@
#!/usr/bin/env python3
"""
This is an extra gRPC server of LocalAI for Bark TTS
This is an extra gRPC server of LocalAI for Faster Whisper TTS
"""
from concurrent import futures
import time
@@ -40,7 +40,7 @@ class BackendServicer(backend_pb2_grpc.BackendServicer):
device = "mps"
try:
print("Preparing models, please wait", file=sys.stderr)
self.model = WhisperModel(request.Model, device=device, compute_type="float16")
self.model = WhisperModel(request.Model, device=device, compute_type="default")
except Exception as err:
return backend_pb2.Result(success=False, message=f"Unexpected {err=}, {type(err)=}")
# Implement your logic here for the LoadModel service
@@ -55,11 +55,12 @@ class BackendServicer(backend_pb2_grpc.BackendServicer):
id = 0
for segment in segments:
print("[%.2fs -> %.2fs] %s" % (segment.start, segment.end, segment.text))
resultSegments.append(backend_pb2.TranscriptSegment(id=id, start=segment.start, end=segment.end, text=segment.text))
resultSegments.append(backend_pb2.TranscriptSegment(id=id, start=int(segment.start)*1e9, end=int(segment.end)*1e9, text=segment.text))
text += segment.text
id += 1
id += 1
except Exception as err:
print(f"Unexpected {err=}, {type(err)=}", file=sys.stderr)
raise err
return backend_pb2.TranscriptResult(segments=resultSegments, text=text)

View File

@@ -1,9 +0,0 @@
--extra-index-url https://download.pytorch.org/whl/cu118
torch==2.4.1+cu118
faster-whisper
opencv-python
accelerate
compel
peft
sentencepiece
optimum-quanto

View File

@@ -1,3 +1,3 @@
--extra-index-url https://download.pytorch.org/whl/rocm6.0
--extra-index-url https://download.pytorch.org/whl/rocm6.4
torch
faster-whisper

View File

@@ -1,6 +1,4 @@
--extra-index-url https://pytorch-extension.intel.com/release-whl/stable/xpu/us/
intel-extension-for-pytorch==2.3.110+xpu
torch==2.3.1+cxx11.abi
oneccl_bind_pt==2.3.100+xpu
--extra-index-url https://download.pytorch.org/whl/xpu
torch
optimum[openvino]
faster-whisper

View File

@@ -16,4 +16,8 @@ if [ "x${BUILD_PROFILE}" == "xintel" ]; then
EXTRA_PIP_INSTALL_FLAGS+=" --upgrade --index-strategy=unsafe-first-match"
fi
if [ "x${BUILD_PROFILE}" == "xl4t12" ]; then
USE_PIP=true
fi
installRequirements

View File

@@ -1,7 +0,0 @@
--extra-index-url https://download.pytorch.org/whl/cu118
torch==2.7.1+cu118
torchaudio==2.7.1+cu118
transformers
accelerate
kokoro
soundfile

View File

@@ -1,6 +1,6 @@
--extra-index-url https://download.pytorch.org/whl/rocm6.3
torch==2.7.1+rocm6.3
torchaudio==2.7.1+rocm6.3
--extra-index-url https://download.pytorch.org/whl/rocm6.4
torch==2.8.0+rocm6.4
torchaudio==2.8.0+rocm6.4
transformers
accelerate
kokoro

View File

@@ -1,8 +1,6 @@
--extra-index-url https://pytorch-extension.intel.com/release-whl/stable/xpu/us/
intel-extension-for-pytorch==2.8.10+xpu
torch==2.5.1+cxx11.abi
oneccl_bind_pt==2.8.0+xpu
torchaudio==2.5.1+cxx11.abi
--extra-index-url https://download.pytorch.org/whl/xpu
torch
torchaudio
optimum[openvino]
setuptools
transformers==4.48.3

View File

@@ -0,0 +1,16 @@
.DEFAULT_GOAL := install
.PHONY: install
install:
bash install.sh
.PHONY: protogen-clean
protogen-clean:
$(RM) backend_pb2_grpc.py backend_pb2.py
.PHONY: clean
clean: protogen-clean
rm -rf venv __pycache__
test: install
bash test.sh

View File

@@ -1,6 +1,6 @@
#!/usr/bin/env python3
"""
This is an extra gRPC server of LocalAI for Bark TTS
This is an extra gRPC server of LocalAI for Moonshine transcription
"""
from concurrent import futures
import time
@@ -8,11 +8,9 @@ import argparse
import signal
import sys
import os
from scipy.io.wavfile import write as write_wav
import backend_pb2
import backend_pb2_grpc
from bark import SAMPLE_RATE, generate_audio, preload_models
import moonshine_onnx
import grpc
@@ -29,36 +27,52 @@ class BackendServicer(backend_pb2_grpc.BackendServicer):
"""
def Health(self, request, context):
return backend_pb2.Reply(message=bytes("OK", 'utf-8'))
def LoadModel(self, request, context):
model_name = request.Model
try:
print("Preparing models, please wait", file=sys.stderr)
# download and load all models
preload_models()
# Store the model name for use in transcription
# Model name format: e.g., "moonshine/tiny"
self.model_name = request.Model
print(f"Model name set to: {self.model_name}", file=sys.stderr)
except Exception as err:
return backend_pb2.Result(success=False, message=f"Unexpected {err=}, {type(err)=}")
# Implement your logic here for the LoadModel service
# Replace this with your desired response
return backend_pb2.Result(message="Model loaded successfully", success=True)
def TTS(self, request, context):
model = request.model
print(request, file=sys.stderr)
def AudioTranscription(self, request, context):
resultSegments = []
text = ""
try:
audio_array = None
if model != "":
audio_array = generate_audio(request.text, history_prompt=model)
# moonshine_onnx.transcribe returns a list of strings
transcriptions = moonshine_onnx.transcribe(request.dst, self.model_name)
# Combine all transcriptions into a single text
if isinstance(transcriptions, list):
text = " ".join(transcriptions)
# Create segments for each transcription in the list
for id, trans in enumerate(transcriptions):
# Since moonshine doesn't provide timing info, we'll create a single segment
# with id and text, using approximate timing
resultSegments.append(backend_pb2.TranscriptSegment(
id=id,
start=0,
end=0,
text=trans
))
else:
audio_array = generate_audio(request.text)
print("saving to", request.dst, file=sys.stderr)
# save audio to disk
write_wav(request.dst, SAMPLE_RATE, audio_array)
print("saved to", request.dst, file=sys.stderr)
print("tts for", file=sys.stderr)
print(request, file=sys.stderr)
# Handle case where it's not a list (shouldn't happen, but be safe)
text = str(transcriptions)
resultSegments.append(backend_pb2.TranscriptSegment(
id=0,
start=0,
end=0,
text=text
))
except Exception as err:
return backend_pb2.Result(success=False, message=f"Unexpected {err=}, {type(err)=}")
return backend_pb2.Result(success=True)
print(f"Unexpected {err=}, {type(err)=}", file=sys.stderr)
return backend_pb2.TranscriptResult(segments=[], text="")
return backend_pb2.TranscriptResult(segments=resultSegments, text=text)
def serve(address):
server = grpc.server(futures.ThreadPoolExecutor(max_workers=MAX_WORKERS),
@@ -96,3 +110,4 @@ if __name__ == "__main__":
args = parser.parse_args()
serve(args.addr)

View File

@@ -0,0 +1,12 @@
#!/bin/bash
set -e
backend_dir=$(dirname $0)
if [ -d $backend_dir/common ]; then
source $backend_dir/common/libbackend.sh
else
source $backend_dir/../common/libbackend.sh
fi
installRequirements

View File

@@ -0,0 +1,12 @@
#!/bin/bash
set -e
backend_dir=$(dirname $0)
if [ -d $backend_dir/common ]; then
source $backend_dir/common/libbackend.sh
else
source $backend_dir/../common/libbackend.sh
fi
python3 -m grpc_tools.protoc -I../.. -I./ --python_out=. --grpc_python_out=. backend.proto

View File

@@ -0,0 +1,4 @@
grpcio==1.71.0
protobuf
grpcio-tools
useful-moonshine-onnx@git+https://git@github.com/moonshine-ai/moonshine.git#subdirectory=moonshine-onnx

View File

@@ -1,6 +1,4 @@
#!/bin/bash
LIMIT_TARGETS="cublas"
backend_dir=$(dirname $0)
if [ -d $backend_dir/common ]; then
source $backend_dir/common/libbackend.sh
@@ -8,4 +6,5 @@ else
source $backend_dir/../common/libbackend.sh
fi
startBackend $@
startBackend $@

View File

@@ -0,0 +1,139 @@
"""
A test script to test the gRPC service for Moonshine transcription
"""
import unittest
import subprocess
import time
import os
import tempfile
import shutil
import backend_pb2
import backend_pb2_grpc
import grpc
class TestBackendServicer(unittest.TestCase):
"""
TestBackendServicer is the class that tests the gRPC service
"""
def setUp(self):
"""
This method sets up the gRPC service by starting the server
"""
self.service = subprocess.Popen(["python3", "backend.py", "--addr", "localhost:50051"])
time.sleep(10)
def tearDown(self) -> None:
"""
This method tears down the gRPC service by terminating the server
"""
self.service.terminate()
self.service.wait()
def test_server_startup(self):
"""
This method tests if the server starts up successfully
"""
try:
self.setUp()
with grpc.insecure_channel("localhost:50051") as channel:
stub = backend_pb2_grpc.BackendStub(channel)
response = stub.Health(backend_pb2.HealthMessage())
self.assertEqual(response.message, b'OK')
except Exception as err:
print(err)
self.fail("Server failed to start")
finally:
self.tearDown()
def test_load_model(self):
"""
This method tests if the model is loaded successfully
"""
try:
self.setUp()
with grpc.insecure_channel("localhost:50051") as channel:
stub = backend_pb2_grpc.BackendStub(channel)
response = stub.LoadModel(backend_pb2.ModelOptions(Model="moonshine/tiny"))
self.assertTrue(response.success)
self.assertEqual(response.message, "Model loaded successfully")
except Exception as err:
print(err)
self.fail("LoadModel service failed")
finally:
self.tearDown()
def test_audio_transcription(self):
"""
This method tests if audio transcription works successfully
"""
# Create a temporary directory for the audio file
temp_dir = tempfile.mkdtemp()
audio_file = os.path.join(temp_dir, 'audio.wav')
try:
# Download the audio file to the temporary directory
print(f"Downloading audio file to {audio_file}...")
url = "https://cdn.openai.com/whisper/draft-20220913a/micro-machines.wav"
result = subprocess.run(
["wget", "-q", url, "-O", audio_file],
capture_output=True,
text=True
)
if result.returncode != 0:
self.fail(f"Failed to download audio file: {result.stderr}")
# Verify the file was downloaded
if not os.path.exists(audio_file):
self.fail(f"Audio file was not downloaded to {audio_file}")
self.setUp()
with grpc.insecure_channel("localhost:50051") as channel:
stub = backend_pb2_grpc.BackendStub(channel)
# Load the model first
load_response = stub.LoadModel(backend_pb2.ModelOptions(Model="moonshine/tiny"))
self.assertTrue(load_response.success)
# Perform transcription
transcript_request = backend_pb2.TranscriptRequest(dst=audio_file)
transcript_response = stub.AudioTranscription(transcript_request)
# Print the transcribed text for debugging
print(f"Transcribed text: {transcript_response.text}")
print(f"Number of segments: {len(transcript_response.segments)}")
# Verify response structure
self.assertIsNotNone(transcript_response)
self.assertIsNotNone(transcript_response.text)
# Protobuf repeated fields return a sequence, not a list
self.assertIsNotNone(transcript_response.segments)
# Check if segments is iterable (has length)
self.assertGreaterEqual(len(transcript_response.segments), 0)
# Verify the transcription contains the expected text
expected_text = "This is the micro machine man presenting the most midget miniature"
self.assertIn(
expected_text.lower(),
transcript_response.text.lower(),
f"Expected text '{expected_text}' not found in transcription: '{transcript_response.text}'"
)
# If we got segments, verify they have the expected structure
if len(transcript_response.segments) > 0:
segment = transcript_response.segments[0]
self.assertIsNotNone(segment.text)
self.assertIsInstance(segment.id, int)
else:
# Even if no segments, we should have text
self.assertIsNotNone(transcript_response.text)
self.assertGreater(len(transcript_response.text), 0)
except Exception as err:
print(err)
self.fail("AudioTranscription service failed")
finally:
self.tearDown()
# Clean up the temporary directory
if os.path.exists(temp_dir):
shutil.rmtree(temp_dir)

View File

@@ -0,0 +1,12 @@
#!/bin/bash
set -e
backend_dir=$(dirname $0)
if [ -d $backend_dir/common ]; then
source $backend_dir/common/libbackend.sh
else
source $backend_dir/../common/libbackend.sh
fi
runUnittests

View File

@@ -26,6 +26,12 @@ fi
EXTRA_PIP_INSTALL_FLAGS+=" --no-build-isolation"
if [ "x${BUILD_PROFILE}" == "xl4t12" ]; then
USE_PIP=true
fi
git clone https://github.com/neuphonic/neutts-air neutts-air
cp -rfv neutts-air/neuttsair ./

View File

@@ -1,5 +1,5 @@
--extra-index-url https://download.pytorch.org/whl/rocm6.3
torch==2.8.0+rocm6.3
--extra-index-url https://download.pytorch.org/whl/rocm6.4
torch==2.8.0+rocm6.4
transformers==4.56.1
accelerate
librosa==0.11.0

View File

@@ -0,0 +1,23 @@
.PHONY: pocket-tts
pocket-tts:
bash install.sh
.PHONY: run
run: pocket-tts
@echo "Running pocket-tts..."
bash run.sh
@echo "pocket-tts run."
.PHONY: test
test: pocket-tts
@echo "Testing pocket-tts..."
bash test.sh
@echo "pocket-tts tested."
.PHONY: protogen-clean
protogen-clean:
$(RM) backend_pb2_grpc.py backend_pb2.py
.PHONY: clean
clean: protogen-clean
rm -rf venv __pycache__

View File

@@ -0,0 +1,255 @@
#!/usr/bin/env python3
"""
This is an extra gRPC server of LocalAI for Pocket TTS
"""
from concurrent import futures
import time
import argparse
import signal
import sys
import os
import traceback
import scipy.io.wavfile
import backend_pb2
import backend_pb2_grpc
import torch
from pocket_tts import TTSModel
import grpc
def is_float(s):
"""Check if a string can be converted to float."""
try:
float(s)
return True
except ValueError:
return False
def is_int(s):
"""Check if a string can be converted to int."""
try:
int(s)
return True
except ValueError:
return False
_ONE_DAY_IN_SECONDS = 60 * 60 * 24
# If MAX_WORKERS are specified in the environment use it, otherwise default to 1
MAX_WORKERS = int(os.environ.get('PYTHON_GRPC_MAX_WORKERS', '1'))
# Implement the BackendServicer class with the service methods
class BackendServicer(backend_pb2_grpc.BackendServicer):
"""
BackendServicer is the class that implements the gRPC service
"""
def Health(self, request, context):
return backend_pb2.Reply(message=bytes("OK", 'utf-8'))
def LoadModel(self, request, context):
# Get device
if torch.cuda.is_available():
print("CUDA is available", file=sys.stderr)
device = "cuda"
else:
print("CUDA is not available", file=sys.stderr)
device = "cpu"
mps_available = hasattr(torch.backends, "mps") and torch.backends.mps.is_available()
if mps_available:
device = "mps"
if not torch.cuda.is_available() and request.CUDA:
return backend_pb2.Result(success=False, message="CUDA is not available")
# Normalize potential 'mpx' typo to 'mps'
if device == "mpx":
print("Note: device 'mpx' detected, treating it as 'mps'.", file=sys.stderr)
device = "mps"
# Validate mps availability if requested
if device == "mps" and not torch.backends.mps.is_available():
print("Warning: MPS not available. Falling back to CPU.", file=sys.stderr)
device = "cpu"
self.device = device
options = request.Options
# empty dict
self.options = {}
# The options are a list of strings in this form optname:optvalue
# We are storing all the options in a dict so we can use it later when
# generating the audio
for opt in options:
if ":" not in opt:
continue
key, value = opt.split(":", 1) # Split only on first colon
# if value is a number, convert it to the appropriate type
if is_float(value):
value = float(value)
elif is_int(value):
value = int(value)
elif value.lower() in ["true", "false"]:
value = value.lower() == "true"
self.options[key] = value
# Default voice for caching
self.default_voice_url = self.options.get("default_voice", None)
self._voice_cache = {}
try:
print("Loading Pocket TTS model", file=sys.stderr)
self.tts_model = TTSModel.load_model()
print(f"Model loaded successfully. Sample rate: {self.tts_model.sample_rate}", file=sys.stderr)
# Pre-load default voice if specified
if self.default_voice_url:
try:
print(f"Pre-loading default voice: {self.default_voice_url}", file=sys.stderr)
voice_state = self.tts_model.get_state_for_audio_prompt(self.default_voice_url)
self._voice_cache[self.default_voice_url] = voice_state
print("Default voice loaded successfully", file=sys.stderr)
except Exception as e:
print(f"Warning: Failed to pre-load default voice: {e}", file=sys.stderr)
except Exception as err:
return backend_pb2.Result(success=False, message=f"Unexpected {err=}, {type(err)=}")
return backend_pb2.Result(message="Model loaded successfully", success=True)
def _get_voice_state(self, voice_input):
"""
Get voice state from cache or load it.
voice_input can be:
- HuggingFace URL (e.g., hf://kyutai/tts-voices/alba-mackenna/casual.wav)
- Local file path
- None (use default)
"""
# Use default if no voice specified
if not voice_input:
voice_input = self.default_voice_url
if not voice_input:
return None
# Check cache first
if voice_input in self._voice_cache:
return self._voice_cache[voice_input]
# Load voice state
try:
print(f"Loading voice from: {voice_input}", file=sys.stderr)
voice_state = self.tts_model.get_state_for_audio_prompt(voice_input)
self._voice_cache[voice_input] = voice_state
return voice_state
except Exception as e:
print(f"Error loading voice from {voice_input}: {e}", file=sys.stderr)
return None
def TTS(self, request, context):
try:
# Determine voice input
# Priority: request.voice > AudioPath (from ModelOptions) > default
voice_input = None
if request.voice:
voice_input = request.voice
elif hasattr(request, 'AudioPath') and request.AudioPath:
# Use AudioPath as voice file
if os.path.isabs(request.AudioPath):
voice_input = request.AudioPath
elif hasattr(request, 'ModelFile') and request.ModelFile:
model_file_base = os.path.dirname(request.ModelFile)
voice_input = os.path.join(model_file_base, request.AudioPath)
elif hasattr(request, 'ModelPath') and request.ModelPath:
voice_input = os.path.join(request.ModelPath, request.AudioPath)
else:
voice_input = request.AudioPath
# Get voice state
voice_state = self._get_voice_state(voice_input)
if voice_state is None:
return backend_pb2.Result(
success=False,
message=f"Voice not found or failed to load: {voice_input}. Please provide a valid voice URL or file path."
)
# Prepare text
text = request.text.strip()
if not text:
return backend_pb2.Result(
success=False,
message="Text is empty"
)
print(f"Generating audio for text: {text[:50]}...", file=sys.stderr)
# Generate audio
audio = self.tts_model.generate_audio(voice_state, text)
# Audio is a 1D torch tensor containing PCM data
if audio is None or audio.numel() == 0:
return backend_pb2.Result(
success=False,
message="No audio generated"
)
# Save audio to file
output_path = request.dst
if not output_path:
output_path = "/tmp/pocket-tts-output.wav"
# Ensure output directory exists
output_dir = os.path.dirname(output_path)
if output_dir and not os.path.exists(output_dir):
os.makedirs(output_dir, exist_ok=True)
# Convert torch tensor to numpy and save
audio_numpy = audio.numpy()
scipy.io.wavfile.write(output_path, self.tts_model.sample_rate, audio_numpy)
print(f"Saved audio to {output_path}", file=sys.stderr)
except Exception as err:
print(f"Error in TTS: {err}", file=sys.stderr)
print(traceback.format_exc(), file=sys.stderr)
return backend_pb2.Result(success=False, message=f"Unexpected {err=}, {type(err)=}")
return backend_pb2.Result(success=True)
def serve(address):
server = grpc.server(futures.ThreadPoolExecutor(max_workers=MAX_WORKERS),
options=[
('grpc.max_message_length', 50 * 1024 * 1024), # 50MB
('grpc.max_send_message_length', 50 * 1024 * 1024), # 50MB
('grpc.max_receive_message_length', 50 * 1024 * 1024), # 50MB
])
backend_pb2_grpc.add_BackendServicer_to_server(BackendServicer(), server)
server.add_insecure_port(address)
server.start()
print("Server started. Listening on: " + address, file=sys.stderr)
# Define the signal handler function
def signal_handler(sig, frame):
print("Received termination signal. Shutting down...")
server.stop(0)
sys.exit(0)
# Set the signal handlers for SIGINT and SIGTERM
signal.signal(signal.SIGINT, signal_handler)
signal.signal(signal.SIGTERM, signal_handler)
try:
while True:
time.sleep(_ONE_DAY_IN_SECONDS)
except KeyboardInterrupt:
server.stop(0)
if __name__ == "__main__":
parser = argparse.ArgumentParser(description="Run the gRPC server.")
parser.add_argument(
"--addr", default="localhost:50051", help="The address to bind the server to."
)
args = parser.parse_args()
serve(args.addr)

View File

@@ -16,4 +16,15 @@ if [ "x${BUILD_PROFILE}" == "xintel" ]; then
EXTRA_PIP_INSTALL_FLAGS+=" --upgrade --index-strategy=unsafe-first-match"
fi
# Use python 3.12 for l4t
if [ "x${BUILD_PROFILE}" == "xl4t13" ]; then
PYTHON_VERSION="3.12"
PYTHON_PATCH="12"
PY_STANDALONE_TAG="20251120"
fi
if [ "x${BUILD_PROFILE}" == "xl4t12" ]; then
USE_PIP=true
fi
installRequirements

View File

@@ -0,0 +1,11 @@
#!/bin/bash
set -e
backend_dir=$(dirname $0)
if [ -d $backend_dir/common ]; then
source $backend_dir/common/libbackend.sh
else
source $backend_dir/../common/libbackend.sh
fi
python3 -m grpc_tools.protoc -I../.. -I./ --python_out=. --grpc_python_out=. backend.proto

Some files were not shown because too many files have changed in this diff Show More