Compare commits

...

148 Commits

Author SHA1 Message Date
Ettore Di Giacinto
e169492543 fix: this backend is CUDA only
Signed-off-by: Ettore Di Giacinto <mudler@localai.io>
2026-02-26 23:08:24 +00:00
Ettore Di Giacinto
51c26f1f39 feat(backends): add faster-qwen3-tts
Signed-off-by: Ettore Di Giacinto <mudler@localai.io>
2026-02-26 23:00:59 +00:00
LocalAI [bot]
65082b3a6f fix: Add named volumes for Windows Docker compatibility (#8661)
- Added named volumes (models, images) to docker-compose.yaml
- Added named volumes (models, backends) to .devcontainer/docker-compose-devcontainer.yml
- Changed bind mounts to named volumes for Windows compatibility

Fixes #8455

Signed-off-by: localai-bot <localai-bot@users.noreply.github.com>
Co-authored-by: localai-bot <localai-bot@users.noreply.github.com>
2026-02-26 23:18:53 +01:00
Ettore Di Giacinto
0483d47674 Change condition for dependabot job in workflow
Signed-off-by: Ettore Di Giacinto <mudler@users.noreply.github.com>
2026-02-26 23:17:33 +01:00
LocalAI [bot]
8ad40091a6 chore: ⬆️ Update ggml-org/llama.cpp to 723c71064da0908c19683f8c344715fbf6d986fd (#8660)
⬆️ Update ggml-org/llama.cpp

Signed-off-by: github-actions[bot] <41898282+github-actions[bot]@users.noreply.github.com>
Co-authored-by: mudler <2420543+mudler@users.noreply.github.com>
2026-02-26 21:34:47 +00:00
LocalAI [bot]
8bfe458fbc fix: change file permissions from 0600 to 0644 in InstallModel (#8657)
Closes #8119

When installing models from the gallery, files are created with 0600
permissions (owner read/write only), making them unreadable by the
LocalAI server when running as a different user.

This fix changes the permissions to 0644 (owner read/write, group/others
read), allowing the server to read model files regardless of the user
it runs as.

Co-authored-by: localai-bot <localai-bot@users.noreply.github.com>
2026-02-26 09:38:54 +01:00
Ettore Di Giacinto
657ba8cdad fix(chat): do not send thinking/reasoning messages to the LLM (#8656)
Signed-off-by: Ettore Di Giacinto <mudler@localai.io>
2026-02-26 00:06:35 +01:00
LocalAI [bot]
fb86f6461d chore: ⬆️ Update ggml-org/llama.cpp to 3769fe6eb70b0a0fbb30b80917f1caae68c902f7 (#8655)
⬆️ Update ggml-org/llama.cpp

Signed-off-by: github-actions[bot] <41898282+github-actions[bot]@users.noreply.github.com>
Co-authored-by: mudler <2420543+mudler@users.noreply.github.com>
2026-02-26 00:05:03 +01:00
LocalAI [bot]
1027c487a6 fix: reload model after editing YAML config (issue #8647) (#8652)
fix: reload model configuration after editing (issue #8647)

- Add *model.ModelLoader parameter to EditModelEndpoint
- Call ml.ShutdownModel() after saving config to unload the running model
- Model will be reloaded on next inference request with new settings (e.g., context_size)
- Update route registration to pass ml to EditModelEndpoint

Signed-off-by: localai-bot <localai-bot@users.noreply.github.com>
Co-authored-by: localai-bot <localai-bot@users.noreply.github.com>
2026-02-25 22:18:42 +01:00
LocalAI [bot]
bb226d1eaa feat(swagger): update swagger (#8654)
Signed-off-by: github-actions[bot] <41898282+github-actions[bot]@users.noreply.github.com>
Co-authored-by: mudler <2420543+mudler@users.noreply.github.com>
2026-02-25 21:52:35 +01:00
Ettore Di Giacinto
b032cf489b fix(chatterbox): add support for cuda13/aarch64 (#8653)
Signed-off-by: Ettore Di Giacinto <mudler@localai.io>
2026-02-25 21:51:44 +01:00
Copilot
3ac7301f31 Add sample_rate support to TTS API via post-processing resampling (#8650)
* Initial plan

* Add TTS sample_rate support via AudioResample post-processing

Co-authored-by: mudler <2420543+mudler@users.noreply.github.com>

---------

Co-authored-by: copilot-swe-agent[bot] <198982749+Copilot@users.noreply.github.com>
Co-authored-by: mudler <2420543+mudler@users.noreply.github.com>
2026-02-25 16:36:27 +01:00
dependabot[bot]
c4783a0a05 chore(deps): bump grpcio from 1.76.0 to 1.78.1 in /backend/python/vllm (#8635)
Bumps [grpcio](https://github.com/grpc/grpc) from 1.76.0 to 1.78.1.
- [Release notes](https://github.com/grpc/grpc/releases)
- [Commits](https://github.com/grpc/grpc/compare/v1.76.0...v1.78.1)

---
updated-dependencies:
- dependency-name: grpcio
  dependency-version: 1.78.1
  dependency-type: direct:production
  update-type: version-update:semver-minor
...

Signed-off-by: dependabot[bot] <support@github.com>
Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com>
2026-02-25 08:17:32 +01:00
dependabot[bot]
c44f03b882 chore(deps): bump grpcio from 1.76.0 to 1.78.1 in /backend/python/rerankers (#8636)
chore(deps): bump grpcio in /backend/python/rerankers

Bumps [grpcio](https://github.com/grpc/grpc) from 1.76.0 to 1.78.1.
- [Release notes](https://github.com/grpc/grpc/releases)
- [Commits](https://github.com/grpc/grpc/compare/v1.76.0...v1.78.1)

---
updated-dependencies:
- dependency-name: grpcio
  dependency-version: 1.78.1
  dependency-type: direct:production
  update-type: version-update:semver-minor
...

Signed-off-by: dependabot[bot] <support@github.com>
Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com>
2026-02-25 08:16:57 +01:00
dependabot[bot]
eeec92af78 chore(deps): bump sentence-transformers from 5.2.2 to 5.2.3 in /backend/python/transformers (#8638)
chore(deps): bump sentence-transformers in /backend/python/transformers

Bumps [sentence-transformers](https://github.com/huggingface/sentence-transformers) from 5.2.2 to 5.2.3.
- [Release notes](https://github.com/huggingface/sentence-transformers/releases)
- [Commits](https://github.com/huggingface/sentence-transformers/compare/v5.2.2...v5.2.3)

---
updated-dependencies:
- dependency-name: sentence-transformers
  dependency-version: 5.2.3
  dependency-type: direct:production
  update-type: version-update:semver-patch
...

Signed-off-by: dependabot[bot] <support@github.com>
Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com>
2026-02-25 08:16:41 +01:00
dependabot[bot]
842033b8b5 chore(deps): bump grpcio from 1.76.0 to 1.78.1 in /backend/python/transformers (#8640)
chore(deps): bump grpcio in /backend/python/transformers

Bumps [grpcio](https://github.com/grpc/grpc) from 1.76.0 to 1.78.1.
- [Release notes](https://github.com/grpc/grpc/releases)
- [Commits](https://github.com/grpc/grpc/compare/v1.76.0...v1.78.1)

---
updated-dependencies:
- dependency-name: grpcio
  dependency-version: 1.78.1
  dependency-type: direct:production
  update-type: version-update:semver-minor
...

Signed-off-by: dependabot[bot] <support@github.com>
Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com>
2026-02-25 08:14:55 +01:00
dependabot[bot]
a2941228a7 chore(deps): bump grpcio from 1.76.0 to 1.78.1 in /backend/python/common/template (#8641)
chore(deps): bump grpcio in /backend/python/common/template

Bumps [grpcio](https://github.com/grpc/grpc) from 1.76.0 to 1.78.1.
- [Release notes](https://github.com/grpc/grpc/releases)
- [Commits](https://github.com/grpc/grpc/compare/v1.76.0...v1.78.1)

---
updated-dependencies:
- dependency-name: grpcio
  dependency-version: 1.78.1
  dependency-type: direct:production
  update-type: version-update:semver-minor
...

Signed-off-by: dependabot[bot] <support@github.com>
Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com>
2026-02-25 08:14:43 +01:00
dependabot[bot]
791e6b84ee chore(deps): bump grpcio from 1.76.0 to 1.78.1 in /backend/python/coqui (#8642)
Bumps [grpcio](https://github.com/grpc/grpc) from 1.76.0 to 1.78.1.
- [Release notes](https://github.com/grpc/grpc/releases)
- [Commits](https://github.com/grpc/grpc/compare/v1.76.0...v1.78.1)

---
updated-dependencies:
- dependency-name: grpcio
  dependency-version: 1.78.1
  dependency-type: direct:production
  update-type: version-update:semver-minor
...

Signed-off-by: dependabot[bot] <support@github.com>
Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com>
2026-02-25 08:14:30 +01:00
LocalAI [bot]
d845c39963 docs: add Podman installation documentation (#8646)
* docs: add Podman installation documentation

- Add new podman.md with comprehensive installation and usage guide
- Cover installation on multiple platforms (Ubuntu, Fedora, Arch, macOS, Windows)
- Document GPU support (NVIDIA CUDA, AMD ROCm, Intel, Vulkan)
- Include rootless container configuration
- Document Docker Compose with podman-compose
- Add troubleshooting section for common issues
- Link to Podman documentation in installation index
- Update image references to use Docker Hub and link to docker docs
- Change YAML heredoc to EOF in compose.yaml example
- Add curly brackets to notice shortcode and fix link

Closes #8645

Signed-off-by: localai-bot <localai-bot@users.noreply.github.com>

* docs: merge Docker and Podman docs into unified Containers guide

Following the review comment, we have merged the Docker and Podman
documentation into a single 'Containers' page that covers both container
engines. The Docker and Podman pages now redirect to this unified guide.

Changes:
- Added new docs/content/installation/containers.md with combined Docker/Podman guide
- Updated docs/content/installation/docker.md to redirect to containers
- Updated docs/content/installation/podman.md to redirect to containers
- Updated docs/content/installation/_index.en.md to link to containers

Signed-off-by: LocalAI [bot] <localai-bot@users.noreply.github.com>
Signed-off-by: localai-bot <localai-bot@users.noreply.github.com>

* docs: remove podman.md as docs are merged into containers.md

Signed-off-by: localai-bot <localai-bot@users.noreply.github.com>

---------

Signed-off-by: localai-bot <localai-bot@users.noreply.github.com>
Signed-off-by: LocalAI [bot] <localai-bot@users.noreply.github.com>
Co-authored-by: localai-bot <localai-bot@users.noreply.github.com>
2026-02-25 08:13:55 +01:00
LocalAI [bot]
1331e23b67 chore: ⬆️ Update ggml-org/llama.cpp to 418dea39cea85d3496c8b04a118c3b17f3940ad8 (#8649)
⬆️ Update ggml-org/llama.cpp

Signed-off-by: github-actions[bot] <41898282+github-actions[bot]@users.noreply.github.com>
Co-authored-by: mudler <2420543+mudler@users.noreply.github.com>
2026-02-25 00:04:48 +00:00
LocalAI [bot]
36ff2a0138 fix(webui): use different icon for System nav item (#8648)
Change the System nav item icon from fas fa-server to fas fa-desktop
to distinguish it from the Backends nav item which still uses fa-server.

Signed-off-by: localai-bot <localai-bot@users.noreply.github.com>
Co-authored-by: localai-bot <localai-bot@users.noreply.github.com>
2026-02-24 17:10:58 +01:00
LocalAI [bot]
db6ba4ef07 chore: remove install.sh script and documentation references (#8643)
* chore: remove install.sh script and documentation references

- Delete docs/static/install.sh (broken installer causing issues)
- Remove One-Line Installer section from linux.md documentation
- Remove install.sh references from installation/_index.en.md
- Remove install.sh warning and commands from README.md

Closes #8032

* fix: add missing closing braces to notice shortcode
2026-02-24 08:36:25 +01:00
dependabot[bot]
d19dcac863 chore(deps): bump github.com/mudler/cogito from 0.9.1-0.20260217143801-bb7f986ed2c7 to 0.9.1 (#8632)
chore(deps): bump github.com/mudler/cogito

Bumps [github.com/mudler/cogito](https://github.com/mudler/cogito) from 0.9.1-0.20260217143801-bb7f986ed2c7 to 0.9.1.
- [Release notes](https://github.com/mudler/cogito/releases)
- [Commits](https://github.com/mudler/cogito/commits/v0.9.1)

---
updated-dependencies:
- dependency-name: github.com/mudler/cogito
  dependency-version: 0.9.1
  dependency-type: direct:production
  update-type: version-update:semver-patch
...

Signed-off-by: dependabot[bot] <support@github.com>
Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com>
2026-02-24 01:29:47 +00:00
dependabot[bot]
fd42675bec chore(deps): bump goreleaser/goreleaser-action from 6 to 7 (#8634)
Bumps [goreleaser/goreleaser-action](https://github.com/goreleaser/goreleaser-action) from 6 to 7.
- [Release notes](https://github.com/goreleaser/goreleaser-action/releases)
- [Commits](https://github.com/goreleaser/goreleaser-action/compare/v6...v7)

---
updated-dependencies:
- dependency-name: goreleaser/goreleaser-action
  dependency-version: '7'
  dependency-type: direct:production
  update-type: version-update:semver-major
...

Signed-off-by: dependabot[bot] <support@github.com>
Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com>
2026-02-23 23:27:49 +01:00
dependabot[bot]
3391538806 chore(deps): bump actions/stale from 10.1.1 to 10.2.0 (#8633)
Bumps [actions/stale](https://github.com/actions/stale) from 10.1.1 to 10.2.0.
- [Release notes](https://github.com/actions/stale/releases)
- [Changelog](https://github.com/actions/stale/blob/main/CHANGELOG.md)
- [Commits](997185467f...b5d41d4e1d)

---
updated-dependencies:
- dependency-name: actions/stale
  dependency-version: 10.2.0
  dependency-type: direct:production
  update-type: version-update:semver-minor
...

Signed-off-by: dependabot[bot] <support@github.com>
Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com>
2026-02-23 23:27:20 +01:00
dependabot[bot]
c4f879c4ea chore(deps): bump github.com/gpustack/gguf-parser-go from 0.23.1 to 0.24.0 (#8631)
chore(deps): bump github.com/gpustack/gguf-parser-go

Bumps [github.com/gpustack/gguf-parser-go](https://github.com/gpustack/gguf-parser-go) from 0.23.1 to 0.24.0.
- [Release notes](https://github.com/gpustack/gguf-parser-go/releases)
- [Commits](https://github.com/gpustack/gguf-parser-go/compare/v0.23.1...v0.24.0)

---
updated-dependencies:
- dependency-name: github.com/gpustack/gguf-parser-go
  dependency-version: 0.24.0
  dependency-type: direct:production
  update-type: version-update:semver-minor
...

Signed-off-by: dependabot[bot] <support@github.com>
Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com>
2026-02-23 23:26:41 +01:00
dependabot[bot]
b7e0de54fe chore(deps): bump github.com/anthropics/anthropic-sdk-go from 1.22.0 to 1.26.0 (#8630)
chore(deps): bump github.com/anthropics/anthropic-sdk-go

Bumps [github.com/anthropics/anthropic-sdk-go](https://github.com/anthropics/anthropic-sdk-go) from 1.22.0 to 1.26.0.
- [Release notes](https://github.com/anthropics/anthropic-sdk-go/releases)
- [Changelog](https://github.com/anthropics/anthropic-sdk-go/blob/main/CHANGELOG.md)
- [Commits](https://github.com/anthropics/anthropic-sdk-go/compare/v1.22.0...v1.26.0)

---
updated-dependencies:
- dependency-name: github.com/anthropics/anthropic-sdk-go
  dependency-version: 1.26.0
  dependency-type: direct:production
  update-type: version-update:semver-minor
...

Signed-off-by: dependabot[bot] <support@github.com>
Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com>
2026-02-23 23:26:15 +01:00
dependabot[bot]
f0868acdf3 chore(deps): bump fyne.io/fyne/v2 from 2.7.2 to 2.7.3 (#8629)
Bumps [fyne.io/fyne/v2](https://github.com/fyne-io/fyne) from 2.7.2 to 2.7.3.
- [Release notes](https://github.com/fyne-io/fyne/releases)
- [Changelog](https://github.com/fyne-io/fyne/blob/v2.7.3/CHANGELOG.md)
- [Commits](https://github.com/fyne-io/fyne/compare/v2.7.2...v2.7.3)

---
updated-dependencies:
- dependency-name: fyne.io/fyne/v2
  dependency-version: 2.7.3
  dependency-type: direct:production
  update-type: version-update:semver-patch
...

Signed-off-by: dependabot[bot] <support@github.com>
Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com>
2026-02-23 23:25:46 +01:00
LocalAI [bot]
9a5b5ee8a9 chore: ⬆️ Update ggml-org/llama.cpp to b68a83e641b3ebe6465970b34e99f3f0e0a0b21a (#8628)
⬆️ Update ggml-org/llama.cpp

Signed-off-by: github-actions[bot] <41898282+github-actions[bot]@users.noreply.github.com>
Co-authored-by: mudler <2420543+mudler@users.noreply.github.com>
2026-02-23 22:02:40 +00:00
Lukas Schaefer
ed0bfb8732 fix: rename json_verbose to verbose_json (#8627)
Signed-off-by: Lukas Schaefer <lukas@lschaefer.xyz>
2026-02-23 17:57:06 +00:00
Richard Palethorpe
be84b1d258 feat(traces): Use accordian instead of pop-ups (#8626)
Signed-off-by: Richard Palethorpe <io@richiejp.com>
2026-02-23 13:07:41 +01:00
Andres
cbedcc9091 fix(api): Downgrade health/readiness check to debug (#8625)
Downgrade health/readiness check to debug

Signed-off-by: Andres Smith <andressmithdev@pm.me>
2026-02-23 11:58:04 +01:00
Andres
e45d63c86e fix(cli): Fix watchdog running constantly and spamming logs (#8624)
* Fix watchdog running constantly and spamming logs

Signed-off-by: Andres Smith <andressmithdev@pm.me>

* Update docs

Signed-off-by: Andres Smith <andressmithdev@pm.me>

---------

Signed-off-by: Andres Smith <andressmithdev@pm.me>
2026-02-23 11:57:28 +01:00
LocalAI [bot]
f40c8dd0ce chore: ⬆️ Update ggml-org/llama.cpp to 2b6dfe824de8600c061ef91ce5cc5c307f97112c (#8622)
⬆️ Update ggml-org/llama.cpp

Signed-off-by: github-actions[bot] <41898282+github-actions[bot]@users.noreply.github.com>
Co-authored-by: mudler <2420543+mudler@users.noreply.github.com>
2026-02-23 09:30:58 +00:00
LocalAI [bot]
559ab99890 docs: update diffusers multi-GPU documentation to mention tensor_parallel_size configuration (#8621)
* docs: update diffusers multi-GPU documentation to mention tensor_parallel_size configuration

* chore: revert backend/python/diffusers/README.md to original content

---------

Co-authored-by: Your Name <you@example.com>
2026-02-22 18:17:23 +01:00
LocalAI [bot]
91f2dd5820 chore: ⬆️ Update ggml-org/llama.cpp to f75c4e8bf52ea480ece07fd3d9a292f1d7f04bc5 (#8619)
⬆️ Update ggml-org/llama.cpp

Signed-off-by: github-actions[bot] <41898282+github-actions[bot]@users.noreply.github.com>
Co-authored-by: mudler <2420543+mudler@users.noreply.github.com>
2026-02-22 13:20:08 +01:00
LocalAI [bot]
8250815763 docs: ⬆️ update docs version mudler/LocalAI (#8618)
⬆️ Update docs version mudler/LocalAI

Signed-off-by: github-actions[bot] <41898282+github-actions[bot]@users.noreply.github.com>
Co-authored-by: mudler <2420543+mudler@users.noreply.github.com>
2026-02-21 21:18:40 +00:00
Richard Palethorpe
b1b67b973e fix(realtime): Add functions to conversation history (#8616)
Signed-off-by: Richard Palethorpe <io@richiejp.com>
2026-02-21 19:03:49 +01:00
LocalAI [bot]
fcecc12e57 chore: ⬆️ Update ggml-org/llama.cpp to ba3b9c8844aca35ecb40d31886686326f22d2214 (#8613)
⬆️ Update ggml-org/llama.cpp

Signed-off-by: github-actions[bot] <41898282+github-actions[bot]@users.noreply.github.com>
Co-authored-by: mudler <2420543+mudler@users.noreply.github.com>
Co-authored-by: Ettore Di Giacinto <mudler@users.noreply.github.com>
2026-02-21 09:57:04 +01:00
Ettore Di Giacinto
51902df7ba fix: merge openresponses messages (#8615)
Signed-off-by: Ettore Di Giacinto <mudler@localai.io>
2026-02-21 09:56:43 +01:00
Ettore Di Giacinto
05f3ae31de chore: drop bark.cpp leftovers from pipelines (#8614)
Update bump_deps.yaml

Signed-off-by: Ettore Di Giacinto <mudler@users.noreply.github.com>
2026-02-21 09:24:09 +01:00
LocalAI [bot]
bb0924dff1 chore: ⬆️ Update ggml-org/llama.cpp to b908baf1825b1a89afef87b09e22c32af2ca6548 (#8612)
⬆️ Update ggml-org/llama.cpp

Signed-off-by: github-actions[bot] <41898282+github-actions[bot]@users.noreply.github.com>
Co-authored-by: mudler <2420543+mudler@users.noreply.github.com>
2026-02-20 23:47:47 +01:00
Richard Palethorpe
51eec4e6b8 feat(traces): Add backend traces (#8609)
Signed-off-by: Richard Palethorpe <io@richiejp.com>
2026-02-20 23:47:33 +01:00
LocalAI [bot]
462c82fad2 docs: ⬆️ update docs version mudler/LocalAI (#8611)
⬆️ Update docs version mudler/LocalAI

Signed-off-by: github-actions[bot] <41898282+github-actions[bot]@users.noreply.github.com>
Co-authored-by: mudler <2420543+mudler@users.noreply.github.com>
2026-02-20 21:19:51 +00:00
Ettore Di Giacinto
352b8aaa1b fix(ui): pass by needed values to unbreak model editor
Signed-off-by: Ettore Di Giacinto <mudler@localai.io>
2026-02-20 09:06:17 +01:00
Ettore Di Giacinto
df792d6243 chore(ui): improve navigation and buttons placement (#8608)
Signed-off-by: Ettore Di Giacinto <mudler@localai.io>
2026-02-19 23:41:05 +01:00
LocalAI [bot]
b1c434f0fc chore: ⬆️ Update ggml-org/llama.cpp to 11c325c6e0666a30590cde390d5746a405e536b9 (#8607)
⬆️ Update ggml-org/llama.cpp

Signed-off-by: github-actions[bot] <41898282+github-actions[bot]@users.noreply.github.com>
Co-authored-by: mudler <2420543+mudler@users.noreply.github.com>
2026-02-19 23:32:35 +01:00
LocalAI [bot]
bb42b342de chore: ⬆️ Update ggml-org/whisper.cpp to 21411d81ea736ed5d9cdea4df360d3c4b60a4adb (#8606)
⬆️ Update ggml-org/whisper.cpp

Signed-off-by: github-actions[bot] <41898282+github-actions[bot]@users.noreply.github.com>
Co-authored-by: mudler <2420543+mudler@users.noreply.github.com>
2026-02-19 23:32:21 +01:00
LocalAI [bot]
e555057f8b fix: multi-GPU support for Diffusers (Issue #8575) (#8605)
* chore: init

* feat: implement multi-GPU support for Diffusers backend (fixes #8575)

---------

Co-authored-by: localai-bot <localai-bot@users.noreply.github.com>
2026-02-19 21:35:58 +01:00
Ettore Di Giacinto
76fba02e56 fix: do not keep track model if not existing (#8603)
Signed-off-by: Ettore Di Giacinto <mudler@localai.io>
2026-02-19 17:18:38 +01:00
Ettore Di Giacinto
dadc7158fb fix(diffusers): sd_embed is not always available (#8602)
Seems sd_embed doesn't play well with MPS and L4T. Making it optional

Signed-off-by: Ettore Di Giacinto <mudler@localai.io>
2026-02-19 10:45:17 +01:00
LocalAI [bot]
68c7077491 chore: ⬆️ Update ggml-org/llama.cpp to b55dcdef5dcd74dc75c4921090e928d43453c157 (#8599)
⬆️ Update ggml-org/llama.cpp

Signed-off-by: github-actions[bot] <41898282+github-actions[bot]@users.noreply.github.com>
Co-authored-by: mudler <2420543+mudler@users.noreply.github.com>
2026-02-18 22:33:25 +01:00
Ettore Di Giacinto
b471619ad9 chore(deps): bump cogito and add new options to the agent config (#8601)
Signed-off-by: Ettore Di Giacinto <mudler@localai.io>
2026-02-18 22:10:26 +01:00
LocalAI [bot]
a0476d5567 chore(model-gallery): ⬆️ update checksum (#8600)
⬆️ Checksum updates in gallery/index.yaml

Signed-off-by: github-actions[bot] <41898282+github-actions[bot]@users.noreply.github.com>
Co-authored-by: mudler <2420543+mudler@users.noreply.github.com>
2026-02-18 21:52:28 +01:00
Ettore Di Giacinto
a2228f1418 fix(ui): improve view on mobile (#8598)
Signed-off-by: Ettore Di Giacinto <mudler@localai.io>
2026-02-18 19:50:59 +01:00
Ettore Di Giacinto
7dd9a155a3 fix(ui): drop duplicated footer import
Signed-off-by: Ettore Di Giacinto <mudler@localai.io>
2026-02-18 14:38:53 +01:00
Richard Palethorpe
4fe830ff58 fix(realtime): Limit buffer sizes to prevent DoS (#8596)
Signed-off-by: Richard Palethorpe <io@richiejp.com>
2026-02-18 14:36:43 +01:00
Richard Palethorpe
86b3bc9313 fix(realtime): Better support for thinking models and setting model parameters (#8595)
* fix(realtime): Wrap functions in OpenAI chat completions format

Signed-off-by: Richard Palethorpe <io@richiejp.com>

* feat(realtime): Set max tokens from session object

Signed-off-by: Richard Palethorpe <io@richiejp.com>

* fix(realtime): Find thinking start tag for thinking extraction

Signed-off-by: Richard Palethorpe <io@richiejp.com>

* fix(realtime): Don't send buffer cleared message when we automatically drop it

Signed-off-by: Richard Palethorpe <io@richiejp.com>

---------

Signed-off-by: Richard Palethorpe <io@richiejp.com>
2026-02-18 14:36:16 +01:00
Ettore Di Giacinto
2fabdc08e6 feat(ui): left navbar, dark/light theme (#8594)
* feat(ui): left navbar, dark/light theme

Signed-off-by: Ettore Di Giacinto <mudler@localai.io>

* darker background

Signed-off-by: Ettore Di Giacinto <mudler@localai.io>

---------

Signed-off-by: Ettore Di Giacinto <mudler@localai.io>
2026-02-18 00:14:39 +01:00
LocalAI [bot]
ed832cf0e0 chore: ⬆️ Update ggml-org/llama.cpp to 2b089c77580d347767f440205103e4da8ec33d89 (#8592)
⬆️ Update ggml-org/llama.cpp

Signed-off-by: github-actions[bot] <41898282+github-actions[bot]@users.noreply.github.com>
Co-authored-by: mudler <2420543+mudler@users.noreply.github.com>
Co-authored-by: Ettore Di Giacinto <mudler@users.noreply.github.com>
2026-02-17 22:35:07 +00:00
LocalAI [bot]
95db1da309 chore(model-gallery): ⬆️ update checksum (#8593)
⬆️ Checksum updates in gallery/index.yaml

Signed-off-by: github-actions[bot] <41898282+github-actions[bot]@users.noreply.github.com>
Co-authored-by: mudler <2420543+mudler@users.noreply.github.com>
2026-02-17 21:31:35 +01:00
Richard Palethorpe
9e692967c3 fix(llama-cpp): Pass parameters when using embedded template (#8590)
Signed-off-by: Richard Palethorpe <io@richiejp.com>
2026-02-17 18:50:05 +01:00
Ettore Di Giacinto
ecba23d44e fix: improve watchdown logics (#8591)
* fix: ensure proper watchdog shutdown and state passing between restarts

Signed-off-by: Ettore Di Giacinto <mudler@localai.io>

* fix: add missing watchdog settings

Signed-off-by: Ettore Di Giacinto <mudler@localai.io>

* fix: untrack model if we shut it down successfully

Signed-off-by: Ettore Di Giacinto <mudler@localai.io>

---------

Signed-off-by: Ettore Di Giacinto <mudler@localai.io>
2026-02-17 18:49:22 +01:00
LocalAI [bot]
067a255435 chore: ⬆️ Update ggml-org/llama.cpp to d612901116ab2066c7923372d4827032ff296bc4 (#8588)
⬆️ Update ggml-org/llama.cpp

Signed-off-by: github-actions[bot] <41898282+github-actions[bot]@users.noreply.github.com>
Co-authored-by: mudler <2420543+mudler@users.noreply.github.com>
2026-02-17 00:57:32 +01:00
dependabot[bot]
637ecba382 chore(deps): bump github.com/modelcontextprotocol/go-sdk from 1.2.0 to 1.3.0 (#8585)
chore(deps): bump github.com/modelcontextprotocol/go-sdk

Bumps [github.com/modelcontextprotocol/go-sdk](https://github.com/modelcontextprotocol/go-sdk) from 1.2.0 to 1.3.0.
- [Release notes](https://github.com/modelcontextprotocol/go-sdk/releases)
- [Commits](https://github.com/modelcontextprotocol/go-sdk/compare/v1.2.0...v1.3.0)

---
updated-dependencies:
- dependency-name: github.com/modelcontextprotocol/go-sdk
  dependency-version: 1.3.0
  dependency-type: direct:production
  update-type: version-update:semver-minor
...

Signed-off-by: dependabot[bot] <support@github.com>
Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com>
2026-02-16 23:04:19 +00:00
dependabot[bot]
46c64e59f5 chore(deps): bump github.com/jaypipes/ghw from 0.22.0 to 0.23.0 (#8587)
Bumps [github.com/jaypipes/ghw](https://github.com/jaypipes/ghw) from 0.22.0 to 0.23.0.
- [Release notes](https://github.com/jaypipes/ghw/releases)
- [Commits](https://github.com/jaypipes/ghw/compare/v0.22.0...v0.23.0)

---
updated-dependencies:
- dependency-name: github.com/jaypipes/ghw
  dependency-version: 0.23.0
  dependency-type: direct:production
  update-type: version-update:semver-minor
...

Signed-off-by: dependabot[bot] <support@github.com>
Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com>
2026-02-16 21:49:41 +00:00
dependabot[bot]
f806838c37 chore(deps): bump google.golang.org/grpc from 1.78.0 to 1.79.1 (#8583)
Bumps [google.golang.org/grpc](https://github.com/grpc/grpc-go) from 1.78.0 to 1.79.1.
- [Release notes](https://github.com/grpc/grpc-go/releases)
- [Commits](https://github.com/grpc/grpc-go/compare/v1.78.0...v1.79.1)

---
updated-dependencies:
- dependency-name: google.golang.org/grpc
  dependency-version: 1.79.1
  dependency-type: direct:production
  update-type: version-update:semver-minor
...

Signed-off-by: dependabot[bot] <support@github.com>
Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com>
2026-02-16 20:20:30 +00:00
Richard Palethorpe
074a982853 fix(gallery): Use YAML v3 to avoid merging maps with incompatible keys (#8580)
Signed-off-by: Richard Palethorpe <io@richiejp.com>
2026-02-16 14:10:19 +01:00
LocalAI [bot]
109f29cc24 chore: ⬆️ Update ggml-org/llama.cpp to 27b93cbd157fc4ad94573a1fbc226d3e18ea1bb4 (#8577)
⬆️ Update ggml-org/llama.cpp

Signed-off-by: github-actions[bot] <41898282+github-actions[bot]@users.noreply.github.com>
Co-authored-by: mudler <2420543+mudler@users.noreply.github.com>
2026-02-15 23:42:36 +01:00
LocalAI [bot]
587e4a21b3 chore: ⬆️ Update antirez/voxtral.c to 134d366c24d20c64b614a3dcc8bda2a6922d077d (#8578)
⬆️ Update antirez/voxtral.c

Signed-off-by: github-actions[bot] <41898282+github-actions[bot]@users.noreply.github.com>
Co-authored-by: mudler <2420543+mudler@users.noreply.github.com>
2026-02-15 23:42:11 +01:00
LocalAI [bot]
3f1f58b2ab chore: ⬆️ Update ggml-org/whisper.cpp to 364c77f4ca2737e3287652e0e8a8c6dce3231bba (#8576)
⬆️ Update ggml-org/whisper.cpp

Signed-off-by: github-actions[bot] <41898282+github-actions[bot]@users.noreply.github.com>
Co-authored-by: mudler <2420543+mudler@users.noreply.github.com>
2026-02-15 21:20:04 +00:00
Ettore Di Giacinto
01eb70caff Fix formatting in README.md for Audio to Text section
Signed-off-by: Ettore Di Giacinto <mudler@users.noreply.github.com>
2026-02-15 09:57:50 +01:00
LocalAI [bot]
d784851337 chore: ⬆️ Update ggml-org/llama.cpp to 01d8eaa28d57bfc6d06e30072085ed0ef12e06c5 (#8567)
⬆️ Update ggml-org/llama.cpp

Signed-off-by: github-actions[bot] <41898282+github-actions[bot]@users.noreply.github.com>
Co-authored-by: mudler <2420543+mudler@users.noreply.github.com>
2026-02-14 22:52:32 +01:00
Ettore Di Giacinto
1c4e5aa5c0 chore: bump cogito (#8568)
Adapt to new API and drop call to Ask()

Signed-off-by: Ettore Di Giacinto <mudler@localai.io>
2026-02-14 22:52:22 +01:00
LocalAI [bot]
94df096fb9 fix: pin neutts-air to known working commit (#8566)
* chore: init

* fix: pin neutts-air to known working commit

---------

Co-authored-by: localai-bot <localai-bot@users.noreply.github.com>
2026-02-14 21:16:37 +01:00
Ettore Di Giacinto
820bd7dd01 fix(ci): try to fix deps for l4t13 on qwen-*
Signed-off-by: Ettore Di Giacinto <mudler@localai.io>
2026-02-14 10:21:23 +01:00
Austen
42cb7bda19 fix(llama-cpp): populate tensor_buft_override buffer so llama-cpp properly performs fit calculations (#8560)
fix auto-fit for llama-cpp
2026-02-14 10:07:37 +01:00
Ettore Di Giacinto
2fb9940b8a fix(voxcpm): pin setuptools (#8556)
Signed-off-by: Ettore Di Giacinto <mudler@localai.io>
2026-02-13 23:44:35 +01:00
LocalAI [bot]
2ff0ad4190 chore: ⬆️ Update ggml-org/llama.cpp to 05a6f0e8946914918758db767f6eb04bc1e38507 (#8553)
⬆️ Update ggml-org/llama.cpp

Signed-off-by: github-actions[bot] <41898282+github-actions[bot]@users.noreply.github.com>
Co-authored-by: mudler <2420543+mudler@users.noreply.github.com>
2026-02-13 22:48:01 +01:00
Ettore Di Giacinto
bd12103ed4 chore: compute capabilities once (#8555)
Signed-off-by: Ettore Di Giacinto <mudler@localai.io>
2026-02-13 22:23:06 +01:00
LocalAI [bot]
2e17edd72a fix: prevent excessive logging in capability detection (#8552)
Closes #8527.

This PR fixes the excessive logging issue in capability detection by applying the existing capabilityLogged guard to the forced capability run file case.

## Changes
- Apply capabilityLogged flag to forced capability detection logging
- Prevents repeated log messages during backend discovery and gallery operations

Co-authored-by: localai-bot <localai-bot@users.noreply.github.com>
2026-02-13 20:00:29 +00:00
Richard Palethorpe
24aab68b3f feat(gallery): Add nanbeige4.1-3b (#8551)
Signed-off-by: Richard Palethorpe <io@richiejp.com>
2026-02-13 18:23:44 +01:00
Richard Palethorpe
5bdbb10593 fix(realtime): Send proper image data to backend (#8547)
* fix(realtime): Allow empty parameters

Signed-off-by: Richard Palethorpe <io@richiejp.com>

* fix(realtime): Just pass base64 string to backend

Signed-off-by: Richard Palethorpe <io@richiejp.com>

---------

Signed-off-by: Richard Palethorpe <io@richiejp.com>
2026-02-13 18:01:07 +01:00
Ettore Di Giacinto
2fd026e958 fix: update moonshine API, add setuptools to voxcpm requirements (#8541)
Signed-off-by: Ettore Di Giacinto <mudler@localai.io>
2026-02-12 23:22:37 +01:00
LocalAI [bot]
08718b656e chore: ⬆️ Update ggml-org/llama.cpp to 338085c69e486b7155e5b03d7b5087e02c0e2528 (#8538)
⬆️ Update ggml-org/llama.cpp

Signed-off-by: github-actions[bot] <41898282+github-actions[bot]@users.noreply.github.com>
Co-authored-by: mudler <2420543+mudler@users.noreply.github.com>
2026-02-12 23:21:53 +01:00
LocalAI [bot]
7121b189f7 chore(model-gallery): ⬆️ update checksum (#8540)
⬆️ Checksum updates in gallery/index.yaml

Signed-off-by: github-actions[bot] <41898282+github-actions[bot]@users.noreply.github.com>
Co-authored-by: mudler <2420543+mudler@users.noreply.github.com>
2026-02-12 21:54:33 +01:00
Richard Palethorpe
f6c80a6987 feat(realtime): Allow sending text, image and audio conversation items" (#8524)
feat(realtime): Allow sending text and image conversation items

Signed-off-by: Richard Palethorpe <io@richiejp.com>
Co-authored-by: Ettore Di Giacinto <mudler@users.noreply.github.com>
2026-02-12 19:33:46 +00:00
Ettore Di Giacinto
4a4d65f8e8 chore(model gallery): add vllm-omni models (#8536)
Signed-off-by: Ettore Di Giacinto <mudler@localai.io>
2026-02-12 18:27:20 +01:00
Ettore Di Giacinto
2858e71606 chore(model gallery): add neutts (#8535)
Signed-off-by: Ettore Di Giacinto <mudler@localai.io>
2026-02-12 18:17:03 +01:00
Ettore Di Giacinto
088205339c chore(model gallery): add voxcpm, whisperx, moonshine-tiny (#8534)
Signed-off-by: Ettore Di Giacinto <mudler@localai.io>
2026-02-12 18:13:03 +01:00
Ettore Di Giacinto
8616397d59 chore(model gallery): add nemo-asr (#8533)
Signed-off-by: Ettore Di Giacinto <mudler@localai.io>
2026-02-12 18:01:42 +01:00
Ettore Di Giacinto
1698f92bd0 Remove URL entry from gallery index
Signed-off-by: Ettore Di Giacinto <mudler@users.noreply.github.com>
2026-02-12 17:50:13 +01:00
Ettore Di Giacinto
02c95a57ae Add known use cases for audio processing
Signed-off-by: Ettore Di Giacinto <mudler@users.noreply.github.com>
2026-02-12 17:49:54 +01:00
rampa3
2ab6be1d0c chore(model gallery): Add npc-llm-3-8b (#8498)
Signed-off-by: rampa3 <68955305+rampa3@users.noreply.github.com>
2026-02-12 17:46:25 +01:00
Ettore Di Giacinto
9d78ec1bd8 chore(model gallery): add voxtral (which is only available in development) (#8532)
Signed-off-by: Ettore Di Giacinto <mudler@localai.io>
2026-02-12 17:44:24 +01:00
LocalAI [bot]
b10b85de52 chore: improve log levels verbosity (#8528)
* chore: init for PR

* feat: improve log verbosity per #8449 - demote /api/resources to DEBUG, elevate job events to INFO

---------

Co-authored-by: localai-bot <localai-bot@users.noreply.github.com>
2026-02-12 16:24:46 +01:00
Richard Palethorpe
1479bee894 fix(realtime): Sampling and websocket locking (#8521)
* fix(realtime): Use locked websocket for concurrent access

Signed-off-by: Richard Palethorpe <io@richiejp.com>

* fix(realtime): Use sample rate set in session

Signed-off-by: Richard Palethorpe <io@richiejp.com>

* fix(config): Allow pipelines to have no model parameters

Signed-off-by: Richard Palethorpe <io@richiejp.com>

---------

Signed-off-by: Richard Palethorpe <io@richiejp.com>
2026-02-12 13:57:34 +01:00
Austen
cff972094c feat(diffusers): add experimental support for sd_embed-style prompt embedding (#8504)
* add experimental support for sd_embed-style prompt embedding

Signed-off-by: Austen Dicken <cvpcsm@gmail.com>

* add doc equivalent to compel

Signed-off-by: Austen Dicken <cvpcsm@gmail.com>

* need to use flux1 embedding function for flux model

Signed-off-by: Austen Dicken <cvpcsm@gmail.com>

---------

Signed-off-by: Austen Dicken <cvpcsm@gmail.com>
2026-02-11 22:58:19 +01:00
LocalAI [bot]
79a25f7ae9 chore: ⬆️ Update ggml-org/llama.cpp to 4d3daf80f8834e0eb5148efc7610513f1e263653 (#8513)
⬆️ Update ggml-org/llama.cpp

Signed-off-by: github-actions[bot] <41898282+github-actions[bot]@users.noreply.github.com>
Co-authored-by: mudler <2420543+mudler@users.noreply.github.com>
2026-02-11 21:48:39 +00:00
Richard Palethorpe
7270a98ce5 fix(realtime): Use user provided voice and allow pipeline models to have no backend (#8415)
* fix(realtime): Use the voice provided by the user or none at all

Signed-off-by: Richard Palethorpe <io@richiejp.com>

* fix(ui,config): Allow pipeline models to have no backend and use same validation in frontend

Signed-off-by: Richard Palethorpe <io@richiejp.com>

---------

Signed-off-by: Richard Palethorpe <io@richiejp.com>
Co-authored-by: Ettore Di Giacinto <mudler@users.noreply.github.com>
2026-02-11 14:18:05 +01:00
LocalAI [bot]
0ee92317ec chore: ⬆️ Update ggml-org/llama.cpp to 57487a64c88c152ac72f3aea09bd1cc491b2f61e (#8499)
⬆️ Update ggml-org/llama.cpp

Signed-off-by: github-actions[bot] <41898282+github-actions[bot]@users.noreply.github.com>
Co-authored-by: mudler <2420543+mudler@users.noreply.github.com>
2026-02-10 21:32:46 +00:00
LocalAI [bot]
743d2d1947 chore: ⬆️ Update ggml-org/whisper.cpp to 764482c3175d9c3bc6089c1ec84df7d1b9537d83 (#8478)
⬆️ Update ggml-org/whisper.cpp

Signed-off-by: github-actions[bot] <41898282+github-actions[bot]@users.noreply.github.com>
Co-authored-by: mudler <2420543+mudler@users.noreply.github.com>
2026-02-10 15:14:59 +01:00
LocalAI [bot]
df04843f34 chore: ⬆️ Update ggml-org/llama.cpp to 262364e31d1da43596fe84244fba44e94a0de64e (#8479)
⬆️ Update ggml-org/llama.cpp

Signed-off-by: github-actions[bot] <41898282+github-actions[bot]@users.noreply.github.com>
Co-authored-by: mudler <2420543+mudler@users.noreply.github.com>
2026-02-10 15:14:33 +01:00
Kolega.dev
780877d1d0 security: validate URLs to prevent SSRF in content fetching endpoints (#8476)
User-supplied URLs passed to GetContentURIAsBase64() and downloadFile()
were fetched without validation, allowing SSRF attacks against internal
services. Added URL validation that blocks private IPs, loopback,
link-local, and cloud metadata endpoints before fetching.

Co-authored-by: kolega.dev <faizan@kolega.ai>
2026-02-10 15:14:14 +01:00
dependabot[bot]
08eeed61f4 chore(deps): bump github.com/openai/openai-go/v3 from 3.17.0 to 3.19.0 (#8485)
Bumps [github.com/openai/openai-go/v3](https://github.com/openai/openai-go) from 3.17.0 to 3.19.0.
- [Release notes](https://github.com/openai/openai-go/releases)
- [Changelog](https://github.com/openai/openai-go/blob/main/CHANGELOG.md)
- [Commits](https://github.com/openai/openai-go/compare/v3.17.0...v3.19.0)

---
updated-dependencies:
- dependency-name: github.com/openai/openai-go/v3
  dependency-version: 3.19.0
  dependency-type: direct:production
  update-type: version-update:semver-minor
...

Signed-off-by: dependabot[bot] <support@github.com>
Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com>
2026-02-10 05:41:15 +00:00
dependabot[bot]
5207ff84dc chore(deps): bump github.com/alecthomas/kong from 1.13.0 to 1.14.0 (#8481)
Bumps [github.com/alecthomas/kong](https://github.com/alecthomas/kong) from 1.13.0 to 1.14.0.
- [Commits](https://github.com/alecthomas/kong/compare/v1.13.0...v1.14.0)

---
updated-dependencies:
- dependency-name: github.com/alecthomas/kong
  dependency-version: 1.14.0
  dependency-type: direct:production
  update-type: version-update:semver-minor
...

Signed-off-by: dependabot[bot] <support@github.com>
Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com>
2026-02-10 04:29:00 +00:00
dependabot[bot]
4ade2e61ab chore(deps): bump github.com/onsi/ginkgo/v2 from 2.28.0 to 2.28.1 (#8483)
Bumps [github.com/onsi/ginkgo/v2](https://github.com/onsi/ginkgo) from 2.28.0 to 2.28.1.
- [Release notes](https://github.com/onsi/ginkgo/releases)
- [Changelog](https://github.com/onsi/ginkgo/blob/master/CHANGELOG.md)
- [Commits](https://github.com/onsi/ginkgo/compare/v2.28.0...v2.28.1)

---
updated-dependencies:
- dependency-name: github.com/onsi/ginkgo/v2
  dependency-version: 2.28.1
  dependency-type: direct:production
  update-type: version-update:semver-patch
...

Signed-off-by: dependabot[bot] <support@github.com>
Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com>
2026-02-10 03:15:46 +00:00
dependabot[bot]
818be98314 chore(deps): bump github.com/jaypipes/ghw from 0.21.2 to 0.22.0 (#8484)
Bumps [github.com/jaypipes/ghw](https://github.com/jaypipes/ghw) from 0.21.2 to 0.22.0.
- [Release notes](https://github.com/jaypipes/ghw/releases)
- [Commits](https://github.com/jaypipes/ghw/compare/v0.21.2...v0.22.0)

---
updated-dependencies:
- dependency-name: github.com/jaypipes/ghw
  dependency-version: 0.22.0
  dependency-type: direct:production
  update-type: version-update:semver-minor
...

Signed-off-by: dependabot[bot] <support@github.com>
Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com>
2026-02-10 02:02:38 +00:00
dependabot[bot]
056c438452 chore(deps): bump github.com/anthropics/anthropic-sdk-go from 1.20.0 to 1.22.0 (#8482)
chore(deps): bump github.com/anthropics/anthropic-sdk-go

Bumps [github.com/anthropics/anthropic-sdk-go](https://github.com/anthropics/anthropic-sdk-go) from 1.20.0 to 1.22.0.
- [Release notes](https://github.com/anthropics/anthropic-sdk-go/releases)
- [Changelog](https://github.com/anthropics/anthropic-sdk-go/blob/main/CHANGELOG.md)
- [Commits](https://github.com/anthropics/anthropic-sdk-go/compare/v1.20.0...v1.22.0)

---
updated-dependencies:
- dependency-name: github.com/anthropics/anthropic-sdk-go
  dependency-version: 1.22.0
  dependency-type: direct:production
  update-type: version-update:semver-minor
...

Signed-off-by: dependabot[bot] <support@github.com>
Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com>
2026-02-09 23:34:36 +00:00
LocalAI [bot]
0c040beb59 chore: ⬆️ Update antirez/voxtral.c to c9e8773a2042d67c637fc492c8a655c485354080 (#8477)
⬆️ Update antirez/voxtral.c

Signed-off-by: github-actions[bot] <41898282+github-actions[bot]@users.noreply.github.com>
Co-authored-by: mudler <2420543+mudler@users.noreply.github.com>
2026-02-09 22:20:03 +01:00
Ettore Di Giacinto
bf5a1dd840 feat(voxtral): add voxtral backend (#8451)
* feat(voxtral): add voxtral backend

Signed-off-by: Ettore Di Giacinto <mudler@localai.io>

* simplify

Signed-off-by: Ettore Di Giacinto <mudler@localai.io>

---------

Signed-off-by: Ettore Di Giacinto <mudler@localai.io>
2026-02-09 09:12:05 +01:00
rampa3
f44200bec8 chore(model gallery): Add Ministral 3 family of models (aside from base versions) (#8467)
Signed-off-by: rampa3 <68955305+rampa3@users.noreply.github.com>
2026-02-09 09:10:37 +01:00
LocalAI [bot]
3b1b08efd6 chore: ⬆️ Update ggml-org/llama.cpp to e06088da0fa86aa444409f38dff274904931c507 (#8464)
⬆️ Update ggml-org/llama.cpp

Signed-off-by: github-actions[bot] <41898282+github-actions[bot]@users.noreply.github.com>
Co-authored-by: mudler <2420543+mudler@users.noreply.github.com>
2026-02-09 09:09:32 +01:00
LocalAI [bot]
3d8791067f chore: ⬆️ Update ggml-org/whisper.cpp to 4b23ff249e7f93137cb870b28fb27818e074c255 (#8463)
⬆️ Update ggml-org/whisper.cpp

Signed-off-by: github-actions[bot] <41898282+github-actions[bot]@users.noreply.github.com>
Co-authored-by: mudler <2420543+mudler@users.noreply.github.com>
2026-02-09 09:08:55 +01:00
Austen
da8207b73b feat(stablediffusion-ggml): Improve legacy CPU support for stablediffusion-ggml backend (#8461)
* Port AVX logic from whisper to stablediffusion-ggml

Signed-off-by: Austen Dicken <cvpcsm@gmail.com>

* disable BMI2 on AVX builds

Signed-off-by: Austen Dicken <cvpcsm@gmail.com>

---------

Signed-off-by: Austen Dicken <cvpcsm@gmail.com>
Co-authored-by: Ettore Di Giacinto <mudler@users.noreply.github.com>
2026-02-08 23:11:33 +00:00
Varun Chawla
aa9ca401fa docs: update model gallery documentation to reference main repository (#8452)
Fixes #8212 - Updated the note about reporting broken models to
reference the main LocalAI repository instead of the outdated
separate gallery repository reference.
2026-02-08 22:14:23 +01:00
LocalAI [bot]
e43c0c3ffc docs: ⬆️ update docs version mudler/LocalAI (#8462)
⬆️ Update docs version mudler/LocalAI

Signed-off-by: github-actions[bot] <41898282+github-actions[bot]@users.noreply.github.com>
Co-authored-by: mudler <2420543+mudler@users.noreply.github.com>
2026-02-08 21:12:50 +00:00
LocalAI [bot]
944874d08b chore: ⬆️ Update ggml-org/llama.cpp to 8872ad2125336d209a9911a82101f80095a9831d (#8448)
⬆️ Update ggml-org/llama.cpp

Signed-off-by: github-actions[bot] <41898282+github-actions[bot]@users.noreply.github.com>
Co-authored-by: mudler <2420543+mudler@users.noreply.github.com>
2026-02-07 21:22:18 +00:00
Ettore Di Giacinto
3370d807c2 feat(nemo): add Nemo (only asr for now) backend (#8436)
* feat(nemo): add Nemo (only asr for now) backend

Signed-off-by: Ettore Di Giacinto <mudler@localai.io>

* feat(nemo): add Nemo backend without Python version pins (#8438)

* Initial plan

* Remove Python version pins from nemo backend install.sh

Co-authored-by: mudler <2420543+mudler@users.noreply.github.com>

* Pin pyarrow to 20.0.0 in nemo requirements

Co-authored-by: mudler <2420543+mudler@users.noreply.github.com>

---------

Co-authored-by: copilot-swe-agent[bot] <198982749+Copilot@users.noreply.github.com>
Co-authored-by: mudler <2420543+mudler@users.noreply.github.com>

---------

Signed-off-by: Ettore Di Giacinto <mudler@localai.io>
Co-authored-by: Copilot <198982749+Copilot@users.noreply.github.com>
Co-authored-by: mudler <2420543+mudler@users.noreply.github.com>
2026-02-07 08:19:37 +01:00
LocalAI [bot]
ae2689936a chore: ⬆️ Update ggml-org/llama.cpp to b83111815e9a79949257e9d4b087206b320a3063 (#8434)
⬆️ Update ggml-org/llama.cpp

Signed-off-by: github-actions[bot] <41898282+github-actions[bot]@users.noreply.github.com>
Co-authored-by: mudler <2420543+mudler@users.noreply.github.com>
2026-02-06 21:22:33 +00:00
Richard Palethorpe
15c12674b6 fix(qwen-asr): Remove contagious slop (DEFAULT_GOAL) from Makefile (#8431)
Signed-off-by: Richard Palethorpe <io@richiejp.com>
2026-02-06 17:12:45 +01:00
Richard Palethorpe
7fbe1d2e72 chore(models): Add Qwen TTS 0.6b (#8428)
Signed-off-by: Richard Palethorpe <io@richiejp.com>
2026-02-06 10:30:36 +01:00
Richard Palethorpe
c1d0b10b14 chore(docs): Document using a local model gallery (#8426)
Signed-off-by: Richard Palethorpe <io@richiejp.com>
2026-02-06 10:28:41 +01:00
Andres
efd552f83e fix(api)!: Stop model prior to deletion (#8422)
* Unload model prior to deletion

Signed-off-by: Andres Smith <andressmithdev@pm.me>

* Fix LFM model in gallery

Signed-off-by: Andres Smith <andressmithdev@pm.me>

* Remove mistakenly added files

Signed-off-by: Andres Smith <andressmithdev@pm.me>

---------

Signed-off-by: Andres Smith <andressmithdev@pm.me>
2026-02-06 09:22:10 +01:00
LocalAI [bot]
bcd927da6e chore: ⬆️ Update ggml-org/llama.cpp to 22cae832188a1f08d18bd0a707a4ba5cd03c7349 (#8419)
⬆️ Update ggml-org/llama.cpp

Signed-off-by: github-actions[bot] <41898282+github-actions[bot]@users.noreply.github.com>
Co-authored-by: mudler <2420543+mudler@users.noreply.github.com>
2026-02-06 09:21:33 +01:00
LocalAI [bot]
682ac7e637 chore(model-gallery): ⬆️ update checksum (#8420)
⬆️ Checksum updates in gallery/index.yaml

Signed-off-by: github-actions[bot] <41898282+github-actions[bot]@users.noreply.github.com>
Co-authored-by: mudler <2420543+mudler@users.noreply.github.com>
2026-02-05 21:34:26 +01:00
LocalAI [bot]
c8d74d35df feat(swagger): update swagger (#8418)
Signed-off-by: github-actions[bot] <41898282+github-actions[bot]@users.noreply.github.com>
Co-authored-by: mudler <2420543+mudler@users.noreply.github.com>
2026-02-05 21:33:44 +01:00
Ettore Di Giacinto
a849f285a5 chore(tests): add audio/wav to expected wav file 2026-02-05 20:27:06 +00:00
Ettore Di Giacinto
697f6aa71c feat(audio): set audio content type (#8416)
* feat(audio): set audio content type

Signed-off-by: Ettore Di Giacinto <mudler@localai.io>

* chore: add tests

Signed-off-by: Ettore Di Giacinto <mudler@localai.io>

---------

Signed-off-by: Ettore Di Giacinto <mudler@localai.io>
2026-02-05 19:14:12 +01:00
Ettore Di Giacinto
218d0526cb fix(qwen-tts): add six dependency
Signed-off-by: Ettore Di Giacinto <mudler@localai.io>
2026-02-05 18:05:31 +01:00
Ettore Di Giacinto
9bc5ab18fa fix(voxcpm): make sed call unix-compliant
Signed-off-by: Ettore Di Giacinto <mudler@localai.io>
2026-02-05 17:15:58 +01:00
Ettore Di Giacinto
a9267f391c fix(huggingface): add clean target
Signed-off-by: Ettore Di Giacinto <mudler@localai.io>
2026-02-05 16:54:41 +01:00
Ettore Di Giacinto
029ae3420d fix(package.sh): drop redundant -a and -R
-a implies already -R

Signed-off-by: Ettore Di Giacinto <mudler@localai.io>
2026-02-05 16:39:38 +01:00
Ettore Di Giacinto
c0461f32a1 fix: add missing clean targets
Signed-off-by: Ettore Di Giacinto <mudler@localai.io>
2026-02-05 16:38:16 +01:00
Ettore Di Giacinto
8989d2944e fix: add clean target to local-store
Signed-off-by: Ettore Di Giacinto <mudler@localai.io>
2026-02-05 14:55:34 +01:00
Ettore Di Giacinto
7aea2add44 Revert "chore(deps): bump torch from 2.4.1 to 2.7.1+xpu in /backend/python/rerankers in the pip group across 1 directory" (#8412)
Revert "chore(deps): bump torch from 2.4.1 to 2.7.1+xpu in /backend/python/re…"

This reverts commit 55e43b3f92.
2026-02-05 14:17:33 +01:00
dependabot[bot]
55e43b3f92 chore(deps): bump torch from 2.4.1 to 2.7.1+xpu in /backend/python/rerankers in the pip group across 1 directory (#8407)
chore(deps): bump torch

Bumps the pip group with 1 update in the /backend/python/rerankers directory: torch.


Updates `torch` from 2.4.1 to 2.7.1+xpu

---
updated-dependencies:
- dependency-name: torch
  dependency-version: 2.7.1+xpu
  dependency-type: direct:production
  dependency-group: pip
...

Signed-off-by: dependabot[bot] <support@github.com>
Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com>
2026-02-05 12:37:52 +00:00
Ettore Di Giacinto
53276d28e7 feat(musicgen): add ace-step and UI interface (#8396)
* feat(musicgen): add ace-step and UI interface

Signed-off-by: Ettore Di Giacinto <mudler@localai.io>

* Correctly handle model dir

Signed-off-by: Ettore Di Giacinto <mudler@localai.io>

* Drop auto-download

Signed-off-by: Ettore Di Giacinto <mudler@localai.io>

* Fixups

Signed-off-by: Ettore Di Giacinto <mudler@localai.io>

* Add to models, fixup UIs icons

Signed-off-by: Ettore Di Giacinto <mudler@localai.io>

* fixups

Signed-off-by: Ettore Di Giacinto <mudler@localai.io>

* Update docs

Signed-off-by: Ettore Di Giacinto <mudler@localai.io>

* l4t13 is incompatbile

Signed-off-by: Ettore Di Giacinto <mudler@localai.io>

* avoid pinning version for cuda12

Signed-off-by: Ettore Di Giacinto <mudler@localai.io>

* Drop l4t12

Signed-off-by: Ettore Di Giacinto <mudler@localai.io>

---------

Signed-off-by: Ettore Di Giacinto <mudler@localai.io>
2026-02-05 12:04:53 +01:00
Yaroslav98214
6dbcdb0b9e fix: filter GGUF and GGML files from model list (#8397)
Filter GGUF and GGML files from model list

Skip .gguf/.ggml loose files when listing models and add a test
for .gguf exclusion.

Closes #1077

Signed-off-by: Yaroslav98214 <diakovichyaroslav30@gmail.com>
2026-02-05 10:17:46 +01:00
LocalAI [bot]
c30866ba95 chore: ⬆️ Update ggml-org/llama.cpp to b536eb023368701fe3564210440e2df6151c3e65 (#8399)
⬆️ Update ggml-org/llama.cpp

Signed-off-by: github-actions[bot] <41898282+github-actions[bot]@users.noreply.github.com>
Co-authored-by: mudler <2420543+mudler@users.noreply.github.com>
2026-02-04 23:08:08 +01:00
LocalAI [bot]
b413beba2d chore: ⬆️ Update ggml-org/whisper.cpp to 941bdabbe4561bc6de68981aea01bc5ab05781c5 (#8398)
⬆️ Update ggml-org/whisper.cpp

Signed-off-by: github-actions[bot] <41898282+github-actions[bot]@users.noreply.github.com>
Co-authored-by: mudler <2420543+mudler@users.noreply.github.com>
2026-02-04 21:20:59 +00:00
Ettore Di Giacinto
9db4df22f3 chore: update torch and torchaudio version specifications for qwen-tts in MPS
Signed-off-by: Ettore Di Giacinto <mudler@users.noreply.github.com>
2026-02-04 16:55:42 +01:00
Jonas Bernard
5ac50c9348 fix(docs): Promote DEBUG=false in production docker compose (#8390)
fix(docs): Use DEBUG=false in production docker compose

Signed-off-by: Jonas Bernard <public.jbernard@web.de>
2026-02-04 09:35:32 +01:00
Ettore Di Giacinto
5201b58d3e feat(mlx): Add support for CUDA12, CUDA13, L4T, SBSA and CPU (#8380)
Signed-off-by: Ettore Di Giacinto <mudler@localai.io>
2026-02-03 23:53:34 +01:00
LocalAI [bot]
8fa6737bdc chore(model gallery): 🤖 add 1 new models via gallery agent (#8381)
chore(model gallery): 🤖 add new models via gallery agent

Signed-off-by: github-actions[bot] <41898282+github-actions[bot]@users.noreply.github.com>
Co-authored-by: mudler <2420543+mudler@users.noreply.github.com>
2026-02-03 22:40:22 +01:00
Ettore Di Giacinto
3039ced287 chore(ci): enlarge sleep startup time
Even if suboptimal as we should poll to wait for the service to be available, this should at least alleviate tests for now

Signed-off-by: Ettore Di Giacinto <mudler@users.noreply.github.com>
2026-02-03 22:07:07 +01:00
Ettore Di Giacinto
e7fc604dbc feat(metal): try to extend support to remaining backends (#8374)
* feat(metal): try to extend support to remaining backends

Signed-off-by: Ettore Di Giacinto <mudler@localai.io>

* neutts doesn't work

Signed-off-by: Ettore Di Giacinto <mudler@localai.io>

* split outetts out of transformers

Signed-off-by: Ettore Di Giacinto <mudler@localai.io>

* Remove torch pin to whisperx

Signed-off-by: Ettore Di Giacinto <mudler@localai.io>

---------

Signed-off-by: Ettore Di Giacinto <mudler@localai.io>
2026-02-03 21:57:50 +01:00
Richard Palethorpe
5195062e12 fix(realtime): Include noAction function in prompt template and handle tool_choice (#8372)
The realtime endpoint was not passing the noAction "answer" function to the
model in the prompt template, causing the model to always call user-provided
tools even when a direct response was appropriate.

Root cause:
- User tools were added to the funcs list
- TemplateMessages() was called to generate the prompt
- noAction function was only added AFTER templating
- This meant the prompt didn't include the "answer" function, even though
  the grammar did

Fix:
- Move noAction function creation before TemplateMessages() call so it's
  included in both the prompt and grammar
- Add proper tool_choice parameter handling to support "auto", "required",
  "none", and specific function selection
- Match behavior of the standard chat endpoint

💘 Generated with Crush

Assisted-by: Claude Sonnet 4.5 via Crush <crush@charm.land>

Signed-off-by: Richard Palethorpe <io@richiejp.com>
2026-02-03 14:30:37 +01:00
272 changed files with 10580 additions and 3388 deletions

View File

@@ -10,7 +10,8 @@ services:
- 8080:8080
volumes:
- localai_workspace:/workspace
- ../models:/host-models
- models:/host-models
- backends:/host-backends
- ./customization:/devcontainer-customization
command: /bin/sh -c "while sleep 1000; do :; done"
cap_add:
@@ -39,6 +40,9 @@ services:
- GF_SECURITY_ADMIN_PASSWORD=grafana
volumes:
- ./grafana:/etc/grafana/provisioning/datasources
volumes:
prom_data:
localai_workspace:
localai_workspace:
models:
backends:

3
.env
View File

@@ -26,6 +26,9 @@
## Disables COMPEL (Diffusers)
# COMPEL=0
## Disables SD_EMBED (Diffusers)
# SD_EMBED=0
## Enable/Disable single backend (useful if only one GPU is available)
# LOCALAI_SINGLE_ACTIVE_BACKEND=true

View File

@@ -146,7 +146,7 @@ func getRealReadme(ctx context.Context, repository string) (string, error) {
return "", err
}
content := newFragment.LastMessage().Content
content := result.LastMessage().Content
return cleanTextContent(content), nil
}

View File

@@ -14,6 +14,7 @@ concurrency:
jobs:
backend-jobs:
if: github.repository == 'mudler/LocalAI'
uses: ./.github/workflows/backend_build.yml
with:
tag-latest: ${{ matrix.tag-latest }}
@@ -104,6 +105,58 @@ jobs:
dockerfile: "./backend/Dockerfile.python"
context: "./"
ubuntu-version: '2404'
- build-type: ''
cuda-major-version: ""
cuda-minor-version: ""
platforms: 'linux/amd64'
tag-latest: 'auto'
tag-suffix: '-cpu-ace-step'
runs-on: 'ubuntu-latest'
base-image: "ubuntu:24.04"
skip-drivers: 'true'
backend: "ace-step"
dockerfile: "./backend/Dockerfile.python"
context: "./"
ubuntu-version: '2404'
- build-type: ''
cuda-major-version: ""
cuda-minor-version: ""
platforms: 'linux/amd64'
tag-latest: 'auto'
tag-suffix: '-cpu-mlx'
runs-on: 'ubuntu-latest'
base-image: "ubuntu:24.04"
skip-drivers: 'true'
backend: "mlx"
dockerfile: "./backend/Dockerfile.python"
context: "./"
ubuntu-version: '2404'
- build-type: ''
cuda-major-version: ""
cuda-minor-version: ""
platforms: 'linux/amd64'
tag-latest: 'auto'
tag-suffix: '-cpu-mlx-vlm'
runs-on: 'ubuntu-latest'
base-image: "ubuntu:24.04"
skip-drivers: 'true'
backend: "mlx-vlm"
dockerfile: "./backend/Dockerfile.python"
context: "./"
ubuntu-version: '2404'
- build-type: ''
cuda-major-version: ""
cuda-minor-version: ""
platforms: 'linux/amd64'
tag-latest: 'auto'
tag-suffix: '-cpu-mlx-audio'
runs-on: 'ubuntu-latest'
base-image: "ubuntu:24.04"
skip-drivers: 'true'
backend: "mlx-audio"
dockerfile: "./backend/Dockerfile.python"
context: "./"
ubuntu-version: '2404'
# CUDA 12 builds
- build-type: 'cublas'
cuda-major-version: "12"
@@ -131,6 +184,19 @@ jobs:
dockerfile: "./backend/Dockerfile.python"
context: "./"
ubuntu-version: '2404'
- build-type: 'cublas'
cuda-major-version: "12"
cuda-minor-version: "8"
platforms: 'linux/amd64'
tag-latest: 'auto'
tag-suffix: '-gpu-nvidia-cuda-12-nemo'
runs-on: 'ubuntu-latest'
base-image: "ubuntu:24.04"
skip-drivers: 'false'
backend: "nemo"
dockerfile: "./backend/Dockerfile.python"
context: "./"
ubuntu-version: '2404'
- build-type: 'cublas'
cuda-major-version: "12"
cuda-minor-version: "8"
@@ -144,6 +210,19 @@ jobs:
dockerfile: "./backend/Dockerfile.python"
context: "./"
ubuntu-version: '2404'
- build-type: 'cublas'
cuda-major-version: "12"
cuda-minor-version: "8"
platforms: 'linux/amd64'
tag-latest: 'auto'
tag-suffix: '-gpu-nvidia-cuda-12-faster-qwen3-tts'
runs-on: 'ubuntu-latest'
base-image: "ubuntu:24.04"
skip-drivers: 'false'
backend: "faster-qwen3-tts"
dockerfile: "./backend/Dockerfile.python"
context: "./"
ubuntu-version: '2404'
- build-type: 'cublas'
cuda-major-version: "12"
cuda-minor-version: "8"
@@ -248,6 +327,19 @@ jobs:
dockerfile: "./backend/Dockerfile.python"
context: "./"
ubuntu-version: '2404'
- build-type: 'cublas'
cuda-major-version: "12"
cuda-minor-version: "8"
platforms: 'linux/amd64'
tag-latest: 'auto'
tag-suffix: '-gpu-nvidia-cuda-12-ace-step'
runs-on: 'ubuntu-latest'
base-image: "ubuntu:24.04"
skip-drivers: 'false'
backend: "ace-step"
dockerfile: "./backend/Dockerfile.python"
context: "./"
ubuntu-version: '2404'
- build-type: 'cublas'
cuda-major-version: "12"
cuda-minor-version: "8"
@@ -300,6 +392,19 @@ jobs:
dockerfile: "./backend/Dockerfile.python"
context: "./"
ubuntu-version: '2404'
- build-type: 'cublas'
cuda-major-version: "12"
cuda-minor-version: "8"
platforms: 'linux/amd64'
tag-latest: 'auto'
tag-suffix: '-gpu-nvidia-cuda-12-outetts'
runs-on: 'ubuntu-latest'
base-image: "ubuntu:24.04"
skip-drivers: 'false'
backend: "outetts"
dockerfile: "./backend/Dockerfile.python"
context: "./"
ubuntu-version: '2404'
- build-type: 'cublas'
cuda-major-version: "12"
cuda-minor-version: "8"
@@ -326,6 +431,45 @@ jobs:
dockerfile: "./backend/Dockerfile.python"
context: "./"
ubuntu-version: '2404'
- build-type: 'cublas'
cuda-major-version: "12"
cuda-minor-version: "8"
platforms: 'linux/amd64'
tag-latest: 'auto'
tag-suffix: '-gpu-nvidia-cuda-12-mlx'
runs-on: 'ubuntu-latest'
base-image: "ubuntu:24.04"
skip-drivers: 'false'
backend: "mlx"
dockerfile: "./backend/Dockerfile.python"
context: "./"
ubuntu-version: '2404'
- build-type: 'cublas'
cuda-major-version: "12"
cuda-minor-version: "8"
platforms: 'linux/amd64'
tag-latest: 'auto'
tag-suffix: '-gpu-nvidia-cuda-12-mlx-vlm'
runs-on: 'ubuntu-latest'
base-image: "ubuntu:24.04"
skip-drivers: 'false'
backend: "mlx-vlm"
dockerfile: "./backend/Dockerfile.python"
context: "./"
ubuntu-version: '2404'
- build-type: 'cublas'
cuda-major-version: "12"
cuda-minor-version: "8"
platforms: 'linux/amd64'
tag-latest: 'auto'
tag-suffix: '-gpu-nvidia-cuda-12-mlx-audio'
runs-on: 'ubuntu-latest'
base-image: "ubuntu:24.04"
skip-drivers: 'false'
backend: "mlx-audio"
dockerfile: "./backend/Dockerfile.python"
context: "./"
ubuntu-version: '2404'
- build-type: 'cublas'
cuda-major-version: "12"
cuda-minor-version: "8"
@@ -418,6 +562,19 @@ jobs:
dockerfile: "./backend/Dockerfile.python"
context: "./"
ubuntu-version: '2404'
- build-type: 'cublas'
cuda-major-version: "13"
cuda-minor-version: "0"
platforms: 'linux/amd64'
tag-latest: 'auto'
tag-suffix: '-gpu-nvidia-cuda-13-nemo'
runs-on: 'ubuntu-latest'
base-image: "ubuntu:24.04"
skip-drivers: 'false'
backend: "nemo"
dockerfile: "./backend/Dockerfile.python"
context: "./"
ubuntu-version: '2404'
- build-type: 'cublas'
cuda-major-version: "13"
cuda-minor-version: "0"
@@ -431,6 +588,19 @@ jobs:
dockerfile: "./backend/Dockerfile.python"
context: "./"
ubuntu-version: '2404'
- build-type: 'cublas'
cuda-major-version: "13"
cuda-minor-version: "0"
platforms: 'linux/amd64'
tag-latest: 'auto'
tag-suffix: '-gpu-nvidia-cuda-13-faster-qwen3-tts'
runs-on: 'ubuntu-latest'
base-image: "ubuntu:24.04"
skip-drivers: 'false'
backend: "faster-qwen3-tts"
dockerfile: "./backend/Dockerfile.python"
context: "./"
ubuntu-version: '2404'
- build-type: 'cublas'
cuda-major-version: "13"
cuda-minor-version: "0"
@@ -509,6 +679,19 @@ jobs:
dockerfile: "./backend/Dockerfile.python"
context: "./"
ubuntu-version: '2404'
- build-type: 'cublas'
cuda-major-version: "13"
cuda-minor-version: "0"
platforms: 'linux/amd64'
tag-latest: 'auto'
tag-suffix: '-gpu-nvidia-cuda-13-ace-step'
runs-on: 'ubuntu-latest'
base-image: "ubuntu:24.04"
skip-drivers: 'false'
backend: "ace-step"
dockerfile: "./backend/Dockerfile.python"
context: "./"
ubuntu-version: '2404'
- build-type: 'l4t'
cuda-major-version: "13"
cuda-minor-version: "0"
@@ -548,6 +731,19 @@ jobs:
backend: "qwen-tts"
dockerfile: "./backend/Dockerfile.python"
context: "./"
- build-type: 'l4t'
cuda-major-version: "13"
cuda-minor-version: "0"
platforms: 'linux/arm64'
tag-latest: 'auto'
tag-suffix: '-nvidia-l4t-cuda-13-arm64-faster-qwen3-tts'
runs-on: 'ubuntu-24.04-arm'
base-image: "ubuntu:24.04"
skip-drivers: 'false'
ubuntu-version: '2404'
backend: "faster-qwen3-tts"
dockerfile: "./backend/Dockerfile.python"
context: "./"
- build-type: 'l4t'
cuda-major-version: "13"
cuda-minor-version: "0"
@@ -561,6 +757,19 @@ jobs:
backend: "pocket-tts"
dockerfile: "./backend/Dockerfile.python"
context: "./"
- build-type: 'l4t'
cuda-major-version: "13"
cuda-minor-version: "0"
platforms: 'linux/arm64'
tag-latest: 'auto'
tag-suffix: '-nvidia-l4t-cuda-13-arm64-chatterbox'
runs-on: 'ubuntu-24.04-arm'
base-image: "ubuntu:24.04"
skip-drivers: 'false'
ubuntu-version: '2404'
backend: "chatterbox"
dockerfile: "./backend/Dockerfile.python"
context: "./"
- build-type: 'l4t'
cuda-major-version: "13"
cuda-minor-version: "0"
@@ -574,6 +783,45 @@ jobs:
backend: "diffusers"
dockerfile: "./backend/Dockerfile.python"
context: "./"
- build-type: 'l4t'
cuda-major-version: "13"
cuda-minor-version: "0"
platforms: 'linux/arm64'
tag-latest: 'auto'
tag-suffix: '-nvidia-l4t-cuda-13-arm64-mlx'
runs-on: 'ubuntu-24.04-arm'
base-image: "ubuntu:24.04"
skip-drivers: 'false'
ubuntu-version: '2404'
backend: "mlx"
dockerfile: "./backend/Dockerfile.python"
context: "./"
- build-type: 'l4t'
cuda-major-version: "13"
cuda-minor-version: "0"
platforms: 'linux/arm64'
tag-latest: 'auto'
tag-suffix: '-nvidia-l4t-cuda-13-arm64-mlx-vlm'
runs-on: 'ubuntu-24.04-arm'
base-image: "ubuntu:24.04"
skip-drivers: 'false'
ubuntu-version: '2404'
backend: "mlx-vlm"
dockerfile: "./backend/Dockerfile.python"
context: "./"
- build-type: 'l4t'
cuda-major-version: "13"
cuda-minor-version: "0"
platforms: 'linux/arm64'
tag-latest: 'auto'
tag-suffix: '-nvidia-l4t-cuda-13-arm64-mlx-audio'
runs-on: 'ubuntu-24.04-arm'
base-image: "ubuntu:24.04"
skip-drivers: 'false'
ubuntu-version: '2404'
backend: "mlx-audio"
dockerfile: "./backend/Dockerfile.python"
context: "./"
- build-type: 'cublas'
cuda-major-version: "13"
cuda-minor-version: "0"
@@ -639,6 +887,45 @@ jobs:
dockerfile: "./backend/Dockerfile.python"
context: "./"
ubuntu-version: '2404'
- build-type: 'cublas'
cuda-major-version: "13"
cuda-minor-version: "0"
platforms: 'linux/amd64'
tag-latest: 'auto'
tag-suffix: '-gpu-nvidia-cuda-13-mlx'
runs-on: 'ubuntu-latest'
base-image: "ubuntu:24.04"
skip-drivers: 'false'
backend: "mlx"
dockerfile: "./backend/Dockerfile.python"
context: "./"
ubuntu-version: '2404'
- build-type: 'cublas'
cuda-major-version: "13"
cuda-minor-version: "0"
platforms: 'linux/amd64'
tag-latest: 'auto'
tag-suffix: '-gpu-nvidia-cuda-13-mlx-vlm'
runs-on: 'ubuntu-latest'
base-image: "ubuntu:24.04"
skip-drivers: 'false'
backend: "mlx-vlm"
dockerfile: "./backend/Dockerfile.python"
context: "./"
ubuntu-version: '2404'
- build-type: 'cublas'
cuda-major-version: "13"
cuda-minor-version: "0"
platforms: 'linux/amd64'
tag-latest: 'auto'
tag-suffix: '-gpu-nvidia-cuda-13-mlx-audio'
runs-on: 'ubuntu-latest'
base-image: "ubuntu:24.04"
skip-drivers: 'false'
backend: "mlx-audio"
dockerfile: "./backend/Dockerfile.python"
context: "./"
ubuntu-version: '2404'
- build-type: 'cublas'
cuda-major-version: "13"
cuda-minor-version: "0"
@@ -783,6 +1070,19 @@ jobs:
dockerfile: "./backend/Dockerfile.python"
context: "./"
ubuntu-version: '2404'
- build-type: 'hipblas'
cuda-major-version: ""
cuda-minor-version: ""
platforms: 'linux/amd64'
tag-latest: 'auto'
tag-suffix: '-gpu-rocm-hipblas-ace-step'
runs-on: 'arc-runner-set'
base-image: "rocm/dev-ubuntu-24.04:6.4.4"
skip-drivers: 'false'
backend: "ace-step"
dockerfile: "./backend/Dockerfile.python"
context: "./"
ubuntu-version: '2404'
# ROCm additional backends
- build-type: 'hipblas'
cuda-major-version: ""
@@ -823,6 +1123,19 @@ jobs:
dockerfile: "./backend/Dockerfile.python"
context: "./"
ubuntu-version: '2404'
- build-type: 'hipblas'
cuda-major-version: ""
cuda-minor-version: ""
platforms: 'linux/amd64'
tag-latest: 'auto'
tag-suffix: '-gpu-rocm-hipblas-nemo'
runs-on: 'arc-runner-set'
base-image: "rocm/dev-ubuntu-24.04:6.4.4"
skip-drivers: 'false'
backend: "nemo"
dockerfile: "./backend/Dockerfile.python"
context: "./"
ubuntu-version: '2404'
- build-type: 'hipblas'
cuda-major-version: ""
cuda-minor-version: ""
@@ -980,6 +1293,19 @@ jobs:
dockerfile: "./backend/Dockerfile.python"
context: "./"
ubuntu-version: '2404'
- build-type: 'intel'
cuda-major-version: ""
cuda-minor-version: ""
platforms: 'linux/amd64'
tag-latest: 'auto'
tag-suffix: '-gpu-intel-ace-step'
runs-on: 'ubuntu-latest'
base-image: "intel/oneapi-basekit:2025.3.0-0-devel-ubuntu24.04"
skip-drivers: 'false'
backend: "ace-step"
dockerfile: "./backend/Dockerfile.python"
context: "./"
ubuntu-version: '2404'
- build-type: 'l4t'
cuda-major-version: "12"
cuda-minor-version: "0"
@@ -1019,6 +1345,19 @@ jobs:
dockerfile: "./backend/Dockerfile.python"
context: "./"
ubuntu-version: '2204'
- build-type: 'l4t'
cuda-major-version: "12"
cuda-minor-version: "0"
platforms: 'linux/arm64'
tag-latest: 'auto'
tag-suffix: '-nvidia-l4t-faster-qwen3-tts'
runs-on: 'ubuntu-24.04-arm'
base-image: "nvcr.io/nvidia/l4t-jetpack:r36.4.0"
skip-drivers: 'true'
backend: "faster-qwen3-tts"
dockerfile: "./backend/Dockerfile.python"
context: "./"
ubuntu-version: '2204'
- build-type: 'l4t'
cuda-major-version: "12"
cuda-minor-version: "0"
@@ -1045,6 +1384,45 @@ jobs:
dockerfile: "./backend/Dockerfile.python"
context: "./"
ubuntu-version: '2204'
- build-type: 'l4t'
cuda-major-version: "12"
cuda-minor-version: "0"
platforms: 'linux/arm64'
tag-latest: 'auto'
tag-suffix: '-nvidia-l4t-mlx'
runs-on: 'ubuntu-24.04-arm'
base-image: "nvcr.io/nvidia/l4t-jetpack:r36.4.0"
skip-drivers: 'true'
backend: "mlx"
dockerfile: "./backend/Dockerfile.python"
context: "./"
ubuntu-version: '2204'
- build-type: 'l4t'
cuda-major-version: "12"
cuda-minor-version: "0"
platforms: 'linux/arm64'
tag-latest: 'auto'
tag-suffix: '-nvidia-l4t-mlx-vlm'
runs-on: 'ubuntu-24.04-arm'
base-image: "nvcr.io/nvidia/l4t-jetpack:r36.4.0"
skip-drivers: 'true'
backend: "mlx-vlm"
dockerfile: "./backend/Dockerfile.python"
context: "./"
ubuntu-version: '2204'
- build-type: 'l4t'
cuda-major-version: "12"
cuda-minor-version: "0"
platforms: 'linux/arm64'
tag-latest: 'auto'
tag-suffix: '-nvidia-l4t-mlx-audio'
runs-on: 'ubuntu-24.04-arm'
base-image: "nvcr.io/nvidia/l4t-jetpack:r36.4.0"
skip-drivers: 'true'
backend: "mlx-audio"
dockerfile: "./backend/Dockerfile.python"
context: "./"
ubuntu-version: '2204'
# SYCL additional backends
- build-type: 'intel'
cuda-major-version: ""
@@ -1098,6 +1476,19 @@ jobs:
dockerfile: "./backend/Dockerfile.python"
context: "./"
ubuntu-version: '2404'
- build-type: 'intel'
cuda-major-version: ""
cuda-minor-version: ""
platforms: 'linux/amd64'
tag-latest: 'auto'
tag-suffix: '-gpu-intel-nemo'
runs-on: 'arc-runner-set'
base-image: "intel/oneapi-basekit:2025.3.0-0-devel-ubuntu24.04"
skip-drivers: 'false'
backend: "nemo"
dockerfile: "./backend/Dockerfile.python"
context: "./"
ubuntu-version: '2404'
- build-type: 'intel'
cuda-major-version: ""
cuda-minor-version: ""
@@ -1348,6 +1739,20 @@ jobs:
dockerfile: "./backend/Dockerfile.golang"
context: "./"
ubuntu-version: '2404'
# voxtral
- build-type: ''
cuda-major-version: ""
cuda-minor-version: ""
platforms: 'linux/amd64,linux/arm64'
tag-latest: 'auto'
tag-suffix: '-cpu-voxtral'
runs-on: 'ubuntu-latest'
base-image: "ubuntu:24.04"
skip-drivers: 'false'
backend: "voxtral"
dockerfile: "./backend/Dockerfile.golang"
context: "./"
ubuntu-version: '2404'
#silero-vad
- build-type: ''
cuda-major-version: ""
@@ -1523,6 +1928,19 @@ jobs:
dockerfile: "./backend/Dockerfile.python"
context: "./"
ubuntu-version: '2404'
- build-type: ''
cuda-major-version: ""
cuda-minor-version: ""
platforms: 'linux/amd64,linux/arm64'
tag-latest: 'auto'
tag-suffix: '-cpu-nemo'
runs-on: 'ubuntu-latest'
base-image: "ubuntu:24.04"
skip-drivers: 'false'
backend: "nemo"
dockerfile: "./backend/Dockerfile.python"
context: "./"
ubuntu-version: '2404'
- build-type: ''
cuda-major-version: ""
cuda-minor-version: ""
@@ -1539,7 +1957,7 @@ jobs:
- build-type: ''
cuda-major-version: ""
cuda-minor-version: ""
platforms: 'linux/amd64'
platforms: 'linux/amd64,linux/arm64'
tag-latest: 'auto'
tag-suffix: '-cpu-voxcpm'
runs-on: 'ubuntu-latest'
@@ -1562,6 +1980,19 @@ jobs:
dockerfile: "./backend/Dockerfile.python"
context: "./"
ubuntu-version: '2404'
- build-type: ''
cuda-major-version: ""
cuda-minor-version: ""
platforms: 'linux/amd64'
tag-latest: 'auto'
tag-suffix: '-cpu-outetts'
runs-on: 'ubuntu-latest'
base-image: "ubuntu:24.04"
skip-drivers: 'true'
backend: "outetts"
dockerfile: "./backend/Dockerfile.python"
context: "./"
ubuntu-version: '2404'
backend-jobs-darwin:
uses: ./.github/workflows/backend_build_darwin.yml
strategy:
@@ -1570,6 +2001,9 @@ jobs:
- backend: "diffusers"
tag-suffix: "-metal-darwin-arm64-diffusers"
build-type: "mps"
- backend: "ace-step"
tag-suffix: "-metal-darwin-arm64-ace-step"
build-type: "mps"
- backend: "mlx"
tag-suffix: "-metal-darwin-arm64-mlx"
build-type: "mps"
@@ -1590,6 +2024,71 @@ jobs:
tag-suffix: "-metal-darwin-arm64-whisper"
build-type: "metal"
lang: "go"
- backend: "voxtral"
tag-suffix: "-metal-darwin-arm64-voxtral"
build-type: "metal"
lang: "go"
- backend: "vibevoice"
tag-suffix: "-metal-darwin-arm64-vibevoice"
build-type: "mps"
- backend: "qwen-asr"
tag-suffix: "-metal-darwin-arm64-qwen-asr"
build-type: "mps"
- backend: "nemo"
tag-suffix: "-metal-darwin-arm64-nemo"
build-type: "mps"
- backend: "qwen-tts"
tag-suffix: "-metal-darwin-arm64-qwen-tts"
build-type: "mps"
- backend: "voxcpm"
tag-suffix: "-metal-darwin-arm64-voxcpm"
build-type: "mps"
- backend: "pocket-tts"
tag-suffix: "-metal-darwin-arm64-pocket-tts"
build-type: "mps"
- backend: "moonshine"
tag-suffix: "-metal-darwin-arm64-moonshine"
build-type: "mps"
- backend: "whisperx"
tag-suffix: "-metal-darwin-arm64-whisperx"
build-type: "mps"
- backend: "rerankers"
tag-suffix: "-metal-darwin-arm64-rerankers"
build-type: "mps"
- backend: "transformers"
tag-suffix: "-metal-darwin-arm64-transformers"
build-type: "mps"
- backend: "kokoro"
tag-suffix: "-metal-darwin-arm64-kokoro"
build-type: "mps"
- backend: "faster-whisper"
tag-suffix: "-metal-darwin-arm64-faster-whisper"
build-type: "mps"
- backend: "coqui"
tag-suffix: "-metal-darwin-arm64-coqui"
build-type: "mps"
- backend: "rfdetr"
tag-suffix: "-metal-darwin-arm64-rfdetr"
build-type: "mps"
- backend: "kitten-tts"
tag-suffix: "-metal-darwin-arm64-kitten-tts"
build-type: "mps"
- backend: "piper"
tag-suffix: "-metal-darwin-arm64-piper"
build-type: "metal"
lang: "go"
- backend: "silero-vad"
tag-suffix: "-metal-darwin-arm64-silero-vad"
build-type: "metal"
lang: "go"
- backend: "local-store"
tag-suffix: "-metal-darwin-arm64-local-store"
build-type: "metal"
lang: "go"
- backend: "huggingface"
tag-suffix: "-metal-darwin-arm64-huggingface"
build-type: "metal"
lang: "go"
with:
backend: ${{ matrix.backend }}
build-type: ${{ matrix.build-type }}

View File

@@ -5,6 +5,7 @@ on:
workflow_dispatch:
jobs:
bump-backends:
if: github.repository == 'mudler/LocalAI'
strategy:
fail-fast: false
matrix:
@@ -17,10 +18,6 @@ jobs:
variable: "WHISPER_CPP_VERSION"
branch: "master"
file: "backend/go/whisper/Makefile"
- repository: "PABannier/bark.cpp"
variable: "BARKCPP_VERSION"
branch: "main"
file: "Makefile"
- repository: "leejet/stable-diffusion.cpp"
variable: "STABLEDIFFUSION_GGML_VERSION"
branch: "master"
@@ -29,6 +26,10 @@ jobs:
variable: "PIPER_VERSION"
branch: "master"
file: "backend/go/piper/Makefile"
- repository: "antirez/voxtral.c"
variable: "VOXTRAL_VERSION"
branch: "main"
file: "backend/go/voxtral/Makefile"
runs-on: ubuntu-latest
steps:
- uses: actions/checkout@v6

View File

@@ -5,6 +5,7 @@ on:
workflow_dispatch:
jobs:
bump-docs:
if: github.repository == 'mudler/LocalAI'
strategy:
fail-fast: false
matrix:

View File

@@ -5,6 +5,7 @@ on:
workflow_dispatch:
jobs:
checksum_check:
if: github.repository == 'mudler/LocalAI'
runs-on: ubuntu-latest
steps:
- name: Force Install GIT latest

View File

@@ -9,8 +9,8 @@ permissions:
jobs:
dependabot:
if: github.repository == 'mudler/LocalAI' && github.actor == 'dependabot[bot]'
runs-on: ubuntu-latest
if: ${{ github.actor == 'dependabot[bot]' }}
steps:
- name: Dependabot metadata
id: metadata

View File

@@ -12,6 +12,7 @@ concurrency:
jobs:
build-linux:
if: github.repository == 'mudler/LocalAI'
runs-on: ubuntu-latest
steps:
- name: Clone

View File

@@ -27,6 +27,7 @@ on:
type: string
jobs:
gallery-agent:
if: github.repository == 'mudler/LocalAI'
runs-on: ubuntu-latest
steps:
- name: Checkout repository

View File

@@ -13,6 +13,7 @@ concurrency:
jobs:
generate_caches:
if: github.repository == 'mudler/LocalAI'
strategy:
matrix:
include:

View File

@@ -12,6 +12,7 @@ concurrency:
jobs:
generate_caches:
if: github.repository == 'mudler/LocalAI'
strategy:
matrix:
include:

View File

@@ -14,6 +14,7 @@
jobs:
hipblas-jobs:
if: github.repository == 'mudler/LocalAI'
uses: ./.github/workflows/image_build.yml
with:
tag-latest: ${{ matrix.tag-latest }}
@@ -50,6 +51,7 @@
ubuntu-codename: 'noble'
core-image-build:
if: github.repository == 'mudler/LocalAI'
uses: ./.github/workflows/image_build.yml
with:
tag-latest: ${{ matrix.tag-latest }}
@@ -136,6 +138,7 @@
ubuntu-codename: 'noble'
gh-runner:
if: github.repository == 'mudler/LocalAI'
uses: ./.github/workflows/image_build.yml
with:
tag-latest: ${{ matrix.tag-latest }}

View File

@@ -10,8 +10,8 @@ permissions:
actions: write # to dispatch publish workflow
jobs:
dependabot:
if: github.repository == 'mudler/LocalAI' && github.actor == 'localai-bot' && contains(github.event.pull_request.title, 'chore:')
runs-on: ubuntu-latest
if: ${{ github.actor == 'localai-bot' && !contains(github.event.pull_request.title, 'chore(model gallery):') }}
steps:
- name: Checkout repository
uses: actions/checkout@v6

View File

@@ -10,7 +10,7 @@ permissions:
jobs:
notify-discord:
if: ${{ (github.event.pull_request.merged == true) && (contains(github.event.pull_request.labels.*.name, 'area/ai-model')) }}
if: github.repository == 'mudler/LocalAI' && (github.event.pull_request.merged == true) && (contains(github.event.pull_request.labels.*.name, 'area/ai-model'))
env:
MODEL_NAME: gemma-3-12b-it-qat
runs-on: ubuntu-latest
@@ -90,7 +90,7 @@ jobs:
connect-timeout-seconds: 180
limit-access-to-actor: true
notify-twitter:
if: ${{ (github.event.pull_request.merged == true) && (contains(github.event.pull_request.labels.*.name, 'area/ai-model')) }}
if: github.repository == 'mudler/LocalAI' && (github.event.pull_request.merged == true) && (contains(github.event.pull_request.labels.*.name, 'area/ai-model'))
env:
MODEL_NAME: gemma-3-12b-it-qat
runs-on: ubuntu-latest

View File

@@ -6,6 +6,7 @@ on:
jobs:
notify-discord:
if: github.repository == 'mudler/LocalAI'
runs-on: ubuntu-latest
env:
RELEASE_BODY: ${{ github.event.release.body }}

View File

@@ -18,7 +18,7 @@ jobs:
with:
go-version: 1.23
- name: Run GoReleaser
uses: goreleaser/goreleaser-action@v6
uses: goreleaser/goreleaser-action@v7
with:
version: v2.11.0
args: release --clean

View File

@@ -8,9 +8,10 @@ on:
jobs:
stale:
if: github.repository == 'mudler/LocalAI'
runs-on: ubuntu-latest
steps:
- uses: actions/stale@997185467fa4f803885201cee163a9f38240193d # v9
- uses: actions/stale@b5d41d4e1d5dceea10e7104786b73624c18a190f # v9
with:
stale-issue-message: 'This issue is stale because it has been open 90 days with no activity. Remove stale label or comment or this will be closed in 5 days.'
stale-pr-message: 'This PR is stale because it has been open 90 days with no activity. Remove stale label or comment or this will be closed in 10 days.'

View File

@@ -323,6 +323,25 @@ jobs:
run: |
make --jobs=5 --output-sync=target -C backend/python/qwen-asr
make --jobs=5 --output-sync=target -C backend/python/qwen-asr test
tests-nemo:
runs-on: ubuntu-latest
steps:
- name: Clone
uses: actions/checkout@v6
with:
submodules: true
- name: Dependencies
run: |
sudo apt-get update
sudo apt-get install -y build-essential ffmpeg sox
sudo apt-get install -y ca-certificates cmake curl patch python3-pip
# Install UV
curl -LsSf https://astral.sh/uv/install.sh | sh
pip install --user --no-cache-dir grpcio-tools==1.64.1
- name: Test nemo
run: |
make --jobs=5 --output-sync=target -C backend/python/nemo
make --jobs=5 --output-sync=target -C backend/python/nemo test
tests-voxcpm:
runs-on: ubuntu-latest
steps:
@@ -342,3 +361,34 @@ jobs:
run: |
make --jobs=5 --output-sync=target -C backend/python/voxcpm
make --jobs=5 --output-sync=target -C backend/python/voxcpm test
tests-voxtral:
runs-on: ubuntu-latest
steps:
- name: Clone
uses: actions/checkout@v6
with:
submodules: true
- name: Dependencies
run: |
sudo apt-get update
sudo apt-get install -y build-essential cmake curl libopenblas-dev ffmpeg
- name: Setup Go
uses: actions/setup-go@v5
# You can test your matrix by printing the current Go version
- name: Display Go version
run: go version
- name: Proto Dependencies
run: |
# Install protoc
curl -L -s https://github.com/protocolbuffers/protobuf/releases/download/v26.1/protoc-26.1-linux-x86_64.zip -o protoc.zip && \
unzip -j -d /usr/local/bin protoc.zip bin/protoc && \
rm protoc.zip
go install google.golang.org/protobuf/cmd/protoc-gen-go@v1.34.2
go install google.golang.org/grpc/cmd/protoc-gen-go-grpc@1958fcbe2ca8bd93af633f11e97d44e567e945af
PATH="$PATH:$HOME/go/bin" make protogen-go
- name: Build voxtral
run: |
make --jobs=5 --output-sync=target -C backend/go/voxtral
- name: Test voxtral
run: |
make --jobs=5 --output-sync=target -C backend/go/voxtral test

View File

@@ -5,6 +5,7 @@ on:
workflow_dispatch:
jobs:
swagger:
if: github.repository == 'mudler/LocalAI'
strategy:
fail-fast: false
runs-on: ubuntu-latest

1
CLAUDE.md Symbolic link
View File

@@ -0,0 +1 @@
AGENTS.md

View File

@@ -1,5 +1,5 @@
# Disable parallel execution for backend builds
.NOTPARALLEL: backends/diffusers backends/llama-cpp backends/piper backends/stablediffusion-ggml backends/whisper backends/faster-whisper backends/silero-vad backends/local-store backends/huggingface backends/rfdetr backends/kitten-tts backends/kokoro backends/chatterbox backends/llama-cpp-darwin backends/neutts build-darwin-python-backend build-darwin-go-backend backends/mlx backends/diffuser-darwin backends/mlx-vlm backends/mlx-audio backends/stablediffusion-ggml-darwin backends/vllm backends/vllm-omni backends/moonshine backends/pocket-tts backends/qwen-tts backends/qwen-asr backends/voxcpm backends/whisperx
.NOTPARALLEL: backends/diffusers backends/llama-cpp backends/outetts backends/piper backends/stablediffusion-ggml backends/whisper backends/faster-whisper backends/silero-vad backends/local-store backends/huggingface backends/rfdetr backends/kitten-tts backends/kokoro backends/chatterbox backends/llama-cpp-darwin backends/neutts build-darwin-python-backend build-darwin-go-backend backends/mlx backends/diffuser-darwin backends/mlx-vlm backends/mlx-audio backends/stablediffusion-ggml-darwin backends/vllm backends/vllm-omni backends/moonshine backends/pocket-tts backends/qwen-tts backends/faster-qwen3-tts backends/qwen-asr backends/nemo backends/voxcpm backends/whisperx backends/ace-step backends/voxtral
GOCMD=go
GOTEST=$(GOCMD) test
@@ -308,6 +308,7 @@ protogen-go-clean:
prepare-test-extra: protogen-python
$(MAKE) -C backend/python/transformers
$(MAKE) -C backend/python/outetts
$(MAKE) -C backend/python/diffusers
$(MAKE) -C backend/python/chatterbox
$(MAKE) -C backend/python/vllm
@@ -316,12 +317,16 @@ prepare-test-extra: protogen-python
$(MAKE) -C backend/python/moonshine
$(MAKE) -C backend/python/pocket-tts
$(MAKE) -C backend/python/qwen-tts
$(MAKE) -C backend/python/faster-qwen3-tts
$(MAKE) -C backend/python/qwen-asr
$(MAKE) -C backend/python/nemo
$(MAKE) -C backend/python/voxcpm
$(MAKE) -C backend/python/whisperx
$(MAKE) -C backend/python/ace-step
test-extra: prepare-test-extra
$(MAKE) -C backend/python/transformers test
$(MAKE) -C backend/python/outetts test
$(MAKE) -C backend/python/diffusers test
$(MAKE) -C backend/python/chatterbox test
$(MAKE) -C backend/python/vllm test
@@ -330,9 +335,12 @@ test-extra: prepare-test-extra
$(MAKE) -C backend/python/moonshine test
$(MAKE) -C backend/python/pocket-tts test
$(MAKE) -C backend/python/qwen-tts test
$(MAKE) -C backend/python/faster-qwen3-tts test
$(MAKE) -C backend/python/qwen-asr test
$(MAKE) -C backend/python/nemo test
$(MAKE) -C backend/python/voxcpm test
$(MAKE) -C backend/python/whisperx test
$(MAKE) -C backend/python/ace-step test
DOCKER_IMAGE?=local-ai
DOCKER_AIO_IMAGE?=local-ai-aio
@@ -447,10 +455,12 @@ BACKEND_HUGGINGFACE = huggingface|golang|.|false|true
BACKEND_SILERO_VAD = silero-vad|golang|.|false|true
BACKEND_STABLEDIFFUSION_GGML = stablediffusion-ggml|golang|.|--progress=plain|true
BACKEND_WHISPER = whisper|golang|.|false|true
BACKEND_VOXTRAL = voxtral|golang|.|false|true
# Python backends with root context
BACKEND_RERANKERS = rerankers|python|.|false|true
BACKEND_TRANSFORMERS = transformers|python|.|false|true
BACKEND_OUTETTS = outetts|python|.|false|true
BACKEND_FASTER_WHISPER = faster-whisper|python|.|false|true
BACKEND_COQUI = coqui|python|.|false|true
BACKEND_RFDETR = rfdetr|python|.|false|true
@@ -465,9 +475,12 @@ BACKEND_VIBEVOICE = vibevoice|python|.|--progress=plain|true
BACKEND_MOONSHINE = moonshine|python|.|false|true
BACKEND_POCKET_TTS = pocket-tts|python|.|false|true
BACKEND_QWEN_TTS = qwen-tts|python|.|false|true
BACKEND_FASTER_QWEN3_TTS = faster-qwen3-tts|python|.|false|true
BACKEND_QWEN_ASR = qwen-asr|python|.|false|true
BACKEND_NEMO = nemo|python|.|false|true
BACKEND_VOXCPM = voxcpm|python|.|false|true
BACKEND_WHISPERX = whisperx|python|.|false|true
BACKEND_ACE_STEP = ace-step|python|.|false|true
# Helper function to build docker image for a backend
# Usage: $(call docker-build-backend,BACKEND_NAME,DOCKERFILE_TYPE,BUILD_CONTEXT,PROGRESS_FLAG,NEEDS_BACKEND_ARG)
@@ -497,8 +510,10 @@ $(eval $(call generate-docker-build-target,$(BACKEND_HUGGINGFACE)))
$(eval $(call generate-docker-build-target,$(BACKEND_SILERO_VAD)))
$(eval $(call generate-docker-build-target,$(BACKEND_STABLEDIFFUSION_GGML)))
$(eval $(call generate-docker-build-target,$(BACKEND_WHISPER)))
$(eval $(call generate-docker-build-target,$(BACKEND_VOXTRAL)))
$(eval $(call generate-docker-build-target,$(BACKEND_RERANKERS)))
$(eval $(call generate-docker-build-target,$(BACKEND_TRANSFORMERS)))
$(eval $(call generate-docker-build-target,$(BACKEND_OUTETTS)))
$(eval $(call generate-docker-build-target,$(BACKEND_FASTER_WHISPER)))
$(eval $(call generate-docker-build-target,$(BACKEND_COQUI)))
$(eval $(call generate-docker-build-target,$(BACKEND_RFDETR)))
@@ -513,15 +528,18 @@ $(eval $(call generate-docker-build-target,$(BACKEND_VIBEVOICE)))
$(eval $(call generate-docker-build-target,$(BACKEND_MOONSHINE)))
$(eval $(call generate-docker-build-target,$(BACKEND_POCKET_TTS)))
$(eval $(call generate-docker-build-target,$(BACKEND_QWEN_TTS)))
$(eval $(call generate-docker-build-target,$(BACKEND_FASTER_QWEN3_TTS)))
$(eval $(call generate-docker-build-target,$(BACKEND_QWEN_ASR)))
$(eval $(call generate-docker-build-target,$(BACKEND_NEMO)))
$(eval $(call generate-docker-build-target,$(BACKEND_VOXCPM)))
$(eval $(call generate-docker-build-target,$(BACKEND_WHISPERX)))
$(eval $(call generate-docker-build-target,$(BACKEND_ACE_STEP)))
# Pattern rule for docker-save targets
docker-save-%: backend-images
docker save local-ai-backend:$* -o backend-images/$*.tar
docker-build-backends: docker-build-llama-cpp docker-build-rerankers docker-build-vllm docker-build-vllm-omni docker-build-transformers docker-build-diffusers docker-build-kokoro docker-build-faster-whisper docker-build-coqui docker-build-chatterbox docker-build-vibevoice docker-build-moonshine docker-build-pocket-tts docker-build-qwen-tts docker-build-qwen-asr docker-build-voxcpm docker-build-whisperx
docker-build-backends: docker-build-llama-cpp docker-build-rerankers docker-build-vllm docker-build-vllm-omni docker-build-transformers docker-build-outetts docker-build-diffusers docker-build-kokoro docker-build-faster-whisper docker-build-coqui docker-build-chatterbox docker-build-vibevoice docker-build-moonshine docker-build-pocket-tts docker-build-qwen-tts docker-build-faster-qwen3-tts docker-build-qwen-asr docker-build-nemo docker-build-voxcpm docker-build-whisperx docker-build-ace-step docker-build-voxtral
########################################################
### Mock Backend for E2E Tests

View File

@@ -93,16 +93,7 @@ Liking LocalAI? LocalAI is part of an integrated suite of AI infrastructure tool
## 💻 Quickstart
> ⚠️ **Note:** The `install.sh` script is currently experiencing issues due to the heavy changes currently undergoing in LocalAI and may produce broken or misconfigured installations. Please use Docker installation (see below) or manual binary installation until [issue #8032](https://github.com/mudler/LocalAI/issues/8032) is resolved.
Run the installer script:
```bash
# Basic installation
curl https://localai.io/install.sh | sh
```
For more installation options, see [Installer Options](https://localai.io/installation/).
### macOS Download:
@@ -203,7 +194,8 @@ local-ai run oci://localai/phi-2:latest
For more information, see [💻 Getting started](https://localai.io/basics/getting_started/index.html), if you are interested in our roadmap items and future enhancements, you can see the [Issues labeled as Roadmap here](https://github.com/mudler/LocalAI/issues?q=is%3Aissue+is%3Aopen+label%3Aroadmap)
## 📰 Latest project news
- February 2026: [Realtime API for audio-to-audio with tool calling](https://github.com/mudler/LocalAI/pull/6245), [ACE-Step 1.5 support](https://github.com/mudler/LocalAI/pull/8396)
- January 2026: **LocalAI 3.10.0** - Major release with Anthropic API support, Open Responses API for stateful agents, video & image generation suite (LTX-2), unified GPU backends, tool streaming & XML parsing, system-aware backend gallery, crash fixes for AVX-only CPUs and AMD VRAM reporting, request tracing, and new backends: **Moonshine** (ultra-fast transcription), **Pocket-TTS** (lightweight TTS). Vulkan arm64 builds now available. [Release notes](https://github.com/mudler/LocalAI/releases/tag/v3.10.0).
- December 2025: [Dynamic Memory Resource reclaimer](https://github.com/mudler/LocalAI/pull/7583), [Automatic fitting of models to multiple GPUS(llama.cpp)](https://github.com/mudler/LocalAI/pull/7584), [Added Vibevoice backend](https://github.com/mudler/LocalAI/pull/7494)
- November 2025: Major improvements to the UX. Among these: [Import models via URL](https://github.com/mudler/LocalAI/pull/7245) and [Multiple chats and history](https://github.com/mudler/LocalAI/pull/7325)
- October 2025: 🔌 [Model Context Protocol (MCP)](https://localai.io/docs/features/mcp/) support added for agentic capabilities with external tools
@@ -236,7 +228,7 @@ Roadmap items: [List of issues](https://github.com/mudler/LocalAI/issues?q=is%3A
- 🧩 [Backend Gallery](https://localai.io/backends/): Install/remove backends on the fly, powered by OCI images — fully customizable and API-driven.
- 📖 [Text generation with GPTs](https://localai.io/features/text-generation/) (`llama.cpp`, `transformers`, `vllm` ... [:book: and more](https://localai.io/model-compatibility/index.html#model-compatibility-table))
- 🗣 [Text to Audio](https://localai.io/features/text-to-audio/)
- 🔈 [Audio to Text](https://localai.io/features/audio-to-text/) (Audio transcription with `whisper.cpp`)
- 🔈 [Audio to Text](https://localai.io/features/audio-to-text/)
- 🎨 [Image generation](https://localai.io/features/image-generation)
- 🔥 [OpenAI-alike tools API](https://localai.io/features/openai-functions/)
- ⚡ [Realtime API](https://localai.io/features/openai-realtime/) (Speech-to-speech)
@@ -269,6 +261,7 @@ LocalAI supports a comprehensive range of AI backends with multiple acceleration
|---------|-------------|---------------------|
| **whisper.cpp** | OpenAI Whisper in C/C++ | CUDA 12/13, ROCm, Intel SYCL, Vulkan, CPU |
| **faster-whisper** | Fast Whisper with CTranslate2 | CUDA 12/13, ROCm, Intel, CPU |
| **moonshine** | Ultra-fast transcription engine for low-end devices | CUDA 12/13, Metal, CPU |
| **coqui** | Advanced TTS with 1100+ languages | CUDA 12/13, ROCm, Intel, CPU |
| **kokoro** | Lightweight TTS model | CUDA 12/13, ROCm, Intel, CPU |
| **chatterbox** | Production-grade TTS | CUDA 12/13, CPU |
@@ -279,6 +272,7 @@ LocalAI supports a comprehensive range of AI backends with multiple acceleration
| **vibevoice** | Real-time TTS with voice cloning | CUDA 12/13, ROCm, Intel, CPU |
| **pocket-tts** | Lightweight CPU-based TTS | CUDA 12/13, ROCm, Intel, CPU |
| **qwen-tts** | High-quality TTS with custom voice, voice design, and voice cloning | CUDA 12/13, ROCm, Intel, CPU |
| **ace-step** | Music generation from text descriptions, lyrics, or audio samples | CUDA 12/13, ROCm, Intel, Metal, CPU |
### Image & Video Generation
| Backend | Description | Acceleration Support |
@@ -300,11 +294,11 @@ LocalAI supports a comprehensive range of AI backends with multiple acceleration
|-------------------|-------------------|------------------|
| **NVIDIA CUDA 12** | All CUDA-compatible backends | Nvidia hardware |
| **NVIDIA CUDA 13** | All CUDA-compatible backends | Nvidia hardware |
| **AMD ROCm** | llama.cpp, whisper, vllm, transformers, diffusers, rerankers, coqui, kokoro, neutts, vibevoice, pocket-tts, qwen-tts | AMD Graphics |
| **Intel oneAPI** | llama.cpp, whisper, stablediffusion, vllm, transformers, diffusers, rfdetr, rerankers, coqui, kokoro, vibevoice, pocket-tts, qwen-tts | Intel Arc, Intel iGPUs |
| **Apple Metal** | llama.cpp, whisper, diffusers, MLX, MLX-VLM | Apple M1/M2/M3+ |
| **AMD ROCm** | llama.cpp, whisper, vllm, transformers, diffusers, rerankers, coqui, kokoro, neutts, vibevoice, pocket-tts, qwen-tts, ace-step | AMD Graphics |
| **Intel oneAPI** | llama.cpp, whisper, stablediffusion, vllm, transformers, diffusers, rfdetr, rerankers, coqui, kokoro, vibevoice, pocket-tts, qwen-tts, ace-step | Intel Arc, Intel iGPUs |
| **Apple Metal** | llama.cpp, whisper, diffusers, MLX, MLX-VLM, moonshine, ace-step | Apple M1/M2/M3+ |
| **Vulkan** | llama.cpp, whisper, stablediffusion | Cross-platform GPUs |
| **NVIDIA Jetson (CUDA 12)** | llama.cpp, whisper, stablediffusion, diffusers, rfdetr | ARM64 embedded AI (AGX Orin, etc.) |
| **NVIDIA Jetson (CUDA 12)** | llama.cpp, whisper, stablediffusion, diffusers, rfdetr, ace-step | ARM64 embedded AI (AGX Orin, etc.) |
| **NVIDIA Jetson (CUDA 13)** | llama.cpp, whisper, stablediffusion, diffusers, rfdetr | ARM64 embedded AI (DGX Spark) |
| **CPU Optimized** | All backends | AVX/AVX2/AVX512, quantization support |

View File

@@ -20,7 +20,7 @@ RUN apt-get update && \
build-essential \
git ccache \
ca-certificates \
make cmake wget \
make cmake wget libopenblas-dev \
curl unzip \
libssl-dev && \
apt-get clean && \

View File

@@ -365,6 +365,14 @@ message SoundGenerationRequest {
optional bool sample = 6;
optional string src = 7;
optional int32 src_divisor = 8;
optional bool think = 9;
optional string caption = 10;
optional string lyrics = 11;
optional int32 bpm = 12;
optional string keyscale = 13;
optional string language = 14;
optional string timesignature = 15;
optional bool instrumental = 17;
}
message TokenizationResponse {

View File

@@ -1,5 +1,5 @@
LLAMA_VERSION?=2634ed207a17db1a54bd8df0555bd8499a6ab691
LLAMA_VERSION?=723c71064da0908c19683f8c344715fbf6d986fd
LLAMA_REPO?=https://github.com/ggerganov/llama.cpp
CMAKE_ARGS?=

View File

@@ -417,6 +417,12 @@ static void params_parse(server_context& /*ctx_server*/, const backend::ModelOpt
// n_ctx_checkpoints: max context checkpoints per slot (default: 8)
params.n_ctx_checkpoints = 8;
// llama memory fit fails if we don't provide a buffer for tensor overrides
const size_t ntbo = llama_max_tensor_buft_overrides();
while (params.tensor_buft_overrides.size() < ntbo) {
params.tensor_buft_overrides.push_back({nullptr, nullptr});
}
// decode options. Options are in form optname:optvale, or if booleans only optname.
for (int i = 0; i < request->options_size(); i++) {
std::string opt = request->options(i);
@@ -1255,6 +1261,42 @@ public:
body_json["add_generation_prompt"] = data["add_generation_prompt"];
}
// Pass sampling parameters to body_json so oaicompat_chat_params_parse respects them
// and doesn't overwrite them with defaults in the returned parsed_data
if (data.contains("n_predict")) {
body_json["max_tokens"] = data["n_predict"];
}
if (data.contains("ignore_eos")) {
body_json["ignore_eos"] = data["ignore_eos"];
}
if (data.contains("stop")) {
body_json["stop"] = data["stop"];
}
if (data.contains("temperature")) {
body_json["temperature"] = data["temperature"];
}
if (data.contains("top_p")) {
body_json["top_p"] = data["top_p"];
}
if (data.contains("frequency_penalty")) {
body_json["frequency_penalty"] = data["frequency_penalty"];
}
if (data.contains("presence_penalty")) {
body_json["presence_penalty"] = data["presence_penalty"];
}
if (data.contains("seed")) {
body_json["seed"] = data["seed"];
}
if (data.contains("logit_bias")) {
body_json["logit_bias"] = data["logit_bias"];
}
if (data.contains("top_k")) {
body_json["top_k"] = data["top_k"];
}
if (data.contains("min_p")) {
body_json["min_p"] = data["min_p"];
}
// Debug: Print full body_json before template processing (includes messages, tools, tool_choice, etc.)
SRV_DBG("[CONVERSATION DEBUG] PredictStream: Full body_json before oaicompat_chat_params_parse:\n%s\n", body_json.dump(2).c_str());
@@ -1986,6 +2028,42 @@ public:
body_json["add_generation_prompt"] = data["add_generation_prompt"];
}
// Pass sampling parameters to body_json so oaicompat_chat_params_parse respects them
// and doesn't overwrite them with defaults in the returned parsed_data
if (data.contains("n_predict")) {
body_json["max_tokens"] = data["n_predict"];
}
if (data.contains("ignore_eos")) {
body_json["ignore_eos"] = data["ignore_eos"];
}
if (data.contains("stop")) {
body_json["stop"] = data["stop"];
}
if (data.contains("temperature")) {
body_json["temperature"] = data["temperature"];
}
if (data.contains("top_p")) {
body_json["top_p"] = data["top_p"];
}
if (data.contains("frequency_penalty")) {
body_json["frequency_penalty"] = data["frequency_penalty"];
}
if (data.contains("presence_penalty")) {
body_json["presence_penalty"] = data["presence_penalty"];
}
if (data.contains("seed")) {
body_json["seed"] = data["seed"];
}
if (data.contains("logit_bias")) {
body_json["logit_bias"] = data["logit_bias"];
}
if (data.contains("top_k")) {
body_json["top_k"] = data["top_k"];
}
if (data.contains("min_p")) {
body_json["min_p"] = data["min_p"];
}
// Debug: Print full body_json before template processing (includes messages, tools, tool_choice, etc.)
SRV_DBG("[CONVERSATION DEBUG] Predict: Full body_json before oaicompat_chat_params_parse:\n%s\n", body_json.dump(2).c_str());

View File

@@ -6,4 +6,7 @@ huggingface:
package:
bash package.sh
build: huggingface package
build: huggingface package
clean:
rm -f huggingface

View File

@@ -8,5 +8,5 @@ set -e
CURDIR=$(dirname "$(realpath $0)")
mkdir -p $CURDIR/package
cp -avrf $CURDIR/huggingface $CURDIR/package/
cp -avf $CURDIR/huggingface $CURDIR/package/
cp -rfv $CURDIR/run.sh $CURDIR/package/

View File

@@ -6,4 +6,7 @@ local-store:
package:
bash package.sh
build: local-store package
build: local-store package
clean:
rm -f local-store

View File

@@ -8,5 +8,5 @@ set -e
CURDIR=$(dirname "$(realpath $0)")
mkdir -p $CURDIR/package
cp -avrf $CURDIR/local-store $CURDIR/package/
cp -avf $CURDIR/local-store $CURDIR/package/
cp -rfv $CURDIR/run.sh $CURDIR/package/

View File

@@ -34,4 +34,7 @@ piper: sources/go-piper sources/go-piper/libpiper_binding.a espeak-ng-data
package:
bash package.sh
build: piper package
build: piper package
clean:
rm -f piper

View File

@@ -10,8 +10,8 @@ CURDIR=$(dirname "$(realpath $0)")
# Create lib directory
mkdir -p $CURDIR/package/lib
cp -avrf $CURDIR/piper $CURDIR/package/
cp -avrf $CURDIR/espeak-ng-data $CURDIR/package/
cp -avf $CURDIR/piper $CURDIR/package/
cp -avf $CURDIR/espeak-ng-data $CURDIR/package/
cp -rfv $CURDIR/run.sh $CURDIR/package/
cp -rfLv $CURDIR/sources/go-piper/piper-phonemize/pi/lib/* $CURDIR/package/lib/

View File

@@ -44,4 +44,7 @@ silero-vad: backend-assets/lib/libonnxruntime.so.1
package:
bash package.sh
build: silero-vad package
build: silero-vad package
clean:
rm -f silero-vad

View File

@@ -10,8 +10,8 @@ CURDIR=$(dirname "$(realpath $0)")
# Create lib directory
mkdir -p $CURDIR/package/lib
cp -avrf $CURDIR/silero-vad $CURDIR/package/
cp -avrf $CURDIR/run.sh $CURDIR/package/
cp -avf $CURDIR/silero-vad $CURDIR/package/
cp -avf $CURDIR/run.sh $CURDIR/package/
cp -rfLv $CURDIR/backend-assets/lib/* $CURDIR/package/lib/
# Detect architecture and copy appropriate libraries

View File

@@ -2,5 +2,5 @@ package/
sources/
.cache/
build/
libgosd.so
*.so
stablediffusion-ggml

View File

@@ -66,15 +66,18 @@ sources/stablediffusion-ggml.cpp:
git checkout $(STABLEDIFFUSION_GGML_VERSION) && \
git submodule update --init --recursive --depth 1 --single-branch
libgosd.so: sources/stablediffusion-ggml.cpp CMakeLists.txt gosd.cpp gosd.h
mkdir -p build && \
cd build && \
cmake .. $(CMAKE_ARGS) && \
cmake --build . --config Release -j$(JOBS) && \
cd .. && \
mv build/libgosd.so ./
# Detect OS
UNAME_S := $(shell uname -s)
stablediffusion-ggml: main.go gosd.go libgosd.so
# Only build CPU variants on Linux
ifeq ($(UNAME_S),Linux)
VARIANT_TARGETS = libgosd-avx.so libgosd-avx2.so libgosd-avx512.so libgosd-fallback.so
else
# On non-Linux (e.g., Darwin), build only fallback variant
VARIANT_TARGETS = libgosd-fallback.so
endif
stablediffusion-ggml: main.go gosd.go $(VARIANT_TARGETS)
CGO_ENABLED=0 $(GOCMD) build -tags "$(GO_TAGS)" -o stablediffusion-ggml ./
package: stablediffusion-ggml
@@ -82,5 +85,46 @@ package: stablediffusion-ggml
build: package
clean:
rm -rf libgosd.so build stablediffusion-ggml package sources
clean: purge
rm -rf libgosd*.so stablediffusion-ggml package sources
purge:
rm -rf build*
# Build all variants (Linux only)
ifeq ($(UNAME_S),Linux)
libgosd-avx.so: sources/stablediffusion-ggml.cpp
$(MAKE) purge
$(info ${GREEN}I stablediffusion-ggml build info:avx${RESET})
SO_TARGET=libgosd-avx.so CMAKE_ARGS="$(CMAKE_ARGS) -DGGML_AVX=on -DGGML_AVX2=off -DGGML_AVX512=off -DGGML_FMA=off -DGGML_F16C=off -DGGML_BMI2=off" $(MAKE) libgosd-custom
rm -rfv build*
libgosd-avx2.so: sources/stablediffusion-ggml.cpp
$(MAKE) purge
$(info ${GREEN}I stablediffusion-ggml build info:avx2${RESET})
SO_TARGET=libgosd-avx2.so CMAKE_ARGS="$(CMAKE_ARGS) -DGGML_AVX=on -DGGML_AVX2=on -DGGML_AVX512=off -DGGML_FMA=on -DGGML_F16C=on -DGGML_BMI2=on" $(MAKE) libgosd-custom
rm -rfv build*
libgosd-avx512.so: sources/stablediffusion-ggml.cpp
$(MAKE) purge
$(info ${GREEN}I stablediffusion-ggml build info:avx512${RESET})
SO_TARGET=libgosd-avx512.so CMAKE_ARGS="$(CMAKE_ARGS) -DGGML_AVX=on -DGGML_AVX2=on -DGGML_AVX512=on -DGGML_FMA=on -DGGML_F16C=on -DGGML_BMI2=on" $(MAKE) libgosd-custom
rm -rfv build*
endif
# Build fallback variant (all platforms)
libgosd-fallback.so: sources/stablediffusion-ggml.cpp
$(MAKE) purge
$(info ${GREEN}I stablediffusion-ggml build info:fallback${RESET})
SO_TARGET=libgosd-fallback.so CMAKE_ARGS="$(CMAKE_ARGS) -DGGML_AVX=off -DGGML_AVX2=off -DGGML_AVX512=off -DGGML_FMA=off -DGGML_F16C=off -DGGML_BMI2=off" $(MAKE) libgosd-custom
rm -rfv build*
libgosd-custom: CMakeLists.txt gosd.cpp gosd.h
mkdir -p build-$(SO_TARGET) && \
cd build-$(SO_TARGET) && \
cmake .. $(CMAKE_ARGS) && \
cmake --build . --config Release -j$(JOBS) && \
cd .. && \
mv build-$(SO_TARGET)/libgosd.so ./$(SO_TARGET)
all: stablediffusion-ggml package

View File

@@ -2,6 +2,7 @@ package main
import (
"flag"
"os"
"github.com/ebitengine/purego"
grpc "github.com/mudler/LocalAI/pkg/grpc"
@@ -17,7 +18,13 @@ type LibFuncs struct {
}
func main() {
gosd, err := purego.Dlopen("./libgosd.so", purego.RTLD_NOW|purego.RTLD_GLOBAL)
// Get library name from environment variable, default to fallback
libName := os.Getenv("SD_LIBRARY")
if libName == "" {
libName = "./libgosd-fallback.so"
}
gosd, err := purego.Dlopen(libName, purego.RTLD_NOW|purego.RTLD_GLOBAL)
if err != nil {
panic(err)
}

View File

@@ -11,7 +11,7 @@ REPO_ROOT="${CURDIR}/../../.."
# Create lib directory
mkdir -p $CURDIR/package/lib
cp -avf $CURDIR/libgosd.so $CURDIR/package/
cp -avf $CURDIR/libgosd-*.so $CURDIR/package/
cp -avf $CURDIR/stablediffusion-ggml $CURDIR/package/
cp -fv $CURDIR/run.sh $CURDIR/package/

View File

@@ -1,14 +1,52 @@
#!/bin/bash
set -ex
# Get the absolute current dir where the script is located
CURDIR=$(dirname "$(realpath $0)")
cd /
echo "CPU info:"
if [ "$(uname)" != "Darwin" ]; then
grep -e "model\sname" /proc/cpuinfo | head -1
grep -e "flags" /proc/cpuinfo | head -1
fi
LIBRARY="$CURDIR/libgosd-fallback.so"
if [ "$(uname)" != "Darwin" ]; then
if grep -q -e "\savx\s" /proc/cpuinfo ; then
echo "CPU: AVX found OK"
if [ -e $CURDIR/libgosd-avx.so ]; then
LIBRARY="$CURDIR/libgosd-avx.so"
fi
fi
if grep -q -e "\savx2\s" /proc/cpuinfo ; then
echo "CPU: AVX2 found OK"
if [ -e $CURDIR/libgosd-avx2.so ]; then
LIBRARY="$CURDIR/libgosd-avx2.so"
fi
fi
# Check avx 512
if grep -q -e "\savx512f\s" /proc/cpuinfo ; then
echo "CPU: AVX512F found OK"
if [ -e $CURDIR/libgosd-avx512.so ]; then
LIBRARY="$CURDIR/libgosd-avx512.so"
fi
fi
fi
export LD_LIBRARY_PATH=$CURDIR/lib:$LD_LIBRARY_PATH
export SD_LIBRARY=$LIBRARY
# If there is a lib/ld.so, use it
if [ -f $CURDIR/lib/ld.so ]; then
echo "Using lib/ld.so"
echo "Using library: $LIBRARY"
exec $CURDIR/lib/ld.so $CURDIR/stablediffusion-ggml "$@"
fi
exec $CURDIR/stablediffusion-ggml "$@"
echo "Using library: $LIBRARY"
exec $CURDIR/stablediffusion-ggml "$@"

9
backend/go/voxtral/.gitignore vendored Normal file
View File

@@ -0,0 +1,9 @@
.cache/
sources/
build/
build-*/
package/
voxtral
*.so
*.dylib
compile_commands.json

View File

@@ -0,0 +1,84 @@
cmake_minimum_required(VERSION 3.12)
if(USE_METAL)
project(govoxtral LANGUAGES C OBJC)
else()
project(govoxtral LANGUAGES C)
endif()
set(CMAKE_POSITION_INDEPENDENT_CODE ON)
set(CMAKE_EXPORT_COMPILE_COMMANDS ON)
# Workaround: CMake + GCC linker depfile generation fails for MODULE libraries
set(CMAKE_C_LINKER_DEPFILE_SUPPORTED FALSE)
# Build voxtral.c as a library
set(VOXTRAL_SOURCES
sources/voxtral.c/voxtral.c
sources/voxtral.c/voxtral_kernels.c
sources/voxtral.c/voxtral_audio.c
sources/voxtral.c/voxtral_encoder.c
sources/voxtral.c/voxtral_decoder.c
sources/voxtral.c/voxtral_tokenizer.c
sources/voxtral.c/voxtral_safetensors.c
)
# Metal GPU acceleration (macOS arm64 only)
if(USE_METAL)
# Generate embedded shader header from .metal source via xxd
add_custom_command(
OUTPUT ${CMAKE_CURRENT_SOURCE_DIR}/sources/voxtral.c/voxtral_shaders_source.h
COMMAND xxd -i voxtral_shaders.metal > voxtral_shaders_source.h
WORKING_DIRECTORY ${CMAKE_CURRENT_SOURCE_DIR}/sources/voxtral.c
DEPENDS sources/voxtral.c/voxtral_shaders.metal
COMMENT "Generating embedded Metal shaders header"
)
list(APPEND VOXTRAL_SOURCES sources/voxtral.c/voxtral_metal.m)
set_source_files_properties(sources/voxtral.c/voxtral_metal.m PROPERTIES
COMPILE_FLAGS "-fobjc-arc"
)
endif()
add_library(govoxtral MODULE csrc/govoxtral.c ${VOXTRAL_SOURCES})
target_include_directories(govoxtral PRIVATE sources/voxtral.c csrc)
target_compile_options(govoxtral PRIVATE -O3 -ffast-math)
if(USE_METAL)
target_compile_definitions(govoxtral PRIVATE USE_BLAS USE_METAL ACCELERATE_NEW_LAPACK)
target_link_libraries(govoxtral PRIVATE
"-framework Accelerate"
"-framework Metal"
"-framework MetalPerformanceShaders"
"-framework MetalPerformanceShadersGraph"
"-framework Foundation"
"-framework AudioToolbox"
"-framework CoreFoundation"
m
)
# Ensure the generated shader header is built before compiling
target_sources(govoxtral PRIVATE
${CMAKE_CURRENT_SOURCE_DIR}/sources/voxtral.c/voxtral_shaders_source.h
)
elseif(USE_OPENBLAS)
# Try to find OpenBLAS; use it if available, otherwise fall back to pure C
find_package(BLAS)
if(BLAS_FOUND)
target_compile_definitions(govoxtral PRIVATE USE_BLAS USE_OPENBLAS)
target_link_libraries(govoxtral PRIVATE ${BLAS_LIBRARIES} m)
target_include_directories(govoxtral PRIVATE /usr/include/openblas)
else()
message(WARNING "OpenBLAS requested but not found, building without BLAS")
target_link_libraries(govoxtral PRIVATE m)
endif()
elseif(APPLE)
# macOS without Metal: use Accelerate framework
target_compile_definitions(govoxtral PRIVATE USE_BLAS ACCELERATE_NEW_LAPACK)
target_link_libraries(govoxtral PRIVATE "-framework Accelerate" m)
else()
target_link_libraries(govoxtral PRIVATE m)
endif()
set_property(TARGET govoxtral PROPERTY C_STANDARD 11)
set_target_properties(govoxtral PROPERTIES LIBRARY_OUTPUT_DIRECTORY ${CMAKE_BINARY_DIR})

107
backend/go/voxtral/Makefile Normal file
View File

@@ -0,0 +1,107 @@
.NOTPARALLEL:
CMAKE_ARGS?=
BUILD_TYPE?=
NATIVE?=true
GOCMD?=go
GO_TAGS?=
JOBS?=$(shell nproc --ignore=1 2>/dev/null || sysctl -n hw.ncpu 2>/dev/null || echo 4)
# voxtral.c version
VOXTRAL_REPO?=https://github.com/antirez/voxtral.c
VOXTRAL_VERSION?=134d366c24d20c64b614a3dcc8bda2a6922d077d
# Detect OS
UNAME_S := $(shell uname -s)
# Shared library extension
ifeq ($(UNAME_S),Darwin)
SO_EXT=dylib
else
SO_EXT=so
endif
SO_TARGET?=libgovoxtral.$(SO_EXT)
CMAKE_ARGS+=-DBUILD_SHARED_LIBS=OFF
ifeq ($(NATIVE),false)
ifneq ($(UNAME_S),Darwin)
CMAKE_ARGS+=-DCMAKE_C_FLAGS="-march=x86-64"
endif
endif
ifeq ($(BUILD_TYPE),cublas)
CMAKE_ARGS+=-DUSE_OPENBLAS=OFF
else ifeq ($(BUILD_TYPE),hipblas)
CMAKE_ARGS+=-DUSE_OPENBLAS=OFF
else ifeq ($(BUILD_TYPE),metal)
CMAKE_ARGS+=-DUSE_OPENBLAS=OFF -DUSE_METAL=ON
else ifeq ($(UNAME_S),Darwin)
# Default on macOS: use Accelerate (no OpenBLAS needed)
CMAKE_ARGS+=-DUSE_OPENBLAS=OFF
else
CMAKE_ARGS+=-DUSE_OPENBLAS=ON
endif
# Single library target
ifeq ($(UNAME_S),Darwin)
VARIANT_TARGETS = libgovoxtral.dylib
else
VARIANT_TARGETS = libgovoxtral.so
endif
sources/voxtral.c:
mkdir -p sources/voxtral.c
cd sources/voxtral.c && \
git init && \
git remote add origin $(VOXTRAL_REPO) && \
git fetch origin && \
git checkout $(VOXTRAL_VERSION) && \
git submodule update --init --recursive --depth 1 --single-branch
voxtral: main.go govoxtral.go $(VARIANT_TARGETS)
CGO_ENABLED=0 $(GOCMD) build -tags "$(GO_TAGS)" -o voxtral ./
package: voxtral
bash package.sh
build: package
clean: purge
rm -rf libgovoxtral.so libgovoxtral.dylib package sources/voxtral.c voxtral
purge:
rm -rf build*
# Build single library
ifeq ($(UNAME_S),Darwin)
libgovoxtral.dylib: sources/voxtral.c
$(MAKE) purge
$(info Building voxtral: darwin)
SO_TARGET=libgovoxtral.dylib NATIVE=true $(MAKE) libgovoxtral-custom
rm -rfv build*
else
libgovoxtral.so: sources/voxtral.c
$(MAKE) purge
$(info Building voxtral)
SO_TARGET=libgovoxtral.so $(MAKE) libgovoxtral-custom
rm -rfv build*
endif
libgovoxtral-custom: CMakeLists.txt csrc/govoxtral.c csrc/govoxtral.h
mkdir -p build-$(SO_TARGET) && \
cd build-$(SO_TARGET) && \
cmake .. $(CMAKE_ARGS) && \
cmake --build . --config Release -j$(JOBS) && \
cd .. && \
(mv build-$(SO_TARGET)/libgovoxtral.so ./$(SO_TARGET) 2>/dev/null || \
mv build-$(SO_TARGET)/libgovoxtral.dylib ./$(SO_TARGET) 2>/dev/null)
test: voxtral
@echo "Running voxtral tests..."
bash test.sh
@echo "voxtral tests completed."
all: voxtral package

View File

@@ -0,0 +1,62 @@
#include "govoxtral.h"
#include "voxtral.h"
#include "voxtral_audio.h"
#ifdef USE_METAL
#include "voxtral_metal.h"
#endif
#include <stdlib.h>
#include <string.h>
#include <stdio.h>
static vox_ctx_t *ctx = NULL;
static char *last_result = NULL;
static int metal_initialized = 0;
int load_model(const char *model_dir) {
if (ctx != NULL) {
vox_free(ctx);
ctx = NULL;
}
#ifdef USE_METAL
if (!metal_initialized) {
vox_metal_init();
metal_initialized = 1;
}
#endif
ctx = vox_load(model_dir);
if (ctx == NULL) {
fprintf(stderr, "error: failed to load voxtral model from %s\n", model_dir);
return 1;
}
return 0;
}
const char *transcribe(const char *wav_path) {
if (ctx == NULL) {
fprintf(stderr, "error: model not loaded\n");
return "";
}
if (last_result != NULL) {
free(last_result);
last_result = NULL;
}
last_result = vox_transcribe(ctx, wav_path);
if (last_result == NULL) {
fprintf(stderr, "error: transcription failed for %s\n", wav_path);
return "";
}
return last_result;
}
void free_result(void) {
if (last_result != NULL) {
free(last_result);
last_result = NULL;
}
}

View File

@@ -0,0 +1,8 @@
#ifndef GOVOXTRAL_H
#define GOVOXTRAL_H
extern int load_model(const char *model_dir);
extern const char *transcribe(const char *wav_path);
extern void free_result(void);
#endif /* GOVOXTRAL_H */

View File

@@ -0,0 +1,60 @@
package main
import (
"fmt"
"os"
"strings"
"github.com/mudler/LocalAI/pkg/grpc/base"
pb "github.com/mudler/LocalAI/pkg/grpc/proto"
"github.com/mudler/LocalAI/pkg/utils"
)
var (
CppLoadModel func(modelDir string) int
CppTranscribe func(wavPath string) string
CppFreeResult func()
)
type Voxtral struct {
base.SingleThread
}
func (v *Voxtral) Load(opts *pb.ModelOptions) error {
if ret := CppLoadModel(opts.ModelFile); ret != 0 {
return fmt.Errorf("failed to load Voxtral model from %s", opts.ModelFile)
}
return nil
}
func (v *Voxtral) AudioTranscription(opts *pb.TranscriptRequest) (pb.TranscriptResult, error) {
dir, err := os.MkdirTemp("", "voxtral")
if err != nil {
return pb.TranscriptResult{}, err
}
defer os.RemoveAll(dir)
convertedPath := dir + "/converted.wav"
if err := utils.AudioToWav(opts.Dst, convertedPath); err != nil {
return pb.TranscriptResult{}, err
}
result := strings.Clone(CppTranscribe(convertedPath))
CppFreeResult()
text := strings.TrimSpace(result)
segments := []*pb.TranscriptSegment{}
if text != "" {
segments = append(segments, &pb.TranscriptSegment{
Id: 0,
Text: text,
})
}
return pb.TranscriptResult{
Segments: segments,
Text: text,
}, nil
}

View File

@@ -0,0 +1,53 @@
package main
// Note: this is started internally by LocalAI and a server is allocated for each model
import (
"flag"
"os"
"runtime"
"github.com/ebitengine/purego"
grpc "github.com/mudler/LocalAI/pkg/grpc"
)
var (
addr = flag.String("addr", "localhost:50051", "the address to connect to")
)
type LibFuncs struct {
FuncPtr any
Name string
}
func main() {
// Get library name from environment variable, default to fallback
libName := os.Getenv("VOXTRAL_LIBRARY")
if libName == "" {
if runtime.GOOS == "darwin" {
libName = "./libgovoxtral.dylib"
} else {
libName = "./libgovoxtral.so"
}
}
gosd, err := purego.Dlopen(libName, purego.RTLD_NOW|purego.RTLD_GLOBAL)
if err != nil {
panic(err)
}
libFuncs := []LibFuncs{
{&CppLoadModel, "load_model"},
{&CppTranscribe, "transcribe"},
{&CppFreeResult, "free_result"},
}
for _, lf := range libFuncs {
purego.RegisterLibFunc(lf.FuncPtr, gosd, lf.Name)
}
flag.Parse()
if err := grpc.StartServer(*addr, &Voxtral{}); err != nil {
panic(err)
}
}

View File

@@ -0,0 +1,68 @@
#!/bin/bash
# Script to copy the appropriate libraries based on architecture
set -e
CURDIR=$(dirname "$(realpath $0)")
REPO_ROOT="${CURDIR}/../../.."
# Create lib directory
mkdir -p $CURDIR/package/lib
cp -avf $CURDIR/voxtral $CURDIR/package/
cp -fv $CURDIR/libgovoxtral-*.so $CURDIR/package/ 2>/dev/null || true
cp -fv $CURDIR/libgovoxtral-*.dylib $CURDIR/package/ 2>/dev/null || true
cp -fv $CURDIR/run.sh $CURDIR/package/
# Detect architecture and copy appropriate libraries
if [ -f "/lib64/ld-linux-x86-64.so.2" ]; then
# x86_64 architecture
echo "Detected x86_64 architecture, copying x86_64 libraries..."
cp -arfLv /lib64/ld-linux-x86-64.so.2 $CURDIR/package/lib/ld.so
cp -arfLv /lib/x86_64-linux-gnu/libc.so.6 $CURDIR/package/lib/libc.so.6
cp -arfLv /lib/x86_64-linux-gnu/libgcc_s.so.1 $CURDIR/package/lib/libgcc_s.so.1
cp -arfLv /lib/x86_64-linux-gnu/libstdc++.so.6 $CURDIR/package/lib/libstdc++.so.6
cp -arfLv /lib/x86_64-linux-gnu/libm.so.6 $CURDIR/package/lib/libm.so.6
cp -arfLv /lib/x86_64-linux-gnu/libgomp.so.1 $CURDIR/package/lib/libgomp.so.1
cp -arfLv /lib/x86_64-linux-gnu/libdl.so.2 $CURDIR/package/lib/libdl.so.2
cp -arfLv /lib/x86_64-linux-gnu/librt.so.1 $CURDIR/package/lib/librt.so.1
cp -arfLv /lib/x86_64-linux-gnu/libpthread.so.0 $CURDIR/package/lib/libpthread.so.0
# OpenBLAS if available
if [ -f /usr/lib/x86_64-linux-gnu/libopenblas.so.0 ]; then
cp -arfLv /usr/lib/x86_64-linux-gnu/libopenblas.so.0 $CURDIR/package/lib/
fi
elif [ -f "/lib/ld-linux-aarch64.so.1" ]; then
# ARM64 architecture
echo "Detected ARM64 architecture, copying ARM64 libraries..."
cp -arfLv /lib/ld-linux-aarch64.so.1 $CURDIR/package/lib/ld.so
cp -arfLv /lib/aarch64-linux-gnu/libc.so.6 $CURDIR/package/lib/libc.so.6
cp -arfLv /lib/aarch64-linux-gnu/libgcc_s.so.1 $CURDIR/package/lib/libgcc_s.so.1
cp -arfLv /lib/aarch64-linux-gnu/libstdc++.so.6 $CURDIR/package/lib/libstdc++.so.6
cp -arfLv /lib/aarch64-linux-gnu/libm.so.6 $CURDIR/package/lib/libm.so.6
cp -arfLv /lib/aarch64-linux-gnu/libgomp.so.1 $CURDIR/package/lib/libgomp.so.1
cp -arfLv /lib/aarch64-linux-gnu/libdl.so.2 $CURDIR/package/lib/libdl.so.2
cp -arfLv /lib/aarch64-linux-gnu/librt.so.1 $CURDIR/package/lib/librt.so.1
cp -arfLv /lib/aarch64-linux-gnu/libpthread.so.0 $CURDIR/package/lib/libpthread.so.0
# OpenBLAS if available
if [ -f /usr/lib/aarch64-linux-gnu/libopenblas.so.0 ]; then
cp -arfLv /usr/lib/aarch64-linux-gnu/libopenblas.so.0 $CURDIR/package/lib/
fi
elif [ $(uname -s) = "Darwin" ]; then
echo "Detected Darwin — system frameworks linked dynamically, no bundled libs needed"
else
echo "Error: Could not detect architecture"
exit 1
fi
# Package GPU libraries based on BUILD_TYPE
GPU_LIB_SCRIPT="${REPO_ROOT}/scripts/build/package-gpu-libs.sh"
if [ -f "$GPU_LIB_SCRIPT" ]; then
echo "Packaging GPU libraries for BUILD_TYPE=${BUILD_TYPE:-cpu}..."
source "$GPU_LIB_SCRIPT" "$CURDIR/package/lib"
package_gpu_libs
fi
echo "Packaging completed successfully"
ls -liah $CURDIR/package/
ls -liah $CURDIR/package/lib/

49
backend/go/voxtral/run.sh Normal file
View File

@@ -0,0 +1,49 @@
#!/bin/bash
set -ex
# Get the absolute current dir where the script is located
CURDIR=$(dirname "$(realpath $0)")
cd /
echo "CPU info:"
if [ "$(uname)" != "Darwin" ]; then
grep -e "model\sname" /proc/cpuinfo | head -1
grep -e "flags" /proc/cpuinfo | head -1
fi
if [ "$(uname)" = "Darwin" ]; then
# macOS: single dylib variant (Metal or Accelerate)
LIBRARY="$CURDIR/libgovoxtral-fallback.dylib"
export DYLD_LIBRARY_PATH=$CURDIR/lib:$DYLD_LIBRARY_PATH
else
LIBRARY="$CURDIR/libgovoxtral-fallback.so"
if grep -q -e "\savx\s" /proc/cpuinfo ; then
echo "CPU: AVX found OK"
if [ -e $CURDIR/libgovoxtral-avx.so ]; then
LIBRARY="$CURDIR/libgovoxtral-avx.so"
fi
fi
if grep -q -e "\savx2\s" /proc/cpuinfo ; then
echo "CPU: AVX2 found OK"
if [ -e $CURDIR/libgovoxtral-avx2.so ]; then
LIBRARY="$CURDIR/libgovoxtral-avx2.so"
fi
fi
export LD_LIBRARY_PATH=$CURDIR/lib:$LD_LIBRARY_PATH
fi
export VOXTRAL_LIBRARY=$LIBRARY
# If there is a lib/ld.so, use it (Linux only)
if [ -f $CURDIR/lib/ld.so ]; then
echo "Using lib/ld.so"
echo "Using library: $LIBRARY"
exec $CURDIR/lib/ld.so $CURDIR/voxtral "$@"
fi
echo "Using library: $LIBRARY"
exec $CURDIR/voxtral "$@"

View File

@@ -0,0 +1,48 @@
#!/bin/bash
set -e
CURDIR=$(dirname "$(realpath $0)")
echo "Running voxtral backend tests..."
# The test requires:
# - VOXTRAL_MODEL_DIR: path to directory containing consolidated.safetensors + tekken.json
# - VOXTRAL_BINARY: path to the voxtral binary (defaults to ./voxtral)
#
# Tests that require the model will be skipped if VOXTRAL_MODEL_DIR is not set.
cd "$CURDIR"
export VOXTRAL_MODEL_DIR="${VOXTRAL_MODEL_DIR:-./voxtral-model}"
if [ ! -d "$VOXTRAL_MODEL_DIR" ]; then
echo "Creating voxtral-model directory for tests..."
mkdir -p "$VOXTRAL_MODEL_DIR"
MODEL_ID="mistralai/Voxtral-Mini-4B-Realtime-2602"
echo "Model: ${MODEL_ID}"
echo ""
# Files to download
FILES=(
"consolidated.safetensors"
"params.json"
"tekken.json"
)
BASE_URL="https://huggingface.co/${MODEL_ID}/resolve/main"
for file in "${FILES[@]}"; do
dest="${VOXTRAL_MODEL_DIR}/${file}"
if [ -f "${dest}" ]; then
echo " [skip] ${file} (already exists)"
else
echo " [download] ${file}..."
curl -L -o "${dest}" "${BASE_URL}/${file}" --progress-bar
echo " [done] ${file}"
fi
done
fi
# Run Go tests
go test -v -timeout 300s ./...
echo "All voxtral tests passed."

View File

@@ -0,0 +1,201 @@
package main
import (
"context"
"fmt"
"io"
"net/http"
"os"
"os/exec"
"path/filepath"
"strings"
"testing"
"time"
pb "github.com/mudler/LocalAI/pkg/grpc/proto"
"google.golang.org/grpc"
"google.golang.org/grpc/credentials/insecure"
)
const (
testAddr = "localhost:50051"
sampleAudio = "https://qianwen-res.oss-cn-beijing.aliyuncs.com/Qwen3-ASR-Repo/asr_en.wav"
startupWait = 5 * time.Second
)
func skipIfNoModel(t *testing.T) string {
t.Helper()
modelDir := os.Getenv("VOXTRAL_MODEL_DIR")
if modelDir == "" {
t.Skip("VOXTRAL_MODEL_DIR not set, skipping test (set to voxtral model directory)")
}
if _, err := os.Stat(filepath.Join(modelDir, "consolidated.safetensors")); os.IsNotExist(err) {
t.Skipf("Model file not found in %s, skipping", modelDir)
}
return modelDir
}
func startServer(t *testing.T) *exec.Cmd {
t.Helper()
binary := os.Getenv("VOXTRAL_BINARY")
if binary == "" {
binary = "./voxtral"
}
if _, err := os.Stat(binary); os.IsNotExist(err) {
t.Skipf("Backend binary not found at %s, skipping", binary)
}
cmd := exec.Command(binary, "--addr", testAddr)
cmd.Stdout = os.Stderr
cmd.Stderr = os.Stderr
if err := cmd.Start(); err != nil {
t.Fatalf("Failed to start server: %v", err)
}
time.Sleep(startupWait)
return cmd
}
func stopServer(cmd *exec.Cmd) {
if cmd != nil && cmd.Process != nil {
cmd.Process.Kill()
cmd.Wait()
}
}
func dialGRPC(t *testing.T) *grpc.ClientConn {
t.Helper()
conn, err := grpc.Dial(testAddr,
grpc.WithTransportCredentials(insecure.NewCredentials()),
grpc.WithDefaultCallOptions(
grpc.MaxCallRecvMsgSize(50*1024*1024),
grpc.MaxCallSendMsgSize(50*1024*1024),
),
)
if err != nil {
t.Fatalf("Failed to dial gRPC: %v", err)
}
return conn
}
func downloadFile(url, dest string) error {
resp, err := http.Get(url)
if err != nil {
return fmt.Errorf("HTTP GET failed: %w", err)
}
defer resp.Body.Close()
if resp.StatusCode != http.StatusOK {
return fmt.Errorf("bad status: %s", resp.Status)
}
f, err := os.Create(dest)
if err != nil {
return err
}
defer f.Close()
_, err = io.Copy(f, resp.Body)
return err
}
func TestServerHealth(t *testing.T) {
cmd := startServer(t)
defer stopServer(cmd)
conn := dialGRPC(t)
defer conn.Close()
client := pb.NewBackendClient(conn)
resp, err := client.Health(context.Background(), &pb.HealthMessage{})
if err != nil {
t.Fatalf("Health check failed: %v", err)
}
if string(resp.Message) != "OK" {
t.Fatalf("Expected OK, got %s", string(resp.Message))
}
}
func TestLoadModel(t *testing.T) {
modelDir := skipIfNoModel(t)
cmd := startServer(t)
defer stopServer(cmd)
conn := dialGRPC(t)
defer conn.Close()
client := pb.NewBackendClient(conn)
resp, err := client.LoadModel(context.Background(), &pb.ModelOptions{
ModelFile: modelDir,
})
if err != nil {
t.Fatalf("LoadModel failed: %v", err)
}
if !resp.Success {
t.Fatalf("LoadModel returned failure: %s", resp.Message)
}
}
func TestAudioTranscription(t *testing.T) {
modelDir := skipIfNoModel(t)
tmpDir, err := os.MkdirTemp("", "voxtral-test")
if err != nil {
t.Fatal(err)
}
defer os.RemoveAll(tmpDir)
// Download sample audio — JFK "ask not what your country can do for you" clip
audioFile := filepath.Join(tmpDir, "sample.wav")
t.Log("Downloading sample audio...")
if err := downloadFile(sampleAudio, audioFile); err != nil {
t.Fatalf("Failed to download sample audio: %v", err)
}
cmd := startServer(t)
defer stopServer(cmd)
conn := dialGRPC(t)
defer conn.Close()
client := pb.NewBackendClient(conn)
// Load model
loadResp, err := client.LoadModel(context.Background(), &pb.ModelOptions{
ModelFile: modelDir,
})
if err != nil {
t.Fatalf("LoadModel failed: %v", err)
}
if !loadResp.Success {
t.Fatalf("LoadModel returned failure: %s", loadResp.Message)
}
// Transcribe
transcriptResp, err := client.AudioTranscription(context.Background(), &pb.TranscriptRequest{
Dst: audioFile,
})
if err != nil {
t.Fatalf("AudioTranscription failed: %v", err)
}
if transcriptResp == nil {
t.Fatal("AudioTranscription returned nil")
}
t.Logf("Transcribed text: %s", transcriptResp.Text)
t.Logf("Number of segments: %d", len(transcriptResp.Segments))
if transcriptResp.Text == "" {
t.Fatal("Transcription returned empty text")
}
allText := strings.ToLower(transcriptResp.Text)
for _, seg := range transcriptResp.Segments {
allText += " " + strings.ToLower(seg.Text)
}
t.Logf("All text: %s", allText)
if !strings.Contains(allText, "big") {
t.Errorf("Expected 'big' in transcription, got: %s", allText)
}
// The sample audio should contain recognizable speech
if len(allText) < 10 {
t.Errorf("Transcription too short: %q", allText)
}
}

View File

@@ -8,7 +8,7 @@ JOBS?=$(shell nproc --ignore=1)
# whisper.cpp version
WHISPER_REPO?=https://github.com/ggml-org/whisper.cpp
WHISPER_CPP_VERSION?=aa1bc0d1a6dfd70dbb9f60c11df12441e03a9075
WHISPER_CPP_VERSION?=21411d81ea736ed5d9cdea4df360d3c4b60a4adb
SO_TARGET?=libgowhisper.so
CMAKE_ARGS+=-DBUILD_SHARED_LIBS=OFF
@@ -78,7 +78,7 @@ package: whisper
build: package
clean: purge
rm -rf libgowhisper*.so sources/whisper.cpp whisper
rm -rf libgowhisper*.so package sources/whisper.cpp whisper
purge:
rm -rf build*
@@ -88,19 +88,19 @@ ifeq ($(UNAME_S),Linux)
libgowhisper-avx.so: sources/whisper.cpp
$(MAKE) purge
$(info ${GREEN}I whisper build info:avx${RESET})
SO_TARGET=libgowhisper-avx.so CMAKE_ARGS="$(CMAKE_ARGS) -DGGML_AVX=on -DGGML_AVX2=off -DGGML_AVX512=off -DGGML_FMA=off -DGGML_F16C=off" $(MAKE) libgowhisper-custom
SO_TARGET=libgowhisper-avx.so CMAKE_ARGS="$(CMAKE_ARGS) -DGGML_AVX=on -DGGML_AVX2=off -DGGML_AVX512=off -DGGML_FMA=off -DGGML_F16C=off -DGGML_BMI2=off" $(MAKE) libgowhisper-custom
rm -rfv build*
libgowhisper-avx2.so: sources/whisper.cpp
$(MAKE) purge
$(info ${GREEN}I whisper build info:avx2${RESET})
SO_TARGET=libgowhisper-avx2.so CMAKE_ARGS="$(CMAKE_ARGS) -DGGML_AVX=on -DGGML_AVX2=on -DGGML_AVX512=off -DGGML_FMA=on -DGGML_F16C=on" $(MAKE) libgowhisper-custom
SO_TARGET=libgowhisper-avx2.so CMAKE_ARGS="$(CMAKE_ARGS) -DGGML_AVX=on -DGGML_AVX2=on -DGGML_AVX512=off -DGGML_FMA=on -DGGML_F16C=on -DGGML_BMI2=on" $(MAKE) libgowhisper-custom
rm -rfv build*
libgowhisper-avx512.so: sources/whisper.cpp
$(MAKE) purge
$(info ${GREEN}I whisper build info:avx512${RESET})
SO_TARGET=libgowhisper-avx512.so CMAKE_ARGS="$(CMAKE_ARGS) -DGGML_AVX=on -DGGML_AVX2=off -DGGML_AVX512=on -DGGML_FMA=on -DGGML_F16C=on" $(MAKE) libgowhisper-custom
SO_TARGET=libgowhisper-avx512.so CMAKE_ARGS="$(CMAKE_ARGS) -DGGML_AVX=on -DGGML_AVX2=on -DGGML_AVX512=on -DGGML_FMA=on -DGGML_F16C=on -DGGML_BMI2=on" $(MAKE) libgowhisper-custom
rm -rfv build*
endif
@@ -108,7 +108,7 @@ endif
libgowhisper-fallback.so: sources/whisper.cpp
$(MAKE) purge
$(info ${GREEN}I whisper build info:fallback${RESET})
SO_TARGET=libgowhisper-fallback.so CMAKE_ARGS="$(CMAKE_ARGS) -DGGML_AVX=off -DGGML_AVX2=off -DGGML_AVX512=off -DGGML_FMA=off -DGGML_F16C=off" $(MAKE) libgowhisper-custom
SO_TARGET=libgowhisper-fallback.so CMAKE_ARGS="$(CMAKE_ARGS) -DGGML_AVX=off -DGGML_AVX2=off -DGGML_AVX512=off -DGGML_FMA=off -DGGML_F16C=off -DGGML_BMI2=off" $(MAKE) libgowhisper-custom
rm -rfv build*
libgowhisper-custom: CMakeLists.txt gowhisper.cpp gowhisper.h

View File

File diff suppressed because it is too large Load Diff

View File

@@ -0,0 +1,16 @@
.DEFAULT_GOAL := install
.PHONY: install
install:
bash install.sh
.PHONY: protogen-clean
protogen-clean:
$(RM) backend_pb2_grpc.py backend_pb2.py
.PHONY: clean
clean: protogen-clean
rm -rf venv __pycache__
test: install
bash test.sh

View File

@@ -0,0 +1,472 @@
#!/usr/bin/env python3
"""
LocalAI ACE-Step Backend
gRPC backend for ACE-Step 1.5 music generation. Aligns with upstream acestep API:
- LoadModel: initializes AceStepHandler (DiT) and LLMHandler, parses Options.
- SoundGeneration: uses create_sample (simple mode), format_sample (optional), then
generate_music from acestep.inference. Writes first output to request.dst.
- Fail hard: no fallback WAV on error; exceptions propagate to gRPC.
"""
from concurrent import futures
import argparse
import shutil
import signal
import sys
import os
import tempfile
import backend_pb2
import backend_pb2_grpc
import grpc
from acestep.inference import (
GenerationParams,
GenerationConfig,
generate_music,
create_sample,
format_sample,
)
from acestep.handler import AceStepHandler
from acestep.llm_inference import LLMHandler
from acestep.model_downloader import ensure_lm_model
_ONE_DAY_IN_SECONDS = 60 * 60 * 24
MAX_WORKERS = int(os.environ.get("PYTHON_GRPC_MAX_WORKERS", "1"))
# Model name -> HuggingFace/ModelScope repo (from upstream api_server.py)
MODEL_REPO_MAPPING = {
"acestep-v15-turbo": "ACE-Step/Ace-Step1.5",
"acestep-5Hz-lm-0.6B": "ACE-Step/Ace-Step1.5",
"acestep-5Hz-lm-1.7B": "ACE-Step/Ace-Step1.5",
"vae": "ACE-Step/Ace-Step1.5",
"Qwen3-Embedding-0.6B": "ACE-Step/Ace-Step1.5",
"acestep-v15-base": "ACE-Step/acestep-v15-base",
"acestep-v15-sft": "ACE-Step/acestep-v15-sft",
"acestep-v15-turbo-shift3": "ACE-Step/acestep-v15-turbo-shift3",
"acestep-5Hz-lm-4B": "ACE-Step/acestep-5Hz-lm-4B",
}
DEFAULT_REPO_ID = "ACE-Step/Ace-Step1.5"
def _is_float(s):
try:
float(s)
return True
except (ValueError, TypeError):
return False
def _is_int(s):
try:
int(s)
return True
except (ValueError, TypeError):
return False
def _parse_timesteps(s):
if s is None or (isinstance(s, str) and not s.strip()):
return None
if isinstance(s, (list, tuple)):
return [float(x) for x in s]
try:
return [float(x.strip()) for x in str(s).split(",") if x.strip()]
except (ValueError, TypeError):
return None
def _parse_options(opts_list):
"""Parse repeated 'key:value' options into a dict. Coerce numeric and bool."""
out = {}
for opt in opts_list or []:
if ":" not in opt:
continue
key, value = opt.split(":", 1)
key = key.strip()
value = value.strip()
if _is_int(value):
out[key] = int(value)
elif _is_float(value):
out[key] = float(value)
elif value.lower() in ("true", "false"):
out[key] = value.lower() == "true"
else:
out[key] = value
return out
def _generate_audio_sync(servicer, payload, dst_path):
"""
Run full ACE-Step pipeline using acestep.inference:
- If sample_mode/sample_query: create_sample() for caption/lyrics/metadata.
- If use_format and caption/lyrics: format_sample().
- Build GenerationParams and GenerationConfig, then generate_music().
Writes the first generated audio to dst_path. Raises on failure.
"""
opts = servicer.options
dit_handler = servicer.dit_handler
llm_handler = servicer.llm_handler
for key, value in opts.items():
if key not in payload:
payload[key] = value
def _opt(name, default):
return opts.get(name, default)
lm_temperature = _opt("temperature", 0.85)
lm_cfg_scale = _opt("lm_cfg_scale", _opt("cfg_scale", 2.0))
lm_top_k = opts.get("top_k")
lm_top_p = _opt("top_p", 0.9)
if lm_top_p is not None and lm_top_p >= 1.0:
lm_top_p = None
inference_steps = _opt("inference_steps", 8)
guidance_scale = _opt("guidance_scale", 7.0)
batch_size = max(1, int(_opt("batch_size", 1)))
use_simple = bool(payload.get("sample_query") or payload.get("text"))
sample_mode = use_simple and (payload.get("thinking") or payload.get("sample_mode"))
sample_query = (payload.get("sample_query") or payload.get("text") or "").strip()
use_format = bool(payload.get("use_format"))
caption = (payload.get("prompt") or payload.get("caption") or "").strip()
lyrics = (payload.get("lyrics") or "").strip()
vocal_language = (payload.get("vocal_language") or "en").strip()
instrumental = bool(payload.get("instrumental"))
bpm = payload.get("bpm")
key_scale = (payload.get("key_scale") or "").strip()
time_signature = (payload.get("time_signature") or "").strip()
audio_duration = payload.get("audio_duration")
if audio_duration is not None:
try:
audio_duration = float(audio_duration)
except (TypeError, ValueError):
audio_duration = None
if sample_mode and llm_handler and getattr(llm_handler, "llm_initialized", False):
parsed_language = None
if sample_query:
for hint in ("english", "en", "chinese", "zh", "japanese", "ja"):
if hint in sample_query.lower():
parsed_language = "en" if hint == "english" or hint == "en" else hint
break
vocal_lang = vocal_language if vocal_language and vocal_language != "unknown" else parsed_language
sample_result = create_sample(
llm_handler=llm_handler,
query=sample_query or "NO USER INPUT",
instrumental=instrumental,
vocal_language=vocal_lang,
temperature=lm_temperature,
top_k=lm_top_k,
top_p=lm_top_p,
use_constrained_decoding=True,
)
if not sample_result.success:
raise RuntimeError(f"create_sample failed: {sample_result.error or sample_result.status_message}")
caption = sample_result.caption or caption
lyrics = sample_result.lyrics or lyrics
bpm = sample_result.bpm
key_scale = sample_result.keyscale or key_scale
time_signature = sample_result.timesignature or time_signature
if sample_result.duration is not None:
audio_duration = sample_result.duration
if getattr(sample_result, "language", None):
vocal_language = sample_result.language
if use_format and (caption or lyrics) and llm_handler and getattr(llm_handler, "llm_initialized", False):
user_metadata = {}
if bpm is not None:
user_metadata["bpm"] = bpm
if audio_duration is not None and float(audio_duration) > 0:
user_metadata["duration"] = int(audio_duration)
if key_scale:
user_metadata["keyscale"] = key_scale
if time_signature:
user_metadata["timesignature"] = time_signature
if vocal_language and vocal_language != "unknown":
user_metadata["language"] = vocal_language
format_result = format_sample(
llm_handler=llm_handler,
caption=caption,
lyrics=lyrics,
user_metadata=user_metadata if user_metadata else None,
temperature=lm_temperature,
top_k=lm_top_k,
top_p=lm_top_p,
use_constrained_decoding=True,
)
if format_result.success:
caption = format_result.caption or caption
lyrics = format_result.lyrics or lyrics
if format_result.duration is not None:
audio_duration = format_result.duration
if format_result.bpm is not None:
bpm = format_result.bpm
if format_result.keyscale:
key_scale = format_result.keyscale
if format_result.timesignature:
time_signature = format_result.timesignature
if getattr(format_result, "language", None):
vocal_language = format_result.language
thinking = bool(payload.get("thinking"))
use_cot_metas = not sample_mode
params = GenerationParams(
task_type=payload.get("task_type", "text2music"),
instruction=payload.get("instruction", "Fill the audio semantic mask based on the given conditions:"),
reference_audio=payload.get("reference_audio_path"),
src_audio=payload.get("src_audio_path"),
audio_codes=payload.get("audio_code_string", ""),
caption=caption,
lyrics=lyrics,
instrumental=instrumental or (not lyrics or str(lyrics).strip().lower() in ("[inst]", "[instrumental]")),
vocal_language=vocal_language or "unknown",
bpm=bpm,
keyscale=key_scale,
timesignature=time_signature,
duration=float(audio_duration) if audio_duration and float(audio_duration) > 0 else -1.0,
inference_steps=inference_steps,
seed=int(payload.get("seed", -1)),
guidance_scale=guidance_scale,
use_adg=bool(payload.get("use_adg")),
cfg_interval_start=float(payload.get("cfg_interval_start", 0.0)),
cfg_interval_end=float(payload.get("cfg_interval_end", 1.0)),
shift=float(payload.get("shift", 1.0)),
infer_method=(payload.get("infer_method") or "ode").strip(),
timesteps=_parse_timesteps(payload.get("timesteps")),
repainting_start=float(payload.get("repainting_start", 0.0)),
repainting_end=float(payload.get("repainting_end", -1)) if payload.get("repainting_end") is not None else -1,
audio_cover_strength=float(payload.get("audio_cover_strength", 1.0)),
thinking=thinking,
lm_temperature=lm_temperature,
lm_cfg_scale=lm_cfg_scale,
lm_top_k=lm_top_k or 0,
lm_top_p=lm_top_p if lm_top_p is not None and lm_top_p < 1.0 else 0.9,
lm_negative_prompt=payload.get("lm_negative_prompt", "NO USER INPUT"),
use_cot_metas=use_cot_metas,
use_cot_caption=bool(payload.get("use_cot_caption", True)),
use_cot_language=bool(payload.get("use_cot_language", True)),
use_constrained_decoding=True,
)
config = GenerationConfig(
batch_size=batch_size,
allow_lm_batch=bool(payload.get("allow_lm_batch", False)),
use_random_seed=bool(payload.get("use_random_seed", True)),
seeds=payload.get("seeds"),
lm_batch_chunk_size=max(1, int(payload.get("lm_batch_chunk_size", 8))),
constrained_decoding_debug=bool(payload.get("constrained_decoding_debug")),
audio_format=(payload.get("audio_format") or "flac").strip() or "flac",
)
save_dir = tempfile.mkdtemp(prefix="ace_step_")
try:
result = generate_music(
dit_handler=dit_handler,
llm_handler=llm_handler if (llm_handler and getattr(llm_handler, "llm_initialized", False)) else None,
params=params,
config=config,
save_dir=save_dir,
progress=None,
)
if not result.success:
raise RuntimeError(result.error or result.status_message or "generate_music failed")
audios = result.audios or []
if not audios:
raise RuntimeError("generate_music returned no audio")
first_path = audios[0].get("path") or ""
if not first_path or not os.path.isfile(first_path):
raise RuntimeError("first generated audio path missing or not a file")
shutil.copy2(first_path, dst_path)
finally:
try:
shutil.rmtree(save_dir, ignore_errors=True)
except Exception:
pass
class BackendServicer(backend_pb2_grpc.BackendServicer):
def __init__(self):
self.model_path = None
self.model_dir = None
self.checkpoint_dir = None
self.project_root = None
self.options = {}
self.dit_handler = None
self.llm_handler = None
def Health(self, request, context):
return backend_pb2.Reply(message=b"OK")
def LoadModel(self, request, context):
try:
self.options = _parse_options(list(getattr(request, "Options", []) or []))
model_path = getattr(request, "ModelPath", None) or ""
model_name = (request.Model or "").strip()
model_file = (getattr(request, "ModelFile", None) or "").strip()
# Model dir: where we store checkpoints (always under LocalAI models path, never backend dir)
if model_path and model_name:
model_dir = os.path.join(model_path, model_name)
elif model_file:
model_dir = model_file
else:
model_dir = os.path.abspath(model_name or ".")
self.model_dir = model_dir
self.checkpoint_dir = os.path.join(model_dir, "checkpoints")
self.project_root = model_dir
self.model_path = os.path.join(self.checkpoint_dir, model_name or os.path.basename(model_dir.rstrip("/\\")))
config_path = model_name or os.path.basename(model_dir.rstrip("/\\"))
os.makedirs(self.checkpoint_dir, exist_ok=True)
self.dit_handler = AceStepHandler()
# Patch handler so it uses our model dir instead of site-packages/checkpoints
self.dit_handler._get_project_root = lambda: self.project_root
device = self.options.get("device", "auto")
use_flash = self.options.get("use_flash_attention", True)
if isinstance(use_flash, str):
use_flash = str(use_flash).lower() in ("1", "true", "yes")
offload = self.options.get("offload_to_cpu", False)
if isinstance(offload, str):
offload = str(offload).lower() in ("1", "true", "yes")
status_msg, ok = self.dit_handler.initialize_service(
project_root=self.project_root,
config_path=config_path,
device=device,
use_flash_attention=use_flash,
compile_model=False,
offload_to_cpu=offload,
offload_dit_to_cpu=bool(self.options.get("offload_dit_to_cpu", False)),
)
if not ok:
return backend_pb2.Result(success=False, message=f"DiT init failed: {status_msg}")
self.llm_handler = None
if self.options.get("init_lm", True):
lm_model = self.options.get("lm_model_path", "acestep-5Hz-lm-0.6B")
# Ensure LM model is downloaded before initializing
try:
from pathlib import Path
lm_success, lm_msg = ensure_lm_model(
model_name=lm_model,
checkpoints_dir=Path(self.checkpoint_dir),
prefer_source=None, # Auto-detect HuggingFace vs ModelScope
)
if not lm_success:
print(f"[ace-step] Warning: LM model download failed: {lm_msg}", file=sys.stderr)
# Continue anyway - LLM initialization will fail gracefully
else:
print(f"[ace-step] LM model ready: {lm_msg}", file=sys.stderr)
except Exception as e:
print(f"[ace-step] Warning: LM model download check failed: {e}", file=sys.stderr)
# Continue anyway - LLM initialization will fail gracefully
self.llm_handler = LLMHandler()
lm_backend = (self.options.get("lm_backend") or "vllm").strip().lower()
if lm_backend not in ("vllm", "pt"):
lm_backend = "vllm"
lm_status, lm_ok = self.llm_handler.initialize(
checkpoint_dir=self.checkpoint_dir,
lm_model_path=lm_model,
backend=lm_backend,
device=device,
offload_to_cpu=offload,
dtype=getattr(self.dit_handler, "dtype", None),
)
if not lm_ok:
self.llm_handler = None
print(f"[ace-step] LM init failed (optional): {lm_status}", file=sys.stderr)
print(f"[ace-step] LoadModel: model={self.model_path}, options={list(self.options.keys())}", file=sys.stderr)
return backend_pb2.Result(success=True, message="Model loaded successfully")
except Exception as err:
return backend_pb2.Result(success=False, message=f"LoadModel error: {err}")
def SoundGeneration(self, request, context):
if not request.dst:
return backend_pb2.Result(success=False, message="request.dst is required")
use_simple = bool(request.text)
if use_simple:
payload = {
"sample_query": request.text or "",
"sample_mode": True,
"thinking": True,
"vocal_language": request.language or request.GetLanguage() or "en",
"instrumental": request.instrumental if request.HasField("instrumental") else False,
}
else:
caption = request.caption or request.GetCaption() or request.text
payload = {
"prompt": caption,
"lyrics": request.lyrics or request.lyrics or "",
"thinking": request.think if request.HasField("think") else False,
"vocal_language": request.language or request.GetLanguage() or "en",
}
if request.HasField("bpm"):
payload["bpm"] = request.bpm
if request.HasField("keyscale") and request.keyscale:
payload["key_scale"] = request.keyscale
if request.HasField("timesignature") and request.timesignature:
payload["time_signature"] = request.timesignature
if request.HasField("duration") and request.duration:
payload["audio_duration"] = int(request.duration) if request.duration else None
if request.src:
payload["src_audio_path"] = request.src
_generate_audio_sync(self, payload, request.dst)
return backend_pb2.Result(success=True, message="Sound generated successfully")
def TTS(self, request, context):
if not request.dst:
return backend_pb2.Result(success=False, message="request.dst is required")
payload = {
"sample_query": request.text,
"sample_mode": True,
"thinking": False,
"vocal_language": (request.language if request.language else "") or "en",
"instrumental": False,
}
_generate_audio_sync(self, payload, request.dst)
return backend_pb2.Result(success=True, message="TTS (music fallback) generated successfully")
def serve(address):
server = grpc.server(
futures.ThreadPoolExecutor(max_workers=MAX_WORKERS),
options=[
("grpc.max_message_length", 50 * 1024 * 1024),
("grpc.max_send_message_length", 50 * 1024 * 1024),
("grpc.max_receive_message_length", 50 * 1024 * 1024),
],
)
backend_pb2_grpc.add_BackendServicer_to_server(BackendServicer(), server)
server.add_insecure_port(address)
server.start()
print(f"[ace-step] Server listening on {address}", file=sys.stderr)
def shutdown(sig, frame):
server.stop(0)
sys.exit(0)
signal.signal(signal.SIGINT, shutdown)
signal.signal(signal.SIGTERM, shutdown)
try:
while True:
import time
time.sleep(_ONE_DAY_IN_SECONDS)
except KeyboardInterrupt:
server.stop(0)
if __name__ == "__main__":
parser = argparse.ArgumentParser()
parser.add_argument("--addr", default="localhost:50051", help="Listen address")
args = parser.parse_args()
serve(args.addr)

View File

@@ -0,0 +1,26 @@
#!/bin/bash
set -e
backend_dir=$(dirname $0)
if [ -d $backend_dir/common ]; then
source $backend_dir/common/libbackend.sh
else
source $backend_dir/../common/libbackend.sh
fi
PYTHON_VERSION="3.11"
PYTHON_PATCH="14"
PY_STANDALONE_TAG="20260203"
installRequirements
if [ ! -d ACE-Step-1.5 ]; then
git clone https://github.com/ace-step/ACE-Step-1.5
cd ACE-Step-1.5/
if [ "x${USE_PIP}" == "xtrue" ]; then
pip install ${EXTRA_PIP_INSTALL_FLAGS:-} --no-deps .
else
uv pip install ${EXTRA_PIP_INSTALL_FLAGS:-} --no-deps .
fi
fi

View File

@@ -0,0 +1,22 @@
--extra-index-url https://download.pytorch.org/whl/cpu
torch
torchaudio
torchvision
# Core dependencies
transformers>=4.51.0,<4.58.0
diffusers
gradio
matplotlib>=3.7.5
scipy>=1.10.1
soundfile>=0.13.1
loguru>=0.7.3
einops>=0.8.1
accelerate>=1.12.0
fastapi>=0.110.0
uvicorn[standard]>=0.27.0
numba>=0.63.1
vector-quantize-pytorch>=1.27.15
torchcodec>=0.9.1
torchao
modelscope

View File

@@ -0,0 +1,22 @@
--extra-index-url https://download.pytorch.org/whl/cu128
torch
torchaudio
torchvision
# Core dependencies
transformers>=4.51.0,<4.58.0
diffusers
gradio>=6.5.1
matplotlib>=3.7.5
scipy>=1.10.1
soundfile>=0.13.1
loguru>=0.7.3
einops>=0.8.1
accelerate>=1.12.0
fastapi>=0.110.0
uvicorn[standard]>=0.27.0
numba>=0.63.1
vector-quantize-pytorch>=1.27.15
torchcodec>=0.9.1
torchao
modelscope

View File

@@ -0,0 +1,22 @@
--extra-index-url https://download.pytorch.org/whl/cu130
torch
torchaudio
torchvision
# Core dependencies
transformers>=4.51.0,<4.58.0
diffusers
gradio>=6.5.1
matplotlib>=3.7.5
scipy>=1.10.1
soundfile>=0.13.1
loguru>=0.7.3
einops>=0.8.1
accelerate>=1.12.0
fastapi>=0.110.0
uvicorn[standard]>=0.27.0
numba>=0.63.1
vector-quantize-pytorch>=1.27.15
torchcodec>=0.9.1
torchao
modelscope

View File

@@ -0,0 +1,22 @@
--extra-index-url https://download.pytorch.org/whl/rocm6.4
torch==2.8.0+rocm6.4
torchaudio
torchvision
# Core dependencies
transformers>=4.51.0,<4.58.0
diffusers
gradio>=6.5.1
matplotlib>=3.7.5
scipy>=1.10.1
soundfile>=0.13.1
loguru>=0.7.3
einops>=0.8.1
accelerate>=1.12.0
fastapi>=0.110.0
uvicorn[standard]>=0.27.0
numba>=0.63.1
vector-quantize-pytorch>=1.27.15
torchcodec>=0.9.1
torchao
modelscope

View File

@@ -0,0 +1,26 @@
--extra-index-url https://download.pytorch.org/whl/xpu
torch
torchaudio
torchvision
# Core dependencies
transformers>=4.51.0,<4.58.0
diffusers
gradio
matplotlib>=3.7.5
scipy>=1.10.1
soundfile>=0.13.1
loguru>=0.7.3
einops>=0.8.1
accelerate>=1.12.0
fastapi>=0.110.0
uvicorn[standard]>=0.27.0
numba>=0.63.1
vector-quantize-pytorch>=1.27.15
torchcodec>=0.9.1
torchao
modelscope
# LoRA Training dependencies (optional)
peft>=0.7.0
lightning>=2.0.0

View File

@@ -0,0 +1,21 @@
--extra-index-url https://download.pytorch.org/whl/cu130
torch
torchaudio
torchvision
# Core dependencies
transformers>=4.51.0,<4.58.0
diffusers
gradio>=6.5.1
matplotlib>=3.7.5
scipy>=1.10.1
soundfile>=0.13.1
loguru>=0.7.3
einops>=0.8.1
accelerate>=1.12.0
fastapi>=0.110.0
uvicorn[standard]>=0.27.0
numba>=0.63.1
vector-quantize-pytorch>=1.27.15
torchcodec>=0.9.1
torchao
modelscope

View File

@@ -0,0 +1,25 @@
torch
torchaudio
torchvision
# Core dependencies
transformers>=4.51.0,<4.58.0
diffusers
gradio
matplotlib>=3.7.5
scipy>=1.10.1
soundfile>=0.13.1
loguru>=0.7.3
einops>=0.8.1
accelerate>=1.12.0
fastapi>=0.110.0
uvicorn[standard]>=0.27.0
numba>=0.63.1
vector-quantize-pytorch>=1.27.15
torchcodec>=0.9.1
torchao
modelscope
# LoRA Training dependencies (optional)
peft>=0.7.0
lightning>=2.0.0

View File

@@ -0,0 +1,4 @@
setuptools
grpcio==1.76.0
protobuf
certifi

View File

@@ -0,0 +1,9 @@
#!/bin/bash
backend_dir=$(dirname $0)
if [ -d $backend_dir/common ]; then
source $backend_dir/common/libbackend.sh
else
source $backend_dir/../common/libbackend.sh
fi
startBackend $@

View File

@@ -0,0 +1,53 @@
"""
Tests for the ACE-Step gRPC backend.
"""
import os
import tempfile
import unittest
import backend_pb2
import backend_pb2_grpc
import grpc
class TestACEStepBackend(unittest.TestCase):
"""Test Health, LoadModel, and SoundGeneration (minimal; no real model required)."""
@classmethod
def setUpClass(cls):
port = os.environ.get("BACKEND_PORT", "50051")
cls.channel = grpc.insecure_channel(f"localhost:{port}")
cls.stub = backend_pb2_grpc.BackendStub(cls.channel)
@classmethod
def tearDownClass(cls):
cls.channel.close()
def test_health(self):
response = self.stub.Health(backend_pb2.HealthMessage())
self.assertEqual(response.message, b"OK")
def test_load_model(self):
response = self.stub.LoadModel(backend_pb2.ModelOptions(Model="ace-step-test"))
self.assertTrue(response.success, response.message)
def test_sound_generation_minimal(self):
with tempfile.NamedTemporaryFile(suffix=".wav", delete=False) as f:
dst = f.name
try:
req = backend_pb2.SoundGenerationRequest(
text="upbeat pop song",
model="ace-step-test",
dst=dst,
)
response = self.stub.SoundGeneration(req)
self.assertTrue(response.success, response.message)
self.assertTrue(os.path.exists(dst), f"Output file not created: {dst}")
self.assertGreater(os.path.getsize(dst), 0)
finally:
if os.path.exists(dst):
os.unlink(dst)
if __name__ == "__main__":
unittest.main()

View File

@@ -0,0 +1,19 @@
#!/bin/bash
set -e
backend_dir=$(dirname $0)
if [ -d $backend_dir/common ]; then
source $backend_dir/common/libbackend.sh
else
source $backend_dir/../common/libbackend.sh
fi
# Start backend in background (use env to avoid port conflict in parallel tests)
export PYTHONUNBUFFERED=1
BACKEND_PORT=${BACKEND_PORT:-50051}
python backend.py --addr "localhost:${BACKEND_PORT}" &
BACKEND_PID=$!
trap "kill $BACKEND_PID 2>/dev/null || true" EXIT
sleep 3
export BACKEND_PORT
runUnittests

View File

@@ -0,0 +1,7 @@
torch
torchaudio
accelerate
numpy>=1.24.0,<1.26.0
transformers
# https://github.com/mudler/LocalAI/pull/6240#issuecomment-3329518289
chatterbox-tts@git+https://git@github.com/mudler/chatterbox.git@faster

View File

@@ -1,3 +1,3 @@
grpcio==1.76.0
grpcio==1.78.1
protobuf
grpcio-tools

View File

@@ -0,0 +1,4 @@
torch==2.7.1
transformers==4.48.3
accelerate
coqui-tts

View File

@@ -1,4 +1,4 @@
grpcio==1.76.0
grpcio==1.78.1
protobuf
certifi
packaging==24.1

View File

@@ -115,6 +115,7 @@ Available pipelines: AnimateDiffPipeline, AnimateDiffVideoToVideoPipeline, ...
| Variable | Default | Description |
|----------|---------|-------------|
| `COMPEL` | `0` | Enable Compel for prompt weighting |
| `SD_EMBED` | `0` | Enable sd_embed for prompt weighting |
| `XPU` | `0` | Enable Intel XPU support |
| `CLIPSKIP` | `1` | Enable CLIP skip support |
| `SAFETENSORS` | `1` | Use safetensors format |

View File

@@ -40,6 +40,21 @@ from compel import Compel, ReturnedEmbeddingsType
from optimum.quanto import freeze, qfloat8, quantize
from transformers import T5EncoderModel
from safetensors.torch import load_file
# Try to import sd_embed - it might not always be available
try:
from sd_embed.embedding_funcs import (
get_weighted_text_embeddings_sd15,
get_weighted_text_embeddings_sdxl,
get_weighted_text_embeddings_sd3,
get_weighted_text_embeddings_flux1,
)
SD_EMBED_AVAILABLE = True
except ImportError:
get_weighted_text_embeddings_sd15 = None
get_weighted_text_embeddings_sdxl = None
get_weighted_text_embeddings_sd3 = None
get_weighted_text_embeddings_flux1 = None
SD_EMBED_AVAILABLE = False
# Import LTX-2 specific utilities
from diffusers.pipelines.ltx2.export_utils import encode_video as ltx2_encode_video
@@ -47,6 +62,10 @@ from diffusers import LTX2VideoTransformer3DModel, GGUFQuantizationConfig
_ONE_DAY_IN_SECONDS = 60 * 60 * 24
COMPEL = os.environ.get("COMPEL", "0") == "1"
SD_EMBED = os.environ.get("SD_EMBED", "0") == "1"
# Warn if SD_EMBED is enabled but the module is not available
if SD_EMBED and not SD_EMBED_AVAILABLE:
print("WARNING: SD_EMBED is enabled but sd_embed module is not available. Falling back to standard prompt processing.", file=sys.stderr)
XPU = os.environ.get("XPU", "0") == "1"
CLIPSKIP = os.environ.get("CLIPSKIP", "1") == "1"
SAFETENSORS = os.environ.get("SAFETENSORS", "1") == "1"
@@ -177,7 +196,7 @@ def get_scheduler(name: str, config: dict = {}):
# Implement the BackendServicer class with the service methods
class BackendServicer(backend_pb2_grpc.BackendServicer):
def _load_pipeline(self, request, modelFile, fromSingleFile, torchType, variant):
def _load_pipeline(self, request, modelFile, fromSingleFile, torchType, variant, device_map=None):
"""
Load a diffusers pipeline dynamically using the dynamic loader.
@@ -191,6 +210,7 @@ class BackendServicer(backend_pb2_grpc.BackendServicer):
fromSingleFile: Whether to use from_single_file() vs from_pretrained()
torchType: The torch dtype to use
variant: Model variant (e.g., "fp16")
device_map: Device mapping strategy (e.g., "auto" for multi-GPU)
Returns:
The loaded pipeline instance
@@ -212,14 +232,14 @@ class BackendServicer(backend_pb2_grpc.BackendServicer):
dtype = torch.bfloat16
bfl_repo = os.environ.get("BFL_REPO", "ChuckMcSneed/FLUX.1-dev")
transformer = FluxTransformer2DModel.from_single_file(modelFile, torch_dtype=dtype)
transformer = FluxTransformer2DModel.from_single_file(modelFile, torch_dtype=dtype, device_map=device_map)
quantize(transformer, weights=qfloat8)
freeze(transformer)
text_encoder_2 = T5EncoderModel.from_pretrained(bfl_repo, subfolder="text_encoder_2", torch_dtype=dtype)
text_encoder_2 = T5EncoderModel.from_pretrained(bfl_repo, subfolder="text_encoder_2", torch_dtype=dtype, device_map=device_map)
quantize(text_encoder_2, weights=qfloat8)
freeze(text_encoder_2)
pipe = FluxPipeline.from_pretrained(bfl_repo, transformer=None, text_encoder_2=None, torch_dtype=dtype)
pipe = FluxPipeline.from_pretrained(bfl_repo, transformer=None, text_encoder_2=None, torch_dtype=dtype, device_map=device_map)
pipe.transformer = transformer
pipe.text_encoder_2 = text_encoder_2
@@ -232,13 +252,15 @@ class BackendServicer(backend_pb2_grpc.BackendServicer):
vae = AutoencoderKLWan.from_pretrained(
request.Model,
subfolder="vae",
torch_dtype=torch.float32
torch_dtype=torch.float32,
device_map=device_map
)
pipe = load_diffusers_pipeline(
class_name="WanPipeline",
model_id=request.Model,
vae=vae,
torch_dtype=torchType
torch_dtype=torchType,
device_map=device_map
)
self.txt2vid = True
return pipe
@@ -248,13 +270,15 @@ class BackendServicer(backend_pb2_grpc.BackendServicer):
vae = AutoencoderKLWan.from_pretrained(
request.Model,
subfolder="vae",
torch_dtype=torch.float32
torch_dtype=torch.float32,
device_map=device_map
)
pipe = load_diffusers_pipeline(
class_name="WanImageToVideoPipeline",
model_id=request.Model,
vae=vae,
torch_dtype=torchType
torch_dtype=torchType,
device_map=device_map
)
self.img2vid = True
return pipe
@@ -265,7 +289,8 @@ class BackendServicer(backend_pb2_grpc.BackendServicer):
class_name="SanaPipeline",
model_id=request.Model,
variant="bf16",
torch_dtype=torch.bfloat16
torch_dtype=torch.bfloat16,
device_map=device_map
)
pipe.vae.to(torch.bfloat16)
pipe.text_encoder.to(torch.bfloat16)
@@ -277,7 +302,8 @@ class BackendServicer(backend_pb2_grpc.BackendServicer):
pipe = load_diffusers_pipeline(
class_name="DiffusionPipeline",
model_id=request.Model,
torch_dtype=torchType
torch_dtype=torchType,
device_map=device_map
)
return pipe
@@ -288,7 +314,8 @@ class BackendServicer(backend_pb2_grpc.BackendServicer):
class_name="StableVideoDiffusionPipeline",
model_id=request.Model,
torch_dtype=torchType,
variant=variant
variant=variant,
device_map=device_map
)
if not DISABLE_CPU_OFFLOAD:
pipe.enable_model_cpu_offload()
@@ -312,6 +339,7 @@ class BackendServicer(backend_pb2_grpc.BackendServicer):
modelFile,
config=request.Model, # Use request.Model as the config/model_id
subfolder="transformer",
device_map=device_map,
**transformer_kwargs,
)
@@ -321,6 +349,7 @@ class BackendServicer(backend_pb2_grpc.BackendServicer):
model_id=request.Model,
transformer=transformer,
torch_dtype=torchType,
device_map=device_map,
)
else:
# Single file but not GGUF - use standard single file loading
@@ -329,6 +358,7 @@ class BackendServicer(backend_pb2_grpc.BackendServicer):
model_id=modelFile,
from_single_file=True,
torch_dtype=torchType,
device_map=device_map,
)
else:
# Standard loading from pretrained
@@ -336,7 +366,8 @@ class BackendServicer(backend_pb2_grpc.BackendServicer):
class_name="LTX2ImageToVideoPipeline",
model_id=request.Model,
torch_dtype=torchType,
variant=variant
variant=variant,
device_map=device_map
)
if not DISABLE_CPU_OFFLOAD:
@@ -361,6 +392,7 @@ class BackendServicer(backend_pb2_grpc.BackendServicer):
modelFile,
config=request.Model, # Use request.Model as the config/model_id
subfolder="transformer",
device_map=device_map,
**transformer_kwargs,
)
@@ -370,6 +402,7 @@ class BackendServicer(backend_pb2_grpc.BackendServicer):
model_id=request.Model,
transformer=transformer,
torch_dtype=torchType,
device_map=device_map,
)
else:
# Single file but not GGUF - use standard single file loading
@@ -378,6 +411,7 @@ class BackendServicer(backend_pb2_grpc.BackendServicer):
model_id=modelFile,
from_single_file=True,
torch_dtype=torchType,
device_map=device_map,
)
else:
# Standard loading from pretrained
@@ -385,7 +419,8 @@ class BackendServicer(backend_pb2_grpc.BackendServicer):
class_name="LTX2Pipeline",
model_id=request.Model,
torch_dtype=torchType,
variant=variant
variant=variant,
device_map=device_map
)
if not DISABLE_CPU_OFFLOAD:
@@ -408,6 +443,10 @@ class BackendServicer(backend_pb2_grpc.BackendServicer):
if not fromSingleFile:
load_kwargs["use_safetensors"] = SAFETENSORS
# Add device_map for multi-GPU support (when TensorParallelSize > 1)
if device_map:
load_kwargs["device_map"] = device_map
# Determine pipeline class name - default to AutoPipelineForText2Image
effective_pipeline_type = pipeline_type if pipeline_type else "AutoPipelineForText2Image"
@@ -510,6 +549,13 @@ class BackendServicer(backend_pb2_grpc.BackendServicer):
print(f"LoadModel: PipelineType from request: {request.PipelineType}", file=sys.stderr)
# Determine device_map for multi-GPU support based on TensorParallelSize
# When TensorParallelSize > 1, use device_map='auto' to distribute model across GPUs
device_map = None
if hasattr(request, 'TensorParallelSize') and request.TensorParallelSize > 1:
device_map = "auto"
print(f"LoadModel: Multi-GPU mode enabled with TensorParallelSize={request.TensorParallelSize}, using device_map='auto'", file=sys.stderr)
# Load pipeline using dynamic loader
# Special cases that require custom initialization are handled first
self.pipe = self._load_pipeline(
@@ -517,7 +563,8 @@ class BackendServicer(backend_pb2_grpc.BackendServicer):
modelFile=modelFile,
fromSingleFile=fromSingleFile,
torchType=torchType,
variant=variant
variant=variant,
device_map=device_map
)
print(f"LoadModel: After loading - ltx2_pipeline: {self.ltx2_pipeline}, img2vid: {self.img2vid}, txt2vid: {self.txt2vid}, PipelineType: {self.PipelineType}", file=sys.stderr)
@@ -542,7 +589,7 @@ class BackendServicer(backend_pb2_grpc.BackendServicer):
if request.ControlNet:
self.controlnet = ControlNetModel.from_pretrained(
request.ControlNet, torch_dtype=torchType, variant=variant
request.ControlNet, torch_dtype=torchType, variant=variant, device_map=device_map
)
self.pipe.controlnet = self.controlnet
else:
@@ -581,7 +628,9 @@ class BackendServicer(backend_pb2_grpc.BackendServicer):
self.pipe.set_adapters(adapters_name, adapter_weights=adapters_weights)
if device != "cpu":
# Only move pipeline to device if NOT using device_map
# device_map handles device placement automatically
if device_map is None and device != "cpu":
self.pipe.to(device)
if self.controlnet:
self.controlnet.to(device)
@@ -737,6 +786,51 @@ class BackendServicer(backend_pb2_grpc.BackendServicer):
kwargs["prompt_embeds"] = conditioning
kwargs["pooled_prompt_embeds"] = pooled
# pass the kwargs dictionary to the self.pipe method
image = self.pipe(
guidance_scale=self.cfg_scale,
**kwargs
).images[0]
elif SD_EMBED and SD_EMBED_AVAILABLE:
if self.PipelineType == "StableDiffusionPipeline":
(
kwargs["prompt_embeds"],
kwargs["negative_prompt_embeds"],
) = get_weighted_text_embeddings_sd15(
pipe = self.pipe,
prompt = prompt,
neg_prompt = request.negative_prompt if hasattr(request, 'negative_prompt') else None,
)
if self.PipelineType == "StableDiffusionXLPipeline":
(
kwargs["prompt_embeds"],
kwargs["negative_prompt_embeds"],
kwargs["pooled_prompt_embeds"],
kwargs["negative_pooled_prompt_embeds"],
) = get_weighted_text_embeddings_sdxl(
pipe = self.pipe,
prompt = prompt,
neg_prompt = request.negative_prompt if hasattr(request, 'negative_prompt') else None
)
if self.PipelineType == "StableDiffusion3Pipeline":
(
kwargs["prompt_embeds"],
kwargs["negative_prompt_embeds"],
kwargs["pooled_prompt_embeds"],
kwargs["negative_pooled_prompt_embeds"],
) = get_weighted_text_embeddings_sd3(
pipe = self.pipe,
prompt = prompt,
neg_prompt = request.negative_prompt if hasattr(request, 'negative_prompt') else None
)
if self.PipelineType == "FluxTransformer2DModel":
(
kwargs["prompt_embeds"],
kwargs["pooled_prompt_embeds"],
) = get_weighted_text_embeddings_flux1(
pipe = self.pipe,
prompt = prompt,
)
image = self.pipe(
guidance_scale=self.cfg_scale,
**kwargs

View File

@@ -5,6 +5,7 @@ transformers
torchvision==0.22.1
accelerate
compel
git+https://github.com/xhinker/sd_embed
peft
sentencepiece
torch==2.7.1

View File

@@ -5,6 +5,7 @@ transformers
torchvision
accelerate
compel
git+https://github.com/xhinker/sd_embed
peft
sentencepiece
torch

View File

@@ -5,6 +5,7 @@ transformers
torchvision
accelerate
compel
git+https://github.com/xhinker/sd_embed
peft
sentencepiece
torch

View File

@@ -8,6 +8,7 @@ opencv-python
transformers
accelerate
compel
git+https://github.com/xhinker/sd_embed
peft
sentencepiece
optimum-quanto

View File

@@ -0,0 +1,23 @@
.PHONY: faster-qwen3-tts
faster-qwen3-tts:
bash install.sh
.PHONY: run
run: faster-qwen3-tts
@echo "Running faster-qwen3-tts..."
bash run.sh
@echo "faster-qwen3-tts run."
.PHONY: test
test: faster-qwen3-tts
@echo "Testing faster-qwen3-tts..."
bash test.sh
@echo "faster-qwen3-tts tested."
.PHONY: protogen-clean
protogen-clean:
$(RM) backend_pb2_grpc.py backend_pb2.py
.PHONY: clean
clean: protogen-clean
rm -rf venv __pycache__

View File

@@ -0,0 +1,193 @@
#!/usr/bin/env python3
"""
gRPC server of LocalAI for Faster Qwen3-TTS (CUDA graph capture, voice clone only).
"""
from concurrent import futures
import time
import argparse
import signal
import sys
import os
import traceback
import backend_pb2
import backend_pb2_grpc
import torch
import soundfile as sf
import grpc
def is_float(s):
try:
float(s)
return True
except ValueError:
return False
def is_int(s):
try:
int(s)
return True
except ValueError:
return False
_ONE_DAY_IN_SECONDS = 60 * 60 * 24
MAX_WORKERS = int(os.environ.get('PYTHON_GRPC_MAX_WORKERS', '1'))
class BackendServicer(backend_pb2_grpc.BackendServicer):
def Health(self, request, context):
return backend_pb2.Reply(message=bytes("OK", 'utf-8'))
def LoadModel(self, request, context):
if not torch.cuda.is_available():
return backend_pb2.Result(
success=False,
message="faster-qwen3-tts requires NVIDIA GPU with CUDA"
)
self.options = {}
for opt in request.Options:
if ":" not in opt:
continue
key, value = opt.split(":", 1)
if is_float(value):
value = float(value)
elif is_int(value):
value = int(value)
elif value.lower() in ["true", "false"]:
value = value.lower() == "true"
self.options[key] = value
model_path = request.Model or "Qwen/Qwen3-TTS-12Hz-0.6B-Base"
self.audio_path = request.AudioPath if hasattr(request, 'AudioPath') and request.AudioPath else None
self.model_file = request.ModelFile if hasattr(request, 'ModelFile') and request.ModelFile else None
self.model_path = request.ModelPath if hasattr(request, 'ModelPath') and request.ModelPath else None
from faster_qwen3_tts import FasterQwen3TTS
print(f"Loading model from: {model_path}", file=sys.stderr)
try:
self.model = FasterQwen3TTS.from_pretrained(model_path)
except Exception as e:
print(f"[ERROR] Loading model: {type(e).__name__}: {e}", file=sys.stderr)
print(traceback.format_exc(), file=sys.stderr)
return backend_pb2.Result(success=False, message=str(e))
print(f"Model loaded successfully: {model_path}", file=sys.stderr)
return backend_pb2.Result(message="Model loaded successfully", success=True)
def _get_ref_audio_path(self, request):
if not self.audio_path:
return None
if os.path.isabs(self.audio_path):
return self.audio_path
if self.model_file:
model_file_base = os.path.dirname(self.model_file)
ref_path = os.path.join(model_file_base, self.audio_path)
if os.path.exists(ref_path):
return ref_path
if self.model_path:
ref_path = os.path.join(self.model_path, self.audio_path)
if os.path.exists(ref_path):
return ref_path
return self.audio_path
def TTS(self, request, context):
try:
if not request.dst:
return backend_pb2.Result(
success=False,
message="dst (output path) is required"
)
text = request.text.strip()
if not text:
return backend_pb2.Result(
success=False,
message="Text is empty"
)
language = request.language if hasattr(request, 'language') and request.language else None
if not language or language == "":
language = "English"
ref_audio = self._get_ref_audio_path(request)
if not ref_audio:
return backend_pb2.Result(
success=False,
message="AudioPath is required for voice clone (set in LoadModel)"
)
ref_text = self.options.get("ref_text")
if not ref_text and hasattr(request, 'ref_text') and request.ref_text:
ref_text = request.ref_text
if not ref_text:
return backend_pb2.Result(
success=False,
message="ref_text is required for voice clone (set via LoadModel Options, e.g. ref_text:Your reference transcript)"
)
chunk_size = self.options.get("chunk_size")
generation_kwargs = {}
if chunk_size is not None:
generation_kwargs["chunk_size"] = int(chunk_size)
audio_list, sr = self.model.generate_voice_clone(
text=text,
language=language,
ref_audio=ref_audio,
ref_text=ref_text,
**generation_kwargs
)
if audio_list is None or (isinstance(audio_list, list) and len(audio_list) == 0):
return backend_pb2.Result(
success=False,
message="No audio output generated"
)
audio_data = audio_list[0] if isinstance(audio_list, list) else audio_list
sf.write(request.dst, audio_data, sr)
print(f"Saved output to {request.dst}", file=sys.stderr)
except Exception as err:
print(f"Error in TTS: {err}", file=sys.stderr)
print(traceback.format_exc(), file=sys.stderr)
return backend_pb2.Result(success=False, message=f"Unexpected {err=}, {type(err)=}")
return backend_pb2.Result(success=True)
def serve(address):
server = grpc.server(
futures.ThreadPoolExecutor(max_workers=MAX_WORKERS),
options=[
('grpc.max_message_length', 50 * 1024 * 1024),
('grpc.max_send_message_length', 50 * 1024 * 1024),
('grpc.max_receive_message_length', 50 * 1024 * 1024),
]
)
backend_pb2_grpc.add_BackendServicer_to_server(BackendServicer(), server)
server.add_insecure_port(address)
server.start()
print("Server started. Listening on: " + address, file=sys.stderr)
def signal_handler(sig, frame):
print("Received termination signal. Shutting down...")
server.stop(0)
sys.exit(0)
signal.signal(signal.SIGINT, signal_handler)
signal.signal(signal.SIGTERM, signal_handler)
try:
while True:
time.sleep(_ONE_DAY_IN_SECONDS)
except KeyboardInterrupt:
server.stop(0)
if __name__ == "__main__":
parser = argparse.ArgumentParser(description="Run the gRPC server.")
parser.add_argument("--addr", default="localhost:50051", help="The address to bind the server to.")
args = parser.parse_args()
serve(args.addr)

View File

@@ -0,0 +1,13 @@
#!/bin/bash
set -e
EXTRA_PIP_INSTALL_FLAGS="--no-build-isolation"
backend_dir=$(dirname $0)
if [ -d $backend_dir/common ]; then
source $backend_dir/common/libbackend.sh
else
source $backend_dir/../common/libbackend.sh
fi
installRequirements

View File

@@ -0,0 +1,4 @@
--extra-index-url https://download.pytorch.org/whl/cu121
torch
torchaudio
faster-qwen3-tts

View File

@@ -0,0 +1,4 @@
--extra-index-url https://download.pytorch.org/whl/cu130
torch
torchaudio
faster-qwen3-tts

View File

@@ -0,0 +1,4 @@
--extra-index-url https://pypi.jetson-ai-lab.io/jp6/cu129/
torch
torchaudio
faster-qwen3-tts

View File

@@ -0,0 +1,4 @@
--extra-index-url https://download.pytorch.org/whl/cu130
torch
torchaudio
faster-qwen3-tts

View File

@@ -0,0 +1,8 @@
grpcio==1.71.0
protobuf
certifi
packaging==24.1
soundfile
setuptools
six
sox

View File

@@ -0,0 +1,9 @@
#!/bin/bash
backend_dir=$(dirname $0)
if [ -d $backend_dir/common ]; then
source $backend_dir/common/libbackend.sh
else
source $backend_dir/../common/libbackend.sh
fi
startBackend $@

View File

@@ -0,0 +1,104 @@
"""
Tests for the faster-qwen3-tts gRPC backend.
"""
import unittest
import subprocess
import time
import os
import sys
import tempfile
import backend_pb2
import backend_pb2_grpc
import grpc
class TestBackendServicer(unittest.TestCase):
def setUp(self):
self.service = subprocess.Popen(
["python3", "backend.py", "--addr", "localhost:50052"],
stdout=subprocess.PIPE,
stderr=subprocess.PIPE,
text=True,
cwd=os.path.dirname(os.path.abspath(__file__)),
)
time.sleep(15)
def tearDown(self):
self.service.terminate()
try:
self.service.communicate(timeout=5)
except subprocess.TimeoutExpired:
self.service.kill()
self.service.communicate()
def test_health(self):
with grpc.insecure_channel("localhost:50052") as channel:
stub = backend_pb2_grpc.BackendStub(channel)
reply = stub.Health(backend_pb2.HealthMessage(), timeout=5.0)
self.assertEqual(reply.message, b"OK")
def test_load_model_requires_cuda(self):
with grpc.insecure_channel("localhost:50052") as channel:
stub = backend_pb2_grpc.BackendStub(channel)
response = stub.LoadModel(
backend_pb2.ModelOptions(
Model="Qwen/Qwen3-TTS-12Hz-0.6B-Base",
CUDA=True,
),
timeout=10.0,
)
self.assertFalse(response.success)
@unittest.skipUnless(
__import__("torch").cuda.is_available(),
"faster-qwen3-tts TTS requires CUDA",
)
def test_tts(self):
import soundfile as sf
try:
with grpc.insecure_channel("localhost:50052") as channel:
stub = backend_pb2_grpc.BackendStub(channel)
ref_audio = tempfile.NamedTemporaryFile(suffix='.wav', delete=False)
ref_audio.close()
try:
sr = 22050
duration = 1.0
samples = int(sr * duration)
sf.write(ref_audio.name, [0.0] * samples, sr)
response = stub.LoadModel(
backend_pb2.ModelOptions(
Model="Qwen/Qwen3-TTS-12Hz-0.6B-Base",
AudioPath=ref_audio.name,
Options=["ref_text:Hello world"],
),
timeout=600.0,
)
self.assertTrue(response.success, response.message)
with tempfile.NamedTemporaryFile(suffix='.wav', delete=False) as out:
output_path = out.name
try:
tts_response = stub.TTS(
backend_pb2.TTSRequest(
text="Test output.",
dst=output_path,
language="English",
),
timeout=120.0,
)
self.assertTrue(tts_response.success, tts_response.message)
self.assertTrue(os.path.exists(output_path))
self.assertGreater(os.path.getsize(output_path), 0)
finally:
if os.path.exists(output_path):
os.unlink(output_path)
finally:
if os.path.exists(ref_audio.name):
os.unlink(ref_audio.name)
except Exception as err:
self.fail(f"TTS test failed: {err}")
if __name__ == "__main__":
unittest.main()

View File

@@ -0,0 +1,11 @@
#!/bin/bash
set -e
backend_dir=$(dirname $0)
if [ -d $backend_dir/common ]; then
source $backend_dir/common/libbackend.sh
else
source $backend_dir/../common/libbackend.sh
fi
runUnittests

View File

@@ -0,0 +1,8 @@
torch==2.7.1
faster-whisper
opencv-python
accelerate
compel
peft
sentencepiece
optimum-quanto

View File

@@ -0,0 +1,5 @@
grpcio==1.71.0
protobuf
certifi
packaging==24.1
https://github.com/KittenML/KittenTTS/releases/download/0.1/kittentts-0.1.0-py3-none-any.whl

View File

@@ -0,0 +1,5 @@
torch==2.7.1
transformers
accelerate
kokoro
soundfile

View File

@@ -0,0 +1,2 @@
git+https://github.com/Blaizzy/mlx-audio
mlx[cpu]

View File

@@ -0,0 +1,2 @@
git+https://github.com/Blaizzy/mlx-audio
mlx[cuda12]

View File

@@ -0,0 +1,2 @@
git+https://github.com/Blaizzy/mlx-audio
mlx[cuda13]

View File

@@ -0,0 +1,2 @@
git+https://github.com/Blaizzy/mlx-audio
mlx[cuda12]

View File

@@ -0,0 +1,2 @@
git+https://github.com/Blaizzy/mlx-audio
mlx[cuda13]

View File

@@ -0,0 +1,2 @@
git+https://github.com/Blaizzy/mlx-vlm
mlx[cpu]

View File

@@ -0,0 +1,2 @@
git+https://github.com/Blaizzy/mlx-vlm
mlx[cuda12]

View File

@@ -0,0 +1,2 @@
git+https://github.com/Blaizzy/mlx-vlm
mlx[cuda13]

View File

@@ -0,0 +1,2 @@
git+https://github.com/Blaizzy/mlx-vlm
mlx[cuda12]

Some files were not shown because too many files have changed in this diff Show More