Compare commits

...

53 Commits

Author SHA1 Message Date
Ettore Di Giacinto
8fb95686af chore(model gallery): add ibm-granite_granite-4.0-micro (#6376)
Signed-off-by: Ettore Di Giacinto <mudler@localai.io>
2025-10-03 10:03:34 +02:00
Ettore Di Giacinto
4132085c01 chore(model gallery): add ibm-granite_granite-4.0-h-micro (#6375)
Signed-off-by: Ettore Di Giacinto <mudler@localai.io>
2025-10-03 09:32:20 +02:00
Ettore Di Giacinto
c14f1ffcfd chore(model gallery): add ibm-granite_granite-4.0-h-tiny (#6374)
Signed-off-by: Ettore Di Giacinto <mudler@localai.io>
2025-10-03 09:31:00 +02:00
Ettore Di Giacinto
07cca4b69a chore(model gallery): add ibm-granite_granite-4.0-h-small (#6373)
Signed-off-by: Ettore Di Giacinto <mudler@localai.io>
2025-10-03 09:28:57 +02:00
LocalAI [bot]
dd927c36f6 chore: ⬆️ Update ggml-org/llama.cpp to d64c8104f090b27b1f99e8da5995ffcfa6b726e2 (#6371)
⬆️ Update ggml-org/llama.cpp

Signed-off-by: github-actions[bot] <41898282+github-actions[bot]@users.noreply.github.com>
Co-authored-by: mudler <2420543+mudler@users.noreply.github.com>
2025-10-02 21:09:00 +00:00
LocalAI [bot]
052f42e926 chore: ⬆️ Update ggml-org/llama.cpp to 1fe4e38cc20af058ed320bd46cac934991190056 (#6368)
⬆️ Update ggml-org/llama.cpp

Signed-off-by: github-actions[bot] <41898282+github-actions[bot]@users.noreply.github.com>
Co-authored-by: mudler <2420543+mudler@users.noreply.github.com>
Co-authored-by: Ettore Di Giacinto <mudler@users.noreply.github.com>
2025-10-02 16:29:57 +02:00
LocalAI [bot]
30d43588ab chore: ⬆️ Update ggml-org/whisper.cpp to 7849aff7a2e1f4234aa31b01a1870906d5431959 (#6367)
⬆️ Update ggml-org/whisper.cpp

Signed-off-by: github-actions[bot] <41898282+github-actions[bot]@users.noreply.github.com>
Co-authored-by: mudler <2420543+mudler@users.noreply.github.com>
2025-10-01 21:15:28 +00:00
LocalAI [bot]
d21ec22f74 chore: ⬆️ Update ggml-org/whisper.cpp to 8c0855fd6bb115e113c0dca6255ea05f774d35f7 (#6365)
⬆️ Update ggml-org/whisper.cpp

Signed-off-by: github-actions[bot] <41898282+github-actions[bot]@users.noreply.github.com>
Co-authored-by: mudler <2420543+mudler@users.noreply.github.com>
2025-10-01 12:12:27 +02:00
LocalAI [bot]
04fecd634a chore: ⬆️ Update ggml-org/llama.cpp to b2ba81dbe07b6dbea9c96b13346c66973dede32c (#6366)
⬆️ Update ggml-org/llama.cpp

Signed-off-by: github-actions[bot] <41898282+github-actions[bot]@users.noreply.github.com>
Co-authored-by: mudler <2420543+mudler@users.noreply.github.com>
2025-09-30 21:13:23 +00:00
LocalAI [bot]
33c14198db chore: ⬆️ Update ggml-org/llama.cpp to 5f7e166cbf7b9ca928c7fad990098ef32358ac75 (#6355)
⬆️ Update ggml-org/llama.cpp

Signed-off-by: github-actions[bot] <41898282+github-actions[bot]@users.noreply.github.com>
Co-authored-by: mudler <2420543+mudler@users.noreply.github.com>
2025-09-30 14:41:16 +02:00
LocalAI [bot]
967c2727e3 chore: ⬆️ Update ggml-org/whisper.cpp to 32be14f8ebfc0498c2c619182f0d7f4c822d52c4 (#6354)
⬆️ Update ggml-org/whisper.cpp

Signed-off-by: github-actions[bot] <41898282+github-actions[bot]@users.noreply.github.com>
Co-authored-by: mudler <2420543+mudler@users.noreply.github.com>
2025-09-30 14:40:59 +02:00
dependabot[bot]
f41f30ad92 chore(deps): bump grpcio from 1.74.0 to 1.75.1 in /backend/python/exllama2 (#6356)
chore(deps): bump grpcio in /backend/python/exllama2

Bumps [grpcio](https://github.com/grpc/grpc) from 1.74.0 to 1.75.1.
- [Release notes](https://github.com/grpc/grpc/releases)
- [Changelog](https://github.com/grpc/grpc/blob/master/doc/grpc_release_schedule.md)
- [Commits](https://github.com/grpc/grpc/compare/v1.74.0...v1.75.1)

---
updated-dependencies:
- dependency-name: grpcio
  dependency-version: 1.75.1
  dependency-type: direct:production
  update-type: version-update:semver-minor
...

Signed-off-by: dependabot[bot] <support@github.com>
Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com>
2025-09-30 14:40:41 +02:00
dependabot[bot]
e77340e8a5 chore(deps): bump grpcio from 1.75.0 to 1.75.1 in /backend/python/transformers (#6362)
chore(deps): bump grpcio in /backend/python/transformers

Bumps [grpcio](https://github.com/grpc/grpc) from 1.75.0 to 1.75.1.
- [Release notes](https://github.com/grpc/grpc/releases)
- [Changelog](https://github.com/grpc/grpc/blob/master/doc/grpc_release_schedule.md)
- [Commits](https://github.com/grpc/grpc/compare/v1.75.0...v1.75.1)

---
updated-dependencies:
- dependency-name: grpcio
  dependency-version: 1.75.1
  dependency-type: direct:production
  update-type: version-update:semver-patch
...

Signed-off-by: dependabot[bot] <support@github.com>
Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com>
2025-09-30 14:40:29 +02:00
dependabot[bot]
d51a3090f7 chore(deps): bump grpcio from 1.74.0 to 1.75.1 in /backend/python/bark (#6359)
Bumps [grpcio](https://github.com/grpc/grpc) from 1.74.0 to 1.75.1.
- [Release notes](https://github.com/grpc/grpc/releases)
- [Changelog](https://github.com/grpc/grpc/blob/master/doc/grpc_release_schedule.md)
- [Commits](https://github.com/grpc/grpc/compare/v1.74.0...v1.75.1)

---
updated-dependencies:
- dependency-name: grpcio
  dependency-version: 1.75.1
  dependency-type: direct:production
  update-type: version-update:semver-minor
...

Signed-off-by: dependabot[bot] <support@github.com>
Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com>
2025-09-30 14:40:16 +02:00
dependabot[bot]
1bf3bc932c chore(deps): bump grpcio from 1.74.0 to 1.75.1 in /backend/python/vllm (#6357)
Bumps [grpcio](https://github.com/grpc/grpc) from 1.74.0 to 1.75.1.
- [Release notes](https://github.com/grpc/grpc/releases)
- [Changelog](https://github.com/grpc/grpc/blob/master/doc/grpc_release_schedule.md)
- [Commits](https://github.com/grpc/grpc/compare/v1.74.0...v1.75.1)

---
updated-dependencies:
- dependency-name: grpcio
  dependency-version: 1.75.1
  dependency-type: direct:production
  update-type: version-update:semver-minor
...

Signed-off-by: dependabot[bot] <support@github.com>
Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com>
2025-09-30 14:40:02 +02:00
dependabot[bot]
564a47da4e chore(deps): bump grpcio from 1.74.0 to 1.75.1 in /backend/python/common/template (#6358)
chore(deps): bump grpcio in /backend/python/common/template

Bumps [grpcio](https://github.com/grpc/grpc) from 1.74.0 to 1.75.1.
- [Release notes](https://github.com/grpc/grpc/releases)
- [Changelog](https://github.com/grpc/grpc/blob/master/doc/grpc_release_schedule.md)
- [Commits](https://github.com/grpc/grpc/compare/v1.74.0...v1.75.1)

---
updated-dependencies:
- dependency-name: grpcio
  dependency-version: 1.75.1
  dependency-type: direct:production
  update-type: version-update:semver-minor
...

Signed-off-by: dependabot[bot] <support@github.com>
Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com>
2025-09-30 08:52:36 +02:00
dependabot[bot]
c37ee93ff2 chore(deps): bump grpcio from 1.74.0 to 1.75.1 in /backend/python/rerankers (#6360)
chore(deps): bump grpcio in /backend/python/rerankers

Bumps [grpcio](https://github.com/grpc/grpc) from 1.74.0 to 1.75.1.
- [Release notes](https://github.com/grpc/grpc/releases)
- [Changelog](https://github.com/grpc/grpc/blob/master/doc/grpc_release_schedule.md)
- [Commits](https://github.com/grpc/grpc/compare/v1.74.0...v1.75.1)

---
updated-dependencies:
- dependency-name: grpcio
  dependency-version: 1.75.1
  dependency-type: direct:production
  update-type: version-update:semver-minor
...

Signed-off-by: dependabot[bot] <support@github.com>
Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com>
2025-09-30 08:52:25 +02:00
dependabot[bot]
f4b65db4e7 chore(deps): bump grpcio from 1.74.0 to 1.75.1 in /backend/python/diffusers (#6361)
chore(deps): bump grpcio in /backend/python/diffusers

Bumps [grpcio](https://github.com/grpc/grpc) from 1.74.0 to 1.75.1.
- [Release notes](https://github.com/grpc/grpc/releases)
- [Changelog](https://github.com/grpc/grpc/blob/master/doc/grpc_release_schedule.md)
- [Commits](https://github.com/grpc/grpc/compare/v1.74.0...v1.75.1)

---
updated-dependencies:
- dependency-name: grpcio
  dependency-version: 1.75.1
  dependency-type: direct:production
  update-type: version-update:semver-minor
...

Signed-off-by: dependabot[bot] <support@github.com>
Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com>
2025-09-30 08:52:11 +02:00
Ettore Di Giacinto
f5fa8e6649 Revert "chore(deps): bump transformers from 4.48.3 to 4.56.2 in /backend/python/coqui" (#6363)
Revert "chore(deps): bump transformers from 4.48.3 to 4.56.2 in /backend/pyth…"

This reverts commit 570e39bdcf.
2025-09-30 08:51:49 +02:00
dependabot[bot]
570e39bdcf chore(deps): bump transformers from 4.48.3 to 4.56.2 in /backend/python/coqui (#6330)
chore(deps): bump transformers in /backend/python/coqui

Bumps [transformers](https://github.com/huggingface/transformers) from 4.48.3 to 4.56.2.
- [Release notes](https://github.com/huggingface/transformers/releases)
- [Commits](https://github.com/huggingface/transformers/compare/v4.48.3...v4.56.2)

---
updated-dependencies:
- dependency-name: transformers
  dependency-version: 4.56.2
  dependency-type: direct:production
  update-type: version-update:semver-minor
...

Signed-off-by: dependabot[bot] <support@github.com>
Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com>
2025-09-29 21:30:29 +00:00
dependabot[bot]
2ebe37b671 chore(deps): bump grpcio from 1.74.0 to 1.75.1 in /backend/python/coqui (#6353)
Bumps [grpcio](https://github.com/grpc/grpc) from 1.74.0 to 1.75.1.
- [Release notes](https://github.com/grpc/grpc/releases)
- [Changelog](https://github.com/grpc/grpc/blob/master/doc/grpc_release_schedule.md)
- [Commits](https://github.com/grpc/grpc/compare/v1.74.0...v1.75.1)

---
updated-dependencies:
- dependency-name: grpcio
  dependency-version: 1.75.1
  dependency-type: direct:production
  update-type: version-update:semver-minor
...

Signed-off-by: dependabot[bot] <support@github.com>
Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com>
2025-09-29 20:11:55 +00:00
LocalAI [bot]
dca685f784 chore: ⬆️ Update ggml-org/llama.cpp to bd0af02fc96c2057726f33c0f0daf7bb8f3e462a (#6352)
⬆️ Update ggml-org/llama.cpp

Signed-off-by: github-actions[bot] <41898282+github-actions[bot]@users.noreply.github.com>
Co-authored-by: mudler <2420543+mudler@users.noreply.github.com>
2025-09-28 21:08:50 +00:00
LocalAI [bot]
84ebf2a2c9 chore: ⬆️ Update ggml-org/llama.cpp to 4807e8f96a61b2adccebd5e57444c94d18de7264 (#6350)
⬆️ Update ggml-org/llama.cpp

Signed-off-by: github-actions[bot] <41898282+github-actions[bot]@users.noreply.github.com>
Co-authored-by: mudler <2420543+mudler@users.noreply.github.com>
2025-09-28 00:33:46 +02:00
Ettore Di Giacinto
ce5662ba90 chore(deps): bump llama.cpp to '72b24d96c6888c609d562779a23787304ae4609c' (#6349)
* chore(deps): bump llama.cpp to '72b24d96c6888c609d562779a23787304ae4609c'

Signed-off-by: Ettore Di Giacinto <mudler@localai.io>

* Disable OPENSSL (just introduced upstream)

Signed-off-by: Ettore Di Giacinto <mudler@localai.io>

---------

Signed-off-by: Ettore Di Giacinto <mudler@localai.io>
2025-09-27 13:55:51 +02:00
Ettore Di Giacinto
9878f27813 chore(deps): bump llama.cpp to '835b2b915c52bcabcd688d025eacff9a07b65f52' (#6347)
Signed-off-by: Ettore Di Giacinto <mudler@localai.io>
2025-09-26 23:26:14 +02:00
jongames
f2b9452ec4 fix: reranking models limited to 512 tokens in llama.cpp backend (#6344)
Fix reranking models being limited to 512 tokens input in llama.cpp backend

Signed-off-by: JonGames <18472148+jongames@users.noreply.github.com>
2025-09-25 23:32:07 +00:00
Ettore Di Giacinto
585da99c52 chore(models): add whisper-turbo via whisper.cpp (#6340)
Signed-off-by: Ettore Di Giacinto <mudler@localai.io>
2025-09-25 09:15:06 +02:00
Ettore Di Giacinto
fd4f432079 CI: disable build-testing on PRs against arm64 (#6341)
CI: disable testing on PRs against arm64

Removed configuration for cublas and arm64 platform.

Signed-off-by: Ettore Di Giacinto <mudler@users.noreply.github.com>
2025-09-25 09:14:50 +02:00
LocalAI [bot]
238c68c57b chore: ⬆️ Update ggml-org/llama.cpp to 4ae88d07d026e66b41e85afece74e88af54f4e66 (#6339)
⬆️ Update ggml-org/llama.cpp

Signed-off-by: github-actions[bot] <41898282+github-actions[bot]@users.noreply.github.com>
Co-authored-by: mudler <2420543+mudler@users.noreply.github.com>
2025-09-25 08:47:02 +02:00
Ettore Di Giacinto
04fbf5cb82 Change build type and update tag suffix in backend.yml
Signed-off-by: Ettore Di Giacinto <mudler@users.noreply.github.com>
2025-09-24 22:08:29 +02:00
Ettore Di Giacinto
c85d559919 feat(chatterbox): support multilingual (#6240)
* feat(chatterbox): support multilingual

Signed-off-by: Ettore Di Giacinto <mudler@localai.io>

* Add l4t support

Signed-off-by: Ettore Di Giacinto <mudler@localai.io>

* Fixups

Signed-off-by: Ettore Di Giacinto <mudler@localai.io>

* fix: switch to fork

Until https://github.com/resemble-ai/chatterbox/pull/295 is merged

Signed-off-by: Ettore Di Giacinto <mudler@localai.io>

---------

Signed-off-by: Ettore Di Giacinto <mudler@localai.io>
2025-09-24 18:37:37 +02:00
Ettore Di Giacinto
b5efc4f89e chore(cudss): add cudds to l4t images (#6338)
* chore(cudds): add cudds to l4t images

Signed-off-by: Ettore Di Giacinto <mudler@localai.io>

* add arm64 to CI tests

Signed-off-by: Ettore Di Giacinto <mudler@localai.io>

---------

Signed-off-by: Ettore Di Giacinto <mudler@localai.io>
2025-09-24 16:46:24 +02:00
Ettore Di Giacinto
3f9c09a4c5 chore(model gallery): add qwen-image-edit-2509 (#6336)
Signed-off-by: Ettore Di Giacinto <mudler@localai.io>
2025-09-24 10:05:03 +02:00
dependabot[bot]
4a84660475 chore(deps): bump securego/gosec from 2.22.8 to 2.22.9 (#6324)
Bumps [securego/gosec](https://github.com/securego/gosec) from 2.22.8 to 2.22.9.
- [Release notes](https://github.com/securego/gosec/releases)
- [Changelog](https://github.com/securego/gosec/blob/master/.goreleaser.yml)
- [Commits](https://github.com/securego/gosec/compare/v2.22.8...v2.22.9)

---
updated-dependencies:
- dependency-name: securego/gosec
  dependency-version: 2.22.9
  dependency-type: direct:production
  update-type: version-update:semver-patch
...

Signed-off-by: dependabot[bot] <support@github.com>
Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com>
2025-09-23 08:26:50 +02:00
LocalAI [bot]
737248256e chore: ⬆️ Update ggml-org/llama.cpp to 1d0125bcf1cbd7195ad0faf826a20bc7cec7d3f4 (#6335)
⬆️ Update ggml-org/llama.cpp

Signed-off-by: github-actions[bot] <41898282+github-actions[bot]@users.noreply.github.com>
Co-authored-by: mudler <2420543+mudler@users.noreply.github.com>
2025-09-22 21:13:34 +00:00
dependabot[bot]
0ae334fc62 chore(deps): bump grpcio from 1.74.0 to 1.75.0 in /backend/python/transformers (#6332)
chore(deps): bump grpcio in /backend/python/transformers

Bumps [grpcio](https://github.com/grpc/grpc) from 1.74.0 to 1.75.0.
- [Release notes](https://github.com/grpc/grpc/releases)
- [Changelog](https://github.com/grpc/grpc/blob/master/doc/grpc_release_schedule.md)
- [Commits](https://github.com/grpc/grpc/compare/v1.74.0...v1.75.0)

---
updated-dependencies:
- dependency-name: grpcio
  dependency-version: 1.75.0
  dependency-type: direct:production
  update-type: version-update:semver-minor
...

Signed-off-by: dependabot[bot] <support@github.com>
Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com>
2025-09-22 19:53:35 +00:00
Ettore Di Giacinto
36c373b7c9 feat(kokoro): add support for l4t devices (#6322)
Signed-off-by: Ettore Di Giacinto <mudler@localai.io>
2025-09-22 10:33:26 +02:00
LocalAI [bot]
6afcb932b7 chore: ⬆️ Update ggml-org/llama.cpp to da30ab5f8696cabb2d4620cdc0aa41a298c54fd6 (#6321)
⬆️ Update ggml-org/llama.cpp

Signed-off-by: github-actions[bot] <41898282+github-actions[bot]@users.noreply.github.com>
Co-authored-by: mudler <2420543+mudler@users.noreply.github.com>
2025-09-21 21:28:27 +00:00
LocalAI [bot]
357bf571a3 docs: ⬆️ update docs version mudler/LocalAI (#6318)
⬆️ Update docs version mudler/LocalAI

Signed-off-by: github-actions[bot] <41898282+github-actions[bot]@users.noreply.github.com>
Co-authored-by: mudler <2420543+mudler@users.noreply.github.com>
2025-09-21 08:40:00 +02:00
LocalAI [bot]
e74ade9ebb chore: ⬆️ Update ggml-org/llama.cpp to 7f766929ca8e8e01dcceb1c526ee584f7e5e1408 (#6319)
⬆️ Update ggml-org/llama.cpp

Signed-off-by: github-actions[bot] <41898282+github-actions[bot]@users.noreply.github.com>
Co-authored-by: mudler <2420543+mudler@users.noreply.github.com>
2025-09-20 21:05:25 +00:00
LocalAI [bot]
f7f26b8efa docs: ⬆️ update docs version mudler/LocalAI (#6315)
⬆️ Update docs version mudler/LocalAI

Signed-off-by: github-actions[bot] <41898282+github-actions[bot]@users.noreply.github.com>
Co-authored-by: mudler <2420543+mudler@users.noreply.github.com>
2025-09-20 09:41:58 +02:00
LocalAI [bot]
75eb98f8bd chore: ⬆️ Update ggml-org/llama.cpp to f432d8d83e7407073634c5e4fd81a3d23a10827f (#6316)
⬆️ Update ggml-org/llama.cpp

Signed-off-by: github-actions[bot] <41898282+github-actions[bot]@users.noreply.github.com>
Co-authored-by: mudler <2420543+mudler@users.noreply.github.com>
2025-09-20 09:41:45 +02:00
LocalAI [bot]
c337e7baf7 chore: ⬆️ Update ggml-org/whisper.cpp to 44fa2f647cf2a6953493b21ab83b50d5f5dbc483 (#6317)
⬆️ Update ggml-org/whisper.cpp

Signed-off-by: github-actions[bot] <41898282+github-actions[bot]@users.noreply.github.com>
Co-authored-by: mudler <2420543+mudler@users.noreply.github.com>
2025-09-19 21:14:10 +00:00
Ettore Di Giacinto
660bd45be8 fix(python): make option check uniform across backends (#6314)
Signed-off-by: Ettore Di Giacinto <mudler@localai.io>
2025-09-19 19:56:08 +02:00
Ettore Di Giacinto
c27da0a0f6 fix(diffusers): fix float detection (#6313)
There was apparently an oversight, this fixes the float/int detection

Fixes: https://github.com/mudler/LocalAI/issues/6312

Signed-off-by: Ettore Di Giacinto <mudler@users.noreply.github.com>
2025-09-19 19:09:04 +02:00
Ettore Di Giacinto
ac043ed9ba chore(model gallery): add aquif-3.5-a4b-think (#6311)
Signed-off-by: Ettore Di Giacinto <mudler@localai.io>
2025-09-19 11:16:50 +02:00
Ettore Di Giacinto
2e0d66a1c8 chore(model gallery): add impish_qwen_14b-1m (#6310)
Signed-off-by: Ettore Di Giacinto <mudler@localai.io>
2025-09-19 10:57:33 +02:00
Ettore Di Giacinto
41a0f361eb chore(model gallery): add mistralai_magistral-small-2509 (#6309)
Signed-off-by: Ettore Di Giacinto <mudler@localai.io>
2025-09-19 10:48:13 +02:00
LocalAI [bot]
d3c5c02837 docs: ⬆️ update docs version mudler/LocalAI (#6307)
⬆️ Update docs version mudler/LocalAI

Signed-off-by: github-actions[bot] <41898282+github-actions[bot]@users.noreply.github.com>
Co-authored-by: mudler <2420543+mudler@users.noreply.github.com>
2025-09-18 23:48:02 +02:00
LocalAI [bot]
ae3d8fb0c4 chore: ⬆️ Update ggml-org/llama.cpp to 3edd87cd055a45d885fa914d879d36d33ecfc3e1 (#6308)
⬆️ Update ggml-org/llama.cpp

Signed-off-by: github-actions[bot] <41898282+github-actions[bot]@users.noreply.github.com>
Co-authored-by: mudler <2420543+mudler@users.noreply.github.com>
2025-09-18 21:09:14 +00:00
LocalAI [bot]
902e47f0b0 chore: ⬆️ Update ggml-org/llama.cpp to 0320ac5264279d74f8ee91bafa6c90e9ab9bbb91 (#6306)
⬆️ Update ggml-org/llama.cpp

Signed-off-by: github-actions[bot] <41898282+github-actions[bot]@users.noreply.github.com>
Co-authored-by: mudler <2420543+mudler@users.noreply.github.com>
2025-09-18 09:27:18 +02:00
Ettore Di Giacinto
50bb78fd24 Add permissions for issues and actions
Signed-off-by: Ettore Di Giacinto <mudler@users.noreply.github.com>
2025-09-18 09:26:10 +02:00
LocalAI [bot]
542f07ab2d docs: ⬆️ update docs version mudler/LocalAI (#6305)
⬆️ Update docs version mudler/LocalAI

Signed-off-by: github-actions[bot] <41898282+github-actions[bot]@users.noreply.github.com>
Co-authored-by: mudler <2420543+mudler@users.noreply.github.com>
2025-09-17 21:06:50 +00:00
34 changed files with 487 additions and 105 deletions

View File

@@ -489,6 +489,18 @@ jobs:
backend: "diffusers"
dockerfile: "./backend/Dockerfile.python"
context: "./backend"
- build-type: 'l4t'
cuda-major-version: "12"
cuda-minor-version: "0"
platforms: 'linux/arm64'
tag-latest: 'auto'
tag-suffix: '-gpu-nvidia-l4t-kokoro'
runs-on: 'ubuntu-24.04-arm'
base-image: "nvcr.io/nvidia/l4t-jetpack:r36.4.0"
skip-drivers: 'true'
backend: "kokoro"
dockerfile: "./backend/Dockerfile.python"
context: "./backend"
# SYCL additional backends
- build-type: 'intel'
cuda-major-version: ""
@@ -870,7 +882,7 @@ jobs:
backend: "rfdetr"
dockerfile: "./backend/Dockerfile.python"
context: "./backend"
- build-type: 'cublas'
- build-type: 'l4t'
cuda-major-version: "12"
cuda-minor-version: "0"
platforms: 'linux/arm64'
@@ -943,6 +955,18 @@ jobs:
backend: "exllama2"
dockerfile: "./backend/Dockerfile.python"
context: "./backend"
- build-type: 'l4t'
cuda-major-version: "12"
cuda-minor-version: "0"
platforms: 'linux/arm64'
skip-drivers: 'true'
tag-latest: 'auto'
tag-suffix: '-gpu-nvidia-l4t-arm64-chatterbox'
base-image: "nvcr.io/nvidia/l4t-jetpack:r36.4.0"
runs-on: 'ubuntu-24.04-arm'
backend: "chatterbox"
dockerfile: "./backend/Dockerfile.python"
context: "./backend"
# runs out of space on the runner
# - build-type: 'hipblas'
# cuda-major-version: ""

View File

@@ -6,7 +6,8 @@ permissions:
contents: write
pull-requests: write
packages: read
issues: write # for Homebrew/actions/post-comment
actions: write # to dispatch publish workflow
jobs:
dependabot:
runs-on: ubuntu-latest

View File

@@ -18,7 +18,7 @@ jobs:
if: ${{ github.actor != 'dependabot[bot]' }}
- name: Run Gosec Security Scanner
if: ${{ github.actor != 'dependabot[bot]' }}
uses: securego/gosec@v2.22.8
uses: securego/gosec@v2.22.9
with:
# we let the report trigger content trigger a failure using the GitHub Security features.
args: '-no-fail -fmt sarif -out results.sarif ./...'

View File

@@ -78,6 +78,16 @@ RUN <<EOT bash
fi
EOT
# https://github.com/NVIDIA/Isaac-GR00T/issues/343
RUN <<EOT bash
if [ "${BUILD_TYPE}" = "cublas" ] && [ "${TARGETARCH}" = "arm64" ]; then
wget https://developer.download.nvidia.com/compute/cudss/0.6.0/local_installers/cudss-local-tegra-repo-ubuntu2204-0.6.0_0.6.0-1_arm64.deb && \
dpkg -i cudss-local-tegra-repo-ubuntu2204-0.6.0_0.6.0-1_arm64.deb && \
cp /var/cudss-local-tegra-repo-ubuntu2204-0.6.0/cudss-*-keyring.gpg /usr/share/keyrings/ && \
apt-get update && apt-get -y install cudss
fi
EOT
# If we are building with clblas support, we need the libraries for the builds
RUN if [ "${BUILD_TYPE}" = "clblas" ] && [ "${SKIP_DRIVERS}" = "false" ]; then \
apt-get update && \

View File

@@ -429,6 +429,9 @@ docker-build-kitten-tts:
docker-save-kitten-tts: backend-images
docker save local-ai-backend:kitten-tts -o backend-images/kitten-tts.tar
docker-save-chatterbox: backend-images
docker save local-ai-backend:chatterbox -o backend-images/chatterbox.tar
docker-build-kokoro:
docker build --build-arg BUILD_TYPE=$(BUILD_TYPE) --build-arg BASE_IMAGE=$(BASE_IMAGE) -t local-ai-backend:kokoro -f backend/Dockerfile.python --build-arg BACKEND=kokoro ./backend

View File

@@ -1,5 +1,5 @@
LLAMA_VERSION?=8ff206097c2bf3ca1c7aa95f9d6db779fc7bdd68
LLAMA_VERSION?=d64c8104f090b27b1f99e8da5995ffcfa6b726e2
LLAMA_REPO?=https://github.com/ggerganov/llama.cpp
CMAKE_ARGS?=
@@ -14,7 +14,7 @@ CMAKE_ARGS+=-DBUILD_SHARED_LIBS=OFF -DLLAMA_CURL=OFF
CURRENT_MAKEFILE_DIR := $(dir $(abspath $(lastword $(MAKEFILE_LIST))))
ifeq ($(NATIVE),false)
CMAKE_ARGS+=-DGGML_NATIVE=OFF
CMAKE_ARGS+=-DGGML_NATIVE=OFF -DLLAMA_OPENSSL=OFF
endif
# If build type is cublas, then we set -DGGML_CUDA=ON to CMAKE_ARGS automatically
ifeq ($(BUILD_TYPE),cublas)

View File

@@ -231,6 +231,7 @@ static void params_parse(const backend::ModelOptions* request,
params.cpuparams.n_threads = request->threads();
params.n_gpu_layers = request->ngpulayers();
params.n_batch = request->nbatch();
params.n_ubatch = request->nbatch(); // fixes issue with reranking models being limited to 512 tokens (the default n_ubatch size); allows for setting the maximum input amount of tokens thereby avoiding this error "input is too large to process. increase the physical batch size"
// Set params.n_parallel by environment variable (LLAMA_PARALLEL), defaults to 1
//params.n_parallel = 1;
const char *env_parallel = std::getenv("LLAMACPP_PARALLEL");
@@ -801,11 +802,6 @@ public:
return grpc::Status(grpc::StatusCode::INVALID_ARGUMENT, "\"documents\" must be a non-empty string array");
}
// Tokenize the query
auto tokenized_query = tokenize_input_prompts(ctx_server.vocab, ctx_server.mctx, request->query(), /* add_special */ false, true);
if (tokenized_query.size() != 1) {
return grpc::Status(grpc::StatusCode::INVALID_ARGUMENT, "\"query\" must contain only a single prompt");
}
// Create and queue the task
json responses = json::array();
bool error = false;
@@ -817,10 +813,9 @@ public:
documents.push_back(request->documents(i));
}
auto tokenized_docs = tokenize_input_prompts(ctx_server.vocab, ctx_server.mctx, documents, /* add_special */ false, true);
tasks.reserve(tokenized_docs.size());
for (size_t i = 0; i < tokenized_docs.size(); i++) {
auto tmp = format_rerank(ctx_server.vocab, tokenized_query[0], tokenized_docs[i]);
tasks.reserve(documents.size());
for (size_t i = 0; i < documents.size(); i++) {
auto tmp = format_rerank(ctx_server.model, ctx_server.vocab, ctx_server.mctx, request->query(), documents[i]);
server_task task = server_task(SERVER_TASK_TYPE_RERANK);
task.id = ctx_server.queue_tasks.get_new_id();
task.index = i;

View File

@@ -8,7 +8,7 @@ JOBS?=$(shell nproc --ignore=1)
# whisper.cpp version
WHISPER_REPO?=https://github.com/ggml-org/whisper.cpp
WHISPER_CPP_VERSION?=edea8a9c3cf0eb7676dcdb604991eb2f95c3d984
WHISPER_CPP_VERSION?=7849aff7a2e1f4234aa31b01a1870906d5431959
CMAKE_ARGS+=-DBUILD_SHARED_LIBS=OFF

View File

@@ -270,6 +270,7 @@
nvidia: "cuda12-kokoro"
intel: "intel-kokoro"
amd: "rocm-kokoro"
nvidia-l4t: "nvidia-l4t-kokoro"
- &coqui
urls:
- https://github.com/idiap/coqui-ai-TTS
@@ -352,6 +353,7 @@
nvidia: "cuda12-chatterbox"
metal: "metal-chatterbox"
default: "cpu-chatterbox"
nvidia-l4t: "nvidia-l4t-arm64-chatterbox"
- &piper
name: "piper"
uri: "quay.io/go-skynet/local-ai-backends:latest-piper"
@@ -1049,6 +1051,7 @@
nvidia: "cuda12-kokoro-development"
intel: "intel-kokoro-development"
amd: "rocm-kokoro-development"
nvidia-l4t: "nvidia-l4t-kokoro-development"
- !!merge <<: *kokoro
name: "cuda11-kokoro-development"
uri: "quay.io/go-skynet/local-ai-backends:master-gpu-nvidia-cuda-11-kokoro"
@@ -1074,6 +1077,16 @@
uri: "quay.io/go-skynet/local-ai-backends:master-gpu-intel-kokoro"
mirrors:
- localai/localai-backends:master-gpu-intel-kokoro
- !!merge <<: *kokoro
name: "nvidia-l4t-kokoro"
uri: "quay.io/go-skynet/local-ai-backends:latest-gpu-nvidia-l4t-kokoro"
mirrors:
- localai/localai-backends:latest-gpu-nvidia-l4t-kokoro
- !!merge <<: *kokoro
name: "nvidia-l4t-kokoro-development"
uri: "quay.io/go-skynet/local-ai-backends:master-gpu-nvidia-l4t-kokoro"
mirrors:
- localai/localai-backends:master-gpu-nvidia-l4t-kokoro
- !!merge <<: *kokoro
name: "cuda11-kokoro"
uri: "quay.io/go-skynet/local-ai-backends:latest-gpu-nvidia-cuda-11-kokoro"
@@ -1227,6 +1240,7 @@
nvidia: "cuda12-chatterbox-development"
metal: "metal-chatterbox-development"
default: "cpu-chatterbox-development"
nvidia-l4t: "nvidia-l4t-arm64-chatterbox"
- !!merge <<: *chatterbox
name: "cpu-chatterbox"
uri: "quay.io/go-skynet/local-ai-backends:latest-cpu-chatterbox"
@@ -1237,6 +1251,16 @@
uri: "quay.io/go-skynet/local-ai-backends:master-cpu-chatterbox"
mirrors:
- localai/localai-backends:master-cpu-chatterbox
- !!merge <<: *chatterbox
name: "nvidia-l4t-arm64-chatterbox"
uri: "quay.io/go-skynet/local-ai-backends:latest-gpu-nvidia-l4t-arm64-chatterbox"
mirrors:
- localai/localai-backends:latest-gpu-nvidia-l4t-arm64-chatterbox
- !!merge <<: *chatterbox
name: "nvidia-l4t-arm64-chatterbox-development"
uri: "quay.io/go-skynet/local-ai-backends:master-gpu-nvidia-l4t-arm64-chatterbox"
mirrors:
- localai/localai-backends:master-gpu-nvidia-l4t-arm64-chatterbox
- !!merge <<: *chatterbox
name: "metal-chatterbox"
uri: "quay.io/go-skynet/local-ai-backends:latest-metal-darwin-arm64-chatterbox"

View File

@@ -1,4 +1,4 @@
bark==0.1.5
grpcio==1.74.0
grpcio==1.75.1
protobuf
certifi

View File

@@ -14,9 +14,23 @@ import backend_pb2_grpc
import torch
import torchaudio as ta
from chatterbox.tts import ChatterboxTTS
from chatterbox.mtl_tts import ChatterboxMultilingualTTS
import grpc
def is_float(s):
"""Check if a string can be converted to float."""
try:
float(s)
return True
except ValueError:
return False
def is_int(s):
"""Check if a string can be converted to int."""
try:
int(s)
return True
except ValueError:
return False
_ONE_DAY_IN_SECONDS = 60 * 60 * 24
@@ -47,6 +61,28 @@ class BackendServicer(backend_pb2_grpc.BackendServicer):
if not torch.cuda.is_available() and request.CUDA:
return backend_pb2.Result(success=False, message="CUDA is not available")
options = request.Options
# empty dict
self.options = {}
# The options are a list of strings in this form optname:optvalue
# We are storing all the options in a dict so we can use it later when
# generating the images
for opt in options:
if ":" not in opt:
continue
key, value = opt.split(":")
# if value is a number, convert it to the appropriate type
if is_float(value):
value = float(value)
elif is_int(value):
value = int(value)
elif value.lower() in ["true", "false"]:
value = value.lower() == "true"
self.options[key] = value
self.AudioPath = None
if os.path.isabs(request.AudioPath):
@@ -56,10 +92,14 @@ class BackendServicer(backend_pb2_grpc.BackendServicer):
modelFileBase = os.path.dirname(request.ModelFile)
# modify LoraAdapter to be relative to modelFileBase
self.AudioPath = os.path.join(modelFileBase, request.AudioPath)
try:
print("Preparing models, please wait", file=sys.stderr)
self.model = ChatterboxTTS.from_pretrained(device=device)
if "multilingual" in self.options:
# remove key from options
del self.options["multilingual"]
self.model = ChatterboxMultilingualTTS.from_pretrained(device=device)
else:
self.model = ChatterboxTTS.from_pretrained(device=device)
except Exception as err:
return backend_pb2.Result(success=False, message=f"Unexpected {err=}, {type(err)=}")
# Implement your logic here for the LoadModel service
@@ -68,12 +108,18 @@ class BackendServicer(backend_pb2_grpc.BackendServicer):
def TTS(self, request, context):
try:
# Generate audio using ChatterboxTTS
kwargs = {}
if "language" in self.options:
kwargs["language_id"] = self.options["language"]
if self.AudioPath is not None:
wav = self.model.generate(request.text, audio_prompt_path=self.AudioPath)
else:
wav = self.model.generate(request.text)
kwargs["audio_prompt_path"] = self.AudioPath
# add options to kwargs
kwargs.update(self.options)
# Generate audio using ChatterboxTTS
wav = self.model.generate(request.text, **kwargs)
# Save the generated audio
ta.save(request.dst, wav, self.model.sr)

View File

@@ -15,5 +15,6 @@ fi
if [ "x${BUILD_PROFILE}" == "xintel" ]; then
EXTRA_PIP_INSTALL_FLAGS+=" --upgrade --index-strategy=unsafe-first-match"
fi
EXTRA_PIP_INSTALL_FLAGS+=" --no-build-isolation"
installRequirements

View File

@@ -1,6 +1,8 @@
--extra-index-url https://download.pytorch.org/whl/cpu
accelerate
torch==2.6.0
torchaudio==2.6.0
transformers==4.46.3
chatterbox-tts==0.1.2
torch
torchaudio
transformers
# https://github.com/mudler/LocalAI/pull/6240#issuecomment-3329518289
chatterbox-tts@git+https://git@github.com/mudler/chatterbox.git@faster
#chatterbox-tts==0.1.4

View File

@@ -2,5 +2,6 @@
torch==2.6.0+cu118
torchaudio==2.6.0+cu118
transformers==4.46.3
chatterbox-tts==0.1.2
# https://github.com/mudler/LocalAI/pull/6240#issuecomment-3329518289
chatterbox-tts@git+https://git@github.com/mudler/chatterbox.git@faster
accelerate

View File

@@ -1,5 +1,6 @@
torch==2.6.0
torchaudio==2.6.0
transformers==4.46.3
chatterbox-tts==0.1.2
torch
torchaudio
transformers
# https://github.com/mudler/LocalAI/pull/6240#issuecomment-3329518289
chatterbox-tts@git+https://git@github.com/mudler/chatterbox.git@faster
accelerate

View File

@@ -1,6 +1,7 @@
--extra-index-url https://download.pytorch.org/whl/rocm6.0
torch==2.6.0+rocm6.1
torchaudio==2.6.0+rocm6.1
transformers==4.46.3
chatterbox-tts==0.1.2
transformers
# https://github.com/mudler/LocalAI/pull/6240#issuecomment-3329518289
chatterbox-tts@git+https://git@github.com/mudler/chatterbox.git@faster
accelerate

View File

@@ -2,8 +2,9 @@
intel-extension-for-pytorch==2.3.110+xpu
torch==2.3.1+cxx11.abi
torchaudio==2.3.1+cxx11.abi
transformers==4.46.3
chatterbox-tts==0.1.2
transformers
# https://github.com/mudler/LocalAI/pull/6240#issuecomment-3329518289
chatterbox-tts@git+https://git@github.com/mudler/chatterbox.git@faster
accelerate
oneccl_bind_pt==2.3.100+xpu
optimum[openvino]

View File

@@ -0,0 +1,6 @@
--extra-index-url https://pypi.jetson-ai-lab.io/jp6/cu126/
torch
torchaudio
transformers
chatterbox-tts@git+https://git@github.com/mudler/chatterbox.git@faster
accelerate

View File

@@ -1,3 +1,3 @@
grpcio==1.74.0
grpcio==1.75.1
protobuf
grpcio-tools

View File

@@ -1,4 +1,4 @@
grpcio==1.74.0
grpcio==1.75.1
protobuf
certifi
packaging==24.1

View File

@@ -66,11 +66,20 @@ from diffusers.schedulers import (
)
def is_float(s):
"""Check if a string can be converted to float."""
try:
float(s)
return True
except ValueError:
return False
def is_int(s):
"""Check if a string can be converted to int."""
try:
int(s)
return True
except ValueError:
return False
# The scheduler list mapping was taken from here: https://github.com/neggles/animatediff-cli/blob/6f336f5f4b5e38e85d7f06f1744ef42d0a45f2a7/src/animatediff/schedulers.py#L39
# Credits to https://github.com/neggles
@@ -177,10 +186,11 @@ class BackendServicer(backend_pb2_grpc.BackendServicer):
key, value = opt.split(":")
# if value is a number, convert it to the appropriate type
if is_float(value):
if value.is_integer():
value = int(value)
else:
value = float(value)
value = float(value)
elif is_int(value):
value = int(value)
elif value.lower() in ["true", "false"]:
value = value.lower() == "true"
self.options[key] = value
# From options, extract if present "torch_dtype" and set it to the appropriate type

View File

@@ -1,5 +1,5 @@
setuptools
grpcio==1.74.0
grpcio==1.75.1
pillow
protobuf
certifi

View File

@@ -1,4 +1,4 @@
grpcio==1.74.0
grpcio==1.75.1
protobuf
certifi
wheel

View File

@@ -0,0 +1,7 @@
--extra-index-url https://pypi.jetson-ai-lab.io/jp6/cu126/
torch
torchaudio
transformers
accelerate
kokoro
soundfile

View File

@@ -20,6 +20,21 @@ import soundfile as sf
import numpy as np
import uuid
def is_float(s):
"""Check if a string can be converted to float."""
try:
float(s)
return True
except ValueError:
return False
def is_int(s):
"""Check if a string can be converted to int."""
try:
int(s)
return True
except ValueError:
return False
_ONE_DAY_IN_SECONDS = 60 * 60 * 24
# If MAX_WORKERS are specified in the environment use it, otherwise default to 1
@@ -32,14 +47,6 @@ class BackendServicer(backend_pb2_grpc.BackendServicer):
This backend provides TTS (Text-to-Speech) functionality using MLX-Audio.
"""
def _is_float(self, s):
"""Check if a string can be converted to float."""
try:
float(s)
return True
except ValueError:
return False
def Health(self, request, context):
"""
Returns a health check message.
@@ -80,11 +87,10 @@ class BackendServicer(backend_pb2_grpc.BackendServicer):
key, value = opt.split(":", 1) # Split only on first colon to handle values with colons
# Convert numeric values to appropriate types
if self._is_float(value):
if float(value).is_integer():
value = int(value)
else:
value = float(value)
if is_float(value):
value = float(value)
elif is_int(value):
value = int(value)
elif value.lower() in ["true", "false"]:
value = value.lower() == "true"

View File

@@ -21,6 +21,21 @@ import io
from PIL import Image
import tempfile
def is_float(s):
"""Check if a string can be converted to float."""
try:
float(s)
return True
except ValueError:
return False
def is_int(s):
"""Check if a string can be converted to int."""
try:
int(s)
return True
except ValueError:
return False
_ONE_DAY_IN_SECONDS = 60 * 60 * 24
# If MAX_WORKERS are specified in the environment use it, otherwise default to 1
@@ -32,14 +47,6 @@ class BackendServicer(backend_pb2_grpc.BackendServicer):
A gRPC servicer that implements the Backend service defined in backend.proto.
"""
def _is_float(self, s):
"""Check if a string can be converted to float."""
try:
float(s)
return True
except ValueError:
return False
def Health(self, request, context):
"""
Returns a health check message.
@@ -79,12 +86,10 @@ class BackendServicer(backend_pb2_grpc.BackendServicer):
continue
key, value = opt.split(":", 1) # Split only on first colon to handle values with colons
# Convert numeric values to appropriate types
if self._is_float(value):
if float(value).is_integer():
value = int(value)
else:
value = float(value)
if is_float(value):
value = float(value)
elif is_int(value):
value = int(value)
elif value.lower() in ["true", "false"]:
value = value.lower() == "true"

View File

@@ -24,20 +24,27 @@ _ONE_DAY_IN_SECONDS = 60 * 60 * 24
# If MAX_WORKERS are specified in the environment use it, otherwise default to 1
MAX_WORKERS = int(os.environ.get('PYTHON_GRPC_MAX_WORKERS', '1'))
def is_float(s):
"""Check if a string can be converted to float."""
try:
float(s)
return True
except ValueError:
return False
def is_int(s):
"""Check if a string can be converted to int."""
try:
int(s)
return True
except ValueError:
return False
# Implement the BackendServicer class with the service methods
class BackendServicer(backend_pb2_grpc.BackendServicer):
"""
A gRPC servicer that implements the Backend service defined in backend.proto.
"""
def _is_float(self, s):
"""Check if a string can be converted to float."""
try:
float(s)
return True
except ValueError:
return False
def Health(self, request, context):
"""
Returns a health check message.
@@ -78,11 +85,10 @@ class BackendServicer(backend_pb2_grpc.BackendServicer):
key, value = opt.split(":", 1) # Split only on first colon to handle values with colons
# Convert numeric values to appropriate types
if self._is_float(value):
if float(value).is_integer():
value = int(value)
else:
value = float(value)
if is_float(value):
value = float(value)
elif is_int(value):
value = int(value)
elif value.lower() in ["true", "false"]:
value = value.lower() == "true"

View File

@@ -1,3 +1,3 @@
grpcio==1.74.0
grpcio==1.75.1
protobuf
certifi

View File

@@ -1,4 +1,4 @@
grpcio==1.74.0
grpcio==1.75.1
protobuf==6.32.0
certifi
setuptools

View File

@@ -1,4 +1,4 @@
grpcio==1.74.0
grpcio==1.75.1
protobuf
certifi
setuptools

View File

@@ -1,3 +1,3 @@
{
"version": "v3.5.0"
"version": "v3.5.4"
}

48
gallery/granite4.yaml Normal file
View File

@@ -0,0 +1,48 @@
---
name: "granite-3.2"
config_file: |
backend: "llama-cpp"
mmap: true
template:
chat_message: |
<|start_of_role|>{{ .RoleName }}<|end_of_role|>
{{ if .FunctionCall -}}
<tool_call>
{{ else if eq .RoleName "tool" -}}
<tool_response>
{{ end -}}
{{ if .Content -}}
{{.Content }}
{{ end -}}
{{ if eq .RoleName "tool" -}}
</tool_response>
{{ end -}}
{{ if .FunctionCall -}}
{{toJson .FunctionCall}}
</tool_call>
{{ end -}}
<|end_of_text|>
function: |
<|start_of_role|>system<|end_of_role|>
You are a helpful AI assistant with access to the following tools. When a tool is required to answer the user's query, respond with <|tool_call|> followed by a JSON list of tools used. If a tool does not exist in the provided list of tools, notify the user that you do not have the ability to fulfill the request.
Write the response to the user's input by strictly aligning with the facts in the provided documents. If the information needed to answer the question is not available in the documents, inform the user that the question cannot be answered based on the available data.
{{range .Functions}}
{'type': 'function', 'function': {'name': '{{.Name}}', 'description': '{{.Description}}', 'parameters': {{toJson .Parameters}} }}
{{end}}
For each function call return a json object with function name and arguments
{{.Input -}}
<|start_of_role|>assistant<|end_of_role|>
chat: |
{{.Input -}}
<|start_of_role|>assistant<|end_of_role|>
completion: |
{{.Input}}
context_size: 8192
f16: true
stopwords:
- '<|im_end|>'
- '<dummy32000>'
- '</s>'
- '<|end_of_text|>'

View File

@@ -1,4 +1,68 @@
---
- &granite4
url: "github:mudler/LocalAI/gallery/granite4.yaml@master"
name: "ibm-granite_granite-4.0-h-small"
license: apache-2.0
icon: https://cdn-avatars.huggingface.co/v1/production/uploads/639bcaa2445b133a4e942436/CEW-OjXkRkDNmTxSu8Egh.png
tags:
- gguf
- GPU
- CPU
- text-to-text
urls:
- https://huggingface.co/ibm-granite/granite-4.0-h-small
- https://huggingface.co/bartowski/ibm-granite_granite-4.0-h-small-GGUF
description: |
Granite-4.0-H-Small is a 32B parameter long-context instruct model finetuned from Granite-4.0-H-Small-Base using a combination of open source instruction datasets with permissive license and internally collected synthetic datasets. This model is developed using a diverse set of techniques with a structured chat format, including supervised finetuning, model alignment using reinforcement learning, and model merging. Granite 4.0 instruct models feature improved instruction following (IF) and tool-calling capabilities, making them more effective in enterprise applications.
overrides:
parameters:
model: ibm-granite_granite-4.0-h-small-Q4_K_M.gguf
files:
- filename: ibm-granite_granite-4.0-h-small-Q4_K_M.gguf
sha256: c59ce76239bd5794acdbdf88616dfc296247f4e78792a9678d4b3e24966ead69
uri: huggingface://bartowski/ibm-granite_granite-4.0-h-small-GGUF/ibm-granite_granite-4.0-h-small-Q4_K_M.gguf
- !!merge <<: *granite4
name: "ibm-granite_granite-4.0-h-tiny"
urls:
- https://huggingface.co/ibm-granite/granite-4.0-h-tiny
- https://huggingface.co/bartowski/ibm-granite_granite-4.0-h-tiny-GGUF
description: |
Granite-4.0-H-Tiny is a 7B parameter long-context instruct model finetuned from Granite-4.0-H-Tiny-Base using a combination of open source instruction datasets with permissive license and internally collected synthetic datasets. This model is developed using a diverse set of techniques with a structured chat format, including supervised finetuning, model alignment using reinforcement learning, and model merging. Granite 4.0 instruct models feature improved instruction following (IF) and tool-calling capabilities, making them more effective in enterprise applications.
overrides:
parameters:
model: ibm-granite_granite-4.0-h-tiny-Q4_K_M.gguf
files:
- filename: ibm-granite_granite-4.0-h-tiny-Q4_K_M.gguf
sha256: 33a689fe7f35b14ebab3ae599b65aaa3ed8548c393373b1b0eebee36c653146f
uri: huggingface://bartowski/ibm-granite_granite-4.0-h-tiny-GGUF/ibm-granite_granite-4.0-h-tiny-Q4_K_M.gguf
- !!merge <<: *granite4
name: "ibm-granite_granite-4.0-h-micro"
urls:
- https://huggingface.co/ibm-granite/granite-4.0-h-micro
- https://huggingface.co/bartowski/ibm-granite_granite-4.0-h-micro-GGUF
description: |
Granite-4.0-H-Micro is a 3B parameter long-context instruct model finetuned from Granite-4.0-H-Micro-Base using a combination of open source instruction datasets with permissive license and internally collected synthetic datasets. This model is developed using a diverse set of techniques with a structured chat format, including supervised finetuning, model alignment using reinforcement learning, and model merging. Granite 4.0 instruct models feature improved instruction following (IF) and tool-calling capabilities, making them more effective in enterprise applications.
overrides:
parameters:
model: ibm-granite_granite-4.0-h-micro-Q4_K_M.gguf
files:
- filename: ibm-granite_granite-4.0-h-micro-Q4_K_M.gguf
sha256: 48376d61449687a56b3811a418d92cc0e8e77b4d96ec13eb6c9d9503968c9f20
uri: huggingface://bartowski/ibm-granite_granite-4.0-h-micro-GGUF/ibm-granite_granite-4.0-h-micro-Q4_K_M.gguf
- !!merge <<: *granite4
name: "ibm-granite_granite-4.0-micro"
urls:
- https://huggingface.co/ibm-granite/granite-4.0-micro
- https://huggingface.co/bartowski/ibm-granite_granite-4.0-micro-GGUF
description: |
Granite-4.0-Micro is a 3B parameter long-context instruct model finetuned from Granite-4.0-Micro-Base using a combination of open source instruction datasets with permissive license and internally collected synthetic datasets. This model is developed using a diverse set of techniques with a structured chat format, including supervised finetuning, model alignment using reinforcement learning, and model merging. Granite 4.0 instruct models feature improved instruction following (IF) and tool-calling capabilities, making them more effective in enterprise applications.
overrides:
parameters:
model: ibm-granite_granite-4.0-micro-Q4_K_M.gguf
files:
- filename: ibm-granite_granite-4.0-micro-Q4_K_M.gguf
sha256: bd9d7b4795b9dc44e3e81aeae93bb5d8e6b891b7e823be5bf9910ed3ac060baf
uri: huggingface://bartowski/ibm-granite_granite-4.0-micro-GGUF/ibm-granite_granite-4.0-micro-Q4_K_M.gguf
- &ernie
url: "github:mudler/LocalAI/gallery/chatml.yaml@master"
name: "baidu_ernie-4.5-21b-a3b-thinking"
@@ -335,7 +399,7 @@
url: "github:mudler/LocalAI/gallery/qwen-image.yaml@master"
urls:
- https://huggingface.co/Qwen/Qwen-Image-Edit
icon: https://qianwen-res.oss-cn-beijing.aliyuncs.com/Qwen-Image/qwen_image_logo.png
icon: https://qianwen-res.oss-cn-beijing.aliyuncs.com/Qwen-Image/qwen_image_edit_logo.png
license: apache-2.0
tags:
- qwen-image
@@ -350,6 +414,26 @@
cuda: true
pipeline_type: QwenImageEditPipeline
enable_parameters: num_inference_steps,image
- !!merge <<: *qwenimage
name: "qwen-image-edit-2509"
url: "github:mudler/LocalAI/gallery/qwen-image.yaml@master"
urls:
- https://huggingface.co/Qwen/Qwen-Image-Edit-2509
icon: https://qianwen-res.oss-cn-beijing.aliyuncs.com/Qwen-Image/qwen_image_edit_logo.png
license: apache-2.0
tags:
- qwen-image
- gpu
- image-to-image
description: |
Qwen-Image-Edit is a model for image editing, which is based on Qwen-Image.
overrides:
parameters:
model: Qwen/Qwen-Image-Edit-2509
diffusers:
cuda: true
pipeline_type: QwenImageEditPipeline
enable_parameters: num_inference_steps,image
- &gptoss
name: "gpt-oss-20b"
url: "github:mudler/LocalAI/gallery/harmony.yaml@master"
@@ -2638,6 +2722,39 @@
- filename: Alibaba-NLP_Tongyi-DeepResearch-30B-A3B-Q4_K_M.gguf
sha256: 1afefb3b369ea2de191f24fe8ea22cbbb7b412357902f27bd81d693dde35c2d9
uri: huggingface://bartowski/Alibaba-NLP_Tongyi-DeepResearch-30B-A3B-GGUF/Alibaba-NLP_Tongyi-DeepResearch-30B-A3B-Q4_K_M.gguf
- !!merge <<: *qwen3
name: "impish_qwen_14b-1m"
icon: https://huggingface.co/SicariusSicariiStuff/Impish_QWEN_14B-1M/resolve/main/Images/Impish_Qwen_14B.png
urls:
- https://huggingface.co/SicariusSicariiStuff/Impish_QWEN_14B-1M
- https://huggingface.co/mradermacher/Impish_QWEN_14B-1M-GGUF
description: |
Supreme context One million tokens to play with.
Strong Roleplay internet RP format lovers will appriciate it, medium size paragraphs.
Qwen smarts built-in, but naughty and playful Maybe it's even too naughty.
VERY compliant with low censorship.
VERY high IFeval for a 14B RP model: 78.68.
overrides:
parameters:
model: Impish_QWEN_14B-1M.Q4_K_M.gguf
files:
- filename: Impish_QWEN_14B-1M.Q4_K_M.gguf
sha256: d326f2b8f05814ea3943c82498f0cd3cde64859cf03f532855c87fb94b0da79e
uri: huggingface://mradermacher/Impish_QWEN_14B-1M-GGUF/Impish_QWEN_14B-1M.Q4_K_M.gguf
- !!merge <<: *qwen3
name: "aquif-3.5-a4b-think"
urls:
- https://huggingface.co/aquif-ai/aquif-3.5-A4B-Think
- https://huggingface.co/QuantFactory/aquif-3.5-A4B-Think-GGUF
description: |
The aquif-3.5 series is the successor to aquif-3, featuring a simplified naming scheme, expanded Mixture of Experts (MoE) options, and across-the-board performance improvements. This release streamlines model selection while delivering enhanced capabilities across reasoning, multilingual support, and general intelligence tasks.
overrides:
parameters:
model: aquif-3.5-A4B-Think.Q4_K_M.gguf
files:
- filename: aquif-3.5-A4B-Think.Q4_K_M.gguf
sha256: 1650b72ae1acf12b45a702f2ff5f47205552e494f0d910e81cbe40dfba55a6b9
uri: huggingface://QuantFactory/aquif-3.5-A4B-Think-GGUF/aquif-3.5-A4B-Think.Q4_K_M.gguf
- &gemma3
url: "github:mudler/LocalAI/gallery/gemma.yaml@master"
name: "gemma-3-27b-it"
@@ -15175,6 +15292,27 @@
- filename: Impish_Longtail_12B-Q4_K_M.gguf
sha256: 2cf0cacb65d71cfc5b4255f3273ad245bbcb11956a0f9e3aaa0e739df57c90df
uri: huggingface://SicariusSicariiStuff/Impish_Longtail_12B_GGUF/Impish_Longtail_12B-Q4_K_M.gguf
- !!merge <<: *mistral03
name: "mistralai_magistral-small-2509"
urls:
- https://huggingface.co/mistralai/Magistral-Small-2509
- https://huggingface.co/bartowski/mistralai_Magistral-Small-2509-GGUF
description: |
Magistral Small 1.2
Building upon Mistral Small 3.2 (2506), with added reasoning capabilities, undergoing SFT from Magistral Medium traces and RL on top, it's a small, efficient reasoning model with 24B parameters.
Magistral Small can be deployed locally, fitting within a single RTX 4090 or a 32GB RAM MacBook once quantized.
Learn more about Magistral in our blog post.
The model was presented in the paper Magistral.
overrides:
parameters:
model: mistralai_Magistral-Small-2509-Q4_K_M.gguf
files:
- filename: mistralai_Magistral-Small-2509-Q4_K_M.gguf
sha256: 1d638bc931de30d29fc73ad439206ff185f76666a096e7ad723866a20f78728d
uri: huggingface://bartowski/mistralai_Magistral-Small-2509-GGUF/mistralai_Magistral-Small-2509-Q4_K_M.gguf
- &mudler
url: "github:mudler/LocalAI/gallery/mudler.yaml@master" ### START mudler's LocalAI specific-models
name: "LocalAI-llama3-8b-function-call-v0.2"
@@ -20336,9 +20474,9 @@
- https://huggingface.co/ggerganov/whisper.cpp
overrides:
parameters:
model: ggml-whisper-base.bin
model: ggml-base.bin
files:
- filename: "ggml-whisper-base.bin"
- filename: "ggml-base.bin"
sha256: "60ed5bc3dd14eea856493d334349b405782ddcaf0028d4b5df4088345fba2efe"
uri: "https://huggingface.co/ggerganov/whisper.cpp/resolve/main/ggml-base.bin"
description: |
@@ -20383,11 +20521,20 @@
name: "whisper-large-q5_0"
overrides:
parameters:
model: ggml-large-q5_0.bin
model: ggml-large-v3-q5_0.bin
files:
- filename: "ggml-large-q5_0.bin"
uri: "huggingface://ggerganov/whisper.cpp/ggml-large-q5_0.bin"
sha256: 3a214837221e4530dbc1fe8d734f302af393eb30bd0ed046042ebf4baf70f6f2
- filename: "ggml-large-v3-q5_0.bin"
uri: "huggingface://ggerganov/whisper.cpp/ggml-large-v3-q5_0.bin"
sha256: d75795ecff3f83b5faa89d1900604ad8c780abd5739fae406de19f23ecd98ad1
- !!merge <<: *whisper
name: "whisper-medium"
overrides:
parameters:
model: ggml-medium.bin
files:
- filename: "ggml-medium.bin"
uri: "huggingface://ggerganov/whisper.cpp/ggml-medium.bin"
sha256: 6c14d5adee5f86394037b4e4e8b59f1673b6cee10e3cf0b11bbdbee79c156208
- !!merge <<: *whisper
name: "whisper-medium-q5_0"
overrides:
@@ -20415,15 +20562,6 @@
- filename: "ggml-small.bin"
uri: "huggingface://ggerganov/whisper.cpp/ggml-small.bin"
sha256: 1be3a9b2063867b937e64e2ec7483364a79917e157fa98c5d94b5c1fffea987b
- !!merge <<: *whisper
name: "whisper-small-en-tdrz"
overrides:
parameters:
model: ggml-small.en-tdrz.bin
files:
- filename: "ggml-small.bin"
uri: "huggingface://akashmjn/tinydiarize-whisper.cpp/ggml-small.en-tdrz.bin"
sha256: ceac3ec06d1d98ef71aec665283564631055fd6129b79d8e1be4f9cc33cc54b4
- !!merge <<: *whisper
name: "whisper-small-en-q5_1"
overrides:
@@ -20496,6 +20634,51 @@
- filename: "ggml-tiny.en-q8_0.bin"
uri: "huggingface://ggerganov/whisper.cpp/ggml-tiny.en-q8_0.bin"
sha256: 5bc2b3860aa151a4c6e7bb095e1fcce7cf12c7b020ca08dcec0c6d018bb7dd94
- !!merge <<: *whisper
name: "whisper-large"
overrides:
parameters:
model: ggml-large-v3.bin
files:
- filename: "ggml-large-v3.bin"
uri: "huggingface://ggerganov/whisper.cpp/ggml-large-v3.bin"
sha256: 64d182b440b98d5203c4f9bd541544d84c605196c4f7b845dfa11fb23594d1e2
- !!merge <<: *whisper
name: "whisper-large-q5_0"
overrides:
parameters:
model: ggml-large-v3-q5_0.bin
files:
- filename: "ggml-large-v3-q5_0.bin"
uri: "huggingface://ggerganov/whisper.cpp/ggml-large-v3-q5_0.bin"
sha256: d75795ecff3f83b5faa89d1900604ad8c780abd5739fae406de19f23ecd98ad1
- !!merge <<: *whisper
name: "whisper-large-turbo"
overrides:
parameters:
model: ggml-large-v3-turbo.bin
files:
- filename: "ggml-large-v3-turbo.bin"
uri: "huggingface://ggerganov/whisper.cpp/ggml-large-v3-turbo.bin"
sha256: 1fc70f774d38eb169993ac391eea357ef47c88757ef72ee5943879b7e8e2bc69
- !!merge <<: *whisper
name: "whisper-large-turbo-q5_0"
overrides:
parameters:
model: ggml-large-v3-turbo-q5_0.bin
files:
- filename: "ggml-large-v3-turbo-q5_0.bin"
uri: "huggingface://ggerganov/whisper.cpp/ggml-large-v3-turbo-q5_0.bin"
sha256: 394221709cd5ad1f40c46e6031ca61bce88931e6e088c188294c6d5a55ffa7e2
- !!merge <<: *whisper
name: "whisper-large-turbo-q8_0"
overrides:
parameters:
model: ggml-large-v3-turbo-q8_0.bin
files:
- filename: "ggml-large-v3-turbo-q8_0.bin"
uri: "huggingface://ggerganov/whisper.cpp/ggml-large-v3-turbo-q8_0.bin"
sha256: 317eb69c11673c9de1e1f0d459b253999804ec71ac4c23c17ecf5fbe24e259a1
## Bert embeddings (llama3.2 drop-in)
- !!merge <<: *llama32
name: "bert-embeddings"

View File

@@ -95,6 +95,7 @@ var knownModelsNameSuffixToSkip []string = []string{
".DS_Store",
".",
".safetensors",
".bin",
".partial",
".tar.gz",
}