Compare commits

...

200 Commits

Author SHA1 Message Date
Ettore Di Giacinto
8fea82e68b wire to grpc
Signed-off-by: Ettore Di Giacinto <mudler@localai.io>
2025-04-19 20:22:31 +02:00
Ettore Di Giacinto
01e2e3dbc3 wip reranking llama.cpp
Signed-off-by: Ettore Di Giacinto <mudler@localai.io>
2025-04-19 19:52:02 +02:00
Ettore Di Giacinto
61cc76c455 chore(autogptq): drop archived backend (#5214)
Signed-off-by: Ettore Di Giacinto <mudler@localai.io>
2025-04-19 15:52:29 +02:00
Ettore Di Giacinto
8abecb4a18 chore: bump grpc limits to 50MB (#5212)
Signed-off-by: Ettore Di Giacinto <mudler@localai.io>
2025-04-19 08:53:24 +02:00
LocalAI [bot]
8b3f76d8e6 chore: ⬆️ Update ggml-org/llama.cpp to 6408210082cc0a61b992b487be7e2ff2efbb9e36 (#5211)
⬆️ Update ggml-org/llama.cpp

Signed-off-by: github-actions[bot] <41898282+github-actions[bot]@users.noreply.github.com>
Co-authored-by: mudler <2420543+mudler@users.noreply.github.com>
2025-04-18 21:45:48 +00:00
Ettore Di Giacinto
4e0497f1a6 chore(model gallery): add pictor-1338-qwenp-1.5b (#5208)
Signed-off-by: Ettore Di Giacinto <mudler@localai.io>
2025-04-18 10:47:23 +02:00
Ettore Di Giacinto
ba88c9f451 chore(ci): use gemma-3-12b-it for models notifications (twitter)
Signed-off-by: Ettore Di Giacinto <mudler@users.noreply.github.com>
2025-04-18 10:38:36 +02:00
Ettore Di Giacinto
a598285825 chore(model gallery): add google-gemma-3-27b-it-qat-q4_0-small (#5207)
Signed-off-by: Ettore Di Giacinto <mudler@localai.io>
2025-04-18 10:35:48 +02:00
Ettore Di Giacinto
cb7a172897 chore(ci): use gemma-3-12b-it for models notifications
Signed-off-by: Ettore Di Giacinto <mudler@users.noreply.github.com>
2025-04-18 10:20:33 +02:00
Ettore Di Giacinto
771be28dfb ci: use gemma3 for notifications of releases
Signed-off-by: Ettore Di Giacinto <mudler@users.noreply.github.com>
2025-04-18 10:19:52 +02:00
Ettore Di Giacinto
7d6b3eb42d chore(model gallery): add readyart_amoral-fallen-omega-gemma3-12b (#5206)
Signed-off-by: Ettore Di Giacinto <mudler@localai.io>
2025-04-18 10:17:39 +02:00
Ettore Di Giacinto
0bb33fab55 chore(model gallery): add ibm-granite_granite-3.3-2b-instruct (#5205)
Signed-off-by: Ettore Di Giacinto <mudler@localai.io>
2025-04-18 10:15:05 +02:00
Ettore Di Giacinto
e3bf7f77f7 chore(model gallery): add ibm-granite_granite-3.3-8b-instruct (#5204)
Signed-off-by: Ettore Di Giacinto <mudler@localai.io>
2025-04-18 09:59:17 +02:00
LocalAI [bot]
bd1707d339 chore: ⬆️ Update ggml-org/llama.cpp to 2f74c354c0f752ed9aabf7d3a350e6edebd7e744 (#5203)
⬆️ Update ggml-org/llama.cpp

Signed-off-by: github-actions[bot] <41898282+github-actions[bot]@users.noreply.github.com>
Co-authored-by: mudler <2420543+mudler@users.noreply.github.com>
2025-04-17 21:52:12 +00:00
Ettore Di Giacinto
0474804541 fix(ci): remove duplicate entry
Signed-off-by: Ettore Di Giacinto <mudler@localai.io>
2025-04-17 19:51:21 +02:00
Ettore Di Giacinto
72693b3917 feat(install.sh): allow to uninstall with --uninstall (#5202)
Signed-off-by: Ettore Di Giacinto <mudler@localai.io>
2025-04-17 16:32:23 +02:00
Florian Bachmann
a03b70010f fix(talk): Talk interface sends content-type headers to chatgpt (#5200)
Talk interface sends content-type headers to chatgpt

Signed-off-by: baflo <834350+baflo@users.noreply.github.com>
2025-04-17 15:02:11 +02:00
Ettore Di Giacinto
e3717e5c1a chore(model gallery): add qwen2.5-14b-instruct-1m (#5201)
Signed-off-by: Ettore Di Giacinto <mudler@localai.io>
2025-04-17 10:42:22 +02:00
Ettore Di Giacinto
c8f6858218 chore(ci): add latest images for core (#5198)
Signed-off-by: Ettore Di Giacinto <mudler@localai.io>
2025-04-17 10:00:18 +02:00
Ettore Di Giacinto
06d7cc43ae chore(model gallery): add dreamgen_lucid-v1-nemo (#5196)
Signed-off-by: Ettore Di Giacinto <mudler@localai.io>
2025-04-17 09:10:09 +02:00
Ettore Di Giacinto
f2147cb850 chore(model gallery): add thedrummer_rivermind-12b-v1 (#5195)
Signed-off-by: Ettore Di Giacinto <mudler@localai.io>
2025-04-17 09:02:54 +02:00
Ettore Di Giacinto
75bb9f4c28 chore(model gallery): add menlo_rezero-v0.1-llama-3.2-3b-it-grpo-250404 (#5194)
Signed-off-by: Ettore Di Giacinto <mudler@localai.io>
2025-04-17 09:00:11 +02:00
LocalAI [bot]
a2ef4b1e07 chore: ⬆️ Update ggml-org/llama.cpp to 015022bb53387baa8b23817ac03743705c7d472b (#5192)
⬆️ Update ggml-org/llama.cpp

Signed-off-by: github-actions[bot] <41898282+github-actions[bot]@users.noreply.github.com>
Co-authored-by: mudler <2420543+mudler@users.noreply.github.com>
Co-authored-by: Ettore Di Giacinto <mudler@users.noreply.github.com>
2025-04-17 08:04:37 +02:00
LocalAI [bot]
161c9fe2db docs: ⬆️ update docs version mudler/LocalAI (#5191)
⬆️ Update docs version mudler/LocalAI

Signed-off-by: github-actions[bot] <41898282+github-actions[bot]@users.noreply.github.com>
Co-authored-by: mudler <2420543+mudler@users.noreply.github.com>
2025-04-16 22:13:49 +02:00
Ettore Di Giacinto
7547463f81 Update quickstart.md
Signed-off-by: Ettore Di Giacinto <mudler@users.noreply.github.com>
2025-04-16 08:48:55 +02:00
Gianluca Boiano
32e4dfd47b chore(model gallery): add suno-ai bark-cpp model (#5187)
Signed-off-by: Gianluca Boiano <morf3089@gmail.com>
2025-04-16 08:22:46 +02:00
Gianluca Boiano
f67e5dec68 fix: bark-cpp: assign FLAG_TTS to bark-cpp backend (#5186)
Signed-off-by: Gianluca Boiano <morf3089@gmail.com>
2025-04-16 08:21:30 +02:00
LocalAI [bot]
297d54acea chore: ⬆️ Update ggml-org/llama.cpp to 80f19b41869728eeb6a26569957b92a773a2b2c6 (#5183)
⬆️ Update ggml-org/llama.cpp

Signed-off-by: github-actions[bot] <41898282+github-actions[bot]@users.noreply.github.com>
Co-authored-by: mudler <2420543+mudler@users.noreply.github.com>
2025-04-15 22:50:32 +00:00
Ettore Di Giacinto
56f44d448c chore(docs): decrease logo size, minor enhancements
Signed-off-by: Ettore Di Giacinto <mudler@localai.io>
2025-04-15 22:00:51 +02:00
Richard Palethorpe
0f0fafacd9 fix(stablediffusion): Avoid overwriting SYCL specific flags from outer make call (#5181)
Signed-off-by: Richard Palethorpe <io@richiejp.com>
2025-04-15 19:31:25 +02:00
Ettore Di Giacinto
4f239bac89 feat: rebrand - LocalAGI and LocalRecall joins the LocalAI stack family (#5159)
* wip

Signed-off-by: Ettore Di Giacinto <mudler@localai.io>

* docs

Signed-off-by: Ettore Di Giacinto <mudler@localai.io>

* Update lotusdocs and hugo

Signed-off-by: Ettore Di Giacinto <mudler@localai.io>

* rephrasing

Signed-off-by: Ettore Di Giacinto <mudler@localai.io>

* fixups

Signed-off-by: Ettore Di Giacinto <mudler@localai.io>

* Latest fixups

Signed-off-by: Ettore Di Giacinto <mudler@localai.io>

* Adjust readme section

Signed-off-by: Ettore Di Giacinto <mudler@localai.io>

---------

Signed-off-by: Ettore Di Giacinto <mudler@localai.io>
2025-04-15 17:51:24 +02:00
Ettore Di Giacinto
04d74ac648 chore(model gallery): add m1-32b (#5182)
Signed-off-by: Ettore Di Giacinto <mudler@localai.io>
2025-04-15 17:17:17 +02:00
Richard Palethorpe
18c3dc33ee fix(stablediffusion): Pass ROCM LD CGO flags through to recursive make (#5179)
Signed-off-by: Richard Palethorpe <io@richiejp.com>
2025-04-15 09:27:29 +02:00
LocalAI [bot]
508cfa7369 chore: ⬆️ Update ggml-org/llama.cpp to d6d2c2ab8c8865784ba9fef37f2b2de3f2134d33 (#5178)
⬆️ Update ggml-org/llama.cpp

Signed-off-by: github-actions[bot] <41898282+github-actions[bot]@users.noreply.github.com>
Co-authored-by: mudler <2420543+mudler@users.noreply.github.com>
2025-04-14 23:10:16 +02:00
Ettore Di Giacinto
1f94cddbae chore(model gallery): add nvidia_llama-3.1-8b-ultralong-4m-instruct (#5177)
Signed-off-by: Ettore Di Giacinto <mudler@localai.io>
2025-04-14 12:30:55 +02:00
Ettore Di Giacinto
21ae7b4cd4 chore(model gallery): add nvidia_llama-3.1-8b-ultralong-1m-instruct (#5176)
Signed-off-by: Ettore Di Giacinto <mudler@localai.io>
2025-04-14 12:28:09 +02:00
Ettore Di Giacinto
bef22ab547 chore(model gallery): add skywork_skywork-or1-32b-preview (#5175)
Signed-off-by: Ettore Di Giacinto <mudler@localai.io>
2025-04-14 12:25:43 +02:00
Ettore Di Giacinto
eb04e8cdcf chore(model gallery): add skywork_skywork-or1-math-7b (#5174)
Signed-off-by: Ettore Di Giacinto <mudler@localai.io>
2025-04-14 12:23:44 +02:00
Ettore Di Giacinto
17e533a086 chore(model gallery): add skywork_skywork-or1-7b-preview (#5173)
Signed-off-by: Ettore Di Giacinto <mudler@localai.io>
2025-04-14 12:20:20 +02:00
qwerty108109
4fc68409ff Update README.md (#5172)
Modified the README.md to separate out the different docker run commands to make it easier to copy into the terminal.

Signed-off-by: qwerty108109 <97707491+qwerty108109@users.noreply.github.com>
2025-04-14 10:48:10 +02:00
Richard Palethorpe
e587044449 fix(stablediffusion): Avoid GGML commit which causes CUDA compile error (#5170)
Signed-off-by: Richard Palethorpe <io@richiejp.com>
2025-04-14 09:29:09 +02:00
LocalAI [bot]
1f09db5161 chore: ⬆️ Update ggml-org/llama.cpp to 71e90e8813f90097701e62f7fce137d96ddf41e2 (#5171)
⬆️ Update ggml-org/llama.cpp

Signed-off-by: github-actions[bot] <41898282+github-actions[bot]@users.noreply.github.com>
Co-authored-by: mudler <2420543+mudler@users.noreply.github.com>
2025-04-13 21:46:07 +00:00
Ettore Di Giacinto
05b744f086 chore(model gallery): add daichi-12b (#5169)
Signed-off-by: Ettore Di Giacinto <mudler@localai.io>
2025-04-13 15:53:11 +02:00
Ettore Di Giacinto
89ca4bc02d chore(model gallery): add hamanasu-magnum-4b-i1 (#5168)
Signed-off-by: Ettore Di Giacinto <mudler@localai.io>
2025-04-13 14:37:59 +02:00
Ettore Di Giacinto
e626aa48a4 chore(model gallery): add hamanasu-adventure-4b-i1 (#5167)
Signed-off-by: Ettore Di Giacinto <mudler@localai.io>
2025-04-13 14:35:57 +02:00
Ettore Di Giacinto
752b5e0339 chore(model gallery): add mag-picaro-72b (#5166)
Signed-off-by: Ettore Di Giacinto <mudler@localai.io>
2025-04-13 14:34:14 +02:00
Ettore Di Giacinto
637d72d6e3 chore(model gallery): add lightthinker-qwen (#5165)
Signed-off-by: Ettore Di Giacinto <mudler@localai.io>
2025-04-13 14:31:05 +02:00
LocalAI [bot]
f3bfec580a chore: ⬆️ Update ggml-org/llama.cpp to bc091a4dc585af25c438c8473285a8cfec5c7695 (#5158)
⬆️ Update ggml-org/llama.cpp

Signed-off-by: github-actions[bot] <41898282+github-actions[bot]@users.noreply.github.com>
Co-authored-by: mudler <2420543+mudler@users.noreply.github.com>
2025-04-13 08:23:41 +00:00
Ettore Di Giacinto
165c1ddff3 chore(model gallery): add tesslate_gradience-t1-3b-preview (#5160)
Signed-off-by: Ettore Di Giacinto <mudler@localai.io>
2025-04-12 10:37:40 +02:00
Ettore Di Giacinto
fb83238e9e chore(model gallery): add zyphra_zr1-1.5b (#5157)
Signed-off-by: Ettore Di Giacinto <mudler@localai.io>
2025-04-11 10:06:05 +02:00
Ettore Di Giacinto
700bfa41c7 chore(model gallery): add agentica-org_deepcoder-1.5b-preview (#5156)
Signed-off-by: Ettore Di Giacinto <mudler@localai.io>
2025-04-11 10:03:59 +02:00
LocalAI [bot]
25bdc350df chore: ⬆️ Update ggml-org/llama.cpp to 64eda5deb9859e87a020e56bab5d2f9ca956f1de (#5155)
⬆️ Update ggml-org/llama.cpp

Signed-off-by: github-actions[bot] <41898282+github-actions[bot]@users.noreply.github.com>
Co-authored-by: mudler <2420543+mudler@users.noreply.github.com>
2025-04-10 21:44:55 +00:00
Richard Palethorpe
1b899e1a68 feat(stablediffusion): Enable SYCL (#5144)
* feat(sycl): Enable SYCL for stable diffusion

This is a pain because we compile with CGO, but SD is compiled with
CMake. I don't think we can easily use CMake to set the linker flags
necessary. Also I could not find pkg-config calls that would fully set
the flags, so some of them are set manually.

See https://www.intel.com/content/www/us/en/developer/tools/oneapi/onemkl-link-line-advisor.html
for reference. I also resorted to searching the shared object files in
MKLROOT/lib for the symbols.

Signed-off-by: Richard Palethorpe <io@richiejp.com>

* fix(ci): Don't set nproc on cmake

Signed-off-by: Richard Palethorpe <io@richiejp.com>

---------

Signed-off-by: Richard Palethorpe <io@richiejp.com>
2025-04-10 15:20:53 +02:00
Ettore Di Giacinto
3bf13f8c69 chore(model gallery): add soob3123_amoral-cogito-v1-preview-qwen-14b (#5154)
Signed-off-by: Ettore Di Giacinto <mudler@localai.io>
2025-04-10 10:07:56 +02:00
Ettore Di Giacinto
7a00729374 chore(model gallery): add trappu_magnum-picaro-0.7-v2-12b (#5153)
Signed-off-by: Ettore Di Giacinto <mudler@localai.io>
2025-04-10 10:03:42 +02:00
Ettore Di Giacinto
d484028532 feat(diffusers): add support for Lumina2Text2ImgPipeline (#4806)
Signed-off-by: Ettore Di Giacinto <mudler@localai.io>
2025-04-10 09:55:51 +02:00
LocalAI [bot]
0eb7fc2c41 chore: ⬆️ Update ggml-org/llama.cpp to d3bd7193ba66c15963fd1c59448f22019a8caf6e (#5152)
⬆️ Update ggml-org/llama.cpp

Signed-off-by: github-actions[bot] <41898282+github-actions[bot]@users.noreply.github.com>
Co-authored-by: mudler <2420543+mudler@users.noreply.github.com>
2025-04-09 22:01:25 +00:00
Ettore Di Giacinto
a69e30e0c9 chore(model gallery): add agentica-org_deepcoder-14b-preview (#5151)
Signed-off-by: Ettore Di Giacinto <mudler@localai.io>
2025-04-09 16:55:47 +02:00
Ettore Di Giacinto
9c018e6bff chore(model gallery): add deepcogito_cogito-v1-preview-llama-70b (#5150)
Signed-off-by: Ettore Di Giacinto <mudler@localai.io>
2025-04-09 16:54:59 +02:00
Ettore Di Giacinto
281e818047 chore(model gallery): add deepcogito_cogito-v1-preview-llama-70b (#5150)
Signed-off-by: Ettore Di Giacinto <mudler@localai.io>
2025-04-09 16:53:28 +02:00
Ettore Di Giacinto
270f0e2157 chore(model gallery): add deepcogito_cogito-v1-preview-qwen-32b (#5149)
Signed-off-by: Ettore Di Giacinto <mudler@localai.io>
2025-04-09 16:48:15 +02:00
Ettore Di Giacinto
673e59e76c chore(model gallery): add deepcogito_cogito-v1-preview-llama-3b (#5148)
Signed-off-by: Ettore Di Giacinto <mudler@localai.io>
2025-04-09 16:42:53 +02:00
LocalAI [bot]
5a8a2adb44 chore: ⬆️ Update ggml-org/llama.cpp to b32efad2bc42460637c3a364c9554ea8217b3d7f (#5146)
⬆️ Update ggml-org/llama.cpp

Signed-off-by: github-actions[bot] <41898282+github-actions[bot]@users.noreply.github.com>
Co-authored-by: mudler <2420543+mudler@users.noreply.github.com>
Co-authored-by: Ettore Di Giacinto <mudler@users.noreply.github.com>
2025-04-09 15:39:04 +02:00
Ettore Di Giacinto
a7317d23bf chore(model gallery): add deepcogito_cogito-v1-preview-llama-8b (#5147)
Signed-off-by: Ettore Di Giacinto <mudler@localai.io>
2025-04-09 10:02:09 +02:00
Ettore Di Giacinto
2bab9b5fe2 fix: fix gallery name for cogito-v1-preview-qwen-14B
Signed-off-by: Ettore Di Giacinto <mudler@users.noreply.github.com>
2025-04-08 22:15:32 +02:00
Ettore Di Giacinto
081be3ba7d chore(model gallery): add cogito-v1-preview-qwen-14b (#5145)
Signed-off-by: Ettore Di Giacinto <mudler@localai.io>
2025-04-08 22:04:14 +02:00
Ettore Di Giacinto
25e6f21322 chore(deps): bump llama.cpp to 4ccea213bc629c4eef7b520f7f6c59ce9bbdaca0 (#5143)
Signed-off-by: Ettore Di Giacinto <mudler@localai.io>
2025-04-08 11:26:06 +02:00
Ettore Di Giacinto
b4df1c9cf3 fix(gemma): improve prompt for tool calls (#5142)
Signed-off-by: Ettore Di Giacinto <mudler@localai.io>
2025-04-08 10:12:42 +02:00
Ettore Di Giacinto
4fbd6609f2 chore(model gallery): add meta-llama_llama-4-scout-17b-16e-instruct (#5141)
Signed-off-by: Ettore Di Giacinto <mudler@localai.io>
2025-04-08 10:12:28 +02:00
Ettore Di Giacinto
7387932f89 chore(model gallery): add mensa-beta-14b-instruct-i1 (#5140)
Signed-off-by: Ettore Di Giacinto <mudler@localai.io>
2025-04-08 10:01:24 +02:00
Ettore Di Giacinto
59c37e67b2 chore(model gallery): add eurydice-24b-v2-i1 (#5139)
Signed-off-by: Ettore Di Giacinto <mudler@localai.io>
2025-04-08 09:56:29 +02:00
Ettore Di Giacinto
c09d227647 chore(model gallery): add watt-ai_watt-tool-70b (#5138)
Signed-off-by: Ettore Di Giacinto <mudler@localai.io>
2025-04-08 09:42:49 +02:00
Ettore Di Giacinto
547d322b28 chore(model gallery): add arliai_qwq-32b-arliai-rpr-v (#5137)
Signed-off-by: Ettore Di Giacinto <mudler@localai.io>
2025-04-08 09:40:26 +02:00
dependabot[bot]
a6f0bb410f chore(deps): bump securego/gosec from 2.22.0 to 2.22.3 (#5134)
Bumps [securego/gosec](https://github.com/securego/gosec) from 2.22.0 to 2.22.3.
- [Release notes](https://github.com/securego/gosec/releases)
- [Changelog](https://github.com/securego/gosec/blob/master/.goreleaser.yml)
- [Commits](https://github.com/securego/gosec/compare/v2.22.0...v2.22.3)

---
updated-dependencies:
- dependency-name: securego/gosec
  dependency-version: 2.22.3
  dependency-type: direct:production
  update-type: version-update:semver-patch
...

Signed-off-by: dependabot[bot] <support@github.com>
Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com>
2025-04-07 21:09:45 +00:00
Ettore Di Giacinto
710f624ecd fix(webui): improve model display, do not block view (#5133)
Signed-off-by: Ettore Di Giacinto <mudler@localai.io>
2025-04-07 18:03:25 +02:00
LocalAI [bot]
5018452be7 chore: ⬆️ Update ggml-org/llama.cpp to 916c83bfe7f8b08ada609c3b8e583cf5301e594b (#5130)
⬆️ Update ggml-org/llama.cpp

Signed-off-by: github-actions[bot] <41898282+github-actions[bot]@users.noreply.github.com>
Co-authored-by: mudler <2420543+mudler@users.noreply.github.com>
2025-04-06 21:51:51 +00:00
Ettore Di Giacinto
ece239966f chore: ⬆️ Update ggml-org/llama.cpp to 6bf28f0111ff9f21b3c1b1eace20c590281e7ba6 (#5127)
Signed-off-by: Ettore Di Giacinto <mudler@localai.io>
2025-04-06 14:01:51 +02:00
Ettore Di Giacinto
3b8bc7e64c chore(model gallery): add open-thoughts_openthinker2-7b (#5129)
Signed-off-by: Ettore Di Giacinto <mudler@localai.io>
2025-04-06 10:53:22 +02:00
Ettore Di Giacinto
fc73b2b430 chore(model gallery): add open-thoughts_openthinker2-32b (#5128)
Signed-off-by: Ettore Di Giacinto <mudler@localai.io>
2025-04-06 10:48:21 +02:00
Ettore Di Giacinto
901dba6063 chore(model gallery): add gemma-3-27b-it-qat (#5124)
Signed-off-by: Ettore Di Giacinto <mudler@localai.io>
2025-04-05 08:46:49 +02:00
LocalAI [bot]
b88a7a4550 chore: ⬆️ Update ggml-org/llama.cpp to 3e1d29348b5d77269f6931500dd1c1a729d429c8 (#5123)
⬆️ Update ggml-org/llama.cpp

Signed-off-by: github-actions[bot] <41898282+github-actions[bot]@users.noreply.github.com>
Co-authored-by: mudler <2420543+mudler@users.noreply.github.com>
2025-04-04 21:49:53 +00:00
Ettore Di Giacinto
106e40845f chore(model gallery): add katanemo_arch-function-chat-3b (#5122) 2025-04-04 10:45:44 +02:00
Ettore Di Giacinto
0064bec8f5 chore(model gallery): add katanemo_arch-function-chat-1.5b (#5121) 2025-04-04 10:31:44 +02:00
Ettore Di Giacinto
9e6dbb0b5a chore(model gallery): add katanemo_arch-function-chat-7b (#5120)
Signed-off-by: Ettore Di Giacinto <mudler@localai.io>
2025-04-04 10:29:47 +02:00
Ettore Di Giacinto
d26e61388b chore(model gallery): add tesslate_synthia-s1-27b (#5119)
Signed-off-by: Ettore Di Giacinto <mudler@localai.io>
2025-04-04 10:27:52 +02:00
Ettore Di Giacinto
31a7084c75 chore(model gallery): add gemma-3-4b-it-qat (#5118)
Signed-off-by: Ettore Di Giacinto <mudler@localai.io>
2025-04-04 10:23:56 +02:00
Ettore Di Giacinto
128612a6fc chore(model gallery): add gemma-3-12b-it-qat (#5117)
Signed-off-by: Ettore Di Giacinto <mudler@localai.io>
2025-04-04 10:21:45 +02:00
LocalAI [bot]
6af3f46bc3 chore: ⬆️ Update ggml-org/llama.cpp to c262beddf29f3f3be5bbbf167b56029a19876956 (#5116)
⬆️ Update ggml-org/llama.cpp

Signed-off-by: github-actions[bot] <41898282+github-actions[bot]@users.noreply.github.com>
Co-authored-by: mudler <2420543+mudler@users.noreply.github.com>
2025-04-03 22:59:49 +00:00
Richard Palethorpe
d2cf8ef070 fix(sycl): kernel not found error by forcing -fsycl (#5115)
* chore(sycl): Update oneapi to 2025:1

Signed-off-by: Richard Palethorpe <io@richiejp.com>

* fix(sycl): Pass -fsycl flag as workaround

-fsycl should be set by llama.cpp's cmake file, but something goes wrong
and it doesn't appear to get added

Signed-off-by: Richard Palethorpe <io@richiejp.com>

* fix(build): Speed up llama build by using all CPUs

Signed-off-by: Richard Palethorpe <io@richiejp.com>

---------

Signed-off-by: Richard Palethorpe <io@richiejp.com>
2025-04-03 16:22:59 +02:00
Ettore Di Giacinto
259ad3cfe6 chore(model gallery): add all-hands_openhands-lm-1.5b-v0.1 (#5114)
Signed-off-by: Ettore Di Giacinto <mudler@localai.io>
2025-04-03 10:25:46 +02:00
Ettore Di Giacinto
18b320d577 chore(deps): bump llama.cpp to 'f01bd02376f919b05ee635f438311be8dfc91d7c (#5110)
chore(deps): bump llama.cpp to 'f01bd02376f919b05ee635f438311be8dfc91d7c'

Signed-off-by: Ettore Di Giacinto <mudler@localai.io>
2025-04-03 10:23:14 +02:00
Ettore Di Giacinto
89e151f035 chore(model gallery): add all-hands_openhands-lm-7b-v0.1 (#5113)
Signed-off-by: Ettore Di Giacinto <mudler@localai.io>
2025-04-03 10:20:20 +02:00
Ettore Di Giacinto
22060f6410 chore(model gallery): add burtenshaw_gemmacoder3-12b (#5112)
Signed-off-by: Ettore Di Giacinto <mudler@localai.io>
2025-04-03 10:17:57 +02:00
Ettore Di Giacinto
7ee3288460 chore(model gallery): add all-hands_openhands-lm-32b-v0.1 (#5111)
Signed-off-by: Ettore Di Giacinto <mudler@localai.io>
2025-04-03 10:15:57 +02:00
LocalAI [bot]
cbbc954a8c chore: ⬆️ Update ggml-org/llama.cpp to f423981ac806bf031d83784bcb47d2721bc70f97 (#5108)
⬆️ Update ggml-org/llama.cpp

Signed-off-by: github-actions[bot] <41898282+github-actions[bot]@users.noreply.github.com>
Co-authored-by: mudler <2420543+mudler@users.noreply.github.com>
2025-04-02 09:22:53 +02:00
Ettore Di Giacinto
2c425e9c69 feat(loader): enhance single active backend by treating as singleton (#5107)
feat(loader): enhance single active backend by treating at singleton

Signed-off-by: Ettore Di Giacinto <mudler@localai.io>
2025-04-01 20:58:11 +02:00
LocalAI [bot]
c59975ab05 chore: ⬆️ Update ggml-org/llama.cpp to c80a7759dab10657b9b6c3e87eef988a133b9b6a (#5105)
⬆️ Update ggml-org/llama.cpp

Signed-off-by: github-actions[bot] <41898282+github-actions[bot]@users.noreply.github.com>
Co-authored-by: mudler <2420543+mudler@users.noreply.github.com>
2025-04-01 00:01:34 +02:00
Ettore Di Giacinto
05f7004487 fix: race during stop of active backends (#5106)
* chore: drop double call to stop all backends, refactors

Signed-off-by: Ettore Di Giacinto <mudler@localai.io>

* fix: do lock when cycling to models to delete

Signed-off-by: Ettore Di Giacinto <mudler@localai.io>

---------

Signed-off-by: Ettore Di Giacinto <mudler@localai.io>
2025-04-01 00:01:10 +02:00
Ettore Di Giacinto
2f9203cd2a chore: drop remoteLibraryURL from kong vars (#5103)
Signed-off-by: Ettore Di Giacinto <mudler@localai.io>
2025-03-31 22:48:17 +02:00
LocalAI [bot]
f09b33f2ef docs: ⬆️ update docs version mudler/LocalAI (#5104)
⬆️ Update docs version mudler/LocalAI

Signed-off-by: github-actions[bot] <41898282+github-actions[bot]@users.noreply.github.com>
Co-authored-by: mudler <2420543+mudler@users.noreply.github.com>
2025-03-31 22:48:03 +02:00
Ettore Di Giacinto
65470b0ab1 Update README 2025-03-31 21:51:09 +02:00
Ettore Di Giacinto
9a23fe662b Update README.md
Signed-off-by: Ettore Di Giacinto <mudler@users.noreply.github.com>
2025-03-31 19:35:34 +02:00
LocalAI [bot]
6d7ac09e96 chore: ⬆️ Update ggml-org/llama.cpp to 4663bd353c61c1136cd8a97b9908755e4ab30cec (#5100)
⬆️ Update ggml-org/llama.cpp

Signed-off-by: github-actions[bot] <41898282+github-actions[bot]@users.noreply.github.com>
Co-authored-by: mudler <2420543+mudler@users.noreply.github.com>
2025-03-30 21:59:30 +00:00
Ettore Di Giacinto
c2a39e3639 fix(llama.cpp): properly handle sigterm (#5099)
Signed-off-by: Ettore Di Giacinto <mudler@localai.io>
2025-03-30 18:08:29 +02:00
Ettore Di Giacinto
ae625a4d00 chore(model gallery): add hammer2.0-7b (#5098)
Signed-off-by: Ettore Di Giacinto <mudler@localai.io>
2025-03-30 09:50:21 +02:00
Ettore Di Giacinto
7f3a029596 chore(model gallery): add forgotten-abomination-70b-v5.0 (#5097)
Signed-off-by: Ettore Di Giacinto <mudler@localai.io>
2025-03-30 09:48:24 +02:00
Ettore Di Giacinto
b34cf00819 chore(model gallery): add galactic-qwen-14b-exp1 (#5096)
Signed-off-by: Ettore Di Giacinto <mudler@localai.io>
2025-03-30 09:46:51 +02:00
LocalAI [bot]
d4a10b4300 chore: ⬆️ Update ggml-org/llama.cpp to 0bb2919335d00ff0bc79d5015da95c422de51f03 (#5095)
⬆️ Update ggml-org/llama.cpp

Signed-off-by: github-actions[bot] <41898282+github-actions[bot]@users.noreply.github.com>
Co-authored-by: mudler <2420543+mudler@users.noreply.github.com>
2025-03-29 21:40:45 +00:00
Ettore Di Giacinto
9c74d74f7b feat(gguf): guess default context size from file (#5089)
feat(gguf): guess default config file from files

Signed-off-by: Ettore Di Giacinto <mudler@localai.io>
2025-03-29 14:42:14 +01:00
Ettore Di Giacinto
679ee7bea4 chore(model gallery): add chaoticneutrals_very_berry_qwen2_7b (#5093)
Signed-off-by: Ettore Di Giacinto <mudler@localai.io>
2025-03-29 12:34:49 +01:00
Ettore Di Giacinto
77d7dc62c4 chore(model gallery): add tesslate_tessa-t1-3b (#5092)
Signed-off-by: Ettore Di Giacinto <mudler@localai.io>
2025-03-29 12:15:28 +01:00
Ettore Di Giacinto
699519d1fe chore(model gallery): add tesslate_tessa-t1-7b (#5091)
Signed-off-by: Ettore Di Giacinto <mudler@localai.io>
2025-03-29 12:12:01 +01:00
Ettore Di Giacinto
8faf39d34e chore(model gallery): add tesslate_tessa-t1-14b (#5090)
Signed-off-by: Ettore Di Giacinto <mudler@localai.io>
2025-03-29 11:58:39 +01:00
Ettore Di Giacinto
5d261a6fcd chore(model gallery): add tesslate_tessa-t1-32b (#5088)
Signed-off-by: Ettore Di Giacinto <mudler@localai.io>
2025-03-29 11:53:47 +01:00
Ettore Di Giacinto
22d5727089 chore(model gallery): add tarek07_legion-v2.1-llama-70b (#5087) 2025-03-29 11:27:06 +01:00
LocalAI [bot]
c965197d6f chore: ⬆️ Update ggml-org/llama.cpp to b4ae50810e4304d052e630784c14bde7e79e4132 (#5085)
⬆️ Update ggml-org/llama.cpp

Signed-off-by: github-actions[bot] <41898282+github-actions[bot]@users.noreply.github.com>
Co-authored-by: mudler <2420543+mudler@users.noreply.github.com>
2025-03-28 21:37:18 +00:00
Ettore Di Giacinto
994a6c4939 chore(model gallery): fallen-safeword-70b-r1-v4.1 (#5084) 2025-03-28 15:20:38 +01:00
Ettore Di Giacinto
f926d2a72b chore(model gallery): thoughtless-fallen-abomination-70b-r1-v4.1-i1 (#5083) 2025-03-28 15:11:54 +01:00
Ettore Di Giacinto
ddeb9ed93e chore(model gallery): qwen2.5-14b-instruct-1m-unalign-i1 (#5082) 2025-03-28 15:08:33 +01:00
Ettore Di Giacinto
c7e99c7b59 chore(model gallery): gemma-3-starshine-12b-i1 (#5081) 2025-03-28 14:50:39 +01:00
Ettore Di Giacinto
6fabc92e56 chore(model gallery): add soob3123_amoral-gemma3-12b-v2 (#5080)
Signed-off-by: Ettore Di Giacinto <mudler@localai.io>
2025-03-28 14:45:02 +01:00
LocalAI [bot]
4645b3c919 chore: ⬆️ Update ggml-org/llama.cpp to 5dec47dcd411fdf815a3708fd6194e2b13d19006 (#5079)
⬆️ Update ggml-org/llama.cpp

Signed-off-by: github-actions[bot] <41898282+github-actions[bot]@users.noreply.github.com>
Co-authored-by: mudler <2420543+mudler@users.noreply.github.com>
2025-03-27 23:32:33 +00:00
Dave
134fe2705c fix: ensure git-lfs is present (#5078)
devcontainer clean builds had issue with git-lfs -- should this be installed for _all_ images for safety?

Signed-off-by: Dave Lee <dave@gray101.com>
2025-03-27 22:23:28 +01:00
LocalAI [bot]
3cca32ba7e chore: ⬆️ Update ggml-org/llama.cpp to b3298fa47a2d56ae892127ea038942ab1cada190 (#5077)
⬆️ Update ggml-org/llama.cpp

Signed-off-by: github-actions[bot] <41898282+github-actions[bot]@users.noreply.github.com>
Co-authored-by: mudler <2420543+mudler@users.noreply.github.com>
2025-03-27 10:47:07 +01:00
Ettore Di Giacinto
c069e61b26 chore(model gallery): add textsynth-8b-i1 (#5076)
Signed-off-by: Ettore Di Giacinto <mudler@localai.io>
2025-03-26 14:40:19 +01:00
Ettore Di Giacinto
7fa159e164 chore(model gallery): add blacksheep-24b-i1 (#5075)
Signed-off-by: Ettore Di Giacinto <mudler@localai.io>
2025-03-26 14:37:30 +01:00
Ettore Di Giacinto
5f92025617 chore(model gallery): add gemma-3-glitter-12b-i1 (#5074)
Signed-off-by: Ettore Di Giacinto <mudler@localai.io>
2025-03-26 10:19:00 +01:00
LocalAI [bot]
333e1bc732 chore: ⬆️ Update ggml-org/llama.cpp to ef19c71769681a0b3dde6bc90911728376e5d236 (#5073)
⬆️ Update ggml-org/llama.cpp

Signed-off-by: github-actions[bot] <41898282+github-actions[bot]@users.noreply.github.com>
Co-authored-by: mudler <2420543+mudler@users.noreply.github.com>
2025-03-26 09:51:20 +01:00
Ettore Di Giacinto
e90b97c144 chore(model gallery): add alamios_mistral-small-3.1-draft-0.5b (#5071)
Signed-off-by: Ettore Di Giacinto <mudler@localai.io>
2025-03-25 10:10:45 +01:00
Ettore Di Giacinto
747eeb1d46 chore(model gallery): add helpingai_helpingai3-raw (#5070)
Signed-off-by: Ettore Di Giacinto <mudler@localai.io>
2025-03-25 10:09:00 +01:00
Ettore Di Giacinto
5d2c53abc0 chore(model gallery): add jdineen_llama-3.1-8b-think (#5069)
Signed-off-by: Ettore Di Giacinto <mudler@localai.io>
2025-03-25 10:06:24 +01:00
LocalAI [bot]
0b1e721242 chore: ⬆️ Update ggml-org/llama.cpp to c95fa362b3587d1822558f7e28414521075f254f (#5068)
⬆️ Update ggml-org/llama.cpp

Signed-off-by: github-actions[bot] <41898282+github-actions[bot]@users.noreply.github.com>
Co-authored-by: mudler <2420543+mudler@users.noreply.github.com>
2025-03-24 21:37:16 +00:00
Ettore Di Giacinto
8c76a9ce99 chore(model gallery): add dusk_rainbow (#5066)
Signed-off-by: Ettore Di Giacinto <mudler@localai.io>
2025-03-24 09:49:32 +01:00
Ettore Di Giacinto
338321af5b chore(model gallery): add eximius_persona_5b (#5065)
Signed-off-by: Ettore Di Giacinto <mudler@localai.io>
2025-03-24 09:30:20 +01:00
Ettore Di Giacinto
2774a92484 chore(model gallery): add impish_llama_3b (#5064)
Signed-off-by: Ettore Di Giacinto <mudler@localai.io>
2025-03-24 09:27:04 +01:00
LocalAI [bot]
1a6bfb41a1 chore: ⬆️ Update ggml-org/llama.cpp to 77f9c6bbe55fccd9ea567794024cb80943947901 (#5062)
⬆️ Update ggml-org/llama.cpp

Signed-off-by: github-actions[bot] <41898282+github-actions[bot]@users.noreply.github.com>
Co-authored-by: mudler <2420543+mudler@users.noreply.github.com>
2025-03-23 21:37:14 +00:00
Ettore Di Giacinto
314981eaf8 chore(model gallery): add fiendish_llama_3b (#5061)
Signed-off-by: Ettore Di Giacinto <mudler@localai.io>
2025-03-23 10:00:19 +01:00
Ettore Di Giacinto
d7266c633d chore(model gallery): add sicariussicariistuff_x-ray_alpha (#5060)
Signed-off-by: Ettore Di Giacinto <mudler@localai.io>
2025-03-23 09:56:35 +01:00
Ettore Di Giacinto
eb4d5f2b95 chore(model gallery): add mawdistical_mawdistic-nightlife-24b (#5059)
Signed-off-by: Ettore Di Giacinto <mudler@localai.io>
2025-03-23 09:52:50 +01:00
Ettore Di Giacinto
c63b449ad6 chore(model gallery): add huihui-ai_gemma-3-1b-it-abliterated (#5058)
Signed-off-by: Ettore Di Giacinto <mudler@localai.io>
2025-03-23 09:35:05 +01:00
Ettore Di Giacinto
dd4a778c2c chore(model gallery): add thedrummer_fallen-gemma3-27b-v1 (#5057)
Signed-off-by: Ettore Di Giacinto <mudler@localai.io>
2025-03-23 09:32:58 +01:00
Ettore Di Giacinto
a0896d21d6 chore(model gallery): add thedrummer_fallen-gemma3-12b-v1 (#5056)
Signed-off-by: Ettore Di Giacinto <mudler@localai.io>
2025-03-23 09:31:37 +01:00
Ettore Di Giacinto
0e697f951a chore(model gallery): add thedrummer_fallen-gemma3-4b-v1 (#5055)
Signed-off-by: Ettore Di Giacinto <mudler@localai.io>
2025-03-23 09:30:17 +01:00
Ettore Di Giacinto
fa4bb9082d chore(model gallery): add knoveleng_open-rs3 (#5054)
Signed-off-by: Ettore Di Giacinto <mudler@localai.io>
2025-03-23 09:27:27 +01:00
LocalAI [bot]
8ff7b15441 chore: ⬆️ Update ggml-org/llama.cpp to ba932dfb50cc694645b1a148c72f8c06ee080b17 (#5053)
⬆️ Update ggml-org/llama.cpp

Signed-off-by: github-actions[bot] <41898282+github-actions[bot]@users.noreply.github.com>
Co-authored-by: mudler <2420543+mudler@users.noreply.github.com>
2025-03-22 22:18:55 +00:00
LocalAI [bot]
dd45f85a20 chore: ⬆️ Update ggml-org/llama.cpp to 4375415b4abf94fb36a5fd15f233ac0ee23c0bd1 (#5052)
⬆️ Update ggml-org/llama.cpp

Signed-off-by: github-actions[bot] <41898282+github-actions[bot]@users.noreply.github.com>
Co-authored-by: mudler <2420543+mudler@users.noreply.github.com>
2025-03-21 21:36:25 +00:00
Ettore Di Giacinto
decdd9e522 chore(model gallery): add luvgpt_phi3-uncensored-chat (#5051)
Signed-off-by: Ettore Di Giacinto <mudler@localai.io>
2025-03-21 09:11:07 +01:00
Ettore Di Giacinto
31a21d4a2c chore(model gallery): add sao10k_llama-3.3-70b-vulpecula-r1 (#5050)
Signed-off-by: Ettore Di Giacinto <mudler@localai.io>
2025-03-21 09:08:55 +01:00
Ettore Di Giacinto
2c129843a7 chore(model gallery): add qwen-writerdemo-7b-s500-i1 (#5049)
Signed-off-by: Ettore Di Giacinto <mudler@localai.io>
2025-03-21 09:03:39 +01:00
LocalAI [bot]
ce71a0bcfb chore: ⬆️ Update ggml-org/llama.cpp to e04643063b3d240b8c0fdba98677dff6ba346784 (#5047)
⬆️ Update ggml-org/llama.cpp

Signed-off-by: github-actions[bot] <41898282+github-actions[bot]@users.noreply.github.com>
Co-authored-by: mudler <2420543+mudler@users.noreply.github.com>
2025-03-20 21:34:51 +00:00
Ettore Di Giacinto
0a32c38317 chore(model gallery): add basic function template for gemma
Signed-off-by: Ettore Di Giacinto <mudler@localai.io>
2025-03-20 09:32:21 +01:00
Ettore Di Giacinto
36f596f260 chore(model gallery): add soob3123_amoral-gemma3-4b (#5046)
Signed-off-by: Ettore Di Giacinto <mudler@localai.io>
2025-03-20 09:30:04 +01:00
Ettore Di Giacinto
953552545b chore(model gallery): add samsungsailmontreal_bytecraft (#5045)
Signed-off-by: Ettore Di Giacinto <mudler@localai.io>
2025-03-20 09:27:33 +01:00
Ettore Di Giacinto
835e55b1de chore(model gallery): add rootxhacker_apollo-v3-32b (#5044)
Signed-off-by: Ettore Di Giacinto <mudler@localai.io>
2025-03-20 09:20:42 +01:00
Ettore Di Giacinto
dcd2921eaa chore(model gallery): add gemma-3-4b-it-uncensored-dbl-x-i1 (#5043)
Signed-off-by: Ettore Di Giacinto <mudler@localai.io>
2025-03-20 09:17:20 +01:00
LocalAI [bot]
5e6459fd18 chore: ⬆️ Update ggml-org/llama.cpp to 568013d0cd3d5add37c376b3d5e959809b711fc7 (#5042)
⬆️ Update ggml-org/llama.cpp

Signed-off-by: github-actions[bot] <41898282+github-actions[bot]@users.noreply.github.com>
Co-authored-by: mudler <2420543+mudler@users.noreply.github.com>
2025-03-19 21:47:18 +00:00
Ettore Di Giacinto
50ddb3eb59 chore(model gallery): add nvidia_llama-3_3-nemotron-super-49b-v1 (#5041)
Signed-off-by: Ettore Di Giacinto <mudler@localai.io>
2025-03-19 09:37:27 +01:00
Ettore Di Giacinto
5eebfee4b5 chore(model gallery): add gryphe_pantheon-rp-1.8-24b-small-3.1 (#5040)
Signed-off-by: Ettore Di Giacinto <mudler@localai.io>
2025-03-19 09:32:47 +01:00
Ettore Di Giacinto
567919ea90 chore(model gallery): add mistralai_mistral-small-3.1-24b-instruct-2503 (#5039)
Signed-off-by: Ettore Di Giacinto <mudler@localai.io>
2025-03-19 09:29:23 +01:00
LocalAI [bot]
27a3997530 chore(model-gallery): ⬆️ update checksum (#5036)
⬆️ Checksum updates in gallery/index.yaml

Signed-off-by: github-actions[bot] <41898282+github-actions[bot]@users.noreply.github.com>
Co-authored-by: mudler <2420543+mudler@users.noreply.github.com>
2025-03-19 09:18:40 +01:00
LocalAI [bot]
192ba2c657 chore: ⬆️ Update ggml-org/llama.cpp to d84635b1b085d54d6a21924e6171688d6e3dfb46 (#5035)
⬆️ Update ggml-org/llama.cpp

Signed-off-by: github-actions[bot] <41898282+github-actions[bot]@users.noreply.github.com>
Co-authored-by: mudler <2420543+mudler@users.noreply.github.com>
2025-03-18 22:23:39 +00:00
Ettore Di Giacinto
92abac9ca8 chore(model gallery): add soob3123_amoral-gemma3-12b (#5034)
Signed-off-by: Ettore Di Giacinto <mudler@localai.io>
2025-03-18 09:38:05 +01:00
Ettore Di Giacinto
04ebbbd73a chore(model gallery): add mlabonne_gemma-3-4b-it-abliterated (#5033)
Signed-off-by: Ettore Di Giacinto <mudler@localai.io>
2025-03-18 09:36:14 +01:00
Ettore Di Giacinto
55305e0d95 chore(model gallery): add mlabonne_gemma-3-12b-it-abliterated (#5032)
Signed-off-by: Ettore Di Giacinto <mudler@localai.io>
2025-03-18 09:32:41 +01:00
Ettore Di Giacinto
67623639e4 chore(model gallery): add mlabonne_gemma-3-27b-it-abliterated (#5031)
Signed-off-by: Ettore Di Giacinto <mudler@localai.io>
2025-03-18 09:30:25 +01:00
LocalAI [bot]
cc76def342 chore: ⬆️ Update ggml-org/llama.cpp to b1b132efcba216c873715c483809730bb253f4a1 (#5029)
⬆️ Update ggml-org/llama.cpp

Signed-off-by: github-actions[bot] <41898282+github-actions[bot]@users.noreply.github.com>
Co-authored-by: mudler <2420543+mudler@users.noreply.github.com>
2025-03-17 21:43:15 +00:00
Ettore Di Giacinto
4967fa5928 chore(model gallery): disable gemma3 mmproj
Signed-off-by: Ettore Di Giacinto <mudler@localai.io>
2025-03-17 12:34:21 +01:00
Ettore Di Giacinto
2b98e4ec56 chore(model gallery): update gemma3 URLs
Signed-off-by: Ettore Di Giacinto <mudler@localai.io>
2025-03-17 12:22:35 +01:00
Ettore Di Giacinto
fa1d058ee2 chore(model gallery): add mproj files for gemma3 models (#5028)
Signed-off-by: Ettore Di Giacinto <mudler@localai.io>
2025-03-17 12:11:46 +01:00
Ettore Di Giacinto
a49a588bfa chore(model gallery): add readyart_forgotten-safeword-70b-3.6 (#5027)
Signed-off-by: Ettore Di Giacinto <mudler@localai.io>
2025-03-17 11:50:34 +01:00
LocalAI [bot]
ca7dda61c6 chore: ⬆️ Update ggml-org/llama.cpp to 8ba95dca2065c0073698afdfcda4c8a8f08bf0d9 (#5026)
⬆️ Update ggml-org/llama.cpp

Signed-off-by: github-actions[bot] <41898282+github-actions[bot]@users.noreply.github.com>
Co-authored-by: mudler <2420543+mudler@users.noreply.github.com>
2025-03-16 21:42:17 +00:00
Ettore Di Giacinto
ffedddd76d chore(model gallery): add beaverai_mn-2407-dsk-qwqify-v0.1-12b (#5024)
Signed-off-by: Ettore Di Giacinto <mudler@localai.io>
2025-03-16 09:33:19 +01:00
Ettore Di Giacinto
766c76ae8e chore(model gallery): add pocketdoc_dans-sakurakaze-v1.0.0-12b (#5023)
Signed-off-by: Ettore Di Giacinto <mudler@localai.io>
2025-03-16 09:29:48 +01:00
LocalAI [bot]
3096ff33e9 chore: ⬆️ Update ggml-org/llama.cpp to f4c3dd5daa3a79f713813cf1aabdc5886071061d (#5022)
⬆️ Update ggml-org/llama.cpp

Signed-off-by: github-actions[bot] <41898282+github-actions[bot]@users.noreply.github.com>
Co-authored-by: mudler <2420543+mudler@users.noreply.github.com>
2025-03-15 21:43:48 +00:00
Ettore Di Giacinto
90a7451da4 chore(model gallery): add allura-org_bigger-body-70b (#5021)
Signed-off-by: Ettore Di Giacinto <mudler@localai.io>
2025-03-15 14:43:51 +01:00
LocalAI [bot]
529a4b9ee8 chore: ⬆️ Update ggml-org/llama.cpp to 9f2250ba722738ec0e6ab684636268a79160c854 (#5019)
⬆️ Update ggml-org/llama.cpp

Signed-off-by: github-actions[bot] <41898282+github-actions[bot]@users.noreply.github.com>
Co-authored-by: mudler <2420543+mudler@users.noreply.github.com>
2025-03-14 21:45:54 +00:00
Ettore Di Giacinto
0567e104eb chore(model gallery): add eurollm-9b-instruct (#5017)
Signed-off-by: Ettore Di Giacinto <mudler@localai.io>
2025-03-14 09:25:44 +01:00
Ettore Di Giacinto
ecbeacd022 chore(model gallery): add prithivmlmods_viper-coder-32b-elite13 (#5016)
Signed-off-by: Ettore Di Giacinto <mudler@localai.io>
2025-03-14 09:20:27 +01:00
Ettore Di Giacinto
2772960e41 chore(model gallery): add nousresearch_deephermes-3-llama-3-3b-preview (#5015)
Signed-off-by: Ettore Di Giacinto <mudler@localai.io>
2025-03-14 09:16:17 +01:00
Ettore Di Giacinto
1b694191e2 chore(model gallery): add nousresearch_deephermes-3-mistral-24b-preview (#5014)
Signed-off-by: Ettore Di Giacinto <mudler@localai.io>
2025-03-14 09:13:27 +01:00
Ettore Di Giacinto
69578a5f8f chore(model gallery): add models/qgallouedec_gemma-3-27b-it-codeforces-sft (#5013)
Signed-off-by: Ettore Di Giacinto <mudler@localai.io>
2025-03-14 09:11:13 +01:00
LocalAI [bot]
7d96cfe72b chore: ⬆️ Update ggml-org/llama.cpp to 84d547554123a62e9ac77107cb20e4f6cc503af4 (#5011)
⬆️ Update ggml-org/llama.cpp

Signed-off-by: github-actions[bot] <41898282+github-actions[bot]@users.noreply.github.com>
Co-authored-by: mudler <2420543+mudler@users.noreply.github.com>
2025-03-13 22:30:17 +00:00
Ettore Di Giacinto
423514a5a5 fix(clip): do not imply GPU offload by default (#5010)
* fix(clip): do not imply GPUs by default

Until a better solution is found upstream, be conservative and default
to GPU.

https://github.com/ggml-org/llama.cpp/pull/12322
https://github.com/ggml-org/llama.cpp/pull/12322#issuecomment-2720970695

Signed-off-by: Ettore Di Giacinto <mudler@localai.io>

* allow to override gpu via backend options

Signed-off-by: Ettore Di Giacinto <mudler@localai.io>

---------

Signed-off-by: Ettore Di Giacinto <mudler@localai.io>
2025-03-13 15:14:11 +01:00
Ettore Di Giacinto
12568c7d6d chore(model gallery): add gemma-3-1b-it (#5009)
Signed-off-by: Ettore Di Giacinto <mudler@localai.io>
2025-03-13 09:48:40 +01:00
Ettore Di Giacinto
8d16a0a536 chore(model gallery): add gemma-3-4b-it (#5008)
Signed-off-by: Ettore Di Giacinto <mudler@localai.io>
2025-03-13 09:47:01 +01:00
Ettore Di Giacinto
87ca801f00 chore(model gallery): add gemma-3-12b-it (#5007)
Signed-off-by: Ettore Di Giacinto <mudler@localai.io>
2025-03-13 09:44:49 +01:00
Ettore Di Giacinto
e4ecbb6c30 chore(model gallery): add gemma-3-27b-it (#5003)
Signed-off-by: Ettore Di Giacinto <mudler@localai.io>
2025-03-13 08:28:28 +01:00
LocalAI [bot]
b1a67de2b9 chore: ⬆️ Update ggml-org/llama.cpp to f08f4b3187b691bb08a8884ed39ebaa94e956707 (#5006)
⬆️ Update ggml-org/llama.cpp

Signed-off-by: github-actions[bot] <41898282+github-actions[bot]@users.noreply.github.com>
Co-authored-by: mudler <2420543+mudler@users.noreply.github.com>
2025-03-13 01:01:30 +00:00
LocalAI [bot]
71a23910fe chore: ⬆️ Update ggml-org/llama.cpp to 80a02aa8588ef167d616f76f1781b104c245ace0 (#5004)
⬆️ Update ggml-org/llama.cpp

Signed-off-by: github-actions[bot] <41898282+github-actions[bot]@users.noreply.github.com>
Co-authored-by: mudler <2420543+mudler@users.noreply.github.com>
2025-03-12 16:26:09 +00:00
LocalAI [bot]
0ede31f9cf chore: ⬆️ Update ggml-org/llama.cpp to 10f2e81809bbb69ecfe64fc8b4686285f84b0c07 (#4996)
⬆️ Update ggml-org/llama.cpp

Signed-off-by: github-actions[bot] <41898282+github-actions[bot]@users.noreply.github.com>
Co-authored-by: mudler <2420543+mudler@users.noreply.github.com>
2025-03-12 14:13:04 +00:00
Ettore Di Giacinto
9f5dcf2d1e feat(aio): update AIO image defaults (#5002)
* feat(aio): update AIO image defaults

cpu:
 - text-to-text: llama3.1
 - embeddings: granite-embeddings
 - vision: moonream2

gpu/intel:
 - text-to-text: localai-functioncall-qwen2.5-7b-v0.5
 - embeddings: granite-embeddings
 - vision: minicpm

Signed-off-by: Ettore Di Giacinto <mudler@localai.io>

* feat(aio): use minicpm as moondream2 stopped working

https://github.com/ggml-org/llama.cpp/pull/12322#issuecomment-2717483759

Signed-off-by: Ettore Di Giacinto <mudler@localai.io>

---------

Signed-off-by: Ettore Di Giacinto <mudler@localai.io>
2025-03-12 12:55:06 +01:00
Ettore Di Giacinto
e878556e98 chore(model gallery): add trashpanda-org_qwq-32b-snowdrop-v0 (#5000)
Signed-off-by: Ettore Di Giacinto <mudler@localai.io>
2025-03-12 08:26:09 +01:00
Ettore Di Giacinto
b096928172 chore(model gallery): add open-r1_olympiccoder-7b (#4999)
Signed-off-by: Ettore Di Giacinto <mudler@localai.io>
2025-03-12 08:24:35 +01:00
Ettore Di Giacinto
db7442ae67 chore(model gallery): add open-r1_olympiccoder-32b (#4998)
Signed-off-by: Ettore Di Giacinto <mudler@localai.io>
2025-03-12 08:23:01 +01:00
Ettore Di Giacinto
b6cd430e08 chore(model gallery): add thedrummer_gemmasutra-small-4b-v1 (#4997)
Signed-off-by: Ettore Di Giacinto <mudler@localai.io>
2025-03-12 08:19:51 +01:00
LocalAI [bot]
478e50cda2 chore: ⬆️ Update ggml-org/llama.cpp to 2c9f833d17bb5b8ea89dec663b072b5420fc5438 (#4991)
⬆️ Update ggml-org/llama.cpp

Signed-off-by: github-actions[bot] <41898282+github-actions[bot]@users.noreply.github.com>
Co-authored-by: mudler <2420543+mudler@users.noreply.github.com>
Co-authored-by: Ettore Di Giacinto <mudler@users.noreply.github.com>
2025-03-11 11:19:03 +00:00
Ettore Di Giacinto
1db2b9943c chore(deps): Bump grpcio to 1.71.0 (#4993)
Signed-off-by: Ettore Di Giacinto <mudler@localai.io>
2025-03-11 09:44:21 +01:00
Ettore Di Giacinto
ac41aa8b67 chore(model gallery): add openpipe_deductive-reasoning-qwen-32b (#4995)
Signed-off-by: Ettore Di Giacinto <mudler@localai.io>
2025-03-11 09:44:07 +01:00
Ettore Di Giacinto
156a98e2e7 chore(model gallery): add openpipe_deductive-reasoning-qwen-14b (#4994)
Signed-off-by: Ettore Di Giacinto <mudler@localai.io>
2025-03-11 09:40:38 +01:00
dependabot[bot]
d88ec1209e chore(deps): Bump docs/themes/hugo-theme-relearn from 4a4b60e to 9a020e7 (#4988)
chore(deps): Bump docs/themes/hugo-theme-relearn

Bumps [docs/themes/hugo-theme-relearn](https://github.com/McShelby/hugo-theme-relearn) from `4a4b60e` to `9a020e7`.
- [Release notes](https://github.com/McShelby/hugo-theme-relearn/releases)
- [Commits](4a4b60ef04...9a020e7ead)

---
updated-dependencies:
- dependency-name: docs/themes/hugo-theme-relearn
  dependency-type: direct:production
...

Signed-off-by: dependabot[bot] <support@github.com>
Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com>
2025-03-11 09:39:04 +01:00
144 changed files with 4341 additions and 1321 deletions

3
.env
View File

@@ -29,6 +29,9 @@
## Enable/Disable single backend (useful if only one GPU is available)
# LOCALAI_SINGLE_ACTIVE_BACKEND=true
# Forces shutdown of the backends if busy (only if LOCALAI_SINGLE_ACTIVE_BACKEND is set)
# LOCALAI_FORCE_BACKEND_SHUTDOWN=true
## Specify a build type. Available: cublas, openblas, clblas.
## cuBLAS: This is a GPU-accelerated version of the complete standard BLAS (Basic Linear Algebra Subprograms) library. It's provided by Nvidia and is part of their CUDA toolkit.
## OpenBLAS: This is an open-source implementation of the BLAS library that aims to provide highly optimized code for various platforms. It includes support for multi-threading and can be compiled to use hardware-specific features for additional performance. OpenBLAS can run on many kinds of hardware, including CPUs from Intel, AMD, and ARM.

View File

@@ -29,10 +29,6 @@ updates:
schedule:
# Check for updates to GitHub Actions every weekday
interval: "weekly"
- package-ecosystem: "pip"
directory: "/backend/python/autogptq"
schedule:
interval: "weekly"
- package-ecosystem: "pip"
directory: "/backend/python/bark"
schedule:

View File

@@ -15,7 +15,7 @@ jobs:
strategy:
matrix:
include:
- base-image: intel/oneapi-basekit:2025.0.0-0-devel-ubuntu22.04
- base-image: intel/oneapi-basekit:2025.1.0-0-devel-ubuntu22.04
runs-on: 'ubuntu-latest'
platforms: 'linux/amd64'
runs-on: ${{matrix.runs-on}}

View File

@@ -75,6 +75,7 @@ jobs:
grpc-base-image: "ubuntu:22.04"
runs-on: 'arc-runner-set'
makeflags: "--jobs=3 --output-sync=target"
latest-image: 'latest-gpu-hipblas-core'
- build-type: 'hipblas'
platforms: 'linux/amd64'
tag-latest: 'false'
@@ -251,6 +252,7 @@ jobs:
image-type: 'core'
runs-on: 'arc-runner-set'
makeflags: "--jobs=3 --output-sync=target"
latest-image: 'latest-gpu-intel-f16-core'
- build-type: 'sycl_f32'
platforms: 'linux/amd64'
tag-latest: 'false'
@@ -261,6 +263,7 @@ jobs:
image-type: 'core'
runs-on: 'arc-runner-set'
makeflags: "--jobs=3 --output-sync=target"
latest-image: 'latest-gpu-intel-f32-core'
core-image-build:
uses: ./.github/workflows/image_build.yml
@@ -339,6 +342,7 @@ jobs:
base-image: "ubuntu:22.04"
makeflags: "--jobs=4 --output-sync=target"
skip-drivers: 'false'
latest-image: 'latest-gpu-nvidia-cuda-12-core'
- build-type: 'cublas'
cuda-major-version: "12"
cuda-minor-version: "0"
@@ -351,17 +355,18 @@ jobs:
base-image: "ubuntu:22.04"
skip-drivers: 'false'
makeflags: "--jobs=4 --output-sync=target"
latest-image: 'latest-gpu-nvidia-cuda-12-core'
- build-type: 'vulkan'
platforms: 'linux/amd64'
tag-latest: 'false'
tag-suffix: '-vulkan-ffmpeg-core'
latest-image: 'latest-vulkan-ffmpeg-core'
ffmpeg: 'true'
image-type: 'core'
runs-on: 'arc-runner-set'
base-image: "ubuntu:22.04"
skip-drivers: 'false'
makeflags: "--jobs=4 --output-sync=target"
latest-image: 'latest-gpu-vulkan-core'
gh-runner:
uses: ./.github/workflows/image_build.yml
with:

View File

@@ -8,7 +8,7 @@ jobs:
notify-discord:
if: ${{ (github.event.pull_request.merged == true) && (contains(github.event.pull_request.labels.*.name, 'area/ai-model')) }}
env:
MODEL_NAME: hermes-2-theta-llama-3-8b
MODEL_NAME: gemma-3-12b-it
runs-on: ubuntu-latest
steps:
- uses: actions/checkout@v4
@@ -16,7 +16,7 @@ jobs:
fetch-depth: 0 # needed to checkout all branches for this Action to work
- uses: mudler/localai-github-action@v1
with:
model: 'hermes-2-theta-llama-3-8b' # Any from models.localai.io, or from huggingface.com with: "huggingface://<repository>/file"
model: 'gemma-3-12b-it' # Any from models.localai.io, or from huggingface.com with: "huggingface://<repository>/file"
# Check the PR diff using the current branch and the base branch of the PR
- uses: GrantBirki/git-diff-action@v2.8.0
id: git-diff-action
@@ -87,7 +87,7 @@ jobs:
notify-twitter:
if: ${{ (github.event.pull_request.merged == true) && (contains(github.event.pull_request.labels.*.name, 'area/ai-model')) }}
env:
MODEL_NAME: hermes-2-theta-llama-3-8b
MODEL_NAME: gemma-3-12b-it
runs-on: ubuntu-latest
steps:
- uses: actions/checkout@v4

View File

@@ -14,7 +14,7 @@ jobs:
steps:
- uses: mudler/localai-github-action@v1
with:
model: 'hermes-2-theta-llama-3-8b' # Any from models.localai.io, or from huggingface.com with: "huggingface://<repository>/file"
model: 'gemma-3-12b-it' # Any from models.localai.io, or from huggingface.com with: "huggingface://<repository>/file"
- name: Summarize
id: summarize
run: |
@@ -60,4 +60,4 @@ jobs:
DISCORD_AVATAR: "https://avatars.githubusercontent.com/u/139863280?v=4"
uses: Ilshidur/action-discord@master
with:
args: ${{ steps.summarize.outputs.message }}
args: ${{ steps.summarize.outputs.message }}

View File

@@ -18,7 +18,7 @@ jobs:
if: ${{ github.actor != 'dependabot[bot]' }}
- name: Run Gosec Security Scanner
if: ${{ github.actor != 'dependabot[bot]' }}
uses: securego/gosec@v2.22.0
uses: securego/gosec@v2.22.3
with:
# we let the report trigger content trigger a failure using the GitHub Security features.
args: '-no-fail -fmt sarif -out results.sarif ./...'

View File

@@ -15,7 +15,7 @@ ARG TARGETARCH
ARG TARGETVARIANT
ENV DEBIAN_FRONTEND=noninteractive
ENV EXTERNAL_GRPC_BACKENDS="coqui:/build/backend/python/coqui/run.sh,transformers:/build/backend/python/transformers/run.sh,rerankers:/build/backend/python/rerankers/run.sh,autogptq:/build/backend/python/autogptq/run.sh,bark:/build/backend/python/bark/run.sh,diffusers:/build/backend/python/diffusers/run.sh,faster-whisper:/build/backend/python/faster-whisper/run.sh,kokoro:/build/backend/python/kokoro/run.sh,vllm:/build/backend/python/vllm/run.sh,exllama2:/build/backend/python/exllama2/run.sh"
ENV EXTERNAL_GRPC_BACKENDS="coqui:/build/backend/python/coqui/run.sh,transformers:/build/backend/python/transformers/run.sh,rerankers:/build/backend/python/rerankers/run.sh,bark:/build/backend/python/bark/run.sh,diffusers:/build/backend/python/diffusers/run.sh,faster-whisper:/build/backend/python/faster-whisper/run.sh,kokoro:/build/backend/python/kokoro/run.sh,vllm:/build/backend/python/vllm/run.sh,exllama2:/build/backend/python/exllama2/run.sh"
RUN apt-get update && \
apt-get install -y --no-install-recommends \
@@ -24,6 +24,7 @@ RUN apt-get update && \
ca-certificates \
curl libssl-dev \
git \
git-lfs \
unzip upx-ucl && \
apt-get clean && \
rm -rf /var/lib/apt/lists/*
@@ -430,9 +431,6 @@ RUN if [[ ( "${EXTRA_BACKENDS}" =~ "kokoro" || -z "${EXTRA_BACKENDS}" ) && "$IMA
RUN if [[ ( "${EXTRA_BACKENDS}" =~ "vllm" || -z "${EXTRA_BACKENDS}" ) && "$IMAGE_TYPE" == "extras" ]]; then \
make -C backend/python/vllm \
; fi && \
if [[ ( "${EXTRA_BACKENDS}" =~ "autogptq" || -z "${EXTRA_BACKENDS}" ) && "$IMAGE_TYPE" == "extras" ]]; then \
make -C backend/python/autogptq \
; fi && \
if [[ ( "${EXTRA_BACKENDS}" =~ "bark" || -z "${EXTRA_BACKENDS}" ) && "$IMAGE_TYPE" == "extras" ]]; then \
make -C backend/python/bark \
; fi && \

View File

@@ -6,7 +6,7 @@ BINARY_NAME=local-ai
DETECT_LIBS?=true
# llama.cpp versions
CPPLLAMA_VERSION?=1e2f78a00450593e2dfa458796fcdd9987300dfc
CPPLLAMA_VERSION?=6408210082cc0a61b992b487be7e2ff2efbb9e36
# whisper.cpp version
WHISPER_REPO?=https://github.com/ggerganov/whisper.cpp
@@ -21,8 +21,8 @@ BARKCPP_REPO?=https://github.com/PABannier/bark.cpp.git
BARKCPP_VERSION?=v1.0.0
# stablediffusion.cpp (ggml)
STABLEDIFFUSION_GGML_REPO?=https://github.com/leejet/stable-diffusion.cpp
STABLEDIFFUSION_GGML_VERSION?=19d876ee300a055629926ff836489901f734f2b7
STABLEDIFFUSION_GGML_REPO?=https://github.com/richiejp/stable-diffusion.cpp
STABLEDIFFUSION_GGML_VERSION?=53e3b17eb3d0b5760ced06a1f98320b68b34aaae
ONNX_VERSION?=1.20.0
ONNX_ARCH?=x64
@@ -260,11 +260,7 @@ backend/go/image/stablediffusion-ggml/libsd.a: sources/stablediffusion-ggml.cpp
$(MAKE) -C backend/go/image/stablediffusion-ggml libsd.a
backend-assets/grpc/stablediffusion-ggml: backend/go/image/stablediffusion-ggml/libsd.a backend-assets/grpc
CGO_LDFLAGS="$(CGO_LDFLAGS)" C_INCLUDE_PATH=$(CURDIR)/backend/go/image/stablediffusion-ggml/ LIBRARY_PATH=$(CURDIR)/backend/go/image/stablediffusion-ggml/ \
$(GOCMD) build -ldflags "$(LD_FLAGS)" -tags "$(GO_TAGS)" -o backend-assets/grpc/stablediffusion-ggml ./backend/go/image/stablediffusion-ggml/
ifneq ($(UPX),)
$(UPX) backend-assets/grpc/stablediffusion-ggml
endif
$(MAKE) -C backend/go/image/stablediffusion-ggml CGO_LDFLAGS="$(CGO_LDFLAGS)" stablediffusion-ggml
sources/onnxruntime:
mkdir -p sources/onnxruntime
@@ -509,18 +505,10 @@ protogen-go-clean:
$(RM) bin/*
.PHONY: protogen-python
protogen-python: autogptq-protogen bark-protogen coqui-protogen diffusers-protogen exllama2-protogen rerankers-protogen transformers-protogen kokoro-protogen vllm-protogen faster-whisper-protogen
protogen-python: bark-protogen coqui-protogen diffusers-protogen exllama2-protogen rerankers-protogen transformers-protogen kokoro-protogen vllm-protogen faster-whisper-protogen
.PHONY: protogen-python-clean
protogen-python-clean: autogptq-protogen-clean bark-protogen-clean coqui-protogen-clean diffusers-protogen-clean exllama2-protogen-clean rerankers-protogen-clean transformers-protogen-clean kokoro-protogen-clean vllm-protogen-clean faster-whisper-protogen-clean
.PHONY: autogptq-protogen
autogptq-protogen:
$(MAKE) -C backend/python/autogptq protogen
.PHONY: autogptq-protogen-clean
autogptq-protogen-clean:
$(MAKE) -C backend/python/autogptq protogen-clean
protogen-python-clean: bark-protogen-clean coqui-protogen-clean diffusers-protogen-clean exllama2-protogen-clean rerankers-protogen-clean transformers-protogen-clean kokoro-protogen-clean vllm-protogen-clean faster-whisper-protogen-clean
.PHONY: bark-protogen
bark-protogen:
@@ -597,7 +585,6 @@ vllm-protogen-clean:
## GRPC
# Note: it is duplicated in the Dockerfile
prepare-extra-conda-environments: protogen-python
$(MAKE) -C backend/python/autogptq
$(MAKE) -C backend/python/bark
$(MAKE) -C backend/python/coqui
$(MAKE) -C backend/python/diffusers
@@ -809,7 +796,8 @@ docker-aio-all:
docker-image-intel:
docker build \
--build-arg BASE_IMAGE=intel/oneapi-basekit:2025.0.0-0-devel-ubuntu22.04 \
--progress plain \
--build-arg BASE_IMAGE=intel/oneapi-basekit:2025.1.0-0-devel-ubuntu24.04 \
--build-arg IMAGE_TYPE=$(IMAGE_TYPE) \
--build-arg GO_TAGS="none" \
--build-arg MAKEFLAGS="$(DOCKER_MAKEFLAGS)" \
@@ -817,7 +805,7 @@ docker-image-intel:
docker-image-intel-xpu:
docker build \
--build-arg BASE_IMAGE=intel/oneapi-basekit:2025.0.0-0-devel-ubuntu22.04 \
--build-arg BASE_IMAGE=intel/oneapi-basekit:2025.1.0-0-devel-ubuntu22.04 \
--build-arg IMAGE_TYPE=$(IMAGE_TYPE) \
--build-arg GO_TAGS="none" \
--build-arg MAKEFLAGS="$(DOCKER_MAKEFLAGS)" \

View File

@@ -1,7 +1,6 @@
<h1 align="center">
<br>
<img height="300" src="https://github.com/go-skynet/LocalAI/assets/2420543/0966aa2a-166e-4f99-a3e5-6c915fc997dd"> <br>
LocalAI
<img height="300" src="./core/http/static/logo.png"> <br>
<br>
</h1>
@@ -48,9 +47,58 @@
[![tests](https://github.com/go-skynet/LocalAI/actions/workflows/test.yml/badge.svg)](https://github.com/go-skynet/LocalAI/actions/workflows/test.yml)[![Build and Release](https://github.com/go-skynet/LocalAI/actions/workflows/release.yaml/badge.svg)](https://github.com/go-skynet/LocalAI/actions/workflows/release.yaml)[![build container images](https://github.com/go-skynet/LocalAI/actions/workflows/image.yml/badge.svg)](https://github.com/go-skynet/LocalAI/actions/workflows/image.yml)[![Bump dependencies](https://github.com/go-skynet/LocalAI/actions/workflows/bump_deps.yaml/badge.svg)](https://github.com/go-skynet/LocalAI/actions/workflows/bump_deps.yaml)[![Artifact Hub](https://img.shields.io/endpoint?url=https://artifacthub.io/badge/repository/localai)](https://artifacthub.io/packages/search?repo=localai)
**LocalAI** is the free, Open Source OpenAI alternative. LocalAI act as a drop-in replacement REST API thats compatible with OpenAI (Elevenlabs, Anthropic... ) API specifications for local AI inferencing. It allows you to run LLMs, generate images, audio (and not only) locally or on-prem with consumer grade hardware, supporting multiple model families. Does not require GPU. It is created and maintained by [Ettore Di Giacinto](https://github.com/mudler).
**LocalAI** is the free, Open Source OpenAI alternative. LocalAI act as a drop-in replacement REST API that's compatible with OpenAI (Elevenlabs, Anthropic... ) API specifications for local AI inferencing. It allows you to run LLMs, generate images, audio (and not only) locally or on-prem with consumer grade hardware, supporting multiple model families. Does not require GPU. It is created and maintained by [Ettore Di Giacinto](https://github.com/mudler).
![screen](https://github.com/mudler/LocalAI/assets/2420543/20b5ccd2-8393-44f0-aaf6-87a23806381e)
## 📚🆕 Local Stack Family
🆕 LocalAI is now part of a comprehensive suite of AI tools designed to work together:
<table>
<tr>
<td width="50%" valign="top">
<a href="https://github.com/mudler/LocalAGI">
<img src="https://raw.githubusercontent.com/mudler/LocalAGI/refs/heads/main/webui/react-ui/public/logo_2.png" width="300" alt="LocalAGI Logo">
</a>
</td>
<td width="50%" valign="top">
<h3><a href="https://github.com/mudler/LocalAGI">LocalAGI</a></h3>
<p>A powerful Local AI agent management platform that serves as a drop-in replacement for OpenAI's Responses API, enhanced with advanced agentic capabilities.</p>
</td>
</tr>
<tr>
<td width="50%" valign="top">
<a href="https://github.com/mudler/LocalRecall">
<img src="https://raw.githubusercontent.com/mudler/LocalRecall/refs/heads/main/static/localrecall_horizontal.png" width="300" alt="LocalRecall Logo">
</a>
</td>
<td width="50%" valign="top">
<h3><a href="https://github.com/mudler/LocalRecall">LocalRecall</a></h3>
<p>A REST-ful API and knowledge base management system that provides persistent memory and storage capabilities for AI agents.</p>
</td>
</tr>
</table>
## Screenshots
| Talk Interface | Generate Audio |
| --- | --- |
| ![Screenshot 2025-03-31 at 12-01-36 LocalAI - Talk](./docs/assets/images/screenshots/screenshot_tts.png) | ![Screenshot 2025-03-31 at 12-01-29 LocalAI - Generate audio with voice-en-us-ryan-low](./docs/assets/images/screenshots/screenshot_tts.png) |
| Models Overview | Generate Images |
| --- | --- |
| ![Screenshot 2025-03-31 at 12-01-20 LocalAI - Models](./docs/assets/images/screenshots/screenshot_gallery.png) | ![Screenshot 2025-03-31 at 12-31-41 LocalAI - Generate images with flux 1-dev](./docs/assets/images/screenshots/screenshot_image.png) |
| Chat Interface | Home |
| --- | --- |
| ![Screenshot 2025-03-31 at 11-57-44 LocalAI - Chat with localai-functioncall-qwen2 5-7b-v0 5](./docs/assets/images/screenshots/screenshot_chat.png) | ![Screenshot 2025-03-31 at 11-57-23 LocalAI API - c2a39e3 (c2a39e3639227cfd94ffffe9f5691239acc275a8)](./docs/assets/images/screenshots/screenshot_home.png) |
| Login | Swarm |
| --- | --- |
|![Screenshot 2025-03-31 at 12-09-59 ](./docs/assets/images/screenshots/screenshot_login.png) | ![Screenshot 2025-03-31 at 12-10-39 LocalAI - P2P dashboard](./docs/assets/images/screenshots/screenshot_p2p.png) |
## 💻 Quickstart
Run the installer script:
@@ -59,17 +107,21 @@ curl https://localai.io/install.sh | sh
```
Or run with docker:
### CPU only image:
```bash
# CPU only image:
docker run -ti --name local-ai -p 8080:8080 localai/localai:latest-cpu
# Nvidia GPU:
```
### Nvidia GPU:
```bash
docker run -ti --name local-ai -p 8080:8080 --gpus all localai/localai:latest-gpu-nvidia-cuda-12
# CPU and GPU image (bigger size):
```
### CPU and GPU image (bigger size):
```bash
docker run -ti --name local-ai -p 8080:8080 localai/localai:latest
# AIO images (it will pre-download a set of models ready for use, see https://localai.io/basics/container/)
```
### AIO images (it will pre-download a set of models ready for use, see https://localai.io/basics/container/)
```bash
docker run -ti --name local-ai -p 8080:8080 localai/localai:latest-aio-cpu
```
@@ -88,10 +140,13 @@ local-ai run https://gist.githubusercontent.com/.../phi-2.yaml
local-ai run oci://localai/phi-2:latest
```
[💻 Getting started](https://localai.io/basics/getting_started/index.html)
For more information, see [💻 Getting started](https://localai.io/basics/getting_started/index.html)
## 📰 Latest project news
- Apr 2025: [LocalAGI](https://github.com/mudler/LocalAGI) and [LocalRecall](https://github.com/mudler/LocalRecall) join the LocalAI family stack.
- Apr 2025: WebUI overhaul, AIO images updates
- Feb 2025: Backend cleanup, Breaking changes, new backends (kokoro, OutelTTS, faster-whisper), Nvidia L4T images
- Jan 2025: LocalAI model release: https://huggingface.co/mudler/LocalAI-functioncall-phi-4-v0.3, SANA support in diffusers: https://github.com/mudler/LocalAI/pull/4603
- Dec 2024: stablediffusion.cpp backend (ggml) added ( https://github.com/mudler/LocalAI/pull/4289 )
- Nov 2024: Bark.cpp backend added ( https://github.com/mudler/LocalAI/pull/4287 )
@@ -105,19 +160,6 @@ local-ai run oci://localai/phi-2:latest
Roadmap items: [List of issues](https://github.com/mudler/LocalAI/issues?q=is%3Aissue+is%3Aopen+label%3Aroadmap)
## 🔥🔥 Hot topics (looking for help):
- Multimodal with vLLM and Video understanding: https://github.com/mudler/LocalAI/pull/3729
- Realtime API https://github.com/mudler/LocalAI/issues/3714
- WebUI improvements: https://github.com/mudler/LocalAI/issues/2156
- Backends v2: https://github.com/mudler/LocalAI/issues/1126
- Improving UX v2: https://github.com/mudler/LocalAI/issues/1373
- Assistant API: https://github.com/mudler/LocalAI/issues/1273
- Vulkan: https://github.com/mudler/LocalAI/issues/1647
- Anthropic API: https://github.com/mudler/LocalAI/issues/1808
If you want to help and contribute, issues up for grabs: https://github.com/mudler/LocalAI/issues?q=is%3Aissue+is%3Aopen+label%3A%22up+for+grabs%22
## 🚀 [Features](https://localai.io/features/)
- 📖 [Text generation with GPTs](https://localai.io/features/text-generation/) (`llama.cpp`, `transformers`, `vllm` ... [:book: and more](https://localai.io/model-compatibility/index.html#model-compatibility-table))
@@ -131,12 +173,10 @@ If you want to help and contribute, issues up for grabs: https://github.com/mudl
- 🥽 [Vision API](https://localai.io/features/gpt-vision/)
- 📈 [Reranker API](https://localai.io/features/reranker/)
- 🆕🖧 [P2P Inferencing](https://localai.io/features/distribute/)
- [Agentic capabilities](https://github.com/mudler/LocalAGI)
- 🔊 Voice activity detection (Silero-VAD support)
- 🌍 Integrated WebUI!
## 💻 Usage
Check out the [Getting started](https://localai.io/basics/getting_started/index.html) section in our documentation.
### 🔗 Community and integrations

View File

@@ -1,7 +1,7 @@
name: text-embedding-ada-002
embeddings: true
name: text-embedding-ada-002
parameters:
model: huggingface://hugging-quants/Llama-3.2-1B-Instruct-Q4_K_M-GGUF/llama-3.2-1b-instruct-q4_k_m.gguf
model: huggingface://bartowski/granite-embedding-107m-multilingual-GGUF/granite-embedding-107m-multilingual-f16.gguf
usage: |
You can test this model with curl like this:

View File

@@ -1,101 +1,57 @@
name: gpt-4
mmap: true
parameters:
model: huggingface://NousResearch/Hermes-2-Pro-Llama-3-8B-GGUF/Hermes-2-Pro-Llama-3-8B-Q4_K_M.gguf
context_size: 8192
stopwords:
- "<|im_end|>"
- "<dummy32000>"
- "</tool_call>"
- "<|eot_id|>"
- "<|end_of_text|>"
f16: true
function:
# disable injecting the "answer" tool
disable_no_action: true
grammar:
# This allows the grammar to also return messages
mixed_mode: true
# Suffix to add to the grammar
#prefix: '<tool_call>\n'
# Force parallel calls in the grammar
# parallel_calls: true
return_name_in_function_response: true
# Without grammar uncomment the lines below
# Warning: this is relying only on the capability of the
# LLM model to generate the correct function call.
json_regex_match:
- "(?s)<tool_call>(.*?)</tool_call>"
- "(?s)<tool_call>(.*?)"
replace_llm_results:
# Drop the scratchpad content from responses
- key: "(?s)<scratchpad>.*</scratchpad>"
value: ""
replace_function_results:
# Replace everything that is not JSON array or object
#
- key: '(?s)^[^{\[]*'
value: ""
- key: '(?s)[^}\]]*$'
value: ""
- key: "'([^']*?)'"
value: "_DQUOTE_${1}_DQUOTE_"
- key: '\\"'
value: "__TEMP_QUOTE__"
- key: "\'"
value: "'"
- key: "_DQUOTE_"
value: '"'
- key: "__TEMP_QUOTE__"
value: '"'
# Drop the scratchpad content from responses
- key: "(?s)<scratchpad>.*</scratchpad>"
value: ""
no_mixed_free_string: true
schema_type: llama3.1 # or JSON is supported too (json)
response_regex:
- <function=(?P<name>\w+)>(?P<arguments>.*)</function>
mmap: true
name: gpt-4
parameters:
model: Hermes-3-Llama-3.2-3B-Q4_K_M.gguf
stopwords:
- <|im_end|>
- <dummy32000>
- <|eot_id|>
- <|end_of_text|>
template:
chat: |
{{.Input -}}
<|im_start|>assistant
<|begin_of_text|><|start_header_id|>system<|end_header_id|>
You are a helpful assistant<|eot_id|><|start_header_id|>user<|end_header_id|>
{{.Input }}
<|start_header_id|>assistant<|end_header_id|>
chat_message: |
<|im_start|>{{if eq .RoleName "assistant"}}assistant{{else if eq .RoleName "system"}}system{{else if eq .RoleName "tool"}}tool{{else if eq .RoleName "user"}}user{{end}}
{{- if .FunctionCall }}
<tool_call>
{{- else if eq .RoleName "tool" }}
<tool_response>
{{- end }}
{{- if .Content}}
{{.Content }}
{{- end }}
{{- if .FunctionCall}}
{{toJson .FunctionCall}}
{{- end }}
{{- if .FunctionCall }}
</tool_call>
{{- else if eq .RoleName "tool" }}
</tool_response>
{{- end }}<|im_end|>
<|start_header_id|>{{if eq .RoleName "assistant"}}assistant{{else if eq .RoleName "system"}}system{{else if eq .RoleName "tool"}}tool{{else if eq .RoleName "user"}}user{{end}}<|end_header_id|>
{{ if .FunctionCall -}}
{{ else if eq .RoleName "tool" -}}
The Function was executed and the response was:
{{ end -}}
{{ if .Content -}}
{{.Content -}}
{{ else if .FunctionCall -}}
{{ range .FunctionCall }}
[{{.FunctionCall.Name}}({{.FunctionCall.Arguments}})]
{{ end }}
{{ end -}}
<|eot_id|>
completion: |
{{.Input}}
function: |-
<|im_start|>system
You are a function calling AI model.
Here are the available tools:
<tools>
{{range .Functions}}
{'type': 'function', 'function': {'name': '{{.Name}}', 'description': '{{.Description}}', 'parameters': {{toJson .Parameters}} }}
{{end}}
</tools>
You should call the tools provided to you sequentially
Please use <scratchpad> XML tags to record your reasoning and planning before you call the functions as follows:
<scratchpad>
{step-by-step reasoning and plan in bullet points}
</scratchpad>
For each function call return a json object with function name and arguments within <tool_call> XML tags as follows:
<tool_call>
{"arguments": <args-dict>, "name": <function-name>}
</tool_call><|im_end|>
{{.Input -}}
<|im_start|>assistant
function: |
<|start_header_id|>system<|end_header_id|>
You are an expert in composing functions. You are given a question and a set of possible functions.
Based on the question, you will need to make one or more function/tool calls to achieve the purpose.
If none of the functions can be used, point it out. If the given question lacks the parameters required by the function, also point it out. You should only return the function call in tools call sections.
If you decide to invoke any of the function(s), you MUST put it in the format as follows:
[func_name1(params_name1=params_value1,params_name2=params_value2,...),func_name2(params_name1=params_value1,params_name2=params_value2,...)]
You SHOULD NOT include any other text in the response.
Here is a list of functions in JSON format that you can invoke.
{{toJson .Functions}}
<|eot_id|><|start_header_id|>user<|end_header_id|>
{{.Input}}
<|eot_id|><|start_header_id|>assistant<|end_header_id|>
download_files:
- filename: Hermes-3-Llama-3.2-3B-Q4_K_M.gguf
sha256: 2e220a14ba4328fee38cf36c2c068261560f999fadb5725ce5c6d977cb5126b5
uri: huggingface://bartowski/Hermes-3-Llama-3.2-3B-GGUF/Hermes-3-Llama-3.2-3B-Q4_K_M.gguf

View File

@@ -1,31 +1,49 @@
backend: llama-cpp
context_size: 4096
f16: true
mmap: true
mmproj: minicpm-v-2_6-mmproj-f16.gguf
name: gpt-4o
roles:
user: "USER:"
assistant: "ASSISTANT:"
system: "SYSTEM:"
mmproj: bakllava-mmproj.gguf
parameters:
model: bakllava.gguf
model: minicpm-v-2_6-Q4_K_M.gguf
stopwords:
- <|im_end|>
- <dummy32000>
- </s>
- <|endoftext|>
template:
chat: |
A chat between a curious human and an artificial intelligence assistant. The assistant gives helpful, detailed, and polite answers to the human's questions.
{{.Input -}}
<|im_start|>assistant
chat_message: |
<|im_start|>{{ .RoleName }}
{{ if .FunctionCall -}}
Function call:
{{ else if eq .RoleName "tool" -}}
Function response:
{{ end -}}
{{ if .Content -}}
{{.Content }}
{{ end -}}
{{ if .FunctionCall -}}
{{toJson .FunctionCall}}
{{ end -}}<|im_end|>
completion: |
{{.Input}}
ASSISTANT:
function: |
<|im_start|>system
You are a function calling AI model. You are provided with functions to execute. You may call one or more functions to assist with the user query. Don't make assumptions about what values to plug into functions. Here are the available tools:
{{range .Functions}}
{'type': 'function', 'function': {'name': '{{.Name}}', 'description': '{{.Description}}', 'parameters': {{toJson .Parameters}} }}
{{end}}
For each function call return a json object with function name and arguments
<|im_end|>
{{.Input -}}
<|im_start|>assistant
download_files:
- filename: bakllava.gguf
uri: huggingface://mys/ggml_bakllava-1/ggml-model-q4_k.gguf
- filename: bakllava-mmproj.gguf
uri: huggingface://mys/ggml_bakllava-1/mmproj-model-f16.gguf
usage: |
curl http://localhost:8080/v1/chat/completions -H "Content-Type: application/json" -d '{
"model": "gpt-4-vision-preview",
"messages": [{"role": "user", "content": [{"type":"text", "text": "What is in the image?"}, {"type": "image_url", "image_url": {"url": "https://upload.wikimedia.org/wikipedia/commons/thumb/d/dd/Gfp-wisconsin-madison-the-nature-boardwalk.jpg/2560px-Gfp-wisconsin-madison-the-nature-boardwalk.jpg" }}], "temperature": 0.9}]}'
- filename: minicpm-v-2_6-Q4_K_M.gguf
sha256: 3a4078d53b46f22989adbf998ce5a3fd090b6541f112d7e936eb4204a04100b1
uri: huggingface://openbmb/MiniCPM-V-2_6-gguf/ggml-model-Q4_K_M.gguf
- filename: minicpm-v-2_6-mmproj-f16.gguf
uri: huggingface://openbmb/MiniCPM-V-2_6-gguf/mmproj-model-f16.gguf
sha256: 4485f68a0f1aa404c391e788ea88ea653c100d8e98fe572698f701e5809711fd

View File

@@ -1,7 +1,7 @@
embeddings: true
name: text-embedding-ada-002
backend: sentencetransformers
parameters:
model: all-MiniLM-L6-v2
model: huggingface://bartowski/granite-embedding-107m-multilingual-GGUF/granite-embedding-107m-multilingual-f16.gguf
usage: |
You can test this model with curl like this:

View File

@@ -1,101 +1,53 @@
name: gpt-4
mmap: true
parameters:
model: huggingface://NousResearch/Hermes-2-Pro-Llama-3-8B-GGUF/Hermes-2-Pro-Llama-3-8B-Q4_K_M.gguf
context_size: 8192
stopwords:
- "<|im_end|>"
- "<dummy32000>"
- "</tool_call>"
- "<|eot_id|>"
- "<|end_of_text|>"
context_size: 4096
f16: true
function:
# disable injecting the "answer" tool
disable_no_action: true
capture_llm_results:
- (?s)<Thought>(.*?)</Thought>
grammar:
# This allows the grammar to also return messages
mixed_mode: true
# Suffix to add to the grammar
#prefix: '<tool_call>\n'
# Force parallel calls in the grammar
# parallel_calls: true
return_name_in_function_response: true
# Without grammar uncomment the lines below
# Warning: this is relying only on the capability of the
# LLM model to generate the correct function call.
json_regex_match:
- "(?s)<tool_call>(.*?)</tool_call>"
- "(?s)<tool_call>(.*?)"
properties_order: name,arguments
json_regex_match:
- (?s)<Output>(.*?)</Output>
replace_llm_results:
# Drop the scratchpad content from responses
- key: "(?s)<scratchpad>.*</scratchpad>"
- key: (?s)<Thought>(.*?)</Thought>
value: ""
replace_function_results:
# Replace everything that is not JSON array or object
#
- key: '(?s)^[^{\[]*'
value: ""
- key: '(?s)[^}\]]*$'
value: ""
- key: "'([^']*?)'"
value: "_DQUOTE_${1}_DQUOTE_"
- key: '\\"'
value: "__TEMP_QUOTE__"
- key: "\'"
value: "'"
- key: "_DQUOTE_"
value: '"'
- key: "__TEMP_QUOTE__"
value: '"'
# Drop the scratchpad content from responses
- key: "(?s)<scratchpad>.*</scratchpad>"
value: ""
mmap: true
name: gpt-4
parameters:
model: localai-functioncall-qwen2.5-7b-v0.5-q4_k_m.gguf
stopwords:
- <|im_end|>
- <dummy32000>
- </s>
template:
chat: |
{{.Input -}}
<|im_start|>assistant
chat_message: |
<|im_start|>{{if eq .RoleName "assistant"}}assistant{{else if eq .RoleName "system"}}system{{else if eq .RoleName "tool"}}tool{{else if eq .RoleName "user"}}user{{end}}
{{- if .FunctionCall }}
<tool_call>
{{- else if eq .RoleName "tool" }}
<tool_response>
{{- end }}
{{- if .Content}}
<|im_start|>{{ .RoleName }}
{{ if .FunctionCall -}}
Function call:
{{ else if eq .RoleName "tool" -}}
Function response:
{{ end -}}
{{ if .Content -}}
{{.Content }}
{{- end }}
{{- if .FunctionCall}}
{{ end -}}
{{ if .FunctionCall -}}
{{toJson .FunctionCall}}
{{- end }}
{{- if .FunctionCall }}
</tool_call>
{{- else if eq .RoleName "tool" }}
</tool_response>
{{- end }}<|im_end|>
{{ end -}}<|im_end|>
completion: |
{{.Input}}
function: |-
function: |
<|im_start|>system
You are a function calling AI model.
Here are the available tools:
<tools>
You are an AI assistant that executes function calls, and these are the tools at your disposal:
{{range .Functions}}
{'type': 'function', 'function': {'name': '{{.Name}}', 'description': '{{.Description}}', 'parameters': {{toJson .Parameters}} }}
{{end}}
</tools>
You should call the tools provided to you sequentially
Please use <scratchpad> XML tags to record your reasoning and planning before you call the functions as follows:
<scratchpad>
{step-by-step reasoning and plan in bullet points}
</scratchpad>
For each function call return a json object with function name and arguments within <tool_call> XML tags as follows:
<tool_call>
{"arguments": <args-dict>, "name": <function-name>}
</tool_call><|im_end|>
<|im_end|>
{{.Input -}}
<|im_start|>assistant
<|im_start|>assistant
download_files:
- filename: localai-functioncall-phi-4-v0.3-q4_k_m.gguf
sha256: 23fee048ded2a6e2e1a7b6bbefa6cbf83068f194caa9552aecbaa00fec8a16d5
uri: huggingface://mudler/LocalAI-functioncall-phi-4-v0.3-Q4_K_M-GGUF/localai-functioncall-phi-4-v0.3-q4_k_m.gguf

View File

@@ -1,35 +1,49 @@
backend: llama-cpp
context_size: 4096
f16: true
mmap: true
mmproj: minicpm-v-2_6-mmproj-f16.gguf
name: gpt-4o
roles:
user: "USER:"
assistant: "ASSISTANT:"
system: "SYSTEM:"
mmproj: llava-v1.6-7b-mmproj-f16.gguf
parameters:
model: llava-v1.6-mistral-7b.Q5_K_M.gguf
temperature: 0.2
top_k: 40
top_p: 0.95
seed: -1
model: minicpm-v-2_6-Q4_K_M.gguf
stopwords:
- <|im_end|>
- <dummy32000>
- </s>
- <|endoftext|>
template:
chat: |
A chat between a curious human and an artificial intelligence assistant. The assistant gives helpful, detailed, and polite answers to the human's questions.
{{.Input -}}
<|im_start|>assistant
chat_message: |
<|im_start|>{{ .RoleName }}
{{ if .FunctionCall -}}
Function call:
{{ else if eq .RoleName "tool" -}}
Function response:
{{ end -}}
{{ if .Content -}}
{{.Content }}
{{ end -}}
{{ if .FunctionCall -}}
{{toJson .FunctionCall}}
{{ end -}}<|im_end|>
completion: |
{{.Input}}
ASSISTANT:
function: |
<|im_start|>system
You are a function calling AI model. You are provided with functions to execute. You may call one or more functions to assist with the user query. Don't make assumptions about what values to plug into functions. Here are the available tools:
{{range .Functions}}
{'type': 'function', 'function': {'name': '{{.Name}}', 'description': '{{.Description}}', 'parameters': {{toJson .Parameters}} }}
{{end}}
For each function call return a json object with function name and arguments
<|im_end|>
{{.Input -}}
<|im_start|>assistant
download_files:
- filename: llava-v1.6-mistral-7b.Q5_K_M.gguf
uri: huggingface://cjpais/llava-1.6-mistral-7b-gguf/llava-v1.6-mistral-7b.Q5_K_M.gguf
- filename: llava-v1.6-7b-mmproj-f16.gguf
uri: huggingface://cjpais/llava-1.6-mistral-7b-gguf/mmproj-model-f16.gguf
usage: |
curl http://localhost:8080/v1/chat/completions -H "Content-Type: application/json" -d '{
"model": "gpt-4-vision-preview",
"messages": [{"role": "user", "content": [{"type":"text", "text": "What is in the image?"}, {"type": "image_url", "image_url": {"url": "https://upload.wikimedia.org/wikipedia/commons/thumb/d/dd/Gfp-wisconsin-madison-the-nature-boardwalk.jpg/2560px-Gfp-wisconsin-madison-the-nature-boardwalk.jpg" }}], "temperature": 0.9}]}'
- filename: minicpm-v-2_6-Q4_K_M.gguf
sha256: 3a4078d53b46f22989adbf998ce5a3fd090b6541f112d7e936eb4204a04100b1
uri: huggingface://openbmb/MiniCPM-V-2_6-gguf/ggml-model-Q4_K_M.gguf
- filename: minicpm-v-2_6-mmproj-f16.gguf
uri: huggingface://openbmb/MiniCPM-V-2_6-gguf/mmproj-model-f16.gguf
sha256: 4485f68a0f1aa404c391e788ea88ea653c100d8e98fe572698f701e5809711fd

View File

@@ -1,7 +1,7 @@
embeddings: true
name: text-embedding-ada-002
backend: sentencetransformers
parameters:
model: all-MiniLM-L6-v2
model: huggingface://bartowski/granite-embedding-107m-multilingual-GGUF/granite-embedding-107m-multilingual-f16.gguf
usage: |
You can test this model with curl like this:

View File

@@ -1,103 +1,53 @@
name: gpt-4
mmap: false
context_size: 8192
f16: false
parameters:
model: huggingface://NousResearch/Hermes-2-Pro-Llama-3-8B-GGUF/Hermes-2-Pro-Llama-3-8B-Q4_K_M.gguf
stopwords:
- "<|im_end|>"
- "<dummy32000>"
- "</tool_call>"
- "<|eot_id|>"
- "<|end_of_text|>"
context_size: 4096
f16: true
function:
# disable injecting the "answer" tool
disable_no_action: true
capture_llm_results:
- (?s)<Thought>(.*?)</Thought>
grammar:
# This allows the grammar to also return messages
mixed_mode: true
# Suffix to add to the grammar
#prefix: '<tool_call>\n'
# Force parallel calls in the grammar
# parallel_calls: true
return_name_in_function_response: true
# Without grammar uncomment the lines below
# Warning: this is relying only on the capability of the
# LLM model to generate the correct function call.
json_regex_match:
- "(?s)<tool_call>(.*?)</tool_call>"
- "(?s)<tool_call>(.*?)"
properties_order: name,arguments
json_regex_match:
- (?s)<Output>(.*?)</Output>
replace_llm_results:
# Drop the scratchpad content from responses
- key: "(?s)<scratchpad>.*</scratchpad>"
- key: (?s)<Thought>(.*?)</Thought>
value: ""
replace_function_results:
# Replace everything that is not JSON array or object
#
- key: '(?s)^[^{\[]*'
value: ""
- key: '(?s)[^}\]]*$'
value: ""
- key: "'([^']*?)'"
value: "_DQUOTE_${1}_DQUOTE_"
- key: '\\"'
value: "__TEMP_QUOTE__"
- key: "\'"
value: "'"
- key: "_DQUOTE_"
value: '"'
- key: "__TEMP_QUOTE__"
value: '"'
# Drop the scratchpad content from responses
- key: "(?s)<scratchpad>.*</scratchpad>"
value: ""
mmap: true
name: gpt-4
parameters:
model: localai-functioncall-qwen2.5-7b-v0.5-q4_k_m.gguf
stopwords:
- <|im_end|>
- <dummy32000>
- </s>
template:
chat: |
{{.Input -}}
<|im_start|>assistant
chat_message: |
<|im_start|>{{if eq .RoleName "assistant"}}assistant{{else if eq .RoleName "system"}}system{{else if eq .RoleName "tool"}}tool{{else if eq .RoleName "user"}}user{{end}}
{{- if .FunctionCall }}
<tool_call>
{{- else if eq .RoleName "tool" }}
<tool_response>
{{- end }}
{{- if .Content}}
<|im_start|>{{ .RoleName }}
{{ if .FunctionCall -}}
Function call:
{{ else if eq .RoleName "tool" -}}
Function response:
{{ end -}}
{{ if .Content -}}
{{.Content }}
{{- end }}
{{- if .FunctionCall}}
{{ end -}}
{{ if .FunctionCall -}}
{{toJson .FunctionCall}}
{{- end }}
{{- if .FunctionCall }}
</tool_call>
{{- else if eq .RoleName "tool" }}
</tool_response>
{{- end }}<|im_end|>
{{ end -}}<|im_end|>
completion: |
{{.Input}}
function: |-
function: |
<|im_start|>system
You are a function calling AI model.
Here are the available tools:
<tools>
You are an AI assistant that executes function calls, and these are the tools at your disposal:
{{range .Functions}}
{'type': 'function', 'function': {'name': '{{.Name}}', 'description': '{{.Description}}', 'parameters': {{toJson .Parameters}} }}
{{end}}
</tools>
You should call the tools provided to you sequentially
Please use <scratchpad> XML tags to record your reasoning and planning before you call the functions as follows:
<scratchpad>
{step-by-step reasoning and plan in bullet points}
</scratchpad>
For each function call return a json object with function name and arguments within <tool_call> XML tags as follows:
<tool_call>
{"arguments": <args-dict>, "name": <function-name>}
</tool_call><|im_end|>
<|im_end|>
{{.Input -}}
<|im_start|>assistant
download_files:
- filename: localai-functioncall-phi-4-v0.3-q4_k_m.gguf
sha256: 23fee048ded2a6e2e1a7b6bbefa6cbf83068f194caa9552aecbaa00fec8a16d5
uri: huggingface://mudler/LocalAI-functioncall-phi-4-v0.3-Q4_K_M-GGUF/localai-functioncall-phi-4-v0.3-q4_k_m.gguf

View File

@@ -1,35 +1,50 @@
backend: llama-cpp
context_size: 4096
mmap: false
f16: false
f16: true
mmap: true
mmproj: minicpm-v-2_6-mmproj-f16.gguf
name: gpt-4o
roles:
user: "USER:"
assistant: "ASSISTANT:"
system: "SYSTEM:"
mmproj: llava-v1.6-7b-mmproj-f16.gguf
parameters:
model: llava-v1.6-mistral-7b.Q5_K_M.gguf
temperature: 0.2
top_k: 40
top_p: 0.95
seed: -1
model: minicpm-v-2_6-Q4_K_M.gguf
stopwords:
- <|im_end|>
- <dummy32000>
- </s>
- <|endoftext|>
template:
chat: |
A chat between a curious human and an artificial intelligence assistant. The assistant gives helpful, detailed, and polite answers to the human's questions.
{{.Input -}}
<|im_start|>assistant
chat_message: |
<|im_start|>{{ .RoleName }}
{{ if .FunctionCall -}}
Function call:
{{ else if eq .RoleName "tool" -}}
Function response:
{{ end -}}
{{ if .Content -}}
{{.Content }}
{{ end -}}
{{ if .FunctionCall -}}
{{toJson .FunctionCall}}
{{ end -}}<|im_end|>
completion: |
{{.Input}}
ASSISTANT:
function: |
<|im_start|>system
You are a function calling AI model. You are provided with functions to execute. You may call one or more functions to assist with the user query. Don't make assumptions about what values to plug into functions. Here are the available tools:
{{range .Functions}}
{'type': 'function', 'function': {'name': '{{.Name}}', 'description': '{{.Description}}', 'parameters': {{toJson .Parameters}} }}
{{end}}
For each function call return a json object with function name and arguments
<|im_end|>
{{.Input -}}
<|im_start|>assistant
download_files:
- filename: llava-v1.6-mistral-7b.Q5_K_M.gguf
uri: huggingface://cjpais/llava-1.6-mistral-7b-gguf/llava-v1.6-mistral-7b.Q5_K_M.gguf
- filename: llava-v1.6-7b-mmproj-f16.gguf
uri: huggingface://cjpais/llava-1.6-mistral-7b-gguf/mmproj-model-f16.gguf
usage: |
curl http://localhost:8080/v1/chat/completions -H "Content-Type: application/json" -d '{
"model": "gpt-4-vision-preview",
"messages": [{"role": "user", "content": [{"type":"text", "text": "What is in the image?"}, {"type": "image_url", "image_url": {"url": "https://upload.wikimedia.org/wikipedia/commons/thumb/d/dd/Gfp-wisconsin-madison-the-nature-boardwalk.jpg/2560px-Gfp-wisconsin-madison-the-nature-boardwalk.jpg" }}], "temperature": 0.9}]}'
- filename: minicpm-v-2_6-Q4_K_M.gguf
sha256: 3a4078d53b46f22989adbf998ce5a3fd090b6541f112d7e936eb4204a04100b1
uri: huggingface://openbmb/MiniCPM-V-2_6-gguf/ggml-model-Q4_K_M.gguf
- filename: minicpm-v-2_6-mmproj-f16.gguf
uri: huggingface://openbmb/MiniCPM-V-2_6-gguf/mmproj-model-f16.gguf
sha256: 4485f68a0f1aa404c391e788ea88ea653c100d8e98fe572698f701e5809711fd

View File

@@ -190,11 +190,7 @@ message ModelOptions {
int32 NGQA = 20;
string ModelFile = 21;
// AutoGPTQ
string Device = 22;
bool UseTriton = 23;
string ModelBaseName = 24;
bool UseFastTokenizer = 25;
// Diffusers
string PipelineType = 26;

View File

@@ -2,7 +2,7 @@
## XXX: In some versions of CMake clip wasn't being built before llama.
## This is an hack for now, but it should be fixed in the future.
set(TARGET myclip)
add_library(${TARGET} clip.cpp clip.h llava.cpp llava.h)
add_library(${TARGET} clip.cpp clip.h clip-impl.h llava.cpp llava.h)
install(TARGETS ${TARGET} LIBRARY)
target_include_directories(myclip PUBLIC .)
target_include_directories(myclip PUBLIC ../..)

View File

@@ -8,7 +8,7 @@ ONEAPI_VARS?=/opt/intel/oneapi/setvars.sh
TARGET?=--target grpc-server
# Disable Shared libs as we are linking on static gRPC and we can't mix shared and static
CMAKE_ARGS+=-DBUILD_SHARED_LIBS=OFF
CMAKE_ARGS+=-DBUILD_SHARED_LIBS=OFF -DLLAMA_CURL=OFF
# If build type is cublas, then we set -DGGML_CUDA=ON to CMAKE_ARGS automatically
ifeq ($(BUILD_TYPE),cublas)
@@ -36,11 +36,18 @@ else ifeq ($(OS),Darwin)
endif
ifeq ($(BUILD_TYPE),sycl_f16)
CMAKE_ARGS+=-DGGML_SYCL=ON -DCMAKE_C_COMPILER=icx -DCMAKE_CXX_COMPILER=icpx -DGGML_SYCL_F16=ON
CMAKE_ARGS+=-DGGML_SYCL=ON \
-DCMAKE_C_COMPILER=icx \
-DCMAKE_CXX_COMPILER=icpx \
-DCMAKE_CXX_FLAGS="-fsycl" \
-DGGML_SYCL_F16=ON
endif
ifeq ($(BUILD_TYPE),sycl_f32)
CMAKE_ARGS+=-DGGML_SYCL=ON -DCMAKE_C_COMPILER=icx -DCMAKE_CXX_COMPILER=icpx
CMAKE_ARGS+=-DGGML_SYCL=ON \
-DCMAKE_C_COMPILER=icx \
-DCMAKE_CXX_COMPILER=icpx \
-DCMAKE_CXX_FLAGS="-fsycl"
endif
llama.cpp:
@@ -77,4 +84,4 @@ ifneq (,$(findstring sycl,$(BUILD_TYPE)))
else
+cd llama.cpp && mkdir -p build && cd build && cmake .. $(CMAKE_ARGS) && cmake --build . --config Release $(TARGET)
endif
cp llama.cpp/build/bin/grpc-server .
cp llama.cpp/build/bin/grpc-server .

View File

@@ -217,6 +217,7 @@ struct llama_client_slot
bool infill = false;
bool embedding = false;
bool reranker = false;
bool has_next_token = true;
bool truncated = false;
bool stopped_eos = false;
@@ -467,6 +468,7 @@ struct llama_server_context
bool all_slots_are_idle = false;
bool add_bos_token = true;
bool has_eos_token = true;
bool has_gpu = false;
bool grammar_lazy = false;
std::vector<common_grammar_trigger> grammar_triggers;
@@ -508,12 +510,15 @@ struct llama_server_context
bool load_model(const common_params &params_)
{
params = params_;
if (!params.mmproj.empty()) {
if (!params.mmproj.path.empty()) {
multimodal = true;
LOG_INFO("Multi Modal Mode Enabled", {});
clp_ctx = clip_model_load(params.mmproj.c_str(), /*verbosity=*/ 1);
clp_ctx = clip_init(params.mmproj.path.c_str(), clip_context_params {
/* use_gpu */ has_gpu,
/*verbosity=*/ GGML_LOG_LEVEL_INFO,
});
if(clp_ctx == nullptr) {
LOG_ERR("unable to load clip model: %s", params.mmproj.c_str());
LOG_ERR("unable to load clip model: %s", params.mmproj.path.c_str());
return false;
}
@@ -527,10 +532,16 @@ struct llama_server_context
ctx = common_init.context.release();
if (model == nullptr)
{
LOG_ERR("unable to load model: %s", params.model.c_str());
LOG_ERR("unable to load model: %s", params.model.path.c_str());
return false;
}
// Enable reranking if embeddings are enabled - moved after context initialization
if (params.embedding) {
params.reranking = true;
LOG_INFO("Reranking enabled (embeddings are enabled)", {});
}
if (multimodal) {
const int n_embd_clip = clip_n_mmproj_embd(clp_ctx);
const int n_embd_llm = llama_model_n_embd(model);
@@ -1409,7 +1420,59 @@ struct llama_server_context
queue_results.send(res);
}
void request_completion(int task_id, json data, bool infill, bool embedding, int multitask_id)
void send_rerank(llama_client_slot &slot, const llama_batch & batch)
{
task_result res;
res.id = slot.task_id;
res.multitask_id = slot.multitask_id;
res.error = false;
res.stop = true;
float score = -1e6f; // Default score if we fail to get embeddings
if (!params.reranking)
{
LOG_WARNING("reranking disabled", {
{"params.reranking", params.reranking},
});
}
else if (ctx == nullptr)
{
LOG_ERR("context is null, cannot perform reranking");
res.error = true;
}
else
{
for (int i = 0; i < batch.n_tokens; ++i) {
if (!batch.logits[i] || batch.seq_id[i][0] != slot.id) {
continue;
}
const float * embd = llama_get_embeddings_seq(ctx, batch.seq_id[i][0]);
if (embd == NULL) {
embd = llama_get_embeddings_ith(ctx, i);
}
if (embd == NULL) {
LOG("failed to get embeddings");
continue;
}
score = embd[0];
}
}
// Format result as JSON similar to the embedding function
res.result_json = json
{
{"score", score},
{"tokens", slot.num_prompt_tokens}
};
queue_results.send(res);
}
void request_completion(int task_id, json data, bool infill, bool embedding, bool rerank, int multitask_id)
{
task_server task;
task.id = task_id;
@@ -1417,6 +1480,7 @@ struct llama_server_context
task.data = std::move(data);
task.infill_mode = infill;
task.embedding_mode = embedding;
task.reranking_mode = rerank;
task.type = TASK_TYPE_COMPLETION;
task.multitask_id = multitask_id;
@@ -1548,7 +1612,7 @@ struct llama_server_context
subtask_data["prompt"] = subtask_data["prompt"][i];
// subtasks inherit everything else (infill mode, embedding mode, etc.)
request_completion(subtask_ids[i], subtask_data, multiprompt_task.infill_mode, multiprompt_task.embedding_mode, multitask_id);
request_completion(subtask_ids[i], subtask_data, multiprompt_task.infill_mode, multiprompt_task.embedding_mode, multiprompt_task.reranking_mode, multitask_id);
}
}
@@ -1587,6 +1651,7 @@ struct llama_server_context
slot->infill = task.infill_mode;
slot->embedding = task.embedding_mode;
slot->reranker = task.reranking_mode;
slot->task_id = task.id;
slot->multitask_id = task.multitask_id;
@@ -2030,6 +2095,14 @@ struct llama_server_context
continue;
}
if (slot.reranker)
{
send_rerank(slot, batch_view);
slot.release();
slot.i_batch = -1;
continue;
}
completion_token_output result;
const llama_token id = common_sampler_sample(slot.ctx_sampling, ctx, slot.i_batch - i);
@@ -2118,7 +2191,11 @@ static void append_to_generated_text_from_generated_token_probs(llama_server_con
}
std::function<void(int)> shutdown_handler;
inline void signal_handler(int signal) { shutdown_handler(signal); }
inline void signal_handler(int signal) {
exit(1);
}
/////////////////////////////////
////////////////////////////////
@@ -2314,15 +2391,15 @@ static std::string get_all_kv_cache_types() {
}
static void params_parse(const backend::ModelOptions* request,
common_params & params) {
common_params & params, llama_server_context &llama) {
// this is comparable to: https://github.com/ggerganov/llama.cpp/blob/d9b33fe95bd257b36c84ee5769cc048230067d6f/examples/server/server.cpp#L1809
params.model = request->modelfile();
params.model.path = request->modelfile();
if (!request->mmproj().empty()) {
// get the directory of modelfile
std::string model_dir = params.model.substr(0, params.model.find_last_of("/\\"));
params.mmproj = model_dir + "/"+ request->mmproj();
std::string model_dir = params.model.path.substr(0, params.model.path.find_last_of("/\\"));
params.mmproj.path = model_dir + "/"+ request->mmproj();
}
// params.model_alias ??
params.model_alias = request->modelfile();
@@ -2352,6 +2429,20 @@ static void params_parse(const backend::ModelOptions* request,
add_rpc_devices(std::string(llama_grpc_servers));
}
// decode options. Options are in form optname:optvale, or if booleans only optname.
for (int i = 0; i < request->options_size(); i++) {
std::string opt = request->options(i);
char *optname = strtok(&opt[0], ":");
char *optval = strtok(NULL, ":");
if (optval == NULL) {
optval = "true";
}
if (!strcmp(optname, "gpu")) {
llama.has_gpu = true;
}
}
// TODO: Add yarn
if (!request->tensorsplit().empty()) {
@@ -2383,7 +2474,7 @@ static void params_parse(const backend::ModelOptions* request,
scale_factor = request->lorascale();
}
// get the directory of modelfile
std::string model_dir = params.model.substr(0, params.model.find_last_of("/\\"));
std::string model_dir = params.model.path.substr(0, params.model.path.find_last_of("/\\"));
params.lora_adapters.push_back({ model_dir + "/"+request->loraadapter(), scale_factor });
}
params.use_mlock = request->mlock();
@@ -2445,7 +2536,7 @@ public:
grpc::Status LoadModel(ServerContext* context, const backend::ModelOptions* request, backend::Result* result) {
// Implement LoadModel RPC
common_params params;
params_parse(request, params);
params_parse(request, params, llama);
llama_backend_init();
llama_numa_init(params.numa);
@@ -2467,7 +2558,7 @@ public:
json data = parse_options(true, request, llama);
const int task_id = llama.queue_tasks.get_new_id();
llama.queue_results.add_waiting_task_id(task_id);
llama.request_completion(task_id, data, false, false, -1);
llama.request_completion(task_id, data, false, false, false, -1);
while (true)
{
task_result result = llama.queue_results.recv(task_id);
@@ -2521,7 +2612,7 @@ public:
json data = parse_options(false, request, llama);
const int task_id = llama.queue_tasks.get_new_id();
llama.queue_results.add_waiting_task_id(task_id);
llama.request_completion(task_id, data, false, false, -1);
llama.request_completion(task_id, data, false, false, false, -1);
std::string completion_text;
task_result result = llama.queue_results.recv(task_id);
if (!result.error && result.stop) {
@@ -2558,7 +2649,7 @@ public:
json data = parse_options(false, request, llama);
const int task_id = llama.queue_tasks.get_new_id();
llama.queue_results.add_waiting_task_id(task_id);
llama.request_completion(task_id, { {"prompt", data["embeddings"]}, { "n_predict", 0}, {"image_data", ""} }, false, true, -1);
llama.request_completion(task_id, { {"prompt", data["embeddings"]}, { "n_predict", 0}, {"image_data", ""} }, false, true, false, -1);
// get the result
task_result result = llama.queue_results.recv(task_id);
//std::cout << "Embedding result JSON" << result.result_json.dump() << std::endl;
@@ -2590,6 +2681,46 @@ public:
return grpc::Status::OK;
}
grpc::Status Rerank(ServerContext* context, const backend::RerankRequest* request, backend::RerankResult* rerankResult) {
// Create a JSON object with the query and documents
json data = {
{"prompt", request->query()},
{"documents", request->documents()},
{"top_n", request->top_n()}
};
// Generate a new task ID
const int task_id = llama.queue_tasks.get_new_id();
llama.queue_results.add_waiting_task_id(task_id);
// Queue the task with reranking mode enabled
llama.request_completion(task_id, data, false, false, true, -1);
// Get the result
task_result result = llama.queue_results.recv(task_id);
llama.queue_results.remove_waiting_task_id(task_id);
if (!result.error && result.stop) {
// Set usage information
backend::Usage* usage = rerankResult->mutable_usage();
usage->set_total_tokens(result.result_json.value("tokens", 0));
usage->set_prompt_tokens(result.result_json.value("tokens", 0));
// Get the score from the result
float score = result.result_json.value("score", 0.0f);
// Create document results for each input document
for (int i = 0; i < request->documents_size(); i++) {
backend::DocumentResult* doc_result = rerankResult->add_results();
doc_result->set_index(i);
doc_result->set_text(request->documents(i));
doc_result->set_relevance_score(score);
}
}
return grpc::Status::OK;
}
grpc::Status GetMetrics(ServerContext* context, const backend::MetricsRequest* request, backend::MetricsResponse* response) {
llama_client_slot* active_slot = llama.get_active_slot();
@@ -2622,7 +2753,9 @@ void RunServer(const std::string& server_address) {
ServerBuilder builder;
builder.AddListeningPort(server_address, grpc::InsecureServerCredentials());
builder.RegisterService(&service);
builder.SetMaxMessageSize(50 * 1024 * 1024); // 50MB
builder.SetMaxSendMessageSize(50 * 1024 * 1024); // 50MB
builder.SetMaxReceiveMessageSize(50 * 1024 * 1024); // 50MB
std::unique_ptr<Server> server(builder.BuildAndStart());
std::cout << "Server listening on " << server_address << std::endl;
server->Wait();
@@ -2631,6 +2764,20 @@ void RunServer(const std::string& server_address) {
int main(int argc, char** argv) {
std::string server_address("localhost:50051");
#if defined (__unix__) || (defined (__APPLE__) && defined (__MACH__))
struct sigaction sigint_action;
sigint_action.sa_handler = signal_handler;
sigemptyset (&sigint_action.sa_mask);
sigint_action.sa_flags = 0;
sigaction(SIGINT, &sigint_action, NULL);
sigaction(SIGTERM, &sigint_action, NULL);
#elif defined (_WIN32)
auto console_ctrl_handler = +[](DWORD ctrl_type) -> BOOL {
return (ctrl_type == CTRL_C_EVENT) ? (signal_handler(SIGINT), true) : false;
};
SetConsoleCtrlHandler(reinterpret_cast<PHANDLER_ROUTINE>(console_ctrl_handler), true);
#endif
// Define long and short options
struct option long_options[] = {
{"addr", required_argument, nullptr, 'a'},

View File

@@ -21,6 +21,7 @@ fi
## XXX: In some versions of CMake clip wasn't being built before llama.
## This is an hack for now, but it should be fixed in the future.
cp -rfv llama.cpp/examples/llava/clip.h llama.cpp/examples/grpc-server/clip.h
cp -rfv llama.cpp/examples/llava/clip-impl.h llama.cpp/examples/grpc-server/clip-impl.h
cp -rfv llama.cpp/examples/llava/llava.cpp llama.cpp/examples/grpc-server/llava.cpp
echo '#include "llama.h"' > llama.cpp/examples/grpc-server/llava.h
cat llama.cpp/examples/llava/llava.h >> llama.cpp/examples/grpc-server/llava.h

View File

@@ -61,6 +61,7 @@ struct task_server {
json data;
bool infill_mode = false;
bool embedding_mode = false;
bool reranking_mode = false;
int multitask_id = -1;
};

View File

@@ -8,6 +8,13 @@ ONEAPI_VARS?=/opt/intel/oneapi/setvars.sh
# keep standard at C11 and C++11
CXXFLAGS = -I. -I$(INCLUDE_PATH)/../../../../sources/stablediffusion-ggml.cpp/thirdparty -I$(INCLUDE_PATH)/../../../../sources/stablediffusion-ggml.cpp/ggml/include -I$(INCLUDE_PATH)/../../../../sources/stablediffusion-ggml.cpp -O3 -DNDEBUG -std=c++17 -fPIC
GOCMD?=go
CGO_LDFLAGS?=
# Avoid parent make file overwriting CGO_LDFLAGS which is needed for hipblas
CGO_LDFLAGS_SYCL=
GO_TAGS?=
LD_FLAGS?=
# Disable Shared libs as we are linking on static gRPC and we can't mix shared and static
CMAKE_ARGS+=-DBUILD_SHARED_LIBS=OFF
@@ -21,7 +28,7 @@ else ifeq ($(BUILD_TYPE),openblas)
# If build type is clblas (openCL) we set -DGGML_CLBLAST=ON -DCLBlast_DIR=/some/path
else ifeq ($(BUILD_TYPE),clblas)
CMAKE_ARGS+=-DGGML_CLBLAST=ON -DCLBlast_DIR=/some/path
# If it's hipblas we do have also to set CC=/opt/rocm/llvm/bin/clang CXX=/opt/rocm/llvm/bin/clang++
# If it's hipblas we do have also to set CC=/opt/rocm/llvm/bin/clang CXX=/opt/rocm/llvm/bin/clang++
else ifeq ($(BUILD_TYPE),hipblas)
CMAKE_ARGS+=-DGGML_HIP=ON
# If it's OSX, DO NOT embed the metal library - -DGGML_METAL_EMBED_LIBRARY=ON requires further investigation
@@ -36,16 +43,35 @@ else ifeq ($(OS),Darwin)
endif
endif
# ifeq ($(BUILD_TYPE),sycl_f16)
# CMAKE_ARGS+=-DGGML_SYCL=ON -DCMAKE_C_COMPILER=icx -DCMAKE_CXX_COMPILER=icpx -DGGML_SYCL_F16=ON -DSD_SYCL=ON -DGGML_SYCL_F16=ON
# endif
ifeq ($(BUILD_TYPE),sycl_f16)
CMAKE_ARGS+=-DGGML_SYCL=ON \
-DCMAKE_C_COMPILER=icx \
-DCMAKE_CXX_COMPILER=icpx \
-DSD_SYCL=ON \
-DGGML_SYCL_F16=ON
CC=icx
CXX=icpx
CGO_LDFLAGS_SYCL += -fsycl -L${DNNLROOT}/lib -ldnnl ${MKLROOT}/lib/intel64/libmkl_sycl.a -fiopenmp -fopenmp-targets=spir64 -lOpenCL
CGO_LDFLAGS_SYCL += $(shell pkg-config --libs mkl-static-lp64-gomp)
CGO_CXXFLAGS += -fiopenmp -fopenmp-targets=spir64
CGO_CXXFLAGS += $(shell pkg-config --cflags mkl-static-lp64-gomp )
endif
# ifeq ($(BUILD_TYPE),sycl_f32)
# CMAKE_ARGS+=-DGGML_SYCL=ON -DCMAKE_C_COMPILER=icx -DCMAKE_CXX_COMPILER=icpx -DSD_SYCL=ON
# endif
ifeq ($(BUILD_TYPE),sycl_f32)
CMAKE_ARGS+=-DGGML_SYCL=ON \
-DCMAKE_C_COMPILER=icx \
-DCMAKE_CXX_COMPILER=icpx \
-DSD_SYCL=ON
CC=icx
CXX=icpx
CGO_LDFLAGS_SYCL += -fsycl -L${DNNLROOT}/lib -ldnnl ${MKLROOT}/lib/intel64/libmkl_sycl.a -fiopenmp -fopenmp-targets=spir64 -lOpenCL
CGO_LDFLAGS_SYCL += $(shell pkg-config --libs mkl-static-lp64-gomp)
CGO_CXXFLAGS += -fiopenmp -fopenmp-targets=spir64
CGO_CXXFLAGS += $(shell pkg-config --cflags mkl-static-lp64-gomp )
endif
# warnings
CXXFLAGS += -Wall -Wextra -Wpedantic -Wcast-qual -Wno-unused-function
# CXXFLAGS += -Wall -Wextra -Wpedantic -Wcast-qual -Wno-unused-function
# Find all .a archives in ARCHIVE_DIR
# (ggml can have different backends cpu, cuda, etc., each backend generates a .a archive)
@@ -86,11 +112,24 @@ endif
$(MAKE) $(COMBINED_LIB)
gosd.o:
ifneq (,$(findstring sycl,$(BUILD_TYPE)))
+bash -c "source $(ONEAPI_VARS); \
$(CXX) $(CXXFLAGS) gosd.cpp -o gosd.o -c"
else
$(CXX) $(CXXFLAGS) gosd.cpp -o gosd.o -c
endif
libsd.a: gosd.o
cp $(INCLUDE_PATH)/build/libstable-diffusion.a ./libsd.a
$(AR) rcs libsd.a gosd.o
stablediffusion-ggml:
CGO_LDFLAGS="$(CGO_LDFLAGS) $(CGO_LDFLAGS_SYCL)" C_INCLUDE_PATH="$(INCLUDE_PATH)" LIBRARY_PATH="$(LIBRARY_PATH)" \
CC="$(CC)" CXX="$(CXX)" CGO_CXXFLAGS="$(CGO_CXXFLAGS)" \
$(GOCMD) build -ldflags "$(LD_FLAGS)" -tags "$(GO_TAGS)" -o ../../../../backend-assets/grpc/stablediffusion-ggml ./
ifneq ($(UPX),)
$(UPX) ../../../../backend-assets/grpc/stablediffusion-ggml
endif
clean:
rm -rf gosd.o libsd.a build $(COMBINED_LIB)
rm -rf gosd.o libsd.a build $(COMBINED_LIB)

View File

@@ -1,17 +0,0 @@
.PHONY: autogptq
autogptq: protogen
bash install.sh
.PHONY: protogen
protogen: backend_pb2_grpc.py backend_pb2.py
.PHONY: protogen-clean
protogen-clean:
$(RM) backend_pb2_grpc.py backend_pb2.py
backend_pb2_grpc.py backend_pb2.py:
python3 -m grpc_tools.protoc -I../.. --python_out=. --grpc_python_out=. backend.proto
.PHONY: clean
clean: protogen-clean
rm -rf venv __pycache__

View File

@@ -1,5 +0,0 @@
# Creating a separate environment for the autogptq project
```
make autogptq
```

View File

@@ -1,153 +0,0 @@
#!/usr/bin/env python3
from concurrent import futures
import argparse
import signal
import sys
import os
import time
import base64
import grpc
import backend_pb2
import backend_pb2_grpc
from auto_gptq import AutoGPTQForCausalLM
from transformers import AutoTokenizer, AutoModelForCausalLM
from transformers import TextGenerationPipeline
_ONE_DAY_IN_SECONDS = 60 * 60 * 24
# If MAX_WORKERS are specified in the environment use it, otherwise default to 1
MAX_WORKERS = int(os.environ.get('PYTHON_GRPC_MAX_WORKERS', '1'))
# Implement the BackendServicer class with the service methods
class BackendServicer(backend_pb2_grpc.BackendServicer):
def Health(self, request, context):
return backend_pb2.Reply(message=bytes("OK", 'utf-8'))
def LoadModel(self, request, context):
try:
device = "cuda:0"
if request.Device != "":
device = request.Device
# support loading local model files
model_path = os.path.join(os.environ.get('MODELS_PATH', './'), request.Model)
tokenizer = AutoTokenizer.from_pretrained(model_path, use_fast=True, trust_remote_code=request.TrustRemoteCode)
# support model `Qwen/Qwen-VL-Chat-Int4`
if "qwen-vl" in request.Model.lower():
self.model_name = "Qwen-VL-Chat"
model = AutoModelForCausalLM.from_pretrained(model_path,
trust_remote_code=request.TrustRemoteCode,
device_map="auto").eval()
else:
model = AutoGPTQForCausalLM.from_quantized(model_path,
model_basename=request.ModelBaseName,
use_safetensors=True,
trust_remote_code=request.TrustRemoteCode,
device=device,
use_triton=request.UseTriton,
quantize_config=None)
self.model = model
self.tokenizer = tokenizer
except Exception as err:
return backend_pb2.Result(success=False, message=f"Unexpected {err=}, {type(err)=}")
return backend_pb2.Result(message="Model loaded successfully", success=True)
def Predict(self, request, context):
penalty = 1.0
if request.Penalty != 0.0:
penalty = request.Penalty
tokens = 512
if request.Tokens != 0:
tokens = request.Tokens
top_p = 0.95
if request.TopP != 0.0:
top_p = request.TopP
prompt_images = self.recompile_vl_prompt(request)
compiled_prompt = prompt_images[0]
print(f"Prompt: {compiled_prompt}", file=sys.stderr)
# Implement Predict RPC
pipeline = TextGenerationPipeline(
model=self.model,
tokenizer=self.tokenizer,
max_new_tokens=tokens,
temperature=request.Temperature,
top_p=top_p,
repetition_penalty=penalty,
)
t = pipeline(compiled_prompt)[0]["generated_text"]
print(f"generated_text: {t}", file=sys.stderr)
if compiled_prompt in t:
t = t.replace(compiled_prompt, "")
# house keeping. Remove the image files from /tmp folder
for img_path in prompt_images[1]:
try:
os.remove(img_path)
except Exception as e:
print(f"Error removing image file: {img_path}, {e}", file=sys.stderr)
return backend_pb2.Result(message=bytes(t, encoding='utf-8'))
def PredictStream(self, request, context):
# Implement PredictStream RPC
#for reply in some_data_generator():
# yield reply
# Not implemented yet
return self.Predict(request, context)
def recompile_vl_prompt(self, request):
prompt = request.Prompt
image_paths = []
if "qwen-vl" in self.model_name.lower():
# request.Images is an array which contains base64 encoded images. Iterate the request.Images array, decode and save each image to /tmp folder with a random filename.
# Then, save the image file paths to an array "image_paths".
# read "request.Prompt", replace "[img-%d]" with the image file paths in the order they appear in "image_paths". Save the new prompt to "prompt".
for i, img in enumerate(request.Images):
timestamp = str(int(time.time() * 1000)) # Generate timestamp
img_path = f"/tmp/vl-{timestamp}.jpg" # Use timestamp in filename
with open(img_path, "wb") as f:
f.write(base64.b64decode(img))
image_paths.append(img_path)
prompt = prompt.replace(f"[img-{i}]", "<img>" + img_path + "</img>,")
else:
prompt = request.Prompt
return (prompt, image_paths)
def serve(address):
server = grpc.server(futures.ThreadPoolExecutor(max_workers=MAX_WORKERS))
backend_pb2_grpc.add_BackendServicer_to_server(BackendServicer(), server)
server.add_insecure_port(address)
server.start()
print("Server started. Listening on: " + address, file=sys.stderr)
# Define the signal handler function
def signal_handler(sig, frame):
print("Received termination signal. Shutting down...")
server.stop(0)
sys.exit(0)
# Set the signal handlers for SIGINT and SIGTERM
signal.signal(signal.SIGINT, signal_handler)
signal.signal(signal.SIGTERM, signal_handler)
try:
while True:
time.sleep(_ONE_DAY_IN_SECONDS)
except KeyboardInterrupt:
server.stop(0)
if __name__ == "__main__":
parser = argparse.ArgumentParser(description="Run the gRPC server.")
parser.add_argument(
"--addr", default="localhost:50051", help="The address to bind the server to."
)
args = parser.parse_args()
serve(args.addr)

View File

@@ -1,14 +0,0 @@
#!/bin/bash
set -e
source $(dirname $0)/../common/libbackend.sh
# This is here because the Intel pip index is broken and returns 200 status codes for every package name, it just doesn't return any package links.
# This makes uv think that the package exists in the Intel pip index, and by default it stops looking at other pip indexes once it finds a match.
# We need uv to continue falling through to the pypi default index to find optimum[openvino] in the pypi index
# the --upgrade actually allows us to *downgrade* torch to the version provided in the Intel pip index
if [ "x${BUILD_PROFILE}" == "xintel" ]; then
EXTRA_PIP_INSTALL_FLAGS+=" --upgrade --index-strategy=unsafe-first-match"
fi
installRequirements

View File

@@ -1,2 +0,0 @@
--extra-index-url https://download.pytorch.org/whl/cu118
torch==2.4.1+cu118

View File

@@ -1 +0,0 @@
torch==2.4.1

View File

@@ -1,2 +0,0 @@
--extra-index-url https://download.pytorch.org/whl/rocm6.0
torch==2.4.1+rocm6.0

View File

@@ -1,6 +0,0 @@
--extra-index-url https://pytorch-extension.intel.com/release-whl/stable/xpu/us/
intel-extension-for-pytorch==2.3.110+xpu
torch==2.3.1+cxx11.abi
oneccl_bind_pt==2.3.100+xpu
optimum[openvino]
setuptools

View File

@@ -1,6 +0,0 @@
accelerate
auto-gptq==0.7.1
grpcio==1.70.0
protobuf
certifi
transformers

View File

@@ -1,4 +0,0 @@
#!/bin/bash
source $(dirname $0)/../common/libbackend.sh
startBackend $@

View File

@@ -1,6 +0,0 @@
#!/bin/bash
set -e
source $(dirname $0)/../common/libbackend.sh
runUnittests

View File

@@ -61,7 +61,12 @@ class BackendServicer(backend_pb2_grpc.BackendServicer):
return backend_pb2.Result(success=True)
def serve(address):
server = grpc.server(futures.ThreadPoolExecutor(max_workers=MAX_WORKERS))
server = grpc.server(futures.ThreadPoolExecutor(max_workers=MAX_WORKERS),
options=[
('grpc.max_message_length', 50 * 1024 * 1024), # 50MB
('grpc.max_send_message_length', 50 * 1024 * 1024), # 50MB
('grpc.max_receive_message_length', 50 * 1024 * 1024), # 50MB
])
backend_pb2_grpc.add_BackendServicer_to_server(BackendServicer(), server)
server.add_insecure_port(address)
server.start()

View File

@@ -1,4 +1,4 @@
bark==0.1.5
grpcio==1.70.0
grpcio==1.71.0
protobuf
certifi

View File

@@ -1,3 +1,3 @@
grpcio==1.70.0
grpcio==1.71.0
protobuf
grpcio-tools

View File

@@ -86,7 +86,12 @@ class BackendServicer(backend_pb2_grpc.BackendServicer):
return backend_pb2.Result(success=True)
def serve(address):
server = grpc.server(futures.ThreadPoolExecutor(max_workers=MAX_WORKERS))
server = grpc.server(futures.ThreadPoolExecutor(max_workers=MAX_WORKERS),
options=[
('grpc.max_message_length', 50 * 1024 * 1024), # 50MB
('grpc.max_send_message_length', 50 * 1024 * 1024), # 50MB
('grpc.max_receive_message_length', 50 * 1024 * 1024), # 50MB
])
backend_pb2_grpc.add_BackendServicer_to_server(BackendServicer(), server)
server.add_insecure_port(address)
server.start()

View File

@@ -1,4 +1,4 @@
grpcio==1.70.0
grpcio==1.71.0
protobuf
certifi
packaging==24.1

View File

@@ -19,7 +19,7 @@ import grpc
from diffusers import SanaPipeline, StableDiffusion3Pipeline, StableDiffusionXLPipeline, StableDiffusionDepth2ImgPipeline, DPMSolverMultistepScheduler, StableDiffusionPipeline, DiffusionPipeline, \
EulerAncestralDiscreteScheduler, FluxPipeline, FluxTransformer2DModel
from diffusers import StableDiffusionImg2ImgPipeline, AutoPipelineForText2Image, ControlNetModel, StableVideoDiffusionPipeline
from diffusers import StableDiffusionImg2ImgPipeline, AutoPipelineForText2Image, ControlNetModel, StableVideoDiffusionPipeline, Lumina2Text2ImgPipeline
from diffusers.pipelines.stable_diffusion import safety_checker
from diffusers.utils import load_image, export_to_video
from compel import Compel, ReturnedEmbeddingsType
@@ -287,6 +287,12 @@ class BackendServicer(backend_pb2_grpc.BackendServicer):
if request.LowVRAM:
self.pipe.enable_model_cpu_offload()
elif request.PipelineType == "Lumina2Text2ImgPipeline":
self.pipe = Lumina2Text2ImgPipeline.from_pretrained(
request.Model,
torch_dtype=torch.bfloat16)
if request.LowVRAM:
self.pipe.enable_model_cpu_offload()
elif request.PipelineType == "SanaPipeline":
self.pipe = SanaPipeline.from_pretrained(
request.Model,
@@ -516,7 +522,12 @@ class BackendServicer(backend_pb2_grpc.BackendServicer):
def serve(address):
server = grpc.server(futures.ThreadPoolExecutor(max_workers=MAX_WORKERS))
server = grpc.server(futures.ThreadPoolExecutor(max_workers=MAX_WORKERS),
options=[
('grpc.max_message_length', 50 * 1024 * 1024), # 50MB
('grpc.max_send_message_length', 50 * 1024 * 1024), # 50MB
('grpc.max_receive_message_length', 50 * 1024 * 1024), # 50MB
])
backend_pb2_grpc.add_BackendServicer_to_server(BackendServicer(), server)
server.add_insecure_port(address)
server.start()

View File

@@ -1,5 +1,5 @@
setuptools
grpcio==1.70.0
grpcio==1.71.0
pillow
protobuf
certifi

View File

@@ -105,7 +105,12 @@ class BackendServicer(backend_pb2_grpc.BackendServicer):
def serve(address):
server = grpc.server(futures.ThreadPoolExecutor(max_workers=MAX_WORKERS))
server = grpc.server(futures.ThreadPoolExecutor(max_workers=MAX_WORKERS),
options=[
('grpc.max_message_length', 50 * 1024 * 1024), # 50MB
('grpc.max_send_message_length', 50 * 1024 * 1024), # 50MB
('grpc.max_receive_message_length', 50 * 1024 * 1024), # 50MB
])
backend_pb2_grpc.add_BackendServicer_to_server(BackendServicer(), server)
server.add_insecure_port(address)
server.start()

View File

@@ -1,4 +1,4 @@
grpcio==1.70.0
grpcio==1.71.0
protobuf
certifi
wheel

View File

@@ -62,7 +62,12 @@ class BackendServicer(backend_pb2_grpc.BackendServicer):
return backend_pb2.TranscriptResult(segments=resultSegments, text=text)
def serve(address):
server = grpc.server(futures.ThreadPoolExecutor(max_workers=MAX_WORKERS))
server = grpc.server(futures.ThreadPoolExecutor(max_workers=MAX_WORKERS),
options=[
('grpc.max_message_length', 50 * 1024 * 1024), # 50MB
('grpc.max_send_message_length', 50 * 1024 * 1024), # 50MB
('grpc.max_receive_message_length', 50 * 1024 * 1024), # 50MB
])
backend_pb2_grpc.add_BackendServicer_to_server(BackendServicer(), server)
server.add_insecure_port(address)
server.start()

View File

@@ -1,3 +1,3 @@
grpcio==1.70.0
grpcio==1.71.0
protobuf
grpcio-tools

View File

@@ -99,7 +99,12 @@ class BackendServicer(backend_pb2_grpc.BackendServicer):
return backend_pb2.Result(success=True)
def serve(address):
server = grpc.server(futures.ThreadPoolExecutor(max_workers=MAX_WORKERS))
server = grpc.server(futures.ThreadPoolExecutor(max_workers=MAX_WORKERS),
options=[
('grpc.max_message_length', 50 * 1024 * 1024), # 50MB
('grpc.max_send_message_length', 50 * 1024 * 1024), # 50MB
('grpc.max_receive_message_length', 50 * 1024 * 1024), # 50MB
])
backend_pb2_grpc.add_BackendServicer_to_server(BackendServicer(), server)
server.add_insecure_port(address)
server.start()

View File

@@ -1,4 +1,4 @@
grpcio==1.70.0
grpcio==1.71.0
protobuf
phonemizer
scipy

View File

@@ -91,7 +91,12 @@ class BackendServicer(backend_pb2_grpc.BackendServicer):
return backend_pb2.RerankResult(usage=usage, results=results)
def serve(address):
server = grpc.server(futures.ThreadPoolExecutor(max_workers=MAX_WORKERS))
server = grpc.server(futures.ThreadPoolExecutor(max_workers=MAX_WORKERS),
options=[
('grpc.max_message_length', 50 * 1024 * 1024), # 50MB
('grpc.max_send_message_length', 50 * 1024 * 1024), # 50MB
('grpc.max_receive_message_length', 50 * 1024 * 1024), # 50MB
])
backend_pb2_grpc.add_BackendServicer_to_server(BackendServicer(), server)
server.add_insecure_port(address)
server.start()

View File

@@ -1,3 +1,3 @@
grpcio==1.70.0
grpcio==1.71.0
protobuf
certifi

View File

@@ -559,7 +559,12 @@ class BackendServicer(backend_pb2_grpc.BackendServicer):
async def serve(address):
# Start asyncio gRPC server
server = grpc.aio.server(migration_thread_pool=futures.ThreadPoolExecutor(max_workers=MAX_WORKERS))
server = grpc.aio.server(migration_thread_pool=futures.ThreadPoolExecutor(max_workers=MAX_WORKERS),
options=[
('grpc.max_message_length', 50 * 1024 * 1024), # 50MB
('grpc.max_send_message_length', 50 * 1024 * 1024), # 50MB
('grpc.max_receive_message_length', 50 * 1024 * 1024), # 50MB
])
# Add the servicer to the server
backend_pb2_grpc.add_BackendServicer_to_server(BackendServicer(), server)
# Bind the server to the address

View File

@@ -1,4 +1,4 @@
grpcio==1.70.0
grpcio==1.71.0
protobuf
certifi
setuptools

View File

@@ -320,7 +320,12 @@ class BackendServicer(backend_pb2_grpc.BackendServicer):
async def serve(address):
# Start asyncio gRPC server
server = grpc.aio.server(migration_thread_pool=futures.ThreadPoolExecutor(max_workers=MAX_WORKERS))
server = grpc.aio.server(migration_thread_pool=futures.ThreadPoolExecutor(max_workers=MAX_WORKERS),
options=[
('grpc.max_message_length', 50 * 1024 * 1024), # 50MB
('grpc.max_send_message_length', 50 * 1024 * 1024), # 50MB
('grpc.max_receive_message_length', 50 * 1024 * 1024), # 50MB
])
# Add the servicer to the server
backend_pb2_grpc.add_BackendServicer_to_server(BackendServicer(), server)
# Bind the server to the address

View File

@@ -1,4 +1,4 @@
grpcio==1.70.0
grpcio==1.71.0
protobuf
certifi
setuptools

View File

@@ -16,7 +16,7 @@ type Application struct {
func newApplication(appConfig *config.ApplicationConfig) *Application {
return &Application{
backendLoader: config.NewBackendConfigLoader(appConfig.ModelPath),
modelLoader: model.NewModelLoader(appConfig.ModelPath),
modelLoader: model.NewModelLoader(appConfig.ModelPath, appConfig.SingleBackend),
applicationConfig: appConfig,
templatesEvaluator: templates.NewEvaluator(appConfig.ModelPath),
}

View File

@@ -143,7 +143,7 @@ func New(opts ...config.AppOption) (*Application, error) {
}()
}
if options.LoadToMemory != nil {
if options.LoadToMemory != nil && !options.SingleBackend {
for _, m := range options.LoadToMemory {
cfg, err := application.BackendLoader().LoadBackendConfigFileByNameDefaultOptions(m, options)
if err != nil {

View File

@@ -17,6 +17,7 @@ func ModelEmbedding(s string, tokens []int, loader *model.ModelLoader, backendCo
if err != nil {
return nil, err
}
defer loader.Close()
var fn func() ([]float32, error)
switch model := inferenceModel.(type) {

View File

@@ -16,6 +16,7 @@ func ImageGeneration(height, width, mode, step, seed int, positive_prompt, negat
if err != nil {
return nil, err
}
defer loader.Close()
fn := func() error {
_, err := inferenceModel.GenerateImage(

View File

@@ -53,6 +53,7 @@ func ModelInference(ctx context.Context, s string, messages []schema.Message, im
if err != nil {
return nil, err
}
defer loader.Close()
var protoMessages []*proto.Message
// if we are using the tokenizer template, we need to convert the messages to proto messages

View File

@@ -40,10 +40,6 @@ func ModelOptions(c config.BackendConfig, so *config.ApplicationConfig, opts ...
grpcOpts := grpcModelOpts(c)
defOpts = append(defOpts, model.WithLoadGRPCLoadModelOpts(grpcOpts))
if so.SingleBackend {
defOpts = append(defOpts, model.WithSingleActiveBackend())
}
if so.ParallelBackendRequests {
defOpts = append(defOpts, model.EnableParallelRequests)
}
@@ -121,7 +117,7 @@ func grpcModelOpts(c config.BackendConfig) *pb.ModelOptions {
triggers := make([]*pb.GrammarTrigger, 0)
for _, t := range c.FunctionsConfig.GrammarConfig.GrammarTriggers {
triggers = append(triggers, &pb.GrammarTrigger{
Word: t.Word,
Word: t.Word,
})
}
@@ -161,38 +157,33 @@ func grpcModelOpts(c config.BackendConfig) *pb.ModelOptions {
DisableLogStatus: c.DisableLogStatus,
DType: c.DType,
// LimitMMPerPrompt vLLM
LimitImagePerPrompt: int32(c.LimitMMPerPrompt.LimitImagePerPrompt),
LimitVideoPerPrompt: int32(c.LimitMMPerPrompt.LimitVideoPerPrompt),
LimitAudioPerPrompt: int32(c.LimitMMPerPrompt.LimitAudioPerPrompt),
MMProj: c.MMProj,
FlashAttention: c.FlashAttention,
CacheTypeKey: c.CacheTypeK,
CacheTypeValue: c.CacheTypeV,
NoKVOffload: c.NoKVOffloading,
YarnExtFactor: c.YarnExtFactor,
YarnAttnFactor: c.YarnAttnFactor,
YarnBetaFast: c.YarnBetaFast,
YarnBetaSlow: c.YarnBetaSlow,
NGQA: c.NGQA,
RMSNormEps: c.RMSNormEps,
MLock: mmlock,
RopeFreqBase: c.RopeFreqBase,
RopeScaling: c.RopeScaling,
Type: c.ModelType,
RopeFreqScale: c.RopeFreqScale,
NUMA: c.NUMA,
Embeddings: embeddings,
LowVRAM: lowVRAM,
NGPULayers: int32(nGPULayers),
MMap: mmap,
MainGPU: c.MainGPU,
Threads: int32(*c.Threads),
TensorSplit: c.TensorSplit,
// AutoGPTQ
ModelBaseName: c.AutoGPTQ.ModelBaseName,
Device: c.AutoGPTQ.Device,
UseTriton: c.AutoGPTQ.Triton,
UseFastTokenizer: c.AutoGPTQ.UseFastTokenizer,
LimitImagePerPrompt: int32(c.LimitMMPerPrompt.LimitImagePerPrompt),
LimitVideoPerPrompt: int32(c.LimitMMPerPrompt.LimitVideoPerPrompt),
LimitAudioPerPrompt: int32(c.LimitMMPerPrompt.LimitAudioPerPrompt),
MMProj: c.MMProj,
FlashAttention: c.FlashAttention,
CacheTypeKey: c.CacheTypeK,
CacheTypeValue: c.CacheTypeV,
NoKVOffload: c.NoKVOffloading,
YarnExtFactor: c.YarnExtFactor,
YarnAttnFactor: c.YarnAttnFactor,
YarnBetaFast: c.YarnBetaFast,
YarnBetaSlow: c.YarnBetaSlow,
NGQA: c.NGQA,
RMSNormEps: c.RMSNormEps,
MLock: mmlock,
RopeFreqBase: c.RopeFreqBase,
RopeScaling: c.RopeScaling,
Type: c.ModelType,
RopeFreqScale: c.RopeFreqScale,
NUMA: c.NUMA,
Embeddings: embeddings,
LowVRAM: lowVRAM,
NGPULayers: int32(nGPULayers),
MMap: mmap,
MainGPU: c.MainGPU,
Threads: int32(*c.Threads),
TensorSplit: c.TensorSplit,
// RWKV
Tokenizer: c.Tokenizer,
}

View File

@@ -12,10 +12,10 @@ import (
func Rerank(request *proto.RerankRequest, loader *model.ModelLoader, appConfig *config.ApplicationConfig, backendConfig config.BackendConfig) (*proto.RerankResult, error) {
opts := ModelOptions(backendConfig, appConfig)
rerankModel, err := loader.Load(opts...)
if err != nil {
return nil, err
}
defer loader.Close()
if rerankModel == nil {
return nil, fmt.Errorf("could not load rerank model")

View File

@@ -26,10 +26,10 @@ func SoundGeneration(
opts := ModelOptions(backendConfig, appConfig)
soundGenModel, err := loader.Load(opts...)
if err != nil {
return "", nil, err
}
defer loader.Close()
if soundGenModel == nil {
return "", nil, fmt.Errorf("could not load sound generation model")

View File

@@ -20,6 +20,7 @@ func TokenMetrics(
if err != nil {
return nil, err
}
defer loader.Close()
if model == nil {
return nil, fmt.Errorf("could not loadmodel model")

View File

@@ -14,10 +14,10 @@ func ModelTokenize(s string, loader *model.ModelLoader, backendConfig config.Bac
opts := ModelOptions(backendConfig, appConfig)
inferenceModel, err = loader.Load(opts...)
if err != nil {
return schema.TokenizeResponse{}, err
}
defer loader.Close()
predictOptions := gRPCPredictOpts(backendConfig, loader.ModelPath)
predictOptions.Prompt = s

View File

@@ -24,6 +24,7 @@ func ModelTranscription(audio, language string, translate bool, ml *model.ModelL
if err != nil {
return nil, err
}
defer ml.Close()
if transcriptionModel == nil {
return nil, fmt.Errorf("could not load transcription model")

View File

@@ -23,10 +23,10 @@ func ModelTTS(
) (string, *proto.Result, error) {
opts := ModelOptions(backendConfig, appConfig, model.WithDefaultBackendString(model.PiperBackend))
ttsModel, err := loader.Load(opts...)
if err != nil {
return "", nil, err
}
defer loader.Close()
if ttsModel == nil {
return "", nil, fmt.Errorf("could not load tts model %q", backendConfig.Model)

View File

@@ -19,6 +19,8 @@ func VAD(request *schema.VADRequest,
if err != nil {
return nil, err
}
defer ml.Close()
req := proto.VADRequest{
Audio: request.Audio,
}

View File

@@ -38,7 +38,7 @@ type RunCMD struct {
F16 bool `name:"f16" env:"LOCALAI_F16,F16" help:"Enable GPU acceleration" group:"performance"`
Threads int `env:"LOCALAI_THREADS,THREADS" short:"t" help:"Number of threads used for parallel computation. Usage of the number of physical cores in the system is suggested" group:"performance"`
ContextSize int `env:"LOCALAI_CONTEXT_SIZE,CONTEXT_SIZE" default:"512" help:"Default context size for models" group:"performance"`
ContextSize int `env:"LOCALAI_CONTEXT_SIZE,CONTEXT_SIZE" help:"Default context size for models" group:"performance"`
Address string `env:"LOCALAI_ADDRESS,ADDRESS" default:":8080" help:"Bind address for the API server" group:"api"`
CORS bool `env:"LOCALAI_CORS,CORS" help:"" group:"api"`

View File

@@ -74,7 +74,7 @@ func (t *SoundGenerationCMD) Run(ctx *cliContext.Context) error {
AssetsDestination: t.BackendAssetsPath,
ExternalGRPCBackends: externalBackends,
}
ml := model.NewModelLoader(opts.ModelPath)
ml := model.NewModelLoader(opts.ModelPath, opts.SingleBackend)
defer func() {
err := ml.StopAllGRPC()

View File

@@ -32,7 +32,7 @@ func (t *TranscriptCMD) Run(ctx *cliContext.Context) error {
}
cl := config.NewBackendConfigLoader(t.ModelsPath)
ml := model.NewModelLoader(opts.ModelPath)
ml := model.NewModelLoader(opts.ModelPath, opts.SingleBackend)
if err := cl.LoadBackendConfigsFromPath(t.ModelsPath); err != nil {
return err
}

View File

@@ -41,7 +41,7 @@ func (t *TTSCMD) Run(ctx *cliContext.Context) error {
AudioDir: outputDir,
AssetsDestination: t.BackendAssetsPath,
}
ml := model.NewModelLoader(opts.ModelPath)
ml := model.NewModelLoader(opts.ModelPath, opts.SingleBackend)
defer func() {
err := ml.StopAllGRPC()

View File

@@ -50,9 +50,6 @@ type BackendConfig struct {
// LLM configs (GPT4ALL, Llama.cpp, ...)
LLMConfig `yaml:",inline"`
// AutoGPTQ specifics
AutoGPTQ AutoGPTQ `yaml:"autogptq"`
// Diffusers
Diffusers Diffusers `yaml:"diffusers"`
Step int `yaml:"step"`
@@ -176,14 +173,6 @@ type LimitMMPerPrompt struct {
LimitAudioPerPrompt int `yaml:"audio"`
}
// AutoGPTQ is a struct that holds the configuration specific to the AutoGPTQ backend
type AutoGPTQ struct {
ModelBaseName string `yaml:"model_base_name"`
Device string `yaml:"device"`
Triton bool `yaml:"triton"`
UseFastTokenizer bool `yaml:"use_fast_tokenizer"`
}
// TemplateConfig is a struct that holds the configuration of the templating system
type TemplateConfig struct {
// Chat is the template used in the chat completion endpoint
@@ -389,16 +378,6 @@ func (cfg *BackendConfig) SetDefaults(opts ...ConfigLoaderOption) {
cfg.Embeddings = &falseV
}
// Value passed by the top level are treated as default (no implicit defaults)
// defaults are set by the user
if ctx == 0 {
ctx = 1024
}
if cfg.ContextSize == nil {
cfg.ContextSize = &ctx
}
if threads == 0 {
// Threads can't be 0
threads = 4
@@ -420,7 +399,7 @@ func (cfg *BackendConfig) SetDefaults(opts ...ConfigLoaderOption) {
cfg.Debug = &trueV
}
guessDefaultsFromFile(cfg, lo.modelPath)
guessDefaultsFromFile(cfg, lo.modelPath, ctx)
}
func (c *BackendConfig) Validate() bool {
@@ -565,7 +544,7 @@ func (c *BackendConfig) GuessUsecases(u BackendConfigUsecases) bool {
}
}
if (u & FLAG_TTS) == FLAG_TTS {
ttsBackends := []string{"piper", "transformers-musicgen", "parler-tts"}
ttsBackends := []string{"bark-cpp", "parler-tts", "piper", "transformers-musicgen"}
if !slices.Contains(ttsBackends, c.Backend) {
return false
}

253
core/config/gguf.go Normal file
View File

@@ -0,0 +1,253 @@
package config
import (
"strings"
"github.com/rs/zerolog/log"
gguf "github.com/thxcode/gguf-parser-go"
)
type familyType uint8
const (
Unknown familyType = iota
LLaMa3
CommandR
Phi3
ChatML
Mistral03
Gemma
DeepSeek2
)
const (
defaultContextSize = 1024
)
type settingsConfig struct {
StopWords []string
TemplateConfig TemplateConfig
RepeatPenalty float64
}
// default settings to adopt with a given model family
var defaultsSettings map[familyType]settingsConfig = map[familyType]settingsConfig{
Gemma: {
RepeatPenalty: 1.0,
StopWords: []string{"<|im_end|>", "<end_of_turn>", "<start_of_turn>"},
TemplateConfig: TemplateConfig{
Chat: "{{.Input }}\n<start_of_turn>model\n",
ChatMessage: "<start_of_turn>{{if eq .RoleName \"assistant\" }}model{{else}}{{ .RoleName }}{{end}}\n{{ if .Content -}}\n{{.Content -}}\n{{ end -}}<end_of_turn>",
Completion: "{{.Input}}",
},
},
DeepSeek2: {
StopWords: []string{"<end▁of▁sentence>"},
TemplateConfig: TemplateConfig{
ChatMessage: `{{if eq .RoleName "user" -}}User: {{.Content }}
{{ end -}}
{{if eq .RoleName "assistant" -}}Assistant: {{.Content}}<end▁of▁sentence>{{end}}
{{if eq .RoleName "system" -}}{{.Content}}
{{end -}}`,
Chat: "{{.Input -}}\nAssistant: ",
},
},
LLaMa3: {
StopWords: []string{"<|eot_id|>"},
TemplateConfig: TemplateConfig{
Chat: "<|begin_of_text|>{{.Input }}\n<|start_header_id|>assistant<|end_header_id|>",
ChatMessage: "<|start_header_id|>{{ .RoleName }}<|end_header_id|>\n\n{{.Content }}<|eot_id|>",
},
},
CommandR: {
TemplateConfig: TemplateConfig{
Chat: "{{.Input -}}<|START_OF_TURN_TOKEN|><|CHATBOT_TOKEN|>",
Functions: `<|START_OF_TURN_TOKEN|><|SYSTEM_TOKEN|>
You are a function calling AI model, you can call the following functions:
## Available Tools
{{range .Functions}}
- {"type": "function", "function": {"name": "{{.Name}}", "description": "{{.Description}}", "parameters": {{toJson .Parameters}} }}
{{end}}
When using a tool, reply with JSON, for instance {"name": "tool_name", "arguments": {"param1": "value1", "param2": "value2"}}
<|END_OF_TURN_TOKEN|><|START_OF_TURN_TOKEN|><|CHATBOT_TOKEN|>{{.Input -}}`,
ChatMessage: `{{if eq .RoleName "user" -}}
<|START_OF_TURN_TOKEN|><|USER_TOKEN|>{{.Content}}<|END_OF_TURN_TOKEN|>
{{- else if eq .RoleName "system" -}}
<|START_OF_TURN_TOKEN|><|SYSTEM_TOKEN|>{{.Content}}<|END_OF_TURN_TOKEN|>
{{- else if eq .RoleName "assistant" -}}
<|START_OF_TURN_TOKEN|><|CHATBOT_TOKEN|>{{.Content}}<|END_OF_TURN_TOKEN|>
{{- else if eq .RoleName "tool" -}}
<|START_OF_TURN_TOKEN|><|SYSTEM_TOKEN|>{{.Content}}<|END_OF_TURN_TOKEN|>
{{- else if .FunctionCall -}}
<|START_OF_TURN_TOKEN|><|CHATBOT_TOKEN|>{{toJson .FunctionCall}}}<|END_OF_TURN_TOKEN|>
{{- end -}}`,
},
StopWords: []string{"<|END_OF_TURN_TOKEN|>"},
},
Phi3: {
TemplateConfig: TemplateConfig{
Chat: "{{.Input}}\n<|assistant|>",
ChatMessage: "<|{{ .RoleName }}|>\n{{.Content}}<|end|>",
Completion: "{{.Input}}",
},
StopWords: []string{"<|end|>", "<|endoftext|>"},
},
ChatML: {
TemplateConfig: TemplateConfig{
Chat: "{{.Input -}}\n<|im_start|>assistant",
Functions: `<|im_start|>system
You are a function calling AI model. You are provided with functions to execute. You may call one or more functions to assist with the user query. Don't make assumptions about what values to plug into functions. Here are the available tools:
{{range .Functions}}
{'type': 'function', 'function': {'name': '{{.Name}}', 'description': '{{.Description}}', 'parameters': {{toJson .Parameters}} }}
{{end}}
For each function call return a json object with function name and arguments
<|im_end|>
{{.Input -}}
<|im_start|>assistant`,
ChatMessage: `<|im_start|>{{ .RoleName }}
{{ if .FunctionCall -}}
Function call:
{{ else if eq .RoleName "tool" -}}
Function response:
{{ end -}}
{{ if .Content -}}
{{.Content }}
{{ end -}}
{{ if .FunctionCall -}}
{{toJson .FunctionCall}}
{{ end -}}<|im_end|>`,
},
StopWords: []string{"<|im_end|>", "<dummy32000>", "</s>"},
},
Mistral03: {
TemplateConfig: TemplateConfig{
Chat: "{{.Input -}}",
Functions: `[AVAILABLE_TOOLS] [{{range .Functions}}{"type": "function", "function": {"name": "{{.Name}}", "description": "{{.Description}}", "parameters": {{toJson .Parameters}} }}{{end}} ] [/AVAILABLE_TOOLS]{{.Input }}`,
ChatMessage: `{{if eq .RoleName "user" -}}
[INST] {{.Content }} [/INST]
{{- else if .FunctionCall -}}
[TOOL_CALLS] {{toJson .FunctionCall}} [/TOOL_CALLS]
{{- else if eq .RoleName "tool" -}}
[TOOL_RESULTS] {{.Content}} [/TOOL_RESULTS]
{{- else -}}
{{ .Content -}}
{{ end -}}`,
},
StopWords: []string{"<|im_end|>", "<dummy32000>", "</tool_call>", "<|eot_id|>", "<|end_of_text|>", "</s>", "[/TOOL_CALLS]", "[/ACTIONS]"},
},
}
// this maps well known template used in HF to model families defined above
var knownTemplates = map[string]familyType{
`{% if messages[0]['role'] == 'system' %}{% set system_message = messages[0]['content'] %}{% endif %}{% if system_message is defined %}{{ system_message }}{% endif %}{% for message in messages %}{% set content = message['content'] %}{% if message['role'] == 'user' %}{{ '<|im_start|>user\n' + content + '<|im_end|>\n<|im_start|>assistant\n' }}{% elif message['role'] == 'assistant' %}{{ content + '<|im_end|>' + '\n' }}{% endif %}{% endfor %}`: ChatML,
`{{ bos_token }}{% for message in messages %}{% if (message['role'] == 'user') != (loop.index0 % 2 == 0) %}{{ raise_exception('Conversation roles must alternate user/assistant/user/assistant/...') }}{% endif %}{% if message['role'] == 'user' %}{{ '[INST] ' + message['content'] + ' [/INST]' }}{% elif message['role'] == 'assistant' %}{{ message['content'] + eos_token}}{% else %}{{ raise_exception('Only user and assistant roles are supported!') }}{% endif %}{% endfor %}`: Mistral03,
}
func guessGGUFFromFile(cfg *BackendConfig, f *gguf.GGUFFile, defaultCtx int) {
if defaultCtx == 0 && cfg.ContextSize == nil {
ctxSize := f.EstimateLLaMACppUsage().ContextSize
if ctxSize > 0 {
cSize := int(ctxSize)
cfg.ContextSize = &cSize
} else {
defaultCtx = defaultContextSize
cfg.ContextSize = &defaultCtx
}
}
if cfg.HasTemplate() {
// nothing to guess here
log.Debug().Any("name", cfg.Name).Msgf("guessDefaultsFromFile: %s", "template already set")
return
}
log.Debug().
Any("eosTokenID", f.Tokenizer().EOSTokenID).
Any("bosTokenID", f.Tokenizer().BOSTokenID).
Any("modelName", f.Model().Name).
Any("architecture", f.Architecture().Architecture).Msgf("Model file loaded: %s", cfg.ModelFileName())
// guess the name
if cfg.Name == "" {
cfg.Name = f.Model().Name
}
family := identifyFamily(f)
if family == Unknown {
log.Debug().Msgf("guessDefaultsFromFile: %s", "family not identified")
return
}
// identify template
settings, ok := defaultsSettings[family]
if ok {
cfg.TemplateConfig = settings.TemplateConfig
log.Debug().Any("family", family).Msgf("guessDefaultsFromFile: guessed template %+v", cfg.TemplateConfig)
if len(cfg.StopWords) == 0 {
cfg.StopWords = settings.StopWords
}
if cfg.RepeatPenalty == 0.0 {
cfg.RepeatPenalty = settings.RepeatPenalty
}
} else {
log.Debug().Any("family", family).Msgf("guessDefaultsFromFile: no template found for family")
}
if cfg.HasTemplate() {
return
}
// identify from well known templates first, otherwise use the raw jinja template
chatTemplate, found := f.Header.MetadataKV.Get("tokenizer.chat_template")
if found {
// try to use the jinja template
cfg.TemplateConfig.JinjaTemplate = true
cfg.TemplateConfig.ChatMessage = chatTemplate.ValueString()
}
}
func identifyFamily(f *gguf.GGUFFile) familyType {
// identify from well known templates first
chatTemplate, found := f.Header.MetadataKV.Get("tokenizer.chat_template")
if found && chatTemplate.ValueString() != "" {
if family, ok := knownTemplates[chatTemplate.ValueString()]; ok {
return family
}
}
// otherwise try to identify from the model properties
arch := f.Architecture().Architecture
eosTokenID := f.Tokenizer().EOSTokenID
bosTokenID := f.Tokenizer().BOSTokenID
isYI := arch == "llama" && bosTokenID == 1 && eosTokenID == 2
// WTF! Mistral0.3 and isYi have same bosTokenID and eosTokenID
llama3 := arch == "llama" && eosTokenID == 128009
commandR := arch == "command-r" && eosTokenID == 255001
qwen2 := arch == "qwen2"
phi3 := arch == "phi-3"
gemma := strings.HasPrefix(arch, "gemma") || strings.Contains(strings.ToLower(f.Model().Name), "gemma")
deepseek2 := arch == "deepseek2"
switch {
case deepseek2:
return DeepSeek2
case gemma:
return Gemma
case llama3:
return LLaMa3
case commandR:
return CommandR
case phi3:
return Phi3
case qwen2, isYI:
return ChatML
default:
return Unknown
}
}

View File

@@ -3,147 +3,12 @@ package config
import (
"os"
"path/filepath"
"strings"
"github.com/rs/zerolog/log"
gguf "github.com/thxcode/gguf-parser-go"
)
type familyType uint8
const (
Unknown familyType = iota
LLaMa3
CommandR
Phi3
ChatML
Mistral03
Gemma
DeepSeek2
)
type settingsConfig struct {
StopWords []string
TemplateConfig TemplateConfig
RepeatPenalty float64
}
// default settings to adopt with a given model family
var defaultsSettings map[familyType]settingsConfig = map[familyType]settingsConfig{
Gemma: {
RepeatPenalty: 1.0,
StopWords: []string{"<|im_end|>", "<end_of_turn>", "<start_of_turn>"},
TemplateConfig: TemplateConfig{
Chat: "{{.Input }}\n<start_of_turn>model\n",
ChatMessage: "<start_of_turn>{{if eq .RoleName \"assistant\" }}model{{else}}{{ .RoleName }}{{end}}\n{{ if .Content -}}\n{{.Content -}}\n{{ end -}}<end_of_turn>",
Completion: "{{.Input}}",
},
},
DeepSeek2: {
StopWords: []string{"<end▁of▁sentence>"},
TemplateConfig: TemplateConfig{
ChatMessage: `{{if eq .RoleName "user" -}}User: {{.Content }}
{{ end -}}
{{if eq .RoleName "assistant" -}}Assistant: {{.Content}}<end▁of▁sentence>{{end}}
{{if eq .RoleName "system" -}}{{.Content}}
{{end -}}`,
Chat: "{{.Input -}}\nAssistant: ",
},
},
LLaMa3: {
StopWords: []string{"<|eot_id|>"},
TemplateConfig: TemplateConfig{
Chat: "<|begin_of_text|>{{.Input }}\n<|start_header_id|>assistant<|end_header_id|>",
ChatMessage: "<|start_header_id|>{{ .RoleName }}<|end_header_id|>\n\n{{.Content }}<|eot_id|>",
},
},
CommandR: {
TemplateConfig: TemplateConfig{
Chat: "{{.Input -}}<|START_OF_TURN_TOKEN|><|CHATBOT_TOKEN|>",
Functions: `<|START_OF_TURN_TOKEN|><|SYSTEM_TOKEN|>
You are a function calling AI model, you can call the following functions:
## Available Tools
{{range .Functions}}
- {"type": "function", "function": {"name": "{{.Name}}", "description": "{{.Description}}", "parameters": {{toJson .Parameters}} }}
{{end}}
When using a tool, reply with JSON, for instance {"name": "tool_name", "arguments": {"param1": "value1", "param2": "value2"}}
<|END_OF_TURN_TOKEN|><|START_OF_TURN_TOKEN|><|CHATBOT_TOKEN|>{{.Input -}}`,
ChatMessage: `{{if eq .RoleName "user" -}}
<|START_OF_TURN_TOKEN|><|USER_TOKEN|>{{.Content}}<|END_OF_TURN_TOKEN|>
{{- else if eq .RoleName "system" -}}
<|START_OF_TURN_TOKEN|><|SYSTEM_TOKEN|>{{.Content}}<|END_OF_TURN_TOKEN|>
{{- else if eq .RoleName "assistant" -}}
<|START_OF_TURN_TOKEN|><|CHATBOT_TOKEN|>{{.Content}}<|END_OF_TURN_TOKEN|>
{{- else if eq .RoleName "tool" -}}
<|START_OF_TURN_TOKEN|><|SYSTEM_TOKEN|>{{.Content}}<|END_OF_TURN_TOKEN|>
{{- else if .FunctionCall -}}
<|START_OF_TURN_TOKEN|><|CHATBOT_TOKEN|>{{toJson .FunctionCall}}}<|END_OF_TURN_TOKEN|>
{{- end -}}`,
},
StopWords: []string{"<|END_OF_TURN_TOKEN|>"},
},
Phi3: {
TemplateConfig: TemplateConfig{
Chat: "{{.Input}}\n<|assistant|>",
ChatMessage: "<|{{ .RoleName }}|>\n{{.Content}}<|end|>",
Completion: "{{.Input}}",
},
StopWords: []string{"<|end|>", "<|endoftext|>"},
},
ChatML: {
TemplateConfig: TemplateConfig{
Chat: "{{.Input -}}\n<|im_start|>assistant",
Functions: `<|im_start|>system
You are a function calling AI model. You are provided with functions to execute. You may call one or more functions to assist with the user query. Don't make assumptions about what values to plug into functions. Here are the available tools:
{{range .Functions}}
{'type': 'function', 'function': {'name': '{{.Name}}', 'description': '{{.Description}}', 'parameters': {{toJson .Parameters}} }}
{{end}}
For each function call return a json object with function name and arguments
<|im_end|>
{{.Input -}}
<|im_start|>assistant`,
ChatMessage: `<|im_start|>{{ .RoleName }}
{{ if .FunctionCall -}}
Function call:
{{ else if eq .RoleName "tool" -}}
Function response:
{{ end -}}
{{ if .Content -}}
{{.Content }}
{{ end -}}
{{ if .FunctionCall -}}
{{toJson .FunctionCall}}
{{ end -}}<|im_end|>`,
},
StopWords: []string{"<|im_end|>", "<dummy32000>", "</s>"},
},
Mistral03: {
TemplateConfig: TemplateConfig{
Chat: "{{.Input -}}",
Functions: `[AVAILABLE_TOOLS] [{{range .Functions}}{"type": "function", "function": {"name": "{{.Name}}", "description": "{{.Description}}", "parameters": {{toJson .Parameters}} }}{{end}} ] [/AVAILABLE_TOOLS]{{.Input }}`,
ChatMessage: `{{if eq .RoleName "user" -}}
[INST] {{.Content }} [/INST]
{{- else if .FunctionCall -}}
[TOOL_CALLS] {{toJson .FunctionCall}} [/TOOL_CALLS]
{{- else if eq .RoleName "tool" -}}
[TOOL_RESULTS] {{.Content}} [/TOOL_RESULTS]
{{- else -}}
{{ .Content -}}
{{ end -}}`,
},
StopWords: []string{"<|im_end|>", "<dummy32000>", "</tool_call>", "<|eot_id|>", "<|end_of_text|>", "</s>", "[/TOOL_CALLS]", "[/ACTIONS]"},
},
}
// this maps well known template used in HF to model families defined above
var knownTemplates = map[string]familyType{
`{% if messages[0]['role'] == 'system' %}{% set system_message = messages[0]['content'] %}{% endif %}{% if system_message is defined %}{{ system_message }}{% endif %}{% for message in messages %}{% set content = message['content'] %}{% if message['role'] == 'user' %}{{ '<|im_start|>user\n' + content + '<|im_end|>\n<|im_start|>assistant\n' }}{% elif message['role'] == 'assistant' %}{{ content + '<|im_end|>' + '\n' }}{% endif %}{% endfor %}`: ChatML,
`{{ bos_token }}{% for message in messages %}{% if (message['role'] == 'user') != (loop.index0 % 2 == 0) %}{{ raise_exception('Conversation roles must alternate user/assistant/user/assistant/...') }}{% endif %}{% if message['role'] == 'user' %}{{ '[INST] ' + message['content'] + ' [/INST]' }}{% elif message['role'] == 'assistant' %}{{ message['content'] + eos_token}}{% else %}{{ raise_exception('Only user and assistant roles are supported!') }}{% endif %}{% endfor %}`: Mistral03,
}
func guessDefaultsFromFile(cfg *BackendConfig, modelPath string) {
func guessDefaultsFromFile(cfg *BackendConfig, modelPath string, defaultCtx int) {
if os.Getenv("LOCALAI_DISABLE_GUESSING") == "true" {
log.Debug().Msgf("guessDefaultsFromFile: %s", "guessing disabled with LOCALAI_DISABLE_GUESSING")
return
@@ -154,106 +19,20 @@ func guessDefaultsFromFile(cfg *BackendConfig, modelPath string) {
return
}
if cfg.HasTemplate() {
// nothing to guess here
log.Debug().Any("name", cfg.Name).Msgf("guessDefaultsFromFile: %s", "template already set")
return
}
// We try to guess only if we don't have a template defined already
guessPath := filepath.Join(modelPath, cfg.ModelFileName())
// try to parse the gguf file
f, err := gguf.ParseGGUFFile(guessPath)
if err != nil {
// Only valid for gguf files
log.Debug().Str("filePath", guessPath).Msg("guessDefaultsFromFile: not a GGUF file")
if err == nil {
guessGGUFFromFile(cfg, f, defaultCtx)
return
}
log.Debug().
Any("eosTokenID", f.Tokenizer().EOSTokenID).
Any("bosTokenID", f.Tokenizer().BOSTokenID).
Any("modelName", f.Model().Name).
Any("architecture", f.Architecture().Architecture).Msgf("Model file loaded: %s", cfg.ModelFileName())
// guess the name
if cfg.Name == "" {
cfg.Name = f.Model().Name
}
family := identifyFamily(f)
if family == Unknown {
log.Debug().Msgf("guessDefaultsFromFile: %s", "family not identified")
return
}
// identify template
settings, ok := defaultsSettings[family]
if ok {
cfg.TemplateConfig = settings.TemplateConfig
log.Debug().Any("family", family).Msgf("guessDefaultsFromFile: guessed template %+v", cfg.TemplateConfig)
if len(cfg.StopWords) == 0 {
cfg.StopWords = settings.StopWords
if cfg.ContextSize == nil {
if defaultCtx == 0 {
defaultCtx = defaultContextSize
}
if cfg.RepeatPenalty == 0.0 {
cfg.RepeatPenalty = settings.RepeatPenalty
}
} else {
log.Debug().Any("family", family).Msgf("guessDefaultsFromFile: no template found for family")
}
if cfg.HasTemplate() {
return
}
// identify from well known templates first, otherwise use the raw jinja template
chatTemplate, found := f.Header.MetadataKV.Get("tokenizer.chat_template")
if found {
// try to use the jinja template
cfg.TemplateConfig.JinjaTemplate = true
cfg.TemplateConfig.ChatMessage = chatTemplate.ValueString()
}
}
func identifyFamily(f *gguf.GGUFFile) familyType {
// identify from well known templates first
chatTemplate, found := f.Header.MetadataKV.Get("tokenizer.chat_template")
if found && chatTemplate.ValueString() != "" {
if family, ok := knownTemplates[chatTemplate.ValueString()]; ok {
return family
}
}
// otherwise try to identify from the model properties
arch := f.Architecture().Architecture
eosTokenID := f.Tokenizer().EOSTokenID
bosTokenID := f.Tokenizer().BOSTokenID
isYI := arch == "llama" && bosTokenID == 1 && eosTokenID == 2
// WTF! Mistral0.3 and isYi have same bosTokenID and eosTokenID
llama3 := arch == "llama" && eosTokenID == 128009
commandR := arch == "command-r" && eosTokenID == 255001
qwen2 := arch == "qwen2"
phi3 := arch == "phi-3"
gemma := strings.HasPrefix(arch, "gemma") || strings.Contains(strings.ToLower(f.Model().Name), "gemma")
deepseek2 := arch == "deepseek2"
switch {
case deepseek2:
return DeepSeek2
case gemma:
return Gemma
case llama3:
return LLaMa3
case commandR:
return CommandR
case phi3:
return Phi3
case qwen2, isYI:
return ChatML
default:
return Unknown
cfg.ContextSize = &defaultCtx
}
}

View File

@@ -142,9 +142,9 @@ func API(application *application.Application) (*fiber.App, error) {
httpFS := http.FS(embedDirStatic)
router.Use(favicon.New(favicon.Config{
URL: "/favicon.ico",
URL: "/favicon.svg",
FileSystem: httpFS,
File: "static/favicon.ico",
File: "static/favicon.svg",
}))
router.Use("/static", filesystem.New(filesystem.Config{

View File

@@ -122,15 +122,15 @@ func modelModal(m *gallery.GalleryModel) elem.Node {
"id": modalName(m),
"tabindex": "-1",
"aria-hidden": "true",
"class": "hidden overflow-y-auto overflow-x-hidden fixed top-0 right-0 left-0 z-50 justify-center items-center w-full md:inset-0 h-[calc(100%-1rem)] max-h-full",
"class": "hidden fixed top-0 right-0 left-0 z-50 justify-center items-center w-full md:inset-0 h-full max-h-full bg-gray-900/50",
},
elem.Div(
attrs.Props{
"class": "relative p-4 w-full max-w-2xl max-h-full",
"class": "relative p-4 w-full max-w-2xl h-[90vh] mx-auto mt-[5vh]",
},
elem.Div(
attrs.Props{
"class": "relative p-4 w-full max-w-2xl max-h-full bg-white rounded-lg shadow dark:bg-gray-700",
"class": "relative bg-white rounded-lg shadow dark:bg-gray-700 h-full flex flex-col",
},
// header
elem.Div(
@@ -164,14 +164,13 @@ func modelModal(m *gallery.GalleryModel) elem.Node {
// body
elem.Div(
attrs.Props{
"class": "p-4 md:p-5 space-y-4",
"class": "p-4 md:p-5 space-y-4 overflow-y-auto flex-1 min-h-0",
},
elem.Div(
attrs.Props{
"class": "flex justify-center items-center",
},
elem.Img(attrs.Props{
// "class": "rounded-t-lg object-fit object-center h-96",
"class": "lazy rounded-t-lg max-h-48 max-w-96 object-cover mt-3 entered loaded",
"src": m.Icon,
"loading": "lazy",
@@ -232,7 +231,6 @@ func modelModal(m *gallery.GalleryModel) elem.Node {
),
),
)
}
func modelDescription(m *gallery.GalleryModel) elem.Node {

View File

@@ -21,6 +21,7 @@ func StoresSetEndpoint(sl *model.ModelLoader, appConfig *config.ApplicationConfi
if err != nil {
return err
}
defer sl.Close()
vals := make([][]byte, len(input.Values))
for i, v := range input.Values {
@@ -48,6 +49,7 @@ func StoresDeleteEndpoint(sl *model.ModelLoader, appConfig *config.ApplicationCo
if err != nil {
return err
}
defer sl.Close()
if err := store.DeleteCols(c.Context(), sb, input.Keys); err != nil {
return err
@@ -69,6 +71,7 @@ func StoresGetEndpoint(sl *model.ModelLoader, appConfig *config.ApplicationConfi
if err != nil {
return err
}
defer sl.Close()
keys, vals, err := store.GetCols(c.Context(), sb, input.Keys)
if err != nil {
@@ -100,6 +103,7 @@ func StoresFindEndpoint(sl *model.ModelLoader, appConfig *config.ApplicationConf
if err != nil {
return err
}
defer sl.Close()
keys, vals, similarities, err := store.Find(c.Context(), sb, input.Key, input.Topk)
if err != nil {

View File

@@ -40,7 +40,7 @@ func TestAssistantEndpoints(t *testing.T) {
cl := &config.BackendConfigLoader{}
//configsDir := "/tmp/localai/configs"
modelPath := "/tmp/localai/model"
var ml = model.NewModelLoader(modelPath)
var ml = model.NewModelLoader(modelPath, false)
appConfig := &config.ApplicationConfig{
ConfigsDir: configsDir,

View File

@@ -29,9 +29,9 @@ func Explorer(db *explorer.Database) *fiber.App {
httpFS := http.FS(embedDirStatic)
app.Use(favicon.New(favicon.Config{
URL: "/favicon.ico",
URL: "/favicon.svg",
FileSystem: httpFS,
File: "static/favicon.ico",
File: "static/favicon.svg",
}))
app.Use("/static", filesystem.New(filesystem.Config{

View File

@@ -203,18 +203,10 @@ func mergeOpenAIRequestAndBackendConfig(config *config.BackendConfig, input *sch
config.Diffusers.ClipSkip = input.ClipSkip
}
if input.ModelBaseName != "" {
config.AutoGPTQ.ModelBaseName = input.ModelBaseName
}
if input.NegativePromptScale != 0 {
config.NegativePromptScale = input.NegativePromptScale
}
if input.UseFastTokenizer {
config.UseFastTokenizer = input.UseFastTokenizer
}
if input.NegativePrompt != "" {
config.NegativePrompt = input.NegativePrompt
}

View File

@@ -50,11 +50,10 @@ func RegisterLocalAIRoutes(router *fiber.App,
router.Post("/v1/vad", vadChain...)
// Stores
sl := model.NewModelLoader("")
router.Post("/stores/set", localai.StoresSetEndpoint(sl, appConfig))
router.Post("/stores/delete", localai.StoresDeleteEndpoint(sl, appConfig))
router.Post("/stores/get", localai.StoresGetEndpoint(sl, appConfig))
router.Post("/stores/find", localai.StoresFindEndpoint(sl, appConfig))
router.Post("/stores/set", localai.StoresSetEndpoint(ml, appConfig))
router.Post("/stores/delete", localai.StoresDeleteEndpoint(ml, appConfig))
router.Post("/stores/get", localai.StoresGetEndpoint(ml, appConfig))
router.Post("/stores/find", localai.StoresFindEndpoint(ml, appConfig))
if !appConfig.DisableMetrics {
router.Get("/metrics", localai.LocalAIMetricsEndpoint())

View File

Binary file not shown.

Before

Width:  |  Height:  |  Size: 15 KiB

View File

File diff suppressed because one or more lines are too long

After

Width:  |  Height:  |  Size: 108 KiB

BIN
core/http/static/logo.png Normal file
View File

Binary file not shown.

After

Width:  |  Height:  |  Size: 893 KiB

View File

Binary file not shown.

After

Width:  |  Height:  |  Size: 930 KiB

View File

@@ -115,6 +115,7 @@ async function sendTextToChatGPT(text) {
const response = await fetch('v1/chat/completions', {
method: 'POST',
headers: { "Content-Type": "application/json" },
body: JSON.stringify({
model: getModel(),
messages: conversationHistory

View File

@@ -12,7 +12,7 @@
<div class="max-w-md w-full bg-gray-800/90 border border-gray-700/50 rounded-xl overflow-hidden shadow-xl">
<div class="animation-container">
<div class="text-overlay">
<!-- <i class="fas fa-circle-nodes text-5xl text-blue-400 mb-2"></i> -->
<img src="static/logo.png" alt="LocalAI Logo" class="h-32">
</div>
</div>

View File

@@ -3,7 +3,7 @@
<meta name="viewport" content="width=device-width, initial-scale=1.0">
<title>{{.Title}}</title>
<base href="{{.BaseURL}}" />
<link rel="icon" type="image/x-icon" href="favicon.ico" />
<link rel="shortcut icon" href="static/favicon.svg" type="image/svg">
<link rel="stylesheet" href="static/assets/highlightjs.css" />
<script defer src="static/assets/highlightjs.js"></script>
<script defer src="static/assets/alpine.js"></script>

View File

@@ -4,10 +4,9 @@
<div class="flex items-center">
<!-- Logo Image -->
<a href="./" class="flex items-center group">
<img src="https://github.com/go-skynet/LocalAI/assets/2420543/0966aa2a-166e-4f99-a3e5-6c915fc997dd"
<img src="static/logo_horizontal.png"
alt="LocalAI Logo"
class="h-10 mr-3 rounded-lg border border-blue-600/30 shadow-md transition-all duration-300 group-hover:shadow-blue-500/20 group-hover:border-blue-500/50">
<span class="text-white text-xl font-bold bg-clip-text text-transparent bg-gradient-to-r from-blue-400 to-indigo-400">LocalAI</span>
class="h-14 mr-3 brightness-110 transition-all duration-300 group-hover:brightness-125">
</a>
</div>

View File

@@ -4,10 +4,9 @@
<div class="flex items-center">
<!-- Logo Image -->
<a href="./" class="flex items-center group">
<img src="https://github.com/go-skynet/LocalAI/assets/2420543/0966aa2a-166e-4f99-a3e5-6c915fc997dd"
<img src="static/logo_horizontal.png"
alt="LocalAI Logo"
class="h-10 mr-3 rounded-lg border border-blue-600/30 shadow-md transition-all duration-300 group-hover:shadow-blue-500/20 group-hover:border-blue-500/50">
<span class="text-white text-xl font-bold bg-clip-text text-transparent bg-gradient-to-r from-blue-400 to-indigo-400">LocalAI</span>
</a>
</div>

View File

@@ -202,7 +202,6 @@ type OpenAIRequest struct {
Backend string `json:"backend" yaml:"backend"`
// AutoGPTQ
ModelBaseName string `json:"model_base_name" yaml:"model_base_name"`
}

View File

@@ -41,8 +41,6 @@ type PredictionOptions struct {
RopeFreqBase float32 `json:"rope_freq_base" yaml:"rope_freq_base"`
RopeFreqScale float32 `json:"rope_freq_scale" yaml:"rope_freq_scale"`
NegativePromptScale float32 `json:"negative_prompt_scale" yaml:"negative_prompt_scale"`
// AutoGPTQ
UseFastTokenizer bool `json:"use_fast_tokenizer" yaml:"use_fast_tokenizer"`
// Diffusers
ClipSkip int `json:"clip_skip" yaml:"clip_skip"`

View File

Binary file not shown.

After

Width:  |  Height:  |  Size: 506 KiB

View File

Binary file not shown.

After

Width:  |  Height:  |  Size: 170 KiB

View File

Binary file not shown.

After

Width:  |  Height:  |  Size: 893 KiB

View File

File diff suppressed because one or more lines are too long

After

Width:  |  Height:  |  Size: 108 KiB

View File

Binary file not shown.

After

Width:  |  Height:  |  Size: 132 KiB

View File

Binary file not shown.

After

Width:  |  Height:  |  Size: 284 KiB

Some files were not shown because too many files have changed in this diff Show More