Compare commits

...

508 Commits

Author SHA1 Message Date
Ettore Di Giacinto
3826edb9da chore(deps): bump llama.cpp to '10f2e81809bbb69ecfe64fc8b4686285f84b0c07'
Signed-off-by: Ettore Di Giacinto <mudler@localai.io>
2025-03-12 09:12:59 +01:00
Ettore Di Giacinto
e878556e98 chore(model gallery): add trashpanda-org_qwq-32b-snowdrop-v0 (#5000)
Signed-off-by: Ettore Di Giacinto <mudler@localai.io>
2025-03-12 08:26:09 +01:00
Ettore Di Giacinto
b096928172 chore(model gallery): add open-r1_olympiccoder-7b (#4999)
Signed-off-by: Ettore Di Giacinto <mudler@localai.io>
2025-03-12 08:24:35 +01:00
Ettore Di Giacinto
db7442ae67 chore(model gallery): add open-r1_olympiccoder-32b (#4998)
Signed-off-by: Ettore Di Giacinto <mudler@localai.io>
2025-03-12 08:23:01 +01:00
Ettore Di Giacinto
b6cd430e08 chore(model gallery): add thedrummer_gemmasutra-small-4b-v1 (#4997)
Signed-off-by: Ettore Di Giacinto <mudler@localai.io>
2025-03-12 08:19:51 +01:00
LocalAI [bot]
478e50cda2 chore: ⬆️ Update ggml-org/llama.cpp to 2c9f833d17bb5b8ea89dec663b072b5420fc5438 (#4991)
⬆️ Update ggml-org/llama.cpp

Signed-off-by: github-actions[bot] <41898282+github-actions[bot]@users.noreply.github.com>
Co-authored-by: mudler <2420543+mudler@users.noreply.github.com>
Co-authored-by: Ettore Di Giacinto <mudler@users.noreply.github.com>
2025-03-11 11:19:03 +00:00
Ettore Di Giacinto
1db2b9943c chore(deps): Bump grpcio to 1.71.0 (#4993)
Signed-off-by: Ettore Di Giacinto <mudler@localai.io>
2025-03-11 09:44:21 +01:00
Ettore Di Giacinto
ac41aa8b67 chore(model gallery): add openpipe_deductive-reasoning-qwen-32b (#4995)
Signed-off-by: Ettore Di Giacinto <mudler@localai.io>
2025-03-11 09:44:07 +01:00
Ettore Di Giacinto
156a98e2e7 chore(model gallery): add openpipe_deductive-reasoning-qwen-14b (#4994)
Signed-off-by: Ettore Di Giacinto <mudler@localai.io>
2025-03-11 09:40:38 +01:00
dependabot[bot]
d88ec1209e chore(deps): Bump docs/themes/hugo-theme-relearn from 4a4b60e to 9a020e7 (#4988)
chore(deps): Bump docs/themes/hugo-theme-relearn

Bumps [docs/themes/hugo-theme-relearn](https://github.com/McShelby/hugo-theme-relearn) from `4a4b60e` to `9a020e7`.
- [Release notes](https://github.com/McShelby/hugo-theme-relearn/releases)
- [Commits](4a4b60ef04...9a020e7ead)

---
updated-dependencies:
- dependency-name: docs/themes/hugo-theme-relearn
  dependency-type: direct:production
...

Signed-off-by: dependabot[bot] <support@github.com>
Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com>
2025-03-11 09:39:04 +01:00
dependabot[bot]
fde8dbfc80 chore(deps): Bump appleboy/ssh-action from 1.2.1 to 1.2.2 (#4978)
Bumps [appleboy/ssh-action](https://github.com/appleboy/ssh-action) from 1.2.1 to 1.2.2.
- [Release notes](https://github.com/appleboy/ssh-action/releases)
- [Changelog](https://github.com/appleboy/ssh-action/blob/master/.goreleaser.yaml)
- [Commits](https://github.com/appleboy/ssh-action/compare/v1.2.1...v1.2.2)

---
updated-dependencies:
- dependency-name: appleboy/ssh-action
  dependency-type: direct:production
  update-type: version-update:semver-patch
...

Signed-off-by: dependabot[bot] <support@github.com>
Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com>
2025-03-11 08:29:31 +01:00
Ettore Di Giacinto
879dc73eba Revert "chore(deps): Bump intel-extension-for-pytorch from 2.3.110+xpu to 2.6.10+xpu in /backend/python/diffusers" (#4992)
Revert "chore(deps): Bump intel-extension-for-pytorch from 2.3.110+xpu to 2.6…"

This reverts commit 1dfc52de16.
2025-03-11 08:29:05 +01:00
dependabot[bot]
1dfc52de16 chore(deps): Bump intel-extension-for-pytorch from 2.3.110+xpu to 2.6.10+xpu in /backend/python/diffusers (#4973)
chore(deps): Bump intel-extension-for-pytorch

Bumps intel-extension-for-pytorch from 2.3.110+xpu to 2.6.10+xpu.

---
updated-dependencies:
- dependency-name: intel-extension-for-pytorch
  dependency-type: direct:production
  update-type: version-update:semver-minor
...

Signed-off-by: dependabot[bot] <support@github.com>
Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com>
2025-03-10 21:14:43 +00:00
Ettore Di Giacinto
1331129485 fix(routes): do not gate generated artifacts via key (#4971)
fix(routes): do not gate generated images via key

We generate unique uris for images.

Signed-off-by: Ettore Di Giacinto <mudler@localai.io>
2025-03-10 15:58:25 +01:00
Ettore Di Giacinto
1cd98062e5 chore(model gallery): add hyperllama3.1-v2-i1 (#4970)
Signed-off-by: Ettore Di Giacinto <mudler@localai.io>
2025-03-10 10:52:36 +01:00
Ettore Di Giacinto
9791d9b77a chore(model gallery): add opencrystal-l3-15b-v2.1-i1 (#4969)
Signed-off-by: Ettore Di Giacinto <mudler@localai.io>
2025-03-10 10:50:02 +01:00
Ettore Di Giacinto
8956452a45 chore(model gallery): add llmevollama-3.1-8b-v0.1-i1 (#4968)
Signed-off-by: Ettore Di Giacinto <mudler@localai.io>
2025-03-10 10:46:49 +01:00
LocalAI [bot]
f3659fa49c chore: ⬆️ Update ggml-org/llama.cpp to 1e2f78a00450593e2dfa458796fcdd9987300dfc (#4966)
⬆️ Update ggml-org/llama.cpp

Signed-off-by: github-actions[bot] <41898282+github-actions[bot]@users.noreply.github.com>
Co-authored-by: mudler <2420543+mudler@users.noreply.github.com>
2025-03-09 21:44:52 +00:00
Ettore Di Giacinto
585f2be793 chore(model gallery): add tower-babel_babel-9b-chat (#4964)
Signed-off-by: Ettore Di Giacinto <mudler@localai.io>
2025-03-09 12:09:56 +01:00
LocalAI [bot]
d13f160222 chore: ⬆️ Update ggml-org/llama.cpp to 0fd7ca7a210bd4abc995cd728491043491dbdef7 (#4963)
⬆️ Update ggml-org/llama.cpp

Signed-off-by: github-actions[bot] <41898282+github-actions[bot]@users.noreply.github.com>
Co-authored-by: mudler <2420543+mudler@users.noreply.github.com>
2025-03-08 21:41:26 +00:00
Ettore Di Giacinto
db5495b9d7 chore(model gallery): add goppa-ai_goppa-logillama (#4962)
Signed-off-by: Ettore Di Giacinto <mudler@localai.io>
2025-03-08 11:37:40 +01:00
Ettore Di Giacinto
3def1ae232 chore(model gallery): add huihui-ai_qwq-32b-abliterated (#4961)
Signed-off-by: Ettore Di Giacinto <mudler@localai.io>
2025-03-08 11:34:37 +01:00
Ettore Di Giacinto
c6ebead8e5 chore(model gallery): add steelskull_l3.3-electra-r1-70b (#4960)
Signed-off-by: Ettore Di Giacinto <mudler@localai.io>
2025-03-08 11:23:42 +01:00
LocalAI [bot]
cff4a950e0 chore: ⬆️ Update ggml-org/llama.cpp to 7ab364390f92b0b8d83f69821a536b424838f3f8 (#4959)
⬆️ Update ggml-org/llama.cpp

Signed-off-by: github-actions[bot] <41898282+github-actions[bot]@users.noreply.github.com>
Co-authored-by: mudler <2420543+mudler@users.noreply.github.com>
2025-03-07 22:54:28 +00:00
Ettore Di Giacinto
e4fa894153 fix(llama.cpp): correctly handle embeddings in batches (#4957)
Signed-off-by: Ettore Di Giacinto <mudler@localai.io>
2025-03-07 19:29:52 +01:00
Ettore Di Giacinto
69caccfa82 chore(model gallery): add granite embeddings models (#4956)
Signed-off-by: Ettore Di Giacinto <mudler@localai.io>
2025-03-06 23:17:40 +01:00
Ettore Di Giacinto
ab50c13160 chore(model gallery): add nomic-embed-text-v1.5 (#4955)
Signed-off-by: Ettore Di Giacinto <mudler@localai.io>
2025-03-06 23:10:39 +01:00
LocalAI [bot]
56d4e82b14 chore: ⬆️ Update ggml-org/llama.cpp to 3d652bfddfba09022525067e672c3c145c074649 (#4954)
⬆️ Update ggml-org/llama.cpp

Signed-off-by: github-actions[bot] <41898282+github-actions[bot]@users.noreply.github.com>
Co-authored-by: mudler <2420543+mudler@users.noreply.github.com>
2025-03-06 21:54:14 +00:00
Ettore Di Giacinto
09b5bd48bc chore(model gallery): add rombo-org_rombo-llm-v3.1-qwq-32b (#4953)
Signed-off-by: Ettore Di Giacinto <mudler@localai.io>
2025-03-06 10:42:45 +01:00
Ettore Di Giacinto
957dcfb6a9 chore(model gallery): add qwen_qwq-32b (#4952)
Signed-off-by: Ettore Di Giacinto <mudler@localai.io>
2025-03-06 10:28:03 +01:00
Ettore Di Giacinto
67f7bffd18 chore(deps): update llama.cpp and sync with upstream changes (#4950)
Signed-off-by: Ettore Di Giacinto <mudler@localai.io>
2025-03-06 00:40:58 +01:00
Ettore Di Giacinto
de81b42b49 feat(ui): remove api key handling and small ui adjustments (#4948)
* chore(ui): drop set api key button

Signed-off-by: Ettore Di Giacinto <mudler@localai.io>

* chore(ui): shore in-progress installs in model view

Signed-off-by: Ettore Di Giacinto <mudler@localai.io>

* feat(ui): improve text to image view

Signed-off-by: Ettore Di Giacinto <mudler@localai.io>

---------

Signed-off-by: Ettore Di Giacinto <mudler@localai.io>
2025-03-05 19:37:36 +01:00
Ettore Di Giacinto
06eb7e9fa7 chore(model gallery): add llama-3.3-magicalgirl-2.5-i1 (#4946)
Signed-off-by: Ettore Di Giacinto <mudler@localai.io>
2025-03-05 09:35:48 +01:00
Ettore Di Giacinto
45bc1ac566 chore(model gallery): add lolzinventor_meta-llama-3.1-8b-survivev3 (#4945)
Signed-off-by: Ettore Di Giacinto <mudler@localai.io>
2025-03-05 09:33:41 +01:00
Ettore Di Giacinto
02aafeff75 chore(model gallery): add llama-3.1-8b-instruct-uncensored-delmat-i1 (#4944)
Signed-off-by: Ettore Di Giacinto <mudler@localai.io>
2025-03-05 09:31:09 +01:00
Ettore Di Giacinto
6b46c52789 feat(ui): complete design overhaul (#4942)
This PR changes entirely the UI look and feeling. It updates all
sections and makes it also mobile-ready.

Signed-off-by: Ettore Di Giacinto <mudler@localai.io>
2025-03-05 08:27:03 +01:00
LocalAI [bot]
d732e261a4 chore: ⬆️ Update ggml-org/llama.cpp to 5bbe6a9fe9a8796a9389c85accec89dbc4d91e39 (#4943)
⬆️ Update ggml-org/llama.cpp

Signed-off-by: github-actions[bot] <41898282+github-actions[bot]@users.noreply.github.com>
Co-authored-by: mudler <2420543+mudler@users.noreply.github.com>
2025-03-04 21:46:40 +00:00
Ettore Di Giacinto
807c574e91 chore(model gallery): add azura-qwen2.5-32b-i1 (#4941)
Signed-off-by: Ettore Di Giacinto <mudler@localai.io>
2025-03-04 10:33:15 +01:00
Ettore Di Giacinto
bb171a39b3 chore(model gallery): add llama-3.3-magicalgirl-2 (#4940)
Signed-off-by: Ettore Di Giacinto <mudler@localai.io>
2025-03-04 10:07:38 +01:00
Ettore Di Giacinto
941a4fc50e chore(model gallery): add boomer_qwen_72b-i1 (#4939)
Signed-off-by: Ettore Di Giacinto <mudler@localai.io>
2025-03-04 10:01:23 +01:00
Ettore Di Giacinto
afe65bd7bf chore(model gallery): add l3.3-geneticlemonade-unleashed-70b-i1 (#4938)
Signed-off-by: Ettore Di Giacinto <mudler@localai.io>
2025-03-04 09:56:31 +01:00
Ettore Di Giacinto
6f9762049c chore(model gallery): update qihoo360_tinyr1-32b-preview (#4937)
Signed-off-by: Ettore Di Giacinto <mudler@localai.io>
2025-03-04 09:53:23 +01:00
LocalAI [bot]
122970d70d chore: ⬆️ Update ggml-org/llama.cpp to dfd6b2c0be191b3abe2fd9c1b25deff01c6249d8 (#4936)
⬆️ Update ggml-org/llama.cpp

Signed-off-by: github-actions[bot] <41898282+github-actions[bot]@users.noreply.github.com>
Co-authored-by: mudler <2420543+mudler@users.noreply.github.com>
2025-03-03 21:50:31 +00:00
dependabot[bot]
8664b1c7a2 chore(deps): Bump docs/themes/hugo-theme-relearn from 02bba0f to 4a4b60e (#4934)
chore(deps): Bump docs/themes/hugo-theme-relearn

Bumps [docs/themes/hugo-theme-relearn](https://github.com/McShelby/hugo-theme-relearn) from `02bba0f` to `4a4b60e`.
- [Release notes](https://github.com/McShelby/hugo-theme-relearn/releases)
- [Commits](02bba0f199...4a4b60ef04)

---
updated-dependencies:
- dependency-name: docs/themes/hugo-theme-relearn
  dependency-type: direct:production
...

Signed-off-by: dependabot[bot] <support@github.com>
Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com>
2025-03-03 19:56:41 +00:00
Ettore Di Giacinto
c92166f38a chore(model gallery): add steelskull_l3.3-mokume-gane-r1-70b-v1.1 (#4933)
Signed-off-by: Ettore Di Giacinto <mudler@localai.io>
2025-03-03 09:22:22 +01:00
LocalAI [bot]
d616058b12 chore: ⬆️ Update ggml-org/llama.cpp to 14dec0c2f29ae56917907dbf2eed6b19438d0a0e (#4932)
⬆️ Update ggml-org/llama.cpp

Signed-off-by: github-actions[bot] <41898282+github-actions[bot]@users.noreply.github.com>
Co-authored-by: mudler <2420543+mudler@users.noreply.github.com>
2025-03-02 22:27:01 +00:00
Ettore Di Giacinto
a7b4001b75 feat: allow to specify a reply prefix (#4931)
Signed-off-by: Ettore Di Giacinto <mudler@localai.io>
2025-03-02 16:07:32 +01:00
Ettore Di Giacinto
ff85f01459 chore(model gallery): add thedrummer_fallen-llama-3.3-r1-70b-v1 (#4930)
Signed-off-by: Ettore Di Giacinto <mudler@localai.io>
2025-03-02 10:29:28 +01:00
Ettore Di Giacinto
695f81a08b chore(model gallery): add qihoo360_tinyr1-32b-preview (#4929)
Signed-off-by: Ettore Di Giacinto <mudler@localai.io>
2025-03-02 10:24:17 +01:00
Ettore Di Giacinto
326be287da chore(model gallery): add ibm-granite_granite-3.2-2b-instruct (#4928)
Signed-off-by: Ettore Di Giacinto <mudler@localai.io>
2025-03-02 10:22:35 +01:00
Ettore Di Giacinto
0404d98190 chore(model gallery): add ibm-granite_granite-3.2-8b-instruct (#4927)
Signed-off-by: Ettore Di Giacinto <mudler@localai.io>
2025-03-02 10:19:27 +01:00
LocalAI [bot]
0a8ec1eb22 chore: ⬆️ Update ggml-org/llama.cpp to 1782cdfed60952f9ff333fc2ab5245f2be702453 (#4926)
⬆️ Update ggml-org/llama.cpp

Signed-off-by: github-actions[bot] <41898282+github-actions[bot]@users.noreply.github.com>
Co-authored-by: mudler <2420543+mudler@users.noreply.github.com>
2025-03-02 10:02:49 +01:00
Ettore Di Giacinto
d860932dcd fix(chatml): add endoftext stopword
Signed-off-by: Ettore Di Giacinto <mudler@localai.io>
2025-03-01 21:16:10 +01:00
Ettore Di Giacinto
1cb137bd2d fix(deephermes): correct typo
Signed-off-by: Ettore Di Giacinto <mudler@users.noreply.github.com>
2025-03-01 17:07:12 +01:00
Ettore Di Giacinto
3c279e5568 chore(model gallery): add allenai_olmocr-7b-0225-preview (#4924)
Signed-off-by: Ettore Di Giacinto <mudler@localai.io>
2025-03-01 09:10:04 +01:00
Ettore Di Giacinto
fb55e3df57 chore(model gallery): add ozone-research_0x-lite (#4923)
Signed-off-by: Ettore Di Giacinto <mudler@localai.io>
2025-03-01 09:07:01 +01:00
Ettore Di Giacinto
de46fb6e2e chore(model gallery): add ozone-research_chirp-01 (#4922)
Signed-off-by: Ettore Di Giacinto <mudler@localai.io>
2025-03-01 09:05:03 +01:00
Ettore Di Giacinto
d7a0e3c5ea chore(model gallery): add microsoft_phi-4-mini-instruct (#4921)
Signed-off-by: Ettore Di Giacinto <mudler@localai.io>
2025-03-01 08:58:01 +01:00
LocalAI [bot]
0533ea817d chore: ⬆️ Update ggml-org/llama.cpp to 06c2b1561d8b882bc018554591f8c35eb04ad30e (#4920)
⬆️ Update ggml-org/llama.cpp

Signed-off-by: github-actions[bot] <41898282+github-actions[bot]@users.noreply.github.com>
Co-authored-by: mudler <2420543+mudler@users.noreply.github.com>
2025-02-28 22:18:14 +00:00
Ettore Di Giacinto
755e4fb5f4 feat(ui): improvements to index and models page (#4918)
- mobile-friendly index
- adjust color palette
- improve search experience

Signed-off-by: Ettore Di Giacinto <mudler@localai.io>
2025-02-28 19:23:32 +01:00
LocalAI [bot]
e4fdde158f chore: ⬆️ Update ggml-org/llama.cpp to b95c8af37ccf169b0a3216b7ed691af0534e5091 (#4916)
⬆️ Update ggml-org/llama.cpp

Signed-off-by: github-actions[bot] <41898282+github-actions[bot]@users.noreply.github.com>
Co-authored-by: mudler <2420543+mudler@users.noreply.github.com>
2025-02-28 00:00:39 +00:00
Ettore Di Giacinto
6d0712fa6d fix(ui): not all models comes from gallery (#4915)
Signed-off-by: Ettore Di Giacinto <mudler@localai.io>
2025-02-27 19:12:41 +01:00
Ettore Di Giacinto
bbbb28e3ca fix(models): unify usecases identifications (#4914)
Signed-off-by: Ettore Di Giacinto <mudler@localai.io>
2025-02-27 15:51:12 +01:00
Ettore Di Giacinto
3bf2e9d065 fix(ui): not all models have an Icon (#4913)
Signed-off-by: Ettore Di Giacinto <mudler@localai.io>
2025-02-27 10:52:19 +01:00
Ettore Di Giacinto
1461fd8777 chore(model gallery): add locutusque_thespis-llama-3.1-8b (#4912)
Signed-off-by: Ettore Di Giacinto <mudler@localai.io>
2025-02-27 10:02:44 +01:00
LocalAI [bot]
054860539a chore: ⬆️ Update ggml-org/llama.cpp to a800ae46da2ed7dac236aa6bf2b595da6b6294b5 (#4911)
⬆️ Update ggml-org/llama.cpp

Signed-off-by: github-actions[bot] <41898282+github-actions[bot]@users.noreply.github.com>
Co-authored-by: mudler <2420543+mudler@users.noreply.github.com>
2025-02-26 22:43:49 +00:00
Ettore Di Giacinto
c87870b18e feat(ui): improve chat interface (#4910)
* feat(ui): show more informations in the chat view, minor adjustments to model gallery

Signed-off-by: Ettore Di Giacinto <mudler@localai.io>

* fix(ui): UI improvements

Visual improvements and bugfixes including:
- disable pagination during search
- fix scrolling on new message

Signed-off-by: Ettore Di Giacinto <mudler@localai.io>

---------

Signed-off-by: Ettore Di Giacinto <mudler@localai.io>
2025-02-26 18:27:18 +01:00
Ettore Di Giacinto
5ad2be9c45 feat(ui): small improvements to chat interface (#4907)
- Change chat colors
- Improve layout on small windows

Signed-off-by: Ettore Di Giacinto <mudler@localai.io>
2025-02-26 11:10:40 +01:00
LocalAI [bot]
61a24746a1 chore: ⬆️ Update ggml-org/llama.cpp to d7cfe1ffe0f435d0048a6058d529daf76e072d9c (#4908)
⬆️ Update ggml-org/llama.cpp

Signed-off-by: github-actions[bot] <41898282+github-actions[bot]@users.noreply.github.com>
Co-authored-by: mudler <2420543+mudler@users.noreply.github.com>
2025-02-25 21:58:37 +00:00
Ettore Di Giacinto
d557eb9361 chore(model gallery): add latitudegames_wayfarer-large-70b-llama-3.3 (#4903)
Signed-off-by: Ettore Di Giacinto <mudler@localai.io>
2025-02-25 10:21:54 +01:00
Ettore Di Giacinto
a9a1a361a9 chore(model gallery): add perplexity-ai_r1-1776-distill-llama-70b (#4902)
Signed-off-by: Ettore Di Giacinto <mudler@localai.io>
2025-02-25 09:59:21 +01:00
Ettore Di Giacinto
12d070af80 chore(model gallery): add sicariussicariistuff_phi-line_14b (#4901)
Signed-off-by: Ettore Di Giacinto <mudler@localai.io>
2025-02-25 09:56:44 +01:00
LocalAI [bot]
8d40557bc8 chore: ⬆️ Update ggml-org/llama.cpp to 7a2c913e66353362d7f28d612fd3c9d51a831eda (#4899)
⬆️ Update ggml-org/llama.cpp

Signed-off-by: github-actions[bot] <41898282+github-actions[bot]@users.noreply.github.com>
Co-authored-by: mudler <2420543+mudler@users.noreply.github.com>
2025-02-25 09:51:02 +01:00
dependabot[bot]
5a5f3a899a chore(deps): Bump docs/themes/hugo-theme-relearn from 66bc366 to 02bba0f (#4898)
chore(deps): Bump docs/themes/hugo-theme-relearn

Bumps [docs/themes/hugo-theme-relearn](https://github.com/McShelby/hugo-theme-relearn) from `66bc366` to `02bba0f`.
- [Release notes](https://github.com/McShelby/hugo-theme-relearn/releases)
- [Commits](66bc366c47...02bba0f199)

---
updated-dependencies:
- dependency-name: docs/themes/hugo-theme-relearn
  dependency-type: direct:production
...

Signed-off-by: dependabot[bot] <support@github.com>
Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com>
2025-02-25 09:50:46 +01:00
dependabot[bot]
a2d1f133c8 chore(deps): Bump appleboy/ssh-action from 1.2.0 to 1.2.1 (#4896)
Bumps [appleboy/ssh-action](https://github.com/appleboy/ssh-action) from 1.2.0 to 1.2.1.
- [Release notes](https://github.com/appleboy/ssh-action/releases)
- [Changelog](https://github.com/appleboy/ssh-action/blob/master/.goreleaser.yaml)
- [Commits](https://github.com/appleboy/ssh-action/compare/v1.2.0...v1.2.1)

---
updated-dependencies:
- dependency-name: appleboy/ssh-action
  dependency-type: direct:production
  update-type: version-update:semver-patch
...

Signed-off-by: dependabot[bot] <support@github.com>
Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com>
2025-02-24 21:01:39 +00:00
LocalAI [bot]
0ae6420c31 chore: ⬆️ Update ggml-org/llama.cpp to 7ad0779f5de84a68143b2c00ab5dc94a948925d3 (#4890)
⬆️ Update ggml-org/llama.cpp

Signed-off-by: github-actions[bot] <41898282+github-actions[bot]@users.noreply.github.com>
Co-authored-by: mudler <2420543+mudler@users.noreply.github.com>
2025-02-24 11:26:09 +01:00
Ettore Di Giacinto
3a3e05cf18 chore(model gallery): add flux.1dev-abliteratedv2 (#4895)
Signed-off-by: Ettore Di Giacinto <mudler@localai.io>
2025-02-24 10:11:32 +01:00
Ettore Di Giacinto
6a20388e25 chore(model gallery): add nohobby_l3.3-prikol-70b-extra (#4894)
Signed-off-by: Ettore Di Giacinto <mudler@localai.io>
2025-02-24 09:43:50 +01:00
Ettore Di Giacinto
06c836a937 chore(model gallery): add steelskull_l3.3-san-mai-r1-70b (#4893)
Signed-off-by: Ettore Di Giacinto <mudler@localai.io>
2025-02-24 09:41:06 +01:00
Ettore Di Giacinto
049a13fe78 chore(model gallery): add steelskull_l3.3-cu-mai-r1-70b (#4892)
Signed-off-by: Ettore Di Giacinto <mudler@localai.io>
2025-02-24 09:39:12 +01:00
Ettore Di Giacinto
30bf6c962f chore(stable-diffusion-ggml): update, adapt upstream changes (#4889)
Signed-off-by: Ettore Di Giacinto <mudler@localai.io>
2025-02-23 08:36:41 +01:00
LocalAI [bot]
a72b3a23c3 chore: ⬆️ Update ggml-org/llama.cpp to a28e0d5eb18c18e6a4598286158f427269b1444e (#4887)
⬆️ Update ggml-org/llama.cpp

Signed-off-by: github-actions[bot] <41898282+github-actions[bot]@users.noreply.github.com>
Co-authored-by: mudler <2420543+mudler@users.noreply.github.com>
2025-02-23 08:36:25 +01:00
Ettore Di Giacinto
e9971b168a feat(ui): paginate model gallery (#4886)
Signed-off-by: Ettore Di Giacinto <mudler@localai.io>
2025-02-22 21:38:00 +01:00
Ettore Di Giacinto
5b59b5e0c1 chore(model gallery): add steelskull_l3.3-mokume-gane-r1-70b (#4885)
Signed-off-by: Ettore Di Giacinto <mudler@localai.io>
2025-02-22 18:58:06 +01:00
Ettore Di Giacinto
8cfd712428 chore(model gallery): add arcee-ai_arcee-maestro-7b-preview (#4884)
Signed-off-by: Ettore Di Giacinto <mudler@localai.io>
2025-02-22 11:32:25 +01:00
Ettore Di Giacinto
21f7faa80d chore(model gallery): add ozone-ai_reverb-7b (#4883)
Signed-off-by: Ettore Di Giacinto <mudler@localai.io>
2025-02-22 11:28:27 +01:00
Ettore Di Giacinto
a6a0121118 chore(model gallery): add rombo-org_rombo-llm-v3.0-qwen-72b (#4882)
Signed-off-by: Ettore Di Giacinto <mudler@localai.io>
2025-02-22 11:19:04 +01:00
LocalAI [bot]
ba66aa33c5 chore: ⬆️ Update ggml-org/llama.cpp to 51f311e057723b7454d0ebe20f545a1a2c4db6b2 (#4881)
⬆️ Update ggml-org/llama.cpp

Signed-off-by: github-actions[bot] <41898282+github-actions[bot]@users.noreply.github.com>
Co-authored-by: mudler <2420543+mudler@users.noreply.github.com>
2025-02-21 21:51:02 +00:00
Ettore Di Giacinto
8fc024a770 chore(model gallery): add pocketdoc_dans-personalityengine-v1.2.0-24b (#4880)
Signed-off-by: Ettore Di Giacinto <mudler@localai.io>
2025-02-21 10:00:23 +01:00
Ettore Di Giacinto
52aa9d08aa chore(model gallery): add l3.1-8b-rp-ink (#4879)
Signed-off-by: Ettore Di Giacinto <mudler@localai.io>
2025-02-21 09:56:57 +01:00
Ettore Di Giacinto
4c9379c39e chore(model gallery): add smirki_uigen-t1.1-qwen-7b (#4878)
Signed-off-by: Ettore Di Giacinto <mudler@localai.io>
2025-02-21 09:54:42 +01:00
Ettore Di Giacinto
0ff2c39364 chore(model gallery): add smirki_uigen-t1.1-qwen-14b (#4877)
Signed-off-by: Ettore Di Giacinto <mudler@localai.io>
2025-02-21 09:52:20 +01:00
LocalAI [bot]
1af7e5dc49 chore: ⬆️ Update ggml-org/llama.cpp to c392e5094deaf2d1a7c18683214f007fad3fe42b (#4876)
⬆️ Update ggml-org/llama.cpp

Signed-off-by: github-actions[bot] <41898282+github-actions[bot]@users.noreply.github.com>
Co-authored-by: mudler <2420543+mudler@users.noreply.github.com>
2025-02-20 22:03:52 +00:00
Ettore Di Giacinto
af3bb64e42 fix(coqui): pin transformers (#4875)
Signed-off-by: Ettore Di Giacinto <mudler@localai.io>
2025-02-20 16:16:54 +01:00
Ettore Di Giacinto
77281f836e chore(model gallery): add internlm_oreal-7b (#4874)
Signed-off-by: Ettore Di Giacinto <mudler@localai.io>
2025-02-20 15:57:21 +01:00
Ettore Di Giacinto
550275811d chore(model gallery): add internlm_oreal-deepseek-r1-distill-qwen-7b (#4873)
Signed-off-by: Ettore Di Giacinto <mudler@localai.io>
2025-02-20 15:55:13 +01:00
Ettore Di Giacinto
c27ce6c54d chore(model gallery): add internlm_oreal-32b (#4872)
Signed-off-by: Ettore Di Giacinto <mudler@localai.io>
2025-02-20 15:52:28 +01:00
Ettore Di Giacinto
ac4991b069 chore(docs): update sponsor logo
Signed-off-by: Ettore Di Giacinto <mudler@localai.io>
2025-02-20 15:31:41 +01:00
Ettore Di Giacinto
25bee71bb8 feat(ui): do also filter tts and image models (#4871)
Signed-off-by: Ettore Di Giacinto <mudler@localai.io>
2025-02-20 15:02:18 +01:00
LocalAI [bot]
b993780a3b chore: ⬆️ Update ggml-org/llama.cpp to d04e7163c85a847bc61d58c22f2c503596db7aa8 (#4870)
⬆️ Update ggml-org/llama.cpp

Signed-off-by: github-actions[bot] <41898282+github-actions[bot]@users.noreply.github.com>
Co-authored-by: mudler <2420543+mudler@users.noreply.github.com>
2025-02-20 09:42:57 +01:00
Ettore Di Giacinto
ea0c9f1168 feat(ui): show only text models in the chat interface (#4869)
Signed-off-by: Ettore Di Giacinto <mudler@localai.io>
2025-02-19 17:34:30 +01:00
Ettore Di Giacinto
08311f275a chore(model gallery): add sentientagi_dobby-unhinged-llama-3.3-70b (#4868)
Signed-off-by: Ettore Di Giacinto <mudler@localai.io>
2025-02-19 10:36:36 +01:00
Ettore Di Giacinto
4de0f2f737 chore(model gallery): add open-r1_openr1-qwen-7b (#4867)
Signed-off-by: Ettore Di Giacinto <mudler@localai.io>
2025-02-19 10:04:01 +01:00
Ettore Di Giacinto
42ae807c41 chore(model gallery): add pygmalionai_pygmalion-3-12b (#4866)
Signed-off-by: Ettore Di Giacinto <mudler@localai.io>
2025-02-19 10:02:35 +01:00
LocalAI [bot]
94593ba4c3 chore: ⬆️ Update ggml-org/llama.cpp to 63e489c025d61c7ca5ec06c5d10f36e2b76aaa1d (#4865)
⬆️ Update ggml-org/llama.cpp

Signed-off-by: github-actions[bot] <41898282+github-actions[bot]@users.noreply.github.com>
Co-authored-by: mudler <2420543+mudler@users.noreply.github.com>
2025-02-19 09:19:28 +01:00
Brandon Beiler
6a6e1a0ea9 feat(vllm): Additional vLLM config options (Disable logging, dtype, and Per-Prompt media limits) (#4855)
* Adding the following vLLM config options: disable_log_status, dtype, limit_mm_per_prompt

Signed-off-by: TheDropZone <brandonbeiler@gmail.com>

* using " marks in the config.yaml file

Signed-off-by: TheDropZone <brandonbeiler@gmail.com>

* adding in missing colon

Signed-off-by: TheDropZone <brandonbeiler@gmail.com>

---------

Signed-off-by: TheDropZone <brandonbeiler@gmail.com>
2025-02-18 19:27:58 +01:00
Ettore Di Giacinto
5b19af99ff feat(ui): detect model usage and display link (#4864)
Signed-off-by: Ettore Di Giacinto <mudler@localai.io>
2025-02-18 19:27:07 +01:00
Ettore Di Giacinto
28fb8e607a chore(model gallery): add nbeerbower_dumpling-qwen2.5-72b (#4862)
Signed-off-by: Ettore Di Giacinto <mudler@localai.io>
2025-02-18 12:44:59 +01:00
Ettore Di Giacinto
bb85b6ef00 feat: improve ui models list in the index (#4863)
* feat(ui): improve index

- Redirect to the chat view when clicking on a model

Signed-off-by: Ettore Di Giacinto <mudler@localai.io>

* Display chat icon nearby the model

Signed-off-by: Ettore Di Giacinto <mudler@localai.io>

---------

Signed-off-by: Ettore Di Giacinto <mudler@localai.io>
2025-02-18 12:44:44 +01:00
Ettore Di Giacinto
b9b5a635ca chore(model gallery): add nbeerbower_dumpling-qwen2.5-32b-v2 (#4861)
Signed-off-by: Ettore Di Giacinto <mudler@localai.io>
2025-02-18 11:53:23 +01:00
Ettore Di Giacinto
131ea5b627 chore(model gallery): add nbeerbower_dumpling-qwen2.5-14b (#4860)
Signed-off-by: Ettore Di Giacinto <mudler@localai.io>
2025-02-18 11:51:29 +01:00
Ettore Di Giacinto
fac70e9642 chore(model gallery): add allenai_llama-3.1-tulu-3.1-8b (#4859)
Signed-off-by: Ettore Di Giacinto <mudler@localai.io>
2025-02-18 11:49:26 +01:00
Ettore Di Giacinto
7e76ea40fb chore(model gallery): add kubeguru-llama3.2-3b-v0.1 (#4858)
Signed-off-by: Ettore Di Giacinto <mudler@localai.io>
2025-02-18 11:47:00 +01:00
LocalAI [bot]
de09ae42ef chore: ⬆️ Update ggml-org/llama.cpp to 73e2ed3ce3492d3ed70193dd09ae8aa44779651d (#4854)
⬆️ Update ggml-org/llama.cpp

Signed-off-by: github-actions[bot] <41898282+github-actions[bot]@users.noreply.github.com>
Co-authored-by: mudler <2420543+mudler@users.noreply.github.com>
2025-02-18 09:11:07 +01:00
Ettore Di Giacinto
6424f0666d chore(deps): Bump edgevpn to v0.30.1 (#4840)
Signed-off-by: Ettore Di Giacinto <mudler@localai.io>
2025-02-17 16:51:22 +01:00
Ettore Di Giacinto
f3ae94ca70 chore: update Image generation docs and examples (#4841)
Signed-off-by: Ettore Di Giacinto <mudler@localai.io>
2025-02-17 16:51:06 +01:00
LocalAI [bot]
09c9f67a02 chore: ⬆️ Update ggml-org/llama.cpp to 2eea03d86a2d132c8245468c26290ce07a27a8e8 (#4839)
⬆️ Update ggml-org/llama.cpp

Signed-off-by: github-actions[bot] <41898282+github-actions[bot]@users.noreply.github.com>
Co-authored-by: mudler <2420543+mudler@users.noreply.github.com>
2025-02-17 10:55:30 +01:00
Ettore Di Giacinto
c264ca542d fix(ci): update repository for llama.cpp
Signed-off-by: Ettore Di Giacinto <mudler@users.noreply.github.com>
2025-02-17 09:33:34 +01:00
Bas Hulsken
bbf30d416d fix: change initialization order of llama-cpp-avx512 to go before avx2 variant (#4837)
changed to initialization order of the avx512 version of llama.cpp, now tries before avx2

Signed-off-by: Bas Hulsken <bhulsken@hotmail.com>
2025-02-17 09:32:21 +01:00
Ettore Di Giacinto
27617a1b06 chore(model gallery): add ozone-ai_0x-lite (#4835)
Signed-off-by: Ettore Di Giacinto <mudler@localai.io>
2025-02-16 09:23:26 +01:00
Ettore Di Giacinto
e84081769e chore(ci): cleanup before pulling images again 2025-02-16 09:20:22 +01:00
LocalAI [bot]
20119fc580 docs: ⬆️ update docs version mudler/LocalAI (#4834)
⬆️ Update docs version mudler/LocalAI

Signed-off-by: github-actions[bot] <41898282+github-actions[bot]@users.noreply.github.com>
Co-authored-by: mudler <2420543+mudler@users.noreply.github.com>
2025-02-15 22:45:11 +00:00
Ettore Di Giacinto
09941c0bfb chore(docs): update license year
Signed-off-by: Ettore Di Giacinto <mudler@localai.io>
2025-02-15 18:17:15 +01:00
Ettore Di Giacinto
cabe0f4993 chore(model gallery): add davidbrowne17_llamathink-8b-instruct (#4833)
Signed-off-by: Ettore Di Giacinto <mudler@localai.io>
2025-02-15 17:31:46 +01:00
Ettore Di Giacinto
1977c7f190 chore(model gallery): add pygmalionai_eleusis-12b (#4832)
Signed-off-by: Ettore Di Giacinto <mudler@localai.io>
2025-02-15 17:21:30 +01:00
Ettore Di Giacinto
061e7c4eae chore(model gallery): add rombo-org_rombo-llm-v3.0-qwen-32b (#4830)
Signed-off-by: Ettore Di Giacinto <mudler@localai.io>
2025-02-15 10:58:27 +01:00
LocalAI [bot]
5313e660f6 chore: ⬆️ Update ggerganov/llama.cpp to 300907b2110cc17b4337334dc397e05de2d8f5e0 (#4829)
⬆️ Update ggerganov/llama.cpp

Signed-off-by: github-actions[bot] <41898282+github-actions[bot]@users.noreply.github.com>
Co-authored-by: mudler <2420543+mudler@users.noreply.github.com>
2025-02-14 21:51:49 +00:00
Ettore Di Giacinto
9e32fda304 fix(llama.cpp): improve context shift handling (#4820)
Signed-off-by: Ettore Di Giacinto <mudler@localai.io>
2025-02-14 14:55:03 +01:00
Ettore Di Giacinto
83202cae54 chore(model gallery): add nousresearch_deephermes-3-llama-3-8b-preview (#4828)
Signed-off-by: Ettore Di Giacinto <mudler@localai.io>
2025-02-14 12:25:00 +01:00
Ettore Di Giacinto
d96addfa9d chore(model gallery): add open-thoughts_openthinker-32b (#4827)
Signed-off-by: Ettore Di Giacinto <mudler@localai.io>
2025-02-14 12:03:46 +01:00
Ettore Di Giacinto
a715fe588d chore(model gallery): add sicariussicariistuff_phi-lthy4 (#4826)
Signed-off-by: Ettore Di Giacinto <mudler@localai.io>
2025-02-14 11:57:14 +01:00
LocalAI [bot]
2ac4a86bb4 chore: ⬆️ Update ggerganov/llama.cpp to 8a8c4ceb6050bd9392609114ca56ae6d26f5b8f5 (#4825)
⬆️ Update ggerganov/llama.cpp

Signed-off-by: github-actions[bot] <41898282+github-actions[bot]@users.noreply.github.com>
Co-authored-by: mudler <2420543+mudler@users.noreply.github.com>
2025-02-13 21:49:57 +00:00
Ettore Di Giacinto
8670d480a6 chore(model gallery): add nvidia_aceinstruct-72b (#4822)
Signed-off-by: Ettore Di Giacinto <mudler@localai.io>
2025-02-13 09:58:34 +01:00
Ettore Di Giacinto
af0b4ff237 chore(ci): update labels
Signed-off-by: Ettore Di Giacinto <mudler@users.noreply.github.com>
2025-02-13 09:58:19 +01:00
Ettore Di Giacinto
e694764065 chore(model gallery): add nvidia_aceinstruct-7b (#4821)
Signed-off-by: Ettore Di Giacinto <mudler@localai.io>
2025-02-13 09:44:53 +01:00
Ettore Di Giacinto
f3c27e0381 chore(model gallery): add nvidia_aceinstruct-1.5b (#4819)
Signed-off-by: Ettore Di Giacinto <mudler@localai.io>
2025-02-13 09:33:40 +01:00
LocalAI [bot]
bf44319d0d chore: ⬆️ Update ggerganov/llama.cpp to 0fb77f821f6e70ad8b8247a97d1022f0fef78991 (#4814)
⬆️ Update ggerganov/llama.cpp

Signed-off-by: github-actions[bot] <41898282+github-actions[bot]@users.noreply.github.com>
Co-authored-by: mudler <2420543+mudler@users.noreply.github.com>
2025-02-12 22:41:53 +00:00
Ettore Di Giacinto
5b133a640b chore(model gallery): add theskullery_l3.3-exp-unnamed-model-70b-v0.5 (#4813)
Signed-off-by: Ettore Di Giacinto <mudler@localai.io>
2025-02-12 11:05:51 +01:00
Ettore Di Giacinto
0030a3fe75 chore(model gallery): add simplescaling_s1.1-32b (#4812)
Signed-off-by: Ettore Di Giacinto <mudler@localai.io>
2025-02-12 11:03:05 +01:00
Ettore Di Giacinto
0a748b009e chore(ci): avoit cache hits until the ci gRPC job is fixed
Signed-off-by: Ettore Di Giacinto <mudler@localai.io>
2025-02-12 09:11:40 +01:00
LocalAI [bot]
257e951def chore: ⬆️ Update ggerganov/llama.cpp to 90e4dba461b07e635fd1daf3b491c978c7dd0013 (#4810)
⬆️ Update ggerganov/llama.cpp

Signed-off-by: github-actions[bot] <41898282+github-actions[bot]@users.noreply.github.com>
Co-authored-by: mudler <2420543+mudler@users.noreply.github.com>
2025-02-12 00:13:28 +01:00
LocalAI [bot]
fbd82a2dd0 feat(swagger): update swagger (#4809)
Signed-off-by: github-actions[bot] <41898282+github-actions[bot]@users.noreply.github.com>
Co-authored-by: mudler <2420543+mudler@users.noreply.github.com>
2025-02-11 21:54:40 +00:00
Ettore Di Giacinto
5db321dad2 chore(ci): do not always regenerate the cache
Signed-off-by: Ettore Di Giacinto <mudler@localai.io>
2025-02-11 16:36:00 +01:00
Ettore Di Giacinto
f5638a6354 feat(diffusers): allow to override image gen options (#4807)
Use the options field in the model to override kwargs if needed.

This allows to specify from the model yaml config:

```yaml

options:
- foo:bar

```

And each option will be used directly when calling the diffusers
pipeline, e.g:

```python
pipe(
  foo="bar",
)
```

Signed-off-by: Ettore Di Giacinto <mudler@localai.io>
2025-02-11 10:16:32 +01:00
Ettore Di Giacinto
5f64cc6328 Revert "chore(deps): Bump docs/themes/lotusdocs from f5785a2 to 975da91" (#4808)
Revert "chore(deps): Bump docs/themes/lotusdocs from `f5785a2` to `975da91` (…"

This reverts commit e57b750ca3.
2025-02-11 10:05:57 +01:00
Ettore Di Giacinto
28b10e8804 chore(swagger): update (#4805)
Signed-off-by: Ettore Di Giacinto <mudler@localai.io>
2025-02-11 09:51:01 +01:00
Ettore Di Giacinto
3277f5095d chore(model gallery): add agentica-org_deepscaler-1.5b-preview (#4804)
Signed-off-by: Ettore Di Giacinto <mudler@localai.io>
2025-02-11 09:47:19 +01:00
Ettore Di Giacinto
fe3ced2919 chore(ci): try again to bump parallelism in grpc jobs
As we moved these out to self-hosted

Signed-off-by: Ettore Di Giacinto <mudler@localai.io>
2025-02-11 09:31:00 +01:00
LocalAI [bot]
45e37a07bb chore: ⬆️ Update ggerganov/llama.cpp to 19b392d58dc08c366d0b29bd3b9c6991fa4e1662 (#4803)
⬆️ Update ggerganov/llama.cpp

Signed-off-by: github-actions[bot] <41898282+github-actions[bot]@users.noreply.github.com>
Co-authored-by: mudler <2420543+mudler@users.noreply.github.com>
2025-02-11 09:17:02 +01:00
dependabot[bot]
e57b750ca3 chore(deps): Bump docs/themes/lotusdocs from f5785a2 to 975da91 (#4801)
Bumps [docs/themes/lotusdocs](https://github.com/colinwilson/lotusdocs) from `f5785a2` to `975da91`.
- [Release notes](https://github.com/colinwilson/lotusdocs/releases)
- [Commits](f5785a2399...975da91e83)

---
updated-dependencies:
- dependency-name: docs/themes/lotusdocs
  dependency-type: direct:production
...

Signed-off-by: dependabot[bot] <support@github.com>
Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com>
2025-02-10 22:27:14 +00:00
Ettore Di Giacinto
49df492268 chore(ci): run grpc build on self-hosted
Signed-off-by: Ettore Di Giacinto <mudler@localai.io>
2025-02-10 19:44:50 +01:00
Ettore Di Giacinto
516cd660f1 chore(grpcio): reduce parallelism (#4799)
Signed-off-by: Ettore Di Giacinto <mudler@localai.io>
2025-02-10 18:56:13 +01:00
Ettore Di Giacinto
8fd3ace9a1 chore(grpcio): bump to 1.70 (#4798)
Signed-off-by: Ettore Di Giacinto <mudler@localai.io>
2025-02-10 18:38:53 +01:00
Ettore Di Giacinto
099469cb05 chore(tests): decrease parallelism for gRPC builds (#4797)
Signed-off-by: Ettore Di Giacinto <mudler@localai.io>
2025-02-10 12:59:59 +01:00
Ettore Di Giacinto
6be8c0c618 chore(model gallery): add localai-functioncall-qwen2.5-7b-v0.5 (#4796)
Signed-off-by: Ettore Di Giacinto <mudler@localai.io>
2025-02-10 12:07:35 +01:00
Dave
3cddf24747 feat: Centralized Request Processing middleware (#3847)
* squash past, centralize request middleware PR

Signed-off-by: Dave Lee <dave@gray101.com>

* migrate bruno request files to examples repo

Signed-off-by: Dave Lee <dave@gray101.com>

* fix

Signed-off-by: Dave Lee <dave@gray101.com>

* Update tests/e2e-aio/e2e_test.go

Signed-off-by: Ettore Di Giacinto <mudler@users.noreply.github.com>

---------

Signed-off-by: Dave Lee <dave@gray101.com>
Signed-off-by: Ettore Di Giacinto <mudler@users.noreply.github.com>
Co-authored-by: Ettore Di Giacinto <mudler@users.noreply.github.com>
2025-02-10 12:06:16 +01:00
Ettore Di Giacinto
c330360785 chore(model gallery): add ilsp_llama-krikri-8b-instruct (#4795)
Signed-off-by: Ettore Di Giacinto <mudler@localai.io>
2025-02-10 09:54:54 +01:00
LocalAI [bot]
8cd51570e5 chore: ⬆️ Update ggerganov/llama.cpp to 19d3c8293b1f61acbe2dab1d49a17950fd788a4a (#4793)
⬆️ Update ggerganov/llama.cpp

Signed-off-by: github-actions[bot] <41898282+github-actions[bot]@users.noreply.github.com>
Co-authored-by: mudler <2420543+mudler@users.noreply.github.com>
2025-02-09 22:12:01 +00:00
Ettore Di Giacinto
0e7aa5cd15 chore(model gallery): add subtleone_qwen2.5-32b-erudite-writer (#4792)
Signed-off-by: Ettore Di Giacinto <mudler@localai.io>
2025-02-09 10:59:46 +01:00
Ettore Di Giacinto
e06a5f49de chore(model gallery): add huihui-ai_deepseek-r1-distill-llama-70b-abliterated (#4790)
Signed-off-by: Ettore Di Giacinto <mudler@localai.io>
2025-02-09 10:53:48 +01:00
Dave
fb2f847507 chore: migrate bruno request files to examples repo (#4788)
migrate bruno request files to examples repo

Signed-off-by: Dave Lee <dave@gray101.com>
2025-02-09 10:52:28 +01:00
LocalAI [bot]
e01acc88c9 chore: ⬆️ Update ggerganov/llama.cpp to e6e658319952f7ad269dc11275b9edddc721fc6d (#4787)
⬆️ Update ggerganov/llama.cpp

Signed-off-by: github-actions[bot] <41898282+github-actions[bot]@users.noreply.github.com>
Co-authored-by: mudler <2420543+mudler@users.noreply.github.com>
2025-02-08 21:57:40 +00:00
LocalAI [bot]
7a5912908a chore: ⬆️ Update ggerganov/llama.cpp to d2fe216fb2fb7ca8627618c9ea3a2e7886325780 (#4780)
⬆️ Update ggerganov/llama.cpp

Signed-off-by: github-actions[bot] <41898282+github-actions[bot]@users.noreply.github.com>
Co-authored-by: mudler <2420543+mudler@users.noreply.github.com>
2025-02-08 09:44:34 +01:00
Ettore Di Giacinto
4b1b942a7f chore(model gallery): add sicariussicariistuff_redemption_wind_24b (#4781)
Signed-off-by: Ettore Di Giacinto <mudler@localai.io>
2025-02-08 09:04:18 +01:00
Ettore Di Giacinto
230fe0098f chore(model gallery): add cognitivecomputations_dolphin3.0-mistral-24b (#4779)
Signed-off-by: Ettore Di Giacinto <mudler@localai.io>
2025-02-07 13:33:24 +01:00
Ettore Di Giacinto
cc163429dc chore(model gallery): add cognitivecomputations_dolphin3.0-r1-mistral-24b (#4778)
Signed-off-by: Ettore Di Giacinto <mudler@localai.io>
2025-02-07 13:31:49 +01:00
Ettore Di Giacinto
f670e0a91c chore(model gallery): add nohobby_l3.3-prikol-70b-v0.5 (#4777)
Signed-off-by: Ettore Di Giacinto <mudler@localai.io>
2025-02-07 13:29:53 +01:00
LocalAI [bot]
731674eee7 chore: ⬆️ Update ggerganov/llama.cpp to 8a59053f63fffc24e730cd3ea067760abfe4a919 (#4776)
⬆️ Update ggerganov/llama.cpp

Signed-off-by: github-actions[bot] <41898282+github-actions[bot]@users.noreply.github.com>
Co-authored-by: mudler <2420543+mudler@users.noreply.github.com>
2025-02-06 22:02:00 +00:00
Ettore Di Giacinto
cc1f6f913f fix(llama.cpp): disable mirostat as default (#2911)
Even if increasing the quality of the output, it has shown to have
performance drawbacks to be so noticeable that the confuses users about
speed of LocalAI ( see also
https://github.com/mudler/LocalAI/issues/2780 ).

This changeset disables Mirostat by default (which can
be still enabled manually).

Signed-off-by: Ettore Di Giacinto <mudler@localai.io>
Co-authored-by: Dave <dave@gray101.com>
2025-02-06 19:39:59 +01:00
Ettore Di Giacinto
7f90ff7aec chore(llama-ggml): drop deprecated backend (#4775)
The GGML format is now dead, since in the next version of LocalAI we
already bring many breaking compatibility changes, taking the occasion
also to drop ggml support (pre-gguf).

Signed-off-by: Ettore Di Giacinto <mudler@localai.io>
2025-02-06 18:36:23 +01:00
Ettore Di Giacinto
8d45670e41 fix(openai): consistently return stop reason (#4771)
We were not returning a stop reason when no tool was actually called
(even if specified).

Fixes: https://github.com/mudler/LocalAI/issues/4716

Signed-off-by: Ettore Di Giacinto <mudler@localai.io>
2025-02-06 12:41:08 +01:00
Ettore Di Giacinto
e4b8ddb6a1 chore(model gallery): add black-ink-guild_pernicious_prophecy_70b (#4774)
Signed-off-by: Ettore Di Giacinto <mudler@localai.io>
2025-02-06 12:03:59 +01:00
Ettore Di Giacinto
a801561f81 chore(model gallery): add tiger-lab_qwen2.5-32b-instruct-cft (#4773)
Signed-off-by: Ettore Di Giacinto <mudler@localai.io>
2025-02-06 12:01:56 +01:00
Ettore Di Giacinto
16ced07102 chore(model gallery): add arliai_llama-3.3-70b-arliai-rpmax-v1.4 (#4772)
Signed-off-by: Ettore Di Giacinto <mudler@localai.io>
2025-02-06 11:59:14 +01:00
LocalAI [bot]
d35595372d chore: ⬆️ Update ggerganov/llama.cpp to d774ab3acc4fee41fbed6dbfc192b57d5f79f34b (#4770)
⬆️ Update ggerganov/llama.cpp

Signed-off-by: github-actions[bot] <41898282+github-actions[bot]@users.noreply.github.com>
Co-authored-by: mudler <2420543+mudler@users.noreply.github.com>
2025-02-06 09:02:51 +01:00
LocalAI [bot]
81be192279 chore: ⬆️ Update leejet/stable-diffusion.cpp to d46ed5e184b97c2018dc2e8105925bdb8775e02c (#4769)
⬆️ Update leejet/stable-diffusion.cpp

Signed-off-by: github-actions[bot] <41898282+github-actions[bot]@users.noreply.github.com>
Co-authored-by: mudler <2420543+mudler@users.noreply.github.com>
2025-02-05 23:49:15 +00:00
Ettore Di Giacinto
28a1310890 chore(docs): enhance visibility
Signed-off-by: Ettore Di Giacinto <mudler@localai.io>
2025-02-05 19:50:32 +01:00
Ettore Di Giacinto
2a702e9ca4 chore(docs): small updates
Signed-off-by: Ettore Di Giacinto <mudler@localai.io>
2025-02-05 19:49:11 +01:00
Ettore Di Giacinto
3ecaea1b6e chore(docs): update sponsors in the website
Signed-off-by: Ettore Di Giacinto <mudler@localai.io>
2025-02-05 19:41:55 +01:00
Ettore Di Giacinto
7daf5ac3e3 fix(gallery): do not return overrides and additional config (#4768)
When hitting /models/available we are intersted in the model
description, name and small metadatas. Configuration and overrides are
part of internals which are required only for installation.

This also solves a current bug when hitting /models/available fails if
one of the gallery items have overrides with parameters defined

Signed-off-by: Ettore Di Giacinto <mudler@localai.io>
2025-02-05 18:37:09 +01:00
Ettore Di Giacinto
7bc80c17f8 chore(model gallery): add LocalAI-functioncall-llama3.2-3b-v0.5 (#4766)
Signed-off-by: Ettore Di Giacinto <mudler@localai.io>
2025-02-05 10:19:31 +01:00
Ettore Di Giacinto
1996ceb293 chore(model gallery): add krutrim-ai-labs_krutrim-2-instruct (#4765)
Signed-off-by: Ettore Di Giacinto <mudler@localai.io>
2025-02-05 10:17:05 +01:00
Ettore Di Giacinto
0bc3dc43da chore(model gallery): add rubenroy_gilgamesh-72b (#4764)
Signed-off-by: Ettore Di Giacinto <mudler@localai.io>
2025-02-05 10:13:21 +01:00
Ettore Di Giacinto
3324c4e6cb chore(model gallery): add agi-0_art-skynet-3b (#4763)
Signed-off-by: Ettore Di Giacinto <mudler@localai.io>
2025-02-05 10:09:33 +01:00
LocalAI [bot]
7329db4e78 chore: ⬆️ Update ggerganov/llama.cpp to 3ec9fd4b77b6aca03a3c2bf678eae3f9517d6904 (#4762)
⬆️ Update ggerganov/llama.cpp

Signed-off-by: github-actions[bot] <41898282+github-actions[bot]@users.noreply.github.com>
Co-authored-by: mudler <2420543+mudler@users.noreply.github.com>
2025-02-04 21:48:49 +00:00
Ettore Di Giacinto
464686aee6 chore(model gallery): add suayptalha_maestro-10b (#4760)
Signed-off-by: Ettore Di Giacinto <mudler@localai.io>
2025-02-04 09:51:54 +01:00
Ettore Di Giacinto
bfa3d4ccff chore(model gallery): add nohobby_l3.3-prikol-70b-v0.4 (#4759)
Signed-off-by: Ettore Di Giacinto <mudler@localai.io>
2025-02-04 09:50:18 +01:00
Ettore Di Giacinto
6a91288c8c chore(model gallery): add fblgit_miniclaus-qw1.5b-unamgs-grpo (#4758)
Signed-off-by: Ettore Di Giacinto <mudler@localai.io>
2025-02-04 09:45:52 +01:00
dependabot[bot]
96cb407ee0 chore(deps): Bump docs/themes/hugo-theme-relearn from 5bcb9fe to 66bc366 (#4750)
chore(deps): Bump docs/themes/hugo-theme-relearn

Bumps [docs/themes/hugo-theme-relearn](https://github.com/McShelby/hugo-theme-relearn) from `5bcb9fe` to `66bc366`.
- [Release notes](https://github.com/McShelby/hugo-theme-relearn/releases)
- [Commits](5bcb9fe5e6...66bc366c47)

---
updated-dependencies:
- dependency-name: docs/themes/hugo-theme-relearn
  dependency-type: direct:production
...

Signed-off-by: dependabot[bot] <support@github.com>
Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com>
2025-02-04 08:57:19 +01:00
dependabot[bot]
5a19094d3a chore(deps): Bump sentence-transformers from 3.4.0 to 3.4.1 in /backend/python/transformers (#4748)
chore(deps): Bump sentence-transformers in /backend/python/transformers

Bumps [sentence-transformers](https://github.com/UKPLab/sentence-transformers) from 3.4.0 to 3.4.1.
- [Release notes](https://github.com/UKPLab/sentence-transformers/releases)
- [Commits](https://github.com/UKPLab/sentence-transformers/compare/v3.4.0...v3.4.1)

---
updated-dependencies:
- dependency-name: sentence-transformers
  dependency-type: direct:production
  update-type: version-update:semver-patch
...

Signed-off-by: dependabot[bot] <support@github.com>
Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com>
2025-02-04 08:56:51 +01:00
LocalAI [bot]
e3b943ffcb chore: ⬆️ Update ggerganov/llama.cpp to 5598f475be3e31430fbe17ebb85654ec90dc201e (#4757)
⬆️ Update ggerganov/llama.cpp

Signed-off-by: github-actions[bot] <41898282+github-actions[bot]@users.noreply.github.com>
Co-authored-by: mudler <2420543+mudler@users.noreply.github.com>
2025-02-04 08:56:11 +01:00
dependabot[bot]
df30d6a482 chore(deps): Bump GrantBirki/git-diff-action from 2.7.0 to 2.8.0 (#4746)
Bumps [GrantBirki/git-diff-action](https://github.com/grantbirki/git-diff-action) from 2.7.0 to 2.8.0.
- [Release notes](https://github.com/grantbirki/git-diff-action/releases)
- [Commits](https://github.com/grantbirki/git-diff-action/compare/v2.7.0...v2.8.0)

---
updated-dependencies:
- dependency-name: GrantBirki/git-diff-action
  dependency-type: direct:production
  update-type: version-update:semver-minor
...

Signed-off-by: dependabot[bot] <support@github.com>
Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com>
2025-02-03 22:21:40 +00:00
Ettore Di Giacinto
c3c27b7e3d chore(model gallery): small fixups to llama3.2-fcall template
Signed-off-by: Ettore Di Giacinto <mudler@localai.io>
2025-02-03 17:58:57 +01:00
Ettore Di Giacinto
431716d4d6 fix(gallery): remove box token to llama3.2-fcall
Signed-off-by: Ettore Di Giacinto <mudler@users.noreply.github.com>
2025-02-03 16:10:33 +01:00
Ettore Di Giacinto
d290fd159f chore(model gallery): add LocalAI-functioncall-llama3.2-1b-v0.4 (#4740)
Signed-off-by: Ettore Di Giacinto <mudler@localai.io>
2025-02-03 15:55:49 +01:00
Ettore Di Giacinto
051faaf771 chore(model gallery): add uncensoredai_uncensoredlm-deepseek-r1-distill-qwen-14b (#4739)
Signed-off-by: Ettore Di Giacinto <mudler@localai.io>
2025-02-03 10:46:47 +01:00
Ettore Di Giacinto
41a2dfb0d9 chore(model gallery): add thedrummer_gemmasutra-pro-27b-v1.1 (#4738)
Signed-off-by: Ettore Di Giacinto <mudler@localai.io>
2025-02-03 10:37:24 +01:00
Ettore Di Giacinto
ed0094c3d0 chore(model gallery): add steelskull_l3.3-damascus-r1 (#4737)
Signed-off-by: Ettore Di Giacinto <mudler@localai.io>
2025-02-03 10:30:07 +01:00
LocalAI [bot]
52fadeded1 feat(swagger): update swagger (#4735)
Signed-off-by: github-actions[bot] <41898282+github-actions[bot]@users.noreply.github.com>
Co-authored-by: mudler <2420543+mudler@users.noreply.github.com>
2025-02-03 10:16:42 +01:00
LocalAI [bot]
a37fa8d9c4 chore: ⬆️ Update ggerganov/llama.cpp to 90f9b88afb6447d3929843a2aa98c0f11074762d (#4736)
⬆️ Update ggerganov/llama.cpp

Signed-off-by: github-actions[bot] <41898282+github-actions[bot]@users.noreply.github.com>
Co-authored-by: mudler <2420543+mudler@users.noreply.github.com>
2025-02-02 22:18:30 +00:00
Shraddha
03974a4dd4 feat: tokenization with llama.cpp (#4724)
feat: tokenization

Signed-off-by: shraddhazpy <shraddha@shraddhafive.in>
2025-02-02 17:39:43 +00:00
Ettore Di Giacinto
1d6afbd65d feat(llama.cpp): Add support to grammar triggers (#4733)
Signed-off-by: Ettore Di Giacinto <mudler@localai.io>
2025-02-02 13:25:03 +01:00
LocalAI [bot]
d79f02ea09 chore: ⬆️ Update ggerganov/llama.cpp to 53debe6f3c9cca87e9520a83ee8c14d88977afa4 (#4732)
⬆️ Update ggerganov/llama.cpp

Signed-off-by: github-actions[bot] <41898282+github-actions[bot]@users.noreply.github.com>
Co-authored-by: mudler <2420543+mudler@users.noreply.github.com>
2025-02-01 21:45:26 +00:00
Ettore Di Giacinto
ba2f426e3e chore(model gallery): add fuseo1-deekseekr1-qwq-skyt1-32b-preview (#4731)
Signed-off-by: Ettore Di Giacinto <mudler@localai.io>
2025-02-01 10:12:15 +01:00
LocalAI [bot]
732042e5c6 chore: ⬆️ Update ggerganov/llama.cpp to aa6fb1321333fae8853d0cdc26bcb5d438e650a1 (#4728)
⬆️ Update ggerganov/llama.cpp

Signed-off-by: github-actions[bot] <41898282+github-actions[bot]@users.noreply.github.com>
Co-authored-by: mudler <2420543+mudler@users.noreply.github.com>
2025-01-31 22:31:00 +00:00
Ettore Di Giacinto
f1763aabf2 chore(model gallery): add taid-llm-1.5b (#4727)
Signed-off-by: Ettore Di Giacinto <mudler@localai.io>
2025-01-31 14:53:39 +01:00
Ettore Di Giacinto
e0d90b173b chore(model gallery): add tinyswallow-1.5b-instruct (#4726)
Signed-off-by: Ettore Di Giacinto <mudler@localai.io>
2025-01-31 14:49:02 +01:00
Ettore Di Giacinto
ff07612bfa chore(model gallery): add mistral-small-24b-instruct-2501 (#4725)
Signed-off-by: Ettore Di Giacinto <mudler@localai.io>
2025-01-31 14:45:42 +01:00
LocalAI [bot]
7badaf78a0 chore: ⬆️ Update ggerganov/llama.cpp to 8b576b6c55bc4e6be898b47522f0ef402b93ef62 (#4722)
⬆️ Update ggerganov/llama.cpp

Signed-off-by: github-actions[bot] <41898282+github-actions[bot]@users.noreply.github.com>
Co-authored-by: mudler <2420543+mudler@users.noreply.github.com>
Co-authored-by: Ettore Di Giacinto <mudler@users.noreply.github.com>
2025-01-31 11:31:46 +00:00
Ettore Di Giacinto
af41436f1b fix(tests): pin to branch for config used in tests (#4721)
Signed-off-by: Ettore Di Giacinto <mudler@localai.io>
2025-01-31 09:57:58 +01:00
LocalAI [bot]
cd5489ce47 chore(model-gallery): ⬆️ update checksum (#4723)
⬆️ Checksum updates in gallery/index.yaml

Signed-off-by: github-actions[bot] <41898282+github-actions[bot]@users.noreply.github.com>
Co-authored-by: mudler <2420543+mudler@users.noreply.github.com>
2025-01-31 08:51:32 +01:00
Ettore Di Giacinto
60ec2cf751 chore(model gallery): add openthinker-7b (#4720)
Signed-off-by: Ettore Di Giacinto <mudler@localai.io>
2025-01-30 16:44:44 +01:00
Ettore Di Giacinto
244f4b564f chore(model gallery): add selene-1-mini-llama-3.1-8b (#4719)
Signed-off-by: Ettore Di Giacinto <mudler@localai.io>
2025-01-30 16:42:48 +01:00
Ettore Di Giacinto
f1d6d65417 chore(model gallery): add virtuoso-lite (#4718)
Signed-off-by: Ettore Di Giacinto <mudler@localai.io>
2025-01-30 16:38:35 +01:00
Ettore Di Giacinto
72e52c4f6a chore: drop embedded models (#4715)
Since the remote gallery was introduced this is now completely
superseded by it. In order to keep the code clean and remove redudant
parts let's simplify the usage.

Signed-off-by: Ettore Di Giacinto <mudler@localai.io>
2025-01-30 00:03:01 +01:00
LocalAI [bot]
1656e1a88e chore: ⬆️ Update ggerganov/llama.cpp to eb7cf15a808d4d7a71eef89cc6a9b96fe82989dc (#4717)
⬆️ Update ggerganov/llama.cpp

Signed-off-by: github-actions[bot] <41898282+github-actions[bot]@users.noreply.github.com>
Co-authored-by: mudler <2420543+mudler@users.noreply.github.com>
2025-01-29 21:45:38 +00:00
Ettore Di Giacinto
7f62b418a4 chore(docs): add documentation for l4t images
Signed-off-by: Ettore Di Giacinto <mudler@localai.io>
2025-01-29 15:16:07 +01:00
Maximilian Kenfenheuer
1f4e66d638 chore(model gallery): add specific message templates for llama3.2 based models (#4707)
* chore(model gallery): add specific message templates for llama3.2 based models

Signed-off-by: Maximilian Kenfenheuer <maximilian.kenfenheuer@ksol.it>

* fix: yaml lint in llama3.2-quantized.yaml

Signed-off-by: Maximilian Kenfenheuer <maximilian.kenfenheuer@ksol.it>

* fix: yaml lint in llama3.2-quantized.yaml

Signed-off-by: Maximilian Kenfenheuer <maximilian.kenfenheuer@ksol.it>

---------

Signed-off-by: Maximilian Kenfenheuer <maximilian.kenfenheuer@ksol.it>
2025-01-29 10:19:48 +01:00
Maximilian Kenfenheuer
a37b2c765c docs: update advanced-usage.md to reflect changes in #4700 (#4709)
Signed-off-by: Maximilian Kenfenheuer <maximilian.kenfenheuer@ksol.it>
2025-01-28 22:58:35 +01:00
Maximilian Kenfenheuer
b4b67e00bd refactor: function argument parsing using named regex (#4708)
Signed-off-by: Maximilian Kenfenheuer <maximilian.kenfenheuer@ksol.it>
2025-01-28 22:58:02 +01:00
LocalAI [bot]
91e1ff5a95 chore: ⬆️ Update ggerganov/llama.cpp to cae9fb4361138b937464524eed907328731b81f6 (#4711)
⬆️ Update ggerganov/llama.cpp

Signed-off-by: github-actions[bot] <41898282+github-actions[bot]@users.noreply.github.com>
Co-authored-by: mudler <2420543+mudler@users.noreply.github.com>
2025-01-28 21:45:14 +00:00
dependabot[bot]
d9204ea3b5 chore(deps): Bump dependabot/fetch-metadata from 2.2.0 to 2.3.0 (#4701)
Bumps [dependabot/fetch-metadata](https://github.com/dependabot/fetch-metadata) from 2.2.0 to 2.3.0.
- [Release notes](https://github.com/dependabot/fetch-metadata/releases)
- [Commits](https://github.com/dependabot/fetch-metadata/compare/v2.2.0...v2.3.0)

---
updated-dependencies:
- dependency-name: dependabot/fetch-metadata
  dependency-type: direct:production
  update-type: version-update:semver-minor
...

Signed-off-by: dependabot[bot] <support@github.com>
Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com>
2025-01-28 11:50:09 +01:00
LocalAI [bot]
3d0fbcb4f7 chore: ⬆️ Update ggerganov/llama.cpp to a4417ddda98fd0558fb4d802253e68a933704b59 (#4705)
⬆️ Update ggerganov/llama.cpp

Signed-off-by: github-actions[bot] <41898282+github-actions[bot]@users.noreply.github.com>
Co-authored-by: mudler <2420543+mudler@users.noreply.github.com>
2025-01-28 09:13:43 +01:00
dependabot[bot]
03f3df9a82 chore(deps): Bump docs/themes/hugo-theme-relearn from 8dad5ee to 5bcb9fe (#4704)
chore(deps): Bump docs/themes/hugo-theme-relearn

Bumps [docs/themes/hugo-theme-relearn](https://github.com/McShelby/hugo-theme-relearn) from `8dad5ee` to `5bcb9fe`.
- [Release notes](https://github.com/McShelby/hugo-theme-relearn/releases)
- [Commits](8dad5ee419...5bcb9fe5e6)

---
updated-dependencies:
- dependency-name: docs/themes/hugo-theme-relearn
  dependency-type: direct:production
...

Signed-off-by: dependabot[bot] <support@github.com>
Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com>
2025-01-28 09:13:00 +01:00
dependabot[bot]
fff35d5528 chore(deps): Bump sentence-transformers from 3.3.1 to 3.4.0 in /backend/python/transformers (#4702)
chore(deps): Bump sentence-transformers in /backend/python/transformers

Bumps [sentence-transformers](https://github.com/UKPLab/sentence-transformers) from 3.3.1 to 3.4.0.
- [Release notes](https://github.com/UKPLab/sentence-transformers/releases)
- [Commits](https://github.com/UKPLab/sentence-transformers/compare/v3.3.1...v3.4.0)

---
updated-dependencies:
- dependency-name: sentence-transformers
  dependency-type: direct:production
  update-type: version-update:semver-minor
...

Signed-off-by: dependabot[bot] <support@github.com>
Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com>
2025-01-27 21:09:50 +00:00
Maximilian Kenfenheuer
539e94db73 feat: function argument parsing using named regex (#4700)
Signed-off-by: Maximilian Kenfenheuer <maximilian.kenfenheuer@ksol.it>
2025-01-27 15:53:05 +00:00
Ettore Di Giacinto
0f4f62cf3c chore(model gallery): add fuseo1-deepseekr1-qwq-32b-preview (#4699)
Signed-off-by: Ettore Di Giacinto <mudler@localai.io>
2025-01-27 09:51:06 +01:00
Ettore Di Giacinto
e7cffd7afa chore(model gallery): add fuseo1-deepseekr1-qwen2.5-instruct-32b-preview (#4698)
Signed-off-by: Ettore Di Giacinto <mudler@localai.io>
2025-01-27 09:31:47 +01:00
Ettore Di Giacinto
26d790a2b6 chore(model gallery): add fuseo1-deepseekr1-qwen2.5-coder-32b-preview-v0.1 (#4697)
Signed-off-by: Ettore Di Giacinto <mudler@localai.io>
2025-01-27 09:28:29 +01:00
Ettore Di Giacinto
5cf838c08d chore(model gallery): add confucius-o1-14b (#4696)
Signed-off-by: Ettore Di Giacinto <mudler@localai.io>
2025-01-27 09:26:00 +01:00
LocalAI [bot]
4db8f5cbce chore: ⬆️ Update ggerganov/llama.cpp to 178a7eb952d211b8d4232d5e50ae1b64519172a9 (#4694)
⬆️ Update ggerganov/llama.cpp

Signed-off-by: github-actions[bot] <41898282+github-actions[bot]@users.noreply.github.com>
Co-authored-by: mudler <2420543+mudler@users.noreply.github.com>
2025-01-26 21:44:54 +00:00
Ettore Di Giacinto
3b6b37a81b chore(model gallery): add deepseek-r1-qwen-2.5-32b-ablated (#4693)
Signed-off-by: Ettore Di Giacinto <mudler@localai.io>
2025-01-26 10:06:06 +01:00
Ettore Di Giacinto
8f5aa2d9de chore(model gallery): add dumpling-qwen2.5-32b (#4692)
Signed-off-by: Ettore Di Giacinto <mudler@localai.io>
2025-01-26 10:03:46 +01:00
Ettore Di Giacinto
a6bc8aa7c7 chore(model gallery): add l3.3-nevoria-r1-70b (#4691)
Signed-off-by: Ettore Di Giacinto <mudler@localai.io>
2025-01-26 10:01:37 +01:00
LocalAI [bot]
4ab107bc1a chore: ⬆️ Update ggerganov/llama.cpp to 26771a1491f3a4c3d5b99c4c267b81aca9a7dfa0 (#4690)
⬆️ Update ggerganov/llama.cpp

Signed-off-by: github-actions[bot] <41898282+github-actions[bot]@users.noreply.github.com>
Co-authored-by: mudler <2420543+mudler@users.noreply.github.com>
2025-01-25 21:44:14 +00:00
Ettore Di Giacinto
4c3710a531 chore(model gallery): add chuluun-qwen2.5-72b-v0.08 (#4689)
Signed-off-by: Ettore Di Giacinto <mudler@localai.io>
2025-01-25 11:07:31 +01:00
Ettore Di Giacinto
901b06284a chore(model gallery): add art-v0-3b (#4688)
Signed-off-by: Ettore Di Giacinto <mudler@localai.io>
2025-01-25 11:06:05 +01:00
Ettore Di Giacinto
8eef5a2c5e chore(model gallery): add lamarck-14b-v0.7 (#4687)
Signed-off-by: Ettore Di Giacinto <mudler@localai.io>
2025-01-25 11:04:12 +01:00
Gianluca Boiano
e9cace137b chore(model gallery): update deepseek-r1 prompt template (#4686)
Signed-off-by: Gianluca Boiano <morf3089@gmail.com>
2025-01-25 09:04:38 +01:00
LocalAI [bot]
9409c99738 chore: ⬆️ Update ggerganov/llama.cpp to c5d9effb49649db80a52caf5c0626de6f342f526 (#4685)
⬆️ Update ggerganov/llama.cpp

Signed-off-by: github-actions[bot] <41898282+github-actions[bot]@users.noreply.github.com>
Co-authored-by: mudler <2420543+mudler@users.noreply.github.com>
2025-01-24 21:45:54 +00:00
Ettore Di Giacinto
4d44ebc2f2 chore(deps): bump grpcio to 1.70.0 (#4682)
Signed-off-by: Ettore Di Giacinto <mudler@localai.io>
2025-01-24 10:18:22 +01:00
Gianluca Boiano
9a1182fa01 chore(model gallery): add flux.1, stablediffusion and whisper icons (#4680)
Signed-off-by: Gianluca Boiano <morf3089@gmail.com>
2025-01-24 08:29:02 +01:00
Gianluca Boiano
66e9ef3f33 chore(model gallery): add DeepSeek R1 14b, 32b and 70b (#4679)
Signed-off-by: Gianluca Boiano <morf3089@gmail.com>
2025-01-24 08:28:44 +01:00
Ettore Di Giacinto
8282414583 chore(downloader): support hf.co and hf:// URIs (#4677)
Signed-off-by: Ettore Di Giacinto <mudler@localai.io>
2025-01-24 08:27:22 +01:00
Gianluca Boiano
d1d7ce83d4 chore(model gallery): add MiniCPM-o-2.6-7.6b (#4676)
Signed-off-by: Gianluca Boiano <morf3089@gmail.com>
2025-01-24 08:27:02 +01:00
Ettore Di Giacinto
5177837ab0 chore: detect and enable avx512 builds (#4675)
chore(avx512): add support

Fixes https://github.com/mudler/LocalAI/issues/4662

Signed-off-by: Ettore Di Giacinto <mudler@localai.io>
2025-01-24 08:26:44 +01:00
Ettore Di Giacinto
f9e368b7c4 chore(refactor): group cpu cap detection (#4674)
Signed-off-by: Ettore Di Giacinto <mudler@localai.io>
2025-01-23 16:35:44 +01:00
Ettore Di Giacinto
eef80b9880 chore(ci): cleanup tests
Signed-off-by: Ettore Di Giacinto <mudler@localai.io>
2025-01-23 10:02:57 +01:00
Ettore Di Giacinto
073eaec729 chore(openvoice): drop backend (#4673)
The project (MeloTTS) has been quite since long, newer backends are much
performant and better quality overall.

Signed-off-by: Ettore Di Giacinto <mudler@localai.io>
2025-01-23 10:00:36 +01:00
Ettore Di Giacinto
318225f631 chore(parler-tts): drop backend (#4672)
We support at this point more extensive backends that are SOTA and
support also voice cloning, and many other features. This backend is
superseded and also poses significant maintenance burden as there is an
open issue https://github.com/mudler/LocalAI/issues/3941 which is still
open as it deps are pinning old versions of grpc.

Closes https://github.com/mudler/LocalAI/issues/3941

Signed-off-by: Ettore Di Giacinto <mudler@localai.io>
2025-01-23 09:46:16 +01:00
Ettore Di Giacinto
89429a439b feat(transformers): add support to Mamba (#4669)
Signed-off-by: Ettore Di Giacinto <mudler@localai.io>
2025-01-23 09:30:47 +01:00
LocalAI [bot]
200fe358f0 chore: ⬆️ Update ggerganov/llama.cpp to 6152129d05870cb38162c422c6ba80434e021e9f (#4668)
⬆️ Update ggerganov/llama.cpp

Signed-off-by: github-actions[bot] <41898282+github-actions[bot]@users.noreply.github.com>
Co-authored-by: mudler <2420543+mudler@users.noreply.github.com>
2025-01-23 08:06:43 +01:00
Ettore Di Giacinto
e426ab7c23 feat(faster-whisper): add backend (#4666)
Signed-off-by: Ettore Di Giacinto <mudler@localai.io>
2025-01-23 08:06:18 +01:00
LocalAI [bot]
715071b68d feat(swagger): update swagger (#4667)
Signed-off-by: github-actions[bot] <41898282+github-actions[bot]@users.noreply.github.com>
Co-authored-by: mudler <2420543+mudler@users.noreply.github.com>
2025-01-22 21:51:38 +01:00
Peter Cover
a05737c7e4 chore: fix some function names in comment (#4665)
Signed-off-by: petercover <raowanxiang@outlook.com>
2025-01-22 19:35:53 +01:00
Richard Palethorpe
e8eb0b2c50 fix(stores): Stores fixes and testing (#4663)
* fix(stores): Actually check a vector is a unit vector/normalized

Instead of just summing the components to see if they equal 1.0, take
the actual magnitude/p-norm of the vector and check that is
approximately 1.0.

Note that this shouldn't change the order of results except in edge
cases if I am too lax with the precision of the equality
comparison. However it should improve performance for normalized
vectors which were being misclassified.

Signed-off-by: Richard Palethorpe <io@richiejp.com>

* fix(stores): Add tests for known results and triangle inequality

This adds some more tests to check the cosine similarity function has
some expected mathematical properties.

Signed-off-by: Richard Palethorpe <io@richiejp.com>

---------

Signed-off-by: Richard Palethorpe <io@richiejp.com>
2025-01-22 19:35:05 +01:00
Ettore Di Giacinto
e15d29aba2 chore(stablediffusion-ncn): drop in favor of ggml implementation (#4652)
* chore(stablediffusion-ncn): drop in favor of ggml implementation

Signed-off-by: Ettore Di Giacinto <mudler@localai.io>

* chore(ci): drop stablediffusion build

Signed-off-by: Ettore Di Giacinto <mudler@localai.io>

* chore(tests): add

Signed-off-by: Ettore Di Giacinto <mudler@localai.io>

* chore(tests): try to fixup current tests

Signed-off-by: Ettore Di Giacinto <mudler@localai.io>

* Try to fix tests

Signed-off-by: Ettore Di Giacinto <mudler@localai.io>

* Tests improvements

Signed-off-by: Ettore Di Giacinto <mudler@localai.io>

* chore(tests): use quality to specify step

Signed-off-by: Ettore Di Giacinto <mudler@localai.io>

* chore(tests): switch to sd-1.5

also increase prep time for downloading models

Signed-off-by: Ettore Di Giacinto <mudler@localai.io>

---------

Signed-off-by: Ettore Di Giacinto <mudler@localai.io>
2025-01-22 19:34:16 +01:00
Ettore Di Giacinto
10675ac28e Update README.md
Signed-off-by: Ettore Di Giacinto <mudler@users.noreply.github.com>
2025-01-22 18:07:30 +01:00
Ettore Di Giacinto
0ec25b8b07 chore(model gallery): add sd-1.5-ggml and sd-3.5-medium-ggml (#4664)
Signed-off-by: Ettore Di Giacinto <mudler@localai.io>
2025-01-22 16:37:20 +01:00
LocalAI [bot]
e81ceff681 chore: ⬆️ Update ggerganov/llama.cpp to 6171c9d25820ccf676b243c172868819d882848f (#4661)
⬆️ Update ggerganov/llama.cpp

Signed-off-by: github-actions[bot] <41898282+github-actions[bot]@users.noreply.github.com>
Co-authored-by: mudler <2420543+mudler@users.noreply.github.com>
2025-01-21 22:04:29 +00:00
Ettore Di Giacinto
6831719e1e chore(model gallery): add deepseek-r1-distill-qwen-7b (#4660)
Signed-off-by: Ettore Di Giacinto <mudler@localai.io>
2025-01-21 15:09:36 +01:00
Gianluca Boiano
b264a91b3f chore(model gallery): add Deepseek-R1-Distill models (#4646)
* chore(model gallery): add Deepseek-R1-Distill-Llama-8b

Signed-off-by: Gianluca Boiano <morf3089@gmail.com>

* chore(model gallery): add Deepseek-R1-Distill-Qwen-1.5b

Signed-off-by: Gianluca Boiano <morf3089@gmail.com>

---------

Signed-off-by: Gianluca Boiano <morf3089@gmail.com>
2025-01-21 10:37:05 +01:00
LocalAI [bot]
1a08948e63 chore: ⬆️ Update ggerganov/llama.cpp to aea8ddd5165d525a449e2fc3839db77a71f4a318 (#4657)
⬆️ Update ggerganov/llama.cpp

Signed-off-by: github-actions[bot] <41898282+github-actions[bot]@users.noreply.github.com>
Co-authored-by: mudler <2420543+mudler@users.noreply.github.com>
2025-01-21 08:37:13 +01:00
dependabot[bot]
14a1e02f44 chore(deps): Bump docs/themes/hugo-theme-relearn from 80e448e to 8dad5ee (#4656)
chore(deps): Bump docs/themes/hugo-theme-relearn

Bumps [docs/themes/hugo-theme-relearn](https://github.com/McShelby/hugo-theme-relearn) from `80e448e` to `8dad5ee`.
- [Release notes](https://github.com/McShelby/hugo-theme-relearn/releases)
- [Commits](80e448e5bd...8dad5ee419)

---
updated-dependencies:
- dependency-name: docs/themes/hugo-theme-relearn
  dependency-type: direct:production
...

Signed-off-by: dependabot[bot] <support@github.com>
Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com>
2025-01-20 23:33:40 +00:00
Ettore Di Giacinto
2f09aa1b85 chore(model gallery): add sd-3.5-large-ggml (#4647)
Signed-off-by: Ettore Di Giacinto <mudler@localai.io>
2025-01-20 19:04:23 +01:00
Gianluca Boiano
a396040886 chore(model gallery): remove dead icons and update LLAVA and DeepSeek ones (#4645)
* chore(model gallery): update icons and add LLAVA ones

Signed-off-by: Gianluca Boiano <morf3089@gmail.com>

* chore(model gallery): fix all complains related to yamllint

Signed-off-by: Gianluca Boiano <morf3089@gmail.com>

---------

Signed-off-by: Gianluca Boiano <morf3089@gmail.com>
2025-01-20 16:13:19 +01:00
Ettore Di Giacinto
aeb1dca52e chore(model gallery): add l3.3-prikol-70b-v0.2 (#4643)
Signed-off-by: Ettore Di Giacinto <mudler@localai.io>
2025-01-20 11:03:35 +01:00
Ettore Di Giacinto
83a8d90c52 chore(model gallery): add l3.3-70b-magnum-v4-se (#4642)
Signed-off-by: Ettore Di Giacinto <mudler@localai.io>
2025-01-20 10:50:29 +01:00
Ettore Di Giacinto
adebd557ce chore(model gallery): add wayfarer-12b (#4641)
Signed-off-by: Ettore Di Giacinto <mudler@localai.io>
2025-01-20 10:45:10 +01:00
Gianluca Boiano
0c0e015b38 chore(model gallery): update icons and add missing ones (#4639)
* chore(model gallery): uniform github URLs for icons

Signed-off-by: Gianluca Boiano <morf3089@gmail.com>

* chore(model gallery): add icons to phi models

Signed-off-by: Gianluca Boiano <morf3089@gmail.com>

* chore(model gallery): add icons to QwenLM models

Signed-off-by: Gianluca Boiano <morf3089@gmail.com>

* chore(model gallery): update icon for Arcee org

Signed-off-by: Gianluca Boiano <morf3089@gmail.com>

* chore(model gallery): update icon for Meta org

Signed-off-by: Gianluca Boiano <morf3089@gmail.com>

* chore(model gallery): update icon url for OpenCoder org

Signed-off-by: Gianluca Boiano <morf3089@gmail.com>

* chore(model gallery): add icon for RWKV org

Signed-off-by: Gianluca Boiano <morf3089@gmail.com>

* chore(model gallery): add icon for IBM-granite org

Signed-off-by: Gianluca Boiano <morf3089@gmail.com>

* chore(model gallery): add icon for OpenBMB org

Signed-off-by: Gianluca Boiano <morf3089@gmail.com>

* chore(model gallery): add icon for KatanemoLabs org

Signed-off-by: Gianluca Boiano <morf3089@gmail.com>

* chore(model gallery): update icon for Meta-Llama-3.1-8B-Instruct-abliterated

Signed-off-by: Gianluca Boiano <morf3089@gmail.com>

* chore(model gallery): update icon for hermes-3-llama-3.1-8b-lorablated

Signed-off-by: Gianluca Boiano <morf3089@gmail.com>

* chore(model gallery): add icon for Google org

Signed-off-by: Gianluca Boiano <morf3089@gmail.com>

---------

Signed-off-by: Gianluca Boiano <morf3089@gmail.com>
Signed-off-by: Ettore Di Giacinto <mudler@users.noreply.github.com>
Co-authored-by: Ettore Di Giacinto <mudler@users.noreply.github.com>
2025-01-20 10:40:46 +01:00
Gianluca Boiano
390bb3f58b fix(model gallery): minicpm-v-2.6 is based on qwen2 (#4638)
Signed-off-by: Gianluca Boiano <morf3089@gmail.com>
2025-01-20 10:35:05 +01:00
Gianluca Boiano
30739d94a4 chore(model gallery): add InternLM3-8b-Q4_K_M (#4637)
chore(model gallery): add InternLM3-8b-Q4_K_M

Signed-off-by: Gianluca Boiano <morf3089@gmail.com>
2025-01-20 10:34:19 +01:00
LocalAI [bot]
83e2dd5dff chore: ⬆️ Update ggerganov/llama.cpp to 92bc493917d43b83e592349e138b54c90b1c3ea7 (#4640)
⬆️ Update ggerganov/llama.cpp

Signed-off-by: github-actions[bot] <41898282+github-actions[bot]@users.noreply.github.com>
Co-authored-by: mudler <2420543+mudler@users.noreply.github.com>
2025-01-19 22:34:32 +00:00
Ettore Di Giacinto
f496d0113b chore(deps): pin numba
Signed-off-by: Ettore Di Giacinto <mudler@localai.io>
2025-01-19 09:07:56 +01:00
LocalAI [bot]
a752183fb5 chore: ⬆️ Update ggerganov/llama.cpp to a1649cc13f89946322358f92ea268ae1b7b5096c (#4635)
⬆️ Update ggerganov/llama.cpp

Signed-off-by: github-actions[bot] <41898282+github-actions[bot]@users.noreply.github.com>
Co-authored-by: mudler <2420543+mudler@users.noreply.github.com>
2025-01-19 08:38:33 +01:00
LocalAI [bot]
296b97925f chore: ⬆️ Update leejet/stable-diffusion.cpp to 5eb15ef4d022bef4a391de4f5f6556e81fbb5024 (#4636)
⬆️ Update leejet/stable-diffusion.cpp

Signed-off-by: github-actions[bot] <41898282+github-actions[bot]@users.noreply.github.com>
Co-authored-by: mudler <2420543+mudler@users.noreply.github.com>
2025-01-18 22:21:27 +00:00
Gianluca Boiano
d0cc3047dc chore(model gallery): add MiniCPM-V-2.6-8b-q4_K_M (#4633)
Signed-off-by: Gianluca Boiano <morf3089@gmail.com>
2025-01-18 18:36:05 +01:00
Gianluca Boiano
032a33de49 chore: remove deprecated tinydream backend (#4631)
Signed-off-by: Gianluca Boiano <morf3089@gmail.com>
2025-01-18 18:35:30 +01:00
Ettore Di Giacinto
1e9bf19c8d feat(transformers): merge sentencetransformers backend (#4624)
* merge sentencetransformers

Signed-off-by: Ettore Di Giacinto <mudler@localai.io>

* Add alias to silently redirect sentencetransformers to transformers

Signed-off-by: Ettore Di Giacinto <mudler@localai.io>

* Add alias also for transformers-musicgen

Signed-off-by: Ettore Di Giacinto <mudler@localai.io>

* Drop from makefile

Signed-off-by: Ettore Di Giacinto <mudler@localai.io>

* Move tests from sentencetransformers

Signed-off-by: Ettore Di Giacinto <mudler@localai.io>

* Remove sentencetransformers

Signed-off-by: Ettore Di Giacinto <mudler@localai.io>

* Remove tests from CI (part of transformers)

Signed-off-by: Ettore Di Giacinto <mudler@localai.io>

* Do not always try to load the tokenizer

Signed-off-by: Ettore Di Giacinto <mudler@localai.io>

* Adapt tests

Signed-off-by: Ettore Di Giacinto <mudler@localai.io>

* Fix typo

Signed-off-by: Ettore Di Giacinto <mudler@localai.io>

* Tiny adjustments

Signed-off-by: Ettore Di Giacinto <mudler@localai.io>

---------

Signed-off-by: Ettore Di Giacinto <mudler@localai.io>
2025-01-18 18:30:30 +01:00
Gianluca Boiano
4bd8434ae0 fix(docs): add missing -core suffix to sycl images (#4630)
Signed-off-by: Gianluca Boiano <morf3089@gmail.com>
2025-01-18 15:47:49 +01:00
Ettore Di Giacinto
958f6eb722 chore(llama.cpp): update dependency (#4628)
Update to '3edfa7d3753c29e44b964c0ff424d2ea8d5fdee6' and adapt to upstream changes

Signed-off-by: Ettore Di Giacinto <mudler@localai.io>
2025-01-18 11:55:13 +01:00
mintyleaf
96306a39a0 chore(docs): extra-Usage and Machine-Tag docs (#4627)
Rename LocalAI-Extra-Usage -> Extra-Usage, add MACHINE_TAG as cli flag option, add docs about extra-usage and machine-tag

Signed-off-by: mintyleaf <mintyleafdev@gmail.com>
2025-01-18 08:58:38 +01:00
LocalAI [bot]
895cd7c76a feat(swagger): update swagger (#4625)
Signed-off-by: github-actions[bot] <41898282+github-actions[bot]@users.noreply.github.com>
Co-authored-by: mudler <2420543+mudler@users.noreply.github.com>
2025-01-18 08:57:49 +01:00
dependabot[bot]
cbdbe59f16 chore(deps): Bump scipy from 1.14.0 to 1.15.1 in /backend/python/transformers (#4621)
chore(deps): Bump scipy in /backend/python/transformers

Bumps [scipy](https://github.com/scipy/scipy) from 1.14.0 to 1.15.1.
- [Release notes](https://github.com/scipy/scipy/releases)
- [Commits](https://github.com/scipy/scipy/compare/v1.14.0...v1.15.1)

---
updated-dependencies:
- dependency-name: scipy
  dependency-type: direct:production
  update-type: version-update:semver-minor
...

Signed-off-by: dependabot[bot] <support@github.com>
Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com>
2025-01-17 22:14:11 +00:00
Ettore Di Giacinto
ee7904f170 feat(transformers): add support to OuteTTS (#4622)
Signed-off-by: Ettore Di Giacinto <mudler@localai.io>
2025-01-17 19:33:25 +01:00
Ettore Di Giacinto
a761e01944 chore: alias transformers-musicgen to transformers (#4623)
chore: alias transformers-muscigen to transformers

Signed-off-by: Ettore Di Giacinto <mudler@localai.io>
2025-01-17 18:16:17 +01:00
mintyleaf
96f8ec0402 feat: add machine tag and inference timings (#4577)
* Add machine tag option, add extraUsage option, grpc-server -> proto -> endpoint extraUsage data is broken for now

Signed-off-by: mintyleaf <mintyleafdev@gmail.com>

* remove redurant timing fields, fix not working timings output

Signed-off-by: mintyleaf <mintyleafdev@gmail.com>

* use middleware for Machine-Tag only if tag is specified

Signed-off-by: mintyleaf <mintyleafdev@gmail.com>

---------

Signed-off-by: mintyleaf <mintyleafdev@gmail.com>
2025-01-17 17:05:58 +01:00
Ettore Di Giacinto
8027fdf1c7 feat(transformers): merge musicgen functionalities to a single backend (#4620)
* feat(transformers): merge musicgen functionalities to a single backend

So we optimize space

Signed-off-by: Ettore Di Giacinto <mudler@localai.io>

* specify type in tests

Signed-off-by: Ettore Di Giacinto <mudler@localai.io>

* Some adaptations for the MusicgenForConditionalGeneration type

Signed-off-by: Ettore Di Giacinto <mudler@localai.io>

---------

Signed-off-by: Ettore Di Giacinto <mudler@localai.io>
2025-01-17 17:01:16 +01:00
Ettore Di Giacinto
212c8e1a6d Update README.md
Signed-off-by: Ettore Di Giacinto <mudler@users.noreply.github.com>
2025-01-17 15:11:10 +01:00
LocalAI [bot]
78533d7230 chore: ⬆️ Update ggerganov/llama.cpp to 4dbc8b9cb71876e005724f4e8f73a3544646bcf5 (#4618)
⬆️ Update ggerganov/llama.cpp

Signed-off-by: github-actions[bot] <41898282+github-actions[bot]@users.noreply.github.com>
Co-authored-by: mudler <2420543+mudler@users.noreply.github.com>
2025-01-17 10:25:04 +01:00
Ettore Di Giacinto
b5eeb5c5ab ci(arm64): run in parallel
Signed-off-by: Ettore Di Giacinto <mudler@localai.io>
2025-01-17 10:24:15 +01:00
Ettore Di Giacinto
b147ad0596 ci: try to build for arm64
Try to use the free arm64 runners from Github:
https://github.blog/changelog/2025-01-16-linux-arm64-hosted-runners-now-available-for-free-in-public-repositories-public-preview/

Signed-off-by: Ettore Di Giacinto <mudler@localai.io>
2025-01-17 10:14:26 +01:00
Ettore Di Giacinto
7d0ac1ea3f chore(vall-e-x): Drop backend (#4619)
There are many new architectures that are SOTA and replaces vall-e-x
nowadays.

Signed-off-by: Ettore Di Giacinto <mudler@localai.io>
2025-01-17 09:35:10 +01:00
Ettore Di Giacinto
d08d97bebf chore(model gallery): fix typo
Signed-off-by: Ettore Di Giacinto <mudler@users.noreply.github.com>
2025-01-16 22:26:55 +01:00
Ettore Di Giacinto
acb2eb23c8 feat(tts): Add Kokoro backend (#4616)
* feat(kokoro): Add new TTS backend

Signed-off-by: Ettore Di Giacinto <mudler@localai.io>

* Add kokoro to images

Signed-off-by: Ettore Di Giacinto <mudler@localai.io>

* Support combined voices

Signed-off-by: Ettore Di Giacinto <mudler@localai.io>

* Ignore pt and onnx

Signed-off-by: Ettore Di Giacinto <mudler@localai.io>

* Add plbert and istfnet

Signed-off-by: Ettore Di Giacinto <mudler@localai.io>

---------

Signed-off-by: Ettore Di Giacinto <mudler@localai.io>
2025-01-16 22:23:09 +01:00
Ettore Di Giacinto
de4aa9fb1d chore(model gallery): add vikhr-qwen-2.5-1.5b-instruct (#4615)
Signed-off-by: Ettore Di Giacinto <mudler@localai.io>
2025-01-16 10:09:25 +01:00
Ettore Di Giacinto
560ba6f25e chore(model gallery): add drt-o1-14b (#4614)
Signed-off-by: Ettore Di Giacinto <mudler@localai.io>
2025-01-16 10:04:44 +01:00
Ettore Di Giacinto
8131ddd878 chore(model gallery): add uwu-7b-instruct (#4613)
Signed-off-by: Ettore Di Giacinto <mudler@localai.io>
2025-01-16 09:58:14 +01:00
LocalAI [bot]
26c3deb673 chore: ⬆️ Update ggerganov/llama.cpp to adc5dd92e8aea98f5e7ac84f6e1bc15de35130b5 (#4612)
⬆️ Update ggerganov/llama.cpp

Signed-off-by: github-actions[bot] <41898282+github-actions[bot]@users.noreply.github.com>
Co-authored-by: mudler <2420543+mudler@users.noreply.github.com>
2025-01-16 00:08:52 +00:00
Ettore Di Giacinto
6d20497d45 chore(model gallery): add lb-reranker-0.5b-v1.0 (#4611)
Signed-off-by: Ettore Di Giacinto <mudler@localai.io>
2025-01-15 15:54:12 +01:00
Ettore Di Giacinto
482c6b8be4 chore(model gallery): add l3.3-ms-nevoria-70b (#4610)
Signed-off-by: Ettore Di Giacinto <mudler@localai.io>
2025-01-15 15:51:50 +01:00
Ettore Di Giacinto
5bba5edf45 chore(model gallery): add qwerus-7b (#4609)
Signed-off-by: Ettore Di Giacinto <mudler@localai.io>
2025-01-15 15:46:45 +01:00
Ettore Di Giacinto
792b866727 Update README.md
Signed-off-by: Ettore Di Giacinto <mudler@users.noreply.github.com>
2025-01-15 15:46:27 +01:00
LocalAI [bot]
f053f7bde2 chore: ⬆️ Update ggerganov/llama.cpp to b4d92a59a20eea400d8dd30844a339b76210daa0 (#4606)
⬆️ Update ggerganov/llama.cpp

Signed-off-by: github-actions[bot] <41898282+github-actions[bot]@users.noreply.github.com>
Co-authored-by: mudler <2420543+mudler@users.noreply.github.com>
2025-01-14 22:16:33 +00:00
Ettore Di Giacinto
d7dee3a5ec feat(diffusers): add support for Sana pipelines (#4603)
Signed-off-by: Ettore Di Giacinto <mudler@localai.io>
2025-01-14 11:13:16 +01:00
Ettore Di Giacinto
b8d74e52b1 chore(model gallery): add steiner-32b-preview (#4602)
Signed-off-by: Ettore Di Giacinto <mudler@localai.io>
2025-01-14 09:41:30 +01:00
Ettore Di Giacinto
62abe0d2c9 chore(model gallery): add qwen2.5-72b-rp-ink (#4601)
Signed-off-by: Ettore Di Giacinto <mudler@localai.io>
2025-01-14 09:33:19 +01:00
Ettore Di Giacinto
5414c294c4 chore(model gallery): add negative-anubis-70b-v1 (#4600)
Signed-off-by: Ettore Di Giacinto <mudler@localai.io>
2025-01-14 09:29:25 +01:00
Ettore Di Giacinto
1b3e89c89c chore(model gallery): add LocalAI-functioncall-phi-4-v0.3 (#4599)
Signed-off-by: Ettore Di Giacinto <mudler@localai.io>
2025-01-14 09:27:18 +01:00
Ettore Di Giacinto
69c6e5b192 chore(stablediffusion-ggml): disable sycl optimizations (#4598)
Signed-off-by: Ettore Di Giacinto <mudler@localai.io>
2025-01-14 09:17:55 +01:00
LocalAI [bot]
0c02512f15 chore: ⬆️ Update ggerganov/llama.cpp to 504af20ee4eae72080a56d59d744f6774f7901ce (#4597)
⬆️ Update ggerganov/llama.cpp

Signed-off-by: github-actions[bot] <41898282+github-actions[bot]@users.noreply.github.com>
Co-authored-by: mudler <2420543+mudler@users.noreply.github.com>
2025-01-14 09:07:20 +01:00
dependabot[bot]
b0ead0bf12 chore(deps): Bump securego/gosec from 2.21.4 to 2.22.0 (#4594)
Bumps [securego/gosec](https://github.com/securego/gosec) from 2.21.4 to 2.22.0.
- [Release notes](https://github.com/securego/gosec/releases)
- [Changelog](https://github.com/securego/gosec/blob/master/.goreleaser.yml)
- [Commits](https://github.com/securego/gosec/compare/v2.21.4...v2.22.0)

---
updated-dependencies:
- dependency-name: securego/gosec
  dependency-type: direct:production
  update-type: version-update:semver-minor
...

Signed-off-by: dependabot[bot] <support@github.com>
Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com>
2025-01-13 21:17:11 +00:00
Ettore Di Giacinto
ab5adf40af chore(deps): bump llama.cpp to '924518e2e5726e81f3aeb2518fb85963a500e… (#4592)
chore(deps): bump llama.cpp to '924518e2e5726e81f3aeb2518fb85963a500e93a'

Signed-off-by: Ettore Di Giacinto <mudler@localai.io>
2025-01-13 17:33:06 +01:00
Ettore Di Giacinto
8d82afb595 fix(stablediffusion-ggml): enable oneapi before build (#4593)
Signed-off-by: Ettore Di Giacinto <mudler@localai.io>
2025-01-13 10:11:48 +01:00
Ettore Di Giacinto
aea71dd2c6 fix(stablediffusion-ggml): correctly enable sycl (#4591)
Signed-off-by: Ettore Di Giacinto <mudler@localai.io>
2025-01-12 22:07:01 +01:00
Ettore Di Giacinto
9fdb44323d chore(model gallery): add LocalAI-functioncall-phi-4-v0.2 (#4589)
Signed-off-by: Ettore Di Giacinto <mudler@localai.io>
2025-01-12 18:50:41 +01:00
Ettore Di Giacinto
6a299c04a7 feat(stablediffusion-ggml): respect build type (#4581)
* feat(stablediffusion-ggml): respect build type

Signed-off-by: Ettore Di Giacinto <mudler@localai.io>

* combine libraries

Signed-off-by: Ettore Di Giacinto <mudler@localai.io>

---------

Signed-off-by: Ettore Di Giacinto <mudler@localai.io>
2025-01-12 18:33:51 +01:00
Ettore Di Giacinto
9ce71fe427 fix(gallery): correct UL typo
Signed-off-by: Ettore Di Giacinto <mudler@localai.io>
2025-01-12 11:50:40 +01:00
Ettore Di Giacinto
e8de7b52da chore(model gallery): add LocalAI-functioncall-phi-4-v0.1 (#4588)
Signed-off-by: Ettore Di Giacinto <mudler@localai.io>
2025-01-12 11:26:42 +01:00
Ettore Di Giacinto
1780ccadbc chore(model gallery): add finemath-llama-3b (#4587)
Signed-off-by: Ettore Di Giacinto <mudler@localai.io>
2025-01-12 10:40:26 +01:00
Ettore Di Giacinto
f8cffd05e5 chore(model gallery): add negative_llama_70b (#4586)
Signed-off-by: Ettore Di Giacinto <mudler@localai.io>
2025-01-12 10:36:01 +01:00
Ettore Di Giacinto
b898cd49b5 chore(model gallery): add sky-t1-32b-preview (#4585)
Signed-off-by: Ettore Di Giacinto <mudler@localai.io>
2025-01-12 10:33:29 +01:00
LocalAI [bot]
7cd33d10c9 chore: ⬆️ Update ggerganov/llama.cpp to c05e8c9934f94fde49bc1bc9dc51eed282605150 (#4579)
⬆️ Update ggerganov/llama.cpp

Signed-off-by: github-actions[bot] <41898282+github-actions[bot]@users.noreply.github.com>
Co-authored-by: mudler <2420543+mudler@users.noreply.github.com>
2025-01-11 23:25:09 +01:00
Ettore Di Giacinto
cd480dbe5c chore(model gallery): add rombos-qwen2.5-writer-32b (#4584)
Signed-off-by: Ettore Di Giacinto <mudler@localai.io>
2025-01-11 23:24:55 +01:00
Ettore Di Giacinto
cb8bf79ada chore(model gallery): add qwq-32b-preview-ideawhiz-v1 (#4583)
Signed-off-by: Ettore Di Giacinto <mudler@localai.io>
2025-01-11 22:45:37 +01:00
Ettore Di Giacinto
b206eab80f chore(model gallery): add nightwing3-10b-v0.1 (#4582)
Signed-off-by: Ettore Di Giacinto <mudler@localai.io>
2025-01-11 22:41:30 +01:00
LocalAI [bot]
80dc23fab9 chore(model-gallery): ⬆️ update checksum (#4580)
⬆️ Checksum updates in gallery/index.yaml

Signed-off-by: github-actions[bot] <41898282+github-actions[bot]@users.noreply.github.com>
Co-authored-by: mudler <2420543+mudler@users.noreply.github.com>
2025-01-11 22:23:10 +01:00
LocalAI [bot]
844c0c422d docs: ⬆️ update docs version mudler/LocalAI (#4578)
⬆️ Update docs version mudler/LocalAI

Signed-off-by: github-actions[bot] <41898282+github-actions[bot]@users.noreply.github.com>
Co-authored-by: mudler <2420543+mudler@users.noreply.github.com>
2025-01-11 22:10:45 +01:00
LocalAI [bot]
07655c0c2e chore: ⬆️ Update ggerganov/llama.cpp to ba8a1f9c5b675459c55a83e3f97f10df3a66c788 (#4575)
⬆️ Update ggerganov/llama.cpp

Signed-off-by: github-actions[bot] <41898282+github-actions[bot]@users.noreply.github.com>
Co-authored-by: mudler <2420543+mudler@users.noreply.github.com>
2025-01-10 21:45:11 +00:00
Ettore Di Giacinto
bebfd19b45 chore(model gallery): add phi-3.5-moe-instruct (#4574)
Signed-off-by: Ettore Di Giacinto <mudler@localai.io>
2025-01-10 16:59:34 +01:00
Ettore Di Giacinto
6e34430d99 chore(model gallery): add chuluun-qwen2.5-72b-v0.01 (#4573)
Signed-off-by: Ettore Di Giacinto <mudler@localai.io>
2025-01-10 16:27:22 +01:00
Ettore Di Giacinto
0d08aaa29b chore(model gallery): add gwq-9b-preview2 (#4572)
Signed-off-by: Ettore Di Giacinto <mudler@localai.io>
2025-01-10 16:26:20 +01:00
Ettore Di Giacinto
66f9c06e7d docs: update README with langchain integration
Signed-off-by: Ettore Di Giacinto <mudler@users.noreply.github.com>
2025-01-10 09:20:33 +01:00
LocalAI [bot]
775adf871f chore: ⬆️ Update ggerganov/llama.cpp to 1204f9727005974587d6fc1dcd4d4f0ead87c856 (#4570)
⬆️ Update ggerganov/llama.cpp

Signed-off-by: github-actions[bot] <41898282+github-actions[bot]@users.noreply.github.com>
Co-authored-by: mudler <2420543+mudler@users.noreply.github.com>
2025-01-09 21:46:19 +00:00
Ettore Di Giacinto
a0fc19a3d6 chore(model gallery): add 70b-l3.3-cirrus-x1 (#4569)
Signed-off-by: Ettore Di Giacinto <mudler@localai.io>
2025-01-09 16:28:55 +01:00
Ettore Di Giacinto
7bd18662a7 chore(model gallery): add huatuogpt-o1-7b-v0.1 (#4568)
Signed-off-by: Ettore Di Giacinto <mudler@localai.io>
2025-01-09 16:01:57 +01:00
Ettore Di Giacinto
95b0739906 chore(model gallery): add minithinky-v2-1b-llama-3.2 (#4567)
Signed-off-by: Ettore Di Giacinto <mudler@localai.io>
2025-01-09 15:58:38 +01:00
Ettore Di Giacinto
cad7e9a1cd chore(model gallery): add 14b-qwen2.5-freya-x1 (#4566)
Signed-off-by: Ettore Di Giacinto <mudler@localai.io>
2025-01-09 15:55:56 +01:00
Ettore Di Giacinto
4426efab05 chore(deps): bump edgevpn to v0.29.0 (#4564)
Signed-off-by: Ettore Di Giacinto <mudler@localai.io>
2025-01-09 09:33:22 +01:00
Saarthak Verma
6765b17acd feat(dowloader): resume partial downloads (#4537)
* feat(resume downloads): add basic tests

Signed-off-by: Saarthak Verma <saarthakverma739@gmail.com>

* test(resume downloads): implement file download tc

Signed-off-by: Saarthak Verma <saarthakverma739@gmail.com>

* test(resume downloads): add resume partial download test

Signed-off-by: Saarthak Verma <saarthakverma739@gmail.com>

* feat(resume downloads): implement resumable downloads for interrupted transfers

- Adds support for resuming partially downloaded files
- Uses HTTP Range header to continue from last byte position
- Maintains download progress across interruptions
- Preserves partial downloads with .partial extension
- Validates SHA256 checksum after completion

Signed-off-by: Saarthak Verma <saarthakverma739@gmail.com>

* fix(resume downloads): incorrect download percent on front end

Signed-off-by: Saarthak Verma <saarthakverma739@gmail.com>

* feat(resume download): add range header check tc

Signed-off-by: Saarthak Verma <saarthakverma739@gmail.com>

* feat(resume download): implement range header check

Signed-off-by: Saarthak Verma <saarthakverma739@gmail.com>

---------

Signed-off-by: Saarthak Verma <saarthakverma739@gmail.com>
2025-01-09 09:22:52 +01:00
Ettore Di Giacinto
ae1340d59b chore: update labeler.yml to include go files (#4565)
Signed-off-by: Ettore Di Giacinto <mudler@users.noreply.github.com>
2025-01-09 09:20:51 +01:00
Ettore Di Giacinto
fc52f179fe chore(model gallery): add phi-4 (#4562)
Signed-off-by: Ettore Di Giacinto <mudler@localai.io>
2025-01-08 23:26:25 +01:00
LocalAI [bot]
4f43a9a162 chore: ⬆️ Update ggerganov/llama.cpp to 8d59d911711b8f1ba9ec57c4b192ccd2628af033 (#4561)
⬆️ Update ggerganov/llama.cpp

Signed-off-by: github-actions[bot] <41898282+github-actions[bot]@users.noreply.github.com>
Co-authored-by: mudler <2420543+mudler@users.noreply.github.com>
2025-01-08 21:43:56 +00:00
Ettore Di Giacinto
20edd44463 chore(model gallery): add dolphin3.0-qwen2.5-3b (#4560)
Signed-off-by: Ettore Di Giacinto <mudler@localai.io>
2025-01-08 11:57:25 +01:00
Ettore Di Giacinto
1a4f9d8453 chore(model gallery): add dolphin3.0-qwen2.5-1.5b (#4559)
Signed-off-by: Ettore Di Giacinto <mudler@localai.io>
2025-01-08 11:55:44 +01:00
Ettore Di Giacinto
f2dd33b8f4 chore(model gallery): add dolphin3.0-qwen2.5-0.5b (#4558)
Signed-off-by: Ettore Di Giacinto <mudler@localai.io>
2025-01-08 11:53:46 +01:00
LocalAI [bot]
25e988868c chore: ⬆️ Update ggerganov/llama.cpp to 53ff6b9b9fb25ed0ec0a213e05534fe7c3d0040f (#4556)
⬆️ Update ggerganov/llama.cpp

Signed-off-by: github-actions[bot] <41898282+github-actions[bot]@users.noreply.github.com>
Co-authored-by: mudler <2420543+mudler@users.noreply.github.com>
2025-01-08 11:51:14 +01:00
Ettore Di Giacinto
ab344e4f47 docs: update compatibility-table.md (#4557)
Signed-off-by: Ettore Di Giacinto <mudler@users.noreply.github.com>
2025-01-07 21:20:44 +01:00
Ettore Di Giacinto
fac7893dd6 chore(model gallery): add dolphin3.0-llama3.2-3b (#4555)
Signed-off-by: Ettore Di Giacinto <mudler@localai.io>
2025-01-07 20:27:29 +01:00
Ettore Di Giacinto
9be338cfe4 chore(model gallery): add dolphin3.0-llama3.2-1b (#4554)
Signed-off-by: Ettore Di Giacinto <mudler@localai.io>
2025-01-07 20:24:21 +01:00
Ettore Di Giacinto
b4d4f96919 chore(model gallery): add dolphin3.0-llama3.1-8b (#4553)
Signed-off-by: Ettore Di Giacinto <mudler@localai.io>
2025-01-07 20:23:35 +01:00
Max Goltzsche
8cc2d01caa feat(ui): path prefix support via HTTP header (#4497)
Makes the web app honour the `X-Forwarded-Prefix` HTTP request header that may be sent by a reverse-proxy in order to inform the app that its public routes contain a path prefix.
For instance this allows to serve the webapp via a reverse-proxy/ingress controller under a path prefix/sub path such as e.g. `/localai/` while still being able to use the regular LocalAI routes/paths without prefix when directly connecting to the LocalAI server.

Changes:
* Add new `StripPathPrefix` middleware to strip the path prefix (provided with the `X-Forwarded-Prefix` HTTP request header) from the request path prior to matching the HTTP route.
* Add a `BaseURL` utility function to build the base URL, honouring the `X-Forwarded-Prefix` HTTP request header.
* Generate the derived base URL into the HTML (`head.html` template) as `<base/>` tag.
* Make all webapp-internal URLs (within HTML+JS) relative in order to make the browser resolve them against the `<base/>` URL specified within each HTML page's header.
* Make font URLs within the CSS files relative to the CSS file.
* Generate redirect location URLs using the new `BaseURL` function.
* Use the new `BaseURL` function to generate absolute URLs within gallery JSON responses.

Closes #3095

TL;DR:
The header-based approach allows to move the path prefix configuration concern completely to the reverse-proxy/ingress as opposed to having to align the path prefix configuration between LocalAI, the reverse-proxy and potentially other internal LocalAI clients.
The gofiber swagger handler already supports path prefixes this way, see e2d9e9916d/swagger.go (L79)

Signed-off-by: Max Goltzsche <max.goltzsche@gmail.com>
2025-01-07 17:18:21 +01:00
LocalAI [bot]
bf37eebecb chore: ⬆️ Update ggerganov/llama.cpp to ecebbd292d741ac084cf248146b2cfb17002aa1d (#4552)
⬆️ Update ggerganov/llama.cpp

Signed-off-by: github-actions[bot] <41898282+github-actions[bot]@users.noreply.github.com>
Co-authored-by: mudler <2420543+mudler@users.noreply.github.com>
2025-01-07 10:14:07 +01:00
dependabot[bot]
3f0850b58b chore(deps): Bump docs/themes/hugo-theme-relearn from d25f856 to 80e448e (#4549)
chore(deps): Bump docs/themes/hugo-theme-relearn

Bumps [docs/themes/hugo-theme-relearn](https://github.com/McShelby/hugo-theme-relearn) from `d25f856` to `80e448e`.
- [Release notes](https://github.com/McShelby/hugo-theme-relearn/releases)
- [Commits](d25f856477...80e448e5bd)

---
updated-dependencies:
- dependency-name: docs/themes/hugo-theme-relearn
  dependency-type: direct:production
...

Signed-off-by: dependabot[bot] <support@github.com>
Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com>
2025-01-06 20:20:34 +00:00
Ettore Di Giacinto
2ffa89b8b9 chore(model gallery): add 14b-qwen2.5-kunou-v1 (#4547)
Signed-off-by: Ettore Di Giacinto <mudler@localai.io>
2025-01-06 10:43:09 +01:00
Ettore Di Giacinto
d43adc0205 chore(model gallery): add triangulum-10b (#4546)
Signed-off-by: Ettore Di Giacinto <mudler@localai.io>
2025-01-06 10:42:45 +01:00
Ettore Di Giacinto
78b34505ab chore(model gallery): add 32b-qwen2.5-kunou-v1 (#4545)
Signed-off-by: Ettore Di Giacinto <mudler@localai.io>
2025-01-06 10:08:35 +01:00
LocalAI [bot]
e55a1bed59 chore: ⬆️ Update ggerganov/llama.cpp to b56f079e28fda692f11a8b59200ceb815b05d419 (#4544)
⬆️ Update ggerganov/llama.cpp

Signed-off-by: github-actions[bot] <41898282+github-actions[bot]@users.noreply.github.com>
Co-authored-by: mudler <2420543+mudler@users.noreply.github.com>
2025-01-05 21:43:06 +00:00
Ettore Di Giacinto
0d7550ad54 chore(deps): bump grpcio to 1.69.0 (#4543)
Signed-off-by: Ettore Di Giacinto <mudler@localai.io>
2025-01-05 15:01:49 +01:00
Ettore Di Giacinto
b5992255ac chore(model gallery): add qwentile2.5-32b-instruct (#4541)
Signed-off-by: Ettore Di Giacinto <mudler@localai.io>
2025-01-05 09:22:00 +01:00
Ettore Di Giacinto
e845cc0401 chore(model gallery): add llama-deepsync-3b (#4540)
Signed-off-by: Ettore Di Giacinto <mudler@localai.io>
2025-01-05 09:19:05 +01:00
Ettore Di Giacinto
a10033e8a4 chore(model gallery): add experimental-lwd-mirau-rp-14b-iq-imatrix (#4539)
Signed-off-by: Ettore Di Giacinto <mudler@localai.io>
2025-01-05 09:12:12 +01:00
LocalAI [bot]
6c6d840e6b chore: ⬆️ Update ggerganov/llama.cpp to 9394bbd484f802ce80d2858033583af3ef700d25 (#4536)
⬆️ Update ggerganov/llama.cpp

Signed-off-by: github-actions[bot] <41898282+github-actions[bot]@users.noreply.github.com>
Co-authored-by: mudler <2420543+mudler@users.noreply.github.com>
2025-01-04 21:43:08 +00:00
Ettore Di Giacinto
a8b3b3d6f4 chore(model gallery): add llama3.1-8b-prm-deepseek-data (#4535)
Signed-off-by: Ettore Di Giacinto <mudler@localai.io>
2025-01-04 09:48:34 +01:00
Ettore Di Giacinto
ec66f7e3b1 chore(model gallery): add codepy-deepthink-3b (#4534)
Signed-off-by: Ettore Di Giacinto <mudler@localai.io>
2025-01-04 09:45:07 +01:00
Ettore Di Giacinto
05841c2435 chore(model gallery): add drt-o1-7b (#4533)
Signed-off-by: Ettore Di Giacinto <mudler@localai.io>
2025-01-04 09:44:14 +01:00
Ettore Di Giacinto
c553d73748 chore(deps): bump llama.cpp to 4b0c638b9 (#4532)
deps(llama.cpp): bump to 4b0c638b9

Signed-off-by: Ettore Di Giacinto <mudler@localai.io>
2025-01-04 09:40:08 +01:00
Ettore Di Giacinto
1006e8a2ed ci: disable arm jobs
Signed-off-by: Ettore Di Giacinto <mudler@users.noreply.github.com>
2025-01-03 21:58:04 +01:00
Ettore Di Giacinto
9bcfda171b ci: lower concurrent jobs
Signed-off-by: Ettore Di Giacinto <mudler@users.noreply.github.com>
2025-01-03 20:48:23 +01:00
Ettore Di Giacinto
baee4f7bd5 ci: split jobs
Signed-off-by: Ettore Di Giacinto <mudler@users.noreply.github.com>
2025-01-03 19:23:05 +01:00
Ettore Di Giacinto
286dc32fe0 ci(arm64): try building on self-hosted
Signed-off-by: Ettore Di Giacinto <mudler@users.noreply.github.com>
2025-01-03 19:18:18 +01:00
Ettore Di Giacinto
36e4c0fcf0 chore(model gallery): add nera_noctis-12b (#4530)
Signed-off-by: Ettore Di Giacinto <mudler@localai.io>
2025-01-03 09:10:03 +01:00
LocalAI [bot]
3c21c8789a chore: ⬆️ Update ggerganov/llama.cpp to 2f0ee84b9b02d2a98742308026f060ebdc2423f1 (#4528)
⬆️ Update ggerganov/llama.cpp

Signed-off-by: github-actions[bot] <41898282+github-actions[bot]@users.noreply.github.com>
Co-authored-by: mudler <2420543+mudler@users.noreply.github.com>
2025-01-02 21:43:37 +00:00
Ettore Di Giacinto
d9facbcee9 chore(model gallery): add l3.1-purosani-2-8b (#4527)
Signed-off-by: Ettore Di Giacinto <mudler@localai.io>
2025-01-02 10:46:11 +01:00
Ettore Di Giacinto
930280ecac chore(model gallery): add sainemo-remix (#4526)
Signed-off-by: Ettore Di Giacinto <mudler@localai.io>
2025-01-02 10:46:01 +01:00
Ettore Di Giacinto
3415e6ae74 chore(model gallery): add qwenwify2.5-32b-v4.5 (#4525)
Signed-off-by: Ettore Di Giacinto <mudler@localai.io>
2025-01-02 10:45:52 +01:00
Ettore Di Giacinto
f1082f3c6d chore(model gallery): add violet_twilight-v0.2 (#4524)
Signed-off-by: Ettore Di Giacinto <mudler@localai.io>
2025-01-01 14:41:48 +01:00
Ettore Di Giacinto
f345f7a795 chore(model gallery): add captain-eris-diogenes_twilight-v0.420-12b (#4523)
chore(model gallery): add captain-eris-diogenes_twilight-v0.420-12b-arm-imatrix

Signed-off-by: Ettore Di Giacinto <mudler@localai.io>
2025-01-01 13:33:39 +01:00
Ettore Di Giacinto
1a2a7a57b3 chore(model gallery): add mn-12b-mag-mell-r1-iq-arm-imatrix (#4522)
Signed-off-by: Ettore Di Giacinto <mudler@localai.io>
2025-01-01 13:27:13 +01:00
Ettore Di Giacinto
ae80a2bd24 chore(model gallery): add smallthinker-3b-preview (#4521)
Signed-off-by: Ettore Di Giacinto <mudler@localai.io>
2025-01-01 13:26:48 +01:00
LocalAI [bot]
c30ecdd535 chore: ⬆️ Update ggerganov/llama.cpp to 0827b2c1da299805288abbd556d869318f2b121e (#4520)
⬆️ Update ggerganov/llama.cpp

Signed-off-by: github-actions[bot] <41898282+github-actions[bot]@users.noreply.github.com>
Co-authored-by: mudler <2420543+mudler@users.noreply.github.com>
2024-12-31 21:43:29 +00:00
Ettore Di Giacinto
f16c7cef92 chore(model gallery): add q2.5-veltha-14b-0.5 (#4519)
Signed-off-by: Ettore Di Giacinto <mudler@localai.io>
2024-12-31 11:23:29 +01:00
Ettore Di Giacinto
e1dd78bcea chore(model gallery): add huatuogpt-o1-8b (#4518)
Signed-off-by: Ettore Di Giacinto <mudler@localai.io>
2024-12-31 11:23:07 +01:00
dependabot[bot]
25acb0cbbc chore(deps): Bump docs/themes/hugo-theme-relearn from ec88e24 to d25f856 (#4515)
chore(deps): Bump docs/themes/hugo-theme-relearn

Bumps [docs/themes/hugo-theme-relearn](https://github.com/McShelby/hugo-theme-relearn) from `ec88e24` to `d25f856`.
- [Release notes](https://github.com/McShelby/hugo-theme-relearn/releases)
- [Commits](ec88e24f46...d25f856477)

---
updated-dependencies:
- dependency-name: docs/themes/hugo-theme-relearn
  dependency-type: direct:production
...

Signed-off-by: dependabot[bot] <support@github.com>
Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com>
2024-12-31 11:18:51 +01:00
LocalAI [bot]
7674c80bb6 chore: ⬆️ Update ggerganov/llama.cpp to 716bd6dec3e044e5c325386b5b0483392b24cefe (#4516)
⬆️ Update ggerganov/llama.cpp

Signed-off-by: github-actions[bot] <41898282+github-actions[bot]@users.noreply.github.com>
Co-authored-by: mudler <2420543+mudler@users.noreply.github.com>
2024-12-31 11:18:33 +01:00
Ettore Di Giacinto
e044970a5b chore(model gallery): add qwen2.5-32b-rp-ink (#4517)
Signed-off-by: Ettore Di Giacinto <mudler@localai.io>
2024-12-31 11:17:29 +01:00
LocalAI [bot]
639526d207 chore: ⬆️ Update leejet/stable-diffusion.cpp to dcf91f9e0f2cbf9da472ee2a556751ed4bab2d2a (#4509)
⬆️ Update leejet/stable-diffusion.cpp

Signed-off-by: github-actions[bot] <41898282+github-actions[bot]@users.noreply.github.com>
Co-authored-by: mudler <2420543+mudler@users.noreply.github.com>
2024-12-30 21:44:53 +00:00
dependabot[bot]
998ff9fa22 chore(deps): Bump gradio from 3.48.0 to 5.9.1 in /backend/python/openvoice (#4514)
chore(deps): Bump gradio in /backend/python/openvoice

Bumps [gradio](https://github.com/gradio-app/gradio) from 3.48.0 to 5.9.1.
- [Release notes](https://github.com/gradio-app/gradio/releases)
- [Changelog](https://github.com/gradio-app/gradio/blob/main/CHANGELOG.md)
- [Commits](https://github.com/gradio-app/gradio/compare/gradio@3.48.0...gradio@5.9.1)

---
updated-dependencies:
- dependency-name: gradio
  dependency-type: direct:production
  update-type: version-update:semver-major
...

Signed-off-by: dependabot[bot] <support@github.com>
Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com>
2024-12-30 20:05:45 +00:00
LocalAI [bot]
7122c7472e chore: ⬆️ Update ggerganov/llama.cpp to a813badbbdf0d38705f249df7a0c99af5cdee678 (#4512)
⬆️ Update ggerganov/llama.cpp

Signed-off-by: github-actions[bot] <41898282+github-actions[bot]@users.noreply.github.com>
Co-authored-by: mudler <2420543+mudler@users.noreply.github.com>
2024-12-29 21:42:38 +00:00
LocalAI [bot]
671381267a chore: ⬆️ Update ggerganov/llama.cpp to f865ea149d71ef883e3780fced8a20a1464eccf4 (#4510)
⬆️ Update ggerganov/llama.cpp

Signed-off-by: github-actions[bot] <41898282+github-actions[bot]@users.noreply.github.com>
Co-authored-by: mudler <2420543+mudler@users.noreply.github.com>
2024-12-28 21:43:05 +00:00
Ettore Di Giacinto
d1762e098e chore(model gallery): add miscii-14b-1225 (#4508)
Signed-off-by: Ettore Di Giacinto <mudler@localai.io>
2024-12-28 10:56:59 +01:00
Ettore Di Giacinto
270d33504b chore(model gallery): add miscii-14b-1028 (#4507)
Signed-off-by: Ettore Di Giacinto <mudler@localai.io>
2024-12-28 10:54:47 +01:00
Ettore Di Giacinto
9b0983d027 chore(model gallery): add control-nanuq-8b (#4506)
Signed-off-by: Ettore Di Giacinto <mudler@localai.io>
2024-12-28 10:49:53 +01:00
Ettore Di Giacinto
afd0af987d Update README.md
Signed-off-by: Ettore Di Giacinto <mudler@users.noreply.github.com>
2024-12-27 15:17:02 +01:00
Ettore Di Giacinto
58524d40c9 Update README.md
Signed-off-by: Ettore Di Giacinto <mudler@users.noreply.github.com>
2024-12-27 15:13:06 +01:00
Ettore Di Giacinto
2a7222c6aa chore(model gallery): add falcon3-7b-instruct-abliterated (#4504)
Signed-off-by: Ettore Di Giacinto <mudler@localai.io>
2024-12-27 11:29:34 +01:00
Ettore Di Giacinto
0093985e7c chore(model gallery): add falcon3-10b-instruct-abliterated (#4503)
Signed-off-by: Ettore Di Giacinto <mudler@localai.io>
2024-12-27 11:24:13 +01:00
Ettore Di Giacinto
7f51e2dddf chore(model gallery): add falcon3-3b-instruct-abliterated (#4502)
Signed-off-by: Ettore Di Giacinto <mudler@localai.io>
2024-12-27 11:20:10 +01:00
Ettore Di Giacinto
f3bbdef77d chore(model gallery): add falcon3-1b-instruct-abliterated (#4501)
Signed-off-by: Ettore Di Giacinto <mudler@localai.io>
2024-12-27 11:15:56 +01:00
LocalAI [bot]
9cbf168dc0 chore: ⬆️ Update ggerganov/llama.cpp to d79d8f39b4da6deca4aea8bf130c6034c482b320 (#4500)
⬆️ Update ggerganov/llama.cpp

Signed-off-by: github-actions[bot] <41898282+github-actions[bot]@users.noreply.github.com>
Co-authored-by: mudler <2420543+mudler@users.noreply.github.com>
2024-12-26 21:44:36 +00:00
Ettore Di Giacinto
9572f0577b chore(model gallery): add teleut-7b-rp (#4499)
Signed-off-by: Ettore Di Giacinto <mudler@localai.io>
2024-12-26 10:52:57 +01:00
Ettore Di Giacinto
1a14c7d45a chore(model gallery): add qvq-72b-preview (#4498)
Signed-off-by: Ettore Di Giacinto <mudler@localai.io>
2024-12-26 10:47:54 +01:00
LocalAI [bot]
5c29e0cd4d chore: ⬆️ Update ggerganov/llama.cpp to 9ba399dfa7f115effc63d48e6860a94c9faa31b2 (#4496)
⬆️ Update ggerganov/llama.cpp

Signed-off-by: github-actions[bot] <41898282+github-actions[bot]@users.noreply.github.com>
Co-authored-by: mudler <2420543+mudler@users.noreply.github.com>
2024-12-25 21:43:06 +00:00
Ettore Di Giacinto
1a74af1492 chore(model gallery): add llama-3.1-8b-open-sft (#4495)
Signed-off-by: Ettore Di Giacinto <mudler@localai.io>
2024-12-25 11:47:33 +01:00
Ettore Di Giacinto
8f6332ab23 chore(model gallery): add dans-personalityengine-v1.1.0-12b (#4494)
Signed-off-by: Ettore Di Giacinto <mudler@localai.io>
2024-12-25 11:47:22 +01:00
Ettore Di Giacinto
816ae7a53a chore(model gallery): add fastllama-3.2-1b-instruct (#4493)
Signed-off-by: Ettore Di Giacinto <mudler@localai.io>
2024-12-25 11:47:11 +01:00
LocalAI [bot]
1d630e4185 chore(model-gallery): ⬆️ update checksum (#4492)
⬆️ Checksum updates in gallery/index.yaml

Signed-off-by: github-actions[bot] <41898282+github-actions[bot]@users.noreply.github.com>
Co-authored-by: mudler <2420543+mudler@users.noreply.github.com>
2024-12-24 23:04:50 +01:00
LocalAI [bot]
bc8dd3ad14 chore: ⬆️ Update ggerganov/llama.cpp to 2cd43f4900ba0e34124fdcbf02a7f9df25a10a3d (#4491)
⬆️ Update ggerganov/llama.cpp

Signed-off-by: github-actions[bot] <41898282+github-actions[bot]@users.noreply.github.com>
Co-authored-by: mudler <2420543+mudler@users.noreply.github.com>
2024-12-24 21:44:11 +00:00
Ettore Di Giacinto
b969053701 chore(gallery): re-order
Signed-off-by: Ettore Di Giacinto <mudler@users.noreply.github.com>
2024-12-24 11:10:56 +01:00
Ettore Di Giacinto
60bf7c9dd7 chore(model gallery): add rombos-llm-70b-llama-3.3 (#4490)
Signed-off-by: Ettore Di Giacinto <mudler@localai.io>
2024-12-24 11:04:31 +01:00
Ettore Di Giacinto
d65c10cee7 chore(model gallery): add tqwendo-36b (#4489)
Signed-off-by: Ettore Di Giacinto <mudler@localai.io>
2024-12-24 11:04:19 +01:00
Ettore Di Giacinto
6c71698299 chore(model gallery): add l3.3-ms-evalebis-70b (#4488)
Signed-off-by: Ettore Di Giacinto <mudler@localai.io>
2024-12-24 10:59:36 +01:00
LocalAI [bot]
c7c275c7c8 chore(model-gallery): ⬆️ update checksum (#4487)
⬆️ Checksum updates in gallery/index.yaml

Signed-off-by: github-actions[bot] <41898282+github-actions[bot]@users.noreply.github.com>
Co-authored-by: mudler <2420543+mudler@users.noreply.github.com>
2024-12-24 10:56:02 +01:00
LocalAI [bot]
d0adbee75d chore: ⬆️ Update ggerganov/llama.cpp to 32d6ee6385b3fc908b283f509b845f757a6e7206 (#4486)
⬆️ Update ggerganov/llama.cpp

Signed-off-by: github-actions[bot] <41898282+github-actions[bot]@users.noreply.github.com>
Co-authored-by: mudler <2420543+mudler@users.noreply.github.com>
2024-12-24 10:55:30 +01:00
dependabot[bot]
159a7f6df2 chore(deps): Bump docs/themes/hugo-theme-relearn from bd1f3d3 to ec88e24 (#4460)
chore(deps): Bump docs/themes/hugo-theme-relearn

Bumps [docs/themes/hugo-theme-relearn](https://github.com/McShelby/hugo-theme-relearn) from `bd1f3d3` to `ec88e24`.
- [Release notes](https://github.com/McShelby/hugo-theme-relearn/releases)
- [Commits](bd1f3d3432...ec88e24f46)

---
updated-dependencies:
- dependency-name: docs/themes/hugo-theme-relearn
  dependency-type: direct:production
...

Signed-off-by: dependabot[bot] <support@github.com>
Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com>
2024-12-23 22:15:38 +00:00
Ettore Di Giacinto
0eb2911aad chore(llava): update clip.patch (#4453)
Signed-off-by: Ettore Di Giacinto <mudler@localai.io>
2024-12-23 19:11:31 +01:00
Ettore Di Giacinto
cab9f88ca4 chore(docs): add nvidia l4t instructions (#4454)
Signed-off-by: Ettore Di Giacinto <mudler@localai.io>
2024-12-23 18:59:33 +01:00
Ettore Di Giacinto
a3b675b09e Delete .cirrus.yml
Signed-off-by: Ettore Di Giacinto <mudler@users.noreply.github.com>
2024-12-23 18:31:50 +01:00
Ettore Di Giacinto
6477913e8f chore(ci): increase task timeout
Signed-off-by: Ettore Di Giacinto <mudler@users.noreply.github.com>
2024-12-23 16:43:32 +01:00
Ettore Di Giacinto
138cd97ce7 chore(ci): try to add CirrusCI to build arm64 images natively
Signed-off-by: Ettore Di Giacinto <mudler@localai.io>
2024-12-23 15:37:57 +01:00
Ettore Di Giacinto
4dd9ac39b0 chore(ci): comment arm64 job until we find a native CI runner (#4452)
Signed-off-by: Ettore Di Giacinto <mudler@localai.io>
2024-12-23 12:34:39 +01:00
LocalAI [bot]
23499ddc8a chore: ⬆️ Update ggerganov/llama.cpp to ebdee9478ca7ba65497b9b96f7457698c6ee5115 (#4451)
⬆️ Update ggerganov/llama.cpp

Signed-off-by: github-actions[bot] <41898282+github-actions[bot]@users.noreply.github.com>
Co-authored-by: mudler <2420543+mudler@users.noreply.github.com>
2024-12-22 22:56:41 +00:00
Ettore Di Giacinto
8864156300 chore(nvidia-l4t): add l4t arm64 images (#4449)
chore(nvidia-l4t): add nvidia-l4t arm64 images

Signed-off-by: Ettore Di Giacinto <mudler@localai.io>
2024-12-22 21:29:33 +01:00
Ettore Di Giacinto
478014ca18 feat(Dockerfile): allow to skip driver installation (#4447)
Signed-off-by: Ettore Di Giacinto <mudler@localai.io>
2024-12-22 21:28:38 +01:00
Ettore Di Giacinto
d45477b003 chore(model gallery): add llama-3.3-70b-instruct-ablated (#4448)
Signed-off-by: Ettore Di Giacinto <mudler@localai.io>
2024-12-22 08:38:47 +01:00
Ettore Di Giacinto
396fb88e33 chore(model gallery): add anubis-70b-v1 (#4446)
Signed-off-by: Ettore Di Giacinto <mudler@localai.io>
2024-12-22 08:33:35 +01:00
LocalAI [bot]
a429ec1b3f chore: ⬆️ Update ggerganov/llama.cpp to 5cd85b5e008de2ec398d6596e240187d627561e3 (#4445)
⬆️ Update ggerganov/llama.cpp

Signed-off-by: github-actions[bot] <41898282+github-actions[bot]@users.noreply.github.com>
Co-authored-by: mudler <2420543+mudler@users.noreply.github.com>
2024-12-21 21:44:44 +00:00
Ettore Di Giacinto
5b5fb9c22a chore(model gallery): add orca_mini_v8_1_70b (#4444)
Signed-off-by: Ettore Di Giacinto <mudler@localai.io>
2024-12-21 10:39:58 +01:00
LocalAI [bot]
801a87c3a6 chore: ⬆️ Update ggerganov/llama.cpp to eb5c3dc64bd967f2e23c87d9dec195f45468de60 (#4442)
⬆️ Update ggerganov/llama.cpp

Signed-off-by: github-actions[bot] <41898282+github-actions[bot]@users.noreply.github.com>
Co-authored-by: mudler <2420543+mudler@users.noreply.github.com>
2024-12-20 21:48:03 +00:00
Ettore Di Giacinto
badbd212f7 chore(model gallery): add tq2.5-14b-neon-v1 (#4441)
Signed-off-by: Ettore Di Giacinto <mudler@localai.io>
2024-12-20 16:11:16 +01:00
Ettore Di Giacinto
c4bbecc4d6 chore(model gallery): add tq2.5-14b-aletheia-v1 (#4440)
Signed-off-by: Ettore Di Giacinto <mudler@localai.io>
2024-12-20 16:08:23 +01:00
Ettore Di Giacinto
8a08e9ec67 fix(openvoice): pin numpy before installing torch (#4439)
Signed-off-by: Ettore Di Giacinto <mudler@localai.io>
2024-12-20 10:34:23 +01:00
LocalAI [bot]
61e486dbf5 chore: ⬆️ Update ggerganov/llama.cpp to d408bb9268a988c5a60a5746d3a6430386e7604d (#4437)
⬆️ Update ggerganov/llama.cpp

Signed-off-by: github-actions[bot] <41898282+github-actions[bot]@users.noreply.github.com>
Co-authored-by: mudler <2420543+mudler@users.noreply.github.com>
Co-authored-by: Ettore Di Giacinto <mudler@users.noreply.github.com>
2024-12-19 23:03:47 +00:00
Ettore Di Giacinto
f2f387e1dd fix(openvoice): do not pin numpy (#4438)
Signed-off-by: Ettore Di Giacinto <mudler@localai.io>
2024-12-19 21:30:43 +01:00
Ettore Di Giacinto
3be9a08fc9 fix(deps): pin openvoice pytorch/torchaudio (#4436)
Signed-off-by: Ettore Di Giacinto <mudler@localai.io>
2024-12-19 18:24:33 +01:00
Ettore Di Giacinto
b325807c60 fix(intel): pin torch and intel-extensions (#4435)
* fix(intel): pin torch version

Signed-off-by: Ettore Di Giacinto <mudler@localai.io>

* fix(intel): pin intel packages version

Signed-off-by: Ettore Di Giacinto <mudler@localai.io>

---------

Signed-off-by: Ettore Di Giacinto <mudler@localai.io>
2024-12-19 15:39:32 +01:00
jtwolfe
ae9855a39e chore(docs): patch p2p detail in env and docs (#4434)
* Update distributed_inferencing.md

Signed-off-by: jtwolfe <jamie.t.wolfe@gmail.com>

* Update .env

Signed-off-by: jtwolfe <jamie.t.wolfe@gmail.com>

* Update distributed_inferencing.md

whoops

Signed-off-by: jtwolfe <jamie.t.wolfe@gmail.com>

---------

Signed-off-by: jtwolfe <jamie.t.wolfe@gmail.com>
Co-authored-by: Ettore Di Giacinto <mudler@users.noreply.github.com>
2024-12-19 15:19:31 +01:00
LocalAI [bot]
9ac62b589f chore: ⬆️ Update ggerganov/llama.cpp to cd920d0ac38ec243605a5a57c50941140a193f9e (#4433)
⬆️ Update ggerganov/llama.cpp

Signed-off-by: github-actions[bot] <41898282+github-actions[bot]@users.noreply.github.com>
Co-authored-by: mudler <2420543+mudler@users.noreply.github.com>
2024-12-19 12:15:30 +00:00
Ettore Di Giacinto
d12660a286 chore(model gallery): add llama-chat-summary-3.2-3b (#4432)
Signed-off-by: Ettore Di Giacinto <mudler@localai.io>
2024-12-19 09:56:19 +01:00
LocalAI [bot]
3d3bd2d10f chore: ⬆️ Update ggerganov/llama.cpp to 0bf2d10c5514ff61b99897a4a5054f846e384e1e (#4429)
⬆️ Update ggerganov/llama.cpp

Signed-off-by: github-actions[bot] <41898282+github-actions[bot]@users.noreply.github.com>
Co-authored-by: mudler <2420543+mudler@users.noreply.github.com>
2024-12-19 09:53:49 +01:00
Ettore Di Giacinto
b656d10556 chore(model gallery): add llama-song-stream-3b-instruct (#4431)
Signed-off-by: Ettore Di Giacinto <mudler@localai.io>
2024-12-19 09:48:33 +01:00
Ettore Di Giacinto
8c67f38ef6 chore(model gallery): add falcon3-10b-instruct (#4426)
Signed-off-by: Ettore Di Giacinto <mudler@localai.io>
2024-12-18 10:36:41 +01:00
Ettore Di Giacinto
4623728cd7 chore(model gallery): add qwen2-vl-72b-instruct (#4425)
Signed-off-by: Ettore Di Giacinto <mudler@localai.io>
2024-12-18 10:35:49 +01:00
Ettore Di Giacinto
5f804aa6e8 chore(model gallery): add falcon3-3b-instruct (#4424)
Signed-off-by: Ettore Di Giacinto <mudler@localai.io>
2024-12-18 10:32:31 +01:00
Ettore Di Giacinto
f52c6e3a31 chore(model gallery): add falcon3-1b-instruct (#4423)
Signed-off-by: Ettore Di Giacinto <mudler@localai.io>
2024-12-18 10:12:06 +01:00
Ettore Di Giacinto
0b4bb7a562 chore(model gallery): add llama-openreviewer-8b (#4422)
Signed-off-by: Ettore Di Giacinto <mudler@localai.io>
2024-12-18 09:49:48 +01:00
mintyleaf
2bc4b56a79 feat: stream tokens usage (#4415)
* Use pb.Reply instead of []byte with Reply.GetMessage() in llama grpc to get the proper usage data in reply streaming mode at the last [DONE] frame

* Fix 'hang' on empty message from the start

Seems like that empty message marker trick was unnecessary

---------

Co-authored-by: Ettore Di Giacinto <mudler@users.noreply.github.com>
2024-12-18 09:48:50 +01:00
LocalAI [bot]
fc920cc58a chore: ⬆️ Update ggerganov/llama.cpp to 081b29bd2a3d91e7772e3910ce223dd63b8d7d26 (#4421)
⬆️ Update ggerganov/llama.cpp

Signed-off-by: github-actions[bot] <41898282+github-actions[bot]@users.noreply.github.com>
Co-authored-by: mudler <2420543+mudler@users.noreply.github.com>
2024-12-17 22:15:14 +00:00
Ettore Di Giacinto
fdb560b8e5 chore(model gallery): add qwq-lcot-7b-instruct (#4419)
Signed-off-by: Ettore Di Giacinto <mudler@localai.io>
2024-12-17 10:10:37 +01:00
Ettore Di Giacinto
708cba0c1b chore(llama.cpp): bump, drop penalize_nl (#4418)
deps(llama.cpp): bump, drop penalize_nl

Signed-off-by: Ettore Di Giacinto <mudler@localai.io>
2024-12-17 00:47:52 +01:00
Ettore Di Giacinto
24abf568cb chore(tests): stabilize tts test (#4417)
chore(tests): stabilize test

Signed-off-by: Ettore Di Giacinto <mudler@localai.io>
2024-12-17 00:46:48 +01:00
Ettore Di Giacinto
7ca0e2d925 fix(python): remove pin to setuptools, pin python version (#4395)
fix(setuptools): remove pin

Signed-off-by: Ettore Di Giacinto <mudler@localai.io>
2024-12-16 10:55:02 +01:00
Ettore Di Giacinto
037e8030bf chore(model gallery): add qwen2-7b-multilingual-rp (#4394)
Signed-off-by: Ettore Di Giacinto <mudler@localai.io>
2024-12-16 09:48:33 +01:00
Ettore Di Giacinto
472d11f884 chore(model gallery): add marco-o1-uncensored (#4393)
Signed-off-by: Ettore Di Giacinto <mudler@localai.io>
2024-12-16 09:48:23 +01:00
Ettore Di Giacinto
b40d5d12b7 chore(model gallery): add naturallm-7b-instruct (#4392)
Signed-off-by: Ettore Di Giacinto <mudler@localai.io>
2024-12-16 09:47:49 +01:00
LocalAI [bot]
6938618e30 chore: ⬆️ Update ggerganov/llama.cpp to a0974156f334acf8af5858d7ede5ab7d7490d415 (#4391)
⬆️ Update ggerganov/llama.cpp

Signed-off-by: github-actions[bot] <41898282+github-actions[bot]@users.noreply.github.com>
Co-authored-by: mudler <2420543+mudler@users.noreply.github.com>
2024-12-15 22:01:44 +00:00
Ettore Di Giacinto
5d9c530eaa fix(gallery): disable default embeddings
Do not always enable embeddings on llama32, but let specific models settings

Signed-off-by: Ettore Di Giacinto <mudler@users.noreply.github.com>
2024-12-15 18:43:39 +01:00
Ettore Di Giacinto
9429a53db7 chore(model gallery): add neumind-math-7b-instruct (#4388)
Signed-off-by: Ettore Di Giacinto <mudler@localai.io>
2024-12-15 10:07:56 +01:00
Ettore Di Giacinto
1d6d301370 chore(model gallery): add fusechat-llama-3.1-8b-instruct (#4387)
Signed-off-by: Ettore Di Giacinto <mudler@localai.io>
2024-12-15 10:07:42 +01:00
Ettore Di Giacinto
8f2be82667 chore(model gallery): add fusechat-llama-3.2-3b-instruct (#4386)
Signed-off-by: Ettore Di Giacinto <mudler@localai.io>
2024-12-15 10:07:30 +01:00
LocalAI [bot]
cca911f3e5 chore: ⬆️ Update ggerganov/llama.cpp to e52aba537a34d51a65cddec6bc6dafc9031edc63 (#4385)
⬆️ Update ggerganov/llama.cpp

Signed-off-by: github-actions[bot] <41898282+github-actions[bot]@users.noreply.github.com>
Co-authored-by: mudler <2420543+mudler@users.noreply.github.com>
Co-authored-by: Ettore Di Giacinto <mudler@users.noreply.github.com>
2024-12-15 09:59:20 +01:00
Jason Godsey
e37bbbaacc fix: correct gallery/index.yaml (#4384)
Update index.yaml

DBG YAML errors: line 2025: cannot unmarshal !!str `-https:...` into []string

Signed-off-by: Jason Godsey <godsey@users.noreply.github.com>
2024-12-14 21:25:51 +01:00
Ettore Di Giacinto
59cbf38b4b fix(gallery): correct syntax typo
Signed-off-by: Ettore Di Giacinto <mudler@users.noreply.github.com>
2024-12-14 21:21:27 +01:00
Ettore Di Giacinto
432c31d904 chore(model gallery): add chronos-gold-12b-1.0 (#4381)
Signed-off-by: Ettore Di Giacinto <mudler@localai.io>
2024-12-14 11:27:25 +01:00
Ettore Di Giacinto
af33483687 chore(model gallery): add fusechat-qwen-2.5-7b-instruct (#4380)
Signed-off-by: Ettore Di Giacinto <mudler@localai.io>
2024-12-14 11:27:11 +01:00
Ettore Di Giacinto
5051074845 chore(model gallery): add fusechat-gemma-2-9b-instruct (#4379)
Signed-off-by: Ettore Di Giacinto <mudler@localai.io>
2024-12-14 11:26:40 +01:00
Ettore Di Giacinto
fc4a714992 feat(llama.cpp): bump and adapt to upstream changes (#4378)
Signed-off-by: Ettore Di Giacinto <mudler@localai.io>
2024-12-14 00:30:52 +01:00
Ettore Di Giacinto
0429e00746 chore(model gallery): add hermes-3-llama-3.2-3b (#4376)
Signed-off-by: Ettore Di Giacinto <mudler@localai.io>
2024-12-13 09:51:26 +01:00
Ettore Di Giacinto
73f1f25b9a chore(model gallery): add evathene-v1.3 (#4375)
Signed-off-by: Ettore Di Giacinto <mudler@localai.io>
2024-12-13 09:51:13 +01:00
Ettore Di Giacinto
044570fa85 chore(model gallery): add l3.3-ms-evayale-70b (#4374)
Signed-off-by: Ettore Di Giacinto <mudler@localai.io>
2024-12-13 09:50:41 +01:00
LocalAI [bot]
37527420de chore: ⬆️ Update ggerganov/llama.cpp to 274ec65af6e54039eb95cb44904af5c945dca1fa (#4372)
⬆️ Update ggerganov/llama.cpp

Signed-off-by: github-actions[bot] <41898282+github-actions[bot]@users.noreply.github.com>
Co-authored-by: mudler <2420543+mudler@users.noreply.github.com>
2024-12-12 21:44:54 +00:00
Ettore Di Giacinto
1854b8c612 chore(model gallery): add l3.3-70b-euryale-v2.3 (#4371)
Signed-off-by: Ettore Di Giacinto <mudler@localai.io>
2024-12-12 12:22:48 +01:00
Ettore Di Giacinto
b8824f2ad9 chore(model gallery): add deepthought-8b-llama-v0.01-alpha (#4370)
Signed-off-by: Ettore Di Giacinto <mudler@localai.io>
2024-12-12 12:07:57 +01:00
Ettore Di Giacinto
3ab83e91df chore(model gallery): add 72b-qwen2.5-kunou-v1 (#4369)
Signed-off-by: Ettore Di Giacinto <mudler@localai.io>
2024-12-12 12:07:41 +01:00
LocalAI [bot]
f2cb261797 chore: ⬆️ Update ggerganov/llama.cpp to 235f6e14bf0ed0211c51aeff14139038ae1000aa (#4366)
⬆️ Update ggerganov/llama.cpp

Signed-off-by: github-actions[bot] <41898282+github-actions[bot]@users.noreply.github.com>
Co-authored-by: mudler <2420543+mudler@users.noreply.github.com>
2024-12-12 09:23:36 +01:00
Ettore Di Giacinto
c85f46a71d chore(model gallery): add sailor2-20b-chat (#4365)
Signed-off-by: Ettore Di Giacinto <mudler@localai.io>
2024-12-11 10:55:04 +01:00
Ettore Di Giacinto
75b283d83c chore(model gallery): add sailor2-8b-chat (#4364)
Signed-off-by: Ettore Di Giacinto <mudler@localai.io>
2024-12-11 10:51:39 +01:00
Ettore Di Giacinto
1918efdfdd chore(model gallery): add sailor2-1b-chat (#4363)
Signed-off-by: Ettore Di Giacinto <mudler@localai.io>
2024-12-11 10:32:18 +01:00
LocalAI [bot]
ec239a0cd0 docs: ⬆️ update docs version mudler/LocalAI (#4359)
⬆️ Update docs version mudler/LocalAI

Signed-off-by: github-actions[bot] <41898282+github-actions[bot]@users.noreply.github.com>
Co-authored-by: mudler <2420543+mudler@users.noreply.github.com>
2024-12-11 10:04:16 +01:00
LocalAI [bot]
b74a936178 chore: ⬆️ Update ggerganov/llama.cpp to dafae66cc242eb766797194d3c85c5e502625623 (#4360)
⬆️ Update ggerganov/llama.cpp

Signed-off-by: github-actions[bot] <41898282+github-actions[bot]@users.noreply.github.com>
Co-authored-by: mudler <2420543+mudler@users.noreply.github.com>
2024-12-10 21:45:42 +00:00
Ettore Di Giacinto
de1ddb8ba6 chore(model gallery): add b-nimita-l3-8b-v0.02 (#4357)
Signed-off-by: Ettore Di Giacinto <mudler@localai.io>
2024-12-10 09:42:47 +01:00
Ettore Di Giacinto
272763f625 chore(model gallery): add intellect-1-instruct (#4356)
Signed-off-by: Ettore Di Giacinto <mudler@localai.io>
2024-12-10 09:42:37 +01:00
Ettore Di Giacinto
3aff87a5cf chore(model gallery): add qwen2.5-math-14b-instruct (#4355)
Signed-off-by: Ettore Di Giacinto <mudler@localai.io>
2024-12-10 09:42:24 +01:00
LocalAI [bot]
885118e863 chore: ⬆️ Update ggerganov/llama.cpp to 26a8406ba9198eb6fdd8329fa717555b4f77f05f (#4353)
⬆️ Update ggerganov/llama.cpp

Signed-off-by: github-actions[bot] <41898282+github-actions[bot]@users.noreply.github.com>
Co-authored-by: mudler <2420543+mudler@users.noreply.github.com>
2024-12-10 09:10:58 +01:00
dependabot[bot]
a03a9b9e51 chore(deps): Bump docs/themes/hugo-theme-relearn from be85052 to bd1f3d3 (#4348)
chore(deps): Bump docs/themes/hugo-theme-relearn

Bumps [docs/themes/hugo-theme-relearn](https://github.com/McShelby/hugo-theme-relearn) from `be85052` to `bd1f3d3`.
- [Release notes](https://github.com/McShelby/hugo-theme-relearn/releases)
- [Commits](be85052efe...bd1f3d3432)

---
updated-dependencies:
- dependency-name: docs/themes/hugo-theme-relearn
  dependency-type: direct:production
...

Signed-off-by: dependabot[bot] <support@github.com>
Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com>
2024-12-09 20:09:26 +00:00
Ettore Di Giacinto
f45d6c746a chore(model gallery): add tulu-3.1-8b-supernova-smart (#4347)
Signed-off-by: Ettore Di Giacinto <mudler@localai.io>
2024-12-09 15:58:29 +01:00
Ettore Di Giacinto
5eceb5f67c chore(model gallery): add impish_mind_8b (#4344)
Signed-off-by: Ettore Di Giacinto <mudler@localai.io>
2024-12-09 10:24:30 +01:00
Ettore Di Giacinto
a9c0dd3a1e chore(model gallery): add qwen2.5-7b-homeranvita-nerdmix (#4343)
Signed-off-by: Ettore Di Giacinto <mudler@localai.io>
2024-12-09 10:24:15 +01:00
LocalAI [bot]
fb17e737f0 docs: ⬆️ update docs version mudler/LocalAI (#4341)
⬆️ Update docs version mudler/LocalAI

Signed-off-by: github-actions[bot] <41898282+github-actions[bot]@users.noreply.github.com>
Co-authored-by: mudler <2420543+mudler@users.noreply.github.com>
2024-12-09 09:19:25 +01:00
LocalAI [bot]
b5a21202ed chore: ⬆️ Update ggerganov/llama.cpp to e52522b8694ae73abf12feb18d29168674aa1c1b (#4342)
⬆️ Update ggerganov/llama.cpp

Signed-off-by: github-actions[bot] <41898282+github-actions[bot]@users.noreply.github.com>
Co-authored-by: mudler <2420543+mudler@users.noreply.github.com>
2024-12-08 22:54:06 +00:00
Ettore Di Giacinto
e147f1bd3e chore(model gallery): add bio-medical-llama-3-8b (#4339)
Signed-off-by: Ettore Di Giacinto <mudler@localai.io>
2024-12-08 18:43:26 +01:00
Ettore Di Giacinto
61839efed2 chore(model gallery): add virtuoso-small (#4338)
Signed-off-by: Ettore Di Giacinto <mudler@localai.io>
2024-12-08 18:01:25 +01:00
Ettore Di Giacinto
a0fe050055 chore(model gallery): add mn-chunky-lotus-12b (#4337)
Signed-off-by: Ettore Di Giacinto <mudler@localai.io>
2024-12-08 18:01:16 +01:00
Ettore Di Giacinto
f943c4b803 Revert "feat: include tokens usage for streamed output" (#4336)
Revert "feat: include tokens usage for streamed output (#4282)"

This reverts commit 0d6c3a7d57.
2024-12-08 17:53:36 +01:00
Ettore Di Giacinto
cea5a0ea42 feat(template): read jinja templates from gguf files (#4332)
* Read jinja templates as fallback

Signed-off-by: Ettore Di Giacinto <mudler@localai.io>

* Move templating out of model loader

Signed-off-by: Ettore Di Giacinto <mudler@localai.io>

* Test TemplateMessages

Signed-off-by: Ettore Di Giacinto <mudler@localai.io>

* Set role and content from transformers

Signed-off-by: Ettore Di Giacinto <mudler@localai.io>

* Tests: be more flexible

Signed-off-by: Ettore Di Giacinto <mudler@localai.io>

* More jinja

Signed-off-by: Ettore Di Giacinto <mudler@localai.io>

* Small refactoring and adaptations

Signed-off-by: Ettore Di Giacinto <mudler@localai.io>

---------

Signed-off-by: Ettore Di Giacinto <mudler@localai.io>
2024-12-08 13:50:33 +01:00
LocalAI [bot]
f5e1527a5a chore: ⬆️ Update ggerganov/llama.cpp to 3573fa8e7b7f0865638b52b4e9b4d2006f0558a2 (#4335)
⬆️ Update ggerganov/llama.cpp

Signed-off-by: github-actions[bot] <41898282+github-actions[bot]@users.noreply.github.com>
Co-authored-by: mudler <2420543+mudler@users.noreply.github.com>
2024-12-07 21:51:45 +00:00
Ettore Di Giacinto
7184ca546f chore(model gallery): add llama-3.3-70b-instruct (#4333)
Signed-off-by: Ettore Di Giacinto <mudler@localai.io>
2024-12-07 10:39:20 +01:00
LocalAI [bot]
5592f5e820 chore: ⬆️ Update ggerganov/llama.cpp to c5ede3849fc021174862f9c0bf8273808d8f0d39 (#4330)
⬆️ Update ggerganov/llama.cpp

Signed-off-by: github-actions[bot] <41898282+github-actions[bot]@users.noreply.github.com>
Co-authored-by: mudler <2420543+mudler@users.noreply.github.com>
2024-12-06 21:46:51 +00:00
Ettore Di Giacinto
d4c1746c7d feat(llama.cpp): expose cache_type_k and cache_type_v for quant of kv cache (#4329)
Signed-off-by: Ettore Di Giacinto <mudler@localai.io>
2024-12-06 10:23:59 +01:00
LocalAI [bot]
88737e1d76 chore: ⬆️ Update ggerganov/llama.cpp to c9c6e01daedac542b174c235872569fce5385982 (#4328)
⬆️ Update ggerganov/llama.cpp

Signed-off-by: github-actions[bot] <41898282+github-actions[bot]@users.noreply.github.com>
Co-authored-by: mudler <2420543+mudler@users.noreply.github.com>
2024-12-06 09:15:21 +01:00
LocalAI [bot]
ba225f660b docs: ⬆️ update docs version mudler/LocalAI (#4327)
⬆️ Update docs version mudler/LocalAI

Signed-off-by: github-actions[bot] <41898282+github-actions[bot]@users.noreply.github.com>
Co-authored-by: mudler <2420543+mudler@users.noreply.github.com>
2024-12-05 21:54:00 +00:00
Ettore Di Giacinto
3127cd1352 chore(docs): update available backends (#4325)
Signed-off-by: Ettore Di Giacinto <mudler@localai.io>
2024-12-05 16:57:56 +01:00
PetrFlegr
b90d78d9f6 Updated links of yamls (#4324)
Updated links

Links to deplyment*.yaml was changed

Signed-off-by: PetrFlegr <ptrflegr@gmail.com>
2024-12-05 16:06:51 +01:00
Ettore Di Giacinto
b86a3e4fa6 chore(model gallery): add math-iio-7b-instruct (#4323)
Signed-off-by: Ettore Di Giacinto <mudler@localai.io>
2024-12-05 10:05:35 +01:00
Ettore Di Giacinto
be907d993f chore(model gallery): add loki-v2.6-8b-1024k (#4321)
Signed-off-by: Ettore Di Giacinto <mudler@localai.io>
2024-12-05 10:02:02 +01:00
Ettore Di Giacinto
ab0f8648a3 chore(model gallery): add rp-naughty-v1.0c-8b (#4322)
Signed-off-by: Ettore Di Giacinto <mudler@localai.io>
2024-12-05 10:01:49 +01:00
LocalAI [bot]
c226149503 chore: ⬆️ Update leejet/stable-diffusion.cpp to 9578fdcc4632dc3de5565f28e2fb16b7c18f8d48 (#4320)
⬆️ Update leejet/stable-diffusion.cpp

Signed-off-by: github-actions[bot] <41898282+github-actions[bot]@users.noreply.github.com>
Co-authored-by: mudler <2420543+mudler@users.noreply.github.com>
2024-12-05 09:09:11 +01:00
LocalAI [bot]
4a079f893c chore: ⬆️ Update ggerganov/llama.cpp to 59f4db10883a4f3e855cffbf2c3ab68430e95272 (#4319)
⬆️ Update ggerganov/llama.cpp

Signed-off-by: github-actions[bot] <41898282+github-actions[bot]@users.noreply.github.com>
Co-authored-by: mudler <2420543+mudler@users.noreply.github.com>
2024-12-04 22:19:35 +00:00
373 changed files with 253005 additions and 8283 deletions

View File

@@ -1,23 +0,0 @@
meta {
name: musicgen
type: http
seq: 1
}
post {
url: {{PROTOCOL}}{{HOST}}:{{PORT}}/v1/sound-generation
body: json
auth: none
}
headers {
Content-Type: application/json
}
body:json {
{
"model_id": "facebook/musicgen-small",
"text": "Exciting 80s Newscast Interstitial",
"duration_seconds": 8
}
}

View File

@@ -1,17 +0,0 @@
meta {
name: backend monitor
type: http
seq: 4
}
get {
url: {{PROTOCOL}}{{HOST}}:{{PORT}}/backend/monitor
body: json
auth: none
}
body:json {
{
"model": "{{DEFAULT_MODEL}}"
}
}

View File

@@ -1,21 +0,0 @@
meta {
name: backend-shutdown
type: http
seq: 3
}
post {
url: {{PROTOCOL}}{{HOST}}:{{PORT}}/backend/shutdown
body: json
auth: none
}
headers {
Content-Type: application/json
}
body:json {
{
"model": "{{DEFAULT_MODEL}}"
}
}

View File

@@ -1,5 +0,0 @@
{
"version": "1",
"name": "LocalAI Test Requests",
"type": "collection"
}

View File

@@ -1,6 +0,0 @@
vars {
HOST: localhost
PORT: 8080
DEFAULT_MODEL: gpt-3.5-turbo
PROTOCOL: http://
}

View File

@@ -1,11 +0,0 @@
meta {
name: get models list
type: http
seq: 2
}
get {
url: {{PROTOCOL}}{{HOST}}:{{PORT}}/models
body: none
auth: none
}

View File

@@ -1,25 +0,0 @@
meta {
name: Generate image
type: http
seq: 1
}
post {
url: {{PROTOCOL}}{{HOST}}:{{PORT}}/v1/images/generations
body: json
auth: none
}
headers {
Content-Type: application/json
}
body:json {
{
"prompt": "<positive prompt>|<negative prompt>",
"model": "model-name",
"step": 51,
"size": "1024x1024",
"image": ""
}
}

View File

@@ -1,24 +0,0 @@
meta {
name: -completions
type: http
seq: 4
}
post {
url: {{PROTOCOL}}{{HOST}}:{{PORT}}/completions
body: json
auth: none
}
headers {
Content-Type: application/json
}
body:json {
{
"model": "{{DEFAULT_MODEL}}",
"prompt": "function downloadFile(string url, string outputPath) {",
"max_tokens": 256,
"temperature": 0.5
}
}

View File

@@ -1,23 +0,0 @@
meta {
name: -edits
type: http
seq: 5
}
post {
url: {{PROTOCOL}}{{HOST}}:{{PORT}}/edits
body: json
auth: none
}
headers {
Content-Type: application/json
}
body:json {
{
"model": "{{DEFAULT_MODEL}}",
"input": "What day of the wek is it?",
"instruction": "Fix the spelling mistakes"
}
}

View File

@@ -1,22 +0,0 @@
meta {
name: -embeddings
type: http
seq: 6
}
post {
url: {{PROTOCOL}}{{HOST}}:{{PORT}}/embeddings
body: json
auth: none
}
headers {
Content-Type: application/json
}
body:json {
{
"model": "{{DEFAULT_MODEL}}",
"input": "A STRANGE GAME.\nTHE ONLY WINNING MOVE IS NOT TO PLAY.\n\nHOW ABOUT A NICE GAME OF CHESS?"
}
}

View File

@@ -1,30 +0,0 @@
meta {
name: chat completion -simple- 1 message-
type: http
seq: 4
}
post {
url: {{PROTOCOL}}{{HOST}}:{{PORT}}/chat/completions
body: json
auth: none
}
headers {
Content-Type: application/json
}
body:json {
{
"model": "{{DEFAULT_MODEL}}",
"messages": [
{
"role": "user",
"content": "How could one use friction to cook an egg?"
}
],
"max_tokens": 256,
"temperature": 0.2,
"grammar": ""
}
}

View File

@@ -1,29 +0,0 @@
meta {
name: chat-completions -long-
type: http
seq: 5
}
post {
url: {{PROTOCOL}}{{HOST}}:{{PORT}}/chat/completions
body: json
auth: none
}
headers {
Content-Type: application/json
}
body:json {
{
"model": "{{DEFAULT_MODEL}}",
"messages": [{"role": "system", "content": "You are a helpful, intelligent pirate assistant. Always answer as helpfully as possible, while being a pirate and using pirate language. If a question does not make any sense, or is not factually coherent, explain why instead of answering something not correct. If you don't know the answer to a question, please don't share false information."},
{"role": "user", "content": "How could one use electricity to cook an egg?"},
{"role": "assistant",
"content": "Shiver me timbers! Using electricity to cook an egg? Well, matey, I reckon that be a right curious idea! *adjusts spectacles* Now, I ain't sure if this be possible, but I'll do me best to help ye out.\n\nFirst things first, ye gotta understand that electricity be a powerful force, and it can't just cook an egg on its own. Ye see, electricity be like a mighty wind that can make things happen, but it needs somethin' to work with. So, if ye want to use electricity to cook an egg, ye gotta find a way to harness that power and make it do the cookin'.\n\nNow, I know what ye might be thinkin': \"How do I harness the power of electricity to cook an egg?\" Well, matey, there be a few ways to do it. One way be to use a special device called an \"electric frying pan.\" This be a pan that has a built-in heating element that gets hot when ye plug it into a wall socket. When the element gets hot, ye can crack an egg into the pan and watch as it cook"
},
{"role": "user", "content": "I don't have one of those, just a raw wire and plenty of power! How do we get it done?"}],
"max_tokens": 1024,
"temperature": 0.5
}
}

View File

@@ -1,25 +0,0 @@
meta {
name: chat-completions -stream-
type: http
seq: 6
}
post {
url: {{PROTOCOL}}{{HOST}}:{{PORT}}/chat/completions
body: json
auth: none
}
headers {
Content-Type: application/json
}
body:json {
{
"model": "{{DEFAULT_MODEL}}",
"messages": [{"role": "user", "content": "Explain how I can set sail on the ocean using only power generated by seagulls?"}],
"max_tokens": 256,
"temperature": 0.9,
"stream": true
}
}

View File

@@ -1,22 +0,0 @@
meta {
name: add model gallery
type: http
seq: 10
}
post {
url: {{PROTOCOL}}{{HOST}}:{{PORT}}/models/galleries
body: json
auth: none
}
headers {
Content-Type: application/json
}
body:json {
{
"url": "file:///home/dave/projects/model-gallery/huggingface/TheBloke__CodeLlama-7B-Instruct-GGML.yaml",
"name": "test"
}
}

View File

@@ -1,21 +0,0 @@
meta {
name: delete model gallery
type: http
seq: 11
}
delete {
url: {{PROTOCOL}}{{HOST}}:{{PORT}}/models/galleries
body: json
auth: none
}
headers {
Content-Type: application/json
}
body:json {
{
"name": "test"
}
}

View File

@@ -1,11 +0,0 @@
meta {
name: list MODELS in galleries
type: http
seq: 7
}
get {
url: {{PROTOCOL}}{{HOST}}:{{PORT}}/models/available
body: none
auth: none
}

View File

@@ -1,11 +0,0 @@
meta {
name: list model GALLERIES
type: http
seq: 8
}
get {
url: {{PROTOCOL}}{{HOST}}:{{PORT}}/models/galleries
body: none
auth: none
}

View File

@@ -1,11 +0,0 @@
meta {
name: model delete
type: http
seq: 7
}
post {
url: {{PROTOCOL}}{{HOST}}:{{PORT}}/models/galleries
body: none
auth: none
}

View File

@@ -1,21 +0,0 @@
meta {
name: model gallery apply -gist-
type: http
seq: 12
}
post {
url: {{PROTOCOL}}{{HOST}}:{{PORT}}/models/apply
body: json
auth: none
}
headers {
Content-Type: application/json
}
body:json {
{
"id": "TheBloke__CodeLlama-7B-Instruct-GGML__codellama-7b-instruct.ggmlv3.Q2_K.bin"
}
}

View File

@@ -1,22 +0,0 @@
meta {
name: model gallery apply
type: http
seq: 9
}
post {
url: {{PROTOCOL}}{{HOST}}:{{PORT}}/models/apply
body: json
auth: none
}
headers {
Content-Type: application/json
}
body:json {
{
"id": "dave@TheBloke__CodeLlama-7B-Instruct-GGML__codellama-7b-instruct.ggmlv3.Q3_K_S.bin",
"name": "codellama7b"
}
}

View File

Binary file not shown.

View File

@@ -1,16 +0,0 @@
meta {
name: transcribe
type: http
seq: 1
}
post {
url: {{PROTOCOL}}{{HOST}}:{{PORT}}/v1/audio/transcriptions
body: multipartForm
auth: none
}
body:multipart-form {
file: @file(transcription/gb1.ogg)
model: whisper-1
}

View File

@@ -1,22 +0,0 @@
meta {
name: -tts
type: http
seq: 2
}
post {
url: {{PROTOCOL}}{{HOST}}:{{PORT}}/tts
body: json
auth: none
}
headers {
Content-Type: application/json
}
body:json {
{
"model": "{{DEFAULT_MODEL}}",
"input": "A STRANGE GAME.\nTHE ONLY WINNING MOVE IS NOT TO PLAY.\n\nHOW ABOUT A NICE GAME OF CHESS?"
}
}

View File

@@ -1,23 +0,0 @@
meta {
name: musicgen
type: http
seq: 2
}
post {
url: {{PROTOCOL}}{{HOST}}:{{PORT}}/tts
body: json
auth: none
}
headers {
Content-Type: application/json
}
body:json {
{
"backend": "transformers-musicgen",
"model": "facebook/musicgen-small",
"input": "80s Synths playing Jazz"
}
}

View File

@@ -7,7 +7,7 @@ services:
args:
- FFMPEG=true
- IMAGE_TYPE=extras
- GO_TAGS=stablediffusion p2p tts
- GO_TAGS=p2p tts
env_file:
- ../.env
ports:

15
.env
View File

@@ -38,12 +38,12 @@
## Uncomment and set to true to enable rebuilding from source
# REBUILD=true
## Enable go tags, available: stablediffusion, tts
## stablediffusion: image generation with stablediffusion
## Enable go tags, available: p2p, tts
## p2p: enable distributed inferencing
## tts: enables text-to-speech with go-piper
## (requires REBUILD=true)
#
# GO_TAGS=stablediffusion
# GO_TAGS=p2p
## Path where to store generated images
# LOCALAI_IMAGE_PATH=/tmp/generated/images
@@ -82,6 +82,15 @@
# Enable to allow p2p mode
# LOCALAI_P2P=true
# Enable to use federated mode
# LOCALAI_FEDERATED=true
# Enable to start federation server
# FEDERATED_SERVER=true
# Define to use federation token
# TOKEN=""
### Watchdog settings
###
# Enables watchdog to kill backends that are inactive for too much time

View File

@@ -81,14 +81,6 @@ updates:
directory: "/backend/python/transformers"
schedule:
interval: "weekly"
- package-ecosystem: "pip"
directory: "/backend/python/transformers-musicgen"
schedule:
interval: "weekly"
- package-ecosystem: "pip"
directory: "/backend/python/vall-e-x"
schedule:
interval: "weekly"
- package-ecosystem: "pip"
directory: "/backend/python/vllm"
schedule:

6
.github/labeler.yml vendored
View File

@@ -1,10 +1,14 @@
enhancements:
enhancement:
- head-branch: ['^feature', 'feature']
dependencies:
- any:
- changed-files:
- any-glob-to-any-file: 'Makefile'
- changed-files:
- any-glob-to-any-file: '*.mod'
- changed-files:
- any-glob-to-any-file: '*.sum'
kind/documentation:
- any:

View File

@@ -9,7 +9,7 @@ jobs:
fail-fast: false
matrix:
include:
- repository: "ggerganov/llama.cpp"
- repository: "ggml-org/llama.cpp"
variable: "CPPLLAMA_VERSION"
branch: "master"
- repository: "ggerganov/whisper.cpp"

View File

@@ -14,7 +14,7 @@ jobs:
steps:
- name: Dependabot metadata
id: metadata
uses: dependabot/fetch-metadata@v2.2.0
uses: dependabot/fetch-metadata@v2.3.0
with:
github-token: "${{ secrets.GITHUB_TOKEN }}"
skip-commit-verification: true

View File

@@ -33,7 +33,7 @@ jobs:
run: |
CGO_ENABLED=0 make build-api
- name: rm
uses: appleboy/ssh-action@v1.2.0
uses: appleboy/ssh-action@v1.2.2
with:
host: ${{ secrets.EXPLORER_SSH_HOST }}
username: ${{ secrets.EXPLORER_SSH_USERNAME }}
@@ -53,7 +53,7 @@ jobs:
rm: true
target: ./local-ai
- name: restarting
uses: appleboy/ssh-action@v1.2.0
uses: appleboy/ssh-action@v1.2.2
with:
host: ${{ secrets.EXPLORER_SSH_HOST }}
username: ${{ secrets.EXPLORER_SSH_USERNAME }}

View File

@@ -2,9 +2,10 @@ name: 'generate and publish GRPC docker caches'
on:
workflow_dispatch:
push:
branches:
- master
schedule:
# daily at midnight
- cron: '0 0 * * *'
concurrency:
group: grpc-cache-${{ github.head_ref || github.ref }}-${{ github.repository }}
@@ -16,7 +17,7 @@ jobs:
matrix:
include:
- grpc-base-image: ubuntu:22.04
runs-on: 'ubuntu-latest'
runs-on: 'arc-runner-set'
platforms: 'linux/amd64,linux/arm64'
runs-on: ${{matrix.runs-on}}
steps:

View File

@@ -280,6 +280,7 @@ jobs:
makeflags: ${{ matrix.makeflags }}
latest-image: ${{ matrix.latest-image }}
latest-image-aio: ${{ matrix.latest-image-aio }}
skip-drivers: ${{ matrix.skip-drivers }}
secrets:
dockerUsername: ${{ secrets.DOCKERHUB_USERNAME }}
dockerPassword: ${{ secrets.DOCKERHUB_PASSWORD }}
@@ -301,6 +302,7 @@ jobs:
latest-image: 'latest-cpu'
latest-image-aio: 'latest-aio-cpu'
makeflags: "--jobs=4 --output-sync=target"
skip-drivers: 'false'
- build-type: 'cublas'
cuda-major-version: "11"
cuda-minor-version: "7"
@@ -312,6 +314,7 @@ jobs:
base-image: "ubuntu:22.04"
runs-on: 'arc-runner-set'
makeflags: "--jobs=4 --output-sync=target"
skip-drivers: 'false'
- build-type: 'cublas'
cuda-major-version: "12"
cuda-minor-version: "0"
@@ -323,6 +326,7 @@ jobs:
base-image: "ubuntu:22.04"
runs-on: 'arc-runner-set'
makeflags: "--jobs=4 --output-sync=target"
skip-drivers: 'false'
- build-type: 'cublas'
cuda-major-version: "11"
cuda-minor-version: "7"
@@ -334,6 +338,7 @@ jobs:
runs-on: 'arc-runner-set'
base-image: "ubuntu:22.04"
makeflags: "--jobs=4 --output-sync=target"
skip-drivers: 'false'
- build-type: 'cublas'
cuda-major-version: "12"
cuda-minor-version: "0"
@@ -344,6 +349,7 @@ jobs:
image-type: 'core'
runs-on: 'arc-runner-set'
base-image: "ubuntu:22.04"
skip-drivers: 'false'
makeflags: "--jobs=4 --output-sync=target"
- build-type: 'vulkan'
platforms: 'linux/amd64'
@@ -354,4 +360,45 @@ jobs:
image-type: 'core'
runs-on: 'arc-runner-set'
base-image: "ubuntu:22.04"
skip-drivers: 'false'
makeflags: "--jobs=4 --output-sync=target"
gh-runner:
uses: ./.github/workflows/image_build.yml
with:
tag-latest: ${{ matrix.tag-latest }}
tag-suffix: ${{ matrix.tag-suffix }}
ffmpeg: ${{ matrix.ffmpeg }}
image-type: ${{ matrix.image-type }}
build-type: ${{ matrix.build-type }}
cuda-major-version: ${{ matrix.cuda-major-version }}
cuda-minor-version: ${{ matrix.cuda-minor-version }}
platforms: ${{ matrix.platforms }}
runs-on: ${{ matrix.runs-on }}
aio: ${{ matrix.aio }}
base-image: ${{ matrix.base-image }}
grpc-base-image: ${{ matrix.grpc-base-image }}
makeflags: ${{ matrix.makeflags }}
latest-image: ${{ matrix.latest-image }}
latest-image-aio: ${{ matrix.latest-image-aio }}
skip-drivers: ${{ matrix.skip-drivers }}
secrets:
dockerUsername: ${{ secrets.DOCKERHUB_USERNAME }}
dockerPassword: ${{ secrets.DOCKERHUB_PASSWORD }}
quayUsername: ${{ secrets.LOCALAI_REGISTRY_USERNAME }}
quayPassword: ${{ secrets.LOCALAI_REGISTRY_PASSWORD }}
strategy:
matrix:
include:
- build-type: 'cublas'
cuda-major-version: "12"
cuda-minor-version: "0"
platforms: 'linux/arm64'
tag-latest: 'false'
tag-suffix: '-nvidia-l4t-arm64-core'
latest-image: 'latest-nvidia-l4t-arm64-core'
ffmpeg: 'true'
image-type: 'core'
base-image: "nvcr.io/nvidia/l4t-jetpack:r36.4.0"
runs-on: 'ubuntu-24.04-arm'
makeflags: "--jobs=4 --output-sync=target"
skip-drivers: 'true'

View File

@@ -49,6 +49,10 @@ on:
description: 'FFMPEG'
default: ''
type: string
skip-drivers:
description: 'Skip drivers by default'
default: 'false'
type: string
image-type:
description: 'Image type'
default: ''
@@ -234,6 +238,7 @@ jobs:
GRPC_MAKEFLAGS=--jobs=4 --output-sync=target
GRPC_VERSION=v1.65.0
MAKEFLAGS=${{ inputs.makeflags }}
SKIP_DRIVERS=${{ inputs.skip-drivers }}
context: .
file: ./Dockerfile
cache-from: type=gha
@@ -262,6 +267,7 @@ jobs:
GRPC_MAKEFLAGS=--jobs=4 --output-sync=target
GRPC_VERSION=v1.65.0
MAKEFLAGS=${{ inputs.makeflags }}
SKIP_DRIVERS=${{ inputs.skip-drivers }}
context: .
file: ./Dockerfile
cache-from: type=gha
@@ -304,6 +310,11 @@ jobs:
tags: ${{ steps.meta_aio_dockerhub.outputs.tags }}
labels: ${{ steps.meta_aio_dockerhub.outputs.labels }}
- name: Cleanup
run: |
docker builder prune -f
docker system prune --force --volumes --all
- name: Latest tag
# run this on branches, when it is a tag and there is a latest-image defined
if: github.event_name != 'pull_request' && inputs.latest-image != '' && github.ref_type == 'tag'

View File

@@ -18,7 +18,7 @@ jobs:
with:
model: 'hermes-2-theta-llama-3-8b' # Any from models.localai.io, or from huggingface.com with: "huggingface://<repository>/file"
# Check the PR diff using the current branch and the base branch of the PR
- uses: GrantBirki/git-diff-action@v2.7.0
- uses: GrantBirki/git-diff-action@v2.8.0
id: git-diff-action
with:
json_diff_file_output: diff.json
@@ -99,7 +99,7 @@ jobs:
docker run -e -ti -d --name local-ai -p 8080:8080 localai/localai:master-ffmpeg-core run --debug $MODEL_NAME
until [ "`docker inspect -f {{.State.Health.Status}} local-ai`" == "healthy" ]; do echo "Waiting for container to be ready"; docker logs --tail 10 local-ai; sleep 2; done
# Check the PR diff using the current branch and the base branch of the PR
- uses: GrantBirki/git-diff-action@v2.7.0
- uses: GrantBirki/git-diff-action@v2.8.0
id: git-diff-action
with:
json_diff_file_output: diff.json

View File

@@ -237,40 +237,7 @@ jobs:
detached: true
connect-timeout-seconds: 180
limit-access-to-actor: true
build-stablediffusion:
runs-on: ubuntu-latest
steps:
- name: Clone
uses: actions/checkout@v4
with:
submodules: true
- uses: actions/setup-go@v5
with:
go-version: '1.21.x'
cache: false
- name: Dependencies
run: |
sudo apt-get update
sudo apt-get install -y --no-install-recommends libopencv-dev protobuf-compiler ccache upx-ucl
go install google.golang.org/grpc/cmd/protoc-gen-go-grpc@1958fcbe2ca8bd93af633f11e97d44e567e945af
go install google.golang.org/protobuf/cmd/protoc-gen-go@v1.34.2
- name: Build stablediffusion
run: |
export PATH=$PATH:$GOPATH/bin
make backend-assets/grpc/stablediffusion
mkdir -p release && cp backend-assets/grpc/stablediffusion release
env:
GO_TAGS: stablediffusion
- uses: actions/upload-artifact@v4
with:
name: stablediffusion
path: release/
- name: Release
uses: softprops/action-gh-release@v2
if: startsWith(github.ref, 'refs/tags/')
with:
files: |
release/*
build-macOS-x86_64:
runs-on: macos-13

View File

@@ -18,7 +18,7 @@ jobs:
if: ${{ github.actor != 'dependabot[bot]' }}
- name: Run Gosec Security Scanner
if: ${{ github.actor != 'dependabot[bot]' }}
uses: securego/gosec@v2.21.4
uses: securego/gosec@v2.22.0
with:
# we let the report trigger content trigger a failure using the GitHub Security features.
args: '-no-fail -fmt sarif -out results.sarif ./...'

View File

@@ -35,30 +35,6 @@ jobs:
run: |
make --jobs=5 --output-sync=target -C backend/python/transformers
make --jobs=5 --output-sync=target -C backend/python/transformers test
tests-sentencetransformers:
runs-on: ubuntu-latest
steps:
- name: Clone
uses: actions/checkout@v4
with:
submodules: true
- name: Dependencies
run: |
sudo apt-get update
sudo apt-get install build-essential ffmpeg
# Install UV
curl -LsSf https://astral.sh/uv/install.sh | sh
sudo apt-get install -y ca-certificates cmake curl patch python3-pip
sudo apt-get install -y libopencv-dev
pip install --user --no-cache-dir grpcio-tools==1.64.1
- name: Test sentencetransformers
run: |
make --jobs=5 --output-sync=target -C backend/python/sentencetransformers
make --jobs=5 --output-sync=target -C backend/python/sentencetransformers test
tests-rerankers:
runs-on: ubuntu-latest
steps:
@@ -102,78 +78,27 @@ jobs:
make --jobs=5 --output-sync=target -C backend/python/diffusers
make --jobs=5 --output-sync=target -C backend/python/diffusers test
tests-parler-tts:
runs-on: ubuntu-latest
steps:
- name: Clone
uses: actions/checkout@v4
with:
submodules: true
- name: Dependencies
run: |
sudo apt-get update
sudo apt-get install build-essential ffmpeg
# Install UV
curl -LsSf https://astral.sh/uv/install.sh | sh
sudo apt-get install -y ca-certificates cmake curl patch python3-pip
sudo apt-get install -y libopencv-dev
pip install --user --no-cache-dir grpcio-tools==1.64.1
# tests-transformers-musicgen:
# runs-on: ubuntu-latest
# steps:
# - name: Clone
# uses: actions/checkout@v4
# with:
# submodules: true
# - name: Dependencies
# run: |
# sudo apt-get update
# sudo apt-get install build-essential ffmpeg
# # Install UV
# curl -LsSf https://astral.sh/uv/install.sh | sh
# sudo apt-get install -y ca-certificates cmake curl patch python3-pip
# sudo apt-get install -y libopencv-dev
# pip install --user --no-cache-dir grpcio-tools==1.64.1
- name: Test parler-tts
run: |
make --jobs=5 --output-sync=target -C backend/python/parler-tts
make --jobs=5 --output-sync=target -C backend/python/parler-tts test
- name: Setup tmate session if tests fail
if: ${{ failure() }}
uses: mxschmitt/action-tmate@v3.19
with:
detached: true
connect-timeout-seconds: 180
limit-access-to-actor: true
tests-openvoice:
runs-on: ubuntu-latest
steps:
- name: Clone
uses: actions/checkout@v4
with:
submodules: true
- name: Dependencies
run: |
sudo apt-get update
sudo apt-get install build-essential ffmpeg
# Install UV
curl -LsSf https://astral.sh/uv/install.sh | sh
sudo apt-get install -y ca-certificates cmake curl patch python3-pip
sudo apt-get install -y libopencv-dev
pip install --user --no-cache-dir grpcio-tools==1.64.1
- name: Test openvoice
run: |
make --jobs=5 --output-sync=target -C backend/python/openvoice
make --jobs=5 --output-sync=target -C backend/python/openvoice test
tests-transformers-musicgen:
runs-on: ubuntu-latest
steps:
- name: Clone
uses: actions/checkout@v4
with:
submodules: true
- name: Dependencies
run: |
sudo apt-get update
sudo apt-get install build-essential ffmpeg
# Install UV
curl -LsSf https://astral.sh/uv/install.sh | sh
sudo apt-get install -y ca-certificates cmake curl patch python3-pip
sudo apt-get install -y libopencv-dev
pip install --user --no-cache-dir grpcio-tools==1.64.1
- name: Test transformers-musicgen
run: |
make --jobs=5 --output-sync=target -C backend/python/transformers-musicgen
make --jobs=5 --output-sync=target -C backend/python/transformers-musicgen test
# - name: Test transformers-musicgen
# run: |
# make --jobs=5 --output-sync=target -C backend/python/transformers-musicgen
# make --jobs=5 --output-sync=target -C backend/python/transformers-musicgen test
# tests-bark:
# runs-on: ubuntu-latest
@@ -260,26 +185,6 @@ jobs:
# run: |
# make --jobs=5 --output-sync=target -C backend/python/vllm
# make --jobs=5 --output-sync=target -C backend/python/vllm test
tests-vallex:
runs-on: ubuntu-latest
steps:
- name: Clone
uses: actions/checkout@v4
with:
submodules: true
- name: Dependencies
run: |
sudo apt-get update
sudo apt-get install build-essential ffmpeg
# Install UV
curl -LsSf https://astral.sh/uv/install.sh | sh
sudo apt-get install -y ca-certificates cmake curl patch python3-pip
sudo apt-get install -y libopencv-dev
pip install --user --no-cache-dir grpcio-tools==1.64.1
- name: Test vall-e-x
run: |
make --jobs=5 --output-sync=target -C backend/python/vall-e-x
make --jobs=5 --output-sync=target -C backend/python/vall-e-x test
tests-coqui:
runs-on: ubuntu-latest

View File

@@ -100,15 +100,12 @@ jobs:
# The python3-grpc-tools package in 22.04 is too old
pip install --user grpcio-tools
sudo rm -rfv /usr/bin/conda || true
PATH=$PATH:/opt/conda/bin make -C backend/python/sentencetransformers
make -C backend/python/transformers
# Pre-build piper before we start tests in order to have shared libraries in place
make sources/go-piper && \
GO_TAGS="tts" make -C sources/go-piper piper.o && \
sudo cp -rfv sources/go-piper/piper-phonemize/pi/lib/. /usr/lib/ && \
# Pre-build stable diffusion before we install a newer version of abseil (not compatible with stablediffusion-ncn)
PATH="$PATH:/root/go/bin" GO_TAGS="stablediffusion tts" GRPC_BACKENDS=backend-assets/grpc/stablediffusion make build
sudo cp -rfv sources/go-piper/piper-phonemize/pi/lib/. /usr/lib/
env:
CUDA_VERSION: 12-4
- name: Cache grpc
@@ -130,7 +127,7 @@ jobs:
cd grpc && cd cmake/build && sudo make --jobs 5 install
- name: Test
run: |
PATH="$PATH:/root/go/bin" GO_TAGS="stablediffusion tts" make --jobs 5 --output-sync=target test
PATH="$PATH:/root/go/bin" GO_TAGS="tts" make --jobs 5 --output-sync=target test
- name: Setup tmate session if tests fail
if: ${{ failure() }}
uses: mxschmitt/action-tmate@v3.19

2
.vscode/launch.json vendored
View File

@@ -26,7 +26,7 @@
"LOCALAI_P2P": "true",
"LOCALAI_FEDERATED": "true"
},
"buildFlags": ["-tags", "stablediffusion p2p tts", "-v"],
"buildFlags": ["-tags", "p2p tts", "-v"],
"envFile": "${workspaceFolder}/.env",
"cwd": "${workspaceRoot}"
}

View File

@@ -15,8 +15,7 @@ ARG TARGETARCH
ARG TARGETVARIANT
ENV DEBIAN_FRONTEND=noninteractive
ENV EXTERNAL_GRPC_BACKENDS="coqui:/build/backend/python/coqui/run.sh,huggingface-embeddings:/build/backend/python/sentencetransformers/run.sh,transformers:/build/backend/python/transformers/run.sh,sentencetransformers:/build/backend/python/sentencetransformers/run.sh,rerankers:/build/backend/python/rerankers/run.sh,autogptq:/build/backend/python/autogptq/run.sh,bark:/build/backend/python/bark/run.sh,diffusers:/build/backend/python/diffusers/run.sh,openvoice:/build/backend/python/openvoice/run.sh,vall-e-x:/build/backend/python/vall-e-x/run.sh,vllm:/build/backend/python/vllm/run.sh,mamba:/build/backend/python/mamba/run.sh,exllama2:/build/backend/python/exllama2/run.sh,transformers-musicgen:/build/backend/python/transformers-musicgen/run.sh,parler-tts:/build/backend/python/parler-tts/run.sh"
ENV EXTERNAL_GRPC_BACKENDS="coqui:/build/backend/python/coqui/run.sh,transformers:/build/backend/python/transformers/run.sh,rerankers:/build/backend/python/rerankers/run.sh,autogptq:/build/backend/python/autogptq/run.sh,bark:/build/backend/python/bark/run.sh,diffusers:/build/backend/python/diffusers/run.sh,faster-whisper:/build/backend/python/faster-whisper/run.sh,kokoro:/build/backend/python/kokoro/run.sh,vllm:/build/backend/python/vllm/run.sh,exllama2:/build/backend/python/exllama2/run.sh"
RUN apt-get update && \
apt-get install -y --no-install-recommends \
@@ -69,14 +68,10 @@ ENV PATH=/opt/rocm/bin:${PATH}
# OpenBLAS requirements and stable diffusion
RUN apt-get update && \
apt-get install -y --no-install-recommends \
libopenblas-dev \
libopencv-dev && \
libopenblas-dev && \
apt-get clean && \
rm -rf /var/lib/apt/lists/*
# Set up OpenCV
RUN ln -s /usr/include/opencv4/opencv2 /usr/include/opencv2
WORKDIR /build
###################################
@@ -115,12 +110,13 @@ FROM requirements-${IMAGE_TYPE} AS requirements-drivers
ARG BUILD_TYPE
ARG CUDA_MAJOR_VERSION=12
ARG CUDA_MINOR_VERSION=0
ARG SKIP_DRIVERS=false
ENV BUILD_TYPE=${BUILD_TYPE}
# Vulkan requirements
RUN <<EOT bash
if [ "${BUILD_TYPE}" = "vulkan" ]; then
if [ "${BUILD_TYPE}" = "vulkan" ] && [ "${SKIP_DRIVERS}" = "false" ]; then
apt-get update && \
apt-get install -y --no-install-recommends \
software-properties-common pciutils wget gpg-agent && \
@@ -136,7 +132,7 @@ EOT
# CuBLAS requirements
RUN <<EOT bash
if [ "${BUILD_TYPE}" = "cublas" ]; then
if [ "${BUILD_TYPE}" = "cublas" ] && [ "${SKIP_DRIVERS}" = "false" ]; then
apt-get update && \
apt-get install -y --no-install-recommends \
software-properties-common pciutils
@@ -162,7 +158,7 @@ RUN <<EOT bash
EOT
# If we are building with clblas support, we need the libraries for the builds
RUN if [ "${BUILD_TYPE}" = "clblas" ]; then \
RUN if [ "${BUILD_TYPE}" = "clblas" ] && [ "${SKIP_DRIVERS}" = "false" ]; then \
apt-get update && \
apt-get install -y --no-install-recommends \
libclblast-dev && \
@@ -170,7 +166,7 @@ RUN if [ "${BUILD_TYPE}" = "clblas" ]; then \
rm -rf /var/lib/apt/lists/* \
; fi
RUN if [ "${BUILD_TYPE}" = "hipblas" ]; then \
RUN if [ "${BUILD_TYPE}" = "hipblas" ] && [ "${SKIP_DRIVERS}" = "false" ]; then \
apt-get update && \
apt-get install -y --no-install-recommends \
hipblas-dev \
@@ -250,7 +246,7 @@ RUN git clone --recurse-submodules --jobs 4 -b ${GRPC_VERSION} --depth 1 --shall
FROM requirements-drivers AS builder-base
ARG GO_TAGS="stablediffusion tts p2p"
ARG GO_TAGS="tts p2p"
ARG GRPC_BACKENDS
ARG MAKEFLAGS
ARG LD_FLAGS="-s -w"
@@ -284,35 +280,12 @@ RUN <<EOT bash
fi
EOT
###################################
###################################
# This first portion of builder holds the layers specifically used to build backend-assets/grpc/stablediffusion
# In most cases, builder is the image you should be using - however, this can save build time if one just needs to copy backend-assets/grpc/stablediffusion and nothing else.
FROM builder-base AS builder-sd
# stablediffusion does not tolerate a newer version of abseil, copy only over enough elements to build it
COPY Makefile .
COPY go.mod .
COPY go.sum .
COPY backend/backend.proto ./backend/backend.proto
COPY backend/go/image/stablediffusion ./backend/go/image/stablediffusion
COPY pkg/grpc ./pkg/grpc
COPY pkg/stablediffusion ./pkg/stablediffusion
RUN git init
RUN make sources/go-stable-diffusion
RUN touch prepare-sources
# Actually build the backend
RUN GRPC_BACKENDS=backend-assets/grpc/stablediffusion make backend-assets/grpc/stablediffusion
###################################
###################################
# The builder target compiles LocalAI. This target is not the target that will be uploaded to the registry.
# Adjustments to the build process should likely be made here.
FROM builder-sd AS builder
FROM builder-base AS builder
# Install the pre-built GRPC
COPY --from=grpc /opt/grpc /usr/local
@@ -330,7 +303,7 @@ RUN make prepare
## We only leave the most CPU-optimized variant and the fallback for the cublas/hipblas build
## (both will use CUDA or hipblas for the actual computation)
RUN if [ "${BUILD_TYPE}" = "cublas" ] || [ "${BUILD_TYPE}" = "hipblas" ]; then \
SKIP_GRPC_BACKEND="backend-assets/grpc/llama-cpp-avx backend-assets/grpc/llama-cpp-avx2" make build; \
SKIP_GRPC_BACKEND="backend-assets/grpc/llama-cpp-avx512 backend-assets/grpc/llama-cpp-avx backend-assets/grpc/llama-cpp-avx2" make build; \
else \
make build; \
fi
@@ -352,8 +325,6 @@ ARG FFMPEG
COPY --from=grpc /opt/grpc /usr/local
COPY --from=builder-sd /build/backend-assets/grpc/stablediffusion /build/backend-assets/grpc/stablediffusion
COPY .devcontainer-scripts /.devcontainer-scripts
# Add FFmpeg
@@ -426,36 +397,28 @@ COPY --from=builder /build/local-ai ./
# Copy shared libraries for piper
COPY --from=builder /build/sources/go-piper/piper-phonemize/pi/lib/* /usr/lib/
# do not let stablediffusion rebuild (requires an older version of absl)
COPY --from=builder-sd /build/backend-assets/grpc/stablediffusion ./backend-assets/grpc/stablediffusion
# Change the shell to bash so we can use [[ tests below
SHELL ["/bin/bash", "-c"]
# We try to strike a balance between individual layer size (as that affects total push time) and total image size
# Splitting the backends into more groups with fewer items results in a larger image, but a smaller size for the largest layer
# Splitting the backends into fewer groups with more items results in a smaller image, but a larger size for the largest layer
RUN if [[ ( "${IMAGE_TYPE}" == "extras ")]]; then \
apt-get -qq -y install espeak-ng \
; fi
RUN if [[ ( "${EXTRA_BACKENDS}" =~ "coqui" || -z "${EXTRA_BACKENDS}" ) && "$IMAGE_TYPE" == "extras" ]]; then \
make -C backend/python/coqui \
; fi && \
if [[ ( "${EXTRA_BACKENDS}" =~ "parler-tts" || -z "${EXTRA_BACKENDS}" ) && "$IMAGE_TYPE" == "extras" ]]; then \
make -C backend/python/parler-tts \
if [[ ( "${EXTRA_BACKENDS}" =~ "faster-whisper" || -z "${EXTRA_BACKENDS}" ) && "$IMAGE_TYPE" == "extras" ]]; then \
make -C backend/python/faster-whisper \
; fi && \
if [[ ( "${EXTRA_BACKENDS}" =~ "diffusers" || -z "${EXTRA_BACKENDS}" ) && "$IMAGE_TYPE" == "extras" ]]; then \
make -C backend/python/diffusers \
; fi && \
if [[ ( "${EXTRA_BACKENDS}" =~ "transformers-musicgen" || -z "${EXTRA_BACKENDS}" ) && "$IMAGE_TYPE" == "extras" ]]; then \
make -C backend/python/transformers-musicgen \
; fi
RUN if [[ ( "${EXTRA_BACKENDS}" =~ "vall-e-x" || -z "${EXTRA_BACKENDS}" ) && "$IMAGE_TYPE" == "extras" ]]; then \
make -C backend/python/vall-e-x \
; fi && \
if [[ ( "${EXTRA_BACKENDS}" =~ "openvoice" || -z "${EXTRA_BACKENDS}" ) && "$IMAGE_TYPE" == "extras" ]]; then \
make -C backend/python/openvoice \
; fi && \
if [[ ( "${EXTRA_BACKENDS}" =~ "sentencetransformers" || -z "${EXTRA_BACKENDS}" ) && "$IMAGE_TYPE" == "extras" ]]; then \
make -C backend/python/sentencetransformers \
RUN if [[ ( "${EXTRA_BACKENDS}" =~ "kokoro" || -z "${EXTRA_BACKENDS}" ) && "$IMAGE_TYPE" == "extras" ]]; then \
make -C backend/python/kokoro \
; fi && \
if [[ ( "${EXTRA_BACKENDS}" =~ "exllama2" || -z "${EXTRA_BACKENDS}" ) && "$IMAGE_TYPE" == "extras" ]]; then \
make -C backend/python/exllama2 \
@@ -475,9 +438,6 @@ RUN if [[ ( "${EXTRA_BACKENDS}" =~ "vllm" || -z "${EXTRA_BACKENDS}" ) && "$IMAGE
; fi && \
if [[ ( "${EXTRA_BACKENDS}" =~ "rerankers" || -z "${EXTRA_BACKENDS}" ) && "$IMAGE_TYPE" == "extras" ]]; then \
make -C backend/python/rerankers \
; fi && \
if [[ ( "${EXTRA_BACKENDS}" =~ "mamba" || -z "${EXTRA_BACKENDS}" ) && "$IMAGE_TYPE" == "extras" ]]; then \
make -C backend/python/mamba \
; fi
# Make sure the models directory exists

View File

@@ -1,6 +1,6 @@
MIT License
Copyright (c) 2023-2024 Ettore Di Giacinto (mudler@localai.io)
Copyright (c) 2023-2025 Ettore Di Giacinto (mudler@localai.io)
Permission is hereby granted, free of charge, to any person obtaining a copy
of this software and associated documentation files (the "Software"), to deal

202
Makefile
View File

@@ -6,9 +6,7 @@ BINARY_NAME=local-ai
DETECT_LIBS?=true
# llama.cpp versions
GOLLAMA_REPO?=https://github.com/go-skynet/go-llama.cpp
GOLLAMA_VERSION?=2b57a8ae43e4699d3dc5d1496a1ccd42922993be
CPPLLAMA_VERSION?=cc98896db858df7aa40d0e16a505883ef196a482
CPPLLAMA_VERSION?=10f2e81809bbb69ecfe64fc8b4686285f84b0c07
# whisper.cpp version
WHISPER_REPO?=https://github.com/ggerganov/whisper.cpp
@@ -18,21 +16,13 @@ WHISPER_CPP_VERSION?=6266a9f9e56a5b925e9892acf650f3eb1245814d
PIPER_REPO?=https://github.com/mudler/go-piper
PIPER_VERSION?=e10ca041a885d4a8f3871d52924b47792d5e5aa0
# stablediffusion version
STABLEDIFFUSION_REPO?=https://github.com/mudler/go-stable-diffusion
STABLEDIFFUSION_VERSION?=4a3cd6aeae6f66ee57eae9a0075f8c58c3a6a38f
# tinydream version
TINYDREAM_REPO?=https://github.com/M0Rf30/go-tiny-dream
TINYDREAM_VERSION?=c04fa463ace9d9a6464313aa5f9cd0f953b6c057
# bark.cpp
BARKCPP_REPO?=https://github.com/PABannier/bark.cpp.git
BARKCPP_VERSION?=v1.0.0
# stablediffusion.cpp (ggml)
STABLEDIFFUSION_GGML_REPO?=https://github.com/leejet/stable-diffusion.cpp
STABLEDIFFUSION_GGML_VERSION?=4570715727f35e5a07a76796d823824c8f42206c
STABLEDIFFUSION_GGML_VERSION?=19d876ee300a055629926ff836489901f734f2b7
ONNX_VERSION?=1.20.0
ONNX_ARCH?=x64
@@ -159,7 +149,6 @@ ifeq ($(BUILD_TYPE),hipblas)
LD_LIBRARY_PATH ?= /opt/rocm/lib:/opt/rocm/llvm/lib
export CXX=$(ROCM_HOME)/llvm/bin/clang++
export CC=$(ROCM_HOME)/llvm/bin/clang
# llama-ggml has no hipblas support, so override it here.
export STABLE_BUILD_TYPE=
export GGML_HIP=1
GPU_TARGETS ?= gfx900,gfx906,gfx908,gfx940,gfx941,gfx942,gfx90a,gfx1030,gfx1031,gfx1100,gfx1101
@@ -183,16 +172,6 @@ ifeq ($(STATIC),true)
LD_FLAGS+=-linkmode external -extldflags -static
endif
ifeq ($(findstring stablediffusion,$(GO_TAGS)),stablediffusion)
# OPTIONAL_TARGETS+=go-stable-diffusion/libstablediffusion.a
OPTIONAL_GRPC+=backend-assets/grpc/stablediffusion
endif
ifeq ($(findstring tinydream,$(GO_TAGS)),tinydream)
# OPTIONAL_TARGETS+=go-tiny-dream/libtinydream.a
OPTIONAL_GRPC+=backend-assets/grpc/tinydream
endif
ifeq ($(findstring tts,$(GO_TAGS)),tts)
# OPTIONAL_TARGETS+=go-piper/libpiper_binding.a
# OPTIONAL_TARGETS+=backend-assets/espeak-ng-data
@@ -204,8 +183,8 @@ endif
ALL_GRPC_BACKENDS=backend-assets/grpc/huggingface
ALL_GRPC_BACKENDS+=backend-assets/grpc/llama-cpp-avx
ALL_GRPC_BACKENDS+=backend-assets/grpc/llama-cpp-avx2
ALL_GRPC_BACKENDS+=backend-assets/grpc/llama-cpp-avx512
ALL_GRPC_BACKENDS+=backend-assets/grpc/llama-cpp-fallback
ALL_GRPC_BACKENDS+=backend-assets/grpc/llama-ggml
ALL_GRPC_BACKENDS+=backend-assets/grpc/llama-cpp-grpc
ALL_GRPC_BACKENDS+=backend-assets/util/llama-cpp-rpc-server
ALL_GRPC_BACKENDS+=backend-assets/grpc/whisper
@@ -239,19 +218,6 @@ endif
all: help
## go-llama.cpp
sources/go-llama.cpp:
mkdir -p sources/go-llama.cpp
cd sources/go-llama.cpp && \
git init && \
git remote add origin $(GOLLAMA_REPO) && \
git fetch origin && \
git checkout $(GOLLAMA_VERSION) && \
git submodule update --init --recursive --depth 1 --single-branch
sources/go-llama.cpp/libbinding.a: sources/go-llama.cpp
$(MAKE) -C sources/go-llama.cpp BUILD_TYPE=$(STABLE_BUILD_TYPE) libbinding.a
## bark.cpp
sources/bark.cpp:
git clone --recursive $(BARKCPP_REPO) sources/bark.cpp && \
@@ -282,19 +248,6 @@ sources/go-piper:
sources/go-piper/libpiper_binding.a: sources/go-piper
$(MAKE) -C sources/go-piper libpiper_binding.a example/main piper.o
## stable diffusion (onnx)
sources/go-stable-diffusion:
mkdir -p sources/go-stable-diffusion
cd sources/go-stable-diffusion && \
git init && \
git remote add origin $(STABLEDIFFUSION_REPO) && \
git fetch origin && \
git checkout $(STABLEDIFFUSION_VERSION) && \
git submodule update --init --recursive --depth 1 --single-branch
sources/go-stable-diffusion/libstablediffusion.a: sources/go-stable-diffusion
CPATH="$(CPATH):/usr/include/opencv4" $(MAKE) -C sources/go-stable-diffusion libstablediffusion.a
## stablediffusion (ggml)
sources/stablediffusion-ggml.cpp:
git clone --recursive $(STABLEDIFFUSION_GGML_REPO) sources/stablediffusion-ggml.cpp && \
@@ -302,14 +255,8 @@ sources/stablediffusion-ggml.cpp:
git checkout $(STABLEDIFFUSION_GGML_VERSION) && \
git submodule update --init --recursive --depth 1 --single-branch
sources/stablediffusion-ggml.cpp/build/libstable-diffusion.a: sources/stablediffusion-ggml.cpp
cd sources/stablediffusion-ggml.cpp && \
mkdir -p build && \
cd build && \
cmake $(CMAKE_ARGS) .. && \
cmake --build . --config Release
backend/go/image/stablediffusion-ggml/libsd.a: sources/stablediffusion-ggml.cpp/build/libstable-diffusion.a
backend/go/image/stablediffusion-ggml/libsd.a: sources/stablediffusion-ggml.cpp
$(MAKE) -C backend/go/image/stablediffusion-ggml build/libstable-diffusion.a
$(MAKE) -C backend/go/image/stablediffusion-ggml libsd.a
backend-assets/grpc/stablediffusion-ggml: backend/go/image/stablediffusion-ggml/libsd.a backend-assets/grpc
@@ -333,19 +280,6 @@ else
mv backend-assets/lib/libonnxruntime.so.$(ONNX_VERSION) backend-assets/lib/libonnxruntime.so.1
endif
## tiny-dream
sources/go-tiny-dream:
mkdir -p sources/go-tiny-dream
cd sources/go-tiny-dream && \
git init && \
git remote add origin $(TINYDREAM_REPO) && \
git fetch origin && \
git checkout $(TINYDREAM_VERSION) && \
git submodule update --init --recursive --depth 1 --single-branch
sources/go-tiny-dream/libtinydream.a: sources/go-tiny-dream
$(MAKE) -C sources/go-tiny-dream libtinydream.a
## whisper
sources/whisper.cpp:
mkdir -p sources/whisper.cpp
@@ -359,23 +293,17 @@ sources/whisper.cpp:
sources/whisper.cpp/libwhisper.a: sources/whisper.cpp
cd sources/whisper.cpp && $(MAKE) libwhisper.a libggml.a
get-sources: sources/go-llama.cpp sources/go-piper sources/stablediffusion-ggml.cpp sources/bark.cpp sources/whisper.cpp sources/go-stable-diffusion sources/go-tiny-dream backend/cpp/llama/llama.cpp
get-sources: sources/go-piper sources/stablediffusion-ggml.cpp sources/bark.cpp sources/whisper.cpp backend/cpp/llama/llama.cpp
replace:
$(GOCMD) mod edit -replace github.com/ggerganov/whisper.cpp=$(CURDIR)/sources/whisper.cpp
$(GOCMD) mod edit -replace github.com/ggerganov/whisper.cpp/bindings/go=$(CURDIR)/sources/whisper.cpp/bindings/go
$(GOCMD) mod edit -replace github.com/M0Rf30/go-tiny-dream=$(CURDIR)/sources/go-tiny-dream
$(GOCMD) mod edit -replace github.com/mudler/go-piper=$(CURDIR)/sources/go-piper
$(GOCMD) mod edit -replace github.com/mudler/go-stable-diffusion=$(CURDIR)/sources/go-stable-diffusion
$(GOCMD) mod edit -replace github.com/go-skynet/go-llama.cpp=$(CURDIR)/sources/go-llama.cpp
dropreplace:
$(GOCMD) mod edit -dropreplace github.com/ggerganov/whisper.cpp
$(GOCMD) mod edit -dropreplace github.com/ggerganov/whisper.cpp/bindings/go
$(GOCMD) mod edit -dropreplace github.com/M0Rf30/go-tiny-dream
$(GOCMD) mod edit -dropreplace github.com/mudler/go-piper
$(GOCMD) mod edit -dropreplace github.com/mudler/go-stable-diffusion
$(GOCMD) mod edit -dropreplace github.com/go-skynet/go-llama.cpp
prepare-sources: get-sources replace
$(GOCMD) mod download
@@ -383,11 +311,8 @@ prepare-sources: get-sources replace
## GENERIC
rebuild: ## Rebuilds the project
$(GOCMD) clean -cache
$(MAKE) -C sources/go-llama.cpp clean
$(MAKE) -C sources/whisper.cpp clean
$(MAKE) -C sources/go-stable-diffusion clean
$(MAKE) -C sources/go-piper clean
$(MAKE) -C sources/go-tiny-dream clean
$(MAKE) build
prepare: prepare-sources $(OPTIONAL_TARGETS)
@@ -489,7 +414,7 @@ run: prepare ## run local-ai
test-models/testmodel.ggml:
mkdir test-models
mkdir test-dir
wget -q https://huggingface.co/TheBloke/orca_mini_3B-GGML/resolve/main/orca-mini-3b.ggmlv3.q4_0.bin -O test-models/testmodel.ggml
wget -q https://huggingface.co/RichardErkhov/Qwen_-_Qwen2-1.5B-Instruct-gguf/resolve/main/Qwen2-1.5B-Instruct.Q2_K.gguf -O test-models/testmodel.ggml
wget -q https://huggingface.co/ggerganov/whisper.cpp/resolve/main/ggml-base.en.bin -O test-models/whisper-en
wget -q https://huggingface.co/mudler/all-MiniLM-L6-v2/resolve/main/ggml-model-q4_0.bin -O test-models/bert
wget -q https://cdn.openai.com/whisper/draft-20220913a/micro-machines.wav -O test-dir/audio.wav
@@ -501,11 +426,10 @@ prepare-test: grpcs
test: prepare test-models/testmodel.ggml grpcs
@echo 'Running tests'
export GO_TAGS="tts stablediffusion debug"
export GO_TAGS="tts debug"
$(MAKE) prepare-test
HUGGINGFACE_GRPC=$(abspath ./)/backend/python/sentencetransformers/run.sh TEST_DIR=$(abspath ./)/test-dir/ FIXTURES=$(abspath ./)/tests/fixtures CONFIG_FILE=$(abspath ./)/test-models/config.yaml MODELS_PATH=$(abspath ./)/test-models \
$(GOCMD) run github.com/onsi/ginkgo/v2/ginkgo --label-filter="!llama && !llama-gguf" --flake-attempts $(TEST_FLAKES) --fail-fast -v -r $(TEST_PATHS)
$(MAKE) test-llama
HUGGINGFACE_GRPC=$(abspath ./)/backend/python/transformers/run.sh TEST_DIR=$(abspath ./)/test-dir/ FIXTURES=$(abspath ./)/tests/fixtures CONFIG_FILE=$(abspath ./)/test-models/config.yaml MODELS_PATH=$(abspath ./)/test-models \
$(GOCMD) run github.com/onsi/ginkgo/v2/ginkgo --label-filter="!llama-gguf" --flake-attempts $(TEST_FLAKES) --fail-fast -v -r $(TEST_PATHS)
$(MAKE) test-llama-gguf
$(MAKE) test-tts
$(MAKE) test-stablediffusion
@@ -534,10 +458,6 @@ teardown-e2e:
rm -rf $(TEST_DIR) || true
docker stop $$(docker ps -q --filter ancestor=localai-tests)
test-llama: prepare-test
TEST_DIR=$(abspath ./)/test-dir/ FIXTURES=$(abspath ./)/tests/fixtures CONFIG_FILE=$(abspath ./)/test-models/config.yaml MODELS_PATH=$(abspath ./)/test-models \
$(GOCMD) run github.com/onsi/ginkgo/v2/ginkgo --label-filter="llama" --flake-attempts $(TEST_FLAKES) -v -r $(TEST_PATHS)
test-llama-gguf: prepare-test
TEST_DIR=$(abspath ./)/test-dir/ FIXTURES=$(abspath ./)/tests/fixtures CONFIG_FILE=$(abspath ./)/test-models/config.yaml MODELS_PATH=$(abspath ./)/test-models \
$(GOCMD) run github.com/onsi/ginkgo/v2/ginkgo --label-filter="llama-gguf" --flake-attempts $(TEST_FLAKES) -v -r $(TEST_PATHS)
@@ -589,10 +509,10 @@ protogen-go-clean:
$(RM) bin/*
.PHONY: protogen-python
protogen-python: autogptq-protogen bark-protogen coqui-protogen diffusers-protogen exllama2-protogen mamba-protogen rerankers-protogen sentencetransformers-protogen transformers-protogen parler-tts-protogen transformers-musicgen-protogen vall-e-x-protogen vllm-protogen openvoice-protogen
protogen-python: autogptq-protogen bark-protogen coqui-protogen diffusers-protogen exllama2-protogen rerankers-protogen transformers-protogen kokoro-protogen vllm-protogen faster-whisper-protogen
.PHONY: protogen-python-clean
protogen-python-clean: autogptq-protogen-clean bark-protogen-clean coqui-protogen-clean diffusers-protogen-clean exllama2-protogen-clean mamba-protogen-clean sentencetransformers-protogen-clean rerankers-protogen-clean transformers-protogen-clean transformers-musicgen-protogen-clean parler-tts-protogen-clean vall-e-x-protogen-clean vllm-protogen-clean openvoice-protogen-clean
protogen-python-clean: autogptq-protogen-clean bark-protogen-clean coqui-protogen-clean diffusers-protogen-clean exllama2-protogen-clean rerankers-protogen-clean transformers-protogen-clean kokoro-protogen-clean vllm-protogen-clean faster-whisper-protogen-clean
.PHONY: autogptq-protogen
autogptq-protogen:
@@ -626,6 +546,14 @@ diffusers-protogen:
diffusers-protogen-clean:
$(MAKE) -C backend/python/diffusers protogen-clean
.PHONY: faster-whisper-protogen
faster-whisper-protogen:
$(MAKE) -C backend/python/faster-whisper protogen
.PHONY: faster-whisper-protogen-clean
faster-whisper-protogen-clean:
$(MAKE) -C backend/python/faster-whisper protogen-clean
.PHONY: exllama2-protogen
exllama2-protogen:
$(MAKE) -C backend/python/exllama2 protogen
@@ -634,14 +562,6 @@ exllama2-protogen:
exllama2-protogen-clean:
$(MAKE) -C backend/python/exllama2 protogen-clean
.PHONY: mamba-protogen
mamba-protogen:
$(MAKE) -C backend/python/mamba protogen
.PHONY: mamba-protogen-clean
mamba-protogen-clean:
$(MAKE) -C backend/python/mamba protogen-clean
.PHONY: rerankers-protogen
rerankers-protogen:
$(MAKE) -C backend/python/rerankers protogen
@@ -650,14 +570,6 @@ rerankers-protogen:
rerankers-protogen-clean:
$(MAKE) -C backend/python/rerankers protogen-clean
.PHONY: sentencetransformers-protogen
sentencetransformers-protogen:
$(MAKE) -C backend/python/sentencetransformers protogen
.PHONY: sentencetransformers-protogen-clean
sentencetransformers-protogen-clean:
$(MAKE) -C backend/python/sentencetransformers protogen-clean
.PHONY: transformers-protogen
transformers-protogen:
$(MAKE) -C backend/python/transformers protogen
@@ -666,37 +578,13 @@ transformers-protogen:
transformers-protogen-clean:
$(MAKE) -C backend/python/transformers protogen-clean
.PHONY: parler-tts-protogen
parler-tts-protogen:
$(MAKE) -C backend/python/parler-tts protogen
.PHONY: kokoro-protogen
kokoro-protogen:
$(MAKE) -C backend/python/kokoro protogen
.PHONY: parler-tts-protogen-clean
parler-tts-protogen-clean:
$(MAKE) -C backend/python/parler-tts protogen-clean
.PHONY: transformers-musicgen-protogen
transformers-musicgen-protogen:
$(MAKE) -C backend/python/transformers-musicgen protogen
.PHONY: transformers-musicgen-protogen-clean
transformers-musicgen-protogen-clean:
$(MAKE) -C backend/python/transformers-musicgen protogen-clean
.PHONY: vall-e-x-protogen
vall-e-x-protogen:
$(MAKE) -C backend/python/vall-e-x protogen
.PHONY: vall-e-x-protogen-clean
vall-e-x-protogen-clean:
$(MAKE) -C backend/python/vall-e-x protogen-clean
.PHONY: openvoice-protogen
openvoice-protogen:
$(MAKE) -C backend/python/openvoice protogen
.PHONY: openvoice-protogen-clean
openvoice-protogen-clean:
$(MAKE) -C backend/python/openvoice protogen-clean
.PHONY: kokoro-protogen-clean
kokoro-protogen-clean:
$(MAKE) -C backend/python/kokoro protogen-clean
.PHONY: vllm-protogen
vllm-protogen:
@@ -713,15 +601,11 @@ prepare-extra-conda-environments: protogen-python
$(MAKE) -C backend/python/bark
$(MAKE) -C backend/python/coqui
$(MAKE) -C backend/python/diffusers
$(MAKE) -C backend/python/faster-whisper
$(MAKE) -C backend/python/vllm
$(MAKE) -C backend/python/mamba
$(MAKE) -C backend/python/sentencetransformers
$(MAKE) -C backend/python/rerankers
$(MAKE) -C backend/python/transformers
$(MAKE) -C backend/python/transformers-musicgen
$(MAKE) -C backend/python/parler-tts
$(MAKE) -C backend/python/vall-e-x
$(MAKE) -C backend/python/openvoice
$(MAKE) -C backend/python/kokoro
$(MAKE) -C backend/python/exllama2
prepare-test-extra: protogen-python
@@ -791,6 +675,13 @@ backend-assets/grpc/llama-cpp-avx2: backend-assets/grpc backend/cpp/llama/llama.
CMAKE_ARGS="$(CMAKE_ARGS) -DGGML_AVX=on -DGGML_AVX2=on -DGGML_AVX512=off -DGGML_FMA=on -DGGML_F16C=on" $(MAKE) VARIANT="llama-avx2" build-llama-cpp-grpc-server
cp -rfv backend/cpp/llama-avx2/grpc-server backend-assets/grpc/llama-cpp-avx2
backend-assets/grpc/llama-cpp-avx512: backend-assets/grpc backend/cpp/llama/llama.cpp
cp -rf backend/cpp/llama backend/cpp/llama-avx512
$(MAKE) -C backend/cpp/llama-avx512 purge
$(info ${GREEN}I llama-cpp build info:avx512${RESET})
CMAKE_ARGS="$(CMAKE_ARGS) -DGGML_AVX=on -DGGML_AVX2=off -DGGML_AVX512=on -DGGML_FMA=on -DGGML_F16C=on" $(MAKE) VARIANT="llama-avx512" build-llama-cpp-grpc-server
cp -rfv backend/cpp/llama-avx512/grpc-server backend-assets/grpc/llama-cpp-avx512
backend-assets/grpc/llama-cpp-avx: backend-assets/grpc backend/cpp/llama/llama.cpp
cp -rf backend/cpp/llama backend/cpp/llama-avx
$(MAKE) -C backend/cpp/llama-avx purge
@@ -844,13 +735,6 @@ backend-assets/util/llama-cpp-rpc-server: backend-assets/grpc/llama-cpp-grpc
mkdir -p backend-assets/util/
cp -rf backend/cpp/llama-grpc/llama.cpp/build/bin/rpc-server backend-assets/util/llama-cpp-rpc-server
backend-assets/grpc/llama-ggml: sources/go-llama.cpp sources/go-llama.cpp/libbinding.a backend-assets/grpc
CGO_LDFLAGS="$(CGO_LDFLAGS)" C_INCLUDE_PATH=$(CURDIR)/sources/go-llama.cpp LIBRARY_PATH=$(CURDIR)/sources/go-llama.cpp \
$(GOCMD) build -ldflags "$(LD_FLAGS)" -tags "$(GO_TAGS)" -o backend-assets/grpc/llama-ggml ./backend/go/llm/llama-ggml/
ifneq ($(UPX),)
$(UPX) backend-assets/grpc/llama-ggml
endif
backend-assets/grpc/bark-cpp: backend/go/bark/libbark.a backend-assets/grpc
CGO_LDFLAGS="$(CGO_LDFLAGS)" C_INCLUDE_PATH=$(CURDIR)/backend/go/bark/ LIBRARY_PATH=$(CURDIR)/backend/go/bark/ \
$(GOCMD) build -ldflags "$(LD_FLAGS)" -tags "$(GO_TAGS)" -o backend-assets/grpc/bark-cpp ./backend/go/bark/
@@ -865,13 +749,6 @@ ifneq ($(UPX),)
$(UPX) backend-assets/grpc/piper
endif
backend-assets/grpc/stablediffusion: sources/go-stable-diffusion sources/go-stable-diffusion/libstablediffusion.a backend-assets/grpc
CGO_LDFLAGS="$(CGO_LDFLAGS)" CPATH="$(CPATH):$(CURDIR)/sources/go-stable-diffusion/:/usr/include/opencv4" LIBRARY_PATH=$(CURDIR)/sources/go-stable-diffusion/ \
$(GOCMD) build -ldflags "$(LD_FLAGS)" -tags "$(GO_TAGS)" -o backend-assets/grpc/stablediffusion ./backend/go/image/stablediffusion
ifneq ($(UPX),)
$(UPX) backend-assets/grpc/stablediffusion
endif
backend-assets/grpc/silero-vad: backend-assets/grpc backend-assets/lib/libonnxruntime.so.1
CGO_LDFLAGS="$(CGO_LDFLAGS)" CPATH="$(CPATH):$(CURDIR)/sources/onnxruntime/include/" LIBRARY_PATH=$(CURDIR)/backend-assets/lib \
$(GOCMD) build -ldflags "$(LD_FLAGS)" -tags "$(GO_TAGS)" -o backend-assets/grpc/silero-vad ./backend/go/vad/silero
@@ -879,13 +756,6 @@ ifneq ($(UPX),)
$(UPX) backend-assets/grpc/silero-vad
endif
backend-assets/grpc/tinydream: sources/go-tiny-dream sources/go-tiny-dream/libtinydream.a backend-assets/grpc
CGO_LDFLAGS="$(CGO_LDFLAGS)" LIBRARY_PATH=$(CURDIR)/go-tiny-dream \
$(GOCMD) build -ldflags "$(LD_FLAGS)" -tags "$(GO_TAGS)" -o backend-assets/grpc/tinydream ./backend/go/image/tinydream
ifneq ($(UPX),)
$(UPX) backend-assets/grpc/tinydream
endif
backend-assets/grpc/whisper: sources/whisper.cpp sources/whisper.cpp/libwhisper.a backend-assets/grpc
CGO_LDFLAGS="$(CGO_LDFLAGS) $(CGO_LDFLAGS_WHISPER)" C_INCLUDE_PATH="$(CURDIR)/sources/whisper.cpp/include:$(CURDIR)/sources/whisper.cpp/ggml/include" LIBRARY_PATH=$(CURDIR)/sources/whisper.cpp \
$(GOCMD) build -ldflags "$(LD_FLAGS)" -tags "$(GO_TAGS)" -o backend-assets/grpc/whisper ./backend/go/transcribe/whisper
@@ -959,7 +829,7 @@ swagger:
.PHONY: gen-assets
gen-assets:
$(GOCMD) run core/dependencies_manager/manager.go embedded/webui_static.yaml core/http/static/assets
$(GOCMD) run core/dependencies_manager/manager.go webui_static.yaml core/http/static/assets
## Documentation
docs/layouts/_default:

View File

@@ -39,7 +39,7 @@
</p>
<p align="center">
<a href="https://trendshift.io/repositories/1484" target="_blank"><img src="https://trendshift.io/api/badge/repositories/1484" alt="go-skynet%2FLocalAI | Trendshift" style="width: 250px; height: 55px;" width="250" height="55"/></a>
<a href="https://trendshift.io/repositories/5539" target="_blank"><img src="https://trendshift.io/api/badge/repositories/5539" alt="mudler%2FLocalAI | Trendshift" style="width: 250px; height: 55px;" width="250" height="55"/></a>
</p>
> :bulb: Get help - [❓FAQ](https://localai.io/faq/) [💭Discussions](https://github.com/go-skynet/LocalAI/discussions) [:speech_balloon: Discord](https://discord.gg/uJAeKSAGDy) [:book: Documentation website](https://localai.io/)
@@ -92,19 +92,15 @@ local-ai run oci://localai/phi-2:latest
## 📰 Latest project news
- Jan 2025: LocalAI model release: https://huggingface.co/mudler/LocalAI-functioncall-phi-4-v0.3, SANA support in diffusers: https://github.com/mudler/LocalAI/pull/4603
- Dec 2024: stablediffusion.cpp backend (ggml) added ( https://github.com/mudler/LocalAI/pull/4289 )
- Nov 2024: Bark.cpp backend added ( https://github.com/mudler/LocalAI/pull/4287 )
- Nov 2024: Voice activity detection models (**VAD**) added to the API: https://github.com/mudler/LocalAI/pull/4204
- Oct 2024: examples moved to [LocalAI-examples](https://github.com/mudler/LocalAI-examples)
- Aug 2024: 🆕 FLUX-1, [P2P Explorer](https://explorer.localai.io)
- July 2024: 🔥🔥 🆕 P2P Dashboard, LocalAI Federated mode and AI Swarms: https://github.com/mudler/LocalAI/pull/2723
- June 2024: 🆕 You can browse now the model gallery without LocalAI! Check out https://models.localai.io
- June 2024: Support for models from OCI registries: https://github.com/mudler/LocalAI/pull/2628
- July 2024: 🔥🔥 🆕 P2P Dashboard, LocalAI Federated mode and AI Swarms: https://github.com/mudler/LocalAI/pull/2723. P2P Global community pools: https://github.com/mudler/LocalAI/issues/3113
- May 2024: 🔥🔥 Decentralized P2P llama.cpp: https://github.com/mudler/LocalAI/pull/2343 (peer2peer llama.cpp!) 👉 Docs https://localai.io/features/distribute/
- May 2024: 🔥🔥 Openvoice: https://github.com/mudler/LocalAI/pull/2334
- May 2024: 🆕 Function calls without grammars and mixed mode: https://github.com/mudler/LocalAI/pull/2328
- May 2024: 🔥🔥 Distributed inferencing: https://github.com/mudler/LocalAI/pull/2324
- May 2024: Chat, TTS, and Image generation in the WebUI: https://github.com/mudler/LocalAI/pull/2222
- April 2024: Reranker API: https://github.com/mudler/LocalAI/pull/2121
Roadmap items: [List of issues](https://github.com/mudler/LocalAI/issues?q=is%3Aissue+is%3Aopen+label%3Aroadmap)
@@ -113,12 +109,10 @@ Roadmap items: [List of issues](https://github.com/mudler/LocalAI/issues?q=is%3A
- Multimodal with vLLM and Video understanding: https://github.com/mudler/LocalAI/pull/3729
- Realtime API https://github.com/mudler/LocalAI/issues/3714
- 🔥🔥 Distributed, P2P Global community pools: https://github.com/mudler/LocalAI/issues/3113
- WebUI improvements: https://github.com/mudler/LocalAI/issues/2156
- Backends v2: https://github.com/mudler/LocalAI/issues/1126
- Improving UX v2: https://github.com/mudler/LocalAI/issues/1373
- Assistant API: https://github.com/mudler/LocalAI/issues/1273
- Moderation endpoint: https://github.com/mudler/LocalAI/issues/999
- Vulkan: https://github.com/mudler/LocalAI/issues/1647
- Anthropic API: https://github.com/mudler/LocalAI/issues/1808
@@ -126,10 +120,10 @@ If you want to help and contribute, issues up for grabs: https://github.com/mudl
## 🚀 [Features](https://localai.io/features/)
- 📖 [Text generation with GPTs](https://localai.io/features/text-generation/) (`llama.cpp`, `gpt4all.cpp`, ... [:book: and more](https://localai.io/model-compatibility/index.html#model-compatibility-table))
- 📖 [Text generation with GPTs](https://localai.io/features/text-generation/) (`llama.cpp`, `transformers`, `vllm` ... [:book: and more](https://localai.io/model-compatibility/index.html#model-compatibility-table))
- 🗣 [Text to Audio](https://localai.io/features/text-to-audio/)
- 🔈 [Audio to Text](https://localai.io/features/audio-to-text/) (Audio transcription with `whisper.cpp`)
- 🎨 [Image generation with stable diffusion](https://localai.io/features/image-generation)
- 🎨 [Image generation](https://localai.io/features/image-generation)
- 🔥 [OpenAI-alike tools API](https://localai.io/features/openai-functions/)
- 🧠 [Embeddings generation for vector databases](https://localai.io/features/embeddings/)
- ✍️ [Constrained grammars](https://localai.io/features/constrained_grammars/)
@@ -137,6 +131,7 @@ If you want to help and contribute, issues up for grabs: https://github.com/mudl
- 🥽 [Vision API](https://localai.io/features/gpt-vision/)
- 📈 [Reranker API](https://localai.io/features/reranker/)
- 🆕🖧 [P2P Inferencing](https://localai.io/features/distribute/)
- 🔊 Voice activity detection (Silero-VAD support)
- 🌍 Integrated WebUI!
## 💻 Usage
@@ -159,6 +154,7 @@ Model galleries
Other:
- Helm chart https://github.com/go-skynet/helm-charts
- VSCode extension https://github.com/badgooooor/localai-vscode-plugin
- Langchain: https://python.langchain.com/docs/integrations/providers/localai/
- Terminal utility https://github.com/djcopley/ShellOracle
- Local Smart assistant https://github.com/mudler/LocalAGI
- Home Assistant https://github.com/sammcj/homeassistant-localai / https://github.com/drndos/hass-openai-custom-conversation / https://github.com/valentinfrlch/ha-gpt4vision
@@ -216,7 +212,7 @@ A huge thank you to our generous sponsors who support this project covering CI e
<p align="center">
<a href="https://www.spectrocloud.com/" target="blank">
<img height="200" src="https://github.com/go-skynet/LocalAI/assets/2420543/68a6f3cb-8a65-4a4d-99b5-6417a8905512">
<img height="200" src="https://github.com/user-attachments/assets/72eab1dd-8b93-4fc0-9ade-84db49f24962">
</a>
<a href="https://www.premai.io/" target="blank">
<img height="200" src="https://github.com/mudler/LocalAI/assets/2420543/42e4ca83-661e-4f79-8e46-ae43689683d6"> <br>

View File

@@ -1,56 +1,17 @@
name: stablediffusion
backend: stablediffusion
backend: stablediffusion-ggml
cfg_scale: 4.5
options:
- sampler:euler
parameters:
model: stablediffusion_assets
license: "BSD-3"
urls:
- https://github.com/EdVince/Stable-Diffusion-NCNN
- https://github.com/EdVince/Stable-Diffusion-NCNN/blob/main/LICENSE
description: |
Stable Diffusion in NCNN with c++, supported txt2img and img2img
model: stable-diffusion-v1-5-pruned-emaonly-Q4_0.gguf
step: 25
download_files:
- filename: "stablediffusion_assets/AutoencoderKL-256-256-fp16-opt.param"
sha256: "18ca4b66685e21406bcf64c484b3b680b4949900415536d599cc876579c85c82"
uri: "https://raw.githubusercontent.com/EdVince/Stable-Diffusion-NCNN/main/x86/linux/assets/AutoencoderKL-256-256-fp16-opt.param"
- filename: "stablediffusion_assets/AutoencoderKL-512-512-fp16-opt.param"
sha256: "cf45f63aacf3dbbab0f59ed92a6f2c14d9a1801314631cd3abe91e3c85639a20"
uri: "https://raw.githubusercontent.com/EdVince/Stable-Diffusion-NCNN/main/x86/linux/assets/AutoencoderKL-512-512-fp16-opt.param"
- filename: "stablediffusion_assets/AutoencoderKL-base-fp16.param"
sha256: "0254a056dce61b0c27dc9ec1b78b53bcf55315c540f55f051eb841aa992701ba"
uri: "https://raw.githubusercontent.com/EdVince/Stable-Diffusion-NCNN/main/x86/linux/assets/AutoencoderKL-base-fp16.param"
- filename: "stablediffusion_assets/AutoencoderKL-encoder-512-512-fp16.bin"
sha256: "ddcb79a9951b9f91e05e087739ed69da2c1c4ae30ba4168cce350b49d617c9fa"
uri: "https://github.com/EdVince/Stable-Diffusion-NCNN/releases/download/naifu/AutoencoderKL-encoder-512-512-fp16.bin"
- filename: "stablediffusion_assets/AutoencoderKL-fp16.bin"
sha256: "f02e71f80e70252734724bbfaed5c4ddd3a8ed7e61bb2175ff5f53099f0e35dd"
uri: "https://github.com/EdVince/Stable-Diffusion-NCNN/releases/download/naifu/AutoencoderKL-fp16.bin"
- filename: "stablediffusion_assets/FrozenCLIPEmbedder-fp16.bin"
sha256: "1c9a12f4e1dd1b295a388045f7f28a2352a4d70c3dc96a542189a3dd7051fdd6"
uri: "https://github.com/EdVince/Stable-Diffusion-NCNN/releases/download/naifu/FrozenCLIPEmbedder-fp16.bin"
- filename: "stablediffusion_assets/FrozenCLIPEmbedder-fp16.param"
sha256: "471afbe678dd1fd3fe764ef9c6eccaccb0a7d7e601f27b462aa926b20eb368c9"
uri: "https://raw.githubusercontent.com/EdVince/Stable-Diffusion-NCNN/main/x86/linux/assets/FrozenCLIPEmbedder-fp16.param"
- filename: "stablediffusion_assets/log_sigmas.bin"
sha256: "a2089f8aa4c61f9c200feaec541ab3f5c94233b28deb6d5e8bcd974fa79b68ac"
uri: "https://github.com/EdVince/Stable-Diffusion-NCNN/raw/main/x86/linux/assets/log_sigmas.bin"
- filename: "stablediffusion_assets/UNetModel-256-256-MHA-fp16-opt.param"
sha256: "a58c380229f09491776df837b7aa7adffc0a87821dc4708b34535da2e36e3da1"
uri: "https://raw.githubusercontent.com/EdVince/Stable-Diffusion-NCNN/main/x86/linux/assets/UNetModel-256-256-MHA-fp16-opt.param"
- filename: "stablediffusion_assets/UNetModel-512-512-MHA-fp16-opt.param"
sha256: "f12034067062827bd7f43d1d21888d1f03905401acf6c6eea22be23c259636fa"
uri: "https://raw.githubusercontent.com/EdVince/Stable-Diffusion-NCNN/main/x86/linux/assets/UNetModel-512-512-MHA-fp16-opt.param"
- filename: "stablediffusion_assets/UNetModel-base-MHA-fp16.param"
sha256: "696f6975de49f4325b53ce32aff81861a6d6c07cd9ce3f0aae2cc405350af38d"
uri: "https://raw.githubusercontent.com/EdVince/Stable-Diffusion-NCNN/main/x86/linux/assets/UNetModel-base-MHA-fp16.param"
- filename: "stablediffusion_assets/UNetModel-MHA-fp16.bin"
sha256: "d618918d011bfc1f644c0f2a33bf84931bd53b28a98492b0a8ed6f3a818852c3"
uri: "https://github.com/EdVince/Stable-Diffusion-NCNN/releases/download/naifu/UNetModel-MHA-fp16.bin"
- filename: "stablediffusion_assets/vocab.txt"
sha256: "e30e57b6f1e47616982ef898d8922be24e535b4fa3d0110477b3a6f02ebbae7d"
uri: "https://raw.githubusercontent.com/EdVince/Stable-Diffusion-NCNN/main/x86/linux/assets/vocab.txt"
- filename: "stable-diffusion-v1-5-pruned-emaonly-Q4_0.gguf"
sha256: "b8944e9fe0b69b36ae1b5bb0185b3a7b8ef14347fe0fa9af6c64c4829022261f"
uri: "huggingface://second-state/stable-diffusion-v1-5-GGUF/stable-diffusion-v1-5-pruned-emaonly-Q4_0.gguf"
usage: |
curl http://localhost:8080/v1/images/generations \

8
aio/cpu/vad.yaml Normal file
View File

@@ -0,0 +1,8 @@
backend: silero-vad
name: silero-vad
parameters:
model: silero-vad.onnx
download_files:
- filename: silero-vad.onnx
uri: https://huggingface.co/onnx-community/silero-vad/resolve/main/onnx/model.onnx
sha256: a4a068cd6cf1ea8355b84327595838ca748ec29a25bc91fc82e6c299ccdc5808

View File

@@ -129,7 +129,7 @@ detect_gpu
detect_gpu_size
PROFILE="${PROFILE:-$GPU_SIZE}" # default to cpu
export MODELS="${MODELS:-/aio/${PROFILE}/embeddings.yaml,/aio/${PROFILE}/rerank.yaml,/aio/${PROFILE}/text-to-speech.yaml,/aio/${PROFILE}/image-gen.yaml,/aio/${PROFILE}/text-to-text.yaml,/aio/${PROFILE}/speech-to-text.yaml,/aio/${PROFILE}/vision.yaml}"
export MODELS="${MODELS:-/aio/${PROFILE}/embeddings.yaml,/aio/${PROFILE}/rerank.yaml,/aio/${PROFILE}/text-to-speech.yaml,/aio/${PROFILE}/image-gen.yaml,/aio/${PROFILE}/text-to-text.yaml,/aio/${PROFILE}/speech-to-text.yaml,/aio/${PROFILE}/vad.yaml,/aio/${PROFILE}/vision.yaml}"
check_vars

8
aio/gpu-8g/vad.yaml Normal file
View File

@@ -0,0 +1,8 @@
backend: silero-vad
name: silero-vad
parameters:
model: silero-vad.onnx
download_files:
- filename: silero-vad.onnx
uri: https://huggingface.co/onnx-community/silero-vad/resolve/main/onnx/model.onnx
sha256: a4a068cd6cf1ea8355b84327595838ca748ec29a25bc91fc82e6c299ccdc5808

8
aio/intel/vad.yaml Normal file
View File

@@ -0,0 +1,8 @@
backend: silero-vad
name: silero-vad
parameters:
model: silero-vad.onnx
download_files:
- filename: silero-vad.onnx
uri: https://huggingface.co/onnx-community/silero-vad/resolve/main/onnx/model.onnx
sha256: a4a068cd6cf1ea8355b84327595838ca748ec29a25bc91fc82e6c299ccdc5808

View File

@@ -159,6 +159,12 @@ message Reply {
bytes message = 1;
int32 tokens = 2;
int32 prompt_tokens = 3;
double timing_prompt_processing = 4;
double timing_token_generation = 5;
}
message GrammarTrigger {
string word = 1;
}
message ModelOptions {
@@ -222,6 +228,11 @@ message ModelOptions {
int32 MaxModelLen = 54;
int32 TensorParallelSize = 55;
string LoadFormat = 58;
bool DisableLogStatus = 66;
string DType = 67;
int32 LimitImagePerPrompt = 68;
int32 LimitVideoPerPrompt = 69;
int32 LimitAudioPerPrompt = 70;
string MMProj = 41;
@@ -242,6 +253,11 @@ message ModelOptions {
repeated float LoraScales = 61;
repeated string Options = 62;
string CacheTypeKey = 63;
string CacheTypeValue = 64;
repeated GrammarTrigger GrammarTriggers = 65;
}
message Result {
@@ -345,4 +361,4 @@ message StatusResponse {
message Message {
string role = 1;
string content = 2;
}
}

View File

@@ -134,6 +134,32 @@ static std::string tokens_to_output_formatted_string(const llama_context *ctx, c
return out;
}
// Adds an RPC server
// https://github.com/ggerganov/llama.cpp/compare/4dbc8b9cb71876e005724f4e8f73a3544646bcf5..3edfa7d3753c29e44b964c0ff424d2ea8d5fdee6
static void add_rpc_devices(std::string servers) {
auto rpc_servers = string_split<std::string>(servers, ',');
if (rpc_servers.empty()) {
throw std::invalid_argument("no RPC servers specified");
}
ggml_backend_reg_t rpc_reg = ggml_backend_reg_by_name("RPC");
if (!rpc_reg) {
throw std::invalid_argument("failed to find RPC backend");
}
typedef ggml_backend_dev_t (*ggml_backend_rpc_add_device_t)(const char * endpoint);
ggml_backend_rpc_add_device_t ggml_backend_rpc_add_device_fn = (ggml_backend_rpc_add_device_t) ggml_backend_reg_get_proc_address(rpc_reg, "ggml_backend_rpc_add_device");
if (!ggml_backend_rpc_add_device_fn) {
throw std::invalid_argument("failed to find RPC device add function");
}
for (const auto & server : rpc_servers) {
ggml_backend_dev_t dev = ggml_backend_rpc_add_device_fn(server.c_str());
if (dev) {
ggml_backend_device_register(dev);
} else {
throw std::invalid_argument("failed to register RPC device");
}
}
}
// convert a vector of completion_token_output to json
static json probs_vector_to_json(const llama_context *ctx, const std::vector<completion_token_output> &probs)
{
@@ -428,6 +454,7 @@ struct llama_server_context
{
llama_model *model = nullptr;
llama_context *ctx = nullptr;
const llama_vocab * vocab = nullptr;
clip_ctx *clp_ctx = nullptr;
@@ -439,6 +466,10 @@ struct llama_server_context
bool clean_kv_cache = true;
bool all_slots_are_idle = false;
bool add_bos_token = true;
bool has_eos_token = true;
bool grammar_lazy = false;
std::vector<common_grammar_trigger> grammar_triggers;
int32_t n_ctx; // total context for all clients / slots
@@ -492,8 +523,8 @@ struct llama_server_context
}
common_init_result common_init = common_init_from_params(params);
model = common_init.model;
ctx = common_init.context;
model = common_init.model.release();
ctx = common_init.context.release();
if (model == nullptr)
{
LOG_ERR("unable to load model: %s", params.model.c_str());
@@ -502,7 +533,7 @@ struct llama_server_context
if (multimodal) {
const int n_embd_clip = clip_n_mmproj_embd(clp_ctx);
const int n_embd_llm = llama_n_embd(model);
const int n_embd_llm = llama_model_n_embd(model);
if (n_embd_clip != n_embd_llm) {
LOG("%s: embedding dim of the multimodal projector (%d) is not equal to that of LLaMA (%d). Make sure that you use the correct mmproj file.\n", __func__, n_embd_clip, n_embd_llm);
llama_free(ctx);
@@ -511,23 +542,15 @@ struct llama_server_context
}
}
vocab = llama_model_get_vocab(model);
n_ctx = llama_n_ctx(ctx);
add_bos_token = llama_add_bos_token(model);
add_bos_token = llama_vocab_get_add_bos(vocab);
has_eos_token = llama_vocab_eos(vocab) != LLAMA_TOKEN_NULL;
return true;
}
void validate_model_chat_template(server_params & sparams) {
llama_chat_message chat[] = {{"user", "test"}};
std::vector<char> buf(1);
int res = llama_chat_apply_template(model, nullptr, chat, 1, true, buf.data(), buf.size());
if (res < 0) {
LOG_ERR("The chat template comes with this model is not yet supported, falling back to chatml. This may cause the model to output suboptimal responses", __func__);
sparams.chat_template = "<|im_start|>"; // llama_chat_apply_template only checks if <|im_start|> exist in the template
}
}
llama_client_slot* get_active_slot() {
for (llama_client_slot& slot : slots) {
// Check if the slot is currently processing
@@ -681,12 +704,13 @@ struct llama_server_context
slot->sparams.mirostat = json_value(data, "mirostat", default_sparams.mirostat);
slot->sparams.mirostat_tau = json_value(data, "mirostat_tau", default_sparams.mirostat_tau);
slot->sparams.mirostat_eta = json_value(data, "mirostat_eta", default_sparams.mirostat_eta);
slot->sparams.penalize_nl = json_value(data, "penalize_nl", default_sparams.penalize_nl);
slot->params.n_keep = json_value(data, "n_keep", slot->params.n_keep);
slot->sparams.seed = json_value(data, "seed", default_sparams.seed);
slot->sparams.grammar = json_value(data, "grammar", default_sparams.grammar);
slot->sparams.n_probs = json_value(data, "n_probs", default_sparams.n_probs);
slot->sparams.min_keep = json_value(data, "min_keep", default_sparams.min_keep);
slot->sparams.grammar_triggers = grammar_triggers;
slot->sparams.grammar_lazy = grammar_lazy;
if (slot->n_predict > 0 && slot->params.n_predict > slot->n_predict) {
// Might be better to reject the request with a 400 ?
@@ -726,8 +750,8 @@ struct llama_server_context
slot->prompt = "";
}
if (json_value(data, "ignore_eos", false)) {
slot->sparams.logit_bias.push_back({llama_token_eos(model), -INFINITY});
if (json_value(data, "ignore_eos", false) && has_eos_token) {
slot->sparams.logit_bias.push_back({llama_vocab_eos(vocab), -INFINITY});
}
/*
slot->sparams.penalty_prompt_tokens.clear();
@@ -766,13 +790,13 @@ struct llama_server_context
}
}
*/
slot->sparams.logit_bias.clear();
const auto &logit_bias = data.find("logit_bias");
if (logit_bias != data.end() && logit_bias->is_array())
{
const int n_vocab = llama_n_vocab(model);
const llama_vocab * vocab = llama_model_get_vocab(model);
const int n_vocab = llama_vocab_n_tokens(vocab);
for (const auto &el : *logit_bias)
{
if (el.is_array() && el.size() == 2)
@@ -801,7 +825,7 @@ struct llama_server_context
}
else if (el[0].is_string())
{
auto toks = common_tokenize(model, el[0].get<std::string>(), false);
auto toks = common_tokenize(vocab, el[0].get<std::string>(), false);
for (auto tok : toks)
{
slot->sparams.logit_bias.push_back({tok, bias});
@@ -1131,7 +1155,15 @@ struct llama_server_context
slot.has_next_token = false;
}
if (result.tok == llama_token_eos(model))
if (slot.n_past >= slot.n_ctx) {
slot.truncated = true;
slot.stopped_limit = true;
slot.has_next_token = false;
LOG_VERBOSE("stopped due to running out of context capacity", {});
}
if (result.tok == llama_vocab_eos(vocab) || llama_vocab_is_eog(vocab, result.tok))
{
slot.stopped_eos = true;
slot.has_next_token = false;
@@ -1213,13 +1245,12 @@ struct llama_server_context
{"mirostat", slot.sparams.mirostat},
{"mirostat_tau", slot.sparams.mirostat_tau},
{"mirostat_eta", slot.sparams.mirostat_eta},
{"penalize_nl", slot.sparams.penalize_nl},
{"stop", slot.params.antiprompt},
{"n_predict", slot.params.n_predict},
{"n_keep", params.n_keep},
{"ignore_eos", slot.sparams.ignore_eos},
{"stream", slot.params.stream},
// {"logit_bias", slot.sparams.logit_bias},
// {"logit_bias", slot.sparams.logit_bias},
{"n_probs", slot.sparams.n_probs},
{"min_keep", slot.sparams.min_keep},
{"grammar", slot.sparams.grammar},
@@ -1319,7 +1350,7 @@ struct llama_server_context
queue_results.send(res);
}
void send_embedding(llama_client_slot &slot)
void send_embedding(llama_client_slot &slot, const llama_batch & batch)
{
task_result res;
res.id = slot.task_id;
@@ -1327,7 +1358,7 @@ struct llama_server_context
res.error = false;
res.stop = true;
const int n_embd = llama_n_embd(model);
const int n_embd = llama_model_n_embd(model);
if (!params.embedding)
{
LOG_WARNING("embedding disabled", {
@@ -1341,10 +1372,38 @@ struct llama_server_context
else
{
const float *data = llama_get_embeddings(ctx);
std::vector<float> embedding(data, data + n_embd);
std::vector<float> embd_res(n_embd, 0.0f);
std::vector<std::vector<float>> embedding;
for (int i = 0; i < batch.n_tokens; ++i) {
if (!batch.logits[i] || batch.seq_id[i][0] != slot.id) {
continue;
}
const float * embd = llama_get_embeddings_seq(ctx, batch.seq_id[i][0]);
if (embd == NULL) {
embd = llama_get_embeddings_ith(ctx, i);
}
if (embd == NULL) {
LOG("failed to get embeddings");
continue;
}
// normalize only when there is pooling
// TODO: configurable
if (llama_pooling_type(ctx) != LLAMA_POOLING_TYPE_NONE) {
common_embd_normalize(embd, embd_res.data(), n_embd, 2);
embedding.push_back(embd_res);
} else {
embedding.push_back({ embd, embd + n_embd });
}
}
// OAI compat
res.result_json = json
{
{"embedding", embedding },
{"embedding", embedding[0] },
};
}
queue_results.send(res);
@@ -1426,7 +1485,7 @@ struct llama_server_context
n_eval = n_batch;
}
const int n_embd = llama_n_embd(model);
const int n_embd = llama_model_n_embd(model);
float * embd = img.image_embedding + i * n_embd;
llava_embd_batch llava_batch = llava_embd_batch(embd, n_eval, slot.n_past, 0);
if (llama_decode(ctx, llava_batch.batch))
@@ -1604,17 +1663,17 @@ struct llama_server_context
{
if (slot.is_processing() && system_tokens.size() + slot.cache_tokens.size() >= (size_t) slot.n_ctx)
{
// this check is redundant (for good)
// we should never get here, because generation should already stopped in process_token()
// START LOCALAI changes
// Temporary disable context-shifting as it can lead to infinite loops (issue: https://github.com/ggerganov/llama.cpp/issues/3969)
// See: https://github.com/mudler/LocalAI/issues/1333
// Context is exhausted, release the slot
slot.release();
send_final_response(slot);
slot.cache_tokens.clear();
slot.n_past = 0;
slot.truncated = false;
slot.has_next_token = true;
LOG("Context exhausted. Slot %d released (%d tokens in cache)\n", slot.id, (int) slot.cache_tokens.size());
slot.has_next_token = false;
LOG_ERROR("context is exhausted, release the slot", {});
continue;
// END LOCALAI changes
@@ -1707,11 +1766,11 @@ struct llama_server_context
suffix_tokens.erase(suffix_tokens.begin());
}
prefix_tokens.insert(prefix_tokens.begin(), llama_token_prefix(model));
prefix_tokens.insert(prefix_tokens.begin(), llama_token_bos(model)); // always add BOS
prefix_tokens.insert(prefix_tokens.end(), llama_token_suffix(model));
prefix_tokens.insert(prefix_tokens.begin(), llama_vocab_fim_pre(vocab));
prefix_tokens.insert(prefix_tokens.begin(), llama_vocab_bos(vocab)); // always add BOS
prefix_tokens.insert(prefix_tokens.end(), llama_vocab_fim_suf(vocab));
prefix_tokens.insert(prefix_tokens.end(), suffix_tokens.begin(), suffix_tokens.end());
prefix_tokens.push_back(llama_token_middle(model));
prefix_tokens.push_back(llama_vocab_fim_mid(vocab));
prompt_tokens = prefix_tokens;
}
else
@@ -1965,7 +2024,7 @@ struct llama_server_context
// prompt evaluated for embedding
if (slot.embedding)
{
send_embedding(slot);
send_embedding(slot, batch_view);
slot.release();
slot.i_batch = -1;
continue;
@@ -2112,7 +2171,6 @@ json parse_options(bool streaming, const backend::PredictOptions* predict, llama
// slot->sparams.mirostat = json_value(data, "mirostat", default_sparams.mirostat);
// slot->sparams.mirostat_tau = json_value(data, "mirostat_tau", default_sparams.mirostat_tau);
// slot->sparams.mirostat_eta = json_value(data, "mirostat_eta", default_sparams.mirostat_eta);
// slot->sparams.penalize_nl = json_value(data, "penalize_nl", default_sparams.penalize_nl);
// slot->params.n_keep = json_value(data, "n_keep", slot->params.n_keep);
// slot->params.seed = json_value(data, "seed", default_params.seed);
// slot->sparams.grammar = json_value(data, "grammar", default_sparams.grammar);
@@ -2135,7 +2193,6 @@ json parse_options(bool streaming, const backend::PredictOptions* predict, llama
data["mirostat"] = predict->mirostat();
data["mirostat_tau"] = predict->mirostattau();
data["mirostat_eta"] = predict->mirostateta();
data["penalize_nl"] = predict->penalizenl();
data["n_keep"] = predict->nkeep();
data["seed"] = predict->seed();
data["grammar"] = predict->grammar();
@@ -2181,7 +2238,6 @@ json parse_options(bool streaming, const backend::PredictOptions* predict, llama
// llama.params.sparams.mirostat = predict->mirostat();
// llama.params.sparams.mirostat_tau = predict->mirostattau();
// llama.params.sparams.mirostat_eta = predict->mirostateta();
// llama.params.sparams.penalize_nl = predict->penalizenl();
// llama.params.n_keep = predict->nkeep();
// llama.params.seed = predict->seed();
// llama.params.sparams.grammar = predict->grammar();
@@ -2228,6 +2284,35 @@ json parse_options(bool streaming, const backend::PredictOptions* predict, llama
// }
// }
const std::vector<ggml_type> kv_cache_types = {
GGML_TYPE_F32,
GGML_TYPE_F16,
GGML_TYPE_BF16,
GGML_TYPE_Q8_0,
GGML_TYPE_Q4_0,
GGML_TYPE_Q4_1,
GGML_TYPE_IQ4_NL,
GGML_TYPE_Q5_0,
GGML_TYPE_Q5_1,
};
static ggml_type kv_cache_type_from_str(const std::string & s) {
for (const auto & type : kv_cache_types) {
if (ggml_type_name(type) == s) {
return type;
}
}
throw std::runtime_error("Unsupported cache type: " + s);
}
static std::string get_all_kv_cache_types() {
std::ostringstream msg;
for (const auto & type : kv_cache_types) {
msg << ggml_type_name(type) << (&type == &kv_cache_types.back() ? "" : ", ");
}
return msg.str();
}
static void params_parse(const backend::ModelOptions* request,
common_params & params) {
@@ -2241,6 +2326,12 @@ static void params_parse(const backend::ModelOptions* request,
}
// params.model_alias ??
params.model_alias = request->modelfile();
if (!request->cachetypekey().empty()) {
params.cache_type_k = kv_cache_type_from_str(request->cachetypekey());
}
if (!request->cachetypevalue().empty()) {
params.cache_type_v = kv_cache_type_from_str(request->cachetypevalue());
}
params.n_ctx = request->contextsize();
//params.memory_f16 = request->f16memory();
params.cpuparams.n_threads = request->threads();
@@ -2258,7 +2349,7 @@ static void params_parse(const backend::ModelOptions* request,
const char *llama_grpc_servers = std::getenv("LLAMACPP_GRPC_SERVERS");
if (llama_grpc_servers != NULL) {
params.rpc_servers = std::string(llama_grpc_servers);
add_rpc_devices(std::string(llama_grpc_servers));
}
// TODO: Add yarn
@@ -2324,6 +2415,21 @@ static void params_parse(const backend::ModelOptions* request,
if ( request->ropefreqscale() != 0.0f ) {
params.rope_freq_scale = request->ropefreqscale();
}
if (request->grammartriggers_size() > 0) {
LOG_INFO("configuring grammar triggers", {});
llama.grammar_lazy = true;
for (int i = 0; i < request->grammartriggers_size(); i++) {
common_grammar_trigger trigger;
trigger.type = COMMON_GRAMMAR_TRIGGER_TYPE_WORD;
trigger.value = request->grammartriggers(i).word();
// trigger.at_start = request->grammartriggers(i).at_start();
llama.grammar_triggers.push_back(trigger);
LOG_INFO("grammar trigger", {
{ "word", trigger.value },
});
}
}
}
@@ -2384,6 +2490,13 @@ public:
int32_t tokens_evaluated = result.result_json.value("tokens_evaluated", 0);
reply.set_prompt_tokens(tokens_evaluated);
if (result.result_json.contains("timings")) {
double timing_prompt_processing = result.result_json.at("timings").value("prompt_ms", 0.0);
reply.set_timing_prompt_processing(timing_prompt_processing);
double timing_token_generation = result.result_json.at("timings").value("predicted_ms", 0.0);
reply.set_timing_token_generation(timing_token_generation);
}
// Log Request Correlation Id
LOG_VERBOSE("correlation:", {
{ "id", data["correlation_id"] }
@@ -2424,6 +2537,13 @@ public:
reply->set_prompt_tokens(tokens_evaluated);
reply->set_tokens(tokens_predicted);
reply->set_message(completion_text);
if (result.result_json.contains("timings")) {
double timing_prompt_processing = result.result_json.at("timings").value("prompt_ms", 0.0);
reply->set_timing_prompt_processing(timing_prompt_processing);
double timing_token_generation = result.result_json.at("timings").value("predicted_ms", 0.0);
reply->set_timing_token_generation(timing_token_generation);
}
}
else
{
@@ -2458,6 +2578,18 @@ public:
return grpc::Status::OK;
}
grpc::Status TokenizeString(ServerContext* context, const backend::PredictOptions* request, backend::TokenizationResponse* response){
json data = parse_options(false, request, llama);
std::vector<llama_token> tokens = llama.tokenize(data["prompt"],false);
for (int i=0 ; i< tokens.size(); i++){
response->add_tokens(tokens[i]);
}
return grpc::Status::OK;
}
grpc::Status GetMetrics(ServerContext* context, const backend::MetricsRequest* request, backend::MetricsResponse* response) {
llama_client_slot* active_slot = llama.get_active_slot();

View File

@@ -1,13 +1,13 @@
diff --git a/examples/llava/clip.cpp b/examples/llava/clip.cpp
index 342042ff..224db9b5 100644
index 7f892beb..0517e529 100644
--- a/examples/llava/clip.cpp
+++ b/examples/llava/clip.cpp
@@ -2419,7 +2419,7 @@ bool clip_image_batch_encode(clip_ctx * ctx, const int n_threads, const clip_ima
struct ggml_tensor * patches = ggml_graph_get_tensor(gf, "patches");
int* patches_data = (int*)malloc(ggml_nbytes(patches));
for (int i = 0; i < num_patches; i++) {
- patches_data[i] = i + 1;
+ patches_data[i] = i;
}
ggml_backend_tensor_set(patches, patches_data, 0, ggml_nbytes(patches));
free(patches_data);
@@ -2766,7 +2766,7 @@ bool clip_image_batch_encode(clip_ctx * ctx, const int n_threads, const clip_ima
int patch_offset = ctx->has_class_embedding ? 1 : 0;
int* patches_data = (int*)malloc(ggml_nbytes(patches));
for (int i = 0; i < num_patches; i++) {
- patches_data[i] = i + patch_offset;
+ patches_data[i] = i + 1;
}
ggml_backend_tensor_set(patches, patches_data, 0, ggml_nbytes(patches));
free(patches_data);

View File

@@ -1,5 +1,7 @@
#!/bin/bash
set -e
## Patches
## Apply patches from the `patches` directory
for patch in $(ls patches); do

View File

@@ -2,20 +2,95 @@ INCLUDE_PATH := $(abspath ./)
LIBRARY_PATH := $(abspath ./)
AR?=ar
CMAKE_ARGS?=
BUILD_TYPE?=
ONEAPI_VARS?=/opt/intel/oneapi/setvars.sh
# keep standard at C11 and C++11
CXXFLAGS = -I. -I$(INCLUDE_PATH)/../../../../sources/stablediffusion-ggml.cpp/thirdparty -I$(INCLUDE_PATH)/../../../../sources/stablediffusion-ggml.cpp/ggml/include -I$(INCLUDE_PATH)/../../../../sources/stablediffusion-ggml.cpp -O3 -DNDEBUG -std=c++17 -fPIC
# Disable Shared libs as we are linking on static gRPC and we can't mix shared and static
CMAKE_ARGS+=-DBUILD_SHARED_LIBS=OFF
# If build type is cublas, then we set -DGGML_CUDA=ON to CMAKE_ARGS automatically
ifeq ($(BUILD_TYPE),cublas)
CMAKE_ARGS+=-DGGML_CUDA=ON
# If build type is openblas then we set -DGGML_BLAS=ON -DGGML_BLAS_VENDOR=OpenBLAS
# to CMAKE_ARGS automatically
else ifeq ($(BUILD_TYPE),openblas)
CMAKE_ARGS+=-DGGML_BLAS=ON -DGGML_BLAS_VENDOR=OpenBLAS
# If build type is clblas (openCL) we set -DGGML_CLBLAST=ON -DCLBlast_DIR=/some/path
else ifeq ($(BUILD_TYPE),clblas)
CMAKE_ARGS+=-DGGML_CLBLAST=ON -DCLBlast_DIR=/some/path
# If it's hipblas we do have also to set CC=/opt/rocm/llvm/bin/clang CXX=/opt/rocm/llvm/bin/clang++
else ifeq ($(BUILD_TYPE),hipblas)
CMAKE_ARGS+=-DGGML_HIP=ON
# If it's OSX, DO NOT embed the metal library - -DGGML_METAL_EMBED_LIBRARY=ON requires further investigation
# But if it's OSX without metal, disable it here
else ifeq ($(OS),Darwin)
ifneq ($(BUILD_TYPE),metal)
CMAKE_ARGS+=-DGGML_METAL=OFF
else
CMAKE_ARGS+=-DGGML_METAL=ON
CMAKE_ARGS+=-DGGML_METAL_EMBED_LIBRARY=ON
TARGET+=--target ggml-metal
endif
endif
# ifeq ($(BUILD_TYPE),sycl_f16)
# CMAKE_ARGS+=-DGGML_SYCL=ON -DCMAKE_C_COMPILER=icx -DCMAKE_CXX_COMPILER=icpx -DGGML_SYCL_F16=ON -DSD_SYCL=ON -DGGML_SYCL_F16=ON
# endif
# ifeq ($(BUILD_TYPE),sycl_f32)
# CMAKE_ARGS+=-DGGML_SYCL=ON -DCMAKE_C_COMPILER=icx -DCMAKE_CXX_COMPILER=icpx -DSD_SYCL=ON
# endif
# warnings
CXXFLAGS += -Wall -Wextra -Wpedantic -Wcast-qual -Wno-unused-function
# Find all .a archives in ARCHIVE_DIR
# (ggml can have different backends cpu, cuda, etc., each backend generates a .a archive)
GGML_ARCHIVE_DIR := build/ggml/src/
ALL_ARCHIVES := $(shell find $(GGML_ARCHIVE_DIR) -type f -name '*.a')
# Name of the single merged library
COMBINED_LIB := libggmlall.a
# Rule to merge all the .a files into one
$(COMBINED_LIB): $(ALL_ARCHIVES)
@echo "Merging all .a into $(COMBINED_LIB)"
rm -f $@
mkdir -p merge-tmp
for a in $(ALL_ARCHIVES); do \
( cd merge-tmp && ar x ../$$a ); \
done
( cd merge-tmp && ar rcs ../$@ *.o )
# Ensure we have a proper index
ranlib $@
# Clean up
rm -rf merge-tmp
build/libstable-diffusion.a:
@echo "Building SD with $(BUILD_TYPE) build type and $(CMAKE_ARGS)"
ifneq (,$(findstring sycl,$(BUILD_TYPE)))
+bash -c "source $(ONEAPI_VARS); \
mkdir -p build && \
cd build && \
cmake $(CMAKE_ARGS) ../../../../../sources/stablediffusion-ggml.cpp && \
cmake --build . --config Release"
else
mkdir -p build && \
cd build && \
cmake $(CMAKE_ARGS) ../../../../../sources/stablediffusion-ggml.cpp && \
cmake --build . --config Release
endif
$(MAKE) $(COMBINED_LIB)
gosd.o:
$(CXX) $(CXXFLAGS) gosd.cpp -o gosd.o -c
libsd.a: gosd.o
cp $(INCLUDE_PATH)/../../../../sources/stablediffusion-ggml.cpp/build/libstable-diffusion.a ./libsd.a
cp $(INCLUDE_PATH)/build/libstable-diffusion.a ./libsd.a
$(AR) rcs libsd.a gosd.o
clean:
rm -f gosd.o libsd.a
rm -rf gosd.o libsd.a build $(COMBINED_LIB)

View File

@@ -35,6 +35,8 @@ const char* sample_method_str[] = {
"ipndm",
"ipndm_v",
"lcm",
"ddim_trailing",
"tcd",
};
// Names of the sigma schedule overrides, same order as sample_schedule in stable-diffusion.h
@@ -173,6 +175,7 @@ int gen_image(char *text, char *negativeText, int width, int height, int steps,
-1, //clip_skip
cfg_scale, // sfg_scale
3.5f,
0, // eta
width,
height,
sample_method,

View File

@@ -1,7 +1,7 @@
package main
// #cgo CXXFLAGS: -I${SRCDIR}/../../../../sources/stablediffusion-ggml.cpp/thirdparty -I${SRCDIR}/../../../../sources/stablediffusion-ggml.cpp -I${SRCDIR}/../../../../sources/stablediffusion-ggml.cpp/ggml/include
// #cgo LDFLAGS: -L${SRCDIR}/ -L${SRCDIR}/../../../../sources/stablediffusion-ggml.cpp/build/ggml/src/ggml-cpu -L${SRCDIR}/../../../../sources/stablediffusion-ggml.cpp/build/ggml/src -lsd -lstdc++ -lm -lggml -lggml-base -lggml-cpu -lgomp
// #cgo LDFLAGS: -L${SRCDIR}/ -lsd -lstdc++ -lm -lggmlall -lgomp
// #include <gosd.h>
// #include <stdlib.h>
import "C"

View File

@@ -1,21 +0,0 @@
package main
// Note: this is started internally by LocalAI and a server is allocated for each model
import (
"flag"
grpc "github.com/mudler/LocalAI/pkg/grpc"
)
var (
addr = flag.String("addr", "localhost:50051", "the address to connect to")
)
func main() {
flag.Parse()
if err := grpc.StartServer(*addr, &Image{}); err != nil {
panic(err)
}
}

View File

@@ -1,33 +0,0 @@
package main
// This is a wrapper to statisfy the GRPC service interface
// It is meant to be used by the main executable that is the server for the specific backend type (falcon, gpt3, etc)
import (
"github.com/mudler/LocalAI/pkg/grpc/base"
pb "github.com/mudler/LocalAI/pkg/grpc/proto"
"github.com/mudler/LocalAI/pkg/stablediffusion"
)
type Image struct {
base.SingleThread
stablediffusion *stablediffusion.StableDiffusion
}
func (image *Image) Load(opts *pb.ModelOptions) error {
var err error
// Note: the Model here is a path to a directory containing the model files
image.stablediffusion, err = stablediffusion.New(opts.ModelFile)
return err
}
func (image *Image) GenerateImage(opts *pb.GenerateImageRequest) error {
return image.stablediffusion.GenerateImage(
int(opts.Height),
int(opts.Width),
int(opts.Mode),
int(opts.Step),
int(opts.Seed),
opts.PositivePrompt,
opts.NegativePrompt,
opts.Dst)
}

View File

@@ -1,21 +0,0 @@
package main
// Note: this is started internally by LocalAI and a server is allocated for each model
import (
"flag"
grpc "github.com/mudler/LocalAI/pkg/grpc"
)
var (
addr = flag.String("addr", "localhost:50051", "the address to connect to")
)
func main() {
flag.Parse()
if err := grpc.StartServer(*addr, &Image{}); err != nil {
panic(err)
}
}

View File

@@ -1,32 +0,0 @@
package main
// This is a wrapper to statisfy the GRPC service interface
// It is meant to be used by the main executable that is the server for the specific backend type (falcon, gpt3, etc)
import (
"github.com/mudler/LocalAI/pkg/grpc/base"
pb "github.com/mudler/LocalAI/pkg/grpc/proto"
"github.com/mudler/LocalAI/pkg/tinydream"
)
type Image struct {
base.SingleThread
tinydream *tinydream.TinyDream
}
func (image *Image) Load(opts *pb.ModelOptions) error {
var err error
// Note: the Model here is a path to a directory containing the model files
image.tinydream, err = tinydream.New(opts.ModelFile)
return err
}
func (image *Image) GenerateImage(opts *pb.GenerateImageRequest) error {
return image.tinydream.GenerateImage(
int(opts.Height),
int(opts.Width),
int(opts.Step),
int(opts.Seed),
opts.PositivePrompt,
opts.NegativePrompt,
opts.Dst)
}

View File

@@ -1,204 +0,0 @@
package main
// This is a wrapper to statisfy the GRPC service interface
// It is meant to be used by the main executable that is the server for the specific backend type (falcon, gpt3, etc)
import (
"fmt"
"github.com/go-skynet/go-llama.cpp"
"github.com/mudler/LocalAI/pkg/grpc/base"
pb "github.com/mudler/LocalAI/pkg/grpc/proto"
)
type LLM struct {
base.SingleThread
llama *llama.LLama
}
func (llm *LLM) Load(opts *pb.ModelOptions) error {
ropeFreqBase := float32(10000)
ropeFreqScale := float32(1)
if opts.RopeFreqBase != 0 {
ropeFreqBase = opts.RopeFreqBase
}
if opts.RopeFreqScale != 0 {
ropeFreqScale = opts.RopeFreqScale
}
llamaOpts := []llama.ModelOption{
llama.WithRopeFreqBase(ropeFreqBase),
llama.WithRopeFreqScale(ropeFreqScale),
}
if opts.NGQA != 0 {
llamaOpts = append(llamaOpts, llama.WithGQA(int(opts.NGQA)))
}
if opts.RMSNormEps != 0 {
llamaOpts = append(llamaOpts, llama.WithRMSNormEPS(opts.RMSNormEps))
}
if opts.ContextSize != 0 {
llamaOpts = append(llamaOpts, llama.SetContext(int(opts.ContextSize)))
}
if opts.F16Memory {
llamaOpts = append(llamaOpts, llama.EnableF16Memory)
}
if opts.Embeddings {
llamaOpts = append(llamaOpts, llama.EnableEmbeddings)
}
if opts.NGPULayers != 0 {
llamaOpts = append(llamaOpts, llama.SetGPULayers(int(opts.NGPULayers)))
}
llamaOpts = append(llamaOpts, llama.SetMMap(opts.MMap))
llamaOpts = append(llamaOpts, llama.SetMainGPU(opts.MainGPU))
llamaOpts = append(llamaOpts, llama.SetTensorSplit(opts.TensorSplit))
if opts.NBatch != 0 {
llamaOpts = append(llamaOpts, llama.SetNBatch(int(opts.NBatch)))
} else {
llamaOpts = append(llamaOpts, llama.SetNBatch(512))
}
if opts.NUMA {
llamaOpts = append(llamaOpts, llama.EnableNUMA)
}
if opts.LowVRAM {
llamaOpts = append(llamaOpts, llama.EnabelLowVRAM)
}
model, err := llama.New(opts.ModelFile, llamaOpts...)
llm.llama = model
return err
}
func buildPredictOptions(opts *pb.PredictOptions) []llama.PredictOption {
ropeFreqBase := float32(10000)
ropeFreqScale := float32(1)
if opts.RopeFreqBase != 0 {
ropeFreqBase = opts.RopeFreqBase
}
if opts.RopeFreqScale != 0 {
ropeFreqScale = opts.RopeFreqScale
}
predictOptions := []llama.PredictOption{
llama.SetTemperature(opts.Temperature),
llama.SetTopP(opts.TopP),
llama.SetTopK(int(opts.TopK)),
llama.SetTokens(int(opts.Tokens)),
llama.SetThreads(int(opts.Threads)),
llama.WithGrammar(opts.Grammar),
llama.SetRopeFreqBase(ropeFreqBase),
llama.SetRopeFreqScale(ropeFreqScale),
llama.SetNegativePromptScale(opts.NegativePromptScale),
llama.SetNegativePrompt(opts.NegativePrompt),
}
if opts.PromptCacheAll {
predictOptions = append(predictOptions, llama.EnablePromptCacheAll)
}
if opts.PromptCacheRO {
predictOptions = append(predictOptions, llama.EnablePromptCacheRO)
}
// Expected absolute path
if opts.PromptCachePath != "" {
predictOptions = append(predictOptions, llama.SetPathPromptCache(opts.PromptCachePath))
}
if opts.Mirostat != 0 {
predictOptions = append(predictOptions, llama.SetMirostat(int(opts.Mirostat)))
}
if opts.MirostatETA != 0 {
predictOptions = append(predictOptions, llama.SetMirostatETA(opts.MirostatETA))
}
if opts.MirostatTAU != 0 {
predictOptions = append(predictOptions, llama.SetMirostatTAU(opts.MirostatTAU))
}
if opts.Debug {
predictOptions = append(predictOptions, llama.Debug)
}
predictOptions = append(predictOptions, llama.SetStopWords(opts.StopPrompts...))
if opts.PresencePenalty != 0 {
predictOptions = append(predictOptions, llama.SetPenalty(opts.PresencePenalty))
}
if opts.NKeep != 0 {
predictOptions = append(predictOptions, llama.SetNKeep(int(opts.NKeep)))
}
if opts.Batch != 0 {
predictOptions = append(predictOptions, llama.SetBatch(int(opts.Batch)))
}
if opts.F16KV {
predictOptions = append(predictOptions, llama.EnableF16KV)
}
if opts.IgnoreEOS {
predictOptions = append(predictOptions, llama.IgnoreEOS)
}
if opts.Seed != 0 {
predictOptions = append(predictOptions, llama.SetSeed(int(opts.Seed)))
}
//predictOptions = append(predictOptions, llama.SetLogitBias(c.Seed))
predictOptions = append(predictOptions, llama.SetFrequencyPenalty(opts.FrequencyPenalty))
predictOptions = append(predictOptions, llama.SetMlock(opts.MLock))
predictOptions = append(predictOptions, llama.SetMemoryMap(opts.MMap))
predictOptions = append(predictOptions, llama.SetPredictionMainGPU(opts.MainGPU))
predictOptions = append(predictOptions, llama.SetPredictionTensorSplit(opts.TensorSplit))
predictOptions = append(predictOptions, llama.SetTailFreeSamplingZ(opts.TailFreeSamplingZ))
predictOptions = append(predictOptions, llama.SetTypicalP(opts.TypicalP))
return predictOptions
}
func (llm *LLM) Predict(opts *pb.PredictOptions) (string, error) {
return llm.llama.Predict(opts.Prompt, buildPredictOptions(opts)...)
}
func (llm *LLM) PredictStream(opts *pb.PredictOptions, results chan string) error {
predictOptions := buildPredictOptions(opts)
predictOptions = append(predictOptions, llama.SetTokenCallback(func(token string) bool {
results <- token
return true
}))
go func() {
_, err := llm.llama.Predict(opts.Prompt, predictOptions...)
if err != nil {
fmt.Println("err: ", err)
}
close(results)
}()
return nil
}
func (llm *LLM) Embeddings(opts *pb.PredictOptions) ([]float32, error) {
predictOptions := buildPredictOptions(opts)
if len(opts.EmbeddingTokens) > 0 {
tokens := []int{}
for _, t := range opts.EmbeddingTokens {
tokens = append(tokens, int(t))
}
return llm.llama.TokenEmbeddings(tokens, predictOptions...)
}
return llm.llama.Embeddings(opts.Embeddings, predictOptions...)
}

View File

@@ -1,19 +0,0 @@
package main
import (
"flag"
grpc "github.com/mudler/LocalAI/pkg/grpc"
)
var (
addr = flag.String("addr", "localhost:50051", "the address to connect to")
)
func main() {
flag.Parse()
if err := grpc.StartServer(*addr, &LLM{}); err != nil {
panic(err)
}
}

View File

@@ -311,12 +311,16 @@ func (s *Store) StoresGet(opts *pb.StoresGetOptions) (pb.StoresGetResult, error)
}
func isNormalized(k []float32) bool {
var sum float32
var sum float64
for _, v := range k {
sum += v
v64 := float64(v)
sum += v64*v64
}
return sum == 1.0
s := math.Sqrt(sum)
return s >= 0.99 && s <= 1.01
}
// TODO: This we could replace with handwritten SIMD code
@@ -328,7 +332,7 @@ func normalizedCosineSimilarity(k1, k2 []float32) float32 {
dot += k1[i] * k2[i]
}
assert(dot >= -1 && dot <= 1, fmt.Sprintf("dot = %f", dot))
assert(dot >= -1.01 && dot <= 1.01, fmt.Sprintf("dot = %f", dot))
// 2.0 * (1.0 - dot) would be the Euclidean distance
return dot
@@ -418,7 +422,7 @@ func cosineSimilarity(k1, k2 []float32, mag1 float64) float32 {
sim := float32(dot / (mag1 * math.Sqrt(mag2)))
assert(sim >= -1 && sim <= 1, fmt.Sprintf("sim = %f", sim))
assert(sim >= -1.01 && sim <= 1.01, fmt.Sprintf("sim = %f", sim))
return sim
}

View File

@@ -1,5 +1,6 @@
--extra-index-url https://pytorch-extension.intel.com/release-whl/stable/xpu/us/
intel-extension-for-pytorch
torch
intel-extension-for-pytorch==2.3.110+xpu
torch==2.3.1+cxx11.abi
oneccl_bind_pt==2.3.100+xpu
optimum[openvino]
setuptools==75.1.0 # https://github.com/mudler/LocalAI/issues/2406
setuptools

View File

@@ -1,6 +1,6 @@
accelerate
auto-gptq==0.7.1
grpcio==1.68.1
grpcio==1.71.0
protobuf
certifi
transformers

View File

@@ -1,8 +1,9 @@
--extra-index-url https://pytorch-extension.intel.com/release-whl/stable/xpu/us/
intel-extension-for-pytorch
torch
torchaudio
intel-extension-for-pytorch==2.3.110+xpu
torch==2.3.1+cxx11.abi
torchaudio==2.3.1+cxx11.abi
oneccl_bind_pt==2.3.100+xpu
optimum[openvino]
setuptools==75.1.0 # https://github.com/mudler/LocalAI/issues/2406
setuptools
transformers
accelerate

View File

@@ -1,4 +1,4 @@
bark==0.1.5
grpcio==1.68.1
grpcio==1.71.0
protobuf
certifi

View File

@@ -17,6 +17,9 @@
# LIMIT_TARGETS="cublas12"
# source $(dirname $0)/../common/libbackend.sh
#
PYTHON_VERSION="3.10"
function init() {
# Name of the backend (directory name)
BACKEND_NAME=${PWD##*/}
@@ -88,7 +91,7 @@ function getBuildProfile() {
# always result in an activated virtual environment
function ensureVenv() {
if [ ! -d "${EDIR}/venv" ]; then
uv venv ${EDIR}/venv
uv venv --python ${PYTHON_VERSION} ${EDIR}/venv
echo "virtualenv created"
fi

View File

@@ -1,4 +1,5 @@
--extra-index-url https://pytorch-extension.intel.com/release-whl/stable/xpu/us/
intel-extension-for-pytorch
torch
intel-extension-for-pytorch==2.3.110+xpu
torch==2.3.1+cxx11.abi
oneccl_bind_pt==2.3.100+xpu
optimum[openvino]

View File

@@ -1,3 +1,3 @@
grpcio==1.68.1
grpcio==1.71.0
protobuf
grpcio-tools

View File

@@ -1,4 +1,4 @@
transformers
transformers==4.48.3
accelerate
torch==2.4.1
coqui-tts

View File

@@ -1,6 +1,6 @@
--extra-index-url https://download.pytorch.org/whl/cu118
torch==2.4.1+cu118
torchaudio==2.4.1+cu118
transformers
transformers==4.48.3
accelerate
coqui-tts

View File

@@ -1,5 +1,5 @@
torch==2.4.1
torchaudio==2.4.1
transformers
transformers==4.48.3
accelerate
coqui-tts

View File

@@ -1,6 +1,6 @@
--extra-index-url https://download.pytorch.org/whl/rocm6.0
torch==2.4.1+rocm6.0
torchaudio==2.4.1+rocm6.0
transformers
transformers==4.48.3
accelerate
coqui-tts

View File

@@ -1,9 +1,10 @@
--extra-index-url https://pytorch-extension.intel.com/release-whl/stable/xpu/us/
intel-extension-for-pytorch
torch
torchaudio
intel-extension-for-pytorch==2.3.110+xpu
torch==2.3.1+cxx11.abi
torchaudio==2.3.1+cxx11.abi
oneccl_bind_pt==2.3.100+xpu
optimum[openvino]
setuptools==75.1.0 # https://github.com/mudler/LocalAI/issues/2406
transformers
setuptools
transformers==4.48.3
accelerate
coqui-tts

View File

@@ -1,4 +1,4 @@
grpcio==1.68.1
grpcio==1.71.0
protobuf
certifi
packaging==24.1

View File

@@ -17,7 +17,7 @@ import backend_pb2_grpc
import grpc
from diffusers import StableDiffusion3Pipeline, StableDiffusionXLPipeline, StableDiffusionDepth2ImgPipeline, DPMSolverMultistepScheduler, StableDiffusionPipeline, DiffusionPipeline, \
from diffusers import SanaPipeline, StableDiffusion3Pipeline, StableDiffusionXLPipeline, StableDiffusionDepth2ImgPipeline, DPMSolverMultistepScheduler, StableDiffusionPipeline, DiffusionPipeline, \
EulerAncestralDiscreteScheduler, FluxPipeline, FluxTransformer2DModel
from diffusers import StableDiffusionImg2ImgPipeline, AutoPipelineForText2Image, ControlNetModel, StableVideoDiffusionPipeline
from diffusers.pipelines.stable_diffusion import safety_checker
@@ -159,6 +159,18 @@ class BackendServicer(backend_pb2_grpc.BackendServicer):
torchType = torch.float16
variant = "fp16"
options = request.Options
# empty dict
self.options = {}
# The options are a list of strings in this form optname:optvalue
# We are storing all the options in a dict so we can use it later when
# generating the images
for opt in options:
key, value = opt.split(":")
self.options[key] = value
local = False
modelFile = request.Model
@@ -275,6 +287,13 @@ class BackendServicer(backend_pb2_grpc.BackendServicer):
if request.LowVRAM:
self.pipe.enable_model_cpu_offload()
elif request.PipelineType == "SanaPipeline":
self.pipe = SanaPipeline.from_pretrained(
request.Model,
variant="bf16",
torch_dtype=torch.bfloat16)
self.pipe.vae.to(torch.bfloat16)
self.pipe.text_encoder.to(torch.bfloat16)
if CLIPSKIP and request.CLIPSkip != 0:
self.clip_skip = request.CLIPSkip
@@ -434,6 +453,9 @@ class BackendServicer(backend_pb2_grpc.BackendServicer):
# create a dictionary of parameters by using the keys from EnableParameters and the values from defaults
kwargs = {key: options.get(key) for key in keys if key in options}
# populate kwargs from self.options.
kwargs.update(self.options)
# Set seed
if request.seed > 0:
kwargs["generator"] = torch.Generator(device=self.device).manual_seed(

View File

@@ -1,9 +1,10 @@
--extra-index-url https://pytorch-extension.intel.com/release-whl/stable/xpu/us/
intel-extension-for-pytorch
torch
torchvision
intel-extension-for-pytorch==2.3.110+xpu
torch==2.3.1+cxx11.abi
torchvision==0.18.1+cxx11.abi
oneccl_bind_pt==2.3.100+xpu
optimum[openvino]
setuptools==75.1.0 # https://github.com/mudler/LocalAI/issues/2406
setuptools
diffusers
opencv-python
transformers

View File

@@ -1,5 +1,5 @@
setuptools
grpcio==1.68.1
grpcio==1.71.0
pillow
protobuf
certifi

View File

@@ -1,4 +1,4 @@
grpcio==1.68.1
grpcio==1.71.0
protobuf
certifi
wheel

View File

@@ -1,8 +1,9 @@
.DEFAULT_GOAL := install
.PHONY: install
install: protogen
install:
bash install.sh
$(MAKE) protogen
.PHONY: protogen
protogen: backend_pb2_grpc.py backend_pb2.py
@@ -12,14 +13,8 @@ protogen-clean:
$(RM) backend_pb2_grpc.py backend_pb2.py
backend_pb2_grpc.py backend_pb2.py:
python3 -m grpc_tools.protoc -I../.. --python_out=. --grpc_python_out=. backend.proto
bash protogen.sh
.PHONY: clean
clean: protogen-clean
rm -rf venv __pycache__
.PHONY: test
test: protogen
@echo "Testing openvoice..."
bash test.sh
@echo "openvoice tested."
rm -rf venv __pycache__

View File

@@ -1,85 +1,65 @@
#!/usr/bin/env python3
"""
Extra gRPC server for HuggingFace SentenceTransformer models.
This is an extra gRPC server of LocalAI for Bark TTS
"""
from concurrent import futures
import time
import argparse
import signal
import sys
import os
import time
import backend_pb2
import backend_pb2_grpc
from faster_whisper import WhisperModel
import grpc
from sentence_transformers import SentenceTransformer
_ONE_DAY_IN_SECONDS = 60 * 60 * 24
# If MAX_WORKERS are specified in the environment use it, otherwise default to 1
MAX_WORKERS = int(os.environ.get('PYTHON_GRPC_MAX_WORKERS', '1'))
COQUI_LANGUAGE = os.environ.get('COQUI_LANGUAGE', None)
# Implement the BackendServicer class with the service methods
class BackendServicer(backend_pb2_grpc.BackendServicer):
"""
A gRPC servicer for the backend service.
This class implements the gRPC methods for the backend service, including Health, LoadModel, and Embedding.
BackendServicer is the class that implements the gRPC service
"""
def Health(self, request, context):
"""
A gRPC method that returns the health status of the backend service.
Args:
request: A HealthRequest object that contains the request parameters.
context: A grpc.ServicerContext object that provides information about the RPC.
Returns:
A Reply object that contains the health status of the backend service.
"""
return backend_pb2.Reply(message=bytes("OK", 'utf-8'))
def LoadModel(self, request, context):
"""
A gRPC method that loads a model into memory.
device = "cpu"
# Get device
# device = "cuda" if request.CUDA else "cpu"
if request.CUDA:
device = "cuda"
Args:
request: A LoadModelRequest object that contains the request parameters.
context: A grpc.ServicerContext object that provides information about the RPC.
Returns:
A Result object that contains the result of the LoadModel operation.
"""
model_name = request.Model
try:
self.model = SentenceTransformer(model_name, trust_remote_code=request.TrustRemoteCode)
print("Preparing models, please wait", file=sys.stderr)
self.model = WhisperModel(request.Model, device=device, compute_type="float16")
except Exception as err:
return backend_pb2.Result(success=False, message=f"Unexpected {err=}, {type(err)=}")
# Implement your logic here for the LoadModel service
# Replace this with your desired response
return backend_pb2.Result(message="Model loaded successfully", success=True)
def Embedding(self, request, context):
"""
A gRPC method that calculates embeddings for a given sentence.
Args:
request: An EmbeddingRequest object that contains the request parameters.
context: A grpc.ServicerContext object that provides information about the RPC.
Returns:
An EmbeddingResult object that contains the calculated embeddings.
"""
# Implement your logic here for the Embedding service
# Replace this with your desired response
print("Calculated embeddings for: " + request.Embeddings, file=sys.stderr)
sentence_embeddings = self.model.encode(request.Embeddings)
return backend_pb2.EmbeddingResult(embeddings=sentence_embeddings)
def AudioTranscription(self, request, context):
resultSegments = []
text = ""
try:
segments, info = self.model.transcribe(request.dst, beam_size=5, condition_on_previous_text=False)
id = 0
for segment in segments:
print("[%.2fs -> %.2fs] %s" % (segment.start, segment.end, segment.text))
resultSegments.append(backend_pb2.TranscriptSegment(id=id, start=segment.start, end=segment.end, text=segment.text))
text += segment.text
id += 1
except Exception as err:
print(f"Unexpected {err=}, {type(err)=}", file=sys.stderr)
return backend_pb2.TranscriptResult(segments=resultSegments, text=text)
def serve(address):
server = grpc.server(futures.ThreadPoolExecutor(max_workers=MAX_WORKERS))

View File

View File

@@ -0,0 +1,8 @@
faster-whisper
opencv-python
accelerate
compel
peft
sentencepiece
torch==2.4.1
optimum-quanto

View File

@@ -1,5 +1,9 @@
--extra-index-url https://download.pytorch.org/whl/cu118
torch==2.4.1+cu118
faster-whisper
opencv-python
accelerate
sentence-transformers==3.3.1
transformers
compel
peft
sentencepiece
optimum-quanto

View File

@@ -0,0 +1,8 @@
torch==2.4.1
faster-whisper
opencv-python
accelerate
compel
peft
sentencepiece
optimum-quanto

View File

@@ -1,4 +1,3 @@
--extra-index-url https://download.pytorch.org/whl/rocm6.0
transformers
accelerate
torch==2.4.1+rocm6.0
torch
faster-whisper

View File

@@ -0,0 +1,6 @@
--extra-index-url https://pytorch-extension.intel.com/release-whl/stable/xpu/us/
intel-extension-for-pytorch==2.3.110+xpu
torch==2.3.1+cxx11.abi
oneccl_bind_pt==2.3.100+xpu
optimum[openvino]
faster-whisper

View File

@@ -0,0 +1,3 @@
grpcio==1.71.0
protobuf
grpcio-tools

View File

@@ -0,0 +1,20 @@
.DEFAULT_GOAL := install
.PHONY: install
install:
bash install.sh
$(MAKE) protogen
.PHONY: protogen
protogen: backend_pb2_grpc.py backend_pb2.py
.PHONY: protogen-clean
protogen-clean:
$(RM) backend_pb2_grpc.py backend_pb2.py
backend_pb2_grpc.py backend_pb2.py:
bash protogen.sh
.PHONY: clean
clean: protogen-clean
rm -rf venv __pycache__

View File

@@ -1,6 +1,6 @@
#!/usr/bin/env python3
"""
Extra gRPC server for MusicgenForConditionalGeneration models.
Extra gRPC server for Kokoro models.
"""
from concurrent import futures
@@ -8,20 +8,17 @@ import argparse
import signal
import sys
import os
import time
import backend_pb2
import backend_pb2_grpc
import soundfile as sf
import grpc
from scipy.io.wavfile import write as write_wav
from parler_tts import ParlerTTSForConditionalGeneration
from transformers import AutoTokenizer
import soundfile as sf
from models import build_model
from kokoro import generate
import torch
SAMPLE_RATE = 22050
_ONE_DAY_IN_SECONDS = 60 * 60 * 24
# If MAX_WORKERS are specified in the environment use it, otherwise default to 1
@@ -59,10 +56,31 @@ class BackendServicer(backend_pb2_grpc.BackendServicer):
A Result object that contains the result of the LoadModel operation.
"""
model_name = request.Model
device = "cuda:0" if torch.cuda.is_available() else "cpu"
try:
self.model = ParlerTTSForConditionalGeneration.from_pretrained(model_name).to(device)
self.tokenizer = AutoTokenizer.from_pretrained(model_name)
device = "cuda:0" if torch.cuda.is_available() else "cpu"
self.MODEL = build_model(request.ModelFile, device)
options = request.Options
# Find the voice from the options, options are a list of strings in this form optname:optvalue:
VOICE_NAME = None
for opt in options:
if opt.startswith("voice:"):
VOICE_NAME = opt.split(":")[1]
break
if VOICE_NAME is None:
return backend_pb2.Result(success=False, message=f"No voice specified in options")
MODELPATH = request.ModelPath
# If voice name contains a plus, split it and load the two models and combine them
if "+" in VOICE_NAME:
voice1, voice2 = VOICE_NAME.split("+")
voice1 = torch.load(f'{MODELPATH}/{voice1}.pt', weights_only=True).to(device)
voice2 = torch.load(f'{MODELPATH}/{voice2}.pt', weights_only=True).to(device)
self.VOICEPACK = torch.mean(torch.stack([voice1, voice2]), dim=0)
else:
self.VOICEPACK = torch.load(f'{MODELPATH}/{VOICE_NAME}.pt', weights_only=True).to(device)
self.VOICE_NAME = VOICE_NAME
print(f'Loaded voice: {VOICE_NAME}')
except Exception as err:
return backend_pb2.Result(success=False, message=f"Unexpected {err=}, {type(err)=}")
@@ -70,38 +88,26 @@ class BackendServicer(backend_pb2_grpc.BackendServicer):
def TTS(self, request, context):
model_name = request.model
voice = request.voice
if voice == "":
voice = "A female speaker with a slightly low-pitched voice delivers her words quite expressively, in a very confined sounding environment with clear audio quality. She speaks very fast."
if model_name == "":
return backend_pb2.Result(success=False, message="request.model is required")
try:
device = "cuda:0" if torch.cuda.is_available() else "cpu"
input_ids = self.tokenizer(voice, return_tensors="pt").input_ids.to(device)
prompt_input_ids = self.tokenizer(request.text, return_tensors="pt").input_ids.to(device)
generation = self.model.generate(input_ids=input_ids, prompt_input_ids=prompt_input_ids)
audio_arr = generation.cpu().numpy().squeeze()
print("[parler-tts] TTS generated!", file=sys.stderr)
sf.write(request.dst, audio_arr, self.model.config.sampling_rate)
print("[parler-tts] TTS saved to", request.dst, file=sys.stderr)
print("[parler-tts] TTS for", file=sys.stderr)
print(request, file=sys.stderr)
audio, out_ps = generate(self.MODEL, request.text, self.VOICEPACK, lang=self.VOICE_NAME)
print(out_ps)
sf.write(request.dst, audio, SAMPLE_RATE)
except Exception as err:
return backend_pb2.Result(success=False, message=f"Unexpected {err=}, {type(err)=}")
return backend_pb2.Result(success=True)
def serve(address):
server = grpc.server(futures.ThreadPoolExecutor(max_workers=MAX_WORKERS))
backend_pb2_grpc.add_BackendServicer_to_server(BackendServicer(), server)
server.add_insecure_port(address)
server.start()
print("[parler-tts] Server started. Listening on: " + address, file=sys.stderr)
print("[Kokoro] Server started. Listening on: " + address, file=sys.stderr)
# Define the signal handler function
def signal_handler(sig, frame):
print("[parler-tts] Received termination signal. Shutting down...")
print("[Kokoro] Received termination signal. Shutting down...")
server.stop(0)
sys.exit(0)
@@ -121,5 +127,5 @@ if __name__ == "__main__":
"--addr", default="localhost:50051", help="The address to bind the server to."
)
args = parser.parse_args()
print(f"[parler-tts] startup: {args}", file=sys.stderr)
print(f"[Kokoro] startup: {args}", file=sys.stderr)
serve(args.addr)

View File

@@ -0,0 +1,524 @@
# https://huggingface.co/hexgrad/Kokoro-82M/blob/main/istftnet.py
# https://github.com/yl4579/StyleTTS2/blob/main/Modules/istftnet.py
from scipy.signal import get_window
from torch.nn import Conv1d, ConvTranspose1d
from torch.nn.utils import weight_norm, remove_weight_norm
import numpy as np
import torch
import torch.nn as nn
import torch.nn.functional as F
# https://github.com/yl4579/StyleTTS2/blob/main/Modules/utils.py
def init_weights(m, mean=0.0, std=0.01):
classname = m.__class__.__name__
if classname.find("Conv") != -1:
m.weight.data.normal_(mean, std)
def get_padding(kernel_size, dilation=1):
return int((kernel_size*dilation - dilation)/2)
LRELU_SLOPE = 0.1
class AdaIN1d(nn.Module):
def __init__(self, style_dim, num_features):
super().__init__()
self.norm = nn.InstanceNorm1d(num_features, affine=False)
self.fc = nn.Linear(style_dim, num_features*2)
def forward(self, x, s):
h = self.fc(s)
h = h.view(h.size(0), h.size(1), 1)
gamma, beta = torch.chunk(h, chunks=2, dim=1)
return (1 + gamma) * self.norm(x) + beta
class AdaINResBlock1(torch.nn.Module):
def __init__(self, channels, kernel_size=3, dilation=(1, 3, 5), style_dim=64):
super(AdaINResBlock1, self).__init__()
self.convs1 = nn.ModuleList([
weight_norm(Conv1d(channels, channels, kernel_size, 1, dilation=dilation[0],
padding=get_padding(kernel_size, dilation[0]))),
weight_norm(Conv1d(channels, channels, kernel_size, 1, dilation=dilation[1],
padding=get_padding(kernel_size, dilation[1]))),
weight_norm(Conv1d(channels, channels, kernel_size, 1, dilation=dilation[2],
padding=get_padding(kernel_size, dilation[2])))
])
self.convs1.apply(init_weights)
self.convs2 = nn.ModuleList([
weight_norm(Conv1d(channels, channels, kernel_size, 1, dilation=1,
padding=get_padding(kernel_size, 1))),
weight_norm(Conv1d(channels, channels, kernel_size, 1, dilation=1,
padding=get_padding(kernel_size, 1))),
weight_norm(Conv1d(channels, channels, kernel_size, 1, dilation=1,
padding=get_padding(kernel_size, 1)))
])
self.convs2.apply(init_weights)
self.adain1 = nn.ModuleList([
AdaIN1d(style_dim, channels),
AdaIN1d(style_dim, channels),
AdaIN1d(style_dim, channels),
])
self.adain2 = nn.ModuleList([
AdaIN1d(style_dim, channels),
AdaIN1d(style_dim, channels),
AdaIN1d(style_dim, channels),
])
self.alpha1 = nn.ParameterList([nn.Parameter(torch.ones(1, channels, 1)) for i in range(len(self.convs1))])
self.alpha2 = nn.ParameterList([nn.Parameter(torch.ones(1, channels, 1)) for i in range(len(self.convs2))])
def forward(self, x, s):
for c1, c2, n1, n2, a1, a2 in zip(self.convs1, self.convs2, self.adain1, self.adain2, self.alpha1, self.alpha2):
xt = n1(x, s)
xt = xt + (1 / a1) * (torch.sin(a1 * xt) ** 2) # Snake1D
xt = c1(xt)
xt = n2(xt, s)
xt = xt + (1 / a2) * (torch.sin(a2 * xt) ** 2) # Snake1D
xt = c2(xt)
x = xt + x
return x
def remove_weight_norm(self):
for l in self.convs1:
remove_weight_norm(l)
for l in self.convs2:
remove_weight_norm(l)
class TorchSTFT(torch.nn.Module):
def __init__(self, filter_length=800, hop_length=200, win_length=800, window='hann'):
super().__init__()
self.filter_length = filter_length
self.hop_length = hop_length
self.win_length = win_length
self.window = torch.from_numpy(get_window(window, win_length, fftbins=True).astype(np.float32))
def transform(self, input_data):
forward_transform = torch.stft(
input_data,
self.filter_length, self.hop_length, self.win_length, window=self.window.to(input_data.device),
return_complex=True)
return torch.abs(forward_transform), torch.angle(forward_transform)
def inverse(self, magnitude, phase):
inverse_transform = torch.istft(
magnitude * torch.exp(phase * 1j),
self.filter_length, self.hop_length, self.win_length, window=self.window.to(magnitude.device))
return inverse_transform.unsqueeze(-2) # unsqueeze to stay consistent with conv_transpose1d implementation
def forward(self, input_data):
self.magnitude, self.phase = self.transform(input_data)
reconstruction = self.inverse(self.magnitude, self.phase)
return reconstruction
class SineGen(torch.nn.Module):
""" Definition of sine generator
SineGen(samp_rate, harmonic_num = 0,
sine_amp = 0.1, noise_std = 0.003,
voiced_threshold = 0,
flag_for_pulse=False)
samp_rate: sampling rate in Hz
harmonic_num: number of harmonic overtones (default 0)
sine_amp: amplitude of sine-wavefrom (default 0.1)
noise_std: std of Gaussian noise (default 0.003)
voiced_thoreshold: F0 threshold for U/V classification (default 0)
flag_for_pulse: this SinGen is used inside PulseGen (default False)
Note: when flag_for_pulse is True, the first time step of a voiced
segment is always sin(np.pi) or cos(0)
"""
def __init__(self, samp_rate, upsample_scale, harmonic_num=0,
sine_amp=0.1, noise_std=0.003,
voiced_threshold=0,
flag_for_pulse=False):
super(SineGen, self).__init__()
self.sine_amp = sine_amp
self.noise_std = noise_std
self.harmonic_num = harmonic_num
self.dim = self.harmonic_num + 1
self.sampling_rate = samp_rate
self.voiced_threshold = voiced_threshold
self.flag_for_pulse = flag_for_pulse
self.upsample_scale = upsample_scale
def _f02uv(self, f0):
# generate uv signal
uv = (f0 > self.voiced_threshold).type(torch.float32)
return uv
def _f02sine(self, f0_values):
""" f0_values: (batchsize, length, dim)
where dim indicates fundamental tone and overtones
"""
# convert to F0 in rad. The interger part n can be ignored
# because 2 * np.pi * n doesn't affect phase
rad_values = (f0_values / self.sampling_rate) % 1
# initial phase noise (no noise for fundamental component)
rand_ini = torch.rand(f0_values.shape[0], f0_values.shape[2], \
device=f0_values.device)
rand_ini[:, 0] = 0
rad_values[:, 0, :] = rad_values[:, 0, :] + rand_ini
# instantanouse phase sine[t] = sin(2*pi \sum_i=1 ^{t} rad)
if not self.flag_for_pulse:
# # for normal case
# # To prevent torch.cumsum numerical overflow,
# # it is necessary to add -1 whenever \sum_k=1^n rad_value_k > 1.
# # Buffer tmp_over_one_idx indicates the time step to add -1.
# # This will not change F0 of sine because (x-1) * 2*pi = x * 2*pi
# tmp_over_one = torch.cumsum(rad_values, 1) % 1
# tmp_over_one_idx = (padDiff(tmp_over_one)) < 0
# cumsum_shift = torch.zeros_like(rad_values)
# cumsum_shift[:, 1:, :] = tmp_over_one_idx * -1.0
# phase = torch.cumsum(rad_values, dim=1) * 2 * np.pi
rad_values = torch.nn.functional.interpolate(rad_values.transpose(1, 2),
scale_factor=1/self.upsample_scale,
mode="linear").transpose(1, 2)
# tmp_over_one = torch.cumsum(rad_values, 1) % 1
# tmp_over_one_idx = (padDiff(tmp_over_one)) < 0
# cumsum_shift = torch.zeros_like(rad_values)
# cumsum_shift[:, 1:, :] = tmp_over_one_idx * -1.0
phase = torch.cumsum(rad_values, dim=1) * 2 * np.pi
phase = torch.nn.functional.interpolate(phase.transpose(1, 2) * self.upsample_scale,
scale_factor=self.upsample_scale, mode="linear").transpose(1, 2)
sines = torch.sin(phase)
else:
# If necessary, make sure that the first time step of every
# voiced segments is sin(pi) or cos(0)
# This is used for pulse-train generation
# identify the last time step in unvoiced segments
uv = self._f02uv(f0_values)
uv_1 = torch.roll(uv, shifts=-1, dims=1)
uv_1[:, -1, :] = 1
u_loc = (uv < 1) * (uv_1 > 0)
# get the instantanouse phase
tmp_cumsum = torch.cumsum(rad_values, dim=1)
# different batch needs to be processed differently
for idx in range(f0_values.shape[0]):
temp_sum = tmp_cumsum[idx, u_loc[idx, :, 0], :]
temp_sum[1:, :] = temp_sum[1:, :] - temp_sum[0:-1, :]
# stores the accumulation of i.phase within
# each voiced segments
tmp_cumsum[idx, :, :] = 0
tmp_cumsum[idx, u_loc[idx, :, 0], :] = temp_sum
# rad_values - tmp_cumsum: remove the accumulation of i.phase
# within the previous voiced segment.
i_phase = torch.cumsum(rad_values - tmp_cumsum, dim=1)
# get the sines
sines = torch.cos(i_phase * 2 * np.pi)
return sines
def forward(self, f0):
""" sine_tensor, uv = forward(f0)
input F0: tensor(batchsize=1, length, dim=1)
f0 for unvoiced steps should be 0
output sine_tensor: tensor(batchsize=1, length, dim)
output uv: tensor(batchsize=1, length, 1)
"""
f0_buf = torch.zeros(f0.shape[0], f0.shape[1], self.dim,
device=f0.device)
# fundamental component
fn = torch.multiply(f0, torch.FloatTensor([[range(1, self.harmonic_num + 2)]]).to(f0.device))
# generate sine waveforms
sine_waves = self._f02sine(fn) * self.sine_amp
# generate uv signal
# uv = torch.ones(f0.shape)
# uv = uv * (f0 > self.voiced_threshold)
uv = self._f02uv(f0)
# noise: for unvoiced should be similar to sine_amp
# std = self.sine_amp/3 -> max value ~ self.sine_amp
# . for voiced regions is self.noise_std
noise_amp = uv * self.noise_std + (1 - uv) * self.sine_amp / 3
noise = noise_amp * torch.randn_like(sine_waves)
# first: set the unvoiced part to 0 by uv
# then: additive noise
sine_waves = sine_waves * uv + noise
return sine_waves, uv, noise
class SourceModuleHnNSF(torch.nn.Module):
""" SourceModule for hn-nsf
SourceModule(sampling_rate, harmonic_num=0, sine_amp=0.1,
add_noise_std=0.003, voiced_threshod=0)
sampling_rate: sampling_rate in Hz
harmonic_num: number of harmonic above F0 (default: 0)
sine_amp: amplitude of sine source signal (default: 0.1)
add_noise_std: std of additive Gaussian noise (default: 0.003)
note that amplitude of noise in unvoiced is decided
by sine_amp
voiced_threshold: threhold to set U/V given F0 (default: 0)
Sine_source, noise_source = SourceModuleHnNSF(F0_sampled)
F0_sampled (batchsize, length, 1)
Sine_source (batchsize, length, 1)
noise_source (batchsize, length 1)
uv (batchsize, length, 1)
"""
def __init__(self, sampling_rate, upsample_scale, harmonic_num=0, sine_amp=0.1,
add_noise_std=0.003, voiced_threshod=0):
super(SourceModuleHnNSF, self).__init__()
self.sine_amp = sine_amp
self.noise_std = add_noise_std
# to produce sine waveforms
self.l_sin_gen = SineGen(sampling_rate, upsample_scale, harmonic_num,
sine_amp, add_noise_std, voiced_threshod)
# to merge source harmonics into a single excitation
self.l_linear = torch.nn.Linear(harmonic_num + 1, 1)
self.l_tanh = torch.nn.Tanh()
def forward(self, x):
"""
Sine_source, noise_source = SourceModuleHnNSF(F0_sampled)
F0_sampled (batchsize, length, 1)
Sine_source (batchsize, length, 1)
noise_source (batchsize, length 1)
"""
# source for harmonic branch
with torch.no_grad():
sine_wavs, uv, _ = self.l_sin_gen(x)
sine_merge = self.l_tanh(self.l_linear(sine_wavs))
# source for noise branch, in the same shape as uv
noise = torch.randn_like(uv) * self.sine_amp / 3
return sine_merge, noise, uv
def padDiff(x):
return F.pad(F.pad(x, (0,0,-1,1), 'constant', 0) - x, (0,0,0,-1), 'constant', 0)
class Generator(torch.nn.Module):
def __init__(self, style_dim, resblock_kernel_sizes, upsample_rates, upsample_initial_channel, resblock_dilation_sizes, upsample_kernel_sizes, gen_istft_n_fft, gen_istft_hop_size):
super(Generator, self).__init__()
self.num_kernels = len(resblock_kernel_sizes)
self.num_upsamples = len(upsample_rates)
resblock = AdaINResBlock1
self.m_source = SourceModuleHnNSF(
sampling_rate=24000,
upsample_scale=np.prod(upsample_rates) * gen_istft_hop_size,
harmonic_num=8, voiced_threshod=10)
self.f0_upsamp = torch.nn.Upsample(scale_factor=np.prod(upsample_rates) * gen_istft_hop_size)
self.noise_convs = nn.ModuleList()
self.noise_res = nn.ModuleList()
self.ups = nn.ModuleList()
for i, (u, k) in enumerate(zip(upsample_rates, upsample_kernel_sizes)):
self.ups.append(weight_norm(
ConvTranspose1d(upsample_initial_channel//(2**i), upsample_initial_channel//(2**(i+1)),
k, u, padding=(k-u)//2)))
self.resblocks = nn.ModuleList()
for i in range(len(self.ups)):
ch = upsample_initial_channel//(2**(i+1))
for j, (k, d) in enumerate(zip(resblock_kernel_sizes,resblock_dilation_sizes)):
self.resblocks.append(resblock(ch, k, d, style_dim))
c_cur = upsample_initial_channel // (2 ** (i + 1))
if i + 1 < len(upsample_rates): #
stride_f0 = np.prod(upsample_rates[i + 1:])
self.noise_convs.append(Conv1d(
gen_istft_n_fft + 2, c_cur, kernel_size=stride_f0 * 2, stride=stride_f0, padding=(stride_f0+1) // 2))
self.noise_res.append(resblock(c_cur, 7, [1,3,5], style_dim))
else:
self.noise_convs.append(Conv1d(gen_istft_n_fft + 2, c_cur, kernel_size=1))
self.noise_res.append(resblock(c_cur, 11, [1,3,5], style_dim))
self.post_n_fft = gen_istft_n_fft
self.conv_post = weight_norm(Conv1d(ch, self.post_n_fft + 2, 7, 1, padding=3))
self.ups.apply(init_weights)
self.conv_post.apply(init_weights)
self.reflection_pad = torch.nn.ReflectionPad1d((1, 0))
self.stft = TorchSTFT(filter_length=gen_istft_n_fft, hop_length=gen_istft_hop_size, win_length=gen_istft_n_fft)
def forward(self, x, s, f0):
with torch.no_grad():
f0 = self.f0_upsamp(f0[:, None]).transpose(1, 2) # bs,n,t
har_source, noi_source, uv = self.m_source(f0)
har_source = har_source.transpose(1, 2).squeeze(1)
har_spec, har_phase = self.stft.transform(har_source)
har = torch.cat([har_spec, har_phase], dim=1)
for i in range(self.num_upsamples):
x = F.leaky_relu(x, LRELU_SLOPE)
x_source = self.noise_convs[i](har)
x_source = self.noise_res[i](x_source, s)
x = self.ups[i](x)
if i == self.num_upsamples - 1:
x = self.reflection_pad(x)
x = x + x_source
xs = None
for j in range(self.num_kernels):
if xs is None:
xs = self.resblocks[i*self.num_kernels+j](x, s)
else:
xs += self.resblocks[i*self.num_kernels+j](x, s)
x = xs / self.num_kernels
x = F.leaky_relu(x)
x = self.conv_post(x)
spec = torch.exp(x[:,:self.post_n_fft // 2 + 1, :])
phase = torch.sin(x[:, self.post_n_fft // 2 + 1:, :])
return self.stft.inverse(spec, phase)
def fw_phase(self, x, s):
for i in range(self.num_upsamples):
x = F.leaky_relu(x, LRELU_SLOPE)
x = self.ups[i](x)
xs = None
for j in range(self.num_kernels):
if xs is None:
xs = self.resblocks[i*self.num_kernels+j](x, s)
else:
xs += self.resblocks[i*self.num_kernels+j](x, s)
x = xs / self.num_kernels
x = F.leaky_relu(x)
x = self.reflection_pad(x)
x = self.conv_post(x)
spec = torch.exp(x[:,:self.post_n_fft // 2 + 1, :])
phase = torch.sin(x[:, self.post_n_fft // 2 + 1:, :])
return spec, phase
def remove_weight_norm(self):
print('Removing weight norm...')
for l in self.ups:
remove_weight_norm(l)
for l in self.resblocks:
l.remove_weight_norm()
remove_weight_norm(self.conv_pre)
remove_weight_norm(self.conv_post)
class AdainResBlk1d(nn.Module):
def __init__(self, dim_in, dim_out, style_dim=64, actv=nn.LeakyReLU(0.2),
upsample='none', dropout_p=0.0):
super().__init__()
self.actv = actv
self.upsample_type = upsample
self.upsample = UpSample1d(upsample)
self.learned_sc = dim_in != dim_out
self._build_weights(dim_in, dim_out, style_dim)
self.dropout = nn.Dropout(dropout_p)
if upsample == 'none':
self.pool = nn.Identity()
else:
self.pool = weight_norm(nn.ConvTranspose1d(dim_in, dim_in, kernel_size=3, stride=2, groups=dim_in, padding=1, output_padding=1))
def _build_weights(self, dim_in, dim_out, style_dim):
self.conv1 = weight_norm(nn.Conv1d(dim_in, dim_out, 3, 1, 1))
self.conv2 = weight_norm(nn.Conv1d(dim_out, dim_out, 3, 1, 1))
self.norm1 = AdaIN1d(style_dim, dim_in)
self.norm2 = AdaIN1d(style_dim, dim_out)
if self.learned_sc:
self.conv1x1 = weight_norm(nn.Conv1d(dim_in, dim_out, 1, 1, 0, bias=False))
def _shortcut(self, x):
x = self.upsample(x)
if self.learned_sc:
x = self.conv1x1(x)
return x
def _residual(self, x, s):
x = self.norm1(x, s)
x = self.actv(x)
x = self.pool(x)
x = self.conv1(self.dropout(x))
x = self.norm2(x, s)
x = self.actv(x)
x = self.conv2(self.dropout(x))
return x
def forward(self, x, s):
out = self._residual(x, s)
out = (out + self._shortcut(x)) / np.sqrt(2)
return out
class UpSample1d(nn.Module):
def __init__(self, layer_type):
super().__init__()
self.layer_type = layer_type
def forward(self, x):
if self.layer_type == 'none':
return x
else:
return F.interpolate(x, scale_factor=2, mode='nearest')
class Decoder(nn.Module):
def __init__(self, dim_in=512, F0_channel=512, style_dim=64, dim_out=80,
resblock_kernel_sizes = [3,7,11],
upsample_rates = [10, 6],
upsample_initial_channel=512,
resblock_dilation_sizes=[[1,3,5], [1,3,5], [1,3,5]],
upsample_kernel_sizes=[20, 12],
gen_istft_n_fft=20, gen_istft_hop_size=5):
super().__init__()
self.decode = nn.ModuleList()
self.encode = AdainResBlk1d(dim_in + 2, 1024, style_dim)
self.decode.append(AdainResBlk1d(1024 + 2 + 64, 1024, style_dim))
self.decode.append(AdainResBlk1d(1024 + 2 + 64, 1024, style_dim))
self.decode.append(AdainResBlk1d(1024 + 2 + 64, 1024, style_dim))
self.decode.append(AdainResBlk1d(1024 + 2 + 64, 512, style_dim, upsample=True))
self.F0_conv = weight_norm(nn.Conv1d(1, 1, kernel_size=3, stride=2, groups=1, padding=1))
self.N_conv = weight_norm(nn.Conv1d(1, 1, kernel_size=3, stride=2, groups=1, padding=1))
self.asr_res = nn.Sequential(
weight_norm(nn.Conv1d(512, 64, kernel_size=1)),
)
self.generator = Generator(style_dim, resblock_kernel_sizes, upsample_rates,
upsample_initial_channel, resblock_dilation_sizes,
upsample_kernel_sizes, gen_istft_n_fft, gen_istft_hop_size)
def forward(self, asr, F0_curve, N, s):
F0 = self.F0_conv(F0_curve.unsqueeze(1))
N = self.N_conv(N.unsqueeze(1))
x = torch.cat([asr, F0, N], axis=1)
x = self.encode(x, s)
asr_res = self.asr_res(asr)
res = True
for block in self.decode:
if res:
x = torch.cat([x, asr_res, F0, N], axis=1)
x = block(x, s)
if block.upsample_type != "none":
res = False
x = self.generator(x, s, F0_curve)
return x

View File

@@ -0,0 +1,166 @@
# https://huggingface.co/hexgrad/Kokoro-82M/blob/main/kokoro.py
import phonemizer
import re
import torch
import numpy as np
def split_num(num):
num = num.group()
if '.' in num:
return num
elif ':' in num:
h, m = [int(n) for n in num.split(':')]
if m == 0:
return f"{h} o'clock"
elif m < 10:
return f'{h} oh {m}'
return f'{h} {m}'
year = int(num[:4])
if year < 1100 or year % 1000 < 10:
return num
left, right = num[:2], int(num[2:4])
s = 's' if num.endswith('s') else ''
if 100 <= year % 1000 <= 999:
if right == 0:
return f'{left} hundred{s}'
elif right < 10:
return f'{left} oh {right}{s}'
return f'{left} {right}{s}'
def flip_money(m):
m = m.group()
bill = 'dollar' if m[0] == '$' else 'pound'
if m[-1].isalpha():
return f'{m[1:]} {bill}s'
elif '.' not in m:
s = '' if m[1:] == '1' else 's'
return f'{m[1:]} {bill}{s}'
b, c = m[1:].split('.')
s = '' if b == '1' else 's'
c = int(c.ljust(2, '0'))
coins = f"cent{'' if c == 1 else 's'}" if m[0] == '$' else ('penny' if c == 1 else 'pence')
return f'{b} {bill}{s} and {c} {coins}'
def point_num(num):
a, b = num.group().split('.')
return ' point '.join([a, ' '.join(b)])
def normalize_text(text):
text = text.replace(chr(8216), "'").replace(chr(8217), "'")
text = text.replace('«', chr(8220)).replace('»', chr(8221))
text = text.replace(chr(8220), '"').replace(chr(8221), '"')
text = text.replace('(', '«').replace(')', '»')
for a, b in zip('、。!,:;?', ',.!,:;?'):
text = text.replace(a, b+' ')
text = re.sub(r'[^\S \n]', ' ', text)
text = re.sub(r' +', ' ', text)
text = re.sub(r'(?<=\n) +(?=\n)', '', text)
text = re.sub(r'\bD[Rr]\.(?= [A-Z])', 'Doctor', text)
text = re.sub(r'\b(?:Mr\.|MR\.(?= [A-Z]))', 'Mister', text)
text = re.sub(r'\b(?:Ms\.|MS\.(?= [A-Z]))', 'Miss', text)
text = re.sub(r'\b(?:Mrs\.|MRS\.(?= [A-Z]))', 'Mrs', text)
text = re.sub(r'\betc\.(?! [A-Z])', 'etc', text)
text = re.sub(r'(?i)\b(y)eah?\b', r"\1e'a", text)
text = re.sub(r'\d*\.\d+|\b\d{4}s?\b|(?<!:)\b(?:[1-9]|1[0-2]):[0-5]\d\b(?!:)', split_num, text)
text = re.sub(r'(?<=\d),(?=\d)', '', text)
text = re.sub(r'(?i)[$£]\d+(?:\.\d+)?(?: hundred| thousand| (?:[bm]|tr)illion)*\b|[$£]\d+\.\d\d?\b', flip_money, text)
text = re.sub(r'\d*\.\d+', point_num, text)
text = re.sub(r'(?<=\d)-(?=\d)', ' to ', text)
text = re.sub(r'(?<=\d)S', ' S', text)
text = re.sub(r"(?<=[BCDFGHJ-NP-TV-Z])'?s\b", "'S", text)
text = re.sub(r"(?<=X')S\b", 's', text)
text = re.sub(r'(?:[A-Za-z]\.){2,} [a-z]', lambda m: m.group().replace('.', '-'), text)
text = re.sub(r'(?i)(?<=[A-Z])\.(?=[A-Z])', '-', text)
return text.strip()
def get_vocab():
_pad = "$"
_punctuation = ';:,.!?¡¿—…"«»“” '
_letters = 'ABCDEFGHIJKLMNOPQRSTUVWXYZabcdefghijklmnopqrstuvwxyz'
_letters_ipa = "ɑɐɒæɓʙβɔɕçɗɖðʤəɘɚɛɜɝɞɟʄɡɠɢʛɦɧħɥʜɨɪʝɭɬɫɮʟɱɯɰŋɳɲɴøɵɸθœɶʘɹɺɾɻʀʁɽʂʃʈʧʉʊʋⱱʌɣɤʍχʎʏʑʐʒʔʡʕʢǀǁǂǃˈˌːˑʼʴʰʱʲʷˠˤ˞↓↑→↗↘'̩'"
symbols = [_pad] + list(_punctuation) + list(_letters) + list(_letters_ipa)
dicts = {}
for i in range(len((symbols))):
dicts[symbols[i]] = i
return dicts
VOCAB = get_vocab()
def tokenize(ps):
return [i for i in map(VOCAB.get, ps) if i is not None]
phonemizers = dict(
a=phonemizer.backend.EspeakBackend(language='en-us', preserve_punctuation=True, with_stress=True),
b=phonemizer.backend.EspeakBackend(language='en-gb', preserve_punctuation=True, with_stress=True),
)
def phonemize(text, lang, norm=True):
if norm:
text = normalize_text(text)
ps = phonemizers[lang].phonemize([text])
ps = ps[0] if ps else ''
# https://en.wiktionary.org/wiki/kokoro#English
ps = ps.replace('kəkˈoːɹoʊ', 'kˈoʊkəɹoʊ').replace('kəkˈɔːɹəʊ', 'kˈəʊkəɹəʊ')
ps = ps.replace('ʲ', 'j').replace('r', 'ɹ').replace('x', 'k').replace('ɬ', 'l')
ps = re.sub(r'(?<=[a-zɹː])(?=hˈʌndɹɪd)', ' ', ps)
ps = re.sub(r' z(?=[;:,.!?¡¿—…"«»“” ]|$)', 'z', ps)
if lang == 'a':
ps = re.sub(r'(?<=nˈaɪn)ti(?!ː)', 'di', ps)
ps = ''.join(filter(lambda p: p in VOCAB, ps))
return ps.strip()
def length_to_mask(lengths):
mask = torch.arange(lengths.max()).unsqueeze(0).expand(lengths.shape[0], -1).type_as(lengths)
mask = torch.gt(mask+1, lengths.unsqueeze(1))
return mask
@torch.no_grad()
def forward(model, tokens, ref_s, speed):
device = ref_s.device
tokens = torch.LongTensor([[0, *tokens, 0]]).to(device)
input_lengths = torch.LongTensor([tokens.shape[-1]]).to(device)
text_mask = length_to_mask(input_lengths).to(device)
bert_dur = model.bert(tokens, attention_mask=(~text_mask).int())
d_en = model.bert_encoder(bert_dur).transpose(-1, -2)
s = ref_s[:, 128:]
d = model.predictor.text_encoder(d_en, s, input_lengths, text_mask)
x, _ = model.predictor.lstm(d)
duration = model.predictor.duration_proj(x)
duration = torch.sigmoid(duration).sum(axis=-1) / speed
pred_dur = torch.round(duration).clamp(min=1).long()
pred_aln_trg = torch.zeros(input_lengths, pred_dur.sum().item())
c_frame = 0
for i in range(pred_aln_trg.size(0)):
pred_aln_trg[i, c_frame:c_frame + pred_dur[0,i].item()] = 1
c_frame += pred_dur[0,i].item()
en = d.transpose(-1, -2) @ pred_aln_trg.unsqueeze(0).to(device)
F0_pred, N_pred = model.predictor.F0Ntrain(en, s)
t_en = model.text_encoder(tokens, input_lengths, text_mask)
asr = t_en @ pred_aln_trg.unsqueeze(0).to(device)
return model.decoder(asr, F0_pred, N_pred, ref_s[:, :128]).squeeze().cpu().numpy()
def generate(model, text, voicepack, lang='a', speed=1, ps=None):
ps = ps or phonemize(text, lang)
tokens = tokenize(ps)
if not tokens:
return None
elif len(tokens) > 510:
tokens = tokens[:510]
print('Truncated to 510 tokens')
ref_s = voicepack[len(tokens)]
out = forward(model, tokens, ref_s, speed)
ps = ''.join(next(k for k, v in VOCAB.items() if i == v) for i in tokens)
return out, ps
def generate_full(model, text, voicepack, lang='a', speed=1, ps=None):
ps = ps or phonemize(text, lang)
tokens = tokenize(ps)
if not tokens:
return None
outs = []
loop_count = len(tokens)//510 + (1 if len(tokens) % 510 != 0 else 0)
for i in range(loop_count):
ref_s = voicepack[len(tokens[i*510:(i+1)*510])]
out = forward(model, tokens[i*510:(i+1)*510], ref_s, speed)
outs.append(out)
outs = np.concatenate(outs)
ps = ''.join(next(k for k, v in VOCAB.items() if i == v) for i in tokens)
return outs, ps

View File

@@ -0,0 +1,373 @@
# https://github.com/yl4579/StyleTTS2/blob/main/models.py
# https://huggingface.co/hexgrad/Kokoro-82M/blob/main/models.py
from istftnet import AdaIN1d, Decoder
from munch import Munch
from pathlib import Path
from plbert import load_plbert
from torch.nn.utils import weight_norm, spectral_norm
import json
import numpy as np
import os
import os.path as osp
import torch
import torch.nn as nn
import torch.nn.functional as F
class LinearNorm(torch.nn.Module):
def __init__(self, in_dim, out_dim, bias=True, w_init_gain='linear'):
super(LinearNorm, self).__init__()
self.linear_layer = torch.nn.Linear(in_dim, out_dim, bias=bias)
torch.nn.init.xavier_uniform_(
self.linear_layer.weight,
gain=torch.nn.init.calculate_gain(w_init_gain))
def forward(self, x):
return self.linear_layer(x)
class LayerNorm(nn.Module):
def __init__(self, channels, eps=1e-5):
super().__init__()
self.channels = channels
self.eps = eps
self.gamma = nn.Parameter(torch.ones(channels))
self.beta = nn.Parameter(torch.zeros(channels))
def forward(self, x):
x = x.transpose(1, -1)
x = F.layer_norm(x, (self.channels,), self.gamma, self.beta, self.eps)
return x.transpose(1, -1)
class TextEncoder(nn.Module):
def __init__(self, channels, kernel_size, depth, n_symbols, actv=nn.LeakyReLU(0.2)):
super().__init__()
self.embedding = nn.Embedding(n_symbols, channels)
padding = (kernel_size - 1) // 2
self.cnn = nn.ModuleList()
for _ in range(depth):
self.cnn.append(nn.Sequential(
weight_norm(nn.Conv1d(channels, channels, kernel_size=kernel_size, padding=padding)),
LayerNorm(channels),
actv,
nn.Dropout(0.2),
))
# self.cnn = nn.Sequential(*self.cnn)
self.lstm = nn.LSTM(channels, channels//2, 1, batch_first=True, bidirectional=True)
def forward(self, x, input_lengths, m):
x = self.embedding(x) # [B, T, emb]
x = x.transpose(1, 2) # [B, emb, T]
m = m.to(input_lengths.device).unsqueeze(1)
x.masked_fill_(m, 0.0)
for c in self.cnn:
x = c(x)
x.masked_fill_(m, 0.0)
x = x.transpose(1, 2) # [B, T, chn]
input_lengths = input_lengths.cpu().numpy()
x = nn.utils.rnn.pack_padded_sequence(
x, input_lengths, batch_first=True, enforce_sorted=False)
self.lstm.flatten_parameters()
x, _ = self.lstm(x)
x, _ = nn.utils.rnn.pad_packed_sequence(
x, batch_first=True)
x = x.transpose(-1, -2)
x_pad = torch.zeros([x.shape[0], x.shape[1], m.shape[-1]])
x_pad[:, :, :x.shape[-1]] = x
x = x_pad.to(x.device)
x.masked_fill_(m, 0.0)
return x
def inference(self, x):
x = self.embedding(x)
x = x.transpose(1, 2)
x = self.cnn(x)
x = x.transpose(1, 2)
self.lstm.flatten_parameters()
x, _ = self.lstm(x)
return x
def length_to_mask(self, lengths):
mask = torch.arange(lengths.max()).unsqueeze(0).expand(lengths.shape[0], -1).type_as(lengths)
mask = torch.gt(mask+1, lengths.unsqueeze(1))
return mask
class UpSample1d(nn.Module):
def __init__(self, layer_type):
super().__init__()
self.layer_type = layer_type
def forward(self, x):
if self.layer_type == 'none':
return x
else:
return F.interpolate(x, scale_factor=2, mode='nearest')
class AdainResBlk1d(nn.Module):
def __init__(self, dim_in, dim_out, style_dim=64, actv=nn.LeakyReLU(0.2),
upsample='none', dropout_p=0.0):
super().__init__()
self.actv = actv
self.upsample_type = upsample
self.upsample = UpSample1d(upsample)
self.learned_sc = dim_in != dim_out
self._build_weights(dim_in, dim_out, style_dim)
self.dropout = nn.Dropout(dropout_p)
if upsample == 'none':
self.pool = nn.Identity()
else:
self.pool = weight_norm(nn.ConvTranspose1d(dim_in, dim_in, kernel_size=3, stride=2, groups=dim_in, padding=1, output_padding=1))
def _build_weights(self, dim_in, dim_out, style_dim):
self.conv1 = weight_norm(nn.Conv1d(dim_in, dim_out, 3, 1, 1))
self.conv2 = weight_norm(nn.Conv1d(dim_out, dim_out, 3, 1, 1))
self.norm1 = AdaIN1d(style_dim, dim_in)
self.norm2 = AdaIN1d(style_dim, dim_out)
if self.learned_sc:
self.conv1x1 = weight_norm(nn.Conv1d(dim_in, dim_out, 1, 1, 0, bias=False))
def _shortcut(self, x):
x = self.upsample(x)
if self.learned_sc:
x = self.conv1x1(x)
return x
def _residual(self, x, s):
x = self.norm1(x, s)
x = self.actv(x)
x = self.pool(x)
x = self.conv1(self.dropout(x))
x = self.norm2(x, s)
x = self.actv(x)
x = self.conv2(self.dropout(x))
return x
def forward(self, x, s):
out = self._residual(x, s)
out = (out + self._shortcut(x)) / np.sqrt(2)
return out
class AdaLayerNorm(nn.Module):
def __init__(self, style_dim, channels, eps=1e-5):
super().__init__()
self.channels = channels
self.eps = eps
self.fc = nn.Linear(style_dim, channels*2)
def forward(self, x, s):
x = x.transpose(-1, -2)
x = x.transpose(1, -1)
h = self.fc(s)
h = h.view(h.size(0), h.size(1), 1)
gamma, beta = torch.chunk(h, chunks=2, dim=1)
gamma, beta = gamma.transpose(1, -1), beta.transpose(1, -1)
x = F.layer_norm(x, (self.channels,), eps=self.eps)
x = (1 + gamma) * x + beta
return x.transpose(1, -1).transpose(-1, -2)
class ProsodyPredictor(nn.Module):
def __init__(self, style_dim, d_hid, nlayers, max_dur=50, dropout=0.1):
super().__init__()
self.text_encoder = DurationEncoder(sty_dim=style_dim,
d_model=d_hid,
nlayers=nlayers,
dropout=dropout)
self.lstm = nn.LSTM(d_hid + style_dim, d_hid // 2, 1, batch_first=True, bidirectional=True)
self.duration_proj = LinearNorm(d_hid, max_dur)
self.shared = nn.LSTM(d_hid + style_dim, d_hid // 2, 1, batch_first=True, bidirectional=True)
self.F0 = nn.ModuleList()
self.F0.append(AdainResBlk1d(d_hid, d_hid, style_dim, dropout_p=dropout))
self.F0.append(AdainResBlk1d(d_hid, d_hid // 2, style_dim, upsample=True, dropout_p=dropout))
self.F0.append(AdainResBlk1d(d_hid // 2, d_hid // 2, style_dim, dropout_p=dropout))
self.N = nn.ModuleList()
self.N.append(AdainResBlk1d(d_hid, d_hid, style_dim, dropout_p=dropout))
self.N.append(AdainResBlk1d(d_hid, d_hid // 2, style_dim, upsample=True, dropout_p=dropout))
self.N.append(AdainResBlk1d(d_hid // 2, d_hid // 2, style_dim, dropout_p=dropout))
self.F0_proj = nn.Conv1d(d_hid // 2, 1, 1, 1, 0)
self.N_proj = nn.Conv1d(d_hid // 2, 1, 1, 1, 0)
def forward(self, texts, style, text_lengths, alignment, m):
d = self.text_encoder(texts, style, text_lengths, m)
batch_size = d.shape[0]
text_size = d.shape[1]
# predict duration
input_lengths = text_lengths.cpu().numpy()
x = nn.utils.rnn.pack_padded_sequence(
d, input_lengths, batch_first=True, enforce_sorted=False)
m = m.to(text_lengths.device).unsqueeze(1)
self.lstm.flatten_parameters()
x, _ = self.lstm(x)
x, _ = nn.utils.rnn.pad_packed_sequence(
x, batch_first=True)
x_pad = torch.zeros([x.shape[0], m.shape[-1], x.shape[-1]])
x_pad[:, :x.shape[1], :] = x
x = x_pad.to(x.device)
duration = self.duration_proj(nn.functional.dropout(x, 0.5, training=self.training))
en = (d.transpose(-1, -2) @ alignment)
return duration.squeeze(-1), en
def F0Ntrain(self, x, s):
x, _ = self.shared(x.transpose(-1, -2))
F0 = x.transpose(-1, -2)
for block in self.F0:
F0 = block(F0, s)
F0 = self.F0_proj(F0)
N = x.transpose(-1, -2)
for block in self.N:
N = block(N, s)
N = self.N_proj(N)
return F0.squeeze(1), N.squeeze(1)
def length_to_mask(self, lengths):
mask = torch.arange(lengths.max()).unsqueeze(0).expand(lengths.shape[0], -1).type_as(lengths)
mask = torch.gt(mask+1, lengths.unsqueeze(1))
return mask
class DurationEncoder(nn.Module):
def __init__(self, sty_dim, d_model, nlayers, dropout=0.1):
super().__init__()
self.lstms = nn.ModuleList()
for _ in range(nlayers):
self.lstms.append(nn.LSTM(d_model + sty_dim,
d_model // 2,
num_layers=1,
batch_first=True,
bidirectional=True,
dropout=dropout))
self.lstms.append(AdaLayerNorm(sty_dim, d_model))
self.dropout = dropout
self.d_model = d_model
self.sty_dim = sty_dim
def forward(self, x, style, text_lengths, m):
masks = m.to(text_lengths.device)
x = x.permute(2, 0, 1)
s = style.expand(x.shape[0], x.shape[1], -1)
x = torch.cat([x, s], axis=-1)
x.masked_fill_(masks.unsqueeze(-1).transpose(0, 1), 0.0)
x = x.transpose(0, 1)
input_lengths = text_lengths.cpu().numpy()
x = x.transpose(-1, -2)
for block in self.lstms:
if isinstance(block, AdaLayerNorm):
x = block(x.transpose(-1, -2), style).transpose(-1, -2)
x = torch.cat([x, s.permute(1, -1, 0)], axis=1)
x.masked_fill_(masks.unsqueeze(-1).transpose(-1, -2), 0.0)
else:
x = x.transpose(-1, -2)
x = nn.utils.rnn.pack_padded_sequence(
x, input_lengths, batch_first=True, enforce_sorted=False)
block.flatten_parameters()
x, _ = block(x)
x, _ = nn.utils.rnn.pad_packed_sequence(
x, batch_first=True)
x = F.dropout(x, p=self.dropout, training=self.training)
x = x.transpose(-1, -2)
x_pad = torch.zeros([x.shape[0], x.shape[1], m.shape[-1]])
x_pad[:, :, :x.shape[-1]] = x
x = x_pad.to(x.device)
return x.transpose(-1, -2)
def inference(self, x, style):
x = self.embedding(x.transpose(-1, -2)) * np.sqrt(self.d_model)
style = style.expand(x.shape[0], x.shape[1], -1)
x = torch.cat([x, style], axis=-1)
src = self.pos_encoder(x)
output = self.transformer_encoder(src).transpose(0, 1)
return output
def length_to_mask(self, lengths):
mask = torch.arange(lengths.max()).unsqueeze(0).expand(lengths.shape[0], -1).type_as(lengths)
mask = torch.gt(mask+1, lengths.unsqueeze(1))
return mask
# https://github.com/yl4579/StyleTTS2/blob/main/utils.py
def recursive_munch(d):
if isinstance(d, dict):
return Munch((k, recursive_munch(v)) for k, v in d.items())
elif isinstance(d, list):
return [recursive_munch(v) for v in d]
else:
return d
def build_model(path, device):
config = Path(__file__).parent / 'config.json'
assert config.exists(), f'Config path incorrect: config.json not found at {config}'
with open(config, 'r') as r:
args = recursive_munch(json.load(r))
assert args.decoder.type == 'istftnet', f'Unknown decoder type: {args.decoder.type}'
decoder = Decoder(dim_in=args.hidden_dim, style_dim=args.style_dim, dim_out=args.n_mels,
resblock_kernel_sizes = args.decoder.resblock_kernel_sizes,
upsample_rates = args.decoder.upsample_rates,
upsample_initial_channel=args.decoder.upsample_initial_channel,
resblock_dilation_sizes=args.decoder.resblock_dilation_sizes,
upsample_kernel_sizes=args.decoder.upsample_kernel_sizes,
gen_istft_n_fft=args.decoder.gen_istft_n_fft, gen_istft_hop_size=args.decoder.gen_istft_hop_size)
text_encoder = TextEncoder(channels=args.hidden_dim, kernel_size=5, depth=args.n_layer, n_symbols=args.n_token)
predictor = ProsodyPredictor(style_dim=args.style_dim, d_hid=args.hidden_dim, nlayers=args.n_layer, max_dur=args.max_dur, dropout=args.dropout)
bert = load_plbert()
bert_encoder = nn.Linear(bert.config.hidden_size, args.hidden_dim)
for parent in [bert, bert_encoder, predictor, decoder, text_encoder]:
for child in parent.children():
if isinstance(child, nn.RNNBase):
child.flatten_parameters()
model = Munch(
bert=bert.to(device).eval(),
bert_encoder=bert_encoder.to(device).eval(),
predictor=predictor.to(device).eval(),
decoder=decoder.to(device).eval(),
text_encoder=text_encoder.to(device).eval(),
)
for key, state_dict in torch.load(path, map_location='cpu', weights_only=True)['net'].items():
assert key in model, key
try:
model[key].load_state_dict(state_dict)
except:
state_dict = {k[7:]: v for k, v in state_dict.items()}
model[key].load_state_dict(state_dict, strict=False)
return model

View File

@@ -0,0 +1,16 @@
# https://huggingface.co/hexgrad/Kokoro-82M/blob/main/plbert.py
# https://github.com/yl4579/StyleTTS2/blob/main/Utils/PLBERT/util.py
from transformers import AlbertConfig, AlbertModel
class CustomAlbert(AlbertModel):
def forward(self, *args, **kwargs):
# Call the original forward method
outputs = super().forward(*args, **kwargs)
# Only return the last_hidden_state
return outputs.last_hidden_state
def load_plbert():
plbert_config = {'vocab_size': 178, 'hidden_size': 768, 'num_attention_heads': 12, 'intermediate_size': 2048, 'max_position_embeddings': 512, 'num_hidden_layers': 12, 'dropout': 0.1}
albert_base_configuration = AlbertConfig(**plbert_config)
bert = CustomAlbert(albert_base_configuration)
return bert

View File

@@ -0,0 +1,6 @@
#!/bin/bash
set -e
source $(dirname $0)/../common/libbackend.sh
python3 -m grpc_tools.protoc -I../.. --python_out=. --grpc_python_out=. backend.proto

Some files were not shown because too many files have changed in this diff Show More