Compare commits

..

340 Commits

Author SHA1 Message Date
Ettore Di Giacinto
d65c16e364 debug 2024-06-14 08:42:15 +02:00
LocalAI [bot]
25f45827ab ⬆️ Update ggerganov/whisper.cpp (#2565)
Signed-off-by: github-actions[bot] <41898282+github-actions[bot]@users.noreply.github.com>
Co-authored-by: mudler <2420543+mudler@users.noreply.github.com>
2024-06-14 00:26:51 +00:00
LocalAI [bot]
f322f7c62d ⬆️ Update ggerganov/llama.cpp (#2564)
Signed-off-by: github-actions[bot] <41898282+github-actions[bot]@users.noreply.github.com>
Co-authored-by: mudler <2420543+mudler@users.noreply.github.com>
2024-06-13 23:47:50 +00:00
Ettore Di Giacinto
06351cbbb4 feat(binary): support extracted bundled libs on darwin (#2563)
When offering fallback libs, use the proper env var for darwin

Note: this does not include the libraries itself, but only sets the
proper env var for the libs to be picked up on darwin.

Signed-off-by: Ettore Di Giacinto <mudler@localai.io>
2024-06-13 22:59:42 +02:00
Ettore Di Giacinto
8f952d90b0 feat(guesser): identify gemma models (#2561)
Signed-off-by: Ettore Di Giacinto <mudler@localai.io>
2024-06-13 19:12:37 +02:00
Ettore Di Giacinto
7b205510f9 feat(gallery): uniform download from CLI (#2559)
Signed-off-by: Ettore Di Giacinto <mudler@localai.io>
2024-06-13 16:12:46 +02:00
LocalAI [bot]
f183fec232 ⬆️ Update ggerganov/llama.cpp (#2554)
Signed-off-by: github-actions[bot] <41898282+github-actions[bot]@users.noreply.github.com>
Co-authored-by: mudler <2420543+mudler@users.noreply.github.com>
2024-06-13 08:34:32 +00:00
Ettore Di Giacinto
91f48b2143 docs(gallery): lazy-load images (#2557)
Signed-off-by: Ettore Di Giacinto <mudler@localai.io>
2024-06-13 01:05:24 +02:00
Ettore Di Giacinto
f404580256 docs: bump go version
Signed-off-by: Ettore Di Giacinto <mudler@localai.io>
2024-06-13 00:49:51 +02:00
Ettore Di Giacinto
882556d4db feat(gallery): show available models in website, allow local-ai models install to install from galleries (#2555)
* WIP

Signed-off-by: Ettore Di Giacinto <mudler@localai.io>

* gen a static page instead (we force DNS redirects to it)

Signed-off-by: Ettore Di Giacinto <mudler@localai.io>

* feat(gallery): install models from CLI, unify install

Signed-off-by: Ettore Di Giacinto <mudler@localai.io>

* Uniform graphic of model page

Signed-off-by: Ettore Di Giacinto <mudler@localai.io>

* Makefile: update targets

Signed-off-by: Ettore Di Giacinto <mudler@localai.io>

* Slightly enhance gallery view

Signed-off-by: Ettore Di Giacinto <mudler@localai.io>

---------

Signed-off-by: Ettore Di Giacinto <mudler@localai.io>
2024-06-13 00:47:16 +02:00
LocalAI [bot]
f8382adbf7 ⬆️ Update ggerganov/llama.cpp (#2551)
Signed-off-by: github-actions[bot] <41898282+github-actions[bot]@users.noreply.github.com>
Co-authored-by: mudler <2420543+mudler@users.noreply.github.com>
2024-06-12 08:54:00 +00:00
LocalAI [bot]
80298f94fa ⬆️ Update ggerganov/whisper.cpp (#2552)
Signed-off-by: github-actions[bot] <41898282+github-actions[bot]@users.noreply.github.com>
Co-authored-by: mudler <2420543+mudler@users.noreply.github.com>
2024-06-12 07:39:21 +00:00
Ettore Di Giacinto
0f8b489346 models(gallery): add badger-lambda-llama-3-8b (#2550)
Signed-off-by: Ettore Di Giacinto <mudler@localai.io>
2024-06-11 19:11:42 +02:00
Ettore Di Giacinto
154694462e models(gallery): add duloxetine (#2549)
Signed-off-by: Ettore Di Giacinto <mudler@localai.io>
2024-06-11 19:06:26 +02:00
Ettore Di Giacinto
347317d5d2 models(gallery): add average_normie_v3.69_8b-iq-imatrix (#2548)
Signed-off-by: Ettore Di Giacinto <mudler@localai.io>
2024-06-11 19:05:27 +02:00
Ettore Di Giacinto
d40722d2fa models(gallery): add llama-salad-8x8b (#2547)
Signed-off-by: Ettore Di Giacinto <mudler@localai.io>
2024-06-11 18:40:16 +02:00
Ettore Di Giacinto
7b12300f15 models(gallery): add l3-aethora-15b (#2546)
Signed-off-by: Ettore Di Giacinto <mudler@localai.io>
2024-06-11 18:31:13 +02:00
Ettore Di Giacinto
3c50abffdd models(gallery): add hathor-l3-8b-v.01-iq-imatrix (#2545)
Signed-off-by: Ettore Di Giacinto <mudler@localai.io>
2024-06-11 16:37:27 +02:00
Ettore Di Giacinto
2eb2ed84ab models(gallery): add llama3-8B-aifeifei-1.2-iq-imatrix (#2544)
Signed-off-by: Ettore Di Giacinto <mudler@localai.io>
2024-06-11 10:54:21 +02:00
LocalAI [bot]
5da10fb769 ⬆️ Update ggerganov/llama.cpp (#2540)
Signed-off-by: github-actions[bot] <41898282+github-actions[bot]@users.noreply.github.com>
Co-authored-by: mudler <2420543+mudler@users.noreply.github.com>
2024-06-11 00:59:17 +00:00
LocalAI [bot]
bec883e3ff ⬆️ Update ggerganov/whisper.cpp (#2539)
Signed-off-by: github-actions[bot] <41898282+github-actions[bot]@users.noreply.github.com>
Co-authored-by: mudler <2420543+mudler@users.noreply.github.com>
2024-06-10 23:32:32 +00:00
Ettore Di Giacinto
14b41be057 feat(detection): detect by template in gguf file, add qwen2, phi, mistral and chatml (#2536)
feat(detection): detect by template in gguf file, add qwen and chatml

Signed-off-by: Ettore Di Giacinto <mudler@localai.io>
2024-06-10 22:58:04 +02:00
reid41
aff2acacf9 Add integrations (#2535)
* update integrations

* update integrations1
2024-06-10 19:18:47 +02:00
Rene Leonhardt
b4d4c0a18f chore(deps): Update Dockerfile (#2532)
Signed-off-by: Rene Leonhardt <65483435+reneleonhardt@users.noreply.github.com>
2024-06-10 08:40:02 +00:00
LocalAI [bot]
3a5f2283ea ⬆️ Update ggerganov/llama.cpp (#2531)
Signed-off-by: github-actions[bot] <41898282+github-actions[bot]@users.noreply.github.com>
Co-authored-by: mudler <2420543+mudler@users.noreply.github.com>
2024-06-09 23:15:59 +00:00
Ettore Di Giacinto
d9109ffafb feat(defaults): add defaults for Command-R models (#2529)
Signed-off-by: Ettore Di Giacinto <mudler@localai.io>
2024-06-09 20:00:16 +02:00
Ettore Di Giacinto
d7e137295a feat(util): add util command to print GGUF informations (#2528)
Signed-off-by: Ettore Di Giacinto <mudler@localai.io>
2024-06-09 19:27:42 +02:00
Ettore Di Giacinto
6c087ae743 feat(arm64): enable single-binary builds (#2490)
* ci: try to build for arm64

Signed-off-by: Ettore Di Giacinto <mudler@localai.io>

* Allow to skip hipblas on make dist

Signed-off-by: Ettore Di Giacinto <mudler@localai.io>

* use arm64 cross compiler

Signed-off-by: Ettore Di Giacinto <mudler@localai.io>

* correctly target go arm64

Signed-off-by: Ettore Di Giacinto <mudler@localai.io>

* create a separate target

Signed-off-by: Ettore Di Giacinto <mudler@localai.io>

* cross-compile grpc

Signed-off-by: Ettore Di Giacinto <mudler@localai.io>

* Add Protobuf include dirs

Signed-off-by: Ettore Di Giacinto <mudler@localai.io>

* temp disable CUDA build

Signed-off-by: Ettore Di Giacinto <mudler@localai.io>

* aarch64 builds: Reduce backends

Signed-off-by: Ettore Di Giacinto <mudler@localai.io>

* Even less backends

Signed-off-by: Ettore Di Giacinto <mudler@localai.io>

* Even less backends

Signed-off-by: Ettore Di Giacinto <mudler@localai.io>

* feat(startup): allow to load libs from extracted assets

Signed-off-by: Ettore Di Giacinto <mudler@localai.io>

* makefile: set arch

Signed-off-by: Ettore Di Giacinto <mudler@localai.io>

---------

Signed-off-by: Ettore Di Giacinto <mudler@localai.io>
2024-06-09 15:11:37 +02:00
LocalAI [bot]
88af1033d6 ⬆️ Update ggerganov/llama.cpp (#2524)
Signed-off-by: github-actions[bot] <41898282+github-actions[bot]@users.noreply.github.com>
Co-authored-by: mudler <2420543+mudler@users.noreply.github.com>
2024-06-08 23:53:35 +00:00
Ettore Di Giacinto
e96d2d7667 feat(ui): add page to talk with voice, transcription, and tts (#2520)
* feat(ui): add page to talk with voice, transcription, and tts

Signed-off-by: Ettore Di Giacinto <mudler@localai.io>

* Enhance graphics and status reporting

Signed-off-by: Ettore Di Giacinto <mudler@localai.io>

* Better UX by blocking unvalid actions

Signed-off-by: Ettore Di Giacinto <mudler@localai.io>

---------

Signed-off-by: Ettore Di Giacinto <mudler@localai.io>
2024-06-09 00:03:26 +02:00
Ettore Di Giacinto
aae7ad9d73 feat(llama.cpp): guess model defaults from file (#2522)
* wip: guess informations from gguf file

Signed-off-by: Ettore Di Giacinto <mudler@localai.io>

* update go mod

Signed-off-by: Ettore Di Giacinto <mudler@localai.io>

* Small fixups

Signed-off-by: Ettore Di Giacinto <mudler@localai.io>

* Identify llama3

Signed-off-by: Ettore Di Giacinto <mudler@localai.io>

* Do not try to guess the name, as reading gguf files can be expensive

Signed-off-by: Ettore Di Giacinto <mudler@localai.io>

* Allow to disable guessing

Signed-off-by: Ettore Di Giacinto <mudler@localai.io>

---------

Signed-off-by: Ettore Di Giacinto <mudler@localai.io>
2024-06-08 22:13:02 +02:00
LocalAI [bot]
23b3d22525 ⬆️ Update ggerganov/llama.cpp (#2518)
Signed-off-by: github-actions[bot] <41898282+github-actions[bot]@users.noreply.github.com>
Co-authored-by: mudler <2420543+mudler@users.noreply.github.com>
2024-06-07 23:35:16 +00:00
Ettore Di Giacinto
603d81dda1 feat(install): add install.sh for quick installs (#2489)
Signed-off-by: Ettore Di Giacinto <mudler@localai.io>
2024-06-07 22:30:41 +02:00
LocalAI [bot]
a21a52d384 models(gallery): ⬆️ update checksum (#2519)
⬆️ Checksum updates in gallery/index.yaml

Signed-off-by: github-actions[bot] <41898282+github-actions[bot]@users.noreply.github.com>
Co-authored-by: mudler <2420543+mudler@users.noreply.github.com>
2024-06-07 22:17:25 +02:00
Dave
219078a5e0 test: e2e /reranker endpoint (#2211)
Create a simple e2e test for the /reranker api \\ go mod tidy

Signed-off-by: Dave Lee <dave@gray101.com>
2024-06-07 18:45:52 +00:00
Ettore Di Giacinto
3b7a78adda fix(stream): do not break channel consumption (#2517)
Signed-off-by: Ettore Di Giacinto <mudler@localai.io>
2024-06-07 17:20:42 +02:00
Sertaç Özercan
0d62594099 fix: fix chat webui response parsing (#2515)
fix: fix chat webui

Signed-off-by: Sertac Ozercan <sozercan@gmail.com>
2024-06-07 17:20:31 +02:00
Dave
d38e9090df experiment: -j4 for build-linux: (#2514)
experiment: set -j4 to see if things go faster, while we wait for a proper fix from mudler

Signed-off-by: Dave Lee <dave@gray101.com>
2024-06-07 11:22:28 +02:00
Ettore Di Giacinto
b049805c9b ci: run release build on self-hosted runners (#2505) 2024-06-06 22:16:34 -04:00
LocalAI [bot]
0f9b58f2cf ⬆️ Update ggerganov/llama.cpp (#2508)
Signed-off-by: github-actions[bot] <41898282+github-actions[bot]@users.noreply.github.com>
Co-authored-by: mudler <2420543+mudler@users.noreply.github.com>
2024-06-06 23:48:17 +00:00
LocalAI [bot]
0f134d557e ⬆️ Update ggerganov/whisper.cpp (#2507)
Signed-off-by: github-actions[bot] <41898282+github-actions[bot]@users.noreply.github.com>
Co-authored-by: mudler <2420543+mudler@users.noreply.github.com>
2024-06-06 23:21:25 +00:00
Ettore Di Giacinto
2676e127ae models(gallery): add llama3-8b-feifei-1.0-iq-imatrix (#2511)
Signed-off-by: Ettore Di Giacinto <mudler@localai.io>
2024-06-07 00:17:59 +02:00
Ettore Di Giacinto
270d4f8413 models(gallery): add rawr_llama3_8b-iq-imatrix (#2510)
Signed-off-by: Ettore Di Giacinto <mudler@localai.io>
2024-06-07 00:12:11 +02:00
Ettore Di Giacinto
2d79cee8cb models(gallery): add llama3-8B-aifeifei-1.0-iq-imatrix (#2509)
Signed-off-by: Ettore Di Giacinto <mudler@localai.io>
2024-06-07 00:07:20 +02:00
Ettore Di Giacinto
4c9623f50d deps(whisper): update, add libcufft-dev (#2501)
* arrow_up: Update ggerganov/whisper.cpp

Signed-off-by: github-actions[bot] <41898282+github-actions[bot]@users.noreply.github.com>

* fix(build): add libcufft-dev

Signed-off-by: Ettore Di Giacinto <mudler@localai.io>

---------

Signed-off-by: github-actions[bot] <41898282+github-actions[bot]@users.noreply.github.com>
Signed-off-by: Ettore Di Giacinto <mudler@localai.io>
Co-authored-by: mudler <2420543+mudler@users.noreply.github.com>
2024-06-06 08:41:04 +02:00
Ettore Di Giacinto
596cf76135 build(intel): bundle intel variants in single-binary (#2494)
* wip: try to build also intel variants

Signed-off-by: Ettore Di Giacinto <mudler@localai.io>

* Add dependencies

* Select automatically intel backend

---------

Signed-off-by: Ettore Di Giacinto <mudler@localai.io>
2024-06-06 08:40:51 +02:00
LocalAI [bot]
a293aa1b79 ⬆️ Update ggerganov/llama.cpp (#2493)
Signed-off-by: github-actions[bot] <41898282+github-actions[bot]@users.noreply.github.com>
Co-authored-by: mudler <2420543+mudler@users.noreply.github.com>
2024-06-06 00:02:51 +00:00
Ettore Di Giacinto
c4eb02c80f models(gallery): add l3-8b-stheno-v3.2-iq-imatrix (#2500)
Signed-off-by: Ettore Di Giacinto <mudler@localai.io>
2024-06-05 23:46:59 +02:00
Ettore Di Giacinto
9c9198ff08 models(gallery): add Llama-3-Yggdrasil-2.0-8B (#2499)
models(gallery): add Llama-3-Yggdrasil-2.0-8B-GGUF

Signed-off-by: Ettore Di Giacinto <mudler@localai.io>
2024-06-05 23:42:23 +02:00
Ettore Di Giacinto
83c79d5453 models(gallery): add llama-3-instruct-8b-SimPO-ExPO (#2498)
Signed-off-by: Ettore Di Giacinto <mudler@localai.io>
2024-06-05 23:37:59 +02:00
Ettore Di Giacinto
88fd000065 models(gallery): add phi-3-4x4b (#2497)
Signed-off-by: Ettore Di Giacinto <mudler@localai.io>
2024-06-05 23:29:15 +02:00
Ettore Di Giacinto
956d652314 models(gallery): add nyun (#2496)
Signed-off-by: Ettore Di Giacinto <mudler@localai.io>
2024-06-05 23:22:58 +02:00
Ettore Di Giacinto
9ce2b4d71f models(gallery): add dolphin-2.9.2-phi-3-Medium-abliterated (#2495)
Signed-off-by: Ettore Di Giacinto <mudler@localai.io>
2024-06-05 23:14:43 +02:00
Ettore Di Giacinto
4e974cb4fc models(gallery): add dolphin-2.9.2-Phi-3-Medium (#2492)
Signed-off-by: Ettore Di Giacinto <mudler@localai.io>
2024-06-05 19:17:20 +02:00
Dave
d072835796 feat:OpaqueErrors to hide error information (#2486)
* adds a new configuration option to hide all error message information from http requests
---------

Signed-off-by: Dave Lee <dave@gray101.com>
2024-06-05 08:45:24 +02:00
Ettore Di Giacinto
17cf6c4a4d feat(amdgpu): try to build in single binary (#2485)
* feat(amdgpu): try to build in single binary

Signed-off-by: Ettore Di Giacinto <mudler@localai.io>

* Release space from worker

Signed-off-by: Ettore Di Giacinto <mudler@localai.io>

---------

Signed-off-by: Ettore Di Giacinto <mudler@localai.io>
2024-06-05 08:44:15 +02:00
LocalAI [bot]
fab3e711ff ⬆️ Update ggerganov/llama.cpp (#2487)
Signed-off-by: github-actions[bot] <41898282+github-actions[bot]@users.noreply.github.com>
Co-authored-by: mudler <2420543+mudler@users.noreply.github.com>
2024-06-04 23:11:28 +00:00
Dave
4e1463fec2 feat: fiber CSRF (#2482)
new config option - enables or disables the fiber csrf middleware

Signed-off-by: Dave Lee <dave@gray101.com>
2024-06-04 19:43:46 +00:00
Dave
2fc6fe806b fix: pkg/downloader should respect basePath for file:// urls (#2481)
* pass basePath down to pkg/downloader

Signed-off-by: Dave Lee <dave@gray101.com>

* enforce

Signed-off-by: Dave Lee <dave@gray101.com>

---------

Signed-off-by: Dave Lee <dave@gray101.com>
2024-06-04 14:32:47 +00:00
Ettore Di Giacinto
bdd6769b2d feat(default): use number of physical cores as default (#2483)
Signed-off-by: Ettore Di Giacinto <mudler@localai.io>
2024-06-04 15:23:29 +02:00
Ettore Di Giacinto
1ffee9989f README: update sponsors list (#2476)
Signed-off-by: Ettore Di Giacinto <mudler@users.noreply.github.com>
2024-06-04 15:23:00 +02:00
Dave
34ab442ce9 toil: bump grpc version (#2480)
bump the grpc package version

---------

Signed-off-by: Dave Lee <dave@gray101.com>
2024-06-04 08:39:19 +02:00
LocalAI [bot]
67aa31faad ⬆️ Update ggerganov/llama.cpp (#2477)
Signed-off-by: github-actions[bot] <41898282+github-actions[bot]@users.noreply.github.com>
Co-authored-by: mudler <2420543+mudler@users.noreply.github.com>
2024-06-03 23:09:24 +00:00
fakezeta
6ef78ef7f6 bugfix: CUDA acceleration not working (#2475)
* bugfix: CUDA acceleration not working

CUDA not working after #2286.
Refactored the code to be more polish

* Update requirements.txt

Missing imports

Signed-off-by: fakezeta <fakezeta@gmail.com>

* Update requirements.txt

Signed-off-by: fakezeta <fakezeta@gmail.com>

---------

Signed-off-by: fakezeta <fakezeta@gmail.com>
2024-06-03 22:41:42 +02:00
Ettore Di Giacinto
daa7544d9c Update README.md
Signed-off-by: Ettore Di Giacinto <mudler@users.noreply.github.com>
2024-06-03 19:55:01 +02:00
Ettore Di Giacinto
34527737bb feat(webui): enhance card visibility (#2473)
Do not let the description text to clutter, also highlight the model
names

Signed-off-by: Ettore Di Giacinto <mudler@localai.io>
2024-06-03 17:07:26 +02:00
Ettore Di Giacinto
148adebe16 docs: fix p2p commands (#2472)
Also change icons on GPT vision page

Signed-off-by: Ettore Di Giacinto <mudler@localai.io>
2024-06-03 16:58:53 +02:00
Ettore Di Giacinto
bae2a649fd models(gallery): add new poppy porpoise versions (#2471)
models(gallery): add new poppy purpoise versions

Signed-off-by: Ettore Di Giacinto <mudler@localai.io>
2024-06-03 15:44:52 +02:00
Ettore Di Giacinto
90945ebab3 models(gallery): add fimbulvetr iqmatrix version (#2470)
Signed-off-by: Ettore Di Giacinto <mudler@localai.io>
2024-06-03 15:44:27 +02:00
fakezeta
4a239a4bff feat(transformers): various enhancements to the transformers backend (#2468)
update transformers

*Handle Temperature = 0 as greedy search
*Handle custom works as stop words
*Implement KV cache
*Phi 3 no more requires trust_remote_code: true
2024-06-03 08:52:55 +02:00
LocalAI [bot]
5ddaa19914 ⬆️ Update ggerganov/llama.cpp (#2467)
Signed-off-by: github-actions[bot] <41898282+github-actions[bot]@users.noreply.github.com>
Co-authored-by: mudler <2420543+mudler@users.noreply.github.com>
2024-06-02 21:34:29 +00:00
Ettore Di Giacinto
77d752a481 fix(gemma): correctly format the template
Signed-off-by: Ettore Di Giacinto <mudler@users.noreply.github.com>
2024-06-02 10:51:58 +02:00
Ettore Di Giacinto
29ff51c12a Update gemma stopwords
Signed-off-by: Ettore Di Giacinto <mudler@users.noreply.github.com>
2024-06-02 01:26:41 +02:00
Ettore Di Giacinto
c0744899c9 models(gallery): add gemma-2b (#2466)
Signed-off-by: Ettore Di Giacinto <mudler@localai.io>
2024-06-02 01:15:06 +02:00
LocalAI [bot]
c9092ad39c models(gallery): ⬆️ update checksum (#2463)
⬆️ Checksum updates in gallery/index.yaml

Signed-off-by: github-actions[bot] <41898282+github-actions[bot]@users.noreply.github.com>
Co-authored-by: mudler <2420543+mudler@users.noreply.github.com>
2024-06-01 23:13:02 +00:00
LocalAI [bot]
b588cae70e ⬆️ Update ggerganov/llama.cpp (#2465)
Signed-off-by: github-actions[bot] <41898282+github-actions[bot]@users.noreply.github.com>
Co-authored-by: mudler <2420543+mudler@users.noreply.github.com>
2024-06-01 22:31:32 +00:00
LocalAI [bot]
fb0f188c93 feat(swagger): update swagger (#2464)
Signed-off-by: github-actions[bot] <41898282+github-actions[bot]@users.noreply.github.com>
Co-authored-by: mudler <2420543+mudler@users.noreply.github.com>
2024-06-01 22:04:01 +00:00
Chakib Benziane
b99182c8d4 TTS API improvements (#2308)
* update doc on COQUI_LANGUAGE env variable

Signed-off-by: blob42 <contact@blob42.xyz>

* return errors from tts gRPC backend

Signed-off-by: blob42 <contact@blob42.xyz>

* handle speaker_id and language in coqui TTS backend

Signed-off-by: blob42 <contact@blob42.xyz>

* TTS endpoint: add optional language paramter

Signed-off-by: blob42 <contact@blob42.xyz>

* tts fix: empty language string breaks non-multilingual models

Signed-off-by: blob42 <contact@blob42.xyz>

* allow tts param definition in config file

- consolidate TTS options under `tts` config entry

Signed-off-by: blob42 <contact@blob42.xyz>

* tts: update doc

Signed-off-by: blob42 <contact@blob42.xyz>

---------

Signed-off-by: blob42 <contact@blob42.xyz>
Co-authored-by: Ettore Di Giacinto <mudler@users.noreply.github.com>
2024-06-01 18:26:27 +00:00
Ettore Di Giacinto
95c65d67f5 models(gallery): add all whisper variants (#2462)
Signed-off-by: Ettore Di Giacinto <mudler@localai.io>
2024-06-01 20:04:03 +02:00
Ettore Di Giacinto
c603b95ac7 ci: pin build-time protoc (#2461)
ci: pin protoc

Signed-off-by: Ettore Di Giacinto <mudler@localai.io>
2024-06-01 18:59:15 +02:00
Ettore Di Giacinto
13cfa6de0a models(gallery): add Neural SOVLish Devil (#2460)
Signed-off-by: Ettore Di Giacinto <mudler@localai.io>
2024-06-01 12:54:58 +02:00
Ettore Di Giacinto
0560c6fd57 models(gallery): add poppy porpoise 1.0 (#2459)
modekls(gallery): add poppy porpoise 1.0

Signed-off-by: Ettore Di Giacinto <mudler@localai.io>
2024-06-01 12:54:37 +02:00
Ettore Di Giacinto
f24dddae42 models(gallery): add ultron (#2456)
Signed-off-by: Ettore Di Giacinto <mudler@localai.io>
2024-06-01 00:09:51 +02:00
LocalAI [bot]
06b461b061 ⬆️ Update ggerganov/llama.cpp (#2453)
Signed-off-by: github-actions[bot] <41898282+github-actions[bot]@users.noreply.github.com>
Co-authored-by: mudler <2420543+mudler@users.noreply.github.com>
2024-06-01 00:09:26 +02:00
Ettore Di Giacinto
e50a7ba879 models(gallery): add llama3-11b (#2455)
Signed-off-by: Ettore Di Giacinto <mudler@localai.io>
2024-06-01 00:03:57 +02:00
Ettore Di Giacinto
3b2bce1fc9 models(gallery): add anjir (#2454)
Signed-off-by: Ettore Di Giacinto <mudler@localai.io>
2024-06-01 00:03:46 +02:00
LocalAI [bot]
3fe7e9f678 ⬆️ Update ggerganov/whisper.cpp (#2452)
Signed-off-by: github-actions[bot] <41898282+github-actions[bot]@users.noreply.github.com>
Co-authored-by: mudler <2420543+mudler@users.noreply.github.com>
2024-05-31 21:59:48 +00:00
LocalAI [bot]
654b661688 models(gallery): ⬆️ update checksum (#2451)
⬆️ Checksum updates in gallery/index.yaml

Signed-off-by: github-actions[bot] <41898282+github-actions[bot]@users.noreply.github.com>
Co-authored-by: mudler <2420543+mudler@users.noreply.github.com>
2024-05-31 21:58:54 +00:00
Ettore Di Giacinto
7f387fb238 Update README.md
Signed-off-by: Ettore Di Giacinto <mudler@users.noreply.github.com>
2024-05-31 22:59:51 +02:00
Ettore Di Giacinto
5d31e5269d feat(functions): allow response_regex to be a list (#2447)
feat(functions): allow regex match to be a list

Signed-off-by: Ettore Di Giacinto <mudler@localai.io>
2024-05-31 22:52:02 +02:00
Ettore Di Giacinto
ff8a6962cd build(Makefile): add back single target to build native llama-cpp (#2448)
Signed-off-by: Ettore Di Giacinto <mudler@localai.io>
2024-05-31 18:35:33 +02:00
Ettore Di Giacinto
10c64dbb55 models(gallery): add mopeymule (#2449)
* models(gallery): add mopeymule

Signed-off-by: Ettore Di Giacinto <mudler@localai.io>

* ci: try to fix workflow

Signed-off-by: Ettore Di Giacinto <mudler@localai.io>

---------

Signed-off-by: Ettore Di Giacinto <mudler@localai.io>
2024-05-31 18:08:39 +02:00
Ettore Di Giacinto
3f7212c660 feat(functions): better free string matching, allow to expect strings after JSON (#2445)
Allow now any non-character, both as suffix and prefix when mixed grammars are enabled

Signed-off-by: Ettore Di Giacinto <mudler@localai.io>
2024-05-31 09:36:27 +02:00
LocalAI [bot]
5dc6bace49 ⬆️ Update ggerganov/whisper.cpp (#2443)
Signed-off-by: github-actions[bot] <41898282+github-actions[bot]@users.noreply.github.com>
Co-authored-by: mudler <2420543+mudler@users.noreply.github.com>
2024-05-30 22:18:55 +00:00
LocalAI [bot]
3cd5918ae6 ⬆️ Update ggerganov/llama.cpp (#2444)
Signed-off-by: github-actions[bot] <41898282+github-actions[bot]@users.noreply.github.com>
Co-authored-by: mudler <2420543+mudler@users.noreply.github.com>
2024-05-30 22:09:42 +00:00
Ettore Di Giacinto
5b75bf16c7 models(gallery): add Codestral (#2442)
models(gallery): add Coderstral

Signed-off-by: Ettore Di Giacinto <mudler@localai.io>
2024-05-30 18:50:26 +02:00
LocalAI [bot]
0c40f545d4 feat(swagger): update swagger (#2436)
Signed-off-by: github-actions[bot] <41898282+github-actions[bot]@users.noreply.github.com>
Co-authored-by: mudler <2420543+mudler@users.noreply.github.com>
2024-05-30 08:11:05 +00:00
LocalAI [bot]
b2fc92daa7 ⬆️ Update ggerganov/whisper.cpp (#2438)
Signed-off-by: github-actions[bot] <41898282+github-actions[bot]@users.noreply.github.com>
Co-authored-by: mudler <2420543+mudler@users.noreply.github.com>
2024-05-30 06:07:28 +00:00
LocalAI [bot]
0787797961 ⬆️ Update ggerganov/llama.cpp (#2437)
Signed-off-by: github-actions[bot] <41898282+github-actions[bot]@users.noreply.github.com>
Co-authored-by: mudler <2420543+mudler@users.noreply.github.com>
2024-05-29 23:15:36 +00:00
Ettore Di Giacinto
2ba9e27bcf models(gallery): add neuraldaredevil (#2439)
Signed-off-by: Ettore Di Giacinto <mudler@localai.io>
2024-05-30 00:15:52 +02:00
Prajwal S Nayak
4d98dd9ce7 feat(image): support response_type in the OpenAI API request (#2347)
* Change response_format type to string to match OpenAI Spec

Signed-off-by: prajwal <prajwalnayak7@gmail.com>

* updated response_type type to interface

Signed-off-by: prajwal <prajwalnayak7@gmail.com>

* feat: correctly parse generic struct

Signed-off-by: mudler <mudler@localai.io>

* add tests

Signed-off-by: mudler <mudler@localai.io>

---------

Signed-off-by: prajwal <prajwalnayak7@gmail.com>
Signed-off-by: mudler <mudler@localai.io>
Co-authored-by: Ettore Di Giacinto <mudler@users.noreply.github.com>
Co-authored-by: mudler <mudler@localai.io>
2024-05-29 14:40:54 +02:00
LocalAI [bot]
087bceccac ⬆️ Update ggerganov/llama.cpp (#2433)
Signed-off-by: github-actions[bot] <41898282+github-actions[bot]@users.noreply.github.com>
Co-authored-by: mudler <2420543+mudler@users.noreply.github.com>
2024-05-28 21:55:03 +00:00
Ettore Di Giacinto
7064697ce5 models(gallery): add halu (#2434)
Signed-off-by: Ettore Di Giacinto <mudler@localai.io>
2024-05-28 23:13:50 +02:00
Ettore Di Giacinto
0b99be73b3 models(gallery): add una-thepitbull (#2435)
Signed-off-by: Ettore Di Giacinto <mudler@localai.io>
2024-05-28 23:13:28 +02:00
Ettore Di Giacinto
669cd06dd9 feat(functions): allow parallel calls with mixed/no grammars (#2432)
Signed-off-by: Ettore Di Giacinto <mudler@localai.io>
2024-05-28 21:06:09 +02:00
Ettore Di Giacinto
2bbc52fcc8 feat(build): add arm64 core containers (#2421)
ci: add arm64 container images

Signed-off-by: Ettore Di Giacinto <mudler@localai.io>
2024-05-28 10:34:59 +02:00
LocalAI [bot]
577888f3c0 ⬆️ Update ggerganov/llama.cpp (#2428)
Signed-off-by: github-actions[bot] <41898282+github-actions[bot]@users.noreply.github.com>
Co-authored-by: mudler <2420543+mudler@users.noreply.github.com>
2024-05-27 22:02:49 +00:00
LocalAI [bot]
1c80f628ff ⬆️ Update ggerganov/whisper.cpp (#2427)
Signed-off-by: github-actions[bot] <41898282+github-actions[bot]@users.noreply.github.com>
Co-authored-by: mudler <2420543+mudler@users.noreply.github.com>
2024-05-27 21:28:36 +00:00
Ettore Di Giacinto
10430a00bd feat(hipblas): extend default hipblas GPU_TARGETS (#2426)
Makefile: extend default hipblas GPU_TARGETS

Signed-off-by: Ettore Di Giacinto <mudler@localai.io>
2024-05-27 22:35:11 +02:00
Ettore Di Giacinto
9f5c274321 feat(images): do not install python deps in the core image (#2425)
do not install python deps in the core image

Signed-off-by: Ettore Di Giacinto <mudler@localai.io>
2024-05-27 22:07:48 +02:00
Ettore Di Giacinto
d075dc44dd ci: push test images when building PRs (#2424)
ci: try to push image

Signed-off-by: Ettore Di Giacinto <mudler@localai.io>
2024-05-27 22:07:35 +02:00
Ettore Di Giacinto
be8ffbdfcf ci(grpc-cache): also arm64 (#2423)
grpc-cache: also arm64

Signed-off-by: Ettore Di Giacinto <mudler@localai.io>
2024-05-27 17:23:34 +02:00
Ettore Di Giacinto
eaf653f3d3 models(gallery): add iterative-dpo, fix minicpm (#2422)
Signed-off-by: Ettore Di Giacinto <mudler@localai.io>
2024-05-27 17:17:04 +02:00
LocalAI [bot]
e9c28a1ed7 ⬆️ Update ggerganov/llama.cpp (#2419)
Signed-off-by: github-actions[bot] <41898282+github-actions[bot]@users.noreply.github.com>
Co-authored-by: mudler <2420543+mudler@users.noreply.github.com>
2024-05-26 21:32:05 +00:00
cryptk
ba984c7097 fix: pin version of setuptools for intel builds to work around #2406 (#2414)
Signed-off-by: Chris Jowett <421501+cryptk@users.noreply.github.com>
2024-05-26 18:27:07 +00:00
Ettore Di Giacinto
ff1f9125ed models(gallery): add stheno-mahou (#2418)
Signed-off-by: Ettore Di Giacinto <mudler@localai.io>
2024-05-26 20:12:40 +02:00
Ettore Di Giacinto
2c82058548 models(gallery): add cream-phi-13b (#2417)
Signed-off-by: Ettore Di Giacinto <mudler@localai.io>
2024-05-26 20:11:57 +02:00
cryptk
16433d2e8e fix: install pytorch from proper index for hipblas builds (#2413)
Signed-off-by: Chris Jowett <421501+cryptk@users.noreply.github.com>
2024-05-26 18:05:52 +00:00
Ettore Di Giacinto
345047ed7c models(gallery): add alpha centauri (#2416)
Signed-off-by: Ettore Di Giacinto <mudler@localai.io>
2024-05-26 20:04:26 +02:00
Ettore Di Giacinto
6343758f9c models(gallery): add poppy porpoise 0.85 (#2415)
Signed-off-by: Ettore Di Giacinto <mudler@localai.io>
2024-05-26 19:59:49 +02:00
Ettore Di Giacinto
135208806c models(gallery): add minicpm (#2412)
Signed-off-by: Ettore Di Giacinto <mudler@localai.io>
2024-05-26 15:58:19 +02:00
Ettore Di Giacinto
3280de7adf models(gallery): add Mahou (#2411)
Signed-off-by: Ettore Di Giacinto <mudler@localai.io>
2024-05-26 15:43:31 +02:00
Ettore Di Giacinto
db3113c5c8 fix(watcher): do not emit fatal errors (#2410)
Signed-off-by: Ettore Di Giacinto <mudler@localai.io>
2024-05-26 14:48:30 +02:00
LocalAI [bot]
593fb62bf0 ⬆️ Update ggerganov/llama.cpp (#2409)
Signed-off-by: github-actions[bot] <41898282+github-actions[bot]@users.noreply.github.com>
Co-authored-by: mudler <2420543+mudler@users.noreply.github.com>
2024-05-26 08:43:50 +00:00
LocalAI [bot]
480834f75b ⬆️ Update ggerganov/whisper.cpp (#2408)
Signed-off-by: github-actions[bot] <41898282+github-actions[bot]@users.noreply.github.com>
Co-authored-by: mudler <2420543+mudler@users.noreply.github.com>
2024-05-26 08:05:15 +00:00
Sertaç Özercan
3200a6655e fix: gpu fetch device info (#2403)
* fix: gpu fetch device info

Signed-off-by: Sertac Ozercan <sozercan@gmail.com>

* use pciutils package

Signed-off-by: Sertac Ozercan <sozercan@gmail.com>

---------

Signed-off-by: Sertac Ozercan <sozercan@gmail.com>
2024-05-26 09:56:06 +02:00
Ettore Di Giacinto
b90cdced59 docs: rewording
Signed-off-by: Ettore Di Giacinto <mudler@localai.io>
2024-05-25 20:18:25 +02:00
Ettore Di Giacinto
fc3502b56f docs: rewording
Signed-off-by: Ettore Di Giacinto <mudler@localai.io>
2024-05-25 20:17:04 +02:00
Ettore Di Giacinto
785adc1ed5 docs: updaet title
Signed-off-by: Ettore Di Giacinto <mudler@localai.io>
2024-05-25 16:13:48 +02:00
Ettore Di Giacinto
e25fc656c9 Update README.md
Signed-off-by: Ettore Di Giacinto <mudler@users.noreply.github.com>
2024-05-25 16:13:04 +02:00
Ettore Di Giacinto
bb3ec56de3 docs: add distributed inferencing docs
Signed-off-by: Ettore Di Giacinto <mudler@localai.io>
2024-05-25 16:12:08 +02:00
Ettore Di Giacinto
785c54e7b0 models(gallery): add Mirai Nova (#2405)
Signed-off-by: Ettore Di Giacinto <mudler@localai.io>
2024-05-25 16:11:01 +02:00
Ettore Di Giacinto
003b43f6fc Update quickstart.md
Signed-off-by: Ettore Di Giacinto <mudler@users.noreply.github.com>
2024-05-25 10:18:20 +02:00
LocalAI [bot]
663488b6bd ⬆️ Update docs version mudler/LocalAI (#2398)
Signed-off-by: github-actions[bot] <41898282+github-actions[bot]@users.noreply.github.com>
Co-authored-by: mudler <2420543+mudler@users.noreply.github.com>
2024-05-25 10:08:35 +02:00
Ettore Di Giacinto
e1d6b706f4 Update quickstart.md (#2404)
Signed-off-by: Ettore Di Giacinto <mudler@users.noreply.github.com>
2024-05-25 10:08:23 +02:00
Sertaç Özercan
29615576fb ci: fix sd release (#2400)
Signed-off-by: Sertac Ozercan <sozercan@gmail.com>
2024-05-25 09:33:50 +02:00
LocalAI [bot]
f8cea16c03 ⬆️ Update ggerganov/llama.cpp (#2399)
Signed-off-by: github-actions[bot] <41898282+github-actions[bot]@users.noreply.github.com>
Co-authored-by: mudler <2420543+mudler@users.noreply.github.com>
2024-05-24 21:52:13 +00:00
Ettore Di Giacinto
e0187c2a1a ci: do not tag latest on AIO automatically
Signed-off-by: Ettore Di Giacinto <mudler@localai.io>
2024-05-24 09:41:13 +02:00
Ettore Di Giacinto
b76d2fe68a Update quickstart.md
Signed-off-by: Ettore Di Giacinto <mudler@users.noreply.github.com>
2024-05-24 09:02:59 +02:00
Ettore Di Giacinto
ee4f722bf8 models(gallery): add aya-35b (#2391)
Signed-off-by: Ettore Di Giacinto <mudler@localai.io>
2024-05-23 23:51:34 +02:00
LocalAI [bot]
dce63237f2 ⬆️ Update ggerganov/llama.cpp (#2360)
Signed-off-by: github-actions[bot] <41898282+github-actions[bot]@users.noreply.github.com>
Co-authored-by: mudler <2420543+mudler@users.noreply.github.com>
2024-05-23 21:02:13 +00:00
Dave
0b637465d9 refactor: Minor improvements to BackendConfigLoader (#2353)
some minor renames and refactorings within BackendConfigLoader - make things more consistent, remove underused code, rename things for clarity

Signed-off-by: Dave Lee <dave@gray101.com>
2024-05-23 22:48:12 +02:00
Mauro Morales
114f549f5e Add warning for running the binary on MacOS (#2389) 2024-05-23 22:40:55 +02:00
Ettore Di Giacinto
ea330d452d models(gallery): add mistral-0.3 and command-r, update functions (#2388)
* models(gallery): add mistral-0.3 and command-r, update functions

Add also disable_parallel_new_lines to disable newlines in the JSON
output when forcing parallel tools. Some models (like mistral) might be
very sensible to that when being used for function calling.

Signed-off-by: Ettore Di Giacinto <mudler@localai.io>

* models(gallery): add aya-23-8b

Signed-off-by: Ettore Di Giacinto <mudler@localai.io>

---------

Signed-off-by: Ettore Di Giacinto <mudler@localai.io>
2024-05-23 19:16:08 +02:00
Valentin Fröhlich
eb11a46a73 Add Home Assistant Integration (#2387)
Add https://github.com/valentinfrlch/ha-gpt4vision to Home Assistant Integration section

gpt4vision uses LocalAI's API to send images along with a prompt and return the models output.

Signed-off-by: Valentin Fröhlich <85313672+valentinfrlch@users.noreply.github.com>
2024-05-23 15:21:01 +02:00
LocalAI [bot]
b57e14d65c models(gallery): ⬆️ update checksum (#2386)
⬆️ Checksum updates in gallery/index.yaml

Signed-off-by: github-actions[bot] <41898282+github-actions[bot]@users.noreply.github.com>
Co-authored-by: mudler <2420543+mudler@users.noreply.github.com>
2024-05-23 08:42:45 +02:00
Sertaç Özercan
7efa8e75d4 fix: stablediffusion binary (#2385)
Signed-off-by: Sertac Ozercan <sozercan@gmail.com>
2024-05-23 08:34:37 +02:00
Ettore Di Giacinto
7551369abe Update checksum_checker.sh
Signed-off-by: Ettore Di Giacinto <mudler@users.noreply.github.com>
2024-05-23 08:33:58 +02:00
LocalAI [bot]
79915bcd11 models(gallery): ⬆️ update checksum (#2383)
⬆️ Checksum updates in gallery/index.yaml

Signed-off-by: github-actions[bot] <41898282+github-actions[bot]@users.noreply.github.com>
Co-authored-by: mudler <2420543+mudler@users.noreply.github.com>
2024-05-23 01:10:15 +00:00
LocalAI [bot]
c8d7d14a37 ⬆️ Update go-skynet/go-bert.cpp (#1225)
Signed-off-by: github-actions[bot] <41898282+github-actions[bot]@users.noreply.github.com>
Co-authored-by: mudler <2420543+mudler@users.noreply.github.com>
2024-05-22 23:42:38 +00:00
LocalAI [bot]
c56bc0de98 ⬆️ Update ggerganov/whisper.cpp (#2361)
Signed-off-by: github-actions[bot] <41898282+github-actions[bot]@users.noreply.github.com>
Co-authored-by: mudler <2420543+mudler@users.noreply.github.com>
2024-05-23 01:02:57 +02:00
Ettore Di Giacinto
3a9408363b deps(llama.cpp): update and adapt API changes (#2381)
deps(llama.cpp): update and rename function

Signed-off-by: Ettore Di Giacinto <mudler@localai.io>
2024-05-23 01:02:11 +02:00
Ettore Di Giacinto
21a12c2cdd ci(checksum_checker): do get sha from hf API when available (#2380)
Signed-off-by: Ettore Di Giacinto <mudler@localai.io>
2024-05-22 23:51:02 +02:00
Ettore Di Giacinto
371d0cc1f7 ci: generate specific image for intel builds (#2374)
ci: fix intel images until are fixed upstream

Signed-off-by: Ettore Di Giacinto <mudler@localai.io>
2024-05-22 23:35:39 +02:00
Ettore Di Giacinto
23fa92bec0 models(gallery): add hercules and helpingAI (#2376)
Signed-off-by: Ettore Di Giacinto <mudler@localai.io>
2024-05-22 22:42:41 +02:00
Ettore Di Giacinto
f91e4e5c03 ci: correctly build p2p in GO_TAGS (#2369)
Signed-off-by: Ettore Di Giacinto <mudler@localai.io>
2024-05-22 10:15:36 +02:00
Ettore Di Giacinto
6cbe6a4f99 models(gallery): add phi-3-medium-4k-instruct (#2367)
Signed-off-by: Ettore Di Giacinto <mudler@localai.io>
2024-05-22 08:32:30 +02:00
Ettore Di Giacinto
491e1d752b feat(functions): relax mixedgrammars (#2365)
* feat(functions): relax mixedgrammars

Extend even more the functionalities and when mixed mode is enabled,
tolerate also both strings and JSON in the result - in this case we make
sure that the JSON can be correctly parsed.

This also updates the examples and the gallery model to configure the
grammar.

The changeset also breaks current function/grammar configuration as it
reserves now a stanza in the YAML config.

For example:

```yaml
function:
  grammar:
    # This allows the grammar to also return messages
    mixed_mode: true
    # Suffix to add to the grammar
    # prefix: '<tool_call>\n'
    # Force parallel calls in the grammar
    # parallel_calls: true
```

Signed-off-by: Ettore Di Giacinto <mudler@localai.io>

* refactor, add a way to disable mixed json and freestring

Signed-off-by: Ettore Di Giacinto <mudler@localai.io>

* Fix linting issues

Signed-off-by: Ettore Di Giacinto <mudler@localai.io>

---------

Signed-off-by: Ettore Di Giacinto <mudler@localai.io>
2024-05-22 00:14:16 +02:00
nold
1542c58466 fix(gallery): checksum Meta-Llama-3-70B-Instruct.Q4_K_M.gguf - #2364 (#2366)
Signed-off-by: Gerrit Pannek <nold@gnu.one>
2024-05-21 21:51:48 +02:00
Ettore Di Giacinto
1a3dedece0 dependencies(grpcio): bump to fix CI issues (#2362)
feat(grpcio): bump to fix CI issues

Signed-off-by: Ettore Di Giacinto <mudler@localai.io>
2024-05-21 14:33:47 +02:00
Ettore Di Giacinto
a58ff00ab1 models(gallery): add stheno (#2358)
Signed-off-by: Ettore Di Giacinto <mudler@localai.io>
2024-05-20 19:18:14 +02:00
Ettore Di Giacinto
fdb45153fe feat(llama.cpp): Totally decentralized, private, distributed, p2p inference (#2343)
* feat(llama.cpp): Enable decentralized, distributed inference

As https://github.com/mudler/LocalAI/pull/2324 introduced distributed inferencing thanks to
@rgerganov implementation in https://github.com/ggerganov/llama.cpp/pull/6829 in upstream llama.cpp, now
it is possible to distribute the workload to remote llama.cpp gRPC server.

This changeset now uses mudler/edgevpn to establish a secure, distributed network between the nodes using a shared token.
The token is generated automatically when starting the server with the `--p2p` flag, and can be used by starting the workers
with `local-ai worker p2p-llama-cpp-rpc` by passing the token via environment variable (TOKEN) or with args (--token).

As per how mudler/edgevpn works, a network is established between the server and the workers with dht and mdns discovery protocols,
the llama.cpp rpc server is automatically started and exposed to the underlying p2p network so the API server can connect on.

When the HTTP server is started, it will discover the workers in the network and automatically create the port-forwards to the service locally.
Then llama.cpp is configured to use the services.

This feature is behind the "p2p" GO_FLAGS

Signed-off-by: Ettore Di Giacinto <mudler@localai.io>

* go mod tidy

Signed-off-by: Ettore Di Giacinto <mudler@localai.io>

* ci: add p2p tag

Signed-off-by: Ettore Di Giacinto <mudler@localai.io>

* better message

Signed-off-by: Ettore Di Giacinto <mudler@localai.io>

---------

Signed-off-by: Ettore Di Giacinto <mudler@localai.io>
2024-05-20 19:17:59 +02:00
Ettore Di Giacinto
16474bfb40 build: add sha (#2356)
Signed-off-by: Ettore Di Giacinto <mudler@localai.io>
2024-05-20 18:02:19 +02:00
Ettore Di Giacinto
5a6d120a56 feat(functions): don't use yaml.MapSlice (#2354)
Signed-off-by: Ettore Di Giacinto <mudler@localai.io>
2024-05-20 08:31:06 +02:00
Ettore Di Giacinto
7a480bb16f models(gallery): add LocalAI-Llama3-8b-Function-Call-v0.2-GGUF (#2355)
Signed-off-by: Ettore Di Giacinto <mudler@localai.io>
2024-05-20 00:59:17 +02:00
LocalAI [bot]
053531e434 ⬆️ Update ggerganov/whisper.cpp (#2352)
Signed-off-by: github-actions[bot] <41898282+github-actions[bot]@users.noreply.github.com>
Co-authored-by: mudler <2420543+mudler@users.noreply.github.com>
2024-05-19 22:23:02 +00:00
LocalAI [bot]
b7ab4f25d9 ⬆️ Update ggerganov/llama.cpp (#2351)
Signed-off-by: github-actions[bot] <41898282+github-actions[bot]@users.noreply.github.com>
Co-authored-by: mudler <2420543+mudler@users.noreply.github.com>
2024-05-19 22:22:03 +00:00
Ettore Di Giacinto
73566a2bb2 feat(functions): allow to use JSONRegexMatch unconditionally (#2349)
* feat(functions): allow to use JSONRegexMatch unconditionally

Signed-off-by: Ettore Di Giacinto <mudler@localai.io>

* feat(functions): make json_regex_match a list

Signed-off-by: Ettore Di Giacinto <mudler@localai.io>

---------

Signed-off-by: Ettore Di Giacinto <mudler@localai.io>
2024-05-19 18:24:49 +02:00
Ettore Di Giacinto
8ccd5ab040 feat(webui): statically embed js/css assets (#2348)
* feat(webui): statically embed js/css assets

Signed-off-by: Ettore Di Giacinto <mudler@localai.io>

* update font assets

Signed-off-by: Ettore Di Giacinto <mudler@localai.io>

---------

Signed-off-by: Ettore Di Giacinto <mudler@localai.io>
2024-05-19 18:24:27 +02:00
Ettore Di Giacinto
5a3db730b9 Update README.md
Signed-off-by: Ettore Di Giacinto <mudler@users.noreply.github.com>
2024-05-19 16:37:10 +02:00
Ettore Di Giacinto
8ad669339e add openvoice backend (#2334)
Wip openvoice
2024-05-19 16:27:08 +02:00
Ettore Di Giacinto
a10a952085 models(gallery): update poppy porpoise mmproj (#2346)
models(gallery): update poppy porpose mmproj

Signed-off-by: Ettore Di Giacinto <mudler@localai.io>
2024-05-19 13:26:02 +02:00
Ettore Di Giacinto
b37447cac5 models(gallery): add master-yi (#2345)
Signed-off-by: Ettore Di Giacinto <mudler@localai.io>
2024-05-19 13:25:29 +02:00
Ettore Di Giacinto
f2d182a2eb models(gallery): add anita (#2344)
Signed-off-by: Ettore Di Giacinto <mudler@localai.io>
2024-05-19 13:25:16 +02:00
lenaxia
6b6c8cdd5f feat(functions): Enable true regex replacement for the regexReplacement option (#2341)
* Adding regex capabilities to ParseFunctionCall replacement

Signed-off-by: Lenaxia <github@47north.lat>

* Adding tests for the regex replace in ParseFunctionCall

Signed-off-by: Lenaxia <github@47north.lat>

* Fixing tests and adding a test case to validate double quote replacement works

Signed-off-by: Lenaxia <github@47north.lat>

* Make Regex replacement stable, drop lookaheads

Signed-off-by: mudler <mudler@localai.io>

---------

Signed-off-by: Lenaxia <github@47north.lat>
Signed-off-by: mudler <mudler@localai.io>
Co-authored-by: Lenaxia <github@47north.lat>
Co-authored-by: mudler <mudler@localai.io>
2024-05-19 01:29:10 +02:00
LocalAI [bot]
5f35e85e86 ⬆️ Update ggerganov/llama.cpp (#2342)
Signed-off-by: github-actions[bot] <41898282+github-actions[bot]@users.noreply.github.com>
Co-authored-by: mudler <2420543+mudler@users.noreply.github.com>
2024-05-18 21:06:29 +00:00
Ettore Di Giacinto
02f1b477df feat(functions): simplify parsing, read functions as list (#2340)
Signed-off-by: mudler <mudler@localai.io>
2024-05-18 09:35:28 +02:00
LocalAI [bot]
9ab8f8f5e0 ⬆️ Update ggerganov/llama.cpp (#2339)
Signed-off-by: github-actions[bot] <41898282+github-actions[bot]@users.noreply.github.com>
Co-authored-by: mudler <2420543+mudler@users.noreply.github.com>
2024-05-17 21:13:01 +00:00
LocalAI [bot]
9a255d6453 ⬆️ Update ggerganov/llama.cpp (#2337)
Signed-off-by: github-actions[bot] <41898282+github-actions[bot]@users.noreply.github.com>
Co-authored-by: mudler <2420543+mudler@users.noreply.github.com>
2024-05-16 21:53:19 +00:00
Ettore Di Giacinto
e0ef9e2bb9 models(gallery): add yi 6/9b, sqlcoder, sfr-iterative-dpo (#2335)
Signed-off-by: Ettore Di Giacinto <mudler@localai.io>
2024-05-16 20:05:20 +02:00
cryptk
86627b27f7 fix: add setuptools to all requirements-intel.txt files for python backends (#2333)
Signed-off-by: Chris Jowett <421501+cryptk@users.noreply.github.com>
2024-05-16 19:15:46 +02:00
LocalAI [bot]
4e92569d45 ⬆️ Update ggerganov/whisper.cpp (#2329)
Signed-off-by: github-actions[bot] <41898282+github-actions[bot]@users.noreply.github.com>
Co-authored-by: mudler <2420543+mudler@users.noreply.github.com>
2024-05-15 22:24:06 +00:00
Ettore Di Giacinto
f7508e3888 models(gallery): add hermes-2-theta-llama-3-8b (#2331)
Signed-off-by: mudler <mudler@localai.io>
2024-05-16 00:22:32 +02:00
Aleksandr Oleinikov
badfc16df1 fix(gallery) Correct llama3-8b-instruct model file (#2330)
Correct llama3-8b-instruct model file

This must be a mistake because the config tries to use a model file that is different from the one actually being downloaded.
I assumed the downloaded file is what should be used so I corrected the specified model file to that

Signed-off-by: Aleksandr Oleinikov <10602045+tannisroot@users.noreply.github.com>
2024-05-16 00:22:05 +02:00
LocalAI [bot]
b584dcf18a ⬆️ Update ggerganov/llama.cpp (#2316)
Signed-off-by: github-actions[bot] <41898282+github-actions[bot]@users.noreply.github.com>
Co-authored-by: mudler <2420543+mudler@users.noreply.github.com>
2024-05-15 22:20:37 +00:00
Ettore Di Giacinto
4c845fb47d Update README.md
Signed-off-by: Ettore Di Giacinto <mudler@users.noreply.github.com>
2024-05-15 23:56:52 +02:00
Ettore Di Giacinto
07c0559d06 Update README.md
Signed-off-by: Ettore Di Giacinto <mudler@users.noreply.github.com>
2024-05-15 23:56:22 +02:00
Ettore Di Giacinto
beb598e4f9 feat(functions): mixed JSON BNF grammars (#2328)
feat(functions): support mixed JSON BNF grammar

This PR provides new options to control how functions are extracted from
the LLM, and also provides more control on how JSON grammars can be used
(also in conjunction).

New YAML settings introduced:

- `grammar_message`: when enabled, the generated grammar can also decide
  to push strings and not only JSON objects. This allows the LLM to pick
to either respond freely or using JSON.
- `grammar_prefix`: Allows to prefix a string to the JSON grammar
  definition.
- `replace_results`: Is a map that allows to replace strings in the LLM
  result.

As an example, consider the following settings for Hermes-2-Pro-Mistral,
which allow extracting both JSON results coming from the model, and the
ones coming from the grammar:

```yaml
function:
  # disable injecting the "answer" tool
  disable_no_action: true
  # This allows the grammar to also return messages
  grammar_message: true
  # Suffix to add to the grammar
  grammar_prefix: '<tool_call>\n'
  return_name_in_function_response: true
  # Without grammar uncomment the lines below
  # Warning: this is relying only on the capability of the
  # LLM model to generate the correct function call.
  # no_grammar: true
  # json_regex_match: "(?s)<tool_call>(.*?)</tool_call>"
  replace_results:
    "<tool_call>": ""
    "\'": "\""
```

Note: To disable entirely grammars usage in the example above, uncomment the
`no_grammar` and `json_regex_match`.

Signed-off-by: Ettore Di Giacinto <mudler@localai.io>
2024-05-15 20:03:18 +02:00
Ettore Di Giacinto
c89271b2e4 feat(llama.cpp): add distributed llama.cpp inferencing (#2324)
* feat(llama.cpp): support distributed llama.cpp

Signed-off-by: Ettore Di Giacinto <mudler@localai.io>

* feat: let tweak how chat messages are merged together

Signed-off-by: Ettore Di Giacinto <mudler@localai.io>

* refactor

Signed-off-by: Ettore Di Giacinto <mudler@localai.io>

* Makefile: register to ALL_GRPC_BACKENDS

Signed-off-by: Ettore Di Giacinto <mudler@localai.io>

* refactoring, allow disable auto-detection of backends

Signed-off-by: Ettore Di Giacinto <mudler@localai.io>

* minor fixups

Signed-off-by: mudler <mudler@localai.io>

* feat: add cmd to start rpc-server from llama.cpp

Signed-off-by: mudler <mudler@localai.io>

* ci: add ccache

Signed-off-by: mudler <mudler@localai.io>

---------

Signed-off-by: Ettore Di Giacinto <mudler@localai.io>
Signed-off-by: mudler <mudler@localai.io>
2024-05-15 01:17:02 +02:00
Ettore Di Giacinto
29909666c3 Update README.md
Signed-off-by: Ettore Di Giacinto <mudler@users.noreply.github.com>
2024-05-15 00:33:16 +02:00
LocalAI [bot]
566b5cf2ee ⬆️ Update ggerganov/whisper.cpp (#2326)
Signed-off-by: github-actions[bot] <41898282+github-actions[bot]@users.noreply.github.com>
Co-authored-by: mudler <2420543+mudler@users.noreply.github.com>
2024-05-14 21:17:46 +00:00
Sertaç Özercan
a670318a9f feat: auto select llama-cpp cuda runtime (#2306)
* auto select cpu variant

Signed-off-by: Sertac Ozercan <sozercan@gmail.com>

* remove cuda target for now

Signed-off-by: Sertac Ozercan <sozercan@gmail.com>

* fix metal

Signed-off-by: Sertac Ozercan <sozercan@gmail.com>

* fix path

Signed-off-by: Sertac Ozercan <sozercan@gmail.com>

* cuda

Signed-off-by: Sertac Ozercan <sozercan@gmail.com>

* auto select cuda

Signed-off-by: Sertac Ozercan <sozercan@gmail.com>

* update test

Signed-off-by: Sertac Ozercan <sozercan@gmail.com>

* select CUDA backend only if present

Signed-off-by: mudler <mudler@localai.io>

* ci: keep cuda bin in path

Signed-off-by: mudler <mudler@localai.io>

* Makefile: make dist now builds also cuda

Signed-off-by: mudler <mudler@localai.io>

* Keep pushing fallback in case auto-flagset/nvidia fails

There could be other reasons for which the default binary may fail. For example we might have detected an Nvidia GPU,
however the user might not have the drivers/cuda libraries installed in the system, and so it would fail to start.

We keep the fallback of llama.cpp at the end of the llama.cpp backends to try to fallback loading in case things go wrong

Signed-off-by: mudler <mudler@localai.io>

* Do not build cuda on MacOS

Signed-off-by: mudler <mudler@localai.io>

* cleanup

Signed-off-by: Sertac Ozercan <sozercan@gmail.com>

* Apply suggestions from code review

Signed-off-by: Ettore Di Giacinto <mudler@users.noreply.github.com>

---------

Signed-off-by: Sertac Ozercan <sozercan@gmail.com>
Signed-off-by: Ettore Di Giacinto <mudler@users.noreply.github.com>
Signed-off-by: mudler <mudler@localai.io>
Co-authored-by: Ettore Di Giacinto <mudler@users.noreply.github.com>
Co-authored-by: mudler <mudler@localai.io>
2024-05-14 19:40:18 +02:00
Ettore Di Giacinto
84e2407afa feat(functions): allow to set JSON matcher (#2319)
Signed-off-by: mudler <mudler@localai.io>
2024-05-14 09:39:20 +02:00
Ettore Di Giacinto
c4186f13c3 feat(functions): support models with no grammar and no regex (#2315)
Signed-off-by: Ettore Di Giacinto <mudler@localai.io>
2024-05-14 00:32:32 +02:00
LocalAI [bot]
4ac7956f68 ⬆️ Update ggerganov/whisper.cpp (#2317)
Signed-off-by: github-actions[bot] <41898282+github-actions[bot]@users.noreply.github.com>
Co-authored-by: mudler <2420543+mudler@users.noreply.github.com>
2024-05-13 22:25:14 +00:00
Ettore Di Giacinto
e49ea0123b feat(llama.cpp): add flash_attention and no_kv_offloading (#2310)
feat(llama.cpp): add flash_attn and no_kv_offload

Signed-off-by: Ettore Di Giacinto <mudler@localai.io>
2024-05-13 19:07:51 +02:00
Ettore Di Giacinto
7123d07456 models(gallery): add orthocopter (#2313)
Signed-off-by: Ettore Di Giacinto <mudler@localai.io>
2024-05-13 18:45:58 +02:00
Ettore Di Giacinto
2db22087ae models(gallery): add lumimaidv2 (#2312)
Signed-off-by: Ettore Di Giacinto <mudler@localai.io>
2024-05-13 18:44:44 +02:00
Ettore Di Giacinto
fa7b2aee9c models(gallery): add Bunny-llama (#2311)
Signed-off-by: Ettore Di Giacinto <mudler@localai.io>
2024-05-13 18:44:25 +02:00
Ettore Di Giacinto
4d70b6fb2d models(gallery): add aura-llama-Abliterated (#2309)
Signed-off-by: Ettore Di Giacinto <mudler@localai.io>
2024-05-13 18:44:10 +02:00
Sertaç Özercan
e2c3ffb09b feat: auto select llama-cpp cpu variant (#2305)
* auto select cpu variant

Signed-off-by: Sertac Ozercan <sozercan@gmail.com>

* remove cuda target for now

Signed-off-by: Sertac Ozercan <sozercan@gmail.com>

* fix metal

Signed-off-by: Sertac Ozercan <sozercan@gmail.com>

* fix path

Signed-off-by: Sertac Ozercan <sozercan@gmail.com>

---------

Signed-off-by: Sertac Ozercan <sozercan@gmail.com>
2024-05-13 11:37:52 +02:00
LocalAI [bot]
b4cb22f444 ⬆️ Update ggerganov/llama.cpp (#2303)
Signed-off-by: github-actions[bot] <41898282+github-actions[bot]@users.noreply.github.com>
Co-authored-by: mudler <2420543+mudler@users.noreply.github.com>
2024-05-12 21:18:59 +00:00
LocalAI [bot]
5534b13903 feat(swagger): update swagger (#2302)
Signed-off-by: github-actions[bot] <41898282+github-actions[bot]@users.noreply.github.com>
Co-authored-by: mudler <2420543+mudler@users.noreply.github.com>
2024-05-12 21:00:18 +00:00
fakezeta
5b79bd04a7 add setuptools for openvino (#2301) 2024-05-12 19:31:43 +00:00
Ettore Di Giacinto
9d8c705fd9 feat(ui): display number of available models for installation (#2298)
Signed-off-by: Ettore Di Giacinto <mudler@localai.io>
2024-05-12 14:24:36 +02:00
Ettore Di Giacinto
310b2171be models(gallery): add llama-3-refueled (#2297)
Signed-off-by: Ettore Di Giacinto <mudler@localai.io>
2024-05-12 09:39:58 +02:00
Ettore Di Giacinto
98af0b5d85 models(gallery): add jsl-medllama-3-8b-v2.0 (#2296)
Signed-off-by: Ettore Di Giacinto <mudler@localai.io>
2024-05-12 09:38:05 +02:00
Ettore Di Giacinto
ca14f95d2c models(gallery): add l3-chaoticsoliloquy-v1.5-4x8b (#2295)
Signed-off-by: Ettore Di Giacinto <mudler@localai.io>
2024-05-12 09:37:55 +02:00
Ikko Eltociear Ashimine
1b69b338c0 docs: Update semantic-todo/README.md (#2294)
seperate -> separate

Signed-off-by: Ikko Eltociear Ashimine <eltociear@gmail.com>
2024-05-12 09:02:11 +02:00
cryptk
88942e4761 fix: add missing openvino/optimum/etc libraries for Intel, fixes #2289 (#2292)
Signed-off-by: Chris Jowett <421501+cryptk@users.noreply.github.com>
2024-05-12 09:01:45 +02:00
Ettore Di Giacinto
efa32a2677 feat(grammar): support models with specific construct (#2291)
When enabling grammar with functions, it might be useful to
allow more flexibility to support models that are fine-tuned against returning
function calls of the form of { "name": "function_name", "arguments" {...} }
rather then { "function": "function_name", "arguments": {..} }.

This might call out to a more generic approach later on, but for the moment being we can easily support both
as we have just to specific different types.

If needed we can expand on this later on

Signed-off-by: mudler <mudler@localai.io>
2024-05-12 01:13:22 +02:00
LocalAI [bot]
dfc420706c ⬆️ Update ggerganov/llama.cpp (#2290)
Signed-off-by: github-actions[bot] <41898282+github-actions[bot]@users.noreply.github.com>
Co-authored-by: mudler <2420543+mudler@users.noreply.github.com>
2024-05-11 21:16:34 +00:00
cryptk
e2de8a88f7 feat: create bash library to handle install/run/test of python backends (#2286)
* feat: create bash library to handle install/run/test of python backends

Signed-off-by: Chris Jowett <421501+cryptk@users.noreply.github.com>

* chore: minor cleanup

Signed-off-by: Chris Jowett <421501+cryptk@users.noreply.github.com>

* fix: remove incorrect LIMIT_TARGETS from parler-tts

Signed-off-by: Chris Jowett <421501+cryptk@users.noreply.github.com>

* fix: update runUnitests to handle running tests from a custom test file

Signed-off-by: Chris Jowett <421501+cryptk@users.noreply.github.com>

* chore: document runUnittests

Signed-off-by: Chris Jowett <421501+cryptk@users.noreply.github.com>

---------

Signed-off-by: Chris Jowett <421501+cryptk@users.noreply.github.com>
2024-05-11 18:32:46 +02:00
Ettore Di Giacinto
7f4febd6c2 models(gallery): add Llama-3-8B-Instruct-abliterated (#2288)
Signed-off-by: Ettore Di Giacinto <mudler@localai.io>
2024-05-11 10:10:57 +02:00
LocalAI [bot]
93e581dfd0 ⬆️ Update ggerganov/llama.cpp (#2285)
Signed-off-by: github-actions[bot] <41898282+github-actions[bot]@users.noreply.github.com>
Co-authored-by: mudler <2420543+mudler@users.noreply.github.com>
2024-05-10 21:09:22 +00:00
Ettore Di Giacinto
cf513efa78 Update openai-functions.md
Signed-off-by: Ettore Di Giacinto <mudler@users.noreply.github.com>
2024-05-10 17:09:51 +02:00
Ettore Di Giacinto
9e8b34427a Update openai-functions.md
Signed-off-by: Ettore Di Giacinto <mudler@users.noreply.github.com>
2024-05-10 17:05:16 +02:00
Ettore Di Giacinto
88d0aa1e40 docs: update function docs
Signed-off-by: Ettore Di Giacinto <mudler@users.noreply.github.com>
2024-05-10 17:03:56 +02:00
Ettore Di Giacinto
9b09eb005f build: do not specify a BUILD_ID by default (#2284)
Signed-off-by: Ettore Di Giacinto <mudler@localai.io>
2024-05-10 16:01:55 +02:00
Ettore Di Giacinto
4db41b71f3 models(gallery): add aloe (#2283)
Signed-off-by: Ettore Di Giacinto <mudler@localai.io>
2024-05-10 16:01:47 +02:00
cryptk
28a421cb1d feat: migrate python backends from conda to uv (#2215)
* feat: migrate diffusers backend from conda to uv

  - replace conda with UV for diffusers install (prototype for all
    extras backends)
  - add ability to build docker with one/some/all extras backends
    instead of all or nothing

Signed-off-by: Chris Jowett <421501+cryptk@users.noreply.github.com>

* feat: migrate autogtpq bark coqui from conda to uv

Signed-off-by: Chris Jowett <421501+cryptk@users.noreply.github.com>

* feat: convert exllama over to uv

Signed-off-by: Chris Jowett <421501+cryptk@users.noreply.github.com>

* feat: migrate exllama2 to uv

Signed-off-by: Chris Jowett <421501+cryptk@users.noreply.github.com>

* feat: migrate mamba to uv

Signed-off-by: Chris Jowett <421501+cryptk@users.noreply.github.com>

* feat: migrate parler to uv

Signed-off-by: Chris Jowett <421501+cryptk@users.noreply.github.com>

* feat: migrate petals to uv

Signed-off-by: Chris Jowett <421501+cryptk@users.noreply.github.com>

* fix: fix tests

Signed-off-by: Chris Jowett <421501+cryptk@users.noreply.github.com>

* feat: migrate rerankers to uv

Signed-off-by: Chris Jowett <421501+cryptk@users.noreply.github.com>

* feat: migrate sentencetransformers to uv

Signed-off-by: Chris Jowett <421501+cryptk@users.noreply.github.com>

* fix: install uv for tests-linux

Signed-off-by: Chris Jowett <421501+cryptk@users.noreply.github.com>

* fix: make sure file exists before installing on intel images

Signed-off-by: Chris Jowett <421501+cryptk@users.noreply.github.com>

* feat: migrate transformers backend to uv

Signed-off-by: Chris Jowett <421501+cryptk@users.noreply.github.com>

* feat: migrate transformers-musicgen to uv

Signed-off-by: Chris Jowett <421501+cryptk@users.noreply.github.com>

* feat: migrate vall-e-x to uv

Signed-off-by: Chris Jowett <421501+cryptk@users.noreply.github.com>

* feat: migrate vllm to uv

Signed-off-by: Chris Jowett <421501+cryptk@users.noreply.github.com>

* fix: add uv install to the rest of test-extra.yml

Signed-off-by: Chris Jowett <421501+cryptk@users.noreply.github.com>

* fix: adjust file perms on all install/run/test scripts

Signed-off-by: Chris Jowett <421501+cryptk@users.noreply.github.com>

* fix: add missing acclerate dependencies

Signed-off-by: Chris Jowett <421501+cryptk@users.noreply.github.com>

* fix: add some more missing dependencies to python backends

Signed-off-by: Chris Jowett <421501+cryptk@users.noreply.github.com>

* fix: parler tests venv py dir fix

Signed-off-by: Chris Jowett <421501+cryptk@users.noreply.github.com>

* fix: correct filename for transformers-musicgen tests

Signed-off-by: Chris Jowett <421501+cryptk@users.noreply.github.com>

* fix: adjust the pwd for valle tests

Signed-off-by: Chris Jowett <421501+cryptk@users.noreply.github.com>

* feat: cleanup and optimization work for uv migration

Signed-off-by: Chris Jowett <421501+cryptk@users.noreply.github.com>

* fix: add setuptools to requirements-install for mamba

Signed-off-by: Chris Jowett <421501+cryptk@users.noreply.github.com>

* feat: more size optimization work

Signed-off-by: Chris Jowett <421501+cryptk@users.noreply.github.com>

* feat: make installs and tests more consistent, cleanup some deps

Signed-off-by: Chris Jowett <421501+cryptk@users.noreply.github.com>

* fix: cleanup

Signed-off-by: Chris Jowett <421501+cryptk@users.noreply.github.com>

* fix: mamba backend is cublas only

Signed-off-by: Chris Jowett <421501+cryptk@users.noreply.github.com>

* fix: uncomment lines in makefile

Signed-off-by: Chris Jowett <421501+cryptk@users.noreply.github.com>

---------

Signed-off-by: Chris Jowett <421501+cryptk@users.noreply.github.com>
2024-05-10 15:08:08 +02:00
LocalAI [bot]
e6768097f4 ⬆️ Update docs version mudler/LocalAI (#2280)
Signed-off-by: github-actions[bot] <41898282+github-actions[bot]@users.noreply.github.com>
Co-authored-by: mudler <2420543+mudler@users.noreply.github.com>
2024-05-10 09:10:00 +02:00
LocalAI [bot]
18a04246fa ⬆️ Update ggerganov/llama.cpp (#2281)
Signed-off-by: github-actions[bot] <41898282+github-actions[bot]@users.noreply.github.com>
Co-authored-by: mudler <2420543+mudler@users.noreply.github.com>
2024-05-09 22:18:49 +00:00
LocalAI [bot]
f69de3be0d models(gallery): ⬆️ update checksum (#2278)
⬆️ Checksum updates in gallery/index.yaml

Signed-off-by: github-actions[bot] <41898282+github-actions[bot]@users.noreply.github.com>
Co-authored-by: mudler <2420543+mudler@users.noreply.github.com>
2024-05-09 12:21:24 +00:00
Ettore Di Giacinto
650ae620c5 ci: get latest git version
Signed-off-by: Ettore Di Giacinto <mudler@users.noreply.github.com>
2024-05-09 11:33:16 +02:00
Ettore Di Giacinto
6a209cbef6 ci: get file name correctly in checksum_checker.sh
Signed-off-by: Ettore Di Giacinto <mudler@users.noreply.github.com>
2024-05-09 10:57:23 +02:00
Ettore Di Giacinto
9786bb826d ci: try to fix checksum_checker.sh
Signed-off-by: Ettore Di Giacinto <mudler@users.noreply.github.com>
2024-05-09 09:34:07 +02:00
Ettore Di Giacinto
9b4c6f348a Update checksum_checker.yaml
Signed-off-by: Ettore Di Giacinto <mudler@users.noreply.github.com>
2024-05-09 00:57:22 +02:00
Ettore Di Giacinto
cb6ddb21ec Update checksum_checker.yaml
Signed-off-by: Ettore Di Giacinto <mudler@users.noreply.github.com>
2024-05-09 00:55:48 +02:00
Ettore Di Giacinto
0baacca605 Update checksum_checker.yaml
Signed-off-by: Ettore Di Giacinto <mudler@users.noreply.github.com>
2024-05-09 00:54:35 +02:00
Ettore Di Giacinto
222d714ec7 Update checksum_checker.yaml
Signed-off-by: Ettore Di Giacinto <mudler@users.noreply.github.com>
2024-05-09 00:51:57 +02:00
Ettore Di Giacinto
fd2d89d37b Update checksum_checker.sh
Signed-off-by: Ettore Di Giacinto <mudler@users.noreply.github.com>
2024-05-09 00:43:16 +02:00
Ettore Di Giacinto
6440b608dc Update checksum_checker.yaml
Signed-off-by: Ettore Di Giacinto <mudler@users.noreply.github.com>
2024-05-09 00:42:48 +02:00
Ettore Di Giacinto
1937118eab Update checksum_checker.yaml
Signed-off-by: Ettore Di Giacinto <mudler@users.noreply.github.com>
2024-05-09 00:34:56 +02:00
Ettore Di Giacinto
bc272d1e4b ci: add checksum checker pipeline (#2274)
Signed-off-by: mudler <mudler@localai.io>
2024-05-09 00:31:27 +02:00
LocalAI [bot]
d651f390cd ⬆️ Update ggerganov/whisper.cpp (#2273)
Signed-off-by: github-actions[bot] <41898282+github-actions[bot]@users.noreply.github.com>
Co-authored-by: mudler <2420543+mudler@users.noreply.github.com>
2024-05-08 22:11:10 +00:00
Ettore Di Giacinto
ea777f8716 models(gallery): update SHA for einstein
Signed-off-by: Ettore Di Giacinto <mudler@users.noreply.github.com>
2024-05-08 23:40:58 +02:00
LocalAI [bot]
eca5200fbd ⬆️ Update ggerganov/llama.cpp (#2272)
Signed-off-by: github-actions[bot] <41898282+github-actions[bot]@users.noreply.github.com>
Co-authored-by: mudler <2420543+mudler@users.noreply.github.com>
2024-05-08 21:34:56 +00:00
Ettore Di Giacinto
0809e9e7a0 models(gallery): fix openbiollm typo
Signed-off-by: Ettore Di Giacinto <mudler@users.noreply.github.com>
2024-05-08 23:19:43 +02:00
LocalAI [bot]
b66baa3db6 ⬆️ Update docs version mudler/LocalAI (#2271)
Signed-off-by: github-actions[bot] <41898282+github-actions[bot]@users.noreply.github.com>
Co-authored-by: mudler <2420543+mudler@users.noreply.github.com>
2024-05-08 21:10:30 +00:00
Ettore Di Giacinto
6eb77f0d3a models(gallery): add tiamat (#2269)
Signed-off-by: Ettore Di Giacinto <mudler@localai.io>
2024-05-08 19:59:42 +02:00
Ettore Di Giacinto
b20354b3ad models(gallery): add aurora (#2270)
Signed-off-by: Ettore Di Giacinto <mudler@localai.io>
2024-05-08 19:55:39 +02:00
Ettore Di Giacinto
d6f76c75e1 models(gallery): add kunocchini (#2268)
Signed-off-by: Ettore Di Giacinto <mudler@localai.io>
2024-05-08 19:52:08 +02:00
Ettore Di Giacinto
ed4f412f1c models(gallery): add lumimaid variant (#2267)
Signed-off-by: Ettore Di Giacinto <mudler@localai.io>
2024-05-08 19:51:53 +02:00
Ettore Di Giacinto
5bf56e01aa models(gallery): add tess (#2266)
Signed-off-by: Ettore Di Giacinto <mudler@localai.io>
2024-05-08 19:51:44 +02:00
Ettore Di Giacinto
5ff5f0b393 fix(ux): fix small glitches (#2265)
also drop duplicates for displaying in-progress model ops

Signed-off-by: Ettore Di Giacinto <mudler@localai.io>
2024-05-08 19:34:33 +02:00
Ettore Di Giacinto
6559ac11b1 feat(ui): prompt for chat, support vision, enhancements (#2259)
* feat(ui): allow to set system prompt for chat

Make also the models in the index clickable, and display as table

Fixes #2257

Signed-off-by: Ettore Di Giacinto <mudler@localai.io>

* feat(vision): support also png with base64 input

Signed-off-by: Ettore Di Giacinto <mudler@localai.io>

* feat(ui): support vision and upload of files

Signed-off-by: Ettore Di Giacinto <mudler@localai.io>

* display the processed image

Signed-off-by: Ettore Di Giacinto <mudler@localai.io>

* make trust remote code stand out

Signed-off-by: mudler <mudler@localai.io>

* feat(ui): track in progress job across index/model gallery

Signed-off-by: mudler <mudler@localai.io>

* minor fixups

Signed-off-by: mudler <mudler@localai.io>

---------

Signed-off-by: Ettore Di Giacinto <mudler@localai.io>
Signed-off-by: mudler <mudler@localai.io>
2024-05-08 00:42:34 +02:00
Ettore Di Giacinto
02ec546dd6 models(gallery): Add Soliloquy (#2260)
Signed-off-by: Ettore Di Giacinto <mudler@users.noreply.github.com>
2024-05-08 00:14:19 +02:00
LocalAI [bot]
995aa5ed21 ⬆️ Update ggerganov/llama.cpp (#2263)
Signed-off-by: github-actions[bot] <41898282+github-actions[bot]@users.noreply.github.com>
Co-authored-by: mudler <2420543+mudler@users.noreply.github.com>
2024-05-07 21:39:12 +00:00
Michael Mior
e28ba4b807 Add missing Homebrew dependencies (#2256)
Signed-off-by: Michael Mior <michael.mior@gmail.com>
Signed-off-by: Michael Mior <mmior@mail.rit.edu>
2024-05-07 16:34:30 +00:00
Daniel
d1e3436de5 Update readme: add ShellOracle to community integrations (#2254)
Signed-off-by: Daniel Copley <djcopley@users.noreply.github.com>
2024-05-07 08:39:58 +02:00
Dave
d3ddc9e4aa UI: flag trust_remote_code to users // favicon support (#2253)
* attempt to indicate trust_remote_code in some way

* bonus: favicon support!

---------

Signed-off-by: Dave Lee <dave@gray101.com>
2024-05-07 08:39:23 +02:00
fakezeta
fea9522982 fix: OpenVINO winograd always disabled (#2252)
Winograd convolutions were always disabled giving error when inference device was CPU.
This commit implement logic to disable Winograd convolutions only if CPU or NPU are declared.
2024-05-07 08:38:58 +02:00
Ettore Di Giacinto
fe055d4b36 feat(webui): ux improvements (#2247)
* ux: change welcome when there are no models installed

Signed-off-by: Ettore Di Giacinto <mudler@localai.io>

* ux: filter

Signed-off-by: Ettore Di Giacinto <mudler@localai.io>

* ux: show tags in filter

Signed-off-by: Ettore Di Giacinto <mudler@localai.io>

* wip

Signed-off-by: Ettore Di Giacinto <mudler@localai.io>

* make tags clickable

Signed-off-by: Ettore Di Giacinto <mudler@localai.io>

* allow to delete models from the list

Signed-off-by: Ettore Di Giacinto <mudler@localai.io>

* ui: display icon of installed models

Signed-off-by: Ettore Di Giacinto <mudler@localai.io>

* gallery: remove gallery file when removing model

Signed-off-by: Ettore Di Giacinto <mudler@localai.io>

* feat(gallery): show a re-install button

Signed-off-by: Ettore Di Giacinto <mudler@localai.io>

* make filter buttons, rename Gallery field

Signed-off-by: mudler <mudler@localai.io>

* show again buttons at end of operations

Signed-off-by: mudler <mudler@localai.io>

---------

Signed-off-by: Ettore Di Giacinto <mudler@localai.io>
Signed-off-by: mudler <mudler@localai.io>
2024-05-07 01:17:07 +02:00
LocalAI [bot]
581b894789 ⬆️ Update ggerganov/llama.cpp (#2255)
Signed-off-by: github-actions[bot] <41898282+github-actions[bot]@users.noreply.github.com>
Co-authored-by: mudler <2420543+mudler@users.noreply.github.com>
2024-05-06 21:28:07 +00:00
Ettore Di Giacinto
477655f6e6 models(gallery): average_norrmie reupload
Signed-off-by: Ettore Di Giacinto <mudler@users.noreply.github.com>
2024-05-06 19:56:24 +02:00
fakezeta
169d8d21ff gallery: Added some OpenVINO models (#2249)
* Added some OpenVINO models

Added Phi-3 trust_remote_code: true
Added Hermes 2 Pro Llama3
Added Multilingual-E5-base embedding model with OpenVINO acceleration (CPU and XPU)
Added all-MiniLM-L6-v2 with OpenVINO acceleration (CPU and XPU)

* Added Remote Code for phi, fixed error on Yamllint

* update openvino.yaml

I need to go to rest: today is not my day...
2024-05-06 10:52:05 +02:00
LocalAI [bot]
c5475020fe ⬆️ Update ggerganov/llama.cpp (#2251)
Signed-off-by: github-actions[bot] <41898282+github-actions[bot]@users.noreply.github.com>
Co-authored-by: mudler <2420543+mudler@users.noreply.github.com>
2024-05-05 21:16:00 +00:00
Dave
b52ff1249f test: check the response URL during image gen in app_test.go (#2248)
test: actually check the response URL from image gen

Signed-off-by: Dave Lee <dave@gray101.com>
2024-05-05 18:46:33 +00:00
Ettore Di Giacinto
c5798500cb feat(single-build): generate single binaries for releases (#2246)
* feat(single-build): generate single binaries for releases

Signed-off-by: Ettore Di Giacinto <mudler@localai.io>

* drop old targets

Signed-off-by: Ettore Di Giacinto <mudler@localai.io>

---------

Signed-off-by: Ettore Di Giacinto <mudler@localai.io>
2024-05-05 17:20:51 +02:00
Ettore Di Giacinto
67ad3532ec Update README.md
Signed-off-by: Ettore Di Giacinto <mudler@users.noreply.github.com>
2024-05-05 15:45:55 +02:00
Ettore Di Giacinto
5cb96fe7df models(gallery): add openbiollm (#2245)
Signed-off-by: Ettore Di Giacinto <mudler@localai.io>
2024-05-05 15:19:46 +02:00
Ettore Di Giacinto
810e8e5855 models(gallery): add lumimaid (#2244)
Signed-off-by: Ettore Di Giacinto <mudler@localai.io>
2024-05-05 15:19:33 +02:00
Ettore Di Giacinto
f3bcc648e7 models(gallery): add icon for instruct-coder
Signed-off-by: Ettore Di Giacinto <mudler@users.noreply.github.com>
2024-05-05 12:20:06 +02:00
Ettore Di Giacinto
3096566333 models(gallery): poppy porpoise fix
correct mmproj URL

Signed-off-by: Ettore Di Giacinto <mudler@users.noreply.github.com>
2024-05-05 11:56:07 +02:00
Ettore Di Giacinto
f50c6a4e88 models(gallery): update poppy porpoise (#2243)
Signed-off-by: Ettore Di Giacinto <mudler@users.noreply.github.com>
2024-05-05 11:19:09 +02:00
Ettore Di Giacinto
ab4ee54855 models(gallery): add llama3-instruct-coder (#2242)
Signed-off-by: Ettore Di Giacinto <mudler@localai.io>
2024-05-05 11:18:50 +02:00
Ettore Di Giacinto
f2d35062d4 models(gallery): moondream2 fixups
Signed-off-by: Ettore Di Giacinto <mudler@users.noreply.github.com>
2024-05-05 10:49:04 +02:00
Ettore Di Giacinto
b69ff46c7e feat(startup): show CPU/GPU information with --debug (#2241)
Signed-off-by: mudler <mudler@localai.io>
2024-05-05 09:10:23 +02:00
Ettore Di Giacinto
117c9873e1 fix(webui): display small navbar with smaller screens (#2240)
Signed-off-by: mudler <mudler@localai.io>
2024-05-04 23:38:39 +02:00
LocalAI [bot]
17e94fbcb1 ⬆️ Update ggerganov/llama.cpp (#2239)
Signed-off-by: github-actions[bot] <41898282+github-actions[bot]@users.noreply.github.com>
Co-authored-by: mudler <2420543+mudler@users.noreply.github.com>
2024-05-04 21:26:22 +00:00
Ettore Di Giacinto
92f7feb874 models(gallery): add llama3-llava (#2238)
Signed-off-by: Ettore Di Giacinto <mudler@localai.io>
2024-05-04 22:43:11 +02:00
Ettore Di Giacinto
b70e2bffa3 models(gallery): add moondream2 (#2237)
* models(gallery): add moondream2

Signed-off-by: Ettore Di Giacinto <mudler@localai.io>

* models(gallery): fix typo for TTS models

Signed-off-by: Ettore Di Giacinto <mudler@localai.io>

* models(gallery): add base config for moondream2 and icon

Signed-off-by: Ettore Di Giacinto <mudler@localai.io>

* linter fixes

Signed-off-by: Ettore Di Giacinto <mudler@localai.io>

---------

Signed-off-by: Ettore Di Giacinto <mudler@localai.io>
2024-05-04 18:29:04 +02:00
nold
06c43ca285 fix(gallery): hermes-2-pro-llama3 models checksum changed (#2236)
fix(gallery): hermes-2-pro-llama3 models checksum

Signed-off-by: Gerrit Pannek <nold@gnu.one>
2024-05-04 17:59:54 +02:00
Ettore Di Giacinto
530bec9c64 feat(llama.cpp): do not specify backends to autoload and add llama.cpp variants (#2232)
* feat(initializer): do not specify backends to autoload

We can simply try to autoload the backends extracted in the asset dir.
This will allow to build variants of the same backend (for e.g. with different instructions sets),
so to have a single binary for all the variants.

Signed-off-by: mudler <mudler@localai.io>

* refactor(prepare): refactor out llama.cpp prepare steps

Make it so are idempotent and that we can re-build

Signed-off-by: mudler <mudler@localai.io>

* [TEST] feat(build): build noavx version along

Signed-off-by: mudler <mudler@localai.io>

* build: make build parallel

Signed-off-by: Ettore Di Giacinto <mudler@localai.io>

* build: do not override CMAKE_ARGS

Signed-off-by: Ettore Di Giacinto <mudler@localai.io>

* build: add fallback variant

Signed-off-by: Ettore Di Giacinto <mudler@localai.io>

* Fixups

Signed-off-by: Ettore Di Giacinto <mudler@localai.io>

* fix(huggingface-langchain): fail if no token is set

Signed-off-by: Ettore Di Giacinto <mudler@localai.io>

* fix(huggingface-langchain): rename

Signed-off-by: Ettore Di Giacinto <mudler@localai.io>

* fix: do not autoload local-store

Signed-off-by: Ettore Di Giacinto <mudler@localai.io>

* fix: give priority between the listed backends

Signed-off-by: Ettore Di Giacinto <mudler@localai.io>

---------

Signed-off-by: mudler <mudler@localai.io>
Signed-off-by: Ettore Di Giacinto <mudler@localai.io>
2024-05-04 17:56:12 +02:00
fakezeta
fa10302dd2 docs: updated Transformer parameters description (#2234)
updated Transformer parameters
2024-05-04 10:45:25 +02:00
Ettore Di Giacinto
54faaa87ea fix(webui): correct documentation URL for text2img (#2233)
Signed-off-by: mudler <mudler@localai.io>
Co-authored-by: Dave <dave@gray101.com>
2024-05-04 00:25:13 +00:00
dependabot[bot]
daba8a85f9 build(deps): bump tqdm from 4.65.0 to 4.66.3 in /examples/langchain/langchainpy-localai-example in the pip group across 1 directory (#2231)
build(deps): bump tqdm

Bumps the pip group with 1 update in the /examples/langchain/langchainpy-localai-example directory: [tqdm](https://github.com/tqdm/tqdm).


Updates `tqdm` from 4.65.0 to 4.66.3
- [Release notes](https://github.com/tqdm/tqdm/releases)
- [Commits](https://github.com/tqdm/tqdm/compare/v4.65.0...v4.66.3)

---
updated-dependencies:
- dependency-name: tqdm
  dependency-type: direct:production
  dependency-group: pip
...

Signed-off-by: dependabot[bot] <support@github.com>
Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com>
2024-05-03 23:15:06 +00:00
LocalAI [bot]
ac0f3d6e82 ⬆️ Update ggerganov/whisper.cpp (#2230)
Signed-off-by: github-actions[bot] <41898282+github-actions[bot]@users.noreply.github.com>
Co-authored-by: mudler <2420543+mudler@users.noreply.github.com>
2024-05-03 22:16:26 +00:00
LocalAI [bot]
da0b6a89ae ⬆️ Update ggerganov/llama.cpp (#2229)
Signed-off-by: github-actions[bot] <41898282+github-actions[bot]@users.noreply.github.com>
Co-authored-by: mudler <2420543+mudler@users.noreply.github.com>
2024-05-03 21:39:28 +00:00
LocalAI [bot]
929a68c06d ⬆️ Update docs version mudler/LocalAI (#2228)
Signed-off-by: github-actions[bot] <41898282+github-actions[bot]@users.noreply.github.com>
Co-authored-by: mudler <2420543+mudler@users.noreply.github.com>
2024-05-03 21:18:11 +00:00
cryptk
a0aa5d01a1 feat: update ROCM and use smaller image (#2196)
* feat: update ROCM and use smaller image

Signed-off-by: Chris Jowett <421501+cryptk@users.noreply.github.com>

* fix: add call to ldconfig to fix AMDs broken library packages

Signed-off-by: Chris Jowett <421501+cryptk@users.noreply.github.com>

---------

Signed-off-by: Chris Jowett <421501+cryptk@users.noreply.github.com>
2024-05-03 18:46:49 +02:00
Ettore Di Giacinto
dc834cc9d2 Update README.md
Signed-off-by: Ettore Di Giacinto <mudler@users.noreply.github.com>
2024-05-03 09:42:35 +02:00
Ettore Di Giacinto
b58274b8a2 feat(ui): support multilineand style ul (#2226)
* feat(ui/chat): handle multiline in the input field

Signed-off-by: mudler <mudler@localai.io>

* feat(ui/chat): correctly display multiline messages

Signed-off-by: mudler <mudler@localai.io>

* feat(ui/chat): add list style

Signed-off-by: mudler <mudler@localai.io>

---------

Signed-off-by: mudler <mudler@localai.io>
2024-05-03 00:43:02 +02:00
Ettore Di Giacinto
a31d00d904 feat(aio): switch to llama3-based for LLM (#2225)
Signed-off-by: mudler <mudler@localai.io>
2024-05-03 00:41:45 +02:00
LocalAI [bot]
2cc1bd85af ⬆️ Update ggerganov/llama.cpp (#2224)
Signed-off-by: github-actions[bot] <41898282+github-actions[bot]@users.noreply.github.com>
Co-authored-by: mudler <2420543+mudler@users.noreply.github.com>
2024-05-02 21:23:40 +00:00
Ettore Di Giacinto
2c5a46bc34 feat(ux): Add chat, tts, and image-gen pages to the WebUI (#2222)
* feat(webui): Add chat page

Signed-off-by: Ettore Di Giacinto <mudler@localai.io>

* feat(webui): Add image-gen page

Signed-off-by: Ettore Di Giacinto <mudler@localai.io>

* feat(webui): Add tts page

Signed-off-by: Ettore Di Giacinto <mudler@localai.io>

---------

Signed-off-by: Ettore Di Giacinto <mudler@localai.io>
2024-05-02 21:14:10 +02:00
Ettore Di Giacinto
f7f8b4804b models(gallery): Add Hermes-2-Pro-Llama-3-8B-GGUF (#2218)
Signed-off-by: Ettore Di Giacinto <mudler@users.noreply.github.com>
2024-05-02 18:31:13 +02:00
Ettore Di Giacinto
e5bd9a76c7 models(gallery): add wizardlm2 (#2209)
Signed-off-by: Ettore Di Giacinto <mudler@localai.io>
2024-05-02 18:31:02 +02:00
fakezeta
4690b534e0 feat: user defined inference device for CUDA and OpenVINO (#2212)
user defined inference device

configuration via main_gpu parameter
2024-05-02 09:54:29 +02:00
LocalAI [bot]
6a7a7996bb ⬆️ Update ggerganov/llama.cpp (#2213)
Signed-off-by: github-actions[bot] <41898282+github-actions[bot]@users.noreply.github.com>
Co-authored-by: mudler <2420543+mudler@users.noreply.github.com>
2024-05-01 21:19:44 +00:00
Ettore Di Giacinto
962ebbaf77 models(gallery): fixup phi-3 sha
Signed-off-by: Ettore Di Giacinto <mudler@users.noreply.github.com>
2024-05-01 23:06:58 +02:00
LocalAI [bot]
f90d56d371 ⬆️ Update ggerganov/llama.cpp (#2203)
Signed-off-by: github-actions[bot] <41898282+github-actions[bot]@users.noreply.github.com>
Co-authored-by: mudler <2420543+mudler@users.noreply.github.com>
2024-04-30 21:53:31 +00:00
Ettore Di Giacinto
445cfd4db3 models(gallery): add guillaumetell (#2195)
Signed-off-by: Ettore Di Giacinto <mudler@localai.io>
2024-04-30 23:24:41 +02:00
Ettore Di Giacinto
b24d44dc56 models(gallery): add suzume-llama-3-8B-multilingual-gguf (#2194)
Signed-off-by: Ettore Di Giacinto <mudler@localai.io>
2024-04-30 23:24:28 +02:00
Ettore Di Giacinto
cd31f8d865 models(gallery): add lexifun (#2193)
Signed-off-by: Ettore Di Giacinto <mudler@localai.io>
2024-04-30 23:24:13 +02:00
Chris Jowett
970cb3a219 chore: update go-stablediffusion to latest commit with Make jobserver fix
Signed-off-by: Chris Jowett <421501+cryptk@users.noreply.github.com>
2024-04-30 15:59:28 -05:00
cryptk
f7aabf1b50 fix: bring everything onto the same GRPC version to fix tests (#2199)
fix: more places where we are installing grpc that need a version specified
fix: attempt to fix metal tests
fix: metal/brew is forcing an update, they don't have 1.58 available anymore

Signed-off-by: Chris Jowett <421501+cryptk@users.noreply.github.com>
2024-04-30 19:12:15 +00:00
fakezeta
e38610e521 feat: OpenVINO acceleration for embeddings in transformer backend (#2190)
OpenVINO acceleration for embeddings

New argument type: OVModelForFeatureExtraction
2024-04-30 10:13:04 +02:00
cryptk
3754f154ee feat: organize Dockerfile into distinct sections (#2181)
Signed-off-by: Chris Jowett <421501+cryptk@users.noreply.github.com>
2024-04-30 10:12:19 +02:00
LocalAI [bot]
29d7812344 ⬆️ Update ggerganov/whisper.cpp (#2188)
Signed-off-by: github-actions[bot] <41898282+github-actions[bot]@users.noreply.github.com>
Co-authored-by: mudler <2420543+mudler@users.noreply.github.com>
2024-04-29 22:16:04 +00:00
cryptk
5fd46175dc fix: ensure GNUMake jobserver is passed through to whisper.cpp build (#2187)
Signed-off-by: Chris Jowett <421501+cryptk@users.noreply.github.com>
2024-04-29 16:40:50 -05:00
LocalAI [bot]
52a268c38c ⬆️ Update ggerganov/llama.cpp (#2189)
Signed-off-by: github-actions[bot] <41898282+github-actions[bot]@users.noreply.github.com>
Co-authored-by: mudler <2420543+mudler@users.noreply.github.com>
2024-04-29 21:36:30 +00:00
dependabot[bot]
53c3842bc2 build(deps): bump dependabot/fetch-metadata from 2.0.0 to 2.1.0 (#2186)
Bumps [dependabot/fetch-metadata](https://github.com/dependabot/fetch-metadata) from 2.0.0 to 2.1.0.
- [Release notes](https://github.com/dependabot/fetch-metadata/releases)
- [Commits](https://github.com/dependabot/fetch-metadata/compare/v2.0.0...v2.1.0)

---
updated-dependencies:
- dependency-name: dependabot/fetch-metadata
  dependency-type: direct:production
  update-type: version-update:semver-minor
...

Signed-off-by: dependabot[bot] <support@github.com>
Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com>
2024-04-29 21:12:37 +00:00
Dave
c4f958e11b refactor(application): introduce application global state (#2072)
* start breaking up the giant channel refactor now that it's better understood - easier to merge bites

Signed-off-by: Dave Lee <dave@gray101.com>

* add concurrency and base64 back in, along with new base64 tests.

Signed-off-by: Dave Lee <dave@gray101.com>

* Automatic rename of whisper.go's Result to TranscriptResult

Signed-off-by: Dave Lee <dave@gray101.com>

* remove pkg/concurrency - significant changes coming in split 2

Signed-off-by: Dave Lee <dave@gray101.com>

* fix comments

Signed-off-by: Dave Lee <dave@gray101.com>

* add list_model service as another low-risk service to get it out of the way

Signed-off-by: Dave Lee <dave@gray101.com>

* split backend config loader into seperate file from the actual config struct. No changes yet, just reduce cognative load with smaller files of logical blocks

Signed-off-by: Dave Lee <dave@gray101.com>

* rename state.go ==> application.go

Signed-off-by: Dave Lee <dave@gray101.com>

* fix lost import?

Signed-off-by: Dave Lee <dave@gray101.com>

---------

Signed-off-by: Dave Lee <dave@gray101.com>
2024-04-29 17:42:37 +00:00
Ettore Di Giacinto
147440b39b docs: add reference for concurrent requests
Signed-off-by: Ettore Di Giacinto <mudler@users.noreply.github.com>
2024-04-29 18:31:50 +02:00
Ettore Di Giacinto
baff5ff8c2 models(gallery): add openvino models (#2184)
Signed-off-by: Ettore Di Giacinto <mudler@localai.io>
2024-04-29 18:17:47 +02:00
Ettore Di Giacinto
ea13863221 models(gallery): add llama3-32k (#2183)
Signed-off-by: Ettore Di Giacinto <mudler@localai.io>
2024-04-29 18:17:39 +02:00
cryptk
93ca56086e update go-tinydream to latest commit (#2182)
Signed-off-by: Chris Jowett <421501+cryptk@users.noreply.github.com>
2024-04-29 15:17:09 +02:00
Dave
11c48a0004 fix: security scanner warning noise: error handlers part 2 (#2145)
check off a few more error handlers

Signed-off-by: Dave Lee <dave@gray101.com>
2024-04-29 15:11:42 +02:00
fakezeta
b7ea9602f5 fix: undefined symbol: iJIT_NotifyEvent in import torch ##2153 (#2179)
* add  extra index to Intel repository

* Update install.sh
2024-04-29 15:11:09 +02:00
Dave
982dc6a2bd fix: github bump_docs.sh regex to drop emoji and other text (#2180)
fix: bump_docs regex

Signed-off-by: Dave Lee <dave@gray101.com>
2024-04-29 03:55:29 +00:00
Sijia Lu
74d903acca [Documentations] Removed invalid numberings from troubleshooting mac (#2174)
* updated troubleshooting mac

Signed-off-by: LeonSijiaLu <leonsijialu1@gmail.com>

* prepend -

Signed-off-by: LeonSijiaLu <leonsijialu1@gmail.com>

---------

Signed-off-by: LeonSijiaLu <leonsijialu1@gmail.com>
2024-04-29 02:21:51 +00:00
LocalAI [bot]
5fef3b0ff1 ⬆️ Update ggerganov/whisper.cpp (#2177)
Signed-off-by: github-actions[bot] <41898282+github-actions[bot]@users.noreply.github.com>
Co-authored-by: mudler <2420543+mudler@users.noreply.github.com>
2024-04-28 22:32:45 +00:00
Ettore Di Giacinto
0674893649 Update .env
Signed-off-by: Ettore Di Giacinto <mudler@users.noreply.github.com>
2024-04-28 23:56:10 +02:00
Ettore Di Giacinto
e8d44447ad feat(gallery): support model deletion (#2173)
* feat(gallery): op now supports deletion of models

Signed-off-by: Ettore Di Giacinto <mudler@localai.io>

* Wire things with WebUI(WIP)

Signed-off-by: Ettore Di Giacinto <mudler@localai.io>

* minor improvements

Signed-off-by: Ettore Di Giacinto <mudler@localai.io>

---------

Signed-off-by: Ettore Di Giacinto <mudler@localai.io>
2024-04-28 23:42:46 +02:00
Ettore Di Giacinto
a24cd4fda0 docs: enhance and condense few sections (#2178)
Signed-off-by: Ettore Di Giacinto <mudler@localai.io>
2024-04-28 23:41:59 +02:00
LocalAI [bot]
01860674c4 ⬆️ Update ggerganov/llama.cpp (#2176)
Signed-off-by: github-actions[bot] <41898282+github-actions[bot]@users.noreply.github.com>
Co-authored-by: mudler <2420543+mudler@users.noreply.github.com>
2024-04-28 21:41:12 +00:00
cryptk
987b7ad42d feat: only keep the build artifacts from the grpc build (#2172)
* feat: only keep the build artifacts from the grpc build

Signed-off-by: Chris Jowett <421501+cryptk@users.noreply.github.com>

* feat: remove separate Cache GRPC build step

Signed-off-by: Chris Jowett <421501+cryptk@users.noreply.github.com>

* feat: remove docker inspect step, it is leftover from previous debugging

Signed-off-by: Chris Jowett <421501+cryptk@users.noreply.github.com>

---------

Signed-off-by: Chris Jowett <421501+cryptk@users.noreply.github.com>
2024-04-28 19:24:16 +00:00
cryptk
21974fe1d3 fix: swap to WHISPER_CUDA per deprecation message from whisper.cpp (#2170)
Signed-off-by: Chris Jowett <421501+cryptk@users.noreply.github.com>
2024-04-28 17:51:53 +00:00
Sijia Lu
26e1892521 Issue-1720: Updated Build on mac documentations (#2171)
updated build on macs documentation

Signed-off-by: LeonSijiaLu <leonsijialu1@gmail.com>
2024-04-28 19:38:02 +02:00
Ettore Di Giacinto
a78cd67737 Update quickstart.md
Signed-off-by: Ettore Di Giacinto <mudler@users.noreply.github.com>
2024-04-28 19:30:23 +02:00
Ettore Di Giacinto
5e243ceaeb docs: update gallery, add rerankers (#2166)
Signed-off-by: Ettore Di Giacinto <mudler@localai.io>
2024-04-28 15:54:15 +02:00
QuinnPiers
1a0a6f60a7 docs: update model-gallery.md with correct gallery file (#2163)
* Update model-gallery.md with correct gallery file

The readme points to a file that hasn't been updated in months so when there are announcements about new models, user's won't get them pointing to the old file. Point to the updated files instead.

Signed-off-by: QuinnPiers <167640194+QuinnPiers@users.noreply.github.com>

* Update model-gallery.md

second pass with more understanding

Signed-off-by: QuinnPiers <167640194+QuinnPiers@users.noreply.github.com>

* Update model-gallery.md

Signed-off-by: QuinnPiers <167640194+QuinnPiers@users.noreply.github.com>

* Update model-gallery.md

Signed-off-by: QuinnPiers <167640194+QuinnPiers@users.noreply.github.com>

---------

Signed-off-by: QuinnPiers <167640194+QuinnPiers@users.noreply.github.com>
2024-04-28 12:34:15 +02:00
Ettore Di Giacinto
3179c019af Revert "⬆️ Update docs version mudler/LocalAI" (#2165)
* Revert "⬆️ Update docs version mudler/LocalAI (#2149)"

This reverts commit 56d843c263.

* Apply suggestions from code review

Signed-off-by: Ettore Di Giacinto <mudler@users.noreply.github.com>

---------

Signed-off-by: Ettore Di Giacinto <mudler@users.noreply.github.com>
2024-04-28 12:32:49 +02:00
Ettore Di Giacinto
a8089494fd models(gallery): add biomistral-7b (#2161)
* models(gallery): add biomistral-7b

Signed-off-by: Ettore Di Giacinto <mudler@users.noreply.github.com>

* add <|end_of_text|> to llama3 as stopword

Signed-off-by: Ettore Di Giacinto <mudler@users.noreply.github.com>

---------

Signed-off-by: Ettore Di Giacinto <mudler@users.noreply.github.com>
2024-04-28 12:04:04 +02:00
Ettore Di Giacinto
a248ede222 models(gallery): add Undi95/Llama-3-LewdPlay-8B-evo-GGUF (#2160)
Signed-off-by: Ettore Di Giacinto <mudler@users.noreply.github.com>
2024-04-28 12:02:50 +02:00
Ettore Di Giacinto
0f0ae13ad0 models(gallery): add poppy porpoise (#2158)
Signed-off-by: Ettore Di Giacinto <mudler@localai.io>
2024-04-28 12:01:01 +02:00
Ettore Di Giacinto
773d5d23d5 models(gallery): add solana (#2157)
Signed-off-by: Ettore Di Giacinto <mudler@localai.io>
2024-04-28 11:57:22 +02:00
LocalAI [bot]
c3982212f9 ⬆️ Update ggerganov/llama.cpp (#2159)
Signed-off-by: github-actions[bot] <41898282+github-actions[bot]@users.noreply.github.com>
Co-authored-by: mudler <2420543+mudler@users.noreply.github.com>
2024-04-27 21:32:43 +00:00
Ettore Di Giacinto
7e6bf6e7a1 ci: add auto-label rule for gallery in labeler.yml
Signed-off-by: Ettore Di Giacinto <mudler@users.noreply.github.com>
2024-04-27 19:52:26 +02:00
cryptk
9fc0135991 feat: cleanup Dockerfile and make final image a little smaller (#2146)
* feat: cleanup Dockerfile and make final image a little smaller

Signed-off-by: Chris Jowett <421501+cryptk@users.noreply.github.com>

* fix: add build-essential to final stage

Signed-off-by: Chris Jowett <421501+cryptk@users.noreply.github.com>

* fix: more GRPC cache misses

Signed-off-by: Chris Jowett <421501+cryptk@users.noreply.github.com>

* fix: correct for another cause of GRPC cache misses

Signed-off-by: Chris Jowett <421501+cryptk@users.noreply.github.com>

* feat: generate new GRPC cache automatically if needed

Signed-off-by: Chris Jowett <421501+cryptk@users.noreply.github.com>

* fix: use new GRPC_MAKEFLAGS build arg in GRPC cache generation

Signed-off-by: Chris Jowett <421501+cryptk@users.noreply.github.com>

---------

Signed-off-by: Chris Jowett <421501+cryptk@users.noreply.github.com>
2024-04-27 19:48:20 +02:00
Ettore Di Giacinto
164be58445 Update README.md
Signed-off-by: Ettore Di Giacinto <mudler@users.noreply.github.com>
2024-04-27 18:10:58 +02:00
Ettore Di Giacinto
1f8461767d models(gallery): add average_normie (#2155)
Signed-off-by: Ettore Di Giacinto <mudler@localai.io>
2024-04-27 17:59:31 +02:00
Ettore Di Giacinto
935f4c23f6 models(gallery): add SOVL (#2154)
Signed-off-by: Ettore Di Giacinto <mudler@localai.io>
2024-04-27 17:53:56 +02:00
Ettore Di Giacinto
4c97406f2b models(gallery): add Einstein v6.1 (#2152)
Signed-off-by: Ettore Di Giacinto <mudler@localai.io>
2024-04-27 12:30:15 +02:00
Ettore Di Giacinto
fb2a05ff43 feat(gallery): display job status also during navigation (#2151)
* feat(gallery): keep showing progress also when refreshing

Signed-off-by: Ettore Di Giacinto <mudler@localai.io>

* fix(intel-gpu): better defaults

Signed-off-by: Ettore Di Giacinto <mudler@localai.io>

* feat: make it thread-safe

Signed-off-by: mudler <mudler@localai.io>

---------

Signed-off-by: Ettore Di Giacinto <mudler@localai.io>
Signed-off-by: mudler <mudler@localai.io>
2024-04-27 09:08:33 +02:00
LocalAI [bot]
030d555995 ⬆️ Update ggerganov/llama.cpp (#2150)
Signed-off-by: github-actions[bot] <41898282+github-actions[bot]@users.noreply.github.com>
Co-authored-by: mudler <2420543+mudler@users.noreply.github.com>
2024-04-27 02:18:28 +00:00
LocalAI [bot]
56d843c263 ⬆️ Update docs version mudler/LocalAI (#2149)
Signed-off-by: github-actions[bot] <41898282+github-actions[bot]@users.noreply.github.com>
Co-authored-by: mudler <2420543+mudler@users.noreply.github.com>
2024-04-26 23:03:10 +00:00
Dave
2dc1fa2474 fix: config_file_watcher.go - root all file reads for safety (#2144)
callHandler() now has all file access rooted within DynamicConfigDir

Signed-off-by: Dave Lee <dave@gray101.com>
2024-04-26 16:46:35 +00:00
396 changed files with 34107 additions and 3978 deletions

View File

@@ -6,6 +6,11 @@ examples/chatbot-ui/models
examples/rwkv/models
examples/**/models
Dockerfile*
__pycache__
# SonarQube
.scannerwork
.scannerwork
# backend virtual environments
**/venv
backend/python/**/source

9
.env
View File

@@ -10,7 +10,7 @@
#
## Define galleries.
## models will to install will be visible in `/models/available`
# LOCALAI_GALLERIES=[{"name":"model-gallery", "url":"github:go-skynet/model-gallery/index.yaml"}]
# LOCALAI_GALLERIES=[{"name":"localai", "url":"github:mudler/LocalAI/gallery/index.yaml@master"}]
## CORS settings
# LOCALAI_CORS=true
@@ -71,6 +71,11 @@
### Define the number of parallel LLAMA.cpp workers (Defaults to 1)
# LLAMACPP_PARALLEL=1
### Define a list of GRPC Servers for llama-cpp workers to distribute the load
# https://github.com/ggerganov/llama.cpp/pull/6829
# https://github.com/ggerganov/llama.cpp/blob/master/examples/rpc/README.md
# LLAMACPP_GRPC_SERVERS=""
### Enable to run parallel requests
# LOCALAI_PARALLEL_REQUESTS=true
@@ -86,4 +91,4 @@
# LOCALAI_WATCHDOG_BUSY=true
#
# Time in duration format (e.g. 1h30m) after which a backend is considered busy
# LOCALAI_WATCHDOG_BUSY_TIMEOUT=5m
# LOCALAI_WATCHDOG_BUSY_TIMEOUT=5m

View File

@@ -2,6 +2,6 @@
set -xe
REPO=$1
LATEST_TAG=$(curl -s "https://api.github.com/repos/$REPO/releases/latest" | jq -r '.name')
LATEST_TAG=$(curl -s "https://api.github.com/repos/$REPO/releases/latest" | jq -r '.tag_name')
cat <<< $(jq ".version = \"$LATEST_TAG\"" docs/data/version.json) > docs/data/version.json

126
.github/checksum_checker.sh vendored Normal file
View File

@@ -0,0 +1,126 @@
#!/bin/bash
# This scripts needs yq and huggingface_hub to be installed
# to install hugingface_hub run pip install huggingface_hub
# Path to the input YAML file
input_yaml=$1
# Function to download file and check checksum using Python
function check_and_update_checksum() {
model_name="$1"
file_name="$2"
uri="$3"
old_checksum="$4"
idx="$5"
# Download the file and calculate new checksum using Python
new_checksum=$(python3 -c "
import hashlib
from huggingface_hub import hf_hub_download, get_paths_info
import requests
import sys
import os
uri = '$uri'
file_name = uri.split('/')[-1]
# Function to parse the URI and determine download method
# Function to parse the URI and determine download method
def parse_uri(uri):
if uri.startswith('huggingface://'):
repo_id = uri.split('://')[1]
return 'huggingface', repo_id.rsplit('/', 1)[0]
elif 'huggingface.co' in uri:
parts = uri.split('/resolve/')
if len(parts) > 1:
repo_path = parts[0].split('https://huggingface.co/')[-1]
return 'huggingface', repo_path
return 'direct', uri
def calculate_sha256(file_path):
sha256_hash = hashlib.sha256()
with open(file_path, 'rb') as f:
for byte_block in iter(lambda: f.read(4096), b''):
sha256_hash.update(byte_block)
return sha256_hash.hexdigest()
download_type, repo_id_or_url = parse_uri(uri)
new_checksum = None
# Decide download method based on URI type
if download_type == 'huggingface':
# Use HF API to pull sha
for file in get_paths_info(repo_id_or_url, [file_name], repo_type='model'):
try:
new_checksum = file.lfs.sha256
break
except Exception as e:
print(f'Error from Hugging Face Hub: {str(e)}', file=sys.stderr)
sys.exit(2)
if new_checksum is None:
try:
file_path = hf_hub_download(repo_id=repo_id_or_url, filename=file_name)
except Exception as e:
print(f'Error from Hugging Face Hub: {str(e)}', file=sys.stderr)
sys.exit(2)
else:
response = requests.get(repo_id_or_url)
if response.status_code == 200:
with open(file_name, 'wb') as f:
f.write(response.content)
file_path = file_name
elif response.status_code == 404:
print(f'File not found: {response.status_code}', file=sys.stderr)
sys.exit(2)
else:
print(f'Error downloading file: {response.status_code}', file=sys.stderr)
sys.exit(1)
if new_checksum is None:
new_checksum = calculate_sha256(file_path)
print(new_checksum)
os.remove(file_path)
else:
print(new_checksum)
")
if [[ "$new_checksum" == "" ]]; then
echo "Error calculating checksum for $file_name. Skipping..."
return
fi
echo "Checksum for $file_name: $new_checksum"
# Compare and update the YAML file if checksums do not match
result=$?
if [[ $result -eq 2 ]]; then
echo "File not found, deleting entry for $file_name..."
# yq eval -i "del(.[$idx].files[] | select(.filename == \"$file_name\"))" "$input_yaml"
elif [[ "$old_checksum" != "$new_checksum" ]]; then
echo "Checksum mismatch for $file_name. Updating..."
yq eval -i "del(.[$idx].files[] | select(.filename == \"$file_name\").sha256)" "$input_yaml"
yq eval -i "(.[$idx].files[] | select(.filename == \"$file_name\")).sha256 = \"$new_checksum\"" "$input_yaml"
elif [[ $result -ne 0 ]]; then
echo "Error downloading file $file_name. Skipping..."
else
echo "Checksum match for $file_name. No update needed."
fi
}
# Read the YAML and process each file
len=$(yq eval '. | length' "$input_yaml")
for ((i=0; i<$len; i++))
do
name=$(yq eval ".[$i].name" "$input_yaml")
files_len=$(yq eval ".[$i].files | length" "$input_yaml")
for ((j=0; j<$files_len; j++))
do
filename=$(yq eval ".[$i].files[$j].filename" "$input_yaml")
uri=$(yq eval ".[$i].files[$j].uri" "$input_yaml")
checksum=$(yq eval ".[$i].files[$j].sha256" "$input_yaml")
echo "Checking model $name, file $filename. URI = $uri, Checksum = $checksum"
check_and_update_checksum "$name" "$filename" "$uri" "$checksum" "$i"
done
done

297
.github/ci/modelslist.go vendored Normal file
View File

@@ -0,0 +1,297 @@
package main
import (
"fmt"
"html/template"
"io/ioutil"
"os"
"gopkg.in/yaml.v3"
)
var modelPageTemplate string = `
<!DOCTYPE html>
<html>
<head>
<meta charset="UTF-8">
<meta name="viewport" content="width=device-width, initial-scale=1.0">
<title>LocalAI models</title>
<link href="https://cdnjs.cloudflare.com/ajax/libs/flowbite/2.3.0/flowbite.min.css" rel="stylesheet" />
<script src="https://cdn.jsdelivr.net/npm/vanilla-lazyload@19.1.3/dist/lazyload.min.js"></script>
<link
rel="stylesheet"
href="https://cdn.jsdelivr.net/gh/highlightjs/cdn-release@11.8.0/build/styles/default.min.css"
/>
<script
defer
src="https://cdn.jsdelivr.net/gh/highlightjs/cdn-release@11.8.0/build/highlight.min.js"
></script>
<script
defer
src="https://cdn.jsdelivr.net/npm/alpinejs@3.x.x/dist/cdn.min.js"
></script>
<script
defer
src="https://cdn.jsdelivr.net/npm/marked/marked.min.js"
></script>
<script
defer
src="https://cdn.jsdelivr.net/npm/dompurify@3.0.6/dist/purify.min.js"
></script>
<link href="/static/general.css" rel="stylesheet" />
<link href="https://fonts.googleapis.com/css2?family=Inter:wght@400;600;700&family=Roboto:wght@400;500&display=swap" rel="stylesheet">
<link
href="https://fonts.googleapis.com/css?family=Roboto:300,400,500,700,900&display=swap"
rel="stylesheet" />
<link
rel="stylesheet"
href="https://cdn.jsdelivr.net/npm/tw-elements/css/tw-elements.min.css" />
<script src="https://cdn.tailwindcss.com/3.3.0"></script>
<script>
tailwind.config = {
darkMode: "class",
theme: {
fontFamily: {
sans: ["Roboto", "sans-serif"],
body: ["Roboto", "sans-serif"],
mono: ["ui-monospace", "monospace"],
},
},
corePlugins: {
preflight: false,
},
};
</script>
<link rel="stylesheet" href="https://cdnjs.cloudflare.com/ajax/libs/font-awesome/6.1.1/css/all.min.css">
<script src="https://unpkg.com/htmx.org@1.9.12" integrity="sha384-ujb1lZYygJmzgSwoxRggbCHcjc0rB2XoQrxeTUQyRjrOnlCoYta87iKBWq3EsdM2" crossorigin="anonymous"></script>
</head>
<body class="bg-gray-900 text-gray-200">
<div class="flex flex-col min-h-screen">
<nav class="bg-gray-800 shadow-lg">
<div class="container mx-auto px-4 py-4">
<div class="flex items-center justify-between">
<div class="flex items-center">
<a href="/" class="text-white text-xl font-bold"><img src="https://github.com/go-skynet/LocalAI/assets/2420543/0966aa2a-166e-4f99-a3e5-6c915fc997dd" alt="LocalAI Logo" class="h-10 mr-3 border-2 border-gray-300 shadow rounded"></a>
<a href="/" class="text-white text-xl font-bold">LocalAI</a>
</div>
<!-- Menu button for small screens -->
<div class="lg:hidden">
<button id="menu-toggle" class="text-gray-400 hover:text-white focus:outline-none">
<i class="fas fa-bars fa-lg"></i>
</button>
</div>
<!-- Navigation links -->
<div class="hidden lg:flex lg:items-center lg:justify-end lg:flex-1 lg:w-0">
<a href="https://localai.io" class="text-gray-400 hover:text-white px-3 py-2 rounded" target="_blank" ><i class="fas fa-book-reader pr-2"></i> Documentation</a>
</div>
</div>
<!-- Collapsible menu for small screens -->
<div class="hidden lg:hidden" id="mobile-menu">
<div class="pt-4 pb-3 border-t border-gray-700">
<a href="https://localai.io" class="block text-gray-400 hover:text-white px-3 py-2 rounded mt-1" target="_blank" ><i class="fas fa-book-reader pr-2"></i> Documentation</a>
</div>
</div>
</div>
</nav>
<style>
.is-hidden {
display: none;
}
</style>
<div class="container mx-auto px-4 flex-grow">
<div class="models mt-12">
<h2 class="text-center text-3xl font-semibold text-gray-100">
LocalAI model gallery list </h2><br>
<h2 class="text-center text-3xl font-semibold text-gray-100">
🖼️ Available {{.AvailableModels}} models</i> repositories <a href="https://localai.io/models/" target="_blank" >
<i class="fas fa-circle-info pr-2"></i>
</a></h2>
<h3>
Refer to <a href="https://localai.io/models" target=_blank> Model gallery</a> for more information on how to use the models with LocalAI.
You can install models with the CLI command <code>local-ai models install <model-name></code>. or by using the WebUI.
</h3>
<input class="form-control appearance-none block w-full mt-5 px-3 py-2 text-base font-normal text-gray-300 pb-2 mb-5 bg-gray-800 bg-clip-padding border border-solid border-gray-600 rounded transition ease-in-out m-0 focus:text-gray-300 focus:bg-gray-900 focus:border-blue-500 focus:outline-none" type="search"
id="searchbox" placeholder="Live search keyword..">
<div class="dark grid grid-cols-1 grid-rows-1 md:grid-cols-3 block rounded-lg shadow-secondary-1 dark:bg-surface-dark">
{{ range $_, $model := .Models }}
<div class="box me-4 mb-2 block rounded-lg bg-white shadow-secondary-1 dark:bg-gray-800 dark:bg-surface-dark dark:text-white text-surface pb-2">
<div>
{{ $icon := "https://upload.wikimedia.org/wikipedia/commons/6/65/No-Image-Placeholder.svg" }}
{{ if $model.Icon }}
{{ $icon = $model.Icon }}
{{ end }}
<div class="flex justify-center items-center">
<img data-src="{{ $icon }}" alt="{{$model.Name}}" class="rounded-t-lg max-h-48 max-w-96 object-cover mt-3 lazy">
</div>
<div class="p-6 text-surface dark:text-white">
<h5 class="mb-2 text-xl font-medium leading-tight">{{$model.Name}}</h5>
<p class="mb-4 text-base truncate">{{ $model.Description }}</p>
</div>
<div class="px-6 pt-4 pb-2">
<!-- Modal toggle -->
<button data-modal-target="{{ $model.Name}}-modal" data-modal-toggle="{{ $model.Name }}-modal" class="block text-white bg-blue-700 hover:bg-blue-800 focus:ring-4 focus:outline-none focus:ring-blue-300 font-medium rounded-lg text-sm px-5 py-2.5 text-center dark:bg-blue-600 dark:hover:bg-blue-700 dark:focus:ring-blue-800" type="button">
More info
</button>
<!-- Main modal -->
<div id="{{ $model.Name}}-modal" tabindex="-1" aria-hidden="true" class="hidden overflow-y-auto overflow-x-hidden fixed top-0 right-0 left-0 z-50 justify-center items-center w-full md:inset-0 h-[calc(100%-1rem)] max-h-full">
<div class="relative p-4 w-full max-w-2xl max-h-full">
<!-- Modal content -->
<div class="relative bg-white rounded-lg shadow dark:bg-gray-700">
<!-- Modal header -->
<div class="flex items-center justify-between p-4 md:p-5 border-b rounded-t dark:border-gray-600">
<h3 class="text-xl font-semibold text-gray-900 dark:text-white">
{{ $model.Name}}
</h3>
<button type="button" class="text-gray-400 bg-transparent hover:bg-gray-200 hover:text-gray-900 rounded-lg text-sm w-8 h-8 ms-auto inline-flex justify-center items-center dark:hover:bg-gray-600 dark:hover:text-white" data-modal-hide="{{$model.Name}}-modal">
<svg class="w-3 h-3" aria-hidden="true" xmlns="http://www.w3.org/2000/svg" fill="none" viewBox="0 0 14 14">
<path stroke="currentColor" stroke-linecap="round" stroke-linejoin="round" stroke-width="2" d="m1 1 6 6m0 0 6 6M7 7l6-6M7 7l-6 6"/>
</svg>
<span class="sr-only">Close modal</span>
</button>
</div>
<!-- Modal body -->
<div class="p-4 md:p-5 space-y-4">
<div class="flex justify-center items-center">
<img data-src="{{ $icon }}" alt="{{$model.Name}}" class="lazy rounded-t-lg max-h-48 max-w-96 object-cover mt-3">
</div>
<p class="text-base leading-relaxed text-gray-500 dark:text-gray-400">
{{ $model.Description }}
</p>
<p class="text-base leading-relaxed text-gray-500 dark:text-gray-400">
To install the model with the CLI, run: <br>
<code> local-ai models install {{$model.Name}} </code> <br>
<hr>
See also <a href="https://localai.io/models/" target="_blank" >
Installation <i class="fas fa-circle-info pr-2"></i>
</a> to see how to install models with the REST API.
</p>
<p class="text-base leading-relaxed text-gray-500 dark:text-gray-400">
<ul>
{{ range $_, $u := $model.URLs }}
<li><a href="{{ $u }}" target=_blank><i class="fa-solid fa-link"></i> {{ $u }}</a></li>
{{ end }}
</ul>
</p>
</div>
<!-- Modal footer -->
<div class="flex items-center p-4 md:p-5 border-t border-gray-200 rounded-b dark:border-gray-600">
<button data-modal-hide="{{ $model.Name}}-modal" type="button" class="py-2.5 px-5 ms-3 text-sm font-medium text-gray-900 focus:outline-none bg-white rounded-lg border border-gray-200 hover:bg-gray-100 hover:text-blue-700 focus:z-10 focus:ring-4 focus:ring-gray-100 dark:focus:ring-gray-700 dark:bg-gray-800 dark:text-gray-400 dark:border-gray-600 dark:hover:text-white dark:hover:bg-gray-700">Close</button>
</div>
</div>
</div>
</div>
</div>
</div>
</div>
{{ end }}
</div>
</div>
</div>
<script>
var lazyLoadInstance = new LazyLoad({
// Your custom settings go here
});
let cards = document.querySelectorAll('.box')
function liveSearch() {
let search_query = document.getElementById("searchbox").value;
//Use innerText if all contents are visible
//Use textContent for including hidden elements
for (var i = 0; i < cards.length; i++) {
if(cards[i].textContent.toLowerCase()
.includes(search_query.toLowerCase())) {
cards[i].classList.remove("is-hidden");
} else {
cards[i].classList.add("is-hidden");
}
}
}
//A little delay
let typingTimer;
let typeInterval = 500;
let searchInput = document.getElementById('searchbox');
searchInput.addEventListener('keyup', () => {
clearTimeout(typingTimer);
typingTimer = setTimeout(liveSearch, typeInterval);
});
</script>
</div>
<script src="https://cdnjs.cloudflare.com/ajax/libs/flowbite/2.3.0/flowbite.min.js"></script>
</body>
</html>
`
type GalleryModel struct {
Name string `json:"name" yaml:"name"`
URLs []string `json:"urls" yaml:"urls"`
Icon string `json:"icon" yaml:"icon"`
Description string `json:"description" yaml:"description"`
}
func main() {
// read the YAML file which contains the models
f, err := ioutil.ReadFile(os.Args[1])
if err != nil {
fmt.Println("Error reading file:", err)
return
}
models := []*GalleryModel{}
err = yaml.Unmarshal(f, &models)
if err != nil {
// write to stderr
os.Stderr.WriteString("Error unmarshaling YAML: " + err.Error() + "\n")
return
}
// render the template
data := struct {
Models []*GalleryModel
AvailableModels int
}{
Models: models,
AvailableModels: len(models),
}
tmpl := template.Must(template.New("modelPage").Parse(modelPageTemplate))
err = tmpl.Execute(os.Stdout, data)
if err != nil {
fmt.Println("Error executing template:", err)
return
}
}

7
.github/labeler.yml vendored
View File

@@ -8,6 +8,11 @@ kind/documentation:
- changed-files:
- any-glob-to-any-file: '*.md'
area/ai-model:
- any:
- changed-files:
- any-glob-to-any-file: 'gallery/*'
examples:
- any:
- changed-files:
@@ -16,4 +21,4 @@ examples:
ci:
- any:
- changed-files:
- any-glob-to-any-file: '.github/*'
- any-glob-to-any-file: '.github/*'

47
.github/workflows/checksum_checker.yaml vendored Normal file
View File

@@ -0,0 +1,47 @@
name: Check if checksums are up-to-date
on:
schedule:
- cron: 0 20 * * *
workflow_dispatch:
jobs:
checksum_check:
runs-on: arc-runner-set
steps:
- name: Force Install GIT latest
run: |
sudo apt-get update \
&& sudo apt-get install -y software-properties-common \
&& sudo apt-get update \
&& sudo add-apt-repository -y ppa:git-core/ppa \
&& sudo apt-get update \
&& sudo apt-get install -y git
- uses: actions/checkout@v4
- name: Install dependencies
run: |
sudo apt-get update
sudo apt-get install -y pip wget
sudo pip install --upgrade pip
pip install huggingface_hub
- name: 'Setup yq'
uses: dcarbone/install-yq-action@v1.1.1
with:
version: 'v4.43.1'
download-compressed: true
force: true
- name: Checksum checker 🔧
run: |
export HF_HOME=/hf_cache
sudo mkdir /hf_cache
sudo chmod 777 /hf_cache
bash .github/checksum_checker.sh gallery/index.yaml
- name: Create Pull Request
uses: peter-evans/create-pull-request@v6
with:
token: ${{ secrets.UPDATE_BOT_TOKEN }}
push-to-fork: ci-forks/LocalAI
commit-message: ':arrow_up: Checksum updates in gallery/index.yaml'
title: 'models(gallery): :arrow_up: update checksum'
branch: "update/checksum"
body: Updating checksums in gallery/index.yaml
signoff: true

View File

@@ -14,7 +14,7 @@ jobs:
steps:
- name: Dependabot metadata
id: metadata
uses: dependabot/fetch-metadata@v2.0.0
uses: dependabot/fetch-metadata@v2.1.0
with:
github-token: "${{ secrets.GITHUB_TOKEN }}"
skip-commit-verification: true

View File

@@ -1,7 +1,10 @@
name: 'generate and publish GRPC docker caches'
on:
- workflow_dispatch
workflow_dispatch:
push:
branches:
- master
concurrency:
group: grpc-cache-${{ github.head_ref || github.ref }}-${{ github.repository }}
@@ -14,7 +17,7 @@ jobs:
include:
- grpc-base-image: ubuntu:22.04
runs-on: 'ubuntu-latest'
platforms: 'linux/amd64'
platforms: 'linux/amd64,linux/arm64'
runs-on: ${{matrix.runs-on}}
steps:
- name: Release space from worker
@@ -80,11 +83,12 @@ jobs:
# If the build-args are not an EXACT match, it will result in a cache miss, which will require GRPC to be built from scratch.
build-args: |
GRPC_BASE_IMAGE=${{ matrix.grpc-base-image }}
MAKEFLAGS=--jobs=4 --output-sync=target
GRPC_VERSION=v1.58.0
GRPC_MAKEFLAGS=--jobs=4 --output-sync=target
GRPC_VERSION=v1.64.0
context: .
file: ./Dockerfile
cache-to: type=gha,ignore-error=true
cache-from: type=gha
target: grpc
platforms: ${{ matrix.platforms }}
push: false

View File

@@ -0,0 +1,59 @@
name: 'generate and publish intel docker caches'
on:
workflow_dispatch:
push:
branches:
- master
concurrency:
group: intel-cache-${{ github.head_ref || github.ref }}-${{ github.repository }}
cancel-in-progress: true
jobs:
generate_caches:
strategy:
matrix:
include:
- base-image: intel/oneapi-basekit:2024.1.0-devel-ubuntu22.04
runs-on: 'ubuntu-latest'
platforms: 'linux/amd64'
runs-on: ${{matrix.runs-on}}
steps:
- name: Set up QEMU
uses: docker/setup-qemu-action@master
with:
platforms: all
- name: Login to DockerHub
if: github.event_name != 'pull_request'
uses: docker/login-action@v3
with:
username: ${{ secrets.DOCKERHUB_USERNAME }}
password: ${{ secrets.DOCKERHUB_PASSWORD }}
- name: Login to quay
if: github.event_name != 'pull_request'
uses: docker/login-action@v3
with:
registry: quay.io
username: ${{ secrets.LOCALAI_REGISTRY_USERNAME }}
password: ${{ secrets.LOCALAI_REGISTRY_PASSWORD }}
- name: Set up Docker Buildx
id: buildx
uses: docker/setup-buildx-action@master
- name: Checkout
uses: actions/checkout@v4
- name: Cache Intel images
uses: docker/build-push-action@v5
with:
builder: ${{ steps.buildx.outputs.name }}
build-args: |
BASE_IMAGE=${{ matrix.base-image }}
context: .
file: ./Dockerfile
tags: quay.io/go-skynet/intel-oneapi-base:latest
push: true
target: intel
platforms: ${{ matrix.platforms }}

View File

@@ -61,14 +61,14 @@ jobs:
tag-suffix: '-hipblas'
ffmpeg: 'false'
image-type: 'extras'
base-image: "rocm/dev-ubuntu-22.04:6.0-complete"
base-image: "rocm/dev-ubuntu-22.04:6.1"
grpc-base-image: "ubuntu:22.04"
runs-on: 'arc-runner-set'
makeflags: "--jobs=3 --output-sync=target"
- build-type: 'sycl_f16'
platforms: 'linux/amd64'
tag-latest: 'false'
base-image: "intel/oneapi-basekit:2024.1.0-devel-ubuntu22.04"
base-image: "quay.io/go-skynet/intel-oneapi-base:latest"
grpc-base-image: "ubuntu:22.04"
tag-suffix: 'sycl-f16-ffmpeg'
ffmpeg: 'true'
@@ -110,7 +110,7 @@ jobs:
- build-type: 'sycl_f16'
platforms: 'linux/amd64'
tag-latest: 'false'
base-image: "intel/oneapi-basekit:2024.1.0-devel-ubuntu22.04"
base-image: "quay.io/go-skynet/intel-oneapi-base:latest"
grpc-base-image: "ubuntu:22.04"
tag-suffix: 'sycl-f16-ffmpeg-core'
ffmpeg: 'true'

View File

@@ -129,7 +129,7 @@ jobs:
ffmpeg: 'true'
image-type: 'extras'
aio: "-aio-gpu-hipblas"
base-image: "rocm/dev-ubuntu-22.04:6.0-complete"
base-image: "rocm/dev-ubuntu-22.04:6.1"
grpc-base-image: "ubuntu:22.04"
latest-image: 'latest-gpu-hipblas'
latest-image-aio: 'latest-aio-gpu-hipblas'
@@ -141,14 +141,14 @@ jobs:
tag-suffix: '-hipblas'
ffmpeg: 'false'
image-type: 'extras'
base-image: "rocm/dev-ubuntu-22.04:6.0-complete"
base-image: "rocm/dev-ubuntu-22.04:6.1"
grpc-base-image: "ubuntu:22.04"
runs-on: 'arc-runner-set'
makeflags: "--jobs=3 --output-sync=target"
- build-type: 'sycl_f16'
platforms: 'linux/amd64'
tag-latest: 'auto'
base-image: "intel/oneapi-basekit:2024.1.0-devel-ubuntu22.04"
base-image: "quay.io/go-skynet/intel-oneapi-base:latest"
grpc-base-image: "ubuntu:22.04"
tag-suffix: '-sycl-f16-ffmpeg'
ffmpeg: 'true'
@@ -161,7 +161,7 @@ jobs:
- build-type: 'sycl_f32'
platforms: 'linux/amd64'
tag-latest: 'auto'
base-image: "intel/oneapi-basekit:2024.1.0-devel-ubuntu22.04"
base-image: "quay.io/go-skynet/intel-oneapi-base:latest"
grpc-base-image: "ubuntu:22.04"
tag-suffix: '-sycl-f32-ffmpeg'
ffmpeg: 'true'
@@ -175,7 +175,7 @@ jobs:
- build-type: 'sycl_f16'
platforms: 'linux/amd64'
tag-latest: 'false'
base-image: "intel/oneapi-basekit:2024.1.0-devel-ubuntu22.04"
base-image: "quay.io/go-skynet/intel-oneapi-base:latest"
grpc-base-image: "ubuntu:22.04"
tag-suffix: '-sycl-f16-core'
ffmpeg: 'false'
@@ -185,7 +185,7 @@ jobs:
- build-type: 'sycl_f32'
platforms: 'linux/amd64'
tag-latest: 'false'
base-image: "intel/oneapi-basekit:2024.1.0-devel-ubuntu22.04"
base-image: "quay.io/go-skynet/intel-oneapi-base:latest"
grpc-base-image: "ubuntu:22.04"
tag-suffix: '-sycl-f32-core'
ffmpeg: 'false'
@@ -195,7 +195,7 @@ jobs:
- build-type: 'sycl_f16'
platforms: 'linux/amd64'
tag-latest: 'false'
base-image: "intel/oneapi-basekit:2024.1.0-devel-ubuntu22.04"
base-image: "quay.io/go-skynet/intel-oneapi-base:latest"
grpc-base-image: "ubuntu:22.04"
tag-suffix: '-sycl-f16-ffmpeg-core'
ffmpeg: 'true'
@@ -205,7 +205,7 @@ jobs:
- build-type: 'sycl_f32'
platforms: 'linux/amd64'
tag-latest: 'false'
base-image: "intel/oneapi-basekit:2024.1.0-devel-ubuntu22.04"
base-image: "quay.io/go-skynet/intel-oneapi-base:latest"
grpc-base-image: "ubuntu:22.04"
tag-suffix: '-sycl-f32-ffmpeg-core'
ffmpeg: 'true'
@@ -218,7 +218,7 @@ jobs:
tag-suffix: '-hipblas-ffmpeg-core'
ffmpeg: 'true'
image-type: 'core'
base-image: "rocm/dev-ubuntu-22.04:6.0-complete"
base-image: "rocm/dev-ubuntu-22.04:6.1"
grpc-base-image: "ubuntu:22.04"
runs-on: 'arc-runner-set'
makeflags: "--jobs=3 --output-sync=target"
@@ -228,7 +228,7 @@ jobs:
tag-suffix: '-hipblas-core'
ffmpeg: 'false'
image-type: 'core'
base-image: "rocm/dev-ubuntu-22.04:6.0-complete"
base-image: "rocm/dev-ubuntu-22.04:6.1"
grpc-base-image: "ubuntu:22.04"
runs-on: 'arc-runner-set'
makeflags: "--jobs=3 --output-sync=target"
@@ -260,7 +260,7 @@ jobs:
matrix:
include:
- build-type: ''
platforms: 'linux/amd64'
platforms: 'linux/amd64,linux/arm64'
tag-latest: 'auto'
tag-suffix: '-ffmpeg-core'
ffmpeg: 'true'

View File

@@ -136,6 +136,7 @@ jobs:
- name: Docker meta
id: meta
if: github.event_name != 'pull_request'
uses: docker/metadata-action@v5
with:
images: |
@@ -148,7 +149,20 @@ jobs:
flavor: |
latest=${{ inputs.tag-latest }}
suffix=${{ inputs.tag-suffix }}
- name: Docker meta for PR
id: meta_pull_request
if: github.event_name == 'pull_request'
uses: docker/metadata-action@v5
with:
images: |
ttl.sh/localai-ci-pr-${{ github.event.number }}
tags: |
type=ref,event=branch
type=semver,pattern={{raw}}
type=sha
flavor: |
latest=${{ inputs.tag-latest }}
suffix=${{ inputs.tag-suffix }}
- name: Docker meta AIO (quay.io)
if: inputs.aio != ''
id: meta_aio
@@ -174,7 +188,6 @@ jobs:
type=ref,event=branch
type=semver,pattern={{raw}}
flavor: |
latest=${{ inputs.tag-latest }}
suffix=${{ inputs.aio }}
- name: Set up QEMU
@@ -201,30 +214,15 @@ jobs:
username: ${{ secrets.quayUsername }}
password: ${{ secrets.quayPassword }}
- name: Cache GRPC
- name: Build and push
uses: docker/build-push-action@v5
if: github.event_name != 'pull_request'
with:
builder: ${{ steps.buildx.outputs.name }}
# The build-args MUST be an EXACT match between the image cache and other workflow steps that want to use that cache.
# This means that even the MAKEFLAGS have to be an EXACT match.
# If the build-args are not an EXACT match, it will result in a cache miss, which will require GRPC to be built from scratch.
build-args: |
GRPC_BASE_IMAGE=${{ inputs.grpc-base-image || inputs.base-image }}
MAKEFLAGS=--jobs=4 --output-sync=target
GRPC_VERSION=v1.58.0
context: .
file: ./Dockerfile
cache-from: type=gha
target: grpc
platforms: ${{ inputs.platforms }}
push: false
tags: ${{ steps.meta.outputs.tags }}
labels: ${{ steps.meta.outputs.labels }}
- name: Build and push
uses: docker/build-push-action@v5
with:
builder: ${{ steps.buildx.outputs.name }}
# This is why some build args like GRPC_VERSION and MAKEFLAGS are hardcoded
build-args: |
BUILD_TYPE=${{ inputs.build-type }}
CUDA_MAJOR_VERSION=${{ inputs.cuda-major-version }}
@@ -232,6 +230,9 @@ jobs:
FFMPEG=${{ inputs.ffmpeg }}
IMAGE_TYPE=${{ inputs.image-type }}
BASE_IMAGE=${{ inputs.base-image }}
GRPC_BASE_IMAGE=${{ inputs.grpc-base-image || inputs.base-image }}
GRPC_MAKEFLAGS=--jobs=4 --output-sync=target
GRPC_VERSION=v1.64.0
MAKEFLAGS=${{ inputs.makeflags }}
context: .
file: ./Dockerfile
@@ -240,15 +241,39 @@ jobs:
push: ${{ github.event_name != 'pull_request' }}
tags: ${{ steps.meta.outputs.tags }}
labels: ${{ steps.meta.outputs.labels }}
- name: Inspect image
if: github.event_name != 'pull_request'
### Start testing image
- name: Build and push
uses: docker/build-push-action@v5
if: github.event_name == 'pull_request'
with:
builder: ${{ steps.buildx.outputs.name }}
# The build-args MUST be an EXACT match between the image cache and other workflow steps that want to use that cache.
# This means that even the MAKEFLAGS have to be an EXACT match.
# If the build-args are not an EXACT match, it will result in a cache miss, which will require GRPC to be built from scratch.
# This is why some build args like GRPC_VERSION and MAKEFLAGS are hardcoded
build-args: |
BUILD_TYPE=${{ inputs.build-type }}
CUDA_MAJOR_VERSION=${{ inputs.cuda-major-version }}
CUDA_MINOR_VERSION=${{ inputs.cuda-minor-version }}
FFMPEG=${{ inputs.ffmpeg }}
IMAGE_TYPE=${{ inputs.image-type }}
BASE_IMAGE=${{ inputs.base-image }}
GRPC_BASE_IMAGE=${{ inputs.grpc-base-image || inputs.base-image }}
GRPC_MAKEFLAGS=--jobs=4 --output-sync=target
GRPC_VERSION=v1.64.0
MAKEFLAGS=${{ inputs.makeflags }}
context: .
file: ./Dockerfile
cache-from: type=gha
platforms: ${{ inputs.platforms }}
push: true
tags: ${{ steps.meta_pull_request.outputs.tags }}
labels: ${{ steps.meta_pull_request.outputs.labels }}
- name: Testing image
if: github.event_name == 'pull_request'
run: |
docker pull localai/localai:${{ steps.meta.outputs.version }}
docker image inspect localai/localai:${{ steps.meta.outputs.version }}
docker pull quay.io/go-skynet/local-ai:${{ steps.meta.outputs.version }}
docker image inspect quay.io/go-skynet/local-ai:${{ steps.meta.outputs.version }}
echo "Image is available at ttl.sh/localai-ci-pr-${{ github.event.number }}:${{ steps.meta_pull_request.outputs.version }}" >> $GITHUB_STEP_SUMMARY
## End testing image
- name: Build and push AIO image
if: inputs.aio != ''
uses: docker/build-push-action@v5

View File

@@ -1,11 +1,11 @@
name: Build and Release
on:
on:
- push
- pull_request
env:
GRPC_VERSION: v1.58.0
GRPC_VERSION: v1.64.0
permissions:
contents: write
@@ -15,20 +15,8 @@ concurrency:
cancel-in-progress: true
jobs:
build-linux:
strategy:
matrix:
include:
- build: 'avx2'
defines: ''
- build: 'avx'
defines: '-DLLAMA_AVX2=OFF'
- build: 'avx512'
defines: '-DLLAMA_AVX512=ON'
- build: 'cuda12'
defines: ''
- build: 'cuda11'
defines: ''
build-linux-arm:
runs-on: ubuntu-latest
steps:
- name: Clone
@@ -39,22 +27,153 @@ jobs:
with:
go-version: '1.21.x'
cache: false
- name: Dependencies
run: |
sudo apt-get update
sudo apt-get install build-essential ffmpeg protobuf-compiler ccache
sudo apt-get install -qy binutils-aarch64-linux-gnu gcc-aarch64-linux-gnu g++-aarch64-linux-gnu
- name: Install CUDA Dependencies
run: |
curl -O https://developer.download.nvidia.com/compute/cuda/repos/ubuntu2204/cross-linux-aarch64/cuda-keyring_1.1-1_all.deb
sudo dpkg -i cuda-keyring_1.1-1_all.deb
sudo apt-get update
sudo apt-get install -y cuda-cross-aarch64 cuda-nvcc-cross-aarch64-${CUDA_VERSION} libcublas-cross-aarch64-${CUDA_VERSION}
env:
CUDA_VERSION: 12-4
- name: Cache grpc
id: cache-grpc
uses: actions/cache@v4
with:
path: grpc
key: ${{ runner.os }}-arm-grpc-${{ env.GRPC_VERSION }}
- name: Build grpc
if: steps.cache-grpc.outputs.cache-hit != 'true'
run: |
git clone --recurse-submodules -b ${{ env.GRPC_VERSION }} --depth 1 --shallow-submodules https://github.com/grpc/grpc && \
cd grpc && mkdir -p cmake/build && cd cmake/build && cmake -DgRPC_INSTALL=ON \
-DgRPC_BUILD_TESTS=OFF \
../.. && sudo make --jobs 5 --output-sync=target
- name: Install gRPC
run: |
GNU_HOST=aarch64-linux-gnu
C_COMPILER_ARM_LINUX=$GNU_HOST-gcc
CXX_COMPILER_ARM_LINUX=$GNU_HOST-g++
CROSS_TOOLCHAIN=/usr/$GNU_HOST
CROSS_STAGING_PREFIX=$CROSS_TOOLCHAIN/stage
CMAKE_CROSS_TOOLCHAIN=/tmp/arm.toolchain.cmake
# https://cmake.org/cmake/help/v3.13/manual/cmake-toolchains.7.html#cross-compiling-for-linux
echo "set(CMAKE_SYSTEM_NAME Linux)" >> $CMAKE_CROSS_TOOLCHAIN && \
echo "set(CMAKE_SYSTEM_PROCESSOR arm)" >> $CMAKE_CROSS_TOOLCHAIN && \
echo "set(CMAKE_STAGING_PREFIX $CROSS_STAGING_PREFIX)" >> $CMAKE_CROSS_TOOLCHAIN && \
echo "set(CMAKE_SYSROOT ${CROSS_TOOLCHAIN}/sysroot)" >> $CMAKE_CROSS_TOOLCHAIN && \
echo "set(CMAKE_C_COMPILER /usr/bin/$C_COMPILER_ARM_LINUX)" >> $CMAKE_CROSS_TOOLCHAIN && \
echo "set(CMAKE_CXX_COMPILER /usr/bin/$CXX_COMPILER_ARM_LINUX)" >> $CMAKE_CROSS_TOOLCHAIN && \
echo "set(CMAKE_FIND_ROOT_PATH_MODE_PROGRAM NEVER)" >> $CMAKE_CROSS_TOOLCHAIN && \
echo "set(CMAKE_FIND_ROOT_PATH_MODE_LIBRARY ONLY)" >> $CMAKE_CROSS_TOOLCHAIN && \
echo "set(CMAKE_FIND_ROOT_PATH_MODE_INCLUDE ONLY)" >> $CMAKE_CROSS_TOOLCHAIN && \
echo "set(CMAKE_FIND_ROOT_PATH_MODE_PACKAGE ONLY)" >> $CMAKE_CROSS_TOOLCHAIN
GRPC_DIR=$PWD/grpc
cd grpc && cd cmake/build && sudo make --jobs 5 --output-sync=target install && \
GRPC_CROSS_BUILD_DIR=$GRPC_DIR/cmake/cross_build && \
mkdir -p $GRPC_CROSS_BUILD_DIR && \
cd $GRPC_CROSS_BUILD_DIR && \
cmake -DCMAKE_TOOLCHAIN_FILE=$CMAKE_CROSS_TOOLCHAIN \
-DCMAKE_BUILD_TYPE=Release \
-DCMAKE_INSTALL_PREFIX=$CROSS_TOOLCHAIN/grpc_install \
../.. && \
sudo make -j`nproc` install
- name: Build
id: build
run: |
GNU_HOST=aarch64-linux-gnu
C_COMPILER_ARM_LINUX=$GNU_HOST-gcc
CXX_COMPILER_ARM_LINUX=$GNU_HOST-g++
CROSS_TOOLCHAIN=/usr/$GNU_HOST
CROSS_STAGING_PREFIX=$CROSS_TOOLCHAIN/stage
CMAKE_CROSS_TOOLCHAIN=/tmp/arm.toolchain.cmake
go install google.golang.org/grpc/cmd/protoc-gen-go-grpc@8ba23be9613c672d40ae261d2a1335d639bdd59b
go install google.golang.org/protobuf/cmd/protoc-gen-go@v1.34.0
export PATH=$PATH:$GOPATH/bin
export PATH=/usr/local/cuda/bin:$PATH
GO_TAGS=p2p GOOS=linux GOARCH=arm64 CMAKE_ARGS="-DProtobuf_INCLUDE_DIRS=$CROSS_STAGING_PREFIX/include -DProtobuf_DIR=$CROSS_STAGING_PREFIX/lib/cmake/protobuf -DgRPC_DIR=$CROSS_STAGING_PREFIX/lib/cmake/grpc -DCMAKE_TOOLCHAIN_FILE=$CMAKE_CROSS_TOOLCHAIN -DCMAKE_C_COMPILER=aarch64-linux-gnu-gcc -DCMAKE_CXX_COMPILER=aarch64-linux-gnu-g++" make dist-cross-linux-arm64
- uses: actions/upload-artifact@v4
with:
name: LocalAI-linux-arm64
path: release/
- name: Release
uses: softprops/action-gh-release@v2
if: startsWith(github.ref, 'refs/tags/')
with:
files: |
release/*
build-linux:
runs-on: arc-runner-set
steps:
- name: Force Install GIT latest
run: |
sudo apt-get update \
&& sudo apt-get install -y software-properties-common \
&& sudo apt-get update \
&& sudo add-apt-repository -y ppa:git-core/ppa \
&& sudo apt-get update \
&& sudo apt-get install -y git
- name: Clone
uses: actions/checkout@v4
with:
submodules: true
- uses: actions/setup-go@v5
with:
go-version: '1.21.x'
cache: false
- name: Dependencies
run: |
sudo apt-get update
sudo apt-get install build-essential ffmpeg protobuf-compiler
- name: Install CUDA Dependencies
if: ${{ matrix.build == 'cuda12' || matrix.build == 'cuda11' }}
sudo apt-get install -y wget curl build-essential ffmpeg protobuf-compiler ccache cmake
- name: Intel Dependencies
run: |
wget -O- https://apt.repos.intel.com/intel-gpg-keys/GPG-PUB-KEY-INTEL-SW-PRODUCTS.PUB | gpg --dearmor | sudo tee /usr/share/keyrings/oneapi-archive-keyring.gpg > /dev/null
echo "deb [signed-by=/usr/share/keyrings/oneapi-archive-keyring.gpg] https://apt.repos.intel.com/oneapi all main" | sudo tee /etc/apt/sources.list.d/oneAPI.list
sudo apt update
sudo apt install -y intel-basekit
- name: Install CUDA Dependencies
run: |
if [ "${{ matrix.build }}" == "cuda12" ]; then
export CUDA_VERSION=12-3
else
export CUDA_VERSION=11-7
fi
curl -O https://developer.download.nvidia.com/compute/cuda/repos/ubuntu2204/x86_64/cuda-keyring_1.1-1_all.deb
sudo dpkg -i cuda-keyring_1.1-1_all.deb
sudo apt-get update
sudo apt-get install -y cuda-nvcc-${CUDA_VERSION} libcublas-dev-${CUDA_VERSION}
env:
CUDA_VERSION: 12-3
- name: "Install Hipblas"
env:
ROCM_VERSION: "6.1"
AMDGPU_VERSION: "6.1"
run: |
set -ex
sudo apt-get update
sudo DEBIAN_FRONTEND=noninteractive apt-get install -y --no-install-recommends ca-certificates curl libnuma-dev gnupg
curl -sL https://repo.radeon.com/rocm/rocm.gpg.key | sudo apt-key add -
printf "deb [arch=amd64] https://repo.radeon.com/rocm/apt/$ROCM_VERSION/ jammy main" | sudo tee /etc/apt/sources.list.d/rocm.list
printf "deb [arch=amd64] https://repo.radeon.com/amdgpu/$AMDGPU_VERSION/ubuntu jammy main" | sudo tee /etc/apt/sources.list.d/amdgpu.list
printf 'Package: *\nPin: release o=repo.radeon.com\nPin-Priority: 600' | sudo tee /etc/apt/preferences.d/rocm-pin-600
sudo apt-get update
sudo DEBIAN_FRONTEND=noninteractive apt-get install -y \
hipblas-dev rocm-dev \
rocblas-dev
sudo apt-get clean
sudo rm -rf /var/lib/apt/lists/*
sudo ldconfig
- name: Cache grpc
id: cache-grpc
uses: actions/cache@v4
@@ -73,23 +192,17 @@ jobs:
cd grpc && cd cmake/build && sudo make --jobs 5 --output-sync=target install
- name: Build
id: build
env:
CMAKE_ARGS: "${{ matrix.defines }}"
BUILD_ID: "${{ matrix.build }}"
run: |
go install google.golang.org/grpc/cmd/protoc-gen-go-grpc@latest
go install google.golang.org/protobuf/cmd/protoc-gen-go@latest
go install google.golang.org/grpc/cmd/protoc-gen-go-grpc@8ba23be9613c672d40ae261d2a1335d639bdd59b
go install google.golang.org/protobuf/cmd/protoc-gen-go@v1.34.0
export PATH=$PATH:$GOPATH/bin
if [ "${{ matrix.build }}" == "cuda12" ] || [ "${{ matrix.build }}" == "cuda11" ]; then
export BUILD_TYPE=cublas
export PATH=/usr/local/cuda/bin:$PATH
make dist
else
STATIC=true make dist
fi
export PATH=/usr/local/cuda/bin:$PATH
export PATH=/opt/rocm/bin:$PATH
source /opt/intel/oneapi/setvars.sh
GO_TAGS=p2p make -j4 dist
- uses: actions/upload-artifact@v4
with:
name: LocalAI-linux-${{ matrix.build }}
name: LocalAI-linux
path: release/
- name: Release
uses: softprops/action-gh-release@v2
@@ -111,58 +224,21 @@ jobs:
cache: false
- name: Dependencies
run: |
sudo apt-get install -y --no-install-recommends libopencv-dev protobuf-compiler
go install google.golang.org/grpc/cmd/protoc-gen-go-grpc@latest
go install google.golang.org/protobuf/cmd/protoc-gen-go@latest
sudo apt-get update
sudo apt-get install -y --no-install-recommends libopencv-dev protobuf-compiler ccache
go install google.golang.org/grpc/cmd/protoc-gen-go-grpc@8ba23be9613c672d40ae261d2a1335d639bdd59b
go install google.golang.org/protobuf/cmd/protoc-gen-go@v1.34.0
- name: Build stablediffusion
run: |
export PATH=$PATH:$GOPATH/bin
make backend-assets/grpc/stablediffusion
mkdir -p release && cp backend-assets/grpc/stablediffusion release
env:
GO_TAGS: stablediffusion
- uses: actions/upload-artifact@v4
with:
name: stablediffusion
path: release/
build-macOS:
strategy:
matrix:
include:
- build: 'avx2'
defines: ''
- build: 'avx'
defines: '-DLLAMA_AVX2=OFF'
- build: 'avx512'
defines: '-DLLAMA_AVX512=ON'
runs-on: macOS-latest
steps:
- name: Clone
uses: actions/checkout@v4
with:
submodules: true
- uses: actions/setup-go@v5
with:
go-version: '1.21.x'
cache: false
- name: Dependencies
run: |
brew install protobuf grpc
go install google.golang.org/grpc/cmd/protoc-gen-go-grpc@latest
go install google.golang.org/protobuf/cmd/protoc-gen-go@latest
- name: Build
id: build
env:
CMAKE_ARGS: "${{ matrix.defines }}"
BUILD_ID: "${{ matrix.build }}"
run: |
export C_INCLUDE_PATH=/usr/local/include
export CPLUS_INCLUDE_PATH=/usr/local/include
export PATH=$PATH:$GOPATH/bin
make dist
- uses: actions/upload-artifact@v4
with:
name: LocalAI-MacOS-${{ matrix.build }}
path: release/
- name: Release
uses: softprops/action-gh-release@v2
if: startsWith(github.ref, 'refs/tags/')
@@ -170,19 +246,14 @@ jobs:
files: |
release/*
build-macOS-arm64:
strategy:
matrix:
include:
- build: 'avx2'
defines: ''
- build: 'avx'
defines: '-DLLAMA_AVX2=OFF'
- build: 'avx512'
defines: '-DLLAMA_AVX512=ON'
runs-on: macos-14
steps:
- name: Setup tmate session if tests fail
uses: mxschmitt/action-tmate@v3.18
with:
connect-timeout-seconds: 180
limit-access-to-actor: true
- name: Clone
uses: actions/checkout@v4
with:
@@ -194,21 +265,18 @@ jobs:
- name: Dependencies
run: |
brew install protobuf grpc
go install google.golang.org/grpc/cmd/protoc-gen-go-grpc@latest
go install google.golang.org/protobuf/cmd/protoc-gen-go@latest
go install google.golang.org/grpc/cmd/protoc-gen-go-grpc@8ba23be9613c672d40ae261d2a1335d639bdd59b
go install google.golang.org/protobuf/cmd/protoc-gen-go@v1.34.0
- name: Build
id: build
env:
CMAKE_ARGS: "${{ matrix.defines }}"
BUILD_ID: "${{ matrix.build }}"
run: |
export C_INCLUDE_PATH=/usr/local/include
export CPLUS_INCLUDE_PATH=/usr/local/include
export PATH=$PATH:$GOPATH/bin
make dist
GO_TAGS=p2p make dist
- uses: actions/upload-artifact@v4
with:
name: LocalAI-MacOS-arm64-${{ matrix.build }}
name: LocalAI-MacOS-arm64
path: release/
- name: Release
uses: softprops/action-gh-release@v2

View File

@@ -25,22 +25,14 @@ jobs:
run: |
sudo apt-get update
sudo apt-get install build-essential ffmpeg
curl https://repo.anaconda.com/pkgs/misc/gpgkeys/anaconda.asc | gpg --dearmor > conda.gpg && \
sudo install -o root -g root -m 644 conda.gpg /usr/share/keyrings/conda-archive-keyring.gpg && \
gpg --keyring /usr/share/keyrings/conda-archive-keyring.gpg --no-default-keyring --fingerprint 34161F5BF5EB1D4BFBBB8F0A8AEB4F8B29D82806 && \
sudo /bin/bash -c 'echo "deb [arch=amd64 signed-by=/usr/share/keyrings/conda-archive-keyring.gpg] https://repo.anaconda.com/pkgs/misc/debrepo/conda stable main" > /etc/apt/sources.list.d/conda.list' && \
sudo /bin/bash -c 'echo "deb [arch=amd64 signed-by=/usr/share/keyrings/conda-archive-keyring.gpg] https://repo.anaconda.com/pkgs/misc/debrepo/conda stable main" | tee -a /etc/apt/sources.list.d/conda.list' && \
sudo apt-get update && \
sudo apt-get install -y conda
# Install UV
curl -LsSf https://astral.sh/uv/install.sh | sh
sudo apt-get install -y ca-certificates cmake curl patch python3-pip
sudo apt-get install -y libopencv-dev
pip install --user grpcio-tools
pip install --user grpcio-tools==1.64.0
sudo rm -rfv /usr/bin/conda || true
- name: Test transformers
run: |
export PATH=$PATH:/opt/conda/bin
make --jobs=5 --output-sync=target -C backend/python/transformers
make --jobs=5 --output-sync=target -C backend/python/transformers test
@@ -55,22 +47,14 @@ jobs:
run: |
sudo apt-get update
sudo apt-get install build-essential ffmpeg
curl https://repo.anaconda.com/pkgs/misc/gpgkeys/anaconda.asc | gpg --dearmor > conda.gpg && \
sudo install -o root -g root -m 644 conda.gpg /usr/share/keyrings/conda-archive-keyring.gpg && \
gpg --keyring /usr/share/keyrings/conda-archive-keyring.gpg --no-default-keyring --fingerprint 34161F5BF5EB1D4BFBBB8F0A8AEB4F8B29D82806 && \
sudo /bin/bash -c 'echo "deb [arch=amd64 signed-by=/usr/share/keyrings/conda-archive-keyring.gpg] https://repo.anaconda.com/pkgs/misc/debrepo/conda stable main" > /etc/apt/sources.list.d/conda.list' && \
sudo /bin/bash -c 'echo "deb [arch=amd64 signed-by=/usr/share/keyrings/conda-archive-keyring.gpg] https://repo.anaconda.com/pkgs/misc/debrepo/conda stable main" | tee -a /etc/apt/sources.list.d/conda.list' && \
sudo apt-get update && \
sudo apt-get install -y conda
# Install UV
curl -LsSf https://astral.sh/uv/install.sh | sh
sudo apt-get install -y ca-certificates cmake curl patch python3-pip
sudo apt-get install -y libopencv-dev
pip install --user grpcio-tools
pip install --user grpcio-tools==1.64.0
sudo rm -rfv /usr/bin/conda || true
- name: Test sentencetransformers
run: |
export PATH=$PATH:/opt/conda/bin
make --jobs=5 --output-sync=target -C backend/python/sentencetransformers
make --jobs=5 --output-sync=target -C backend/python/sentencetransformers test
@@ -86,22 +70,14 @@ jobs:
run: |
sudo apt-get update
sudo apt-get install build-essential ffmpeg
curl https://repo.anaconda.com/pkgs/misc/gpgkeys/anaconda.asc | gpg --dearmor > conda.gpg && \
sudo install -o root -g root -m 644 conda.gpg /usr/share/keyrings/conda-archive-keyring.gpg && \
gpg --keyring /usr/share/keyrings/conda-archive-keyring.gpg --no-default-keyring --fingerprint 34161F5BF5EB1D4BFBBB8F0A8AEB4F8B29D82806 && \
sudo /bin/bash -c 'echo "deb [arch=amd64 signed-by=/usr/share/keyrings/conda-archive-keyring.gpg] https://repo.anaconda.com/pkgs/misc/debrepo/conda stable main" > /etc/apt/sources.list.d/conda.list' && \
sudo /bin/bash -c 'echo "deb [arch=amd64 signed-by=/usr/share/keyrings/conda-archive-keyring.gpg] https://repo.anaconda.com/pkgs/misc/debrepo/conda stable main" | tee -a /etc/apt/sources.list.d/conda.list' && \
sudo apt-get update && \
sudo apt-get install -y conda
# Install UV
curl -LsSf https://astral.sh/uv/install.sh | sh
sudo apt-get install -y ca-certificates cmake curl patch python3-pip
sudo apt-get install -y libopencv-dev
pip install --user grpcio-tools
sudo rm -rfv /usr/bin/conda || true
pip install --user grpcio-tools==1.64.0
- name: Test rerankers
run: |
export PATH=$PATH:/opt/conda/bin
make --jobs=5 --output-sync=target -C backend/python/rerankers
make --jobs=5 --output-sync=target -C backend/python/rerankers test
@@ -115,25 +91,16 @@ jobs:
- name: Dependencies
run: |
sudo apt-get update
sudo apt-get install build-essential ffmpeg
curl https://repo.anaconda.com/pkgs/misc/gpgkeys/anaconda.asc | gpg --dearmor > conda.gpg && \
sudo install -o root -g root -m 644 conda.gpg /usr/share/keyrings/conda-archive-keyring.gpg && \
gpg --keyring /usr/share/keyrings/conda-archive-keyring.gpg --no-default-keyring --fingerprint 34161F5BF5EB1D4BFBBB8F0A8AEB4F8B29D82806 && \
sudo /bin/bash -c 'echo "deb [arch=amd64 signed-by=/usr/share/keyrings/conda-archive-keyring.gpg] https://repo.anaconda.com/pkgs/misc/debrepo/conda stable main" > /etc/apt/sources.list.d/conda.list' && \
sudo /bin/bash -c 'echo "deb [arch=amd64 signed-by=/usr/share/keyrings/conda-archive-keyring.gpg] https://repo.anaconda.com/pkgs/misc/debrepo/conda stable main" | tee -a /etc/apt/sources.list.d/conda.list' && \
sudo apt-get update && \
sudo apt-get install -y conda
sudo apt-get install -y build-essential ffmpeg
sudo apt-get install -y ca-certificates cmake curl patch python3-pip
sudo apt-get install -y libopencv-dev
pip install --user grpcio-tools
sudo rm -rfv /usr/bin/conda || true
# Install UV
curl -LsSf https://astral.sh/uv/install.sh | sh
pip install --user grpcio-tools==1.64.0
- name: Test diffusers
run: |
export PATH=$PATH:/opt/conda/bin
make --jobs=5 --output-sync=target -C backend/python/diffusers
make --jobs=5 --output-sync=target -C backend/python/diffusers test
make --jobs=5 --output-sync=target -C backend/python/diffusers
make --jobs=5 --output-sync=target -C backend/python/diffusers test
tests-parler-tts:
runs-on: ubuntu-latest
@@ -146,24 +113,38 @@ jobs:
run: |
sudo apt-get update
sudo apt-get install build-essential ffmpeg
curl https://repo.anaconda.com/pkgs/misc/gpgkeys/anaconda.asc | gpg --dearmor > conda.gpg && \
sudo install -o root -g root -m 644 conda.gpg /usr/share/keyrings/conda-archive-keyring.gpg && \
gpg --keyring /usr/share/keyrings/conda-archive-keyring.gpg --no-default-keyring --fingerprint 34161F5BF5EB1D4BFBBB8F0A8AEB4F8B29D82806 && \
sudo /bin/bash -c 'echo "deb [arch=amd64 signed-by=/usr/share/keyrings/conda-archive-keyring.gpg] https://repo.anaconda.com/pkgs/misc/debrepo/conda stable main" > /etc/apt/sources.list.d/conda.list' && \
sudo /bin/bash -c 'echo "deb [arch=amd64 signed-by=/usr/share/keyrings/conda-archive-keyring.gpg] https://repo.anaconda.com/pkgs/misc/debrepo/conda stable main" | tee -a /etc/apt/sources.list.d/conda.list' && \
sudo apt-get update && \
sudo apt-get install -y conda
# Install UV
curl -LsSf https://astral.sh/uv/install.sh | sh
sudo apt-get install -y ca-certificates cmake curl patch python3-pip
sudo apt-get install -y libopencv-dev
pip install --user grpcio-tools
sudo rm -rfv /usr/bin/conda || true
pip install --user grpcio-tools==1.64.0
- name: Test parler-tts
run: |
export PATH=$PATH:/opt/conda/bin
make --jobs=5 --output-sync=target -C backend/python/parler-tts
make --jobs=5 --output-sync=target -C backend/python/parler-tts test
tests-openvoice:
runs-on: ubuntu-latest
steps:
- name: Clone
uses: actions/checkout@v4
with:
submodules: true
- name: Dependencies
run: |
sudo apt-get update
sudo apt-get install build-essential ffmpeg
# Install UV
curl -LsSf https://astral.sh/uv/install.sh | sh
sudo apt-get install -y ca-certificates cmake curl patch python3-pip
sudo apt-get install -y libopencv-dev
pip install --user grpcio-tools==1.64.0
- name: Test openvoice
run: |
make --jobs=5 --output-sync=target -C backend/python/openvoice
make --jobs=5 --output-sync=target -C backend/python/openvoice test
tests-transformers-musicgen:
runs-on: ubuntu-latest
@@ -176,22 +157,14 @@ jobs:
run: |
sudo apt-get update
sudo apt-get install build-essential ffmpeg
curl https://repo.anaconda.com/pkgs/misc/gpgkeys/anaconda.asc | gpg --dearmor > conda.gpg && \
sudo install -o root -g root -m 644 conda.gpg /usr/share/keyrings/conda-archive-keyring.gpg && \
gpg --keyring /usr/share/keyrings/conda-archive-keyring.gpg --no-default-keyring --fingerprint 34161F5BF5EB1D4BFBBB8F0A8AEB4F8B29D82806 && \
sudo /bin/bash -c 'echo "deb [arch=amd64 signed-by=/usr/share/keyrings/conda-archive-keyring.gpg] https://repo.anaconda.com/pkgs/misc/debrepo/conda stable main" > /etc/apt/sources.list.d/conda.list' && \
sudo /bin/bash -c 'echo "deb [arch=amd64 signed-by=/usr/share/keyrings/conda-archive-keyring.gpg] https://repo.anaconda.com/pkgs/misc/debrepo/conda stable main" | tee -a /etc/apt/sources.list.d/conda.list' && \
sudo apt-get update && \
sudo apt-get install -y conda
# Install UV
curl -LsSf https://astral.sh/uv/install.sh | sh
sudo apt-get install -y ca-certificates cmake curl patch python3-pip
sudo apt-get install -y libopencv-dev
pip install --user grpcio-tools
sudo rm -rfv /usr/bin/conda || true
pip install --user grpcio-tools==1.64.0
- name: Test transformers-musicgen
run: |
export PATH=$PATH:/opt/conda/bin
make --jobs=5 --output-sync=target -C backend/python/transformers-musicgen
make --jobs=5 --output-sync=target -C backend/python/transformers-musicgen test
@@ -208,22 +181,14 @@ jobs:
# run: |
# sudo apt-get update
# sudo apt-get install build-essential ffmpeg
# curl https://repo.anaconda.com/pkgs/misc/gpgkeys/anaconda.asc | gpg --dearmor > conda.gpg && \
# sudo install -o root -g root -m 644 conda.gpg /usr/share/keyrings/conda-archive-keyring.gpg && \
# gpg --keyring /usr/share/keyrings/conda-archive-keyring.gpg --no-default-keyring --fingerprint 34161F5BF5EB1D4BFBBB8F0A8AEB4F8B29D82806 && \
# sudo /bin/bash -c 'echo "deb [arch=amd64 signed-by=/usr/share/keyrings/conda-archive-keyring.gpg] https://repo.anaconda.com/pkgs/misc/debrepo/conda stable main" > /etc/apt/sources.list.d/conda.list' && \
# sudo /bin/bash -c 'echo "deb [arch=amd64 signed-by=/usr/share/keyrings/conda-archive-keyring.gpg] https://repo.anaconda.com/pkgs/misc/debrepo/conda stable main" | tee -a /etc/apt/sources.list.d/conda.list' && \
# sudo apt-get update && \
# sudo apt-get install -y conda
# # Install UV
# curl -LsSf https://astral.sh/uv/install.sh | sh
# sudo apt-get install -y ca-certificates cmake curl patch python3-pip
# sudo apt-get install -y libopencv-dev
# pip install --user grpcio-tools
# sudo rm -rfv /usr/bin/conda || true
# pip install --user grpcio-tools==1.64.0
# - name: Test petals
# run: |
# export PATH=$PATH:/opt/conda/bin
# make --jobs=5 --output-sync=target -C backend/python/petals
# make --jobs=5 --output-sync=target -C backend/python/petals test
@@ -280,22 +245,14 @@ jobs:
# run: |
# sudo apt-get update
# sudo apt-get install build-essential ffmpeg
# curl https://repo.anaconda.com/pkgs/misc/gpgkeys/anaconda.asc | gpg --dearmor > conda.gpg && \
# sudo install -o root -g root -m 644 conda.gpg /usr/share/keyrings/conda-archive-keyring.gpg && \
# gpg --keyring /usr/share/keyrings/conda-archive-keyring.gpg --no-default-keyring --fingerprint 34161F5BF5EB1D4BFBBB8F0A8AEB4F8B29D82806 && \
# sudo /bin/bash -c 'echo "deb [arch=amd64 signed-by=/usr/share/keyrings/conda-archive-keyring.gpg] https://repo.anaconda.com/pkgs/misc/debrepo/conda stable main" > /etc/apt/sources.list.d/conda.list' && \
# sudo /bin/bash -c 'echo "deb [arch=amd64 signed-by=/usr/share/keyrings/conda-archive-keyring.gpg] https://repo.anaconda.com/pkgs/misc/debrepo/conda stable main" | tee -a /etc/apt/sources.list.d/conda.list' && \
# sudo apt-get update && \
# sudo apt-get install -y conda
# # Install UV
# curl -LsSf https://astral.sh/uv/install.sh | sh
# sudo apt-get install -y ca-certificates cmake curl patch python3-pip
# sudo apt-get install -y libopencv-dev
# pip install --user grpcio-tools
# sudo rm -rfv /usr/bin/conda || true
# pip install --user grpcio-tools==1.64.0
# - name: Test bark
# run: |
# export PATH=$PATH:/opt/conda/bin
# make --jobs=5 --output-sync=target -C backend/python/bark
# make --jobs=5 --output-sync=target -C backend/python/bark test
@@ -313,20 +270,13 @@ jobs:
# run: |
# sudo apt-get update
# sudo apt-get install build-essential ffmpeg
# curl https://repo.anaconda.com/pkgs/misc/gpgkeys/anaconda.asc | gpg --dearmor > conda.gpg && \
# sudo install -o root -g root -m 644 conda.gpg /usr/share/keyrings/conda-archive-keyring.gpg && \
# gpg --keyring /usr/share/keyrings/conda-archive-keyring.gpg --no-default-keyring --fingerprint 34161F5BF5EB1D4BFBBB8F0A8AEB4F8B29D82806 && \
# sudo /bin/bash -c 'echo "deb [arch=amd64 signed-by=/usr/share/keyrings/conda-archive-keyring.gpg] https://repo.anaconda.com/pkgs/misc/debrepo/conda stable main" > /etc/apt/sources.list.d/conda.list' && \
# sudo /bin/bash -c 'echo "deb [arch=amd64 signed-by=/usr/share/keyrings/conda-archive-keyring.gpg] https://repo.anaconda.com/pkgs/misc/debrepo/conda stable main" | tee -a /etc/apt/sources.list.d/conda.list' && \
# sudo apt-get update && \
# sudo apt-get install -y conda
# # Install UV
# curl -LsSf https://astral.sh/uv/install.sh | sh
# sudo apt-get install -y ca-certificates cmake curl patch python3-pip
# sudo apt-get install -y libopencv-dev
# pip install --user grpcio-tools
# sudo rm -rfv /usr/bin/conda || true
# pip install --user grpcio-tools==1.64.0
# - name: Test vllm
# run: |
# export PATH=$PATH:/opt/conda/bin
# make --jobs=5 --output-sync=target -C backend/python/vllm
# make --jobs=5 --output-sync=target -C backend/python/vllm test
tests-vallex:
@@ -340,20 +290,13 @@ jobs:
run: |
sudo apt-get update
sudo apt-get install build-essential ffmpeg
curl https://repo.anaconda.com/pkgs/misc/gpgkeys/anaconda.asc | gpg --dearmor > conda.gpg && \
sudo install -o root -g root -m 644 conda.gpg /usr/share/keyrings/conda-archive-keyring.gpg && \
gpg --keyring /usr/share/keyrings/conda-archive-keyring.gpg --no-default-keyring --fingerprint 34161F5BF5EB1D4BFBBB8F0A8AEB4F8B29D82806 && \
sudo /bin/bash -c 'echo "deb [arch=amd64 signed-by=/usr/share/keyrings/conda-archive-keyring.gpg] https://repo.anaconda.com/pkgs/misc/debrepo/conda stable main" > /etc/apt/sources.list.d/conda.list' && \
sudo /bin/bash -c 'echo "deb [arch=amd64 signed-by=/usr/share/keyrings/conda-archive-keyring.gpg] https://repo.anaconda.com/pkgs/misc/debrepo/conda stable main" | tee -a /etc/apt/sources.list.d/conda.list' && \
sudo apt-get update && \
sudo apt-get install -y conda
# Install UV
curl -LsSf https://astral.sh/uv/install.sh | sh
sudo apt-get install -y ca-certificates cmake curl patch python3-pip
sudo apt-get install -y libopencv-dev
pip install --user grpcio-tools
sudo rm -rfv /usr/bin/conda || true
pip install --user grpcio-tools==1.64.0
- name: Test vall-e-x
run: |
export PATH=$PATH:/opt/conda/bin
make --jobs=5 --output-sync=target -C backend/python/vall-e-x
make --jobs=5 --output-sync=target -C backend/python/vall-e-x test
@@ -368,19 +311,11 @@ jobs:
run: |
sudo apt-get update
sudo apt-get install build-essential ffmpeg
curl https://repo.anaconda.com/pkgs/misc/gpgkeys/anaconda.asc | gpg --dearmor > conda.gpg && \
sudo install -o root -g root -m 644 conda.gpg /usr/share/keyrings/conda-archive-keyring.gpg && \
gpg --keyring /usr/share/keyrings/conda-archive-keyring.gpg --no-default-keyring --fingerprint 34161F5BF5EB1D4BFBBB8F0A8AEB4F8B29D82806 && \
sudo /bin/bash -c 'echo "deb [arch=amd64 signed-by=/usr/share/keyrings/conda-archive-keyring.gpg] https://repo.anaconda.com/pkgs/misc/debrepo/conda stable main" > /etc/apt/sources.list.d/conda.list' && \
sudo /bin/bash -c 'echo "deb [arch=amd64 signed-by=/usr/share/keyrings/conda-archive-keyring.gpg] https://repo.anaconda.com/pkgs/misc/debrepo/conda stable main" | tee -a /etc/apt/sources.list.d/conda.list' && \
sudo apt-get update && \
sudo apt-get install -y conda
sudo apt-get install -y ca-certificates cmake curl patch espeak espeak-ng python3-pip
pip install --user grpcio-tools
sudo rm -rfv /usr/bin/conda || true
# Install UV
curl -LsSf https://astral.sh/uv/install.sh | sh
pip install --user grpcio-tools==1.64.0
- name: Test coqui
run: |
export PATH=$PATH:/opt/conda/bin
make --jobs=5 --output-sync=target -C backend/python/coqui
make --jobs=5 --output-sync=target -C backend/python/coqui test
make --jobs=5 --output-sync=target -C backend/python/coqui
make --jobs=5 --output-sync=target -C backend/python/coqui test

View File

@@ -10,7 +10,7 @@ on:
- '*'
env:
GRPC_VERSION: v1.58.0
GRPC_VERSION: v1.64.0
concurrency:
group: ci-tests-${{ github.head_ref || github.ref }}-${{ github.repository }}
@@ -57,7 +57,7 @@ jobs:
df -h
- name: Clone
uses: actions/checkout@v4
with:
with:
submodules: true
- name: Setup Go ${{ matrix.go-version }}
uses: actions/setup-go@v5
@@ -78,6 +78,8 @@ jobs:
sudo /bin/bash -c 'echo "deb [arch=amd64 signed-by=/usr/share/keyrings/conda-archive-keyring.gpg] https://repo.anaconda.com/pkgs/misc/debrepo/conda stable main" | tee -a /etc/apt/sources.list.d/conda.list' && \
sudo apt-get update && \
sudo apt-get install -y conda
# Install UV
curl -LsSf https://astral.sh/uv/install.sh | sh
sudo apt-get install -y ca-certificates cmake patch python3-pip unzip
sudo apt-get install -y libopencv-dev
@@ -85,8 +87,14 @@ jobs:
unzip -j -d /usr/local/bin protoc.zip bin/protoc && \
rm protoc.zip
go install google.golang.org/protobuf/cmd/protoc-gen-go@latest
go install google.golang.org/grpc/cmd/protoc-gen-go-grpc@latest
curl -O https://developer.download.nvidia.com/compute/cuda/repos/ubuntu2204/x86_64/cuda-keyring_1.1-1_all.deb
sudo dpkg -i cuda-keyring_1.1-1_all.deb
sudo apt-get update
sudo apt-get install -y cuda-nvcc-${CUDA_VERSION} libcublas-dev-${CUDA_VERSION}
export CUDACXX=/usr/local/cuda/bin/nvcc
go install google.golang.org/protobuf/cmd/protoc-gen-go@v1.34.0
go install google.golang.org/grpc/cmd/protoc-gen-go-grpc@8ba23be9613c672d40ae261d2a1335d639bdd59b
# The python3-grpc-tools package in 22.04 is too old
pip install --user grpcio-tools
@@ -100,6 +108,8 @@ jobs:
sudo cp -rfv sources/go-piper/piper-phonemize/pi/lib/. /usr/lib/ && \
# Pre-build stable diffusion before we install a newer version of abseil (not compatible with stablediffusion-ncn)
PATH="$PATH:/root/go/bin" GO_TAGS="stablediffusion tts" GRPC_BACKENDS=backend-assets/grpc/stablediffusion make build
env:
CUDA_VERSION: 12-3
- name: Cache grpc
id: cache-grpc
uses: actions/cache@v4
@@ -164,11 +174,11 @@ jobs:
df -h
- name: Clone
uses: actions/checkout@v4
with:
with:
submodules: true
- name: Build images
run: |
docker build --build-arg FFMPEG=true --build-arg IMAGE_TYPE=core --build-arg MAKEFLAGS="--jobs=5 --output-sync=target" -t local-ai:tests -f Dockerfile .
docker build --build-arg FFMPEG=true --build-arg IMAGE_TYPE=extras --build-arg EXTRA_BACKENDS=rerankers --build-arg MAKEFLAGS="--jobs=5 --output-sync=target" -t local-ai:tests -f Dockerfile .
BASE_IMAGE=local-ai:tests DOCKER_AIO_IMAGE=local-ai-aio:test make docker-aio
- name: Test
run: |
@@ -190,7 +200,7 @@ jobs:
steps:
- name: Clone
uses: actions/checkout@v4
with:
with:
submodules: true
- name: Setup Go ${{ matrix.go-version }}
uses: actions/setup-go@v5
@@ -203,7 +213,7 @@ jobs:
- name: Dependencies
run: |
brew install protobuf grpc make protoc-gen-go protoc-gen-go-grpc
pip install --user grpcio-tools
pip install --user grpcio-tools==1.64.0
- name: Test
run: |
export C_INCLUDE_PATH=/usr/local/include

9
.gitignore vendored
View File

@@ -6,6 +6,9 @@ get-sources
prepare-sources
/backend/cpp/llama/grpc-server
/backend/cpp/llama/llama.cpp
/backend/cpp/llama-*
*.log
go-ggml-transformers
go-gpt2
@@ -39,6 +42,7 @@ backend-assets/*
!backend-assets/.keep
prepare
/ggml-metal.metal
docs/static/gallery.html
# Protobuf generated files
*.pb.go
@@ -46,4 +50,7 @@ prepare
*pb2_grpc.py
# SonarQube
.scannerwork
.scannerwork
# backend virtual environments
**/venv

View File

@@ -1,44 +1,40 @@
ARG IMAGE_TYPE=extras
ARG BASE_IMAGE=ubuntu:22.04
ARG GRPC_BASE_IMAGE=${BASE_IMAGE}
ARG INTEL_BASE_IMAGE=${BASE_IMAGE}
# extras or core
# The requirements-core target is common to all images. It should not be placed in requirements-core unless every single build will use it.
FROM ${BASE_IMAGE} AS requirements-core
USER root
ARG GO_VERSION=1.21.7
ARG BUILD_TYPE
ARG CUDA_MAJOR_VERSION=11
ARG CUDA_MINOR_VERSION=7
ARG GO_VERSION=1.22.4
ARG TARGETARCH
ARG TARGETVARIANT
ENV BUILD_TYPE=${BUILD_TYPE}
ENV DEBIAN_FRONTEND=noninteractive
ENV EXTERNAL_GRPC_BACKENDS="coqui:/build/backend/python/coqui/run.sh,huggingface-embeddings:/build/backend/python/sentencetransformers/run.sh,petals:/build/backend/python/petals/run.sh,transformers:/build/backend/python/transformers/run.sh,sentencetransformers:/build/backend/python/sentencetransformers/run.sh,rerankers:/build/backend/python/rerankers/run.sh,autogptq:/build/backend/python/autogptq/run.sh,bark:/build/backend/python/bark/run.sh,diffusers:/build/backend/python/diffusers/run.sh,exllama:/build/backend/python/exllama/run.sh,vall-e-x:/build/backend/python/vall-e-x/run.sh,vllm:/build/backend/python/vllm/run.sh,mamba:/build/backend/python/mamba/run.sh,exllama2:/build/backend/python/exllama2/run.sh,transformers-musicgen:/build/backend/python/transformers-musicgen/run.sh,parler-tts:/build/backend/python/parler-tts/run.sh"
ENV EXTERNAL_GRPC_BACKENDS="coqui:/build/backend/python/coqui/run.sh,huggingface-embeddings:/build/backend/python/sentencetransformers/run.sh,petals:/build/backend/python/petals/run.sh,transformers:/build/backend/python/transformers/run.sh,sentencetransformers:/build/backend/python/sentencetransformers/run.sh,rerankers:/build/backend/python/rerankers/run.sh,autogptq:/build/backend/python/autogptq/run.sh,bark:/build/backend/python/bark/run.sh,diffusers:/build/backend/python/diffusers/run.sh,exllama:/build/backend/python/exllama/run.sh,openvoice:/build/backend/python/openvoice/run.sh,vall-e-x:/build/backend/python/vall-e-x/run.sh,vllm:/build/backend/python/vllm/run.sh,mamba:/build/backend/python/mamba/run.sh,exllama2:/build/backend/python/exllama2/run.sh,transformers-musicgen:/build/backend/python/transformers-musicgen/run.sh,parler-tts:/build/backend/python/parler-tts/run.sh"
ARG GO_TAGS="stablediffusion tinydream tts"
RUN apt-get update && \
apt-get install -y ca-certificates curl python3-pip unzip && apt-get clean
apt-get install -y --no-install-recommends \
build-essential \
ccache \
ca-certificates \
cmake \
curl \
git \
unzip && \
apt-get clean && \
rm -rf /var/lib/apt/lists/*
# Install Go
RUN curl -L -s https://go.dev/dl/go${GO_VERSION}.linux-${TARGETARCH}.tar.gz | tar -C /usr/local -xz
ENV PATH $PATH:/usr/local/go/bin
ENV PATH $PATH:/root/go/bin:/usr/local/go/bin
# Install grpc compilers
ENV PATH $PATH:/root/go/bin
RUN go install google.golang.org/protobuf/cmd/protoc-gen-go@latest && \
go install google.golang.org/grpc/cmd/protoc-gen-go-grpc@latest
# Install protobuf (the version in 22.04 is too old)
RUN curl -L -s https://github.com/protocolbuffers/protobuf/releases/download/v26.1/protoc-26.1-linux-x86_64.zip -o protoc.zip && \
unzip -j -d /usr/local/bin protoc.zip bin/protoc && \
rm protoc.zip
# Install grpcio-tools (the version in 22.04 is too old)
RUN pip install --user grpcio-tools
RUN go install google.golang.org/protobuf/cmd/protoc-gen-go@v1.34.1 && \
go install google.golang.org/grpc/cmd/protoc-gen-go-grpc@1958fcbe2ca8bd93af633f11e97d44e567e945af
COPY --chmod=644 custom-ca-certs/* /usr/local/share/ca-certificates/
RUN update-ca-certificates
@@ -47,16 +43,6 @@ RUN update-ca-certificates
RUN echo "Target Architecture: $TARGETARCH"
RUN echo "Target Variant: $TARGETVARIANT"
# CuBLAS requirements
RUN if [ "${BUILD_TYPE}" = "cublas" ]; then \
apt-get install -y software-properties-common && \
curl -O https://developer.download.nvidia.com/compute/cuda/repos/ubuntu2204/x86_64/cuda-keyring_1.1-1_all.deb && \
dpkg -i cuda-keyring_1.1-1_all.deb && \
rm -f cuda-keyring_1.1-1_all.deb && \
apt-get update && \
apt-get install -y cuda-nvcc-${CUDA_MAJOR_VERSION}-${CUDA_MINOR_VERSION} libcurand-dev-${CUDA_MAJOR_VERSION}-${CUDA_MINOR_VERSION} libcublas-dev-${CUDA_MAJOR_VERSION}-${CUDA_MINOR_VERSION} libcusparse-dev-${CUDA_MAJOR_VERSION}-${CUDA_MINOR_VERSION} libcusolver-dev-${CUDA_MAJOR_VERSION}-${CUDA_MINOR_VERSION} && apt-get clean \
; fi
# Cuda
ENV PATH /usr/local/cuda/bin:${PATH}
@@ -64,10 +50,12 @@ ENV PATH /usr/local/cuda/bin:${PATH}
ENV PATH /opt/rocm/bin:${PATH}
# OpenBLAS requirements and stable diffusion
RUN apt-get install -y \
libopenblas-dev \
libopencv-dev \
&& apt-get clean
RUN apt-get update && \
apt-get install -y --no-install-recommends \
libopenblas-dev \
libopencv-dev && \
apt-get clean && \
rm -rf /var/lib/apt/lists/*
# Set up OpenCV
RUN ln -s /usr/include/opencv4/opencv2 /usr/include/opencv2
@@ -80,58 +68,162 @@ RUN test -n "$TARGETARCH" \
###################################
###################################
# The requirements-extras target is for any builds with IMAGE_TYPE=extras. It should not be placed in this target unless every IMAGE_TYPE=extras build will use it
FROM requirements-core AS requirements-extras
RUN apt install -y gpg && \
curl https://repo.anaconda.com/pkgs/misc/gpgkeys/anaconda.asc | gpg --dearmor > conda.gpg && \
install -o root -g root -m 644 conda.gpg /usr/share/keyrings/conda-archive-keyring.gpg && \
gpg --keyring /usr/share/keyrings/conda-archive-keyring.gpg --no-default-keyring --fingerprint 34161F5BF5EB1D4BFBBB8F0A8AEB4F8B29D82806 && \
echo "deb [arch=amd64 signed-by=/usr/share/keyrings/conda-archive-keyring.gpg] https://repo.anaconda.com/pkgs/misc/debrepo/conda stable main" > /etc/apt/sources.list.d/conda.list && \
echo "deb [arch=amd64 signed-by=/usr/share/keyrings/conda-archive-keyring.gpg] https://repo.anaconda.com/pkgs/misc/debrepo/conda stable main" | tee -a /etc/apt/sources.list.d/conda.list && \
apt-get update && \
apt-get install -y conda && apt-get clean
RUN curl -LsSf https://astral.sh/uv/install.sh | sh
ENV PATH="/root/.cargo/bin:${PATH}"
RUN apt-get install -y python3-pip && apt-get clean
RUN pip install --upgrade pip
RUN curl --proto '=https' --tlsv1.2 -sSf https://sh.rustup.rs | sh -s -- -y
RUN apt-get install -y espeak-ng espeak && apt-get clean
RUN apt-get update && \
apt-get install -y --no-install-recommends \
espeak-ng \
espeak \
python3-pip \
python-is-python3 \
python3-dev \
python3-venv && \
apt-get clean && \
rm -rf /var/lib/apt/lists/* && \
pip install --upgrade pip
RUN if [ ! -e /usr/bin/python ]; then \
ln -s /usr/bin/python3 /usr/bin/python \
# Install grpcio-tools (the version in 22.04 is too old)
RUN pip install --user grpcio-tools
###################################
###################################
# The requirements-drivers target is for BUILD_TYPE specific items. If you need to install something specific to CUDA, or specific to ROCM, it goes here.
# This target will be built on top of requirements-core or requirements-extras as retermined by the IMAGE_TYPE build-arg
FROM requirements-${IMAGE_TYPE} AS requirements-drivers
ARG BUILD_TYPE
ARG CUDA_MAJOR_VERSION=11
ARG CUDA_MINOR_VERSION=8
ENV BUILD_TYPE=${BUILD_TYPE}
# CuBLAS requirements
RUN <<EOT bash
if [ "${BUILD_TYPE}" = "cublas" ]; then
apt-get update && \
apt-get install -y --no-install-recommends \
software-properties-common pciutils
if [ "amd64" = "$TARGETARCH" ]; then
curl -O https://developer.download.nvidia.com/compute/cuda/repos/ubuntu2204/x86_64/cuda-keyring_1.1-1_all.deb
fi
if [ "arm64" = "$TARGETARCH" ]; then
curl -O https://developer.download.nvidia.com/compute/cuda/repos/ubuntu2204/arm64/cuda-keyring_1.1-1_all.deb
fi
dpkg -i cuda-keyring_1.1-1_all.deb && \
rm -f cuda-keyring_1.1-1_all.deb && \
apt-get update && \
apt-get install -y --no-install-recommends \
cuda-nvcc-${CUDA_MAJOR_VERSION}-${CUDA_MINOR_VERSION} \
libcufft-dev-${CUDA_MAJOR_VERSION}-${CUDA_MINOR_VERSION} \
libcurand-dev-${CUDA_MAJOR_VERSION}-${CUDA_MINOR_VERSION} \
libcublas-dev-${CUDA_MAJOR_VERSION}-${CUDA_MINOR_VERSION} \
libcusparse-dev-${CUDA_MAJOR_VERSION}-${CUDA_MINOR_VERSION} \
libcusolver-dev-${CUDA_MAJOR_VERSION}-${CUDA_MINOR_VERSION} && \
apt-get clean && \
rm -rf /var/lib/apt/lists/*
fi
EOT
RUN if [ "${BUILD_TYPE}" = "cublas" ]; then \
apt-get update && \
apt-get install -y --no-install-recommends \
software-properties-common pciutils && \
curl -O https://developer.download.nvidia.com/compute/cuda/repos/ubuntu2204/x86_64/cuda-keyring_1.1-1_all.deb && \
dpkg -i cuda-keyring_1.1-1_all.deb && \
rm -f cuda-keyring_1.1-1_all.deb && \
apt-get update && \
apt-get install -y --no-install-recommends \
cuda-nvcc-${CUDA_MAJOR_VERSION}-${CUDA_MINOR_VERSION} \
libcufft-dev-${CUDA_MAJOR_VERSION}-${CUDA_MINOR_VERSION} \
libcurand-dev-${CUDA_MAJOR_VERSION}-${CUDA_MINOR_VERSION} \
libcublas-dev-${CUDA_MAJOR_VERSION}-${CUDA_MINOR_VERSION} \
libcusparse-dev-${CUDA_MAJOR_VERSION}-${CUDA_MINOR_VERSION} \
libcusolver-dev-${CUDA_MAJOR_VERSION}-${CUDA_MINOR_VERSION} && \
apt-get clean && \
rm -rf /var/lib/apt/lists/* \
; fi
# If we are building with clblas support, we need the libraries for the builds
RUN if [ "${BUILD_TYPE}" = "clblas" ]; then \
apt-get update && \
apt-get install -y --no-install-recommends \
libclblast-dev && \
apt-get clean && \
rm -rf /var/lib/apt/lists/* \
; fi
RUN if [ "${BUILD_TYPE}" = "hipblas" ]; then \
apt-get update && \
apt-get install -y --no-install-recommends \
hipblas-dev \
rocblas-dev && \
apt-get clean && \
rm -rf /var/lib/apt/lists/* && \
# I have no idea why, but the ROCM lib packages don't trigger ldconfig after they install, which results in local-ai and others not being able
# to locate the libraries. We run ldconfig ourselves to work around this packaging deficiency
ldconfig \
; fi
###################################
###################################
# Temporary workaround for Intel's repository to work correctly
# https://community.intel.com/t5/Intel-oneAPI-Math-Kernel-Library/APT-Repository-not-working-signatures-invalid/m-p/1599436/highlight/true#M36143
# This is a temporary workaround until Intel fixes their repository
FROM ${INTEL_BASE_IMAGE} AS intel
RUN wget -qO - https://repositories.intel.com/gpu/intel-graphics.key | \
gpg --yes --dearmor --output /usr/share/keyrings/intel-graphics.gpg
RUN echo "deb [arch=amd64 signed-by=/usr/share/keyrings/intel-graphics.gpg] https://repositories.intel.com/gpu/ubuntu jammy/lts/2350 unified" > /etc/apt/sources.list.d/intel-graphics.list
###################################
###################################
# The grpc target does one thing, it builds and installs GRPC. This is in it's own layer so that it can be effectively cached by CI.
# You probably don't need to change anything here, and if you do, make sure that CI is adjusted so that the cache continues to work.
FROM ${GRPC_BASE_IMAGE} AS grpc
ARG MAKEFLAGS
ARG GRPC_VERSION=v1.58.0
# This is a bit of a hack, but it's required in order to be able to effectively cache this layer in CI
ARG GRPC_MAKEFLAGS="-j4 -Otarget"
ARG GRPC_VERSION=v1.64.2
ENV MAKEFLAGS=${MAKEFLAGS}
ENV MAKEFLAGS=${GRPC_MAKEFLAGS}
WORKDIR /build
RUN apt-get update && \
apt-get install -y build-essential cmake git && \
apt-get install -y --no-install-recommends \
ca-certificates \
build-essential \
cmake \
git && \
apt-get clean && \
rm -rf /var/lib/apt/lists/*
RUN git clone --recurse-submodules --jobs 4 -b ${GRPC_VERSION} --depth 1 --shallow-submodules https://github.com/grpc/grpc
WORKDIR /build/grpc/cmake/build
RUN cmake -DgRPC_INSTALL=ON -DgRPC_BUILD_TESTS=OFF ../.. && \
make
# We install GRPC to a different prefix here so that we can copy in only the build artifacts later
# saves several hundred MB on the final docker image size vs copying in the entire GRPC source tree
# and running make install in the target container
RUN git clone --recurse-submodules --jobs 4 -b ${GRPC_VERSION} --depth 1 --shallow-submodules https://github.com/grpc/grpc && \
mkdir -p /build/grpc/cmake/build && \
cd /build/grpc/cmake/build && \
cmake -DgRPC_INSTALL=ON -DgRPC_BUILD_TESTS=OFF -DCMAKE_INSTALL_PREFIX:PATH=/opt/grpc ../.. && \
make && \
make install && \
rm -rf /build
###################################
###################################
FROM requirements-${IMAGE_TYPE} AS builder
# The builder target compiles LocalAI. This target is not the target that will be uploaded to the registry.
# Adjustments to the build process should likely be made here.
FROM requirements-drivers AS builder
ARG GO_TAGS="stablediffusion tts"
ARG GO_TAGS="stablediffusion tts p2p"
ARG GRPC_BACKENDS
ARG MAKEFLAGS
@@ -148,46 +240,51 @@ COPY . .
COPY .git .
RUN echo "GO_TAGS: $GO_TAGS"
RUN apt-get update && \
apt-get install -y build-essential cmake git && \
apt-get clean && \
rm -rf /var/lib/apt/lists/*
RUN make prepare
# If we are building with clblas support, we need the libraries for the builds
RUN if [ "${BUILD_TYPE}" = "clblas" ]; then \
apt-get update && \
apt-get install -y libclblast-dev && \
apt-get clean \
; fi
# We need protoc installed, and the version in 22.04 is too old. We will create one as part installing the GRPC build below
# but that will also being in a newer version of absl which stablediffusion cannot compile with. This version of protoc is only
# here so that we can generate the grpc code for the stablediffusion build
RUN <<EOT bash
if [ "amd64" = "$TARGETARCH" ]; then
curl -L -s https://github.com/protocolbuffers/protobuf/releases/download/v27.1/protoc-27.1-linux-x86_64.zip -o protoc.zip && \
unzip -j -d /usr/local/bin protoc.zip bin/protoc && \
rm protoc.zip
fi
if [ "arm64" = "$TARGETARCH" ]; then
curl -L -s https://github.com/protocolbuffers/protobuf/releases/download/v27.1/protoc-27.1-linux-aarch_64.zip -o protoc.zip && \
unzip -j -d /usr/local/bin protoc.zip bin/protoc && \
rm protoc.zip
fi
EOT
# stablediffusion does not tolerate a newer version of abseil, build it first
RUN GRPC_BACKENDS=backend-assets/grpc/stablediffusion make build
COPY --from=grpc /build/grpc ./grpc/
WORKDIR /build/grpc/cmake/build
RUN make install
# Install the pre-built GRPC
COPY --from=grpc /opt/grpc /usr/local
# Rebuild with defaults backends
WORKDIR /build
RUN make build
RUN if [ ! -d "/build/sources/go-piper/piper-phonemize/pi/lib/" ]; then \
mkdir -p /build/sources/go-piper/piper-phonemize/pi/lib/ \
touch /build/sources/go-piper/piper-phonemize/pi/lib/keep \
mkdir -p /build/sources/go-piper/piper-phonemize/pi/lib/ \
touch /build/sources/go-piper/piper-phonemize/pi/lib/keep \
; fi
###################################
###################################
FROM requirements-${IMAGE_TYPE}
# This is the final target. The result of this target will be the image uploaded to the registry.
# If you cannot find a more suitable place for an addition, this layer is a suitable place for it.
FROM requirements-drivers
ARG FFMPEG
ARG BUILD_TYPE
ARG TARGETARCH
ARG IMAGE_TYPE=extras
ARG EXTRA_BACKENDS
ARG MAKEFLAGS
ENV BUILD_TYPE=${BUILD_TYPE}
@@ -199,25 +296,16 @@ ARG CUDA_MAJOR_VERSION=11
ENV NVIDIA_DRIVER_CAPABILITIES=compute,utility
ENV NVIDIA_REQUIRE_CUDA="cuda>=${CUDA_MAJOR_VERSION}.0"
ENV NVIDIA_VISIBLE_DEVICES=all
ENV PIP_CACHE_PURGE=true
# Add FFmpeg
RUN if [ "${FFMPEG}" = "true" ]; then \
apt-get install -y ffmpeg && apt-get clean \
apt-get update && \
apt-get install -y --no-install-recommends \
ffmpeg && \
apt-get clean && \
rm -rf /var/lib/apt/lists/* \
; fi
# Add OpenCL
RUN if [ "${BUILD_TYPE}" = "clblas" ]; then \
apt-get update && \
apt-get install -y libclblast1 && \
apt-get clean \
; fi
RUN apt-get update && \
apt-get install -y cmake git && \
apt-get clean && \
rm -rf /var/lib/apt/lists/*
WORKDIR /build
# we start fresh & re-copy all assets because `make build` does not clean up nicely after itself
@@ -227,9 +315,9 @@ WORKDIR /build
COPY . .
COPY --from=builder /build/sources ./sources/
COPY --from=grpc /build/grpc ./grpc/
COPY --from=grpc /opt/grpc /usr/local
RUN make prepare-sources && cd /build/grpc/cmake/build && make install && rm -rf /build/grpc
RUN make prepare-sources
# Copy the binary
COPY --from=builder /build/local-ai ./
@@ -240,51 +328,61 @@ COPY --from=builder /build/sources/go-piper/piper-phonemize/pi/lib/* /usr/lib/
# do not let stablediffusion rebuild (requires an older version of absl)
COPY --from=builder /build/backend-assets/grpc/stablediffusion ./backend-assets/grpc/stablediffusion
## Duplicated from Makefile to avoid having a big layer that's hard to push
RUN if [ "${IMAGE_TYPE}" = "extras" ]; then \
make -C backend/python/autogptq \
# Change the shell to bash so we can use [[ tests below
SHELL ["/bin/bash", "-c"]
# We try to strike a balance between individual layer size (as that affects total push time) and total image size
# Splitting the backends into more groups with fewer items results in a larger image, but a smaller size for the largest layer
# Splitting the backends into fewer groups with more items results in a smaller image, but a larger size for the largest layer
RUN if [[ ( "${EXTRA_BACKENDS}" =~ "coqui" || -z "${EXTRA_BACKENDS}" ) && "$IMAGE_TYPE" == "extras" ]]; then \
make -C backend/python/coqui \
; fi && \
if [[ ( "${EXTRA_BACKENDS}" =~ "parler-tts" || -z "${EXTRA_BACKENDS}" ) && "$IMAGE_TYPE" == "extras" ]]; then \
make -C backend/python/parler-tts \
; fi && \
if [[ ( "${EXTRA_BACKENDS}" =~ "diffusers" || -z "${EXTRA_BACKENDS}" ) && "$IMAGE_TYPE" == "extras" ]]; then \
make -C backend/python/diffusers \
; fi && \
if [[ ( "${EXTRA_BACKENDS}" =~ "transformers-musicgen" || -z "${EXTRA_BACKENDS}" ) && "$IMAGE_TYPE" == "extras" ]]; then \
make -C backend/python/transformers-musicgen \
; fi && \
if [[ ( "${EXTRA_BACKENDS}" =~ "exllama1" || -z "${EXTRA_BACKENDS}" ) && "$IMAGE_TYPE" == "extras" ]]; then \
make -C backend/python/exllama \
; fi
RUN if [ "${IMAGE_TYPE}" = "extras" ]; then \
make -C backend/python/bark \
RUN if [[ ( "${EXTRA_BACKENDS}" =~ "vall-e-x" || -z "${EXTRA_BACKENDS}" ) && "$IMAGE_TYPE" == "extras" ]]; then \
make -C backend/python/vall-e-x \
; fi && \
if [[ ( "${EXTRA_BACKENDS}" =~ "openvoice" || -z "${EXTRA_BACKENDS}" ) && "$IMAGE_TYPE" == "extras" ]]; then \
make -C backend/python/openvoice \
; fi && \
if [[ ( "${EXTRA_BACKENDS}" =~ "petals" || -z "${EXTRA_BACKENDS}" ) && "$IMAGE_TYPE" == "extras" ]]; then \
make -C backend/python/petals \
; fi && \
if [[ ( "${EXTRA_BACKENDS}" =~ "sentencetransformers" || -z "${EXTRA_BACKENDS}" ) && "$IMAGE_TYPE" == "extras" ]]; then \
make -C backend/python/sentencetransformers \
; fi && \
if [[ ( "${EXTRA_BACKENDS}" =~ "exllama2" || -z "${EXTRA_BACKENDS}" ) && "$IMAGE_TYPE" == "extras" ]]; then \
make -C backend/python/exllama2 \
; fi && \
if [[ ( "${EXTRA_BACKENDS}" =~ "transformers" || -z "${EXTRA_BACKENDS}" ) && "$IMAGE_TYPE" == "extras" ]]; then \
make -C backend/python/transformers \
; fi
RUN if [ "${IMAGE_TYPE}" = "extras" ]; then \
make -C backend/python/diffusers \
; fi
RUN if [ "${IMAGE_TYPE}" = "extras" ]; then \
make -C backend/python/vllm \
; fi
RUN if [ "${IMAGE_TYPE}" = "extras" ]; then \
make -C backend/python/mamba \
; fi
RUN if [ "${IMAGE_TYPE}" = "extras" ]; then \
make -C backend/python/sentencetransformers \
; fi
RUN if [ "${IMAGE_TYPE}" = "extras" ]; then \
make -C backend/python/rerankers \
; fi
RUN if [ "${IMAGE_TYPE}" = "extras" ]; then \
make -C backend/python/transformers \
; fi
RUN if [ "${IMAGE_TYPE}" = "extras" ]; then \
make -C backend/python/vall-e-x \
; fi
RUN if [ "${IMAGE_TYPE}" = "extras" ]; then \
make -C backend/python/exllama \
; fi
RUN if [ "${IMAGE_TYPE}" = "extras" ]; then \
make -C backend/python/exllama2 \
; fi
RUN if [ "${IMAGE_TYPE}" = "extras" ]; then \
make -C backend/python/petals \
; fi
RUN if [ "${IMAGE_TYPE}" = "extras" ]; then \
make -C backend/python/transformers-musicgen \
; fi
RUN if [ "${IMAGE_TYPE}" = "extras" ]; then \
make -C backend/python/parler-tts \
; fi
RUN if [ "${IMAGE_TYPE}" = "extras" ]; then \
make -C backend/python/coqui \
RUN if [[ ( "${EXTRA_BACKENDS}" =~ "vllm" || -z "${EXTRA_BACKENDS}" ) && "$IMAGE_TYPE" == "extras" ]]; then \
make -C backend/python/vllm \
; fi && \
if [[ ( "${EXTRA_BACKENDS}" =~ "autogptq" || -z "${EXTRA_BACKENDS}" ) && "$IMAGE_TYPE" == "extras" ]]; then \
make -C backend/python/autogptq \
; fi && \
if [[ ( "${EXTRA_BACKENDS}" =~ "bark" || -z "${EXTRA_BACKENDS}" ) && "$IMAGE_TYPE" == "extras" ]]; then \
make -C backend/python/bark \
; fi && \
if [[ ( "${EXTRA_BACKENDS}" =~ "rerankers" || -z "${EXTRA_BACKENDS}" ) && "$IMAGE_TYPE" == "extras" ]]; then \
make -C backend/python/rerankers \
; fi && \
if [[ ( "${EXTRA_BACKENDS}" =~ "mamba" || -z "${EXTRA_BACKENDS}" ) && "$IMAGE_TYPE" == "extras" ]]; then \
make -C backend/python/mamba \
; fi
# Make sure the models directory exists
@@ -293,7 +391,7 @@ RUN mkdir -p /build/models
# Define the health check command
HEALTHCHECK --interval=1m --timeout=10m --retries=10 \
CMD curl -f ${HEALTHCHECK_ENDPOINT} || exit 1
VOLUME /build/models
EXPOSE 8080
ENTRYPOINT [ "/build/entrypoint.sh" ]

185
Makefile
View File

@@ -5,7 +5,7 @@ BINARY_NAME=local-ai
# llama.cpp versions
GOLLAMA_STABLE_VERSION?=2b57a8ae43e4699d3dc5d1496a1ccd42922993be
CPPLLAMA_VERSION?=46e12c4692a37bdd31a0432fc5153d7d22bc7f72
CPPLLAMA_VERSION?=172c8256840ffd882ab9992ecedbb587d9b21f15
# gpt4all version
GPT4ALL_REPO?=https://github.com/nomic-ai/gpt4all
@@ -16,19 +16,19 @@ RWKV_REPO?=https://github.com/donomii/go-rwkv.cpp
RWKV_VERSION?=661e7ae26d442f5cfebd2a0881b44e8c55949ec6
# whisper.cpp version
WHISPER_CPP_VERSION?=858452d58dba3acdc3431c9bced2bb8cfd9bf418
WHISPER_CPP_VERSION?=b29b3b29240aac8b71ce8e5a4360c1f1562ad66f
# bert.cpp version
BERT_VERSION?=6abe312cded14042f6b7c3cd8edf082713334a4d
BERT_VERSION?=710044b124545415f555e4260d16b146c725a6e4
# go-piper version
PIPER_VERSION?=9d0100873a7dbb0824dfea40e8cec70a1b110759
# stablediffusion version
STABLEDIFFUSION_VERSION?=433ea6d9b64d9d08067324a757ef07040ea29568
STABLEDIFFUSION_VERSION?=4a3cd6aeae6f66ee57eae9a0075f8c58c3a6a38f
# tinydream version
TINYDREAM_VERSION?=22a12a4bc0ac5455856f28f3b771331a551a4293
TINYDREAM_VERSION?=c04fa463ace9d9a6464313aa5f9cd0f953b6c057
export BUILD_TYPE?=
export STABLE_BUILD_TYPE?=$(BUILD_TYPE)
@@ -38,7 +38,7 @@ CGO_LDFLAGS?=
CGO_LDFLAGS_WHISPER?=
CUDA_LIBPATH?=/usr/local/cuda/lib64/
GO_TAGS?=
BUILD_ID?=git
BUILD_ID?=
TEST_DIR=/tmp/test
@@ -70,7 +70,7 @@ UNAME_S := $(shell uname -s)
endif
ifeq ($(OS),Darwin)
ifeq ($(OSX_SIGNING_IDENTITY),)
OSX_SIGNING_IDENTITY := $(shell security find-identity -v -p codesigning | grep '"' | head -n 1 | sed -E 's/.*"(.*)"/\1/')
endif
@@ -99,8 +99,8 @@ endif
ifeq ($(BUILD_TYPE),cublas)
CGO_LDFLAGS+=-lcublas -lcudart -L$(CUDA_LIBPATH)
export LLAMA_CUBLAS=1
export WHISPER_CUBLAS=1
CGO_LDFLAGS_WHISPER+=-L$(CUDA_LIBPATH)/stubs/ -lcuda
export WHISPER_CUDA=1
CGO_LDFLAGS_WHISPER+=-L$(CUDA_LIBPATH)/stubs/ -lcuda -lcufft
endif
ifeq ($(BUILD_TYPE),hipblas)
@@ -112,7 +112,7 @@ ifeq ($(BUILD_TYPE),hipblas)
# llama-ggml has no hipblas support, so override it here.
export STABLE_BUILD_TYPE=
export WHISPER_HIPBLAS=1
GPU_TARGETS ?= gfx900,gfx90a,gfx1030,gfx1031,gfx1100
GPU_TARGETS ?= gfx900,gfx906,gfx908,gfx940,gfx941,gfx942,gfx90a,gfx1030,gfx1031,gfx1100,gfx1101
AMDGPU_TARGETS ?= "$(GPU_TARGETS)"
CMAKE_ARGS+=-DLLAMA_HIPBLAS=ON -DAMDGPU_TARGETS="$(AMDGPU_TARGETS)" -DGPU_TARGETS="$(GPU_TARGETS)"
CGO_LDFLAGS += -O3 --rtlib=compiler-rt -unwindlib=libgcc -lhipblas -lrocblas --hip-link -L${ROCM_HOME}/lib/llvm/lib
@@ -152,10 +152,14 @@ ifeq ($(findstring tts,$(GO_TAGS)),tts)
OPTIONAL_GRPC+=backend-assets/grpc/piper
endif
ALL_GRPC_BACKENDS=backend-assets/grpc/langchain-huggingface
ALL_GRPC_BACKENDS=backend-assets/grpc/huggingface
ALL_GRPC_BACKENDS+=backend-assets/grpc/bert-embeddings
ALL_GRPC_BACKENDS+=backend-assets/grpc/llama-cpp
ALL_GRPC_BACKENDS+=backend-assets/grpc/llama-cpp-avx
ALL_GRPC_BACKENDS+=backend-assets/grpc/llama-cpp-avx2
ALL_GRPC_BACKENDS+=backend-assets/grpc/llama-cpp-fallback
ALL_GRPC_BACKENDS+=backend-assets/grpc/llama-ggml
ALL_GRPC_BACKENDS+=backend-assets/grpc/llama-cpp-grpc
ALL_GRPC_BACKENDS+=backend-assets/util/llama-cpp-rpc-server
ALL_GRPC_BACKENDS+=backend-assets/grpc/gpt4all
ALL_GRPC_BACKENDS+=backend-assets/grpc/rwkv
ALL_GRPC_BACKENDS+=backend-assets/grpc/whisper
@@ -240,7 +244,7 @@ sources/whisper.cpp:
cd sources/whisper.cpp && git checkout -b build $(WHISPER_CPP_VERSION) && git submodule update --init --recursive --depth 1
sources/whisper.cpp/libwhisper.a: sources/whisper.cpp
cd sources/whisper.cpp && make libwhisper.a
cd sources/whisper.cpp && $(MAKE) libwhisper.a
get-sources: sources/go-llama.cpp sources/gpt4all sources/go-piper sources/go-rwkv.cpp sources/whisper.cpp sources/go-bert.cpp sources/go-stable-diffusion sources/go-tiny-dream
@@ -293,6 +297,7 @@ clean: ## Remove build related file
rm -rf backend-assets/*
$(MAKE) -C backend/cpp/grpc clean
$(MAKE) -C backend/cpp/llama clean
rm -rf backend/cpp/llama-* || true
$(MAKE) dropreplace
$(MAKE) protogen-clean
rmdir pkg/grpc/proto || true
@@ -311,14 +316,44 @@ build: prepare backend-assets grpcs ## Build the project
CGO_LDFLAGS="$(CGO_LDFLAGS)" $(GOCMD) build -ldflags "$(LD_FLAGS)" -tags "$(GO_TAGS)" -o $(BINARY_NAME) ./
build-minimal:
BUILD_GRPC_FOR_BACKEND_LLAMA=true GRPC_BACKENDS=backend-assets/grpc/llama-cpp GO_TAGS=none $(MAKE) build
BUILD_GRPC_FOR_BACKEND_LLAMA=true GRPC_BACKENDS="backend-assets/grpc/llama-cpp-avx2" GO_TAGS=none $(MAKE) build
build-api:
BUILD_GRPC_FOR_BACKEND_LLAMA=true BUILD_API_ONLY=true GO_TAGS=none $(MAKE) build
dist: build
dist:
STATIC=true $(MAKE) backend-assets/grpc/llama-cpp-avx2
ifeq ($(OS),Darwin)
$(info ${GREEN}I Skip CUDA/hipblas build on MacOS${RESET})
else
$(MAKE) backend-assets/grpc/llama-cpp-cuda
$(MAKE) backend-assets/grpc/llama-cpp-hipblas
$(MAKE) backend-assets/grpc/llama-cpp-sycl_f16
$(MAKE) backend-assets/grpc/llama-cpp-sycl_f32
endif
$(MAKE) build
mkdir -p release
# if BUILD_ID is empty, then we don't append it to the binary name
ifeq ($(BUILD_ID),)
cp $(BINARY_NAME) release/$(BINARY_NAME)-$(OS)-$(ARCH)
shasum -a 256 release/$(BINARY_NAME)-$(OS)-$(ARCH) > release/$(BINARY_NAME)-$(OS)-$(ARCH).sha256
else
cp $(BINARY_NAME) release/$(BINARY_NAME)-$(BUILD_ID)-$(OS)-$(ARCH)
shasum -a 256 release/$(BINARY_NAME)-$(BUILD_ID)-$(OS)-$(ARCH) > release/$(BINARY_NAME)-$(BUILD_ID)-$(OS)-$(ARCH).sha256
endif
dist-cross-linux-arm64:
CMAKE_ARGS="$(CMAKE_ARGS) -DLLAMA_NATIVE=off" GRPC_BACKENDS="backend-assets/grpc/llama-cpp-fallback backend-assets/grpc/llama-cpp-grpc backend-assets/util/llama-cpp-rpc-server" \
$(MAKE) build
mkdir -p release
# if BUILD_ID is empty, then we don't append it to the binary name
ifeq ($(BUILD_ID),)
cp $(BINARY_NAME) release/$(BINARY_NAME)-$(OS)-arm64
shasum -a 256 release/$(BINARY_NAME)-$(OS)-arm64 > release/$(BINARY_NAME)-$(OS)-arm64.sha256
else
cp $(BINARY_NAME) release/$(BINARY_NAME)-$(BUILD_ID)-$(OS)-arm64
shasum -a 256 release/$(BINARY_NAME)-$(BUILD_ID)-$(OS)-arm64 > release/$(BINARY_NAME)-$(BUILD_ID)-$(OS)-arm64.sha256
endif
osx-signed: build
codesign --deep --force --sign "$(OSX_SIGNING_IDENTITY)" --entitlements "./Entitlements.plist" "./$(BINARY_NAME)"
@@ -428,7 +463,7 @@ protogen-clean: protogen-go-clean protogen-python-clean
.PHONY: protogen-go
protogen-go:
mkdir -p pkg/grpc/proto
protoc -Ibackend/ --go_out=pkg/grpc/proto/ --go_opt=paths=source_relative --go-grpc_out=pkg/grpc/proto/ --go-grpc_opt=paths=source_relative \
protoc --experimental_allow_proto3_optional -Ibackend/ --go_out=pkg/grpc/proto/ --go_opt=paths=source_relative --go-grpc_out=pkg/grpc/proto/ --go-grpc_opt=paths=source_relative \
backend/backend.proto
.PHONY: protogen-go-clean
@@ -437,10 +472,10 @@ protogen-go-clean:
$(RM) bin/*
.PHONY: protogen-python
protogen-python: autogptq-protogen bark-protogen coqui-protogen diffusers-protogen exllama-protogen exllama2-protogen mamba-protogen petals-protogen rerankers-protogen sentencetransformers-protogen transformers-protogen parler-tts-protogen transformers-musicgen-protogen vall-e-x-protogen vllm-protogen
protogen-python: autogptq-protogen bark-protogen coqui-protogen diffusers-protogen exllama-protogen exllama2-protogen mamba-protogen petals-protogen rerankers-protogen sentencetransformers-protogen transformers-protogen parler-tts-protogen transformers-musicgen-protogen vall-e-x-protogen vllm-protogen openvoice-protogen
.PHONY: protogen-python-clean
protogen-python-clean: autogptq-protogen-clean bark-protogen-clean coqui-protogen-clean diffusers-protogen-clean exllama-protogen-clean exllama2-protogen-clean mamba-protogen-clean petals-protogen-clean sentencetransformers-protogen-clean rerankers-protogen-clean transformers-protogen-clean transformers-musicgen-protogen-clean parler-tts-protogen-clean vall-e-x-protogen-clean vllm-protogen-clean
protogen-python-clean: autogptq-protogen-clean bark-protogen-clean coqui-protogen-clean diffusers-protogen-clean exllama-protogen-clean exllama2-protogen-clean mamba-protogen-clean petals-protogen-clean sentencetransformers-protogen-clean rerankers-protogen-clean transformers-protogen-clean transformers-musicgen-protogen-clean parler-tts-protogen-clean vall-e-x-protogen-clean vllm-protogen-clean openvoice-protogen-clean
.PHONY: autogptq-protogen
autogptq-protogen:
@@ -554,6 +589,14 @@ vall-e-x-protogen:
vall-e-x-protogen-clean:
$(MAKE) -C backend/python/vall-e-x protogen-clean
.PHONY: openvoice-protogen
openvoice-protogen:
$(MAKE) -C backend/python/openvoice protogen
.PHONY: openvoice-protogen-clean
openvoice-protogen-clean:
$(MAKE) -C backend/python/openvoice protogen-clean
.PHONY: vllm-protogen
vllm-protogen:
$(MAKE) -C backend/python/vllm protogen
@@ -577,6 +620,7 @@ prepare-extra-conda-environments: protogen-python
$(MAKE) -C backend/python/transformers-musicgen
$(MAKE) -C backend/python/parler-tts
$(MAKE) -C backend/python/vall-e-x
$(MAKE) -C backend/python/openvoice
$(MAKE) -C backend/python/exllama
$(MAKE) -C backend/python/petals
$(MAKE) -C backend/python/exllama2
@@ -616,8 +660,8 @@ backend-assets/grpc/gpt4all: sources/gpt4all sources/gpt4all/gpt4all-bindings/go
CGO_LDFLAGS="$(CGO_LDFLAGS)" C_INCLUDE_PATH=$(CURDIR)/sources/gpt4all/gpt4all-bindings/golang/ LIBRARY_PATH=$(CURDIR)/sources/gpt4all/gpt4all-bindings/golang/ \
$(GOCMD) build -ldflags "$(LD_FLAGS)" -tags "$(GO_TAGS)" -o backend-assets/grpc/gpt4all ./backend/go/llm/gpt4all/
backend-assets/grpc/langchain-huggingface: backend-assets/grpc
$(GOCMD) build -ldflags "$(LD_FLAGS)" -tags "$(GO_TAGS)" -o backend-assets/grpc/langchain-huggingface ./backend/go/llm/langchain/
backend-assets/grpc/huggingface: backend-assets/grpc
$(GOCMD) build -ldflags "$(LD_FLAGS)" -tags "$(GO_TAGS)" -o backend-assets/grpc/huggingface ./backend/go/llm/langchain/
backend/cpp/llama/llama.cpp:
LLAMA_VERSION=$(CPPLLAMA_VERSION) $(MAKE) -C backend/cpp/llama llama.cpp
@@ -629,7 +673,7 @@ ADDED_CMAKE_ARGS=-Dabsl_DIR=${INSTALLED_LIB_CMAKE}/absl \
-Dutf8_range_DIR=${INSTALLED_LIB_CMAKE}/utf8_range \
-DgRPC_DIR=${INSTALLED_LIB_CMAKE}/grpc \
-DCMAKE_CXX_STANDARD_INCLUDE_DIRECTORIES=${INSTALLED_PACKAGES}/include
backend/cpp/llama/grpc-server:
build-llama-cpp-grpc-server:
# Conditionally build grpc for the llama backend to use if needed
ifdef BUILD_GRPC_FOR_BACKEND_LLAMA
$(MAKE) -C backend/cpp/grpc build
@@ -638,19 +682,84 @@ ifdef BUILD_GRPC_FOR_BACKEND_LLAMA
PATH="${INSTALLED_PACKAGES}/bin:${PATH}" \
CMAKE_ARGS="${CMAKE_ARGS} ${ADDED_CMAKE_ARGS}" \
LLAMA_VERSION=$(CPPLLAMA_VERSION) \
$(MAKE) -C backend/cpp/llama grpc-server
$(MAKE) -C backend/cpp/${VARIANT} grpc-server
else
echo "BUILD_GRPC_FOR_BACKEND_LLAMA is not defined."
LLAMA_VERSION=$(CPPLLAMA_VERSION) $(MAKE) -C backend/cpp/llama grpc-server
LLAMA_VERSION=$(CPPLLAMA_VERSION) $(MAKE) -C backend/cpp/${VARIANT} grpc-server
endif
backend-assets/grpc/llama-cpp: backend-assets/grpc backend/cpp/llama/grpc-server
cp -rfv backend/cpp/llama/grpc-server backend-assets/grpc/llama-cpp
# This target is for manually building a variant with-auto detected flags
backend-assets/grpc/llama-cpp: backend-assets/grpc
cp -rf backend/cpp/llama backend/cpp/llama-cpp
$(MAKE) -C backend/cpp/llama-cpp purge
$(info ${GREEN}I llama-cpp build info:avx2${RESET})
$(MAKE) VARIANT="llama-cpp" build-llama-cpp-grpc-server
cp -rfv backend/cpp/llama-cpp/grpc-server backend-assets/grpc/llama-cpp
backend-assets/grpc/llama-cpp-avx2: backend-assets/grpc
cp -rf backend/cpp/llama backend/cpp/llama-avx2
$(MAKE) -C backend/cpp/llama-avx2 purge
$(info ${GREEN}I llama-cpp build info:avx2${RESET})
CMAKE_ARGS="$(CMAKE_ARGS) -DLLAMA_AVX=on -DLLAMA_AVX2=on -DLLAMA_AVX512=off -DLLAMA_FMA=on -DLLAMA_F16C=on" $(MAKE) VARIANT="llama-avx2" build-llama-cpp-grpc-server
cp -rfv backend/cpp/llama-avx2/grpc-server backend-assets/grpc/llama-cpp-avx2
backend-assets/grpc/llama-cpp-avx: backend-assets/grpc
cp -rf backend/cpp/llama backend/cpp/llama-avx
$(MAKE) -C backend/cpp/llama-avx purge
$(info ${GREEN}I llama-cpp build info:avx${RESET})
CMAKE_ARGS="$(CMAKE_ARGS) -DLLAMA_AVX=on -DLLAMA_AVX2=off -DLLAMA_AVX512=off -DLLAMA_FMA=off -DLLAMA_F16C=off" $(MAKE) VARIANT="llama-avx" build-llama-cpp-grpc-server
cp -rfv backend/cpp/llama-avx/grpc-server backend-assets/grpc/llama-cpp-avx
backend-assets/grpc/llama-cpp-fallback: backend-assets/grpc
cp -rf backend/cpp/llama backend/cpp/llama-fallback
$(MAKE) -C backend/cpp/llama-fallback purge
$(info ${GREEN}I llama-cpp build info:fallback${RESET})
CMAKE_ARGS="$(CMAKE_ARGS) -DLLAMA_AVX=off -DLLAMA_AVX2=off -DLLAMA_AVX512=off -DLLAMA_FMA=off -DLLAMA_F16C=off" $(MAKE) VARIANT="llama-fallback" build-llama-cpp-grpc-server
cp -rfv backend/cpp/llama-fallback/grpc-server backend-assets/grpc/llama-cpp-fallback
# TODO: every binary should have its own folder instead, so can have different metal implementations
ifeq ($(BUILD_TYPE),metal)
cp backend/cpp/llama/llama.cpp/build/bin/default.metallib backend-assets/grpc/
cp backend/cpp/llama-fallback/llama.cpp/build/bin/default.metallib backend-assets/grpc/
endif
backend-assets/grpc/llama-cpp-cuda: backend-assets/grpc
cp -rf backend/cpp/llama backend/cpp/llama-cuda
$(MAKE) -C backend/cpp/llama-cuda purge
$(info ${GREEN}I llama-cpp build info:cuda${RESET})
CMAKE_ARGS="$(CMAKE_ARGS) -DLLAMA_AVX=on -DLLAMA_AVX2=off -DLLAMA_AVX512=off -DLLAMA_FMA=off -DLLAMA_F16C=off -DLLAMA_CUDA=ON" $(MAKE) VARIANT="llama-cuda" build-llama-cpp-grpc-server
cp -rfv backend/cpp/llama-cuda/grpc-server backend-assets/grpc/llama-cpp-cuda
backend-assets/grpc/llama-cpp-hipblas: backend-assets/grpc
cp -rf backend/cpp/llama backend/cpp/llama-hipblas
$(MAKE) -C backend/cpp/llama-hipblas purge
$(info ${GREEN}I llama-cpp build info:hipblas${RESET})
BUILD_TYPE="hipblas" $(MAKE) VARIANT="llama-hipblas" build-llama-cpp-grpc-server
cp -rfv backend/cpp/llama-hipblas/grpc-server backend-assets/grpc/llama-cpp-hipblas
backend-assets/grpc/llama-cpp-sycl_f16: backend-assets/grpc
cp -rf backend/cpp/llama backend/cpp/llama-sycl_f16
$(MAKE) -C backend/cpp/llama-sycl_f16 purge
$(info ${GREEN}I llama-cpp build info:sycl_f16${RESET})
BUILD_TYPE="sycl_f16" $(MAKE) VARIANT="llama-sycl_f16" build-llama-cpp-grpc-server
cp -rfv backend/cpp/llama-sycl_f16/grpc-server backend-assets/grpc/llama-cpp-sycl_f16
backend-assets/grpc/llama-cpp-sycl_f32: backend-assets/grpc
cp -rf backend/cpp/llama backend/cpp/llama-sycl_f32
$(MAKE) -C backend/cpp/llama-sycl_f32 purge
$(info ${GREEN}I llama-cpp build info:sycl_f32${RESET})
BUILD_TYPE="sycl_f32" $(MAKE) VARIANT="llama-sycl_f32" build-llama-cpp-grpc-server
cp -rfv backend/cpp/llama-sycl_f32/grpc-server backend-assets/grpc/llama-cpp-sycl_f32
backend-assets/grpc/llama-cpp-grpc: backend-assets/grpc
cp -rf backend/cpp/llama backend/cpp/llama-grpc
$(MAKE) -C backend/cpp/llama-grpc purge
$(info ${GREEN}I llama-cpp build info:grpc${RESET})
CMAKE_ARGS="$(CMAKE_ARGS) -DLLAMA_RPC=ON -DLLAMA_AVX=off -DLLAMA_AVX2=off -DLLAMA_AVX512=off -DLLAMA_FMA=off -DLLAMA_F16C=off" $(MAKE) VARIANT="llama-grpc" build-llama-cpp-grpc-server
cp -rfv backend/cpp/llama-grpc/grpc-server backend-assets/grpc/llama-cpp-grpc
backend-assets/util/llama-cpp-rpc-server: backend-assets/grpc/llama-cpp-grpc
mkdir -p backend-assets/util/
cp -rf backend/cpp/llama-grpc/llama.cpp/build/bin/rpc-server backend-assets/util/llama-cpp-rpc-server
backend-assets/grpc/llama-ggml: sources/go-llama.cpp sources/go-llama.cpp/libbinding.a backend-assets/grpc
CGO_LDFLAGS="$(CGO_LDFLAGS)" C_INCLUDE_PATH=$(CURDIR)/sources/go-llama.cpp LIBRARY_PATH=$(CURDIR)/sources/go-llama.cpp \
$(GOCMD) build -ldflags "$(LD_FLAGS)" -tags "$(GO_TAGS)" -o backend-assets/grpc/llama-ggml ./backend/go/llm/llama-ggml/
@@ -693,7 +802,7 @@ docker:
--build-arg MAKEFLAGS="$(DOCKER_MAKEFLAGS)" \
--build-arg BUILD_TYPE=$(BUILD_TYPE) \
-t $(DOCKER_IMAGE) .
docker-aio:
@echo "Building AIO image with base $(BASE_IMAGE) as $(DOCKER_AIO_IMAGE)"
docker build \
@@ -724,3 +833,25 @@ docker-image-intel-xpu:
.PHONY: swagger
swagger:
swag init -g core/http/app.go --output swagger
.PHONY: gen-assets
gen-assets:
$(GOCMD) run core/dependencies_manager/manager.go embedded/webui_static.yaml core/http/static/assets
## Documentation
docs/layouts/_default:
mkdir -p docs/layouts/_default
docs/static/gallery.html: docs/layouts/_default
$(GOCMD) run ./.github/ci/modelslist.go ./gallery/index.yaml > docs/static/gallery.html
docs/public: docs/layouts/_default docs/static/gallery.html
cd docs && hugo --minify
docs-clean:
rm -rf docs/public
rm -rf docs/static/gallery.html
.PHONY: docs
docs: docs/static/gallery.html
cd docs && hugo serve

View File

@@ -46,19 +46,35 @@
**LocalAI** is the free, Open Source OpenAI alternative. LocalAI act as a drop-in replacement REST API thats compatible with OpenAI (Elevenlabs, Anthropic... ) API specifications for local AI inferencing. It allows you to run LLMs, generate images, audio (and not only) locally or on-prem with consumer grade hardware, supporting multiple model families. Does not require GPU. It is created and maintained by [Ettore Di Giacinto](https://github.com/mudler).
![screen](https://github.com/mudler/LocalAI/assets/2420543/20b5ccd2-8393-44f0-aaf6-87a23806381e)
```bash
docker run -ti --name local-ai -p 8080:8080 localai/localai:latest-aio-cpu
# Alternative images:
# - if you have an Nvidia GPU:
# docker run -ti --name local-ai -p 8080:8080 --gpus all localai/localai:latest-aio-gpu-nvidia-cuda-12
# - without preconfigured models
# docker run -ti --name local-ai -p 8080:8080 localai/localai:latest
# - without preconfigured models for Nvidia GPUs
# docker run -ti --name local-ai -p 8080:8080 --gpus all localai/localai:latest-gpu-nvidia-cuda-12
```
[💻 Getting started](https://localai.io/basics/getting_started/index.html)
## 🔥🔥 Hot topics / Roadmap
[Roadmap](https://github.com/mudler/LocalAI/issues?q=is%3Aissue+is%3Aopen+label%3Aroadmap)
- 🔥🔥 Decentralized llama.cpp: https://github.com/mudler/LocalAI/pull/2343 (peer2peer llama.cpp!) 👉 Docs https://localai.io/features/distribute/
- 🔥🔥 Openvoice: https://github.com/mudler/LocalAI/pull/2334
- 🆕 Function calls without grammars and mixed mode: https://github.com/mudler/LocalAI/pull/2328
- 🔥🔥 Distributed inferencing: https://github.com/mudler/LocalAI/pull/2324
- Chat, TTS, and Image generation in the WebUI: https://github.com/mudler/LocalAI/pull/2222
- Reranker API: https://github.com/mudler/LocalAI/pull/2121
- Gallery WebUI: https://github.com/mudler/LocalAI/pull/2104
- llama3: https://github.com/mudler/LocalAI/discussions/2076
- Parler-TTS: https://github.com/mudler/LocalAI/pull/2027
- Openvino support: https://github.com/mudler/LocalAI/pull/1892
- Vector store: https://github.com/mudler/LocalAI/pull/1795
- All-in-one container image: https://github.com/mudler/LocalAI/issues/1855
Hot topics (looking for contributors):
- WebUI improvements: https://github.com/mudler/LocalAI/issues/2156
- Backends v2: https://github.com/mudler/LocalAI/issues/1126
- Improving UX v2: https://github.com/mudler/LocalAI/issues/1373
- Assistant API: https://github.com/mudler/LocalAI/issues/1273
@@ -67,29 +83,19 @@ Hot topics (looking for contributors):
If you want to help and contribute, issues up for grabs: https://github.com/mudler/LocalAI/issues?q=is%3Aissue+is%3Aopen+label%3A%22up+for+grabs%22
## 💻 [Getting started](https://localai.io/basics/getting_started/index.html)
For a detailed step-by-step introduction, refer to the [Getting Started](https://localai.io/basics/getting_started/index.html) guide.
For those in a hurry, here's a straightforward one-liner to launch a LocalAI AIO(All-in-one) Image using `docker`:
```bash
docker run -ti --name local-ai -p 8080:8080 localai/localai:latest-aio-cpu
# or, if you have an Nvidia GPU:
# docker run -ti --name local-ai -p 8080:8080 --gpus all localai/localai:latest-aio-gpu-nvidia-cuda-12
```
## 🚀 [Features](https://localai.io/features/)
- 📖 [Text generation with GPTs](https://localai.io/features/text-generation/) (`llama.cpp`, `gpt4all.cpp`, ... [:book: and more](https://localai.io/model-compatibility/index.html#model-compatibility-table))
- 🗣 [Text to Audio](https://localai.io/features/text-to-audio/)
- 🔈 [Audio to Text](https://localai.io/features/audio-to-text/) (Audio transcription with `whisper.cpp`)
- 🎨 [Image generation with stable diffusion](https://localai.io/features/image-generation)
- 🔥 [OpenAI functions](https://localai.io/features/openai-functions/) 🆕
- 🔥 [OpenAI-alike tools API](https://localai.io/features/openai-functions/)
- 🧠 [Embeddings generation for vector databases](https://localai.io/features/embeddings/)
- ✍️ [Constrained grammars](https://localai.io/features/constrained_grammars/)
- 🖼️ [Download Models directly from Huggingface ](https://localai.io/models/)
- 🆕 [Vision API](https://localai.io/features/gpt-vision/)
- 🥽 [Vision API](https://localai.io/features/gpt-vision/)
- 📈 [Reranker API](https://localai.io/features/reranker/)
- 🆕🖧 [P2P Inferencing](https://localai.io/features/distribute/)
## 💻 Usage
@@ -103,6 +109,7 @@ Build and deploy custom containers:
WebUIs:
- https://github.com/Jirubizu/localai-admin
- https://github.com/go-skynet/LocalAI-frontend
- QA-Pilot(An interactive chat project that leverages LocalAI LLMs for rapid understanding and navigation of GitHub code repository) https://github.com/reid41/QA-Pilot
Model galleries
- https://github.com/go-skynet/model-gallery
@@ -110,17 +117,19 @@ Model galleries
Other:
- Helm chart https://github.com/go-skynet/helm-charts
- VSCode extension https://github.com/badgooooor/localai-vscode-plugin
- Terminal utility https://github.com/djcopley/ShellOracle
- Local Smart assistant https://github.com/mudler/LocalAGI
- Home Assistant https://github.com/sammcj/homeassistant-localai / https://github.com/drndos/hass-openai-custom-conversation
- Home Assistant https://github.com/sammcj/homeassistant-localai / https://github.com/drndos/hass-openai-custom-conversation / https://github.com/valentinfrlch/ha-gpt4vision
- Discord bot https://github.com/mudler/LocalAGI/tree/main/examples/discord
- Slack bot https://github.com/mudler/LocalAGI/tree/main/examples/slack
- Shell-Pilot(Interact with LLM using LocalAI models via pure shell scripts on your Linux or MacOS system) https://github.com/reid41/shell-pilot
- Telegram bot https://github.com/mudler/LocalAI/tree/master/examples/telegram-bot
- Examples: https://github.com/mudler/LocalAI/tree/master/examples/
### 🔗 Resources
- 🆕 New! [LLM finetuning guide](https://localai.io/docs/advanced/fine-tuning/)
- [LLM finetuning guide](https://localai.io/docs/advanced/fine-tuning/)
- [How to build locally](https://localai.io/basics/build/index.html)
- [How to install in Kubernetes](https://localai.io/basics/getting_started/index.html#run-localai-in-kubernetes)
- [Projects integrating LocalAI](https://localai.io/docs/integrations/)
@@ -128,7 +137,8 @@ Other:
## :book: 🎥 [Media, Blogs, Social](https://localai.io/basics/news/#media-blogs-social)
- [Run LocalAI on AWS EKS with Pulumi](https://www.pulumi.com/ai/answers/tiZMDoZzZV6TLxgDXNBnFE/deploying-helm-charts-on-aws-eks)
- 🆕 [Run LocalAI on Jetson Nano Devkit](https://mudler.pm/posts/local-ai-jetson-nano-devkit/)
- [Run LocalAI on AWS EKS with Pulumi](https://www.pulumi.com/blog/low-code-llm-apps-with-local-ai-flowise-and-pulumi/)
- [Run LocalAI on AWS](https://staleks.hashnode.dev/installing-localai-on-aws-ec2-instance)
- [Create a slackbot for teams and OSS projects that answer to documentation](https://mudler.pm/posts/smart-slackbot-for-teams/)
- [LocalAI meets k8sgpt](https://www.youtube.com/watch?v=PKrDNuJ_dfE)
@@ -155,17 +165,16 @@ If you utilize this repository, data in a downstream project, please consider ci
Support the project by becoming [a backer or sponsor](https://github.com/sponsors/mudler). Your logo will show up here with a link to your website.
A huge thank you to our generous sponsors who support this project:
A huge thank you to our generous sponsors who support this project covering CI expenses, and our [Sponsor list](https://github.com/sponsors/mudler):
| ![Spectro Cloud logo_600x600px_transparent bg](https://github.com/go-skynet/LocalAI/assets/2420543/68a6f3cb-8a65-4a4d-99b5-6417a8905512) |
|:-----------------------------------------------:|
| [Spectro Cloud](https://www.spectrocloud.com/) |
| Spectro Cloud kindly supports LocalAI by providing GPU and computing resources to run tests on lamdalabs! |
And a huge shout-out to individuals sponsoring the project by donating hardware or backing the project.
- [Sponsor list](https://github.com/sponsors/mudler)
- JDAM00 (donating HW for the CI)
<p align="center">
<a href="https://www.spectrocloud.com/" target="blank">
<img height="200" src="https://github.com/go-skynet/LocalAI/assets/2420543/68a6f3cb-8a65-4a4d-99b5-6417a8905512">
</a>
<a href="https://www.premai.io/" target="blank">
<img height="200" src="https://github.com/mudler/LocalAI/assets/2420543/42e4ca83-661e-4f79-8e46-ae43689683d6"> <br>
</a>
</p>
## 🌟 Star history
@@ -175,7 +184,7 @@ And a huge shout-out to individuals sponsoring the project by donating hardware
LocalAI is a community-driven project created by [Ettore Di Giacinto](https://github.com/mudler/).
MIT - Author Ettore Di Giacinto
MIT - Author Ettore Di Giacinto <mudler@localai.io>
## 🙇 Acknowledgements

View File

@@ -1,9 +1,64 @@
name: gpt-4
mmap: true
parameters:
model: huggingface://NousResearch/Hermes-2-Pro-Mistral-7B-GGUF/Hermes-2-Pro-Mistral-7B.Q2_K.gguf
model: huggingface://NousResearch/Hermes-2-Pro-Llama-3-8B-GGUF/Hermes-2-Pro-Llama-3-8B-Q4_K_M.gguf
context_size: 8192
stopwords:
- "<|im_end|>"
- "<dummy32000>"
- "</tool_call>"
- "<|eot_id|>"
- "<|end_of_text|>"
function:
# disable injecting the "answer" tool
disable_no_action: true
grammar:
# This allows the grammar to also return messages
mixed_mode: true
# Suffix to add to the grammar
#prefix: '<tool_call>\n'
# Force parallel calls in the grammar
# parallel_calls: true
return_name_in_function_response: true
# Without grammar uncomment the lines below
# Warning: this is relying only on the capability of the
# LLM model to generate the correct function call.
json_regex_match:
- "(?s)<tool_call>(.*?)</tool_call>"
- "(?s)<tool_call>(.*?)"
replace_llm_results:
# Drop the scratchpad content from responses
- key: "(?s)<scratchpad>.*</scratchpad>"
value: ""
replace_function_results:
# Replace everything that is not JSON array or object
#
- key: '(?s)^[^{\[]*'
value: ""
- key: '(?s)[^}\]]*$'
value: ""
- key: "'([^']*?)'"
value: "_DQUOTE_${1}_DQUOTE_"
- key: '\\"'
value: "__TEMP_QUOTE__"
- key: "\'"
value: "'"
- key: "_DQUOTE_"
value: '"'
- key: "__TEMP_QUOTE__"
value: '"'
# Drop the scratchpad content from responses
- key: "(?s)<scratchpad>.*</scratchpad>"
value: ""
template:
chat: |
{{.Input -}}
<|im_start|>assistant
chat_message: |
<|im_start|>{{if eq .RoleName "assistant"}}assistant{{else if eq .RoleName "system"}}system{{else if eq .RoleName "tool"}}tool{{else if eq .RoleName "user"}}user{{end}}
{{- if .FunctionCall }}
@@ -22,38 +77,25 @@ template:
{{- else if eq .RoleName "tool" }}
</tool_response>
{{- end }}<|im_end|>
# https://huggingface.co/NousResearch/Hermes-2-Pro-Mistral-7B-GGUF#prompt-format-for-function-calling
function: |
completion: |
{{.Input}}
function: |-
<|im_start|>system
You are a function calling AI model. You are provided with function signatures within <tools></tools> XML tags. You may call one or more functions to assist with the user query. Don't make assumptions about what values to plug into functions. Here are the available tools:
You are a function calling AI model.
Here are the available tools:
<tools>
{{range .Functions}}
{'type': 'function', 'function': {'name': '{{.Name}}', 'description': '{{.Description}}', 'parameters': {{toJson .Parameters}} }}
{{end}}
</tools>
Use the following pydantic model json schema for each tool call you will make:
{'title': 'FunctionCall', 'type': 'object', 'properties': {'arguments': {'title': 'Arguments', 'type': 'object'}, 'name': {'title': 'Name', 'type': 'string'}}, 'required': ['arguments', 'name']}
For each function call return a json object with function name and arguments within <tool_call></tool_call> XML tags as follows:
You should call the tools provided to you sequentially
Please use <scratchpad> XML tags to record your reasoning and planning before you call the functions as follows:
<scratchpad>
{step-by-step reasoning and plan in bullet points}
</scratchpad>
For each function call return a json object with function name and arguments within <tool_call> XML tags as follows:
<tool_call>
{'arguments': <args-dict>, 'name': <function-name>}
{"arguments": <args-dict>, "name": <function-name>}
</tool_call><|im_end|>
{{.Input -}}
<|im_start|>assistant
<tool_call>
chat: |
{{.Input -}}
<|im_start|>assistant
completion: |
{{.Input}}
context_size: 4096
f16: true
stopwords:
- <|im_end|>
- <dummy32000>
- "\n</tool_call>"
- "\n\n\n"
usage: |
curl http://localhost:8080/v1/chat/completions -H "Content-Type: application/json" -d '{
"model": "gpt-4",
"messages": [{"role": "user", "content": "How are you doing?", "temperature": 0.1}]
}'

View File

@@ -1,9 +1,64 @@
name: gpt-4
mmap: true
parameters:
model: huggingface://NousResearch/Hermes-2-Pro-Mistral-7B-GGUF/Hermes-2-Pro-Mistral-7B.Q6_K.gguf
model: huggingface://NousResearch/Hermes-2-Pro-Llama-3-8B-GGUF/Hermes-2-Pro-Llama-3-8B-Q4_K_M.gguf
context_size: 8192
stopwords:
- "<|im_end|>"
- "<dummy32000>"
- "</tool_call>"
- "<|eot_id|>"
- "<|end_of_text|>"
function:
# disable injecting the "answer" tool
disable_no_action: true
grammar:
# This allows the grammar to also return messages
mixed_mode: true
# Suffix to add to the grammar
#prefix: '<tool_call>\n'
# Force parallel calls in the grammar
# parallel_calls: true
return_name_in_function_response: true
# Without grammar uncomment the lines below
# Warning: this is relying only on the capability of the
# LLM model to generate the correct function call.
json_regex_match:
- "(?s)<tool_call>(.*?)</tool_call>"
- "(?s)<tool_call>(.*?)"
replace_llm_results:
# Drop the scratchpad content from responses
- key: "(?s)<scratchpad>.*</scratchpad>"
value: ""
replace_function_results:
# Replace everything that is not JSON array or object
#
- key: '(?s)^[^{\[]*'
value: ""
- key: '(?s)[^}\]]*$'
value: ""
- key: "'([^']*?)'"
value: "_DQUOTE_${1}_DQUOTE_"
- key: '\\"'
value: "__TEMP_QUOTE__"
- key: "\'"
value: "'"
- key: "_DQUOTE_"
value: '"'
- key: "__TEMP_QUOTE__"
value: '"'
# Drop the scratchpad content from responses
- key: "(?s)<scratchpad>.*</scratchpad>"
value: ""
template:
chat: |
{{.Input -}}
<|im_start|>assistant
chat_message: |
<|im_start|>{{if eq .RoleName "assistant"}}assistant{{else if eq .RoleName "system"}}system{{else if eq .RoleName "tool"}}tool{{else if eq .RoleName "user"}}user{{end}}
{{- if .FunctionCall }}
@@ -22,38 +77,25 @@ template:
{{- else if eq .RoleName "tool" }}
</tool_response>
{{- end }}<|im_end|>
# https://huggingface.co/NousResearch/Hermes-2-Pro-Mistral-7B-GGUF#prompt-format-for-function-calling
function: |
completion: |
{{.Input}}
function: |-
<|im_start|>system
You are a function calling AI model. You are provided with function signatures within <tools></tools> XML tags. You may call one or more functions to assist with the user query. Don't make assumptions about what values to plug into functions. Here are the available tools:
You are a function calling AI model.
Here are the available tools:
<tools>
{{range .Functions}}
{'type': 'function', 'function': {'name': '{{.Name}}', 'description': '{{.Description}}', 'parameters': {{toJson .Parameters}} }}
{{end}}
</tools>
Use the following pydantic model json schema for each tool call you will make:
{'title': 'FunctionCall', 'type': 'object', 'properties': {'arguments': {'title': 'Arguments', 'type': 'object'}, 'name': {'title': 'Name', 'type': 'string'}}, 'required': ['arguments', 'name']}
For each function call return a json object with function name and arguments within <tool_call></tool_call> XML tags as follows:
You should call the tools provided to you sequentially
Please use <scratchpad> XML tags to record your reasoning and planning before you call the functions as follows:
<scratchpad>
{step-by-step reasoning and plan in bullet points}
</scratchpad>
For each function call return a json object with function name and arguments within <tool_call> XML tags as follows:
<tool_call>
{'arguments': <args-dict>, 'name': <function-name>}
{"arguments": <args-dict>, "name": <function-name>}
</tool_call><|im_end|>
{{.Input -}}
<|im_start|>assistant
<tool_call>
chat: |
{{.Input -}}
<|im_start|>assistant
completion: |
{{.Input}}
context_size: 4096
f16: true
stopwords:
- <|im_end|>
- <dummy32000>
- "\n</tool_call>"
- "\n\n\n"
usage: |
curl http://localhost:8080/v1/chat/completions -H "Content-Type: application/json" -d '{
"model": "gpt-4",
"messages": [{"role": "user", "content": "How are you doing?", "temperature": 0.1}]
}'
<|im_start|>assistant

View File

@@ -1,10 +1,66 @@
name: gpt-4
mmap: false
context_size: 8192
f16: false
parameters:
model: huggingface://NousResearch/Hermes-2-Pro-Mistral-7B-GGUF/Hermes-2-Pro-Mistral-7B.Q6_K.gguf
model: huggingface://NousResearch/Hermes-2-Pro-Llama-3-8B-GGUF/Hermes-2-Pro-Llama-3-8B-Q4_K_M.gguf
stopwords:
- "<|im_end|>"
- "<dummy32000>"
- "</tool_call>"
- "<|eot_id|>"
- "<|end_of_text|>"
function:
# disable injecting the "answer" tool
disable_no_action: true
grammar:
# This allows the grammar to also return messages
mixed_mode: true
# Suffix to add to the grammar
#prefix: '<tool_call>\n'
# Force parallel calls in the grammar
# parallel_calls: true
return_name_in_function_response: true
# Without grammar uncomment the lines below
# Warning: this is relying only on the capability of the
# LLM model to generate the correct function call.
json_regex_match:
- "(?s)<tool_call>(.*?)</tool_call>"
- "(?s)<tool_call>(.*?)"
replace_llm_results:
# Drop the scratchpad content from responses
- key: "(?s)<scratchpad>.*</scratchpad>"
value: ""
replace_function_results:
# Replace everything that is not JSON array or object
#
- key: '(?s)^[^{\[]*'
value: ""
- key: '(?s)[^}\]]*$'
value: ""
- key: "'([^']*?)'"
value: "_DQUOTE_${1}_DQUOTE_"
- key: '\\"'
value: "__TEMP_QUOTE__"
- key: "\'"
value: "'"
- key: "_DQUOTE_"
value: '"'
- key: "__TEMP_QUOTE__"
value: '"'
# Drop the scratchpad content from responses
- key: "(?s)<scratchpad>.*</scratchpad>"
value: ""
template:
chat: |
{{.Input -}}
<|im_start|>assistant
chat_message: |
<|im_start|>{{if eq .RoleName "assistant"}}assistant{{else if eq .RoleName "system"}}system{{else if eq .RoleName "tool"}}tool{{else if eq .RoleName "user"}}user{{end}}
{{- if .FunctionCall }}
@@ -23,37 +79,25 @@ template:
{{- else if eq .RoleName "tool" }}
</tool_response>
{{- end }}<|im_end|>
# https://huggingface.co/NousResearch/Hermes-2-Pro-Mistral-7B-GGUF#prompt-format-for-function-calling
function: |
completion: |
{{.Input}}
function: |-
<|im_start|>system
You are a function calling AI model. You are provided with function signatures within <tools></tools> XML tags. You may call one or more functions to assist with the user query. Don't make assumptions about what values to plug into functions. Here are the available tools:
You are a function calling AI model.
Here are the available tools:
<tools>
{{range .Functions}}
{'type': 'function', 'function': {'name': '{{.Name}}', 'description': '{{.Description}}', 'parameters': {{toJson .Parameters}} }}
{{end}}
</tools>
Use the following pydantic model json schema for each tool call you will make:
{'title': 'FunctionCall', 'type': 'object', 'properties': {'arguments': {'title': 'Arguments', 'type': 'object'}, 'name': {'title': 'Name', 'type': 'string'}}, 'required': ['arguments', 'name']}
For each function call return a json object with function name and arguments within <tool_call></tool_call> XML tags as follows:
You should call the tools provided to you sequentially
Please use <scratchpad> XML tags to record your reasoning and planning before you call the functions as follows:
<scratchpad>
{step-by-step reasoning and plan in bullet points}
</scratchpad>
For each function call return a json object with function name and arguments within <tool_call> XML tags as follows:
<tool_call>
{'arguments': <args-dict>, 'name': <function-name>}
{"arguments": <args-dict>, "name": <function-name>}
</tool_call><|im_end|>
{{.Input -}}
<|im_start|>assistant
<tool_call>
chat: |
{{.Input -}}
<|im_start|>assistant
completion: |
{{.Input}}
context_size: 4096
stopwords:
- <|im_end|>
- "\n</tool_call>"
- <dummy32000>
- "\n\n\n"
usage: |
curl http://localhost:8080/v1/chat/completions -H "Content-Type: application/json" -d '{
"model": "gpt-4",
"messages": [{"role": "user", "content": "How are you doing?", "temperature": 0.1}]
}'

View File

@@ -212,6 +212,9 @@ message ModelOptions {
float YarnBetaSlow = 47;
string Type = 49;
bool FlashAttention = 56;
bool NoKVOffload = 57;
}
message Result {
@@ -263,6 +266,7 @@ message TTSRequest {
string model = 2;
string dst = 3;
string voice = 4;
optional string language = 5;
}
message TokenizationResponse {

View File

@@ -43,35 +43,27 @@ llama.cpp:
llama.cpp/examples/grpc-server: llama.cpp
mkdir -p llama.cpp/examples/grpc-server
cp -r $(abspath ./)/CMakeLists.txt llama.cpp/examples/grpc-server/
cp -r $(abspath ./)/grpc-server.cpp llama.cpp/examples/grpc-server/
cp -rfv $(abspath ./)/json.hpp llama.cpp/examples/grpc-server/
cp -rfv $(abspath ./)/utils.hpp llama.cpp/examples/grpc-server/
echo "add_subdirectory(grpc-server)" >> llama.cpp/examples/CMakeLists.txt
## XXX: In some versions of CMake clip wasn't being built before llama.
## This is an hack for now, but it should be fixed in the future.
cp -rfv llama.cpp/examples/llava/clip.h llama.cpp/examples/grpc-server/clip.h
cp -rfv llama.cpp/examples/llava/llava.cpp llama.cpp/examples/grpc-server/llava.cpp
echo '#include "llama.h"' > llama.cpp/examples/grpc-server/llava.h
cat llama.cpp/examples/llava/llava.h >> llama.cpp/examples/grpc-server/llava.h
cp -rfv llama.cpp/examples/llava/clip.cpp llama.cpp/examples/grpc-server/clip.cpp
bash prepare.sh
rebuild:
cp -rfv $(abspath ./)/CMakeLists.txt llama.cpp/examples/grpc-server/
cp -rfv $(abspath ./)/grpc-server.cpp llama.cpp/examples/grpc-server/
cp -rfv $(abspath ./)/json.hpp llama.cpp/examples/grpc-server/
bash prepare.sh
rm -rf grpc-server
$(MAKE) grpc-server
clean:
rm -rf llama.cpp
purge:
rm -rf llama.cpp/build
rm -rf llama.cpp/examples/grpc-server
rm -rf grpc-server
clean: purge
rm -rf llama.cpp
grpc-server: llama.cpp llama.cpp/examples/grpc-server
@echo "Building grpc-server with $(BUILD_TYPE) build type and $(CMAKE_ARGS)"
ifneq (,$(findstring sycl,$(BUILD_TYPE)))
bash -c "source $(ONEAPI_VARS); \
cd llama.cpp && mkdir -p build && cd build && cmake .. $(CMAKE_ARGS) && cmake --build . --config Release"
cd llama.cpp && mkdir -p build && cd build && cmake .. $(CMAKE_ARGS) && $(MAKE)"
else
cd llama.cpp && mkdir -p build && cd build && cmake .. $(CMAKE_ARGS) && cmake --build . --config Release
cd llama.cpp && mkdir -p build && cd build && cmake .. $(CMAKE_ARGS) && $(MAKE)
endif
cp llama.cpp/build/bin/grpc-server .

View File

@@ -791,7 +791,7 @@ struct llama_server_context
sampler_names.emplace_back(sampler_name);
}
}
slot->sparams.samplers_sequence = sampler_types_from_names(sampler_names, false);
slot->sparams.samplers_sequence = llama_sampling_types_from_names(sampler_names, false);
}
else
{
@@ -1146,7 +1146,7 @@ struct llama_server_context
std::vector<std::string> samplers_sequence;
for (const auto &sampler_type : slot.sparams.samplers_sequence)
{
samplers_sequence.emplace_back(sampler_type_to_name_string(sampler_type));
samplers_sequence.emplace_back(llama_sampling_type_to_str(sampler_type));
}
return json {
@@ -2217,6 +2217,12 @@ static void params_parse(const backend::ModelOptions* request,
} else {
params.n_parallel = 1;
}
const char *llama_grpc_servers = std::getenv("LLAMACPP_GRPC_SERVERS");
if (llama_grpc_servers != NULL) {
params.rpc_servers = std::string(llama_grpc_servers);
}
// TODO: Add yarn
if (!request->tensorsplit().empty()) {
@@ -2254,6 +2260,9 @@ static void params_parse(const backend::ModelOptions* request,
}
params.use_mlock = request->mlock();
params.use_mmap = request->mmap();
params.flash_attn = request->flashattention();
params.no_kv_offload = request->nokvoffload();
params.embedding = request->embeddings();
if (request->ropescaling() == "none") { params.rope_scaling_type = LLAMA_ROPE_SCALING_TYPE_NONE; }

View File

@@ -0,0 +1,20 @@
#!/bin/bash
cp -r CMakeLists.txt llama.cpp/examples/grpc-server/
cp -r grpc-server.cpp llama.cpp/examples/grpc-server/
cp -rfv json.hpp llama.cpp/examples/grpc-server/
cp -rfv utils.hpp llama.cpp/examples/grpc-server/
if grep -q "grpc-server" llama.cpp/examples/CMakeLists.txt; then
echo "grpc-server already added"
else
echo "add_subdirectory(grpc-server)" >> llama.cpp/examples/CMakeLists.txt
fi
## XXX: In some versions of CMake clip wasn't being built before llama.
## This is an hack for now, but it should be fixed in the future.
cp -rfv llama.cpp/examples/llava/clip.h llama.cpp/examples/grpc-server/clip.h
cp -rfv llama.cpp/examples/llava/llava.cpp llama.cpp/examples/grpc-server/llava.cpp
echo '#include "llama.h"' > llama.cpp/examples/grpc-server/llava.h
cat llama.cpp/examples/llava/llava.h >> llama.cpp/examples/grpc-server/llava.h
cp -rfv llama.cpp/examples/llava/clip.cpp llama.cpp/examples/grpc-server/clip.cpp

View File

@@ -4,6 +4,7 @@ package main
// It is meant to be used by the main executable that is the server for the specific backend type (falcon, gpt3, etc)
import (
"fmt"
"os"
"github.com/go-skynet/LocalAI/pkg/grpc/base"
pb "github.com/go-skynet/LocalAI/pkg/grpc/proto"
@@ -18,9 +19,14 @@ type LLM struct {
}
func (llm *LLM) Load(opts *pb.ModelOptions) error {
llm.langchain, _ = langchain.NewHuggingFace(opts.Model)
var err error
hfToken := os.Getenv("HUGGINGFACEHUB_API_TOKEN")
if hfToken == "" {
return fmt.Errorf("no huggingface token provided")
}
llm.langchain, err = langchain.NewHuggingFace(opts.Model, hfToken)
llm.model = opts.Model
return nil
return err
}
func (llm *LLM) Predict(opts *pb.PredictOptions) (string, error) {

View File

@@ -29,8 +29,8 @@ func audioToWav(src, dst string) error {
return nil
}
func Transcript(model whisper.Model, audiopath, language string, threads uint) (schema.Result, error) {
res := schema.Result{}
func Transcript(model whisper.Model, audiopath, language string, threads uint) (schema.TranscriptionResult, error) {
res := schema.TranscriptionResult{}
dir, err := os.MkdirTemp("", "whisper")
if err != nil {

View File

@@ -21,6 +21,6 @@ func (sd *Whisper) Load(opts *pb.ModelOptions) error {
return err
}
func (sd *Whisper) AudioTranscription(opts *pb.TranscriptRequest) (schema.Result, error) {
func (sd *Whisper) AudioTranscription(opts *pb.TranscriptRequest) (schema.TranscriptionResult, error) {
return Transcript(sd.whisper, opts.Dst, opts.Language, uint(opts.Threads))
}

View File

@@ -1,6 +1,6 @@
.PHONY: autogptq
autogptq: protogen
$(MAKE) -C ../common-env/transformers
bash install.sh
.PHONY: protogen
protogen: backend_pb2_grpc.py backend_pb2.py
@@ -10,4 +10,8 @@ protogen-clean:
$(RM) backend_pb2_grpc.py backend_pb2.py
backend_pb2_grpc.py backend_pb2.py:
python3 -m grpc_tools.protoc -I../.. --python_out=. --grpc_python_out=. backend.proto
python3 -m grpc_tools.protoc -I../.. --python_out=. --grpc_python_out=. backend.proto
.PHONY: clean
clean: protogen-clean
rm -rf venv __pycache__

View File

@@ -1,93 +0,0 @@
####
# Attention! This file is abandoned.
# Please use the ../common-env/transformers/transformers.yml file to manage dependencies.
###
name: autogptq
channels:
- defaults
dependencies:
- _libgcc_mutex=0.1=main
- _openmp_mutex=5.1=1_gnu
- bzip2=1.0.8=h7b6447c_0
- ca-certificates=2023.08.22=h06a4308_0
- ld_impl_linux-64=2.38=h1181459_1
- libffi=3.4.4=h6a678d5_0
- libgcc-ng=11.2.0=h1234567_1
- libgomp=11.2.0=h1234567_1
- libstdcxx-ng=11.2.0=h1234567_1
- libuuid=1.41.5=h5eee18b_0
- ncurses=6.4=h6a678d5_0
- openssl=3.0.11=h7f8727e_2
- pip=23.2.1=py311h06a4308_0
- python=3.11.5=h955ad1f_0
- readline=8.2=h5eee18b_0
- setuptools=68.0.0=py311h06a4308_0
- sqlite=3.41.2=h5eee18b_0
- tk=8.6.12=h1ccaba5_0
- wheel=0.41.2=py311h06a4308_0
- xz=5.4.2=h5eee18b_0
- zlib=1.2.13=h5eee18b_0
- pip:
- accelerate==0.27.0
- aiohttp==3.8.5
- aiosignal==1.3.1
- async-timeout==4.0.3
- attrs==23.1.0
- auto-gptq==0.7.1
- certifi==2023.7.22
- charset-normalizer==3.3.0
- datasets==2.14.5
- dill==0.3.7
- filelock==3.12.4
- frozenlist==1.4.0
- fsspec==2023.6.0
- grpcio==1.59.0
- huggingface-hub==0.16.4
- idna==3.4
- jinja2==3.1.2
- markupsafe==2.1.3
- mpmath==1.3.0
- multidict==6.0.4
- multiprocess==0.70.15
- networkx==3.1
- numpy==1.26.0
- nvidia-cublas-cu12==12.1.3.1
- nvidia-cuda-cupti-cu12==12.1.105
- nvidia-cuda-nvrtc-cu12==12.1.105
- nvidia-cuda-runtime-cu12==12.1.105
- nvidia-cudnn-cu12==8.9.2.26
- nvidia-cufft-cu12==11.0.2.54
- nvidia-curand-cu12==10.3.2.106
- nvidia-cusolver-cu12==11.4.5.107
- nvidia-cusparse-cu12==12.1.0.106
- nvidia-nccl-cu12==2.18.1
- nvidia-nvjitlink-cu12==12.2.140
- nvidia-nvtx-cu12==12.1.105
- optimum==1.17.1
- packaging==23.2
- pandas==2.1.1
- peft==0.5.0
- protobuf==4.24.4
- psutil==5.9.5
- pyarrow==13.0.0
- python-dateutil==2.8.2
- pytz==2023.3.post1
- pyyaml==6.0.1
- regex==2023.10.3
- requests==2.31.0
- rouge==1.0.1
- safetensors>=0.3.3
- six==1.16.0
- sympy==1.12
- tokenizers==0.14.0
- tqdm==4.66.1
- torch==2.2.1
- torchvision==0.17.1
- transformers==4.34.0
- transformers_stream_generator==0.0.5
- triton==2.1.0
- typing-extensions==4.8.0
- tzdata==2023.3
- urllib3==2.0.6
- xxhash==3.4.1
- yarl==1.9.2

View File

@@ -0,0 +1,14 @@
#!/bin/bash
set -e
source $(dirname $0)/../common/libbackend.sh
# This is here because the Intel pip index is broken and returns 200 status codes for every package name, it just doesn't return any package links.
# This makes uv think that the package exists in the Intel pip index, and by default it stops looking at other pip indexes once it finds a match.
# We need uv to continue falling through to the pypi default index to find optimum[openvino] in the pypi index
# the --upgrade actually allows us to *downgrade* torch to the version provided in the Intel pip index
if [ "x${BUILD_PROFILE}" == "xintel" ]; then
EXTRA_PIP_INSTALL_FLAGS+=" --upgrade --index-strategy=unsafe-first-match"
fi
installRequirements

View File

@@ -0,0 +1,2 @@
--extra-index-url https://download.pytorch.org/whl/rocm6.0
torch

View File

@@ -0,0 +1,5 @@
--extra-index-url https://pytorch-extension.intel.com/release-whl/stable/xpu/us/
intel-extension-for-pytorch
torch
optimum[openvino]
setuptools==69.5.1 # https://github.com/mudler/LocalAI/issues/2406

View File

@@ -0,0 +1,7 @@
accelerate
auto-gptq==0.7.1
grpcio==1.64.0
protobuf
torch
certifi
transformers

View File

@@ -1,14 +1,4 @@
#!/bin/bash
source $(dirname $0)/../common/libbackend.sh
##
## A bash script wrapper that runs the autogptq server with conda
export PATH=$PATH:/opt/conda/bin
# Activate conda environment
source activate transformers
# get the directory where the bash script is located
DIR="$( cd "$( dirname "${BASH_SOURCE[0]}" )" >/dev/null 2>&1 && pwd )"
python $DIR/autogptq.py $@
startBackend $@

View File

@@ -0,0 +1,6 @@
#!/bin/bash
set -e
source $(dirname $0)/../common/libbackend.sh
runUnittests

View File

@@ -1,6 +1,6 @@
.PHONY: ttsbark
ttsbark: protogen
$(MAKE) -C ../common-env/transformers
bash install.sh
.PHONY: run
run: protogen
@@ -22,4 +22,8 @@ protogen-clean:
$(RM) backend_pb2_grpc.py backend_pb2.py
backend_pb2_grpc.py backend_pb2.py:
python3 -m grpc_tools.protoc -I../.. --python_out=. --grpc_python_out=. backend.proto
python3 -m grpc_tools.protoc -I../.. --python_out=. --grpc_python_out=. backend.proto
.PHONY: clean
clean: protogen-clean
rm -rf venv __pycache__

14
backend/python/bark/install.sh Executable file
View File

@@ -0,0 +1,14 @@
#!/bin/bash
set -e
source $(dirname $0)/../common/libbackend.sh
# This is here because the Intel pip index is broken and returns 200 status codes for every package name, it just doesn't return any package links.
# This makes uv think that the package exists in the Intel pip index, and by default it stops looking at other pip indexes once it finds a match.
# We need uv to continue falling through to the pypi default index to find optimum[openvino] in the pypi index
# the --upgrade actually allows us to *downgrade* torch to the version provided in the Intel pip index
if [ "x${BUILD_PROFILE}" == "xintel" ]; then
EXTRA_PIP_INSTALL_FLAGS+=" --upgrade --index-strategy=unsafe-first-match"
fi
installRequirements

View File

@@ -0,0 +1,3 @@
--extra-index-url https://download.pytorch.org/whl/rocm6.0
torch
torchaudio

View File

@@ -0,0 +1,6 @@
--extra-index-url https://pytorch-extension.intel.com/release-whl/stable/xpu/us/
intel-extension-for-pytorch
torch
torchaudio
optimum[openvino]
setuptools==69.5.1 # https://github.com/mudler/LocalAI/issues/2406

View File

@@ -0,0 +1,6 @@
accelerate
bark==0.1.5
grpcio==1.64.0
protobuf
certifi
transformers

View File

@@ -1,14 +1,4 @@
#!/bin/bash
source $(dirname $0)/../common/libbackend.sh
##
## A bash script wrapper that runs the ttsbark server with conda
export PATH=$PATH:/opt/conda/bin
# Activate conda environment
source activate transformers
# get the directory where the bash script is located
DIR="$( cd "$( dirname "${BASH_SOURCE[0]}" )" >/dev/null 2>&1 && pwd )"
python $DIR/ttsbark.py $@
startBackend $@

View File

@@ -18,7 +18,7 @@ class TestBackendServicer(unittest.TestCase):
"""
This method sets up the gRPC service by starting the server
"""
self.service = subprocess.Popen(["python3", "ttsbark.py", "--addr", "localhost:50051"])
self.service = subprocess.Popen(["python3", "backend.py", "--addr", "localhost:50051"])
time.sleep(10)
def tearDown(self) -> None:

11
backend/python/bark/test.sh Normal file → Executable file
View File

@@ -1,11 +1,6 @@
#!/bin/bash
##
## A bash script wrapper that runs the bark server with conda
set -e
# Activate conda environment
source activate transformers
source $(dirname $0)/../common/libbackend.sh
# get the directory where the bash script is located
DIR="$( cd "$( dirname "${BASH_SOURCE[0]}" )" >/dev/null 2>&1 && pwd )"
python -m unittest $DIR/test.py
runUnittests

View File

@@ -1,21 +0,0 @@
CONDA_ENV_PATH = "transformers.yml"
ifeq ($(BUILD_TYPE), cublas)
CONDA_ENV_PATH = "transformers-nvidia.yml"
endif
ifeq ($(BUILD_TYPE), hipblas)
CONDA_ENV_PATH = "transformers-rocm.yml"
endif
# Intel GPU are supposed to have dependencies installed in the main python
# environment, so we skip conda installation for SYCL builds.
# https://github.com/intel/intel-extension-for-pytorch/issues/538
ifneq (,$(findstring sycl,$(BUILD_TYPE)))
export SKIP_CONDA=1
endif
.PHONY: transformers
transformers:
@echo "Installing $(CONDA_ENV_PATH)..."
bash install.sh $(CONDA_ENV_PATH)

View File

@@ -1,44 +0,0 @@
#!/bin/bash
set -ex
SKIP_CONDA=${SKIP_CONDA:-0}
REQUIREMENTS_FILE=$1
# Check if environment exist
conda_env_exists(){
! conda list --name "${@}" >/dev/null 2>/dev/null
}
if [ $SKIP_CONDA -eq 1 ]; then
echo "Skipping conda environment installation"
else
export PATH=$PATH:/opt/conda/bin
if conda_env_exists "transformers" ; then
echo "Creating virtual environment..."
conda env create --name transformers --file $REQUIREMENTS_FILE
echo "Virtual environment created."
else
echo "Virtual environment already exists."
fi
fi
if [ -d "/opt/intel" ]; then
# Intel GPU: If the directory exists, we assume we are using the intel image
# (no conda env)
# https://github.com/intel/intel-extension-for-pytorch/issues/538
pip install intel-extension-for-transformers datasets sentencepiece tiktoken neural_speed optimum[openvino]
fi
# If we didn't skip conda, activate the environment
# to install FlashAttention
if [ $SKIP_CONDA -eq 0 ]; then
source activate transformers
fi
if [[ $REQUIREMENTS_FILE =~ -nvidia.yml$ ]]; then
#TODO: FlashAttention is supported on nvidia and ROCm, but ROCm install can't be done this easily
pip install flash-attn --no-build-isolation
fi
if [ "$PIP_CACHE_PURGE" = true ] ; then
pip cache purge
fi

View File

@@ -1,125 +0,0 @@
name: transformers
channels:
- defaults
dependencies:
- _libgcc_mutex=0.1=main
- _openmp_mutex=5.1=1_gnu
- bzip2=1.0.8=h7b6447c_0
- ca-certificates=2023.08.22=h06a4308_0
- ld_impl_linux-64=2.38=h1181459_1
- libffi=3.4.4=h6a678d5_0
- libgcc-ng=11.2.0=h1234567_1
- libgomp=11.2.0=h1234567_1
- libstdcxx-ng=11.2.0=h1234567_1
- libuuid=1.41.5=h5eee18b_0
- ncurses=6.4=h6a678d5_0
- openssl=3.0.11=h7f8727e_2
- pip=23.2.1=py311h06a4308_0
- python=3.11.5=h955ad1f_0
- readline=8.2=h5eee18b_0
- setuptools=68.0.0=py311h06a4308_0
- sqlite=3.41.2=h5eee18b_0
- tk=8.6.12=h1ccaba5_0
- wheel=0.41.2=py311h06a4308_0
- xz=5.4.2=h5eee18b_0
- zlib=1.2.13=h5eee18b_0
- pip:
- accelerate==0.27.0
- aiohttp==3.8.5
- aiosignal==1.3.1
- async-timeout==4.0.3
- auto-gptq==0.7.1
- attrs==23.1.0
- bark==0.1.5
- bitsandbytes==0.43.0
- boto3==1.28.61
- botocore==1.31.61
- certifi==2023.7.22
- TTS==0.22.0
- charset-normalizer==3.3.0
- datasets==2.14.5
- sentence-transformers==2.5.1 # Updated Version
- sentencepiece==0.1.99
- dill==0.3.7
- einops==0.7.0
- encodec==0.1.1
- filelock==3.12.4
- frozenlist==1.4.0
- fsspec==2023.6.0
- funcy==2.0
- grpcio==1.59.0
- huggingface-hub
- idna==3.4
- jinja2==3.1.2
- jmespath==1.0.1
- markupsafe==2.1.3
- mpmath==1.3.0
- multidict==6.0.4
- multiprocess==0.70.15
- networkx
- numpy==1.26.0
- nvidia-cublas-cu12==12.1.3.1
- nvidia-cuda-cupti-cu12==12.1.105
- nvidia-cuda-nvrtc-cu12==12.1.105
- nvidia-cuda-runtime-cu12==12.1.105
- nvidia-cudnn-cu12==8.9.2.26
- nvidia-cufft-cu12==11.0.2.54
- nvidia-curand-cu12==10.3.2.106
- nvidia-cusolver-cu12==11.4.5.107
- nvidia-cusparse-cu12==12.1.0.106
- nvidia-nccl-cu12==2.18.1
- nvidia-nvjitlink-cu12==12.2.140
- nvidia-nvtx-cu12==12.1.105
- optimum==1.17.1
- packaging==23.2
- pandas
- peft==0.5.0
- protobuf==4.24.4
- psutil==5.9.5
- pyarrow==13.0.0
- python-dateutil==2.8.2
- pytz==2023.3.post1
- pyyaml==6.0.1
- regex==2023.10.3
- requests==2.31.0
- rouge==1.0.1
- s3transfer==0.7.0
- safetensors>=0.4.1
- scipy==1.12.0 # Updated Version
- six==1.16.0
- sympy==1.12
- tokenizers
- torch==2.1.2
- torchvision==0.16.2
- torchaudio==2.1.2
- tqdm==4.66.1
- triton==2.1.0
- typing-extensions==4.8.0
- tzdata==2023.3
- urllib3==1.26.17
- xxhash==3.4.1
- yarl==1.9.2
- soundfile
- langid
- wget
- unidecode
- pyopenjtalk-prebuilt
- pypinyin
- inflect
- cn2an
- jieba
- eng_to_ipa
- openai-whisper
- matplotlib
- gradio==3.41.2
- nltk
- sudachipy
- sudachidict_core
- vocos
- vllm>=0.4.0
- transformers>=4.38.2 # Updated Version
- transformers_stream_generator==0.0.5
- xformers==0.0.23.post1
- rerankers[transformers]
- pydantic
prefix: /opt/conda/envs/transformers

View File

@@ -1,113 +0,0 @@
name: transformers
channels:
- defaults
dependencies:
- _libgcc_mutex=0.1=main
- _openmp_mutex=5.1=1_gnu
- bzip2=1.0.8=h7b6447c_0
- ca-certificates=2023.08.22=h06a4308_0
- ld_impl_linux-64=2.38=h1181459_1
- libffi=3.4.4=h6a678d5_0
- libgcc-ng=11.2.0=h1234567_1
- libgomp=11.2.0=h1234567_1
- libstdcxx-ng=11.2.0=h1234567_1
- libuuid=1.41.5=h5eee18b_0
- ncurses=6.4=h6a678d5_0
- openssl=3.0.11=h7f8727e_2
- pip=23.2.1=py311h06a4308_0
- python=3.11.5=h955ad1f_0
- readline=8.2=h5eee18b_0
- setuptools=68.0.0=py311h06a4308_0
- sqlite=3.41.2=h5eee18b_0
- tk=8.6.12=h1ccaba5_0
- wheel=0.41.2=py311h06a4308_0
- xz=5.4.2=h5eee18b_0
- zlib=1.2.13=h5eee18b_0
- pip:
- --pre
- --extra-index-url https://download.pytorch.org/whl/nightly/
- accelerate==0.27.0
- auto-gptq==0.7.1
- aiohttp==3.8.5
- aiosignal==1.3.1
- async-timeout==4.0.3
- attrs==23.1.0
- bark==0.1.5
- boto3==1.28.61
- botocore==1.31.61
- certifi==2023.7.22
- TTS==0.22.0
- charset-normalizer==3.3.0
- datasets==2.14.5
- sentence-transformers==2.5.1 # Updated Version
- sentencepiece==0.1.99
- dill==0.3.7
- einops==0.7.0
- encodec==0.1.1
- filelock==3.12.4
- frozenlist==1.4.0
- fsspec==2023.6.0
- funcy==2.0
- grpcio==1.59.0
- huggingface-hub
- idna==3.4
- jinja2==3.1.2
- jmespath==1.0.1
- markupsafe==2.1.3
- mpmath==1.3.0
- multidict==6.0.4
- multiprocess==0.70.15
- networkx
- numpy==1.26.0
- packaging==23.2
- pandas
- peft==0.5.0
- protobuf==4.24.4
- psutil==5.9.5
- pyarrow==13.0.0
- python-dateutil==2.8.2
- pytz==2023.3.post1
- pyyaml==6.0.1
- regex==2023.10.3
- requests==2.31.0
- rouge==1.0.1
- s3transfer==0.7.0
- safetensors>=0.4.1
- scipy==1.12.0 # Updated Version
- six==1.16.0
- sympy==1.12
- tokenizers
- torch
- torchaudio
- tqdm==4.66.1
- triton==2.1.0
- typing-extensions==4.8.0
- tzdata==2023.3
- urllib3==1.26.17
- xxhash==3.4.1
- yarl==1.9.2
- soundfile
- langid
- wget
- unidecode
- optimum==1.17.1
- pyopenjtalk-prebuilt
- pypinyin
- inflect
- cn2an
- jieba
- eng_to_ipa
- openai-whisper
- matplotlib
- gradio==3.41.2
- nltk
- sudachipy
- sudachidict_core
- vocos
- vllm>=0.4.0
- transformers>=4.38.2 # Updated Version
- transformers_stream_generator==0.0.5
- xformers==0.0.23.post1
- rerankers[transformers]
- pydantic
prefix: /opt/conda/envs/transformers

View File

@@ -1,118 +0,0 @@
name: transformers
channels:
- defaults
dependencies:
- _libgcc_mutex=0.1=main
- _openmp_mutex=5.1=1_gnu
- bzip2=1.0.8=h7b6447c_0
- ca-certificates=2023.08.22=h06a4308_0
- ld_impl_linux-64=2.38=h1181459_1
- libffi=3.4.4=h6a678d5_0
- libgcc-ng=11.2.0=h1234567_1
- libgomp=11.2.0=h1234567_1
- libstdcxx-ng=11.2.0=h1234567_1
- libuuid=1.41.5=h5eee18b_0
- ncurses=6.4=h6a678d5_0
- openssl=3.0.11=h7f8727e_2
- pip=23.2.1=py311h06a4308_0
- python=3.11.5=h955ad1f_0
- readline=8.2=h5eee18b_0
- setuptools=68.0.0=py311h06a4308_0
- sqlite=3.41.2=h5eee18b_0
- tk=8.6.12=h1ccaba5_0
- wheel=0.41.2=py311h06a4308_0
- xz=5.4.2=h5eee18b_0
- zlib=1.2.13=h5eee18b_0
- pip:
- accelerate==0.27.0
- aiohttp==3.8.5
- aiosignal==1.3.1
- auto-gptq==0.7.1
- async-timeout==4.0.3
- attrs==23.1.0
- bark==0.1.5
- boto3==1.28.61
- botocore==1.31.61
- certifi==2023.7.22
- coloredlogs==15.0.1
- TTS==0.22.0
- charset-normalizer==3.3.0
- datasets==2.14.5
- sentence-transformers==2.5.1 # Updated Version
- sentencepiece==0.1.99
- dill==0.3.7
- einops==0.7.0
- encodec==0.1.1
- filelock==3.12.4
- frozenlist==1.4.0
- fsspec==2023.6.0
- funcy==2.0
- grpcio==1.59.0
- huggingface-hub
- humanfriendly==10.0
- idna==3.4
- jinja2==3.1.2
- jmespath==1.0.1
- markupsafe==2.1.3
- mpmath==1.3.0
- multidict==6.0.4
- multiprocess==0.70.15
- networkx
- numpy==1.26.0
- onnx==1.15.0
- openvino==2024.1.0
- openvino-telemetry==2024.1.0
- optimum[openvino]==1.19.1
- optimum-intel==1.16.1
- packaging==23.2
- pandas
- peft==0.5.0
- protobuf==4.24.4
- psutil==5.9.5
- pyarrow==13.0.0
- python-dateutil==2.8.2
- pytz==2023.3.post1
- pyyaml==6.0.1
- regex==2023.10.3
- requests==2.31.0
- rouge==1.0.1
- s3transfer==0.7.0
- safetensors>=0.4.1
- scipy==1.12.0 # Updated Version
- six==1.16.0
- sympy==1.12
- tokenizers
- torch==2.1.2
- torchvision==0.16.2
- torchaudio==2.1.2
- tqdm==4.66.1
- triton==2.1.0
- typing-extensions==4.8.0
- tzdata==2023.3
- urllib3==1.26.17
- xxhash==3.4.1
- yarl==1.9.2
- soundfile
- langid
- wget
- unidecode
- pyopenjtalk-prebuilt
- pypinyin
- inflect
- cn2an
- jieba
- eng_to_ipa
- openai-whisper
- matplotlib
- gradio==3.41.2
- nltk
- sudachipy
- sudachidict_core
- vocos
- vllm>=0.4.0
- transformers>=4.38.2 # Updated Version
- transformers_stream_generator==0.0.5
- xformers==0.0.23.post1
- rerankers[transformers]
- pydantic
prefix: /opt/conda/envs/transformers

View File

@@ -0,0 +1,213 @@
# init handles the setup of the library
#
# use the library by adding the following line to a script:
# source $(dirname $0)/../common/libbackend.sh
#
# If you want to limit what targets a backend can be used on, set the variable LIMIT_TARGETS to a
# space separated list of valid targets BEFORE sourcing the library, for example to only allow a backend
# to be used on CUDA and CPU backends:
#
# LIMIT_TARGETS="cublas cpu"
# source $(dirname $0)/../common/libbackend.sh
#
# You can use any valid BUILD_TYPE or BUILD_PROFILE, if you need to limit a backend to CUDA 12 only:
#
# LIMIT_TARGETS="cublas12"
# source $(dirname $0)/../common/libbackend.sh
#
function init() {
BACKEND_NAME=${PWD##*/}
MY_DIR=$(realpath `dirname $0`)
BUILD_PROFILE=$(getBuildProfile)
# If a backend has defined a list of valid build profiles...
if [ ! -z "${LIMIT_TARGETS}" ]; then
isValidTarget=$(checkTargets ${LIMIT_TARGETS})
if [ ${isValidTarget} != true ]; then
echo "${BACKEND_NAME} can only be used on the following targets: ${LIMIT_TARGETS}"
exit 0
fi
fi
echo "Initializing libbackend for ${BACKEND_NAME}"
}
# getBuildProfile will inspect the system to determine which build profile is appropriate:
# returns one of the following:
# - cublas11
# - cublas12
# - hipblas
# - intel
function getBuildProfile() {
# First check if we are a cublas build, and if so report the correct build profile
if [ x"${BUILD_TYPE}" == "xcublas" ]; then
if [ ! -z ${CUDA_MAJOR_VERSION} ]; then
# If we have been given a CUDA version, we trust it
echo ${BUILD_TYPE}${CUDA_MAJOR_VERSION}
else
# We don't know what version of cuda we are, so we report ourselves as a generic cublas
echo ${BUILD_TYPE}
fi
return 0
fi
# If /opt/intel exists, then we are doing an intel/ARC build
if [ -d "/opt/intel" ]; then
echo "intel"
return 0
fi
# If for any other values of BUILD_TYPE, we don't need any special handling/discovery
if [ ! -z ${BUILD_TYPE} ]; then
echo ${BUILD_TYPE}
return 0
fi
# If there is no BUILD_TYPE set at all, set a build-profile value of CPU, we aren't building for any GPU targets
echo "cpu"
}
# ensureVenv makes sure that the venv for the backend both exists, and is activated.
#
# This function is idempotent, so you can call it as many times as you want and it will
# always result in an activated virtual environment
function ensureVenv() {
if [ ! -d "${MY_DIR}/venv" ]; then
uv venv ${MY_DIR}/venv
echo "virtualenv created"
fi
if [ "x${VIRTUAL_ENV}" != "x${MY_DIR}/venv" ]; then
source ${MY_DIR}/venv/bin/activate
echo "virtualenv activated"
fi
echo "activated virtualenv has been ensured"
}
# installRequirements looks for several requirements files and if they exist runs the install for them in order
#
# - requirements-install.txt
# - requirements.txt
# - requirements-${BUILD_TYPE}.txt
# - requirements-${BUILD_PROFILE}.txt
#
# BUILD_PROFILE is a pore specific version of BUILD_TYPE, ex: cuda11 or cuda12
# it can also include some options that we do not have BUILD_TYPES for, ex: intel
#
# NOTE: for BUILD_PROFILE==intel, this function does NOT automatically use the Intel python package index.
# you may want to add the following line to a requirements-intel.txt if you use one:
#
# --index-url https://pytorch-extension.intel.com/release-whl/stable/xpu/us/
#
# If you need to add extra flags into the pip install command you can do so by setting the variable EXTRA_PIP_INSTALL_FLAGS
# before calling installRequirements. For example:
#
# source $(dirname $0)/../common/libbackend.sh
# EXTRA_PIP_INSTALL_FLAGS="--no-build-isolation"
# installRequirements
function installRequirements() {
ensureVenv
# These are the requirements files we will attempt to install, in order
declare -a requirementFiles=(
"${MY_DIR}/requirements-install.txt"
"${MY_DIR}/requirements.txt"
"${MY_DIR}/requirements-${BUILD_TYPE}.txt"
)
if [ "x${BUILD_TYPE}" != "x${BUILD_PROFILE}" ]; then
requirementFiles+=("${MY_DIR}/requirements-${BUILD_PROFILE}.txt")
fi
for reqFile in ${requirementFiles[@]}; do
if [ -f ${reqFile} ]; then
echo "starting requirements install for ${reqFile}"
uv pip install ${EXTRA_PIP_INSTALL_FLAGS} --requirement ${reqFile}
echo "finished requirements install for ${reqFile}"
fi
done
}
# startBackend discovers and runs the backend GRPC server
#
# You can specify a specific backend file to execute by setting BACKEND_FILE before calling startBackend.
# example:
#
# source ../common/libbackend.sh
# BACKEND_FILE="${MY_DIR}/source/backend.py"
# startBackend $@
#
# valid filenames for autodiscovered backend servers are:
# - server.py
# - backend.py
# - ${BACKEND_NAME}.py
function startBackend() {
ensureVenv
if [ ! -z ${BACKEND_FILE} ]; then
python ${BACKEND_FILE} $@
elif [ -e "${MY_DIR}/server.py" ]; then
python ${MY_DIR}/server.py $@
elif [ -e "${MY_DIR}/backend.py" ]; then
python ${MY_DIR}/backend.py $@
elif [ -e "${MY_DIR}/${BACKEND_NAME}.py" ]; then
python ${MY_DIR}/${BACKEND_NAME}.py $@
fi
}
# runUnittests discovers and runs python unittests
#
# You can specify a specific test file to use by setting TEST_FILE before calling runUnittests.
# example:
#
# source ../common/libbackend.sh
# TEST_FILE="${MY_DIR}/source/test.py"
# runUnittests $@
#
# be default a file named test.py in the backends directory will be used
function runUnittests() {
ensureVenv
if [ ! -z ${TEST_FILE} ]; then
testDir=$(dirname `realpath ${TEST_FILE}`)
testFile=$(basename ${TEST_FILE})
pushd ${testDir}
python -m unittest ${testFile}
popd
elif [ -f "${MY_DIR}/test.py" ]; then
pushd ${MY_DIR}
python -m unittest test.py
popd
else
echo "no tests defined for ${BACKEND_NAME}"
fi
}
##################################################################################
# Below here are helper functions not intended to be used outside of the library #
##################################################################################
# checkTargets determines if the current BUILD_TYPE or BUILD_PROFILE is in a list of valid targets
function checkTargets() {
# Collect all provided targets into a variable and...
targets=$@
# ...convert it into an array
declare -a targets=($targets)
for target in ${targets[@]}; do
if [ "x${BUILD_TYPE}" == "x${target}" ]; then
echo true
return 0
fi
if [ "x${BUILD_PROFILE}" == "x${target}" ]; then
echo true
return 0
fi
done
echo false
}
init

View File

@@ -0,0 +1,19 @@
.DEFAULT_GOAL := install
.PHONY: install
install: protogen
bash install.sh
.PHONY: protogen
protogen: backend_pb2_grpc.py backend_pb2.py
.PHONY: protogen-clean
protogen-clean:
$(RM) backend_pb2_grpc.py backend_pb2.py
backend_pb2_grpc.py backend_pb2.py:
python3 -m grpc_tools.protoc -I../.. --python_out=. --grpc_python_out=. backend.proto
.PHONY: clean
clean: protogen-clean
rm -rf venv __pycache__

View File

@@ -0,0 +1,4 @@
#!/usr/bin/env python3
import grpc
import backend_pb2
import backend_pb2_grpc

View File

@@ -0,0 +1,14 @@
#!/bin/bash
set -e
source $(dirname $0)/../common/libbackend.sh
# This is here because the Intel pip index is broken and returns 200 status codes for every package name, it just doesn't return any package links.
# This makes uv think that the package exists in the Intel pip index, and by default it stops looking at other pip indexes once it finds a match.
# We need uv to continue falling through to the pypi default index to find optimum[openvino] in the pypi index
# the --upgrade actually allows us to *downgrade* torch to the version provided in the Intel pip index
if [ "x${BUILD_PROFILE}" == "xintel" ]; then
EXTRA_PIP_INSTALL_FLAGS+=" --upgrade --index-strategy=unsafe-first-match"
fi
installRequirements

View File

@@ -0,0 +1,2 @@
--extra-index-url https://download.pytorch.org/whl/rocm6.0
torch

View File

@@ -0,0 +1,4 @@
--extra-index-url https://pytorch-extension.intel.com/release-whl/stable/xpu/us/
intel-extension-for-pytorch
torch
optimum[openvino]

View File

@@ -0,0 +1,2 @@
grpcio==1.64.0
protobuf

View File

@@ -0,0 +1,4 @@
#!/bin/bash
source $(dirname $0)/../common/libbackend.sh
startBackend $@

View File

@@ -0,0 +1,6 @@
#!/bin/bash
set -e
source $(dirname $0)/../common/libbackend.sh
runUnittests

View File

@@ -1,6 +1,6 @@
.PHONY: coqui
coqui: protogen
$(MAKE) -C ../common-env/transformers
bash install.sh
.PHONY: run
run: protogen
@@ -22,4 +22,8 @@ protogen-clean:
$(RM) backend_pb2_grpc.py backend_pb2.py
backend_pb2_grpc.py backend_pb2.py:
python3 -m grpc_tools.protoc -I../.. --python_out=. --grpc_python_out=. backend.proto
python3 -m grpc_tools.protoc -I../.. --python_out=. --grpc_python_out=. backend.proto
.PHONY: clean
clean: protogen-clean
rm -rf venv __pycache__

View File

@@ -66,7 +66,21 @@ class BackendServicer(backend_pb2_grpc.BackendServicer):
def TTS(self, request, context):
try:
self.tts.tts_to_file(text=request.text, speaker_wav=self.AudioPath, language=COQUI_LANGUAGE, file_path=request.dst)
# if model is multilangual add language from request or env as fallback
lang = request.language or COQUI_LANGUAGE
if lang == "":
lang = None
if self.tts.is_multi_lingual and lang is None:
return backend_pb2.Result(success=False, message=f"Model is multi-lingual, but no language was provided")
# if model is multi-speaker, use speaker_wav or the speaker_id from request.voice
if self.tts.is_multi_speaker and self.AudioPath is None and request.voice is None:
return backend_pb2.Result(success=False, message=f"Model is multi-speaker, but no speaker was provided")
if self.tts.is_multi_speaker and request.voice is not None:
self.tts.tts_to_file(text=request.text, speaker=request.voice, language=lang, file_path=request.dst)
else:
self.tts.tts_to_file(text=request.text, speaker_wav=self.AudioPath, language=lang, file_path=request.dst)
except Exception as err:
return backend_pb2.Result(success=False, message=f"Unexpected {err=}, {type(err)=}")
return backend_pb2.Result(success=True)

14
backend/python/coqui/install.sh Executable file
View File

@@ -0,0 +1,14 @@
#!/bin/bash
set -e
source $(dirname $0)/../common/libbackend.sh
# This is here because the Intel pip index is broken and returns 200 status codes for every package name, it just doesn't return any package links.
# This makes uv think that the package exists in the Intel pip index, and by default it stops looking at other pip indexes once it finds a match.
# We need uv to continue falling through to the pypi default index to find optimum[openvino] in the pypi index
# the --upgrade actually allows us to *downgrade* torch to the version provided in the Intel pip index
if [ "x${BUILD_PROFILE}" == "xintel" ]; then
EXTRA_PIP_INSTALL_FLAGS+=" --upgrade --index-strategy=unsafe-first-match"
fi
installRequirements

View File

@@ -0,0 +1,3 @@
--extra-index-url https://download.pytorch.org/whl/rocm6.0
torch
torchaudio

View File

@@ -0,0 +1,6 @@
--extra-index-url https://pytorch-extension.intel.com/release-whl/stable/xpu/us/
intel-extension-for-pytorch
torch
torchaudio
optimum[openvino]
setuptools==69.5.1 # https://github.com/mudler/LocalAI/issues/2406

View File

@@ -0,0 +1,6 @@
accelerate
TTS==0.22.0
grpcio==1.64.0
protobuf
certifi
transformers

View File

@@ -1,14 +1,4 @@
#!/bin/bash
source $(dirname $0)/../common/libbackend.sh
##
## A bash script wrapper that runs the ttsbark server with conda
export PATH=$PATH:/opt/conda/bin
# Activate conda environment
source activate transformers
# get the directory where the bash script is located
DIR="$( cd "$( dirname "${BASH_SOURCE[0]}" )" >/dev/null 2>&1 && pwd )"
python $DIR/coqui_server.py $@
startBackend $@

View File

@@ -18,7 +18,7 @@ class TestBackendServicer(unittest.TestCase):
"""
This method sets up the gRPC service by starting the server
"""
self.service = subprocess.Popen(["python3", "coqui_server.py", "--addr", "localhost:50051"])
self.service = subprocess.Popen(["python3", "backend.py", "--addr", "localhost:50051"])
time.sleep(10)
def tearDown(self) -> None:

11
backend/python/coqui/test.sh Normal file → Executable file
View File

@@ -1,11 +1,6 @@
#!/bin/bash
##
## A bash script wrapper that runs the bark server with conda
set -e
# Activate conda environment
source activate transformers
source $(dirname $0)/../common/libbackend.sh
# get the directory where the bash script is located
DIR="$( cd "$( dirname "${BASH_SOURCE[0]}" )" >/dev/null 2>&1 && pwd )"
python -m unittest $DIR/test.py
runUnittests

View File

@@ -13,8 +13,7 @@ endif
.PHONY: diffusers
diffusers: protogen
@echo "Installing $(CONDA_ENV_PATH)..."
bash install.sh $(CONDA_ENV_PATH)
bash install.sh
.PHONY: run
run: protogen
@@ -33,4 +32,8 @@ protogen-clean:
$(RM) backend_pb2_grpc.py backend_pb2.py
backend_pb2_grpc.py backend_pb2.py:
python3 -m grpc_tools.protoc -I../.. --python_out=. --grpc_python_out=. backend.proto
python3 -m grpc_tools.protoc -I../.. --python_out=. --grpc_python_out=. backend.proto
.PHONY: clean
clean: protogen-clean
rm -rf venv __pycache__

View File

@@ -1,65 +0,0 @@
name: diffusers
channels:
- defaults
dependencies:
- _libgcc_mutex=0.1=main
- _openmp_mutex=5.1=1_gnu
- bzip2=1.0.8=h7b6447c_0
- ca-certificates=2023.08.22=h06a4308_0
- ld_impl_linux-64=2.38=h1181459_1
- libffi=3.4.4=h6a678d5_0
- libgcc-ng=11.2.0=h1234567_1
- libgomp=11.2.0=h1234567_1
- libstdcxx-ng=11.2.0=h1234567_1
- libuuid=1.41.5=h5eee18b_0
- ncurses=6.4=h6a678d5_0
- openssl=3.0.11=h7f8727e_2
- pip=23.2.1=py311h06a4308_0
- python=3.11.5=h955ad1f_0
- readline=8.2=h5eee18b_0
- setuptools=68.0.0=py311h06a4308_0
- sqlite=3.41.2=h5eee18b_0
- tk=8.6.12=h1ccaba5_0
- tzdata=2023c=h04d1e81_0
- wheel=0.41.2=py311h06a4308_0
- xz=5.4.2=h5eee18b_0
- zlib=1.2.13=h5eee18b_0
- pip:
- --pre
- --extra-index-url https://download.pytorch.org/whl/nightly/
- accelerate>=0.11.0
- certifi==2023.7.22
- charset-normalizer==3.3.0
- compel==2.0.2
- diffusers==0.24.0
- filelock==3.12.4
- fsspec==2023.9.2
- grpcio==1.59.0
- huggingface-hub>=0.19.4
- idna==3.4
- importlib-metadata==6.8.0
- jinja2==3.1.2
- markupsafe==2.1.3
- mpmath==1.3.0
- networkx==3.1
- numpy==1.26.0
- omegaconf
- packaging==23.2
- pillow==10.0.1
- protobuf==4.24.4
- psutil==5.9.5
- pyparsing==3.1.1
- pyyaml==6.0.1
- regex==2023.10.3
- requests==2.31.0
- safetensors==0.4.0
- sympy==1.12
- tqdm==4.66.1
- transformers>=4.25.1
- triton==2.1.0
- typing-extensions==4.8.0
- urllib3==2.0.6
- zipp==3.17.0
- torch
- opencv-python
prefix: /opt/conda/envs/diffusers

View File

@@ -1,75 +0,0 @@
name: diffusers
channels:
- defaults
dependencies:
- _libgcc_mutex=0.1=main
- _openmp_mutex=5.1=1_gnu
- bzip2=1.0.8=h7b6447c_0
- ca-certificates=2023.08.22=h06a4308_0
- ld_impl_linux-64=2.38=h1181459_1
- libffi=3.4.4=h6a678d5_0
- libgcc-ng=11.2.0=h1234567_1
- libgomp=11.2.0=h1234567_1
- libstdcxx-ng=11.2.0=h1234567_1
- libuuid=1.41.5=h5eee18b_0
- ncurses=6.4=h6a678d5_0
- openssl=3.0.11=h7f8727e_2
- pip=23.2.1=py311h06a4308_0
- python=3.11.5=h955ad1f_0
- readline=8.2=h5eee18b_0
- setuptools=68.0.0=py311h06a4308_0
- sqlite=3.41.2=h5eee18b_0
- tk=8.6.12=h1ccaba5_0
- tzdata=2023c=h04d1e81_0
- wheel=0.41.2=py311h06a4308_0
- xz=5.4.2=h5eee18b_0
- zlib=1.2.13=h5eee18b_0
- pip:
- accelerate>=0.11.0
- certifi==2023.7.22
- charset-normalizer==3.3.0
- compel==2.0.2
- diffusers==0.24.0
- filelock==3.12.4
- fsspec==2023.9.2
- grpcio==1.59.0
- huggingface-hub>=0.19.4
- idna==3.4
- importlib-metadata==6.8.0
- jinja2==3.1.2
- markupsafe==2.1.3
- mpmath==1.3.0
- networkx==3.1
- numpy==1.26.0
- nvidia-cublas-cu12==12.1.3.1
- nvidia-cuda-cupti-cu12==12.1.105
- nvidia-cuda-nvrtc-cu12==12.1.105
- nvidia-cuda-runtime-cu12==12.1.105
- nvidia-cudnn-cu12==8.9.2.26
- nvidia-cufft-cu12==11.0.2.54
- nvidia-curand-cu12==10.3.2.106
- nvidia-cusolver-cu12==11.4.5.107
- nvidia-cusparse-cu12==12.1.0.106
- nvidia-nccl-cu12==2.18.1
- nvidia-nvjitlink-cu12==12.2.140
- nvidia-nvtx-cu12==12.1.105
- omegaconf
- packaging==23.2
- pillow==10.0.1
- protobuf==4.24.4
- psutil==5.9.5
- pyparsing==3.1.1
- pyyaml==6.0.1
- regex==2023.10.3
- requests==2.31.0
- safetensors==0.4.0
- sympy==1.12
- torch==2.1.0
- tqdm==4.66.1
- transformers>=4.25.1
- triton==2.1.0
- typing-extensions==4.8.0
- urllib3==2.0.6
- zipp==3.17.0
- opencv-python
prefix: /opt/conda/envs/diffusers

View File

@@ -1,50 +1,14 @@
#!/bin/bash
set -ex
set -e
SKIP_CONDA=${SKIP_CONDA:-0}
source $(dirname $0)/../common/libbackend.sh
# Check if environment exist
conda_env_exists(){
! conda list --name "${@}" >/dev/null 2>/dev/null
}
if [ $SKIP_CONDA -eq 1 ]; then
echo "Skipping conda environment installation"
else
export PATH=$PATH:/opt/conda/bin
if conda_env_exists "diffusers" ; then
echo "Creating virtual environment..."
conda env create --name diffusers --file $1
echo "Virtual environment created."
else
echo "Virtual environment already exists."
fi
# This is here because the Intel pip index is broken and returns 200 status codes for every package name, it just doesn't return any package links.
# This makes uv think that the package exists in the Intel pip index, and by default it stops looking at other pip indexes once it finds a match.
# We need uv to continue falling through to the pypi default index to find optimum[openvino] in the pypi index
# the --upgrade actually allows us to *downgrade* torch to the version provided in the Intel pip index
if [ "x${BUILD_PROFILE}" == "xintel" ]; then
EXTRA_PIP_INSTALL_FLAGS+=" --upgrade --index-strategy=unsafe-first-match"
fi
if [ -d "/opt/intel" ]; then
# Intel GPU: If the directory exists, we assume we are using the Intel image
# https://github.com/intel/intel-extension-for-pytorch/issues/538
pip install torch==2.1.0a0 \
torchvision==0.16.0a0 \
torchaudio==2.1.0a0 \
intel-extension-for-pytorch==2.1.10+xpu \
--extra-index-url https://pytorch-extension.intel.com/release-whl/stable/xpu/us/
pip install google-api-python-client \
grpcio \
grpcio-tools \
diffusers==0.24.0 \
transformers>=4.25.1 \
accelerate \
compel==2.0.2 \
Pillow
fi
if [ "$PIP_CACHE_PURGE" = true ] ; then
if [ $SKIP_CONDA -ne 1 ]; then
# Activate conda environment
source activate diffusers
fi
pip cache purge
fi
installRequirements

View File

@@ -0,0 +1,3 @@
--extra-index-url https://download.pytorch.org/whl/rocm6.0
torch
torchvision

View File

@@ -0,0 +1,6 @@
--extra-index-url https://pytorch-extension.intel.com/release-whl/stable/xpu/us/
intel-extension-for-pytorch
torch
torchvision
optimum[openvino]
setuptools==69.5.1 # https://github.com/mudler/LocalAI/issues/2406

View File

@@ -0,0 +1,10 @@
accelerate
compel
diffusers
grpcio==1.64.0
opencv-python
pillow
protobuf
torch
transformers
certifi

View File

@@ -1,19 +1,4 @@
#!/bin/bash
source $(dirname $0)/../common/libbackend.sh
##
## A bash script wrapper that runs the diffusers server with conda
if [ -d "/opt/intel" ]; then
# Assumes we are using the Intel oneAPI container image
# https://github.com/intel/intel-extension-for-pytorch/issues/538
export XPU=1
else
export PATH=$PATH:/opt/conda/bin
# Activate conda environment
source activate diffusers
fi
# get the directory where the bash script is located
DIR="$( cd "$( dirname "${BASH_SOURCE[0]}" )" >/dev/null 2>&1 && pwd )"
python $DIR/backend_diffusers.py $@
startBackend $@

View File

@@ -18,7 +18,7 @@ class TestBackendServicer(unittest.TestCase):
"""
This method sets up the gRPC service by starting the server
"""
self.service = subprocess.Popen(["python3", "backend_diffusers.py", "--addr", "localhost:50051"])
self.service = subprocess.Popen(["python3", "backend.py", "--addr", "localhost:50051"])
def tearDown(self) -> None:
"""

14
backend/python/diffusers/test.sh Normal file → Executable file
View File

@@ -1,14 +1,6 @@
#!/bin/bash
set -e
##
## A bash script wrapper that runs the diffusers server with conda
source $(dirname $0)/../common/libbackend.sh
export PATH=$PATH:/opt/conda/bin
# Activate conda environment
source activate diffusers
# get the directory where the bash script is located
DIR="$( cd "$( dirname "${BASH_SOURCE[0]}" )" >/dev/null 2>&1 && pwd )"
python -m unittest $DIR/test.py
runUnittests

1
backend/python/exllama/.gitignore vendored Normal file
View File

@@ -0,0 +1 @@
source

View File

@@ -18,4 +18,8 @@ protogen-clean:
$(RM) backend_pb2_grpc.py backend_pb2.py
backend_pb2_grpc.py backend_pb2.py:
python3 -m grpc_tools.protoc -I../.. --python_out=. --grpc_python_out=. backend.proto
python3 -m grpc_tools.protoc -I../.. --python_out=. --grpc_python_out=. backend.proto
.PHONY: clean
clean: protogen-clean
$(RM) -r venv source __pycache__

View File

@@ -14,9 +14,9 @@ import torch
import torch.nn.functional as F
from torch import version as torch_version
from tokenizer import ExLlamaTokenizer
from generator import ExLlamaGenerator
from model import ExLlama, ExLlamaCache, ExLlamaConfig
from source.tokenizer import ExLlamaTokenizer
from source.generator import ExLlamaGenerator
from source.model import ExLlama, ExLlamaCache, ExLlamaConfig
_ONE_DAY_IN_SECONDS = 60 * 60 * 24

View File

@@ -1,56 +0,0 @@
name: exllama
channels:
- defaults
dependencies:
- _libgcc_mutex=0.1=main
- _openmp_mutex=5.1=1_gnu
- bzip2=1.0.8=h7b6447c_0
- ca-certificates=2023.08.22=h06a4308_0
- ld_impl_linux-64=2.38=h1181459_1
- libffi=3.4.4=h6a678d5_0
- libgcc-ng=11.2.0=h1234567_1
- libgomp=11.2.0=h1234567_1
- libstdcxx-ng=11.2.0=h1234567_1
- libuuid=1.41.5=h5eee18b_0
- ncurses=6.4=h6a678d5_0
- openssl=3.0.11=h7f8727e_2
- pip=23.2.1=py311h06a4308_0
- python=3.11.5=h955ad1f_0
- readline=8.2=h5eee18b_0
- setuptools=68.0.0=py311h06a4308_0
- sqlite=3.41.2=h5eee18b_0
- tk=8.6.12=h1ccaba5_0
- tzdata=2023c=h04d1e81_0
- wheel=0.41.2=py311h06a4308_0
- xz=5.4.2=h5eee18b_0
- zlib=1.2.13=h5eee18b_0
- pip:
- filelock==3.12.4
- fsspec==2023.9.2
- grpcio==1.59.0
- jinja2==3.1.2
- markupsafe==2.1.3
- mpmath==1.3.0
- networkx==3.1
- ninja==1.11.1
- protobuf==4.24.4
- nvidia-cublas-cu12==12.1.3.1
- nvidia-cuda-cupti-cu12==12.1.105
- nvidia-cuda-nvrtc-cu12==12.1.105
- nvidia-cuda-runtime-cu12==12.1.105
- nvidia-cudnn-cu12==8.9.2.26
- nvidia-cufft-cu12==11.0.2.54
- nvidia-curand-cu12==10.3.2.106
- nvidia-cusolver-cu12==11.4.5.107
- nvidia-cusparse-cu12==12.1.0.106
- nvidia-nccl-cu12==2.18.1
- nvidia-nvjitlink-cu12==12.2.140
- nvidia-nvtx-cu12==12.1.105
- safetensors==0.3.2
- sentencepiece==0.1.99
- sympy==1.12
- torch==2.1.0
- triton==2.1.0
- typing-extensions==4.8.0
- numpy
prefix: /opt/conda/envs/exllama

View File

@@ -1,32 +1,13 @@
#!/bin/bash
set -ex
set -e
export PATH=$PATH:/opt/conda/bin
LIMIT_TARGETS="cublas"
if [ "$BUILD_TYPE" != "cublas" ]; then
echo "[exllama] Attention!!! Nvidia GPU is required - skipping installation"
exit 0
fi
source $(dirname $0)/../common/libbackend.sh
# Check if environment exist
conda_env_exists(){
! conda list --name "${@}" >/dev/null 2>/dev/null
}
installRequirements
if conda_env_exists "exllama" ; then
echo "Creating virtual environment..."
conda env create --name exllama --file $1
echo "Virtual environment created."
else
echo "Virtual environment already exists."
fi
git clone https://github.com/turboderp/exllama $MY_DIR/source
uv pip install ${BUILD_ISOLATION_FLAG} --requirement ${MY_DIR}/source/requirements.txt
source activate exllama
git clone https://github.com/turboderp/exllama $CONDA_PREFIX/exllama && pushd $CONDA_PREFIX/exllama && pip install -r requirements.txt && popd
cp -rfv $CONDA_PREFIX/exllama/* ./
if [ "$PIP_CACHE_PURGE" = true ] ; then
pip cache purge
fi
cp -v ./*py $MY_DIR/source/

View File

@@ -0,0 +1,6 @@
grpcio==1.64.0
protobuf
torch
transformers
certifi
setuptools

View File

@@ -1,15 +1,7 @@
#!/bin/bash
LIMIT_TARGETS="cublas"
BACKEND_FILE="${MY_DIR}/source/backend.py"
##
## A bash script wrapper that runs the exllama server with conda
export PATH=$PATH:/opt/conda/bin
source $(dirname $0)/../common/libbackend.sh
# Activate conda environment
source activate exllama
# get the directory where the bash script is located
DIR="$( cd "$( dirname "${BASH_SOURCE[0]}" )" >/dev/null 2>&1 && pwd )"
cd $DIR
python $DIR/exllama.py $@
startBackend $@

6
backend/python/exllama/test.sh Executable file
View File

@@ -0,0 +1,6 @@
#!/bin/bash
set -e
source $(dirname $0)/../common/libbackend.sh
runUnittests

1
backend/python/exllama2/.gitignore vendored Normal file
View File

@@ -0,0 +1 @@
source

View File

@@ -1,6 +1,5 @@
.PHONY: exllama2
exllama2: protogen
$(MAKE) -C ../common-env/transformers
bash install.sh
.PHONY: run
@@ -17,4 +16,8 @@ protogen-clean:
$(RM) backend_pb2_grpc.py backend_pb2.py
backend_pb2_grpc.py backend_pb2.py:
python3 -m grpc_tools.protoc -I../.. --python_out=. --grpc_python_out=. backend.proto
python3 -m grpc_tools.protoc -I../.. --python_out=. --grpc_python_out=. backend.proto
.PHONY: clean
clean: protogen-clean
$(RM) -r venv source __pycache__

View File

@@ -1,57 +0,0 @@
name: exllama2
channels:
- defaults
dependencies:
- _libgcc_mutex=0.1=main
- _openmp_mutex=5.1=1_gnu
- bzip2=1.0.8=h7b6447c_0
- ca-certificates=2023.08.22=h06a4308_0
- ld_impl_linux-64=2.38=h1181459_1
- libffi=3.4.4=h6a678d5_0
- libgcc-ng=11.2.0=h1234567_1
- libgomp=11.2.0=h1234567_1
- libstdcxx-ng=11.2.0=h1234567_1
- libuuid=1.41.5=h5eee18b_0
- ncurses=6.4=h6a678d5_0
- openssl=3.0.11=h7f8727e_2
- pip=23.2.1=py311h06a4308_0
- python=3.11.5=h955ad1f_0
- readline=8.2=h5eee18b_0
- setuptools=68.0.0=py311h06a4308_0
- sqlite=3.41.2=h5eee18b_0
- tk=8.6.12=h1ccaba5_0
- tzdata=2023c=h04d1e81_0
- wheel=0.41.2=py311h06a4308_0
- xz=5.4.2=h5eee18b_0
- zlib=1.2.13=h5eee18b_0
- pip:
- filelock==3.12.4
- fsspec==2023.9.2
- grpcio==1.59.0
- markupsafe==2.1.3
- mpmath==1.3.0
- networkx==3.1
- protobuf==4.24.4
- nvidia-cublas-cu12==12.1.3.1
- nvidia-cuda-cupti-cu12==12.1.105
- nvidia-cuda-nvrtc-cu12==12.1.105
- nvidia-cuda-runtime-cu12==12.1.105
- nvidia-cudnn-cu12==8.9.2.26
- nvidia-cufft-cu12==11.0.2.54
- nvidia-curand-cu12==10.3.2.106
- nvidia-cusolver-cu12==11.4.5.107
- nvidia-cusparse-cu12==12.1.0.106
- nvidia-nccl-cu12==2.18.1
- nvidia-nvjitlink-cu12==12.2.140
- nvidia-nvtx-cu12==12.1.105
- pandas
- numpy
- ninja
- fastparquet
- torch>=2.1.0
- safetensors>=0.3.2
- sentencepiece>=0.1.97
- pygments
- websockets
- regex
prefix: /opt/conda/envs/exllama2

View File

@@ -1,32 +1,16 @@
#!/bin/bash
set -e
##
## A bash script installs the required dependencies of VALL-E-X and prepares the environment
export SHA=c0ddebaaaf8ffd1b3529c2bb654e650bce2f790f
if [ "$BUILD_TYPE" != "cublas" ]; then
echo "[exllamav2] Attention!!! Nvidia GPU is required - skipping installation"
exit 0
fi
LIMIT_TARGETS="cublas"
EXTRA_PIP_INSTALL_FLAGS="--no-build-isolation"
EXLLAMA2_VERSION=c0ddebaaaf8ffd1b3529c2bb654e650bce2f790f
export PATH=$PATH:/opt/conda/bin
source activate transformers
source $(dirname $0)/../common/libbackend.sh
echo $CONDA_PREFIX
installRequirements
git clone https://github.com/turboderp/exllamav2 $CONDA_PREFIX/exllamav2
git clone https://github.com/turboderp/exllamav2 $MY_DIR/source
pushd ${MY_DIR}/source && git checkout -b build ${EXLLAMA2_VERSION} && popd
pushd $CONDA_PREFIX/exllamav2
git checkout -b build $SHA
# TODO: this needs to be pinned within the conda environments
pip install -r requirements.txt
popd
cp -rfv $CONDA_PREFIX/exllamav2/* ./
if [ "$PIP_CACHE_PURGE" = true ] ; then
pip cache purge
fi
# This installs exllamav2 in JIT mode so it will compile the appropriate torch extension at runtime
EXLLAMA_NOCOMPILE= uv pip install ${EXTRA_PIP_INSTALL_FLAGS} ${MY_DIR}/source/

View File

@@ -0,0 +1,4 @@
# This is here to trigger the install script to add --no-build-isolation to the uv pip install commands
# exllama2 does not specify it's build requirements per PEP517, so we need to provide some things ourselves
wheel
setuptools

View File

@@ -0,0 +1,7 @@
accelerate
grpcio==1.64.0
protobuf
certifi
torch
wheel
setuptools

View File

@@ -1,16 +1,6 @@
#!/bin/bash
LIMIT_TARGETS="cublas"
##
## A bash script wrapper that runs the exllama server with conda
source $(dirname $0)/../common/libbackend.sh
export PATH=$PATH:/opt/conda/bin
# Activate conda environment
source activate transformers
# get the directory where the bash script is located
DIR="$( cd "$( dirname "${BASH_SOURCE[0]}" )" >/dev/null 2>&1 && pwd )"
cd $DIR
python $DIR/exllama2_backend.py $@
startBackend $@

View File

@@ -0,0 +1,6 @@
#!/bin/bash
set -e
source $(dirname $0)/../common/libbackend.sh
runUnittests

View File

@@ -1,7 +1,6 @@
.PHONY: mamba
mamba: protogen
$(MAKE) -C ../common-env/transformers
bash install.sh
bash install.sh
.PHONY: run
run: protogen
@@ -23,4 +22,8 @@ protogen-clean:
$(RM) backend_pb2_grpc.py backend_pb2.py
backend_pb2_grpc.py backend_pb2.py:
python3 -m grpc_tools.protoc -I../.. --python_out=. --grpc_python_out=. backend.proto
python3 -m grpc_tools.protoc -I../.. --python_out=. --grpc_python_out=. backend.proto
.PHONY: clean
clean: protogen-clean
$(RM) -r venv __pycache__

Some files were not shown because too many files have changed in this diff Show More