The binary is now named "llama-cpp-rpc-server" for p2p workers.
We also decrease the default token rotation interval, in this way
peer discovery is much more responsive.
Signed-off-by: Ettore Di Giacinto <mudler@localai.io>
* feat: split remaining backends and drop embedded backends
- Drop silero-vad, huggingface, and stores backend from embedded
binaries
- Refactor Makefile and Dockerfile to avoid building grpc backends
- Drop golang code that was used to embed backends
- Simplify building by using goreleaser
Signed-off-by: Ettore Di Giacinto <mudler@localai.io>
* chore(gallery): be specific with llama-cpp backend templates
Signed-off-by: Ettore Di Giacinto <mudler@localai.io>
* chore(docs): update
Signed-off-by: Ettore Di Giacinto <mudler@localai.io>
* chore(ci): minor fixes
Signed-off-by: Ettore Di Giacinto <mudler@localai.io>
* chore: drop all ffmpeg references
Signed-off-by: Ettore Di Giacinto <mudler@localai.io>
* fix: run protogen-go
Signed-off-by: Ettore Di Giacinto <mudler@localai.io>
* Always enable p2p mode
Signed-off-by: Ettore Di Giacinto <mudler@localai.io>
* Update gorelease file
Signed-off-by: Ettore Di Giacinto <mudler@localai.io>
* fix(stores): do not always load
Signed-off-by: Ettore Di Giacinto <mudler@localai.io>
* Fix linting issues
Signed-off-by: Ettore Di Giacinto <mudler@localai.io>
* Simplify
Signed-off-by: Ettore Di Giacinto <mudler@localai.io>
* Mac OS fixup
Signed-off-by: Ettore Di Giacinto <mudler@localai.io>
---------
Signed-off-by: Ettore Di Giacinto <mudler@localai.io>
* Build llama.cpp separately
Signed-off-by: Ettore Di Giacinto <mudler@localai.io>
* WIP
Signed-off-by: Ettore Di Giacinto <mudler@localai.io>
* WIP
Signed-off-by: Ettore Di Giacinto <mudler@localai.io>
* WIP
Signed-off-by: Ettore Di Giacinto <mudler@localai.io>
* Start to try to attach some tests
Signed-off-by: Ettore Di Giacinto <mudler@localai.io>
* Add git and small fixups
Signed-off-by: Ettore Di Giacinto <mudler@localai.io>
* fix: correctly autoload external backends
Signed-off-by: Ettore Di Giacinto <mudler@localai.io>
* Try to run AIO tests
Signed-off-by: Ettore Di Giacinto <mudler@localai.io>
* Slightly update the Makefile helps
Signed-off-by: Ettore Di Giacinto <mudler@localai.io>
* Adapt auto-bumper
Signed-off-by: Ettore Di Giacinto <mudler@localai.io>
* Try to run linux test
Signed-off-by: Ettore Di Giacinto <mudler@localai.io>
* Add llama-cpp into build pipelines
Signed-off-by: Ettore Di Giacinto <mudler@localai.io>
* Add default capability (for cpu)
Signed-off-by: Ettore Di Giacinto <mudler@localai.io>
* Drop llama-cpp specific logic from the backend loader
Signed-off-by: Ettore Di Giacinto <mudler@localai.io>
* drop grpc install in ci for tests
Signed-off-by: Ettore Di Giacinto <mudler@localai.io>
* fixups
Signed-off-by: Ettore Di Giacinto <mudler@localai.io>
* Pass by backends path for tests
Signed-off-by: Ettore Di Giacinto <mudler@localai.io>
* Build protogen at start
Signed-off-by: Ettore Di Giacinto <mudler@localai.io>
* fix(tests): set backends path consistently
Signed-off-by: Ettore Di Giacinto <mudler@localai.io>
* Correctly configure the backends path
Signed-off-by: Ettore Di Giacinto <mudler@localai.io>
* Try to build for darwin
Signed-off-by: Ettore Di Giacinto <mudler@localai.io>
* WIP
Signed-off-by: Ettore Di Giacinto <mudler@localai.io>
* Compile for metal on arm64/darwin
Signed-off-by: Ettore Di Giacinto <mudler@localai.io>
* Try to run build off from cross-arch
Signed-off-by: Ettore Di Giacinto <mudler@localai.io>
* Add to the backend index nvidia-l4t and cpu's llama-cpp backends
Signed-off-by: Ettore Di Giacinto <mudler@localai.io>
* Build also darwin-x86 for llama-cpp
Signed-off-by: Ettore Di Giacinto <mudler@localai.io>
* Disable arm64 builds temporary
Signed-off-by: Ettore Di Giacinto <mudler@localai.io>
* Test backend build on PR
Signed-off-by: Ettore Di Giacinto <mudler@localai.io>
* Fixup build backend reusable workflow
Signed-off-by: Ettore Di Giacinto <mudler@localai.io>
* pass by skip drivers
Signed-off-by: Ettore Di Giacinto <mudler@localai.io>
* Use crane
Signed-off-by: Ettore Di Giacinto <mudler@localai.io>
* Skip drivers
Signed-off-by: Ettore Di Giacinto <mudler@localai.io>
* Fixups
Signed-off-by: Ettore Di Giacinto <mudler@localai.io>
* x86 darwin
Signed-off-by: Ettore Di Giacinto <mudler@localai.io>
* Add packaging step for llama.cpp
Signed-off-by: Ettore Di Giacinto <mudler@localai.io>
* fixups
Signed-off-by: Ettore Di Giacinto <mudler@localai.io>
* Fix leftover from bark-cpp extraction
Signed-off-by: Ettore Di Giacinto <mudler@localai.io>
* Try to fix hipblas build
Signed-off-by: Ettore Di Giacinto <mudler@localai.io>
---------
Signed-off-by: Ettore Di Giacinto <mudler@localai.io>
* chore(backends): add metas in the gallery
Signed-off-by: Ettore Di Giacinto <mudler@localai.io>
* chore: correctly handle aliases and metas with same names
Signed-off-by: Ettore Di Giacinto <mudler@localai.io>
---------
Signed-off-by: Ettore Di Giacinto <mudler@localai.io>
We can't use the mutate.Extract written bytes as current status as that
will be bigger than the compressed image size. Image manifest don't have
any guarantee of the type of artifact (can be compressed or not) when
showing the layer size.
Split the extraction process in two parts: Downloading and extracting as
a flattened system, in this way we can display the status of downloading
and extracting accordingly.
This change also fixes a small nuance in detecting installed backends,
now it's more consistent and looks if a metadata.json and/or a path with
a `run.sh` file is present.
Signed-off-by: Ettore Di Giacinto <mudler@localai.io>
We can't use the mutate.Extract written bytes as current status as that
will be bigger than the compressed image size. Image manifest don't have
any guarantee of the type of artifact (can be compressed or not) when
showing the layer size.
Split the extraction process in two parts: Downloading and extracting as
a flattened system, in this way we can display the status of downloading
and extracting accordingly.
This change also fixes a small nuance in detecting installed backends,
now it's more consistent and looks if a metadata.json and/or a path with
a `run.sh` file is present.
Signed-off-by: Ettore Di Giacinto <mudler@localai.io>
* feat(backend gallery): add meta packages
So we can have meta packages such as "vllm" that automatically installs
the corresponding package depending on the GPU that is being currently
detected in the system.
Signed-off-by: Ettore Di Giacinto <mudler@localai.io>
* feat: use a metadata file
Signed-off-by: Ettore Di Giacinto <mudler@localai.io>
---------
Signed-off-by: Ettore Di Giacinto <mudler@localai.io>
* feat: Add backend gallery
This PR add support to manage backends as similar to models. There is
now available a backend gallery which can be used to install and remove
extra backends.
The backend gallery can be configured similarly as a model gallery, and
API calls allows to install and remove new backends in runtime, and as
well during the startup phase of LocalAI.
Signed-off-by: Ettore Di Giacinto <mudler@localai.io>
* Add backends docs
Signed-off-by: Ettore Di Giacinto <mudler@localai.io>
* wip: Backend Dockerfile for python backends
Signed-off-by: Ettore Di Giacinto <mudler@localai.io>
* feat: drop extras images, build python backends separately
Signed-off-by: Ettore Di Giacinto <mudler@localai.io>
* fixup on all backends
Signed-off-by: Ettore Di Giacinto <mudler@localai.io>
* test CI
Signed-off-by: Ettore Di Giacinto <mudler@localai.io>
* Tweaks
Signed-off-by: Ettore Di Giacinto <mudler@localai.io>
* Drop old backends leftovers
Signed-off-by: Ettore Di Giacinto <mudler@localai.io>
* Fixup CI
Signed-off-by: Ettore Di Giacinto <mudler@localai.io>
* Move dockerfile upper
Signed-off-by: Ettore Di Giacinto <mudler@localai.io>
* Fix proto
Signed-off-by: Ettore Di Giacinto <mudler@localai.io>
* Feature dropped for consistency - we prefer model galleries
Signed-off-by: Ettore Di Giacinto <mudler@localai.io>
* Add missing packages in the build image
Signed-off-by: Ettore Di Giacinto <mudler@localai.io>
* exllama is ponly available on cublas
Signed-off-by: Ettore Di Giacinto <mudler@localai.io>
* pin torch on chatterbox
Signed-off-by: Ettore Di Giacinto <mudler@localai.io>
* Fixups to index
Signed-off-by: Ettore Di Giacinto <mudler@localai.io>
* CI
Signed-off-by: Ettore Di Giacinto <mudler@localai.io>
* Debug CI
* Install accellerators deps
Signed-off-by: Ettore Di Giacinto <mudler@localai.io>
* Add target arch
* Add cuda minor version
Signed-off-by: Ettore Di Giacinto <mudler@localai.io>
* Use self-hosted runners
Signed-off-by: Ettore Di Giacinto <mudler@localai.io>
* ci: use quay for test images
Signed-off-by: Ettore Di Giacinto <mudler@localai.io>
* fixups for vllm and chatterbox
Signed-off-by: Ettore Di Giacinto <mudler@localai.io>
* Small fixups on CI
Signed-off-by: Ettore Di Giacinto <mudler@localai.io>
* chatterbox is only available for nvidia
Signed-off-by: Ettore Di Giacinto <mudler@localai.io>
* Simplify CI builds
Signed-off-by: Ettore Di Giacinto <mudler@localai.io>
* Adapt test, use qwen3
Signed-off-by: Ettore Di Giacinto <mudler@localai.io>
* chore(model gallery): add jina-reranker-v1-tiny-en-gguf
Signed-off-by: Ettore Di Giacinto <mudler@localai.io>
* fix(gguf-parser): recover from potential panics that can happen while reading ggufs with gguf-parser
Signed-off-by: Ettore Di Giacinto <mudler@localai.io>
* Use reranker from llama.cpp in AIO images
Signed-off-by: Ettore Di Giacinto <mudler@localai.io>
* Limit concurrent jobs
Signed-off-by: Ettore Di Giacinto <mudler@localai.io>
---------
Signed-off-by: Ettore Di Giacinto <mudler@localai.io>
Signed-off-by: Ettore Di Giacinto <mudler@users.noreply.github.com>
* Support file inputs
Signed-off-by: Ettore Di Giacinto <mudler@localai.io>
* feat: support multiple files
Signed-off-by: Ettore Di Giacinto <mudler@localai.io>
* show preview of files
Signed-off-by: Ettore Di Giacinto <mudler@localai.io>
---------
Signed-off-by: Ettore Di Giacinto <mudler@localai.io>
* feat(realtime): Initial Realtime API implementation
Signed-off-by: Ettore Di Giacinto <mudler@localai.io>
* chore: go mod tidy
Signed-off-by: Richard Palethorpe <io@richiejp.com>
* feat: Implement transcription only mode for realtime API
Reduce the scope of the real time API for the initial realease and make
transcription only mode functional.
Signed-off-by: Richard Palethorpe <io@richiejp.com>
* chore(build): Build backends on a separate layer to speed up core only changes
Signed-off-by: Richard Palethorpe <io@richiejp.com>
---------
Signed-off-by: Ettore Di Giacinto <mudler@localai.io>
Signed-off-by: Richard Palethorpe <io@richiejp.com>
Co-authored-by: Ettore Di Giacinto <mudler@localai.io>
* fix(embed): use go-rice for large backend assets
Golang embed FS has a hard limit that we might exceed when providing
many binary alternatives.
Signed-off-by: Ettore Di Giacinto <mudler@localai.io>
* simplify golang deps
Signed-off-by: Ettore Di Giacinto <mudler@localai.io>
* chore(tests): switch to testcontainers and print logs
Signed-off-by: Ettore Di Giacinto <mudler@localai.io>
* fix(tests): do not build a test binary
Signed-off-by: Ettore Di Giacinto <mudler@localai.io>
* small fixup
Signed-off-by: Ettore Di Giacinto <mudler@localai.io>
---------
Signed-off-by: Ettore Di Giacinto <mudler@localai.io>