* feat: allow to set forcing backends eviction while requests are in flight
Signed-off-by: Ettore Di Giacinto <mudler@localai.io>
* feat: try to make the request sit and retry if eviction couldn't be done
Otherwise calls that in order to pass would need to shutdown other
backends would just fail.
In this way instead we make the request sit and retry eviction until it
succeeds. The thresholds can be configured by the user.
Signed-off-by: Ettore Di Giacinto <mudler@localai.io>
* add tests
Signed-off-by: Ettore Di Giacinto <mudler@localai.io>
* expose settings to CLI
Signed-off-by: Ettore Di Giacinto <mudler@localai.io>
* Update docs
Signed-off-by: Ettore Di Giacinto <mudler@localai.io>
---------
Signed-off-by: Ettore Di Giacinto <mudler@localai.io>
* feat(loader): refactor single active backend support to LRU
This changeset introduces LRU management of loaded backends. Users can
set now a maximum number of models to be loaded concurrently, and, when
setting LocalAI in single active backend mode we set LRU to 1 for
backward compatibility.
Signed-off-by: Ettore Di Giacinto <mudler@localai.io>
* chore: add tests
Signed-off-by: Ettore Di Giacinto <mudler@localai.io>
* Update docs
Signed-off-by: Ettore Di Giacinto <mudler@localai.io>
* Fixups
Signed-off-by: Ettore Di Giacinto <mudler@localai.io>
---------
Signed-off-by: Ettore Di Giacinto <mudler@localai.io>
* feat(ui): add watchdog settings
Signed-off-by: Ettore Di Giacinto <mudler@localai.io>
* Do not re-read env
Signed-off-by: Ettore Di Giacinto <mudler@localai.io>
* Some refactor, move other settings to runtime (p2p)
Signed-off-by: Ettore Di Giacinto <mudler@localai.io>
* Add API Keys handling
Signed-off-by: Ettore Di Giacinto <mudler@localai.io>
* Allow to disable runtime settings
Signed-off-by: Ettore Di Giacinto <mudler@localai.io>
* Documentation
Signed-off-by: Ettore Di Giacinto <mudler@localai.io>
* Small fixups
Signed-off-by: Ettore Di Giacinto <mudler@localai.io>
* show MCP toggle in index
Signed-off-by: Ettore Di Giacinto <mudler@localai.io>
* Drop context default
Signed-off-by: Ettore Di Giacinto <mudler@localai.io>
---------
Signed-off-by: Ettore Di Giacinto <mudler@localai.io>
* move downloader out
* separate startup functions for preloading configuration files
* docs: add popular model examples
Signed-off-by: Ettore Di Giacinto <mudler@localai.io>
* shorteners
* Add llava
* Add mistral-openorca
* Better link to build section
* docs: update
* fixup
* Drop code dups
* Minor fixups
* Apply suggestions from code review
Signed-off-by: Ettore Di Giacinto <mudler@users.noreply.github.com>
* ci: try to cache gRPC build during tests
Signed-off-by: Ettore Di Giacinto <mudler@localai.io>
* ci: do not build all images for tests, just necessary
* ci: cache gRPC also in release pipeline
* fixes
* Update model_preload_test.go
Signed-off-by: Ettore Di Giacinto <mudler@users.noreply.github.com>
---------
Signed-off-by: Ettore Di Giacinto <mudler@localai.io>
Signed-off-by: Ettore Di Giacinto <mudler@users.noreply.github.com>
* feat: allow to pass by models via args
* expose it also as an env/arg
* docs: enhancements to build/requirements
* do not display status always
* print download status
* not all mesages are debug
* Use cuda in transformers if available
tensorflow probably needs a different check.
Signed-off-by: Erich Schubert <kno10@users.noreply.github.com>
* feat: expose CUDA at top level
Signed-off-by: Ettore Di Giacinto <mudler@localai.io>
* tests: add to tests and create workflow for py extra backends
* doc: update note on how to use core images
---------
Signed-off-by: Erich Schubert <kno10@users.noreply.github.com>
Signed-off-by: Ettore Di Giacinto <mudler@localai.io>
Co-authored-by: Erich Schubert <kno10@users.noreply.github.com>