Add a routing middleware stack and a cloud-proxy backend.
* cloud-proxy: a Go gRPC backend that forwards OpenAI- and
Anthropic-shaped chat requests to upstream providers, with an
optional translate mode (OpenAI request -> Anthropic /v1/messages
-> OpenAI response) and full tool-calling support.
* routing: admission control, content-aware model routing
(embedding cache + classifier + rerank + Arch-Router score),
PII detection/redaction (regex + NER) with streaming filter and
OpenAI/Anthropic adapters, and a per-user/per-key billing recorder
backed by GORM or in-memory storage.
* middleware: UsageMiddleware records usage via the billing recorder,
plus admission, route-model, usage-stamp and trace middlewares.
* observability: BackendTrace ring buffer stores full request bodies
(capped), MITM proxy emits structured trace events, and router
classifier decisions surface at /api/router/decide.
* gallery: Arch-Router-1.5B (Q4_K_M and Q8_0).
* UI: cloud-proxy model-editor fields, classifier system-prompt and
score-normalization config, and a Traces page rendering request
bodies.
Assisted-by: claude-code:claude-opus-4-7 [Read] [Edit] [Bash]
Signed-off-by: Richard Palethorpe <io@richiejp.com>
* feat: add distributed mode (experimental)
Signed-off-by: Ettore Di Giacinto <mudler@localai.io>
* fix data races, mutexes, transactions
Signed-off-by: Ettore Di Giacinto <mudler@localai.io>
* refactorings
Signed-off-by: Ettore Di Giacinto <mudler@localai.io>
* fixups
Signed-off-by: Ettore Di Giacinto <mudler@localai.io>
* fix events and tool stream in agent chat
Signed-off-by: Ettore Di Giacinto <mudler@localai.io>
* use ginkgo
Signed-off-by: Ettore Di Giacinto <mudler@localai.io>
* refactoring and consolidation
Signed-off-by: Ettore Di Giacinto <mudler@localai.io>
* refactoring and consolidation
Signed-off-by: Ettore Di Giacinto <mudler@localai.io>
* refactoring and consolidation
Signed-off-by: Ettore Di Giacinto <mudler@localai.io>
* refactoring and consolidation
Signed-off-by: Ettore Di Giacinto <mudler@localai.io>
* refactoring and consolidation
Signed-off-by: Ettore Di Giacinto <mudler@localai.io>
* refactoring and consolidation
Signed-off-by: Ettore Di Giacinto <mudler@localai.io>
* refactoring and consolidation
Signed-off-by: Ettore Di Giacinto <mudler@localai.io>
* refactoring and consolidation
Signed-off-by: Ettore Di Giacinto <mudler@localai.io>
* fix(cron): compute correctly time boundaries avoiding re-triggering
Signed-off-by: Ettore Di Giacinto <mudler@localai.io>
* enhancements, refactorings
Signed-off-by: Ettore Di Giacinto <mudler@localai.io>
* do not flood of healthy checks
Signed-off-by: Ettore Di Giacinto <mudler@localai.io>
* do not list obvious backends as text backends
Signed-off-by: Ettore Di Giacinto <mudler@localai.io>
* tests fixups
Signed-off-by: Ettore Di Giacinto <mudler@localai.io>
* refactoring and consolidation
Signed-off-by: Ettore Di Giacinto <mudler@localai.io>
* Drop redundant healthcheck
Signed-off-by: Ettore Di Giacinto <mudler@localai.io>
* enhancements, refactorings
Signed-off-by: Ettore Di Giacinto <mudler@localai.io>
---------
Signed-off-by: Ettore Di Giacinto <mudler@localai.io>
* chore(transformers): bump to >5.0
Signed-off-by: Ettore Di Giacinto <mudler@localai.io>
* chore: refactor to use generic model loading
Signed-off-by: Ettore Di Giacinto <mudler@localai.io>
---------
Signed-off-by: Ettore Di Giacinto <mudler@localai.io>
* feat(metal): try to extend support to remaining backends
Signed-off-by: Ettore Di Giacinto <mudler@localai.io>
* neutts doesn't work
Signed-off-by: Ettore Di Giacinto <mudler@localai.io>
* split outetts out of transformers
Signed-off-by: Ettore Di Giacinto <mudler@localai.io>
* Remove torch pin to whisperx
Signed-off-by: Ettore Di Giacinto <mudler@localai.io>
---------
Signed-off-by: Ettore Di Giacinto <mudler@localai.io>
* chore: allow to install with pip
Signed-off-by: Ettore Di Giacinto <mudler@localai.io>
* WIP
Signed-off-by: Ettore Di Giacinto <mudler@localai.io>
* Make the backend to build and actually work
Signed-off-by: Ettore Di Giacinto <mudler@localai.io>
* List models from system only
Signed-off-by: Ettore Di Giacinto <mudler@localai.io>
* Add script to build darwin python backends
Signed-off-by: Ettore Di Giacinto <mudler@localai.io>
* Run protogen in libbackend
Signed-off-by: Ettore Di Giacinto <mudler@localai.io>
* Detect if mps is available across python backends
Signed-off-by: Ettore Di Giacinto <mudler@localai.io>
* CI: try to build backend
Signed-off-by: Ettore Di Giacinto <mudler@localai.io>
* Debug CI
Signed-off-by: Ettore Di Giacinto <mudler@localai.io>
* Fixups
Signed-off-by: Ettore Di Giacinto <mudler@localai.io>
* Fixups
Signed-off-by: Ettore Di Giacinto <mudler@localai.io>
* Index mlx-vlm
Signed-off-by: Ettore Di Giacinto <mudler@localai.io>
* Remove mlx-vlm
Signed-off-by: Ettore Di Giacinto <mudler@localai.io>
* Drop CI test
Signed-off-by: Ettore Di Giacinto <mudler@localai.io>
---------
Signed-off-by: Ettore Di Giacinto <mudler@localai.io>
* merge sentencetransformers
Signed-off-by: Ettore Di Giacinto <mudler@localai.io>
* Add alias to silently redirect sentencetransformers to transformers
Signed-off-by: Ettore Di Giacinto <mudler@localai.io>
* Add alias also for transformers-musicgen
Signed-off-by: Ettore Di Giacinto <mudler@localai.io>
* Drop from makefile
Signed-off-by: Ettore Di Giacinto <mudler@localai.io>
* Move tests from sentencetransformers
Signed-off-by: Ettore Di Giacinto <mudler@localai.io>
* Remove sentencetransformers
Signed-off-by: Ettore Di Giacinto <mudler@localai.io>
* Remove tests from CI (part of transformers)
Signed-off-by: Ettore Di Giacinto <mudler@localai.io>
* Do not always try to load the tokenizer
Signed-off-by: Ettore Di Giacinto <mudler@localai.io>
* Adapt tests
Signed-off-by: Ettore Di Giacinto <mudler@localai.io>
* Fix typo
Signed-off-by: Ettore Di Giacinto <mudler@localai.io>
* Tiny adjustments
Signed-off-by: Ettore Di Giacinto <mudler@localai.io>
---------
Signed-off-by: Ettore Di Giacinto <mudler@localai.io>
* feat(transformers): merge musicgen functionalities to a single backend
So we optimize space
Signed-off-by: Ettore Di Giacinto <mudler@localai.io>
* specify type in tests
Signed-off-by: Ettore Di Giacinto <mudler@localai.io>
* Some adaptations for the MusicgenForConditionalGeneration type
Signed-off-by: Ettore Di Giacinto <mudler@localai.io>
---------
Signed-off-by: Ettore Di Giacinto <mudler@localai.io>
* bugfix: CUDA acceleration not working
CUDA not working after #2286.
Refactored the code to be more polish
* Update requirements.txt
Missing imports
Signed-off-by: fakezeta <fakezeta@gmail.com>
* Update requirements.txt
Signed-off-by: fakezeta <fakezeta@gmail.com>
---------
Signed-off-by: fakezeta <fakezeta@gmail.com>
update transformers
*Handle Temperature = 0 as greedy search
*Handle custom works as stop words
*Implement KV cache
*Phi 3 no more requires trust_remote_code: true