- Strict monotonic Go coverage gate (make test-coverage-check, 45% baseline)
run in CI; fixes ginkgo dropping all-but-one coverprofile across multiple
recursive roots, builds with -tags auth, and folds in the in-process
tests/e2e suite via --coverpkg.
- React UI e2e coverage (make test-ui-coverage: vite-plugin-istanbul + nyc,
nix-provided Chromium) plus e2e specs for 6 previously-untested pages, and a
UI coverage gate (make test-ui-coverage-check) with a small tolerance since
e2e line coverage jitters ~0.5pp run-to-run.
- pre-commit hook: lint + coverage on Go changes, Playwright e2e + UI coverage
gate on react-ui changes; install with make install-hooks.
- New Go handler tests (settings, branding), hermetic base64 download test.
- fix(ui): model editor reads vram_display (snake_case), so the VRAM estimate
renders again; covered by a regression test.
Assisted-by: Claude:claude-opus-4-7
Signed-off-by: Richard Palethorpe <io@richiejp.com>
Add a routing middleware stack and a cloud-proxy backend.
* cloud-proxy: a Go gRPC backend that forwards OpenAI- and
Anthropic-shaped chat requests to upstream providers, with an
optional translate mode (OpenAI request -> Anthropic /v1/messages
-> OpenAI response) and full tool-calling support.
* routing: admission control, content-aware model routing
(embedding cache + classifier + rerank + Arch-Router score),
PII detection/redaction (regex + NER) with streaming filter and
OpenAI/Anthropic adapters, and a per-user/per-key billing recorder
backed by GORM or in-memory storage.
* middleware: UsageMiddleware records usage via the billing recorder,
plus admission, route-model, usage-stamp and trace middlewares.
* observability: BackendTrace ring buffer stores full request bodies
(capped), MITM proxy emits structured trace events, and router
classifier decisions surface at /api/router/decide.
* gallery: Arch-Router-1.5B (Q4_K_M and Q8_0).
* UI: cloud-proxy model-editor fields, classifier system-prompt and
score-normalization config, and a Traces page rendering request
bodies.
Assisted-by: claude-code:claude-opus-4-7 [Read] [Edit] [Bash]
Signed-off-by: Richard Palethorpe <io@richiejp.com>
* chore: ignore local .worktrees directory
Signed-off-by: Ettore Di Giacinto <mudler@localai.io>
* fix(openai): stream usage non-zero when tools are enabled
The streaming chat-completions worker for tool-bearing requests
(processTools in core/http/endpoints/openai/chat.go) never forwarded the
cumulative TokenUsage from ComputeChoices to the chunks it placed on the
responses channel. The outer streaming loop's running usage tracker
therefore stayed at the zero value, and the include_usage trailer
reported {prompt_tokens:0, completion_tokens:0, total_tokens:0} whenever
the request carried a `tools` array. Without tools, the alternative
`process` path stamps Usage on every chunk, so that path was unaffected.
Forward the final TokenUsage via a usage-only sentinel chunk (empty
Choices, populated Usage) emitted right before close(responses). The
outer loop's per-chunk Usage capture moves above the empty-Choices skip
so the sentinel updates the tracker without ever reaching the wire,
keeping the existing OpenAI spec contract (intermediate chunks carry no
`usage` field, and the deferred-final-chunk helpers remain Usage-free
per the regression test for issue #8546).
Adds streamUsageFromTokenUsage, usageSentinelChunk, and
applyChunkToUsage helpers with focused Ginkgo coverage plus a flow-level
test that mirrors the outer-loop sequence.
Fixes#9927
Signed-off-by: Ettore Di Giacinto <mudler@localai.io>
Assisted-by: Claude:opus-4-7 [Claude Code]
* refactor(openai): return final TokenUsage from stream workers
Replace the usage-only sentinel SSE chunk introduced in the previous
commit with a plain return value. The streaming workers process and
processTools (now extracted as package-level processStream and
processStreamWithTools) return (backend.TokenUsage, error); the outer
ChatEndpoint loop reads the cumulative counts off the existing `ended`
channel (now carrying streamWorkerResult{usage, err}) and builds the
include_usage trailer from a normal Go value after the LOOP exits.
This drops the empty-Choices "skip but capture Usage" rule from the
outer loop and removes the usageSentinelChunk / applyChunkToUsage
helpers entirely. The SSE responses channel is back to a single
purpose: wire chunks only.
processStream and processStreamWithTools move into chat_stream_workers.go
so they can be exercised directly from tests. The chat_stream_usage_test.go
suite now drives the workers with a mocked backend.ModelInferenceFunc
and asserts on the returned TokenUsage. The regression coverage for
issue #9927 is therefore behavioral: reverting the fix (discarding
ComputeChoices' usage return) makes the assertions fail with concrete
count mismatches.
Signed-off-by: Ettore Di Giacinto <mudler@localai.io>
Assisted-by: Claude:opus-4-7 [Claude Code]
---------
Signed-off-by: Ettore Di Giacinto <mudler@localai.io>
Co-authored-by: Ettore Di Giacinto <mudler@localai.io>
AIO images are behind, and takes effort to maintain these. Wizard and
installation of models have been semplified massively, so AIO images
lost their purpose.
This allows us to be more laser focused on main images and reliefes
stress from CI.
Signed-off-by: Ettore Di Giacinto <mudler@localai.io>
* feat(realtime): WebRTC support
Signed-off-by: Richard Palethorpe <io@richiejp.com>
* fix(tracing): Show full LLM opts and deltas
Signed-off-by: Richard Palethorpe <io@richiejp.com>
---------
Signed-off-by: Richard Palethorpe <io@richiejp.com>
* WIP response format implementation for audio transcriptions
(cherry picked from commit e271dd764bbc13846accf3beb8b6522153aa276f)
Signed-off-by: Andres Smith <andressmithdev@pm.me>
* Rework transcript response_format and add more formats
(cherry picked from commit 6a93a8f63e2ee5726bca2980b0c9cf4ef8b7aeb8)
Signed-off-by: Andres Smith <andressmithdev@pm.me>
* Add test and replace go-openai package with official openai go client
(cherry picked from commit f25d1a04e46526429c89db4c739e1e65942ca893)
Signed-off-by: Andres Smith <andressmithdev@pm.me>
* Fix faster-whisper backend and refactor transcription formatting to also work on CLI
Signed-off-by: Andres Smith <andressmithdev@pm.me>
(cherry picked from commit 69a93977d5e113eb7172bd85a0f918592d3d2168)
Signed-off-by: Andres Smith <andressmithdev@pm.me>
---------
Signed-off-by: Andres Smith <andressmithdev@pm.me>
Co-authored-by: nanoandrew4 <nanoandrew4@gmail.com>
Co-authored-by: Ettore Di Giacinto <mudler@users.noreply.github.com>
* Add launcher (WIP)
Signed-off-by: Ettore Di Giacinto <mudler@localai.io>
* Update gomod
Signed-off-by: Ettore Di Giacinto <mudler@localai.io>
* Cleanup, focus on systray
Signed-off-by: Ettore Di Giacinto <mudler@localai.io>
* Separate launcher from main
Signed-off-by: Ettore Di Giacinto <mudler@localai.io>
* Add a way to identify the binary version
Signed-off-by: Ettore Di Giacinto <mudler@localai.io>
* Implement save config, and start on boot
Signed-off-by: Ettore Di Giacinto <mudler@localai.io>
* Small fixups
Signed-off-by: Ettore Di Giacinto <mudler@localai.io>
* Save installed version as metadata
Signed-off-by: Ettore Di Giacinto <mudler@localai.io>
* Stop LocalAI on quit
Signed-off-by: Ettore Di Giacinto <mudler@localai.io>
* Fix goreleaser
Signed-off-by: Ettore Di Giacinto <mudler@localai.io>
* Check first if binary is there
Signed-off-by: Ettore Di Giacinto <mudler@localai.io>
* do not show version if we don't have it
Signed-off-by: Ettore Di Giacinto <mudler@localai.io>
* Try to build on CI
Signed-off-by: Ettore Di Giacinto <mudler@localai.io>
* use fyne package
Signed-off-by: Ettore Di Giacinto <mudler@localai.io>
* Add to release
Signed-off-by: Ettore Di Giacinto <mudler@localai.io>
* Fixups
Signed-off-by: Ettore Di Giacinto <mudler@localai.io>
* Fyne.Do
Signed-off-by: Ettore Di Giacinto <mudler@localai.io>
* show WEBUI button only if LocalAI is started
Signed-off-by: Ettore Di Giacinto <mudler@localai.io>
* Default to localhost
Signed-off-by: Ettore Di Giacinto <mudler@localai.io>
* CI
Signed-off-by: Ettore Di Giacinto <mudler@localai.io>
* Show rel notes
Signed-off-by: Ettore Di Giacinto <mudler@localai.io>
* Update logo
Signed-off-by: Ettore Di Giacinto <mudler@localai.io>
* Small improvements and fix tests
Signed-off-by: Ettore Di Giacinto <mudler@localai.io>
* Try to fix e2e tests
Signed-off-by: Ettore Di Giacinto <mudler@localai.io>
---------
Signed-off-by: Ettore Di Giacinto <mudler@localai.io>
* feat: split remaining backends and drop embedded backends
- Drop silero-vad, huggingface, and stores backend from embedded
binaries
- Refactor Makefile and Dockerfile to avoid building grpc backends
- Drop golang code that was used to embed backends
- Simplify building by using goreleaser
Signed-off-by: Ettore Di Giacinto <mudler@localai.io>
* chore(gallery): be specific with llama-cpp backend templates
Signed-off-by: Ettore Di Giacinto <mudler@localai.io>
* chore(docs): update
Signed-off-by: Ettore Di Giacinto <mudler@localai.io>
* chore(ci): minor fixes
Signed-off-by: Ettore Di Giacinto <mudler@localai.io>
* chore: drop all ffmpeg references
Signed-off-by: Ettore Di Giacinto <mudler@localai.io>
* fix: run protogen-go
Signed-off-by: Ettore Di Giacinto <mudler@localai.io>
* Always enable p2p mode
Signed-off-by: Ettore Di Giacinto <mudler@localai.io>
* Update gorelease file
Signed-off-by: Ettore Di Giacinto <mudler@localai.io>
* fix(stores): do not always load
Signed-off-by: Ettore Di Giacinto <mudler@localai.io>
* Fix linting issues
Signed-off-by: Ettore Di Giacinto <mudler@localai.io>
* Simplify
Signed-off-by: Ettore Di Giacinto <mudler@localai.io>
* Mac OS fixup
Signed-off-by: Ettore Di Giacinto <mudler@localai.io>
---------
Signed-off-by: Ettore Di Giacinto <mudler@localai.io>
* Build llama.cpp separately
Signed-off-by: Ettore Di Giacinto <mudler@localai.io>
* WIP
Signed-off-by: Ettore Di Giacinto <mudler@localai.io>
* WIP
Signed-off-by: Ettore Di Giacinto <mudler@localai.io>
* WIP
Signed-off-by: Ettore Di Giacinto <mudler@localai.io>
* Start to try to attach some tests
Signed-off-by: Ettore Di Giacinto <mudler@localai.io>
* Add git and small fixups
Signed-off-by: Ettore Di Giacinto <mudler@localai.io>
* fix: correctly autoload external backends
Signed-off-by: Ettore Di Giacinto <mudler@localai.io>
* Try to run AIO tests
Signed-off-by: Ettore Di Giacinto <mudler@localai.io>
* Slightly update the Makefile helps
Signed-off-by: Ettore Di Giacinto <mudler@localai.io>
* Adapt auto-bumper
Signed-off-by: Ettore Di Giacinto <mudler@localai.io>
* Try to run linux test
Signed-off-by: Ettore Di Giacinto <mudler@localai.io>
* Add llama-cpp into build pipelines
Signed-off-by: Ettore Di Giacinto <mudler@localai.io>
* Add default capability (for cpu)
Signed-off-by: Ettore Di Giacinto <mudler@localai.io>
* Drop llama-cpp specific logic from the backend loader
Signed-off-by: Ettore Di Giacinto <mudler@localai.io>
* drop grpc install in ci for tests
Signed-off-by: Ettore Di Giacinto <mudler@localai.io>
* fixups
Signed-off-by: Ettore Di Giacinto <mudler@localai.io>
* Pass by backends path for tests
Signed-off-by: Ettore Di Giacinto <mudler@localai.io>
* Build protogen at start
Signed-off-by: Ettore Di Giacinto <mudler@localai.io>
* fix(tests): set backends path consistently
Signed-off-by: Ettore Di Giacinto <mudler@localai.io>
* Correctly configure the backends path
Signed-off-by: Ettore Di Giacinto <mudler@localai.io>
* Try to build for darwin
Signed-off-by: Ettore Di Giacinto <mudler@localai.io>
* WIP
Signed-off-by: Ettore Di Giacinto <mudler@localai.io>
* Compile for metal on arm64/darwin
Signed-off-by: Ettore Di Giacinto <mudler@localai.io>
* Try to run build off from cross-arch
Signed-off-by: Ettore Di Giacinto <mudler@localai.io>
* Add to the backend index nvidia-l4t and cpu's llama-cpp backends
Signed-off-by: Ettore Di Giacinto <mudler@localai.io>
* Build also darwin-x86 for llama-cpp
Signed-off-by: Ettore Di Giacinto <mudler@localai.io>
* Disable arm64 builds temporary
Signed-off-by: Ettore Di Giacinto <mudler@localai.io>
* Test backend build on PR
Signed-off-by: Ettore Di Giacinto <mudler@localai.io>
* Fixup build backend reusable workflow
Signed-off-by: Ettore Di Giacinto <mudler@localai.io>
* pass by skip drivers
Signed-off-by: Ettore Di Giacinto <mudler@localai.io>
* Use crane
Signed-off-by: Ettore Di Giacinto <mudler@localai.io>
* Skip drivers
Signed-off-by: Ettore Di Giacinto <mudler@localai.io>
* Fixups
Signed-off-by: Ettore Di Giacinto <mudler@localai.io>
* x86 darwin
Signed-off-by: Ettore Di Giacinto <mudler@localai.io>
* Add packaging step for llama.cpp
Signed-off-by: Ettore Di Giacinto <mudler@localai.io>
* fixups
Signed-off-by: Ettore Di Giacinto <mudler@localai.io>
* Fix leftover from bark-cpp extraction
Signed-off-by: Ettore Di Giacinto <mudler@localai.io>
* Try to fix hipblas build
Signed-off-by: Ettore Di Giacinto <mudler@localai.io>
---------
Signed-off-by: Ettore Di Giacinto <mudler@localai.io>
* feat(bark-cpp): add new bark.cpp backend
Signed-off-by: Ettore Di Giacinto <mudler@localai.io>
* build on linux only for now
Signed-off-by: Ettore Di Giacinto <mudler@localai.io>
* track bark.cpp in CI bumps
Signed-off-by: Ettore Di Giacinto <mudler@localai.io>
* Drop old entries from bumper
Signed-off-by: Ettore Di Giacinto <mudler@localai.io>
* No need to test rwkv specifically, now part of llama.cpp
Signed-off-by: Ettore Di Giacinto <mudler@localai.io>
---------
Signed-off-by: Ettore Di Giacinto <mudler@localai.io>
* stash initial fixes, attempt to open branch inside container
Signed-off-by: Dave Lee <dave@gray101.com>
* add yq, from inside DC
Signed-off-by: Dave Lee <dave@gray101.com>
* stash progress, rebuild container
Signed-off-by: Dave Lee <dave@gray101.com>
* snap
Signed-off-by: Dave Lee <dave@gray101.com>
* split builder into builder-sd, will speed up devcontainer build times and potentially help caching in other situations.
Signed-off-by: Dave Lee <dave@gray101.com>
* fix yq
Signed-off-by: Dave Lee <dave@gray101.com>
* fix paths
Signed-off-by: Dave Lee <dave@gray101.com>
* fix paths - new folder to bypass the .dockerignore which _should_ exclude the other files
Signed-off-by: Dave Lee <dave@gray101.com>
* fix
Signed-off-by: Dave Lee <dave@gray101.com>
* fix ]
Signed-off-by: Dave Lee <dave@gray101.com>
---------
Signed-off-by: Dave Lee <dave@gray101.com>
* WIP
Signed-off-by: Ettore Di Giacinto <mudler@localai.io>
* gen a static page instead (we force DNS redirects to it)
Signed-off-by: Ettore Di Giacinto <mudler@localai.io>
* feat(gallery): install models from CLI, unify install
Signed-off-by: Ettore Di Giacinto <mudler@localai.io>
* Uniform graphic of model page
Signed-off-by: Ettore Di Giacinto <mudler@localai.io>
* Makefile: update targets
Signed-off-by: Ettore Di Giacinto <mudler@localai.io>
* Slightly enhance gallery view
Signed-off-by: Ettore Di Giacinto <mudler@localai.io>
---------
Signed-off-by: Ettore Di Giacinto <mudler@localai.io>
* models(gallery): add mistral-0.3 and command-r, update functions
Add also disable_parallel_new_lines to disable newlines in the JSON
output when forcing parallel tools. Some models (like mistral) might be
very sensible to that when being used for function calling.
Signed-off-by: Ettore Di Giacinto <mudler@localai.io>
* models(gallery): add aya-23-8b
Signed-off-by: Ettore Di Giacinto <mudler@localai.io>
---------
Signed-off-by: Ettore Di Giacinto <mudler@localai.io>
* core 1
* api/openai/files fix
* core 2 - core/config
* move over core api.go and tests to the start of core/http
* move over localai specific endpoints to core/http, begin the service/endpoint split there
* refactor big chunk on the plane
* refactor chunk 2 on plane, next step: port and modify changes to request.go
* easy fixes for request.go, major changes not done yet
* lintfix
* json tag lintfix?
* gitignore and .keep files
* strange fix attempt: rename the config dir?
* refactor: rename llama-stable to llama-ggml
* Makefile: get sources in sources/
Signed-off-by: Ettore Di Giacinto <mudler@localai.io>
* fixup path
Signed-off-by: Ettore Di Giacinto <mudler@localai.io>
* fixup sources
Signed-off-by: Ettore Di Giacinto <mudler@localai.io>
* fixups sd
Signed-off-by: Ettore Di Giacinto <mudler@localai.io>
* update SD
* fixup
* fixup: create piper libdir also when not built
Signed-off-by: Ettore Di Giacinto <mudler@localai.io>
* fix make target on linux test
Signed-off-by: Ettore Di Giacinto <mudler@localai.io>
---------
Signed-off-by: Ettore Di Giacinto <mudler@localai.io>