diff --git a/.agents/building-and-testing.md b/.agents/building-and-testing.md index 04df3426e..3cf85c0dc 100644 --- a/.agents/building-and-testing.md +++ b/.agents/building-and-testing.md @@ -38,9 +38,12 @@ The React UI (`core/http/react-ui/`) has **no component/unit tests** — its onl - **Browser:** the flake dev shell ships `chromium` and exports `PLAYWRIGHT_CHROMIUM_PATH`; `playwright.config.js` uses it via `launchOptions.executablePath`, and the Makefile skips `playwright install` when it's set. This avoids Playwright's downloaded browser, which can't resolve system libs (`libglib-2.0`, …) on NixOS. In CI (no `PLAYWRIGHT_CHROMIUM_PATH`) the Makefile falls back to `playwright install --with-deps chromium`. - The app is a React SPA, so coverage accumulates across in-app navigation within a test; a full `page.goto`/reload resets it. - `.nycrc.json` uses `all: true`, so **every `src/**` file is in the report**, including 0%-coverage ones — that's how you spot features with no test at all (sort the HTML report or `coverage-summary.json` by line% ascending). -- **UI coverage gate:** `make test-ui-coverage-check` runs the suite then `scripts/ui-coverage-check.sh`, failing if total line coverage drops more than `UI_COVERAGE_TOLERANCE` (default **1.0pp**) below `core/http/react-ui/coverage-baseline.txt`. `make test-ui-coverage-baseline` regenerates the baseline. **Why a tolerance (unlike the strict Go gate):** UI e2e line coverage is *non-deterministic* — async/debounced paths (e.g. the VRAM estimate's 500ms debounce) make identical specs vary ~0.5pp run-to-run, so a zero-tolerance gate would flake. Keep the tolerance just above the observed jitter. Run in CI (`tests-ui-e2e.yml`) and pre-commit on `core/http/react-ui/` changes. +- **UI coverage gate:** `make test-ui-coverage-check` runs the suite then `scripts/ui-coverage-check.sh`, failing if total line coverage drops more than `UI_COVERAGE_TOLERANCE` below `core/http/react-ui/coverage-baseline.txt`. `make test-ui-coverage-baseline` regenerates the baseline. Runs in CI (`tests-ui-e2e.yml`) and pre-commit on `core/http/react-ui/` changes. +- **Why it has a tolerance (unlike the strict Go gate):** UI e2e coverage is *non-deterministic*. Specs that assert on state and end while async/lazy render work is still in flight collect those lines only when the render beats the coverage teardown — so the total drifts with machine speed/load (a fast local box reads higher than a slow CI runner), diffusely across many specs. The tolerance absorbs that drift, so set the baseline *below* the slow-CI floor, never to a fast-local `make test-ui-coverage-baseline` number, or CI flaps. +- **Raising coverage is cheap:** a *render-smoke* spec (navigate to a route, assert its header renders) mounts a lazy page and runs its full render + initial effects, capturing most of its lines in a few lines of test — see `e2e/page-render-smoke.spec.js`. Auth is disabled in the test server (`isAdmin=true`), so `RequireAdmin`/`RequireFeature` routes render without a mock. The most *deterministic* win is removing a race: make a spec `await` a rendered element before ending (see `e2e/agents.spec.js` → AgentCreate) so its lines count every run. -Rules: -- The gate is **strict — there is no tolerance**. Any decrease fails, regardless of how many lines a PR adds or deletes. `covermode=atomic` makes line coverage deterministic, so there's no run-to-run jitter to excuse. -- When a change legitimately **raises** coverage, run `make test-coverage-baseline` and **commit** the updated `coverage-baseline.txt` so the ratchet moves up. Never lower the baseline by hand. -- If you can't get coverage back to baseline, the fix is to **add tests**, not to edit the baseline. +Rules (both gates): +- **Install the hooks:** `make install-hooks` once per clone so lint + coverage run pre-commit. Don't lean on CI for what the hook catches. +- **Don't work around the gate:** never `git commit --no-verify`, and never hand-lower a baseline or widen a tolerance to turn a red gate green. The ratchet only moves up. +- If a change drops coverage, **add tests** (sort `coverage-summary.json` by line% ascending to find untested code) rather than editing the baseline. When coverage legitimately rises, commit the regenerated baseline (`make test-coverage-baseline` / `test-ui-coverage-baseline`). +- The Go gate is **strict — no tolerance**; `covermode=atomic` keeps it deterministic. The UI gate keeps a small tolerance only because its e2e coverage isn't. diff --git a/AGENTS.md b/AGENTS.md index 1d7e29e9c..9f397e613 100644 --- a/AGENTS.md +++ b/AGENTS.md @@ -35,6 +35,7 @@ LocalAI follows the Linux kernel project's [guidelines for AI coding assistants] ## Quick Reference +- **Git hooks & coverage gates**: Run `make install-hooks` once per clone so the pre-commit lint + coverage gates run. **Never bypass them with `git commit --no-verify`, and never lower a coverage baseline or widen a gate's tolerance to turn a red gate green** — the coverage ratchet only moves up. If a change drops coverage, add tests to raise it (e.g. render-smoke specs). See [.agents/building-and-testing.md](.agents/building-and-testing.md). - **Logging**: Use `github.com/mudler/xlog` (same API as slog) - **Go style**: Prefer `any` over `interface{}` - **Comments**: Explain *why*, not *what* diff --git a/CONTRIBUTING.md b/CONTRIBUTING.md index df1c78909..c45e269b2 100644 --- a/CONTRIBUTING.md +++ b/CONTRIBUTING.md @@ -266,6 +266,12 @@ The e2e tests run LocalAI in a Docker container and exercise the API: make test-e2e ``` +### React UI tests and coverage + +The React UI (`core/http/react-ui/`) is covered by Playwright e2e specs, gated by a **monotonic line-coverage ratchet** (`make test-ui-coverage-check`, run in CI and pre-commit). The metric is non-deterministic — a fast local box reads higher than a slow CI runner for the same code — so a small tolerance is unavoidable. + +**If your change lowers UI coverage, raise it back by adding specs — do not widen the tolerance or hand-lower the baseline.** A *render-smoke* spec (navigate to a page, assert its header is visible) cheaply covers an entire lazy page. See `core/http/react-ui/e2e/page-render-smoke.spec.js` and the full policy in [.agents/building-and-testing.md](.agents/building-and-testing.md#react-ui-coverage). + ### Running E2E container tests These tests build a standard LocalAI Docker image and run it with pre-configured model configs to verify that most endpoints work correctly: diff --git a/README.md b/README.md index 2f4a2960d..907c681b9 100644 --- a/README.md +++ b/README.md @@ -31,12 +31,18 @@ **LocalAI** is the open-source AI engine. Run any model - LLMs, vision, voice, image, video - on any hardware. No GPU required. -- **Drop-in API compatibility** — OpenAI, Anthropic, ElevenLabs APIs -- **36+ backends** — llama.cpp, vLLM, transformers, whisper, diffusers, MLX... -- **Any hardware** — NVIDIA, AMD, Intel, Apple Silicon, Vulkan, or CPU-only -- **Multi-user ready** — API key auth, user quotas, role-based access -- **Built-in AI agents** — autonomous agents with tool use, RAG, MCP, and skills -- **Privacy-first** — your data never leaves your infrastructure +**A small core, not a bundle.** Each backend wraps a best-in-class engine (llama.cpp, vLLM, whisper.cpp, stable-diffusion, MLX...) in its own image, pulled only when a model needs it. You install nothing you don't use. + +- **Composable by design**: backends are separate and pulled on demand, so you install only what your model needs +- **Open and extensible**: load any model, or build your own backend in any language against an open interface +- **Drop-in API compatibility**: OpenAI, Anthropic, and ElevenLabs APIs across every backend +- **Any model, any modality**: LLMs, vision, voice, image, and video behind one API +- **Any hardware**: NVIDIA, AMD, Intel, Apple Silicon, Vulkan, or CPU-only +- **Multi-user ready**: API key auth, user quotas, role-based access +- **Built-in AI agents**: autonomous agents with tool use, RAG, MCP, and skills +- **Privacy-first**: your data never leaves your infrastructure + +![A small LocalAI core with backends (llama.cpp, vLLM, MLX, whisper.cpp, stable-diffusion, kokoro, parakeet.cpp...) plugged in as separate on-demand images](docs/static/images/diagrams/composable-core.png) Created by [Ettore Di Giacinto](https://github.com/mudler) and maintained by the [LocalAI team](#team). diff --git a/backend/cpp/llama-cpp/Makefile b/backend/cpp/llama-cpp/Makefile index b80e8b99a..0d90361a4 100644 --- a/backend/cpp/llama-cpp/Makefile +++ b/backend/cpp/llama-cpp/Makefile @@ -1,5 +1,5 @@ -LLAMA_VERSION?=d6588daa800058dfa54f1d7ea695b1a810c8ae18 +LLAMA_VERSION?=5dcb71166686799f0d873eab7386234302d05ecf LLAMA_REPO?=https://github.com/ggerganov/llama.cpp CMAKE_ARGS?= diff --git a/backend/go/crispasr/Makefile b/backend/go/crispasr/Makefile index 3d57067b0..390c5dfd3 100644 --- a/backend/go/crispasr/Makefile +++ b/backend/go/crispasr/Makefile @@ -8,7 +8,7 @@ JOBS?=$(shell nproc --ignore=1) # CrispASR version (release tag) CRISPASR_REPO?=https://github.com/CrispStrobe/CrispASR -CRISPASR_VERSION?=v0.6.11 +CRISPASR_VERSION?=05e60432bcb5bc2113f8c395a41e86497c11504a SO_TARGET?=libgocrispasr.so CMAKE_ARGS+=-DBUILD_SHARED_LIBS=OFF diff --git a/backend/go/parakeet-cpp/Makefile b/backend/go/parakeet-cpp/Makefile index de0989640..8e8e9d897 100644 --- a/backend/go/parakeet-cpp/Makefile +++ b/backend/go/parakeet-cpp/Makefile @@ -1,6 +1,6 @@ # parakeet-cpp backend Makefile. # -# Upstream pin lives below as PARAKEET_VERSION?=cb45f68068081af01e7092e91b038ee353eb56be +# Upstream pin lives below as PARAKEET_VERSION?=9edf17c3ada66e0f881dcff155492867db7ac4cf # (.github/bump_deps.sh) can find and update it - matches the # whisper.cpp / ds4 / vibevoice-cpp convention. # @@ -15,7 +15,7 @@ # That's what the L0 smoke test uses. The default target below does the # proper clone-at-pin + cmake build so CI doesn't need a side-checkout. -PARAKEET_VERSION?=cb45f68068081af01e7092e91b038ee353eb56be +PARAKEET_VERSION?=9edf17c3ada66e0f881dcff155492867db7ac4cf PARAKEET_REPO?=https://github.com/mudler/parakeet.cpp GOCMD?=go diff --git a/backend/go/parakeet-cpp/batcher.go b/backend/go/parakeet-cpp/batcher.go new file mode 100644 index 000000000..4a7c169e7 --- /dev/null +++ b/backend/go/parakeet-cpp/batcher.go @@ -0,0 +1,79 @@ +package main + +import "time" + +// batchRequest is one in-flight unary transcription waiting to be batched. +// In production pcm/decoder are set; tag is an opaque marker used by tests. +type batchRequest struct { + pcm []float32 + decoder int32 + tag string + reply chan batchReply +} + +// batchReply carries one per-item JSON object string (an element of the C-API's +// JSON array) or an error back to the waiting handler goroutine. +type batchReply struct { + json string + err error +} + +// batcher coalesces concurrent batchRequests into batched runBatch calls. A +// single run() goroutine is the sole caller of runBatch, so runBatch (which in +// production calls the thread-unsafe C engine) is never entered concurrently. +type batcher struct { + submit chan *batchRequest + maxSize int + maxWait time.Duration + runBatch func(reqs []*batchRequest) // must deliver a reply to every req +} + +func newBatcher(maxSize int, maxWait time.Duration, runBatch func([]*batchRequest)) *batcher { + if maxSize < 1 { + maxSize = 1 + } + return &batcher{ + submit: make(chan *batchRequest), + maxSize: maxSize, + maxWait: maxWait, + runBatch: runBatch, + } +} + +// run is the dispatcher loop: accumulate submitted requests until either maxSize +// is reached or maxWait elapses since the first queued request, then dispatch. +// Exits when stop is closed (draining any partially-filled batch first). +func (b *batcher) run(stop <-chan struct{}) { + for { + var first *batchRequest + select { + case first = <-b.submit: + case <-stop: + return + } + batch := []*batchRequest{first} + + // maxSize==1 disables batching: dispatch immediately (passthrough). + if b.maxSize == 1 { + b.runBatch(batch) + continue + } + + timer := time.NewTimer(b.maxWait) + fill: + for len(batch) < b.maxSize { + select { + case r := <-b.submit: + batch = append(batch, r) + case <-timer.C: + break fill + case <-stop: + timer.Stop() + b.runBatch(batch) + return + } + } + timer.Stop() + b.runBatch(batch) + } +} diff --git a/backend/go/parakeet-cpp/batcher_test.go b/backend/go/parakeet-cpp/batcher_test.go new file mode 100644 index 000000000..e51122ee5 --- /dev/null +++ b/backend/go/parakeet-cpp/batcher_test.go @@ -0,0 +1,108 @@ +package main + +import ( + "sync" + "time" + + . "github.com/onsi/ginkgo/v2" + . "github.com/onsi/gomega" +) + +var _ = Describe("batcher", func() { + echoReply := func(reqs []*batchRequest) { + for _, r := range reqs { + r.reply <- batchReply{json: r.tag} + } + } + + It("coalesces concurrent submits into batches", func() { + var mu sync.Mutex + var sizes []int + run := func(reqs []*batchRequest) { + mu.Lock() + sizes = append(sizes, len(reqs)) + mu.Unlock() + echoReply(reqs) + } + b := newBatcher(4, 50*time.Millisecond, run) + stop := make(chan struct{}) + go b.run(stop) + defer close(stop) + + const N = 4 + var wg sync.WaitGroup + got := make([]string, N) + for i := 0; i < N; i++ { + wg.Add(1) + go func(i int) { + defer wg.Done() + rep := make(chan batchReply, 1) + b.submit <- &batchRequest{tag: string(rune('a' + i)), reply: rep} + got[i] = (<-rep).json + }(i) + } + wg.Wait() + + mu.Lock() + defer mu.Unlock() + total, maxBatch := 0, 0 + for _, s := range sizes { + total += s + if s > maxBatch { + maxBatch = s + } + } + Expect(total).To(Equal(N)) + Expect(maxBatch).To(BeNumerically(">=", 2), "expected at least one batch to coalesce >1 request") + }) + + It("dispatches when max size is reached", func() { + dispatched := make(chan int, 8) + run := func(reqs []*batchRequest) { + dispatched <- len(reqs) + echoReply(reqs) + } + b := newBatcher(2, time.Hour, run) // huge window: only size can trigger + stop := make(chan struct{}) + go b.run(stop) + defer close(stop) + for i := 0; i < 2; i++ { + rep := make(chan batchReply, 1) + b.submit <- &batchRequest{tag: "x", reply: rep} + go func(rep chan batchReply) { <-rep }(rep) + } + Eventually(dispatched, "2s").Should(Receive(Equal(2))) + }) + + It("dispatches when the wait window elapses", func() { + dispatched := make(chan int, 8) + run := func(reqs []*batchRequest) { + dispatched <- len(reqs) + echoReply(reqs) + } + b := newBatcher(8, 20*time.Millisecond, run) // size unreachable; window fires + stop := make(chan struct{}) + go b.run(stop) + defer close(stop) + rep := make(chan batchReply, 1) + b.submit <- &batchRequest{tag: "x", reply: rep} + go func() { <-rep }() + Eventually(dispatched, "2s").Should(Receive(Equal(1))) + }) + + It("bypasses batching when max size is 1", func() { + dispatched := make(chan int, 8) + run := func(reqs []*batchRequest) { + dispatched <- len(reqs) + echoReply(reqs) + } + b := newBatcher(1, time.Hour, run) // size 1 => immediate dispatch + stop := make(chan struct{}) + go b.run(stop) + defer close(stop) + rep := make(chan batchReply, 1) + b.submit <- &batchRequest{tag: "x", reply: rep} + go func() { <-rep }() + Eventually(dispatched, "2s").Should(Receive(Equal(1))) + }) +}) diff --git a/backend/go/parakeet-cpp/goparakeetcpp.go b/backend/go/parakeet-cpp/goparakeetcpp.go index f8d49f058..969962a76 100644 --- a/backend/go/parakeet-cpp/goparakeetcpp.go +++ b/backend/go/parakeet-cpp/goparakeetcpp.go @@ -7,13 +7,17 @@ import ( "fmt" "os" "path/filepath" + "strconv" "strings" + "sync" + "time" "unsafe" "github.com/go-audio/wav" "github.com/mudler/LocalAI/pkg/grpc/base" pb "github.com/mudler/LocalAI/pkg/grpc/proto" "github.com/mudler/LocalAI/pkg/utils" + "github.com/mudler/xlog" "google.golang.org/grpc/codes" "google.golang.org/grpc/status" ) @@ -34,6 +38,15 @@ var ( CppFreeString func(s uintptr) CppLastError func(ctx uintptr) string + // Batched JSON transcription: takes a concatenated float buffer of clips + // plus their per-clip sample counts (sum(nSamples)==len(samplesConcat)) + // and returns a malloc'd char* JSON ARRAY of per-clip {"text","words", + // "tokens"} objects (uintptr, freed via CppFreeString). purego passes the + // Go slices as the base pointer of their backing array (kept alive for the + // call), matching the CppStreamFeed pcm []float32 binding pattern; the C + // side reads them as const float*/const int*. + CppTranscribePcmBatchJSON func(ctx uintptr, samplesConcat []float32, nSamples []int32, nClips int32, sampleRate int32, decoder int32) uintptr + // Cache-aware streaming (RNN-T) entry points. stream_begin returns 0 for // non-streaming models. feed/finalize return a malloc'd char* (uintptr, // freed via CppFreeString); feed writes 1 to *eouOut on an /. @@ -77,11 +90,18 @@ type transcriptToken struct { } // ParakeetCpp owns a single loaded parakeet_ctx. The C engine is a -// thread-unsafe singleton (mirrors whisper.cpp / vibevoice.cpp), so we -// serialize calls through base.SingleThread. +// thread-unsafe singleton (mirrors whisper.cpp / vibevoice.cpp). Rather than +// serialize every call through base.SingleThread, we route unary +// transcription through an in-process batcher (its sole dispatcher goroutine +// is the only caller of the engine on that path) and guard the shared engine +// with engineMu so a streaming session and a batched-unary dispatch never +// touch it concurrently. type ParakeetCpp struct { - base.SingleThread - ctxPtr uintptr + base.Base + ctxPtr uintptr + engineMu sync.Mutex // sole guard of the one C engine (dispatcher + streaming) + bat *batcher + batStop chan struct{} } // Load is the LocalAI gRPC entry point for LoadModel: it calls @@ -100,13 +120,103 @@ func (p *ParakeetCpp) Load(opts *pb.ModelOptions) error { return fmt.Errorf("parakeet-cpp: parakeet_capi_load failed for %q", opts.ModelFile) } p.ctxPtr = ctx + + // Dynamic batching knobs (model YAML options:, key:value form). Batching is + // OFF by default (batch_max_size:1): each request runs on its own. On GPU, + // raising batch_max_size coalesces concurrent requests into one batched + // engine call and improves throughput under load; leave it at 1 on CPU and + // for low-concurrency setups, where batching only adds latency. + maxSize := optInt(opts, "batch_max_size", 1) + maxWaitMs := optInt(opts, "batch_max_wait_ms", 15) + if maxWaitMs < 0 { + maxWaitMs = 0 + } + if CppTranscribePcmBatchJSON != nil { + p.batStop = make(chan struct{}) + p.bat = newBatcher(maxSize, time.Duration(maxWaitMs)*time.Millisecond, p.runBatch) + go p.bat.run(p.batStop) // dispatcher runs until Free closes batStop + if maxSize > 1 { + xlog.Info("parakeet-cpp: dynamic batching enabled", + "batch_max_size", maxSize, "batch_max_wait_ms", maxWaitMs) + } else { + xlog.Info("parakeet-cpp: dynamic batching off (batch_max_size=1); " + + "set batch_max_size>1 to coalesce concurrent requests on GPU") + } + } else { + xlog.Info("parakeet-cpp: batched C-API not present in libparakeet.so; " + + "batching disabled, using per-request transcription") + } return nil } -// AudioTranscription runs parakeet_capi_transcribe_path_json on the wav at -// opts.Dst with the default decoder (decoder=0, which selects the right head -// per architecture: transducer for tdt/rnnt/hybrid, CTC for ctc) and shapes -// the per-word timestamps into a LocalAI TranscriptResult. +// optInt reads an integer model option (key:value form) from ModelOptions, +// returning def when absent or unparseable. The options array carries the +// model YAML's options: entries (see core/config; siblings such as +// acestep-cpp parse the same key:value form via strings.Cut on ":"). +func optInt(opts *pb.ModelOptions, key string, def int) int { + for _, o := range opts.GetOptions() { + k, v, ok := strings.Cut(o, ":") + if ok && strings.TrimSpace(k) == key { + if n, err := strconv.Atoi(strings.TrimSpace(v)); err == nil { + return n + } + } + } + return def +} + +// runBatch is the dispatcher's batch handler and the ONLY caller of the C +// engine on the unary path. It concatenates the batch PCM, calls the batched +// JSON C-API under engineMu, splits the JSON array, and replies to each request. +func (p *ParakeetCpp) runBatch(reqs []*batchRequest) { + // Observability: the actual coalesced batch size per engine call. Debug-level + // so it stays silent in normal operation but lets operators confirm/tune batching. + xlog.Debug("parakeet-cpp: dispatching batch", "size", len(reqs)) + nSamples := make([]int32, len(reqs)) + total := 0 + for i, r := range reqs { + nSamples[i] = int32(len(r.pcm)) + total += len(r.pcm) + } + concat := make([]float32, 0, total) + for _, r := range reqs { + concat = append(concat, r.pcm...) + } + var dec int32 + if len(reqs) > 0 { + dec = reqs[0].decoder + } + p.engineMu.Lock() + cstr := CppTranscribePcmBatchJSON(p.ctxPtr, concat, nSamples, int32(len(reqs)), 16000, dec) + p.engineMu.Unlock() + if cstr == 0 { + err := fmt.Errorf("parakeet-cpp: batch transcribe failed: %s", CppLastError(p.ctxPtr)) + for _, r := range reqs { + r.reply <- batchReply{err: err} + } + return + } + raw := goStringFromCPtr(cstr) + CppFreeString(cstr) + var docs []json.RawMessage + if err := json.Unmarshal([]byte(raw), &docs); err != nil || len(docs) != len(reqs) { + e := fmt.Errorf("parakeet-cpp: batch json: got %d results for %d reqs (%v)", len(docs), len(reqs), err) + for _, r := range reqs { + r.reply <- batchReply{err: e} + } + return + } + for i, r := range reqs { + r.reply <- batchReply{json: string(docs[i])} + } +} + +// AudioTranscription decodes the wav at opts.Dst to 16 kHz mono PCM and +// submits it to the in-process batcher, which coalesces concurrent requests +// into a single batched engine call (parakeet_capi_transcribe_pcm_batch_json) +// with the default decoder (decoder=0, which selects the right head per +// architecture: transducer for tdt/rnnt/hybrid, CTC for ctc) and shapes the +// per-word timestamps into a LocalAI TranscriptResult. // // Parakeet emits word- and token-level timestamps but no native segment // boundaries, so we synthesise a single whole-clip segment spanning the first @@ -118,7 +228,7 @@ func (p *ParakeetCpp) Load(opts *pb.ModelOptions) error { // translate/diarize/prompt/temperature/language/threads are not applicable to // parakeet and are ignored; streaming is handled by AudioTranscriptionStream // (L2). -func (p *ParakeetCpp) AudioTranscription(_ context.Context, opts *pb.TranscriptRequest) (pb.TranscriptResult, error) { +func (p *ParakeetCpp) AudioTranscription(ctx context.Context, opts *pb.TranscriptRequest) (pb.TranscriptResult, error) { if p.ctxPtr == 0 { return pb.TranscriptResult{}, errors.New("parakeet-cpp: model not loaded") } @@ -126,61 +236,74 @@ func (p *ParakeetCpp) AudioTranscription(_ context.Context, opts *pb.TranscriptR return pb.TranscriptResult{}, errors.New("parakeet-cpp: TranscriptRequest.dst (audio path) is required") } - cstr := CppTranscribePathJSON(p.ctxPtr, opts.Dst, 0) - if cstr == 0 { - msg := CppLastError(p.ctxPtr) - if msg == "" { - msg = "unknown error" + // Fallback when the batched C-API is unavailable: transcribe directly from + // the file path (original behavior, no batching). + if p.bat == nil { + cstr := CppTranscribePathJSON(p.ctxPtr, opts.Dst, 0) + if cstr == 0 { + return pb.TranscriptResult{}, fmt.Errorf("parakeet-cpp: transcribe_path_json failed: %s", CppLastError(p.ctxPtr)) } - return pb.TranscriptResult{}, fmt.Errorf("parakeet-cpp: transcribe_path_json failed: %s", msg) + raw := goStringFromCPtr(cstr) + CppFreeString(cstr) + var doc transcriptJSON + if err := json.Unmarshal([]byte(raw), &doc); err != nil { + return pb.TranscriptResult{}, fmt.Errorf("parakeet-cpp: decode transcript json: %w", err) + } + return transcriptResultFromDoc(doc, opts), nil } - raw := goStringFromCPtr(cstr) - CppFreeString(cstr) - + // Batched path: decode to PCM, submit to the batcher, wait for this request's + // JSON element. The dispatcher is the sole engine caller on this path; both + // sends honour ctx cancellation. + pcm, _, err := decodeWavMono16k(opts.Dst) + if err != nil { + return pb.TranscriptResult{}, err + } + rep := make(chan batchReply, 1) + select { + case p.bat.submit <- &batchRequest{pcm: pcm, decoder: 0, reply: rep}: + case <-ctx.Done(): + return pb.TranscriptResult{}, status.Error(codes.Canceled, "transcription cancelled") + } + var res batchReply + select { + case res = <-rep: + case <-ctx.Done(): + return pb.TranscriptResult{}, status.Error(codes.Canceled, "transcription cancelled") + } + if res.err != nil { + return pb.TranscriptResult{}, res.err + } var doc transcriptJSON - if err := json.Unmarshal([]byte(raw), &doc); err != nil { + if err := json.Unmarshal([]byte(res.json), &doc); err != nil { return pb.TranscriptResult{}, fmt.Errorf("parakeet-cpp: decode transcript json: %w", err) } + return transcriptResultFromDoc(doc, opts), nil +} +// transcriptResultFromDoc maps a decoded transcriptJSON to a TranscriptResult, +// synthesising a single whole-clip segment and attaching word timings only when +// the caller requested word granularity. Shared by the batched and direct paths. +func transcriptResultFromDoc(doc transcriptJSON, opts *pb.TranscriptRequest) pb.TranscriptResult { text := strings.TrimSpace(doc.Text) - words := make([]*pb.TranscriptWord, 0, len(doc.Words)) for _, w := range doc.Words { - words = append(words, &pb.TranscriptWord{ - Start: secondsToNanos(w.Start), - End: secondsToNanos(w.End), - Text: w.W, - }) + words = append(words, &pb.TranscriptWord{Start: secondsToNanos(w.Start), End: secondsToNanos(w.End), Text: w.W}) } - tokens := make([]int32, 0, len(doc.Tokens)) for _, t := range doc.Tokens { tokens = append(tokens, t.ID) } - - // Single whole-clip segment, spanning the first word start to the last - // word end (0/0 when the clip produced no words). var segStart, segEnd int64 if len(words) > 0 { segStart = words[0].Start segEnd = words[len(words)-1].End } - seg := &pb.TranscriptSegment{ - Id: 0, - Start: segStart, - End: segEnd, - Text: text, - Tokens: tokens, - } + seg := &pb.TranscriptSegment{Id: 0, Start: segStart, End: segEnd, Text: text, Tokens: tokens} if wordsRequested(opts.TimestampGranularities) { seg.Words = words } - - return pb.TranscriptResult{ - Text: text, - Segments: []*pb.TranscriptSegment{seg}, - }, nil + return pb.TranscriptResult{Text: text, Segments: []*pb.TranscriptSegment{seg}} } // wordsRequested reports whether the caller asked for word-level timestamps. @@ -243,6 +366,14 @@ func (p *ParakeetCpp) AudioTranscriptionStream(ctx context.Context, opts *pb.Tra return nil } defer CppStreamFree(stream) + // The C engine is a single shared context: a streaming session and a batched + // unary dispatch must never touch it at once, so hold engineMu for the whole + // stream. This lock is intentionally taken AFTER the non-streaming fallback + // above returns: that fallback goes through AudioTranscription -> the batcher + // -> runBatch, which itself acquires engineMu, so locking here first would + // deadlock. Do not hoist this lock above the fallback. + p.engineMu.Lock() + defer p.engineMu.Unlock() data, duration, err := decodeWavMono16k(opts.Dst) if err != nil { @@ -362,6 +493,12 @@ func decodeWavMono16k(path string) ([]float32, float32, error) { // Free releases the underlying parakeet_ctx. Called by LocalAI when the // model is unloaded. func (p *ParakeetCpp) Free() error { + // Stop the dispatcher before releasing the engine so no in-flight runBatch + // can touch a freed ctx (close leak / use-after-free on reload). + if p.batStop != nil { + close(p.batStop) + p.batStop = nil + } if p.ctxPtr != 0 { CppFree(p.ctxPtr) p.ctxPtr = 0 diff --git a/backend/go/parakeet-cpp/goparakeetcpp_test.go b/backend/go/parakeet-cpp/goparakeetcpp_test.go index 9ce425139..c059eb4bf 100644 --- a/backend/go/parakeet-cpp/goparakeetcpp_test.go +++ b/backend/go/parakeet-cpp/goparakeetcpp_test.go @@ -43,6 +43,9 @@ func ensureLibLoaded() { purego.RegisterLibFunc(&CppFree, lib, "parakeet_capi_free") purego.RegisterLibFunc(&CppTranscribePath, lib, "parakeet_capi_transcribe_path") purego.RegisterLibFunc(&CppTranscribePathJSON, lib, "parakeet_capi_transcribe_path_json") + if sym, err := purego.Dlsym(lib, "parakeet_capi_transcribe_pcm_batch_json"); err == nil && sym != 0 { + purego.RegisterLibFunc(&CppTranscribePcmBatchJSON, lib, "parakeet_capi_transcribe_pcm_batch_json") + } purego.RegisterLibFunc(&CppStreamBegin, lib, "parakeet_capi_stream_begin") purego.RegisterLibFunc(&CppStreamFeed, lib, "parakeet_capi_stream_feed") purego.RegisterLibFunc(&CppStreamFinalize, lib, "parakeet_capi_stream_finalize") diff --git a/backend/go/parakeet-cpp/main.go b/backend/go/parakeet-cpp/main.go index a8fd7fe3b..32d94b7b1 100644 --- a/backend/go/parakeet-cpp/main.go +++ b/backend/go/parakeet-cpp/main.go @@ -58,6 +58,13 @@ func main() { purego.RegisterLibFunc(lf.FuncPtr, lib, lf.Name) } + // The batched-JSON entry point exists only in newer libparakeet.so (ABI >= 2). + // Probe with Dlsym and register only if present, so the backend still loads + // against an older library (it falls back to per-request transcription). + if sym, err := purego.Dlsym(lib, "parakeet_capi_transcribe_pcm_batch_json"); err == nil && sym != 0 { + purego.RegisterLibFunc(&CppTranscribePcmBatchJSON, lib, "parakeet_capi_transcribe_pcm_batch_json") + } + fmt.Fprintf(os.Stderr, "[parakeet-cpp] ABI=%d\n", CppAbiVersion()) flag.Parse() diff --git a/backend/go/stablediffusion-ggml/Makefile b/backend/go/stablediffusion-ggml/Makefile index b23d1caf4..ca13f6fa1 100644 --- a/backend/go/stablediffusion-ggml/Makefile +++ b/backend/go/stablediffusion-ggml/Makefile @@ -8,7 +8,7 @@ JOBS?=$(shell nproc --ignore=1) # stablediffusion.cpp (ggml) STABLEDIFFUSION_GGML_REPO?=https://github.com/leejet/stable-diffusion.cpp -STABLEDIFFUSION_GGML_VERSION?=be65ac7511b30379b003626c15224798929e33d4 +STABLEDIFFUSION_GGML_VERSION?=7948df8ac1070f5f6881b8d34675821893eb97d6 CMAKE_ARGS+=-DGGML_MAX_NAME=128 diff --git a/backend/go/whisper/Makefile b/backend/go/whisper/Makefile index d71e32bcb..261fbe84c 100644 --- a/backend/go/whisper/Makefile +++ b/backend/go/whisper/Makefile @@ -8,7 +8,7 @@ JOBS?=$(shell nproc --ignore=1) # whisper.cpp version WHISPER_REPO?=https://github.com/ggml-org/whisper.cpp -WHISPER_CPP_VERSION?=fe69461618ffc50ba8afa65c25cc6c6e34d4537f +WHISPER_CPP_VERSION?=23ee03506a91ac3d3f0071b40e66a430eebdfa1d SO_TARGET?=libgowhisper.so CMAKE_ARGS+=-DBUILD_SHARED_LIBS=OFF diff --git a/backend/python/nemo/requirements-cublas13.txt b/backend/python/nemo/requirements-cublas13.txt index 50c18d53e..8c996c10b 100644 --- a/backend/python/nemo/requirements-cublas13.txt +++ b/backend/python/nemo/requirements-cublas13.txt @@ -1,3 +1,4 @@ --extra-index-url https://download.pytorch.org/whl/cu130 torch +texterrors==1.1.6 nemo_toolkit[asr] diff --git a/core/http/react-ui/coverage-baseline.txt b/core/http/react-ui/coverage-baseline.txt index b4be1a3b7..b2e09eeb0 100644 --- a/core/http/react-ui/coverage-baseline.txt +++ b/core/http/react-ui/coverage-baseline.txt @@ -1 +1 @@ -39.86 \ No newline at end of file +40.0 \ No newline at end of file diff --git a/core/http/react-ui/e2e/page-render-smoke.spec.js b/core/http/react-ui/e2e/page-render-smoke.spec.js new file mode 100644 index 000000000..40cfa1897 --- /dev/null +++ b/core/http/react-ui/e2e/page-render-smoke.spec.js @@ -0,0 +1,40 @@ +import { test, expect } from './coverage-fixtures.js' + +// Render-smoke coverage. Each page is lazy-loaded and runs its full render + +// initial effects on mount, so a bare visit captures the bulk of a page's +// lines — cheap, real coverage for pages that have no dedicated spec yet. +// +// This is the project's preferred way to keep the UI coverage gate green: +// raise the floor by covering more, rather than loosening the gate's +// tolerance (see CONTRIBUTING.md → "React UI coverage"). Auth is disabled in +// the test server, so RequireAdmin/RequireFeature resolve to isAdmin=true and +// every gated route renders without an auth/capability mock. +// +// Asserts the page mounted (its .page-title header is visible) and that it did +// not bounce to a gate redirect (/login or back to /app home). +const PAGES = [ + ['/app/talk', 'Talk'], + ['/app/usage', 'Usage'], + ['/app/account', 'Account'], + ['/app/studio', 'Studio'], + ['/app/manage', 'Manage'], + ['/app/backends', 'Backends'], + ['/app/settings', 'Settings'], + ['/app/nodes', 'Nodes'], + ['/app/face', 'Face recognition'], + ['/app/voice', 'Voice recognition'], + ['/app/fine-tune', 'Fine-tuning'], + ['/app/quantize', 'Quantize'], +] + +test.describe('Page render smoke', () => { + for (const [path, label] of PAGES) { + test(`renders ${label} (${path})`, async ({ page }) => { + await page.goto(path) + // .page-title for the normal header; .empty-state-title for pages that + // render a gated/empty state (e.g. Account when auth is disabled). + await expect(page.locator('.page-title, .empty-state-title').first()).toBeVisible({ timeout: 15_000 }) + await expect(page).toHaveURL(new RegExp(path.replace(/\//g, '\\/') + '$')) + }) + } +}) diff --git a/docs/content/_index.md b/docs/content/_index.md index d24cae7f3..d6410ffe1 100644 --- a/docs/content/_index.md +++ b/docs/content/_index.md @@ -1,10 +1,10 @@ +++ title = "LocalAI" -description = "The free, OpenAI, Anthropic alternative. Your All-in-One Complete AI Stack" +description = "The free, OpenAI and Anthropic alternative. A small, composable AI stack: run any model locally and install only what you use." type = "home" +++ -**The free, OpenAI, Anthropic alternative. Your All-in-One Complete AI Stack** - Run powerful language models, autonomous agents, and document intelligence **locally** on your hardware. +**The free, OpenAI and Anthropic alternative. A small, composable AI stack.** - Run powerful language models, autonomous agents, and document intelligence **locally** on your hardware. A lean core that pulls model backends on demand, so you install only what you use. **No cloud, no limits, no compromise.** diff --git a/docs/content/advanced/advanced-usage.md b/docs/content/advanced/advanced-usage.md index 7742eb29a..9b99eba80 100644 --- a/docs/content/advanced/advanced-usage.md +++ b/docs/content/advanced/advanced-usage.md @@ -273,7 +273,7 @@ A list of the environment variable that tweaks parallelism is the following: ``` ### Python backends GRPC max workers ### Default number of workers for GRPC Python backends. -### This actually controls wether a backend can process multiple requests or not. +### This actually controls whether a backend can process multiple requests or not. ### Define the number of parallel LLAMA.cpp workers (Defaults to 1) diff --git a/docs/content/advanced/fine-tuning.md b/docs/content/advanced/fine-tuning.md index f6c529bf3..4708a2c05 100644 --- a/docs/content/advanced/fine-tuning.md +++ b/docs/content/advanced/fine-tuning.md @@ -5,6 +5,8 @@ title = "Fine-tuning LLMs for text generation" weight = 22 +++ +![Fine-tuning recipe: from dataset to a servable GGUF via LoRA fine-tune and merge](/images/diagrams/finetune-recipe.png) + {{% notice note %}} Section under construction {{% /notice %}} diff --git a/docs/content/advanced/reverse-proxy-tls.md b/docs/content/advanced/reverse-proxy-tls.md index 57eb73689..24af55c62 100644 --- a/docs/content/advanced/reverse-proxy-tls.md +++ b/docs/content/advanced/reverse-proxy-tls.md @@ -4,6 +4,8 @@ description: Configure LocalAI behind a TLS termination reverse proxy (HAProxy, weight: 100 --- +![TLS at the edge: terminate TLS at the reverse proxy and forward headers so LocalAI emits correct https URLs](/images/diagrams/reverse-proxy-tls.png) + # TLS Reverse Proxy Configuration When running LocalAI behind a TLS termination reverse proxy, the Web UI may fail to load static assets (CSS, JS) correctly because the application doesn't automatically detect that it's being served over HTTPS. This guide explains how to properly configure your reverse proxy to work with LocalAI. diff --git a/docs/content/advanced/vram-management.md b/docs/content/advanced/vram-management.md index 3b7620e80..ee7c346be 100644 --- a/docs/content/advanced/vram-management.md +++ b/docs/content/advanced/vram-management.md @@ -5,6 +5,8 @@ weight = 22 url = '/advanced/vram-management' +++ +![VRAM management: least-recently-used eviction and concurrency-group anti-affinity keep hot models warm](/images/diagrams/vram-eviction.png) + When running multiple models in LocalAI, especially on systems with limited GPU memory (VRAM), you may encounter situations where loading a new model fails because there isn't enough available VRAM. LocalAI provides several mechanisms to automatically manage model memory allocation and prevent VRAM exhaustion: 1. **Max Active Backends (LRU Eviction)**: Limit the number of loaded models, evicting the least recently used when the limit is reached diff --git a/docs/content/faq.md b/docs/content/faq.md index afb6459e3..879e304ed 100644 --- a/docs/content/faq.md +++ b/docs/content/faq.md @@ -12,6 +12,22 @@ url = "/faq/" Here are answers to some of the most common questions. +### Do I need to install all the backends? + +No. You install only the backends your models use. LocalAI's core is a single binary (or container) that provides the OpenAI-compatible API, request routing, the web UI, and agents. Each inference backend (llama.cpp, vLLM, whisper.cpp, stable-diffusion, MLX, and others) is a separate artifact, installed only when a model needs it. + +In practice: + +- **You install one backend, not all of them.** Run a model with `local-ai run ` and the matching backend is pulled automatically; nothing else is downloaded. +- **Each backend is purpose-built for its engine.** LocalAI builds a dedicated gRPC backend around each engine, so every one stays independently optimized without a single binary trying to support every model architecture at once. +- **You manage backends individually** with `local-ai backends list/install/uninstall` or from the web UI. + +The catalog's breadth is optionality: you only ever run what your models use. + +### Can I bring my own model or backend? + +Yes. You can load any compatible model, not just the ones in the gallery. And because every backend talks to the core over a simple gRPC interface, you can write your own backend in any language and plug it in, exactly how the built-in backends work. Nothing about the core is closed off, which gives you the flexibility to run precisely the stack you want. + ### How do I get models? Most gguf-based models should work, but newer models may require additions to the API. If a model doesn't work, please feel free to open up issues. However, be cautious about downloading models from the internet and directly onto your machine, as there may be security vulnerabilities in lama.cpp or ggml that could be maliciously exploited. Some models can be found on Hugging Face: https://huggingface.co/models?search=gguf, or models from gpt4all are compatible too: https://github.com/nomic-ai/gpt4all. diff --git a/docs/content/features/agents.md b/docs/content/features/agents.md index b2eee5093..e6fd1d0e9 100644 --- a/docs/content/features/agents.md +++ b/docs/content/features/agents.md @@ -5,6 +5,8 @@ weight = 21 url = '/features/agents' +++ +![The in-process agent loop: agents call LocalAI's own chat API in a loop, streaming progress over SSE](/images/diagrams/agents-loop.png) + LocalAI includes a built-in agent platform powered by [LocalAGI](https://github.com/mudler/LocalAGI). Agents are autonomous AI entities that can reason, use tools, maintain memory, and interact with external services — all running locally as part of the LocalAI process. ## Overview diff --git a/docs/content/features/audio-diarization.md b/docs/content/features/audio-diarization.md index 8505ec97c..36d9437dc 100644 --- a/docs/content/features/audio-diarization.md +++ b/docs/content/features/audio-diarization.md @@ -5,6 +5,8 @@ weight = 17 url = "/features/audio-diarization/" +++ +![Diarization: segment, embed, and cluster (or a single ASR pass) into speaker-labelled segments](/images/diagrams/diarization-pipeline.png) + Speaker diarization answers the question **"who spoke when?"** — given an audio clip with multiple speakers, it returns time-stamped segments labelled with a stable speaker ID (`SPEAKER_00`, `SPEAKER_01`, …). LocalAI exposes this through the `/v1/audio/diarization` endpoint, modelled after `/v1/audio/transcriptions`. Two backends are supported today: diff --git a/docs/content/features/audio-to-text.md b/docs/content/features/audio-to-text.md index c786c9e7c..22e7d2529 100644 --- a/docs/content/features/audio-to-text.md +++ b/docs/content/features/audio-to-text.md @@ -187,6 +187,22 @@ curl http://localhost:8080/v1/audio/transcriptions \ For real-time use, load a cache-aware streaming model (e.g. `realtime_eou_120m-v1-*.gguf`) and pass `-F stream=true`. Deltas are emitted as the audio is decoded, with end-of-utterance events closing each segment. +### Dynamic batching + +The backend can coalesce concurrent transcription requests into a single batched engine call, which improves throughput on GPU when many requests arrive at once. Batching is **off by default** (`batch_max_size:1`, one request at a time); raise it to opt in. Two `options:` knobs control it: + +```yaml +name: parakeet-110m +backend: parakeet-cpp +parameters: + model: tdt_ctc-110m-f16.gguf +options: +- batch_max_size:8 # max requests coalesced into one batch (default 1 = off) +- batch_max_wait_ms:15 # how long to wait to fill a batch, in ms (default 15) +``` + +By default each request runs on its own. Raise `batch_max_size` (for example 4 to 16) to enable batching; it pays off on GPU under concurrent load, where coalescing the per-step decode GEMMs across requests is a large throughput win. Leave it at 1 on CPU and for low-concurrency setups, where batching only adds latency. Batching only affects concurrent unary requests; streaming sessions always run on their own. + ## See also - [Audio Transform]({{< relref "audio-transform.md" >}}) — clean up the audio (echo cancellation, noise suppression, dereverberation) before passing it to a transcription model. diff --git a/docs/content/features/audio-transform.md b/docs/content/features/audio-transform.md index 511b2e3d7..21ed9a7be 100644 --- a/docs/content/features/audio-transform.md +++ b/docs/content/features/audio-transform.md @@ -5,6 +5,8 @@ weight = 17 url = "/features/audio-transform/" +++ +![Audio transform: two inputs (mic plus reference) become one cleaned output; interleaved-stereo on the wire](/images/diagrams/audio-transform-io.png) + The audio-transform endpoints take **audio in** and emit **audio out**, optionally conditioned on a second reference audio signal. The category is generic by design — concrete operations include joint **acoustic echo cancellation + diff --git a/docs/content/features/cloud-proxy.md b/docs/content/features/cloud-proxy.md index 1c870a930..d7976dc94 100644 --- a/docs/content/features/cloud-proxy.md +++ b/docs/content/features/cloud-proxy.md @@ -7,6 +7,8 @@ tags = ["Proxy", "Cloud", "Routing", "Advanced"] categories = ["Features"] +++ +![Cloud proxy: a local API call is proxied to a hosted model while PII is redacted out and back](/images/diagrams/cloud-proxy-sequence.png) + LocalAI can forward chat-completion and Anthropic Messages requests to an external provider instead of running them through the local gRPC backend pipeline. Configure a model with `backend: cloud-proxy` and a `proxy.upstream_url`, diff --git a/docs/content/features/distributed-mode.md b/docs/content/features/distributed-mode.md index af5f74645..de50cba3e 100644 --- a/docs/content/features/distributed-mode.md +++ b/docs/content/features/distributed-mode.md @@ -13,28 +13,7 @@ Distributed mode requires authentication enabled with a **PostgreSQL** database ## Architecture Overview -``` - ┌─────────────────┐ - │ Load Balancer │ - └────────┬────────┘ - │ - ┌──────────────┼──────────────┐ - │ │ │ - ┌───────▼──────┐ ┌────▼─────┐ ┌─────▼──────┐ - │ Frontend #1 │ │ Frontend │ │ Frontend #N│ - │ (LocalAI) │ │ #2 │ │ (LocalAI) │ - └──────┬───────┘ └────┬─────┘ └─────┬──────┘ - │ │ │ - ┌───────▼──────────────▼──────────────▼───────┐ - │ PostgreSQL + NATS │ - │ (node registry, jobs, coordination) │ - └───────┬──────────────┬──────────────┬───────┘ - │ │ │ - ┌──────▼──────┐ ┌────▼─────┐ ┌─────▼──────┐ - │ Worker #1 │ │ Worker │ │ Worker #N │ - │ (generic) │ │ #2 │ │ (generic) │ - └─────────────┘ └──────────┘ └────────────┘ -``` +![Distributed mode architecture: a load balancer fronts stateless SmartRouter frontends backed by a shared NATS/PostgreSQL/S3 plane, with generic workers running per-model gRPC backends](/images/diagrams/distributed-mode-arch.png) **Frontends** are stateless LocalAI instances that receive API requests and route them to worker nodes via the **SmartRouter**. All frontends share state through PostgreSQL and coordinate via NATS. @@ -42,6 +21,8 @@ Distributed mode requires authentication enabled with a **PostgreSQL** database ### Scheduling Algorithm +![SmartRouter scheduling: idle-first placement that checks for an already-loaded node, then free VRAM, then an idle node, then preemptive LRU eviction, ending in backend.install and LoadModel](/images/diagrams/smartrouter-scheduling.png) + The SmartRouter uses **idle-first** scheduling with **preemptive eviction**: 1. If the model is already loaded on a node → use it (per-model gRPC address) 2. If no node has the model → prefer nodes with enough free VRAM @@ -432,6 +413,8 @@ This is **not** routed through the SmartRouter: it is a model-internal split, co ### Topology +![ds4 layer-split topology: workers dial in to the coordinator and own higher layer ranges, the inverse of llama.cpp RPC where the main server dials out to rpc-servers](/images/diagrams/ds4-layer-split.png) + ds4 uses a **coordinator/worker** split: - The **coordinator** owns tokenization, sampling, the prompt, and a low layer range (e.g. `0:19`). It is LocalAI's ds4 backend and **listens** on a host/port. Workers dial into it. diff --git a/docs/content/features/distributed_inferencing.md b/docs/content/features/distributed_inferencing.md index 3df597822..dc635a9f9 100644 --- a/docs/content/features/distributed_inferencing.md +++ b/docs/content/features/distributed_inferencing.md @@ -5,6 +5,8 @@ weight = 15 url = "/features/distribute/" +++ +![Federated vs worker mode: federated routes a whole request to one node; worker shards one model across nodes](/images/diagrams/federated-vs-worker.png) + {{% notice tip %}} Looking for production-grade horizontal scaling with PostgreSQL and NATS? See [Distributed Mode]({{% relref "features/distributed-mode" %}}). {{% /notice %}} diff --git a/docs/content/features/face-recognition.md b/docs/content/features/face-recognition.md index 34dc366fc..ecc3e7213 100644 --- a/docs/content/features/face-recognition.md +++ b/docs/content/features/face-recognition.md @@ -5,6 +5,8 @@ weight = 14 url = "/features/face-recognition/" +++ +![Face recognition: 1:N match against a vector store, with an anti-spoofing liveness gate that can veto a verification](/images/diagrams/face-recognition-flow.png) + LocalAI supports face recognition through the `insightface` backend: face verification (1:1), face identification (1:N) against a built-in vector store, face embedding, face detection, demographic analysis diff --git a/docs/content/features/fine-tuning.md b/docs/content/features/fine-tuning.md index adb04fe96..1c4b44591 100644 --- a/docs/content/features/fine-tuning.md +++ b/docs/content/features/fine-tuning.md @@ -5,6 +5,8 @@ weight = 18 url = '/features/fine-tuning/' +++ +![The fine-tune job lifecycle: create, train with SSE progress, then export to LoRA, merged, or GGUF](/images/diagrams/finetune-job-lifecycle.png) + LocalAI supports fine-tuning LLMs directly through the API and Web UI. Fine-tuning is powered by pluggable backends that implement a generic gRPC interface, allowing support for different training frameworks and model types. ## Supported Backends diff --git a/docs/content/features/image-generation.md b/docs/content/features/image-generation.md index e35b7fbf0..bb9748dd9 100644 --- a/docs/content/features/image-generation.md +++ b/docs/content/features/image-generation.md @@ -199,7 +199,7 @@ Pipelines types available: ##### Advanced: Additional parameters -Additional arbitrarly parameters can be specified in the option field in key/value separated by `:`: +Additional arbitrary parameters can be specified in the option field in key/value separated by `:`: ```yaml name: animagine-xl @@ -207,7 +207,7 @@ options: - "cfg_scale:6" ``` -**Note**: There is no complete parameter list. Any parameter can be passed arbitrarly and is passed to the model directly as argument to the pipeline. Different pipelines/implementations support different parameters. +**Note**: There is no complete parameter list. Any parameter can be passed arbitrarily and is passed to the model directly as argument to the pipeline. Different pipelines/implementations support different parameters. The example above, will result in the following python code when generating images: @@ -342,4 +342,4 @@ diffusers: ```bash (echo -n '{"prompt": "spiderman surfing","size": "512x512","model":"txt2vid"}') | curl -H "Content-Type: application/json" -X POST -d @- http://localhost:8080/v1/images/generations -``` \ No newline at end of file +``` diff --git a/docs/content/features/mcp.md b/docs/content/features/mcp.md index 55f3226e3..ed1cda503 100644 --- a/docs/content/features/mcp.md +++ b/docs/content/features/mcp.md @@ -7,6 +7,8 @@ tags = ["MCP", "Agents", "Tools", "Advanced"] categories = ["Features"] +++ +![Server-side vs client-side MCP: the model's tool loop runs on the server or in the browser](/images/diagrams/mcp-server-vs-client.png) + LocalAI now supports the **Model Context Protocol (MCP)**, enabling powerful agentic capabilities by connecting AI models to external tools and services. This feature allows your LocalAI models to interact with various MCP servers, providing access to real-time data, APIs, and specialized tools. diff --git a/docs/content/features/middleware.md b/docs/content/features/middleware.md index ee4ef9d4a..84b8fb382 100644 --- a/docs/content/features/middleware.md +++ b/docs/content/features/middleware.md @@ -7,6 +7,8 @@ tags = ["Routing", "Privacy", "PII", "Middleware", "Advanced"] categories = ["Features"] +++ +![The request lifecycle: one shared hook chain for auth, model routing, and PII, with decision and event logs](/images/diagrams/middleware-lifecycle.png) + LocalAI ships a request-middleware layer that sits between the HTTP API and the backend dispatcher. Two subsystems share that layer because they share the same lifecycle hook: **PII filtering** scans the request body before it diff --git a/docs/content/features/mitm-proxy.md b/docs/content/features/mitm-proxy.md index 4c0428df4..e5eb22acd 100644 --- a/docs/content/features/mitm-proxy.md +++ b/docs/content/features/mitm-proxy.md @@ -7,6 +7,8 @@ tags = ["Proxy", "MITM", "Privacy", "Routing", "Advanced"] categories = ["Features"] +++ +![MITM proxy: allowlisted hosts are decrypted and scanned, everything else is a blind TCP tunnel](/images/diagrams/mitm-intercept.png) + LocalAI can act as a local HTTPS proxy that **redacts PII from your Claude Code, OpenAI Codex CLI, or any HTTPS client** without holding their API keys. The proxy intercepts only the LLM API endpoints you allowlist (default: diff --git a/docs/content/features/mlx-distributed.md b/docs/content/features/mlx-distributed.md index 9e20474fd..307f7d612 100644 --- a/docs/content/features/mlx-distributed.md +++ b/docs/content/features/mlx-distributed.md @@ -5,6 +5,8 @@ weight = 18 url = '/features/mlx-distributed/' +++ +![MLX pipeline-parallel inference: layers split across ranks, rank 0 coordinates, activations flow down the ring](/images/diagrams/mlx-pipeline.png) + MLX distributed inference allows you to split large language models across multiple Apple Silicon Macs (or other devices) for joint inference. Unlike federation (which distributes whole requests), MLX distributed splits a single model's layers across machines so they all participate in every forward pass. ## How It Works diff --git a/docs/content/features/openai-functions.md b/docs/content/features/openai-functions.md index 9596fb5cb..4893cd4ef 100644 --- a/docs/content/features/openai-functions.md +++ b/docs/content/features/openai-functions.md @@ -6,6 +6,8 @@ weight = 17 url = "/features/openai-functions/" +++ +![Function calling: one tool-call request shape, each backend's native parser extracts the calls](/images/diagrams/tool-call-parsers.png) + LocalAI supports running the OpenAI [functions and tools API](https://platform.openai.com/docs/api-reference/chat/create#chat-create-tools) across multiple backends. The OpenAI request shape is the same regardless of which backend runs your model — LocalAI is responsible for extracting structured tool calls from the model's output before returning the response. ![localai-functions-1](https://github.com/ggerganov/llama.cpp/assets/2420543/5bd15da2-78c1-4625-be90-1e938e6823f1) diff --git a/docs/content/features/openai-realtime.md b/docs/content/features/openai-realtime.md index 57a7fe597..8dba6d419 100644 --- a/docs/content/features/openai-realtime.md +++ b/docs/content/features/openai-realtime.md @@ -4,6 +4,8 @@ title: "Realtime API" weight: 60 --- +![The realtime voice loop: VAD to STT to LLM to TTS, over WebSocket or WebRTC](/images/diagrams/realtime-pipeline.png) + LocalAI supports the [OpenAI Realtime API](https://platform.openai.com/docs/guides/realtime) which enables low-latency, multi-modal conversations (voice and text) over WebSocket. To use the Realtime API, you need to configure a pipeline model that defines the components for Voice Activity Detection (VAD), Transcription (STT), Language Model (LLM), and Text-to-Speech (TTS). diff --git a/docs/content/features/quantization.md b/docs/content/features/quantization.md index c78e4c6d8..cadebb50d 100644 --- a/docs/content/features/quantization.md +++ b/docs/content/features/quantization.md @@ -5,6 +5,8 @@ weight = 19 url = '/features/quantization/' +++ +![From an HF model to a quantized GGUF: convert to f16, then quantize, tracked as a job](/images/diagrams/quantization-flow.png) + LocalAI supports model quantization directly through the API and Web UI. Quantization converts HuggingFace models to GGUF format and compresses them to smaller sizes for efficient inference with llama.cpp. {{% notice note %}} diff --git a/docs/content/features/reranker.md b/docs/content/features/reranker.md index bf830d768..b171bcfa0 100644 --- a/docs/content/features/reranker.md +++ b/docs/content/features/reranker.md @@ -6,6 +6,8 @@ weight = 11 url = "/features/reranker/" +++ +![Two-stage retrieval: a fast retriever finds candidates, a cross-encoder reorders them by relevance](/images/diagrams/reranker-pipeline.png) + A **reranking** model, often referred to as a cross-encoder, is a core component in the two-stage retrieval systems used in information retrieval and natural language processing tasks. Given a query and a set of documents, it will output similarity scores. diff --git a/docs/content/features/text-generation.md b/docs/content/features/text-generation.md index b39377e73..c09717a3f 100644 --- a/docs/content/features/text-generation.md +++ b/docs/content/features/text-generation.md @@ -516,7 +516,7 @@ The `llama.cpp` backend supports additional configuration options that can be sp | `cache_idle_slots` or `idle_slots_cache` | boolean | On a new task, save the previous slot's KV state into the prompt cache (and clear the slot) so a later request with the same prefix can warm-load it. Default: `true`. Auto-disabled by the server if `kv_unified=false` or `cache_ram=0`. | `cache_idle_slots:false` | | `n_ctx_checkpoints` or `ctx_checkpoints` | integer | Maximum number of context checkpoints per slot (used for partial-prefix recovery, e.g. SWA). Default: `32`. | `ctx_checkpoints:16` | | `checkpoint_min_step` or `checkpoint_min_spacing` (aliases: `checkpoint_every_nt`, `checkpoint_every_n_tokens`) | integer | Minimum spacing in tokens between context checkpoints. `0` disables the minimum-spacing gate. Default: `256`. (Renamed upstream from `checkpoint_every_nt`; semantics shifted from a fixed cadence to a minimum spacing.) | `checkpoint_min_step:1024` | -| `split_mode` or `sm` | string | How to split the model across multiple GPUs: `none` (single GPU only), `layer` (default — split layers and KV across GPUs), `row` (split rows across GPUs), `tensor` (experimental tensor parallelism — requires `flash_attention: true`, no KV-cache quantization, manually set `context_size`, and a llama.cpp build that includes [#19378](https://github.com/ggml-org/llama.cpp/pull/19378)). | `split_mode:tensor` | +| `split_mode` or `sm` | string | How to split the model across multiple GPUs: `none` (single GPU only), `layer` (default — split layers and KV across GPUs), `row` (split rows across GPUs), `tensor` (experimental tensor parallelism, requires `flash_attention: true`, manually set `context_size`, and a llama.cpp build that includes [#19378](https://github.com/ggml-org/llama.cpp/pull/19378); it historically also required KV-cache quantization to be disabled, but [#23792](https://github.com/ggml-org/llama.cpp/pull/23792) lifts that restriction so `cache_type_k`/`cache_type_v` quantization can be combined with tensor parallelism on builds that include it). | `split_mode:tensor` | **Example configuration with options:** @@ -897,7 +897,7 @@ The backend will automatically download the required files in order to run the m - `OVModelForCausalLM` requires OpenVINO IR [Text Generation](https://huggingface.co/models?library=openvino&pipeline_tag=text-generation) models from Hugging face - `OVModelForFeatureExtraction` works with any Safetensors Transformer [Feature Extraction](https://huggingface.co/models?pipeline_tag=feature-extraction&library=transformers,safetensors) model from Huggingface (Embedding Model) -Please note that streaming is currently not implemente in `AutoModelForCausalLM` for Intel GPU. +Please note that streaming is currently not implemented in `AutoModelForCausalLM` for Intel GPU. AMD GPU support is not implemented. Although AMD CPU is not officially supported by OpenVINO there are reports that it works: YMMV. @@ -1008,4 +1008,4 @@ template: completion: | {{.Input}} -``` \ No newline at end of file +``` diff --git a/docs/content/features/voice-recognition.md b/docs/content/features/voice-recognition.md index 4e6ccc389..20728a28f 100644 --- a/docs/content/features/voice-recognition.md +++ b/docs/content/features/voice-recognition.md @@ -5,6 +5,8 @@ weight = 15 url = "/features/voice-recognition/" +++ +![Voice recognition: register, identify, and forget voiceprints in a vector store, for 1:1 verify or 1:N identify](/images/diagrams/voice-recognition-flow.png) + LocalAI supports voice (speaker) recognition through the `speaker-recognition` backend: speaker verification (1:1), speaker identification (1:N) against a built-in vector store, speaker diff --git a/docs/content/getting-started/models.md b/docs/content/getting-started/models.md index 3382d723a..cf949f715 100644 --- a/docs/content/getting-started/models.md +++ b/docs/content/getting-started/models.md @@ -6,6 +6,8 @@ icon = "hub" description = "Learn how to install, configure, and manage models in LocalAI" +++ +![Model resolution: many sources converge on one resolve, auto-detect backend, load, and serve path](/images/diagrams/model-resolution.png) + This section covers everything you need to know about installing and configuring models in LocalAI. You'll learn multiple methods to get models running. ## Prerequisites diff --git a/docs/content/getting-started/quickstart.md b/docs/content/getting-started/quickstart.md index 5dad07ca3..8c23f0fe6 100644 --- a/docs/content/getting-started/quickstart.md +++ b/docs/content/getting-started/quickstart.md @@ -6,6 +6,8 @@ url = '/basics/getting_started/' icon = "rocket_launch" +++ +![Quickstart journey: install, start LocalAI, pick a model, then chat or curl the API](/images/diagrams/quickstart-journey.png) + **LocalAI** is a free, open-source alternative to OpenAI (Anthropic, etc.), functioning as a drop-in replacement REST API for local inferencing. It allows you to run [LLMs]({{% relref "features/text-generation" %}}), generate images, and produce audio, all locally or on-premises with consumer-grade hardware, supporting multiple model families and architectures. LocalAI comes with a **built-in web interface** for chatting with models, managing installations, configuring AI agents, and more — no extra tools needed. diff --git a/docs/content/overview.md b/docs/content/overview.md index aec385a2d..387eb3f10 100644 --- a/docs/content/overview.md +++ b/docs/content/overview.md @@ -11,7 +11,9 @@ icon = "info" +++ -LocalAI is your complete AI stack for running AI models locally. It's designed to be simple, efficient, and accessible, providing a drop-in replacement for OpenAI's API while keeping your data private and secure. +LocalAI is a composable AI stack for running models locally: a small core that speaks the OpenAI and Anthropic APIs, with each model backend added only when you need it. It's simple, efficient, and private by default, and a drop-in replacement that keeps your data on your own hardware. + +![How LocalAI works: clients speak one API to a small core, which routes each request over gRPC to separate backend processes pulled on demand](/images/diagrams/architecture-overview.png) ## Why LocalAI? @@ -21,22 +23,26 @@ In today's AI landscape, privacy, control, and flexibility are paramount. LocalA - **Complete Control**: Run models on your terms, with your hardware - **Open Source**: MIT licensed and community-driven - **Flexible Deployment**: From laptops to servers, with or without GPUs -- **Extensible**: Add new models and features as needed +- **Composable by design**: A small core, not a bundle. Backends are separate and installed on demand, so you only run what you use ## What's Included -LocalAI is a single binary (or container) that gives you everything you need: +The LocalAI core is a single small binary (or container). It gives you everything you need to serve models, and pulls each model backend on demand, so you install only what you use: - **OpenAI-compatible API** — Drop-in replacement for OpenAI, Anthropic, and Open Responses APIs - **Built-in Web Interface** — Chat, model management, agent creation, image generation, and system monitoring - **AI Agents** — Create autonomous agents with MCP (Model Context Protocol) tool support, directly from the UI -- **Multiple Model Support** — LLMs, image generation, text-to-speech, speech-to-text, vision, embeddings, and more +- **Any Model, Any Modality**: LLMs, image and video, text-to-speech, speech-to-text, vision, and embeddings, each on its own backend, pulled automatically when you load a model - **GPU Acceleration** — Automatic detection and support for NVIDIA, AMD, Intel, and Vulkan GPUs - **Distributed Mode** — Scale horizontally with worker nodes, P2P federation, and model sharding - **No GPU Required** — Runs on CPU with consumer-grade hardware LocalAI integrates [LocalAGI](https://github.com/mudler/LocalAGI) (agent platform) and [LocalRecall](https://github.com/mudler/LocalRecall) (semantic memory) as built-in libraries — no separate installation needed. +Each backend is a dedicated gRPC service that LocalAI builds around a best-in-class engine (llama.cpp, vLLM, whisper.cpp, stable-diffusion, MLX, and more), exposing it through the unified API. Backends ship as standard OCI images and run as isolated processes, so each one can be installed, upgraded, or removed without touching the core, can even run on a separate machine, and a fault in one never brings down the rest. + +Because the backend contract is a simple gRPC interface, the system is open: bring your own model, or write a custom backend in any language and plug it in, exactly how the built-in backends work. This is what keeps the core small and gives you the flexibility to run precisely the stack you want, instead of compiling every engine into one binary. + ## Getting Started LocalAI can be installed in several ways. **Docker is the recommended installation method** for most users as it provides the easiest setup and works across all platforms. diff --git a/docs/content/reference/architecture.md b/docs/content/reference/architecture.md index 9f701bc5d..b0aa7e81c 100644 --- a/docs/content/reference/architecture.md +++ b/docs/content/reference/architecture.md @@ -9,7 +9,7 @@ LocalAI is an API written in Go that serves as an OpenAI shim, enabling software LocalAI uses a mixture of backends written in various languages (C++, Golang, Python, ...). You can check [the model compatibility table]({{%relref "reference/compatibility-table" %}}) to learn about all the components of LocalAI. -![localai](https://github.com/go-skynet/localai-website/assets/2420543/6492e685-8282-4217-9daa-e229a31548bc) +![How LocalAI works: clients speak one API to a small core, which routes each request over gRPC to separate backend processes pulled on demand](/images/diagrams/architecture-overview.png) ## Backstory diff --git a/docs/content/whats-new.md b/docs/content/whats-new.md index 8a393b4b4..e93fd6483 100644 --- a/docs/content/whats-new.md +++ b/docs/content/whats-new.md @@ -105,7 +105,7 @@ It is now possible for single-devices with one GPU to specify `--single-active-b #### Resources management -Thanks to the continous community efforts (another cool contribution from {{< github "dave-gray101" >}} ) now it's possible to shutdown a backend programmatically via the API. +Thanks to the continuous community efforts (another cool contribution from {{< github "dave-gray101" >}} ) now it's possible to shutdown a backend programmatically via the API. There is an ongoing effort in the community to better handling of resources. See also the [🔥Roadmap](https://localai.io/#-hot-topics--roadmap). #### New how-to section diff --git a/docs/static/images/diagrams/agents-loop.html b/docs/static/images/diagrams/agents-loop.html new file mode 100644 index 000000000..d051aa3c3 --- /dev/null +++ b/docs/static/images/diagrams/agents-loop.html @@ -0,0 +1,166 @@ + + + + + + + + + + +
+
+
+
+
LocalAI · Agents
+

The in-process agent loop

+
+
+
SELF
+
hosted
+
+
+
+
+
Agents call LocalAI's own chat API in a loop; progress streams back over SSE.
+
localai.io/features/agents
+
+
+ + + diff --git a/docs/static/images/diagrams/agents-loop.png b/docs/static/images/diagrams/agents-loop.png new file mode 100644 index 000000000..662a32714 Binary files /dev/null and b/docs/static/images/diagrams/agents-loop.png differ diff --git a/docs/static/images/diagrams/architecture-overview.html b/docs/static/images/diagrams/architecture-overview.html new file mode 100644 index 000000000..359330d2a --- /dev/null +++ b/docs/static/images/diagrams/architecture-overview.html @@ -0,0 +1,146 @@ + + + + + + + + + + +
+
+
+
+
LocalAI · Architecture
+

How LocalAI works

+
+
+
ONE API
+
many engines
+
+
+
+
+
Clients speak one API. The core routes each request. Every backend is a separate process, pulled only when a model needs it.
+
localai.io/docs/overview
+
+
+ + + diff --git a/docs/static/images/diagrams/architecture-overview.png b/docs/static/images/diagrams/architecture-overview.png new file mode 100644 index 000000000..e538795fd Binary files /dev/null and b/docs/static/images/diagrams/architecture-overview.png differ diff --git a/docs/static/images/diagrams/audio-transform-io.html b/docs/static/images/diagrams/audio-transform-io.html new file mode 100644 index 000000000..5d18820a5 --- /dev/null +++ b/docs/static/images/diagrams/audio-transform-io.html @@ -0,0 +1,172 @@ + + + + + + + + + + +
+
+
+
+
LocalAI · Audio Transform
+

Near + far in, clean out

+
+
+
STEREO
+
wire
+
+
+
+
+
Two inputs (mic + reference) transform to one cleaned output; interleaved-stereo on the wire.
+
localai.io/features/audio-transform
+
+
+ + + diff --git a/docs/static/images/diagrams/audio-transform-io.png b/docs/static/images/diagrams/audio-transform-io.png new file mode 100644 index 000000000..16053e25c Binary files /dev/null and b/docs/static/images/diagrams/audio-transform-io.png differ diff --git a/docs/static/images/diagrams/cloud-proxy-sequence.html b/docs/static/images/diagrams/cloud-proxy-sequence.html new file mode 100644 index 000000000..1d6851b4b --- /dev/null +++ b/docs/static/images/diagrams/cloud-proxy-sequence.html @@ -0,0 +1,157 @@ + + + + + + + + + + +
+
+
+
+
LocalAI · Cloud Proxy
+

Local API, cloud model

+
+
+
PII
+
filtered
+
+
+
+
+
Proxy to a hosted model while PII is redacted on the way out and on the way back.
+
localai.io/features/cloud-proxy
+
+
+ + + diff --git a/docs/static/images/diagrams/cloud-proxy-sequence.png b/docs/static/images/diagrams/cloud-proxy-sequence.png new file mode 100644 index 000000000..4661057f9 Binary files /dev/null and b/docs/static/images/diagrams/cloud-proxy-sequence.png differ diff --git a/docs/static/images/diagrams/composable-core.html b/docs/static/images/diagrams/composable-core.html new file mode 100644 index 000000000..8802a601e --- /dev/null +++ b/docs/static/images/diagrams/composable-core.html @@ -0,0 +1,149 @@ + + + + + + + + + + +
+
+
+
+
LocalAI · Architecture
+

One small core.
Backends you plug in.

+
+
+
ONLY WHAT
+
you actually run
+
+
+ +
+ +
+
Run a model and the right engine is pulled automatically.
Each backend is its own image, optimized for one job. Install nothing you don't use.
+
localai.io
+
+
+ + + + diff --git a/docs/static/images/diagrams/composable-core.png b/docs/static/images/diagrams/composable-core.png new file mode 100644 index 000000000..2f2108862 Binary files /dev/null and b/docs/static/images/diagrams/composable-core.png differ diff --git a/docs/static/images/diagrams/diarization-pipeline.html b/docs/static/images/diagrams/diarization-pipeline.html new file mode 100644 index 000000000..12868b265 --- /dev/null +++ b/docs/static/images/diagrams/diarization-pipeline.html @@ -0,0 +1,161 @@ + + + + + + + + + + +
+
+
+
+
LocalAI · Diarization
+

Who spoke when

+
+
+
RTTM
+
out
+
+
+
+
+
Segment, embed, and cluster - or a single ASR pass - into speaker-labelled segments.
+
localai.io/features/audio-diarization
+
+
+ + + diff --git a/docs/static/images/diagrams/diarization-pipeline.png b/docs/static/images/diagrams/diarization-pipeline.png new file mode 100644 index 000000000..21898d2e4 Binary files /dev/null and b/docs/static/images/diagrams/diarization-pipeline.png differ diff --git a/docs/static/images/diagrams/distributed-mode-arch.html b/docs/static/images/diagrams/distributed-mode-arch.html new file mode 100644 index 000000000..f1e49b708 --- /dev/null +++ b/docs/static/images/diagrams/distributed-mode-arch.html @@ -0,0 +1,170 @@ + + + + + + + + + + +
+
+
+
+
LocalAI · Distributed Mode
+

One control plane, many workers

+
+
+
SCALE
+
out
+
+
+
+
+
Stateless frontends, a shared NATS/Postgres plane, and generic workers running per-model backends.
+
localai.io/features/distributed-mode
+
+
+ + + diff --git a/docs/static/images/diagrams/distributed-mode-arch.png b/docs/static/images/diagrams/distributed-mode-arch.png new file mode 100644 index 000000000..52117732e Binary files /dev/null and b/docs/static/images/diagrams/distributed-mode-arch.png differ diff --git a/docs/static/images/diagrams/ds4-layer-split.html b/docs/static/images/diagrams/ds4-layer-split.html new file mode 100644 index 000000000..0afe8bdc7 --- /dev/null +++ b/docs/static/images/diagrams/ds4-layer-split.html @@ -0,0 +1,164 @@ + + + + + + + + + + +
+
+
+
+
LocalAI · ds4 layer split
+

Workers dial in

+
+
+
LAYER
+
split
+
+
+
+
+
ds4 workers connect to the coordinator (llama.cpp RPC dials the other direction).
+
localai.io/features/distributed-mode
+
+
+ + + diff --git a/docs/static/images/diagrams/ds4-layer-split.png b/docs/static/images/diagrams/ds4-layer-split.png new file mode 100644 index 000000000..57cd12031 Binary files /dev/null and b/docs/static/images/diagrams/ds4-layer-split.png differ diff --git a/docs/static/images/diagrams/face-recognition-flow.html b/docs/static/images/diagrams/face-recognition-flow.html new file mode 100644 index 000000000..21b367b7d --- /dev/null +++ b/docs/static/images/diagrams/face-recognition-flow.html @@ -0,0 +1,160 @@ + + + + + + + + + + +
+
+
+
+
LocalAI · Face Recognition
+

Identify, with a liveness gate

+
+
+
1:N
+
+ live
+
+
+
+
+
1:N match against a vector store; anti-spoofing can veto a verification.
+
localai.io/features/face-recognition
+
+
+ + + diff --git a/docs/static/images/diagrams/face-recognition-flow.png b/docs/static/images/diagrams/face-recognition-flow.png new file mode 100644 index 000000000..387caf63b Binary files /dev/null and b/docs/static/images/diagrams/face-recognition-flow.png differ diff --git a/docs/static/images/diagrams/federated-vs-worker.html b/docs/static/images/diagrams/federated-vs-worker.html new file mode 100644 index 000000000..e8badc25c --- /dev/null +++ b/docs/static/images/diagrams/federated-vs-worker.html @@ -0,0 +1,194 @@ + + + + + + + + + + +
+
+
+
+
LocalAI · Distributed
+

Federated vs worker mode

+
+
+
TWO
+
modes
+
+
+
+
+
Federated routes whole requests to one node; worker shards one model across machines.
+
localai.io/features/distributed_inferencing
+
+
+ + + diff --git a/docs/static/images/diagrams/federated-vs-worker.png b/docs/static/images/diagrams/federated-vs-worker.png new file mode 100644 index 000000000..3d0e054e7 Binary files /dev/null and b/docs/static/images/diagrams/federated-vs-worker.png differ diff --git a/docs/static/images/diagrams/finetune-job-lifecycle.html b/docs/static/images/diagrams/finetune-job-lifecycle.html new file mode 100644 index 000000000..b71389d40 --- /dev/null +++ b/docs/static/images/diagrams/finetune-job-lifecycle.html @@ -0,0 +1,158 @@ + + + + + + + + + + +
+
+
+
+
LocalAI · Fine-tuning jobs
+

The fine-tune job lifecycle

+
+
+
SSE
+
progress
+
+
+
+
+
Create, train with live SSE progress, then export to LoRA, merged, or GGUF.
+
localai.io/features/fine-tuning
+
+
+ + + diff --git a/docs/static/images/diagrams/finetune-job-lifecycle.png b/docs/static/images/diagrams/finetune-job-lifecycle.png new file mode 100644 index 000000000..1e686e51f Binary files /dev/null and b/docs/static/images/diagrams/finetune-job-lifecycle.png differ diff --git a/docs/static/images/diagrams/finetune-recipe.html b/docs/static/images/diagrams/finetune-recipe.html new file mode 100644 index 000000000..e99e9da35 --- /dev/null +++ b/docs/static/images/diagrams/finetune-recipe.html @@ -0,0 +1,144 @@ + + + + + + + + + + +
+
+
+
+
LocalAI · Fine-tuning
+

Train, merge, deploy

+
+
+
LoRA
+
to GGUF
+
+
+
+
+
From dataset to a servable GGUF, via LoRA fine-tune and merge.
+
localai.io/advanced/fine-tuning
+
+
+ + + diff --git a/docs/static/images/diagrams/finetune-recipe.png b/docs/static/images/diagrams/finetune-recipe.png new file mode 100644 index 000000000..5a5409458 Binary files /dev/null and b/docs/static/images/diagrams/finetune-recipe.png differ diff --git a/docs/static/images/diagrams/mcp-server-vs-client.html b/docs/static/images/diagrams/mcp-server-vs-client.html new file mode 100644 index 000000000..64c523a48 --- /dev/null +++ b/docs/static/images/diagrams/mcp-server-vs-client.html @@ -0,0 +1,183 @@ + + + + + + + + + + +
+
+
+
+
LocalAI · MCP
+

Server-side vs client-side tools

+
+
+
TWO
+
loops
+
+
+
+
+
The model's tool loop runs on the server, or in the browser - same chat API.
+
localai.io/features/mcp
+
+
+ + + diff --git a/docs/static/images/diagrams/mcp-server-vs-client.png b/docs/static/images/diagrams/mcp-server-vs-client.png new file mode 100644 index 000000000..ab3be8b67 Binary files /dev/null and b/docs/static/images/diagrams/mcp-server-vs-client.png differ diff --git a/docs/static/images/diagrams/middleware-lifecycle.html b/docs/static/images/diagrams/middleware-lifecycle.html new file mode 100644 index 000000000..b5b9c7d0c --- /dev/null +++ b/docs/static/images/diagrams/middleware-lifecycle.html @@ -0,0 +1,159 @@ + + + + + + + + + + +
+
+
+
+
LocalAI · Middleware
+

The request lifecycle

+
+
+
HOOK
+
chain
+
+
+
+
+
One shared hook chain: auth, model routing, and PII, with decision and event logs.
+
localai.io/features/middleware
+
+
+ + + diff --git a/docs/static/images/diagrams/middleware-lifecycle.png b/docs/static/images/diagrams/middleware-lifecycle.png new file mode 100644 index 000000000..ac3321b5d Binary files /dev/null and b/docs/static/images/diagrams/middleware-lifecycle.png differ diff --git a/docs/static/images/diagrams/mitm-intercept.html b/docs/static/images/diagrams/mitm-intercept.html new file mode 100644 index 000000000..70638d904 --- /dev/null +++ b/docs/static/images/diagrams/mitm-intercept.html @@ -0,0 +1,185 @@ + + + + + + + + + + +
+
+
+
+
LocalAI · MITM Proxy
+

Inspect what you allow, tunnel the rest

+
+
+
TLS
+
selective
+
+
+
+
+
Allowlisted hosts are decrypted and scanned; everything else is a blind TCP tunnel.
+
localai.io/features/mitm-proxy
+
+
+ + + diff --git a/docs/static/images/diagrams/mitm-intercept.png b/docs/static/images/diagrams/mitm-intercept.png new file mode 100644 index 000000000..a7a3df17d Binary files /dev/null and b/docs/static/images/diagrams/mitm-intercept.png differ diff --git a/docs/static/images/diagrams/mlx-pipeline.html b/docs/static/images/diagrams/mlx-pipeline.html new file mode 100644 index 000000000..16577e44b --- /dev/null +++ b/docs/static/images/diagrams/mlx-pipeline.html @@ -0,0 +1,134 @@ + + + + + + + + + + +
+
+
+
+
LocalAI · MLX Distributed
+

Pipeline-parallel across ranks

+
+
+
RING
+
TCP
+
+
+
+
+
Layers split across ranks; rank 0 coordinates, activations flow down the ring.
+
localai.io/features/mlx-distributed
+
+
+ + + diff --git a/docs/static/images/diagrams/mlx-pipeline.png b/docs/static/images/diagrams/mlx-pipeline.png new file mode 100644 index 000000000..66534a444 Binary files /dev/null and b/docs/static/images/diagrams/mlx-pipeline.png differ diff --git a/docs/static/images/diagrams/model-resolution.html b/docs/static/images/diagrams/model-resolution.html new file mode 100644 index 000000000..c8473ae3f --- /dev/null +++ b/docs/static/images/diagrams/model-resolution.html @@ -0,0 +1,148 @@ + + + + + + + + + + +
+
+
+
+
LocalAI · Models
+

Many sources, one load path

+
+
+
AUTO
+
detect
+
+
+
+
+
However you point at a model, it lands on the same resolve → backend → load path.
+
localai.io/getting-started/models
+
+
+ + + diff --git a/docs/static/images/diagrams/model-resolution.png b/docs/static/images/diagrams/model-resolution.png new file mode 100644 index 000000000..7ccbabed3 Binary files /dev/null and b/docs/static/images/diagrams/model-resolution.png differ diff --git a/docs/static/images/diagrams/quantization-flow.html b/docs/static/images/diagrams/quantization-flow.html new file mode 100644 index 000000000..d07569a04 --- /dev/null +++ b/docs/static/images/diagrams/quantization-flow.html @@ -0,0 +1,180 @@ + + + + + + + + + + +
+
+
+
+
LocalAI · Quantization
+

From HF model to quantized GGUF

+
+
+
GGUF
+
q4..q8
+
+
+
+
+
Convert first, then quantize - tracked as a job from queued to completed.
+
localai.io/features/quantization
+
+
+ + + diff --git a/docs/static/images/diagrams/quantization-flow.png b/docs/static/images/diagrams/quantization-flow.png new file mode 100644 index 000000000..5b3380ce1 Binary files /dev/null and b/docs/static/images/diagrams/quantization-flow.png differ diff --git a/docs/static/images/diagrams/quickstart-journey.html b/docs/static/images/diagrams/quickstart-journey.html new file mode 100644 index 000000000..9ce5a78f7 --- /dev/null +++ b/docs/static/images/diagrams/quickstart-journey.html @@ -0,0 +1,135 @@ + + + + + + + + + + +
+
+
+
+
LocalAI · Quickstart
+

Install, run, serve

+
+
+
QUICK
+
start
+
+
+
+
+
From install to your first /v1 call in three steps.
+
localai.io/basics/getting_started
+
+
+ + + diff --git a/docs/static/images/diagrams/quickstart-journey.png b/docs/static/images/diagrams/quickstart-journey.png new file mode 100644 index 000000000..e4bd3ebfa Binary files /dev/null and b/docs/static/images/diagrams/quickstart-journey.png differ diff --git a/docs/static/images/diagrams/realtime-pipeline.html b/docs/static/images/diagrams/realtime-pipeline.html new file mode 100644 index 000000000..ea9de73e6 --- /dev/null +++ b/docs/static/images/diagrams/realtime-pipeline.html @@ -0,0 +1,139 @@ + + + + + + + + + + +
+
+
+
+
LocalAI · Realtime API
+

The realtime voice loop

+
+
+
WS
+
/ WebRTC
+
+
+
+
+
Voice in, voice out: VAD → STT → LLM → TTS, over WebSocket or WebRTC.
+
localai.io/features/openai-realtime
+
+
+ + + diff --git a/docs/static/images/diagrams/realtime-pipeline.png b/docs/static/images/diagrams/realtime-pipeline.png new file mode 100644 index 000000000..064d02ebc Binary files /dev/null and b/docs/static/images/diagrams/realtime-pipeline.png differ diff --git a/docs/static/images/diagrams/reranker-pipeline.html b/docs/static/images/diagrams/reranker-pipeline.html new file mode 100644 index 000000000..3a574959e --- /dev/null +++ b/docs/static/images/diagrams/reranker-pipeline.html @@ -0,0 +1,171 @@ + + + + + + + + + + +
+
+
+
+
LocalAI · Reranker
+

Two-stage retrieval

+
+
+
RE
+
rank
+
+
+
+
+
A fast retriever finds candidates; the cross-encoder reorders them by true relevance.
+
localai.io/features/reranker
+
+
+ + + diff --git a/docs/static/images/diagrams/reranker-pipeline.png b/docs/static/images/diagrams/reranker-pipeline.png new file mode 100644 index 000000000..c618392fa Binary files /dev/null and b/docs/static/images/diagrams/reranker-pipeline.png differ diff --git a/docs/static/images/diagrams/reverse-proxy-tls.html b/docs/static/images/diagrams/reverse-proxy-tls.html new file mode 100644 index 000000000..a00210801 --- /dev/null +++ b/docs/static/images/diagrams/reverse-proxy-tls.html @@ -0,0 +1,175 @@ + + + + + + + + + + +
+
+
+
+
LocalAI · Deployment
+

TLS at the edge

+
+
+
X-FWD
+
headers
+
+
+
+
+
Terminate TLS at the proxy; forwarded headers let LocalAI emit correct https asset URLs.
+
localai.io/docs
+
+
+ + + diff --git a/docs/static/images/diagrams/reverse-proxy-tls.png b/docs/static/images/diagrams/reverse-proxy-tls.png new file mode 100644 index 000000000..092a708b3 Binary files /dev/null and b/docs/static/images/diagrams/reverse-proxy-tls.png differ diff --git a/docs/static/images/diagrams/smartrouter-scheduling.html b/docs/static/images/diagrams/smartrouter-scheduling.html new file mode 100644 index 000000000..1e9e0e9a7 --- /dev/null +++ b/docs/static/images/diagrams/smartrouter-scheduling.html @@ -0,0 +1,171 @@ + + + + + + + + + + +
+
+
+
+
LocalAI · SmartRouter
+

How the router places a request

+
+
+
IDLE
+
first
+
+
+
+
+
Idle-first placement with preemptive least-recently-used eviction.
+
localai.io/features/distributed-mode
+
+
+ + + diff --git a/docs/static/images/diagrams/smartrouter-scheduling.png b/docs/static/images/diagrams/smartrouter-scheduling.png new file mode 100644 index 000000000..b375e5341 Binary files /dev/null and b/docs/static/images/diagrams/smartrouter-scheduling.png differ diff --git a/docs/static/images/diagrams/tool-call-parsers.html b/docs/static/images/diagrams/tool-call-parsers.html new file mode 100644 index 000000000..6cf5e0fdb --- /dev/null +++ b/docs/static/images/diagrams/tool-call-parsers.html @@ -0,0 +1,142 @@ + + + + + + + + + + +
+
+
+
+
LocalAI · Function calling
+

Same request, any backend

+
+
+
TOOLS
+
native
+
+
+
+
+
One tool-call request shape; each backend's native parser extracts the calls.
+
localai.io/features/openai-functions
+
+
+ + + diff --git a/docs/static/images/diagrams/tool-call-parsers.png b/docs/static/images/diagrams/tool-call-parsers.png new file mode 100644 index 000000000..0f5c194a9 Binary files /dev/null and b/docs/static/images/diagrams/tool-call-parsers.png differ diff --git a/docs/static/images/diagrams/voice-recognition-flow.html b/docs/static/images/diagrams/voice-recognition-flow.html new file mode 100644 index 000000000..2af32f18e --- /dev/null +++ b/docs/static/images/diagrams/voice-recognition-flow.html @@ -0,0 +1,158 @@ + + + + + + + + + + +
+
+
+
+
LocalAI · Voice Recognition
+

Register, identify, forget

+
+
+
1:N
+
match
+
+
+
+
+
Voiceprints in a vector store: 1:1 verify, or 1:N identify.
+
localai.io/features/voice-recognition
+
+
+ + + diff --git a/docs/static/images/diagrams/voice-recognition-flow.png b/docs/static/images/diagrams/voice-recognition-flow.png new file mode 100644 index 000000000..edc73c089 Binary files /dev/null and b/docs/static/images/diagrams/voice-recognition-flow.png differ diff --git a/docs/static/images/diagrams/vram-eviction.html b/docs/static/images/diagrams/vram-eviction.html new file mode 100644 index 000000000..58ae93adf --- /dev/null +++ b/docs/static/images/diagrams/vram-eviction.html @@ -0,0 +1,197 @@ + + + + + + + + + + +
+
+
+
+
LocalAI · VRAM
+

Load, evict, reuse

+
+
+
LRU
+
evict
+
+
+
+
+
Least-recently-used eviction keeps the hottest models warm within your VRAM budget.
+
localai.io/advanced/vram-management
+
+
+ + + diff --git a/docs/static/images/diagrams/vram-eviction.png b/docs/static/images/diagrams/vram-eviction.png new file mode 100644 index 000000000..831511572 Binary files /dev/null and b/docs/static/images/diagrams/vram-eviction.png differ diff --git a/gallery/index.yaml b/gallery/index.yaml index 97b0d472f..fbdbb643f 100644 --- a/gallery/index.yaml +++ b/gallery/index.yaml @@ -31833,7 +31833,6 @@ - filename: parakeet-cpp/tdt_ctc-1.1b-f16.gguf uri: huggingface://mudler/parakeet-cpp-gguf/tdt_ctc-1.1b-f16.gguf sha256: cd53f64eefac2623a12f2f118ef50b56622dc3012f42c815c6adf0d08292f387 - - name: parakeet-crispasr url: github:mudler/LocalAI/gallery/virtual.yaml@master urls: @@ -32076,6 +32075,7 @@ files: - filename: voxtral-mini-3b-2507-q4_k.gguf uri: huggingface://cstr/voxtral-mini-3b-2507-GGUF/voxtral-mini-3b-2507-q4_k.gguf + sha256: 306088d884e36aa512aa41ea66087b9fd7f3e11e1568ccf6ca5df12dc97595a2 - name: voxtral4b-crispasr url: github:mudler/LocalAI/gallery/virtual.yaml@master urls: @@ -32098,6 +32098,7 @@ files: - filename: voxtral-mini-4b-realtime-q4_k.gguf uri: huggingface://cstr/voxtral-mini-4b-realtime-GGUF/voxtral-mini-4b-realtime-q4_k.gguf + sha256: 7dda1dba692f18c9d30a6064943b92c562853b399e96320929d2e1399c9d41cc - name: granite-crispasr url: github:mudler/LocalAI/gallery/virtual.yaml@master urls: @@ -32120,6 +32121,7 @@ files: - filename: granite-speech-4.0-1b-q4_k.gguf uri: huggingface://cstr/granite-speech-4.0-1b-GGUF/granite-speech-4.0-1b-q4_k.gguf + sha256: 4ab89d22379b0286033d5c958d7d0759860c4cb9e8ce81cab2e9272303321301 - name: granite-4.1-crispasr url: github:mudler/LocalAI/gallery/virtual.yaml@master urls: @@ -32142,6 +32144,7 @@ files: - filename: granite-speech-4.1-2b-q4_k.gguf uri: huggingface://cstr/granite-speech-4.1-2b-GGUF/granite-speech-4.1-2b-q4_k.gguf + sha256: d2fd66c801c37eb12b9ae1792994e406ce5a53ff0c864cc8cfe33f91d8eb7920 - name: granite-4.1-plus-crispasr url: github:mudler/LocalAI/gallery/virtual.yaml@master urls: @@ -32164,6 +32167,7 @@ files: - filename: granite-speech-4.1-2b-plus-q4_k.gguf uri: huggingface://cstr/granite-speech-4.1-2b-plus-GGUF/granite-speech-4.1-2b-plus-q4_k.gguf + sha256: 797ad005c53305d4fdea1fadd7baa62bd3310a3e2975c7964e48c76a41198dd4 - name: granite-4.1-nar-crispasr url: github:mudler/LocalAI/gallery/virtual.yaml@master urls: @@ -32186,6 +32190,7 @@ files: - filename: granite-speech-4.1-2b-nar-q4_k.gguf uri: huggingface://cstr/granite-speech-4.1-2b-nar-GGUF/granite-speech-4.1-2b-nar-q4_k.gguf + sha256: 7ffa9fd63b20c72cdc72c114631d5f6dfc2d81bf0e1e5255c350a9b6826f2ba4 - name: qwen3-crispasr url: github:mudler/LocalAI/gallery/virtual.yaml@master urls: @@ -32208,6 +32213,7 @@ files: - filename: qwen3-asr-0.6b-q4_k.gguf uri: huggingface://cstr/qwen3-asr-0.6b-GGUF/qwen3-asr-0.6b-q4_k.gguf + sha256: 4c67426908a518c28c24bc780df27175fcf84ce4d6dbd678133a4531904bbcc9 - name: qwen3-1.7b-crispasr url: github:mudler/LocalAI/gallery/virtual.yaml@master urls: @@ -32230,6 +32236,7 @@ files: - filename: qwen3-asr-1.7b-q4_k.gguf uri: huggingface://cstr/qwen3-asr-1.7b-GGUF/qwen3-asr-1.7b-q4_k.gguf + sha256: 1f1d26ee044f0f041b0a7bfcf6d560996103c951acbde6eb48ccb24e7edfc69c - name: cohere-crispasr url: github:mudler/LocalAI/gallery/virtual.yaml@master urls: @@ -32252,6 +32259,7 @@ files: - filename: cohere-transcribe-q4_k.gguf uri: huggingface://cstr/cohere-transcribe-03-2026-GGUF/cohere-transcribe-q4_k.gguf + sha256: 2931fc0ac6d6708eef5389aadf1ebd5eec7b8e764bac385be585e910c0e7b410 - name: wav2vec2-crispasr url: github:mudler/LocalAI/gallery/virtual.yaml@master urls: @@ -32274,6 +32282,7 @@ files: - filename: wav2vec2-xlsr-en-q4_k.gguf uri: huggingface://cstr/wav2vec2-large-xlsr-53-english-GGUF/wav2vec2-xlsr-en-q4_k.gguf + sha256: e28e4131af7eb4cc2dc2c15464801f4a6437a5f7cd51f45e5b12883ef7e8bc8f - name: wav2vec2-de-crispasr url: github:mudler/LocalAI/gallery/virtual.yaml@master urls: @@ -32296,6 +32305,7 @@ files: - filename: wav2vec2-large-xlsr-53-german-q4_k.gguf uri: huggingface://cstr/wav2vec2-large-xlsr-53-german-GGUF/wav2vec2-large-xlsr-53-german-q4_k.gguf + sha256: d134f7470d6b1f24a47fd165840697340b5259dc93b7d35cf43e14fb0d0213e7 - name: vibevoice-crispasr url: github:mudler/LocalAI/gallery/virtual.yaml@master urls: @@ -32318,6 +32328,7 @@ files: - filename: vibevoice-asr-q4_k.gguf uri: huggingface://cstr/vibevoice-asr-GGUF/vibevoice-asr-q4_k.gguf + sha256: f1e87bb5c25dd469b495759e59c4554c4e8ec254f36c5c659737ff3e61ace982 - name: vibevoice-tts-crispasr url: github:mudler/LocalAI/gallery/virtual.yaml@master urls: @@ -32339,6 +32350,7 @@ files: - filename: vibevoice-realtime-0.5b-q4_k.gguf uri: huggingface://cstr/vibevoice-realtime-0.5b-GGUF/vibevoice-realtime-0.5b-q4_k.gguf + sha256: e3244986d8939a9a8f65701196efbfe3f8b81afd307b29f434fe259b9c411ef1 - name: chatterbox-tts-crispasr url: github:mudler/LocalAI/gallery/virtual.yaml@master urls: @@ -32362,8 +32374,10 @@ files: - filename: chatterbox-t3-q8_0.gguf uri: huggingface://cstr/chatterbox-GGUF/chatterbox-t3-q8_0.gguf + sha256: 7b2da930c27df7e43d17a077bb58433b1bc33474ad66d781f715a7125f65d075 - filename: chatterbox-s3gen-q8_0.gguf uri: huggingface://cstr/chatterbox-GGUF/chatterbox-s3gen-q8_0.gguf + sha256: 6bbb93b892deeea73330cf773218e776e4bd0cf6ba71f60ef4dba72c922d0b3b - name: qwen3-tts-customvoice-crispasr url: github:mudler/LocalAI/gallery/virtual.yaml@master urls: @@ -32389,8 +32403,10 @@ files: - filename: qwen3-tts-12hz-0.6b-customvoice-q8_0.gguf uri: huggingface://cstr/qwen3-tts-0.6b-customvoice-GGUF/qwen3-tts-12hz-0.6b-customvoice-q8_0.gguf + sha256: 5227dcbc4df7c5533341d111cc469fa491a48e722b23dd10f553181b52dff2d9 - filename: qwen3-tts-tokenizer-12hz.gguf uri: huggingface://cstr/qwen3-tts-tokenizer-12hz-GGUF/qwen3-tts-tokenizer-12hz.gguf + sha256: 70dc95dbfdd9aa5d9d406236ff771d061bf17b0cda02a72513953355606e719b - name: orpheus-tts-crispasr url: github:mudler/LocalAI/gallery/virtual.yaml@master urls: @@ -32415,8 +32431,10 @@ files: - filename: orpheus-3b-base-q8_0.gguf uri: huggingface://cstr/orpheus-3b-base-GGUF/orpheus-3b-base-q8_0.gguf + sha256: 380e891d72adee9ad7db7b6f8626f737d1285a7cf8c98d256d70094182ed0615 - filename: snac-24khz.gguf uri: huggingface://cstr/snac-24khz-GGUF/snac-24khz.gguf + sha256: b4b044631df62ececa86ab080516b3e619cd8f93caabd5f6758c7eae14981bd8 - name: hubert-crispasr url: github:mudler/LocalAI/gallery/virtual.yaml@master urls: @@ -32441,6 +32459,7 @@ files: - filename: hubert-large-ls960-ft-q4_k.gguf uri: huggingface://cstr/hubert-large-ls960-ft-GGUF/hubert-large-ls960-ft-q4_k.gguf + sha256: 7cfd627da224e0c77b466e27bb10613fe834e7156cf5a58de9ad7885ba5af937 - name: data2vec-crispasr url: github:mudler/LocalAI/gallery/virtual.yaml@master urls: @@ -32465,6 +32484,7 @@ files: - filename: data2vec-audio-base-960h-q4_k.gguf uri: huggingface://cstr/data2vec-audio-960h-GGUF/data2vec-audio-base-960h-q4_k.gguf + sha256: 93b6ab01f1f83525157d797a385a3e9e014c6761d3e974351363adc452a86f7e - name: glm-asr-crispasr url: github:mudler/LocalAI/gallery/virtual.yaml@master urls: @@ -32489,6 +32509,7 @@ files: - filename: glm-asr-nano-q4_k.gguf uri: huggingface://cstr/glm-asr-nano-GGUF/glm-asr-nano-q4_k.gguf + sha256: 2e4f3360f69e7f7dfd24127305583ea16629975c643a771f8603ca04c6ab50d4 - name: kyutai-stt-crispasr url: github:mudler/LocalAI/gallery/virtual.yaml@master urls: @@ -32513,6 +32534,7 @@ files: - filename: kyutai-stt-1b-q4_k.gguf uri: huggingface://cstr/kyutai-stt-1b-GGUF/kyutai-stt-1b-q4_k.gguf + sha256: 32937b2c337e8b8b1bfd68bc90f07a1dbc9fcdfd5e7099dc770e15f0cbff512e - name: firered-asr-crispasr url: github:mudler/LocalAI/gallery/virtual.yaml@master urls: @@ -32537,6 +32559,7 @@ files: - filename: firered-asr2-aed-q4_k.gguf uri: huggingface://cstr/firered-asr2-aed-GGUF/firered-asr2-aed-q4_k.gguf + sha256: c5f40fe5b467296395027c7397d87043a39e3223fcd049056ed5ba88974e9e0d - name: moonshine-crispasr url: github:mudler/LocalAI/gallery/virtual.yaml@master urls: @@ -32562,8 +32585,10 @@ files: - filename: moonshine-tiny-q4_k.gguf uri: huggingface://cstr/moonshine-tiny-GGUF/moonshine-tiny-q4_k.gguf + sha256: 333bb4a7df0c51da04fa2694fdc944936e75e79e57745c7ac3fd11f3176a8368 - filename: tokenizer.bin uri: huggingface://cstr/moonshine-tiny-GGUF/tokenizer.bin + sha256: 0e90e02b765a10f0fa35b7d67877df29dd22a1fd4890899c9b1b203a19bc8999 - name: moonshine-de-crispasr url: github:mudler/LocalAI/gallery/virtual.yaml@master urls: @@ -32590,8 +32615,10 @@ files: - filename: moonshine-base-de-fidoriel-q4_k.gguf uri: huggingface://cstr/moonshine-base-de-fidoriel-GGUF/moonshine-base-de-fidoriel-q4_k.gguf + sha256: 6ce0bec4248720d3474ee80db2b35dbac8e5608106a47fe8853fc36a6d77aeb8 - filename: tokenizer.bin uri: huggingface://cstr/moonshine-base-de-fidoriel-GGUF/tokenizer.bin + sha256: 0e90e02b765a10f0fa35b7d67877df29dd22a1fd4890899c9b1b203a19bc8999 - name: moonshine-tiny-de-crispasr url: github:mudler/LocalAI/gallery/virtual.yaml@master urls: @@ -32618,8 +32645,10 @@ files: - filename: moonshine-tiny-de-fidoriel-q4_k.gguf uri: huggingface://cstr/moonshine-tiny-de-fidoriel-GGUF/moonshine-tiny-de-fidoriel-q4_k.gguf + sha256: cc2a94570dae9c9996d6c27c3b0d307973d08b43802a271922fb583f0a2afc71 - filename: tokenizer.bin uri: huggingface://cstr/moonshine-tiny-de-fidoriel-GGUF/tokenizer.bin + sha256: 0e90e02b765a10f0fa35b7d67877df29dd22a1fd4890899c9b1b203a19bc8999 - name: moonshine-streaming-crispasr url: github:mudler/LocalAI/gallery/virtual.yaml@master urls: @@ -32645,8 +32674,10 @@ files: - filename: moonshine-streaming-tiny-q4_k.gguf uri: huggingface://cstr/moonshine-streaming-tiny-GGUF/moonshine-streaming-tiny-q4_k.gguf + sha256: 46bf62ab1323da8ff3cf3936b62c08980590396a324bb822c91e38e821d972cc - filename: tokenizer.bin uri: huggingface://cstr/moonshine-streaming-tiny-GGUF/tokenizer.bin + sha256: 0e90e02b765a10f0fa35b7d67877df29dd22a1fd4890899c9b1b203a19bc8999 - name: mimo-asr-crispasr url: github:mudler/LocalAI/gallery/virtual.yaml@master urls: @@ -32672,5 +32703,7 @@ files: - filename: mimo-asr-q4_k.gguf uri: huggingface://cstr/mimo-asr-GGUF/mimo-asr-q4_k.gguf + sha256: 12dbc7cc7a20c7add6ff00bf8b12bca1c46304e0100a5c5a6e74bdecfc57a306 - filename: mimo-tokenizer-q4_k.gguf uri: huggingface://cstr/mimo-tokenizer-GGUF/mimo-tokenizer-q4_k.gguf + sha256: 3f3a903b10294ead4ef6a4afec035639fd2113b1d307d42f649a97cc85670e3f diff --git a/go.mod b/go.mod index 0bb00e30d..60e169977 100644 --- a/go.mod +++ b/go.mod @@ -37,14 +37,14 @@ require ( github.com/microcosm-cc/bluemonday v1.0.27 github.com/modelcontextprotocol/go-sdk v1.5.0 github.com/mudler/cogito v0.9.5-0.20260315222927-63abdec7189b - github.com/mudler/edgevpn v0.32.2 + github.com/mudler/edgevpn v0.34.0 github.com/mudler/go-processmanager v0.1.1 github.com/mudler/memory v0.0.0-20260406210934-424c1ecf2cf8 github.com/mudler/xlog v0.0.6 github.com/nats-io/nats.go v1.52.0 github.com/ollama/ollama v0.20.4 github.com/onsi/ginkgo/v2 v2.29.0 - github.com/onsi/gomega v1.40.0 + github.com/onsi/gomega v1.41.0 github.com/openai/openai-go/v3 v3.26.0 github.com/otiai10/copy v1.14.1 github.com/otiai10/openaigo v1.7.0 @@ -63,10 +63,10 @@ require ( github.com/testcontainers/testcontainers-go/modules/nats v0.42.0 github.com/testcontainers/testcontainers-go/modules/postgres v0.42.0 github.com/timbutler/zxcvbn v1.0.4 - go.opentelemetry.io/otel v1.43.0 - go.opentelemetry.io/otel/exporters/prometheus v0.65.0 - go.opentelemetry.io/otel/metric v1.43.0 - go.opentelemetry.io/otel/sdk/metric v1.43.0 + go.opentelemetry.io/otel v1.44.0 + go.opentelemetry.io/otel/exporters/prometheus v0.66.0 + go.opentelemetry.io/otel/metric v1.44.0 + go.opentelemetry.io/otel/sdk/metric v1.44.0 google.golang.org/grpc v1.80.0 google.golang.org/protobuf v1.36.11 gopkg.in/yaml.v3 v3.0.1 @@ -123,7 +123,7 @@ require ( github.com/go-openapi/validate v0.25.1 // indirect github.com/go-viper/mapstructure/v2 v2.4.0 // indirect github.com/google/certificate-transparency-go v1.3.2 // indirect - github.com/grpc-ecosystem/grpc-gateway/v2 v2.27.7 // indirect + github.com/grpc-ecosystem/grpc-gateway/v2 v2.28.0 // indirect github.com/in-toto/attestation v1.1.2 // indirect github.com/in-toto/in-toto-golang v0.9.0 // indirect github.com/invopop/jsonschema v0.13.0 // indirect @@ -155,7 +155,7 @@ require ( github.com/transparency-dev/merkle v0.0.2 // indirect github.com/wk8/go-ordered-map/v2 v2.1.8 // indirect go.mongodb.org/mongo-driver v1.17.6 // indirect - google.golang.org/genproto/googleapis/api v0.0.0-20260128011058-8636f8732409 // indirect + google.golang.org/genproto/googleapis/api v0.0.0-20260209200024-4cfbd4190f57 // indirect sigs.k8s.io/yaml v1.6.0 // indirect ) @@ -325,7 +325,7 @@ require ( github.com/xo/terminfo v0.0.0-20220910002029-abceb7e1c41e // indirect github.com/yosida95/uritemplate/v3 v3.0.2 // indirect go.opentelemetry.io/auto/sdk v1.2.1 // indirect - go.opentelemetry.io/contrib/instrumentation/net/http/otelhttp v0.65.0 // indirect + go.opentelemetry.io/contrib/instrumentation/net/http/otelhttp v0.67.0 // indirect go.uber.org/mock v0.5.2 // indirect go.yaml.in/yaml/v2 v2.4.4 go.yaml.in/yaml/v3 v3.0.4 // indirect @@ -351,7 +351,7 @@ require ( github.com/beorn7/perks v1.0.1 // indirect github.com/c-robinson/iplib v1.0.8 // indirect github.com/cenkalti/backoff/v4 v4.3.0 // indirect - github.com/cespare/xxhash/v2 v2.3.0 // indirect + github.com/cespare/xxhash/v2 v2.3.0 github.com/containerd/cgroups v1.1.0 // indirect github.com/containerd/continuity v0.4.4 // indirect github.com/containerd/errdefs v1.0.0 // indirect @@ -392,10 +392,10 @@ require ( github.com/henvic/httpretty v0.1.4 // indirect github.com/huandu/xstrings v1.5.0 // indirect github.com/huin/goupnp v1.3.0 // indirect - github.com/ipfs/boxo v0.37.0 // indirect + github.com/ipfs/boxo v0.39.0 // indirect github.com/ipfs/go-cid v0.6.1 // indirect github.com/ipfs/go-datastore v0.9.1 // indirect - github.com/ipfs/go-log/v2 v2.9.1 // indirect + github.com/ipfs/go-log/v2 v2.9.2 // indirect github.com/ipld/go-ipld-prime v0.23.0 // indirect github.com/jackpal/go-nat-pmp v1.0.2 // indirect github.com/jaypipes/pcidb v1.1.1 // indirect @@ -407,9 +407,9 @@ require ( github.com/libp2p/go-cidranger v1.1.0 // indirect github.com/libp2p/go-flow-metrics v0.3.0 // indirect github.com/libp2p/go-libp2p-asn-util v0.4.1 // indirect - github.com/libp2p/go-libp2p-kad-dht v0.39.0 // indirect + github.com/libp2p/go-libp2p-kad-dht v0.40.0 // indirect github.com/libp2p/go-libp2p-kbucket v0.8.0 // indirect - github.com/libp2p/go-libp2p-pubsub v0.15.0 // indirect + github.com/libp2p/go-libp2p-pubsub v0.16.0 // indirect github.com/libp2p/go-libp2p-record v0.3.1 // indirect github.com/libp2p/go-libp2p-routing-helpers v0.7.5 // indirect github.com/libp2p/go-msgio v0.3.0 // indirect @@ -421,7 +421,7 @@ require ( github.com/mailru/easyjson v0.9.0 // indirect github.com/marten-seemann/tcp v0.0.0-20210406111302-dfbc87cc63fd // indirect github.com/mattn/go-colorable v0.1.14 // indirect - github.com/mattn/go-isatty v0.0.20 // indirect + github.com/mattn/go-isatty v0.0.22 // indirect github.com/mattn/go-runewidth v0.0.17 // indirect github.com/miekg/dns v1.1.72 // indirect github.com/mikioh/tcpinfo v0.0.0-20190314235526-30a79bb1804b // indirect @@ -487,25 +487,25 @@ require ( github.com/yuin/goldmark-emoji v1.0.6 // indirect github.com/yusufpapurcu/wmi v1.2.4 // indirect go.opencensus.io v0.24.0 // indirect - go.opentelemetry.io/otel/sdk v1.43.0 // indirect - go.opentelemetry.io/otel/trace v1.43.0 // indirect + go.opentelemetry.io/otel/sdk v1.44.0 // indirect + go.opentelemetry.io/otel/trace v1.44.0 // indirect go.uber.org/dig v1.19.0 // indirect go.uber.org/fx v1.24.0 // indirect go.uber.org/multierr v1.11.0 // indirect - go.uber.org/zap v1.27.1 // indirect + go.uber.org/zap v1.28.0 // indirect golang.org/x/crypto v0.51.0 golang.org/x/exp v0.0.0-20260410095643-746e56fc9e2f // indirect golang.org/x/mod v0.35.0 // indirect golang.org/x/sync v0.20.0 - golang.org/x/sys v0.44.0 // indirect + golang.org/x/sys v0.45.0 // indirect golang.org/x/term v0.43.0 golang.org/x/text v0.37.0 // indirect golang.org/x/tools v0.44.0 // indirect golang.zx2c4.com/wintun v0.0.0-20230126152724-0fa3db229ce2 // indirect golang.zx2c4.com/wireguard v0.0.0-20250521234502-f333402bd9cb // indirect - golang.zx2c4.com/wireguard/windows v0.5.3 // indirect + golang.zx2c4.com/wireguard/windows v0.6.1 // indirect gonum.org/v1/gonum v0.17.0 // indirect - google.golang.org/genproto/googleapis/rpc v0.0.0-20260128011058-8636f8732409 // indirect + google.golang.org/genproto/googleapis/rpc v0.0.0-20260209200024-4cfbd4190f57 // indirect gopkg.in/fsnotify.v1 v1.4.7 // indirect gopkg.in/tomb.v1 v1.0.0-20141024135613-dd632973f1e7 // indirect howett.net/plist v1.0.2-0.20250314012144-ee69052608d9 // indirect diff --git a/go.sum b/go.sum index c949ea155..4864f1537 100644 --- a/go.sum +++ b/go.sum @@ -649,8 +649,8 @@ github.com/gpustack/gguf-parser-go v0.24.0/go.mod h1:y4TwTtDqFWTK+xvprOjRUh+dowg github.com/grpc-ecosystem/go-grpc-middleware v1.4.0 h1:UH//fgunKIs4JdUbpDl1VZCDaL56wXCB/5+wF6uHfaI= github.com/grpc-ecosystem/go-grpc-middleware v1.4.0/go.mod h1:g5qyo/la0ALbONm6Vbp88Yd8NsDy6rZz+RcrMPxvld8= github.com/grpc-ecosystem/grpc-gateway v1.16.0/go.mod h1:BDjrQk3hbvj6Nolgz8mAMFbcEtjT1g+wF4CSlocrBnw= -github.com/grpc-ecosystem/grpc-gateway/v2 v2.27.7 h1:X+2YciYSxvMQK0UZ7sg45ZVabVZBeBuvMkmuI2V3Fak= -github.com/grpc-ecosystem/grpc-gateway/v2 v2.27.7/go.mod h1:lW34nIZuQ8UDPdkon5fmfp2l3+ZkQ2me/+oecHYLOII= +github.com/grpc-ecosystem/grpc-gateway/v2 v2.28.0 h1:HWRh5R2+9EifMyIHV7ZV+MIZqgz+PMpZ14Jynv3O2Zs= +github.com/grpc-ecosystem/grpc-gateway/v2 v2.28.0/go.mod h1:JfhWUomR1baixubs02l85lZYYOm7LV6om4ceouMv45c= github.com/hack-pad/go-indexeddb v0.3.2 h1:DTqeJJYc1usa45Q5r52t01KhvlSN02+Oq+tQbSBI91A= github.com/hack-pad/go-indexeddb v0.3.2/go.mod h1:QvfTevpDVlkfomY498LhstjwbPW6QC4VC/lxYb0Kom0= github.com/hack-pad/safejs v0.1.0 h1:qPS6vjreAqh2amUqj4WNG1zIw7qlRQJ9K10eDKMCnE8= @@ -722,8 +722,8 @@ github.com/inconshreveable/mousetrap v1.1.0 h1:wN+x4NVGpMsO7ErUn/mUI3vEoE6Jt13X2 github.com/inconshreveable/mousetrap v1.1.0/go.mod h1:vpF70FUmC8bwa3OWnCshd2FqLfsEA9PFc4w1p2J65bw= github.com/invopop/jsonschema v0.13.0 h1:KvpoAJWEjR3uD9Kbm2HWJmqsEaHt8lBUpd0qHcIi21E= github.com/invopop/jsonschema v0.13.0/go.mod h1:ffZ5Km5SWWRAIN6wbDXItl95euhFz2uON45H2qjYt+0= -github.com/ipfs/boxo v0.37.0 h1:2E3mZvydMI2t5IkAgtkmZ3sGsld0oS7o3I+xyzDk6uI= -github.com/ipfs/boxo v0.37.0/go.mod h1:8yyiRn54F2CsW13n0zwXEPrVsZix/gFj9SYIRYMZ6KE= +github.com/ipfs/boxo v0.39.0 h1:u9jLf5pLx5SWROXjHtj8VMvv+iDlMbiTyZ/vVTQ4VhI= +github.com/ipfs/boxo v0.39.0/go.mod h1:k9YCvMjytFguMHndEiGdCGMMj4b7CkdOT44vtgAxOdk= github.com/ipfs/go-block-format v0.2.3 h1:mpCuDaNXJ4wrBJLrtEaGFGXkferrw5eqVvzaHhtFKQk= github.com/ipfs/go-block-format v0.2.3/go.mod h1:WJaQmPAKhD3LspLixqlqNFxiZ3BZ3xgqxxoSR/76pnA= github.com/ipfs/go-cid v0.6.1 h1:T5TnNb08+ueovG76Z5gx1L4Y7QOaGTXHg1F6raWFxIc= @@ -735,10 +735,10 @@ github.com/ipfs/go-detect-race v0.0.1/go.mod h1:8BNT7shDZPo99Q74BpGMK+4D8Mn4j46U github.com/ipfs/go-log v1.0.5 h1:2dOuUCB1Z7uoczMWgAyDck5JLb72zHzrMnGnCNNbvY8= github.com/ipfs/go-log v1.0.5/go.mod h1:j0b8ZoR+7+R99LD9jZ6+AJsrzkPbSXbZfGakb5JPtIo= github.com/ipfs/go-log/v2 v2.1.3/go.mod h1:/8d0SH3Su5Ooc31QlL1WysJhvyOTDCjcCZ9Axpmri6g= -github.com/ipfs/go-log/v2 v2.9.1 h1:3JXwHWU31dsCpvQ+7asz6/QsFJHqFr4gLgQ0FWteujk= -github.com/ipfs/go-log/v2 v2.9.1/go.mod h1:evFx7sBiohUN3AG12mXlZBw5hacBQld3ZPHrowlJYoo= -github.com/ipfs/go-test v0.2.3 h1:Z/jXNAReQFtCYyn7bsv/ZqUwS6E7iIcSpJ2CuzCvnrc= -github.com/ipfs/go-test v0.2.3/go.mod h1:QW8vSKkwYvWFwIZQLGQXdkt9Ud76eQXRQ9Ao2H+cA1o= +github.com/ipfs/go-log/v2 v2.9.2 h1:O/5BB0elpkRILvT24rCJ5976wWd7u0nJ436T3rdYdc4= +github.com/ipfs/go-log/v2 v2.9.2/go.mod h1:RziRwwXWhndlk8L75RnEe0zeAYaq2heKtEMc3jqUov0= +github.com/ipfs/go-test v0.3.0 h1:0Y4Uve3tp9HI+2lIJjfOliOrOgv/YpXg/l1y3P4DEYE= +github.com/ipfs/go-test v0.3.0/go.mod h1:JK+U8pRpATZb7lsYNSJlCj3WYB3cFfWIbI6nWRM/GFk= github.com/ipld/go-ipld-prime v0.23.0 h1:csqdPZH60BsTC+AZrv7fpa27v+09I/oTqyHYYYE27eE= github.com/ipld/go-ipld-prime v0.23.0/go.mod h1:46YCFSFNFBJHPjB0pfMuv7Ly7df2eChpkpyPo5SE0bA= github.com/jackc/pgpassfile v1.0.0 h1:/6Hmqy13Ss2zCq62VdNG8tM1wchn8zjSGOBJ6icpsIM= @@ -839,12 +839,12 @@ github.com/libp2p/go-libp2p v0.48.0 h1:h2BrLAgrj7X8bEN05K7qmrjpNHYA+6tnsGRdprjTn github.com/libp2p/go-libp2p v0.48.0/go.mod h1:Q1fBZNdmC2Hf82husCTfkKJVfHm2we5zk+NWmOGEmWk= github.com/libp2p/go-libp2p-asn-util v0.4.1 h1:xqL7++IKD9TBFMgnLPZR6/6iYhawHKHl950SO9L6n94= github.com/libp2p/go-libp2p-asn-util v0.4.1/go.mod h1:d/NI6XZ9qxw67b4e+NgpQexCIiFYJjErASrYW4PFDN8= -github.com/libp2p/go-libp2p-kad-dht v0.39.0 h1:mww38eBYiUvdsu+Xl/GLlBC0Aa8M+5HAwvafkFOygAM= -github.com/libp2p/go-libp2p-kad-dht v0.39.0/go.mod h1:Po2JugFEkDq9Vig/JXtc153ntOi0q58o4j7IuITCOVs= +github.com/libp2p/go-libp2p-kad-dht v0.40.0 h1:as8U7Y1RX9CTKCBiFBHWKZ6tSS+rE+6WNz+H1+M+wbo= +github.com/libp2p/go-libp2p-kad-dht v0.40.0/go.mod h1:iLUjII47u3/HjxyhucI2lhsl29lrzlAs/ym16+H40jE= github.com/libp2p/go-libp2p-kbucket v0.8.0 h1:QAK7RzKJpYe+EuSEATAaaHYMYLkPDGC18m9jxPLnU8s= github.com/libp2p/go-libp2p-kbucket v0.8.0/go.mod h1:JMlxqcEyKwO6ox716eyC0hmiduSWZZl6JY93mGaaqc4= -github.com/libp2p/go-libp2p-pubsub v0.15.0 h1:cG7Cng2BT82WttmPFMi50gDNV+58K626m/wR00vGL1o= -github.com/libp2p/go-libp2p-pubsub v0.15.0/go.mod h1:lr4oE8bFgQaifRcoc2uWhWWiK6tPdOEKpUuR408GFN4= +github.com/libp2p/go-libp2p-pubsub v0.16.0 h1:j7G2C8kJwkcAQqYR7Wmq3d75d3Sgw/N0Hhiv0dVx7OY= +github.com/libp2p/go-libp2p-pubsub v0.16.0/go.mod h1:lr4oE8bFgQaifRcoc2uWhWWiK6tPdOEKpUuR408GFN4= github.com/libp2p/go-libp2p-record v0.3.1 h1:cly48Xi5GjNw5Wq+7gmjfBiG9HCzQVkiZOUZ8kUl+Fg= github.com/libp2p/go-libp2p-record v0.3.1/go.mod h1:T8itUkLcWQLCYMqtX7Th6r7SexyUJpIyPgks757td/E= github.com/libp2p/go-libp2p-routing-helpers v0.7.5 h1:HdwZj9NKovMx0vqq6YNPTh6aaNzey5zHD7HeLJtq6fI= @@ -885,8 +885,8 @@ github.com/mattn/go-colorable v0.1.14/go.mod h1:6LmQG8QLFO4G5z1gPvYEzlUgJ2wF+stg github.com/mattn/go-isatty v0.0.3/go.mod h1:M+lRXTBqGeGNdLjl/ufCoiOlB5xdOkqRJdNxMWT7Zi4= github.com/mattn/go-isatty v0.0.16/go.mod h1:kYGgaQfpe5nmfYZH+SKPsOc2e4SrIfOl2e/yFXSvRLM= github.com/mattn/go-isatty v0.0.19/go.mod h1:W+V8PltTTMOvKvAeJH7IuucS94S2C6jfK/D7dTCTo3Y= -github.com/mattn/go-isatty v0.0.20 h1:xfD0iDuEKnDkl03q4limB+vH+GxLEtL/jb4xVJSWWEY= -github.com/mattn/go-isatty v0.0.20/go.mod h1:W+V8PltTTMOvKvAeJH7IuucS94S2C6jfK/D7dTCTo3Y= +github.com/mattn/go-isatty v0.0.22 h1:j8l17JJ9i6VGPUFUYoTUKPSgKe/83EYU2zBC7YNKMw4= +github.com/mattn/go-isatty v0.0.22/go.mod h1:ZXfXG4SQHsB/w3ZeOYbR0PrPwLy+n6xiMrJlRFqopa4= github.com/mattn/go-runewidth v0.0.9/go.mod h1:H031xJmbD/WCDINGzjvQ9THkh0rPKHF+m2gUSrubnMI= github.com/mattn/go-runewidth v0.0.12/go.mod h1:RAqKPSqVFrSLVXbA8x7dzmKdmGzieGRCM46jaSJTDAk= github.com/mattn/go-runewidth v0.0.17 h1:78v8ZlW0bP43XfmAfPsdXcoNCelfMHsDmd/pkENfrjQ= @@ -972,8 +972,8 @@ github.com/mudler/LocalAGI v0.0.0-20260508125235-37810d918a87 h1:az+2umaD/sT1rRv github.com/mudler/LocalAGI v0.0.0-20260508125235-37810d918a87/go.mod h1:x77p9W1zKZr+W+UcEwg8/qdp00p4XXOI69wE7WlXZc0= github.com/mudler/cogito v0.9.5-0.20260315222927-63abdec7189b h1:A74T2Lauvg61KodYqsjTYDY05kPLcW+efVZjd23dghU= github.com/mudler/cogito v0.9.5-0.20260315222927-63abdec7189b/go.mod h1:6sfja3lcu2nWRzEc0wwqGNu/eCG3EWgij+8s7xyUeQ4= -github.com/mudler/edgevpn v0.32.2 h1:umTPyyZgkom/A81Bk4HbP0p1ZSEU5EFPW3Bg+YPxI8A= -github.com/mudler/edgevpn v0.32.2/go.mod h1:UaMc8MORbcRsAjuO5gVJj9Bn3Nq2AP5U9NTb6epVyv8= +github.com/mudler/edgevpn v0.34.0 h1:qDrD/rCPFY/FdURbXudIZWihVKY4VOX3nMn3CcbeQEU= +github.com/mudler/edgevpn v0.34.0/go.mod h1:yki7uMi5LR9gSMrw8PdPieuxsrk8BLV2Ui7VBEmbbIA= github.com/mudler/go-piper v0.0.0-20241023091659-2494246fd9fc h1:RxwneJl1VgvikiX28EkpdAyL4yQVnJMrbquKospjHyA= github.com/mudler/go-piper v0.0.0-20241023091659-2494246fd9fc/go.mod h1:O7SwdSWMilAWhBZMK9N9Y/oBDyMMzshE3ju8Xkexwig= github.com/mudler/go-processmanager v0.1.1 h1:c/1NRZOZpW8HuFv9RhBG57nQu1oDMRomEHedwBFMlrw= @@ -1044,8 +1044,8 @@ github.com/onsi/ginkgo v1.16.5 h1:8xi0RTUf59SOSfEtZMvwTvXYMzG4gV23XVHOZiXNtnE= github.com/onsi/ginkgo v1.16.5/go.mod h1:+E8gABHa3K6zRBolWtd+ROzc/U5bkGt0FwiG042wbpU= github.com/onsi/ginkgo/v2 v2.29.0 h1:rfh+ZFjgJhYWRoIqVf3Uwx/W20yLrcrE2h2GmYVRaag= github.com/onsi/ginkgo/v2 v2.29.0/go.mod h1:+aXOY+vzZ5mu2iI2HpTZUPmM//oQfsNFX6gU9kNcA44= -github.com/onsi/gomega v1.40.0 h1:Vtol0e1MghCD2ZVIilPDIg44XSL9l2QAn8ZNaljWcJc= -github.com/onsi/gomega v1.40.0/go.mod h1:M/Uqpu/8qTjtzCLUA2zJHX9Iilrau25x1PdoSRbWh5A= +github.com/onsi/gomega v1.41.0 h1:OwKp4pXNgVxf6sCplzYo794OFNuoL2q2SBMU5NSWOjA= +github.com/onsi/gomega v1.41.0/go.mod h1:M/Uqpu/8qTjtzCLUA2zJHX9Iilrau25x1PdoSRbWh5A= github.com/openai/openai-go/v3 v3.26.0 h1:bRt6H/ozMNt/dDkN4gobnLqaEGrRGBzmbVs0xxJEnQE= github.com/openai/openai-go/v3 v3.26.0/go.mod h1:cdufnVK14cWcT9qA1rRtrXx4FTRsgbDPW7Ia7SS5cZo= github.com/opencontainers/go-digest v1.0.0 h1:apOUWs51W5PlhuyGyz9FCeeBIOUDA/6nW8Oi/yOhh5U= @@ -1417,20 +1417,22 @@ go.opentelemetry.io/auto/sdk v1.2.1 h1:jXsnJ4Lmnqd11kwkBV2LgLoFMZKizbCi5fNZ/ipaZ go.opentelemetry.io/auto/sdk v1.2.1/go.mod h1:KRTj+aOaElaLi+wW1kO/DZRXwkF4C5xPbEe3ZiIhN7Y= go.opentelemetry.io/contrib/instrumentation/google.golang.org/grpc/otelgrpc v0.63.0 h1:YH4g8lQroajqUwWbq/tr2QX1JFmEXaDLgG+ew9bLMWo= go.opentelemetry.io/contrib/instrumentation/google.golang.org/grpc/otelgrpc v0.63.0/go.mod h1:fvPi2qXDqFs8M4B4fmJhE92TyQs9Ydjlg3RvfUp+NbQ= -go.opentelemetry.io/contrib/instrumentation/net/http/otelhttp v0.65.0 h1:7iP2uCb7sGddAr30RRS6xjKy7AZ2JtTOPA3oolgVSw8= -go.opentelemetry.io/contrib/instrumentation/net/http/otelhttp v0.65.0/go.mod h1:c7hN3ddxs/z6q9xwvfLPk+UHlWRQyaeR1LdgfL/66l0= -go.opentelemetry.io/otel v1.43.0 h1:mYIM03dnh5zfN7HautFE4ieIig9amkNANT+xcVxAj9I= -go.opentelemetry.io/otel v1.43.0/go.mod h1:JuG+u74mvjvcm8vj8pI5XiHy1zDeoCS2LB1spIq7Ay0= -go.opentelemetry.io/otel/exporters/prometheus v0.65.0 h1:jOveH/b4lU9HT7y+Gfamf18BqlOuz2PWEvs8yM7Q6XE= -go.opentelemetry.io/otel/exporters/prometheus v0.65.0/go.mod h1:i1P8pcumauPtUI4YNopea1dhzEMuEqWP1xoUZDylLHo= -go.opentelemetry.io/otel/metric v1.43.0 h1:d7638QeInOnuwOONPp4JAOGfbCEpYb+K6DVWvdxGzgM= -go.opentelemetry.io/otel/metric v1.43.0/go.mod h1:RDnPtIxvqlgO8GRW18W6Z/4P462ldprJtfxHxyKd2PY= -go.opentelemetry.io/otel/sdk v1.43.0 h1:pi5mE86i5rTeLXqoF/hhiBtUNcrAGHLKQdhg4h4V9Dg= -go.opentelemetry.io/otel/sdk v1.43.0/go.mod h1:P+IkVU3iWukmiit/Yf9AWvpyRDlUeBaRg6Y+C58QHzg= -go.opentelemetry.io/otel/sdk/metric v1.43.0 h1:S88dyqXjJkuBNLeMcVPRFXpRw2fuwdvfCGLEo89fDkw= -go.opentelemetry.io/otel/sdk/metric v1.43.0/go.mod h1:C/RJtwSEJ5hzTiUz5pXF1kILHStzb9zFlIEe85bhj6A= -go.opentelemetry.io/otel/trace v1.43.0 h1:BkNrHpup+4k4w+ZZ86CZoHHEkohws8AY+WTX09nk+3A= -go.opentelemetry.io/otel/trace v1.43.0/go.mod h1:/QJhyVBUUswCphDVxq+8mld+AvhXZLhe+8WVFxiFff0= +go.opentelemetry.io/contrib/instrumentation/net/http/otelhttp v0.67.0 h1:OyrsyzuttWTSur2qN/Lm0m2a8yqyIjUVBZcxFPuXq2o= +go.opentelemetry.io/contrib/instrumentation/net/http/otelhttp v0.67.0/go.mod h1:C2NGBr+kAB4bk3xtMXfZ94gqFDtg/GkI7e9zqGh5Beg= +go.opentelemetry.io/otel v1.44.0 h1:JjwHmHpA4iZ3wBxluu2fbbE7j4kqlE8jXyAyPXH7HqU= +go.opentelemetry.io/otel v1.44.0/go.mod h1:BMgjTHL9WPRlRjL2oZCBTL4whCGtXch2H4BhOPIAyYc= +go.opentelemetry.io/otel/exporters/prometheus v0.66.0 h1:vkrK8PAznv2NKt2r+kdu252ccGzkEqLc2aSXbQIALYQ= +go.opentelemetry.io/otel/exporters/prometheus v0.66.0/go.mod h1:V/UB6D3vMF/UBOL5igAsAYnk1nG/bzYYTzvsB16cy7o= +go.opentelemetry.io/otel/metric v1.44.0 h1:1w0gILTcHdr3YI+ixLyjemwrVnsMURbTZFrSYCdDdmc= +go.opentelemetry.io/otel/metric v1.44.0/go.mod h1:8O7hanEPBNgEMmybD3s2VBKcgWOCsA6tzHBPODAiquo= +go.opentelemetry.io/otel/metric/x v0.66.0 h1:YkCrx1zLOChi9ZcZ6euupOcsgzbVlec7D/xoEU1+cTA= +go.opentelemetry.io/otel/metric/x v0.66.0/go.mod h1:d1+BDj9t96do0/1LoU1ayfCv79ZgNE41qbhBvnMOBZk= +go.opentelemetry.io/otel/sdk v1.44.0 h1:nHYwb9lK+fJPU/dnT6s7W7Z8itMWyqrnVfbheVYrZ58= +go.opentelemetry.io/otel/sdk v1.44.0/go.mod h1:Osuydd3Se74nqjAKxid74N5eC+jfEqfTegHRnq58oK0= +go.opentelemetry.io/otel/sdk/metric v1.44.0 h1:3LlKgI+VjbVsjNRFZJZAJ30WjXC5VkNRks6si09iEfI= +go.opentelemetry.io/otel/sdk/metric v1.44.0/go.mod h1:5B5pMARnXxKhltooO4xUuCBorl65a4EpnTalObqOigA= +go.opentelemetry.io/otel/trace v1.44.0 h1:jxF5CsGYCe74MCRx2X4g7WsY/VBKRqqpNvXlX/6gtIk= +go.opentelemetry.io/otel/trace v1.44.0/go.mod h1:oLl1jrMQAVo6v3GAggN+1VH9VIz9iUSvW53sW1Q8PIE= go.starlark.net v0.0.0-20250417143717-f57e51f710eb h1:zOg9DxxrorEmgGUr5UPdCEwKqiqG0MlZciuCuA3XiDE= go.starlark.net v0.0.0-20250417143717-f57e51f710eb/go.mod h1:YKMCv9b1WrfWmeqdV5MAuEHWsu5iC+fe6kYl2sQjdI8= go.step.sm/crypto v0.74.0 h1:/APBEv45yYR4qQFg47HA8w1nesIGcxh44pGyQNw6JRA= @@ -1452,8 +1454,8 @@ go.uber.org/multierr v1.11.0/go.mod h1:20+QtiLqy0Nd6FdQB9TLXag12DsQkrbs3htMFfDN8 go.uber.org/tools v0.0.0-20190618225709-2cfd321de3ee/go.mod h1:vJERXedbb3MVM5f9Ejo0C68/HhF8uaILCdgjnY+goOA= go.uber.org/zap v1.16.0/go.mod h1:MA8QOfq0BHJwdXa996Y4dYkAqRKB8/1K1QMMZVaNZjQ= go.uber.org/zap v1.17.0/go.mod h1:MXVU+bhUf/A7Xi2HNOnopQOrmycQ5Ih87HtOu4q5SSo= -go.uber.org/zap v1.27.1 h1:08RqriUEv8+ArZRYSTXy1LeBScaMpVSTBhCeaZYfMYc= -go.uber.org/zap v1.27.1/go.mod h1:GB2qFLM7cTU87MWRP2mPIjqfIDnGu+VIO4V/SdhGo2E= +go.uber.org/zap v1.28.0 h1:IZzaP1Fv73/T/pBMLk4VutPl36uNC+OSUh3JLG3FIjo= +go.uber.org/zap v1.28.0/go.mod h1:rDLpOi171uODNm/mxFcuYWxDsqWSAVkFdX4XojSKg/Q= go.yaml.in/yaml/v2 v2.4.4 h1:tuyd0P+2Ont/d6e2rl3be67goVK4R6deVxCUX5vyPaQ= go.yaml.in/yaml/v2 v2.4.4/go.mod h1:gMZqIpDtDqOfM0uNfy0SkpRhvUryYH0Z6wdMYcacYXQ= go.yaml.in/yaml/v3 v3.0.4 h1:tfq32ie2Jv2UxXFdLJdh3jXuOzWiL1fo0bu/FbuKpbc= @@ -1674,8 +1676,8 @@ golang.org/x/sys v0.12.0/go.mod h1:oPkhp1MJrh7nUepCBck5+mAzfO9JrbApNNgaTdGDITg= golang.org/x/sys v0.17.0/go.mod h1:/VUhepiaJMQUp4+oa/7Zr1D23ma6VTLIYjOOTFZPUcA= golang.org/x/sys v0.20.0/go.mod h1:/VUhepiaJMQUp4+oa/7Zr1D23ma6VTLIYjOOTFZPUcA= golang.org/x/sys v0.28.0/go.mod h1:/VUhepiaJMQUp4+oa/7Zr1D23ma6VTLIYjOOTFZPUcA= -golang.org/x/sys v0.44.0 h1:ildZl3J4uzeKP07r2F++Op7E9B29JRUy+a27EibtBTQ= -golang.org/x/sys v0.44.0/go.mod h1:4GL1E5IUh+htKOUEOaiffhrAeqysfVGipDYzABqnCmw= +golang.org/x/sys v0.45.0 h1:dO4czNzziLiiXplLQgBCEpCvXQ3dnkn0SdaZSYdQ+FY= +golang.org/x/sys v0.45.0/go.mod h1:4GL1E5IUh+htKOUEOaiffhrAeqysfVGipDYzABqnCmw= golang.org/x/telemetry v0.0.0-20240228155512-f48c80bd79b2/go.mod h1:TeRTkGYfJXctD9OcfyVLyj2J3IxLnKwHJR8f4D8a3YE= golang.org/x/telemetry v0.0.0-20260409153401-be6f6cb8b1fa h1:efT73AJZfAAUV7SOip6pWGkwJDzIGiKBZGVzHYa+ve4= golang.org/x/telemetry v0.0.0-20260409153401-be6f6cb8b1fa/go.mod h1:kHjTxDEnAu6/Nl9lDkzjWpR+bmKfxeiRuSDlsMb70gE= @@ -1785,8 +1787,8 @@ golang.zx2c4.com/wintun v0.0.0-20230126152724-0fa3db229ce2 h1:B82qJJgjvYKsXS9jeu golang.zx2c4.com/wintun v0.0.0-20230126152724-0fa3db229ce2/go.mod h1:deeaetjYA+DHMHg+sMSMI58GrEteJUUzzw7en6TJQcI= golang.zx2c4.com/wireguard v0.0.0-20250521234502-f333402bd9cb h1:whnFRlWMcXI9d+ZbWg+4sHnLp52d5yiIPUxMBSt4X9A= golang.zx2c4.com/wireguard v0.0.0-20250521234502-f333402bd9cb/go.mod h1:rpwXGsirqLqN2L0JDJQlwOboGHmptD5ZD6T2VmcqhTw= -golang.zx2c4.com/wireguard/windows v0.5.3 h1:On6j2Rpn3OEMXqBq00QEDC7bWSZrPIHKIus8eIuExIE= -golang.zx2c4.com/wireguard/windows v0.5.3/go.mod h1:9TEe8TJmtwyQebdFwAkEWOPr3prrtqm+REGFifP60hI= +golang.zx2c4.com/wireguard/windows v0.6.1 h1:XMaKojH1Hs/raMrmnir4n35nTvzvWj7NmSYzHn2F4qU= +golang.zx2c4.com/wireguard/windows v0.6.1/go.mod h1:04aqInu5GYuTFvMuDw/rKBAF7mHrltW/3rekpfbbZDM= gonum.org/v1/gonum v0.17.0 h1:VbpOemQlsSMrYmn7T2OUvQ4dqxQXU+ouZFQsZOx50z4= gonum.org/v1/gonum v0.17.0/go.mod h1:El3tOrEuMpv2UdMrbNlKEh9vd86bmQ6vqIcDwxEOc1E= google.golang.org/api v0.4.0/go.mod h1:8k5glujaEP+g9n7WNsDg8QP6cUVNI86fCNMcbazEtwE= @@ -1865,10 +1867,10 @@ google.golang.org/genproto v0.0.0-20210402141018-6c239bbf2bb1/go.mod h1:9lPAdzaE google.golang.org/genproto v0.0.0-20210602131652-f16073e35f0c/go.mod h1:UODoCrxHCcBojKKwX1terBiRUaqAsFqJiF615XL43r0= google.golang.org/genproto v0.0.0-20250922171735-9219d122eba9 h1:LvZVVaPE0JSqL+ZWb6ErZfnEOKIqqFWUJE2D0fObSmc= google.golang.org/genproto v0.0.0-20250922171735-9219d122eba9/go.mod h1:QFOrLhdAe2PsTp3vQY4quuLKTi9j3XG3r6JPPaw7MSc= -google.golang.org/genproto/googleapis/api v0.0.0-20260128011058-8636f8732409 h1:merA0rdPeUV3YIIfHHcH4qBkiQAc1nfCKSI7lB4cV2M= -google.golang.org/genproto/googleapis/api v0.0.0-20260128011058-8636f8732409/go.mod h1:fl8J1IvUjCilwZzQowmw2b7HQB2eAuYBabMXzWurF+I= -google.golang.org/genproto/googleapis/rpc v0.0.0-20260128011058-8636f8732409 h1:H86B94AW+VfJWDqFeEbBPhEtHzJwJfTbgE2lZa54ZAQ= -google.golang.org/genproto/googleapis/rpc v0.0.0-20260128011058-8636f8732409/go.mod h1:j9x/tPzZkyxcgEFkiKEEGxfvyumM01BEtsW8xzOahRQ= +google.golang.org/genproto/googleapis/api v0.0.0-20260209200024-4cfbd4190f57 h1:JLQynH/LBHfCTSbDWl+py8C+Rg/k1OVH3xfcaiANuF0= +google.golang.org/genproto/googleapis/api v0.0.0-20260209200024-4cfbd4190f57/go.mod h1:kSJwQxqmFXeo79zOmbrALdflXQeAYcUbgS7PbpMknCY= +google.golang.org/genproto/googleapis/rpc v0.0.0-20260209200024-4cfbd4190f57 h1:mWPCjDEyshlQYzBpMNHaEof6UX1PmHcaUODUywQ0uac= +google.golang.org/genproto/googleapis/rpc v0.0.0-20260209200024-4cfbd4190f57/go.mod h1:j9x/tPzZkyxcgEFkiKEEGxfvyumM01BEtsW8xzOahRQ= google.golang.org/grpc v1.19.0/go.mod h1:mqu4LbDTu4XGKhr4mRzUsmM4RtVoemTSY81AxZiDr8c= google.golang.org/grpc v1.20.1/go.mod h1:10oTOabMzJvdu6/UiuZezV6QK5dSlG84ov/aaiqXj38= google.golang.org/grpc v1.21.1/go.mod h1:oYelfM1adQP15Ek0mdvEgi9Df8B9CZIaU1084ijfRaM= diff --git a/scripts/ui-coverage-check.sh b/scripts/ui-coverage-check.sh index 33a43748c..9d24df7ee 100755 --- a/scripts/ui-coverage-check.sh +++ b/scripts/ui-coverage-check.sh @@ -4,28 +4,33 @@ # # Compares the total line coverage in an nyc coverage-summary.json against a # committed baseline and fails (exit 1) if it dropped by more than -# UI_COVERAGE_TOLERANCE percentage points (default 0.1). The React UI e2e suite +# UI_COVERAGE_TOLERANCE percentage points (default 0.8). The React UI e2e suite # drives the real app, so a removed feature or deleted spec shows up as a # coverage drop here. # -# The tolerance exists only to absorb the irreducible measurement noise floor, -# NOT to permit regression. UI e2e coverage USED to swing ~1pp run-to-run, which -# forced a loose 0.8pp band — but that swing was a bug, not inherent jitter: a -# spec that navigated to a route and ended on the URL assertion let the target -# component's render race the coverage teardown, so ~400 lines were collected -# only when the render won (see e2e/agents.spec.js → AgentCreate). With that race -# fixed, repeated runs land within ~0.013pp (a handful of lines) of each other, -# so the band is tightened to 0.1pp — enough for the noise floor, tight enough -# that a real ~40-line regression still trips the gate. If a future run wobbles -# more, fix the racing spec (await a rendered element) rather than loosening this. +# Why the band is this wide: UI e2e line coverage is NOT deterministic. Many +# specs assert on state and end while async/lazy render work is still in flight, +# so those lines are collected only when the render beats the coverage teardown +# — and that depends on machine speed/load. The effect is diffuse (spread across +# dozens of specs, no single dominant file) and tracks the runner: a quiet local +# box measures ~0.9pp higher than a slow/loaded CI runner for the SAME tree +# (observed: 39.9% local vs 39.0% CI). The tolerance absorbs that spread; setting +# it tighter (it was briefly 0.1pp, calibrated to a lucky fast-local cluster) +# makes CI flap. # -# When coverage rises meaningfully, regenerate and commit the baseline with: -# make test-ui-coverage-baseline +# The principled way to tighten this is to remove the variance at the source — +# make each racing spec await a rendered element before ending (e2e/agents.spec.js +# → AgentCreate fixed the single biggest one) — NOT to chase the baseline up to a +# fast-machine high or loosen further. Keep the baseline conservatively at or +# below the slow-runner floor so the band catches real regressions, not jitter. +# +# When coverage rises meaningfully AND reproducibly (check on a slow/CI-like run), +# regenerate and commit the baseline with: make test-ui-coverage-baseline set -eu summary="${1:?usage: ui-coverage-check.sh SUMMARY_JSON BASELINE_FILE}" baseline_file="${2:?usage: ui-coverage-check.sh SUMMARY_JSON BASELINE_FILE}" -tolerance="${UI_COVERAGE_TOLERANCE:-0.1}" +tolerance="${UI_COVERAGE_TOLERANCE:-0.8}" if [ ! -f "$summary" ]; then echo "ui-coverage-check: coverage summary not found: $summary" >&2