Compare commits

...

41 Commits

Author SHA1 Message Date
Ettore Di Giacinto
87d5734c33 fix(config): gate parallel-slot default on per-device VRAM too (#10485)
The first #10485 fix (#10494) made the Blackwell physical-batch boost
per-device/context-aware, which neutralized the big compute-buffer OOM, but
the reporter's 2x16 GiB consumer Blackwell still OOM'd. Tracing the post-fix
log: the model now loads its weights, builds the main context and warms up
fine, and dies only on the *last* allocation — the MTP draft context's 800 MiB
KV cache on the tighter device.

#10411 changed only two defaults: the physical batch (now gated) and a
VRAM-scaled parallel-slot count. The KV cache is unified (n_ctx_seq == full
context proves slots share the budget, so parallel doesn't multiply KV), but
n_seq_max=4 still adds per-slot compute-graph / context-checkpoint / output
scratch. On a device packed ~99% by a 27B model spanning both cards, that
overhead is the few-hundred-MiB straw — which is why reverting #10411 (and only
#10411) restores a working load.

Gate the parallel-slot default on the same per-device headroom predicate as the
batch boost: when a large context already fills a single card
(largeContextForDevice), keep n_parallel=1. A user running one big-context model
that barely fits across two consumer GPUs is not serving four concurrent
tenants. Small contexts and large unified-memory devices (GB10) keep full
concurrency. Applied on both the single-host path and the distributed router.

Also make the auto-tuning visible and reversible (the debugging here needed
DEBUG logs and a git bisect):

  - Log the effective performance-relevant runtime options at INFO once per
    model load ("effective runtime tuning …": context, n_batch, n_gpu_layers,
    parallel, flash_attention, f16) so an admin can see what will run and pin or
    override any value in the model YAML.
  - LOCALAI_DISABLE_HARDWARE_DEFAULTS=true skips the hardware auto-tuning
    entirely (mirrors LOCALAI_DISABLE_GUESSING) for stock llama.cpp behavior.

Signed-off-by: Ettore Di Giacinto <mudler@localai.io>
Assisted-by: Claude:opus-4.8 [Claude Code]
2026-06-25 12:57:19 +00:00
LocalAI [bot]
693e3eec05 chore(model gallery): 🤖 add 1 new models via gallery agent (#10505)
chore(model gallery): 🤖 add new models via gallery agent

Signed-off-by: github-actions[bot] <41898282+github-actions[bot]@users.noreply.github.com>
Co-authored-by: mudler <2420543+mudler@users.noreply.github.com>
2026-06-25 08:11:52 +02:00
LocalAI [bot]
f1e5071321 chore: ⬆️ Update leejet/stable-diffusion.cpp to 8caa3f908ae6d4a4bef531e73b9a969f266a3d1f (#10493)
⬆️ Update leejet/stable-diffusion.cpp

Signed-off-by: github-actions[bot] <41898282+github-actions[bot]@users.noreply.github.com>
Co-authored-by: mudler <2420543+mudler@users.noreply.github.com>
2026-06-25 08:11:31 +02:00
LocalAI [bot]
93d6255de3 chore: ⬆️ Update ggml-org/llama.cpp to 8be759e6f70d629638a7eb70db3824cbdcea370b (#10501)
⬆️ Update ggml-org/llama.cpp

Signed-off-by: github-actions[bot] <41898282+github-actions[bot]@users.noreply.github.com>
Co-authored-by: mudler <2420543+mudler@users.noreply.github.com>
2026-06-25 08:11:17 +02:00
LocalAI [bot]
fe4f425fb5 fix: correct scheme/host on self-referential URLs behind an HTTPS reverse proxy (#10482) (#10504)
* fix(http): harden BaseURL proxy scheme/host detection

Split comma-separated X-Forwarded-Proto and honor the RFC 7239 Forwarded
header so generated links use https behind common reverse-proxy setups.

Refs #10482

Assisted-by: Claude:claude-opus-4-8
Signed-off-by: Ettore Di Giacinto <mudler@localai.io>

* feat(http): honor explicit external base URL in BaseURL

When _external_base_url is set in the request context it dictates the
origin (scheme+host+port); the proxy path prefix is still appended.

Refs #10482

Assisted-by: Claude:claude-opus-4-8
Signed-off-by: Ettore Di Giacinto <mudler@localai.io>

* feat(config): generalize LOCALAI_BASE_URL to ExternalBaseURL

LOCALAI_BASE_URL now sets a single instance-wide external base URL used
for OAuth callbacks and all self-referential links. A Pre middleware
stamps it into the request context for middleware.BaseURL.

Refs #10482

Assisted-by: Claude:claude-opus-4-8
Signed-off-by: Ettore Di Giacinto <mudler@localai.io>

* docs: document LOCALAI_BASE_URL and reverse-proxy headers

Refs #10482

Assisted-by: Claude:claude-opus-4-8
Signed-off-by: Ettore Di Giacinto <mudler@localai.io>

* test(http): cover parseForwarded edge cases; clarify base-url flag group

Adds direct unit coverage for quoted/malformed/multi-element Forwarded
headers and regroups the external base URL flag away from auth-only.

Refs #10482

Assisted-by: Claude:claude-opus-4-8
Signed-off-by: Ettore Di Giacinto <mudler@localai.io>

---------

Signed-off-by: Ettore Di Giacinto <mudler@localai.io>
Co-authored-by: Ettore Di Giacinto <mudler@localai.io>
2026-06-25 08:10:59 +02:00
LocalAI [bot]
fae9f6356f chore: ⬆️ Update ServeurpersoCom/qwentts.cpp to 9dbe7ea26a01b30fccb117ae5e86807c1dc23d42 (#10499)
⬆️ Update ServeurpersoCom/qwentts.cpp

Signed-off-by: github-actions[bot] <41898282+github-actions[bot]@users.noreply.github.com>
Co-authored-by: mudler <2420543+mudler@users.noreply.github.com>
2026-06-25 08:10:41 +02:00
LocalAI [bot]
066abf82c0 feat(llama-cpp): cpu_moe/n_cpu_moe options + generic upstream-flag passthrough (#10490)
* feat(llama-cpp): add main-model cpu_moe/n_cpu_moe options

Mirror the existing draft_cpu_moe/draft_n_cpu_moe siblings for the main
model, matching upstream --cpu-moe / --n-cpu-moe (common/arg.cpp). Lets
users keep MoE expert weights on CPU to manage VRAM on large MoE models.

Closes part of #10483

Signed-off-by: Ettore Di Giacinto <mudler@localai.io>

* feat(llama-cpp): forward unknown '-' options to upstream arg parser

Any options: entry starting with '-' is collected and passed verbatim to
llama.cpp's own common_params_parse (LLAMA_EXAMPLE_SERVER) at the end of
params_parse, so every upstream llama-server flag works without a new
hand-wired branch. Passthrough runs last and wins on overlap; n_parallel is
snapshotted to survive parser_init's SERVER reset, and help/usage/completion
flags are skipped to avoid exiting the backend.

Closes #10483

Signed-off-by: Ettore Di Giacinto <mudler@localai.io>

* docs(llama-cpp): document cpu_moe/n_cpu_moe and option passthrough

Signed-off-by: Ettore Di Giacinto <mudler@localai.io>

* fix(llama-cpp): terminate tensor/kv override vectors after passthrough

The tensor_buft_overrides padding and the kv/draft override terminators
ran before the generic option passthrough, so a passthrough flag
(--cpu-moe, --override-tensor, --override-kv, ...) appended a real entry
after the null sentinel - tripping the model loader's
back().pattern == nullptr assertion (crash) or being silently dropped.
Move all three termination/padding blocks to the end of params_parse,
after both the named-option loop and common_params_parse have pushed
their real entries. Also widen the exit()-flag skip list so --version,
--license, --list-devices and --cache-list cannot terminate the backend.

Signed-off-by: Ettore Di Giacinto <mudler@localai.io>

---------

Signed-off-by: Ettore Di Giacinto <mudler@localai.io>
Co-authored-by: Ettore Di Giacinto <mudler@localai.io>
2026-06-25 08:10:08 +02:00
LocalAI [bot]
a7fec9a49d feat(backends): add darwin/metal (MPS) build for trl (#10487)
* feat(backends): add darwin/metal (MPS) build for trl

Authors backend/python/trl/requirements-mps.txt and wires trl into the
darwin CI matrix and gallery so the MPS training path can be built and
validated on Apple Silicon. The MPS variant installs plain PyPI torch
wheels (MPS-capable on macOS arm64) and the trl training stack; bitsandbytes
is omitted as it is a CUDA-only dependency with poor Apple Silicon support.

Signed-off-by: Ettore Di Giacinto <mudler@localai.io>
Assisted-by: Claude:opus-4.8 [Claude Code]

* fix(trl): guard uv-only --index-strategy for the pip/darwin path

The darwin/MPS build installs with pip (USE_PIP=true), which rejects the
uv-only --index-strategy flag and failed the darwin backend build. Add it
only on the uv path; Linux/CUDA resolution is unchanged.

Signed-off-by: Ettore Di Giacinto <mudler@localai.io>
Assisted-by: Claude:opus-4.8 [Claude Code]

---------

Signed-off-by: Ettore Di Giacinto <mudler@localai.io>
Co-authored-by: Ettore Di Giacinto <mudler@localai.io>
2026-06-25 08:09:36 +02:00
LocalAI [bot]
c678530cf0 fix(backends): darwin/metal support across purego Go backends (#10481)
* fix(parakeet-cpp): darwin/metal support (libparakeet.dylib + DYLD path)

The parakeet-cpp backend had no macOS support and panicked at startup on
Apple/Metal nodes when purego.Dlopen could not find "libparakeet.so".
Fix it across the same four layers the sibling voxtral backend already
handles correctly:

- main.go: default the dlopen target to libparakeet.dylib on darwin
  (runtime.GOOS), libparakeet.so elsewhere; PARAKEET_LIBRARY still wins.
- Makefile: also stage the built libparakeet.dylib next to the Go sources.
- package.sh: accept either the Linux .so[.X.Y] or the macOS .dylib when
  bundling instead of hard-failing when no .so is present (the macOS case);
  note that on Darwin only system frameworks are linked.
- run.sh: on Darwin set DYLD_LIBRARY_PATH and PARAKEET_LIBRARY to the
  packaged .dylib; keep LD_LIBRARY_PATH + .so on Linux.

Mirrors backend/go/voxtral.

Assisted-by: Claude:claude-opus-4-8
Signed-off-by: Ettore Di Giacinto <mudler@localai.io>

* fix(backends): darwin/metal support across purego Go backends

The parakeet-cpp fix in the previous commit was an instance of a bug
shared by nearly every purego/dlopen Go backend: the dlopen target was
hardcoded to a .so name and run.sh exported only LD_LIBRARY_PATH, so the
backend panicked at startup on macOS/Apple-Metal nodes (dyld needs the
.dylib name and DYLD_LIBRARY_PATH). voxtral was the only backend handling
this correctly.

Apply the same four-layer fix (mirroring backend/go/voxtral) to the
remaining affected backends:

  whisper, sherpa-onnx, ced, stablediffusion-ggml, vibevoice-cpp,
  qwen3-tts-cpp, omnivoice-cpp, crispasr, acestep-cpp, locate-anything-cpp,
  depth-anything-cpp, rfdetr-cpp, sam3-cpp, localvqe

Per backend:
- main.go (sherpa-onnx: backend.go, two libraries): default the dlopen
  target to the .dylib on darwin (runtime.GOOS), .so elsewhere; the
  existing <BACKEND>_LIBRARY env override still wins.
- run.sh: on Darwin set DYLD_LIBRARY_PATH and point <BACKEND>_LIBRARY at
  the packaged .dylib; keep LD_LIBRARY_PATH + the Linux CPU-variant
  (avx/avx2/avx512) selection unchanged in the else branch.
- package.sh: also bundle the .dylib and stop hard-failing when no .so is
  present (the macOS case).
- Makefile: also stage the built .dylib.

Notes:
- stablediffusion-ggml and acestep-cpp build their lib as a CMake MODULE,
  which emits .so (not .dylib) on macOS; run.sh prefers .dylib and falls
  back to .so so both layouts work.
- sherpa-onnx was already partly darwin-aware (Makefile/package.sh); only
  run.sh and the two dlopen defaults needed fixing.

Linux behavior is unchanged. Verified gofmt-clean and
`CGO_ENABLED=0 go build` for every backend.

Assisted-by: Claude:claude-opus-4-8
Signed-off-by: Ettore Di Giacinto <mudler@localai.io>

---------

Signed-off-by: Ettore Di Giacinto <mudler@localai.io>
Co-authored-by: Ettore Di Giacinto <mudler@localai.io>
2026-06-25 08:09:18 +02:00
LocalAI [bot]
3c63431e46 chore: ⬆️ Update ServeurpersoCom/omnivoice.cpp to 0f37401bebe9b20c0160a888e592108fc1d17607 (#10492)
⬆️ Update ServeurpersoCom/omnivoice.cpp

Signed-off-by: github-actions[bot] <41898282+github-actions[bot]@users.noreply.github.com>
Co-authored-by: mudler <2420543+mudler@users.noreply.github.com>
2026-06-25 00:57:58 +02:00
LocalAI [bot]
3f647a2764 chore: ⬆️ Update ikawrakow/ik_llama.cpp to d5507e33ae7ee2b7b41475f08044d3bde3b839ee (#10498)
⬆️ Update ikawrakow/ik_llama.cpp

Signed-off-by: github-actions[bot] <41898282+github-actions[bot]@users.noreply.github.com>
Co-authored-by: mudler <2420543+mudler@users.noreply.github.com>
2026-06-25 00:57:42 +02:00
LocalAI [bot]
f88981cdce feat(ui): data-driven hardware model recommendations + gallery surfacing (#10500)
* feat(ui): make hardware starter models data-driven

The empty-state starter widget recommended from a hardcoded list, which
drifts as the gallery evolves. Add useRecommendedModels: it queries the
live gallery for chat-capable models (their natural curated order, since
the gallery exposes no popularity signal), estimates size/VRAM for the top
candidates via the existing estimate endpoint, and ranks by hardware fit -
smallest on CPU-only boxes, largest-that-fits on GPUs.

StarterModels now renders those live picks and keeps the curated static
list only as an offline/trimmed-gallery fallback.

Signed-off-by: Ettore Di Giacinto <mudler@localai.io>
Assisted-by: Claude:claude-opus-4-8 [Claude Code]

* feat(ui): recommend models for your hardware in the gallery

Hardware-aware recommendations were only shown on the first-run empty
state. Surface them on the main Models gallery too: a dismissible
"Recommended for your hardware" strip at the top, sharing the
useRecommendedModels fit-ranking with the starter widget. CPU-only boxes
get small models; GPUs get the largest picks that fit VRAM, with size and
VRAM shown per card. One-click install; dismissal persists per browser.

Signed-off-by: Ettore Di Giacinto <mudler@localai.io>
Assisted-by: Claude:claude-opus-4-8 [Claude Code]

* feat(ui): gpu-mid tier + NVIDIA NVFP4 model recommendations

Refine the hardware recommendation tiers and curated picks:

- Add a gpu-mid tier (8-24GB VRAM) between gpu-small and gpu-large, so
  ~27B-class models are suggested separately from the 30B+ large tier.
- Detect NVIDIA GPUs (resources.gpus[].vendor) and, on NVIDIA only, prefer
  NVFP4 + MTP variants (Blackwell-optimised); NVFP4 models are filtered out
  of recommendations on non-NVIDIA hardware where they can't run. This
  applies to both the live ranking and the static fallback, with an NVFP4
  badge shown on those picks.
- Refresh the curated fallback to current models: Gemma-4 QAT Q4 builds at
  every tier, low qwen3.5 (4B distilled / 9B) on CPU/small, qwen3.6-27b
  and MTP variants at mid, qwen3.6/qwen3.5 35B-A3B apex/distilled at large.
  All names verified against gallery/index.yaml.

Signed-off-by: Ettore Di Giacinto <mudler@localai.io>
Assisted-by: Claude:claude-opus-4-8 [Claude Code]

---------

Signed-off-by: Ettore Di Giacinto <mudler@localai.io>
Co-authored-by: Ettore Di Giacinto <mudler@localai.io>
2026-06-25 00:22:45 +02:00
LocalAI [bot]
0d6de15ae9 fix(config): per-device VRAM headroom for Blackwell defaults (#10485) (#10494)
The hardware-tuned defaults from #10411 were measured on a GB10 / DGX Spark
(128 GiB unified memory) and over-provisioned multi-GPU consumer Blackwell
(e.g. 2x16 GiB RTX 50-series) into CUDA OOM during model init:

  - The Blackwell physical batch (512 -> 2048) sets both n_batch and n_ubatch.
    The compute buffer scales ~n_ubatch * n_ctx and is allocated PER DEVICE
    (it can't be split across GPUs), so a large context turns ub2048 into
    multi-GiB of scratch that must fit one 16 GiB card.
  - The VRAM-scaled parallel-slot default tiered off TotalAvailableVRAM(),
    which SUMS all GPUs (2x16 -> "32 GiB" -> 8 slots), but the allocations
    are per-device.

Make both decisions per-device and context-aware:

  - xsysinfo.MinPerGPUVRAM() reports the smallest device's VRAM; localGPU()
    uses it so the parallel tier and batch guard reason about one card.
  - PhysicalBatchForContext(gpu, ctx) raises the batch only when the extra
    compute buffer fits VRAM/4 at this model's context (16 GiB crosses over
    ~174k ctx, 32 GiB ~349k; GB10 reports system RAM so it still clears it).
  - Apply hardware defaults AFTER runBackendHooks in SetDefaults so the
    GGUF-guessed context is resolved before the batch decision.
  - The distributed router gates the node batch the same way.

Unified-memory devices (GB10, Apple) report system RAM as their single
device's VRAM, so they keep the prefill win.


Assisted-by: Claude:opus-4.8 [Claude Code]

Signed-off-by: Ettore Di Giacinto <mudler@localai.io>
Co-authored-by: Ettore Di Giacinto <mudler@localai.io>
2026-06-25 00:07:48 +02:00
LocalAI [bot]
5c3d48ab50 feat(ui): usage & UX enhancements (last-used model, polling, starter models, usage cost, a11y) (#10496)
* feat(ui): remember last-used model per capability

ModelSelector auto-selected the first option whenever the bound value was
empty or stale, so every visit to the Home chat box, Image, TTS or Talk
pages reset the choice to whatever sorted first. Persist the user's pick
in localStorage keyed by capability and prefer it on auto-select when the
model is still available, falling back to the first option otherwise.

Because every modality picker funnels through ModelSelector, this fixes
the friction everywhere at once. External-options callers pass no
capability and keep the previous first-item behaviour.

Signed-off-by: Ettore Di Giacinto <mudler@localai.io>
Assisted-by: Claude:claude-opus-4-8 [Claude Code]

* feat(ui): add visibility-aware polling hook

The app had 26 hand-rolled setInterval polls, none of which paused when
the browser tab was hidden, so backgrounded dashboards kept hitting the
server every few seconds for data nobody was looking at.

Add usePolling: runs immediately, polls on a fixed interval, pauses while
document.hidden, fires a catch-up poll on return, and guards against
overlapping slow requests. Route useResources (the highest-frequency
shared poll) through it. Further callers can be migrated incrementally.

Signed-off-by: Ettore Di Giacinto <mudler@localai.io>
Assisted-by: Claude:claude-opus-4-8 [Claude Code]

* feat(ui): hardware-aware starter models on empty home

A fresh install dropped admins straight into a 1000+ model gallery with
no guidance. Add a StarterModels widget to the empty-state wizard that
recommends a small, curated set tuned to the detected hardware:

- CPU-only machines (no GPU VRAM) are steered to genuinely small models
  (1-4B, Q4) that stay responsive without a GPU.
- GPU machines get suggestions scaled to available VRAM.

Curated names are real gallery entries, intersected against the live
gallery at render time so a trimmed/custom gallery degrades gracefully.
Install is one click via the existing model-install API.

Also routes Home's cluster and system-info polls through usePolling so a
backgrounded home page stops fetching.

Signed-off-by: Ettore Di Giacinto <mudler@localai.io>
Assisted-by: Claude:claude-opus-4-8 [Claude Code]

* feat(ui): optional token-cost estimates on usage dashboard

The usage dashboard tracked tokens but had no monetary view. Multi-user
deployments that bill back or budget compute had to export and compute
cost elsewhere.

Add an opt-in pricing control: admins set $ per 1M prompt/completion
tokens (stored per-browser). When set, an estimated-cost summary card and
per-model / per-user cost columns appear, computed from recorded token
counts. The entire cost surface stays hidden until a price is entered, so
the default view is unchanged. Cost is clearly labelled an estimate -
LocalAI itself has no notion of price.

Signed-off-by: Ettore Di Giacinto <mudler@localai.io>
Assisted-by: Claude:claude-opus-4-8 [Claude Code]

* fix(ui): label icon-only send buttons for screen readers

The chat and agent-chat send buttons were a bare paper-plane icon with
no accessible name, so screen readers announced only "button". Add an
aria-label/title ("Send message") and mark the icon aria-hidden. An audit
of all icon-only buttons found these were the only two unlabeled controls;
the rest already carry visible text.

Signed-off-by: Ettore Di Giacinto <mudler@localai.io>
Assisted-by: Claude:claude-opus-4-8 [Claude Code]

---------

Signed-off-by: Ettore Di Giacinto <mudler@localai.io>
Co-authored-by: Ettore Di Giacinto <mudler@localai.io>
2026-06-24 23:30:08 +02:00
LocalAI [bot]
764b0352b9 docs: ⬆️ update docs version mudler/LocalAI (#10491)
⬆️ Update docs version mudler/LocalAI

Signed-off-by: github-actions[bot] <41898282+github-actions[bot]@users.noreply.github.com>
Co-authored-by: mudler <2420543+mudler@users.noreply.github.com>
2026-06-24 23:18:24 +02:00
LocalAI [bot]
75ba2daba1 chore(model-gallery): ⬆️ update checksum (#10495)
⬆️ Checksum updates in gallery/index.yaml

Signed-off-by: github-actions[bot] <41898282+github-actions[bot]@users.noreply.github.com>
Co-authored-by: mudler <2420543+mudler@users.noreply.github.com>
2026-06-24 23:18:04 +02:00
LocalAI [bot]
62b14fd635 feat(backends): add darwin/metal build for liquid-audio (#10486)
* feat(backends): add darwin/metal build for liquid-audio

Wire the already-MPS-ready liquid-audio backend (it ships
requirements-mps.txt) into the darwin CI matrix and the gallery so
metal-darwin-arm64 images are built and selectable.

Signed-off-by: Ettore Di Giacinto <mudler@localai.io>
Assisted-by: Claude:opus-4.8 [Claude Code]

* ci(liquid-audio): trigger darwin build via requirements-mps note

The changed-backends path filter only builds a backend when a file under
its directory changes. The metal wiring lived in index.yaml + the matrix,
so the darwin job was skipped. Add a documenting comment to the MPS
requirements so CI actually exercises the darwin build.

Signed-off-by: Ettore Di Giacinto <mudler@localai.io>
Assisted-by: Claude:opus-4.8 [Claude Code]

* fix(liquid-audio): guard uv-only --index-strategy for the pip/darwin path

Same fix as trl: the darwin/MPS build installs with pip (USE_PIP=true), which
rejects the uv-only --index-strategy flag and failed the darwin backend build.
Add it only on the uv path; Linux/CUDA resolution is unchanged.

Signed-off-by: Ettore Di Giacinto <mudler@localai.io>
Assisted-by: Claude:opus-4.8 [Claude Code]

---------

Signed-off-by: Ettore Di Giacinto <mudler@localai.io>
Co-authored-by: Ettore Di Giacinto <mudler@localai.io>
2026-06-24 23:16:27 +02:00
LocalAI [bot]
193d0e6aef fix(backends): darwin/metal support for supertonic (#10488)
The supertonic Go TTS backend dlopens ONNX Runtime, but its runtime and
packaging scripts were Linux-only: run.sh exported LD_LIBRARY_PATH, pointed
ONNXRUNTIME_LIB_PATH at libonnxruntime.so, and always tried the ld.so exec
path, while package.sh hard-failed on any non-Linux host. On macOS dyld has
no ld.so loader, uses DYLD_LIBRARY_PATH, and ONNX Runtime ships as a .dylib.

This applies the same purego .dylib/DYLD_LIBRARY_PATH fix that PR #10481
landed for 15 other ONNX/purego backends (sherpa-onnx, silero-vad, etc.) but
which omitted supertonic:

- run.sh: on darwin export DYLD_LIBRARY_PATH and point ONNXRUNTIME_LIB_PATH
  at libonnxruntime.dylib; guard the ld.so exec path to Linux only.
- package.sh: recognize Darwin instead of erroring out; the bundled .dylib is
  resolved via DYLD_LIBRARY_PATH, no glibc/ld.so to bundle.
- helper.go: platform-native default library extension (dylib on darwin) for
  the last-resort dlopen fallback.

It also wires the darwin CI build and gallery entries, resolving the
inconsistency where backend/index.yaml advertised metal for supertonic but no
includeDarwin matrix entry built the image:

- .github/backend-matrix.yml: add the -metal-darwin-arm64-supertonic Go entry.
- backend/index.yaml: declare metal capabilities and add the concrete
  metal-supertonic / metal-supertonic-development child entries.

The Makefile already detects Darwin/osx/arm64 and stages the per-OS ONNX
Runtime tarball, mirroring sherpa-onnx, so no Makefile change is required.


Assisted-by: Claude:opus-4.8 [Claude Code]

Signed-off-by: Ettore Di Giacinto <mudler@localai.io>
Co-authored-by: Ettore Di Giacinto <mudler@localai.io>
2026-06-24 22:19:03 +02:00
LocalAI [bot]
482314c623 fix(realtime): resolve model aliases for pipeline sub-models (#10484)
Realtime pipeline sub-models (llm/transcription/tts/vad/sound-detection)
were loaded via cl.LoadModelConfigFileByName without alias resolution,
unlike top-level API requests which resolve aliases in
core/http/middleware/request.go. So a pipeline that references an alias
(e.g. `pipeline.llm: default`, where `default` is an alias for a real
LLM) reached model loading as the alias stub with an empty Backend.

This was silently broken on a single host (it failed downstream) and a
hard error in distributed/p2p mode:

    routing model : loading model default: ... installing backend on
    node X: backend name is empty

Fix by routing every pipeline sub-model load through a small helper that
follows a single alias hop (mirroring the top-level resolution), so
non-alias sub-models behave identically and aliased ones get the
target's full config (Backend, Model, ...).

Assisted-by: Claude:claude-opus-4-8

Signed-off-by: Ettore Di Giacinto <mudler@localai.io>
Co-authored-by: Ettore Di Giacinto <mudler@localai.io>
2026-06-24 21:50:44 +02:00
Dedy F. Setyawan
e8ae88a2a0 i18n(id): update and complete Indonesian translations (#10480)
- translate remaining English strings in chat, common, home, and media locales.
- fix typo and improve wording consistency (e.g., klaster -> kluster, otomasi -> automasi).

Signed-off-by: Dedy F. Setyawan <dedyfajars@gmail.com>
2026-06-24 18:35:21 +02:00
Richard Palethorpe
e1994579f8 fix(pii): load default detectors at startup + add LOCALAI_PII_DEFAULT_DETECTORS (#10474)
pii_default_detectors was applied to the live config only by a live
POST /api/settings (ApplyRuntimeSettings) — neither the startup loader nor
the config file watcher read it back. So after a restart the persisted
default detectors were dropped, and the cloud-proxy MITM listener (which
resolves each intercept host's detectors once at start via ResolvePIIPolicy)
came up with an empty set and forwarded intercepted traffic unredacted, even
though the MITM model had pii.enabled:true and the defaults were on disk.
Request-side default redaction broke the same way.

- startup.go: loadRuntimeSettingsFromFile now applies pii_default_detectors,
  before startMITMIfConfigured, with env > file precedence.
- config_file_watcher.go: apply pii_default_detectors on live file edits,
  matching the existing env-guard pattern used for the other fields.
- settings endpoint: rebuild the MITM listener when pii_default_detectors
  changes (its per-host detector map is frozen at listener start), not only
  on a mitm_listen change — so toggling a default detector takes effect on
  cloud-proxy traffic immediately.
- new LOCALAI_PII_DEFAULT_DETECTORS env var / CLI flag (WithPIIDefaultDetectors)
  so the default detector set can be pinned at boot for immutable deployments.

Assisted-by: Claude:claude-opus-4-8 Claude-Code

Signed-off-by: Richard Palethorpe <io@richiejp.com>
Co-authored-by: Ettore Di Giacinto <mudler@users.noreply.github.com>
2026-06-24 11:08:57 +02:00
LocalAI [bot]
e5620989dd refactor(distributed): make in-flight tracking coverage a compile-time contract (#10476)
PR #10475 fixed SoundDetection in-flight tracking, but the underlying trap
remains: InFlightTrackingClient embedded the whole grpc.Backend interface
"for passthrough of untracked methods", so any newly added inference method
is silently satisfied by the embedded passthrough and never wrapped with
track(). That leaves onFirstComplete unfired and in-flight stuck at 1 - the
exact SoundDetection bug, waiting to recur for the next backend method.

Close the gap at the type level instead of relying on reviewers to remember:

- Split grpc.Backend into two composed sub-interfaces: InferenceBackend
  (methods that are one discrete inference call and must be tracked) and
  ControlBackend (control-plane calls plus the streaming constructors whose
  work spans the returned stream, safe to pass through). The classification
  now lives next to the interface it documents.
- InFlightTrackingClient embeds only grpc.ControlBackend and implements every
  InferenceBackend method explicitly, delegating to an inner InferenceBackend.
  A `var _ grpc.Backend = (*InFlightTrackingClient)(nil)` assertion makes the
  package fail to compile if any inference method is left unwrapped.

Now adding a method to InferenceBackend is a build error (at the assertion and
every call site: "does not implement grpc.Backend (missing method X)"), not a
silent runtime leak - and the obvious fix is to copy a neighbouring wrapper,
which calls track(). No runtime guard or reviewer vigilance required.

Pure refactor: the composed Backend interface is identical to the old flat
one, so all implementers and consumers are unaffected (verified with a full
`go build ./...`). Behaviour is unchanged; the existing nodes suite passes.


Assisted-by: Claude:claude-opus-4-8 [Claude Code]

Signed-off-by: Ettore Di Giacinto <mudler@localai.io>
Co-authored-by: Ettore Di Giacinto <mudler@localai.io>
2026-06-24 11:08:29 +02:00
LocalAI [bot]
fc618dcee6 fix(distributed): track in-flight for SoundDetection requests (#10475)
The distributed router wraps backend clients in InFlightTrackingClient so
the eviction logic knows which replicas are actively serving. Every
inference method must be wrapped: track() increments in-flight on entry
and decrements (plus fires onFirstComplete, which releases the load-time
reservation) on return.

SoundDetection was added after the tracking client and never got a
wrapper, so its calls fell through to the embedded passthrough Backend.
The increment/decrement never ran and, critically, onFirstComplete never
fired, so the reservation set at model load was never released - leaving
in-flight stuck at 1 and the replica permanently ineligible for eviction.

Wrap SoundDetection like the other non-LLM methods and cover it in the
"non-LLM inference methods track in-flight" table test.


Assisted-by: Claude:claude-opus-4-8 [Claude Code]

Signed-off-by: Ettore Di Giacinto <mudler@localai.io>
Co-authored-by: Ettore Di Giacinto <mudler@localai.io>
2026-06-24 10:13:37 +02:00
LocalAI [bot]
e6042080c0 fix(agents): URL-decode collection/agent name path params (#10443) (#10471)
fix(agents): URL-decode collection/agent name path params

Collection and agent names carry a "legacy-api-key:" prefix, so the ':'
arrives percent-encoded as %3A in the request path. Echo routes such
paths via URL.RawPath and stores the matched path-param value still
escaped, so c.Param("name") returned "legacy-api-key%3ALiteraryResearch"
and the store lookup 404'd ("collection not found").

This was second-order fallout of #10375/#10387: once colons became valid
in names, the URL-decode gap surfaced on every name-bearing endpoint.

Add a decodedParam helper that url.PathUnescape's the param (falling back
to the raw value on invalid encoding) and wire it into all collection
endpoints and the agent :name endpoints, which share the identical
prefix. The entry endpoints already unescaped c.Param("*"); this closes
the same gap for :name.

Fixes #10443


Assisted-by: Claude:claude-opus-4-8 [Claude Code]

Signed-off-by: Ettore Di Giacinto <mudler@localai.io>
Co-authored-by: Ettore Di Giacinto <mudler@localai.io>
2026-06-24 09:42:09 +02:00
LocalAI [bot]
0f3b24436d chore: ⬆️ Update mudler/parakeet.cpp to 89f5e2977b4d8bccd45e7bcc6f2ef7c4ed49e89a (#10468)
⬆️ Update mudler/parakeet.cpp

Signed-off-by: github-actions[bot] <41898282+github-actions[bot]@users.noreply.github.com>
Co-authored-by: mudler <2420543+mudler@users.noreply.github.com>
2026-06-24 09:41:43 +02:00
LocalAI [bot]
4b6f911835 chore: ⬆️ Update ggml-org/whisper.cpp to 43d78af5be58f41d6ffbc227d608f104577741ea (#10466)
⬆️ Update ggml-org/whisper.cpp

Signed-off-by: github-actions[bot] <41898282+github-actions[bot]@users.noreply.github.com>
Co-authored-by: mudler <2420543+mudler@users.noreply.github.com>
2026-06-24 09:41:14 +02:00
LocalAI [bot]
a5e28942a6 chore: ⬆️ Update ggml-org/llama.cpp to be4a6a63eb2b848e19c277bdcf2bd399e8af76d9 (#10467)
⬆️ Update ggml-org/llama.cpp

Signed-off-by: github-actions[bot] <41898282+github-actions[bot]@users.noreply.github.com>
Co-authored-by: mudler <2420543+mudler@users.noreply.github.com>
2026-06-24 09:40:54 +02:00
LocalAI [bot]
dba9cd7ca4 chore: ⬆️ Update CrispStrobe/CrispASR to 96b2a6ee31d30389fed8a7ef1a54239b75231ddc (#10465)
⬆️ Update CrispStrobe/CrispASR

Signed-off-by: github-actions[bot] <41898282+github-actions[bot]@users.noreply.github.com>
Co-authored-by: mudler <2420543+mudler@users.noreply.github.com>
2026-06-24 09:40:34 +02:00
LocalAI [bot]
c93190de50 chore: ⬆️ Update ikawrakow/ik_llama.cpp to 7ccf1d209588962b96eacca325b37e9b3e8faf5e (#10456)
⬆️ Update ikawrakow/ik_llama.cpp

Signed-off-by: github-actions[bot] <41898282+github-actions[bot]@users.noreply.github.com>
Co-authored-by: mudler <2420543+mudler@users.noreply.github.com>
2026-06-24 09:40:13 +02:00
LocalAI [bot]
4dbf69f889 chore(model gallery): 🤖 add 1 new models via gallery agent (#10472)
chore(model gallery): 🤖 add new models via gallery agent

Signed-off-by: github-actions[bot] <41898282+github-actions[bot]@users.noreply.github.com>
Co-authored-by: mudler <2420543+mudler@users.noreply.github.com>
2026-06-24 00:00:26 +02:00
LocalAI [bot]
deb430f3ec chore(model-gallery): ⬆️ update checksum (#10469)
⬆️ Checksum updates in gallery/index.yaml

Signed-off-by: github-actions[bot] <41898282+github-actions[bot]@users.noreply.github.com>
Co-authored-by: mudler <2420543+mudler@users.noreply.github.com>
2026-06-23 23:15:47 +02:00
LocalAI [bot]
dd8c8778e2 chore(model gallery): 🤖 add 1 new models via gallery agent (#10464)
chore(model gallery): 🤖 add new models via gallery agent

Signed-off-by: github-actions[bot] <41898282+github-actions[bot]@users.noreply.github.com>
Co-authored-by: mudler <2420543+mudler@users.noreply.github.com>
2026-06-23 15:43:21 +02:00
LocalAI [bot]
06a7b6cadb chore: ⬆️ Update leejet/stable-diffusion.cpp to f440ad9c29dd8bc34e5d1f4b863832b96d6ea05f (#10457)
⬆️ Update leejet/stable-diffusion.cpp

Signed-off-by: github-actions[bot] <41898282+github-actions[bot]@users.noreply.github.com>
Co-authored-by: mudler <2420543+mudler@users.noreply.github.com>
2026-06-23 13:29:07 +02:00
LocalAI [bot]
67c8889866 chore: ⬆️ Update CrispStrobe/CrispASR to 63b57289255267edf66e43e33bc3911e04a2e92d (#10455)
⬆️ Update CrispStrobe/CrispASR

Signed-off-by: github-actions[bot] <41898282+github-actions[bot]@users.noreply.github.com>
Co-authored-by: mudler <2420543+mudler@users.noreply.github.com>
2026-06-23 13:28:49 +02:00
LocalAI [bot]
1d49041c85 chore: ⬆️ Update ggml-org/llama.cpp to 73618f27a801c0b8614ceaf3547d3c2a99baae14 (#10458)
⬆️ Update ggml-org/llama.cpp

Signed-off-by: github-actions[bot] <41898282+github-actions[bot]@users.noreply.github.com>
Co-authored-by: mudler <2420543+mudler@users.noreply.github.com>
2026-06-23 13:28:09 +02:00
LocalAI [bot]
2edc4e25b3 chore: ⬆️ Update ggml-org/whisper.cpp to bae6bc02b1940bbfb87b6a0299c565e563b916d1 (#10459)
⬆️ Update ggml-org/whisper.cpp

Signed-off-by: github-actions[bot] <41898282+github-actions[bot]@users.noreply.github.com>
Co-authored-by: mudler <2420543+mudler@users.noreply.github.com>
2026-06-23 13:27:51 +02:00
Richard Palethorpe
7888067914 fix(settings): merge partial /api/settings updates instead of overwriting (#10463)
POST /api/settings rebuilt runtime_settings.json from only the request
body, so a focused admin page that submits a single field wiped every
other persisted setting. The Middleware proxy tab (mitm_listen) and
detector table (pii_default_detectors), plus the MCP SetBranding tool
(instance_name/instance_tagline), all POST partial bodies; the
no-omitempty api_keys and pii_default_detectors fields even round-tripped
as JSON null.

Read the persisted settings and overlay only the fields the request set
(RuntimeSettings.MergeNonNil) before writing. Every field is a pointer, so
the reflection-based merge is total over the struct and any field added
later is preserved automatically. Absent or null fields are now kept;
clearing a setting is done by sending its explicit empty/zero value
(api_keys [], mitm_listen "", etc.), unchanged from before. The full
Settings page sends every field, so its Save behaves identically.

Assisted-by: Claude:claude-opus-4-8 Claude-Code

Signed-off-by: Richard Palethorpe <io@richiejp.com>
2026-06-23 13:27:34 +02:00
LocalAI [bot]
9eedbf537a chore(model gallery): 🤖 add 1 new models via gallery agent (#10461)
chore(model gallery): 🤖 add new models via gallery agent

Signed-off-by: github-actions[bot] <41898282+github-actions[bot]@users.noreply.github.com>
Co-authored-by: mudler <2420543+mudler@users.noreply.github.com>
2026-06-23 08:04:46 +02:00
LocalAI [bot]
69c16481c8 fix(test): update e2e UpdateProgress calls for new cancellable arg (#10460)
PR #10454 added a `cancellable bool` parameter to GalleryStore.UpdateProgress
but missed two callers under tests/e2e/distributed, breaking the build on
master (golangci-lint and tests-e2e-backend both failed to compile with
"not enough arguments in call to ... UpdateProgress").

Pass cancellable=true (both ops are downloading installs, which are
cancellable) and assert the flag is persisted, exercising the new behavior.


Assisted-by: Claude:claude-opus-4-8 [Claude Code]

Signed-off-by: Ettore Di Giacinto <mudler@localai.io>
Co-authored-by: Ettore Di Giacinto <mudler@localai.io>
2026-06-22 23:45:22 +02:00
LocalAI [bot]
56f8a6623f fix(galleryop): persist cancellable so restarted in-flight ops stay cancellable (#10454)
In distributed mode a model/backend install marks OpStatus.Cancellable=true
while downloading, but the gallery_operations row never recorded it:
UpdateStatus persisted only progress/status and Create left the cancellable
column at its zero value. After a replica restart Hydrate rebuilt the op with
cancellable=false, /api/operations reported false, and the UI hid the cancel
button - the orphaned op then lingered until the 30-minute stale reaper
expired it ("stays there on restart, can't cancel, after a bit it expires").

Persist the flag on every progress tick and at row creation (installs are
cancellable, deletes are not), and clear it on terminal transitions. A
rehydrated in-flight op is now cancellable, so an admin can dismiss the
orphaned op immediately instead of waiting out the reaper. The functional
cancel path already survived restart (CancelOperation persists store.Cancel
even with no live CancelFunc); this restores the UI affordance that drives it.


Assisted-by: Claude:claude-opus-4-8 [Claude Code]

Signed-off-by: Ettore Di Giacinto <mudler@localai.io>
Co-authored-by: Ettore Di Giacinto <mudler@localai.io>
2026-06-22 22:41:16 +02:00
Ettore Di Giacinto
4755d676a3 Revert "feat(ui): role and deployment-mode adaptive UI (landing, sidebar, top navbar)" (#10453)
Revert "feat(ui): role and deployment-mode adaptive UI (landing, sidebar, top…"

This reverts commit 9d54a599b0.
2026-06-22 21:59:05 +02:00
146 changed files with 3010 additions and 1029 deletions

View File

@@ -4974,6 +4974,12 @@ includeDarwin:
- backend: "kitten-tts"
tag-suffix: "-metal-darwin-arm64-kitten-tts"
build-type: "mps"
- backend: "trl"
tag-suffix: "-metal-darwin-arm64-trl"
build-type: "mps"
- backend: "liquid-audio"
tag-suffix: "-metal-darwin-arm64-liquid-audio"
build-type: "mps"
- backend: "piper"
tag-suffix: "-metal-darwin-arm64-piper"
build-type: "metal"
@@ -4990,6 +4996,10 @@ includeDarwin:
tag-suffix: "-metal-darwin-arm64-sherpa-onnx"
build-type: "metal"
lang: "go"
- backend: "supertonic"
tag-suffix: "-metal-darwin-arm64-supertonic"
build-type: "metal"
lang: "go"
- backend: "local-store"
tag-suffix: "-metal-darwin-arm64-local-store"
build-type: "metal"

View File

@@ -1,5 +1,5 @@
IK_LLAMA_VERSION?=6c00e87ac84404af588ad2e65935bd6f079c696f
IK_LLAMA_VERSION?=d5507e33ae7ee2b7b41475f08044d3bde3b839ee
LLAMA_REPO?=https://github.com/ikawrakow/ik_llama.cpp
CMAKE_ARGS?=

View File

@@ -1,5 +1,5 @@
LLAMA_VERSION?=7c082bc417bbe53210a83df4ba5b49e18ce6193c
LLAMA_VERSION?=8be759e6f70d629638a7eb70db3824cbdcea370b
LLAMA_REPO?=https://github.com/ggerganov/llama.cpp
CMAKE_ARGS?=

View File

@@ -37,6 +37,7 @@
#include "backend.pb.h"
#include "backend.grpc.pb.h"
#include "common.h"
#include "arg.h"
#include "chat-auto-parser.h"
#include <getopt.h>
#include <grpcpp/ext/proto_server_reflection_plugin.h>
@@ -592,6 +593,10 @@ static void params_parse(server_context& /*ctx_server*/, const backend::ModelOpt
params.checkpoint_min_step = 256;
#endif
// Raw upstream llama-server flags collected from any option entry that
// starts with '-'. Applied once after the loop via common_params_parse.
std::vector<std::string> extra_argv;
// decode options. Options are in form optname:optvale, or if booleans only optname.
for (int i = 0; i < request->options_size(); i++) {
std::string opt = request->options(i);
@@ -1080,6 +1085,31 @@ static void params_parse(server_context& /*ctx_server*/, const backend::ModelOpt
} catch (...) {}
}
// --- main model MoE on CPU (upstream --cpu-moe / --n-cpu-moe) ---
} else if (!strcmp(optname, "cpu_moe")) {
// Bool-style flag: keep all MoE expert weights on CPU.
const bool enable = (optval == NULL) ||
optval_str == "true" || optval_str == "1" || optval_str == "yes" ||
optval_str == "on" || optval_str == "enabled";
if (enable) {
params.tensor_buft_overrides.push_back(llm_ffn_exps_cpu_override());
}
} else if (!strcmp(optname, "n_cpu_moe")) {
if (optval != NULL) {
try {
int n = std::stoi(optval_str);
if (n < 0) n = 0;
// Keep override-name storage alive for the lifetime of the
// params struct (mirrors upstream arg.cpp's function-local static).
static std::list<std::string> buft_overrides_main;
for (int i = 0; i < n; ++i) {
buft_overrides_main.push_back(llm_ffn_exps_block_regex(i));
params.tensor_buft_overrides.push_back(
{buft_overrides_main.back().c_str(), ggml_backend_cpu_buffer_type()});
}
} catch (...) {}
}
// --- draft model tensor buffer overrides (upstream --spec-draft-override-tensor) ---
} else if (!strcmp(optname, "draft_override_tensor") || !strcmp(optname, "spec_draft_override_tensor")) {
// Format: <tensor regex>=<buffer type>,<tensor regex>=<buffer type>,...
@@ -1111,6 +1141,30 @@ static void params_parse(server_context& /*ctx_server*/, const backend::ModelOpt
else { cur.push_back(c); }
}
if (!cur.empty()) flush(cur);
// --- generic passthrough: any entry starting with '-' is a raw
// upstream llama-server flag, forwarded verbatim to the parser. ---
} else if (optname[0] == '-') {
std::string flag = optname;
// These flags make upstream's parser exit() (printing usage /
// completion), which would kill the backend process. Skip them.
if (flag == "-h" || flag == "--help" || flag == "--usage" ||
flag == "--version" || flag == "--license" ||
flag == "--list-devices" || flag == "-cl" ||
flag == "--cache-list" ||
flag.rfind("--completion", 0) == 0) {
fprintf(stderr,
"[llama-cpp] ignoring passthrough flag that would exit: %s\n",
flag.c_str());
} else {
extra_argv.push_back(flag);
// Preserve the whole value after the first ':' so embedded
// colons (e.g. host:port) survive strtok's truncation of optval.
auto colon = opt.find(':');
if (colon != std::string::npos) {
extra_argv.push_back(opt.substr(colon + 1));
}
}
}
}
@@ -1146,27 +1200,6 @@ static void params_parse(server_context& /*ctx_server*/, const backend::ModelOpt
}
}
if (!params.kv_overrides.empty()) {
params.kv_overrides.emplace_back();
params.kv_overrides.back().key[0] = 0;
}
// tensor_buft_overrides sentinel termination (mirrors upstream common/arg.cpp).
// Real entries are pushed during option parsing; here we pad/terminate so the
// model loader sees back().pattern == nullptr (GGML_ASSERT at common.cpp:1543)
// and so llama_params_fit has the placeholder slots it requires.
{
const size_t ntbo = llama_max_tensor_buft_overrides();
while (params.tensor_buft_overrides.size() < ntbo) {
params.tensor_buft_overrides.push_back({nullptr, nullptr});
}
}
// Terminate the draft tensor_buft_overrides list with a sentinel, mirroring
// the main-model handling above.
if (!params.speculative.draft.tensor_buft_overrides.empty()) {
params.speculative.draft.tensor_buft_overrides.push_back({nullptr, nullptr});
}
// TODO: Add yarn
if (!request->tensorsplit().empty()) {
@@ -1259,6 +1292,69 @@ static void params_parse(server_context& /*ctx_server*/, const backend::ModelOpt
params.sampling.grammar_triggers.push_back(std::move(trigger));
}
}
// Apply any raw upstream flags last so an explicit passthrough flag wins
// over the LocalAI-resolved field it maps to (e.g. --ctx-size beats
// context_size). This is the same parser llama-server itself uses.
if (!extra_argv.empty()) {
// common_params_parser_init resets a few fields for the SERVER example
// (n_parallel -> -1, use_color). Snapshot n_parallel so an unrelated
// passthrough flag can't silently clobber LocalAI's resolved value.
const int saved_n_parallel = params.n_parallel;
std::vector<char *> argv;
std::string prog = "llama-server";
argv.push_back(prog.data());
for (auto & a : extra_argv) {
argv.push_back(a.data());
}
// ctx_arg.params is a reference, so this overlays the given flags onto
// `params` in place. Returns false on a recoverable parse error (and
// self-restores params); may exit() on a hard error, exactly as
// passing the same bad flag to llama-server would.
if (!common_params_parse((int)argv.size(), argv.data(), params,
LLAMA_EXAMPLE_SERVER)) {
fprintf(stderr,
"[llama-cpp] failed to parse passthrough options; ignoring them\n");
}
// Restore n_parallel unless a passthrough flag explicitly set it
// (parser_init's reset sentinel for SERVER is -1).
if (params.n_parallel == -1) {
params.n_parallel = saved_n_parallel;
}
}
// Terminate/pad the override vectors only after BOTH the named-option loop
// and the generic passthrough (common_params_parse above) have pushed their
// real entries, so back() is the null sentinel the model loader asserts on.
// Running these before the passthrough let a passthrough flag (--cpu-moe,
// --override-tensor, --override-kv, ...) append a real entry after the
// sentinel: a GGML_ASSERT crash for tensor_buft_overrides, a silent drop for
// kv_overrides. Double-termination is harmless (the while is a no-op if the
// passthrough parse already padded; an extra trailing null is ignored).
if (!params.kv_overrides.empty()) {
params.kv_overrides.emplace_back();
params.kv_overrides.back().key[0] = 0;
}
// tensor_buft_overrides sentinel termination (mirrors upstream common/arg.cpp).
// Real entries are pushed during option parsing; here we pad/terminate so the
// model loader sees back().pattern == nullptr (GGML_ASSERT at common.cpp:1543)
// and so llama_params_fit has the placeholder slots it requires.
{
const size_t ntbo = llama_max_tensor_buft_overrides();
while (params.tensor_buft_overrides.size() < ntbo) {
params.tensor_buft_overrides.push_back({nullptr, nullptr});
}
}
// Terminate the draft tensor_buft_overrides list with a sentinel, mirroring
// the main-model handling above.
if (!params.speculative.draft.tensor_buft_overrides.empty()) {
params.speculative.draft.tensor_buft_overrides.push_back({nullptr, nullptr});
}
}

View File

@@ -117,7 +117,8 @@ libgoacestepcpp-custom: CMakeLists.txt cpp/goacestepcpp.cpp cpp/goacestepcpp.h
cmake .. $(CMAKE_ARGS) && \
cmake --build . --config Release -j$(JOBS) --target goacestepcpp && \
cd .. && \
mv build-$(SO_TARGET)/libgoacestepcpp.so ./$(SO_TARGET)
(mv build-$(SO_TARGET)/libgoacestepcpp.so ./$(SO_TARGET) 2>/dev/null || \
mv build-$(SO_TARGET)/libgoacestepcpp.dylib ./$(SO_TARGET) 2>/dev/null)
test: acestep-cpp
@echo "Running acestep-cpp tests..."

View File

@@ -4,6 +4,7 @@ package main
import (
"flag"
"os"
"runtime"
"github.com/ebitengine/purego"
grpc "github.com/mudler/LocalAI/pkg/grpc"
@@ -22,7 +23,11 @@ func main() {
// Get library name from environment variable, default to fallback
libName := os.Getenv("ACESTEP_LIBRARY")
if libName == "" {
libName = "./libgoacestepcpp-fallback.so"
if runtime.GOOS == "darwin" {
libName = "./libgoacestepcpp-fallback.dylib"
} else {
libName = "./libgoacestepcpp-fallback.so"
}
}
gosd, err := purego.Dlopen(libName, purego.RTLD_NOW|purego.RTLD_GLOBAL)

View File

@@ -13,6 +13,7 @@ mkdir -p $CURDIR/package/lib
cp -avf $CURDIR/acestep-cpp $CURDIR/package/
cp -fv $CURDIR/libgoacestepcpp-*.so $CURDIR/package/
cp -fv $CURDIR/libgoacestepcpp-*.dylib $CURDIR/package/ 2>/dev/null || true
cp -fv $CURDIR/run.sh $CURDIR/package/
# Detect architecture and copy appropriate libraries

View File

@@ -12,9 +12,19 @@ if [ "$(uname)" != "Darwin" ]; then
grep -e "flags" /proc/cpuinfo | head -1
fi
LIBRARY="$CURDIR/libgoacestepcpp-fallback.so"
if [ "$(uname)" = "Darwin" ]; then
# macOS: single library variant (Metal or Accelerate). The goacestepcpp
# target is built as a CMake MODULE, which emits a .dylib for a SHARED
# build but a .so for a MODULE build on Apple, so prefer .dylib and fall
# back to .so.
LIBRARY="$CURDIR/libgoacestepcpp-fallback.dylib"
if [ ! -e "$LIBRARY" ]; then
LIBRARY="$CURDIR/libgoacestepcpp-fallback.so"
fi
export DYLD_LIBRARY_PATH=$CURDIR/lib:$DYLD_LIBRARY_PATH
else
LIBRARY="$CURDIR/libgoacestepcpp-fallback.so"
if [ "$(uname)" != "Darwin" ]; then
if grep -q -e "\savx\s" /proc/cpuinfo ; then
echo "CPU: AVX found OK"
if [ -e $CURDIR/libgoacestepcpp-avx.so ]; then
@@ -36,9 +46,10 @@ if [ "$(uname)" != "Darwin" ]; then
LIBRARY="$CURDIR/libgoacestepcpp-avx512.so"
fi
fi
export LD_LIBRARY_PATH=$CURDIR/lib:$LD_LIBRARY_PATH
fi
export LD_LIBRARY_PATH=$CURDIR/lib:$LD_LIBRARY_PATH
export ACESTEP_LIBRARY=$LIBRARY
# If there is a lib/ld.so, use it

View File

@@ -57,6 +57,7 @@ libced.so: sources/ced.cpp
cmake -B sources/ced.cpp/build-shared -S sources/ced.cpp $(CMAKE_ARGS)
cmake --build sources/ced.cpp/build-shared --config Release -j$(JOBS)
cp -fv sources/ced.cpp/build-shared/libced.so* ./ 2>/dev/null || true
cp -fv sources/ced.cpp/build-shared/libced.dylib ./ 2>/dev/null || true
cp -fv sources/ced.cpp/include/ced_capi.h ./
ced-grpc: libced.so main.go goced.go

View File

@@ -12,6 +12,7 @@ import (
"flag"
"fmt"
"os"
"runtime"
"github.com/ebitengine/purego"
grpc "github.com/mudler/LocalAI/pkg/grpc"
@@ -27,7 +28,11 @@ type libFunc struct {
func main() {
libName := os.Getenv("CED_LIBRARY")
if libName == "" {
libName = "libced.so"
if runtime.GOOS == "darwin" {
libName = "libced.dylib"
} else {
libName = "libced.so"
}
}
lib, err := purego.Dlopen(libName, purego.RTLD_NOW|purego.RTLD_GLOBAL)
if err != nil {

View File

@@ -15,10 +15,12 @@ mkdir -p "$CURDIR/package/lib"
cp -avf "$CURDIR/ced-grpc" "$CURDIR/package/"
cp -avf "$CURDIR/run.sh" "$CURDIR/package/"
cp -avf "$CURDIR"/libced.so* "$CURDIR/package/lib/" 2>/dev/null || {
echo "ERROR: libced.so not found in $CURDIR, run 'make' first" >&2
cp -avf "$CURDIR"/libced.so* "$CURDIR/package/lib/" 2>/dev/null || true
cp -avf "$CURDIR"/libced.dylib "$CURDIR/package/lib/" 2>/dev/null || true
if ! ls "$CURDIR"/package/lib/libced.* >/dev/null 2>&1; then
echo "ERROR: libced shared library not found in $CURDIR, run 'make' first" >&2
exit 1
}
fi
if [ -f "/lib64/ld-linux-x86-64.so.2" ]; then
echo "Detected x86_64 architecture, copying x86_64 libraries..."

View File

@@ -3,7 +3,12 @@ set -e
CURDIR=$(dirname "$(realpath "$0")")
export LD_LIBRARY_PATH="$CURDIR/lib:$CURDIR:${LD_LIBRARY_PATH:-}"
if [ "$(uname)" = "Darwin" ]; then
export DYLD_LIBRARY_PATH="$CURDIR/lib:$CURDIR:${DYLD_LIBRARY_PATH:-}"
export CED_LIBRARY="$CURDIR/lib/libced.dylib"
else
export LD_LIBRARY_PATH="$CURDIR/lib:$CURDIR:${LD_LIBRARY_PATH:-}"
fi
# If a self-contained ld.so was packaged, route through it so the packaged
# libc / libstdc++ are used instead of the host's (matches the sibling backends).

View File

@@ -8,7 +8,7 @@ JOBS?=$(shell nproc --ignore=1)
# CrispASR version (release tag)
CRISPASR_REPO?=https://github.com/CrispStrobe/CrispASR
CRISPASR_VERSION?=7a8cb80907341c0204bd0488c1244764f4163883
CRISPASR_VERSION?=96b2a6ee31d30389fed8a7ef1a54239b75231ddc
SO_TARGET?=libgocrispasr.so
CMAKE_ARGS+=-DBUILD_SHARED_LIBS=OFF
@@ -75,7 +75,8 @@ UNAME_S := $(shell uname -s)
ifeq ($(UNAME_S),Linux)
VARIANT_TARGETS = libgocrispasr-avx.so libgocrispasr-avx2.so libgocrispasr-avx512.so libgocrispasr-fallback.so
else
VARIANT_TARGETS = libgocrispasr-fallback.so
# On non-Linux (e.g., Darwin), build only fallback variant (as a dylib)
VARIANT_TARGETS = libgocrispasr-fallback.dylib
endif
crispasr: main.go gocrispasr.go $(VARIANT_TARGETS)
@@ -87,7 +88,7 @@ package: crispasr
build: package
clean: purge
rm -rf libgocrispasr*.so package sources/CrispASR crispasr
rm -rf libgocrispasr*.so libgocrispasr*.dylib package sources/CrispASR crispasr
purge:
rm -rf build*
@@ -118,13 +119,21 @@ libgocrispasr-fallback.so: sources/CrispASR
SO_TARGET=libgocrispasr-fallback.so CMAKE_ARGS="$(CMAKE_ARGS) -DGGML_AVX=off -DGGML_AVX2=off -DGGML_AVX512=off -DGGML_FMA=off -DGGML_F16C=off -DGGML_BMI2=off" $(MAKE) libgocrispasr-custom
rm -rfv build*
# Build fallback variant as a dylib (Darwin)
libgocrispasr-fallback.dylib: sources/CrispASR
$(MAKE) purge
$(info ${GREEN}I crispasr build info:fallback (dylib)${RESET})
SO_TARGET=libgocrispasr-fallback.dylib CMAKE_ARGS="$(CMAKE_ARGS) -DGGML_AVX=off -DGGML_AVX2=off -DGGML_AVX512=off -DGGML_FMA=off -DGGML_F16C=off -DGGML_BMI2=off" $(MAKE) libgocrispasr-custom
rm -rfv build*
libgocrispasr-custom: CMakeLists.txt cpp/crispasr_shim.cpp cpp/crispasr_shim.h
mkdir -p build-$(SO_TARGET) && \
cd build-$(SO_TARGET) && \
cmake .. $(CMAKE_ARGS) && \
cmake --build . --config Release -j$(JOBS) && \
cd .. && \
mv build-$(SO_TARGET)/libgocrispasr.so ./$(SO_TARGET)
(mv build-$(SO_TARGET)/libgocrispasr.so ./$(SO_TARGET) 2>/dev/null || \
mv build-$(SO_TARGET)/libgocrispasr.dylib ./$(SO_TARGET) 2>/dev/null)
test: crispasr
CGO_ENABLED=0 $(GOCMD) test -v ./...

View File

@@ -4,6 +4,7 @@ package main
import (
"flag"
"os"
"runtime"
"github.com/ebitengine/purego"
grpc "github.com/mudler/LocalAI/pkg/grpc"
@@ -21,7 +22,11 @@ type LibFuncs struct {
func main() {
libName := os.Getenv("CRISPASR_LIBRARY")
if libName == "" {
libName = "./libgocrispasr-fallback.so"
if runtime.GOOS == "darwin" {
libName = "./libgocrispasr-fallback.dylib"
} else {
libName = "./libgocrispasr-fallback.so"
}
}
lib, err := purego.Dlopen(libName, purego.RTLD_NOW|purego.RTLD_GLOBAL)

View File

@@ -12,7 +12,8 @@ REPO_ROOT="${CURDIR}/../../.."
mkdir -p $CURDIR/package/lib
cp -avf $CURDIR/crispasr $CURDIR/package/
cp -fv $CURDIR/libgocrispasr-*.so $CURDIR/package/
cp -fv $CURDIR/libgocrispasr-*.so $CURDIR/package/ 2>/dev/null || true
cp -fv $CURDIR/libgocrispasr-*.dylib $CURDIR/package/ 2>/dev/null || true
cp -fv $CURDIR/run.sh $CURDIR/package/
# Detect architecture and copy appropriate libraries

View File

@@ -12,9 +12,13 @@ if [ "$(uname)" != "Darwin" ]; then
grep -e "flags" /proc/cpuinfo | head -1
fi
LIBRARY="$CURDIR/libgocrispasr-fallback.so"
if [ "$(uname)" = "Darwin" ]; then
# macOS: single dylib variant (Metal or Accelerate)
LIBRARY="$CURDIR/libgocrispasr-fallback.dylib"
export DYLD_LIBRARY_PATH=$CURDIR/lib:$DYLD_LIBRARY_PATH
else
LIBRARY="$CURDIR/libgocrispasr-fallback.so"
if [ "$(uname)" != "Darwin" ]; then
if grep -q -e "\savx\s" /proc/cpuinfo ; then
echo "CPU: AVX found OK"
if [ -e $CURDIR/libgocrispasr-avx.so ]; then
@@ -36,9 +40,10 @@ if [ "$(uname)" != "Darwin" ]; then
LIBRARY="$CURDIR/libgocrispasr-avx512.so"
fi
fi
export LD_LIBRARY_PATH=$CURDIR/lib:$LD_LIBRARY_PATH
fi
export LD_LIBRARY_PATH=$CURDIR/lib:$LD_LIBRARY_PATH
export CRISPASR_LIBRARY=$LIBRARY
# Point piper's espeak-ng phonemizer at the bundled voice data. The variable

View File

@@ -77,7 +77,7 @@ ifeq ($(UNAME_S),Linux)
VARIANT_TARGETS = libdepthanythingcpp-avx.so libdepthanythingcpp-avx2.so libdepthanythingcpp-avx512.so libdepthanythingcpp-fallback.so
else
# On non-Linux (e.g., Darwin), build only fallback variant
VARIANT_TARGETS = libdepthanythingcpp-fallback.so
VARIANT_TARGETS = libdepthanythingcpp-fallback.dylib
endif
depth-anything-cpp: main.go godepthanythingcpp.go $(VARIANT_TARGETS)
@@ -89,7 +89,7 @@ package: depth-anything-cpp
build: package
clean: purge
rm -rf libdepthanythingcpp*.so depth-anything-cpp package sources
rm -rf libdepthanythingcpp*.so libdepthanythingcpp*.dylib depth-anything-cpp package sources
purge:
rm -rf build*
@@ -116,11 +116,19 @@ libdepthanythingcpp-avx512.so: sources/depth-anything.cpp
endif
# Build fallback variant (all platforms)
ifeq ($(UNAME_S),Darwin)
libdepthanythingcpp-fallback.dylib: sources/depth-anything.cpp
rm -rfv build-$@
$(info ${GREEN}I depth-anything-cpp build info:fallback${RESET})
SO_TARGET=$@ CMAKE_ARGS="$(CMAKE_ARGS) -DGGML_AVX=off -DGGML_AVX2=off -DGGML_AVX512=off -DGGML_FMA=off -DGGML_F16C=off -DGGML_BMI2=off" $(MAKE) libdepthanythingcpp-custom
rm -rfv build-$@
else
libdepthanythingcpp-fallback.so: sources/depth-anything.cpp
rm -rfv build-$@
$(info ${GREEN}I depth-anything-cpp build info:fallback${RESET})
SO_TARGET=$@ CMAKE_ARGS="$(CMAKE_ARGS) -DGGML_AVX=off -DGGML_AVX2=off -DGGML_AVX512=off -DGGML_FMA=off -DGGML_F16C=off -DGGML_BMI2=off" $(MAKE) libdepthanythingcpp-custom
rm -rfv build-$@
endif
libdepthanythingcpp-custom: CMakeLists.txt
mkdir -p build-$(SO_TARGET) && \
@@ -128,7 +136,8 @@ libdepthanythingcpp-custom: CMakeLists.txt
cmake .. $(CMAKE_ARGS) && \
cmake --build . --config Release -j$(JOBS) && \
cd .. && \
mv build-$(SO_TARGET)/libdepthanything.so ./$(SO_TARGET)
(mv build-$(SO_TARGET)/libdepthanything.so ./$(SO_TARGET) 2>/dev/null || \
mv build-$(SO_TARGET)/libdepthanything.dylib ./$(SO_TARGET) 2>/dev/null)
all: depth-anything-cpp package

View File

@@ -9,6 +9,7 @@ package main
import (
"flag"
"os"
"runtime"
"github.com/ebitengine/purego"
grpc "github.com/mudler/LocalAI/pkg/grpc"
@@ -27,7 +28,11 @@ func main() {
// Get library name from environment variable, default to fallback
libName := os.Getenv("DEPTHANYTHING_LIBRARY")
if libName == "" {
libName = "./libdepthanythingcpp-fallback.so"
if runtime.GOOS == "darwin" {
libName = "./libdepthanythingcpp-fallback.dylib"
} else {
libName = "./libdepthanythingcpp-fallback.so"
}
}
lib, err := purego.Dlopen(libName, purego.RTLD_NOW|purego.RTLD_GLOBAL)

View File

@@ -10,7 +10,8 @@ REPO_ROOT="${CURDIR}/../../.."
# Create lib directory
mkdir -p $CURDIR/package/lib
cp -avf $CURDIR/libdepthanythingcpp-*.so $CURDIR/package/
cp -fv $CURDIR/libdepthanythingcpp-*.so $CURDIR/package/ 2>/dev/null || true
cp -fv $CURDIR/libdepthanythingcpp-*.dylib $CURDIR/package/ 2>/dev/null || true
cp -avf $CURDIR/depth-anything-cpp $CURDIR/package/
cp -fv $CURDIR/run.sh $CURDIR/package/

View File

@@ -12,9 +12,13 @@ if [ "$(uname)" != "Darwin" ]; then
grep -e "flags" /proc/cpuinfo | head -1
fi
LIBRARY="$CURDIR/libdepthanythingcpp-fallback.so"
if [ "$(uname)" = "Darwin" ]; then
# macOS: single dylib variant (Metal or Accelerate)
LIBRARY="$CURDIR/libdepthanythingcpp-fallback.dylib"
export DYLD_LIBRARY_PATH=$CURDIR/lib:$DYLD_LIBRARY_PATH
else
LIBRARY="$CURDIR/libdepthanythingcpp-fallback.so"
if [ "$(uname)" != "Darwin" ]; then
if grep -q -e "\savx\s" /proc/cpuinfo ; then
echo "CPU: AVX found OK"
if [ -e $CURDIR/libdepthanythingcpp-avx.so ]; then
@@ -36,9 +40,10 @@ if [ "$(uname)" != "Darwin" ]; then
LIBRARY="$CURDIR/libdepthanythingcpp-avx512.so"
fi
fi
export LD_LIBRARY_PATH=$CURDIR/lib:$LD_LIBRARY_PATH
fi
export LD_LIBRARY_PATH=$CURDIR/lib:$LD_LIBRARY_PATH
export DEPTHANYTHING_LIBRARY=$LIBRARY
# If there is a lib/ld.so, use it

View File

@@ -67,8 +67,9 @@ $(LIB_SENTINEL): sources/LocalVQE
# that the loader picks at runtime. We must build every target — the
# default `--target localvqe_shared` drops these. CMAKE_LIBRARY_OUTPUT_DIRECTORY
# routes all of them into build/bin; copy them out next to the binary.
cp -P build/bin/liblocalvqe.so* . 2>/dev/null || cp -P build/liblocalvqe.so* .
cp -P build/bin/liblocalvqe.so* . 2>/dev/null || cp -P build/bin/liblocalvqe.dylib . 2>/dev/null || cp -P build/liblocalvqe.so* . 2>/dev/null || cp -P build/liblocalvqe.dylib .
cp -P build/bin/libggml*.so* . 2>/dev/null || true
cp -P build/bin/libggml*.dylib . 2>/dev/null || true
touch $(LIB_SENTINEL)
liblocalvqe.so: $(LIB_SENTINEL)

View File

@@ -4,6 +4,7 @@ package main
import (
"flag"
"os"
"runtime"
"github.com/ebitengine/purego"
grpc "github.com/mudler/LocalAI/pkg/grpc"
@@ -21,7 +22,11 @@ type LibFuncs struct {
func main() {
libName := os.Getenv("LOCALVQE_LIBRARY")
if libName == "" {
libName = "./liblocalvqe.so"
if runtime.GOOS == "darwin" {
libName = "./liblocalvqe.dylib"
} else {
libName = "./liblocalvqe.so"
}
}
lib, err := purego.Dlopen(libName, purego.RTLD_NOW|purego.RTLD_GLOBAL)

View File

@@ -15,7 +15,9 @@ cp -avf $CURDIR/localvqe $CURDIR/package/
# liblocalvqe.so* (with SOVERSION symlinks) and the libggml-*.so runtime
# variants — LocalVQE picks the matching CPU variant at load time.
cp -P $CURDIR/liblocalvqe.so* $CURDIR/package/ 2>/dev/null || true
cp -P $CURDIR/liblocalvqe.dylib $CURDIR/package/ 2>/dev/null || true
cp -P $CURDIR/libggml*.so* $CURDIR/package/ 2>/dev/null || true
cp -P $CURDIR/libggml*.dylib $CURDIR/package/ 2>/dev/null || true
cp -fv $CURDIR/run.sh $CURDIR/package/
# Detect architecture and copy appropriate libraries

View File

@@ -10,8 +10,19 @@ CURDIR=$(dirname "$(realpath $0)")
# exec'ing the binary.
cd "$CURDIR"
export LD_LIBRARY_PATH=$CURDIR:$CURDIR/lib:$LD_LIBRARY_PATH
export LOCALVQE_LIBRARY=$CURDIR/liblocalvqe.so
if [ "$(uname)" = "Darwin" ]; then
# macOS: LocalVQE is built as a SHARED library, so dyld needs the .dylib +
# DYLD_LIBRARY_PATH. Prefer .dylib and fall back to .so just in case.
export DYLD_LIBRARY_PATH=$CURDIR:$CURDIR/lib:$DYLD_LIBRARY_PATH
LOCALVQE_LIBRARY=$CURDIR/liblocalvqe.dylib
if [ ! -e "$LOCALVQE_LIBRARY" ]; then
LOCALVQE_LIBRARY=$CURDIR/liblocalvqe.so
fi
export LOCALVQE_LIBRARY
else
export LD_LIBRARY_PATH=$CURDIR:$CURDIR/lib:$LD_LIBRARY_PATH
export LOCALVQE_LIBRARY=$CURDIR/liblocalvqe.so
fi
if [ -f $CURDIR/lib/ld.so ]; then
echo "Using lib/ld.so"

View File

@@ -70,7 +70,7 @@ ifeq ($(UNAME_S),Linux)
VARIANT_TARGETS = liblocateanythingcpp-avx.so liblocateanythingcpp-avx2.so liblocateanythingcpp-avx512.so liblocateanythingcpp-fallback.so
else
# On non-Linux (e.g., Darwin), build only fallback variant
VARIANT_TARGETS = liblocateanythingcpp-fallback.so
VARIANT_TARGETS = liblocateanythingcpp-fallback.dylib
endif
locate-anything-cpp: main.go golocateanythingcpp.go $(VARIANT_TARGETS)
@@ -82,7 +82,7 @@ package: locate-anything-cpp
build: package
clean: purge
rm -rf liblocateanythingcpp*.so locate-anything-cpp package sources
rm -rf liblocateanythingcpp*.so liblocateanythingcpp*.dylib locate-anything-cpp package sources
purge:
rm -rf build*
@@ -109,11 +109,19 @@ liblocateanythingcpp-avx512.so: sources/locate-anything.cpp
endif
# Build fallback variant (all platforms)
ifeq ($(UNAME_S),Darwin)
liblocateanythingcpp-fallback.dylib: sources/locate-anything.cpp
rm -rfv build-$@
$(info ${GREEN}I locate-anything-cpp build info:fallback${RESET})
SO_TARGET=$@ CMAKE_ARGS="$(CMAKE_ARGS) -DGGML_AVX=off -DGGML_AVX2=off -DGGML_AVX512=off -DGGML_FMA=off -DGGML_F16C=off -DGGML_BMI2=off" $(MAKE) liblocateanythingcpp-custom
rm -rfv build-$@
else
liblocateanythingcpp-fallback.so: sources/locate-anything.cpp
rm -rfv build-$@
$(info ${GREEN}I locate-anything-cpp build info:fallback${RESET})
SO_TARGET=$@ CMAKE_ARGS="$(CMAKE_ARGS) -DGGML_AVX=off -DGGML_AVX2=off -DGGML_AVX512=off -DGGML_FMA=off -DGGML_F16C=off -DGGML_BMI2=off" $(MAKE) liblocateanythingcpp-custom
rm -rfv build-$@
endif
liblocateanythingcpp-custom: CMakeLists.txt
mkdir -p build-$(SO_TARGET) && \
@@ -121,7 +129,8 @@ liblocateanythingcpp-custom: CMakeLists.txt
cmake .. $(CMAKE_ARGS) && \
cmake --build . --config Release -j$(JOBS) && \
cd .. && \
mv build-$(SO_TARGET)/liblocateanythingcpp.so ./$(SO_TARGET)
(mv build-$(SO_TARGET)/liblocateanythingcpp.so ./$(SO_TARGET) 2>/dev/null || \
mv build-$(SO_TARGET)/liblocateanythingcpp.dylib ./$(SO_TARGET) 2>/dev/null)
all: locate-anything-cpp package

View File

@@ -9,6 +9,7 @@ package main
import (
"flag"
"os"
"runtime"
"github.com/ebitengine/purego"
grpc "github.com/mudler/LocalAI/pkg/grpc"
@@ -27,7 +28,11 @@ func main() {
// Get library name from environment variable, default to fallback
libName := os.Getenv("LOCATEANYTHING_LIBRARY")
if libName == "" {
libName = "./liblocateanythingcpp-fallback.so"
if runtime.GOOS == "darwin" {
libName = "./liblocateanythingcpp-fallback.dylib"
} else {
libName = "./liblocateanythingcpp-fallback.so"
}
}
lib, err := purego.Dlopen(libName, purego.RTLD_NOW|purego.RTLD_GLOBAL)

View File

@@ -10,7 +10,8 @@ REPO_ROOT="${CURDIR}/../../.."
# Create lib directory
mkdir -p $CURDIR/package/lib
cp -avf $CURDIR/liblocateanythingcpp-*.so $CURDIR/package/
cp -fv $CURDIR/liblocateanythingcpp-*.so $CURDIR/package/ 2>/dev/null || true
cp -fv $CURDIR/liblocateanythingcpp-*.dylib $CURDIR/package/ 2>/dev/null || true
cp -avf $CURDIR/locate-anything-cpp $CURDIR/package/
cp -fv $CURDIR/run.sh $CURDIR/package/

View File

@@ -12,9 +12,13 @@ if [ "$(uname)" != "Darwin" ]; then
grep -e "flags" /proc/cpuinfo | head -1
fi
LIBRARY="$CURDIR/liblocateanythingcpp-fallback.so"
if [ "$(uname)" = "Darwin" ]; then
# macOS: single dylib variant (Metal or Accelerate)
LIBRARY="$CURDIR/liblocateanythingcpp-fallback.dylib"
export DYLD_LIBRARY_PATH=$CURDIR/lib:$DYLD_LIBRARY_PATH
else
LIBRARY="$CURDIR/liblocateanythingcpp-fallback.so"
if [ "$(uname)" != "Darwin" ]; then
if grep -q -e "\savx\s" /proc/cpuinfo ; then
echo "CPU: AVX found OK"
if [ -e $CURDIR/liblocateanythingcpp-avx.so ]; then
@@ -36,9 +40,10 @@ if [ "$(uname)" != "Darwin" ]; then
LIBRARY="$CURDIR/liblocateanythingcpp-avx512.so"
fi
fi
export LD_LIBRARY_PATH=$CURDIR/lib:$LD_LIBRARY_PATH
fi
export LD_LIBRARY_PATH=$CURDIR/lib:$LD_LIBRARY_PATH
export LOCATEANYTHING_LIBRARY=$LIBRARY
# If there is a lib/ld.so, use it

View File

@@ -8,7 +8,7 @@ JOBS?=$(shell nproc --ignore=1)
# omnivoice.cpp version
OMNIVOICE_REPO?=https://github.com/ServeurpersoCom/omnivoice.cpp
OMNIVOICE_VERSION?=96d30169afd5e6bb3fd6a0e9be0eb505bfe81fcd
OMNIVOICE_VERSION?=0f37401bebe9b20c0160a888e592108fc1d17607
SO_TARGET?=libgomnivoicecpp.so
CMAKE_ARGS+=-DBUILD_SHARED_LIBS=OFF
@@ -65,7 +65,8 @@ UNAME_S := $(shell uname -s)
ifeq ($(UNAME_S),Linux)
VARIANT_TARGETS = libgomnivoicecpp-avx.so libgomnivoicecpp-avx2.so libgomnivoicecpp-avx512.so libgomnivoicecpp-fallback.so
else
VARIANT_TARGETS = libgomnivoicecpp-fallback.so
# On non-Linux (e.g., Darwin), build only fallback variant (as a dylib)
VARIANT_TARGETS = libgomnivoicecpp-fallback.dylib
endif
omnivoice-cpp: main.go gomnivoicecpp.go $(VARIANT_TARGETS)
@@ -77,7 +78,7 @@ package: omnivoice-cpp
build: package
clean: purge
rm -rf libgomnivoicecpp*.so package sources/omnivoice.cpp omnivoice-cpp
rm -rf libgomnivoicecpp*.so libgomnivoicecpp*.dylib package sources/omnivoice.cpp omnivoice-cpp
purge:
rm -rf build*
@@ -106,13 +107,20 @@ libgomnivoicecpp-fallback.so: sources/omnivoice.cpp
SO_TARGET=libgomnivoicecpp-fallback.so CMAKE_ARGS="$(CMAKE_ARGS) -DGGML_AVX=off -DGGML_AVX2=off -DGGML_AVX512=off -DGGML_FMA=off -DGGML_F16C=off -DGGML_BMI2=off" $(MAKE) libgomnivoicecpp-custom
rm -rf build-libgomnivoicecpp-fallback.so
# Build fallback variant as a dylib (Darwin)
libgomnivoicecpp-fallback.dylib: sources/omnivoice.cpp
$(info ${GREEN}I omnivoice-cpp build info:fallback (dylib)${RESET})
SO_TARGET=libgomnivoicecpp-fallback.dylib CMAKE_ARGS="$(CMAKE_ARGS) -DGGML_AVX=off -DGGML_AVX2=off -DGGML_AVX512=off -DGGML_FMA=off -DGGML_F16C=off -DGGML_BMI2=off" $(MAKE) libgomnivoicecpp-custom
rm -rf build-libgomnivoicecpp-fallback.dylib
libgomnivoicecpp-custom: CMakeLists.txt cpp/gomnivoicecpp.cpp cpp/gomnivoicecpp.h
mkdir -p build-$(SO_TARGET) && \
cd build-$(SO_TARGET) && \
cmake .. $(CMAKE_ARGS) && \
cmake --build . --config Release -j$(JOBS) --target gomnivoicecpp && \
cd .. && \
mv build-$(SO_TARGET)/libgomnivoicecpp.so ./$(SO_TARGET)
(mv build-$(SO_TARGET)/libgomnivoicecpp.so ./$(SO_TARGET) 2>/dev/null || \
mv build-$(SO_TARGET)/libgomnivoicecpp.dylib ./$(SO_TARGET) 2>/dev/null)
test: omnivoice-cpp
@echo "Running omnivoice-cpp tests..."

View File

@@ -4,6 +4,7 @@ package main
import (
"flag"
"os"
"runtime"
"github.com/ebitengine/purego"
grpc "github.com/mudler/LocalAI/pkg/grpc"
@@ -21,7 +22,11 @@ type LibFuncs struct {
func main() {
libName := os.Getenv("OMNIVOICE_LIBRARY")
if libName == "" {
libName = "./libgomnivoicecpp-fallback.so"
if runtime.GOOS == "darwin" {
libName = "./libgomnivoicecpp-fallback.dylib"
} else {
libName = "./libgomnivoicecpp-fallback.so"
}
}
lib, err := purego.Dlopen(libName, purego.RTLD_NOW|purego.RTLD_GLOBAL)

View File

@@ -12,7 +12,8 @@ REPO_ROOT="${CURDIR}/../../.."
mkdir -p $CURDIR/package/lib
cp -avf $CURDIR/omnivoice-cpp $CURDIR/package/
cp -fv $CURDIR/libgomnivoicecpp-*.so $CURDIR/package/
cp -fv $CURDIR/libgomnivoicecpp-*.so $CURDIR/package/ 2>/dev/null || true
cp -fv $CURDIR/libgomnivoicecpp-*.dylib $CURDIR/package/ 2>/dev/null || true
cp -fv $CURDIR/run.sh $CURDIR/package/
# Detect architecture and copy appropriate libraries

View File

@@ -12,9 +12,13 @@ if [ "$(uname)" != "Darwin" ]; then
grep -e "flags" /proc/cpuinfo | head -1
fi
LIBRARY="$CURDIR/libgomnivoicecpp-fallback.so"
if [ "$(uname)" = "Darwin" ]; then
# macOS: single dylib variant (Metal or Accelerate)
LIBRARY="$CURDIR/libgomnivoicecpp-fallback.dylib"
export DYLD_LIBRARY_PATH=$CURDIR/lib:$DYLD_LIBRARY_PATH
else
LIBRARY="$CURDIR/libgomnivoicecpp-fallback.so"
if [ "$(uname)" != "Darwin" ]; then
if grep -q -e "\savx\s" /proc/cpuinfo ; then
echo "CPU: AVX found OK"
if [ -e $CURDIR/libgomnivoicecpp-avx.so ]; then
@@ -36,9 +40,10 @@ if [ "$(uname)" != "Darwin" ]; then
LIBRARY="$CURDIR/libgomnivoicecpp-avx512.so"
fi
fi
export LD_LIBRARY_PATH=$CURDIR/lib:$LD_LIBRARY_PATH
fi
export LD_LIBRARY_PATH=$CURDIR/lib:$LD_LIBRARY_PATH
export OMNIVOICE_LIBRARY=$LIBRARY
# If there is a lib/ld.so, use it

View File

@@ -1,6 +1,6 @@
# parakeet-cpp backend Makefile.
#
# Upstream pin lives below as PARAKEET_VERSION?=db755a78d39f789bb7d4e3935158a9e8105dbe36
# Upstream pin lives below as PARAKEET_VERSION?=89f5e2977b4d8bccd45e7bcc6f2ef7c4ed49e89a
# (.github/bump_deps.sh) can find and update it - matches the
# whisper.cpp / ds4 / vibevoice-cpp convention.
#
@@ -15,7 +15,7 @@
# That's what the L0 smoke test uses. The default target below does the
# proper clone-at-pin + cmake build so CI doesn't need a side-checkout.
PARAKEET_VERSION?=db755a78d39f789bb7d4e3935158a9e8105dbe36
PARAKEET_VERSION?=89f5e2977b4d8bccd45e7bcc6f2ef7c4ed49e89a
PARAKEET_REPO?=https://github.com/mudler/parakeet.cpp
GOCMD?=go
@@ -74,6 +74,7 @@ libparakeet.so: sources/parakeet.cpp
cmake -B sources/parakeet.cpp/build-shared -S sources/parakeet.cpp $(CMAKE_ARGS)
cmake --build sources/parakeet.cpp/build-shared --config Release -j$(JOBS)
cp -fv sources/parakeet.cpp/build-shared/libparakeet.so* ./ 2>/dev/null || true
cp -fv sources/parakeet.cpp/build-shared/libparakeet.dylib ./ 2>/dev/null || true
cp -fv sources/parakeet.cpp/include/parakeet_capi.h ./
parakeet-cpp-grpc: libparakeet.so main.go goparakeetcpp.go

View File

@@ -2,15 +2,17 @@ package main
// Started internally by LocalAI - one gRPC server per loaded model.
//
// Loads libparakeet.so via purego and registers the flat C-API entry
// points declared in parakeet_capi.h. The library name can be overridden
// with PARAKEET_LIBRARY (mirrors the WHISPER_LIBRARY / VIBEVOICECPP_LIBRARY
// convention in the sibling backends); the default looks for the .so next
// to this binary.
// Loads the parakeet shared library via purego and registers the flat
// C-API entry points declared in parakeet_capi.h. The library name can be
// overridden with PARAKEET_LIBRARY (mirrors the WHISPER_LIBRARY /
// VIBEVOICECPP_LIBRARY convention in the sibling backends); the default
// looks next to this binary for libparakeet.so on Linux and
// libparakeet.dylib on macOS.
import (
"flag"
"fmt"
"os"
"runtime"
"github.com/ebitengine/purego"
grpc "github.com/mudler/LocalAI/pkg/grpc"
@@ -28,7 +30,11 @@ type LibFuncs struct {
func main() {
libName := os.Getenv("PARAKEET_LIBRARY")
if libName == "" {
libName = "libparakeet.so"
if runtime.GOOS == "darwin" {
libName = "libparakeet.dylib"
} else {
libName = "libparakeet.so"
}
}
lib, err := purego.Dlopen(libName, purego.RTLD_NOW|purego.RTLD_GLOBAL)

View File

@@ -16,12 +16,15 @@ mkdir -p "$CURDIR/package/lib"
cp -avf "$CURDIR/parakeet-cpp-grpc" "$CURDIR/package/"
cp -avf "$CURDIR/run.sh" "$CURDIR/package/"
# libparakeet.so + any soname symlinks (libparakeet.so.X[.Y]). purego.Dlopen
# resolves it via LD_LIBRARY_PATH, which run.sh points at lib/.
cp -avf "$CURDIR"/libparakeet.so* "$CURDIR/package/lib/" 2>/dev/null || {
echo "ERROR: libparakeet.so not found in $CURDIR, run 'make' first" >&2
# libparakeet shared lib + any soname symlinks. On Linux this is
# libparakeet.so[.X.Y]; on macOS it is libparakeet.dylib. purego.Dlopen
# resolves it via the *_LIBRARY_PATH that run.sh points at lib/.
cp -avf "$CURDIR"/libparakeet.so* "$CURDIR/package/lib/" 2>/dev/null || true
cp -avf "$CURDIR"/libparakeet.dylib "$CURDIR/package/lib/" 2>/dev/null || true
if ! ls "$CURDIR"/package/lib/libparakeet.* >/dev/null 2>&1; then
echo "ERROR: libparakeet shared library not found in $CURDIR, run 'make' first" >&2
exit 1
}
fi
# Detect architecture and copy the core runtime libs libparakeet.so links
# against, plus the matching dynamic loader as lib/ld.so.
@@ -48,7 +51,7 @@ elif [ -f "/lib/ld-linux-aarch64.so.1" ]; then
cp -arfLv /lib/aarch64-linux-gnu/librt.so.1 "$CURDIR/package/lib/librt.so.1"
cp -arfLv /lib/aarch64-linux-gnu/libpthread.so.0 "$CURDIR/package/lib/libpthread.so.0"
elif [ "$(uname -s)" = "Darwin" ]; then
echo "Detected Darwin"
echo "Detected Darwin — system frameworks linked dynamically, no bundled libs needed"
else
echo "Error: Could not detect architecture"
exit 1

View File

@@ -3,11 +3,17 @@ set -e
CURDIR=$(dirname "$(realpath "$0")")
export LD_LIBRARY_PATH="$CURDIR/lib:$CURDIR:${LD_LIBRARY_PATH:-}"
if [ "$(uname)" = "Darwin" ]; then
export DYLD_LIBRARY_PATH="$CURDIR/lib:$CURDIR:${DYLD_LIBRARY_PATH:-}"
export PARAKEET_LIBRARY="$CURDIR/lib/libparakeet.dylib"
else
export LD_LIBRARY_PATH="$CURDIR/lib:$CURDIR:${LD_LIBRARY_PATH:-}"
export PARAKEET_LIBRARY="$CURDIR/lib/libparakeet.so"
fi
# If a self-contained ld.so was packaged, route through it so the
# packaged libc / libstdc++ are used instead of the host's (matches the
# whisper backend's runtime layout).
# whisper backend's runtime layout). Linux only.
if [ -f "$CURDIR/lib/ld.so" ]; then
echo "Using lib/ld.so"
exec "$CURDIR/lib/ld.so" "$CURDIR/parakeet-cpp-grpc" "$@"

View File

@@ -8,7 +8,7 @@ JOBS?=$(shell nproc --ignore=1)
# qwentts.cpp version
QWEN3TTS_REPO?=https://github.com/ServeurpersoCom/qwentts.cpp
QWEN3TTS_CPP_VERSION?=4536dcdce27c3764a93a06d6bf64026b124962f5
QWEN3TTS_CPP_VERSION?=9dbe7ea26a01b30fccb117ae5e86807c1dc23d42
SO_TARGET?=libgoqwen3ttscpp.so
CMAKE_ARGS+=-DBUILD_SHARED_LIBS=OFF
@@ -65,8 +65,8 @@ UNAME_S := $(shell uname -s)
ifeq ($(UNAME_S),Linux)
VARIANT_TARGETS = libgoqwen3ttscpp-avx.so libgoqwen3ttscpp-avx2.so libgoqwen3ttscpp-avx512.so libgoqwen3ttscpp-fallback.so
else
# On non-Linux (e.g., Darwin), build only fallback variant
VARIANT_TARGETS = libgoqwen3ttscpp-fallback.so
# On non-Linux (e.g., Darwin), build only fallback variant (as a dylib)
VARIANT_TARGETS = libgoqwen3ttscpp-fallback.dylib
endif
qwen3-tts-cpp: main.go goqwen3ttscpp.go $(VARIANT_TARGETS)
@@ -78,7 +78,7 @@ package: qwen3-tts-cpp
build: package
clean: purge
rm -rf libgoqwen3ttscpp*.so package sources/qwentts.cpp qwen3-tts-cpp
rm -rf libgoqwen3ttscpp*.so libgoqwen3ttscpp*.dylib package sources/qwentts.cpp qwen3-tts-cpp
purge:
rm -rf build*
@@ -110,13 +110,20 @@ libgoqwen3ttscpp-fallback.so: sources/qwentts.cpp
SO_TARGET=libgoqwen3ttscpp-fallback.so CMAKE_ARGS="$(CMAKE_ARGS) -DGGML_AVX=off -DGGML_AVX2=off -DGGML_AVX512=off -DGGML_FMA=off -DGGML_F16C=off -DGGML_BMI2=off" $(MAKE) libgoqwen3ttscpp-custom
rm -rf build-libgoqwen3ttscpp-fallback.so
# Build fallback variant as a dylib (Darwin)
libgoqwen3ttscpp-fallback.dylib: sources/qwentts.cpp
$(info ${GREEN}I qwen3-tts-cpp build info:fallback (dylib)${RESET})
SO_TARGET=libgoqwen3ttscpp-fallback.dylib CMAKE_ARGS="$(CMAKE_ARGS) -DGGML_AVX=off -DGGML_AVX2=off -DGGML_AVX512=off -DGGML_FMA=off -DGGML_F16C=off -DGGML_BMI2=off" $(MAKE) libgoqwen3ttscpp-custom
rm -rf build-libgoqwen3ttscpp-fallback.dylib
libgoqwen3ttscpp-custom: CMakeLists.txt cpp/goqwen3ttscpp.cpp cpp/goqwen3ttscpp.h
mkdir -p build-$(SO_TARGET) && \
cd build-$(SO_TARGET) && \
cmake .. $(CMAKE_ARGS) && \
cmake --build . --config Release -j$(JOBS) --target goqwen3ttscpp && \
cd .. && \
mv build-$(SO_TARGET)/libgoqwen3ttscpp.so ./$(SO_TARGET)
(mv build-$(SO_TARGET)/libgoqwen3ttscpp.so ./$(SO_TARGET) 2>/dev/null || \
mv build-$(SO_TARGET)/libgoqwen3ttscpp.dylib ./$(SO_TARGET) 2>/dev/null)
test: qwen3-tts-cpp
@echo "Running qwen3-tts-cpp tests..."

View File

@@ -4,6 +4,7 @@ package main
import (
"flag"
"os"
"runtime"
"github.com/ebitengine/purego"
grpc "github.com/mudler/LocalAI/pkg/grpc"
@@ -21,7 +22,11 @@ type LibFuncs struct {
func main() {
libName := os.Getenv("QWEN3TTS_LIBRARY")
if libName == "" {
libName = "./libgoqwen3ttscpp-fallback.so"
if runtime.GOOS == "darwin" {
libName = "./libgoqwen3ttscpp-fallback.dylib"
} else {
libName = "./libgoqwen3ttscpp-fallback.so"
}
}
lib, err := purego.Dlopen(libName, purego.RTLD_NOW|purego.RTLD_GLOBAL)

View File

@@ -12,7 +12,8 @@ REPO_ROOT="${CURDIR}/../../.."
mkdir -p $CURDIR/package/lib
cp -avf $CURDIR/qwen3-tts-cpp $CURDIR/package/
cp -fv $CURDIR/libgoqwen3ttscpp-*.so $CURDIR/package/
cp -fv $CURDIR/libgoqwen3ttscpp-*.so $CURDIR/package/ 2>/dev/null || true
cp -fv $CURDIR/libgoqwen3ttscpp-*.dylib $CURDIR/package/ 2>/dev/null || true
cp -fv $CURDIR/run.sh $CURDIR/package/
# Detect architecture and copy appropriate libraries

View File

@@ -12,9 +12,13 @@ if [ "$(uname)" != "Darwin" ]; then
grep -e "flags" /proc/cpuinfo | head -1
fi
LIBRARY="$CURDIR/libgoqwen3ttscpp-fallback.so"
if [ "$(uname)" = "Darwin" ]; then
# macOS: single dylib variant (Metal or Accelerate)
LIBRARY="$CURDIR/libgoqwen3ttscpp-fallback.dylib"
export DYLD_LIBRARY_PATH=$CURDIR/lib:$DYLD_LIBRARY_PATH
else
LIBRARY="$CURDIR/libgoqwen3ttscpp-fallback.so"
if [ "$(uname)" != "Darwin" ]; then
if grep -q -e "\savx\s" /proc/cpuinfo ; then
echo "CPU: AVX found OK"
if [ -e $CURDIR/libgoqwen3ttscpp-avx.so ]; then
@@ -36,9 +40,10 @@ if [ "$(uname)" != "Darwin" ]; then
LIBRARY="$CURDIR/libgoqwen3ttscpp-avx512.so"
fi
fi
export LD_LIBRARY_PATH=$CURDIR/lib:$LD_LIBRARY_PATH
fi
export LD_LIBRARY_PATH=$CURDIR/lib:$LD_LIBRARY_PATH
export QWEN3TTS_LIBRARY=$LIBRARY
# If there is a lib/ld.so, use it

View File

@@ -71,7 +71,7 @@ ifeq ($(UNAME_S),Linux)
VARIANT_TARGETS = librfdetrcpp-avx.so librfdetrcpp-avx2.so librfdetrcpp-avx512.so librfdetrcpp-fallback.so
else
# On non-Linux (e.g., Darwin), build only fallback variant
VARIANT_TARGETS = librfdetrcpp-fallback.so
VARIANT_TARGETS = librfdetrcpp-fallback.dylib
endif
rfdetr-cpp: main.go gorfdetrcpp.go $(VARIANT_TARGETS)
@@ -83,7 +83,7 @@ package: rfdetr-cpp
build: package
clean: purge
rm -rf librfdetrcpp*.so rfdetr-cpp package sources
rm -rf librfdetrcpp*.so librfdetrcpp*.dylib rfdetr-cpp package sources
purge:
rm -rf build*
@@ -110,11 +110,19 @@ librfdetrcpp-avx512.so: sources/rt-detr.cpp
endif
# Build fallback variant (all platforms)
ifeq ($(UNAME_S),Darwin)
librfdetrcpp-fallback.dylib: sources/rt-detr.cpp
rm -rfv build-$@
$(info ${GREEN}I rfdetr-cpp build info:fallback${RESET})
SO_TARGET=$@ CMAKE_ARGS="$(CMAKE_ARGS) -DGGML_AVX=off -DGGML_AVX2=off -DGGML_AVX512=off -DGGML_FMA=off -DGGML_F16C=off -DGGML_BMI2=off" $(MAKE) librfdetrcpp-custom
rm -rfv build-$@
else
librfdetrcpp-fallback.so: sources/rt-detr.cpp
rm -rfv build-$@
$(info ${GREEN}I rfdetr-cpp build info:fallback${RESET})
SO_TARGET=$@ CMAKE_ARGS="$(CMAKE_ARGS) -DGGML_AVX=off -DGGML_AVX2=off -DGGML_AVX512=off -DGGML_FMA=off -DGGML_F16C=off -DGGML_BMI2=off" $(MAKE) librfdetrcpp-custom
rm -rfv build-$@
endif
librfdetrcpp-custom: CMakeLists.txt
mkdir -p build-$(SO_TARGET) && \
@@ -122,7 +130,8 @@ librfdetrcpp-custom: CMakeLists.txt
cmake .. $(CMAKE_ARGS) && \
cmake --build . --config Release -j$(JOBS) && \
cd .. && \
mv build-$(SO_TARGET)/librfdetrcpp.so ./$(SO_TARGET)
(mv build-$(SO_TARGET)/librfdetrcpp.so ./$(SO_TARGET) 2>/dev/null || \
mv build-$(SO_TARGET)/librfdetrcpp.dylib ./$(SO_TARGET) 2>/dev/null)
all: rfdetr-cpp package

View File

@@ -9,6 +9,7 @@ package main
import (
"flag"
"os"
"runtime"
"github.com/ebitengine/purego"
grpc "github.com/mudler/LocalAI/pkg/grpc"
@@ -27,7 +28,11 @@ func main() {
// Get library name from environment variable, default to fallback
libName := os.Getenv("RFDETR_LIBRARY")
if libName == "" {
libName = "./librfdetrcpp-fallback.so"
if runtime.GOOS == "darwin" {
libName = "./librfdetrcpp-fallback.dylib"
} else {
libName = "./librfdetrcpp-fallback.so"
}
}
rfdetrLib, err := purego.Dlopen(libName, purego.RTLD_NOW|purego.RTLD_GLOBAL)

View File

@@ -10,7 +10,8 @@ REPO_ROOT="${CURDIR}/../../.."
# Create lib directory
mkdir -p $CURDIR/package/lib
cp -avf $CURDIR/librfdetrcpp-*.so $CURDIR/package/
cp -fv $CURDIR/librfdetrcpp-*.so $CURDIR/package/ 2>/dev/null || true
cp -fv $CURDIR/librfdetrcpp-*.dylib $CURDIR/package/ 2>/dev/null || true
cp -avf $CURDIR/rfdetr-cpp $CURDIR/package/
cp -fv $CURDIR/run.sh $CURDIR/package/

View File

@@ -12,9 +12,13 @@ if [ "$(uname)" != "Darwin" ]; then
grep -e "flags" /proc/cpuinfo | head -1
fi
LIBRARY="$CURDIR/librfdetrcpp-fallback.so"
if [ "$(uname)" = "Darwin" ]; then
# macOS: single dylib variant (Metal or Accelerate)
LIBRARY="$CURDIR/librfdetrcpp-fallback.dylib"
export DYLD_LIBRARY_PATH=$CURDIR/lib:$DYLD_LIBRARY_PATH
else
LIBRARY="$CURDIR/librfdetrcpp-fallback.so"
if [ "$(uname)" != "Darwin" ]; then
if grep -q -e "\savx\s" /proc/cpuinfo ; then
echo "CPU: AVX found OK"
if [ -e $CURDIR/librfdetrcpp-avx.so ]; then
@@ -36,9 +40,10 @@ if [ "$(uname)" != "Darwin" ]; then
LIBRARY="$CURDIR/librfdetrcpp-avx512.so"
fi
fi
export LD_LIBRARY_PATH=$CURDIR/lib:$LD_LIBRARY_PATH
fi
export LD_LIBRARY_PATH=$CURDIR/lib:$LD_LIBRARY_PATH
export RFDETR_LIBRARY=$LIBRARY
# If there is a lib/ld.so, use it

View File

@@ -66,7 +66,7 @@ ifeq ($(UNAME_S),Linux)
VARIANT_TARGETS = libgosam3-avx.so libgosam3-avx2.so libgosam3-avx512.so libgosam3-fallback.so
else
# On non-Linux (e.g., Darwin), build only fallback variant
VARIANT_TARGETS = libgosam3-fallback.so
VARIANT_TARGETS = libgosam3-fallback.dylib
endif
sam3-cpp: main.go gosam3.go $(VARIANT_TARGETS)
@@ -78,7 +78,7 @@ package: sam3-cpp
build: package
clean: purge
rm -rf libgosam3*.so sam3-cpp package sources
rm -rf libgosam3*.so libgosam3*.dylib sam3-cpp package sources
purge:
rm -rf build*
@@ -105,11 +105,19 @@ libgosam3-avx512.so: sources/sam3.cpp
endif
# Build fallback variant (all platforms)
ifeq ($(UNAME_S),Darwin)
libgosam3-fallback.dylib: sources/sam3.cpp
$(MAKE) purge
$(info ${GREEN}I sam3-cpp build info:fallback${RESET})
SO_TARGET=libgosam3-fallback.dylib CMAKE_ARGS="$(CMAKE_ARGS) -DGGML_AVX=off -DGGML_AVX2=off -DGGML_AVX512=off -DGGML_FMA=off -DGGML_F16C=off -DGGML_BMI2=off" $(MAKE) libgosam3-custom
rm -rfv build*
else
libgosam3-fallback.so: sources/sam3.cpp
$(MAKE) purge
$(info ${GREEN}I sam3-cpp build info:fallback${RESET})
SO_TARGET=libgosam3-fallback.so CMAKE_ARGS="$(CMAKE_ARGS) -DGGML_AVX=off -DGGML_AVX2=off -DGGML_AVX512=off -DGGML_FMA=off -DGGML_F16C=off -DGGML_BMI2=off" $(MAKE) libgosam3-custom
rm -rfv build*
endif
libgosam3-custom: CMakeLists.txt cpp/gosam3.cpp cpp/gosam3.h
mkdir -p build-$(SO_TARGET) && \
@@ -117,6 +125,7 @@ libgosam3-custom: CMakeLists.txt cpp/gosam3.cpp cpp/gosam3.h
cmake .. $(CMAKE_ARGS) && \
cmake --build . --config Release -j$(JOBS) && \
cd .. && \
mv build-$(SO_TARGET)/libgosam3.so ./$(SO_TARGET)
(mv build-$(SO_TARGET)/libgosam3.so ./$(SO_TARGET) 2>/dev/null || \
mv build-$(SO_TARGET)/libgosam3.dylib ./$(SO_TARGET) 2>/dev/null)
all: sam3-cpp package

View File

@@ -3,6 +3,7 @@ package main
import (
"flag"
"os"
"runtime"
"github.com/ebitengine/purego"
grpc "github.com/mudler/LocalAI/pkg/grpc"
@@ -21,7 +22,11 @@ func main() {
// Get library name from environment variable, default to fallback
libName := os.Getenv("SAM3_LIBRARY")
if libName == "" {
libName = "./libgosam3-fallback.so"
if runtime.GOOS == "darwin" {
libName = "./libgosam3-fallback.dylib"
} else {
libName = "./libgosam3-fallback.so"
}
}
gosamLib, err := purego.Dlopen(libName, purego.RTLD_NOW|purego.RTLD_GLOBAL)

View File

@@ -10,7 +10,8 @@ REPO_ROOT="${CURDIR}/../../.."
# Create lib directory
mkdir -p $CURDIR/package/lib
cp -avf $CURDIR/libgosam3-*.so $CURDIR/package/
cp -fv $CURDIR/libgosam3-*.so $CURDIR/package/ 2>/dev/null || true
cp -fv $CURDIR/libgosam3-*.dylib $CURDIR/package/ 2>/dev/null || true
cp -avf $CURDIR/sam3-cpp $CURDIR/package/
cp -fv $CURDIR/run.sh $CURDIR/package/

View File

@@ -12,9 +12,13 @@ if [ "$(uname)" != "Darwin" ]; then
grep -e "flags" /proc/cpuinfo | head -1
fi
LIBRARY="$CURDIR/libgosam3-fallback.so"
if [ "$(uname)" = "Darwin" ]; then
# macOS: single dylib variant (Metal or Accelerate)
LIBRARY="$CURDIR/libgosam3-fallback.dylib"
export DYLD_LIBRARY_PATH=$CURDIR/lib:$DYLD_LIBRARY_PATH
else
LIBRARY="$CURDIR/libgosam3-fallback.so"
if [ "$(uname)" != "Darwin" ]; then
if grep -q -e "\savx\s" /proc/cpuinfo ; then
echo "CPU: AVX found OK"
if [ -e $CURDIR/libgosam3-avx.so ]; then
@@ -36,9 +40,10 @@ if [ "$(uname)" != "Darwin" ]; then
LIBRARY="$CURDIR/libgosam3-avx512.so"
fi
fi
export LD_LIBRARY_PATH=$CURDIR/lib:$LD_LIBRARY_PATH
fi
export LD_LIBRARY_PATH=$CURDIR/lib:$LD_LIBRARY_PATH
export SAM3_LIBRARY=$LIBRARY
# If there is a lib/ld.so, use it

View File

@@ -7,6 +7,7 @@ import (
"fmt"
"os"
"path/filepath"
"runtime"
"strconv"
"strings"
"sync"
@@ -238,11 +239,19 @@ func loadSherpaLibs() error {
func loadSherpaLibsOnce() error {
shimLib := os.Getenv("SHERPA_SHIM_LIBRARY")
if shimLib == "" {
shimLib = "libsherpa-shim.so"
if runtime.GOOS == "darwin" {
shimLib = "libsherpa-shim.dylib"
} else {
shimLib = "libsherpa-shim.so"
}
}
capiLib := os.Getenv("SHERPA_ONNX_LIBRARY")
if capiLib == "" {
capiLib = "libsherpa-onnx-c-api.so"
if runtime.GOOS == "darwin" {
capiLib = "libsherpa-onnx-c-api.dylib"
} else {
capiLib = "libsherpa-onnx-c-api.so"
}
}
shim, err := purego.Dlopen(shimLib, purego.RTLD_NOW|purego.RTLD_GLOBAL)

View File

@@ -3,7 +3,13 @@ set -ex
CURDIR=$(dirname "$(realpath $0)")
export LD_LIBRARY_PATH=$CURDIR/lib:$LD_LIBRARY_PATH
if [ "$(uname)" = "Darwin" ]; then
export DYLD_LIBRARY_PATH=$CURDIR/lib:$DYLD_LIBRARY_PATH
export SHERPA_SHIM_LIBRARY=$CURDIR/lib/libsherpa-shim.dylib
export SHERPA_ONNX_LIBRARY=$CURDIR/lib/libsherpa-onnx-c-api.dylib
else
export LD_LIBRARY_PATH=$CURDIR/lib:$LD_LIBRARY_PATH
fi
if [ -f $CURDIR/lib/ld.so ]; then
echo "Using lib/ld.so"

View File

@@ -8,7 +8,7 @@ JOBS?=$(shell nproc --ignore=1)
# stablediffusion.cpp (ggml)
STABLEDIFFUSION_GGML_REPO?=https://github.com/leejet/stable-diffusion.cpp
STABLEDIFFUSION_GGML_VERSION?=b12098f5d09fc83da36e65c784f7bdb16a5a5ebf
STABLEDIFFUSION_GGML_VERSION?=8caa3f908ae6d4a4bef531e73b9a969f266a3d1f
CMAKE_ARGS+=-DGGML_MAX_NAME=128
@@ -131,6 +131,7 @@ libgosd-custom: CMakeLists.txt cpp/gosd.cpp cpp/gosd.h
cmake .. $(CMAKE_ARGS) && \
cmake --build . --config Release -j$(JOBS) && \
cd .. && \
mv build-$(SO_TARGET)/libgosd.so ./$(SO_TARGET)
(mv build-$(SO_TARGET)/libgosd.so ./$(SO_TARGET) 2>/dev/null || \
mv build-$(SO_TARGET)/libgosd.dylib ./$(SO_TARGET) 2>/dev/null)
all: stablediffusion-ggml package

View File

@@ -3,6 +3,7 @@ package main
import (
"flag"
"os"
"runtime"
"github.com/ebitengine/purego"
grpc "github.com/mudler/LocalAI/pkg/grpc"
@@ -21,7 +22,11 @@ func main() {
// Get library name from environment variable, default to fallback
libName := os.Getenv("SD_LIBRARY")
if libName == "" {
libName = "./libgosd-fallback.so"
if runtime.GOOS == "darwin" {
libName = "./libgosd-fallback.dylib"
} else {
libName = "./libgosd-fallback.so"
}
}
gosd, err := purego.Dlopen(libName, purego.RTLD_NOW|purego.RTLD_GLOBAL)

View File

@@ -12,6 +12,7 @@ REPO_ROOT="${CURDIR}/../../.."
mkdir -p $CURDIR/package/lib
cp -avf $CURDIR/libgosd-*.so $CURDIR/package/
cp -fv $CURDIR/libgosd-*.dylib $CURDIR/package/ 2>/dev/null || true
cp -avf $CURDIR/stablediffusion-ggml $CURDIR/package/
cp -fv $CURDIR/run.sh $CURDIR/package/

View File

@@ -12,9 +12,18 @@ if [ "$(uname)" != "Darwin" ]; then
grep -e "flags" /proc/cpuinfo | head -1
fi
LIBRARY="$CURDIR/libgosd-fallback.so"
if [ "$(uname)" = "Darwin" ]; then
# macOS: single library variant (Metal or Accelerate). The gosd target is
# built as a CMake MODULE, which emits a .dylib for a SHARED build but a
# .so for a MODULE build on Apple, so prefer .dylib and fall back to .so.
LIBRARY="$CURDIR/libgosd-fallback.dylib"
if [ ! -e "$LIBRARY" ]; then
LIBRARY="$CURDIR/libgosd-fallback.so"
fi
export DYLD_LIBRARY_PATH=$CURDIR/lib:$DYLD_LIBRARY_PATH
else
LIBRARY="$CURDIR/libgosd-fallback.so"
if [ "$(uname)" != "Darwin" ]; then
if grep -q -e "\savx\s" /proc/cpuinfo ; then
echo "CPU: AVX found OK"
if [ -e $CURDIR/libgosd-avx.so ]; then
@@ -36,9 +45,10 @@ if [ "$(uname)" != "Darwin" ]; then
LIBRARY="$CURDIR/libgosd-avx512.so"
fi
fi
export LD_LIBRARY_PATH=$CURDIR/lib:$LD_LIBRARY_PATH
fi
export LD_LIBRARY_PATH=$CURDIR/lib:$LD_LIBRARY_PATH
export SD_LIBRARY=$LIBRARY
# If there is a lib/ld.so, use it

View File

@@ -16,6 +16,7 @@ import (
"os"
"path/filepath"
"regexp"
"runtime"
"strings"
"time"
"unicode"
@@ -943,7 +944,13 @@ func InitializeONNXRuntime() error {
}
}
if libPath == "" {
libPath = "/usr/local/lib/libonnxruntime.so"
// LocalAI: default to the platform-native shared library
// extension when nothing else is found (dyld vs ld.so).
if runtime.GOOS == "darwin" {
libPath = "/usr/local/lib/libonnxruntime.dylib"
} else {
libPath = "/usr/local/lib/libonnxruntime.so"
}
}
}
ort.SetSharedLibraryPath(libPath)

View File

@@ -32,6 +32,10 @@ elif [ -f "/lib/ld-linux-aarch64.so.1" ]; then
cp -arfLv /lib/aarch64-linux-gnu/libdl.so.2 $CURDIR/package/lib/libdl.so.2
cp -arfLv /lib/aarch64-linux-gnu/librt.so.1 $CURDIR/package/lib/librt.so.1
cp -arfLv /lib/aarch64-linux-gnu/libpthread.so.0 $CURDIR/package/lib/libpthread.so.0
elif [ $(uname -s) = "Darwin" ]; then
# macOS: dyld resolves the bundled .dylib via DYLD_LIBRARY_PATH (set in
# run.sh); there is no ld.so loader nor glibc to bundle.
echo "Detected Darwin"
else
echo "Error: Could not detect architecture"
exit 1

View File

@@ -3,12 +3,19 @@ set -ex
CURDIR=$(dirname "$(realpath $0)")
export LD_LIBRARY_PATH=$CURDIR/lib:$LD_LIBRARY_PATH
export ONNXRUNTIME_LIB_PATH=$CURDIR/lib/libonnxruntime.so
if [ "$(uname)" = "Darwin" ]; then
# macOS uses dyld: there is no ld.so loader, and the search path env
# var is DYLD_LIBRARY_PATH. ONNX Runtime ships as a .dylib here.
export DYLD_LIBRARY_PATH=$CURDIR/lib:$DYLD_LIBRARY_PATH
export ONNXRUNTIME_LIB_PATH=$CURDIR/lib/libonnxruntime.dylib
else
export LD_LIBRARY_PATH=$CURDIR/lib:$LD_LIBRARY_PATH
export ONNXRUNTIME_LIB_PATH=$CURDIR/lib/libonnxruntime.so
if [ -f $CURDIR/lib/ld.so ]; then
echo "Using lib/ld.so"
exec $CURDIR/lib/ld.so $CURDIR/supertonic "$@"
if [ -f $CURDIR/lib/ld.so ]; then
echo "Using lib/ld.so"
exec $CURDIR/lib/ld.so $CURDIR/supertonic "$@"
fi
fi
exec $CURDIR/supertonic "$@"

View File

@@ -70,8 +70,8 @@ UNAME_S := $(shell uname -s)
ifeq ($(UNAME_S),Linux)
VARIANT_TARGETS = libgovibevoicecpp-avx.so libgovibevoicecpp-avx2.so libgovibevoicecpp-avx512.so libgovibevoicecpp-fallback.so
else
# On non-Linux (e.g., Darwin), build only fallback variant
VARIANT_TARGETS = libgovibevoicecpp-fallback.so
# On non-Linux (e.g., Darwin), build only fallback variant (as a dylib)
VARIANT_TARGETS = libgovibevoicecpp-fallback.dylib
endif
vibevoice-cpp: main.go govibevoicecpp.go $(VARIANT_TARGETS)
@@ -83,7 +83,7 @@ package: vibevoice-cpp
build: package
clean: purge
rm -rf libgovibevoicecpp*.so package sources/vibevoice.cpp vibevoice-cpp
rm -rf libgovibevoicecpp*.so libgovibevoicecpp*.dylib package sources/vibevoice.cpp vibevoice-cpp
purge:
rm -rf build*
@@ -119,13 +119,21 @@ libgovibevoicecpp-fallback.so: sources/vibevoice.cpp
SO_TARGET=libgovibevoicecpp-fallback.so CMAKE_ARGS="$(CMAKE_ARGS) -DGGML_AVX=off -DGGML_AVX2=off -DGGML_AVX512=off -DGGML_FMA=off -DGGML_F16C=off -DGGML_BMI2=off" $(MAKE) libgovibevoicecpp-custom
rm -rfv build*
# Build fallback variant as a dylib (Darwin)
libgovibevoicecpp-fallback.dylib: sources/vibevoice.cpp
$(MAKE) purge
$(info ${GREEN}I vibevoice-cpp build info:fallback (dylib)${RESET})
SO_TARGET=libgovibevoicecpp-fallback.dylib CMAKE_ARGS="$(CMAKE_ARGS) -DGGML_AVX=off -DGGML_AVX2=off -DGGML_AVX512=off -DGGML_FMA=off -DGGML_F16C=off -DGGML_BMI2=off" $(MAKE) libgovibevoicecpp-custom
rm -rfv build*
libgovibevoicecpp-custom: CMakeLists.txt cpp/govibevoicecpp.cpp cpp/govibevoicecpp.h
mkdir -p build-$(SO_TARGET) && \
cd build-$(SO_TARGET) && \
cmake .. $(CMAKE_ARGS) && \
cmake --build . --config Release -j$(JOBS) --target govibevoicecpp && \
cd .. && \
mv build-$(SO_TARGET)/libgovibevoicecpp.so ./$(SO_TARGET)
(mv build-$(SO_TARGET)/libgovibevoicecpp.so ./$(SO_TARGET) 2>/dev/null || \
mv build-$(SO_TARGET)/libgovibevoicecpp.dylib ./$(SO_TARGET) 2>/dev/null)
test: vibevoice-cpp
@echo "Running vibevoice-cpp tests..."

View File

@@ -4,6 +4,7 @@ package main
import (
"flag"
"os"
"runtime"
"github.com/ebitengine/purego"
grpc "github.com/mudler/LocalAI/pkg/grpc"
@@ -21,7 +22,11 @@ type LibFuncs struct {
func main() {
libName := os.Getenv("VIBEVOICECPP_LIBRARY")
if libName == "" {
libName = "./libgovibevoicecpp-fallback.so"
if runtime.GOOS == "darwin" {
libName = "./libgovibevoicecpp-fallback.dylib"
} else {
libName = "./libgovibevoicecpp-fallback.so"
}
}
lib, err := purego.Dlopen(libName, purego.RTLD_NOW|purego.RTLD_GLOBAL)

View File

@@ -12,7 +12,8 @@ REPO_ROOT="${CURDIR}/../../.."
mkdir -p $CURDIR/package/lib
cp -avf $CURDIR/vibevoice-cpp $CURDIR/package/
cp -fv $CURDIR/libgovibevoicecpp-*.so $CURDIR/package/
cp -fv $CURDIR/libgovibevoicecpp-*.so $CURDIR/package/ 2>/dev/null || true
cp -fv $CURDIR/libgovibevoicecpp-*.dylib $CURDIR/package/ 2>/dev/null || true
cp -fv $CURDIR/run.sh $CURDIR/package/
# Detect architecture and copy appropriate libraries

View File

@@ -11,9 +11,13 @@ if [ "$(uname)" != "Darwin" ]; then
grep -e "flags" /proc/cpuinfo | head -1
fi
LIBRARY="$CURDIR/libgovibevoicecpp-fallback.so"
if [ "$(uname)" = "Darwin" ]; then
# macOS: single dylib variant (Metal or Accelerate)
LIBRARY="$CURDIR/libgovibevoicecpp-fallback.dylib"
export DYLD_LIBRARY_PATH=$CURDIR/lib:$DYLD_LIBRARY_PATH
else
LIBRARY="$CURDIR/libgovibevoicecpp-fallback.so"
if [ "$(uname)" != "Darwin" ]; then
if grep -q -e "\savx\s" /proc/cpuinfo ; then
echo "CPU: AVX found OK"
if [ -e $CURDIR/libgovibevoicecpp-avx.so ]; then
@@ -34,9 +38,10 @@ if [ "$(uname)" != "Darwin" ]; then
LIBRARY="$CURDIR/libgovibevoicecpp-avx512.so"
fi
fi
export LD_LIBRARY_PATH=$CURDIR/lib:$LD_LIBRARY_PATH
fi
export LD_LIBRARY_PATH=$CURDIR/lib:$LD_LIBRARY_PATH
export VIBEVOICECPP_LIBRARY=$LIBRARY
if [ -f $CURDIR/lib/ld.so ]; then

View File

@@ -8,7 +8,7 @@ JOBS?=$(shell nproc --ignore=1)
# whisper.cpp version
WHISPER_REPO?=https://github.com/ggml-org/whisper.cpp
WHISPER_CPP_VERSION?=5ed76e9a079962f1c85cfce44edd325c27ef1f97
WHISPER_CPP_VERSION?=43d78af5be58f41d6ffbc227d608f104577741ea
SO_TARGET?=libgowhisper.so
CMAKE_ARGS+=-DBUILD_SHARED_LIBS=OFF
@@ -117,6 +117,7 @@ libgowhisper-custom: CMakeLists.txt cpp/gowhisper.cpp cpp/gowhisper.h
cmake .. $(CMAKE_ARGS) && \
cmake --build . --config Release -j$(JOBS) && \
cd .. && \
mv build-$(SO_TARGET)/libgowhisper.so ./$(SO_TARGET)
mv build-$(SO_TARGET)/libgowhisper.so ./$(SO_TARGET) 2>/dev/null || \
mv build-$(SO_TARGET)/libgowhisper.dylib ./$(SO_TARGET:.so=.dylib)
all: whisper package

View File

@@ -4,6 +4,7 @@ package main
import (
"flag"
"os"
"runtime"
"github.com/ebitengine/purego"
grpc "github.com/mudler/LocalAI/pkg/grpc"
@@ -22,7 +23,11 @@ func main() {
// Get library name from environment variable, default to fallback
libName := os.Getenv("WHISPER_LIBRARY")
if libName == "" {
libName = "./libgowhisper-fallback.so"
if runtime.GOOS == "darwin" {
libName = "./libgowhisper-fallback.dylib"
} else {
libName = "./libgowhisper-fallback.so"
}
}
gosd, err := purego.Dlopen(libName, purego.RTLD_NOW|purego.RTLD_GLOBAL)

View File

@@ -12,7 +12,8 @@ REPO_ROOT="${CURDIR}/../../.."
mkdir -p $CURDIR/package/lib
cp -avf $CURDIR/whisper $CURDIR/package/
cp -fv $CURDIR/libgowhisper-*.so $CURDIR/package/
cp -fv $CURDIR/libgowhisper-*.so $CURDIR/package/ 2>/dev/null || true
cp -fv $CURDIR/libgowhisper-*.dylib $CURDIR/package/ 2>/dev/null || true
cp -fv $CURDIR/run.sh $CURDIR/package/
# Detect architecture and copy appropriate libraries

View File

@@ -12,9 +12,13 @@ if [ "$(uname)" != "Darwin" ]; then
grep -e "flags" /proc/cpuinfo | head -1
fi
LIBRARY="$CURDIR/libgowhisper-fallback.so"
if [ "$(uname)" = "Darwin" ]; then
# macOS: single dylib variant (Metal or Accelerate)
LIBRARY="$CURDIR/libgowhisper-fallback.dylib"
export DYLD_LIBRARY_PATH=$CURDIR/lib:$DYLD_LIBRARY_PATH
else
LIBRARY="$CURDIR/libgowhisper-fallback.so"
if [ "$(uname)" != "Darwin" ]; then
if grep -q -e "\savx\s" /proc/cpuinfo ; then
echo "CPU: AVX found OK"
if [ -e $CURDIR/libgowhisper-avx.so ]; then
@@ -36,9 +40,10 @@ if [ "$(uname)" != "Darwin" ]; then
LIBRARY="$CURDIR/libgowhisper-avx512.so"
fi
fi
export LD_LIBRARY_PATH=$CURDIR/lib:$LD_LIBRARY_PATH
fi
export LD_LIBRARY_PATH=$CURDIR/lib:$LD_LIBRARY_PATH
export WHISPER_LIBRARY=$LIBRARY
# If there is a lib/ld.so, use it

View File

@@ -1284,6 +1284,7 @@
nvidia-cuda-13: "cuda13-liquid-audio"
nvidia-cuda-12: "cuda12-liquid-audio"
nvidia-l4t-cuda-13: "cuda13-nvidia-l4t-arm64-liquid-audio"
metal: "metal-liquid-audio"
icon: https://cdn-avatars.huggingface.co/v1/production/uploads/61b8e2ba285851687028d395/7_6D7rWrLxp2hb6OHSV1p.png
- &qwen-tts
urls:
@@ -1569,6 +1570,7 @@
- TTS
capabilities:
default: "cpu-supertonic"
metal: "metal-supertonic"
- !!merge <<: *neutts
name: "neutts-development"
capabilities:
@@ -4612,6 +4614,7 @@
nvidia-cuda-13: "cuda13-liquid-audio-development"
nvidia-cuda-12: "cuda12-liquid-audio-development"
nvidia-l4t-cuda-13: "cuda13-nvidia-l4t-arm64-liquid-audio-development"
metal: "metal-liquid-audio-development"
- !!merge <<: *liquid-audio
name: "cpu-liquid-audio"
uri: "quay.io/go-skynet/local-ai-backends:latest-cpu-liquid-audio"
@@ -4622,6 +4625,16 @@
uri: "quay.io/go-skynet/local-ai-backends:master-cpu-liquid-audio"
mirrors:
- localai/localai-backends:master-cpu-liquid-audio
- !!merge <<: *liquid-audio
name: "metal-liquid-audio"
uri: "quay.io/go-skynet/local-ai-backends:latest-metal-darwin-arm64-liquid-audio"
mirrors:
- localai/localai-backends:latest-metal-darwin-arm64-liquid-audio
- !!merge <<: *liquid-audio
name: "metal-liquid-audio-development"
uri: "quay.io/go-skynet/local-ai-backends:master-metal-darwin-arm64-liquid-audio"
mirrors:
- localai/localai-backends:master-metal-darwin-arm64-liquid-audio
- !!merge <<: *liquid-audio
name: "cuda12-liquid-audio"
uri: "quay.io/go-skynet/local-ai-backends:latest-gpu-nvidia-cuda-12-liquid-audio"
@@ -5282,6 +5295,7 @@
nvidia: "cuda12-trl"
nvidia-cuda-12: "cuda12-trl"
nvidia-cuda-13: "cuda13-trl"
metal: "metal-trl"
## TRL backend images
- !!merge <<: *trl
name: "cpu-trl"
@@ -5313,6 +5327,16 @@
uri: "quay.io/go-skynet/local-ai-backends:master-gpu-nvidia-cuda-13-trl"
mirrors:
- localai/localai-backends:master-gpu-nvidia-cuda-13-trl
- !!merge <<: *trl
name: "metal-trl"
uri: "quay.io/go-skynet/local-ai-backends:latest-metal-darwin-arm64-trl"
mirrors:
- localai/localai-backends:latest-metal-darwin-arm64-trl
- !!merge <<: *trl
name: "metal-trl-development"
uri: "quay.io/go-skynet/local-ai-backends:master-metal-darwin-arm64-trl"
mirrors:
- localai/localai-backends:master-metal-darwin-arm64-trl
## llama.cpp quantization backend
- &llama-cpp-quantization
name: "llama-cpp-quantization"
@@ -5484,6 +5508,7 @@
name: "supertonic-development"
capabilities:
default: "cpu-supertonic-development"
metal: "metal-supertonic-development"
- !!merge <<: *supertonic
name: "cpu-supertonic"
uri: "quay.io/go-skynet/local-ai-backends:latest-cpu-supertonic"
@@ -5494,3 +5519,13 @@
uri: "quay.io/go-skynet/local-ai-backends:master-cpu-supertonic"
mirrors:
- localai/localai-backends:master-cpu-supertonic
- !!merge <<: *supertonic
name: "metal-supertonic"
uri: "quay.io/go-skynet/local-ai-backends:latest-metal-darwin-arm64-supertonic"
mirrors:
- localai/localai-backends:latest-metal-darwin-arm64-supertonic
- !!merge <<: *supertonic
name: "metal-supertonic-development"
uri: "quay.io/go-skynet/local-ai-backends:master-metal-darwin-arm64-supertonic"
mirrors:
- localai/localai-backends:master-metal-darwin-arm64-supertonic

View File

@@ -14,5 +14,11 @@ else
fi
# liquid-audio's torch wheels are large; allow upgrades to satisfy transitive pins
EXTRA_PIP_INSTALL_FLAGS+=" --upgrade --index-strategy=unsafe-first-match"
EXTRA_PIP_INSTALL_FLAGS+=" --upgrade"
# --index-strategy is a uv-only flag. The darwin/MPS build installs with pip
# (USE_PIP=true in scripts/build/python-darwin.sh), which rejects it. Only add
# it on the uv path; Linux/CUDA resolution is unchanged.
if [ "x${USE_PIP:-}" != "xtrue" ]; then
EXTRA_PIP_INSTALL_FLAGS+=" --index-strategy=unsafe-first-match"
fi
installRequirements

View File

@@ -1,3 +1,4 @@
# MPS (Apple Silicon / Metal) build profile - installed by the darwin CI job.
torch>=2.8.0
torchaudio>=2.8.0
torchcodec>=0.9.1

View File

@@ -8,7 +8,13 @@ else
source $backend_dir/../common/libbackend.sh
fi
EXTRA_PIP_INSTALL_FLAGS+=" --upgrade --index-strategy=unsafe-first-match"
EXTRA_PIP_INSTALL_FLAGS+=" --upgrade"
# --index-strategy is a uv-only flag. The darwin/MPS build installs with pip
# (USE_PIP=true in scripts/build/python-darwin.sh), which rejects it. Only add
# it when uv is the installer, keeping the Linux/CUDA resolution unchanged.
if [ "x${USE_PIP:-}" != "xtrue" ]; then
EXTRA_PIP_INSTALL_FLAGS+=" --index-strategy=unsafe-first-match"
fi
installRequirements
# Fetch convert_hf_to_gguf.py and gguf package from the same llama.cpp version

View File

@@ -0,0 +1,12 @@
torch==2.10.0
trl
peft
datasets>=3.0.0
transformers>=4.56.2
accelerate>=1.4.0
huggingface-hub>=1.3.0
sentencepiece
# Note: bitsandbytes is intentionally omitted on MPS. It is only used by the
# CUDA (cublas) variants for 8-bit/4-bit quantization and has poor support on
# Apple Silicon. torch here uses the plain PyPI wheels, which ship MPS support
# on macOS arm64.

View File

@@ -215,6 +215,7 @@ func readRuntimeSettingsJson(startupAppConfig config.ApplicationConfig) fileHand
envBackendGalleries := slices.Equal(appConfig.BackendGalleries, startupAppConfig.BackendGalleries)
envAutoloadGalleries := appConfig.AutoloadGalleries == startupAppConfig.AutoloadGalleries
envAutoloadBackendGalleries := appConfig.AutoloadBackendGalleries == startupAppConfig.AutoloadBackendGalleries
envPIIDefaultDetectors := slices.Equal(appConfig.PIIDefaultDetectors, startupAppConfig.PIIDefaultDetectors)
envAgentJobRetentionDays := appConfig.AgentJobRetentionDays == startupAppConfig.AgentJobRetentionDays
envForceEvictionWhenBusy := appConfig.ForceEvictionWhenBusy == startupAppConfig.ForceEvictionWhenBusy
envLRUEvictionMaxRetries := appConfig.LRUEvictionMaxRetries == startupAppConfig.LRUEvictionMaxRetries
@@ -335,6 +336,15 @@ func readRuntimeSettingsJson(startupAppConfig config.ApplicationConfig) fileHand
if settings.AutoloadBackendGalleries != nil && !envAutoloadBackendGalleries {
appConfig.AutoloadBackendGalleries = *settings.AutoloadBackendGalleries
}
if settings.PIIDefaultDetectors != nil && !envPIIDefaultDetectors {
// Request-side default redaction reads this live via
// ResolvePIIPolicy, so a file edit takes effect on the next chat
// request. The MITM listener resolves its per-host detector map
// once at start, so a raw file edit reaches cloud-proxy traffic
// only after a restart or a POST /api/settings (which rebuilds
// the listener) — the admin UI uses the latter.
appConfig.PIIDefaultDetectors = append([]string(nil), (*settings.PIIDefaultDetectors)...)
}
if settings.AutoUpgradeBackends != nil {
appConfig.AutoUpgradeBackends = *settings.AutoUpgradeBackends
}

View File

@@ -109,6 +109,52 @@ var _ = Describe("loadRuntimeSettingsFromFile", func() {
})
})
// Instance-wide default PII detectors. The file is the only source (no
// env var), and the loader runs immediately before startMITMIfConfigured,
// so a regression here means the cloud-proxy MITM listener resolves an
// empty detector set at boot and forwards intercepted traffic unredacted —
// even though pii_default_detectors is on disk and the MITM model has PII
// enabled. It also breaks request-side default redaction the same way.
Describe("PII default detectors", func() {
It("loads pii_default_detectors from the file", func() {
cfg := &config.ApplicationConfig{DynamicConfigsDir: seedSettings(`{"pii_default_detectors": ["privacy-filter-nemotron", "secret-filter"]}`)}
loadRuntimeSettingsFromFile(cfg)
Expect(cfg.PIIDefaultDetectors).To(Equal([]string{"privacy-filter-nemotron", "secret-filter"}))
})
It("does not override an env/CLI-set value (LOCALAI_PII_DEFAULT_DETECTORS)", func() {
cfg := &config.ApplicationConfig{
DynamicConfigsDir: seedSettings(`{"pii_default_detectors": ["from-file"]}`),
PIIDefaultDetectors: []string{"from-env"}, // simulate WithPIIDefaultDetectors(env)
}
loadRuntimeSettingsFromFile(cfg)
Expect(cfg.PIIDefaultDetectors).To(Equal([]string{"from-env"}), "env var must win over the persisted file value")
})
})
// The live file watcher applies pii_default_detectors on a runtime change
// the same way it handles galleries/threads/etc.: env-set values (current
// == startup snapshot) are left alone, otherwise the file value is applied
// to the live config so request-side default redaction picks it up without
// a restart.
Describe("file watcher: pii_default_detectors", func() {
It("applies a changed file value to the live config", func() {
startup := config.ApplicationConfig{} // no env baseline
live := &config.ApplicationConfig{PIIDefaultDetectors: []string{"old"}}
handler := readRuntimeSettingsJson(startup)
Expect(handler([]byte(`{"pii_default_detectors":["new-a","new-b"]}`), live)).To(Succeed())
Expect(live.PIIDefaultDetectors).To(Equal([]string{"new-a", "new-b"}))
})
It("leaves an env-controlled value untouched", func() {
startup := config.ApplicationConfig{PIIDefaultDetectors: []string{"from-env"}}
live := &config.ApplicationConfig{PIIDefaultDetectors: []string{"from-env"}}
handler := readRuntimeSettingsJson(startup)
Expect(handler([]byte(`{"pii_default_detectors":["from-file"]}`), live)).To(Succeed())
Expect(live.PIIDefaultDetectors).To(Equal([]string{"from-env"}), "env-controlled detectors must not be overwritten by the file")
})
})
// The Agent Pool block has a mix of zero and non-zero defaults
// (Enabled=true, EmbeddingModel="granite-...", MaxChunkingSize=400,
// VectorEngine="chromem", AgentHubURL="https://agenthub.localai.io").

View File

@@ -750,6 +750,20 @@ func loadRuntimeSettingsFromFile(options *config.ApplicationConfig) {
options.MITMListen = *settings.MITMListen
}
// Instance-wide default PII detectors. LOCALAI_PII_DEFAULT_DETECTORS (via
// WithPIIDefaultDetectors) wins when set; otherwise the file is the source
// — apply it only when the env/CLI left the value empty, mirroring the
// "env > file" precedence used for the other fields. This must land before
// startMITMIfConfigured (called right after this loader): the cloud-proxy
// listener resolves each intercept host's detectors once at start via
// ResolvePIIPolicy, and a MITM model that names no detectors of its own
// falls back to these defaults. Without it the listener (and request-side
// default redaction) starts with an empty detector set and forwards
// traffic unredacted even though pii_default_detectors is on disk.
if settings.PIIDefaultDetectors != nil && len(options.PIIDefaultDetectors) == 0 {
options.PIIDefaultDetectors = append([]string(nil), (*settings.PIIDefaultDetectors)...)
}
// Backend upgrade flags
if settings.AutoUpgradeBackends != nil {
if !options.AutoUpgradeBackends {

View File

@@ -140,7 +140,7 @@ type RunCMD struct {
OIDCIssuer string `env:"LOCALAI_OIDC_ISSUER" help:"OIDC issuer URL for auto-discovery" group:"auth"`
OIDCClientID string `env:"LOCALAI_OIDC_CLIENT_ID" help:"OIDC Client ID (auto-enables auth)" group:"auth"`
OIDCClientSecret string `env:"LOCALAI_OIDC_CLIENT_SECRET" help:"OIDC Client Secret" group:"auth"`
AuthBaseURL string `env:"LOCALAI_BASE_URL" help:"Base URL for OAuth callbacks (e.g. http://localhost:8080)" group:"auth"`
ExternalBaseURL string `env:"LOCALAI_BASE_URL" help:"External base URL of this instance (e.g. https://localhost:8080). Used for OAuth callbacks and self-referential links (generated images/videos, job status). When unset, derived from X-Forwarded-Proto/Host or Forwarded headers." group:"api"`
AuthAdminEmail string `env:"LOCALAI_ADMIN_EMAIL" help:"Email address to auto-promote to admin role" group:"auth"`
AuthRegistrationMode string `env:"LOCALAI_REGISTRATION_MODE" default:"open" help:"Registration mode: 'open' (default), 'approval', or 'invite' (invite code required)" group:"auth"`
DisableLocalAuth bool `env:"LOCALAI_DISABLE_LOCAL_AUTH" default:"false" help:"Disable local email/password registration and login (use with OAuth/OIDC-only setups)" group:"auth"`
@@ -181,6 +181,8 @@ type RunCMD struct {
// Cloud-proxy MITM listener (off by default).
MITMListen string `env:"LOCALAI_MITM_LISTEN" help:"Address (host:port) for the cloudproxy MITM listener. Empty = disabled. Clients set HTTPS_PROXY=http://<this>:<port>. Intercept hosts are declared per-model via the model YAML mitm.hosts: block; create one from the Add Model UI." group:"middleware"`
MITMCADir string `env:"LOCALAI_MITM_CA_DIR" type:"path" help:"Directory holding the MITM proxy CA cert + key. Defaults to <data-path>/mitm-ca." group:"middleware"`
PIIDefaultDetectors []string `env:"LOCALAI_PII_DEFAULT_DETECTORS" help:"Instance-wide default PII/secret detector model names applied to any PII-enabled model (chiefly cloud-proxy / MITM models) that names no pii.detectors of its own. Comma-separated, e.g. privacy-filter-nemotron,secret-filter. Takes precedence over the value persisted via the Middleware UI." group:"middleware"`
}
func (r *RunCMD) Run(ctx *cliContext.Context) error {
@@ -243,6 +245,7 @@ func (r *RunCMD) Run(ctx *cliContext.Context) error {
config.WithAPIAddress(r.Address),
config.WithMITMListen(r.MITMListen),
config.WithMITMCADir(r.MITMCADir),
config.WithPIIDefaultDetectors(r.PIIDefaultDetectors),
config.WithAgentJobRetentionDays(r.AgentJobRetentionDays),
config.WithLlamaCPPTunnelCallback(func(tunnels []string) {
tunnelEnvVar := strings.Join(tunnels, ",")
@@ -500,9 +503,6 @@ func (r *RunCMD) Run(ctx *cliContext.Context) error {
opts = append(opts, config.WithAuthOIDCClientID(r.OIDCClientID))
opts = append(opts, config.WithAuthOIDCClientSecret(r.OIDCClientSecret))
}
if r.AuthBaseURL != "" {
opts = append(opts, config.WithAuthBaseURL(r.AuthBaseURL))
}
if r.AuthAdminEmail != "" {
opts = append(opts, config.WithAuthAdminEmail(r.AuthAdminEmail))
}
@@ -520,6 +520,12 @@ func (r *RunCMD) Run(ctx *cliContext.Context) error {
}
}
// Applied unconditionally: the external base URL governs all self-referential
// links (not just OAuth callbacks), so it must take effect even when auth is off.
if r.ExternalBaseURL != "" {
opts = append(opts, config.WithExternalBaseURL(r.ExternalBaseURL))
}
if idleWatchDog || busyWatchDog {
opts = append(opts, config.EnableWatchDog)
if idleWatchDog {

View File

@@ -49,6 +49,13 @@ type ApplicationConfig struct {
P2PNetworkID string
Federated bool
// ExternalBaseURL is the externally visible base URL of this instance
// (scheme+host[:port]), set via LOCALAI_BASE_URL. When non-empty it is
// authoritative for every self-referential URL LocalAI emits (OAuth
// callbacks, generated image/video links, async job StatusURLs),
// overriding proxy-header detection. Empty = derive from request headers.
ExternalBaseURL string
// DisableStats turns off per-request token tracking. By default the
// routing module's billing recorder runs in every mode (including
// no-auth single-user) so dashboards and `/api/usage` are immediately
@@ -196,7 +203,6 @@ type AuthConfig struct {
OIDCIssuer string // OIDC issuer URL for auto-discovery (e.g. https://accounts.google.com)
OIDCClientID string
OIDCClientSecret string
BaseURL string // for OAuth callback URLs (e.g. "http://localhost:8080")
AdminEmail string // auto-promote to admin on login
RegistrationMode string // "open", "approval" (default when empty), "invite"
DisableLocalAuth bool // disable local email/password registration and login
@@ -712,6 +718,18 @@ func WithMITMCADir(dir string) AppOption {
}
}
// WithPIIDefaultDetectors sets the instance-wide default PII/secret detector
// model names applied to any PII-enabled model (chiefly cloud-proxy / MITM
// models) that names no pii.detectors of its own. CLI/env:
// LOCALAI_PII_DEFAULT_DETECTORS. Empty leaves the value to
// runtime_settings.json / the Middleware UI; a non-empty value takes
// precedence over the file (env > file).
func WithPIIDefaultDetectors(detectors []string) AppOption {
return func(o *ApplicationConfig) {
o.PIIDefaultDetectors = detectors
}
}
func WithDynamicConfigDir(dynamicConfigsDir string) AppOption {
return func(o *ApplicationConfig) {
o.DynamicConfigsDir = dynamicConfigsDir
@@ -938,9 +956,9 @@ func WithAuthGitHubClientSecret(clientSecret string) AppOption {
}
}
func WithAuthBaseURL(baseURL string) AppOption {
func WithExternalBaseURL(url string) AppOption {
return func(o *ApplicationConfig) {
o.Auth.BaseURL = baseURL
o.ExternalBaseURL = url
}
}

View File

@@ -2,6 +2,7 @@ package config
import (
"fmt"
"os"
"strconv"
"strings"
@@ -9,6 +10,19 @@ import (
"github.com/mudler/xlog"
)
// HardwareDefaultsDisabled reports whether hardware auto-tuning is turned off via
// LOCALAI_DISABLE_HARDWARE_DEFAULTS=true (mirrors LOCALAI_DISABLE_GUESSING). When
// set, ApplyHardwareDefaults and the distributed router's node tuning are
// skipped entirely, so the backend runs llama.cpp's stock batch/parallel
// behavior — an escape hatch for users who want predictable, un-tuned defaults.
func HardwareDefaultsDisabled() bool {
// Read directly like the sibling LOCALAI_DISABLE_GUESSING toggle in
// hooks_llamacpp.go: these config-layer heuristic switches run deep in the
// defaults pipeline with no ApplicationConfig in scope to plumb through.
//nolint:forbidigo // config-layer heuristic toggle, mirrors LOCALAI_DISABLE_GUESSING
return os.Getenv("LOCALAI_DISABLE_HARDWARE_DEFAULTS") == "true"
}
// Hardware-driven model-config defaults.
//
// This sits alongside the other config overriders (ApplyInferenceDefaults for
@@ -54,8 +68,35 @@ func (g GPU) IsNVIDIABlackwell() bool {
return maj >= 12
}
// Compute-buffer headroom guard for the raised physical batch.
//
// Raising n_ubatch grows the CUDA *compute buffer* (the scratch for the forward
// graph), which is allocated PER DEVICE — it does not benefit from a second GPU
// the way weights or KV (which are split across devices) do. The buffer scales
// ~linearly with n_ubatch * n_ctx, so a large context turns the GB10-tuned
// ub2048 into multi-GiB of extra scratch that must fit on a SINGLE card. On a
// 16 GiB consumer Blackwell with a 200k context that overflows (issue #10485),
// even though the GB10 it was measured on (128 GiB unified memory) had room.
//
// These constants size a conservative guard: only raise the batch when the
// extra scratch fits the per-device VRAM ceiling.
const (
// computeBufferBytesPerCell approximates the CUDA compute-buffer cost of one
// (n_ubatch * n_ctx) cell. Derived from an observed allocation (ub2048 *
// ctx204800 ~= 4.5 GiB => ~11 B/cell) and rounded up to 16 for margin, since
// the real cost also grows with model width (heads / embedding dim) which we
// don't know at config time.
computeBufferBytesPerCell = 16
// blackwellBatchHeadroomDivisor caps the extra compute buffer from raising the
// physical batch at VRAM/divisor. /4 keeps the bulk of a device for weights +
// KV, which already dominate VRAM use.
blackwellBatchHeadroomDivisor = 4
)
// PhysicalBatch returns the canonical physical batch (n_batch/n_ubatch) for the
// given hardware, used when the model config leaves batch unset.
// given hardware class, ignoring context/VRAM headroom. Use
// PhysicalBatchForContext when a model context and per-device VRAM are known
// (the load paths) so the raised batch can't overflow a single device.
func PhysicalBatch(g GPU) int {
if g.IsNVIDIABlackwell() {
return BlackwellPhysicalBatch
@@ -63,6 +104,51 @@ func PhysicalBatch(g GPU) int {
return DefaultPhysicalBatch
}
// PhysicalBatchForContext is PhysicalBatch gated on per-device VRAM headroom for
// the given context: it only raises the batch above the conservative default
// when the extra compute buffer (which is allocated on a single device and grows
// with n_ubatch * n_ctx) fits within blackwellBatchHeadroomDivisor of the GPU's
// VRAM. g.VRAM must be the PER-DEVICE ceiling (the smallest device on a
// multi-GPU host), not the summed total — the compute buffer can't be split.
//
// VRAM 0 (unknown) stays conservative rather than risk a per-device OOM; the
// GB10 / unified-memory path reports system RAM, so it still clears the guard.
func PhysicalBatchForContext(g GPU, ctx int) int {
if !g.IsNVIDIABlackwell() {
return DefaultPhysicalBatch
}
if g.VRAM == 0 {
return DefaultPhysicalBatch
}
if largeContextForDevice(g, ctx) {
return DefaultPhysicalBatch
}
return BlackwellPhysicalBatch
}
// largeContextForDevice reports whether the given context is large relative to
// the per-device VRAM ceiling — the shared "tight single-model fit" signal that
// suppresses BOTH throughput-oriented defaults (the Blackwell batch boost and
// the concurrency slot count). It sizes the extra compute-buffer scratch a
// raised batch would need at this context (which grows ~n_ubatch * n_ctx and
// is allocated per device) and asks whether it overflows a fraction of the
// device VRAM; when it does, the device has no headroom to spend on throughput
// and the conservative defaults must hold (issue #10485).
//
// g.VRAM must be the PER-DEVICE ceiling (the smallest device on a multi-GPU
// host). VRAM 0 (unknown) is treated as not-large so detection gaps don't
// silently disable the defaults.
func largeContextForDevice(g GPU, ctx int) bool {
if g.VRAM == 0 {
return false
}
if ctx <= 0 {
ctx = DefaultContextSize
}
extra := uint64(ctx) * uint64(BlackwellPhysicalBatch-DefaultPhysicalBatch) * computeBufferBytesPerCell
return extra > g.VRAM/blackwellBatchHeadroomDivisor
}
// IsManagedPhysicalBatch reports whether n is a value PhysicalBatch assigns.
// Callers that re-tune a value chosen by an upstream host (the distributed
// router correcting the frontend's guess) use this to avoid clobbering an
@@ -99,17 +185,50 @@ func DefaultParallelSlots(g GPU) int {
}
}
// EnsureParallelOption appends a VRAM-scaled "parallel:N" backend option when the
// model doesn't already set one (and the GPU warrants concurrency). Returns the
// possibly-extended options. Shared by the single-host config path
// (ApplyHardwareDefaults) and the distributed router (per selected node).
func EnsureParallelOption(opts []string, gpu GPU) []string {
if slots := DefaultParallelSlots(gpu); slots > 1 && !hasParallelOption(opts) {
// ParallelSlotsForContext is DefaultParallelSlots gated on per-device VRAM
// headroom for the given context. A large context already claims most of a
// single device's VRAM (the KV cache plus the per-slot compute/checkpoint
// scratch that scales with n_seq_max), so defaulting multiple slots there
// pushes a tight single-model fit into per-device CUDA OOM (issue #10485): the
// model loads but the final allocation (e.g. an MTP draft context's KV cache)
// overflows the tighter card by a few hundred MiB. Returns 1 (no concurrency)
// in that tight regime, otherwise the VRAM-scaled DefaultParallelSlots.
//
// g.VRAM must be the PER-DEVICE ceiling (smallest device on a multi-GPU host).
// It shares largeContextForDevice with the batch boost so both throughput
// defaults are suppressed together; the GB10 / unified-memory path reports
// system RAM and so keeps full concurrency even at large contexts.
func ParallelSlotsForContext(g GPU, ctx int) int {
slots := DefaultParallelSlots(g)
if slots <= 1 || g.VRAM == 0 {
return slots
}
if largeContextForDevice(g, ctx) {
return 1
}
return slots
}
// EnsureParallelOptionForContext appends a VRAM-scaled "parallel:N" backend
// option when the model doesn't already set one and the GPU warrants (and has
// headroom for) concurrency at this context. Returns the possibly-extended
// options. Shared by the single-host config path (ApplyHardwareDefaults) and
// the distributed router (per selected node).
func EnsureParallelOptionForContext(opts []string, gpu GPU, ctx int) []string {
if slots := ParallelSlotsForContext(gpu, ctx); slots > 1 && !hasParallelOption(opts) {
return append(opts, fmt.Sprintf("parallel:%d", slots))
}
return opts
}
// EnsureParallelOption is EnsureParallelOptionForContext with no known context
// (defaults to DefaultContextSize, which clears the headroom gate on any device
// large enough to warrant concurrency). Kept for callers without a model
// context.
func EnsureParallelOption(opts []string, gpu GPU) []string {
return EnsureParallelOptionForContext(opts, gpu, 0)
}
// hasParallelOption reports whether the model already sets parallel/n_parallel
// so we never override an explicit value (helper shared with serving_defaults.go).
func hasParallelOption(opts []string) bool {
@@ -122,7 +241,12 @@ func hasParallelOption(opts []string) bool {
// deterministic device — detection does a live nvidia-smi call.
var localGPU = func() GPU {
vendor, _ := xsysinfo.DetectGPUVendor()
vram, _ := xsysinfo.TotalAvailableVRAM()
// Use the SMALLEST device's VRAM, not the summed total: the parallel-slot
// tier and the batch headroom guard both reason about what fits on a single
// card, and per-device compute buffers can't be split across GPUs. Summing
// two 16 GiB cards into "32 GiB" is what over-provisioned multi-GPU hosts
// into OOM (issue #10485).
vram, _ := xsysinfo.MinPerGPUVRAM()
return GPU{
Vendor: vendor,
ComputeCapability: xsysinfo.NVIDIAComputeCapability(),
@@ -134,25 +258,36 @@ var localGPU = func() GPU {
// and were left unset by the user. Currently: a larger physical batch on
// Blackwell. Explicit config always wins (we only touch zero values).
func ApplyHardwareDefaults(cfg *ModelConfig, gpu GPU) {
if cfg == nil {
if cfg == nil || HardwareDefaultsDisabled() {
return
}
if cfg.Batch == 0 && gpu.IsNVIDIABlackwell() {
cfg.Batch = BlackwellPhysicalBatch
xlog.Debug("[hardware_defaults] Blackwell GPU: defaulting physical batch",
"batch", cfg.Batch, "compute_cap", gpu.ComputeCapability)
// Raise the physical batch on Blackwell only when the resulting compute
// buffer fits the per-device VRAM at THIS model's context. Leaving Batch at 0
// (rather than writing the default 512) preserves the downstream single-pass
// sizing in core/backend.EffectiveBatchSize for embedding/score/rerank.
ctx := DefaultContextSize
if cfg.ContextSize != nil {
ctx = *cfg.ContextSize
}
if cfg.Batch == 0 {
if PhysicalBatchForContext(gpu, ctx) == BlackwellPhysicalBatch {
cfg.Batch = BlackwellPhysicalBatch
xlog.Debug("[hardware_defaults] Blackwell GPU: defaulting physical batch",
"batch", cfg.Batch, "compute_cap", gpu.ComputeCapability, "context", ctx, "vram_gib", gpu.VRAM>>30)
}
}
// Enable concurrent serving by default on a capable GPU: without this the
// llama.cpp backend runs n_parallel=1 and serializes multi-user requests
// (continuous batching stays off). Unified KV means the slots share the
// context budget, so this is concurrency without extra KV memory. Explicit
// parallel/n_parallel in the model options always wins.
// context budget, but a context large enough to fill a single device leaves
// no room for the per-slot scratch, so the slot count is gated on per-device
// headroom too (issue #10485). Explicit parallel/n_parallel always wins.
if before := len(cfg.Options); true {
cfg.Options = EnsureParallelOption(cfg.Options, gpu)
cfg.Options = EnsureParallelOptionForContext(cfg.Options, gpu, ctx)
if len(cfg.Options) > before {
xlog.Debug("[hardware_defaults] defaulting parallel slots for concurrent serving",
"option", cfg.Options[len(cfg.Options)-1], "vram_gib", gpu.VRAM>>30)
"option", cfg.Options[len(cfg.Options)-1], "context", ctx, "vram_gib", gpu.VRAM>>30)
}
}
}

View File

@@ -9,26 +9,37 @@ import (
// GPU. The detection seam (localGPU) is injected so the path is deterministic
// without a real GPU.
var _ = Describe("SetDefaults hardware defaults (single-instance)", func() {
const gib = uint64(1) << 30
var orig func() GPU
BeforeEach(func() { orig = localGPU })
AfterEach(func() { localGPU = orig })
It("sets the physical batch on a local Blackwell GPU", func() {
localGPU = func() GPU { return GPU{ComputeCapability: "12.1"} }
It("sets the physical batch on a local Blackwell GPU with headroom", func() {
localGPU = func() GPU { return GPU{ComputeCapability: "12.1", VRAM: 119 * gib} }
cfg := &ModelConfig{}
cfg.SetDefaults()
Expect(cfg.Batch).To(Equal(BlackwellPhysicalBatch))
})
It("leaves batch unset when a large context would overflow the device", func() {
// Regression guard for issue #10485: 16 GiB consumer Blackwell + ~200k ctx.
localGPU = func() GPU { return GPU{ComputeCapability: "12.0", VRAM: 16 * gib} }
ctx := 204800
cfg := &ModelConfig{LLMConfig: LLMConfig{ContextSize: &ctx}}
cfg.SetDefaults()
Expect(cfg.Batch).To(Equal(0))
})
It("leaves batch unset on a non-Blackwell local GPU", func() {
localGPU = func() GPU { return GPU{ComputeCapability: "8.9"} }
localGPU = func() GPU { return GPU{ComputeCapability: "8.9", VRAM: 119 * gib} }
cfg := &ModelConfig{}
cfg.SetDefaults()
Expect(cfg.Batch).To(Equal(0))
})
It("never overrides an explicit batch", func() {
localGPU = func() GPU { return GPU{ComputeCapability: "12.1"} }
localGPU = func() GPU { return GPU{ComputeCapability: "12.1", VRAM: 119 * gib} }
cfg := &ModelConfig{}
cfg.Batch = 1024
cfg.SetDefaults()

View File

@@ -7,6 +7,8 @@ import (
)
var _ = Describe("Hardware-driven config defaults", func() {
const gib = uint64(1) << 30
DescribeTable("GPU.IsNVIDIABlackwell (sm_12x consumer family)",
func(cc string, want bool) {
Expect(GPU{ComputeCapability: cc}.IsNVIDIABlackwell()).To(Equal(want))
@@ -35,29 +37,69 @@ var _ = Describe("Hardware-driven config defaults", func() {
})
})
Describe("PhysicalBatchForContext (per-device VRAM headroom)", func() {
It("raises the batch when the compute buffer fits the device", func() {
// 16 GiB Blackwell with a small context: the extra scratch is tiny.
Expect(PhysicalBatchForContext(GPU{ComputeCapability: "12.0", VRAM: 16 * gib}, 8192)).
To(Equal(BlackwellPhysicalBatch))
})
It("keeps the default batch when a large context would overflow one device", func() {
// The issue #10485 case: 16 GiB consumer Blackwell, ~200k context.
Expect(PhysicalBatchForContext(GPU{ComputeCapability: "12.0", VRAM: 16 * gib}, 204800)).
To(Equal(DefaultPhysicalBatch))
})
It("still raises the batch on a large unified-memory device (GB10)", func() {
// GB10 reports system RAM (~119 GiB) as its single device's VRAM.
Expect(PhysicalBatchForContext(GPU{ComputeCapability: "12.1", VRAM: 119 * gib}, 204800)).
To(Equal(BlackwellPhysicalBatch))
})
It("stays conservative when VRAM is unknown", func() {
Expect(PhysicalBatchForContext(GPU{ComputeCapability: "12.1"}, 8192)).
To(Equal(DefaultPhysicalBatch))
})
It("never raises the batch on non-Blackwell", func() {
Expect(PhysicalBatchForContext(GPU{ComputeCapability: "9.0", VRAM: 80 * gib}, 8192)).
To(Equal(DefaultPhysicalBatch))
})
})
Describe("ApplyHardwareDefaults", func() {
It("raises an unset batch to 2048 on Blackwell", func() {
It("raises an unset batch to 2048 on Blackwell with headroom", func() {
cfg := &ModelConfig{}
ApplyHardwareDefaults(cfg, GPU{ComputeCapability: "12.1"})
ApplyHardwareDefaults(cfg, GPU{ComputeCapability: "12.1", VRAM: 119 * gib})
Expect(cfg.Batch).To(Equal(BlackwellPhysicalBatch))
})
It("leaves batch unset when a large context would overflow one device", func() {
// Regression guard for issue #10485: 16 GiB card + ~200k context.
ctx := 204800
cfg := &ModelConfig{LLMConfig: LLMConfig{ContextSize: &ctx}}
ApplyHardwareDefaults(cfg, GPU{ComputeCapability: "12.0", VRAM: 16 * gib})
Expect(cfg.Batch).To(Equal(0))
})
It("leaves batch unset on non-Blackwell", func() {
cfg := &ModelConfig{}
ApplyHardwareDefaults(cfg, GPU{ComputeCapability: "9.0"})
ApplyHardwareDefaults(cfg, GPU{ComputeCapability: "9.0", VRAM: 119 * gib})
Expect(cfg.Batch).To(Equal(0))
})
It("never overrides an explicit batch", func() {
cfg := &ModelConfig{}
cfg.Batch = 1024
ApplyHardwareDefaults(cfg, GPU{ComputeCapability: "12.1"})
ApplyHardwareDefaults(cfg, GPU{ComputeCapability: "12.1", VRAM: 119 * gib})
Expect(cfg.Batch).To(Equal(1024))
})
It("no-ops on nil", func() {
Expect(func() { ApplyHardwareDefaults(nil, GPU{ComputeCapability: "12.1"}) }).ToNot(Panic())
})
})
const gib = uint64(1) << 30
It("applies nothing when hardware defaults are disabled via env", func() {
GinkgoT().Setenv("LOCALAI_DISABLE_HARDWARE_DEFAULTS", "true")
Expect(HardwareDefaultsDisabled()).To(BeTrue())
cfg := &ModelConfig{}
ApplyHardwareDefaults(cfg, GPU{ComputeCapability: "12.1", VRAM: 119 * gib})
Expect(cfg.Batch).To(Equal(0))
Expect(cfg.Options).To(BeEmpty())
})
})
DescribeTable("DefaultParallelSlots (by VRAM)",
func(vramGiB uint64, want int) {
@@ -72,12 +114,46 @@ var _ = Describe("Hardware-driven config defaults", func() {
Entry("unknown 0", uint64(0), 1),
)
Describe("ParallelSlotsForContext (per-device VRAM headroom)", func() {
It("keeps the VRAM-scaled slot count when the context fits the device", func() {
// 16 GiB card, small context: plenty of room for concurrency.
Expect(ParallelSlotsForContext(GPU{VRAM: 16 * gib}, 8192)).To(Equal(4))
})
It("drops to a single slot when a large context already fills the device", func() {
// Regression guard for issue #10485: 16 GiB consumer Blackwell, ~200k
// context. Even with unified KV, the per-slot compute/checkpoint
// scratch from 4 slots is the straw that overflows the tighter device.
Expect(ParallelSlotsForContext(GPU{VRAM: 16 * gib}, 204800)).To(Equal(1))
})
It("keeps concurrency on a large unified-memory device (GB10)", func() {
// GB10 reports system RAM (~119 GiB): a 200k context leaves headroom.
Expect(ParallelSlotsForContext(GPU{VRAM: 119 * gib}, 204800)).To(Equal(8))
})
It("keeps concurrency on a big datacenter card with a large context", func() {
// 80 GiB A100: 200k context is a small fraction, concurrency stays.
Expect(ParallelSlotsForContext(GPU{VRAM: 80 * gib}, 204800)).To(Equal(8))
})
It("stays a single slot on small/unknown VRAM regardless of context", func() {
Expect(ParallelSlotsForContext(GPU{VRAM: 2 * gib}, 8192)).To(Equal(1))
Expect(ParallelSlotsForContext(GPU{}, 8192)).To(Equal(1))
})
})
Describe("ApplyHardwareDefaults parallel slots", func() {
It("adds a VRAM-scaled parallel option on a capable GPU", func() {
cfg := &ModelConfig{}
ApplyHardwareDefaults(cfg, GPU{ComputeCapability: "12.1", VRAM: 119 * gib})
Expect(cfg.Options).To(ContainElement("parallel:8"))
})
It("adds no parallel option when a large context already fills one device", func() {
// Regression guard for issue #10485: 16 GiB card + ~200k context. The
// model barely fits; defaulting concurrency tips the tighter GPU into
// CUDA OOM during the final (MTP draft) KV allocation.
ctx := 204800
cfg := &ModelConfig{LLMConfig: LLMConfig{ContextSize: &ctx}}
ApplyHardwareDefaults(cfg, GPU{ComputeCapability: "12.0", VRAM: 16 * gib})
Expect(cfg.Options).ToNot(ContainElement(ContainSubstring("parallel")))
})
It("scales the slot count down with VRAM", func() {
cfg := &ModelConfig{}
ApplyHardwareDefaults(cfg, GPU{VRAM: 24 * gib})

View File

@@ -1204,11 +1204,6 @@ func (cfg *ModelConfig) SetDefaults(opts ...ConfigLoaderOption) {
// This ensures gallery-installed and runtime-loaded models get optimal parameters.
ApplyInferenceDefaults(cfg, cfg.Name, cfg.Model)
// Apply hardware-driven defaults (e.g. a larger physical batch on Blackwell).
// Uses the local GPU here; in distributed mode the router re-applies the same
// heuristics for the selected node's GPU before loading. Explicit config wins.
ApplyHardwareDefaults(cfg, localGPU())
// Apply serving-policy defaults (device-independent): cross-request prefix
// caching. Propagates to distributed nodes via the model options.
ApplyServingDefaults(cfg)
@@ -1247,6 +1242,16 @@ func (cfg *ModelConfig) SetDefaults(opts ...ConfigLoaderOption) {
cfg.ContextSize = &ctx
}
runBackendHooks(cfg, lo.modelPath)
// Apply hardware-driven defaults (e.g. a larger physical batch on Blackwell)
// LAST, after the context size is fully resolved (explicit config, LoadOptions,
// then the GGUF guess inside runBackendHooks): the Blackwell batch guard sizes
// the per-device compute buffer against this model's context, so it must see
// the final value, not a pre-guess nil. Uses the local GPU here; in distributed
// mode the router re-applies the same heuristics for the selected node's GPU
// before loading. Explicit config always wins.
ApplyHardwareDefaults(cfg, localGPU())
cfg.syncKnownUsecasesFromString()
}

View File

@@ -5,6 +5,7 @@ import (
"errors"
"os"
"path/filepath"
"reflect"
)
// runtimeSettingsFile is the on-disk filename inside DynamicConfigsDir.
@@ -33,6 +34,35 @@ func (o *ApplicationConfig) ReadPersistedSettings() (RuntimeSettings, error) {
return settings, nil
}
// MergeNonNil overlays every set (non-nil) field of overlay onto the
// receiver, leaving the receiver's value untouched wherever overlay left a
// field unset. Every RuntimeSettings field is a pointer precisely so "set"
// can be told apart from "absent" (see the type doc), which makes this a
// faithful partial update: a caller that submits only the field it owns
// changes exactly that field and never clobbers unrelated settings.
//
// This is the read-modify-write contract the persistence helpers exist for.
// UpdateSettingsEndpoint reads the on-disk settings, merges the request body
// on top, and writes the result — so a focused admin page that POSTs only its
// own field (the Middleware page sends only mitm_listen; the detector table
// only pii_default_detectors) no longer nulls every other setting.
//
// Reflection keeps the merge total over the struct: a field added to
// RuntimeSettings later is merged automatically, so the persistence path can
// never silently drop a new setting the way a hand-maintained field list
// would. Non-pointer fields (none today) are skipped — they cannot express
// "absent", so the receiver wins.
func (s *RuntimeSettings) MergeNonNil(overlay RuntimeSettings) {
dst := reflect.ValueOf(s).Elem()
src := reflect.ValueOf(overlay)
for i := 0; i < src.NumField(); i++ {
f := src.Field(i)
if f.Kind() == reflect.Pointer && !f.IsNil() {
dst.Field(i).Set(f)
}
}
}
// WritePersistedSettings serialises the given RuntimeSettings to
// runtime_settings.json with restricted permissions (it may carry API
// keys and P2P tokens).

View File

@@ -12,6 +12,7 @@ import (
)
func strPtr(s string) *string { return &s }
func boolPtr(b bool) *bool { return &b }
var _ = Describe("RuntimeSettings persistence helpers", func() {
var (
@@ -51,6 +52,47 @@ var _ = Describe("RuntimeSettings persistence helpers", func() {
})
})
// MergeNonNil is the partial-update primitive UpdateSettingsEndpoint
// relies on: a focused admin page POSTs only the field it owns, and the
// handler reads the on-disk settings and overlays the request on top.
// Without it, the body would be written verbatim and every field the
// caller omitted would be nulled (the reported regression: changing
// mitm_listen wiped the galleries, api keys, watchdog config, etc.).
Describe("MergeNonNil partial update", func() {
It("overlays set fields and preserves unset ones", func() {
base := config.RuntimeSettings{
MITMListen: strPtr(":9000"),
Galleries: &[]config.Gallery{{Name: "g1", URL: "http://example/g1"}},
WatchdogIdleEnabled: boolPtr(true),
ApiKeys: &[]string{"persisted-key"},
PIIDefaultDetectors: &[]string{"det-a"},
}
// Simulate the Middleware proxy tab: only mitm_listen is sent.
overlay := config.RuntimeSettings{MITMListen: strPtr(":8443")}
base.MergeNonNil(overlay)
Expect(base.MITMListen).ToNot(BeNil())
Expect(*base.MITMListen).To(Equal(":8443"), "set field should be overlaid")
// Everything the overlay left unset must survive untouched.
Expect(base.Galleries).ToNot(BeNil(), "galleries were clobbered")
Expect(*base.Galleries).To(HaveLen(1))
Expect(base.WatchdogIdleEnabled).ToNot(BeNil())
Expect(*base.WatchdogIdleEnabled).To(BeTrue())
Expect(base.ApiKeys).ToNot(BeNil(), "api_keys were clobbered")
Expect(*base.ApiKeys).To(Equal([]string{"persisted-key"}))
Expect(base.PIIDefaultDetectors).ToNot(BeNil(), "pii_default_detectors were clobbered")
Expect(*base.PIIDefaultDetectors).To(Equal([]string{"det-a"}))
})
It("lets an explicit empty slice clear a field", func() {
base := config.RuntimeSettings{PIIDefaultDetectors: &[]string{"det-a"}}
base.MergeNonNil(config.RuntimeSettings{PIIDefaultDetectors: &[]string{}})
Expect(base.PIIDefaultDetectors).ToNot(BeNil())
Expect(*base.PIIDefaultDetectors).To(BeEmpty(), "an explicit empty slice should clear, not preserve")
})
})
// MITM round trip pins the contract that loadRuntimeSettingsFromFile
// MITM listener address must survive a write/read round trip so the
// next process restart can bring the listener back up. (Intercept

View File

@@ -149,6 +149,18 @@ func API(application *application.Application) (*echo.Echo, error) {
// Middleware - StripPathPrefix must be registered early as it uses Rewrite which runs before routing
e.Pre(httpMiddleware.StripPathPrefix())
// Stamp the configured external base URL into each request context so
// middleware.BaseURL can treat it as authoritative for self-referential
// links. Registered as Pre so it runs before routing and handlers.
if extBaseURL := application.ApplicationConfig().ExternalBaseURL; extBaseURL != "" {
e.Pre(func(next echo.HandlerFunc) echo.HandlerFunc {
return func(c echo.Context) error {
c.Set("_external_base_url", extBaseURL)
return next(c)
}
})
}
e.Pre(middleware.RemoveTrailingSlash())
if application.ApplicationConfig().MachineTag != "" {

View File

@@ -70,7 +70,7 @@ func UploadToCollectionEndpoint(app *application.Application) echo.HandlerFunc {
return func(c echo.Context) error {
svc := app.AgentPoolService()
userID := effectiveUserID(c)
name := c.Param("name")
name := decodedParam(c, "name")
file, err := c.FormFile("file")
if err != nil {
return c.JSON(http.StatusBadRequest, map[string]string{"error": "file required"})
@@ -116,7 +116,7 @@ func ListCollectionEntriesEndpoint(app *application.Application) echo.HandlerFun
return func(c echo.Context) error {
svc := app.AgentPoolService()
userID := effectiveUserID(c)
entries, err := svc.ListCollectionEntriesForUser(userID, c.Param("name"))
entries, err := svc.ListCollectionEntriesForUser(userID, decodedParam(c, "name"))
if err != nil {
if strings.Contains(err.Error(), "not found") {
return c.JSON(http.StatusNotFound, map[string]string{"error": err.Error()})
@@ -139,7 +139,7 @@ func GetCollectionEntryContentEndpoint(app *application.Application) echo.Handle
if err != nil {
entry = entryParam
}
content, chunkCount, err := svc.GetCollectionEntryContentForUser(userID, c.Param("name"), entry)
content, chunkCount, err := svc.GetCollectionEntryContentForUser(userID, decodedParam(c, "name"), entry)
if err != nil {
if strings.Contains(err.Error(), "not found") {
return c.JSON(http.StatusNotFound, map[string]string{"error": err.Error()})
@@ -164,7 +164,7 @@ func SearchCollectionEndpoint(app *application.Application) echo.HandlerFunc {
if err := c.Bind(&payload); err != nil {
return c.JSON(http.StatusBadRequest, map[string]string{"error": err.Error()})
}
results, err := svc.SearchCollectionForUser(userID, c.Param("name"), payload.Query, payload.MaxResults)
results, err := svc.SearchCollectionForUser(userID, decodedParam(c, "name"), payload.Query, payload.MaxResults)
if err != nil {
if strings.Contains(err.Error(), "not found") {
return c.JSON(http.StatusNotFound, map[string]string{"error": err.Error()})
@@ -182,7 +182,7 @@ func ResetCollectionEndpoint(app *application.Application) echo.HandlerFunc {
return func(c echo.Context) error {
svc := app.AgentPoolService()
userID := effectiveUserID(c)
if err := svc.ResetCollectionForUser(userID, c.Param("name")); err != nil {
if err := svc.ResetCollectionForUser(userID, decodedParam(c, "name")); err != nil {
if strings.Contains(err.Error(), "not found") {
return c.JSON(http.StatusNotFound, map[string]string{"error": err.Error()})
}
@@ -202,7 +202,7 @@ func DeleteCollectionEntryEndpoint(app *application.Application) echo.HandlerFun
if err := c.Bind(&payload); err != nil {
return c.JSON(http.StatusBadRequest, map[string]string{"error": err.Error()})
}
remaining, err := svc.DeleteCollectionEntryForUser(userID, c.Param("name"), payload.Entry)
remaining, err := svc.DeleteCollectionEntryForUser(userID, decodedParam(c, "name"), payload.Entry)
if err != nil {
if strings.Contains(err.Error(), "not found") {
return c.JSON(http.StatusNotFound, map[string]string{"error": err.Error()})
@@ -230,7 +230,7 @@ func AddCollectionSourceEndpoint(app *application.Application) echo.HandlerFunc
if payload.UpdateInterval < 1 {
payload.UpdateInterval = 60
}
if err := svc.AddCollectionSourceForUser(userID, c.Param("name"), payload.URL, payload.UpdateInterval); err != nil {
if err := svc.AddCollectionSourceForUser(userID, decodedParam(c, "name"), payload.URL, payload.UpdateInterval); err != nil {
if strings.Contains(err.Error(), "not found") {
return c.JSON(http.StatusNotFound, map[string]string{"error": err.Error()})
}
@@ -250,7 +250,7 @@ func RemoveCollectionSourceEndpoint(app *application.Application) echo.HandlerFu
if err := c.Bind(&payload); err != nil {
return c.JSON(http.StatusBadRequest, map[string]string{"error": err.Error()})
}
if err := svc.RemoveCollectionSourceForUser(userID, c.Param("name"), payload.URL); err != nil {
if err := svc.RemoveCollectionSourceForUser(userID, decodedParam(c, "name"), payload.URL); err != nil {
return c.JSON(http.StatusInternalServerError, map[string]string{"error": err.Error()})
}
return c.JSON(http.StatusOK, map[string]string{"status": "ok"})
@@ -267,7 +267,7 @@ func GetCollectionEntryRawFileEndpoint(app *application.Application) echo.Handle
if err != nil {
entry = entryParam
}
fpath, err := svc.GetCollectionEntryFilePathForUser(userID, c.Param("name"), entry)
fpath, err := svc.GetCollectionEntryFilePathForUser(userID, decodedParam(c, "name"), entry)
if err != nil {
if strings.Contains(err.Error(), "not found") {
return c.JSON(http.StatusNotFound, map[string]string{"error": err.Error()})
@@ -282,7 +282,7 @@ func ListCollectionSourcesEndpoint(app *application.Application) echo.HandlerFun
return func(c echo.Context) error {
svc := app.AgentPoolService()
userID := effectiveUserID(c)
sources, err := svc.ListCollectionSourcesForUser(userID, c.Param("name"))
sources, err := svc.ListCollectionSourcesForUser(userID, decodedParam(c, "name"))
if err != nil {
if strings.Contains(err.Error(), "not found") {
return c.JSON(http.StatusNotFound, map[string]string{"error": err.Error()})

View File

@@ -0,0 +1,49 @@
package localai
import (
"net/http"
"net/http/httptest"
"github.com/labstack/echo/v4"
. "github.com/onsi/ginkgo/v2"
. "github.com/onsi/gomega"
)
// Regression for #10443: agent/collection names carry a "legacy-api-key:"
// prefix, so the ':' is percent-encoded as %3A in the request path. Echo routes
// such paths via URL.RawPath and stores the path-param value still escaped, so
// handlers must URL-decode it before looking the collection up in the store -
// otherwise the lookup sees "legacy-api-key%3ALiteraryResearch" and 404s.
var _ = Describe("decodedParam", func() {
var e *echo.Echo
BeforeEach(func() {
e = echo.New()
})
// route runs a request through Echo's real router so the path param is
// populated exactly as it would be in production, then returns the decoded
// value the handler would observe.
route := func(rawPath string) string {
var got string
e.GET("/api/agents/collections/:name/upload", func(c echo.Context) error {
got = decodedParam(c, "name")
return c.NoContent(http.StatusOK)
})
req := httptest.NewRequest(http.MethodGet, rawPath, nil)
rec := httptest.NewRecorder()
e.ServeHTTP(rec, req)
Expect(rec.Code).To(Equal(http.StatusOK))
return got
}
It("decodes a percent-encoded colon in the collection name", func() {
got := route("/api/agents/collections/legacy-api-key%3ALiteraryResearch/upload")
Expect(got).To(Equal("legacy-api-key:LiteraryResearch"))
})
It("leaves an unencoded name untouched", func() {
got := route("/api/agents/collections/PlainCollection/upload")
Expect(got).To(Equal("PlainCollection"))
})
})

View File

@@ -6,6 +6,7 @@ import (
"io"
"maps"
"net/http"
"net/url"
"os"
"path/filepath"
"slices"
@@ -33,6 +34,22 @@ func getUserID(c echo.Context) string {
return user.ID
}
// decodedParam returns the named path parameter, URL-decoding it.
//
// Echo routes a request via URL.RawPath whenever the path contains
// percent-encoded characters (e.g. %3A for ':'), and in that case stores the
// matched path-param value raw/escaped. Agent and collection names carry a
// "legacy-api-key:" prefix, so the ':' arrives as %3A and the raw param no
// longer matches the stored name. Callers must unescape before lookups.
// Falls back to the raw value if it isn't valid percent-encoding.
func decodedParam(c echo.Context, name string) string {
raw := c.Param(name)
if decoded, err := url.PathUnescape(raw); err == nil {
return decoded
}
return raw
}
// isAdminUser returns true if the authenticated user has admin role.
func isAdminUser(c echo.Context) bool {
user := auth.GetUser(c)
@@ -127,7 +144,7 @@ func GetAgentEndpoint(app *application.Application) echo.HandlerFunc {
return func(c echo.Context) error {
svc := app.AgentPoolService()
userID := effectiveUserID(c)
name := c.Param("name")
name := decodedParam(c, "name")
statuses := svc.ListAgentsForUser(userID)
active, exists := statuses[name]
@@ -142,7 +159,7 @@ func UpdateAgentEndpoint(app *application.Application) echo.HandlerFunc {
return func(c echo.Context) error {
svc := app.AgentPoolService()
userID := effectiveUserID(c)
name := c.Param("name")
name := decodedParam(c, "name")
var cfg state.AgentConfig
if err := c.Bind(&cfg); err != nil {
return c.JSON(http.StatusBadRequest, map[string]string{"error": err.Error()})
@@ -161,7 +178,7 @@ func DeleteAgentEndpoint(app *application.Application) echo.HandlerFunc {
return func(c echo.Context) error {
svc := app.AgentPoolService()
userID := effectiveUserID(c)
name := c.Param("name")
name := decodedParam(c, "name")
if err := svc.DeleteAgentForUser(userID, name); err != nil {
return c.JSON(http.StatusInternalServerError, map[string]string{"error": err.Error()})
}
@@ -173,7 +190,7 @@ func GetAgentConfigEndpoint(app *application.Application) echo.HandlerFunc {
return func(c echo.Context) error {
svc := app.AgentPoolService()
userID := effectiveUserID(c)
name := c.Param("name")
name := decodedParam(c, "name")
cfg := svc.GetAgentConfigForUser(userID, name)
if cfg == nil {
return c.JSON(http.StatusNotFound, map[string]string{"error": "Agent not found"})
@@ -186,7 +203,7 @@ func PauseAgentEndpoint(app *application.Application) echo.HandlerFunc {
return func(c echo.Context) error {
svc := app.AgentPoolService()
userID := effectiveUserID(c)
if err := svc.PauseAgentForUser(userID, c.Param("name")); err != nil {
if err := svc.PauseAgentForUser(userID, decodedParam(c, "name")); err != nil {
return c.JSON(http.StatusNotFound, map[string]string{"error": err.Error()})
}
return c.JSON(http.StatusOK, map[string]string{"status": "ok"})
@@ -197,7 +214,7 @@ func ResumeAgentEndpoint(app *application.Application) echo.HandlerFunc {
return func(c echo.Context) error {
svc := app.AgentPoolService()
userID := effectiveUserID(c)
if err := svc.ResumeAgentForUser(userID, c.Param("name")); err != nil {
if err := svc.ResumeAgentForUser(userID, decodedParam(c, "name")); err != nil {
return c.JSON(http.StatusNotFound, map[string]string{"error": err.Error()})
}
return c.JSON(http.StatusOK, map[string]string{"status": "ok"})
@@ -208,7 +225,7 @@ func GetAgentStatusEndpoint(app *application.Application) echo.HandlerFunc {
return func(c echo.Context) error {
svc := app.AgentPoolService()
userID := effectiveUserID(c)
name := c.Param("name")
name := decodedParam(c, "name")
history := svc.GetAgentStatusForUser(userID, name)
if history == nil {
@@ -241,7 +258,7 @@ func GetAgentObservablesEndpoint(app *application.Application) echo.HandlerFunc
return func(c echo.Context) error {
svc := app.AgentPoolService()
userID := effectiveUserID(c)
name := c.Param("name")
name := decodedParam(c, "name")
history, err := svc.GetAgentObservablesForUser(userID, name)
if err != nil {
@@ -261,7 +278,7 @@ func ClearAgentObservablesEndpoint(app *application.Application) echo.HandlerFun
return func(c echo.Context) error {
svc := app.AgentPoolService()
userID := effectiveUserID(c)
name := c.Param("name")
name := decodedParam(c, "name")
if err := svc.ClearAgentObservablesForUser(userID, name); err != nil {
return c.JSON(http.StatusNotFound, map[string]string{"error": err.Error()})
}
@@ -273,7 +290,7 @@ func ChatWithAgentEndpoint(app *application.Application) echo.HandlerFunc {
return func(c echo.Context) error {
svc := app.AgentPoolService()
userID := effectiveUserID(c)
name := c.Param("name")
name := decodedParam(c, "name")
var payload struct {
Message string `json:"message"`
}
@@ -302,7 +319,7 @@ func AgentSSEEndpoint(app *application.Application) echo.HandlerFunc {
return func(c echo.Context) error {
svc := app.AgentPoolService()
userID := effectiveUserID(c)
name := c.Param("name")
name := decodedParam(c, "name")
// Try local SSE manager first
manager := svc.GetSSEManagerForUser(userID, name)
@@ -334,7 +351,7 @@ func ExportAgentEndpoint(app *application.Application) echo.HandlerFunc {
return func(c echo.Context) error {
svc := app.AgentPoolService()
userID := effectiveUserID(c)
name := c.Param("name")
name := decodedParam(c, "name")
data, err := svc.ExportAgentForUser(userID, name)
if err != nil {
return c.JSON(http.StatusNotFound, map[string]string{"error": err.Error()})

View File

@@ -4,8 +4,6 @@ import (
"encoding/json"
"io"
"net/http"
"os"
"path/filepath"
"time"
"github.com/labstack/echo/v4"
@@ -110,6 +108,18 @@ func UpdateSettingsEndpoint(app *application.Application) echo.HandlerFunc {
})
}
// Read whatever is already persisted: it is both the source of truth
// for branding asset filenames (below) and the base we merge this
// request onto before writing. A read failure must not let a Save
// silently discard the existing settings — surface it instead.
persisted, err := appConfig.ReadPersistedSettings()
if err != nil {
return c.JSON(http.StatusInternalServerError, schema.SettingsResponse{
Success: false,
Error: "Failed to read existing settings: " + err.Error(),
})
}
// Branding asset filenames are owned exclusively by
// /api/branding/asset/{kind} (upload/delete). The Settings page also
// round-trips them via GET /api/settings, but its local state is stale
@@ -118,11 +128,9 @@ func UpdateSettingsEndpoint(app *application.Application) echo.HandlerFunc {
// at page open. Replace whatever the body sent for these three fields
// with the values currently on disk so /api/settings can never
// regress them.
if existing, err := appConfig.ReadPersistedSettings(); err == nil {
settings.LogoFile = existing.LogoFile
settings.LogoHorizontalFile = existing.LogoHorizontalFile
settings.FaviconFile = existing.FaviconFile
}
settings.LogoFile = persisted.LogoFile
settings.LogoHorizontalFile = persisted.LogoHorizontalFile
settings.FaviconFile = persisted.FaviconFile
// The UI reads ApiKeys from GET /api/settings, which already returns the
// merged env+runtime list. When the user clicks Save, the same merged
@@ -145,16 +153,17 @@ func UpdateSettingsEndpoint(app *application.Application) echo.HandlerFunc {
settings.ApiKeys = &runtimeOnly
}
settingsFile := filepath.Join(appConfig.DynamicConfigsDir, "runtime_settings.json")
settingsJSON, err := json.MarshalIndent(settings, "", " ")
if err != nil {
return c.JSON(http.StatusInternalServerError, schema.SettingsResponse{
Success: false,
Error: "Failed to marshal settings: " + err.Error(),
})
}
if err := os.WriteFile(settingsFile, settingsJSON, 0600); err != nil {
// Persist as a partial update: overlay only the fields this request set
// onto the settings already on disk. Focused admin pages POST just the
// keys they own (the Middleware proxy tab sends only mitm_listen; the
// detector table only pii_default_detectors), so writing the request
// body verbatim would null every unrelated setting (the no-omitempty
// api_keys / pii_default_detectors fields even round-trip as JSON
// null). The full Settings page still round-trips every field, so its
// Save is unchanged.
toPersist := persisted
toPersist.MergeNonNil(settings)
if err := appConfig.WritePersistedSettings(toPersist); err != nil {
return c.JSON(http.StatusInternalServerError, schema.SettingsResponse{
Success: false,
Error: "Failed to write settings file: " + err.Error(),
@@ -262,7 +271,14 @@ func UpdateSettingsEndpoint(app *application.Application) echo.HandlerFunc {
}
}
if settings.MITMListen != nil {
// Rebuild the MITM listener when its address OR the instance-wide
// default detectors change. The per-host detector map is resolved once
// at listener start (startMITMLocked → ResolvePIIPolicy), so a
// default-detector change is otherwise invisible to cloud-proxy traffic
// until the next restart — an admin toggling a default detector would
// see no redaction. RestartMITM is a no-op when the listener is
// disabled (empty address).
if settings.MITMListen != nil || settings.PIIDefaultDetectors != nil {
if err := app.RestartMITM(); err != nil {
xlog.Error("Failed to restart MITM proxy", "error", err)
return c.JSON(http.StatusInternalServerError, schema.SettingsResponse{

View File

@@ -52,6 +52,10 @@ var _ = Describe("Settings endpoints", func() {
// Settings are persisted here; set after construction since there's no
// dedicated AppOption for it.
app.ApplicationConfig().DynamicConfigsDir = tmp
// Contain the MITM CA inside tmp too. The partial-save spec flips
// mitm_listen, which starts the listener and writes a CA; without this
// it defaults to ./mitm-ca and litters the package source tree.
app.ApplicationConfig().MITMCADir = filepath.Join(tmp, "mitm-ca")
e = echo.New()
e.GET("/api/settings", GetSettingsEndpoint(app))
@@ -109,6 +113,57 @@ var _ = Describe("Settings endpoints", func() {
Expect(err).ToNot(HaveOccurred())
})
// Regression: a focused admin page (the Middleware proxy tab) POSTs only
// the one field it owns — mitm_listen. The old handler wrote the request
// body verbatim, so every other persisted setting was dropped (and
// api_keys / pii_default_detectors, which lack omitempty, were written as
// null). A partial POST must now merge onto what is already on disk.
It("preserves unrelated persisted settings when a partial POST sets only mitm_listen", func() {
// First save establishes a fuller settings file (as the full Settings
// page would): galleries, an API key, and the MITM listener. The
// listener restart binds a real socket, so use 127.0.0.1:0 for an
// ephemeral free port rather than a fixed one that may be in use.
rec := post(`{"mitm_listen":"127.0.0.1:0","galleries":[{"name":"g1","url":"http://example/g1"}],"api_keys":["k1"],"pii_default_detectors":["det-a"]}`)
Expect(rec.Code).To(Equal(http.StatusOK), rec.Body.String())
// The Middleware proxy tab then changes only the listen address — the
// exact partial body that nulled everything else before the fix.
rec = post(`{"mitm_listen":"127.0.0.1:0"}`)
Expect(rec.Code).To(Equal(http.StatusOK), rec.Body.String())
raw, err := os.ReadFile(filepath.Join(tmp, "runtime_settings.json"))
Expect(err).ToNot(HaveOccurred())
var ondisk config.RuntimeSettings
Expect(json.Unmarshal(raw, &ondisk)).To(Succeed())
Expect(ondisk.MITMListen).ToNot(BeNil())
Expect(*ondisk.MITMListen).To(Equal("127.0.0.1:0"), "the changed field should be saved")
Expect(ondisk.Galleries).ToNot(BeNil(), "galleries were clobbered by the partial save")
Expect(*ondisk.Galleries).To(HaveLen(1))
Expect(ondisk.ApiKeys).ToNot(BeNil(), "api_keys were nulled by the partial save")
Expect(*ondisk.ApiKeys).To(Equal([]string{"k1"}))
Expect(ondisk.PIIDefaultDetectors).ToNot(BeNil(), "pii_default_detectors were nulled by the partial save")
Expect(*ondisk.PIIDefaultDetectors).To(Equal([]string{"det-a"}))
})
// The MITM listener resolves its per-host PII detectors once at start
// (startMITMLocked → ResolvePIIPolicy), and the handler used to restart it
// only when mitm_listen changed. So an admin toggling a default detector
// (the Middleware detector table POSTs only pii_default_detectors) left
// cloud-proxy traffic unredacted until the next reboot. A
// pii_default_detectors change must now rebuild the listener.
It("rebuilds the MITM listener when only pii_default_detectors changes", func() {
rec := post(`{"mitm_listen":"127.0.0.1:0"}`)
Expect(rec.Code).To(Equal(http.StatusOK), rec.Body.String())
srv1 := app.MITMServer()
Expect(srv1).ToNot(BeNil(), "listener should be running after mitm_listen is set")
rec = post(`{"pii_default_detectors":["det-a"]}`)
Expect(rec.Code).To(Equal(http.StatusOK), rec.Body.String())
Expect(app.MITMServer()).ToNot(BeIdenticalTo(srv1),
"a default-detector change must restart the listener so it picks up the new detectors")
})
// Residual #9125: enabling the watchdog from a cold (off) state via the
// React master toggle must start the live watchdog immediately, without a
// restart. The toggle posts watchdog_idle_enabled/busy_enabled=true while

View File

@@ -432,7 +432,7 @@ func loadSoundDetectionConfig(pipeline *config.Pipeline, cl *config.ModelConfigL
if pipeline.SoundDetection == "" {
return nil, nil
}
cfg, err := cl.LoadModelConfigFileByName(pipeline.SoundDetection, ml.ModelPath)
cfg, err := loadPipelineSubModel(cl, pipeline.SoundDetection, ml.ModelPath)
if err != nil {
return nil, fmt.Errorf("failed to load sound detection config: %w", err)
}
@@ -443,7 +443,7 @@ func loadSoundDetectionConfig(pipeline *config.Pipeline, cl *config.ModelConfigL
}
func newTranscriptionOnlyModel(pipeline *config.Pipeline, cl *config.ModelConfigLoader, ml *model.ModelLoader, appConfig *config.ApplicationConfig) (Model, *config.ModelConfig, error) {
cfgVAD, err := cl.LoadModelConfigFileByName(pipeline.VAD, ml.ModelPath)
cfgVAD, err := loadPipelineSubModel(cl, pipeline.VAD, ml.ModelPath)
if err != nil {
return nil, nil, fmt.Errorf("failed to load backend config: %w", err)
@@ -453,7 +453,7 @@ func newTranscriptionOnlyModel(pipeline *config.Pipeline, cl *config.ModelConfig
return nil, nil, fmt.Errorf("failed to validate config: %w", err)
}
cfgSST, err := cl.LoadModelConfigFileByName(pipeline.Transcription, ml.ModelPath)
cfgSST, err := loadPipelineSubModel(cl, pipeline.Transcription, ml.ModelPath)
if err != nil {
return nil, nil, fmt.Errorf("failed to load backend config: %w", err)
@@ -542,11 +542,30 @@ func buildRealtimeRoutingContext(a *application.Application, sessionID string) *
}
}
// loadPipelineSubModel loads a pipeline sub-model config by name and follows a
// single alias hop, so a pipeline that references an alias (e.g. `llm: default`)
// gets the alias target's full config (Backend, Model, ...) rather than the
// alias stub with an empty Backend. Without this the alias survives unresolved
// into model loading and fails downstream — notably in distributed mode with
// "backend name is empty". Mirrors the top-level alias resolution in
// core/http/middleware/request.go.
func loadPipelineSubModel(cl *config.ModelConfigLoader, name, modelPath string) (*config.ModelConfig, error) {
cfg, err := cl.LoadModelConfigFileByName(name, modelPath)
if err != nil {
return nil, err
}
resolved, _, err := cl.ResolveAlias(cfg)
if err != nil {
return nil, err
}
return resolved, nil
}
// returns and loads either a wrapped model or a model that support audio-to-audio
func newModel(pipeline *config.Pipeline, cl *config.ModelConfigLoader, ml *model.ModelLoader, appConfig *config.ApplicationConfig, evaluator *templates.Evaluator, routing *RealtimeRoutingContext) (Model, error) {
xlog.Debug("Creating new model pipeline model", "pipeline", pipeline)
cfgVAD, err := cl.LoadModelConfigFileByName(pipeline.VAD, ml.ModelPath)
cfgVAD, err := loadPipelineSubModel(cl, pipeline.VAD, ml.ModelPath)
if err != nil {
return nil, fmt.Errorf("failed to load backend config: %w", err)
@@ -557,7 +576,7 @@ func newModel(pipeline *config.Pipeline, cl *config.ModelConfigLoader, ml *model
}
// TODO: Do we always need a transcription model? It can be disabled. Note that any-to-any instruction following models don't transcribe as such, so if transcription is required it is a separate process
cfgSST, err := cl.LoadModelConfigFileByName(pipeline.Transcription, ml.ModelPath)
cfgSST, err := loadPipelineSubModel(cl, pipeline.Transcription, ml.ModelPath)
if err != nil {
return nil, fmt.Errorf("failed to load backend config: %w", err)
@@ -589,7 +608,7 @@ func newModel(pipeline *config.Pipeline, cl *config.ModelConfigLoader, ml *model
xlog.Debug("Loading a wrapped model")
// Otherwise we want to return a wrapped model, which is a "virtual" model that re-uses other models to perform operations
cfgLLM, err := cl.LoadModelConfigFileByName(pipeline.LLM, ml.ModelPath)
cfgLLM, err := loadPipelineSubModel(cl, pipeline.LLM, ml.ModelPath)
if err != nil {
return nil, fmt.Errorf("failed to load backend config: %w", err)
@@ -604,7 +623,7 @@ func newModel(pipeline *config.Pipeline, cl *config.ModelConfigLoader, ml *model
applyPipelineReasoning(cfgLLM, *pipeline)
applyPipelineThinking(cfgLLM, *pipeline)
cfgTTS, err := cl.LoadModelConfigFileByName(pipeline.TTS, ml.ModelPath)
cfgTTS, err := loadPipelineSubModel(cl, pipeline.TTS, ml.ModelPath)
if err != nil {
return nil, fmt.Errorf("failed to load backend config: %w", err)

View File

@@ -0,0 +1,52 @@
package openai
import (
"os"
"path/filepath"
. "github.com/onsi/ginkgo/v2"
. "github.com/onsi/gomega"
"github.com/mudler/LocalAI/core/config"
)
// loadPipelineSubModel must resolve a pipeline sub-model that references an
// alias (e.g. `llm: default`) one hop to the alias target's full config — so
// the effective backend is the target's backend, not the empty backend of the
// alias stub. This mirrors the top-level alias resolution done in
// core/http/middleware/request.go, which the realtime pipeline previously
// skipped (failing in distributed mode with "backend name is empty").
var _ = Describe("loadPipelineSubModel", func() {
It("resolves a sub-model alias one hop to the target's config", func() {
tmpDir := GinkgoT().TempDir()
// A real model config with a concrete backend.
realLLM := `name: real-llm
backend: llama-cpp
parameters:
model: real-llm.gguf
`
Expect(os.WriteFile(filepath.Join(tmpDir, "real-llm.yaml"), []byte(realLLM), 0644)).To(Succeed())
// An alias pointing at the real model.
aliasCfg := `name: default
alias: real-llm
`
Expect(os.WriteFile(filepath.Join(tmpDir, "default.yaml"), []byte(aliasCfg), 0644)).To(Succeed())
cl := config.NewModelConfigLoader(tmpDir)
Expect(cl.LoadModelConfigsFromPath(tmpDir)).To(Succeed())
// Resolving the alias must follow the hop to the target's full config.
resolved, err := loadPipelineSubModel(cl, "default", tmpDir)
Expect(err).NotTo(HaveOccurred())
Expect(resolved.IsAlias()).To(BeFalse())
Expect(resolved.Backend).To(Equal("llama-cpp"))
// A non-alias name must load unchanged.
direct, err := loadPipelineSubModel(cl, "real-llm", tmpDir)
Expect(err).NotTo(HaveOccurred())
Expect(direct.Backend).To(Equal("llama-cpp"))
Expect(direct.Name).To(Equal("real-llm"))
})
})

View File

@@ -55,17 +55,70 @@ func BasePathPrefix(c echo.Context) string {
// The returned URL is guaranteed to end with `/`.
// The method should be used in conjunction with the StripPathPrefix middleware.
func BaseURL(c echo.Context) string {
// An explicit external base URL (LOCALAI_BASE_URL) is authoritative for
// the origin. The proxy-derived path prefix is still appended so a
// reverse-proxy mount point keeps working. Trailing slashes are
// normalized via BasePathPrefix, which always starts and ends with "/".
if ext, ok := c.Get("_external_base_url").(string); ok && ext != "" {
return strings.TrimRight(ext, "/") + BasePathPrefix(c)
}
fwdProto, fwdHost := parseForwarded(c.Request().Header.Get("Forwarded"))
scheme := "http"
if c.Request().Header.Get("X-Forwarded-Proto") == "https" {
switch {
case c.Request().TLS != nil:
scheme = "https"
} else if c.Request().TLS != nil {
case strings.EqualFold(firstToken(c.Request().Header.Get("X-Forwarded-Proto")), "https"):
scheme = "https"
case strings.EqualFold(fwdProto, "https"):
scheme = "https"
}
host := c.Request().Host
if forwardedHost := c.Request().Header.Get("X-Forwarded-Host"); forwardedHost != "" {
host = forwardedHost
} else if fwdHost != "" {
host = fwdHost
}
return scheme + "://" + host + BasePathPrefix(c)
}
// firstToken returns the first comma-separated token of v, trimmed of spaces.
// Reverse-proxy chains can emit X-Forwarded-Proto as "https,http"; only the
// first hop (closest to the client) is meaningful for scheme detection.
func firstToken(v string) string {
if i := strings.IndexByte(v, ','); i >= 0 {
v = v[:i]
}
return strings.TrimSpace(v)
}
// parseForwarded extracts the proto and host directives from the first element
// of an RFC 7239 Forwarded header (e.g. `for=x;proto=https;host=h, for=y`).
// Values may be quoted. Returns empty strings when absent or malformed so the
// caller can fall through to other signals.
func parseForwarded(header string) (proto, host string) {
if header == "" {
return "", ""
}
// Only the first element (closest proxy to the client) matters here.
if i := strings.IndexByte(header, ','); i >= 0 {
header = header[:i]
}
for _, directive := range strings.Split(header, ";") {
key, value, ok := strings.Cut(strings.TrimSpace(directive), "=")
if !ok {
continue
}
value = strings.Trim(strings.TrimSpace(value), `"`)
switch strings.ToLower(strings.TrimSpace(key)) {
case "proto":
proto = value
case "host":
host = value
}
}
return proto, host
}

View File

@@ -135,4 +135,138 @@ var _ = Describe("BaseURL", func() {
Entry("missing leading slash", "evil"),
)
})
Context("scheme detection hardening", func() {
It("treats comma-separated X-Forwarded-Proto as https when first token is https", func() {
app := echo.New()
actualURL := ""
app.GET("/x", func(c echo.Context) error {
actualURL = BaseURL(c)
return nil
})
req := httptest.NewRequest("GET", "/x", nil)
req.Header.Set("X-Forwarded-Proto", "https,http")
rec := httptest.NewRecorder()
app.ServeHTTP(rec, req)
Expect(actualURL).To(Equal("https://example.com/"))
})
It("derives https from the RFC 7239 Forwarded proto directive", func() {
app := echo.New()
actualURL := ""
app.GET("/x", func(c echo.Context) error {
actualURL = BaseURL(c)
return nil
})
req := httptest.NewRequest("GET", "/x", nil)
req.Header.Set("Forwarded", "for=192.0.2.1;proto=https;host=proxy.example")
rec := httptest.NewRecorder()
app.ServeHTTP(rec, req)
Expect(actualURL).To(Equal("https://proxy.example/"))
})
It("prefers X-Forwarded-Host over the Forwarded host directive", func() {
app := echo.New()
actualURL := ""
app.GET("/x", func(c echo.Context) error {
actualURL = BaseURL(c)
return nil
})
req := httptest.NewRequest("GET", "/x", nil)
req.Header.Set("X-Forwarded-Host", "xfh.example")
req.Header.Set("Forwarded", "host=fwd.example;proto=https")
rec := httptest.NewRecorder()
app.ServeHTTP(rec, req)
Expect(actualURL).To(Equal("https://xfh.example/"))
})
})
Context("explicit external base URL override", func() {
It("uses the configured origin over conflicting forwarded headers", func() {
app := echo.New()
actualURL := ""
app.GET("/x", func(c echo.Context) error {
c.Set("_external_base_url", "https://192.168.0.13:34567")
actualURL = BaseURL(c)
return nil
})
req := httptest.NewRequest("GET", "/x", nil)
req.Header.Set("X-Forwarded-Proto", "http")
req.Header.Set("X-Forwarded-Host", "internal:8080")
rec := httptest.NewRecorder()
app.ServeHTTP(rec, req)
Expect(actualURL).To(Equal("https://192.168.0.13:34567/"))
})
It("combines the configured origin with a detected path prefix", func() {
app := echo.New()
actualURL := ""
app.GET("/hello", func(c echo.Context) error {
c.Set("_original_path", "/localai/hello")
c.Set("_external_base_url", "https://ext.example")
actualURL = BaseURL(c)
return nil
})
req := httptest.NewRequest("GET", "/hello", nil)
rec := httptest.NewRecorder()
app.ServeHTTP(rec, req)
Expect(actualURL).To(Equal("https://ext.example/localai/"))
})
It("ignores an empty override", func() {
app := echo.New()
actualURL := ""
app.GET("/x", func(c echo.Context) error {
c.Set("_external_base_url", "")
actualURL = BaseURL(c)
return nil
})
req := httptest.NewRequest("GET", "/x", nil)
rec := httptest.NewRecorder()
app.ServeHTTP(rec, req)
Expect(actualURL).To(Equal("http://example.com/"))
})
})
Context("parseForwarded helper", func() {
It("parses unquoted proto and host", func() {
proto, host := parseForwarded("for=192.0.2.1;proto=https;host=h.example")
Expect(proto).To(Equal("https"))
Expect(host).To(Equal("h.example"))
})
It("strips quotes around values", func() {
proto, host := parseForwarded(`proto="https";host="h.example"`)
Expect(proto).To(Equal("https"))
Expect(host).To(Equal("h.example"))
})
It("uses only the first element of a multi-element header", func() {
proto, host := parseForwarded("proto=https;host=first.example, proto=http;host=second.example")
Expect(proto).To(Equal("https"))
Expect(host).To(Equal("first.example"))
})
It("returns empty strings for an empty header", func() {
proto, host := parseForwarded("")
Expect(proto).To(BeEmpty())
Expect(host).To(BeEmpty())
})
It("skips directives without a value", func() {
proto, host := parseForwarded("proto;host=h.example")
Expect(proto).To(BeEmpty())
Expect(host).To(Equal("h.example"))
})
})
Context("firstToken helper", func() {
It("returns the whole trimmed string when there is no comma", func() {
Expect(firstToken(" https ")).To(Equal("https"))
})
It("returns the first trimmed token when there is a comma", func() {
Expect(firstToken("https , http")).To(Equal("https"))
})
})
})

View File

@@ -1,100 +0,0 @@
import { test, expect } from './coverage-fixtures.js'
// These specs stub /api/features and /api/auth/status per cell. The test server
// disables auth (isAdmin=true) and reports its own features, so we intercept
// before navigation to simulate each role x mode cell.
function stubFeatures(page, features) {
return page.route('**/api/features', route =>
route.fulfill({ contentType: 'application/json', body: JSON.stringify(features) }))
}
function stubNoP2P(page) {
// P2P token endpoint returns empty -> p2pEnabled=false.
return page.route('**/api/p2p/token', route =>
route.fulfill({ contentType: 'text/plain', body: '' }))
}
test.describe('Adaptive landing (HomeRoute)', () => {
test('admin + distributed redirects /app to Nodes', async ({ page }) => {
await stubFeatures(page, { distributed: true })
await stubNoP2P(page)
await page.goto('/app')
await expect(page).toHaveURL(/\/app\/nodes$/)
await expect(page.locator('.page-title').first()).toBeVisible({ timeout: 15_000 })
})
test('admin + single-node stays on Home', async ({ page }) => {
await stubFeatures(page, { distributed: false })
await stubNoP2P(page)
await page.goto('/app')
await expect(page).toHaveURL(/\/app$/)
await expect(page.locator('.home-greeting')).toBeVisible({ timeout: 15_000 })
})
})
test.describe('Adaptive sidebar', () => {
test('distributed pins the Cluster group with Nodes at the top', async ({ page }) => {
await stubFeatures(page, { distributed: true })
await stubNoP2P(page)
await page.goto('/app/chat') // any in-app page so the sidebar is mounted
const pinned = page.locator('.sidebar-nav .sidebar-section-items').first()
await expect(pinned.getByText('Nodes', { exact: false })).toBeVisible({ timeout: 15_000 })
})
test('single-node does not pin a Cluster group', async ({ page }) => {
await stubFeatures(page, { distributed: false })
await stubNoP2P(page)
await page.goto('/app/chat')
// Nodes is reachable only via the Operate rail, not pinned at the top.
await expect(page.locator('.sidebar-nav')).toBeVisible({ timeout: 15_000 })
await expect(page.locator('.sidebar-nav .sidebar-section-items').first()
.getByText('Nodes', { exact: false })).toHaveCount(0)
})
})
test.describe('Top navbar', () => {
test('admin sees the mode pill and settings cog', async ({ page }) => {
await stubFeatures(page, { distributed: true })
await stubNoP2P(page)
await page.goto('/app/chat')
await expect(page.locator('.top-navbar__mode')).toBeVisible({ timeout: 15_000 })
await expect(page.locator('.top-navbar__icon[aria-label]')).not.toHaveCount(0)
})
test('admin-via-chat jump shows when localai_assistant is enabled', async ({ page }) => {
await stubFeatures(page, { distributed: false, localai_assistant: true })
await stubNoP2P(page)
await page.goto('/app/chat')
await expect(page.locator('.top-navbar__assistant')).toBeVisible({ timeout: 15_000 })
})
test('admin-via-chat jump hidden when localai_assistant is off', async ({ page }) => {
await stubFeatures(page, { distributed: false, localai_assistant: false })
await stubNoP2P(page)
await page.goto('/app/chat')
await expect(page.locator('.top-navbar__assistant')).toHaveCount(0)
})
})
test.describe('Token usage meter', () => {
test('renders when admin usage has data', async ({ page }) => {
await stubFeatures(page, { distributed: false })
await stubNoP2P(page)
await page.route('**/api/auth/admin/usage**', route =>
route.fulfill({ contentType: 'application/json',
body: JSON.stringify({ buckets: [{ total_tokens: 1234 }] }) }))
await page.goto('/app/chat')
await expect(page.locator('.top-navbar__meter')).toBeVisible({ timeout: 15_000 })
})
test('hidden when admin usage is empty (graceful degrade)', async ({ page }) => {
await stubFeatures(page, { distributed: false })
await stubNoP2P(page)
await page.route('**/api/auth/admin/usage**', route =>
route.fulfill({ contentType: 'application/json', body: JSON.stringify({ buckets: [] }) }))
await page.goto('/app/chat')
await expect(page.locator('.top-navbar')).toBeVisible({ timeout: 15_000 })
await expect(page.locator('.top-navbar__meter')).toHaveCount(0)
})
})

View File

@@ -86,6 +86,7 @@
"input": {
"placeholder": "Message...",
"attachFile": "Attach file",
"send": "Send message",
"stopGenerating": "Stop generating",
"canvasTitle": "Canvas — extract code blocks and media into a side panel for preview, copy, and download",
"canvasLabel": "Canvas",

View File

@@ -77,6 +77,21 @@
"noModelsTitle": "No Models Available",
"noModelsBody": "There are no models installed yet. Ask your administrator to set up models so you can start chatting."
},
"starters": {
"title": "Recommended for your hardware",
"tier": {
"cpu": "CPU-only",
"gpu-small": "GPU",
"gpu-mid": "GPU",
"gpu-large": "GPU"
},
"cpuNote": "No GPU detected — these small models stay responsive on CPU.",
"gpuNote": "Picked to fit your available VRAM with room for context.",
"install": "Install",
"installing": "Installing",
"installStarted": "Installing {{model}}…",
"installFailed": "Install failed: {{message}}"
},
"connect": {
"title": "One endpoint, every API",
"subtitle": "LocalAI serves its own full API — image & video generation, depth, object detection, reranking, audio, face & voice recognition, and realtime voice over WebRTC and WebSocket. On top of that, a drop-in compatibility layer lets any app built for OpenAI, Anthropic, Ollama or OpenAI Responses talk to it unchanged.",

View File

@@ -2,6 +2,16 @@
"title": "Install Models",
"subtitle": "Browse and install AI models from the gallery",
"models": "Models",
"recommended": {
"title": "Recommended for your hardware",
"cpuNote": "No GPU detected - small models that stay responsive on CPU.",
"gpuNote": "Sized to fit your available VRAM with room for context.",
"install": "Install",
"installing": "Installing",
"installStarted": "Installing {{model}}…",
"installFailed": "Install failed: {{message}}",
"dismiss": "Dismiss recommendations"
},
"stats": {
"available": "Available",
"installed": "Installed"

View File

@@ -12,16 +12,6 @@
"accountSettings": "Account settings",
"account": "Account",
"accountFor": "Account: {{name}}",
"topbar": {
"label": "Top bar",
"modeDistributed": "Distributed",
"modeSwarm": "Swarm",
"modeSingle": "Single-node",
"pickModel": "Models",
"adminViaChat": "Admin via chat",
"tokensToday": "Tokens today",
"usageDetail": "View usage detail"
},
"sections": {
"create": "Create",
"recognition": "Recognition",

View File

@@ -45,7 +45,7 @@
},
"scheduling": {
"title": "Penjadwalan",
"subtitle": "Aturan penempatan model dan replika di seluruh klaster"
"subtitle": "Aturan penempatan model dan replika di seluruh kluster"
},
"p2p": {
"title": "Komputasi AI Terdistribusi",
@@ -86,4 +86,4 @@
"title": "Penjelajah",
"subtitle": "Jelajahi file dan konfigurasi"
}
}
}

View File

@@ -72,7 +72,7 @@
"actions": {
"copy": "Salin",
"regenerate": "Hasilkan ulang",
"jumpToLatest": "Jump to latest"
"jumpToLatest": "Lompat ke terbaru"
},
"streaming": {
"transferring": "Mentransfer model...",
@@ -115,4 +115,4 @@
"clearAll": "Hapus semua",
"deleteAllTitle": "Hapus semua percakapan"
}
}
}

View File

@@ -1,8 +1,8 @@
{
"unsaved": {
"title": "Discard unsaved changes?",
"message": "You have unsaved changes that will be lost if you leave this page.",
"leave": "Leave"
"title": "Buang perubahan yang belum disimpan?",
"message": "Anda memiliki perubahan yang belum disimpan. Perubahan tersebut akan hilang jika Anda meninggalkan halaman ini.",
"leave": "Tinggalkan Halaman"
},
"actions": {
"save": "Simpan",

View File

@@ -7,15 +7,15 @@
"resourceGpu": "GPU",
"resourceRam": "RAM",
"greeting": {
"morning": "Good morning",
"afternoon": "Good afternoon",
"evening": "Good evening",
"night": "Working late"
"morning": "Selamat pagi",
"afternoon": "Selamat siang",
"evening": "Selamat malam",
"night": "Selamat lembur"
},
"statusLine": {
"modelsLoaded_one": "{{count}} model loaded",
"modelsLoaded_other": "{{count}} models loaded",
"noModelsLoaded": "No models loaded",
"modelsLoaded_one": "{{count}} model dimuat",
"modelsLoaded_other": "{{count}} model dimuat",
"noModelsLoaded": "Tidak ada model yang dimuat",
"nodes_one": "{{count}} node",
"nodes_other": "{{count}} nodes"
},
@@ -79,14 +79,14 @@
},
"connect": {
"title": "Satu endpoint, semua API",
"subtitle": "LocalAI menyediakan API miliknya sendiri yang lengkap — pembuatan gambar & video, depth, deteksi objek, reranking, audio, pengenalan wajah & suara, serta suara realtime melalui WebRTC dan WebSocket. Di atas itu, lapisan kompatibilitas drop-in membuat aplikasi apa pun yang dibuat untuk OpenAI, Anthropic, Ollama, atau OpenAI Responses bekerja tanpa perubahan.",
"subtitle": "LocalAI menyediakan API miliknya sendiri yang lengkap — pembuatan gambar & video, depth, deteksi objek, reranking, audio, pengenalan wajah & suara, serta suara realtime melalui WebRTC dan WebSocket. Selain itu, lapisan kompatibilitas drop-in membuat aplikasi apa pun yang dibuat untuk OpenAI, Anthropic, Ollama, atau OpenAI Responses bekerja tanpa perubahan.",
"nativeTitle": "API native",
"compatTitle": "Kompatibilitas drop-in",
"apiReference": "Referensi API lengkap",
"copy": "Salin",
"copied": "Disalin",
"browse": "Browse the API",
"hide": "Hide endpoints",
"dismiss": "Dismiss"
"browse": "Jelajahi API",
"hide": "Sembunyikan endpoint",
"dismiss": "Abaikan"
}
}

Some files were not shown because too many files have changed in this diff Show More