Compare commits

..

92 Commits

Author SHA1 Message Date
Ettore Di Giacinto
3bafc2e1ab feat(ui): canvas fullscreen toggle + keyboard tab navigation
The canvas header gains a fullscreen toggle (expands the panel to cover the
viewport; resize handle hidden while fullscreen). The artifact tab strip is now
a proper ARIA tablist with roving tabindex and Left/Right arrow-key navigation.

Assisted-by: Claude:claude-opus-4-8 [Claude Code]
Signed-off-by: Ettore Di Giacinto <mudler@localai.io>
2026-06-18 17:41:46 +00:00
Ettore Di Giacinto
8991b94a80 feat(ui): jump-to-latest pill when scrolled up in chat
When the user scrolls away from the bottom of a conversation, a floating
"Jump to latest" pill appears (sticky, centered above the composer); clicking
it smooth-scrolls to the newest message and re-pins auto-scroll. Resets on
chat switch. Adds the chat.actions.jumpToLatest i18n key to all locales.

Assisted-by: Claude:claude-opus-4-8 [Claude Code]
Signed-off-by: Ettore Di Giacinto <mudler@localai.io>
2026-06-18 17:41:46 +00:00
Ettore Di Giacinto
5510262aab feat(ui): image attachments show a thumbnail in the composer
Staged image attachments now preview as a 28px thumbnail (from their data URL)
instead of a bare file icon; other types keep the icon. File names truncate and
the remove button gets an aria-label.

Assisted-by: Claude:claude-opus-4-8 [Claude Code]
Signed-off-by: Ettore Di Giacinto <mudler@localai.io>
2026-06-18 17:41:46 +00:00
Ettore Di Giacinto
eb8b9b125d feat(ui): per-message code blocks get a language header + copy button
Chat code blocks now render inside a framed block with a header showing the
language and a copy button (delegated handler, copies the block and flips to a
check briefly). Decoration + highlighting run from a MutationObserver scoped to
the messages container, which fires reliably for streamed responses AND for
chats loaded/switched from storage - the prior render-keyed effect missed the
load path (code was left unhighlighted on reload). The observer disconnects
while mutating so it does not retrigger on its own edits.

Assisted-by: Claude:claude-opus-4-8 [Claude Code]
Signed-off-by: Ettore Di Giacinto <mudler@localai.io>
2026-06-18 17:41:46 +00:00
Ettore Di Giacinto
a5a4701f05 feat(ui): canvas drag-to-resize + slide-in, fix hooks order, typed download
Canvas was a fixed pane; make it a workbench:
- Drag the panel's left edge to resize (clamped 360px..75vw), persisted to
  localStorage, double-click to reset; hidden and full-width on narrow screens.
- Slide-in/fade on open via canvasSlideIn (honored by reduced-motion).
- Fix a rules-of-hooks bug: the `if (!current) return null` early return sat
  above useEffect, so the hook count changed when artifacts emptied. All hooks
  now run unconditionally before the guard.
- Downloads use the artifact language's real extension + MIME (a Python
  artifact saves as .py, not .txt) via extensionForLanguage.

Assisted-by: Claude:claude-opus-4-8 [Claude Code]
Signed-off-by: Ettore Di Giacinto <mudler@localai.io>
2026-06-18 17:41:46 +00:00
Ettore Di Giacinto
a753f82823 feat(ui): ThemeToggle i18n + animated icon, drop transition:all
The theme toggle hard-coded its English tooltip; route it through the existing
nav switchToLightMode/switchToDarkMode keys and add an aria-label. The sun/moon
icon now replays a small rotate+fade on theme change (keyed remount; honored by
the global reduced-motion block). Replace the .theme-toggle `transition: all`
with explicit properties.

Assisted-by: Claude:claude-opus-4-8 [Claude Code]
Signed-off-by: Ettore Di Giacinto <mudler@localai.io>
2026-06-18 17:41:46 +00:00
Ettore Di Giacinto
67e2f2caba test(ui): home greeting asserts sans font, not the dropped serif
The greeting render-smoke still asserted Fraunces; update it to assert the
Geist sans display font (and not Fraunces), matching the all-sans direction.

Assisted-by: Claude:claude-opus-4-8 [Claude Code]
Signed-off-by: Ettore Di Giacinto <mudler@localai.io>
2026-06-18 17:41:46 +00:00
Ettore Di Giacinto
30245d7bec refactor(ui): adopt PageHeader on ops/admin/media pages (batch B)
Replace hand-rolled .page-header title blocks with the shared editorial
PageHeader component across 14 pages (Manage, Middleware, Models,
NodeBackendLogs, Nodes, P2P, SkillEdit, Skills, Sound, Traces, TTS, Usage,
Users, VideoGen). Title/subtitle move into PageHeader; header-own action
clusters (Models stats+buttons, Skills search+buttons) move into the actions
slot. Tabs, filters, stat cards, ResourceMonitor and page body stay as
siblings. Eyebrow is left to auto-derive from the route.

Assisted-by: Claude:claude-opus-4-8 [Claude Code]
Signed-off-by: Ettore Di Giacinto <mudler@localai.io>
2026-06-18 17:41:46 +00:00
Ettore Di Giacinto
f1132f9143 refactor(ui): adopt PageHeader on agent/media/import/backend pages (batch A)
Signed-off-by: Ettore Di Giacinto <mudler@localai.io>
Assisted-by: Claude:claude-opus-4-8 [Claude Code]
2026-06-18 17:40:37 +00:00
Ettore Di Giacinto
a24cc84f7d feat(ui): editorial PageHeader with section eyebrow + scroll-to-top on nav
PageHeader now derives its eyebrow from the route's section/console (Build /
Operate / Create) via sectionKeyForPath, so pages get a consistent, meaningful
eyebrow with no per-page wiring (override with the eyebrow prop, suppress with
eyebrow={null}). Settings adopts it as the first consumer.

Also fix a navigation scroll bug: the default layout uses the document as its
scroll container and route changes did not reset it, so navigating the console
rail from a scrolled page landed mid-view. App now scrolls to top on pathname
change.

Assisted-by: Claude:claude-opus-4-8 [Claude Code]
Signed-off-by: Ettore Di Giacinto <mudler@localai.io>
2026-06-18 17:40:37 +00:00
Ettore Di Giacinto
7cca8352cb refactor(ui): declutter Home - discoverable + dismissable API, vertical balance
Home felt overloaded and top-heavy. Three changes from review:
- The API endpoint catalog (12 endpoints) is collapsed by default behind a
  "Browse the API" disclosure; only the base URL + copy stay visible, so the
  catalog is discoverable without dominating the page.
- The whole connect card is dismissable (x): dismissing unmounts it so the
  vertical space is recovered, and the choice is remembered (localStorage).
- .home-page now fills its column and vertically centers its content when
  there is slack, so sparse states (no models / card dismissed) read as a
  balanced launcher instead of content jammed at the top. Overflow-safe:
  tall content flows from the top and scrolls.

Adds connect.browse / connect.hide / connect.dismiss i18n keys to all locales.

Assisted-by: Claude:claude-opus-4-8 [Claude Code]
Signed-off-by: Ettore Di Giacinto <mudler@localai.io>
2026-06-18 17:40:37 +00:00
Ettore Di Giacinto
8caf93fb3a test(e2e): add --host flag to ui-test-server
Allow binding the e2e/preview server to an arbitrary address (e.g.
0.0.0.0 to review the UI from another device on the LAN). Defaults to
127.0.0.1 so existing e2e behavior is unchanged.

Assisted-by: Claude:claude-opus-4-8 [Claude Code]
Signed-off-by: Ettore Di Giacinto <mudler@localai.io>
2026-06-18 17:40:37 +00:00
Ettore Di Giacinto
e8cac3506b refactor(ui): all-sans editorial headings + tint-only active nav
Per design review, pivot the heading strategy from hybrid-serif to a
refined grotesk: drop the Fraunces dependency, token, and import; page
titles, the Home greeting, and section/empty-state titles now use Geist
at semibold with the editorial fluid sizing and tight tracking. No serif
anywhere.

Active sidebar item is now a tint-only treatment (accent text + tinted
background); the left accent rail is removed and the shared base
.nav-item.active inset bar is suppressed in the sidebar (as the console
rail already does). Update the design-system e2e specs to assert the
sans display font and the tinted-background active state.

Assisted-by: Claude:claude-opus-4-8 [Claude Code]
Signed-off-by: Ettore Di Giacinto <mudler@localai.io>
2026-06-18 17:40:37 +00:00
Ettore Di Giacinto
f9a130e446 fix(ui): single focus ring (no double-ring) + neutralize stagger delay under reduced motion
Signed-off-by: Ettore Di Giacinto <mudler@localai.io>
Assisted-by: Claude:claude-opus-4-8 [Claude Code]
2026-06-18 17:40:37 +00:00
Ettore Di Giacinto
76f3e032ae feat(ui): Home loaded-models skeleton list, button hierarchy, EmptyState wizard
Assisted-by: Claude:claude-opus-4-8 [Claude Code]
Signed-off-by: Ettore Di Giacinto <mudler@localai.io>
2026-06-18 17:40:37 +00:00
Ettore Di Giacinto
c7d9dbda6b feat(ui): Home editorial header + status line (north-star redesign)
Assisted-by: Claude:claude-opus-4-8 [Claude Code]
Signed-off-by: Ettore Di Giacinto <mudler@localai.io>
2026-06-18 17:40:37 +00:00
Ettore Di Giacinto
0838d7d3e6 feat(ui): StatusPill primitive + time-of-day greeting helper
Assisted-by: Claude:claude-opus-4-8 [Claude Code]
Signed-off-by: Ettore Di Giacinto <mudler@localai.io>
2026-06-18 17:40:37 +00:00
Ettore Di Giacinto
abd8162aad feat(ui): PageHeader + SectionHeading editorial primitives
Assisted-by: Claude:claude-opus-4-8 [Claude Code]
Signed-off-by: Ettore Di Giacinto <mudler@localai.io>
2026-06-18 17:40:37 +00:00
Ettore Di Giacinto
b73b93850c feat(ui): Skeleton shimmer primitive
Assisted-by: Claude:claude-opus-4-8 [Claude Code]
Signed-off-by: Ettore Di Giacinto <mudler@localai.io>
2026-06-18 17:40:37 +00:00
Ettore Di Giacinto
f31f311e38 feat(ui): EmptyState primitive with serif title
Assisted-by: Claude:claude-opus-4-8 [Claude Code]
Signed-off-by: Ettore Di Giacinto <mudler@localai.io>
2026-06-18 17:40:37 +00:00
Ettore Di Giacinto
24623d52e8 refactor(ui): route boot fallback through the LoadingSpinner primitive
Assisted-by: Claude:claude-opus-4-8 [Claude Code]
Signed-off-by: Ettore Di Giacinto <mudler@localai.io>
2026-06-18 17:40:37 +00:00
Ettore Di Giacinto
6ef88c78f2 refactor(ui): fix dead token refs + dedupe toggle to one primitive
Migrate all .toggle-slider consumers (Users, Chat, AgentChat) to the
canonical BEM toggle primitive and delete the legacy duplicate CSS block.

Assisted-by: Claude:claude-opus-4-8 [Claude Code]
Signed-off-by: Ettore Di Giacinto <mudler@localai.io>
2026-06-18 17:40:37 +00:00
Ettore Di Giacinto
7d501f1ba1 feat(ui): orchestrated page reveal + stagger motion primitives
Assisted-by: Claude:claude-opus-4-8 [Claude Code]
Signed-off-by: Ettore Di Giacinto <mudler@localai.io>
2026-06-18 17:40:37 +00:00
Ettore Di Giacinto
53af1807b6 feat(ui): un-overload accent — nav rail, stronger focus ring, neutral hover
Assisted-by: Claude:claude-opus-4-8 [Claude Code]
Signed-off-by: Ettore Di Giacinto <mudler@localai.io>
2026-06-18 17:40:37 +00:00
Ettore Di Giacinto
b5d8992531 feat(ui): serif display tier + section-heading typography scale
Assisted-by: Claude:claude-opus-4-8 [Claude Code]
Signed-off-by: Ettore Di Giacinto <mudler@localai.io>
2026-06-18 17:40:37 +00:00
Ettore Di Giacinto
9110002592 feat(ui): add Fraunces variable serif + --font-serif token
Assisted-by: Claude:claude-opus-4-8 [Claude Code]
Signed-off-by: Ettore Di Giacinto <mudler@localai.io>
2026-06-18 17:40:37 +00:00
Tai An
c3b3336654 fix(whisperx): use whisperx.diarize.DiarizationPipeline with token kwarg (#10389)
Signed-off-by: Anai-Guo <antai12232931@outlook.com>
2026-06-18 18:50:37 +02:00
LocalAI [bot]
c4cd86bb15 chore: bump localrecall to fix PostgreSQL collection name with ':' (#10375) (#10387)
chore: bump localrecall to include PostgreSQL table-name sanitization fix

Pulls mudler/localrecall#48, which makes sanitizeTableName allowlist valid
identifier characters so collection names containing ':' (e.g. the per-user
"legacy-api-key:<agent>" namespace) no longer break PostgreSQL CREATE TABLE
with "syntax error at or near ':'".

Fixes #10375

Signed-off-by: Ettore Di Giacinto <mudler@localai.io>
Co-authored-by: Ettore Di Giacinto <mudler@localai.io>
2026-06-18 17:05:52 +02:00
LocalAI [bot]
13f59f0822 docs: document the privacy-filter.cpp backend (#10386)
docs: document the privacy-filter.cpp backend in README and compatibility table

The privacy-filter.cpp backend (#10360) was registered in backend/index.yaml
and referenced from the PII feature docs, but was missing from the backend
catalog surfaces. Add it to the README "Backends built by us" table, the
compatibility table (Utilities & Other, CPU/CUDA 13/Vulkan), and the backend
type list in the backends feature doc.

Assisted-by: Claude:claude-opus-4-8 [Claude Code]

Signed-off-by: Ettore Di Giacinto <mudler@localai.io>
Co-authored-by: Ettore Di Giacinto <mudler@localai.io>
2026-06-18 15:07:01 +02:00
Richard Palethorpe
3fa7b2955c feat(pii): NER tier engine — privacy-filter.cpp backend + NER-centric PII filter (#10360)
Squashed feat/pii-ner-tier-engine rebased onto master (was 45 commits; see
backup/pii-ner-tier-engine-prerebase). Net change:

- privacy-filter.cpp: standalone GGML engine for the openai-privacy-filter
  PII/NER token classifier, wired as a LocalAI gRPC backend (CPU/CUDA/Vulkan).
  TokenClassify moves off the patched llama.cpp path onto this backend.
- PII filter reworked to be NER-centric (encoder/NER detection tier scanning
  whole conversations as one document), with a recreated bounded restricted-
  regex secret-matching pattern detector tier alongside it (per-model
  pii_detection.builtins / .patterns + core/services/routing/piipattern).
- Detection labelled by source (ner vs pattern); backend trace / confidence /
  debug observability; analyze/redact exposed as a synchronous API.
- Instance-wide default detector policy + per-usecase default-on; request
  filtering extended to completions, embeddings, edits & Ollama.
- React UI: NER-centric PII editor, detector-models table, pattern/builtins
  editor, middleware default-policy UI.
- Gallery: privacy-filter-multilingual token-classify model + NER install
  filter; token_classify known_usecase; batch sized to context for NER models.
  privacy-filter backend registered in the backend gallery (cpu/vulkan/cuda-13
  meta + image entries with a capabilities map) matching its CI matrix jobs,
  and an /import-model auto-detect importer (PrivacyFilterImporter, narrow
  privacy-filter GGUF detection) replacing the prior pref-only registration.

Reconciled against master's independent evolution:

- Dropped master's PIIPatternOverrides feature (global-pattern runtime
  overrides + /api/pii/patterns API + runtime_settings.json persistence). The
  per-model NER + pattern-detector design supersedes it; it was built on the
  global redactor pattern set this branch replaced.
- Reverted the llama.cpp Score carry-patch (0006-server-task-type-score):
  removed the patch and restored master's grpc-server.cpp Score RPC (direct
  llama_decode, slot-loop bypass) and LLAMA_VERSION pin, plus master's
  model_config validation forbidding score + chat/completion/embeddings on
  llama-cpp. token_classify is unaffected (it runs on the privacy-filter
  backend, not llama-cpp).

Assisted-by: Claude:claude-opus-4-8 [Claude Code]

Signed-off-by: Richard Palethorpe <io@richiejp.com>
2026-06-18 11:45:22 +01:00
LocalAI [bot]
c133ca39dc chore: ⬆️ Update ggml-org/llama.cpp to f3e182816421c648188b5eab269853bf1531d950 (#10379)
⬆️ Update ggml-org/llama.cpp

Signed-off-by: github-actions[bot] <41898282+github-actions[bot]@users.noreply.github.com>
Co-authored-by: mudler <2420543+mudler@users.noreply.github.com>
2026-06-18 11:43:23 +02:00
LocalAI [bot]
757822cd74 chore(model gallery): 🤖 add 1 new models via gallery agent (#10384)
chore(model gallery): 🤖 add new models via gallery agent

Signed-off-by: github-actions[bot] <41898282+github-actions[bot]@users.noreply.github.com>
Co-authored-by: mudler <2420543+mudler@users.noreply.github.com>
2026-06-18 08:51:30 +02:00
LocalAI [bot]
91f97f2a54 chore: ⬆️ Update ggml-org/whisper.cpp to 86c40c3bd6fc86f1187fb751d111b49e0fc18e84 (#10382)
⬆️ Update ggml-org/whisper.cpp

Signed-off-by: github-actions[bot] <41898282+github-actions[bot]@users.noreply.github.com>
Co-authored-by: mudler <2420543+mudler@users.noreply.github.com>
2026-06-18 08:34:43 +02:00
LocalAI [bot]
55f9ff6805 chore: ⬆️ Update mudler/parakeet.cpp to 92a5f0306be354c109150fe58ae4cc4f8a21ca45 (#10380)
⬆️ Update mudler/parakeet.cpp

Signed-off-by: github-actions[bot] <41898282+github-actions[bot]@users.noreply.github.com>
Co-authored-by: mudler <2420543+mudler@users.noreply.github.com>
2026-06-18 08:32:13 +02:00
LocalAI [bot]
88726f2da4 fix(react-ui): restore sidebar collapse in dev + stop Talk page auto-scroll (#10383)
The sidebar collapse toggle silently no-op'd in dev builds. toggleCollapse
ran its side effects (localStorage write + sidebar-collapse dispatch) inside
the setCollapsed updater. StrictMode double-invokes updaters in dev to surface
impurity, and the synchronous dispatch re-entered setState from the
App/Sidebar listeners mid-update, so the toggle never committed. Production
builds don't double-invoke, which is why only the dev server was affected.
Compute next from current state and move the persist + broadcast into the
handler body so the updater is pure.

Also fix the Talk page anchoring to the transcript box on load. The transcript
is its own overflow container, but scrollIntoView bubbles to every scrollable
ancestor including the window, yanking the whole page down on mount. Scroll
the transcript container directly instead.


Assisted-by: Claude:claude-opus-4-8 [Claude Code]

Signed-off-by: Ettore Di Giacinto <mudler@localai.io>
Co-authored-by: Ettore Di Giacinto <mudler@localai.io>
2026-06-18 00:48:56 +02:00
LocalAI [bot]
5c2ae7857a chore: ⬆️ Update antirez/ds4 to 80ebbc396aee40eedc1d829222f3362d10fa4c6c (#10378)
⬆️ Update antirez/ds4

Signed-off-by: github-actions[bot] <41898282+github-actions[bot]@users.noreply.github.com>
Co-authored-by: mudler <2420543+mudler@users.noreply.github.com>
2026-06-18 00:32:13 +02:00
LocalAI [bot]
4af360300f chore: ⬆️ Update ikawrakow/ik_llama.cpp to 71af16a6b7f6fb7315b346b4a51aad530599c3f5 (#10381)
⬆️ Update ikawrakow/ik_llama.cpp

Signed-off-by: github-actions[bot] <41898282+github-actions[bot]@users.noreply.github.com>
Co-authored-by: mudler <2420543+mudler@users.noreply.github.com>
2026-06-18 00:12:25 +02:00
LocalAI [bot]
5ac864dbed feat(ui): console-based navigation + drop-in API endpoint section (#10377)
* feat(ui): restructure sidebar into Create/Recognition/Build tiers

Signed-off-by: Ettore Di Giacinto <mudler@localai.io>

* fix(ui): preserve exact sidebar gating for agent items and fine-tune/quantize

Signed-off-by: Ettore Di Giacinto <mudler@localai.io>

* i18n(ui): add nav tier + console keys to all locales

Signed-off-by: Ettore Di Giacinto <mudler@localai.io>

* feat(ui): add grouped admin console via pathless layout route

Wrap the existing admin pages in a pathless AdminConsoleLayout route so
they keep their exact flat URLs while gaining a grouped left rail
(Inference / Cluster / Observability / Access / System). Rail item gating
mirrors the sidebar (adminOnly / authOnly / feature + /api/features). The
layout forwards the App-level outlet context (addToast) to the wrapped
pages, which read it via useOutletContext().

Signed-off-by: Ettore Di Giacinto <mudler@localai.io>

* feat(ui): fold Audio Transform into Studio as a tab

Signed-off-by: Ettore Di Giacinto <mudler@localai.io>

* test(ui): update e2e specs for tiered nav + admin console

Signed-off-by: Ettore Di Giacinto <mudler@localai.io>

* fix(ui): gate embedded Studio transform view on audio_transform feature

Signed-off-by: Ettore Di Giacinto <mudler@localai.io>

* feat(ui): visual polish + console-ize Build/Recognition tiers

Generalize the one-off admin console into a reusable ConsoleLayout driven by
a shared consoleConfig (single source of truth for the rail, its gating, and
the sidebar entry that opens it — removes the prior rail/sidebar drift).

- Promote Install Models to the top menu next to Home.
- Build and Operate are now console tiers (secondary rail); Create stays inline.
- Fold Recognition (Faces/Voices) into the Build console as a group alongside
  Automation and Training so it no longer feels split off.
- Style the console rail as a panel (header, grouped dividers, rounded active
  pills) with a hover nudge; sidebar items become inset rounded pills. The rail
  slide-in plays only when entering a console, not on item-to-item sub-nav
  (which remounts the layout), so switching no longer flashes the menu. All
  token-based (light + dark), respects reduced-motion.
- Add a delayed RouteFallback loader so lazy routes no longer flash blank;
  scoped inside ConsoleLayout so the rail stays put while the body loads.
- Update e2e specs for the new structure (.console-* classes, console entries).

Signed-off-by: Ettore Di Giacinto <mudler@localai.io>

* feat(ui): persist console layout across sub-nav + add drop-in endpoint section

- Keep the page-transition key stable within a console (derived from the
  shared console config) so the ConsoleLayout and its rail persist across
  item-to-item navigation instead of remounting — fixes the submenu flash.
  Cache /api/features across mounts and play the rail entrance animation only
  when actually entering a console.
- Add a "One endpoint, every API" section to Home: leads with LocalAI's own
  native API (images, video, realtime voice over WebRTC/WS, depth, object
  detection, rerank, audio/TTS, face & voice recognition) plus a Full API
  reference link, then the drop-in compatibility layer (OpenAI, Anthropic,
  Ollama, OpenAI Responses) with the live copyable base URL. All 7 locales.

Signed-off-by: Ettore Di Giacinto <mudler@localai.io>

* fix(ui): revert Middleware nav label rename (keep Middleware in all locales)

Signed-off-by: Ettore Di Giacinto <mudler@localai.io>

---------

Signed-off-by: Ettore Di Giacinto <mudler@localai.io>
Co-authored-by: Ettore Di Giacinto <mudler@localai.io>
2026-06-18 00:09:17 +02:00
LocalAI [bot]
9b57dcb721 docs: document all available backends and add "built by us" list (#10376)
Bring the Backend & Model Compatibility Table up to the full set of
backends published in backend/index.yaml (60+), organized by modality
with per-backend acceleration targets. Add an "Available Backends"
pointer and expand the backend-type list in the backends feature doc.

Update the README backend count to 60+ and add a "Backends built by us"
section listing the native C/C++/GGML engines maintained by the LocalAI
project (parakeet.cpp, voxtral.c, vibevoice.cpp, rf-detr.cpp,
locate-anything.cpp, depth-anything.cpp, LocalVQE, local-store).


Assisted-by: Claude:claude-opus-4-8 [Claude Code]

Signed-off-by: Ettore Di Giacinto <mudler@localai.io>
Co-authored-by: Ettore Di Giacinto <mudler@localai.io>
2026-06-17 20:39:09 +02:00
LocalAI [bot]
95e7149c87 chore: ⬆️ Update ggml-org/llama.cpp to 74ade52741203e5c8f81eaf06a96cb1cfe15f2a3 (#10368)
⬆️ Update ggml-org/llama.cpp

Signed-off-by: github-actions[bot] <41898282+github-actions[bot]@users.noreply.github.com>
Co-authored-by: mudler <2420543+mudler@users.noreply.github.com>
2026-06-17 13:25:29 +02:00
LocalAI [bot]
fd26c8c753 chore: ⬆️ Update ikawrakow/ik_llama.cpp to 064d23a6f816d50491d8c9b35a0cafe546eaf4b5 (#10367)
⬆️ Update ikawrakow/ik_llama.cpp

Signed-off-by: github-actions[bot] <41898282+github-actions[bot]@users.noreply.github.com>
Co-authored-by: mudler <2420543+mudler@users.noreply.github.com>
2026-06-17 13:25:14 +02:00
LocalAI [bot]
e60c094a7d feat(ds4): SSD streaming + quality engine options, 128GB DeepSeek gallery models (#10374)
feat(ds4): wire SSD streaming + quality engine options, add 128GB DeepSeek gallery models

The ds4 backend zero-initialized ds4_engine_options and exposed none of the
engine's tunable knobs, so SSD streaming (run a model larger than RAM by
streaming routed MoE experts from the GGUF on SSD) and the quality/perf knobs
were unreachable from LocalAI model YAMLs.

Map ModelOptions.Options onto ds4_engine_options through a declarative table
(kEngineOptSpecs + apply_engine_option) instead of per-field branches: the
struct is fixed C with no reflection, so the field set is enumerated once and a
future knob is a one-line table row. Two fields use ds4's own typed parsers
(GiB budgets, cache-experts count-or-NGB). Bare flags (e.g. "ssd_streaming")
mean true; path-type options (mtp_path, expert_profile_path,
directional_steering_file) resolve relative to the model directory so a gallery
entry can reference a companion file by bare filename. mtp_draft/mtp_margin are
now validated rather than parsed with throwing std::stoi/std::stof.

Add gallery entries for the 128 GB class:
- deepseek-v4-flash-q2-q4 (~91 GB, mixed q2/q4, fits RAM, higher quality)
- deepseek-v4-flash-q4-ssd (~153 GB full 4-bit, runs on 128 GB via SSD streaming)
- deepseek-v4-flash-q2-mtp (~81 GB + MTP speculative draft weights)
- deepseek-v4-pro-q2-ssd (~433 GB Pro, experimental SSD streaming)

SSD streaming is Metal (Darwin) only; the options are inert on CUDA/CPU.


Assisted-by: Claude:claude-opus-4-8 [Claude Code]

Signed-off-by: Ettore Di Giacinto <mudler@localai.io>
Co-authored-by: Ettore Di Giacinto <mudler@localai.io>
2026-06-17 10:30:06 +02:00
LocalAI [bot]
159df8e2ef feat(swagger): update swagger (#10365)
Signed-off-by: github-actions[bot] <41898282+github-actions[bot]@users.noreply.github.com>
Co-authored-by: mudler <2420543+mudler@users.noreply.github.com>
2026-06-17 09:32:17 +02:00
LocalAI [bot]
de299ca101 chore(model-gallery): ⬆️ update checksum (#10371)
⬆️ Checksum updates in gallery/index.yaml

Signed-off-by: github-actions[bot] <41898282+github-actions[bot]@users.noreply.github.com>
Co-authored-by: mudler <2420543+mudler@users.noreply.github.com>
2026-06-17 09:28:47 +02:00
LocalAI [bot]
980ec4a311 chore: ⬆️ Update antirez/ds4 to cafc134f78a5a1890d98808d3102f4313573a1bc (#10369)
⬆️ Update antirez/ds4

Signed-off-by: github-actions[bot] <41898282+github-actions[bot]@users.noreply.github.com>
Co-authored-by: mudler <2420543+mudler@users.noreply.github.com>
2026-06-17 09:28:19 +02:00
LocalAI [bot]
dfd5a00e6f chore: ⬆️ Update ggml-org/whisper.cpp to 9efddafb9153e1fb22bdc3dd3057072c99165ed2 (#10366)
⬆️ Update ggml-org/whisper.cpp

Signed-off-by: github-actions[bot] <41898282+github-actions[bot]@users.noreply.github.com>
Co-authored-by: mudler <2420543+mudler@users.noreply.github.com>
2026-06-17 09:27:52 +02:00
LocalAI [bot]
63be479066 chore: ⬆️ Update leejet/stable-diffusion.cpp to 7f0e728b7d42f2490dfa5dd9539082d904f2f6b2 (#10370)
⬆️ Update leejet/stable-diffusion.cpp

Signed-off-by: github-actions[bot] <41898282+github-actions[bot]@users.noreply.github.com>
Co-authored-by: mudler <2420543+mudler@users.noreply.github.com>
2026-06-17 09:08:34 +02:00
LocalAI [bot]
4c6750fe6b feat(depth): metric-large + nested metric model gallery entries (#10363)
* feat(depth): add depth-anything-3-metric-large gallery entry

DA3METRIC-LARGE (ViT-L) single-file metric-scale depth + sky, served by the
existing depth-anything backend (same single-GGUF path as mono-large). GGUF
published at mudler/depth-anything.cpp-gguf.

Assisted-by: Claude:claude-opus-4-8
Signed-off-by: Ettore Di Giacinto <mudler@localai.io>

* feat(depth): serve nested metric model (two-file load)

The DA3 nested model needs both branches (anyview GIANT + metric ViT-L) loaded
together. Wire it through the backend:
- Load reads a 'metric_model:<file>' entry from ModelOptions.Options and, when
  present, calls da_capi_load_nested(anyview, metric) instead of da_capi_load
  (registers the new abi-4 symbol; helper optionValue + unit test).
- gallery: depth-anything-3-nested (model=anyview, options=metric branch, both
  GGUFs fetched) for metric-scale depth + pose.
- bump depth-anything.cpp pin to cce5edc (abi 4 / da_capi_load_nested).

Assisted-by: Claude:claude-opus-4-8
Signed-off-by: Ettore Di Giacinto <mudler@localai.io>

---------

Signed-off-by: Ettore Di Giacinto <mudler@localai.io>
Co-authored-by: Ettore Di Giacinto <mudler@localai.io>
2026-06-16 22:03:58 +02:00
LocalAI [bot]
a6e1c6d0b3 fix(docs): use relearn notice shortcode instead of unsupported alert (#10364)
The Hugo relearn theme does not provide an "alert" shortcode, so the
docs deploy failed at the Build site step:

  failed to extract shortcode: template for shortcode "alert" not found
  docs/content/features/image-generation.md:106

Convert the vae_decode_only note to the theme-supported notice shortcode
used everywhere else in the docs.

Signed-off-by: Ettore Di Giacinto <mudler@localai.io>
Co-authored-by: Ettore Di Giacinto <mudler@localai.io>
2026-06-16 21:06:20 +02:00
LocalAI [bot]
294170d3ed feat(backend): add depth-anything (Depth Anything 3) C++/ggml backend + gallery (#10352)
* feat(backend): add depth-anything (Depth Anything 3) C++/ggml backend + gallery

Mirrors the locate-anything-cpp backend to register a new depth-anything
backend that wraps the Depth Anything 3 ggml port (depth-anything.cpp) via
purego (cgo-less, no Python at inference).

- backend/go/depth-anything-cpp/: gRPC backend (Load + Predict + GenerateImage),
  purego binding to the da_capi_* C ABI, CMake/Makefile/run/package/test scripts
  building depth-anything.cpp's DA_SHARED static .so per CPU variant.
- backend/index.yaml: depth-anything backend meta + all hardware-variant
  capability entries (cpu/cuda12/cuda13/intel-sycl-f32+f16/vulkan/nvidia-l4t).
- gallery/index.yaml: 8 Depth Anything 3 GGUF models (base q4_k/q8_0/f16/f32,
  small, large, giant, mono-large).
- .github/backend-matrix.yml: one build entry per hardware variant.

Assisted-by: Claude:claude-opus-4-8
Signed-off-by: Ettore Di Giacinto <mudler@localai.io>

* feat(depth): typed Depth RPC + REST endpoint exposing full DA3 data

Assisted-by: Claude:claude-opus-4-8
Signed-off-by: Ettore Di Giacinto <mudler@localai.io>

* fix(depth): pin depth-anything.cpp to e0b6814 (ABI 3 dense C-API)

The Depth RPC handler calls da_capi_depth_dense / da_capi_points (C-API ABI 3);
pin the native build to the commit that exports them.

Assisted-by: Claude:claude-opus-4-8
Signed-off-by: Ettore Di Giacinto <mudler@localai.io>

* fix(depth): pin depth-anything.cpp to v0.1.0 release (b515c31)

Repoint the native version from the now-orphaned e0b6814 to the
b515c31 release commit, kept alive by the upstream v0.1.0 tag.
C-API is unchanged (da_capi_abi_version == 3).

Assisted-by: Claude:claude-opus-4-8
Signed-off-by: Ettore Di Giacinto <mudler@localai.io>

* fix(depth): wire depth-anything-cpp into build, CI bump, and importer

The backend dir, gallery index, and CI build-matrix were present but the
backend was never wired into the integration points that adding-backends.md
requires:

- root Makefile: add to .NOTPARALLEL, the test-extra chain, a BACKEND_*
  definition, the docker-build target eval, and docker-build-backends
  (mirrors parakeet-cpp; the backend's own Makefile already documented that
  its `test` target is driven by test-extra).
- bump_deps.yaml: register the DEPTHANYTHING_VERSION pin so the daily
  auto-bump bot tracks mudler/depth-anything.cpp master (it cannot see an
  unregistered Makefile pin).
- import form: add a preference-only KnownBackend entry so depth-anything is
  selectable at /import-model (mirrors sam3-cpp; no reliable GGUF auto-detect
  signal, so pref-only per the doc's default).

changed-backends.js needs no entry: the generic golang suffix branch already
resolves backend/go/depth-anything-cpp/.

Assisted-by: Claude:claude-opus-4-8
Signed-off-by: Ettore Di Giacinto <mudler@localai.io>

* feat(depth): auto-detect importer for depth-anything GGUFs

Replace the preference-only entry with a real auto-detect importer
(mirrors parakeet-cpp / locate-anything):

- DepthAnythingImporter matches a .gguf whose name carries a
  depth-anything token (depth-anything-<size>-<quant>.gguf), so
  /import-model recognises mudler/depth-anything.cpp-gguf repos and direct
  GGUF URLs without an explicit backend preference. preferences.backend=
  "depth-anything" still forces it.
- Registered before LlamaCPPImporter so its GGUF bundles aren't claimed by
  the generic .gguf importer; the narrow name match means it cannot claim
  arbitrary llama GGUFs or the upstream safetensors PyTorch repos.
- Multi-quant repos pick the smallest quant by default (q4_k -> ... -> f32,
  depth stays >0.998 corr even at q4_k); quantizations preference overrides.
- Drops the now-redundant knownPrefOnlyBackends entry (importer-backed
  backends are not listed there, matching parakeet-cpp).
- Table-driven Ginkgo test covers detection, negative cases (llama GGUF,
  upstream safetensors), default/override/fallback quant pick, and direct
  URL import. 10/10 specs pass.

Assisted-by: Claude:claude-opus-4-8
Signed-off-by: Ettore Di Giacinto <mudler@localai.io>

* fix(depth): check conn.Close error in grpc Depth client (errcheck)

The new Depth() client method used a bare `defer conn.Close()`. golangci-lint
runs with new-from-merge-base, so although the 39 sibling methods use the same
bare form (grandfathered), the newly added line trips errcheck. Drop the result
explicitly to satisfy the linter.

Signed-off-by: Ettore Di Giacinto <mudler@localai.io>
Assisted-by: Claude:claude-opus-4-8

* fix(depth): bump depth-anything.cpp to v0.1.1 (embeddable CMake)

v0.1.0 (b515c31) used ${CMAKE_SOURCE_DIR} for its include dirs, which
points at the parent project when built via add_subdirectory() as this
backend does, so the container build failed with missing stb_image.h /
da_gguf_keys.h. v0.1.1 (2d42897) switches to project-relative paths.

Signed-off-by: Ettore Di Giacinto <mudler@localai.io>
Assisted-by: Claude:claude-opus-4-8

* fix(depth): resolve gosec findings in the backend wrapper

The code-scanning gate flagged three new failure-level alerts in
godepthanythingcpp.go (gosec runs with -no-fail; GitHub gates on new alerts):

- G301: export dirs were created with 0o755. Tighten to 0o750 (no world
  access needed for backend-written export output).
- G304: writeDepthPNG creates req.GetDst(). That path is chosen by the
  LocalAI core as the intended output destination (same pattern every
  image backend uses), not attacker input, so annotate with #nosec G304
  and document why.

The remaining G103 "audit unsafe" notes on the unsafe.Slice C-buffer copies
are warning-level (the same purego interop whisper/parakeet use) and do not
gate the check, per the supertonic exclusion precedent in secscan.yaml.

Signed-off-by: Ettore Di Giacinto <mudler@localai.io>
Assisted-by: Claude:claude-opus-4-8

* fix(depth): bump depth-anything.cpp to v0.1.2 (CUDA cross-build arch)

v0.1.1 forced CMAKE_CUDA_ARCHITECTURES=native, which breaks the GPU-less
l4t/cublas CI builds (nvcc "Unsupported gpu architecture 'compute_'" on
CMake 3.22). v0.1.2 (442eea4) drops the override and lets ggml pick its
default cross-build arch list.

Signed-off-by: Ettore Di Giacinto <mudler@localai.io>
Assisted-by: Claude:claude-opus-4-8

---------

Signed-off-by: Ettore Di Giacinto <mudler@localai.io>
Co-authored-by: Ettore Di Giacinto <mudler@localai.io>
2026-06-16 16:28:28 +02:00
LocalAI [bot]
1ab61a0875 feat: generic chat_template_kwargs (model config + per-request metadata) (#10359)
* feat(config): add chat_template_kwargs model field + resolver

Adds the ChatTemplateKwargs model-config map and RequestMetadata carrier,
plus ResolveChatTemplateKwargs which layers the config map under coerced
request metadata. Foundation for generic jinja chat-template kwargs (issue #10329).

Assisted-by: Claude:claude-opus-4-8
Signed-off-by: Ettore Di Giacinto <mudler@localai.io>

* feat(backend): forward resolved chat_template_kwargs blob to backends

gRPCPredictOpts now merges per-request client metadata over the server-derived
enable_thinking/reasoning_effort (reaching all backends via the standalone keys)
and serialises the resolved chat_template_kwargs map into a JSON blob for
llama.cpp, written last so a client cannot clobber it. Issue #10329.

Assisted-by: Claude:claude-opus-4-8
Signed-off-by: Ettore Di Giacinto <mudler@localai.io>

* feat(http): wire request metadata to config.RequestMetadata

The OpenAI request metadata field was parsed but unused; stamp it onto the
per-request ModelConfig so gRPCPredictOpts forwards it as chat_template_kwargs
overrides. Issue #10329.

Assisted-by: Claude:claude-opus-4-8
Signed-off-by: Ettore Di Giacinto <mudler@localai.io>

* feat(llama-cpp): generic chat_template_kwargs merge (drop per-key blocks)

Replace the per-key enable_thinking/reasoning_effort handling in both the
streaming and non-streaming chat paths with a single block that parses the
chat_template_kwargs JSON blob resolved by the Go layer and merges every key
into body_json. New jinja template levers (e.g. preserve_thinking) now need
no C++ change. Issue #10329.

Assisted-by: Claude:claude-opus-4-8
Signed-off-by: Ettore Di Giacinto <mudler@localai.io>

* docs: document custom chat_template_kwargs (model + per-request)

Issue #10329.

Assisted-by: Claude:claude-opus-4-8
Signed-off-by: Ettore Di Giacinto <mudler@localai.io>

* test(backend): pin reasoning_effort as a string in the chat_template_kwargs blob

Issue #10329.

Assisted-by: Claude:claude-opus-4-8
Signed-off-by: Ettore Di Giacinto <mudler@localai.io>

* test(http): e2e guard pinning chat_template_kwargs forwarded to gRPC

Adds an ECHO_PREDICT_METADATA marker to the mock-backend that echoes the
received PredictOptions.Metadata, and an app_test.go spec that drives a real
/v1/chat/completions request (model chat_template_kwargs + per-request metadata
override) and asserts the exact metadata + chat_template_kwargs blob the REST
layer forwards to gRPC. Locks the REST->gRPC contract against regressions. Issue #10329.

Assisted-by: Claude:claude-opus-4-8
Signed-off-by: Ettore Di Giacinto <mudler@localai.io>

* test(config): grandfather chat_template_kwargs in registry coverage

chat_template_kwargs is a free-form map[string]any (like engine_args, already
on the list), not a scalar the config UI registry can surface, so it is exempt
from the registry-entry requirement. Fixes the TestAllFieldsHaveRegistryEntries
failure introduced by the new field. Issue #10329.

Assisted-by: Claude:claude-opus-4-8
Signed-off-by: Ettore Di Giacinto <mudler@localai.io>

---------

Signed-off-by: Ettore Di Giacinto <mudler@localai.io>
Co-authored-by: Ettore Di Giacinto <mudler@localai.io>
2026-06-16 12:16:34 +02:00
LocalAI [bot]
f44034021e chore: ⬆️ Update leejet/stable-diffusion.cpp to 5a34bc7f6e0621dd2f899daa64476eac667d7ed3 (#10335)
* ⬆️ Update leejet/stable-diffusion.cpp

Signed-off-by: github-actions[bot] <41898282+github-actions[bot]@users.noreply.github.com>

* fix(stablediffusion-ggml): adapt gosd.cpp to upstream sd_ctx_params_t API

The bump to 5a34bc7 restructured sd_ctx_params_t: the boolean CPU-offload
knobs (offload_params_to_cpu, keep_clip_on_cpu, keep_vae_on_cpu,
keep_control_net_on_cpu) were replaced by backend assignment specs
(backend/params_backend), and vae_decode_only / free_params_immediately
were dropped entirely. The build broke with "no member named ..." on
every arch.

Translate the legacy options we still accept from gallery configs into
the new backend assignment specs, mirroring prepare_backend_assignments()
in the upstream CLI, so offload_params_to_cpu / keep_*_on_cpu keep
working. vae_decode_only is parsed and ignored for config compatibility.

Signed-off-by: Ettore Di Giacinto <mudler@localai.io>
Assisted-by: Claude:claude-opus-4-8 [Claude Code]

* feat(stablediffusion-ggml): expose backend/params placement options

The upstream bump introduced new sd_ctx_params_t fields for device and
memory placement (backend, params_backend, rpc_servers, max_vram,
stream_layers) plus PuLID-Flux weights (pulid_weights_path). Wire them up
as backend options so models can be split across CPU/GPU/disk/RPC:

- backend: per-component compute placement (e.g. clip=cpu,vae=cuda0)
- params_backend: per-component weight storage incl. disk mmap
- max_vram / stream_layers: graph-cut segmented parameter offload budget
- rpc_servers: offload compute to remote RPC servers
- pulid_weights_path: PuLID-Flux identity injection

The legacy keep_*_on_cpu / offload_params_to_cpu booleans now seed and
compose with the explicit backend/params_backend specs, matching upstream
prepare_backend_assignments(). Option values are taken as everything after
the first ':' so colon-bearing values (rpc_servers host:port) survive
parsing. Documented the new options in the image-generation guide.

Signed-off-by: Ettore Di Giacinto <mudler@localai.io>
Assisted-by: Claude:claude-opus-4-8 [Claude Code]

* feat(stablediffusion-ggml): distributed RPC across ggml workers

Enable the ggml RPC backend (-DSD_RPC=ON) so image generation can be
sharded across remote rpc-server workers. The ggml rpc-server is
backend-agnostic, so this reuses the exact same worker pool as the
llama.cpp backend - one set of `local-ai worker llama-cpp-rpc` /
`p2p-llama-cpp-rpc` workers accelerates both text and image generation.

RPC servers are selected by precedence:
- the explicit `rpc_servers` option, else
- the LLAMACPP_GRPC_SERVERS env var, which LocalAI's p2p worker mode
  populates automatically with discovered workers (the backend inherits
  it from the parent process env), so distributed image generation needs
  no per-model configuration.

Documented manual and p2p setup in the image-generation guide.

Signed-off-by: Ettore Di Giacinto <mudler@localai.io>
Assisted-by: Claude:claude-opus-4-8 [Claude Code]

---------

Signed-off-by: github-actions[bot] <41898282+github-actions[bot]@users.noreply.github.com>
Signed-off-by: Ettore Di Giacinto <mudler@localai.io>
Co-authored-by: mudler <2420543+mudler@users.noreply.github.com>
Co-authored-by: Ettore Di Giacinto <mudler@localai.io>
2026-06-16 12:15:45 +02:00
LocalAI [bot]
6b9f1bd4b3 chore: ⬆️ Update antirez/ds4 to e34a8086693ba7ca5cfabd2b9028ee52f0bfac2e (#10350)
* ⬆️ Update antirez/ds4

Signed-off-by: github-actions[bot] <41898282+github-actions[bot]@users.noreply.github.com>

* fix(ds4): add Homebrew include/lib prefix for Darwin grpc-proto build

The darwin/metal ds4 backend job runs for the first time on this bump
(it was skipped on prior ds4 PRs) and fails compiling backend.pb.cc with
'google/protobuf/runtime_version.h' file not found.

hw_grpc_proto links neither protobuf::libprotobuf nor gRPC::grpc++, so
the generated proto sources rely on default system include paths. That
works on Linux (/usr/include) but not on macOS, where Homebrew installs
under /opt/homebrew. Add the Homebrew prefix to include/link dirs on
Darwin, mirroring the llama-cpp backend that already builds on Darwin CI.

Signed-off-by: Ettore Di Giacinto <mudler@localai.io>
Assisted-by: Claude:claude-opus-4-8 [Claude Code]

* fix(ds4): install nlohmann-json on Darwin CI for ds4 backend

After the protobuf include-path fix the ds4 darwin build advances to
compiling dsml_renderer.cpp, which includes <nlohmann/json.hpp> and
#errors when absent. On Linux the header comes from apt nlohmann-json3-dev
in the build image; the macOS runner had no equivalent. Add the
header-only nlohmann-json formula to the shared Darwin backend brew
install/link list and Homebrew cache, alongside the existing deps.

Signed-off-by: Ettore Di Giacinto <mudler@localai.io>
Assisted-by: Claude:claude-opus-4-8 [Claude Code]

* fix(ds4): build proper OCI image tar for Darwin backend

The darwin packaging referenced scripts/build/oci-pack.sh, which was
never added to the tree, so it fell back to a plain 'tar' that omits
manifest.json. 'local-ai backends install' then rejects the tarball
with 'file manifest.json not found in tar'.

Use './local-ai util create-oci-image' (already built by the 'build'
prerequisite of the backends/ds4-darwin target), mirroring
llama-cpp-darwin.sh, to emit a real OCI image the installer accepts.

Signed-off-by: Ettore Di Giacinto <mudler@localai.io>
Assisted-by: Claude:claude-opus-4-8 [Claude Code]

---------

Signed-off-by: github-actions[bot] <41898282+github-actions[bot]@users.noreply.github.com>
Signed-off-by: Ettore Di Giacinto <mudler@localai.io>
Co-authored-by: mudler <2420543+mudler@users.noreply.github.com>
Co-authored-by: Ettore Di Giacinto <mudler@localai.io>
2026-06-16 09:59:50 +02:00
github-actions[bot]
416f871bea chore: bump inference defaults from unsloth (#10358)
Co-authored-by: mudler <2420543+mudler@users.noreply.github.com>
2026-06-16 09:59:36 +02:00
LocalAI [bot]
8bd2df8f68 fix(launcher): truncate download status labels to stop progress dialog blowout (#10357)
fix(launcher): truncate download status labels to stop dialog blowout

The download progress windows place a ProgressBar and a status Label in the
same VBox. On failure the status label is set to "Download failed: <error>",
and the error commonly contains a long, unbreakable URL/path. A Fyne label
with default settings reports its MinSize as the full single-line text width,
so a long message stretches the window — and the progress bar sharing the
VBox — arbitrarily wide (fixes #10355).

Set Truncation = fyne.TextTruncateEllipsis on the four affected status labels
(the main-window status label plus the status label in each of the three
showDownloadProgress implementations). Truncation collapses the label's
MinSize to roughly one character plus the ellipsis regardless of content, so
the window keeps its intended size. TextWrapWord is not enough because it
cannot break a spaceless URL. The full error text remains visible via the
dialog.ShowError call already present in each path.


Assisted-by: Claude:claude-opus-4-8 [Claude Code]

Signed-off-by: Ettore Di Giacinto <mudler@localai.io>
Co-authored-by: Ettore Di Giacinto <mudler@localai.io>
2026-06-16 09:42:07 +02:00
neo
6799d802d3 docs: add translated README links (#10353) 2026-06-16 09:06:49 +02:00
LocalAI [bot]
40cc549882 fix(ci): track ServeurpersoCom/qwentts.cpp for QWEN3TTS_CPP_VERSION bumps (#10356)
The qwen3-tts backend migrated from predict-woo/qwen3-tts.cpp to
ServeurpersoCom/qwentts.cpp (the Makefile QWEN3TTS_REPO already points
there), but the bump_deps matrix still tracked the old repo. That made
the nightly bumper open PRs (e.g. #10334) against the wrong upstream.
Point the matrix entry at the new repo and its master branch.

Signed-off-by: Ettore Di Giacinto <mudler@localai.io>
Co-authored-by: Ettore Di Giacinto <mudler@localai.io>
2026-06-16 09:04:52 +02:00
LocalAI [bot]
3d295adfa8 chore: ⬆️ Update ikawrakow/ik_llama.cpp to 2f524850a1f67716bc0ba80ffa30ce39c5b8bd5f (#10336)
⬆️ Update ikawrakow/ik_llama.cpp

Signed-off-by: github-actions[bot] <41898282+github-actions[bot]@users.noreply.github.com>
Co-authored-by: mudler <2420543+mudler@users.noreply.github.com>
Co-authored-by: Ettore Di Giacinto <mudler@users.noreply.github.com>
2026-06-16 09:04:35 +02:00
LocalAI [bot]
4fa2064875 chore: ⬆️ Update ggml-org/llama.cpp to 7dad2f1a17d65b5e2034c277125bc9f97573a779 (#10337)
⬆️ Update ggml-org/llama.cpp

Signed-off-by: github-actions[bot] <41898282+github-actions[bot]@users.noreply.github.com>
Co-authored-by: mudler <2420543+mudler@users.noreply.github.com>
2026-06-16 08:22:26 +02:00
LocalAI [bot]
cb74399b3a chore: ⬆️ Update ggml-org/whisper.cpp to 0ec0845110dc934911dc48e8c5beb5ad3189b3f3 (#10349)
⬆️ Update ggml-org/whisper.cpp

Signed-off-by: github-actions[bot] <41898282+github-actions[bot]@users.noreply.github.com>
Co-authored-by: mudler <2420543+mudler@users.noreply.github.com>
2026-06-16 08:22:10 +02:00
dependabot[bot]
2388686369 chore(deps): bump grpcio from 1.81.0 to 1.81.1 in /backend/python/vllm (#10347)
Bumps [grpcio](https://github.com/grpc/grpc) from 1.81.0 to 1.81.1.
- [Release notes](https://github.com/grpc/grpc/releases)
- [Commits](https://github.com/grpc/grpc/compare/v1.81.0...v1.81.1)

---
updated-dependencies:
- dependency-name: grpcio
  dependency-version: 1.81.1
  dependency-type: direct:production
  update-type: version-update:semver-patch
...

Signed-off-by: dependabot[bot] <support@github.com>
Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com>
2026-06-15 22:57:38 +02:00
LocalAI [bot]
edc61053aa fix(gallery): hide broken Gemma 4 QAT MTP entries (#10348)
The Gemma 4 QAT MTP assistant-head gallery entries currently fail to load in the stock llama.cpp backend with unknown architecture errors. Hide them until the assistant GGUFs are verified against the supported backend path.

Assisted-by: Codex:GPT-5 [gh] [git]

Co-authored-by: Ettore Di Giacinto <mudler@localai.io>
2026-06-15 22:57:19 +02:00
Dedy F. Setyawan
9ba8521e7e feat(react-ui): localize models and fix 'Import' typo (#10341)
* feat(react-ui): localize SearchableSelect component

Signed-off-by: Dedy F. Setyawan <dedyfajars@gmail.com>

* feat(react-ui): localize ModelSelector component

Signed-off-by: Dedy F. Setyawan <dedyfajars@gmail.com>

* fix(react-ui): dynamically localize back navigation caption to match page title

Signed-off-by: Dedy F. Setyawan <dedyfajars@gmail.com>

* feat(react-ui): localize back navigation state on Models page

Signed-off-by: Dedy F. Setyawan <dedyfajars@gmail.com>

* feat(react-ui): localize ModelEditor page

Signed-off-by: Dedy F. Setyawan <dedyfajars@gmail.com>

* fix(react-ui): fix Indonesian typo 'Import' to 'Impor' in importModel locale

Signed-off-by: Dedy F. Setyawan <dedyfajars@gmail.com>

---------

Signed-off-by: Dedy F. Setyawan <dedyfajars@gmail.com>
Co-authored-by: Ettore Di Giacinto <mudler@users.noreply.github.com>
2026-06-15 18:26:27 +02:00
LocalAI [bot]
51c23197ed docs: ⬆️ update docs version mudler/LocalAI (#10333)
⬆️ Update docs version mudler/LocalAI

Signed-off-by: github-actions[bot] <41898282+github-actions[bot]@users.noreply.github.com>
Co-authored-by: mudler <2420543+mudler@users.noreply.github.com>
2026-06-15 16:55:34 +02:00
LocalAI [bot]
2df2876db2 feat(supertonic): add Supertonic ONNX TTS backend (CPU) (#10342)
* feat(supertonic): vendor upstream Go TTS pipeline (helper.go)

Assisted-by: Claude:claude-opus-4-8 [Claude Code]
Signed-off-by: Ettore Di Giacinto <mudler@localai.io>

* feat(supertonic): add gRPC backend (Load/TTS/TTSStream, CPU)

Assisted-by: Claude:claude-opus-4-8 [Claude Code]
Signed-off-by: Ettore Di Giacinto <mudler@localai.io>

* fix(supertonic): satisfy unused linter (use onnxProvider; exclude vendored helper.go)

Assisted-by: Claude:claude-opus-4-8 [Claude Code]
Signed-off-by: Ettore Di Giacinto <mudler@localai.io>

* test(supertonic): unit tests for resolvers + gated end-to-end synthesis

Assisted-by: Claude:claude-opus-4-8 [Claude Code]
Signed-off-by: Ettore Di Giacinto <mudler@localai.io>

* style(supertonic): gofmt backend.go comment block

Assisted-by: Claude:claude-opus-4-8 [Claude Code]
Signed-off-by: Ettore Di Giacinto <mudler@localai.io>

* feat(supertonic): add Makefile, run.sh, package.sh (CPU build)

Assisted-by: Claude:claude-opus-4-8 [Claude Code]
Signed-off-by: Ettore Di Giacinto <mudler@localai.io>

* build(supertonic): wire backend into root Makefile

Assisted-by: Claude:claude-opus-4-8 [Claude Code]
Signed-off-by: Ettore Di Giacinto <mudler@localai.io>

* fix(supertonic): check ort.DestroyEnvironment return (errcheck)

Assisted-by: Claude:claude-opus-4-8 [Claude Code]
Signed-off-by: Ettore Di Giacinto <mudler@localai.io>

* fix(supertonic): resolve voice_styles as sibling of onnx dir; guard trim; test voice

Assisted-by: Claude:claude-opus-4-8 [Claude Code]
Signed-off-by: Ettore Di Giacinto <mudler@localai.io>

* feat(supertonic): add CPU build matrix + gallery index entries

Assisted-by: Claude:claude-opus-4-8 [Claude Code]
Signed-off-by: Ettore Di Giacinto <mudler@localai.io>

* feat(supertonic): expose as pref-only importable backend

Assisted-by: Claude:claude-opus-4-8 [Claude Code]
Signed-off-by: Ettore Di Giacinto <mudler@localai.io>

* feat(supertonic): add Supertonic/supertonic-3 TTS model to the gallery

16 files (4 onnx + tts.json + unicode_indexer.json + 10 voice styles)
from HF Supertone/supertonic-3, served via the supertonic backend.
Defaults to voice F1; onnx/ + sibling voice_styles/ layout matches the
backend's resolveVoicesDir.

Assisted-by: Claude:claude-opus-4-8 [Claude Code]
Signed-off-by: Ettore Di Giacinto <mudler@localai.io>

* fix(meta): register pipeline.max_history_items config field

Pre-existing on master: the field was added without a registry entry,
failing TestAllFieldsHaveRegistryEntries (core/config/meta). Add the
entry so it renders properly in the model-config UI.

Assisted-by: Claude:claude-opus-4-8 [Claude Code]
Signed-off-by: Ettore Di Giacinto <mudler@localai.io>

* ci(secscan): exclude vendored supertonic backend from gosec

helper.go is vendored from supertone-inc/supertonic; its G304/G404/G104
findings are inherent to upstream and the math/rand use is correct for
flow-matching noise (crypto/rand would be wrong).

Assisted-by: Claude:claude-opus-4-8 [Claude Code]
Signed-off-by: Ettore Di Giacinto <mudler@localai.io>

---------

Signed-off-by: Ettore Di Giacinto <mudler@localai.io>
Co-authored-by: Ettore Di Giacinto <mudler@localai.io>
2026-06-15 16:54:11 +02:00
LocalAI [bot]
f648f07b13 chore: ⬆️ Update ggml-org/llama.cpp to 4988f6e866057afd130c1515ecef0c9bab9a15f8 (#10280)
⬆️ Update ggml-org/llama.cpp

Signed-off-by: github-actions[bot] <41898282+github-actions[bot]@users.noreply.github.com>
Co-authored-by: mudler <2420543+mudler@users.noreply.github.com>
2026-06-14 21:53:25 +02:00
LocalAI [bot]
1dedb5277c feat(gallery): add all Italian + all UK English sherpa-onnx Piper voices (#10332)
Expands sherpa-onnx Piper TTS coverage in the model gallery. Previously only
5 single-speaker Piper voices shipped (it_IT-paola, en_US-amy, es_ES-davefx,
fr_FR-siwis, de_DE-thorsten). This adds 19 entries:

Italian (it_IT): dii-high, miro-high, riccardo-x_low.
UK English (en_GB): alan (low+medium), alba-medium, aru-medium, cori
(high+medium), dii-high, jenny_dioco-medium, miro-high,
northern_english_male-medium, semaine-medium, southern_english_female
(low+medium), southern_english_male-medium, vctk-medium, sweetbbak-amy.

Each entry mirrors the existing Piper block (sherpa-onnx-tts.yaml base config).
sha256, ONNX path, sample rate and speaker count were read from the actual
release tarballs; licenses and source URLs were taken from each archive's
MODEL_CARD/README rather than assumed:

- dii/miro voices are OpenVoiceOS models under CC BY-NC-SA 4.0 (non-commercial),
  labelled as such in both the license field and description.
- cori is LibriVox public-domain (cc0-1.0); OpenSLR-83 voices are CC BY-SA 4.0;
  alba/vctk are CC BY 4.0.
- vctk (109), aru (12) and semaine (4) are multi-speaker; tagged accordingly
  with a note to select the speaker via the numeric voice id.

The legacy underscore-named southern_english_female_medium duplicate is
intentionally skipped. No backend change is needed: sherpa-onnx auto-detects
single-speaker VITS vs Kokoro, and each tarball ships its own espeak-ng-data.


Assisted-by: Claude:claude-opus-4-8 [Claude Code]

Signed-off-by: Ettore Di Giacinto <mudler@localai.io>
Co-authored-by: Ettore Di Giacinto <mudler@localai.io>
2026-06-14 18:33:44 +02:00
LocalAI [bot]
7d2a762b53 feat(realtime): configurable pipeline.max_history_items (#10331)
Composed realtime pipelines (VAD+STT+LLM+TTS) defaulted to unlimited history,
so a long-running session grew every turn and fed the whole conversation to the
LLM until its context window filled. Add an optional pipeline.max_history_items
to cap the trailing items per turn; explicit value (including 0=unlimited) wins
over the per-model-type default. Self-contained any-to-any models keep their
6-item default.

Co-authored-by: Ettore Di Giacinto <mudler@localai.io>
Co-authored-by: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
2026-06-14 18:13:09 +02:00
LocalAI [bot]
61cde6fd77 chore: ⬆️ Update ikawrakow/ik_llama.cpp to 5f917a64b391b7d31839845153a473a65f630458 (#10240)
⬆️ Update ikawrakow/ik_llama.cpp

Signed-off-by: github-actions[bot] <41898282+github-actions[bot]@users.noreply.github.com>
Co-authored-by: mudler <2420543+mudler@users.noreply.github.com>
2026-06-14 16:46:49 +02:00
LocalAI [bot]
ca1668dd85 feat(swagger): update swagger (#10278)
Signed-off-by: github-actions[bot] <41898282+github-actions[bot]@users.noreply.github.com>
Co-authored-by: mudler <2420543+mudler@users.noreply.github.com>
2026-06-14 16:46:34 +02:00
LocalAI [bot]
fdc352a618 fix(settings): start watchdog on cold-enable from the React UI (#9125) (#10287)
fix(watchdog): start the live watchdog on a cold enable from Settings (#9125)

The React Settings "Enable Watchdog" master toggle only ever writes the
idle/busy flags; watchdog_enabled is vestigial in that UI. The live
start/stop decision in UpdateSettingsEndpoint keyed off the raw, stale
watchdog_enabled request field, so a cold enable (idle/busy=true,
watchdog_enabled=false) called StopWatchdog() and the watchdog stayed
stopped until the next restart - at which point startup re-derived it
from the idle flag. Net: enabling the watchdog appeared to do nothing.

Derive the run-state from idle||busy as the single source of truth,
mirroring the startup invariant:

- ApplyRuntimeSettings now sets WatchDog = idle||busy whenever either
  field is present (so a full disable also brings it down), while an API
  client posting only watchdog_enabled keeps its explicit value.
- Add ApplicationConfig.WatchdogShouldRun() mirroring startWatchdog's
  gating (idle/busy, LRU eviction, memory reclaimer); the /api/settings
  handler uses it to decide start vs stop.
- Belt-and-suspenders: the Settings.jsx master toggle also writes
  watchdog_enabled = idle||busy.

Assisted-by: claude:claude-opus-4-8 [Claude Code]

Signed-off-by: Ettore Di Giacinto <mudler@localai.io>
Co-authored-by: Ettore Di Giacinto <mudler@localai.io>
2026-06-14 16:46:14 +02:00
LocalAI [bot]
692970e507 chore: ⬆️ Update leejet/stable-diffusion.cpp to 276025e054555166ec419413c6748ca79986ee93 (#10313)
⬆️ Update leejet/stable-diffusion.cpp

Signed-off-by: github-actions[bot] <41898282+github-actions[bot]@users.noreply.github.com>
Co-authored-by: mudler <2420543+mudler@users.noreply.github.com>
2026-06-14 16:44:05 +02:00
LocalAI [bot]
e046a7749f chore(model gallery): 🤖 add 1 new models via gallery agent (#10328)
chore(model gallery): 🤖 add new models via gallery agent

Signed-off-by: github-actions[bot] <41898282+github-actions[bot]@users.noreply.github.com>
Co-authored-by: mudler <2420543+mudler@users.noreply.github.com>
2026-06-14 16:43:32 +02:00
LocalAI [bot]
e5c95e0449 fix(distributed): stage backend companion assets to remote nodes (#10330)
A model whose ModelFile is a single file (e.g. sherpa-onnx VITS/piper: the
.onnx) failed to load on remote worker nodes because the sibling assets the
backend resolves from the model dir — tokens.txt, lexicon.txt, the
espeak-ng-data / dict directories, Kokoro's voices.bin — were never staged.
Only the declared ModelFile was shipped, so the worker hit "failed to create
sherpa-onnx TTS engine" and TTS produced no audio.

Lean on the existing option-path staging instead of hardcoding filenames:

- stageGenericOptions now also resolves an option value relative to the model's
  own directory (not just the frontend models dir), so a shared config can
  declare companions with bare names regardless of whether Model includes a
  subdirectory; and it expands directory-valued options (e.g. espeak-ng-data)
  file-by-file rather than handing a directory fd to the stager.
- gallery/sherpa-onnx-tts.yaml declares the companion assets as option paths
  (tokens, lexicon, espeak-ng-data, voices.bin, dict, per-lang lexicons). The
  backend ignores these keys and keeps resolving siblings from the model dir;
  they exist only so distributed staging ships them. Absent files are skipped.

Adds router_optionstage_test.go covering file + directory companion staging via
the model-dir fallback.

Co-authored-by: Ettore Di Giacinto <mudler@localai.io>
Co-authored-by: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
2026-06-14 16:42:59 +02:00
LocalAI [bot]
4d3d54d61b test(e2e): live-server voice-recognition gate test (#10324)
Add mock-backend VoiceEmbed/VoiceVerify (deterministic DC-offset speaker
discrimination) and a verify-mode gated realtime pipeline, then drive the
real HTTP/WS stack: an authorized speaker reaches response.done while an
unauthorized one is dropped before the LLM with a speaker_not_authorized
event.


Assisted-by: Claude:opus-4.8 [Claude Code]

Signed-off-by: Ettore Di Giacinto <mudler@localai.io>
Co-authored-by: Ettore Di Giacinto <mudler@localai.io>
2026-06-13 23:54:27 +02:00
LocalAI [bot]
36e3419203 chore: ⬆️ Update vllm-project/vllm cu130 wheel to 0.23.0 (#10314)
⬆️ Update vllm-project/vllm cu130 wheel

Signed-off-by: github-actions[bot] <41898282+github-actions[bot]@users.noreply.github.com>
Co-authored-by: mudler <2420543+mudler@users.noreply.github.com>
2026-06-13 23:39:10 +02:00
LocalAI [bot]
4ec6e3221e feat(realtime): gate realtime pipeline voice models behind voice recognition (#10319)
* feat(realtime): add pipeline voice_recognition gate config schema

Add the PipelineVoiceRecognition config block that gates a realtime
pipeline behind speaker verification (identify against the voice
registry, or verify against reference audios), with Normalize defaults
and Validate enum/shape checks. Register the new fields in the config
meta registry so the UI renders them with proper labels/components
(required by the registry-coverage gate).

Signed-off-by: Ettore Di Giacinto <mudler@localai.io>
Assisted-by: Claude:opus-4.8 [Claude Code]

* fix(realtime): range-check voice gate threshold and floor UI min

Signed-off-by: Ettore Di Giacinto <mudler@localai.io>
Assisted-by: Claude:opus-4.8 [Claude Code]

* feat(realtime): add cosineDistance helper for voice gate

Signed-off-by: Ettore Di Giacinto <mudler@localai.io>
Assisted-by: Claude:opus-4.8 [Claude Code]

* feat(realtime): add voiceGate identify-mode authorization

Signed-off-by: Ettore Di Giacinto <mudler@localai.io>
Assisted-by: Claude:opus-4.8 [Claude Code]

* test(realtime): cover voice gate fail-closed error paths

Signed-off-by: Ettore Di Giacinto <mudler@localai.io>
Assisted-by: Claude:opus-4.8 [Claude Code]

* feat(realtime): add voiceGate verify-mode authorization

Signed-off-by: Ettore Di Giacinto <mudler@localai.io>
Assisted-by: Claude:opus-4.8 [Claude Code]

* feat(realtime): add voiceGate decide policy helper

Signed-off-by: Ettore Di Giacinto <mudler@localai.io>
Assisted-by: Claude:opus-4.8 [Claude Code]

* feat(realtime): add newVoiceGate constructor

Signed-off-by: Ettore Di Giacinto <mudler@localai.io>
Assisted-by: Claude:opus-4.8 [Claude Code]

* feat(realtime): gate pipeline responses behind voice recognition

Run speaker verification concurrently with transcription and join on a
hard barrier before generateResponse, so unauthorized utterances never
reach the LLM, tools, or TTS. Supports identify (registry) and verify
(reference) modes with multiple authorized speakers, per-utterance or
first-utterance checking, and drop-with-event or silent-drop on reject.

Signed-off-by: Ettore Di Giacinto <mudler@localai.io>
Assisted-by: Claude:opus-4.8 [Claude Code]

* fix(realtime): harden voice gate goroutine lifecycle

Only launch the verification goroutine on the transcription path and
drain it before the temp WAV is removed on the transcription-error
return, so an in-flight backend read never races the deferred cleanup.
Drop the write-only voiceMatched field; log the matched speaker instead.

Signed-off-by: Ettore Di Giacinto <mudler@localai.io>
Assisted-by: Claude:opus-4.8 [Claude Code]

* docs(realtime): document the voice_recognition pipeline gate

Signed-off-by: Ettore Di Giacinto <mudler@localai.io>
Assisted-by: Claude:opus-4.8 [Claude Code]

* fix(realtime): fail closed on an incomplete voice_recognition block

A present voice_recognition block with no model previously disabled the
gate silently, authorizing every speaker. Treat block presence as the
intent signal and reject an empty model in Validate, so the session is
refused instead of running unprotected.

Signed-off-by: Ettore Di Giacinto <mudler@localai.io>
Assisted-by: Claude:opus-4.8 [Claude Code]

* test(realtime): integration-test the voice gate through commitUtterance

Drive the real commitUtterance path (gate goroutine, hard join before the
LLM, reject event, when:first session trust) with the existing
transport/model doubles: authorized speakers reach a full response,
unauthorized ones are dropped before the LLM with a speaker_not_authorized
event, backend errors fail closed, drop_silent stays quiet, and when:first
trusts the session after one match.

Signed-off-by: Ettore Di Giacinto <mudler@localai.io>
Assisted-by: Claude:opus-4.8 [Claude Code]

---------

Signed-off-by: Ettore Di Giacinto <mudler@localai.io>
Co-authored-by: Ettore Di Giacinto <mudler@localai.io>
2026-06-13 23:38:08 +02:00
LocalAI [bot]
4bb592cf91 feat(qwen3-tts-cpp): migrate to ServeurpersoCom/qwentts.cpp (streaming, speakers, voice design) (#10316)
* feat(qwen3-tts-cpp): repoint upstream to ServeurpersoCom/qwentts.cpp

Signed-off-by: Ettore Di Giacinto <mudler@localai.io>
Assisted-by: Claude:claude-opus-4-8 [Claude Code]

* feat(qwen3-tts-cpp): flatten qt_* ABI into qt3_* purego shim

Signed-off-by: Ettore Di Giacinto <mudler@localai.io>
Assisted-by: Claude:claude-opus-4-8 [Claude Code]

* feat(qwen3-tts-cpp): build shim against upstream qwen-core static lib

Signed-off-by: Ettore Di Giacinto <mudler@localai.io>
Assisted-by: Claude:claude-opus-4-8 [Claude Code]

* feat(qwen3-tts-cpp): add option/language/voice/sampling parsing

Signed-off-by: Ettore Di Giacinto <mudler@localai.io>
Assisted-by: Claude:claude-opus-4-8 [Claude Code]

* feat(qwen3-tts-cpp): add 24kHz WAV encode/decode/stream-header helpers

Signed-off-by: Ettore Di Giacinto <mudler@localai.io>
Assisted-by: Claude:claude-opus-4-8 [Claude Code]

* feat(qwen3-tts-cpp): purego backend with streaming, speakers, voice design

Map TTSRequest onto qwentts.cpp: instructions->instruct, voice->named
speaker or clone-reference path, params map->ref_text + sampling. Add
TTSStream over the qt chunk callback.

Signed-off-by: Ettore Di Giacinto <mudler@localai.io>
Assisted-by: Claude:claude-opus-4-8 [Claude Code]

* test(qwen3-tts-cpp): unit specs + build-gated TTS/TTSStream e2e

Signed-off-by: Ettore Di Giacinto <mudler@localai.io>
Assisted-by: Claude:claude-opus-4-8 [Claude Code]

* fix(qwen3-tts-cpp): close defensive PCM-free gap on zero-sample result

Register CppPCMFree before the n<=0 guard so a non-null buffer with zero
samples cannot leak (the C contract returns NULL on failure, so this is
defensive). Raised in code review.

Signed-off-by: Ettore Di Giacinto <mudler@localai.io>
Assisted-by: Claude:claude-opus-4-8 [Claude Code]

* feat(qwen3-tts-cpp): advertise TTSStream capability

Signed-off-by: Ettore Di Giacinto <mudler@localai.io>
Assisted-by: Claude:claude-opus-4-8 [Claude Code]

* chore(qwen3-tts-cpp): update backend index metadata for qwentts.cpp

Signed-off-by: Ettore Di Giacinto <mudler@localai.io>
Assisted-by: Claude:claude-opus-4-8 [Claude Code]

* feat(gallery): qwentts.cpp models - base/customvoice/voicedesign, Q8_0 & Q4_K_M

Signed-off-by: Ettore Di Giacinto <mudler@localai.io>
Assisted-by: Claude:claude-opus-4-8 [Claude Code]

* docs(qwen3-tts-cpp): release note for qwentts.cpp migration

Signed-off-by: Ettore Di Giacinto <mudler@localai.io>
Assisted-by: Claude:claude-opus-4-8 [Claude Code]

* test(qwen3-tts-cpp): cover audio_path voice-cloning fallback

Add resolveRequest unit specs (config audio_path used as the clone
reference when Voice is empty; per-request audio Voice overrides it; a
named-speaker Voice does not trigger cloning) plus a real-inference e2e
that clones from audio_path (confirmed ref_spk_emb=yes in the pipeline).

Signed-off-by: Ettore Di Giacinto <mudler@localai.io>
Assisted-by: Claude:claude-opus-4-8 [Claude Code]

* chore(qwen3-tts-cpp): drop the release-note doc

Signed-off-by: Ettore Di Giacinto <mudler@localai.io>
Assisted-by: Claude:claude-opus-4-8 [Claude Code]

---------

Signed-off-by: Ettore Di Giacinto <mudler@localai.io>
Co-authored-by: Ettore Di Giacinto <mudler@localai.io>
2026-06-13 23:09:59 +02:00
Ettore Di Giacinto
3e838c0cff docs: add realtime voice demo example and refresh README news
Add the localai-org/localai-realtime-demo Go client to the README
Examples list and to the realtime docs (integrations + realtime feature
page). Refresh the Latest News section with June 2026 highlights pulled
from history since v4.3.0: realtime pipeline streaming, the parakeet.cpp
and CrispASR speech work, new backends (locate-anything.cpp, Ideogram4,
llama.cpp video input), and distributed-mode hardening.

Signed-off-by: Ettore Di Giacinto <mudler@localai.io>
Assisted-by: Claude:claude-opus-4-8 [Claude Code]
2026-06-13 20:10:22 +00:00
moduvoice
36b4a81d1e feat(i18n): add Korean (ko) translation (#10312)
Add a full Korean locale (core/http/react-ui/public/locales/ko/, 13 namespaces,
840 keys, full parity with en/) and register ko in SUPPORTED_LANGUAGES
(core/http/react-ui/src/i18n/index.js). All i18next {{interpolation}} and
_one/_other plural keys preserved; brand/model names kept untranslated.

Assisted-by: Claude:claude-opus-4-8

Signed-off-by: moduvoice <moduvoicr77@gmail.com>
2026-06-13 21:58:50 +02:00
LocalAI [bot]
0854932a25 feat(omnivoice-cpp): add OmniVoice TTS backend (file + streaming, voice cloning + voice design) (#10310)
* feat(omnivoice-cpp): add C wrapper + CMake/Makefile build over OmniVoice ov_* ABI

Assisted-by: claude:claude-opus-4-8 [Claude Code]
Signed-off-by: Ettore Di Giacinto <mudler@localai.io>

* feat(omnivoice-cpp): add option/language parsing + WAV framing helpers with tests

Assisted-by: claude:claude-opus-4-8 [Claude Code]
Signed-off-by: Ettore Di Giacinto <mudler@localai.io>

* feat(omnivoice-cpp): wire purego binding with TTS + streaming TTSStream

Assisted-by: claude:claude-opus-4-8 [Claude Code]
Signed-off-by: Ettore Di Giacinto <mudler@localai.io>

* build(omnivoice-cpp): wire backend into root Makefile

Assisted-by: claude:claude-opus-4-8 [Claude Code]
Signed-off-by: Ettore Di Giacinto <mudler@localai.io>

* ci(omnivoice-cpp): add build matrix entries + dep-bump registration

Assisted-by: claude:claude-opus-4-8 [Claude Code]
Signed-off-by: Ettore Di Giacinto <mudler@localai.io>

* feat(omnivoice-cpp): register backend meta + image entries

Assisted-by: claude:claude-opus-4-8 [Claude Code]
Signed-off-by: Ettore Di Giacinto <mudler@localai.io>

* feat(omnivoice-cpp): expose as preference-only importable backend

Assisted-by: claude:claude-opus-4-8 [Claude Code]
Signed-off-by: Ettore Di Giacinto <mudler@localai.io>

* feat(gallery): add omnivoice-cpp TTS models (Q8_0 default + BF16 HQ)

Assisted-by: claude:claude-opus-4-8 [Claude Code]
Signed-off-by: Ettore Di Giacinto <mudler@localai.io>

* docs(omnivoice-cpp): document the OmniVoice TTS backend

Assisted-by: claude:claude-opus-4-8 [Claude Code]
Signed-off-by: Ettore Di Giacinto <mudler@localai.io>

* test(omnivoice-cpp): add env-gated e2e for TTS + streaming

Assisted-by: claude:claude-opus-4-8 [Claude Code]
Signed-off-by: Ettore Di Giacinto <mudler@localai.io>

* feat(omnivoice-cpp): honor tts.audio_path/tts.voice config as default cloning reference

The model config tts.audio_path (ModelOptions.AudioPath) and tts.voice now
provide a default voice-cloning reference used when a request omits Voice, so a
cloned voice can be pinned in the model YAML instead of passed per request. A
per-request voice still overrides. Paths resolve relative to the model dir.

Assisted-by: claude:claude-opus-4-8 [Claude Code]
Signed-off-by: Ettore Di Giacinto <mudler@localai.io>

* fix(omnivoice-cpp): add missing omnivoice-cpp-development backend meta

Mirrors the whisper/vibevoice convention: a -development meta aggregating the
master-tagged image variants (the production meta and per-variant prod+dev image
entries already existed; only the development meta aggregator was missing).

Assisted-by: claude:claude-opus-4-8 [Claude Code]
Signed-off-by: Ettore Di Giacinto <mudler@localai.io>

---------

Signed-off-by: Ettore Di Giacinto <mudler@localai.io>
Co-authored-by: Ettore Di Giacinto <mudler@localai.io>
2026-06-13 21:28:46 +02:00
LocalAI [bot]
203410871b feat(sherpa-onnx): add Kokoro TTS + multilingual Piper voices (#10309)
Wire the Kokoro model family into the sherpa-onnx backend (which only
supported VITS/Piper before) and add gallery voices for Italian, English,
Spanish, French and German plus a multilingual Kokoro model.

- csrc/shim.{c,h}: kokoro_* config setters (model/voices/tokens/data_dir/
  dict_dir/lexicon/lang/length_scale) mirroring the VITS path, with the
  matching frees in tts_config_free.
- backend.go: loadTTS now detects a Kokoro model (a voices.bin beside the
  ONNX) and routes to configureKokoroTTS, otherwise configureVitsTTS.
  Kokoro picks up espeak-ng-data, the jieba dict and the per-language
  lexicons (only one English variant, to avoid tens of thousands of
  duplicate-word warnings at load); the language= option hints the lang.
- backend_test.go: functional test for isKokoroModel detection.
- gallery: 5 Piper VITS voices (it_IT-paola, en_US-amy, es_ES-davefx,
  fr_FR-siwis, de_DE-thorsten) + kokoro-multi-lang-v1.0, served through
  sherpa-onnx-tts.yaml with native streaming TTS.

Verified by building the backend and synthesizing with a real Piper and
Kokoro model (31/31 specs pass, including real-model synth smokes).


Assisted-by: Claude:claude-opus-4-8 gofmt golangci-lint go-test

Signed-off-by: Ettore Di Giacinto <mudler@localai.io>
Co-authored-by: Ettore Di Giacinto <mudler@localai.io>
2026-06-13 21:27:27 +02:00
LocalAI [bot]
7637f8cf1b feat(distributed): declarative per-model scheduling via env/args (#10308)
* feat(distributed): add SpreadAll column and authoritative scheduling seeding

Signed-off-by: Ettore Di Giacinto <mudler@localai.io>

* feat(distributed): parse declarative model scheduling config (env/file)

Signed-off-by: Ettore Di Giacinto <mudler@localai.io>

* feat(distributed): reconcile spread_all to one replica per matching node

Signed-off-by: Ettore Di Giacinto <mudler@localai.io>

* feat(distributed): wire LOCALAI_MODEL_SCHEDULING env/args and startup seeding

Signed-off-by: Ettore Di Giacinto <mudler@localai.io>

* feat(distributed): expose spread_all on the scheduling API endpoint

Signed-off-by: Ettore Di Giacinto <mudler@localai.io>

* feat(distributed): add spread-to-all-nodes mode to the scheduling UI

Signed-off-by: Ettore Di Giacinto <mudler@localai.io>

* docs(distributed): document LOCALAI_MODEL_SCHEDULING env/args

Signed-off-by: Ettore Di Giacinto <mudler@localai.io>

* docs(distributed): clarify replica modes and all-nodes spread in scheduling config

Signed-off-by: Ettore Di Giacinto <mudler@localai.io>

---------

Signed-off-by: Ettore Di Giacinto <mudler@localai.io>
Co-authored-by: Ettore Di Giacinto <mudler@localai.io>
2026-06-13 18:31:06 +02:00
LocalAI [bot]
f0e001b7f8 fix(xsysinfo): container-aware total RAM detection (cgroup/lxcfs) (#8059) (#10288)
fix(xsysinfo): make reported system RAM total cgroup/lxcfs-aware (#8059)

GetSystemRAMInfo derived Total from memory.TotalMemory(), which on Linux
uses syscall.Sysinfo().Totalram - the HOST kernel total. lxcfs/LXD does
NOT virtualize that value, while MemAvailable (used for Free/Available)
IS virtualized. Inside an LXD/container with a 128Gi host but a ~10Gi
container view this produced Total=128Gi, Available=10Gi => Used=118Gi,
reporting ~92% RAM usage on an idle container.

Derive Total instead from the minimum of all non-zero, non-unlimited
candidates: cgroup v2 memory.max, cgroup v1 memory.limit_in_bytes (the
kernel unlimited sentinel is ignored), /proc/meminfo MemTotal (which
lxcfs virtualizes), and the syscall.Sysinfo total as the bare-metal
fallback. On bare metal every candidate is unlimited or equals the host
total, so behavior is unchanged.

The selection/parsing lives in a pure function chooseTotalMemory(...)
taking file CONTENTS, unit-tested without a real LXD host; OS file
reads stay in a thin wrapper.

Assisted-by: claude:claude-opus-4-8 [Claude Code]

Signed-off-by: Ettore Di Giacinto <mudler@localai.io>
Co-authored-by: Ettore Di Giacinto <mudler@localai.io>
2026-06-13 18:13:06 +02:00
pos-ei-don
cf9debf4eb model: fix case-insensitive suffix matching and skip .bak files in ListFilesInModelPath (#10306)
model: skip .bak files and fix case-insensitive suffix matching in ListFilesInModelPath
2026-06-13 17:46:46 +02:00
LocalAI [bot]
e1556aa1dc fix(react-ui): make agent chat timestamps format-agnostic (#9867) (#10290)
fix(agents): make React agent chat timestamps format-agnostic

The agent SSE bridge emits the json_message timestamp in three different
encodings depending on deploy mode: an RFC3339 string (standalone agent
pool), Unix milliseconds (local dispatcher), and Unix nanoseconds (the
older NATS path). The React AgentChat handler passed data.timestamp
straight through, so the standalone string and any numeric value outside
the millisecond range rendered as "Invalid Timestamp" or a constant
epoch-ish time.

Add a small pure helper, normalizeTimestampMs, that accepts an RFC3339
string or a numeric epoch in s/ms/us/ns and returns JS milliseconds,
falling back to Date.now() on null/empty/unparseable input. Use it in
the json_message handler so the rendered time is correct regardless of
which backend path produced it.

Fixes #9867


Assisted-by: claude:claude-opus-4-8 [Claude Code]

Signed-off-by: Ettore Di Giacinto <mudler@localai.io>
Co-authored-by: Ettore Di Giacinto <mudler@localai.io>
2026-06-13 11:05:21 +02:00
LocalAI [bot]
53cbb578a9 chore(model gallery): 🤖 add 1 new models via gallery agent (#10304)
chore(model gallery): 🤖 add new models via gallery agent

Signed-off-by: github-actions[bot] <41898282+github-actions[bot]@users.noreply.github.com>
Co-authored-by: mudler <2420543+mudler@users.noreply.github.com>
2026-06-13 11:03:03 +02:00
LocalAI [bot]
99c8205740 fix(react-ui): stop Talk pipeline overflow and center collapsed-rail avatar (#10305)
Two small visual fixes in the React UI:

- Talk page pipeline summary: the four-column grid used
  `repeat(4, 1fr)`, which resolves to `minmax(auto, 1fr)` so each track
  refuses to shrink below the min-content width of its `nowrap` model
  name. Long names (e.g. a verbose GGUF LLM id) blew the grid out past
  the container despite the per-cell ellipsis styling. Switching to
  `minmax(0, 1fr)` lets the tracks shrink and the ellipsis take effect.

- Sidebar user avatar: the desktop collapsed look centers the avatar via
  `.sidebar.collapsed .sidebar-user{-link}` rules, but the tablet
  icon-rail (640-1023px) collapses visually through `.sidebar:not(.open)`
  without necessarily carrying the `.collapsed` class, so the avatar kept
  its left-aligned negative margins and looked misaligned. Mirror the
  centering rules under `.sidebar:not(.open)`.

Assisted-by: Claude:claude-opus-4-8 [Claude Code]

Signed-off-by: Ettore Di Giacinto <mudler@localai.io>
Co-authored-by: Ettore Di Giacinto <mudler@localai.io>
2026-06-13 11:02:48 +02:00
LocalAI [bot]
d7162b9f89 ci(darwin): build the ds4 backend for darwin/arm64 (metal) (#10303)
The gallery has metal-ds4 / metal-ds4-development entries, and the build
recipe exists (make backends/ds4-darwin, special-cased in
backend_build_darwin.yml), but ds4 was never listed in the darwin matrix,
so no metal-darwin-arm64-ds4 image was ever published and the entries
dangled.

- Add ds4 to the darwin matrix (includeDarwin), mirroring the llama-cpp
  form (the reusable workflow builds it via 'make backends/ds4-darwin').
- Fix inferBackendPathDarwin in scripts/changed-backends.js to map ds4 to
  backend/cpp/ds4/ (like llama-cpp): ds4 is C++ but the matrix entry carries
  lang=go, so without this its darwin build would only ever run on a release
  (FORCE_ALL), never incrementally when backend/cpp/ds4 changes.

sherpa-onnx and speaker-recognition are already in the darwin matrix on
master and are not changed here.

Assisted-by: claude:claude-opus-4-8 [Claude Code]

Signed-off-by: Ettore Di Giacinto <mudler@localai.io>
Co-authored-by: Ettore Di Giacinto <mudler@localai.io>
2026-06-13 11:02:32 +02:00
LocalAI [bot]
3351b62c91 chore(model gallery): 🤖 add 1 new models via gallery agent (#10302)
chore(model gallery): 🤖 add new models via gallery agent

Signed-off-by: github-actions[bot] <41898282+github-actions[bot]@users.noreply.github.com>
Co-authored-by: mudler <2420543+mudler@users.noreply.github.com>
2026-06-13 10:59:23 +02:00
LocalAI [bot]
0eca930b8d fix(gallery): correct meta-backend definitions for platform auto-selection (#10299)
fix(gallery): correct meta-backend definitions in backend/index.yaml

Backends that ship per-platform images must be meta backends (a capabilities
map and NO uri) so the right variant is auto-selected per platform - mirroring
llama-cpp/whisper. Several entries were misdefined; fixed here:

- Concrete base + metal sibling (could not select the Apple Silicon variant):
  silero-vad, piper, kitten-tts, local-store (+ their -development). Converted
  each anchor to a meta and added the cpu-<name> concrete.
- mlx family (mlx, mlx-vlm, mlx-audio, mlx-distributed + -development): anchor
  had both a uri AND a capabilities map, so IsMeta() was false and the map was
  ignored (always resolved to the metal-darwin image); the metal-<name> target
  did not exist. Removed the uri and added the missing metal-<name> concretes.
- Dangling capability targets: diffusers/kokoro nvidia-l4t-cuda-12 repointed to
  the existing nvidia-l4t-<name> concrete; coqui nvidia-cuda-13 key removed
  (no cuda13-coqui image).
- locate-anything: the meta existed but its concrete entries were never added,
  so it was un-installable on every platform. Added the full concrete set plus
  the locate-anything-development meta, mirroring rfdetr-cpp. Image tags grounded
  against the published quay.io tags.
- trl (cuda12/13): repointed the stale 'cublas-cuda12/13-trl' image tags to the
  actually-published 'gpu-nvidia-cuda-12/13-trl' tags (fixes #9236).

Assisted-by: claude:claude-opus-4-8 [Claude Code]

Signed-off-by: Ettore Di Giacinto <mudler@localai.io>
Co-authored-by: Ettore Di Giacinto <mudler@localai.io>
2026-06-13 10:43:14 +02:00
LocalAI [bot]
81ab62e874 chore(model gallery): 🤖 add 1 new models via gallery agent (#10298)
chore(model gallery): 🤖 add new models via gallery agent

Signed-off-by: github-actions[bot] <41898282+github-actions[bot]@users.noreply.github.com>
Co-authored-by: mudler <2420543+mudler@users.noreply.github.com>
2026-06-13 09:58:11 +02:00
389 changed files with 25685 additions and 6627 deletions

View File

@@ -44,6 +44,39 @@ maps to `DS4_THINK_HIGH`. We pass the chosen mode to `ds4_chat_append_assistant_
via `ModelOptions.Options[] = "kv_cache_dir:/some/path"`. Format is **our own** -
NOT bit-compatible with ds4-server's KVC files (interop is a follow-up plan).
## Engine options (LoadModel)
`LoadModel` maps `ModelOptions.Options[]` (`"key:value"`, from model-YAML
`options:`) onto `ds4_engine_options` through a **declarative table**
(`kEngineOptSpecs` + `apply_engine_option` in `grpc-server.cpp`). The struct is
plain C with no reflection, so the field set is enumerated once in the table;
adding a future engine knob is a one-line table row, not a new branch. Unknown
keys are ignored (back-compat). A bare flag (`ssd_streaming` with no value)
means `true`. Path-type values (`mtp_path`, `expert_profile_path`,
`directional_steering_file`) resolve **relative to the model directory**, so a
gallery entry can reference a companion file it downloaded by bare filename;
absolute values pass through. `ds4_role` / `ds4_layers` / `ds4_listen` /
`ds4_route_timeout` / `kv_cache_dir` keep their dedicated handling (validation
+ coordinator wiring) and are not in the table.
Wired keys: `mtp_path`, `mtp_draft`, `mtp_margin`, `prefill_chunk`,
`power_percent`, `warm_weights`, `quality`, `ssd_streaming`,
`ssd_streaming_cold`, `ssd_streaming_preload_experts`,
`ssd_streaming_cache_experts` (count or `NGB`, sets both experts+bytes via
`ds4_parse_streaming_cache_experts_arg`), `simulate_used_memory` (`NGB` via
`ds4_parse_gib_arg`), `expert_profile_path`, `directional_steering_file`,
`directional_steering_attn`, `directional_steering_ffn`.
## SSD streaming (running models larger than RAM)
ds4's **SSD streaming** keeps non-routed weights resident and streams routed MoE
experts from the GGUF on cache misses, turning "does it fit in RAM" into a speed
spectrum. **Metal (Darwin) only** - it is a no-op on CUDA/CPU. Enable with
`options: ["ssd_streaming"]`; size the routed-expert cache with
`ssd_streaming_cache_experts:NGB` (omit for ds4's automatic 80%-of-working-set
budget). Gallery entries built on this: `deepseek-v4-flash-q4-ssd` (153 GB Flash
on a 128 GB Mac) and `deepseek-v4-pro-q2-ssd` (433 GB Pro, experimental).
## Build matrix
| Build | Where | Notes |

View File

@@ -31,6 +31,15 @@ backend/python/**/source
backend/cpp/llama-cpp/llama.cpp
backend/cpp/llama-cpp-*-build
# privacy-filter: same in-place pattern. The Makefile fetches privacy-filter.cpp
# at the pinned commit (or symlinks a PRIVACY_FILTER_SRC checkout for local dev).
# A stale dir/symlink COPY'd into the image makes the clone step fail (dangling
# symlink) or compile against the wrong commit, so keep host build state out.
backend/cpp/privacy-filter/privacy-filter.cpp
backend/cpp/privacy-filter/build
backend/cpp/privacy-filter/grpc-server
backend/cpp/privacy-filter/package
# Rust backend build output (sources are tracked; target/ is generated)
backend/rust/*/target

View File

@@ -716,6 +716,19 @@ include:
dockerfile: "./backend/Dockerfile.golang"
context: "./"
ubuntu-version: '2404'
- build-type: 'cublas'
cuda-major-version: "12"
cuda-minor-version: "8"
platforms: 'linux/amd64'
tag-latest: 'auto'
tag-suffix: '-gpu-nvidia-cuda-12-depth-anything-cpp'
runs-on: 'ubuntu-latest'
base-image: "ubuntu:24.04"
skip-drivers: 'false'
backend: "depth-anything-cpp"
dockerfile: "./backend/Dockerfile.golang"
context: "./"
ubuntu-version: '2404'
- build-type: 'cublas'
cuda-major-version: "12"
cuda-minor-version: "8"
@@ -781,6 +794,19 @@ include:
dockerfile: "./backend/Dockerfile.golang"
context: "./"
ubuntu-version: '2404'
- build-type: 'cublas'
cuda-major-version: "12"
cuda-minor-version: "8"
platforms: 'linux/amd64'
tag-latest: 'auto'
tag-suffix: '-gpu-nvidia-cuda-12-omnivoice-cpp'
runs-on: 'ubuntu-latest'
base-image: "ubuntu:24.04"
skip-drivers: 'false'
backend: "omnivoice-cpp"
dockerfile: "./backend/Dockerfile.golang"
context: "./"
ubuntu-version: '2404'
- build-type: 'cublas'
cuda-major-version: "12"
cuda-minor-version: "8"
@@ -1569,6 +1595,19 @@ include:
dockerfile: "./backend/Dockerfile.golang"
context: "./"
ubuntu-version: '2404'
- build-type: 'cublas'
cuda-major-version: "13"
cuda-minor-version: "0"
platforms: 'linux/amd64'
tag-latest: 'auto'
tag-suffix: '-gpu-nvidia-cuda-13-depth-anything-cpp'
runs-on: 'ubuntu-latest'
base-image: "ubuntu:24.04"
skip-drivers: 'false'
backend: "depth-anything-cpp"
dockerfile: "./backend/Dockerfile.golang"
context: "./"
ubuntu-version: '2404'
- build-type: 'cublas'
cuda-major-version: "13"
cuda-minor-version: "0"
@@ -1608,6 +1647,19 @@ include:
backend: "locate-anything-cpp"
dockerfile: "./backend/Dockerfile.golang"
context: "./"
- build-type: 'cublas'
cuda-major-version: "13"
cuda-minor-version: "0"
platforms: 'linux/arm64'
skip-drivers: 'false'
tag-latest: 'auto'
tag-suffix: '-nvidia-l4t-cuda-13-arm64-depth-anything-cpp'
base-image: "ubuntu:24.04"
ubuntu-version: '2404'
runs-on: 'ubuntu-24.04-arm'
backend: "depth-anything-cpp"
dockerfile: "./backend/Dockerfile.golang"
context: "./"
- build-type: 'cublas'
cuda-major-version: "13"
cuda-minor-version: "0"
@@ -1712,6 +1764,19 @@ include:
dockerfile: "./backend/Dockerfile.golang"
context: "./"
ubuntu-version: '2404'
- build-type: 'cublas'
cuda-major-version: "13"
cuda-minor-version: "0"
platforms: 'linux/amd64'
tag-latest: 'auto'
tag-suffix: '-gpu-nvidia-cuda-13-omnivoice-cpp'
runs-on: 'ubuntu-latest'
base-image: "ubuntu:24.04"
skip-drivers: 'false'
backend: "omnivoice-cpp"
dockerfile: "./backend/Dockerfile.golang"
context: "./"
ubuntu-version: '2404'
- build-type: 'cublas'
cuda-major-version: "13"
cuda-minor-version: "0"
@@ -1751,6 +1816,19 @@ include:
backend: "qwen3-tts-cpp"
dockerfile: "./backend/Dockerfile.golang"
context: "./"
- build-type: 'cublas'
cuda-major-version: "13"
cuda-minor-version: "0"
platforms: 'linux/arm64'
skip-drivers: 'false'
tag-latest: 'auto'
tag-suffix: '-nvidia-l4t-cuda-13-arm64-omnivoice-cpp'
base-image: "ubuntu:24.04"
ubuntu-version: '2404'
runs-on: 'ubuntu-24.04-arm'
backend: "omnivoice-cpp"
dockerfile: "./backend/Dockerfile.golang"
context: "./"
- build-type: 'cublas'
cuda-major-version: "13"
cuda-minor-version: "0"
@@ -2592,6 +2670,78 @@ include:
dockerfile: "./backend/Dockerfile.ds4"
context: "./"
ubuntu-version: '2404'
# privacy-filter: PII/NER token classifier (per-arch native -> manifest merge).
# Every variant builds FROM a prebuilt quay.io/go-skynet/ci-cache:base-grpc-*
# image (gRPC + cmake + protoc + conditional CUDA/Vulkan already installed),
# exactly like llama-cpp — no toolchain is installed in Dockerfile.privacy-filter.
# builder-base-image makes the workflow use the Dockerfile's builder-prebuilt
# stage; without it (local builds) the builder-fromsource stage runs the same
# .docker/install-base-deps.sh.
- build-type: ''
cuda-major-version: ""
cuda-minor-version: ""
platforms: 'linux/amd64'
platform-tag: 'amd64'
tag-latest: 'auto'
tag-suffix: '-cpu-privacy-filter'
builder-base-image: 'quay.io/go-skynet/ci-cache:base-grpc-amd64'
runs-on: 'ubuntu-latest'
base-image: "ubuntu:24.04"
skip-drivers: 'true'
backend: "privacy-filter"
dockerfile: "./backend/Dockerfile.privacy-filter"
context: "./"
ubuntu-version: '2404'
- build-type: ''
cuda-major-version: ""
cuda-minor-version: ""
platforms: 'linux/arm64'
platform-tag: 'arm64'
tag-latest: 'auto'
tag-suffix: '-cpu-privacy-filter'
builder-base-image: 'quay.io/go-skynet/ci-cache:base-grpc-arm64'
runs-on: 'ubuntu-24.04-arm'
base-image: "ubuntu:24.04"
skip-drivers: 'true'
backend: "privacy-filter"
dockerfile: "./backend/Dockerfile.privacy-filter"
context: "./"
ubuntu-version: '2404'
# Vulkan: base-grpc-vulkan-amd64 carries the SDK. arm64 vulkan is a one-line
# add once amd64 is proven in CI.
- build-type: 'vulkan'
cuda-major-version: ""
cuda-minor-version: ""
platforms: 'linux/amd64'
platform-tag: 'amd64'
tag-latest: 'auto'
tag-suffix: '-gpu-vulkan-privacy-filter'
builder-base-image: 'quay.io/go-skynet/ci-cache:base-grpc-vulkan-amd64'
runs-on: 'ubuntu-latest'
base-image: "ubuntu:24.04"
skip-drivers: 'false'
backend: "privacy-filter"
dockerfile: "./backend/Dockerfile.privacy-filter"
context: "./"
ubuntu-version: '2404'
# CUDA: base-grpc-cuda-13-amd64 carries the toolkit; BUILD_TYPE=cublas ->
# -DPF_CUDA=ON. cuda-12 and arm64/l4t are one-line adds once cuda-13 amd64 is
# proven in CI.
- build-type: 'cublas'
cuda-major-version: "13"
cuda-minor-version: "0"
platforms: 'linux/amd64'
platform-tag: 'amd64'
tag-latest: 'auto'
tag-suffix: '-gpu-nvidia-cuda-13-privacy-filter'
builder-base-image: 'quay.io/go-skynet/ci-cache:base-grpc-cuda-13-amd64'
runs-on: 'ubuntu-latest'
base-image: "ubuntu:24.04"
skip-drivers: 'true'
backend: "privacy-filter"
dockerfile: "./backend/Dockerfile.privacy-filter"
context: "./"
ubuntu-version: '2404'
- build-type: ''
cuda-major-version: ""
cuda-minor-version: ""
@@ -2859,6 +3009,19 @@ include:
dockerfile: "./backend/Dockerfile.golang"
context: "./"
ubuntu-version: '2404'
- build-type: ''
cuda-major-version: ""
cuda-minor-version: ""
platforms: 'linux/amd64'
tag-latest: 'auto'
tag-suffix: '-cpu-depth-anything-cpp'
runs-on: 'ubuntu-latest'
base-image: "ubuntu:24.04"
skip-drivers: 'false'
backend: "depth-anything-cpp"
dockerfile: "./backend/Dockerfile.golang"
context: "./"
ubuntu-version: '2404'
- build-type: 'sycl_f32'
cuda-major-version: ""
cuda-minor-version: ""
@@ -2872,6 +3035,19 @@ include:
dockerfile: "./backend/Dockerfile.golang"
context: "./"
ubuntu-version: '2404'
- build-type: 'sycl_f32'
cuda-major-version: ""
cuda-minor-version: ""
platforms: 'linux/amd64'
tag-latest: 'auto'
tag-suffix: '-gpu-intel-sycl-f32-depth-anything-cpp'
runs-on: 'ubuntu-latest'
base-image: "intel/oneapi-basekit:2025.3.0-0-devel-ubuntu24.04"
skip-drivers: 'false'
backend: "depth-anything-cpp"
dockerfile: "./backend/Dockerfile.golang"
context: "./"
ubuntu-version: '2404'
- build-type: 'sycl_f16'
cuda-major-version: ""
cuda-minor-version: ""
@@ -2885,6 +3061,19 @@ include:
dockerfile: "./backend/Dockerfile.golang"
context: "./"
ubuntu-version: '2404'
- build-type: 'sycl_f16'
cuda-major-version: ""
cuda-minor-version: ""
platforms: 'linux/amd64'
tag-latest: 'auto'
tag-suffix: '-gpu-intel-sycl-f16-depth-anything-cpp'
runs-on: 'ubuntu-latest'
base-image: "intel/oneapi-basekit:2025.3.0-0-devel-ubuntu24.04"
skip-drivers: 'false'
backend: "depth-anything-cpp"
dockerfile: "./backend/Dockerfile.golang"
context: "./"
ubuntu-version: '2404'
- build-type: 'vulkan'
cuda-major-version: ""
cuda-minor-version: ""
@@ -2899,6 +3088,20 @@ include:
dockerfile: "./backend/Dockerfile.golang"
context: "./"
ubuntu-version: '2404'
- build-type: 'vulkan'
cuda-major-version: ""
cuda-minor-version: ""
platforms: 'linux/amd64'
platform-tag: 'amd64'
tag-latest: 'auto'
tag-suffix: '-gpu-vulkan-depth-anything-cpp'
runs-on: 'ubuntu-latest'
base-image: "ubuntu:24.04"
skip-drivers: 'false'
backend: "depth-anything-cpp"
dockerfile: "./backend/Dockerfile.golang"
context: "./"
ubuntu-version: '2404'
- build-type: 'vulkan'
cuda-major-version: ""
cuda-minor-version: ""
@@ -2913,6 +3116,20 @@ include:
dockerfile: "./backend/Dockerfile.golang"
context: "./"
ubuntu-version: '2404'
- build-type: 'vulkan'
cuda-major-version: ""
cuda-minor-version: ""
platforms: 'linux/arm64'
platform-tag: 'arm64'
tag-latest: 'auto'
tag-suffix: '-gpu-vulkan-depth-anything-cpp'
runs-on: 'ubuntu-24.04-arm'
base-image: "ubuntu:24.04"
skip-drivers: 'false'
backend: "depth-anything-cpp"
dockerfile: "./backend/Dockerfile.golang"
context: "./"
ubuntu-version: '2404'
- build-type: 'sycl_f32'
cuda-major-version: ""
cuda-minor-version: ""
@@ -3019,6 +3236,19 @@ include:
dockerfile: "./backend/Dockerfile.golang"
context: "./"
ubuntu-version: '2204'
- build-type: 'cublas'
cuda-major-version: "12"
cuda-minor-version: "0"
platforms: 'linux/arm64'
skip-drivers: 'false'
tag-latest: 'auto'
tag-suffix: '-nvidia-l4t-arm64-depth-anything-cpp'
base-image: "nvcr.io/nvidia/l4t-jetpack:r36.4.0"
runs-on: 'ubuntu-24.04-arm'
backend: "depth-anything-cpp"
dockerfile: "./backend/Dockerfile.golang"
context: "./"
ubuntu-version: '2204'
# whisper
- build-type: ''
cuda-major-version: ""
@@ -3483,6 +3713,35 @@ include:
dockerfile: "./backend/Dockerfile.golang"
context: "./"
ubuntu-version: '2404'
# omnivoice-cpp
- build-type: ''
cuda-major-version: ""
cuda-minor-version: ""
platforms: 'linux/amd64'
platform-tag: 'amd64'
tag-latest: 'auto'
tag-suffix: '-cpu-omnivoice-cpp'
runs-on: 'ubuntu-latest'
base-image: "ubuntu:24.04"
skip-drivers: 'false'
backend: "omnivoice-cpp"
dockerfile: "./backend/Dockerfile.golang"
context: "./"
ubuntu-version: '2404'
- build-type: ''
cuda-major-version: ""
cuda-minor-version: ""
platforms: 'linux/arm64'
platform-tag: 'arm64'
tag-latest: 'auto'
tag-suffix: '-cpu-omnivoice-cpp'
runs-on: 'ubuntu-24.04-arm'
base-image: "ubuntu:24.04"
skip-drivers: 'false'
backend: "omnivoice-cpp"
dockerfile: "./backend/Dockerfile.golang"
context: "./"
ubuntu-version: '2404'
- build-type: 'sycl_f32'
cuda-major-version: ""
cuda-minor-version: ""
@@ -3496,6 +3755,19 @@ include:
dockerfile: "./backend/Dockerfile.golang"
context: "./"
ubuntu-version: '2404'
- build-type: 'sycl_f32'
cuda-major-version: ""
cuda-minor-version: ""
platforms: 'linux/amd64'
tag-latest: 'auto'
tag-suffix: '-gpu-intel-sycl-f32-omnivoice-cpp'
runs-on: 'ubuntu-latest'
base-image: "intel/oneapi-basekit:2025.3.0-0-devel-ubuntu24.04"
skip-drivers: 'false'
backend: "omnivoice-cpp"
dockerfile: "./backend/Dockerfile.golang"
context: "./"
ubuntu-version: '2404'
- build-type: 'sycl_f16'
cuda-major-version: ""
cuda-minor-version: ""
@@ -3509,6 +3781,19 @@ include:
dockerfile: "./backend/Dockerfile.golang"
context: "./"
ubuntu-version: '2404'
- build-type: 'sycl_f16'
cuda-major-version: ""
cuda-minor-version: ""
platforms: 'linux/amd64'
tag-latest: 'auto'
tag-suffix: '-gpu-intel-sycl-f16-omnivoice-cpp'
runs-on: 'ubuntu-latest'
base-image: "intel/oneapi-basekit:2025.3.0-0-devel-ubuntu24.04"
skip-drivers: 'false'
backend: "omnivoice-cpp"
dockerfile: "./backend/Dockerfile.golang"
context: "./"
ubuntu-version: '2404'
- build-type: 'vulkan'
cuda-major-version: ""
cuda-minor-version: ""
@@ -3523,6 +3808,20 @@ include:
dockerfile: "./backend/Dockerfile.golang"
context: "./"
ubuntu-version: '2404'
- build-type: 'vulkan'
cuda-major-version: ""
cuda-minor-version: ""
platforms: 'linux/amd64'
platform-tag: 'amd64'
tag-latest: 'auto'
tag-suffix: '-gpu-vulkan-omnivoice-cpp'
runs-on: 'ubuntu-latest'
base-image: "ubuntu:24.04"
skip-drivers: 'false'
backend: "omnivoice-cpp"
dockerfile: "./backend/Dockerfile.golang"
context: "./"
ubuntu-version: '2404'
- build-type: 'vulkan'
cuda-major-version: ""
cuda-minor-version: ""
@@ -3537,6 +3836,20 @@ include:
dockerfile: "./backend/Dockerfile.golang"
context: "./"
ubuntu-version: '2404'
- build-type: 'vulkan'
cuda-major-version: ""
cuda-minor-version: ""
platforms: 'linux/arm64'
platform-tag: 'arm64'
tag-latest: 'auto'
tag-suffix: '-gpu-vulkan-omnivoice-cpp'
runs-on: 'ubuntu-24.04-arm'
base-image: "ubuntu:24.04"
skip-drivers: 'false'
backend: "omnivoice-cpp"
dockerfile: "./backend/Dockerfile.golang"
context: "./"
ubuntu-version: '2404'
- build-type: 'cublas'
cuda-major-version: "12"
cuda-minor-version: "0"
@@ -3550,6 +3863,19 @@ include:
dockerfile: "./backend/Dockerfile.golang"
context: "./"
ubuntu-version: '2204'
- build-type: 'cublas'
cuda-major-version: "12"
cuda-minor-version: "0"
platforms: 'linux/arm64'
skip-drivers: 'false'
tag-latest: 'auto'
tag-suffix: '-nvidia-l4t-arm64-omnivoice-cpp'
base-image: "nvcr.io/nvidia/l4t-jetpack:r36.4.0"
runs-on: 'ubuntu-24.04-arm'
backend: "omnivoice-cpp"
dockerfile: "./backend/Dockerfile.golang"
context: "./"
ubuntu-version: '2204'
- build-type: 'hipblas'
cuda-major-version: ""
cuda-minor-version: ""
@@ -3563,6 +3889,19 @@ include:
dockerfile: "./backend/Dockerfile.golang"
context: "./"
ubuntu-version: '2404'
- build-type: 'hipblas'
cuda-major-version: ""
cuda-minor-version: ""
platforms: 'linux/amd64'
tag-latest: 'auto'
tag-suffix: '-gpu-rocm-hipblas-omnivoice-cpp'
base-image: "rocm/dev-ubuntu-24.04:6.4.4"
runs-on: 'ubuntu-latest'
skip-drivers: 'false'
backend: "omnivoice-cpp"
dockerfile: "./backend/Dockerfile.golang"
context: "./"
ubuntu-version: '2404'
# vibevoice-cpp
- build-type: ''
cuda-major-version: ""
@@ -4342,6 +4681,36 @@ include:
dockerfile: "./backend/Dockerfile.golang"
context: "./"
ubuntu-version: '2404'
# supertonic CPU (amd64)
- build-type: ''
cuda-major-version: ""
cuda-minor-version: ""
platforms: 'linux/amd64'
platform-tag: 'amd64'
tag-latest: 'auto'
tag-suffix: '-cpu-supertonic'
runs-on: 'ubuntu-latest'
base-image: "ubuntu:24.04"
skip-drivers: 'false'
backend: "supertonic"
dockerfile: "./backend/Dockerfile.golang"
context: "./"
ubuntu-version: '2404'
# supertonic CPU (arm64)
- build-type: ''
cuda-major-version: ""
cuda-minor-version: ""
platforms: 'linux/arm64'
platform-tag: 'arm64'
tag-latest: 'auto'
tag-suffix: '-cpu-supertonic'
runs-on: 'ubuntu-24.04-arm'
base-image: "ubuntu:24.04"
skip-drivers: 'false'
backend: "supertonic"
dockerfile: "./backend/Dockerfile.golang"
context: "./"
ubuntu-version: '2404'
# Darwin matrix (consumed by backend-jobs-darwin).
includeDarwin:
@@ -4393,6 +4762,10 @@ includeDarwin:
tag-suffix: "-metal-darwin-arm64-qwen3-tts-cpp"
build-type: "metal"
lang: "go"
- backend: "omnivoice-cpp"
tag-suffix: "-metal-darwin-arm64-omnivoice-cpp"
build-type: "metal"
lang: "go"
- backend: "vibevoice-cpp"
tag-suffix: "-metal-darwin-arm64-vibevoice-cpp"
build-type: "metal"
@@ -4475,3 +4848,6 @@ includeDarwin:
- backend: "speaker-recognition"
tag-suffix: "-metal-darwin-arm64-speaker-recognition"
build-type: "mps"
- backend: "ds4"
tag-suffix: "-metal-darwin-arm64-ds4"
lang: "go"

View File

@@ -98,6 +98,7 @@ jobs:
/opt/homebrew/Cellar/hiredis
/opt/homebrew/Cellar/xxhash
/opt/homebrew/Cellar/zstd
/opt/homebrew/Cellar/nlohmann-json
key: brew-${{ runner.os }}-${{ runner.arch }}-v1-${{ hashFiles('.github/workflows/backend_build_darwin.yml') }}
- name: Dependencies
@@ -109,7 +110,10 @@ jobs:
# Without explicitly installing them, a brew cache-hit run restores
# ccache's Cellar dir but skips installing those transitive deps,
# and ccache fails at runtime with `dyld: Library not loaded`.
brew install protobuf grpc make protoc-gen-go protoc-gen-go-grpc libomp llvm ccache blake3 fmt hiredis xxhash zstd
# nlohmann-json is header-only and required by the ds4 backend
# (dsml_renderer.cpp includes <nlohmann/json.hpp>); on Linux it comes
# from the apt-installed nlohmann-json3-dev in the build image.
brew install protobuf grpc make protoc-gen-go protoc-gen-go-grpc libomp llvm ccache blake3 fmt hiredis xxhash zstd nlohmann-json
# Force-reinstall ccache so brew re-validates its full runtime-dep
# closure on every run. This is the durable fix: when the upstream
# ccache formula gains a new transitive dep (as it has multiple times
@@ -128,7 +132,7 @@ jobs:
# and decides "already installed" without re-linking, so on a cache-
# hit run the formulas aren't on PATH. Force-link them; --overwrite
# tolerates pre-existing symlinks from earlier installs.
brew link --overwrite protobuf grpc make protoc-gen-go protoc-gen-go-grpc libomp llvm ccache blake3 fmt hiredis xxhash zstd 2>/dev/null || true
brew link --overwrite protobuf grpc make protoc-gen-go protoc-gen-go-grpc libomp llvm ccache blake3 fmt hiredis xxhash zstd nlohmann-json 2>/dev/null || true
- name: Save Homebrew cache
if: github.event_name != 'pull_request' && steps.brew-cache.outputs.cache-hit != 'true'
@@ -148,6 +152,7 @@ jobs:
/opt/homebrew/Cellar/hiredis
/opt/homebrew/Cellar/xxhash
/opt/homebrew/Cellar/zstd
/opt/homebrew/Cellar/nlohmann-json
key: brew-${{ runner.os }}-${{ runner.arch }}-v1-${{ hashFiles('.github/workflows/backend_build_darwin.yml') }}
# ---- ccache for llama.cpp CMake builds ----

View File

@@ -26,6 +26,10 @@ jobs:
variable: "DS4_VERSION"
branch: "main"
file: "backend/cpp/ds4/Makefile"
- repository: "localai-org/privacy-filter.cpp"
variable: "PRIVACY_FILTER_VERSION"
branch: "master"
file: "backend/cpp/privacy-filter/Makefile"
- repository: "ggml-org/whisper.cpp"
variable: "WHISPER_CPP_VERSION"
branch: "master"
@@ -38,6 +42,10 @@ jobs:
variable: "PARAKEET_VERSION"
branch: "master"
file: "backend/go/parakeet-cpp/Makefile"
- repository: "mudler/depth-anything.cpp"
variable: "DEPTHANYTHING_VERSION"
branch: "master"
file: "backend/go/depth-anything-cpp/Makefile"
- repository: "leejet/stable-diffusion.cpp"
variable: "STABLEDIFFUSION_GGML_VERSION"
branch: "master"
@@ -66,10 +74,14 @@ jobs:
variable: "LOCATEANYTHING_VERSION"
branch: "master"
file: "backend/go/locate-anything-cpp/Makefile"
- repository: "predict-woo/qwen3-tts.cpp"
- repository: "ServeurpersoCom/qwentts.cpp"
variable: "QWEN3TTS_CPP_VERSION"
branch: "main"
branch: "master"
file: "backend/go/qwen3-tts-cpp/Makefile"
- repository: "ServeurpersoCom/omnivoice.cpp"
variable: "OMNIVOICE_VERSION"
branch: "master"
file: "backend/go/omnivoice-cpp/Makefile"
- repository: "localai-org/vibevoice.cpp"
variable: "VIBEVOICE_CPP_VERSION"
branch: "master"

View File

@@ -21,7 +21,10 @@ jobs:
uses: securego/gosec@v2.27.1
with:
# we let the report trigger content trigger a failure using the GitHub Security features.
args: '-no-fail -fmt sarif -out results.sarif ./...'
# backend/go/supertonic is excluded: it vendors upstream supertone-inc/supertonic
# (helper.go), whose findings (G304 model-file loads, G404 math/rand for flow-matching
# noise, G104 unhandled errors) are inherent to that upstream code, not ours to rewrite.
args: '-no-fail -exclude-dir=backend/go/supertonic -fmt sarif -out results.sarif ./...'
- name: Upload SARIF file
if: ${{ github.actor != 'dependabot[bot]' }}
uses: github/codeql-action/upload-sarif@v4

View File

@@ -74,6 +74,8 @@ linters:
paths:
# Upstream whisper.cpp source tree fetched by the whisper backend Makefile.
- 'backend/go/whisper/sources'
# Vendored upstream supertonic pipeline (supertone-inc/supertonic go/helper.go).
- 'backend/go/supertonic/helper.go'
- 'docs/'
rules:
# CLI entry points: kong's `env:"..."` tag is the legitimate env→struct

View File

@@ -1,5 +1,5 @@
# Disable parallel execution for backend builds
.NOTPARALLEL: backends/diffusers backends/llama-cpp backends/turboquant backends/outetts backends/piper backends/stablediffusion-ggml backends/whisper backends/crispasr backends/parakeet-cpp backends/faster-whisper backends/silero-vad backends/local-store backends/huggingface backends/rfdetr backends/rfdetr-cpp backends/insightface backends/speaker-recognition backends/kitten-tts backends/kokoro backends/chatterbox backends/llama-cpp-darwin backends/neutts build-darwin-python-backend build-darwin-go-backend backends/mlx backends/diffuser-darwin backends/mlx-vlm backends/mlx-audio backends/mlx-distributed backends/stablediffusion-ggml-darwin backends/vllm backends/vllm-omni backends/sglang backends/moonshine backends/pocket-tts backends/qwen-tts backends/faster-qwen3-tts backends/qwen-asr backends/nemo backends/voxcpm backends/whisperx backends/ace-step backends/acestep-cpp backends/fish-speech backends/voxtral backends/opus backends/trl backends/llama-cpp-quantization backends/kokoros backends/sam3-cpp backends/qwen3-tts-cpp backends/vibevoice-cpp backends/localvqe backends/tinygrad backends/sherpa-onnx backends/ds4 backends/ds4-darwin backends/liquid-audio
.NOTPARALLEL: backends/diffusers backends/llama-cpp backends/turboquant backends/outetts backends/piper backends/stablediffusion-ggml backends/whisper backends/crispasr backends/parakeet-cpp backends/faster-whisper backends/silero-vad backends/local-store backends/huggingface backends/rfdetr backends/rfdetr-cpp backends/insightface backends/speaker-recognition backends/kitten-tts backends/kokoro backends/chatterbox backends/llama-cpp-darwin backends/neutts build-darwin-python-backend build-darwin-go-backend backends/mlx backends/diffuser-darwin backends/mlx-vlm backends/mlx-audio backends/mlx-distributed backends/stablediffusion-ggml-darwin backends/vllm backends/vllm-omni backends/sglang backends/moonshine backends/pocket-tts backends/qwen-tts backends/faster-qwen3-tts backends/qwen-asr backends/nemo backends/voxcpm backends/whisperx backends/ace-step backends/acestep-cpp backends/fish-speech backends/voxtral backends/opus backends/trl backends/llama-cpp-quantization backends/kokoros backends/sam3-cpp backends/qwen3-tts-cpp backends/omnivoice-cpp backends/vibevoice-cpp backends/localvqe backends/tinygrad backends/sherpa-onnx backends/ds4 backends/ds4-darwin backends/liquid-audio backends/supertonic backends/depth-anything-cpp backends/privacy-filter
GOCMD=go
GOTEST=$(GOCMD) test
@@ -595,6 +595,8 @@ test-extra: prepare-test-extra
$(MAKE) -C backend/rust/kokoros test
$(MAKE) -C backend/go/rfdetr-cpp test
$(MAKE) -C backend/go/locate-anything-cpp test
$(MAKE) -C backend/go/depth-anything-cpp test
$(MAKE) -C backend/go/supertonic test
##
## End-to-end gRPC tests that exercise a built backend container image.
@@ -1162,6 +1164,10 @@ BACKEND_TURBOQUANT = turboquant|turboquant|.|false|false
# Single-model; hardware-only validation lives at tests/e2e-backends/
# (BACKEND_BINARY mode); see docs/superpowers/plans/2026-05-11-ds4-backend.md.
BACKEND_DS4 = ds4|ds4|.|false|false
# privacy-filter wraps the standalone privacy-filter.cpp GGML engine (the
# openai-privacy-filter PII/NER token classifier) — the TokenClassify RPC for
# the PII redactor tier, on stock ggml with no llama.cpp carry-patches.
BACKEND_PRIVACY_FILTER = privacy-filter|privacy-filter|.|false|false
# Golang backends
BACKEND_PIPER = piper|golang|.|false|true
@@ -1173,13 +1179,16 @@ BACKEND_STABLEDIFFUSION_GGML = stablediffusion-ggml|golang|.|--progress=plain|tr
BACKEND_WHISPER = whisper|golang|.|false|true
BACKEND_CRISPASR = crispasr|golang|.|false|true
BACKEND_PARAKEET_CPP = parakeet-cpp|golang|.|false|true
BACKEND_DEPTH_ANYTHING_CPP = depth-anything-cpp|golang|.|false|true
BACKEND_VOXTRAL = voxtral|golang|.|false|true
BACKEND_ACESTEP_CPP = acestep-cpp|golang|.|false|true
BACKEND_QWEN3_TTS_CPP = qwen3-tts-cpp|golang|.|false|true
BACKEND_OMNIVOICE_CPP = omnivoice-cpp|golang|.|false|true
BACKEND_VIBEVOICE_CPP = vibevoice-cpp|golang|.|false|true
BACKEND_LOCALVQE = localvqe|golang|.|false|true
BACKEND_OPUS = opus|golang|.|false|true
BACKEND_SHERPA_ONNX = sherpa-onnx|golang|.|false|true
BACKEND_SUPERTONIC = supertonic|golang|.|false|true
# Python backends with root context
BACKEND_RERANKERS = rerankers|python|.|false|true
@@ -1253,6 +1262,7 @@ $(eval $(call generate-docker-build-target,$(BACKEND_LLAMA_CPP)))
$(eval $(call generate-docker-build-target,$(BACKEND_IK_LLAMA_CPP)))
$(eval $(call generate-docker-build-target,$(BACKEND_TURBOQUANT)))
$(eval $(call generate-docker-build-target,$(BACKEND_DS4)))
$(eval $(call generate-docker-build-target,$(BACKEND_PRIVACY_FILTER)))
$(eval $(call generate-docker-build-target,$(BACKEND_PIPER)))
$(eval $(call generate-docker-build-target,$(BACKEND_LOCAL_STORE)))
$(eval $(call generate-docker-build-target,$(BACKEND_CLOUD_PROXY)))
@@ -1262,6 +1272,7 @@ $(eval $(call generate-docker-build-target,$(BACKEND_STABLEDIFFUSION_GGML)))
$(eval $(call generate-docker-build-target,$(BACKEND_WHISPER)))
$(eval $(call generate-docker-build-target,$(BACKEND_CRISPASR)))
$(eval $(call generate-docker-build-target,$(BACKEND_PARAKEET_CPP)))
$(eval $(call generate-docker-build-target,$(BACKEND_DEPTH_ANYTHING_CPP)))
$(eval $(call generate-docker-build-target,$(BACKEND_VOXTRAL)))
$(eval $(call generate-docker-build-target,$(BACKEND_OPUS)))
$(eval $(call generate-docker-build-target,$(BACKEND_RERANKERS)))
@@ -1294,6 +1305,7 @@ $(eval $(call generate-docker-build-target,$(BACKEND_WHISPERX)))
$(eval $(call generate-docker-build-target,$(BACKEND_ACE_STEP)))
$(eval $(call generate-docker-build-target,$(BACKEND_ACESTEP_CPP)))
$(eval $(call generate-docker-build-target,$(BACKEND_QWEN3_TTS_CPP)))
$(eval $(call generate-docker-build-target,$(BACKEND_OMNIVOICE_CPP)))
$(eval $(call generate-docker-build-target,$(BACKEND_VIBEVOICE_CPP)))
$(eval $(call generate-docker-build-target,$(BACKEND_LOCALVQE)))
$(eval $(call generate-docker-build-target,$(BACKEND_MLX)))
@@ -1306,12 +1318,13 @@ $(eval $(call generate-docker-build-target,$(BACKEND_KOKOROS)))
$(eval $(call generate-docker-build-target,$(BACKEND_SAM3_CPP)))
$(eval $(call generate-docker-build-target,$(BACKEND_RFDETR_CPP)))
$(eval $(call generate-docker-build-target,$(BACKEND_SHERPA_ONNX)))
$(eval $(call generate-docker-build-target,$(BACKEND_SUPERTONIC)))
# Pattern rule for docker-save targets
docker-save-%: backend-images
docker save local-ai-backend:$* -o backend-images/$*.tar
docker-build-backends: docker-build-llama-cpp docker-build-ik-llama-cpp docker-build-turboquant docker-build-ds4 docker-build-rerankers docker-build-vllm docker-build-vllm-omni docker-build-sglang docker-build-transformers docker-build-outetts docker-build-diffusers docker-build-kokoro docker-build-faster-whisper docker-build-crispasr docker-build-coqui docker-build-chatterbox docker-build-vibevoice docker-build-liquid-audio docker-build-moonshine docker-build-pocket-tts docker-build-qwen-tts docker-build-fish-speech docker-build-faster-qwen3-tts docker-build-qwen-asr docker-build-nemo docker-build-voxcpm docker-build-whisperx docker-build-ace-step docker-build-acestep-cpp docker-build-voxtral docker-build-mlx-distributed docker-build-trl docker-build-llama-cpp-quantization docker-build-tinygrad docker-build-kokoros docker-build-sam3-cpp docker-build-rfdetr-cpp docker-build-qwen3-tts-cpp docker-build-vibevoice-cpp docker-build-localvqe docker-build-insightface docker-build-speaker-recognition docker-build-sherpa-onnx docker-build-cloud-proxy
docker-build-backends: docker-build-llama-cpp docker-build-ik-llama-cpp docker-build-turboquant docker-build-ds4 docker-build-rerankers docker-build-vllm docker-build-vllm-omni docker-build-sglang docker-build-transformers docker-build-outetts docker-build-diffusers docker-build-kokoro docker-build-faster-whisper docker-build-crispasr docker-build-coqui docker-build-chatterbox docker-build-vibevoice docker-build-liquid-audio docker-build-moonshine docker-build-pocket-tts docker-build-qwen-tts docker-build-fish-speech docker-build-faster-qwen3-tts docker-build-qwen-asr docker-build-nemo docker-build-voxcpm docker-build-whisperx docker-build-ace-step docker-build-acestep-cpp docker-build-voxtral docker-build-mlx-distributed docker-build-trl docker-build-llama-cpp-quantization docker-build-tinygrad docker-build-kokoros docker-build-sam3-cpp docker-build-rfdetr-cpp docker-build-qwen3-tts-cpp docker-build-omnivoice-cpp docker-build-vibevoice-cpp docker-build-localvqe docker-build-insightface docker-build-speaker-recognition docker-build-sherpa-onnx docker-build-cloud-proxy docker-build-supertonic docker-build-depth-anything-cpp docker-build-privacy-filter
########################################################
### Mock Backend for E2E Tests

View File

@@ -29,6 +29,18 @@
<a href="https://trendshift.io/repositories/5539" target="_blank"><img src="https://trendshift.io/api/badge/repositories/5539" alt="mudler%2FLocalAI | Trendshift" style="width: 250px; height: 55px;" width="250" height="55"/></a>
</p>
<!-- Keep these links, translations synced daily. -->
<p align="center">
<a href="https://zdoc.app/de/mudler/LocalAI">Deutsch</a> |
<a href="https://zdoc.app/es/mudler/LocalAI">Español</a> |
<a href="https://zdoc.app/fr/mudler/LocalAI">français</a> |
<a href="https://zdoc.app/ja/mudler/LocalAI">日本語</a> |
<a href="https://zdoc.app/ko/mudler/LocalAI">한국어</a> |
<a href="https://zdoc.app/pt/mudler/LocalAI">Português</a> |
<a href="https://zdoc.app/ru/mudler/LocalAI">Русский</a> |
<a href="https://zdoc.app/zh/mudler/LocalAI">中文</a>
</p>
**LocalAI** is the open-source AI engine. Run any model - LLMs, vision, voice, image, video - on any hardware. No GPU required.
**A small core, not a bundle.** Each backend wraps a best-in-class engine (llama.cpp, vLLM, whisper.cpp, stable-diffusion, MLX...) in its own image, pulled only when a model needs it. You install nothing you don't use.
@@ -165,6 +177,10 @@ For more details, see the [Getting Started guide](https://localai.io/basics/gett
## Latest News
- **June 2026**: New [realtime voice assistant demo](https://github.com/localai-org/localai-realtime-demo) (a tiny Go client for the Realtime API with a full talk-back voice loop and tool calling), plus [streaming of the realtime LLM / TTS / transcription pipeline stages](https://github.com/mudler/LocalAI/pull/10176) and [configurable WebRTC ICE candidates](https://github.com/mudler/LocalAI/pull/10231).
- **June 2026**: Big speech push: the [parakeet.cpp](https://github.com/mudler/parakeet.cpp) ASR engine gains [NeMo-faithful segment timestamps](https://github.com/mudler/LocalAI/pull/10207), a [multilingual streaming Nemotron-3.5 model](https://github.com/mudler/LocalAI/pull/10199), [dynamic batching for concurrent transcription](https://github.com/mudler/LocalAI/pull/10112) and [CUDA graphs](https://github.com/mudler/LocalAI/pull/10273); the new [CrispASR backend](https://github.com/mudler/LocalAI/pull/10099) adds multi-architecture ASR + TTS, and [60 Piper TTS voices across 42 languages](https://github.com/mudler/LocalAI/pull/10296) land in the gallery (plus [per-request TTS instructions and params](https://github.com/mudler/LocalAI/pull/10172)).
- **June 2026**: New backends and models: [locate-anything.cpp](https://github.com/mudler/LocalAI/pull/10264) for open-vocabulary object detection via ggml, [Ideogram4 image generation](https://github.com/mudler/LocalAI/pull/10201) in stablediffusion-ggml, [llama.cpp video input](https://github.com/mudler/LocalAI/pull/10216), and the [Gemma 4 QAT family with MTP speculative-decoding pairs](https://github.com/mudler/LocalAI/pull/10215). Plus an [interactive CLI chat mode](https://github.com/mudler/LocalAI/pull/10226) and [RAG source citations in agent responses](https://github.com/mudler/LocalAI/pull/10228).
- **June 2026**: Distributed mode hardening: [prefix-cache-aware routing](https://github.com/mudler/LocalAI/pull/10071), a [production-ready request router with auto-sized embedding/rerank batches](https://github.com/mudler/LocalAI/pull/10104), [ds4 layer-split distributed inference](https://github.com/mudler/LocalAI/pull/10098), [NATS JWT auth + TLS/mTLS](https://github.com/mudler/LocalAI/pull/10159), and [resumable file uploads](https://github.com/mudler/LocalAI/pull/10109).
- **May 2026**: **LocalAI 4.3.0** - `llama.cpp` [prompt cache on by default](https://github.com/mudler/LocalAI/pull/9925) (repeated system prompts collapse from minutes to seconds), [keyless cosign signing of backend OCI images](https://github.com/mudler/LocalAI/pull/9823), [per-API-key + per-user usage attribution](https://github.com/mudler/LocalAI/pull/9920), Distributed v3 with [per-request replica routing](https://github.com/mudler/LocalAI/pull/9968). [Release notes](https://github.com/mudler/LocalAI/releases/tag/v4.3.0)
- **May 2026**: **LocalAI 4.2.0** - LocalAI sees and hears: [voice recognition](https://github.com/mudler/LocalAI/pull/9500), [face recognition + antispoofing liveness](https://github.com/mudler/LocalAI/pull/9480), speaker diarization. Plus [drop-in Ollama API](https://github.com/mudler/LocalAI/pull/9284), [video generation](https://github.com/mudler/LocalAI/pull/9420), redesigned UI with i18n + admin-configurable branding, vLLM at feature parity with llama.cpp, and 11 new backends. [Release notes](https://github.com/mudler/LocalAI/releases/tag/v4.2.0)
- **April 2026**: **LocalAI 4.1.0** - LocalAI becomes a control tower: distributed cluster mode with VRAM-aware smart routing + autoscaling, multi-user platform with OIDC and API keys, per-user quotas with predictive analytics, in-UI fine-tuning with TRL (auto-export to GGUF), on-the-fly quantization backend, visual pipeline editor. [Release notes](https://github.com/mudler/LocalAI/releases/tag/v4.1.0)
@@ -204,10 +220,26 @@ For older news and full release notes, see [GitHub Releases](https://github.com/
## Supported Backends & Acceleration
LocalAI supports **36+ backends** including llama.cpp, vLLM, transformers, whisper.cpp, diffusers, MLX, MLX-VLM, and many more. Hardware acceleration is available for **NVIDIA** (CUDA 12/13), **AMD** (ROCm), **Intel** (oneAPI/SYCL), **Apple Silicon** (Metal), **Vulkan**, and **NVIDIA Jetson** (L4T). All backends can be installed on-the-fly from the [Backend Gallery](https://localai.io/backends/).
LocalAI supports **60+ backends** including llama.cpp, vLLM, SGLang, transformers, whisper.cpp, diffusers, MLX, MLX-VLM, and many more. Hardware acceleration is available for **NVIDIA** (CUDA 12/13), **AMD** (ROCm), **Intel** (oneAPI/SYCL), **Apple Silicon** (Metal), **Vulkan**, and **NVIDIA Jetson** (L4T). All backends can be installed on-the-fly from the [Backend Gallery](https://localai.io/backends/).
See the full [Backend & Model Compatibility Table](https://localai.io/model-compatibility/) and [GPU Acceleration guide](https://localai.io/features/gpu-acceleration/).
### Backends built by us
Most backends wrap a best-in-class upstream engine. A handful of them are native C/C++/GGML engines (no Python at inference) developed and maintained by the LocalAI project itself:
| Backend | What it does |
|---------|-------------|
| [parakeet.cpp](https://github.com/mudler/parakeet.cpp) | C++/GGML port of NVIDIA NeMo Parakeet ASR (tdt/ctc/rnnt/hybrid), with cache-aware streaming transcription |
| [voxtral.c](https://github.com/mudler/voxtral.c) | Voxtral Realtime 4B speech-to-text in pure C |
| [vibevoice.cpp](https://github.com/mudler/vibevoice.cpp) | Native port of Microsoft VibeVoice for TTS (voice cloning) and long-form ASR with speaker diarization |
| [rf-detr.cpp](https://github.com/mudler/rf-detr.cpp) | Native RF-DETR object detection and instance segmentation |
| [locate-anything.cpp](https://github.com/mudler/locate-anything.cpp) | Open-vocabulary object detection and visual grounding (LocateAnything-3B) |
| [depth-anything.cpp](https://github.com/mudler/depth-anything.cpp) | Depth Anything 3 monocular metric depth + camera pose estimation |
| [privacy-filter.cpp](https://github.com/localai-org/privacy-filter.cpp) | Standalone GGML PII/NER token-classification engine powering LocalAI's PII redaction tier |
| [LocalVQE](https://github.com/localai-org/LocalVQE) | Joint acoustic echo cancellation, noise suppression, and dereverberation |
| [local-store](https://github.com/mudler/LocalAI) | Local-first vector database for embeddings (shipped in-tree) |
## Resources
- [Documentation](https://localai.io/)
@@ -217,7 +249,7 @@ See the full [Backend & Model Compatibility Table](https://localai.io/model-comp
- [Integrations & community projects](https://localai.io/docs/integrations/)
- [Installation video walkthrough](https://www.youtube.com/watch?v=cMVNnlqwfw4)
- [Media & blog posts](https://localai.io/basics/news/#media-blogs-social)
- [Examples](https://github.com/mudler/LocalAI-examples)
- [Examples](https://github.com/mudler/LocalAI-examples) — including the [realtime voice assistant demo](https://github.com/localai-org/localai-realtime-demo) (Go client for the Realtime API with tool calling)
## Team

View File

@@ -0,0 +1,109 @@
ARG BASE_IMAGE=ubuntu:24.04
# BUILDER_BASE_IMAGE defaults to BASE_IMAGE so the Dockerfile parses when no
# prebuilt base is supplied; the builder-prebuilt stage is only entered when
# BUILDER_TARGET=builder-prebuilt, so the fallback content is harmless
# (BuildKit prunes the unreferenced builder).
ARG BUILDER_BASE_IMAGE=${BASE_IMAGE}
# BUILDER_TARGET selects which builder stage the scratch image copies from.
# Declared before any FROM so it is usable in `FROM ${BUILDER_TARGET}`. The
# backend_build workflow sets it to builder-prebuilt when the matrix entry
# provides builder-base-image, else builder-fromsource (the local default).
ARG BUILDER_TARGET=builder-fromsource
ARG APT_MIRROR=""
ARG APT_PORTS_MIRROR=""
# privacy-filter: standalone GGML engine for the openai-privacy-filter PII/NER
# token classifier, wrapped as a LocalAI gRPC backend.
#
# Mirrors backend/Dockerfile.llama-cpp: the build toolchain (gRPC + cmake +
# protoc + conditional CUDA/Vulkan) comes from the shared
# .docker/install-base-deps.sh (from-source path) or a prebuilt
# quay.io/go-skynet/ci-cache:base-grpc-* image (CI path) — nothing GPU-specific
# is hand-rolled here. BUILD_TYPE selects the engine backend in the Makefile:
# "" = cpu, "cublas" -> -DPF_CUDA=ON, "vulkan" -> -DPF_VULKAN=ON.
# ============================================================================
# Stage: builder-fromsource — self-contained build. Runs the same install
# script backend/Dockerfile.base-grpc-builder runs, so this path is
# bit-equivalent to the prebuilt base. Used when BUILDER_TARGET=builder-fromsource
# (the default; local `make backends/privacy-filter`).
# ============================================================================
FROM ${BASE_IMAGE} AS builder-fromsource
ARG BUILD_TYPE
ARG CUDA_MAJOR_VERSION
ARG CUDA_MINOR_VERSION
ARG CMAKE_FROM_SOURCE=false
# CUDA Toolkit 13.x needs CMake 3.31.9+ for correct toolchain/arch detection.
ARG CMAKE_VERSION=3.31.10
ARG GRPC_VERSION=v1.65.0
ARG GRPC_MAKEFLAGS="-j4 -Otarget"
ARG SKIP_DRIVERS=false
ARG TARGETARCH
ARG UBUNTU_VERSION=2404
ARG APT_MIRROR
ARG APT_PORTS_MIRROR
ENV BUILD_TYPE=${BUILD_TYPE} \
CUDA_MAJOR_VERSION=${CUDA_MAJOR_VERSION} \
CUDA_MINOR_VERSION=${CUDA_MINOR_VERSION} \
CMAKE_FROM_SOURCE=${CMAKE_FROM_SOURCE} \
CMAKE_VERSION=${CMAKE_VERSION} \
GRPC_VERSION=${GRPC_VERSION} \
GRPC_MAKEFLAGS=${GRPC_MAKEFLAGS} \
SKIP_DRIVERS=${SKIP_DRIVERS} \
TARGETARCH=${TARGETARCH} \
UBUNTU_VERSION=${UBUNTU_VERSION} \
APT_MIRROR=${APT_MIRROR} \
APT_PORTS_MIRROR=${APT_PORTS_MIRROR} \
DEBIAN_FRONTEND=noninteractive
# CUDA on PATH (a no-op when CUDA is not installed, e.g. cpu/vulkan builds).
ENV PATH=/usr/local/cuda/bin:${PATH}
WORKDIR /build
# apt deps + cmake + protoc + gRPC + conditional CUDA/Vulkan, all from the
# shared script (the source of truth that base-grpc-builder also runs).
RUN --mount=type=bind,source=.docker/install-base-deps.sh,target=/usr/local/sbin/install-base-deps \
--mount=type=bind,source=.docker/apt-mirror.sh,target=/usr/local/sbin/apt-mirror \
bash /usr/local/sbin/install-base-deps
# install-base-deps installs gRPC under /opt/grpc; copy it to /usr/local so the
# backend's find_package(gRPC CONFIG) resolves it at the canonical prefix.
RUN cp -a /opt/grpc/. /usr/local/
COPY . /LocalAI
RUN --mount=type=cache,target=/root/.ccache,id=privacy-filter-ccache-${TARGETARCH}-${BUILD_TYPE},sharing=locked \
make -C /LocalAI/backend/cpp/privacy-filter BUILD_TYPE=${BUILD_TYPE} NATIVE=false grpc-server package
# ============================================================================
# Stage: builder-prebuilt — FROM a prebuilt
# quay.io/go-skynet/ci-cache:base-grpc-* image (gRPC at /opt/grpc + apt deps +
# CUDA/Vulkan already installed). Used in CI when the matrix entry sets
# builder-base-image.
# ============================================================================
FROM ${BUILDER_BASE_IMAGE} AS builder-prebuilt
ARG BUILD_TYPE
ARG TARGETARCH
ENV BUILD_TYPE=${BUILD_TYPE}
# CUDA on PATH (a no-op for the cpu/vulkan base images).
ENV PATH=/usr/local/cuda/bin:${PATH}
# Mirror builder-fromsource: the base-grpc image installs gRPC to /opt/grpc but
# does not copy it to /usr/local.
RUN cp -a /opt/grpc/. /usr/local/
COPY . /LocalAI
RUN --mount=type=cache,target=/root/.ccache,id=privacy-filter-ccache-${TARGETARCH}-${BUILD_TYPE},sharing=locked \
make -C /LocalAI/backend/cpp/privacy-filter BUILD_TYPE=${BUILD_TYPE} NATIVE=false grpc-server package
# ============================================================================
# Final stage — copy the package output from the selected builder. BuildKit
# does not expand variables in `COPY --from=`, so alias the chosen builder to a
# fixed stage name first.
# ============================================================================
FROM ${BUILDER_TARGET} AS builder
FROM scratch
COPY --from=builder /LocalAI/backend/cpp/privacy-filter/package/. ./

View File

@@ -24,6 +24,7 @@ service Backend {
rpc TokenizeString(PredictOptions) returns (TokenizationResponse) {}
rpc Status(HealthMessage) returns (StatusResponse) {}
rpc Detect(DetectOptions) returns (DetectResponse) {}
rpc Depth(DepthRequest) returns (DepthResponse) {}
rpc FaceVerify(FaceVerifyRequest) returns (FaceVerifyResponse) {}
rpc FaceAnalyze(FaceAnalyzeRequest) returns (FaceAnalyzeResponse) {}
rpc VoiceVerify(VoiceVerifyRequest) returns (VoiceVerifyResponse) {}
@@ -670,6 +671,35 @@ message DetectResponse {
repeated Detection Detections = 1;
}
// --- Depth estimation messages (Depth Anything 3) ---
message DepthRequest {
string src = 1; // input image (filesystem path or base64-encoded payload)
string dst = 2; // optional output directory for exports (glb/colmap)
bool include_depth = 3; // return the per-pixel metric depth map
bool include_confidence = 4; // return the per-pixel confidence map (DualDPT)
bool include_pose = 5; // return camera extrinsics/intrinsics (DualDPT)
bool include_sky = 6; // return the per-pixel sky map (mono models)
bool include_points = 7; // back-project to a 3D point cloud (DualDPT)
float points_conf_thresh = 8; // keep points with confidence >= this threshold
repeated string exports = 9; // requested exports: "glb", "colmap"
}
message DepthResponse {
int32 width = 1; // processed depth-map width
int32 height = 2; // processed depth-map height
repeated float depth = 3; // width*height row-major metric depth
repeated float confidence = 4; // width*height row-major confidence (DualDPT)
repeated float sky = 5; // width*height row-major sky map (mono)
repeated float extrinsics = 6; // 12 floats, 3x4 row-major (world-to-camera)
repeated float intrinsics = 7; // 9 floats, 3x3 row-major
int32 num_points = 8; // number of 3D points
repeated float points = 9; // num_points*3 xyz, world space
bytes point_colors = 10; // num_points*3 uint8 rgb
repeated string export_paths = 11; // paths written for the requested exports
bool is_metric = 12; // depth is in metric units
}
// --- Face recognition messages ---
message FacialArea {

View File

@@ -9,6 +9,22 @@ option(DS4_NATIVE "Compile with -march=native / -mcpu=native" ON)
set(DS4_GPU "cpu" CACHE STRING "GPU backend: cpu, cuda, or metal")
set(DS4_DIR "${CMAKE_CURRENT_SOURCE_DIR}/ds4" CACHE PATH "Path to cloned ds4 source")
if(${CMAKE_SYSTEM_NAME} MATCHES "Darwin")
# Homebrew installs protobuf/grpc under a non-default prefix. The generated
# backend.pb.cc / backend.grpc.pb.cc pull in google/protobuf and grpcpp
# headers, but the hw_grpc_proto library links neither target, so on macOS
# the headers (e.g. google/protobuf/runtime_version.h) are never on the
# compiler's include path. Add the Homebrew prefix globally, matching the
# llama-cpp backend which builds on Darwin CI.
if(CMAKE_HOST_SYSTEM_PROCESSOR MATCHES "arm64")
set(HOMEBREW_DEFAULT_PREFIX "/opt/homebrew")
else()
set(HOMEBREW_DEFAULT_PREFIX "/usr/local")
endif()
link_directories("${HOMEBREW_DEFAULT_PREFIX}/lib")
include_directories("${HOMEBREW_DEFAULT_PREFIX}/include")
endif()
find_package(Threads REQUIRED)
find_package(Protobuf CONFIG QUIET)
if(NOT Protobuf_FOUND)

View File

@@ -1,10 +1,10 @@
# ds4 backend Makefile.
#
# Upstream pin lives below as DS4_VERSION?=d881f2a05e8ff6bec001315a36b794b4aa310173
# Upstream pin lives below as DS4_VERSION?=80ebbc396aee40eedc1d829222f3362d10fa4c6c
# (.github/bump_deps.sh) can find and update it - matches the
# llama-cpp / ik-llama-cpp / turboquant convention.
DS4_VERSION?=d881f2a05e8ff6bec001315a36b794b4aa310173
DS4_VERSION?=80ebbc396aee40eedc1d829222f3362d10fa4c6c
DS4_REPO?=https://github.com/antirez/ds4
CURRENT_MAKEFILE_DIR := $(dir $(abspath $(lastword $(MAKEFILE_LIST))))

View File

@@ -25,6 +25,8 @@ extern "C" {
#include <chrono>
#include <climits>
#include <csignal>
#include <cstddef>
#include <cstdint>
#include <cstdlib>
#include <cstring>
#include <ctime>
@@ -105,6 +107,130 @@ static bool parse_layers_spec(const std::string &spec, ds4_distributed_layers *o
return true;
}
// Parse a boolean LoadModel option. An empty value (a bare flag-style option
// like "ssd_streaming" with no colon) means true so model YAMLs can write
// options: ["ssd_streaming"] to enable a switch.
static bool parse_bool_option(const std::string &s, bool *out) {
if (s.empty() || s == "true" || s == "1" || s == "yes" || s == "on") { *out = true; return true; }
if (s == "false" || s == "0" || s == "no" || s == "off") { *out = false; return true; }
return false;
}
// Table-driven mapping from LoadModel option keys to ds4_engine_options fields.
// ds4_engine_options is a fixed C struct with no reflection, so the field set
// is enumerated once here; adding a future engine knob is a one-line table
// entry rather than a new branch in LoadModel. Two fields need ds4's own typed
// parsers (Gib, CacheExperts) so a plain string passthrough can't cover them.
enum class DsOptType { Bool, Int, Uint, Float, Str, Gib, CacheExperts };
struct DsOptSpec {
const char *key;
DsOptType type;
size_t off; // byte offset into ds4_engine_options
size_t off2; // second offset (CacheExperts writes experts + bytes)
bool is_path; // Str values: resolve a relative value against the model dir
};
static const DsOptSpec kEngineOptSpecs[] = {
{"mtp_path", DsOptType::Str, offsetof(ds4_engine_options, mtp_path), 0, true},
{"mtp_draft", DsOptType::Int, offsetof(ds4_engine_options, mtp_draft_tokens), 0},
{"mtp_margin", DsOptType::Float, offsetof(ds4_engine_options, mtp_margin), 0},
{"prefill_chunk", DsOptType::Uint, offsetof(ds4_engine_options, prefill_chunk), 0},
{"power_percent", DsOptType::Int, offsetof(ds4_engine_options, power_percent), 0},
{"warm_weights", DsOptType::Bool, offsetof(ds4_engine_options, warm_weights), 0},
{"quality", DsOptType::Bool, offsetof(ds4_engine_options, quality), 0},
{"ssd_streaming", DsOptType::Bool, offsetof(ds4_engine_options, ssd_streaming), 0},
{"ssd_streaming_cold", DsOptType::Bool, offsetof(ds4_engine_options, ssd_streaming_cold), 0},
{"ssd_streaming_preload_experts", DsOptType::Uint, offsetof(ds4_engine_options, ssd_streaming_preload_experts), 0},
{"ssd_streaming_cache_experts", DsOptType::CacheExperts, offsetof(ds4_engine_options, ssd_streaming_cache_experts),
offsetof(ds4_engine_options, ssd_streaming_cache_bytes)},
{"simulate_used_memory", DsOptType::Gib, offsetof(ds4_engine_options, simulate_used_memory_bytes), 0},
{"expert_profile_path", DsOptType::Str, offsetof(ds4_engine_options, expert_profile_path), 0, true},
{"directional_steering_file", DsOptType::Str, offsetof(ds4_engine_options, directional_steering_file), 0, true},
{"directional_steering_attn", DsOptType::Float, offsetof(ds4_engine_options, directional_steering_attn), 0},
{"directional_steering_ffn", DsOptType::Float, offsetof(ds4_engine_options, directional_steering_ffn), 0},
};
// Apply a single key:value LoadModel option to the engine options struct.
// Unknown keys are ignored (back-compat: callers pass mixed option sets).
// String values are copied into `storage`, whose elements the engine reads by
// pointer during ds4_engine_open; `storage` MUST have reserved capacity so
// push_back never reallocates and dangles an earlier c_str(). Returns false
// with `err` set when a recognized key has an invalid value.
static bool apply_engine_option(ds4_engine_options *opt, const std::string &key,
const std::string &val, const std::string &model_dir,
std::vector<std::string> &storage, std::string &err) {
const DsOptSpec *spec = nullptr;
for (const auto &s : kEngineOptSpecs) {
if (key == s.key) { spec = &s; break; }
}
if (!spec) return true; // unknown key: ignore
char *base = reinterpret_cast<char *>(opt);
switch (spec->type) {
case DsOptType::Bool: {
bool b = false;
if (!parse_bool_option(val, &b)) { err = key + " must be true/false"; return false; }
*reinterpret_cast<bool *>(base + spec->off) = b;
return true;
}
case DsOptType::Int: {
char *end = nullptr;
long v = std::strtol(val.c_str(), &end, 10);
if (val.empty() || !end || *end != '\0') { err = key + " must be an integer"; return false; }
*reinterpret_cast<int *>(base + spec->off) = static_cast<int>(v);
return true;
}
case DsOptType::Uint: {
char *end = nullptr;
long v = std::strtol(val.c_str(), &end, 10);
if (val.empty() || !end || *end != '\0' || v < 0 || v > static_cast<long>(UINT32_MAX)) {
err = key + " must be a non-negative integer"; return false;
}
*reinterpret_cast<uint32_t *>(base + spec->off) = static_cast<uint32_t>(v);
return true;
}
case DsOptType::Float: {
char *end = nullptr;
float f = std::strtof(val.c_str(), &end);
if (val.empty() || !end || *end != '\0') { err = key + " must be a number"; return false; }
*reinterpret_cast<float *>(base + spec->off) = f;
return true;
}
case DsOptType::Str: {
// Resolve a relative path option (e.g. mtp_path: a sibling GGUF the
// gallery downloaded next to the model) against the model directory, so
// YAMLs reference companion files by name. Absolute values pass through.
if (spec->is_path && !model_dir.empty() && !val.empty() && val.front() != '/') {
storage.push_back(model_dir + "/" + val);
} else {
storage.push_back(val);
}
*reinterpret_cast<const char **>(base + spec->off) = storage.back().c_str();
return true;
}
case DsOptType::Gib: {
uint64_t bytes = 0;
if (!ds4_parse_gib_arg(val.c_str(), &bytes)) {
err = key + " must be a GiB value, e.g. 64GB"; return false;
}
*reinterpret_cast<uint64_t *>(base + spec->off) = bytes;
return true;
}
case DsOptType::CacheExperts: {
uint32_t experts = 0;
uint64_t bytes = 0;
if (!ds4_parse_streaming_cache_experts_arg(val.c_str(), &experts, &bytes)) {
err = key + " must be a positive expert count or a <number>GB budget"; return false;
}
*reinterpret_cast<uint32_t *>(base + spec->off) = experts;
*reinterpret_cast<uint64_t *>(base + spec->off2) = bytes;
return true;
}
}
return true;
}
// When acting as a distributed coordinator, block until the worker route
// covers all layers (ds4_session_distributed_route_ready == 1) or the timeout
// elapses. Returns an empty string on success, or an error message to return
@@ -476,39 +602,10 @@ public:
return GStatus::OK;
}
std::string mtp_path;
int mtp_draft = 0;
float mtp_margin = 3.0f;
std::string ds4_role, ds4_layers, ds4_listen;
for (const auto &opt : request->options()) {
auto [k, v] = split_option(opt);
if (k == "mtp_path") mtp_path = v;
else if (k == "mtp_draft") mtp_draft = std::stoi(v);
else if (k == "mtp_margin") mtp_margin = std::stof(v);
else if (k == "kv_cache_dir") g_kv_cache_dir = v;
else if (k == "ds4_role") ds4_role = v;
else if (k == "ds4_layers") ds4_layers = v;
else if (k == "ds4_listen") ds4_listen = v;
else if (k == "ds4_route_timeout") {
if (!parse_positive_int(v, &g_route_timeout_sec)) {
result->set_success(false);
result->set_message("ds4: ds4_route_timeout must be a positive integer");
return GStatus::OK;
}
}
}
g_kv_cache.SetDir(g_kv_cache_dir);
ds4_engine_options opt = {};
opt.model_path = model_path.c_str();
opt.mtp_path = mtp_path.empty() ? nullptr : mtp_path.c_str();
opt.n_threads = request->threads() > 0 ? request->threads() : 0;
opt.mtp_draft_tokens = mtp_draft;
opt.mtp_margin = mtp_margin;
opt.directional_steering_file = nullptr;
opt.warm_weights = false;
opt.quality = false;
opt.mtp_margin = 3.0f; // ds4 default; overridable via the mtp_margin option
#if defined(DS4_NO_GPU)
opt.backend = DS4_BACKEND_CPU;
@@ -518,6 +615,46 @@ public:
opt.backend = DS4_BACKEND_CUDA;
#endif
// Stable storage for string-valued engine options. The engine reads
// these by pointer during ds4_engine_open, so the std::string backing
// store must outlive the call and not reallocate; reserve up front so
// push_back keeps every prior c_str() valid. Static + clear() reuses
// the buffer across LoadModel calls (the old engine is closed above).
static std::vector<std::string> s_opt_strings;
s_opt_strings.clear();
s_opt_strings.reserve(sizeof(kEngineOptSpecs) / sizeof(kEngineOptSpecs[0]));
// Directory of the main model, used to resolve relative path options.
std::string model_dir;
if (auto slash = model_path.find_last_of('/'); slash != std::string::npos) {
model_dir = model_path.substr(0, slash);
}
std::string ds4_role, ds4_layers, ds4_listen;
for (const auto &o : request->options()) {
auto [k, v] = split_option(o);
if (k == "kv_cache_dir") { g_kv_cache_dir = v; continue; }
else if (k == "ds4_role") { ds4_role = v; continue; }
else if (k == "ds4_layers") { ds4_layers = v; continue; }
else if (k == "ds4_listen") { ds4_listen = v; continue; }
else if (k == "ds4_route_timeout") {
if (!parse_positive_int(v, &g_route_timeout_sec)) {
result->set_success(false);
result->set_message("ds4: ds4_route_timeout must be a positive integer");
return GStatus::OK;
}
continue;
}
std::string err;
if (!apply_engine_option(&opt, k, v, model_dir, s_opt_strings, err)) {
result->set_success(false);
result->set_message("ds4: " + err);
return GStatus::OK;
}
}
g_kv_cache.SetDir(g_kv_cache_dir);
// Coordinator wiring. 'ds4_role:coordinator' enables layer-split
// distributed inference: this process listens on ds4_listen and owns
// the ds4_layers slice; workers dial in (see `local-ai worker

View File

@@ -1,5 +1,5 @@
IK_LLAMA_VERSION?=e6f8112f3ba126eed3ff5b30cdd08085414a7516
IK_LLAMA_VERSION?=71af16a6b7f6fb7315b346b4a51aad530599c3f5
LLAMA_REPO?=https://github.com/ikawrakow/ik_llama.cpp
CMAKE_ARGS?=

View File

@@ -1,5 +1,5 @@
LLAMA_VERSION?=4c6595503fe45d5a39f88d194e270f64c7424677
LLAMA_VERSION?=f3e182816421c648188b5eab269853bf1531d950
LLAMA_REPO?=https://github.com/ggerganov/llama.cpp
CMAKE_ARGS?=

View File

@@ -1922,25 +1922,27 @@ public:
body_json["min_p"] = data["min_p"];
}
// Pass enable_thinking via chat_template_kwargs (where oaicompat_chat_params_parse reads it)
// Forward the chat_template_kwargs the Go layer resolved (model config
// chat_template_kwargs + per-request metadata: enable_thinking,
// reasoning_effort, preserve_thinking, ...). One generic merge replaces
// the previous per-key handling - new template levers need no C++ change.
// oaicompat_chat_params_parse reads these from body_json.
const auto& metadata = request->metadata();
auto et_it = metadata.find("enable_thinking");
if (et_it != metadata.end()) {
if (!body_json.contains("chat_template_kwargs")) {
body_json["chat_template_kwargs"] = json::object();
auto ctk_it = metadata.find("chat_template_kwargs");
if (ctk_it != metadata.end() && !ctk_it->second.empty()) {
try {
json ctk = json::parse(ctk_it->second);
if (ctk.is_object()) {
if (!body_json.contains("chat_template_kwargs")) {
body_json["chat_template_kwargs"] = json::object();
}
for (auto& el : ctk.items()) {
body_json["chat_template_kwargs"][el.key()] = el.value();
}
}
} catch (const std::exception & e) {
SRV_WRN("failed to parse chat_template_kwargs metadata: %s\n", e.what());
}
body_json["chat_template_kwargs"]["enable_thinking"] = (et_it->second == "true");
}
// Pass reasoning_effort via chat_template_kwargs too: the lever
// jinja templates like gpt-oss (Harmony) / LFM2.5 read, distinct
// from enable_thinking which those templates ignore.
auto re_it = metadata.find("reasoning_effort");
if (re_it != metadata.end() && !re_it->second.empty()) {
if (!body_json.contains("chat_template_kwargs")) {
body_json["chat_template_kwargs"] = json::object();
}
body_json["chat_template_kwargs"]["reasoning_effort"] = re_it->second;
}
// Debug: Print full body_json before template processing (includes messages, tools, tool_choice, etc.)
@@ -2756,25 +2758,26 @@ public:
body_json["min_p"] = data["min_p"];
}
// Pass enable_thinking via chat_template_kwargs (where oaicompat_chat_params_parse reads it)
// Forward the chat_template_kwargs the Go layer resolved (model config
// chat_template_kwargs + per-request metadata: enable_thinking,
// reasoning_effort, preserve_thinking, ...). One generic merge replaces
// the previous per-key handling - new template levers need no C++ change.
const auto& predict_metadata = request->metadata();
auto predict_et_it = predict_metadata.find("enable_thinking");
if (predict_et_it != predict_metadata.end()) {
if (!body_json.contains("chat_template_kwargs")) {
body_json["chat_template_kwargs"] = json::object();
auto predict_ctk_it = predict_metadata.find("chat_template_kwargs");
if (predict_ctk_it != predict_metadata.end() && !predict_ctk_it->second.empty()) {
try {
json ctk = json::parse(predict_ctk_it->second);
if (ctk.is_object()) {
if (!body_json.contains("chat_template_kwargs")) {
body_json["chat_template_kwargs"] = json::object();
}
for (auto& el : ctk.items()) {
body_json["chat_template_kwargs"][el.key()] = el.value();
}
}
} catch (const std::exception & e) {
SRV_WRN("failed to parse chat_template_kwargs metadata: %s\n", e.what());
}
body_json["chat_template_kwargs"]["enable_thinking"] = (predict_et_it->second == "true");
}
// Pass reasoning_effort via chat_template_kwargs too: the lever
// jinja templates like gpt-oss (Harmony) / LFM2.5 read, distinct
// from enable_thinking which those templates ignore.
auto predict_re_it = predict_metadata.find("reasoning_effort");
if (predict_re_it != predict_metadata.end() && !predict_re_it->second.empty()) {
if (!body_json.contains("chat_template_kwargs")) {
body_json["chat_template_kwargs"] = json::object();
}
body_json["chat_template_kwargs"]["reasoning_effort"] = predict_re_it->second;
}
// Debug: Print full body_json before template processing (includes messages, tools, tool_choice, etc.)

9
backend/cpp/privacy-filter/.gitignore vendored Normal file
View File

@@ -0,0 +1,9 @@
/privacy-filter.cpp
build/
package/
grpc-server
*.o
backend.pb.cc
backend.pb.h
backend.grpc.pb.cc
backend.grpc.pb.h

View File

@@ -0,0 +1,69 @@
cmake_minimum_required(VERSION 3.21)
project(privacy-filter-grpc-server LANGUAGES CXX C)
set(CMAKE_CXX_STANDARD 17)
set(CMAKE_CXX_STANDARD_REQUIRED ON)
set(TARGET grpc-server)
# Path to the privacy-filter.cpp engine sources. The Makefile arranges for this
# to exist (clone of a pinned commit, or a symlink to PRIVACY_FILTER_SRC).
set(PRIVACY_FILTER_DIR "${CMAKE_CURRENT_SOURCE_DIR}/privacy-filter.cpp"
CACHE PATH "Path to the privacy-filter.cpp engine source tree")
find_package(Threads REQUIRED)
find_package(Protobuf CONFIG QUIET)
if(NOT Protobuf_FOUND)
find_package(Protobuf REQUIRED)
endif()
find_package(gRPC CONFIG QUIET)
if(NOT gRPC_FOUND)
# Ubuntu's apt-installed grpc++ does not ship a CMake config - fall back.
find_library(GRPCPP_LIB grpc++ REQUIRED)
find_library(GRPCPP_REFLECTION_LIB grpc++_reflection REQUIRED)
add_library(gRPC::grpc++ INTERFACE IMPORTED)
set_target_properties(gRPC::grpc++ PROPERTIES INTERFACE_LINK_LIBRARIES "${GRPCPP_LIB}")
add_library(gRPC::grpc++_reflection INTERFACE IMPORTED)
set_target_properties(gRPC::grpc++_reflection PROPERTIES INTERFACE_LINK_LIBRARIES "${GRPCPP_REFLECTION_LIB}")
endif()
find_program(_PROTOC NAMES protoc REQUIRED)
find_program(_GRPC_CPP_PLUGIN NAMES grpc_cpp_plugin REQUIRED)
get_filename_component(HW_PROTO "${CMAKE_CURRENT_SOURCE_DIR}/../../backend.proto" ABSOLUTE)
get_filename_component(HW_PROTO_PATH "${HW_PROTO}" PATH)
set(HW_PROTO_SRCS "${CMAKE_CURRENT_BINARY_DIR}/backend.pb.cc")
set(HW_PROTO_HDRS "${CMAKE_CURRENT_BINARY_DIR}/backend.pb.h")
set(HW_GRPC_SRCS "${CMAKE_CURRENT_BINARY_DIR}/backend.grpc.pb.cc")
set(HW_GRPC_HDRS "${CMAKE_CURRENT_BINARY_DIR}/backend.grpc.pb.h")
add_custom_command(
OUTPUT "${HW_PROTO_SRCS}" "${HW_PROTO_HDRS}" "${HW_GRPC_SRCS}" "${HW_GRPC_HDRS}"
COMMAND ${_PROTOC}
ARGS --grpc_out "${CMAKE_CURRENT_BINARY_DIR}"
--cpp_out "${CMAKE_CURRENT_BINARY_DIR}"
-I "${HW_PROTO_PATH}"
--plugin=protoc-gen-grpc="${_GRPC_CPP_PLUGIN}"
"${HW_PROTO}"
DEPENDS "${HW_PROTO}")
add_library(hw_grpc_proto STATIC
${HW_GRPC_SRCS} ${HW_GRPC_HDRS}
${HW_PROTO_SRCS} ${HW_PROTO_HDRS})
target_include_directories(hw_grpc_proto PUBLIC ${CMAKE_CURRENT_BINARY_DIR})
# Build only the pf static lib (+ ggml) from the engine tree — no CLI/bench/tests.
# PF_VULKAN is honored when passed on the cmake command line (it lands in the
# shared cache the engine reads).
set(PF_BUILD_TOOLS OFF CACHE BOOL "" FORCE)
set(PF_BUILD_TESTS OFF CACHE BOOL "" FORCE)
add_subdirectory(${PRIVACY_FILTER_DIR} ${CMAKE_CURRENT_BINARY_DIR}/privacy-filter.cpp)
add_executable(${TARGET} grpc-server.cpp)
target_link_libraries(${TARGET} PRIVATE
pf
hw_grpc_proto
gRPC::grpc++
gRPC::grpc++_reflection
protobuf::libprotobuf
Threads::Threads)

View File

@@ -0,0 +1,77 @@
# privacy-filter backend Makefile.
#
# Wraps the standalone privacy-filter.cpp GGML engine (the openai-privacy-filter
# PII/NER token classifier) as a LocalAI gRPC backend. The engine source is
# fetched at the pin below — .github/workflows/bump_deps.yaml finds and updates
# PRIVACY_FILTER_VERSION, matching the llama-cpp / ds4 convention.
#
# Local development: point at a working checkout instead of cloning, e.g.
# make PRIVACY_FILTER_SRC=$HOME/c/privacy-filter.cpp grpc-server
PRIVACY_FILTER_VERSION?=646342f7a59c6b7d195185eac60bad762e572f1d
PRIVACY_FILTER_REPO?=https://github.com/localai-org/privacy-filter.cpp
PRIVACY_FILTER_SRC?=
CURRENT_MAKEFILE_DIR := $(dir $(abspath $(lastword $(MAKEFILE_LIST))))
BUILD_DIR := build
BUILD_TYPE ?=
NATIVE ?= false
JOBS ?= $(shell nproc 2>/dev/null || echo 4)
CMAKE_ARGS ?= -DCMAKE_BUILD_TYPE=Release
# GPU backends; the default (cpu) needs no extra flags. 'cublas' is LocalAI's
# name for the CUDA build (matches llama-cpp / ds4), mapping to the engine's
# GGML_CUDA path; 'vulkan' selects the ggml Vulkan backend.
ifeq ($(BUILD_TYPE),cublas)
CMAKE_ARGS += -DPF_CUDA=ON
endif
ifeq ($(BUILD_TYPE),vulkan)
CMAKE_ARGS += -DPF_VULKAN=ON
endif
# Portable binaries for distribution: disable -march=native unless asked.
ifneq ($(NATIVE),true)
CMAKE_ARGS += -DGGML_NATIVE=OFF
endif
.PHONY: grpc-server package clean purge test all
all: grpc-server
# Provide the engine sources at ./privacy-filter.cpp. With PRIVACY_FILTER_SRC
# set we symlink a local checkout (instant, no network); otherwise we clone the
# pinned commit and its ggml submodule. The directory/symlink is the target, so
# make only does this once — run 'make purge && make' to refetch after a bump.
privacy-filter.cpp:
ifneq ($(PRIVACY_FILTER_SRC),)
ln -sfn $(abspath $(PRIVACY_FILTER_SRC)) privacy-filter.cpp
else
mkdir -p privacy-filter.cpp
cd privacy-filter.cpp && \
git init -q && \
git remote add origin $(PRIVACY_FILTER_REPO) && \
git fetch --depth 1 origin $(PRIVACY_FILTER_VERSION) && \
git checkout FETCH_HEAD && \
git submodule update --init --recursive --depth 1
endif
grpc-server: privacy-filter.cpp
@echo "Building privacy-filter grpc-server ($(BUILD_TYPE)) with $(CMAKE_ARGS)"
mkdir -p $(BUILD_DIR)
cd $(BUILD_DIR) && cmake $(CMAKE_ARGS) $(CURRENT_MAKEFILE_DIR) && cmake --build . --config Release -j $(JOBS)
cp $(BUILD_DIR)/grpc-server grpc-server
package: grpc-server
bash package.sh
test:
@echo "privacy-filter backend: parity/regression coverage lives in the engine repo"
clean:
rm -rf $(BUILD_DIR) grpc-server package
# 'privacy-filter.cpp' may be a symlink (PRIVACY_FILTER_SRC) — rm without a
# trailing slash removes the link, never the linked-to checkout.
purge: clean
rm -rf privacy-filter.cpp

View File

@@ -0,0 +1,210 @@
// privacy-filter LocalAI gRPC backend.
//
// Thin shim over privacy-filter.cpp's flat C API (include/pf.h): a standalone
// GGML engine for the openai-privacy-filter token-classification model family
// (PII NER). It replaces the llama.cpp-patched TokenClassify path for this one
// model family — same GGUF files, no llama.cpp carry-patches.
//
// Only the RPCs the PII tier needs are implemented: LoadModel, TokenClassify,
// plus Health / Status / Free. Everything else inherits the generated base
// class default (UNIMPLEMENTED).
#include "backend.pb.h"
#include "backend.grpc.pb.h"
#include "pf.h"
#include <grpcpp/grpcpp.h>
#include <grpcpp/server.h>
#include <grpcpp/server_builder.h>
#include <grpcpp/ext/proto_server_reflection_plugin.h>
#include <atomic>
#include <chrono>
#include <csignal>
#include <iostream>
#include <memory>
#include <mutex>
#include <string>
using grpc::Server;
using grpc::ServerBuilder;
using grpc::ServerContext;
// NOTE: do NOT alias grpc::Status as Status — the Status RPC method below would
// shadow the type and break the other method signatures. Use GStatus instead.
using GStatus = ::grpc::Status;
using grpc::StatusCode;
namespace {
// The engine is single-model-per-process: LocalAI spawns one backend process
// per loaded model. g_mu guards (re)load against in-flight classification.
std::mutex g_mu;
pf_ctx * g_ctx = nullptr;
std::atomic<Server *> g_server{nullptr};
// Resolve the device string the engine expects ("cpu" / "gpu" / "cuda" /
// "vulkan", optionally ":N"). Priority: an explicit "device:..." in
// ModelOptions.Options, then a non-zero NGPULayers as a coarse "use the GPU"
// signal, else CPU. "gpu" lets the engine pick whichever GPU backend this
// binary was compiled with (CUDA or Vulkan), so the same config works on
// either build; pin "device:cuda"/"device:vulkan" to be explicit.
std::string resolve_device(const backend::ModelOptions * opts) {
for (const auto & o : opts->options()) {
const std::string prefix = "device:";
if (o.rfind(prefix, 0) == 0) {
return o.substr(prefix.size());
}
}
if (opts->ngpulayers() > 0) {
return "gpu";
}
return "cpu";
}
class PrivacyFilterBackend final : public backend::Backend::Service {
public:
GStatus Health(ServerContext *, const backend::HealthMessage *,
backend::Reply * reply) override {
reply->set_message("OK");
return GStatus::OK;
}
GStatus Status(ServerContext *, const backend::HealthMessage *,
backend::StatusResponse * response) override {
std::lock_guard<std::mutex> lock(g_mu);
response->set_state(g_ctx ? backend::StatusResponse::READY
: backend::StatusResponse::UNINITIALIZED);
return GStatus::OK;
}
GStatus LoadModel(ServerContext *, const backend::ModelOptions * request,
backend::Result * result) override {
std::lock_guard<std::mutex> lock(g_mu);
// ModelFile is the absolute path LocalAI resolves; Model is the bare
// name. Prefer the former, fall back to the latter.
const std::string path =
!request->modelfile().empty() ? request->modelfile() : request->model();
if (path.empty()) {
result->set_success(false);
result->set_message("no model path supplied");
return GStatus::OK;
}
const std::string device = resolve_device(request);
if (g_ctx) { pf_free(g_ctx); g_ctx = nullptr; }
pf_ctx * ctx = pf_load(path.c_str(), device.c_str(), request->threads());
const char * err = pf_last_error(ctx);
if (err) {
result->set_success(false);
result->set_message(std::string("privacy-filter load failed: ") + err);
pf_free(ctx);
return GStatus::OK;
}
// ContextSize, when set, becomes the per-forward window. The engine
// ignores values that are too small to window (<= 2*halo) and just
// runs a single forward, so passing it through is always safe.
if (request->contextsize() > 0) {
pf_set_window(ctx, request->contextsize());
}
g_ctx = ctx;
result->set_success(true);
result->set_message("privacy-filter loaded (" + device + ")");
return GStatus::OK;
}
GStatus TokenClassify(ServerContext *, const backend::TokenClassifyRequest * request,
backend::TokenClassifyResponse * response) override {
std::lock_guard<std::mutex> lock(g_mu);
if (!g_ctx) {
return GStatus(StatusCode::FAILED_PRECONDITION, "Model not loaded");
}
const std::string & text = request->text();
if (text.empty()) {
return GStatus::OK; // no text -> no entities
}
pf_entity * ents = nullptr;
size_t n = 0;
if (pf_classify(g_ctx, text.data(), text.size(), request->threshold(), &ents, &n) != 0) {
const char * err = pf_last_error(g_ctx);
return GStatus(StatusCode::INTERNAL,
std::string("TokenClassify failed: ") + (err ? err : "unknown"));
}
// Byte offsets are into the original UTF-8 text; the engine already
// applied the threshold and whitespace-trimmed span edges.
for (size_t i = 0; i < n; i++) {
backend::TokenClassifyEntity * ent = response->add_entities();
ent->set_entity_group(ents[i].label ? ents[i].label : "");
ent->set_start(ents[i].start);
ent->set_end(ents[i].end);
ent->set_score(ents[i].score);
ent->set_text(text.substr((size_t) ents[i].start,
(size_t) (ents[i].end - ents[i].start)));
}
pf_entities_free(ents, n);
return GStatus::OK;
}
GStatus Free(ServerContext *, const backend::HealthMessage *,
backend::Result * result) override {
std::lock_guard<std::mutex> lock(g_mu);
if (g_ctx) { pf_free(g_ctx); g_ctx = nullptr; }
result->set_success(true);
return GStatus::OK;
}
};
void RunServer(const std::string & addr) {
PrivacyFilterBackend service;
grpc::EnableDefaultHealthCheckService(true);
grpc::reflection::InitProtoReflectionServerBuilderPlugin();
ServerBuilder builder;
builder.AddListeningPort(addr, grpc::InsecureServerCredentials());
builder.RegisterService(&service);
builder.SetMaxReceiveMessageSize(64 * 1024 * 1024);
builder.SetMaxSendMessageSize(64 * 1024 * 1024);
std::unique_ptr<Server> server(builder.BuildAndStart());
if (!server) {
std::cerr << "privacy-filter grpc-server: failed to bind " << addr << "\n";
std::exit(1);
}
g_server = server.get();
std::cerr << "privacy-filter grpc-server listening on " << addr << "\n";
server->Wait();
}
void signal_handler(int) {
if (auto * srv = g_server.load()) {
srv->Shutdown(std::chrono::system_clock::now() + std::chrono::seconds(3));
}
}
} // namespace
int main(int argc, char * argv[]) {
std::string addr = "127.0.0.1:50051";
for (int i = 1; i < argc; ++i) {
std::string a = argv[i];
const std::string addr_flag = "--addr=";
if (a.rfind(addr_flag, 0) == 0) addr = a.substr(addr_flag.size());
else if (a == "--addr" && i + 1 < argc) addr = argv[++i];
else if (a == "--help" || a == "-h") {
std::cout << "Usage: grpc-server --addr=HOST:PORT\n";
return 0;
}
}
std::signal(SIGINT, signal_handler);
std::signal(SIGTERM, signal_handler);
RunServer(addr);
return 0;
}

View File

@@ -0,0 +1,39 @@
#!/bin/bash
# Assemble package/ for the from-scratch backend image: the grpc-server binary,
# run.sh, the dynamic loader, and every shared library the binary needs.
set -e
CURDIR=$(dirname "$(realpath "$0")")
REPO_ROOT="${CURDIR}/../../.."
mkdir -p "$CURDIR/package/lib"
cp -avf "$CURDIR/grpc-server" "$CURDIR/package/"
cp -rfv "$CURDIR/run.sh" "$CURDIR/package/"
# The dynamic loader, renamed to lib/ld.so so run.sh can invoke it explicitly
# (makes the image independent of the host's glibc layout).
if [ -f "/lib64/ld-linux-x86-64.so.2" ]; then
cp -arfLv /lib64/ld-linux-x86-64.so.2 "$CURDIR/package/lib/ld.so"
elif [ -f "/lib/ld-linux-aarch64.so.1" ]; then
cp -arfLv /lib/ld-linux-aarch64.so.1 "$CURDIR/package/lib/ld.so"
else
echo "package.sh: unknown architecture" >&2; exit 1
fi
# Bundle the binary's transitive shared deps (libstdc++, libgomp, and the apt
# grpc++/protobuf/absl stack) by walking ldd — robust to whichever of those are
# linked shared vs static. The loader line (no "=>") is skipped; ld.so above
# already covers it.
ldd "$CURDIR/grpc-server" | awk '$2 == "=>" && $3 ~ /^\// { print $3 }' | sort -u | \
while read -r so; do
[ -f "$so" ] && cp -arfLv "$so" "$CURDIR/package/lib/"
done
# Vulkan loader / GPU libs when building the GPU variant.
GPU_LIB_SCRIPT="${REPO_ROOT}/scripts/build/package-gpu-libs.sh"
if [ -f "$GPU_LIB_SCRIPT" ]; then
source "$GPU_LIB_SCRIPT" "$CURDIR/package/lib"
package_gpu_libs
fi
echo "privacy-filter package contents:"
ls -lah "$CURDIR/package/" "$CURDIR/package/lib/"

View File

@@ -0,0 +1,9 @@
#!/bin/bash
# Entry point for the privacy-filter backend image / BACKEND_BINARY mode.
set -e
CURDIR=$(dirname "$(realpath "$0")")
export LD_LIBRARY_PATH="$CURDIR/lib:$LD_LIBRARY_PATH"
if [ -f "$CURDIR/lib/ld.so" ]; then
exec "$CURDIR/lib/ld.so" "$CURDIR/grpc-server" "$@"
fi
exec "$CURDIR/grpc-server" "$@"

View File

@@ -0,0 +1,7 @@
sources/
build*/
package/
libdepthanythingcpp*.so
depth-anything-cpp
test-models/
test-data/

View File

@@ -0,0 +1,28 @@
cmake_minimum_required(VERSION 3.18)
project(libdepthanythingcpp LANGUAGES C CXX)
set(CMAKE_POSITION_INDEPENDENT_CODE ON)
set(CMAKE_CXX_STANDARD 17)
set(CMAKE_CXX_STANDARD_REQUIRED ON)
# Static-link ggml into the depth-anything shared library so the resulting .so
# has no runtime dependency on an external libggml — only on
# libc/libstdc++/libgomp, which the LocalAI package step bundles into the
# docker image.
set(BUILD_SHARED_LIBS OFF CACHE BOOL "Build static libraries" FORCE)
# depth-anything.cpp build switches: skip CLI/tests, but build libdepthanything
# itself as a SHARED library (DA_SHARED) while ggml stays static
# (BUILD_SHARED_LIBS OFF above). The da_capi_* C ABI is compiled into
# src/da_capi.cpp and re-exported by that shared library, so no extra MODULE
# wrapper is needed (unlike locate-anything.cpp).
set(DA_BUILD_CLI OFF CACHE BOOL "Disable depth-anything CLI" FORCE)
set(DA_BUILD_TESTS OFF CACHE BOOL "Disable depth-anything tests" FORCE)
set(DA_SHARED ON CACHE BOOL "Build libdepthanything as a shared lib" FORCE)
add_subdirectory(./sources/depth-anything.cpp)
# Emit libdepthanything.so into the top-level build dir so the Makefile can
# rename it to the per-variant libdepthanythingcpp-<variant>.so.
set_target_properties(depthanything PROPERTIES
LIBRARY_OUTPUT_DIRECTORY ${CMAKE_BINARY_DIR})

View File

@@ -0,0 +1,139 @@
CMAKE_ARGS?=
BUILD_TYPE?=
NATIVE?=false
GOCMD?=go
GO_TAGS?=
JOBS?=$(shell nproc --ignore=1)
# depth-anything.cpp. Pin to a specific commit for a stable build; a squash
# merge upstream can orphan a branch, so the native version is pinned by SHA.
# This SHA adds the nested two-file metric C-API (abi_version 4,
# da_capi_load_nested) required by the depth-anything-3-nested gallery model;
# tag it (e.g. v0.1.3) upstream to keep the SHA alive.
DEPTHANYTHING_REPO?=https://github.com/mudler/depth-anything.cpp.git
DEPTHANYTHING_VERSION?=cce5edc395fd1843806093d7ccc0c8b0d0b97b72
ifeq ($(NATIVE),false)
CMAKE_ARGS+=-DGGML_NATIVE=OFF
endif
# Forward LocalAI's BUILD_TYPE to the matching ggml backend switch. depth-anything.cpp
# force-sets GGML_CUDA/GGML_VULKAN/GGML_METAL from its own DA_GGML_* options, so
# those must be toggled via the DA_GGML_* names (a bare -DGGML_CUDA=ON would be
# overridden); the remaining ggml switches pass straight through.
ifeq ($(BUILD_TYPE),cublas)
CMAKE_ARGS+=-DGGML_CUDA=ON -DDA_GGML_CUDA=ON
else ifeq ($(BUILD_TYPE),openblas)
CMAKE_ARGS+=-DGGML_BLAS=ON -DGGML_BLAS_VENDOR=OpenBLAS
else ifeq ($(BUILD_TYPE),clblas)
CMAKE_ARGS+=-DGGML_CLBLAST=ON
else ifeq ($(BUILD_TYPE),hipblas)
ROCM_HOME ?= /opt/rocm
ROCM_PATH ?= /opt/rocm
export CXX=$(ROCM_HOME)/llvm/bin/clang++
export CC=$(ROCM_HOME)/llvm/bin/clang
AMDGPU_TARGETS?=gfx908,gfx90a,gfx942,gfx950,gfx1030,gfx1100,gfx1101,gfx1102,gfx1200,gfx1201
CMAKE_ARGS+=-DGGML_HIPBLAS=ON -DAMDGPU_TARGETS=$(AMDGPU_TARGETS)
else ifeq ($(BUILD_TYPE),vulkan)
CMAKE_ARGS+=-DGGML_VULKAN=ON -DDA_GGML_VULKAN=ON
else ifeq ($(OS),Darwin)
ifneq ($(BUILD_TYPE),metal)
CMAKE_ARGS+=-DGGML_METAL=OFF
else
CMAKE_ARGS+=-DGGML_METAL=ON
CMAKE_ARGS+=-DGGML_METAL_EMBED_LIBRARY=ON
CMAKE_ARGS+=-DDA_GGML_METAL=ON
endif
endif
ifeq ($(BUILD_TYPE),sycl_f16)
CMAKE_ARGS+=-DGGML_SYCL=ON \
-DCMAKE_C_COMPILER=icx \
-DCMAKE_CXX_COMPILER=icpx \
-DGGML_SYCL_F16=ON
endif
ifeq ($(BUILD_TYPE),sycl_f32)
CMAKE_ARGS+=-DGGML_SYCL=ON \
-DCMAKE_C_COMPILER=icx \
-DCMAKE_CXX_COMPILER=icpx
endif
sources/depth-anything.cpp:
mkdir -p sources && \
git clone --recursive $(DEPTHANYTHING_REPO) sources/depth-anything.cpp && \
cd sources/depth-anything.cpp && \
git checkout $(DEPTHANYTHING_VERSION) && \
git submodule update --init --recursive --depth 1 --single-branch
# Detect OS
UNAME_S := $(shell uname -s)
# Only build CPU variants on Linux
ifeq ($(UNAME_S),Linux)
VARIANT_TARGETS = libdepthanythingcpp-avx.so libdepthanythingcpp-avx2.so libdepthanythingcpp-avx512.so libdepthanythingcpp-fallback.so
else
# On non-Linux (e.g., Darwin), build only fallback variant
VARIANT_TARGETS = libdepthanythingcpp-fallback.so
endif
depth-anything-cpp: main.go godepthanythingcpp.go $(VARIANT_TARGETS)
CGO_ENABLED=0 $(GOCMD) build -tags "$(GO_TAGS)" -o depth-anything-cpp ./
package: depth-anything-cpp
bash package.sh
build: package
clean: purge
rm -rf libdepthanythingcpp*.so depth-anything-cpp package sources
purge:
rm -rf build*
# Build all variants (Linux only)
ifeq ($(UNAME_S),Linux)
libdepthanythingcpp-avx.so: sources/depth-anything.cpp
rm -rfv build-$@
$(info ${GREEN}I depth-anything-cpp build info:avx${RESET})
SO_TARGET=$@ CMAKE_ARGS="$(CMAKE_ARGS) -DGGML_AVX=on -DGGML_AVX2=off -DGGML_AVX512=off -DGGML_FMA=off -DGGML_F16C=off -DGGML_BMI2=off" $(MAKE) libdepthanythingcpp-custom
rm -rfv build-$@
libdepthanythingcpp-avx2.so: sources/depth-anything.cpp
rm -rfv build-$@
$(info ${GREEN}I depth-anything-cpp build info:avx2${RESET})
SO_TARGET=$@ CMAKE_ARGS="$(CMAKE_ARGS) -DGGML_AVX=on -DGGML_AVX2=on -DGGML_AVX512=off -DGGML_FMA=on -DGGML_F16C=on -DGGML_BMI2=on" $(MAKE) libdepthanythingcpp-custom
rm -rfv build-$@
libdepthanythingcpp-avx512.so: sources/depth-anything.cpp
rm -rfv build-$@
$(info ${GREEN}I depth-anything-cpp build info:avx512${RESET})
SO_TARGET=$@ CMAKE_ARGS="$(CMAKE_ARGS) -DGGML_AVX=on -DGGML_AVX2=on -DGGML_AVX512=on -DGGML_FMA=on -DGGML_F16C=on -DGGML_BMI2=on" $(MAKE) libdepthanythingcpp-custom
rm -rfv build-$@
endif
# Build fallback variant (all platforms)
libdepthanythingcpp-fallback.so: sources/depth-anything.cpp
rm -rfv build-$@
$(info ${GREEN}I depth-anything-cpp build info:fallback${RESET})
SO_TARGET=$@ CMAKE_ARGS="$(CMAKE_ARGS) -DGGML_AVX=off -DGGML_AVX2=off -DGGML_AVX512=off -DGGML_FMA=off -DGGML_F16C=off -DGGML_BMI2=off" $(MAKE) libdepthanythingcpp-custom
rm -rfv build-$@
libdepthanythingcpp-custom: CMakeLists.txt
mkdir -p build-$(SO_TARGET) && \
cd build-$(SO_TARGET) && \
cmake .. $(CMAKE_ARGS) && \
cmake --build . --config Release -j$(JOBS) && \
cd .. && \
mv build-$(SO_TARGET)/libdepthanything.so ./$(SO_TARGET)
all: depth-anything-cpp package
# `test` is invoked by the top-level Makefile's `test-extra` target. It builds
# the backend binary + the fallback shared library (needed for dlopen at
# runtime), then runs test.sh which downloads a small GGUF + a test image and
# exercises the gRPC Load/Predict wire path via the Go smoke test in
# main_test.go.
test: depth-anything-cpp libdepthanythingcpp-fallback.so
bash test.sh

View File

@@ -0,0 +1,556 @@
package main
// godepthanythingcpp.go - gRPC handlers (Load, Predict, GenerateImage) for the
// depth-anything-cpp backend, wrapping the Depth Anything 3 ggml C-API
// (libdepthanythingcpp-<variant>.so) via purego.
//
// Embeds base.SingleThread to default the unimplemented RPCs to "not supported"
// and to serialize calls — the C side shares a ggml graph allocator and is NOT
// reentrant, so all inference must run one-at-a-time.
//
// Depth has no native OpenAI endpoint, so the model is exposed two ways:
//
// - GenerateImage(src, dst): run depth on the src image and write a
// min-max-normalised grayscale depth PNG to dst.
// - Predict(images[0]): run depth+pose and return a JSON blob with the depth
// dimensions, depth stats and the camera extrinsics (3x4) / intrinsics (3x3).
import (
"encoding/base64"
"encoding/json"
"fmt"
"image"
"image/png"
"math"
"os"
"path/filepath"
"strings"
"unsafe"
"github.com/mudler/LocalAI/pkg/grpc/base"
pb "github.com/mudler/LocalAI/pkg/grpc/proto"
)
// C-API function pointers, registered in main.go via purego. The da_capi_*
// symbols live inside libdepthanything (src/da_capi.cpp) and are re-exported by
// the DA_SHARED build.
var (
// da_capi_load(const char* gguf_path, int n_threads) -> da_ctx* (0 = fail)
CapiLoad func(gguf string, nThreads int32) uintptr
// da_capi_load_nested(const char* anyview_gguf, const char* metric_gguf,
// int n_threads) -> da_ctx* (0 = fail). The returned ctx serves the nested
// metric model: depth/pose calls produce final metric-scale depth + scaled pose.
CapiLoadNested func(anyview string, metric string, nThreads int32) uintptr
// da_capi_free(da_ctx* ctx) — safe on a 0 handle.
CapiFree func(handle uintptr)
// da_capi_last_error(da_ctx* ctx) -> const char* (owned by ctx, "" if none).
// purego marshals the returned C string into a Go string (a copy), so we
// never free it.
CapiLastError func(handle uintptr) string
// da_capi_depth_path(ctx, image_path, out_h*, out_w*) -> float* depth map
// (row-major H*W); nil on error. Caller frees via da_capi_free_floats.
CapiDepthPath func(handle uintptr, imagePath string, outH *int32, outW *int32) *float32
// da_capi_free_floats(float* p)
CapiFreeFloats func(p *float32)
// da_capi_pose_path(ctx, image_path, out_ext[12], out_intr[9]) -> 0 ok, -1 err
CapiPosePath func(handle uintptr, imagePath string, outExt *float32, outIntr *float32) int32
// da_capi_depth_dense(ctx, image_path, out_h*, out_w*, out_depth**, out_conf**,
// out_sky**, out_ext[12], out_intr[9], out_is_metric*) -> 0 ok, -1 err.
// Each non-NULL out_depth/out_conf/out_sky receives a malloc'd float[H*W] (free
// via da_capi_free_floats); buffers the model doesn't produce are set NULL.
CapiDepthDense func(handle uintptr, imagePath string,
outH, outW *int32,
outDepth, outConf, outSky **float32,
outExt, outIntr *float32,
outIsMetric *int32) int32
// da_capi_points(ctx, image_path, conf_thresh, out_n*, out_xyz**, out_rgb**) ->
// 0 ok, -1 err. *out_xyz = malloc'd float[3*N] (free via da_capi_free_floats),
// *out_rgb = malloc'd uint8[3*N] (free via da_capi_free_bytes).
CapiPoints func(handle uintptr, imagePath string, confThresh float32,
outN *int32, outXyz **float32, outRgb **byte) int32
// da_capi_free_bytes(unsigned char* p)
CapiFreeBytes func(p *byte)
// da_capi_export_glb(ctx, image_path, out_glb) -> 0 ok, -1 err
CapiExportGlb func(handle uintptr, imagePath string, outGlb string) int32
// da_capi_export_colmap(ctx, image_path, out_dir, binary) -> 0 ok, -1 err
CapiExportColmap func(handle uintptr, imagePath string, outDir string, binary int32) int32
)
type DepthAnythingCpp struct {
base.SingleThread
handle uintptr
}
// Load loads the GGUF model at opts.ModelFile (joined with opts.ModelPath if
// relative) and stores the da_ctx handle for later inference calls.
func (r *DepthAnythingCpp) Load(opts *pb.ModelOptions) error {
modelFile := opts.ModelFile
if modelFile == "" {
modelFile = opts.Model
}
if modelFile == "" {
return fmt.Errorf("depth-anything-cpp: ModelFile is empty")
}
resolve := func(name string) string {
if filepath.IsAbs(name) {
return name
}
return filepath.Join(opts.ModelPath, name)
}
modelPath := resolve(modelFile)
if _, err := os.Stat(modelPath); err != nil {
return fmt.Errorf("depth-anything-cpp: model file not found: %s: %w", modelPath, err)
}
// Nested metric models are a two-file pair: the main model is the anyview
// (GIANT) branch and the metric (ViT-L + DPT/sky) branch is named via a
// "metric_model:<filename>" entry in opts.Options. When present we load both
// branches so the engine runs the nested metric alignment.
metricFile := optionValue(opts.Options, "metric_model")
threads := opts.Threads
if threads <= 0 {
threads = 4
}
// Release previous model if any (re-Load).
if r.handle != 0 {
CapiFree(r.handle)
r.handle = 0
}
var h uintptr
if metricFile != "" {
metricPath := resolve(metricFile)
if _, err := os.Stat(metricPath); err != nil {
return fmt.Errorf("depth-anything-cpp: metric_model file not found: %s: %w", metricPath, err)
}
h = CapiLoadNested(modelPath, metricPath, threads)
if h == 0 {
if msg := CapiLastError(0); msg != "" {
return fmt.Errorf("depth-anything-cpp: da_capi_load_nested failed for %s + %s: %s", modelPath, metricPath, msg)
}
return fmt.Errorf("depth-anything-cpp: da_capi_load_nested failed for %s + %s", modelPath, metricPath)
}
} else {
h = CapiLoad(modelPath, threads)
if h == 0 {
// da_capi_last_error needs a ctx; on a failed load we have none (it
// returns "" for a null ctx), so the text is best-effort.
if msg := CapiLastError(0); msg != "" {
return fmt.Errorf("depth-anything-cpp: da_capi_load failed for %s: %s", modelPath, msg)
}
return fmt.Errorf("depth-anything-cpp: da_capi_load failed for %s", modelPath)
}
}
r.handle = h
return nil
}
// optionValue returns the value of the first "key:value" entry in opts whose key
// matches (case-sensitive), or "" if absent. Mirrors how other LocalAI backends
// read ModelOptions.Options.
func optionValue(opts []string, key string) string {
prefix := key + ":"
for _, o := range opts {
if strings.HasPrefix(o, prefix) {
return strings.TrimSpace(o[len(prefix):])
}
}
return ""
}
// depthResult is the JSON payload returned by Predict.
type depthResult struct {
DepthW int `json:"depth_w"`
DepthH int `json:"depth_h"`
DepthMin float32 `json:"depth_min"`
DepthMax float32 `json:"depth_max"`
Extrinsics [12]float32 `json:"extrinsics"` // 3x4 row-major
Intrinsics [9]float32 `json:"intrinsics"` // 3x3 row-major
}
// Predict runs depth+pose on the first supplied image and returns depth
// statistics + camera pose as a JSON string. LocalAI wraps the string into the
// Reply.Message of the gRPC response. The image in Images[0] may be a
// filesystem path or a base64-encoded payload.
func (r *DepthAnythingCpp) Predict(opts *pb.PredictOptions) (string, error) {
imgs := opts.GetImages()
if len(imgs) == 0 {
return "", fmt.Errorf("depth-anything-cpp: Predict requires an image in Images[]")
}
imgPath, cleanup, err := materializeImage(imgs[0])
if err != nil {
return "", fmt.Errorf("depth-anything-cpp: %w", err)
}
defer cleanup()
depth, h, w, ext, intr, err := r.runDepthPose(imgPath)
if err != nil {
return "", err
}
dmin, dmax := minMax(depth)
payload, err := json.Marshal(depthResult{
DepthW: w, DepthH: h,
DepthMin: dmin, DepthMax: dmax,
Extrinsics: ext, Intrinsics: intr,
})
if err != nil {
return "", fmt.Errorf("depth-anything-cpp: marshal: %w", err)
}
return string(payload), nil
}
// GenerateImage runs depth on req.Src and writes a normalised grayscale depth
// PNG to req.Dst.
func (r *DepthAnythingCpp) GenerateImage(req *pb.GenerateImageRequest) error {
if req.GetSrc() == "" {
return fmt.Errorf("depth-anything-cpp: GenerateImage requires src")
}
if req.GetDst() == "" {
return fmt.Errorf("depth-anything-cpp: GenerateImage requires dst")
}
imgPath, cleanup, err := materializeImage(req.GetSrc())
if err != nil {
return fmt.Errorf("depth-anything-cpp: %w", err)
}
defer cleanup()
depth, h, w, _, _, err := r.runDepthPose(imgPath)
if err != nil {
return err
}
return writeDepthPNG(req.GetDst(), depth, h, w)
}
// Depth is the typed Depth RPC. It runs the Depth Anything 3 pipeline on the
// request's src image and fills a DepthResponse honoring the include_* flags and
// exports: per-pixel metric depth + confidence (DualDPT) or depth + sky (mono),
// camera extrinsics/intrinsics, an optional back-projected 3D point cloud and
// glb/COLMAP exports. The src may be a filesystem path or a base64 payload.
func (r *DepthAnythingCpp) Depth(in *pb.DepthRequest) (pb.DepthResponse, error) {
// Accumulate into locals and return a single composite literal at the end:
// returning a named pb.DepthResponse value would copy its embedded mutex
// (go vet copylocks).
if r.handle == 0 {
return pb.DepthResponse{}, fmt.Errorf("depth-anything-cpp: model not loaded")
}
if in.GetSrc() == "" {
return pb.DepthResponse{}, fmt.Errorf("depth-anything-cpp: Depth requires src")
}
imgPath, cleanup, err := materializeImage(in.GetSrc())
if err != nil {
return pb.DepthResponse{}, fmt.Errorf("depth-anything-cpp: %w", err)
}
defer cleanup()
// Dense per-pixel output + pose. Pass buffer pointers only for the
// requested maps so the native side can skip unrequested work; ext/intr
// must always point at 12/9 floats per the C ABI.
var (
h, w, isMetric int32
depthPtr, confPtr *float32
skyPtr *float32
ext [12]float32
intr [9]float32
pDepth, pConf, pSky **float32
)
if in.GetIncludeDepth() {
pDepth = &depthPtr
}
if in.GetIncludeConfidence() {
pConf = &confPtr
}
if in.GetIncludeSky() {
pSky = &skyPtr
}
rc := CapiDepthDense(r.handle, imgPath, &h, &w, pDepth, pConf, pSky, &ext[0], &intr[0], &isMetric)
if rc != 0 {
return pb.DepthResponse{}, fmt.Errorf("depth-anything-cpp: da_capi_depth_dense failed (rc=%d): %s", rc, r.lastError())
}
n := int(h) * int(w)
var (
depth, conf, sky []float32
extrinsics, intrinsic []float32
numPoints int32
points []float32
pointColors []byte
exportPaths []string
)
if depthPtr != nil {
depth = copyFloats(depthPtr, n)
CapiFreeFloats(depthPtr)
}
if confPtr != nil {
conf = copyFloats(confPtr, n)
CapiFreeFloats(confPtr)
}
if skyPtr != nil {
sky = copyFloats(skyPtr, n)
CapiFreeFloats(skyPtr)
}
if in.GetIncludePose() {
extrinsics = append([]float32(nil), ext[:]...)
intrinsic = append([]float32(nil), intr[:]...)
}
// 3D point cloud (DualDPT / pose-capable models only).
if in.GetIncludePoints() {
var (
np int32
xyzPtr *float32
rgbPtr *byte
)
if rc := CapiPoints(r.handle, imgPath, in.GetPointsConfThresh(), &np, &xyzPtr, &rgbPtr); rc != 0 {
return pb.DepthResponse{}, fmt.Errorf("depth-anything-cpp: da_capi_points failed (rc=%d): %s", rc, r.lastError())
}
numPoints = np
if xyzPtr != nil {
points = copyFloats(xyzPtr, int(np)*3)
CapiFreeFloats(xyzPtr)
}
if rgbPtr != nil {
pointColors = copyBytes(rgbPtr, int(np)*3)
CapiFreeBytes(rgbPtr)
}
}
// Exports (glb / colmap). They are written under in.Dst (a directory); a
// temp dir is used when Dst is empty.
if len(in.GetExports()) > 0 {
exportPaths, err = r.runExports(imgPath, in.GetDst(), in.GetExports())
if err != nil {
return pb.DepthResponse{}, err
}
}
return pb.DepthResponse{
Width: w,
Height: h,
Depth: depth,
Confidence: conf,
Sky: sky,
Extrinsics: extrinsics,
Intrinsics: intrinsic,
NumPoints: numPoints,
Points: points,
PointColors: pointColors,
ExportPaths: exportPaths,
IsMetric: isMetric != 0,
}, nil
}
// runExports writes the requested exports for imgPath into dstDir and returns
// the written paths. Supported exports: "glb", "colmap".
func (r *DepthAnythingCpp) runExports(imgPath, dstDir string, exports []string) ([]string, error) {
if dstDir == "" {
tmp, err := os.MkdirTemp("", "depth-anything-export-*")
if err != nil {
return nil, fmt.Errorf("depth-anything-cpp: mkdir export dir: %w", err)
}
dstDir = tmp
} else if err := os.MkdirAll(dstDir, 0o750); err != nil {
return nil, fmt.Errorf("depth-anything-cpp: mkdir %s: %w", dstDir, err)
}
var paths []string
for _, exp := range exports {
switch exp {
case "glb":
out := filepath.Join(dstDir, "pointcloud.glb")
if rc := CapiExportGlb(r.handle, imgPath, out); rc != 0 {
return nil, fmt.Errorf("depth-anything-cpp: da_capi_export_glb failed (rc=%d): %s", rc, r.lastError())
}
paths = append(paths, out)
case "colmap":
out := filepath.Join(dstDir, "colmap")
if err := os.MkdirAll(out, 0o750); err != nil {
return nil, fmt.Errorf("depth-anything-cpp: mkdir %s: %w", out, err)
}
if rc := CapiExportColmap(r.handle, imgPath, out, 1); rc != 0 {
return nil, fmt.Errorf("depth-anything-cpp: da_capi_export_colmap failed (rc=%d): %s", rc, r.lastError())
}
paths = append(paths, out)
default:
return nil, fmt.Errorf("depth-anything-cpp: unknown export %q (want glb|colmap)", exp)
}
}
return paths, nil
}
// copyFloats copies n float32 values from a C heap pointer into a fresh Go
// slice so the C buffer can be freed afterwards.
func copyFloats(p *float32, n int) []float32 {
if p == nil || n <= 0 {
return nil
}
src := unsafe.Slice(p, n)
out := make([]float32, n)
copy(out, src)
return out
}
// copyBytes copies n bytes from a C heap pointer into a fresh Go slice.
func copyBytes(p *byte, n int) []byte {
if p == nil || n <= 0 {
return nil
}
src := unsafe.Slice(p, n)
out := make([]byte, n)
copy(out, src)
return out
}
// runDepthPose runs depth estimation then pose recovery on an image file. It
// returns the row-major depth map (length h*w), its dimensions, the 3x4
// extrinsics (12 floats) and 3x3 intrinsics (9 floats).
// runDepthPose returns depth + camera pose via two C-API calls (depth then pose).
// For a nested metric model both calls run the full two-branch pipeline, so this
// path infers twice; the typed Depth RPC (single da_capi_depth_dense call) is the
// efficient path for nested models.
func (r *DepthAnythingCpp) runDepthPose(imagePath string) (depth []float32, h, w int, ext [12]float32, intr [9]float32, err error) {
if r.handle == 0 {
err = fmt.Errorf("depth-anything-cpp: model not loaded")
return
}
var ch, cw int32
ptr := CapiDepthPath(r.handle, imagePath, &ch, &cw)
if ptr == nil {
err = fmt.Errorf("depth-anything-cpp: da_capi_depth_path failed: %s", r.lastError())
return
}
h, w = int(ch), int(cw)
n := h * w
if n > 0 {
src := unsafe.Slice(ptr, n)
depth = make([]float32, n)
copy(depth, src)
}
CapiFreeFloats(ptr)
if rc := CapiPosePath(r.handle, imagePath, &ext[0], &intr[0]); rc != 0 {
err = fmt.Errorf("depth-anything-cpp: da_capi_pose_path failed (rc=%d): %s", rc, r.lastError())
return
}
return
}
// lastError returns the context's last error string, or "" if none.
func (r *DepthAnythingCpp) lastError() string {
if CapiLastError == nil || r.handle == 0 {
return ""
}
return CapiLastError(r.handle)
}
// materializeImage returns a filesystem path for an image argument that may be
// either an existing path or a base64-encoded payload. When the input is
// base64 it is decoded into a temp file; cleanup removes it (no-op for a path).
func materializeImage(arg string) (path string, cleanup func(), err error) {
cleanup = func() {}
if _, statErr := os.Stat(arg); statErr == nil {
return arg, cleanup, nil
}
// Strip an optional data URL prefix (data:image/...;base64,<payload>).
b64 := arg
if i := indexComma(b64); i >= 0 && hasDataPrefix(b64) {
b64 = b64[i+1:]
}
data, decErr := base64.StdEncoding.DecodeString(b64)
if decErr != nil {
return "", cleanup, fmt.Errorf("image is neither an existing path nor valid base64: %v", decErr)
}
f, tErr := os.CreateTemp("", "depth-anything-*.img")
if tErr != nil {
return "", cleanup, tErr
}
if _, wErr := f.Write(data); wErr != nil {
_ = f.Close()
_ = os.Remove(f.Name())
return "", cleanup, wErr
}
_ = f.Close()
name := f.Name()
return name, func() { _ = os.Remove(name) }, nil
}
func hasDataPrefix(s string) bool {
return len(s) >= 5 && s[:5] == "data:"
}
func indexComma(s string) int {
for i := 0; i < len(s); i++ {
if s[i] == ',' {
return i
}
}
return -1
}
// writeDepthPNG min-max normalises a depth map and writes it as an 8-bit
// grayscale PNG. Near = bright (255), far = dark (0), matching the usual
// depth-map convention for inverse-depth-like outputs.
func writeDepthPNG(dst string, depth []float32, h, w int) error {
if h <= 0 || w <= 0 || len(depth) < h*w {
return fmt.Errorf("depth-anything-cpp: writeDepthPNG: bad dims h=%d w=%d len=%d", h, w, len(depth))
}
dmin, dmax := minMax(depth)
span := dmax - dmin
if span <= 0 || math.IsNaN(float64(span)) {
span = 1
}
img := image.NewGray(image.Rect(0, 0, w, h))
for y := 0; y < h; y++ {
for x := 0; x < w; x++ {
v := depth[y*w+x]
n := (v - dmin) / span // 0..1
if math.IsNaN(float64(n)) {
n = 0
}
if n < 0 {
n = 0
} else if n > 1 {
n = 1
}
img.Pix[y*img.Stride+x] = uint8(n * 255)
}
}
// dst is the gRPC-provided output path chosen by the LocalAI core (the
// intended write destination for the rendered depth map), not
// attacker-controlled input, so the variable path is expected here.
f, err := os.Create(dst) // #nosec G304
if err != nil {
return err
}
defer func() { _ = f.Close() }()
return png.Encode(f, img)
}
func minMax(v []float32) (mn, mx float32) {
if len(v) == 0 {
return 0, 0
}
mn, mx = v[0], v[0]
for _, x := range v {
if math.IsNaN(float64(x)) || math.IsInf(float64(x), 0) {
continue
}
if x < mn {
mn = x
}
if x > mx {
mx = x
}
}
return mn, mx
}

View File

@@ -0,0 +1,62 @@
package main
// main.go - entry point for the depth-anything-cpp gRPC backend.
//
// Dlopens libdepthanythingcpp-<variant>.so via purego at the path in
// DEPTHANYTHING_LIBRARY (set by run.sh based on /proc/cpuinfo), registers the
// da_capi_* C ABI symbols, then starts the gRPC server.
import (
"flag"
"os"
"github.com/ebitengine/purego"
grpc "github.com/mudler/LocalAI/pkg/grpc"
)
var (
addr = flag.String("addr", "localhost:50051", "the address to connect to")
)
type LibFuncs struct {
FuncPtr any
Name string
}
func main() {
// Get library name from environment variable, default to fallback
libName := os.Getenv("DEPTHANYTHING_LIBRARY")
if libName == "" {
libName = "./libdepthanythingcpp-fallback.so"
}
lib, err := purego.Dlopen(libName, purego.RTLD_NOW|purego.RTLD_GLOBAL)
if err != nil {
panic(err)
}
libFuncs := []LibFuncs{
{&CapiLoad, "da_capi_load"},
{&CapiLoadNested, "da_capi_load_nested"},
{&CapiFree, "da_capi_free"},
{&CapiLastError, "da_capi_last_error"},
{&CapiDepthPath, "da_capi_depth_path"},
{&CapiFreeFloats, "da_capi_free_floats"},
{&CapiPosePath, "da_capi_pose_path"},
{&CapiDepthDense, "da_capi_depth_dense"},
{&CapiPoints, "da_capi_points"},
{&CapiFreeBytes, "da_capi_free_bytes"},
{&CapiExportGlb, "da_capi_export_glb"},
{&CapiExportColmap, "da_capi_export_colmap"},
}
for _, lf := range libFuncs {
purego.RegisterLibFunc(lf.FuncPtr, lib, lf.Name)
}
flag.Parse()
if err := grpc.StartServer(*addr, &DepthAnythingCpp{}); err != nil {
panic(err)
}
}

View File

@@ -0,0 +1,167 @@
package main
// main_test.go - end-to-end smoke test for the depth-anything-cpp gRPC backend.
//
// Spawns the compiled depth-anything-cpp binary on a free local port, dials it
// via gRPC, and exercises LoadModel + Predict against the test fixtures
// downloaded by test.sh: the small (vits) f32 GGUF of Depth Anything 3 and a
// real photo. Asserts that Predict returns a JSON payload with a positive
// depth-map width/height.
//
// The spec Skip()s cleanly if its fixtures (the model, the test image, the
// built binary, or the fallback .so) are missing, so the test target stays
// usable on a fresh checkout / on CI runners where the model hasn't been
// downloaded.
import (
"context"
"encoding/base64"
"encoding/json"
"fmt"
"net"
"os"
"os/exec"
"path/filepath"
"testing"
"time"
pb "github.com/mudler/LocalAI/pkg/grpc/proto"
. "github.com/onsi/ginkgo/v2"
. "github.com/onsi/gomega"
"google.golang.org/grpc"
"google.golang.org/grpc/credentials/insecure"
)
func TestDepth(t *testing.T) {
RegisterFailHandler(Fail)
RunSpecs(t, "depth-anything-cpp backend smoke suite")
}
// freePort grabs an ephemeral TCP port and immediately releases it so the
// spawned backend can bind to it. There is a tiny TOCTOU window here but in
// practice it's adequate for a smoke test on a quiet runner.
func freePort() int {
l, err := net.Listen("tcp", "127.0.0.1:0")
Expect(err).ToNot(HaveOccurred(), "freePort listen")
port := l.Addr().(*net.TCPAddr).Port
Expect(l.Close()).To(Succeed())
return port
}
// startBackend spawns the depth-anything-cpp binary on the given port and waits
// until it accepts TCP connections (up to 10s). It mirrors how main.go resolves
// the purego library: the DEPTHANYTHING_LIBRARY env var points the dlopen at the
// freshly built fallback .so. The returned cleanup func kills the process.
func startBackend(port int) func() {
binary, err := filepath.Abs("./depth-anything-cpp")
Expect(err).ToNot(HaveOccurred())
if _, err := os.Stat(binary); err != nil {
Skip(fmt.Sprintf("backend binary not built: %s (run `make depth-anything-cpp` first)", binary))
}
libPath, err := filepath.Abs("./libdepthanythingcpp-fallback.so")
Expect(err).ToNot(HaveOccurred())
if _, err := os.Stat(libPath); err != nil {
Skip(fmt.Sprintf("fallback library not built: %s (run `make libdepthanythingcpp-fallback.so` first)", libPath))
}
addr := fmt.Sprintf("127.0.0.1:%d", port)
cmd := exec.Command(binary, "--addr", addr)
cmd.Env = append(os.Environ(), "DEPTHANYTHING_LIBRARY="+libPath)
cmd.Stdout = os.Stderr
cmd.Stderr = os.Stderr
Expect(cmd.Start()).To(Succeed())
cleanup := func() {
if cmd.Process != nil {
_ = cmd.Process.Kill()
_, _ = cmd.Process.Wait()
}
}
deadline := time.Now().Add(10 * time.Second)
for time.Now().Before(deadline) {
c, err := net.DialTimeout("tcp", addr, 200*time.Millisecond)
if err == nil {
_ = c.Close()
return cleanup
}
time.Sleep(200 * time.Millisecond)
}
cleanup()
Fail(fmt.Sprintf("backend did not become ready on %s within 10s", addr))
return func() {}
}
// loadTestImage reads the test image downloaded by test.sh and returns its
// base64-encoded content (one of the wire formats accepted by Predict).
func loadTestImage() string {
imgPath, err := filepath.Abs("test-data/test.jpg")
Expect(err).ToNot(HaveOccurred())
imgBytes, err := os.ReadFile(imgPath)
if err != nil {
Skip(fmt.Sprintf("test image not present: %s (run test.sh first)", imgPath))
}
return base64.StdEncoding.EncodeToString(imgBytes)
}
// dialBackend opens a gRPC client connection to the spawned backend.
func dialBackend(port int) (pb.BackendClient, func()) {
addr := fmt.Sprintf("127.0.0.1:%d", port)
conn, err := grpc.NewClient(addr, grpc.WithTransportCredentials(insecure.NewCredentials()))
Expect(err).ToNot(HaveOccurred())
return pb.NewBackendClient(conn), func() { _ = conn.Close() }
}
// modelPathOrSkip resolves the model file under ./test-models/ and Skip()s the
// current spec if it's missing (not present on a fresh checkout / on CI runners
// without the download).
func modelPathOrSkip(name string) string {
modelDir, err := filepath.Abs("test-models")
Expect(err).ToNot(HaveOccurred())
modelPath := filepath.Join(modelDir, name)
if _, err := os.Stat(modelPath); err != nil {
Skip(fmt.Sprintf("model not present: %s (run test.sh first)", modelPath))
}
return modelPath
}
var _ = Describe("depth-anything-cpp backend", func() {
It("runs depth+pose against a known-good image", func() {
modelPath := modelPathOrSkip("depth-anything-small-f32.gguf")
imgB64 := loadTestImage()
port := freePort()
cleanup := startBackend(port)
defer cleanup()
client, closeConn := dialBackend(port)
defer closeConn()
ctx, cancel := context.WithTimeout(context.Background(), 20*time.Minute)
defer cancel()
loadResp, err := client.LoadModel(ctx, &pb.ModelOptions{
Model: "depth-anything-small-f32.gguf",
ModelFile: modelPath,
Threads: 4,
})
Expect(err).ToNot(HaveOccurred(), "LoadModel")
Expect(loadResp.GetSuccess()).To(BeTrue(), "LoadModel reported failure: %s", loadResp.GetMessage())
// Predict runs depth+pose and returns the JSON depthResult in Reply.Message.
reply, err := client.Predict(ctx, &pb.PredictOptions{
Images: []string{imgB64},
})
Expect(err).ToNot(HaveOccurred(), "Predict")
var res depthResult
Expect(json.Unmarshal(reply.GetMessage(), &res)).To(Succeed(), "Predict returned non-JSON: %q", string(reply.GetMessage()))
Expect(res.DepthW).To(BeNumerically(">", 0), "depth width should be positive")
Expect(res.DepthH).To(BeNumerically(">", 0), "depth height should be positive")
_, _ = fmt.Fprintf(GinkgoWriter, "depth OK: %dx%d min=%.3f max=%.3f\n",
res.DepthW, res.DepthH, res.DepthMin, res.DepthMax)
})
})

View File

@@ -0,0 +1,64 @@
package main
// nested_e2e_test.go - e2e smoke for the nested two-file metric model. Loads the
// anyview branch as the main model and points the metric branch via the
// "metric_model:<file>" option (exactly as the depth-anything-3-nested gallery
// entry does), then exercises the typed Depth RPC and asserts a metric depth map.
//
// Skips cleanly unless both nested GGUFs are present under ./test-models/ and the
// backend binary + fallback .so are built.
import (
"context"
"fmt"
"path/filepath"
"time"
pb "github.com/mudler/LocalAI/pkg/grpc/proto"
. "github.com/onsi/ginkgo/v2"
. "github.com/onsi/gomega"
)
var _ = Describe("depth-anything-cpp nested metric model", func() {
It("loads the two-file pair via the metric_model option and returns metric depth", func() {
anyviewPath := modelPathOrSkip("depth-anything-nested-anyview.gguf")
_ = modelPathOrSkip("depth-anything-nested-metric.gguf")
imgB64 := loadTestImage()
port := freePort()
cleanup := startBackend(port)
defer cleanup()
client, closeConn := dialBackend(port)
defer closeConn()
ctx, cancel := context.WithTimeout(context.Background(), 25*time.Minute)
defer cancel()
loadResp, err := client.LoadModel(ctx, &pb.ModelOptions{
Model: "depth-anything-nested-anyview.gguf",
ModelFile: anyviewPath,
ModelPath: filepath.Dir(anyviewPath),
Options: []string{"metric_model:depth-anything-nested-metric.gguf"},
Threads: 8,
})
Expect(err).ToNot(HaveOccurred(), "LoadModel(nested)")
Expect(loadResp.GetSuccess()).To(BeTrue(), "LoadModel reported failure: %s", loadResp.GetMessage())
resp, err := client.Depth(ctx, &pb.DepthRequest{
Src: imgB64,
IncludeDepth: true,
IncludePose: true,
})
Expect(err).ToNot(HaveOccurred(), "Depth(nested)")
Expect(resp.GetWidth()).To(BeNumerically(">", 0), "depth width")
Expect(resp.GetHeight()).To(BeNumerically(">", 0), "depth height")
Expect(resp.GetIsMetric()).To(BeTrue(), "nested output must be metric")
Expect(len(resp.GetDepth())).To(Equal(int(resp.GetWidth())*int(resp.GetHeight())), "dense depth length")
Expect(len(resp.GetExtrinsics())).To(Equal(12), "extrinsics 3x4")
Expect(resp.GetIntrinsics()[0]).To(BeNumerically(">", 0), "fx > 0")
_, _ = fmt.Fprintf(GinkgoWriter, "nested depth OK: %dx%d is_metric=%v fx=%.2f\n",
resp.GetWidth(), resp.GetHeight(), resp.GetIsMetric(), resp.GetIntrinsics()[0])
})
})

View File

@@ -0,0 +1,20 @@
package main
import (
. "github.com/onsi/ginkgo/v2"
. "github.com/onsi/gomega"
)
var _ = DescribeTable("optionValue",
func(opts []string, key, want string) {
Expect(optionValue(opts, key)).To(Equal(want))
},
Entry("present", []string{"foo:bar", "metric_model:m.gguf"}, "metric_model", "m.gguf"),
Entry("absent", []string{"foo:bar"}, "metric_model", ""),
Entry("nil", []string(nil), "metric_model", ""),
Entry("trims space", []string{"metric_model: m.gguf "}, "metric_model", "m.gguf"),
Entry("value with colon", []string{"metric_model:a:b.gguf"}, "metric_model", "a:b.gguf"),
Entry("first wins", []string{"metric_model:first.gguf", "metric_model:second.gguf"}, "metric_model", "first.gguf"),
Entry("empty value", []string{"metric_model:"}, "metric_model", ""),
Entry("prefix not key", []string{"metric_model_extra:x"}, "metric_model", ""),
)

View File

@@ -0,0 +1,59 @@
#!/bin/bash
# Script to copy the appropriate libraries based on architecture
set -e
CURDIR=$(dirname "$(realpath $0)")
REPO_ROOT="${CURDIR}/../../.."
# Create lib directory
mkdir -p $CURDIR/package/lib
cp -avf $CURDIR/libdepthanythingcpp-*.so $CURDIR/package/
cp -avf $CURDIR/depth-anything-cpp $CURDIR/package/
cp -fv $CURDIR/run.sh $CURDIR/package/
# Detect architecture and copy appropriate libraries
if [ -f "/lib64/ld-linux-x86-64.so.2" ]; then
# x86_64 architecture
echo "Detected x86_64 architecture, copying x86_64 libraries..."
cp -arfLv /lib64/ld-linux-x86-64.so.2 $CURDIR/package/lib/ld.so
cp -arfLv /lib/x86_64-linux-gnu/libc.so.6 $CURDIR/package/lib/libc.so.6
cp -arfLv /lib/x86_64-linux-gnu/libgcc_s.so.1 $CURDIR/package/lib/libgcc_s.so.1
cp -arfLv /lib/x86_64-linux-gnu/libstdc++.so.6 $CURDIR/package/lib/libstdc++.so.6
cp -arfLv /lib/x86_64-linux-gnu/libm.so.6 $CURDIR/package/lib/libm.so.6
cp -arfLv /lib/x86_64-linux-gnu/libgomp.so.1 $CURDIR/package/lib/libgomp.so.1
cp -arfLv /lib/x86_64-linux-gnu/libdl.so.2 $CURDIR/package/lib/libdl.so.2
cp -arfLv /lib/x86_64-linux-gnu/librt.so.1 $CURDIR/package/lib/librt.so.1
cp -arfLv /lib/x86_64-linux-gnu/libpthread.so.0 $CURDIR/package/lib/libpthread.so.0
elif [ -f "/lib/ld-linux-aarch64.so.1" ]; then
# ARM64 architecture
echo "Detected ARM64 architecture, copying ARM64 libraries..."
cp -arfLv /lib/ld-linux-aarch64.so.1 $CURDIR/package/lib/ld.so
cp -arfLv /lib/aarch64-linux-gnu/libc.so.6 $CURDIR/package/lib/libc.so.6
cp -arfLv /lib/aarch64-linux-gnu/libgcc_s.so.1 $CURDIR/package/lib/libgcc_s.so.1
cp -arfLv /lib/aarch64-linux-gnu/libstdc++.so.6 $CURDIR/package/lib/libstdc++.so.6
cp -arfLv /lib/aarch64-linux-gnu/libm.so.6 $CURDIR/package/lib/libm.so.6
cp -arfLv /lib/aarch64-linux-gnu/libgomp.so.1 $CURDIR/package/lib/libgomp.so.1
cp -arfLv /lib/aarch64-linux-gnu/libdl.so.2 $CURDIR/package/lib/libdl.so.2
cp -arfLv /lib/aarch64-linux-gnu/librt.so.1 $CURDIR/package/lib/librt.so.1
cp -arfLv /lib/aarch64-linux-gnu/libpthread.so.0 $CURDIR/package/lib/libpthread.so.0
elif [ $(uname -s) = "Darwin" ]; then
echo "Detected Darwin"
else
echo "Error: Could not detect architecture"
exit 1
fi
# Package GPU libraries based on BUILD_TYPE
GPU_LIB_SCRIPT="${REPO_ROOT}/scripts/build/package-gpu-libs.sh"
if [ -f "$GPU_LIB_SCRIPT" ]; then
echo "Packaging GPU libraries for BUILD_TYPE=${BUILD_TYPE:-cpu}..."
source "$GPU_LIB_SCRIPT" "$CURDIR/package/lib"
package_gpu_libs
fi
echo "Packaging completed successfully"
ls -liah $CURDIR/package/
ls -liah $CURDIR/package/lib/

View File

@@ -0,0 +1,52 @@
#!/bin/bash
set -ex
# Get the absolute current dir where the script is located
CURDIR=$(dirname "$(realpath $0)")
cd /
echo "CPU info:"
if [ "$(uname)" != "Darwin" ]; then
grep -e "model\sname" /proc/cpuinfo | head -1
grep -e "flags" /proc/cpuinfo | head -1
fi
LIBRARY="$CURDIR/libdepthanythingcpp-fallback.so"
if [ "$(uname)" != "Darwin" ]; then
if grep -q -e "\savx\s" /proc/cpuinfo ; then
echo "CPU: AVX found OK"
if [ -e $CURDIR/libdepthanythingcpp-avx.so ]; then
LIBRARY="$CURDIR/libdepthanythingcpp-avx.so"
fi
fi
if grep -q -e "\savx2\s" /proc/cpuinfo ; then
echo "CPU: AVX2 found OK"
if [ -e $CURDIR/libdepthanythingcpp-avx2.so ]; then
LIBRARY="$CURDIR/libdepthanythingcpp-avx2.so"
fi
fi
# Check avx 512
if grep -q -e "\savx512f\s" /proc/cpuinfo ; then
echo "CPU: AVX512F found OK"
if [ -e $CURDIR/libdepthanythingcpp-avx512.so ]; then
LIBRARY="$CURDIR/libdepthanythingcpp-avx512.so"
fi
fi
fi
export LD_LIBRARY_PATH=$CURDIR/lib:$LD_LIBRARY_PATH
export DEPTHANYTHING_LIBRARY=$LIBRARY
# If there is a lib/ld.so, use it
if [ -f $CURDIR/lib/ld.so ]; then
echo "Using lib/ld.so"
echo "Using library: $LIBRARY"
exec $CURDIR/lib/ld.so $CURDIR/depth-anything-cpp "$@"
fi
echo "Using library: $LIBRARY"
exec $CURDIR/depth-anything-cpp "$@"

View File

@@ -0,0 +1,45 @@
#!/bin/bash
set -e
CURDIR=$(dirname "$(realpath $0)")
echo "Running depth-anything-cpp backend tests..."
# Test model from the mudler/depth-anything.cpp-gguf HuggingFace repo. The small
# (vits) f32 GGUF is the lightest backbone (~131 MB), so it keeps the download
# cheap. It is resumed with `curl -C -` and skipped entirely if already present.
DEPTHANYTHING_MODEL_DIR="${DEPTHANYTHING_MODEL_DIR:-$CURDIR/test-models}"
DEPTHANYTHING_MODEL_FILE="${DEPTHANYTHING_MODEL_FILE:-depth-anything-small-f32.gguf}"
DEPTHANYTHING_MODEL_URL="${DEPTHANYTHING_MODEL_URL:-https://huggingface.co/mudler/depth-anything.cpp-gguf/resolve/main/depth-anything-small-f32.gguf}"
mkdir -p "$DEPTHANYTHING_MODEL_DIR"
if [ ! -f "$DEPTHANYTHING_MODEL_DIR/$DEPTHANYTHING_MODEL_FILE" ]; then
echo "Downloading depth-anything small f32 model (~131 MB)..."
# -C - resumes a partial download so an interrupted run doesn't restart from 0.
curl -L -C - -o "$DEPTHANYTHING_MODEL_DIR/$DEPTHANYTHING_MODEL_FILE" "$DEPTHANYTHING_MODEL_URL" --progress-bar
fi
# Use a real photo (people + cars) from the upstream rf-detr.cpp repo (~46 KB).
# Depth estimation needs real content; a synthetic image would be degenerate.
TEST_IMAGE_DIR="$CURDIR/test-data"
TEST_IMAGE_FILE="$TEST_IMAGE_DIR/test.jpg"
TEST_IMAGE_URL="${TEST_IMAGE_URL:-https://raw.githubusercontent.com/mudler/rf-detr.cpp/main/tests/fixtures/ci/test_image.jpg}"
mkdir -p "$TEST_IMAGE_DIR"
if [ ! -f "$TEST_IMAGE_FILE" ]; then
echo "Downloading test image..."
curl -L -o "$TEST_IMAGE_FILE" "$TEST_IMAGE_URL" --progress-bar
fi
echo "depth-anything-cpp test setup complete."
echo " model: $DEPTHANYTHING_MODEL_DIR/$DEPTHANYTHING_MODEL_FILE"
echo " test image: $TEST_IMAGE_FILE"
# Run the Go smoke test: spawns the backend binary on a free port, calls
# LoadModel + Predict via gRPC against the downloaded GGUF + image.
echo ""
echo "Running Go smoke test..."
cd "$CURDIR"
go test -v -timeout 30m ./...

17
backend/go/omnivoice-cpp/.gitignore vendored Normal file
View File

@@ -0,0 +1,17 @@
# Fetched upstream sources
sources/
# CMake build directories
build*/
# Compiled shared libraries
*.so
# Compiled backend binary
omnivoice-cpp
# Packaging output
package/
# Downloaded e2e models
omnivoice-models/

View File

@@ -0,0 +1,53 @@
cmake_minimum_required(VERSION 3.14)
project(gomnivoicecpp LANGUAGES C CXX)
set(CMAKE_POSITION_INDEPENDENT_CODE ON)
set(CMAKE_EXPORT_COMPILE_COMMANDS ON)
set(OMNIVOICE_DIR ${CMAKE_CURRENT_SOURCE_DIR}/sources/omnivoice.cpp)
# Override upstream's CMAKE_CUDA_ARCHITECTURES before add_subdirectory.
if(NOT DEFINED CMAKE_CUDA_ARCHITECTURES)
set(CMAKE_CUDA_ARCHITECTURES "75-virtual;80-virtual;86-real;89-real")
endif()
# Add the upstream project. Its own CMakeLists adds ggml + builds
# omnivoice-core (STATIC, contains src/omnivoice.cpp i.e. the ov_* impl).
# EXCLUDE_FROM_ALL keeps its CLI tools/tests from building unless referenced.
add_subdirectory(${OMNIVOICE_DIR} omnivoice EXCLUDE_FROM_ALL)
# Upstream generates version.h into its own CMAKE_CURRENT_BINARY_DIR and adds
# the top-level ${CMAKE_BINARY_DIR} to omnivoice-core's include path. When the
# project is nested under add_subdirectory those two directories differ
# (<build>/omnivoice vs <build>), so omnivoice.cpp cannot find version.h. Point
# omnivoice-core at the subproject binary dir where version.h is actually
# generated. (Fix lives here, never in the fetched upstream checkout.)
target_include_directories(omnivoice-core PRIVATE ${CMAKE_BINARY_DIR}/omnivoice)
add_library(gomnivoicecpp MODULE cpp/gomnivoicecpp.cpp)
target_link_libraries(gomnivoicecpp PRIVATE omnivoice-core)
target_include_directories(gomnivoicecpp PRIVATE ${OMNIVOICE_DIR}/src)
target_include_directories(gomnivoicecpp SYSTEM PRIVATE ${OMNIVOICE_DIR}/ggml/include)
# Link GPU backends if the upstream ggml created them.
foreach(backend blas cuda metal vulkan sycl)
if(TARGET ggml-${backend})
target_link_libraries(gomnivoicecpp PRIVATE ggml-${backend})
if(backend STREQUAL "cuda")
find_package(CUDAToolkit QUIET)
if(CUDAToolkit_FOUND)
target_link_libraries(gomnivoicecpp PRIVATE CUDA::cudart)
endif()
endif()
endif()
endforeach()
if(MSVC)
target_compile_options(gomnivoicecpp PRIVATE /W4 /wd4100 /wd4505)
else()
target_compile_options(gomnivoicecpp PRIVATE -Wall -Wextra
-Wno-unused-parameter -Wno-unused-function)
endif()
set_property(TARGET gomnivoicecpp PROPERTY CXX_STANDARD 17)
set_target_properties(gomnivoicecpp PROPERTIES LIBRARY_OUTPUT_DIRECTORY ${CMAKE_BINARY_DIR})

View File

@@ -0,0 +1,122 @@
CMAKE_ARGS?=
BUILD_TYPE?=
NATIVE?=false
GOCMD?=go
GO_TAGS?=
JOBS?=$(shell nproc --ignore=1)
# omnivoice.cpp version
OMNIVOICE_REPO?=https://github.com/ServeurpersoCom/omnivoice.cpp
OMNIVOICE_VERSION?=2603355a5dfacae5cfc33531d5d0933221843509
SO_TARGET?=libgomnivoicecpp.so
CMAKE_ARGS+=-DBUILD_SHARED_LIBS=OFF
ifeq ($(NATIVE),false)
CMAKE_ARGS+=-DGGML_NATIVE=OFF
endif
ifeq ($(BUILD_TYPE),cublas)
CMAKE_ARGS+=-DGGML_CUDA=ON
else ifeq ($(BUILD_TYPE),openblas)
CMAKE_ARGS+=-DGGML_BLAS=ON -DGGML_BLAS_VENDOR=OpenBLAS
else ifeq ($(BUILD_TYPE),clblas)
CMAKE_ARGS+=-DGGML_CLBLAST=ON -DCLBlast_DIR=/some/path
else ifeq ($(BUILD_TYPE),hipblas)
CMAKE_ARGS+=-DGGML_HIPBLAS=ON
else ifeq ($(BUILD_TYPE),vulkan)
CMAKE_ARGS+=-DGGML_VULKAN=ON
else ifeq ($(OS),Darwin)
ifneq ($(BUILD_TYPE),metal)
CMAKE_ARGS+=-DGGML_METAL=OFF
else
CMAKE_ARGS+=-DGGML_METAL=ON
CMAKE_ARGS+=-DGGML_METAL_EMBED_LIBRARY=ON
endif
endif
ifeq ($(BUILD_TYPE),sycl_f16)
CMAKE_ARGS+=-DGGML_SYCL=ON \
-DCMAKE_C_COMPILER=icx \
-DCMAKE_CXX_COMPILER=icpx \
-DGGML_SYCL_F16=ON
endif
ifeq ($(BUILD_TYPE),sycl_f32)
CMAKE_ARGS+=-DGGML_SYCL=ON \
-DCMAKE_C_COMPILER=icx \
-DCMAKE_CXX_COMPILER=icpx
endif
sources/omnivoice.cpp:
mkdir -p sources/omnivoice.cpp
cd sources/omnivoice.cpp && \
git init && \
git remote add origin $(OMNIVOICE_REPO) && \
git fetch origin && \
git checkout $(OMNIVOICE_VERSION) && \
git submodule update --init --recursive --depth 1 --single-branch
# Detect OS
UNAME_S := $(shell uname -s)
# Only build CPU variants on Linux
ifeq ($(UNAME_S),Linux)
VARIANT_TARGETS = libgomnivoicecpp-avx.so libgomnivoicecpp-avx2.so libgomnivoicecpp-avx512.so libgomnivoicecpp-fallback.so
else
VARIANT_TARGETS = libgomnivoicecpp-fallback.so
endif
omnivoice-cpp: main.go gomnivoicecpp.go $(VARIANT_TARGETS)
CGO_ENABLED=0 $(GOCMD) build -tags "$(GO_TAGS)" -o omnivoice-cpp ./
package: omnivoice-cpp
bash package.sh
build: package
clean: purge
rm -rf libgomnivoicecpp*.so package sources/omnivoice.cpp omnivoice-cpp
purge:
rm -rf build*
.NOTPARALLEL:
ifeq ($(UNAME_S),Linux)
libgomnivoicecpp-avx.so: sources/omnivoice.cpp
$(info ${GREEN}I omnivoice-cpp build info:avx${RESET})
SO_TARGET=libgomnivoicecpp-avx.so CMAKE_ARGS="$(CMAKE_ARGS) -DGGML_AVX=on -DGGML_AVX2=off -DGGML_AVX512=off -DGGML_FMA=off -DGGML_F16C=off -DGGML_BMI2=off" $(MAKE) libgomnivoicecpp-custom
rm -rf build-libgomnivoicecpp-avx.so
libgomnivoicecpp-avx2.so: sources/omnivoice.cpp
$(info ${GREEN}I omnivoice-cpp build info:avx2${RESET})
SO_TARGET=libgomnivoicecpp-avx2.so CMAKE_ARGS="$(CMAKE_ARGS) -DGGML_AVX=on -DGGML_AVX2=on -DGGML_AVX512=off -DGGML_FMA=on -DGGML_F16C=on -DGGML_BMI2=on" $(MAKE) libgomnivoicecpp-custom
rm -rf build-libgomnivoicecpp-avx2.so
libgomnivoicecpp-avx512.so: sources/omnivoice.cpp
$(info ${GREEN}I omnivoice-cpp build info:avx512${RESET})
SO_TARGET=libgomnivoicecpp-avx512.so CMAKE_ARGS="$(CMAKE_ARGS) -DGGML_AVX=on -DGGML_AVX2=on -DGGML_AVX512=on -DGGML_FMA=on -DGGML_F16C=on -DGGML_BMI2=on" $(MAKE) libgomnivoicecpp-custom
rm -rf build-libgomnivoicecpp-avx512.so
endif
libgomnivoicecpp-fallback.so: sources/omnivoice.cpp
$(info ${GREEN}I omnivoice-cpp build info:fallback${RESET})
SO_TARGET=libgomnivoicecpp-fallback.so CMAKE_ARGS="$(CMAKE_ARGS) -DGGML_AVX=off -DGGML_AVX2=off -DGGML_AVX512=off -DGGML_FMA=off -DGGML_F16C=off -DGGML_BMI2=off" $(MAKE) libgomnivoicecpp-custom
rm -rf build-libgomnivoicecpp-fallback.so
libgomnivoicecpp-custom: CMakeLists.txt cpp/gomnivoicecpp.cpp cpp/gomnivoicecpp.h
mkdir -p build-$(SO_TARGET) && \
cd build-$(SO_TARGET) && \
cmake .. $(CMAKE_ARGS) && \
cmake --build . --config Release -j$(JOBS) --target gomnivoicecpp && \
cd .. && \
mv build-$(SO_TARGET)/libgomnivoicecpp.so ./$(SO_TARGET)
test: omnivoice-cpp
@echo "Running omnivoice-cpp tests..."
bash test.sh
@echo "omnivoice-cpp tests completed."
all: omnivoice-cpp package

View File

@@ -0,0 +1,129 @@
package main
import (
"bytes"
"encoding/binary"
"fmt"
"os"
"runtime"
"github.com/go-audio/audio"
"github.com/go-audio/wav"
)
const omnivoiceSampleRate = 24000
// wavHeader24k returns a 44-byte WAV header for a streaming 24 kHz mono 16-bit
// PCM stream, with placeholder (0xFFFFFFFF) sizes since the total length is
// unknown up front. Emitted as the first chunk of TTSStream so the HTTP layer
// receives a self-describing WAV (the gRPC TTSStream path never sets Message,
// so the backend owns the header - see core/backend/tts.go:ModelTTSStream).
func wavHeader24k() []byte {
var buf bytes.Buffer
w := func(v any) { _ = binary.Write(&buf, binary.LittleEndian, v) }
buf.WriteString("RIFF")
w(uint32(0xFFFFFFFF))
buf.WriteString("WAVE")
buf.WriteString("fmt ")
w(uint32(16)) // Subchunk1Size
w(uint16(1)) // PCM
w(uint16(1)) // mono
w(uint32(omnivoiceSampleRate)) // sample rate
w(uint32(omnivoiceSampleRate * 2)) // byte rate = SR * blockAlign
w(uint16(2)) // block align (16-bit mono)
w(uint16(16)) // bits per sample
buf.WriteString("data")
w(uint32(0xFFFFFFFF))
return buf.Bytes()
}
// floatToPCM16LE clamps each sample to [-1,1] and encodes it as little-endian
// signed 16-bit PCM.
func floatToPCM16LE(samples []float32) []byte {
out := make([]byte, len(samples)*2)
for i, s := range samples {
if s > 1 {
s = 1
} else if s < -1 {
s = -1
}
v := int16(s * 32767)
out[i*2] = byte(v)
out[i*2+1] = byte(v >> 8)
}
return out
}
// writeWAV24k writes samples as a finalized 24 kHz mono 16-bit WAV at dst.
func writeWAV24k(dst string, samples []float32) error {
f, err := os.Create(dst)
if err != nil {
return fmt.Errorf("omnivoice: create %q: %w", dst, err)
}
enc := wav.NewEncoder(f, omnivoiceSampleRate, 16, 1, 1)
ints := make([]int, len(samples))
for i, s := range samples {
if s > 1 {
s = 1
} else if s < -1 {
s = -1
}
ints[i] = int(s * 32767)
}
b := &audio.IntBuffer{
Format: &audio.Format{NumChannels: 1, SampleRate: omnivoiceSampleRate},
Data: ints,
SourceBitDepth: 16,
}
if err := enc.Write(b); err != nil {
_ = enc.Close()
_ = f.Close()
return fmt.Errorf("omnivoice: encode WAV: %w", err)
}
if err := enc.Close(); err != nil {
_ = f.Close()
return fmt.Errorf("omnivoice: finalize WAV: %w", err)
}
return f.Close()
}
// readWAVAsFloat decodes a WAV file (any sample rate/channels) to a mono
// float32 slice in [-1,1] for use as reference audio. OmniVoice expects 24 kHz;
// callers should supply 24 kHz reference clips.
func readWAVAsFloat(path string) ([]float32, error) {
f, err := os.Open(path)
if err != nil {
return nil, fmt.Errorf("omnivoice: open ref %q: %w", path, err)
}
defer func() { _ = f.Close() }()
dec := wav.NewDecoder(f)
buf, err := dec.FullPCMBuffer()
if err != nil {
return nil, fmt.Errorf("omnivoice: decode ref %q: %w", path, err)
}
ch := int(buf.Format.NumChannels)
if ch < 1 {
ch = 1
}
bitDepth := int(buf.SourceBitDepth)
if bitDepth == 0 {
bitDepth = 16
}
scale := float32(int64(1) << uint(bitDepth-1))
n := len(buf.Data) / ch
out := make([]float32, n)
for i := 0; i < n; i++ {
// Downmix to mono by averaging channels.
var acc int
for c := 0; c < ch; c++ {
acc += buf.Data[i*ch+c]
}
out[i] = float32(acc) / float32(ch) / scale
}
return out, nil
}
// runtimeKeepAlive prevents the GC from reclaiming the reference-audio slice
// while its backing pointer is in use across the C call.
func runtimeKeepAlive(v any) { runtime.KeepAlive(v) }

View File

@@ -0,0 +1,166 @@
#include "gomnivoicecpp.h"
#include "ggml-backend.h"
#include "omnivoice.h"
#include <cstdio>
#include <cstdlib>
#include <cstring>
static ov_context *g_ctx = nullptr;
static void ggml_log_cb(enum ggml_log_level level, const char *log,
void * /*data*/) {
if (!log)
return;
const char *lvl = "?????";
switch (level) {
case GGML_LOG_LEVEL_DEBUG: lvl = "DEBUG"; break;
case GGML_LOG_LEVEL_INFO: lvl = "INFO"; break;
case GGML_LOG_LEVEL_WARN: lvl = "WARN"; break;
case GGML_LOG_LEVEL_ERROR: lvl = "ERROR"; break;
default: break;
}
fprintf(stderr, "[%-5s] %s", lvl, log);
fflush(stderr);
}
int omni_load(const char *model_path, const char *codec_path, int use_fa,
int clamp_fp16) {
ggml_log_set(ggml_log_cb, nullptr);
ggml_backend_load_all();
if (!model_path || model_path[0] == '\0') {
fprintf(stderr, "[omnivoice-cpp] ERROR: model_path is required\n");
return 1;
}
if (!codec_path || codec_path[0] == '\0') {
fprintf(stderr, "[omnivoice-cpp] ERROR: codec_path is required\n");
return 2;
}
ov_init_params p;
ov_init_default_params(&p);
p.model_path = model_path;
p.codec_path = codec_path;
p.use_fa = use_fa != 0;
p.clamp_fp16 = clamp_fp16 != 0;
fprintf(stderr, "[omnivoice-cpp] Loading model=%s codec=%s\n", model_path,
codec_path);
g_ctx = ov_init(&p);
if (!g_ctx) {
fprintf(stderr, "[omnivoice-cpp] FATAL: ov_init failed: %s\n",
ov_last_error());
return 3;
}
fprintf(stderr, "[omnivoice-cpp] Model loaded (%s)\n", ov_version());
return 0;
}
// Fill an ov_tts_params from the flat wrapper arguments.
static void fill_params(ov_tts_params *tp, const char *text, const char *lang,
const char *instruct, const float *ref_samples,
int ref_n, const char *ref_text, long long seed,
int denoise) {
ov_tts_default_params(tp);
tp->text = text ? text : "";
tp->lang = lang ? lang : "";
if (instruct && instruct[0] != '\0')
tp->instruct = instruct;
if (ref_samples && ref_n > 0) {
tp->ref_audio_24k = ref_samples;
tp->ref_n_samples = ref_n;
if (ref_text && ref_text[0] != '\0')
tp->ref_text = ref_text;
tp->denoise = denoise != 0;
}
if (seed >= 0)
tp->mg_seed = (uint64_t)seed;
}
float *omni_tts(const char *text, const char *lang, const char *instruct,
const float *ref_samples, int ref_n, const char *ref_text,
long long seed, int denoise, int *out_n) {
if (out_n)
*out_n = 0;
if (!g_ctx) {
fprintf(stderr, "[omnivoice-cpp] ERROR: model not loaded\n");
return nullptr;
}
if (!text || text[0] == '\0') {
fprintf(stderr, "[omnivoice-cpp] ERROR: text is required\n");
return nullptr; // omni_tts: out_n already 0
}
ov_tts_params tp;
fill_params(&tp, text, lang, instruct, ref_samples, ref_n, ref_text, seed,
denoise);
ov_audio out = {0};
enum ov_status rc = ov_synthesize(g_ctx, &tp, &out);
if (rc != OV_STATUS_OK || out.n_samples <= 0 || !out.samples) {
fprintf(stderr, "[omnivoice-cpp] ERROR: synthesize failed (rc=%d): %s\n",
(int)rc, ov_last_error());
ov_audio_free(&out);
return nullptr;
}
// Copy into a plain malloc buffer the Go side can free symmetrically via
// omni_pcm_free; then release the ov_audio-owned buffer.
size_t bytes = (size_t)out.n_samples * sizeof(float);
float *buf = (float *)malloc(bytes);
if (!buf) {
fprintf(stderr, "[omnivoice-cpp] ERROR: malloc(%zu) failed\n", bytes);
ov_audio_free(&out);
return nullptr;
}
memcpy(buf, out.samples, bytes);
if (out_n)
*out_n = out.n_samples;
ov_audio_free(&out);
return buf;
}
int omni_tts_stream(const char *text, const char *lang, const char *instruct,
const float *ref_samples, int ref_n, const char *ref_text,
long long seed, int denoise, omni_pcm_chunk_cb cb,
void *user_data) {
if (!g_ctx) {
fprintf(stderr, "[omnivoice-cpp] ERROR: model not loaded\n");
return 1;
}
if (!cb) {
fprintf(stderr, "[omnivoice-cpp] ERROR: stream callback is null\n");
return 2;
}
if (!text || text[0] == '\0') {
fprintf(stderr, "[omnivoice-cpp] ERROR: text is required\n");
return 4;
}
ov_tts_params tp;
fill_params(&tp, text, lang, instruct, ref_samples, ref_n, ref_text, seed,
denoise);
// ov_audio_chunk_cb has the identical signature to omni_pcm_chunk_cb
// (bool vs int return are ABI-compatible; non-zero == true).
tp.on_chunk = (ov_audio_chunk_cb)cb;
tp.on_chunk_user_data = user_data;
ov_audio out = {0}; // stays empty in streaming mode
enum ov_status rc = ov_synthesize(g_ctx, &tp, &out);
ov_audio_free(&out);
if (rc != OV_STATUS_OK && rc != OV_STATUS_CANCELLED) {
fprintf(stderr, "[omnivoice-cpp] ERROR: stream synth failed (rc=%d): %s\n",
(int)rc, ov_last_error());
return 3;
}
return 0;
}
void omni_pcm_free(float *p) { free(p); }
void omni_unload(void) {
if (g_ctx) {
ov_free(g_ctx);
g_ctx = nullptr;
}
}

View File

@@ -0,0 +1,38 @@
#pragma once
#include <cstdint>
extern "C" {
// Streaming PCM chunk callback. samples is mono float PCM at 24 kHz, valid
// only for the duration of the call. Return non-zero to continue, 0 to abort.
typedef int (*omni_pcm_chunk_cb)(const float *samples, int n_samples,
void *user_data);
// Load the LM (model_path) + codec (codec_path) GGUFs. use_fa / clamp_fp16
// map to ov_init_params. Returns 0 on success, non-zero on failure.
int omni_load(const char *model_path, const char *codec_path, int use_fa,
int clamp_fp16);
// Synthesize to a malloc'd float PCM buffer (caller frees via omni_pcm_free).
// ref_samples != null && ref_n > 0 => voice cloning (ref_text optional).
// instruct != null && non-empty => voice design. seed < 0 keeps the default
// MaskGIT seed. denoise toggles the <|denoise|> marker (only with a reference).
// Writes the sample count to *out_n. Returns NULL on failure (out_n set to 0).
float *omni_tts(const char *text, const char *lang, const char *instruct,
const float *ref_samples, int ref_n, const char *ref_text,
long long seed, int denoise, int *out_n);
// Streaming synthesis: cb is invoked per PCM chunk as audio is produced.
// Same reference/design/seed semantics as omni_tts. Returns 0 on success.
int omni_tts_stream(const char *text, const char *lang, const char *instruct,
const float *ref_samples, int ref_n, const char *ref_text,
long long seed, int denoise, omni_pcm_chunk_cb cb,
void *user_data);
// Free a buffer returned by omni_tts.
void omni_pcm_free(float *p);
// Release the OmniVoice context.
void omni_unload(void);
}

View File

@@ -0,0 +1,74 @@
package main
import (
"os"
"strings"
"github.com/ebitengine/purego"
pb "github.com/mudler/LocalAI/pkg/grpc/proto"
. "github.com/onsi/ginkgo/v2"
. "github.com/onsi/gomega"
)
func ttsReq(text, voice string, lang *string, dst string) *pb.TTSRequest {
return &pb.TTSRequest{Text: text, Voice: voice, Language: lang, Dst: dst}
}
var _ = Describe("OmniVoice e2e", Label("e2e"), func() {
var loaded bool
BeforeEach(func() {
modelPath := os.Getenv("OMNIVOICE_MODEL")
codecPath := os.Getenv("OMNIVOICE_CODEC")
if modelPath == "" || codecPath == "" {
Skip("OMNIVOICE_MODEL / OMNIVOICE_CODEC not set; skipping e2e")
}
if !loaded {
lib := os.Getenv("OMNIVOICE_LIBRARY")
if lib == "" {
lib = "./libgomnivoicecpp-fallback.so"
}
h, err := purego.Dlopen(lib, purego.RTLD_NOW|purego.RTLD_GLOBAL)
Expect(err).ToNot(HaveOccurred())
purego.RegisterLibFunc(&CppLoad, h, "omni_load")
purego.RegisterLibFunc(&CppTTS, h, "omni_tts")
purego.RegisterLibFunc(&CppTTSStream, h, "omni_tts_stream")
purego.RegisterLibFunc(&CppPCMFree, h, "omni_pcm_free")
purego.RegisterLibFunc(&CppUnload, h, "omni_unload")
Expect(CppLoad(modelPath, codecPath, 0, 0)).To(Equal(0))
loaded = true
}
})
It("synthesizes a WAV file via TTS", func() {
b := &OmnivoiceCpp{opts: loadOptions{seed: 42, denoise: true}}
dst := GinkgoT().TempDir() + "/out.wav"
lang := "en"
err := b.TTS(ttsReq("Hello world.", "", &lang, dst))
Expect(err).ToNot(HaveOccurred())
fi, err := os.Stat(dst)
Expect(err).ToNot(HaveOccurred())
Expect(fi.Size()).To(BeNumerically(">", int64(44)))
})
It("streams audio chunks via TTSStream", func() {
b := &OmnivoiceCpp{opts: loadOptions{seed: 42, denoise: true}}
results := make(chan []byte, 1024)
lang := "en"
done := make(chan error, 1)
go func() { done <- b.TTSStream(ttsReq("Hello there, streaming test.", "", &lang, ""), results) }()
var chunks int
var first []byte
for c := range results {
if chunks == 0 {
first = c
}
chunks++
}
Expect(<-done).ToNot(HaveOccurred())
Expect(chunks).To(BeNumerically(">=", 2))
Expect(string(first[0:4])).To(Equal("RIFF"))
Expect(strings.HasPrefix(string(first[8:12]), "WAVE")).To(BeTrue())
})
})

View File

@@ -0,0 +1,246 @@
package main
import (
"fmt"
"os"
"path/filepath"
"strings"
"sync"
"unsafe"
"github.com/ebitengine/purego"
"github.com/mudler/LocalAI/pkg/grpc/base"
pb "github.com/mudler/LocalAI/pkg/grpc/proto"
)
var (
// omni_load(model_path, codec_path, use_fa, clamp_fp16) int
CppLoad func(modelPath, codecPath string, useFA, clampFP16 int) int
// omni_tts(text, lang, instruct, ref_samples, ref_n, ref_text, seed, denoise, out_n) -> float* (uintptr)
CppTTS func(text, lang, instruct string, refSamples unsafe.Pointer, refN int,
refText string, seed int64, denoise int, outN unsafe.Pointer) uintptr
// omni_tts_stream(text, lang, instruct, ref_samples, ref_n, ref_text, seed, denoise, cb, user) int
CppTTSStream func(text, lang, instruct string, refSamples unsafe.Pointer, refN int,
refText string, seed int64, denoise int, cb uintptr, user uintptr) int
CppPCMFree func(ptr uintptr)
CppUnload func()
)
type OmnivoiceCpp struct {
base.SingleThread
opts loadOptions
// audioPath is the model-config reference voice (tts.audio_path), used as
// the default voice-cloning reference when a request does not set Voice.
audioPath string
}
func (o *OmnivoiceCpp) Load(opts *pb.ModelOptions) error {
model := opts.ModelFile
if model == "" {
model = opts.ModelPath
}
if !filepath.IsAbs(model) && opts.ModelPath != "" {
model = filepath.Join(opts.ModelPath, model)
}
o.opts = parseOptions(opts.Options)
// Resolve the codec/tokenizer GGUF: explicit option, else auto-discover a
// *tokenizer*.gguf sibling of the base model.
codec := o.opts.codecPath
if codec != "" && !filepath.IsAbs(codec) {
codec = filepath.Join(filepath.Dir(model), codec)
}
if codec == "" {
codec = discoverTokenizer(filepath.Dir(model))
}
if codec == "" {
return fmt.Errorf("omnivoice: no codec/tokenizer GGUF found; set option 'tokenizer:<file>'")
}
o.opts.codecPath = codec
// tts.audio_path (ModelOptions.AudioPath) is the config-level voice-cloning
// reference: a default reference WAV used when a request omits Voice.
// Resolved relative to the model directory like the codec.
o.audioPath = opts.AudioPath
if o.audioPath != "" && !filepath.IsAbs(o.audioPath) {
o.audioPath = filepath.Join(filepath.Dir(model), o.audioPath)
}
useFA := boolToInt(o.opts.useFA)
clamp := boolToInt(o.opts.clampFP16)
fmt.Fprintf(os.Stderr, "[omnivoice-cpp] Load model=%s codec=%s use_fa=%d clamp_fp16=%d\n",
model, codec, useFA, clamp)
if rc := CppLoad(model, codec, useFA, clamp); rc != 0 {
return fmt.Errorf("omnivoice: failed to load model (rc=%d)", rc)
}
return nil
}
// discoverTokenizer returns the first *tokenizer*.gguf in dir, or "".
func discoverTokenizer(dir string) string {
entries, err := os.ReadDir(dir)
if err != nil {
return ""
}
for _, e := range entries {
name := strings.ToLower(e.Name())
if strings.Contains(name, "tokenizer") && strings.HasSuffix(name, ".gguf") {
return filepath.Join(dir, e.Name())
}
}
return ""
}
func boolToInt(b bool) int {
if b {
return 1
}
return 0
}
// refAudio loads the reference WAV (voice cloning) if voice points to a file.
// Returns nil if no cloning (empty or non-path - voice design uses Instructions).
func (o *OmnivoiceCpp) refAudio(voice string) ([]float32, error) {
v := strings.TrimSpace(voice)
if v == "" {
return nil, nil
}
if _, err := os.Stat(v); err != nil {
return nil, nil
}
return readWAVAsFloat(v)
}
// refAudioFor resolves the cloning reference for a request: the per-request
// Voice takes precedence, falling back to the model-config audio_path. Empty
// result means no cloning (voice design via Instructions still applies).
func (o *OmnivoiceCpp) refAudioFor(req *pb.TTSRequest) ([]float32, error) {
voice := strings.TrimSpace(req.Voice)
if voice == "" {
voice = o.audioPath
}
return o.refAudio(voice)
}
func reqParam(req *pb.TTSRequest, key string) string {
if req.Params == nil {
return ""
}
return req.Params[key]
}
func (o *OmnivoiceCpp) seedFor(req *pb.TTSRequest) int64 {
if s := reqParam(req, "seed"); s != "" {
var n int64
if _, err := fmt.Sscan(s, &n); err == nil {
return n
}
}
return o.opts.seed
}
func optStr(p *string) string {
if p == nil {
return ""
}
return *p
}
func (o *OmnivoiceCpp) TTS(req *pb.TTSRequest) error {
if req.Dst == "" {
return fmt.Errorf("omnivoice: TTS requires a destination path")
}
lang := normalizeLanguage(optStr(req.Language))
instruct := optStr(req.Instructions)
refText := reqParam(req, "ref_text")
seed := o.seedFor(req)
ref, err := o.refAudioFor(req)
if err != nil {
return err
}
var refPtr unsafe.Pointer
if len(ref) > 0 {
refPtr = unsafe.Pointer(&ref[0])
}
var n int32
ptr := CppTTS(req.Text, lang, instruct, refPtr, len(ref), refText, seed,
boolToInt(o.opts.denoise), unsafe.Pointer(&n))
runtimeKeepAlive(ref)
if ptr == 0 || n <= 0 {
return fmt.Errorf("omnivoice: synthesis failed")
}
defer CppPCMFree(ptr)
src := unsafe.Slice((*float32)(unsafe.Pointer(ptr)), int(n)) //nolint:govet // C-allocated PCM, copied out before free
out := make([]float32, int(n))
copy(out, src)
return writeWAV24k(req.Dst, out)
}
// streamState carries the active TTSStream channel to the single shared C
// callback. base.SingleThread serializes TTS/TTSStream, so one global slot is
// safe and avoids leaking a purego callback per request (purego callbacks
// cannot be freed and are capped).
var (
streamMu sync.Mutex
streamChan chan []byte
streamCbOnce sync.Once
streamCbPtr uintptr
)
// streamCallback is registered once and forwards each PCM chunk to streamChan.
func streamCallback(samples *float32, nSamples int32, _ uintptr) uintptr {
if nSamples <= 0 || samples == nil || streamChan == nil {
return 1 // continue
}
src := unsafe.Slice(samples, int(nSamples))
cp := make([]float32, int(nSamples)) // copy out of C memory before returning
copy(cp, src)
streamChan <- floatToPCM16LE(cp)
return 1 // continue
}
func (o *OmnivoiceCpp) TTSStream(req *pb.TTSRequest, results chan []byte) error {
defer close(results)
if req.Text == "" {
return fmt.Errorf("omnivoice: TTSStream requires text")
}
streamCbOnce.Do(func() {
streamCbPtr = purego.NewCallback(streamCallback)
})
lang := normalizeLanguage(optStr(req.Language))
instruct := optStr(req.Instructions)
refText := reqParam(req, "ref_text")
seed := o.seedFor(req)
ref, err := o.refAudioFor(req)
if err != nil {
return err
}
var refPtr unsafe.Pointer
if len(ref) > 0 {
refPtr = unsafe.Pointer(&ref[0])
}
// Emit the WAV header first so the HTTP layer gets a self-describing stream.
results <- wavHeader24k()
streamMu.Lock()
streamChan = results
rc := CppTTSStream(req.Text, lang, instruct, refPtr, len(ref), refText, seed,
boolToInt(o.opts.denoise), streamCbPtr, 0)
streamChan = nil
streamMu.Unlock()
runtimeKeepAlive(ref)
if rc != 0 {
return fmt.Errorf("omnivoice: streaming synthesis failed (rc=%d)", rc)
}
return nil
}

View File

@@ -0,0 +1,90 @@
package main
import (
"bytes"
"encoding/binary"
"testing"
. "github.com/onsi/ginkgo/v2"
. "github.com/onsi/gomega"
)
func TestOmnivoiceCpp(t *testing.T) {
RegisterFailHandler(Fail)
RunSpecs(t, "omnivoice-cpp suite")
}
var _ = Describe("normalizeLanguage", func() {
DescribeTable("maps caller language to OmniVoice codes",
func(in, want string) {
Expect(normalizeLanguage(in)).To(Equal(want))
},
Entry("empty stays empty", "", ""),
Entry("english full name", "English", "en"),
Entry("chinese full name", "Chinese", "zh"),
Entry("locale suffix stripped", "en-US", "en"),
Entry("underscore locale", "zh_CN", "zh"),
Entry("already a code", "en", "en"),
Entry("unknown passes through normalized", "xx", "xx"),
)
})
var _ = Describe("parseOptions", func() {
It("extracts codec, use_fa, clamp_fp16, seed, denoise", func() {
o := parseOptions([]string{
"tokenizer:tok.gguf",
"use_fa:true",
"clamp_fp16:true",
"seed:7",
"denoise:false",
"unknown:ignored",
})
Expect(o.codecPath).To(Equal("tok.gguf"))
Expect(o.useFA).To(BeTrue())
Expect(o.clampFP16).To(BeTrue())
Expect(o.seed).To(Equal(int64(7)))
Expect(o.denoise).To(BeFalse())
})
It("accepts codec: as an alias for tokenizer:", func() {
o := parseOptions([]string{"codec:c.gguf"})
Expect(o.codecPath).To(Equal("c.gguf"))
})
It("defaults seed to -1 and denoise to true", func() {
o := parseOptions(nil)
Expect(o.seed).To(Equal(int64(-1)))
Expect(o.denoise).To(BeTrue())
})
})
var _ = Describe("wavHeader24k", func() {
It("emits a 44-byte streaming WAV header at 24 kHz mono 16-bit", func() {
h := wavHeader24k()
Expect(h).To(HaveLen(44))
Expect(string(h[0:4])).To(Equal("RIFF"))
Expect(string(h[8:12])).To(Equal("WAVE"))
Expect(string(h[12:16])).To(Equal("fmt "))
Expect(string(h[36:40])).To(Equal("data"))
var sampleRate uint32
Expect(binary.Read(bytes.NewReader(h[24:28]), binary.LittleEndian, &sampleRate)).To(Succeed())
Expect(sampleRate).To(Equal(uint32(24000)))
})
})
var _ = Describe("floatToPCM16LE", func() {
It("clamps and converts float PCM to little-endian int16 bytes", func() {
b := floatToPCM16LE([]float32{0, 1.0, -1.0, 2.0, -2.0})
Expect(b).To(HaveLen(10)) // 5 samples * 2 bytes
read := func(off int) int16 {
var v int16
_ = binary.Read(bytes.NewReader(b[off:off+2]), binary.LittleEndian, &v)
return v
}
Expect(read(0)).To(Equal(int16(0)))
Expect(read(2)).To(Equal(int16(32767)))
Expect(read(4)).To(Equal(int16(-32767)))
Expect(read(6)).To(Equal(int16(32767))) // clamped from 2.0
Expect(read(8)).To(Equal(int16(-32767))) // clamped from -2.0
})
})

View File

@@ -0,0 +1,48 @@
package main
// Note: this is started internally by LocalAI and a server is allocated for each model
import (
"flag"
"os"
"github.com/ebitengine/purego"
grpc "github.com/mudler/LocalAI/pkg/grpc"
)
var (
addr = flag.String("addr", "localhost:50051", "the address to connect to")
)
type LibFuncs struct {
FuncPtr any
Name string
}
func main() {
libName := os.Getenv("OMNIVOICE_LIBRARY")
if libName == "" {
libName = "./libgomnivoicecpp-fallback.so"
}
lib, err := purego.Dlopen(libName, purego.RTLD_NOW|purego.RTLD_GLOBAL)
if err != nil {
panic(err)
}
libFuncs := []LibFuncs{
{&CppLoad, "omni_load"},
{&CppTTS, "omni_tts"},
{&CppTTSStream, "omni_tts_stream"},
{&CppPCMFree, "omni_pcm_free"},
{&CppUnload, "omni_unload"},
}
for _, lf := range libFuncs {
purego.RegisterLibFunc(lf.FuncPtr, lib, lf.Name)
}
flag.Parse()
if err := grpc.StartServer(*addr, &OmnivoiceCpp{}); err != nil {
panic(err)
}
}

View File

@@ -0,0 +1,74 @@
package main
import (
"strconv"
"strings"
)
// loadOptions holds the parsed model-level options for OmniVoice.
type loadOptions struct {
codecPath string
useFA bool
clampFP16 bool
seed int64
denoise bool
}
func splitOption(o string) (key, value string, ok bool) {
i := strings.Index(o, ":")
if i < 0 {
return "", "", false
}
return strings.TrimSpace(o[:i]), strings.TrimSpace(o[i+1:]), true
}
// parseOptions reads the backend "key:value" option slice. Unknown keys are
// ignored. Defaults: seed -1 (engine default), denoise true.
func parseOptions(opts []string) loadOptions {
o := loadOptions{seed: -1, denoise: true}
for _, oo := range opts {
key, value, ok := splitOption(oo)
if !ok {
continue
}
switch key {
case "tokenizer", "codec":
o.codecPath = value
case "use_fa":
o.useFA = value == "true" || value == "1"
case "clamp_fp16":
o.clampFP16 = value == "true" || value == "1"
case "seed":
if n, err := strconv.ParseInt(value, 10, 64); err == nil {
o.seed = n
}
case "denoise":
o.denoise = value == "true" || value == "1"
}
}
return o
}
// languageNameAliases maps full language names to OmniVoice codes. OmniVoice's
// lang hint accepts "" (auto), "en", "zh" per the upstream convention; other
// codes pass through and the engine treats unknown hints as auto.
var languageNameAliases = map[string]string{
"english": "en",
"chinese": "zh",
}
// normalizeLanguage lowercases, trims, strips a region/locale suffix, and
// resolves common full names. Empty stays empty so the engine auto-detects.
func normalizeLanguage(lang string) string {
lang = strings.ToLower(strings.TrimSpace(lang))
if lang == "" {
return ""
}
if i := strings.IndexAny(lang, "-_."); i >= 0 {
lang = lang[:i]
}
if code, ok := languageNameAliases[lang]; ok {
return code
}
return lang
}

View File

@@ -0,0 +1,64 @@
#!/bin/bash
# Script to copy the appropriate libraries based on architecture
# This script is used in the final stage of the Dockerfile
set -e
CURDIR=$(dirname "$(realpath $0)")
REPO_ROOT="${CURDIR}/../../.."
# Create lib directory
mkdir -p $CURDIR/package/lib
cp -avf $CURDIR/omnivoice-cpp $CURDIR/package/
cp -fv $CURDIR/libgomnivoicecpp-*.so $CURDIR/package/
cp -fv $CURDIR/run.sh $CURDIR/package/
# Detect architecture and copy appropriate libraries
if [ -f "/lib64/ld-linux-x86-64.so.2" ]; then
# x86_64 architecture
echo "Detected x86_64 architecture, copying x86_64 libraries..."
cp -arfLv /lib64/ld-linux-x86-64.so.2 $CURDIR/package/lib/ld.so
cp -arfLv /lib/x86_64-linux-gnu/libc.so.6 $CURDIR/package/lib/libc.so.6
cp -arfLv /lib/x86_64-linux-gnu/libgcc_s.so.1 $CURDIR/package/lib/libgcc_s.so.1
cp -arfLv /lib/x86_64-linux-gnu/libstdc++.so.6 $CURDIR/package/lib/libstdc++.so.6
cp -arfLv /lib/x86_64-linux-gnu/libm.so.6 $CURDIR/package/lib/libm.so.6
cp -arfLv /lib/x86_64-linux-gnu/libgomp.so.1 $CURDIR/package/lib/libgomp.so.1
cp -arfLv /lib/x86_64-linux-gnu/libgcc_s.so.1 $CURDIR/package/lib/libgcc_s.so.1
cp -arfLv /lib/x86_64-linux-gnu/libstdc++.so.6 $CURDIR/package/lib/libstdc++.so.6
cp -arfLv /lib/x86_64-linux-gnu/libdl.so.2 $CURDIR/package/lib/libdl.so.2
cp -arfLv /lib/x86_64-linux-gnu/librt.so.1 $CURDIR/package/lib/librt.so.1
cp -arfLv /lib/x86_64-linux-gnu/libpthread.so.0 $CURDIR/package/lib/libpthread.so.0
elif [ -f "/lib/ld-linux-aarch64.so.1" ]; then
# ARM64 architecture
echo "Detected ARM64 architecture, copying ARM64 libraries..."
cp -arfLv /lib/ld-linux-aarch64.so.1 $CURDIR/package/lib/ld.so
cp -arfLv /lib/aarch64-linux-gnu/libc.so.6 $CURDIR/package/lib/libc.so.6
cp -arfLv /lib/aarch64-linux-gnu/libgcc_s.so.1 $CURDIR/package/lib/libgcc_s.so.1
cp -arfLv /lib/aarch64-linux-gnu/libstdc++.so.6 $CURDIR/package/lib/libstdc++.so.6
cp -arfLv /lib/aarch64-linux-gnu/libm.so.6 $CURDIR/package/lib/libm.so.6
cp -arfLv /lib/aarch64-linux-gnu/libgomp.so.1 $CURDIR/package/lib/libgomp.so.1
cp -arfLv /lib/aarch64-linux-gnu/libgcc_s.so.1 $CURDIR/package/lib/libgcc_s.so.1
cp -arfLv /lib/aarch64-linux-gnu/libstdc++.so.6 $CURDIR/package/lib/libstdc++.so.6
cp -arfLv /lib/aarch64-linux-gnu/libdl.so.2 $CURDIR/package/lib/libdl.so.2
cp -arfLv /lib/aarch64-linux-gnu/librt.so.1 $CURDIR/package/lib/librt.so.1
cp -arfLv /lib/aarch64-linux-gnu/libpthread.so.0 $CURDIR/package/lib/libpthread.so.0
elif [ $(uname -s) = "Darwin" ]; then
echo "Detected Darwin"
else
echo "Error: Could not detect architecture"
exit 1
fi
# Package GPU libraries based on BUILD_TYPE
GPU_LIB_SCRIPT="${REPO_ROOT}/scripts/build/package-gpu-libs.sh"
if [ -f "$GPU_LIB_SCRIPT" ]; then
echo "Packaging GPU libraries for BUILD_TYPE=${BUILD_TYPE:-cpu}..."
source "$GPU_LIB_SCRIPT" "$CURDIR/package/lib"
package_gpu_libs
fi
echo "Packaging completed successfully"
ls -liah $CURDIR/package/
ls -liah $CURDIR/package/lib/

52
backend/go/omnivoice-cpp/run.sh Executable file
View File

@@ -0,0 +1,52 @@
#!/bin/bash
set -ex
# Get the absolute current dir where the script is located
CURDIR=$(dirname "$(realpath $0)")
cd /
echo "CPU info:"
if [ "$(uname)" != "Darwin" ]; then
grep -e "model\sname" /proc/cpuinfo | head -1
grep -e "flags" /proc/cpuinfo | head -1
fi
LIBRARY="$CURDIR/libgomnivoicecpp-fallback.so"
if [ "$(uname)" != "Darwin" ]; then
if grep -q -e "\savx\s" /proc/cpuinfo ; then
echo "CPU: AVX found OK"
if [ -e $CURDIR/libgomnivoicecpp-avx.so ]; then
LIBRARY="$CURDIR/libgomnivoicecpp-avx.so"
fi
fi
if grep -q -e "\savx2\s" /proc/cpuinfo ; then
echo "CPU: AVX2 found OK"
if [ -e $CURDIR/libgomnivoicecpp-avx2.so ]; then
LIBRARY="$CURDIR/libgomnivoicecpp-avx2.so"
fi
fi
# Check avx 512
if grep -q -e "\savx512f\s" /proc/cpuinfo ; then
echo "CPU: AVX512F found OK"
if [ -e $CURDIR/libgomnivoicecpp-avx512.so ]; then
LIBRARY="$CURDIR/libgomnivoicecpp-avx512.so"
fi
fi
fi
export LD_LIBRARY_PATH=$CURDIR/lib:$LD_LIBRARY_PATH
export OMNIVOICE_LIBRARY=$LIBRARY
# If there is a lib/ld.so, use it
if [ -f $CURDIR/lib/ld.so ]; then
echo "Using lib/ld.so"
echo "Using library: $LIBRARY"
exec $CURDIR/lib/ld.so $CURDIR/omnivoice-cpp "$@"
fi
echo "Using library: $LIBRARY"
exec $CURDIR/omnivoice-cpp "$@"

View File

@@ -0,0 +1,30 @@
#!/bin/bash
set -e
CURDIR=$(dirname "$(realpath $0)")
cd "$CURDIR"
echo "Running omnivoice-cpp backend tests..."
if [ -z "$OMNIVOICE_MODEL" ]; then
MODEL_DIR="./omnivoice-models"
mkdir -p "$MODEL_DIR"
REPO_ID="Serveurperso/OmniVoice-GGUF"
BASE_URL="https://huggingface.co/${REPO_ID}/resolve/main"
FILES=( "omnivoice-base-Q4_K_M.gguf" "omnivoice-tokenizer-Q4_K_M.gguf" )
for file in "${FILES[@]}"; do
dest="${MODEL_DIR}/${file}"
if [ -f "${dest}" ]; then
echo " [skip] ${file}"
else
echo " [download] ${file}..."
curl -L -o "${dest}" "${BASE_URL}/${file}" --progress-bar
fi
done
export OMNIVOICE_MODEL="${MODEL_DIR}/omnivoice-base-Q4_K_M.gguf"
export OMNIVOICE_CODEC="${MODEL_DIR}/omnivoice-tokenizer-Q4_K_M.gguf"
fi
go test -v -timeout 1200s .
echo "All omnivoice-cpp e2e tests passed."

View File

@@ -1,6 +1,6 @@
# parakeet-cpp backend Makefile.
#
# Upstream pin lives below as PARAKEET_VERSION?=b8012f11e5269126eddb7f4fd02f891a2ccc29b0
# Upstream pin lives below as PARAKEET_VERSION?=92a5f0306be354c109150fe58ae4cc4f8a21ca45
# (.github/bump_deps.sh) can find and update it - matches the
# whisper.cpp / ds4 / vibevoice-cpp convention.
#
@@ -15,7 +15,7 @@
# That's what the L0 smoke test uses. The default target below does the
# proper clone-at-pin + cmake build so CI doesn't need a side-checkout.
PARAKEET_VERSION?=b8012f11e5269126eddb7f4fd02f891a2ccc29b0
PARAKEET_VERSION?=92a5f0306be354c109150fe58ae4cc4f8a21ca45
PARAKEET_REPO?=https://github.com/mudler/parakeet.cpp
GOCMD?=go

View File

@@ -3,35 +3,36 @@ project(goqwen3ttscpp LANGUAGES C CXX)
set(CMAKE_POSITION_INDEPENDENT_CODE ON)
set(CMAKE_EXPORT_COMPILE_COMMANDS ON)
set(QWEN3TTS_DIR ${CMAKE_CURRENT_SOURCE_DIR}/sources/qwen3-tts.cpp)
set(QWENTTS_DIR ${CMAKE_CURRENT_SOURCE_DIR}/sources/qwentts.cpp)
# Override upstream's CMAKE_CUDA_ARCHITECTURES before add_subdirectory.
if(NOT DEFINED CMAKE_CUDA_ARCHITECTURES)
set(CMAKE_CUDA_ARCHITECTURES "75-virtual;80-virtual;86-real;89-real")
endif()
# Build ggml from the upstream's submodule FIRST, so that ggml/ggml-base/ggml-cpu
# CMake targets exist when the upstream project references them by name.
# The upstream CMakeLists.txt uses target_link_libraries(... ggml ggml-base ggml-cpu)
# with target_link_directories pointing at a pre-built ggml/build/. By adding ggml
# as a subdirectory here, CMake resolves those names as targets instead.
add_subdirectory(${QWEN3TTS_DIR}/ggml ggml EXCLUDE_FROM_ALL)
# Add the upstream project. Its own CMakeLists adds ggml + cpp-httplib + yyjson
# and builds qwen-core (STATIC, the qt_* impl). EXCLUDE_FROM_ALL keeps its CLI
# tools / tts-server / tests from building unless referenced.
add_subdirectory(${QWENTTS_DIR} qwentts EXCLUDE_FROM_ALL)
# Now add the upstream project
add_subdirectory(${QWEN3TTS_DIR} qwen3tts EXCLUDE_FROM_ALL)
# Upstream generates version.h into its own CMAKE_CURRENT_BINARY_DIR and adds
# the top-level ${CMAKE_BINARY_DIR} to qwen-core's include path. Under
# add_subdirectory those two dirs differ (<build>/qwentts vs <build>), so
# qwen.cpp cannot find version.h. Point qwen-core at the subproject binary dir
# where version.h is actually generated. (Fix lives here, never in the fetched
# upstream checkout.)
target_include_directories(qwen-core PRIVATE ${CMAKE_BINARY_DIR}/qwentts)
add_library(goqwen3ttscpp MODULE cpp/goqwen3ttscpp.cpp)
target_link_libraries(goqwen3ttscpp PRIVATE qwen3_tts)
target_link_libraries(goqwen3ttscpp PRIVATE qwen-core)
target_include_directories(goqwen3ttscpp PRIVATE ${QWEN3TTS_DIR}/src)
target_include_directories(goqwen3ttscpp SYSTEM PRIVATE ${QWEN3TTS_DIR}/ggml/include)
target_include_directories(goqwen3ttscpp PRIVATE ${QWENTTS_DIR}/src)
target_include_directories(goqwen3ttscpp SYSTEM PRIVATE ${QWENTTS_DIR}/ggml/include)
# Link GPU backends if available
foreach(backend blas cuda metal vulkan)
# Link GPU backends if the upstream ggml created them.
foreach(backend blas cuda metal vulkan sycl)
if(TARGET ggml-${backend})
target_link_libraries(goqwen3ttscpp PRIVATE ggml-${backend})
string(TOUPPER ${backend} BACKEND_UPPER)
target_compile_definitions(goqwen3ttscpp PRIVATE QWEN3TTS_HAVE_${BACKEND_UPPER})
if(backend STREQUAL "cuda")
find_package(CUDAToolkit QUIET)
if(CUDAToolkit_FOUND)
@@ -44,12 +45,8 @@ endforeach()
if(MSVC)
target_compile_options(goqwen3ttscpp PRIVATE /W4 /wd4100 /wd4505)
else()
target_compile_options(goqwen3ttscpp PRIVATE -Wall -Wextra -Wshadow -Wconversion
-Wno-unused-parameter -Wno-unused-function -Wno-sign-conversion)
endif()
if(CMAKE_CXX_COMPILER_ID MATCHES "GNU" AND CMAKE_CXX_COMPILER_VERSION VERSION_LESS 9.0)
target_link_libraries(goqwen3ttscpp PRIVATE stdc++fs)
target_compile_options(goqwen3ttscpp PRIVATE -Wall -Wextra
-Wno-unused-parameter -Wno-unused-function)
endif()
set_property(TARGET goqwen3ttscpp PROPERTY CXX_STANDARD 17)

View File

@@ -6,9 +6,9 @@ GOCMD?=go
GO_TAGS?=
JOBS?=$(shell nproc --ignore=1)
# qwen3-tts.cpp version
QWEN3TTS_REPO?=https://github.com/predict-woo/qwen3-tts.cpp
QWEN3TTS_CPP_VERSION?=136e5d36c17083da0321fd96512dc7b263f94a44
# qwentts.cpp version
QWEN3TTS_REPO?=https://github.com/ServeurpersoCom/qwentts.cpp
QWEN3TTS_CPP_VERSION?=0bf4a18b22e8bb8718d95294e9f7f45c0d4270a4
SO_TARGET?=libgoqwen3ttscpp.so
CMAKE_ARGS+=-DBUILD_SHARED_LIBS=OFF
@@ -49,9 +49,9 @@ ifeq ($(BUILD_TYPE),sycl_f32)
-DCMAKE_CXX_COMPILER=icpx
endif
sources/qwen3-tts.cpp:
mkdir -p sources/qwen3-tts.cpp
cd sources/qwen3-tts.cpp && \
sources/qwentts.cpp:
mkdir -p sources/qwentts.cpp
cd sources/qwentts.cpp && \
git init && \
git remote add origin $(QWEN3TTS_REPO) && \
git fetch origin && \
@@ -78,7 +78,7 @@ package: qwen3-tts-cpp
build: package
clean: purge
rm -rf libgoqwen3ttscpp*.so package sources/qwen3-tts.cpp qwen3-tts-cpp
rm -rf libgoqwen3ttscpp*.so package sources/qwentts.cpp qwen3-tts-cpp
purge:
rm -rf build*
@@ -88,24 +88,24 @@ purge:
# Build all variants (Linux only)
ifeq ($(UNAME_S),Linux)
libgoqwen3ttscpp-avx.so: sources/qwen3-tts.cpp
libgoqwen3ttscpp-avx.so: sources/qwentts.cpp
$(info ${GREEN}I qwen3-tts-cpp build info:avx${RESET})
SO_TARGET=libgoqwen3ttscpp-avx.so CMAKE_ARGS="$(CMAKE_ARGS) -DGGML_AVX=on -DGGML_AVX2=off -DGGML_AVX512=off -DGGML_FMA=off -DGGML_F16C=off -DGGML_BMI2=off" $(MAKE) libgoqwen3ttscpp-custom
rm -rf build-libgoqwen3ttscpp-avx.so
libgoqwen3ttscpp-avx2.so: sources/qwen3-tts.cpp
libgoqwen3ttscpp-avx2.so: sources/qwentts.cpp
$(info ${GREEN}I qwen3-tts-cpp build info:avx2${RESET})
SO_TARGET=libgoqwen3ttscpp-avx2.so CMAKE_ARGS="$(CMAKE_ARGS) -DGGML_AVX=on -DGGML_AVX2=on -DGGML_AVX512=off -DGGML_FMA=on -DGGML_F16C=on -DGGML_BMI2=on" $(MAKE) libgoqwen3ttscpp-custom
rm -rf build-libgoqwen3ttscpp-avx2.so
libgoqwen3ttscpp-avx512.so: sources/qwen3-tts.cpp
libgoqwen3ttscpp-avx512.so: sources/qwentts.cpp
$(info ${GREEN}I qwen3-tts-cpp build info:avx512${RESET})
SO_TARGET=libgoqwen3ttscpp-avx512.so CMAKE_ARGS="$(CMAKE_ARGS) -DGGML_AVX=on -DGGML_AVX2=on -DGGML_AVX512=on -DGGML_FMA=on -DGGML_F16C=on -DGGML_BMI2=on" $(MAKE) libgoqwen3ttscpp-custom
rm -rf build-libgoqwen3ttscpp-avx512.so
endif
# Build fallback variant (all platforms)
libgoqwen3ttscpp-fallback.so: sources/qwen3-tts.cpp
libgoqwen3ttscpp-fallback.so: sources/qwentts.cpp
$(info ${GREEN}I qwen3-tts-cpp build info:fallback${RESET})
SO_TARGET=libgoqwen3ttscpp-fallback.so CMAKE_ARGS="$(CMAKE_ARGS) -DGGML_AVX=off -DGGML_AVX2=off -DGGML_AVX512=off -DGGML_FMA=off -DGGML_F16C=off -DGGML_BMI2=off" $(MAKE) libgoqwen3ttscpp-custom
rm -rf build-libgoqwen3ttscpp-fallback.so

View File

@@ -0,0 +1,128 @@
package main
import (
"bytes"
"encoding/binary"
"fmt"
"os"
"runtime"
"github.com/go-audio/audio"
"github.com/go-audio/wav"
)
const qwen3ttsSampleRate = 24000
// wavHeader24k returns a 44-byte WAV header for a streaming 24 kHz mono 16-bit
// PCM stream, with placeholder (0xFFFFFFFF) sizes since the total length is
// unknown up front. Emitted as the first chunk of TTSStream so the HTTP layer
// receives a self-describing WAV (the gRPC TTSStream path never sets Message,
// so the backend owns the header - see core/backend/tts.go:ModelTTSStream).
func wavHeader24k() []byte {
var buf bytes.Buffer
w := func(v any) { _ = binary.Write(&buf, binary.LittleEndian, v) }
buf.WriteString("RIFF")
w(uint32(0xFFFFFFFF))
buf.WriteString("WAVE")
buf.WriteString("fmt ")
w(uint32(16)) // Subchunk1Size
w(uint16(1)) // PCM
w(uint16(1)) // mono
w(uint32(qwen3ttsSampleRate)) // sample rate
w(uint32(qwen3ttsSampleRate * 2)) // byte rate = SR * blockAlign
w(uint16(2)) // block align (16-bit mono)
w(uint16(16)) // bits per sample
buf.WriteString("data")
w(uint32(0xFFFFFFFF))
return buf.Bytes()
}
// floatToPCM16LE clamps each sample to [-1,1] and encodes it as little-endian
// signed 16-bit PCM.
func floatToPCM16LE(samples []float32) []byte {
out := make([]byte, len(samples)*2)
for i, s := range samples {
if s > 1 {
s = 1
} else if s < -1 {
s = -1
}
v := int16(s * 32767)
out[i*2] = byte(v)
out[i*2+1] = byte(v >> 8)
}
return out
}
// writeWAV24k writes samples as a finalized 24 kHz mono 16-bit WAV at dst.
func writeWAV24k(dst string, samples []float32) error {
f, err := os.Create(dst)
if err != nil {
return fmt.Errorf("qwen3-tts: create %q: %w", dst, err)
}
enc := wav.NewEncoder(f, qwen3ttsSampleRate, 16, 1, 1)
ints := make([]int, len(samples))
for i, s := range samples {
if s > 1 {
s = 1
} else if s < -1 {
s = -1
}
ints[i] = int(s * 32767)
}
b := &audio.IntBuffer{
Format: &audio.Format{NumChannels: 1, SampleRate: qwen3ttsSampleRate},
Data: ints,
SourceBitDepth: 16,
}
if err := enc.Write(b); err != nil {
_ = enc.Close()
_ = f.Close()
return fmt.Errorf("qwen3-tts: encode WAV: %w", err)
}
if err := enc.Close(); err != nil {
_ = f.Close()
return fmt.Errorf("qwen3-tts: finalize WAV: %w", err)
}
return f.Close()
}
// readWAVAsFloat decodes a WAV file (any sample rate/channels) to a mono
// float32 slice in [-1,1] for use as cloning reference audio. qwentts expects
// 24 kHz; callers should supply 24 kHz reference clips.
func readWAVAsFloat(path string) ([]float32, error) {
f, err := os.Open(path)
if err != nil {
return nil, fmt.Errorf("qwen3-tts: open ref %q: %w", path, err)
}
defer func() { _ = f.Close() }()
dec := wav.NewDecoder(f)
buf, err := dec.FullPCMBuffer()
if err != nil {
return nil, fmt.Errorf("qwen3-tts: decode ref %q: %w", path, err)
}
ch := int(buf.Format.NumChannels)
if ch < 1 {
ch = 1
}
bitDepth := int(buf.SourceBitDepth)
if bitDepth == 0 {
bitDepth = 16
}
scale := float32(int64(1) << uint(bitDepth-1))
n := len(buf.Data) / ch
out := make([]float32, n)
for i := 0; i < n; i++ {
var acc int
for c := 0; c < ch; c++ {
acc += buf.Data[i*ch+c]
}
out[i] = float32(acc) / float32(ch) / scale
}
return out, nil
}
// runtimeKeepAlive prevents the GC from reclaiming the reference-audio slice
// while its backing pointer is in use across the C call.
func runtimeKeepAlive(v any) { runtime.KeepAlive(v) }

View File

@@ -0,0 +1,54 @@
package main
import (
"path/filepath"
pb "github.com/mudler/LocalAI/pkg/grpc/proto"
. "github.com/onsi/ginkgo/v2"
. "github.com/onsi/gomega"
)
// These specs pin the voice-selection logic in resolveRequest, in particular
// the config-level audio_path (tts.audio_path -> ModelOptions.AudioPath) being
// used as the default voice-cloning reference. No model/C library is needed:
// resolveRequest only reads the reference WAV via readWAVAsFloat (pure Go).
var _ = Describe("resolveRequest voice/clone selection", func() {
var dir, refWav string
BeforeEach(func() {
dir = GinkgoT().TempDir()
refWav = filepath.Join(dir, "ref.wav")
// 0.5s of non-silent 24kHz mono audio as a clone reference.
samples := make([]float32, qwen3ttsSampleRate/2)
for i := range samples {
samples[i] = 0.1
}
Expect(writeWAV24k(refWav, samples)).To(Succeed())
})
It("uses the config audio_path as the clone reference when Voice is empty", func() {
q := &Qwen3TtsCpp{audioPath: refWav}
_, _, speaker, _, ref, _, err := q.resolveRequest(&pb.TTSRequest{Text: "hi"})
Expect(err).ToNot(HaveOccurred())
Expect(speaker).To(BeEmpty())
Expect(len(ref)).To(Equal(qwen3ttsSampleRate / 2))
})
It("lets a per-request audio Voice override audio_path", func() {
other := filepath.Join(dir, "other.wav")
Expect(writeWAV24k(other, make([]float32, 100))).To(Succeed())
q := &Qwen3TtsCpp{audioPath: refWav}
_, _, speaker, _, ref, _, err := q.resolveRequest(&pb.TTSRequest{Text: "hi", Voice: other})
Expect(err).ToNot(HaveOccurred())
Expect(speaker).To(BeEmpty())
Expect(len(ref)).To(Equal(100))
})
It("does not trigger audio_path cloning for a named-speaker Voice", func() {
q := &Qwen3TtsCpp{audioPath: refWav}
_, _, speaker, _, ref, _, err := q.resolveRequest(&pb.TTSRequest{Text: "hi", Voice: "serena"})
Expect(err).ToNot(HaveOccurred())
Expect(speaker).To(Equal("serena"))
Expect(ref).To(BeNil())
})
})

View File

@@ -1,161 +1,191 @@
#include "goqwen3ttscpp.h"
#include "ggml-backend.h"
#include "qwen3_tts.h"
#include "qwen.h"
#include <cmath>
#include <cstdio>
#include <cstdlib>
#include <cstring>
#include <string>
using namespace qwen3_tts;
static qt_context *g_ctx = nullptr;
// Global engine (loaded once, reused across requests)
static Qwen3TTS *g_engine = nullptr;
static bool g_loaded = false;
static int g_threads = 4;
static void ggml_log_cb(enum ggml_log_level level, const char *log, void *data) {
const char *level_str;
static void ggml_log_cb(enum ggml_log_level level, const char *log,
void * /*data*/) {
if (!log)
return;
const char *lvl = "?????";
switch (level) {
case GGML_LOG_LEVEL_DEBUG:
level_str = "DEBUG";
break;
case GGML_LOG_LEVEL_INFO:
level_str = "INFO";
break;
case GGML_LOG_LEVEL_WARN:
level_str = "WARN";
break;
case GGML_LOG_LEVEL_ERROR:
level_str = "ERROR";
break;
default:
level_str = "?????";
break;
case GGML_LOG_LEVEL_DEBUG: lvl = "DEBUG"; break;
case GGML_LOG_LEVEL_INFO: lvl = "INFO"; break;
case GGML_LOG_LEVEL_WARN: lvl = "WARN"; break;
case GGML_LOG_LEVEL_ERROR: lvl = "ERROR"; break;
default: break;
}
fprintf(stderr, "[%-5s] ", level_str);
fputs(log, stderr);
fprintf(stderr, "[%-5s] %s", lvl, log);
fflush(stderr);
}
// Map language string to language_id token used by the model
static int language_to_id(const char *lang) {
if (!lang || lang[0] == '\0')
return 2050; // default: English
std::string l(lang);
if (l == "en")
return 2050;
if (l == "ru")
return 2069;
if (l == "zh")
return 2055;
if (l == "ja")
return 2058;
if (l == "ko")
return 2064;
if (l == "de")
return 2053;
if (l == "fr")
return 2061;
if (l == "es")
return 2054;
if (l == "it")
return 2056;
if (l == "pt")
return 2057;
fprintf(stderr, "[qwen3-tts-cpp] Unknown language '%s', defaulting to English\n",
lang);
return 2050;
}
int load_model(const char *model_dir, int n_threads) {
int qt3_load(const char *talker_path, const char *codec_path, int use_fa,
int clamp_fp16) {
ggml_log_set(ggml_log_cb, nullptr);
ggml_backend_load_all();
if (n_threads <= 0)
n_threads = 4;
g_threads = n_threads;
fprintf(stderr, "[qwen3-tts-cpp] Loading models from %s (threads=%d)\n",
model_dir, n_threads);
g_engine = new Qwen3TTS();
if (!g_engine->load_models(model_dir)) {
fprintf(stderr, "[qwen3-tts-cpp] FATAL: failed to load models from %s\n",
model_dir);
delete g_engine;
g_engine = nullptr;
if (!talker_path || talker_path[0] == '\0') {
fprintf(stderr, "[qwen3-tts-cpp] ERROR: talker_path is required\n");
return 1;
}
g_loaded = true;
fprintf(stderr, "[qwen3-tts-cpp] Models loaded successfully\n");
return 0;
}
int synthesize(const char *text, const char *ref_audio_path, const char *dst,
const char *language, float temperature, float top_p,
int top_k, float repetition_penalty, int max_audio_tokens,
int n_threads) {
if (!g_loaded || !g_engine) {
fprintf(stderr, "[qwen3-tts-cpp] ERROR: models not loaded\n");
return 1;
}
if (!text || !dst) {
fprintf(stderr, "[qwen3-tts-cpp] ERROR: text and dst are required\n");
if (!codec_path || codec_path[0] == '\0') {
fprintf(stderr, "[qwen3-tts-cpp] ERROR: codec_path is required\n");
return 2;
}
tts_params params;
params.max_audio_tokens = max_audio_tokens > 0 ? max_audio_tokens : 4096;
params.temperature = temperature;
params.top_p = top_p;
params.top_k = top_k;
params.repetition_penalty = repetition_penalty;
params.n_threads = n_threads > 0 ? n_threads : g_threads;
params.language_id = language_to_id(language);
qt_init_params p;
qt_init_default_params(&p);
p.talker_path = talker_path;
p.codec_path = codec_path;
p.use_fa = use_fa != 0;
p.clamp_fp16 = clamp_fp16 != 0;
fprintf(stderr, "[qwen3-tts-cpp] Synthesizing: text='%.50s%s', lang_id=%d, "
"temp=%.2f, threads=%d\n",
text, (strlen(text) > 50 ? "..." : ""), params.language_id,
temperature, params.n_threads);
fprintf(stderr, "[qwen3-tts-cpp] Loading talker=%s codec=%s\n", talker_path,
codec_path);
tts_result result;
bool has_ref = ref_audio_path && ref_audio_path[0] != '\0';
if (has_ref) {
fprintf(stderr, "[qwen3-tts-cpp] Voice cloning with ref: %s\n",
ref_audio_path);
result = g_engine->synthesize_with_voice(text, ref_audio_path, params);
} else {
result = g_engine->synthesize(text, params);
}
if (!result.success) {
fprintf(stderr, "[qwen3-tts-cpp] ERROR: synthesis failed: %s\n",
result.error_msg.c_str());
g_ctx = qt_init(&p);
if (!g_ctx) {
fprintf(stderr, "[qwen3-tts-cpp] FATAL: qt_init failed: %s\n",
qt_last_error());
return 3;
}
int n_samples = (int)result.audio.size();
if (n_samples == 0) {
fprintf(stderr, "[qwen3-tts-cpp] ERROR: synthesis produced no samples\n");
return 4;
}
fprintf(stderr,
"[qwen3-tts-cpp] Synthesis done: %d samples (%.2fs @ 24kHz)\n",
n_samples, (float)n_samples / 24000.0f);
if (!save_audio_file(dst, result.audio, result.sample_rate)) {
fprintf(stderr, "[qwen3-tts-cpp] ERROR: failed to write %s\n", dst);
return 5;
}
fprintf(stderr, "[qwen3-tts-cpp] Wrote %s\n", dst);
fprintf(stderr, "[qwen3-tts-cpp] Model loaded (%s)\n", qt_version());
return 0;
}
// Fill a qt_tts_params from the flat wrapper arguments. Unset/zero scalars keep
// the qt defaults (temperature 0.9, top_k 50, top_p 1.0, rep 1.05, max 2048).
static void fill_params(qt_tts_params *tp, const char *text, const char *lang,
const char *instruct, const char *speaker,
const float *ref_samples, int ref_n,
const char *ref_text, long long seed, float temperature,
int top_k, float top_p, float repetition_penalty,
int max_new_tokens) {
qt_tts_default_params(tp);
tp->text = text ? text : "";
if (lang && lang[0] != '\0')
tp->lang = lang; // else keep default NULL -> auto
if (instruct && instruct[0] != '\0')
tp->instruct = instruct;
if (speaker && speaker[0] != '\0')
tp->speaker = speaker;
if (ref_samples && ref_n > 0) {
tp->ref_audio_24k = ref_samples;
tp->ref_n_samples = ref_n;
if (ref_text && ref_text[0] != '\0')
tp->ref_text = ref_text;
}
if (seed >= 0)
tp->seed = (int64_t)seed; // else default -1 (random)
if (temperature > 0.0f)
tp->temperature = temperature;
if (top_k > 0)
tp->top_k = top_k;
if (top_p > 0.0f)
tp->top_p = top_p;
if (repetition_penalty > 0.0f)
tp->repetition_penalty = repetition_penalty;
if (max_new_tokens > 0)
tp->max_new_tokens = max_new_tokens;
}
float *qt3_tts(const char *text, const char *lang, const char *instruct,
const char *speaker, const float *ref_samples, int ref_n,
const char *ref_text, long long seed, float temperature,
int top_k, float top_p, float repetition_penalty,
int max_new_tokens, int *out_n) {
if (out_n)
*out_n = 0;
if (!g_ctx) {
fprintf(stderr, "[qwen3-tts-cpp] ERROR: model not loaded\n");
return nullptr;
}
if (!text || text[0] == '\0') {
fprintf(stderr, "[qwen3-tts-cpp] ERROR: text is required\n");
return nullptr;
}
qt_tts_params tp;
fill_params(&tp, text, lang, instruct, speaker, ref_samples, ref_n,
ref_text, seed, temperature, top_k, top_p, repetition_penalty,
max_new_tokens);
qt_audio out = {0};
enum qt_status rc = qt_synthesize(g_ctx, &tp, &out);
if (rc != QT_STATUS_OK || out.n_samples <= 0 || !out.samples) {
fprintf(stderr, "[qwen3-tts-cpp] ERROR: synthesize failed (rc=%d): %s\n",
(int)rc, qt_last_error());
qt_audio_free(&out);
return nullptr;
}
// Copy into a plain malloc buffer the Go side frees via qt3_pcm_free.
size_t bytes = (size_t)out.n_samples * sizeof(float);
float *buf = (float *)malloc(bytes);
if (!buf) {
fprintf(stderr, "[qwen3-tts-cpp] ERROR: malloc(%zu) failed\n", bytes);
qt_audio_free(&out);
return nullptr;
}
memcpy(buf, out.samples, bytes);
if (out_n)
*out_n = out.n_samples;
qt_audio_free(&out);
return buf;
}
int qt3_tts_stream(const char *text, const char *lang, const char *instruct,
const char *speaker, const float *ref_samples, int ref_n,
const char *ref_text, long long seed, float temperature,
int top_k, float top_p, float repetition_penalty,
int max_new_tokens, qt3_chunk_cb cb, void *user_data) {
if (!g_ctx) {
fprintf(stderr, "[qwen3-tts-cpp] ERROR: model not loaded\n");
return 1;
}
if (!cb) {
fprintf(stderr, "[qwen3-tts-cpp] ERROR: stream callback is null\n");
return 2;
}
if (!text || text[0] == '\0') {
fprintf(stderr, "[qwen3-tts-cpp] ERROR: text is required\n");
return 4;
}
qt_tts_params tp;
fill_params(&tp, text, lang, instruct, speaker, ref_samples, ref_n,
ref_text, seed, temperature, top_k, top_p, repetition_penalty,
max_new_tokens);
// qt_audio_chunk_cb has the identical signature to qt3_chunk_cb
// (bool vs int return are ABI-compatible; non-zero == true).
tp.on_chunk = (qt_audio_chunk_cb)cb;
tp.on_chunk_user_data = user_data;
qt_audio out = {0}; // stays empty in streaming mode
enum qt_status rc = qt_synthesize(g_ctx, &tp, &out);
qt_audio_free(&out);
if (rc != QT_STATUS_OK && rc != QT_STATUS_CANCELLED) {
fprintf(stderr, "[qwen3-tts-cpp] ERROR: stream synth failed (rc=%d): %s\n",
(int)rc, qt_last_error());
return 3;
}
return 0;
}
void qt3_pcm_free(float *p) { free(p); }
void qt3_unload(void) {
if (g_ctx) {
qt_free(g_ctx);
g_ctx = nullptr;
}
}
int qt3_n_speakers(void) { return g_ctx ? qt_n_speakers(g_ctx) : 0; }
const char *qt3_speaker_name(int i) {
return g_ctx ? qt_speaker_name(g_ctx, i) : nullptr;
}

View File

@@ -1,12 +1,47 @@
#pragma once
#include <cstddef>
#include <cstdint>
extern "C" {
int load_model(const char *model_dir, int n_threads);
int synthesize(const char *text, const char *ref_audio_path, const char *dst,
const char *language, float temperature, float top_p,
int top_k, float repetition_penalty, int max_audio_tokens,
int n_threads);
// Streaming PCM chunk callback. samples is mono float PCM at 24 kHz, valid
// only for the duration of the call. Return non-zero to continue, 0 to abort.
typedef int (*qt3_chunk_cb)(const float *samples, int n_samples,
void *user_data);
// Load the talker + codec/tokenizer GGUFs. use_fa / clamp_fp16 map to
// qt_init_params (the qt ABI exposes no thread count; ggml uses its own
// default). Returns 0 on success, non-zero on failure.
int qt3_load(const char *talker_path, const char *codec_path, int use_fa,
int clamp_fp16);
// Synthesize to a malloc'd float PCM buffer (caller frees via qt3_pcm_free).
// The synthesis mode (base / custom_voice / voice_design) is auto-detected by
// qt from the talker GGUF; speaker is honoured only for custom_voice, instruct
// for voice_design / custom_voice, and ref_samples (+ optional ref_text) drive
// base-mode cloning. qt enforces the rules and we surface qt_last_error() on
// QT_STATUS_MODE_INVALID. Writes the sample count to *out_n. Returns NULL on
// failure (out_n set to 0).
float *qt3_tts(const char *text, const char *lang, const char *instruct,
const char *speaker, const float *ref_samples, int ref_n,
const char *ref_text, long long seed, float temperature,
int top_k, float top_p, float repetition_penalty,
int max_new_tokens, int *out_n);
// Streaming synthesis: cb is invoked per PCM chunk as audio is produced. Same
// param semantics as qt3_tts. Returns 0 on success.
int qt3_tts_stream(const char *text, const char *lang, const char *instruct,
const char *speaker, const float *ref_samples, int ref_n,
const char *ref_text, long long seed, float temperature,
int top_k, float top_p, float repetition_penalty,
int max_new_tokens, qt3_chunk_cb cb, void *user_data);
// Free a buffer returned by qt3_tts.
void qt3_pcm_free(float *p);
// Release the qt context.
void qt3_unload(void);
// Named-speaker introspection (custom_voice models). Returns 0 / NULL when no
// model is loaded or the index is out of range.
int qt3_n_speakers(void);
const char *qt3_speaker_name(int i);
}

View File

@@ -0,0 +1,95 @@
package main
import (
"math"
"os"
"strings"
"github.com/ebitengine/purego"
pb "github.com/mudler/LocalAI/pkg/grpc/proto"
. "github.com/onsi/ginkgo/v2"
. "github.com/onsi/gomega"
)
func ttsReq(text, voice string, lang *string, dst string) *pb.TTSRequest {
return &pb.TTSRequest{Text: text, Voice: voice, Language: lang, Dst: dst}
}
var _ = Describe("qwen3-tts-cpp e2e", Label("e2e"), func() {
var loaded bool
BeforeEach(func() {
modelPath := os.Getenv("QWEN3TTS_MODEL")
codecPath := os.Getenv("QWEN3TTS_CODEC")
if modelPath == "" || codecPath == "" {
Skip("QWEN3TTS_MODEL / QWEN3TTS_CODEC not set; skipping e2e")
}
if !loaded {
lib := os.Getenv("QWEN3TTS_LIBRARY")
if lib == "" {
lib = "./libgoqwen3ttscpp-fallback.so"
}
h, err := purego.Dlopen(lib, purego.RTLD_NOW|purego.RTLD_GLOBAL)
Expect(err).ToNot(HaveOccurred())
purego.RegisterLibFunc(&CppLoad, h, "qt3_load")
purego.RegisterLibFunc(&CppTTS, h, "qt3_tts")
purego.RegisterLibFunc(&CppTTSStream, h, "qt3_tts_stream")
purego.RegisterLibFunc(&CppPCMFree, h, "qt3_pcm_free")
purego.RegisterLibFunc(&CppUnload, h, "qt3_unload")
Expect(CppLoad(modelPath, codecPath, 1, 0)).To(Equal(0))
loaded = true
}
})
It("synthesizes a WAV file via TTS", func() {
b := &Qwen3TtsCpp{opts: loadOptions{seed: 42, useFA: true}}
dst := GinkgoT().TempDir() + "/out.wav"
lang := "english"
err := b.TTS(ttsReq("Hello world.", "", &lang, dst))
Expect(err).ToNot(HaveOccurred())
fi, err := os.Stat(dst)
Expect(err).ToNot(HaveOccurred())
Expect(fi.Size()).To(BeNumerically(">", int64(44)))
})
It("streams audio chunks via TTSStream", func() {
b := &Qwen3TtsCpp{opts: loadOptions{seed: 42, useFA: true}}
results := make(chan []byte, 1024)
lang := "english"
done := make(chan error, 1)
go func() { done <- b.TTSStream(ttsReq("Hello there, streaming test.", "", &lang, ""), results) }()
var chunks int
var first []byte
for c := range results {
if chunks == 0 {
first = c
}
chunks++
}
Expect(<-done).ToNot(HaveOccurred())
Expect(chunks).To(BeNumerically(">=", 2))
Expect(string(first[0:4])).To(Equal("RIFF"))
Expect(strings.HasPrefix(string(first[8:12]), "WAVE")).To(BeTrue())
})
It("clones a voice from the config audio_path reference", func() {
// 1s of 24kHz mono audio as a clone reference; the base model carries
// a speaker encoder, so audio_path drives x-vector voice cloning.
ref := GinkgoT().TempDir() + "/ref.wav"
samples := make([]float32, qwen3ttsSampleRate)
for i := range samples {
samples[i] = float32(0.05 * math.Sin(float64(i)*0.06))
}
Expect(writeWAV24k(ref, samples)).To(Succeed())
b := &Qwen3TtsCpp{opts: loadOptions{seed: 42, useFA: true}, audioPath: ref}
dst := GinkgoT().TempDir() + "/clone.wav"
lang := "english"
// Empty Voice -> the config audio_path is used as the clone reference.
Expect(b.TTS(ttsReq("Cloned voice test.", "", &lang, dst))).To(Succeed())
fi, err := os.Stat(dst)
Expect(err).ToNot(HaveOccurred())
Expect(fi.Size()).To(BeNumerically(">", int64(44)))
})
})

View File

@@ -5,108 +5,225 @@ import (
"os"
"path/filepath"
"strings"
"sync"
"unsafe"
"github.com/ebitengine/purego"
"github.com/mudler/LocalAI/pkg/grpc/base"
pb "github.com/mudler/LocalAI/pkg/grpc/proto"
)
var (
CppLoadModel func(modelDir string, nThreads int) int
CppSynthesize func(text, refAudioPath, dst, language string,
temperature, topP float32, topK int,
repetitionPenalty float32, maxAudioTokens, nThreads int) int
// qt3_load(talker_path, codec_path, use_fa, clamp_fp16) int
CppLoad func(talkerPath, codecPath string, useFA, clampFP16 int) int
// qt3_tts(text, lang, instruct, speaker, ref_samples, ref_n, ref_text,
// seed, temperature, top_k, top_p, rep_pen, max_new, out_n) -> float*
CppTTS func(text, lang, instruct, speaker string, refSamples unsafe.Pointer,
refN int, refText string, seed int64, temperature float32, topK int,
topP, repPen float32, maxNew int, outN unsafe.Pointer) uintptr
// qt3_tts_stream(..., cb, user) int
CppTTSStream func(text, lang, instruct, speaker string, refSamples unsafe.Pointer,
refN int, refText string, seed int64, temperature float32, topK int,
topP, repPen float32, maxNew int, cb uintptr, user uintptr) int
CppPCMFree func(ptr uintptr)
CppUnload func()
)
type Qwen3TtsCpp struct {
base.SingleThread
threads int
}
// languageNameAliases maps common full language names to the canonical
// two-letter code understood by the C++ language_to_id table.
var languageNameAliases = map[string]string{
"english": "en",
"russian": "ru",
"chinese": "zh",
"japanese": "ja",
"korean": "ko",
"german": "de",
"french": "fr",
"spanish": "es",
"italian": "it",
"portuguese": "pt",
}
// normalizeLanguage coerces a caller-supplied language into the canonical code
// the model expects. It lowercases, trims, strips any region/locale suffix
// (en-US, en_US, ja.JP -> en/ja), and resolves common full names (english -> en).
// An empty input stays empty so the C++ side applies its English default; an
// unrecognized value is returned normalized so C++ can log it and default.
func normalizeLanguage(lang string) string {
lang = strings.ToLower(strings.TrimSpace(lang))
if lang == "" {
return ""
}
// Strip region/locale suffix: keep the segment before the first separator.
if i := strings.IndexAny(lang, "-_."); i >= 0 {
lang = lang[:i]
}
if code, ok := languageNameAliases[lang]; ok {
return code
}
return lang
opts loadOptions
// audioPath is the model-config reference voice (tts.audio_path), the
// default clone reference when a request omits an audio Voice.
audioPath string
}
func (q *Qwen3TtsCpp) Load(opts *pb.ModelOptions) error {
// ModelFile is the model directory path (containing GGUF files)
modelDir := opts.ModelFile
if modelDir == "" {
modelDir = opts.ModelPath
model := opts.ModelFile
if model == "" {
model = opts.ModelPath
}
if !filepath.IsAbs(model) && opts.ModelPath != "" {
model = filepath.Join(opts.ModelPath, model)
}
// Resolve relative paths
if !filepath.IsAbs(modelDir) && opts.ModelPath != "" {
modelDir = filepath.Join(opts.ModelPath, modelDir)
q.opts = parseOptions(opts.Options)
// Resolve the codec/tokenizer GGUF: explicit option, else auto-discover a
// *tokenizer*.gguf sibling of the talker model.
codec := q.opts.codecPath
if codec != "" && !filepath.IsAbs(codec) {
codec = filepath.Join(filepath.Dir(model), codec)
}
if codec == "" {
codec = discoverTokenizer(filepath.Dir(model))
}
if codec == "" {
return fmt.Errorf("qwen3-tts: no codec/tokenizer GGUF found; set option 'tokenizer:<file>'")
}
q.opts.codecPath = codec
q.audioPath = opts.AudioPath
if q.audioPath != "" && !filepath.IsAbs(q.audioPath) {
q.audioPath = filepath.Join(filepath.Dir(model), q.audioPath)
}
threads := int(opts.Threads)
if threads <= 0 {
threads = 4
useFA := boolToInt(q.opts.useFA)
clamp := boolToInt(q.opts.clampFP16)
fmt.Fprintf(os.Stderr, "[qwen3-tts-cpp] Load talker=%s codec=%s use_fa=%d clamp_fp16=%d\n",
model, codec, useFA, clamp)
if rc := CppLoad(model, codec, useFA, clamp); rc != 0 {
return fmt.Errorf("qwen3-tts: failed to load model (rc=%d)", rc)
}
q.threads = threads
fmt.Fprintf(os.Stderr, "[qwen3-tts-cpp] Loading models from: %s (threads=%d)\n", modelDir, threads)
if ret := CppLoadModel(modelDir, threads); ret != 0 {
return fmt.Errorf("failed to load qwen3-tts model (error code: %d)", ret)
}
return nil
}
// discoverTokenizer returns the first *tokenizer*.gguf in dir, or "".
func discoverTokenizer(dir string) string {
entries, err := os.ReadDir(dir)
if err != nil {
return ""
}
for _, e := range entries {
name := strings.ToLower(e.Name())
if strings.Contains(name, "tokenizer") && strings.HasSuffix(name, ".gguf") {
return filepath.Join(dir, e.Name())
}
}
return ""
}
func boolToInt(b bool) int {
if b {
return 1
}
return 0
}
func optStr(p *string) string {
if p == nil {
return ""
}
return *p
}
// resolveRequest derives the synthesis inputs from a TTSRequest:
// language, instruct, speaker, ref-audio samples, ref-text and sampling.
func (q *Qwen3TtsCpp) resolveRequest(req *pb.TTSRequest) (lang, instruct, speaker, refText string, ref []float32, s sampling, err error) {
lang = normalizeLanguage(optStr(req.Language))
instruct = optStr(req.Instructions)
var refPath string
speaker, refPath = resolveVoice(req.Voice)
if refPath == "" && speaker == "" && q.audioPath != "" {
// No per-request voice: fall back to the config clone reference.
refPath = q.audioPath
}
if refPath != "" {
ref, err = readWAVAsFloat(refPath)
if err != nil {
return
}
}
if req.Params != nil {
refText = req.Params["ref_text"]
}
s = parseSampling(req.Params, q.opts.seed)
return
}
func (q *Qwen3TtsCpp) TTS(req *pb.TTSRequest) error {
text := req.Text
voice := req.Voice // reference audio path for voice cloning (empty = no cloning)
dst := req.Dst
language := ""
if req.Language != nil {
language = normalizeLanguage(*req.Language)
if req.Dst == "" {
return fmt.Errorf("qwen3-tts: TTS requires a destination path")
}
if req.Text == "" {
return fmt.Errorf("qwen3-tts: TTS requires text")
}
lang, instruct, speaker, refText, ref, s, err := q.resolveRequest(req)
if err != nil {
return err
}
var refPtr unsafe.Pointer
if len(ref) > 0 {
refPtr = unsafe.Pointer(&ref[0])
}
// Synthesis parameters with sensible defaults
temperature := float32(0.9)
topP := float32(0.8)
topK := 50
repetitionPenalty := float32(1.05)
maxAudioTokens := 4096
var n int32
ptr := CppTTS(req.Text, lang, instruct, speaker, refPtr, len(ref), refText,
s.seed, s.temperature, s.topK, s.topP, s.repPen, s.maxNew, unsafe.Pointer(&n))
runtimeKeepAlive(ref)
if ptr == 0 {
return fmt.Errorf("qwen3-tts: synthesis failed")
}
// Register the free as soon as we own a non-null buffer, so the n<=0 guard
// below cannot leak it (defensive: the C contract returns NULL on failure).
defer CppPCMFree(ptr)
if n <= 0 {
return fmt.Errorf("qwen3-tts: synthesis produced no samples")
}
src := unsafe.Slice((*float32)(unsafe.Pointer(ptr)), int(n)) //nolint:govet // C-allocated PCM, copied out before free
out := make([]float32, int(n))
copy(out, src)
return writeWAV24k(req.Dst, out)
}
if ret := CppSynthesize(text, voice, dst, language,
temperature, topP, topK, repetitionPenalty,
maxAudioTokens, q.threads); ret != 0 {
return fmt.Errorf("failed to synthesize audio (error code: %d)", ret)
// streamState carries the active TTSStream channel to the single shared C
// callback. base.SingleThread serializes TTS/TTSStream, so one global slot is
// safe and avoids leaking a purego callback per request (purego callbacks
// cannot be freed and are capped).
var (
streamMu sync.Mutex
streamChan chan []byte
streamCbOnce sync.Once
streamCbPtr uintptr
)
// streamCallback is registered once and forwards each PCM chunk to streamChan.
func streamCallback(samples *float32, nSamples int32, _ uintptr) uintptr {
if nSamples <= 0 || samples == nil || streamChan == nil {
return 1 // continue
}
src := unsafe.Slice(samples, int(nSamples))
cp := make([]float32, int(nSamples)) // copy out of C memory before returning
copy(cp, src)
streamChan <- floatToPCM16LE(cp)
return 1 // continue
}
func (q *Qwen3TtsCpp) TTSStream(req *pb.TTSRequest, results chan []byte) error {
defer close(results)
if req.Text == "" {
return fmt.Errorf("qwen3-tts: TTSStream requires text")
}
streamCbOnce.Do(func() {
streamCbPtr = purego.NewCallback(streamCallback)
})
lang, instruct, speaker, refText, ref, s, err := q.resolveRequest(req)
if err != nil {
return err
}
var refPtr unsafe.Pointer
if len(ref) > 0 {
refPtr = unsafe.Pointer(&ref[0])
}
// Emit the WAV header first so the HTTP layer gets a self-describing stream.
results <- wavHeader24k()
streamMu.Lock()
streamChan = results
rc := CppTTSStream(req.Text, lang, instruct, speaker, refPtr, len(ref), refText,
s.seed, s.temperature, s.topK, s.topP, s.repPen, s.maxNew, streamCbPtr, 0)
streamChan = nil
streamMu.Unlock()
runtimeKeepAlive(ref)
if rc != 0 {
return fmt.Errorf("qwen3-tts: streaming synthesis failed (rc=%d)", rc)
}
return nil
}

View File

@@ -1,53 +0,0 @@
package main
import (
"testing"
. "github.com/onsi/ginkgo/v2"
. "github.com/onsi/gomega"
)
func TestLanguageNormalization(t *testing.T) {
RegisterFailHandler(Fail)
RunSpecs(t, "qwen3-tts-cpp language normalization")
}
var _ = Describe("normalizeLanguage", func() {
DescribeTable("maps caller input to the canonical model language code",
func(input, expected string) {
Expect(normalizeLanguage(input)).To(Equal(expected))
},
// Canonical codes pass through unchanged
Entry("canonical en", "en", "en"),
Entry("canonical zh", "zh", "zh"),
Entry("canonical pt", "pt", "pt"),
// Case-insensitive
Entry("uppercase", "EN", "en"),
Entry("mixed case", "Ja", "ja"),
// Surrounding whitespace
Entry("trims whitespace", " en ", "en"),
// Region/locale stripping
Entry("BCP-47 region", "en-US", "en"),
Entry("underscore region", "en_US", "en"),
Entry("dotted locale", "ja.JP", "ja"),
Entry("region + case", "ZH-CN", "zh"),
// Full-name aliases
Entry("english name", "english", "en"),
Entry("chinese name cased", "Chinese", "zh"),
Entry("japanese name", "japanese", "ja"),
Entry("russian name", "russian", "ru"),
Entry("portuguese name", "portuguese", "pt"),
// Empty stays empty (C++ applies the English default)
Entry("empty", "", ""),
Entry("whitespace only", " ", ""),
// Unknown values pass through normalized so C++ can log + default
Entry("unknown code", "klingon", "klingon"),
Entry("unknown with region", "xx-YY", "xx"),
)
})

View File

@@ -19,24 +19,25 @@ type LibFuncs struct {
}
func main() {
// Get library name from environment variable, default to fallback
libName := os.Getenv("QWEN3TTS_LIBRARY")
if libName == "" {
libName = "./libgoqwen3ttscpp-fallback.so"
}
gosd, err := purego.Dlopen(libName, purego.RTLD_NOW|purego.RTLD_GLOBAL)
lib, err := purego.Dlopen(libName, purego.RTLD_NOW|purego.RTLD_GLOBAL)
if err != nil {
panic(err)
}
libFuncs := []LibFuncs{
{&CppLoadModel, "load_model"},
{&CppSynthesize, "synthesize"},
{&CppLoad, "qt3_load"},
{&CppTTS, "qt3_tts"},
{&CppTTSStream, "qt3_tts_stream"},
{&CppPCMFree, "qt3_pcm_free"},
{&CppUnload, "qt3_unload"},
}
for _, lf := range libFuncs {
purego.RegisterLibFunc(lf.FuncPtr, gosd, lf.Name)
purego.RegisterLibFunc(lf.FuncPtr, lib, lf.Name)
}
flag.Parse()

View File

@@ -0,0 +1,161 @@
package main
import (
"strconv"
"strings"
)
// loadOptions holds the parsed model-level options.
type loadOptions struct {
codecPath string
useFA bool
clampFP16 bool
seed int64
}
// sampling holds per-request generation parameters with qt defaults applied.
type sampling struct {
temperature float32
topK int
topP float32
repPen float32
maxNew int
seed int64
}
func splitOption(o string) (key, value string, ok bool) {
i := strings.Index(o, ":")
if i < 0 {
return "", "", false
}
return strings.TrimSpace(o[:i]), strings.TrimSpace(o[i+1:]), true
}
func parseBool(v string) bool { return v == "true" || v == "1" }
// parseOptions reads the backend "key:value" option slice. Unknown keys are
// ignored. Defaults: use_fa true (qt default; CPU still uses the F32 chain),
// seed -1 (engine random).
func parseOptions(opts []string) loadOptions {
o := loadOptions{useFA: true, seed: -1}
for _, oo := range opts {
key, value, ok := splitOption(oo)
if !ok {
continue
}
switch key {
case "tokenizer", "codec":
o.codecPath = value
case "use_fa":
o.useFA = parseBool(value)
case "clamp_fp16":
o.clampFP16 = parseBool(value)
case "seed":
if n, err := strconv.ParseInt(value, 10, 64); err == nil {
o.seed = n
}
}
}
return o
}
// languageAliases maps codes / locales / full names to the upstream qwentts
// language names. "auto" (and empty) map to "" so the engine auto-detects.
var languageAliases = map[string]string{
"en": "english", "english": "english",
"zh": "chinese", "chinese": "chinese", "mandarin": "chinese",
"ja": "japanese", "japanese": "japanese",
"ko": "korean", "korean": "korean",
"de": "german", "german": "german",
"fr": "french", "french": "french",
"es": "spanish", "spanish": "spanish",
"it": "italian", "italian": "italian",
"pt": "portuguese", "portuguese": "portuguese",
"ru": "russian", "russian": "russian",
"auto": "",
}
// normalizeLanguage lowercases, trims, strips a region/locale suffix
// (en-US -> en), and resolves to the qwentts language name. Empty stays empty
// (engine auto-detects); an unknown value passes through normalized.
func normalizeLanguage(lang string) string {
lang = strings.ToLower(strings.TrimSpace(lang))
if lang == "" {
return ""
}
if i := strings.IndexAny(lang, "-_."); i >= 0 {
lang = lang[:i]
}
if v, ok := languageAliases[lang]; ok {
return v
}
return lang
}
var refAudioExts = []string{".wav", ".flac", ".mp3", ".ogg", ".m4a"}
// resolveVoice interprets the request Voice field: a value ending in a known
// audio extension is a clone-reference path; anything else is a named speaker
// (custom_voice). Empty input yields no speaker and no reference.
func resolveVoice(voice string) (speaker, refPath string) {
v := strings.TrimSpace(voice)
if v == "" {
return "", ""
}
lower := strings.ToLower(v)
for _, ext := range refAudioExts {
if strings.HasSuffix(lower, ext) {
return "", v
}
}
return v, ""
}
func parseFloat32(v string, def float32) float32 {
if v == "" {
return def
}
f, err := strconv.ParseFloat(v, 32)
if err != nil {
return def
}
return float32(f)
}
func parseInt(v string, def int) int {
if v == "" {
return def
}
n, err := strconv.Atoi(v)
if err != nil {
return def
}
return n
}
func parseInt64(v string, def int64) int64 {
if v == "" {
return def
}
n, err := strconv.ParseInt(v, 10, 64)
if err != nil {
return def
}
return n
}
// parseSampling reads per-request sampling params from the TTSRequest params
// map, applying qt defaults (matching qt_tts_default_params).
func parseSampling(params map[string]string, defaultSeed int64) sampling {
s := sampling{temperature: 0.9, topK: 50, topP: 1.0, repPen: 1.05, maxNew: 2048, seed: defaultSeed}
if params == nil {
return s
}
s.temperature = parseFloat32(params["temperature"], s.temperature)
s.topK = parseInt(params["top_k"], s.topK)
s.topP = parseFloat32(params["top_p"], s.topP)
s.repPen = parseFloat32(params["repetition_penalty"], s.repPen)
s.maxNew = parseInt(params["max_new_tokens"], s.maxNew)
s.seed = parseInt64(params["seed"], s.seed)
return s
}

View File

@@ -1,173 +1,136 @@
package main
import (
"context"
"os"
"os/exec"
"path/filepath"
"bytes"
"encoding/binary"
"testing"
"time"
pb "github.com/mudler/LocalAI/pkg/grpc/proto"
"google.golang.org/grpc"
"google.golang.org/grpc/credentials/insecure"
. "github.com/onsi/ginkgo/v2"
. "github.com/onsi/gomega"
)
const (
testAddr = "localhost:50051"
startupWait = 5 * time.Second
)
func skipIfNoModel(t *testing.T) string {
t.Helper()
modelDir := os.Getenv("QWEN3TTS_MODEL_DIR")
if modelDir == "" {
t.Skip("QWEN3TTS_MODEL_DIR not set, skipping test (set to directory with GGUF models)")
}
if _, err := os.Stat(filepath.Join(modelDir, "qwen3-tts-0.6b-f16.gguf")); os.IsNotExist(err) {
t.Skipf("TTS model file not found in %s, skipping", modelDir)
}
if _, err := os.Stat(filepath.Join(modelDir, "qwen3-tts-tokenizer-f16.gguf")); os.IsNotExist(err) {
t.Skipf("Tokenizer model file not found in %s, skipping", modelDir)
}
return modelDir
func TestQwen3TtsCpp(t *testing.T) {
RegisterFailHandler(Fail)
RunSpecs(t, "qwen3-tts-cpp suite")
}
func startServer(t *testing.T) *exec.Cmd {
t.Helper()
binary := os.Getenv("QWEN3TTS_BINARY")
if binary == "" {
binary = "./qwen3-tts-cpp"
}
if _, err := os.Stat(binary); os.IsNotExist(err) {
t.Skipf("Backend binary not found at %s, skipping", binary)
}
cmd := exec.Command(binary, "--addr", testAddr)
cmd.Stdout = os.Stderr
cmd.Stderr = os.Stderr
if err := cmd.Start(); err != nil {
t.Fatalf("Failed to start server: %v", err)
}
time.Sleep(startupWait)
return cmd
}
func stopServer(cmd *exec.Cmd) {
if cmd != nil && cmd.Process != nil {
cmd.Process.Kill()
cmd.Wait()
}
}
func dialGRPC(t *testing.T) *grpc.ClientConn {
t.Helper()
conn, err := grpc.Dial(testAddr,
grpc.WithTransportCredentials(insecure.NewCredentials()),
grpc.WithDefaultCallOptions(
grpc.MaxCallRecvMsgSize(50*1024*1024),
grpc.MaxCallSendMsgSize(50*1024*1024),
),
var _ = Describe("normalizeLanguage", func() {
DescribeTable("maps caller language to qwentts language names",
func(in, want string) {
Expect(normalizeLanguage(in)).To(Equal(want))
},
Entry("empty stays empty", "", ""),
Entry("auto maps to empty", "auto", ""),
Entry("english full name", "English", "english"),
Entry("english code", "en", "english"),
Entry("locale suffix stripped", "en-US", "english"),
Entry("underscore locale", "zh_CN", "chinese"),
Entry("mandarin alias", "mandarin", "chinese"),
Entry("japanese already full", "japanese", "japanese"),
Entry("unknown passes through normalized", "xx", "xx"),
)
if err != nil {
t.Fatalf("Failed to dial gRPC: %v", err)
}
return conn
}
})
func TestServerHealth(t *testing.T) {
cmd := startServer(t)
defer stopServer(cmd)
conn := dialGRPC(t)
defer conn.Close()
client := pb.NewBackendClient(conn)
resp, err := client.Health(context.Background(), &pb.HealthMessage{})
if err != nil {
t.Fatalf("Health check failed: %v", err)
}
if string(resp.Message) != "OK" {
t.Fatalf("Expected OK, got %s", string(resp.Message))
}
}
func TestLoadModel(t *testing.T) {
modelDir := skipIfNoModel(t)
cmd := startServer(t)
defer stopServer(cmd)
conn := dialGRPC(t)
defer conn.Close()
client := pb.NewBackendClient(conn)
resp, err := client.LoadModel(context.Background(), &pb.ModelOptions{
ModelFile: modelDir,
Threads: 4,
var _ = Describe("resolveVoice", func() {
It("treats a bare token as a named speaker", func() {
sp, ref := resolveVoice("serena")
Expect(sp).To(Equal("serena"))
Expect(ref).To(BeEmpty())
})
if err != nil {
t.Fatalf("LoadModel failed: %v", err)
}
if !resp.Success {
t.Fatalf("LoadModel returned failure: %s", resp.Message)
}
}
func TestTTS(t *testing.T) {
modelDir := skipIfNoModel(t)
tmpDir, err := os.MkdirTemp("", "qwen3tts-test")
if err != nil {
t.Fatal(err)
}
t.Cleanup(func() { os.RemoveAll(tmpDir) })
outputFile := filepath.Join(tmpDir, "output.wav")
cmd := startServer(t)
defer stopServer(cmd)
conn := dialGRPC(t)
defer conn.Close()
client := pb.NewBackendClient(conn)
// Load models
loadResp, err := client.LoadModel(context.Background(), &pb.ModelOptions{
ModelFile: modelDir,
Threads: 4,
It("treats an audio path as a clone reference (case-insensitive ext)", func() {
sp, ref := resolveVoice("/x/ref.WAV")
Expect(sp).To(BeEmpty())
Expect(ref).To(Equal("/x/ref.WAV"))
})
if err != nil {
t.Fatalf("LoadModel failed: %v", err)
}
if !loadResp.Success {
t.Fatalf("LoadModel returned failure: %s", loadResp.Message)
}
// Synthesize speech
language := "en"
_, err = client.TTS(context.Background(), &pb.TTSRequest{
Text: "Hello, this is a test of the Qwen3 text to speech system.",
Dst: outputFile,
Language: &language,
It("recognizes mp3/flac/ogg/m4a", func() {
for _, p := range []string{"a.mp3", "b.flac", "c.ogg", "d.m4a"} {
sp, ref := resolveVoice(p)
Expect(sp).To(BeEmpty())
Expect(ref).To(Equal(p))
}
})
if err != nil {
t.Fatalf("TTS failed: %v", err)
}
It("returns empty for empty input", func() {
sp, ref := resolveVoice(" ")
Expect(sp).To(BeEmpty())
Expect(ref).To(BeEmpty())
})
})
// Verify output file exists and has content
info, err := os.Stat(outputFile)
if os.IsNotExist(err) {
t.Fatal("Output audio file was not created")
}
if err != nil {
t.Fatalf("Failed to stat output file: %v", err)
}
var _ = Describe("parseOptions", func() {
It("extracts codec, use_fa, clamp_fp16, seed", func() {
o := parseOptions([]string{
"tokenizer:tok.gguf", "use_fa:false", "clamp_fp16:true",
"seed:7", "unknown:ignored",
})
Expect(o.codecPath).To(Equal("tok.gguf"))
Expect(o.useFA).To(BeFalse())
Expect(o.clampFP16).To(BeTrue())
Expect(o.seed).To(Equal(int64(7)))
})
It("accepts codec: as an alias for tokenizer:", func() {
Expect(parseOptions([]string{"codec:c.gguf"}).codecPath).To(Equal("c.gguf"))
})
It("defaults use_fa true and seed -1", func() {
o := parseOptions(nil)
Expect(o.useFA).To(BeTrue())
Expect(o.seed).To(Equal(int64(-1)))
})
})
t.Logf("Output file size: %d bytes", info.Size())
var _ = Describe("parseSampling", func() {
It("applies qt defaults when params are absent", func() {
s := parseSampling(nil, -1)
Expect(s.temperature).To(BeNumerically("~", 0.9, 1e-6))
Expect(s.topK).To(Equal(50))
Expect(s.topP).To(BeNumerically("~", 1.0, 1e-6))
Expect(s.repPen).To(BeNumerically("~", 1.05, 1e-6))
Expect(s.maxNew).To(Equal(2048))
Expect(s.seed).To(Equal(int64(-1)))
})
It("reads overrides and falls back to default seed", func() {
s := parseSampling(map[string]string{
"temperature": "0.5", "top_k": "10", "top_p": "0.8",
"repetition_penalty": "1.2", "max_new_tokens": "512",
}, 99)
Expect(s.temperature).To(BeNumerically("~", 0.5, 1e-6))
Expect(s.topK).To(Equal(10))
Expect(s.topP).To(BeNumerically("~", 0.8, 1e-6))
Expect(s.repPen).To(BeNumerically("~", 1.2, 1e-6))
Expect(s.maxNew).To(Equal(512))
Expect(s.seed).To(Equal(int64(99)))
})
It("reads an explicit seed override", func() {
Expect(parseSampling(map[string]string{"seed": "123"}, -1).seed).To(Equal(int64(123)))
})
})
// WAV header is 44 bytes minimum; any real audio should be much larger
if info.Size() < 1000 {
t.Errorf("Output file too small (%d bytes), expected real audio data", info.Size())
}
}
var _ = Describe("wavHeader24k", func() {
It("emits a 44-byte streaming WAV header at 24 kHz mono 16-bit", func() {
h := wavHeader24k()
Expect(h).To(HaveLen(44))
Expect(string(h[0:4])).To(Equal("RIFF"))
Expect(string(h[8:12])).To(Equal("WAVE"))
Expect(string(h[12:16])).To(Equal("fmt "))
Expect(string(h[36:40])).To(Equal("data"))
var sampleRate uint32
Expect(binary.Read(bytes.NewReader(h[24:28]), binary.LittleEndian, &sampleRate)).To(Succeed())
Expect(sampleRate).To(Equal(uint32(24000)))
})
})
var _ = Describe("floatToPCM16LE", func() {
It("clamps and converts float PCM to little-endian int16 bytes", func() {
b := floatToPCM16LE([]float32{0, 1.0, -1.0, 2.0, -2.0})
Expect(b).To(HaveLen(10))
read := func(off int) int16 {
var v int16
_ = binary.Read(bytes.NewReader(b[off:off+2]), binary.LittleEndian, &v)
return v
}
Expect(read(0)).To(Equal(int16(0)))
Expect(read(2)).To(Equal(int16(32767)))
Expect(read(4)).To(Equal(int16(-32767)))
Expect(read(6)).To(Equal(int16(32767))) // clamped from 2.0
Expect(read(8)).To(Equal(int16(-32767))) // clamped from -2.0
})
})

View File

@@ -2,51 +2,30 @@
set -e
CURDIR=$(dirname "$(realpath $0)")
cd "$CURDIR"
echo "Running qwen3-tts-cpp backend tests..."
# The test requires:
# - QWEN3TTS_MODEL_DIR: path to directory containing GGUF model files
# - QWEN3TTS_BINARY: path to the qwen3-tts-cpp binary (defaults to ./qwen3-tts-cpp)
#
# Tests that require the model will be skipped if QWEN3TTS_MODEL_DIR is not set
# or the directory does not contain the required model files.
cd "$CURDIR"
# Only auto-download models when QWEN3TTS_MODEL_DIR is not explicitly set
if [ -z "$QWEN3TTS_MODEL_DIR" ]; then
export QWEN3TTS_MODEL_DIR="./qwen3-tts-models"
if [ ! -d "$QWEN3TTS_MODEL_DIR" ]; then
echo "Creating qwen3-tts-models directory for tests..."
mkdir -p "$QWEN3TTS_MODEL_DIR"
REPO_ID="endo5501/qwen3-tts.cpp"
echo "Repository: ${REPO_ID}"
echo ""
# Files to download (smallest model for testing)
FILES=(
"qwen3-tts-0.6b-f16.gguf"
"qwen3-tts-tokenizer-f16.gguf"
)
BASE_URL="https://huggingface.co/${REPO_ID}/resolve/main"
for file in "${FILES[@]}"; do
dest="${QWEN3TTS_MODEL_DIR}/${file}"
if [ -f "${dest}" ]; then
echo " [skip] ${file} (already exists)"
else
echo " [download] ${file}..."
curl -L -o "${dest}" "${BASE_URL}/${file}" --progress-bar
echo " [done] ${file}"
fi
done
fi
# Auto-download a small model pair only when QWEN3TTS_MODEL is not set.
if [ -z "$QWEN3TTS_MODEL" ]; then
MODEL_DIR="./qwen3-tts-models"
mkdir -p "$MODEL_DIR"
REPO_ID="Serveurperso/Qwen3-TTS-GGUF"
BASE_URL="https://huggingface.co/${REPO_ID}/resolve/main"
FILES=( "qwen-talker-0.6b-base-Q4_K_M.gguf" "qwen-tokenizer-12hz-Q4_K_M.gguf" )
for file in "${FILES[@]}"; do
dest="${MODEL_DIR}/${file}"
if [ -f "${dest}" ]; then
echo " [skip] ${file}"
else
echo " [download] ${file}..."
curl -L -o "${dest}" "${BASE_URL}/${file}" --progress-bar
fi
done
export QWEN3TTS_MODEL="${MODEL_DIR}/qwen-talker-0.6b-base-Q4_K_M.gguf"
export QWEN3TTS_CODEC="${MODEL_DIR}/qwen-tokenizer-12hz-Q4_K_M.gguf"
fi
# Run Go tests
go test -v -timeout 600s .
go test -v -timeout 1200s .
echo "All qwen3-tts-cpp tests passed."

View File

@@ -62,7 +62,7 @@ var (
shimVadConfigSetDebug func(uintptr, int32)
shimCreateVad func(uintptr, float32) uintptr
// TTS (offline, VITS) config
// TTS (offline, VITS/Piper and Kokoro) config
shimTtsConfigNew func() uintptr
shimTtsConfigFree func(uintptr)
shimTtsConfigSetVitsModel func(uintptr, string)
@@ -76,6 +76,14 @@ var (
shimTtsConfigSetDebug func(uintptr, int32)
shimTtsConfigSetProvider func(uintptr, string)
shimTtsConfigSetMaxNumSentences func(uintptr, int32)
shimTtsConfigSetKokoroModel func(uintptr, string)
shimTtsConfigSetKokoroVoices func(uintptr, string)
shimTtsConfigSetKokoroTokens func(uintptr, string)
shimTtsConfigSetKokoroDataDir func(uintptr, string)
shimTtsConfigSetKokoroDictDir func(uintptr, string)
shimTtsConfigSetKokoroLexicon func(uintptr, string)
shimTtsConfigSetKokoroLang func(uintptr, string)
shimTtsConfigSetKokoroLengthScale func(uintptr, float32)
shimCreateOfflineTts func(uintptr) uintptr
// Offline recognizer config
@@ -101,37 +109,37 @@ var (
shimCreateOfflineRecognizer func(uintptr) uintptr
// Online recognizer config
shimOnlineRecogConfigNew func() uintptr
shimOnlineRecogConfigFree func(uintptr)
shimOnlineRecogConfigSetTransducerEncoder func(uintptr, string)
shimOnlineRecogConfigSetTransducerDecoder func(uintptr, string)
shimOnlineRecogConfigSetTransducerJoiner func(uintptr, string)
shimOnlineRecogConfigSetTokens func(uintptr, string)
shimOnlineRecogConfigSetNumThreads func(uintptr, int32)
shimOnlineRecogConfigSetDebug func(uintptr, int32)
shimOnlineRecogConfigSetProvider func(uintptr, string)
shimOnlineRecogConfigSetFeatSampleRate func(uintptr, int32)
shimOnlineRecogConfigSetFeatFeatureDim func(uintptr, int32)
shimOnlineRecogConfigSetDecodingMethod func(uintptr, string)
shimOnlineRecogConfigSetEnableEndpoint func(uintptr, int32)
shimOnlineRecogConfigNew func() uintptr
shimOnlineRecogConfigFree func(uintptr)
shimOnlineRecogConfigSetTransducerEncoder func(uintptr, string)
shimOnlineRecogConfigSetTransducerDecoder func(uintptr, string)
shimOnlineRecogConfigSetTransducerJoiner func(uintptr, string)
shimOnlineRecogConfigSetTokens func(uintptr, string)
shimOnlineRecogConfigSetNumThreads func(uintptr, int32)
shimOnlineRecogConfigSetDebug func(uintptr, int32)
shimOnlineRecogConfigSetProvider func(uintptr, string)
shimOnlineRecogConfigSetFeatSampleRate func(uintptr, int32)
shimOnlineRecogConfigSetFeatFeatureDim func(uintptr, int32)
shimOnlineRecogConfigSetDecodingMethod func(uintptr, string)
shimOnlineRecogConfigSetEnableEndpoint func(uintptr, int32)
shimOnlineRecogConfigSetRule1MinTrailingSilence func(uintptr, float32)
shimOnlineRecogConfigSetRule2MinTrailingSilence func(uintptr, float32)
shimOnlineRecogConfigSetRule3MinUtteranceLength func(uintptr, float32)
shimCreateOnlineRecognizer func(uintptr) uintptr
shimCreateOnlineRecognizer func(uintptr) uintptr
// Result accessors. Pointer returns use unsafe.Pointer so Go's
// vet checker doesn't flag them — the returned memory is C-owned,
// not subject to Go GC motion.
shimWaveSampleRate func(uintptr) int32
shimWaveNumSamples func(uintptr) int32
shimWaveSamples func(uintptr) unsafe.Pointer
shimOfflineResultText func(uintptr) unsafe.Pointer
shimOnlineResultText func(uintptr) unsafe.Pointer
shimGeneratedAudioSampleRate func(uintptr) int32
shimGeneratedAudioN func(uintptr) int32
shimGeneratedAudioSamples func(uintptr) unsafe.Pointer
shimSpeechSegmentStart func(uintptr) int32
shimSpeechSegmentN func(uintptr) int32
shimWaveSampleRate func(uintptr) int32
shimWaveNumSamples func(uintptr) int32
shimWaveSamples func(uintptr) unsafe.Pointer
shimOfflineResultText func(uintptr) unsafe.Pointer
shimOnlineResultText func(uintptr) unsafe.Pointer
shimGeneratedAudioSampleRate func(uintptr) int32
shimGeneratedAudioN func(uintptr) int32
shimGeneratedAudioSamples func(uintptr) unsafe.Pointer
shimSpeechSegmentStart func(uintptr) int32
shimSpeechSegmentN func(uintptr) int32
// TTS streaming callback trampoline
shimTtsGenerateWithCallback func(tts uintptr, text string, sid int32, speed float32, cb uintptr, ud uintptr) uintptr
@@ -161,13 +169,13 @@ var (
// pointer returned by the shim or `unsafe.Pointer(&slice[0])` from Go.
var (
// VAD
sherpaVadAcceptWaveform func(vad uintptr, samples unsafe.Pointer, n int32)
sherpaVadReset func(vad uintptr)
sherpaVadFlush func(vad uintptr)
sherpaVadEmpty func(vad uintptr) int32
sherpaVadFront func(vad uintptr) uintptr
sherpaVadPop func(vad uintptr)
sherpaDestroySpeechSegment func(seg uintptr)
sherpaVadAcceptWaveform func(vad uintptr, samples unsafe.Pointer, n int32)
sherpaVadReset func(vad uintptr)
sherpaVadFlush func(vad uintptr)
sherpaVadEmpty func(vad uintptr) int32
sherpaVadFront func(vad uintptr) uintptr
sherpaVadPop func(vad uintptr)
sherpaDestroySpeechSegment func(seg uintptr)
// Wave IO
sherpaReadWave func(filename string) uintptr
@@ -175,11 +183,11 @@ var (
sherpaWriteWave func(samples unsafe.Pointer, n int32, sampleRate int32, filename string) int32
// Offline ASR
sherpaCreateOfflineStream func(rec uintptr) uintptr
sherpaDestroyOfflineStream func(stream uintptr)
sherpaAcceptWaveformOffline func(stream uintptr, sr int32, samples unsafe.Pointer, n int32)
sherpaDecodeOfflineStream func(rec uintptr, stream uintptr)
sherpaGetOfflineStreamResult func(stream uintptr) uintptr
sherpaCreateOfflineStream func(rec uintptr) uintptr
sherpaDestroyOfflineStream func(stream uintptr)
sherpaAcceptWaveformOffline func(stream uintptr, sr int32, samples unsafe.Pointer, n int32)
sherpaDecodeOfflineStream func(rec uintptr, stream uintptr)
sherpaGetOfflineStreamResult func(stream uintptr) uintptr
sherpaDestroyOfflineRecognizerResult func(result uintptr)
// Online ASR
@@ -195,21 +203,21 @@ var (
sherpaOnlineStreamInputFinished func(stream uintptr)
// TTS
sherpaOfflineTtsGenerate func(tts uintptr, text string, sid int32, speed float32) uintptr
sherpaOfflineTtsGenerate func(tts uintptr, text string, sid int32, speed float32) uintptr
sherpaDestroyOfflineTtsGeneratedAudio func(audio uintptr)
sherpaOfflineTtsSampleRate func(tts uintptr) int32
sherpaOfflineTtsSampleRate func(tts uintptr) int32
// Offline speaker diarization. Result handle owns the segment-array
// pointer returned by ResultSortByStartTime; destroy the segment
// array first, then the result, then (at backend Free()) the diarizer.
sherpaDestroyOfflineSpeakerDiarization func(sd uintptr)
sherpaOfflineSpeakerDiarizationGetSampleRate func(sd uintptr) int32
sherpaOfflineSpeakerDiarizationProcess func(sd uintptr, samples unsafe.Pointer, n int32) uintptr
sherpaOfflineSpeakerDiarizationResultGetNumSegments func(result uintptr) int32
sherpaOfflineSpeakerDiarizationResultGetNumSpeakers func(result uintptr) int32
sherpaOfflineSpeakerDiarizationResultSortByStartTime func(result uintptr) uintptr
sherpaOfflineSpeakerDiarizationDestroySegment func(segs uintptr)
sherpaDestroyOfflineSpeakerDiarizationResult func(result uintptr)
sherpaDestroyOfflineSpeakerDiarization func(sd uintptr)
sherpaOfflineSpeakerDiarizationGetSampleRate func(sd uintptr) int32
sherpaOfflineSpeakerDiarizationProcess func(sd uintptr, samples unsafe.Pointer, n int32) uintptr
sherpaOfflineSpeakerDiarizationResultGetNumSegments func(result uintptr) int32
sherpaOfflineSpeakerDiarizationResultGetNumSpeakers func(result uintptr) int32
sherpaOfflineSpeakerDiarizationResultSortByStartTime func(result uintptr) uintptr
sherpaOfflineSpeakerDiarizationDestroySegment func(segs uintptr)
sherpaDestroyOfflineSpeakerDiarizationResult func(result uintptr)
)
var (
@@ -278,6 +286,14 @@ func loadSherpaLibsOnce() error {
{&shimTtsConfigSetDebug, "sherpa_shim_tts_config_set_debug"},
{&shimTtsConfigSetProvider, "sherpa_shim_tts_config_set_provider"},
{&shimTtsConfigSetMaxNumSentences, "sherpa_shim_tts_config_set_max_num_sentences"},
{&shimTtsConfigSetKokoroModel, "sherpa_shim_tts_config_set_kokoro_model"},
{&shimTtsConfigSetKokoroVoices, "sherpa_shim_tts_config_set_kokoro_voices"},
{&shimTtsConfigSetKokoroTokens, "sherpa_shim_tts_config_set_kokoro_tokens"},
{&shimTtsConfigSetKokoroDataDir, "sherpa_shim_tts_config_set_kokoro_data_dir"},
{&shimTtsConfigSetKokoroDictDir, "sherpa_shim_tts_config_set_kokoro_dict_dir"},
{&shimTtsConfigSetKokoroLexicon, "sherpa_shim_tts_config_set_kokoro_lexicon"},
{&shimTtsConfigSetKokoroLang, "sherpa_shim_tts_config_set_kokoro_lang"},
{&shimTtsConfigSetKokoroLengthScale, "sherpa_shim_tts_config_set_kokoro_length_scale"},
{&shimCreateOfflineTts, "sherpa_shim_create_offline_tts"},
{&shimOfflineRecogConfigNew, "sherpa_shim_offline_recog_config_new"},
@@ -688,21 +704,14 @@ func (s *SherpaBackend) loadTTS(opts *pb.ModelOptions) error {
cfg := shimTtsConfigNew()
defer shimTtsConfigFree(cfg)
shimTtsConfigSetVitsModel(cfg, modelFile)
if tokensPath := filepath.Join(modelDir, "tokens.txt"); fileExists(tokensPath) {
shimTtsConfigSetVitsTokens(cfg, tokensPath)
// Kokoro models ship a voices style file alongside the ONNX, whereas
// VITS/Piper voices do not. That presence is what tells the two model
// families apart, since both arrive as a plain *.onnx in modelDir.
if isKokoroModel(modelDir) {
s.configureKokoroTTS(cfg, opts, modelFile, modelDir)
} else {
s.configureVitsTTS(cfg, opts, modelFile, modelDir)
}
if lexiconPath := filepath.Join(modelDir, "lexicon.txt"); fileExists(lexiconPath) {
shimTtsConfigSetVitsLexicon(cfg, lexiconPath)
}
if dataDir := filepath.Join(modelDir, "espeak-ng-data"); dirExists(dataDir) {
shimTtsConfigSetVitsDataDir(cfg, dataDir)
}
shimTtsConfigSetVitsNoiseScale(cfg, findOptionFloat(opts, optionTtsNoiseScale, 0.667))
shimTtsConfigSetVitsNoiseScaleW(cfg, findOptionFloat(opts, optionTtsNoiseScaleW, 0.8))
shimTtsConfigSetVitsLengthScale(cfg, findOptionFloat(opts, optionTtsLengthScale, 1.0))
threads := int32(1)
if opts.Threads != 0 {
@@ -723,6 +732,80 @@ func (s *SherpaBackend) loadTTS(opts *pb.ModelOptions) error {
return nil
}
// kokoroVoicesFile is the speaker-style bank that ships with Kokoro models and
// is absent from VITS/Piper voices; its presence is how loadTTS tells them apart.
const kokoroVoicesFile = "voices.bin"
// isKokoroModel reports whether modelDir holds a Kokoro model (a voices file
// next to the ONNX) rather than a VITS/Piper single-speaker model.
func isKokoroModel(modelDir string) bool {
return fileExists(filepath.Join(modelDir, kokoroVoicesFile))
}
// configureVitsTTS wires a VITS/Piper single-speaker model into cfg: the ONNX
// plus the optional tokens, lexicon and espeak-ng-data found beside it.
func (s *SherpaBackend) configureVitsTTS(cfg uintptr, opts *pb.ModelOptions, modelFile, modelDir string) {
shimTtsConfigSetVitsModel(cfg, modelFile)
if tokensPath := filepath.Join(modelDir, "tokens.txt"); fileExists(tokensPath) {
shimTtsConfigSetVitsTokens(cfg, tokensPath)
}
if lexiconPath := filepath.Join(modelDir, "lexicon.txt"); fileExists(lexiconPath) {
shimTtsConfigSetVitsLexicon(cfg, lexiconPath)
}
if dataDir := filepath.Join(modelDir, "espeak-ng-data"); dirExists(dataDir) {
shimTtsConfigSetVitsDataDir(cfg, dataDir)
}
shimTtsConfigSetVitsNoiseScale(cfg, findOptionFloat(opts, optionTtsNoiseScale, 0.667))
shimTtsConfigSetVitsNoiseScaleW(cfg, findOptionFloat(opts, optionTtsNoiseScaleW, 0.8))
shimTtsConfigSetVitsLengthScale(cfg, findOptionFloat(opts, optionTtsLengthScale, 1.0))
}
// configureKokoroTTS wires a Kokoro model into cfg: the ONNX, its voices bank,
// tokens, and the optional espeak-ng-data / jieba dict / lexicon assets the
// multi-lingual packs ship. A language hint comes from the `language=` option.
func (s *SherpaBackend) configureKokoroTTS(cfg uintptr, opts *pb.ModelOptions, modelFile, modelDir string) {
shimTtsConfigSetKokoroModel(cfg, modelFile)
shimTtsConfigSetKokoroVoices(cfg, filepath.Join(modelDir, kokoroVoicesFile))
if tokensPath := filepath.Join(modelDir, "tokens.txt"); fileExists(tokensPath) {
shimTtsConfigSetKokoroTokens(cfg, tokensPath)
}
if dataDir := filepath.Join(modelDir, "espeak-ng-data"); dirExists(dataDir) {
shimTtsConfigSetKokoroDataDir(cfg, dataDir)
}
if dictDir := filepath.Join(modelDir, "dict"); dirExists(dictDir) {
shimTtsConfigSetKokoroDictDir(cfg, dictDir)
}
// Multi-lingual Kokoro ships per-language lexicons; the C API takes them as
// a single comma-separated list. US and GB English overlap almost entirely,
// so pass only one (US preferred) to avoid tens of thousands of "duplicated
// word" warnings at load; non-English lexicons (e.g. zh) are additive.
var lexicons []string
addLexicon := func(name string) {
if p := filepath.Join(modelDir, name); fileExists(p) {
lexicons = append(lexicons, p)
}
}
if fileExists(filepath.Join(modelDir, "lexicon-us-en.txt")) {
addLexicon("lexicon-us-en.txt")
} else {
addLexicon("lexicon-gb-en.txt")
}
addLexicon("lexicon-zh.txt")
addLexicon("lexicon.txt")
if len(lexicons) > 0 {
shimTtsConfigSetKokoroLexicon(cfg, strings.Join(lexicons, ","))
}
if lang := findOptionValue(opts, optionLanguage, ""); lang != "" {
shimTtsConfigSetKokoroLang(cfg, lang)
}
shimTtsConfigSetKokoroLengthScale(cfg, findOptionFloat(opts, optionTtsLengthScale, 1.0))
}
func fileExists(p string) bool {
info, err := os.Stat(p)
return err == nil && !info.IsDir()
@@ -1252,7 +1335,7 @@ type ttsStreamState struct {
var (
ttsStates sync.Map // uint64 → *ttsStreamState
ttsNextID atomic.Uint64
ttsCallbackPtr uintptr // purego.NewCallback return; registered in loadSherpaLibs
ttsCallbackPtr uintptr // purego.NewCallback return; registered in loadSherpaLibs
)
// ttsStreamCallback is invoked by sherpa-onnx for each PCM chunk VITS

View File

@@ -124,6 +124,20 @@ var _ = Describe("Sherpa-ONNX", func() {
Entry("empty", "", false),
Entry("other", "other", false),
)
It("isKokoroModel detects a voices file beside the ONNX", func() {
dir, err := os.MkdirTemp("", "sherpa-kokoro-*")
Expect(err).NotTo(HaveOccurred())
defer func() { _ = os.RemoveAll(dir) }()
// A bare VITS/Piper directory (ONNX only) is not Kokoro.
Expect(os.WriteFile(filepath.Join(dir, "model.onnx"), []byte("x"), 0o600)).To(Succeed())
Expect(isKokoroModel(dir)).To(BeFalse())
// Adding the Kokoro voices bank flips detection on.
Expect(os.WriteFile(filepath.Join(dir, kokoroVoicesFile), []byte("x"), 0o600)).To(Succeed())
Expect(isKokoroModel(dir)).To(BeTrue())
})
})
Context("option parsing", func() {

View File

@@ -79,6 +79,13 @@ void sherpa_shim_tts_config_free(void *h) {
free((char *)c->model.vits.tokens);
free((char *)c->model.vits.lexicon);
free((char *)c->model.vits.data_dir);
free((char *)c->model.kokoro.model);
free((char *)c->model.kokoro.voices);
free((char *)c->model.kokoro.tokens);
free((char *)c->model.kokoro.data_dir);
free((char *)c->model.kokoro.dict_dir);
free((char *)c->model.kokoro.lexicon);
free((char *)c->model.kokoro.lang);
free((char *)c->model.provider);
free(c);
}
@@ -117,6 +124,34 @@ void sherpa_shim_tts_config_set_max_num_sentences(void *h, int32_t v) {
((SherpaOnnxOfflineTtsConfig *)h)->max_num_sentences = v;
}
// Kokoro multi-speaker / multi-lingual TTS. Distinct ONNX + a voices style
// file (voices.bin) instead of VITS' single-speaker graph; espeak-ng-data,
// lexicon and a language hint are optional refinements.
void sherpa_shim_tts_config_set_kokoro_model(void *h, const char *v) {
shim_set_str(&((SherpaOnnxOfflineTtsConfig *)h)->model.kokoro.model, v);
}
void sherpa_shim_tts_config_set_kokoro_voices(void *h, const char *v) {
shim_set_str(&((SherpaOnnxOfflineTtsConfig *)h)->model.kokoro.voices, v);
}
void sherpa_shim_tts_config_set_kokoro_tokens(void *h, const char *v) {
shim_set_str(&((SherpaOnnxOfflineTtsConfig *)h)->model.kokoro.tokens, v);
}
void sherpa_shim_tts_config_set_kokoro_data_dir(void *h, const char *v) {
shim_set_str(&((SherpaOnnxOfflineTtsConfig *)h)->model.kokoro.data_dir, v);
}
void sherpa_shim_tts_config_set_kokoro_dict_dir(void *h, const char *v) {
shim_set_str(&((SherpaOnnxOfflineTtsConfig *)h)->model.kokoro.dict_dir, v);
}
void sherpa_shim_tts_config_set_kokoro_lexicon(void *h, const char *v) {
shim_set_str(&((SherpaOnnxOfflineTtsConfig *)h)->model.kokoro.lexicon, v);
}
void sherpa_shim_tts_config_set_kokoro_lang(void *h, const char *v) {
shim_set_str(&((SherpaOnnxOfflineTtsConfig *)h)->model.kokoro.lang, v);
}
void sherpa_shim_tts_config_set_kokoro_length_scale(void *h, float v) {
((SherpaOnnxOfflineTtsConfig *)h)->model.kokoro.length_scale = v;
}
void *sherpa_shim_create_offline_tts(void *h) {
return (void *)SherpaOnnxCreateOfflineTts(
(const SherpaOnnxOfflineTtsConfig *)h);

View File

@@ -37,7 +37,7 @@ void sherpa_shim_vad_config_set_provider(void *cfg, const char *v);
void sherpa_shim_vad_config_set_debug(void *cfg, int32_t v);
void *sherpa_shim_create_vad(void *cfg, float buffer_size_seconds);
// --- Offline TTS config (VITS path — the only TTS family the backend uses) ---
// --- Offline TTS config (VITS/Piper and Kokoro model families) ---
void *sherpa_shim_tts_config_new(void);
void sherpa_shim_tts_config_free(void *cfg);
void sherpa_shim_tts_config_set_vits_model(void *cfg, const char *v);
@@ -51,6 +51,14 @@ void sherpa_shim_tts_config_set_num_threads(void *cfg, int32_t v);
void sherpa_shim_tts_config_set_debug(void *cfg, int32_t v);
void sherpa_shim_tts_config_set_provider(void *cfg, const char *v);
void sherpa_shim_tts_config_set_max_num_sentences(void *cfg, int32_t v);
void sherpa_shim_tts_config_set_kokoro_model(void *cfg, const char *v);
void sherpa_shim_tts_config_set_kokoro_voices(void *cfg, const char *v);
void sherpa_shim_tts_config_set_kokoro_tokens(void *cfg, const char *v);
void sherpa_shim_tts_config_set_kokoro_data_dir(void *cfg, const char *v);
void sherpa_shim_tts_config_set_kokoro_dict_dir(void *cfg, const char *v);
void sherpa_shim_tts_config_set_kokoro_lexicon(void *cfg, const char *v);
void sherpa_shim_tts_config_set_kokoro_lang(void *cfg, const char *v);
void sherpa_shim_tts_config_set_kokoro_length_scale(void *cfg, float v);
void *sherpa_shim_create_offline_tts(void *cfg);
// --- Offline recognizer config (Whisper / Paraformer / SenseVoice / Omnilingual) ---

View File

@@ -8,10 +8,16 @@ JOBS?=$(shell nproc --ignore=1)
# stablediffusion.cpp (ggml)
STABLEDIFFUSION_GGML_REPO?=https://github.com/leejet/stable-diffusion.cpp
STABLEDIFFUSION_GGML_VERSION?=19bdfe22d255d5b4dff39d449318b9bc5ea2317f
STABLEDIFFUSION_GGML_VERSION?=7f0e728b7d42f2490dfa5dd9539082d904f2f6b2
CMAKE_ARGS+=-DGGML_MAX_NAME=128
# Enable the ggml RPC backend so generation can be sharded across remote
# rpc-server workers (the same backend-agnostic ggml rpc-server used by the
# llama.cpp backend). Servers are selected via the `rpc_servers` option or the
# LLAMACPP_GRPC_SERVERS env var (populated automatically in p2p worker mode).
CMAKE_ARGS+=-DSD_RPC=ON
ifeq ($(NATIVE),false)
CMAKE_ARGS+=-DGGML_NATIVE=OFF
endif

View File

@@ -391,10 +391,18 @@ int load_model(const char *model, char *model_path, char* options[], int threads
const char *control_net_path = "";
const char *embedding_dir = "";
const char *photo_maker_path = "";
const char *pulid_weights_path = "";
const char *tensor_type_rules = "";
char *lora_dir = model_path;
bool vae_decode_only = true;
// Upstream backend/parameter placement specs (see docs/.../stablediffusion).
// Empty means "leave at upstream default" (nullptr).
const char *backend_arg = "";
const char *params_backend_arg = "";
const char *rpc_servers_arg = "";
const char *max_vram_arg = "";
bool stream_layers = false;
int n_threads = threads;
enum sd_type_t wtype = SD_TYPE_COUNT;
enum rng_type_t rng_type = CUDA_RNG;
@@ -418,7 +426,9 @@ int load_model(const char *model, char *model_path, char* options[], int threads
// If options is not NULL, parse options
for (int i = 0; options[i] != NULL; i++) {
const char *optname = strtok(options[i], ":");
const char *optval = strtok(NULL, ":");
// Take everything after the first ':' as the value so values may
// themselves contain colons (e.g. rpc_servers host:port lists).
const char *optval = strtok(NULL, "");
if (optval == NULL) {
optval = "true";
}
@@ -490,9 +500,21 @@ int load_model(const char *model, char *model_path, char* options[], int threads
}
}
if (!strcmp(optname, "photo_maker_path")) photo_maker_path = strdup(optval);
if (!strcmp(optname, "pulid_weights_path")) pulid_weights_path = strdup(optval);
if (!strcmp(optname, "tensor_type_rules")) tensor_type_rules = strdup(optval);
if (!strcmp(optname, "vae_decode_only")) vae_decode_only = (strcmp(optval, "true") == 0 || strcmp(optval, "1") == 0);
// Backend / parameter placement specs (see prepare_backend_assignments
// in the upstream CLI). These compose with the legacy keep_*_on_cpu /
// offload_params_to_cpu booleans below.
if (!strcmp(optname, "backend")) backend_arg = strdup(optval);
if (!strcmp(optname, "params_backend")) params_backend_arg = strdup(optval);
if (!strcmp(optname, "rpc_servers")) rpc_servers_arg = strdup(optval);
if (!strcmp(optname, "max_vram")) max_vram_arg = strdup(optval);
if (!strcmp(optname, "stream_layers")) stream_layers = (strcmp(optval, "true") == 0 || strcmp(optval, "1") == 0);
// vae_decode_only is still accepted for backwards compatibility with
// existing gallery configs, but upstream dropped the option (the model
// now decides), so it is parsed and ignored.
if (!strcmp(optname, "offload_params_to_cpu")) offload_params_to_cpu = (strcmp(optval, "true") == 0 || strcmp(optval, "1") == 0);
if (!strcmp(optname, "keep_clip_on_cpu")) keep_clip_on_cpu = (strcmp(optval, "true") == 0 || strcmp(optval, "1") == 0);
if (!strcmp(optname, "keep_control_net_on_cpu")) keep_control_net_on_cpu = (strcmp(optval, "true") == 0 || strcmp(optval, "1") == 0);
@@ -591,20 +613,48 @@ int load_model(const char *model, char *model_path, char* options[], int threads
ctx_params.embeddings = embedding_vec.empty() ? NULL : embedding_vec.data();
ctx_params.embedding_count = static_cast<uint32_t>(embedding_vec.size());
ctx_params.photo_maker_path = photo_maker_path;
if (strlen(pulid_weights_path) > 0) ctx_params.pulid_weights_path = pulid_weights_path;
ctx_params.tensor_type_rules = tensor_type_rules;
ctx_params.vae_decode_only = vae_decode_only;
// XXX: Setting to true causes a segfault on the second run
ctx_params.free_params_immediately = false;
ctx_params.n_threads = n_threads;
ctx_params.rng_type = rng_type;
ctx_params.keep_clip_on_cpu = keep_clip_on_cpu;
if (wtype != SD_TYPE_COUNT) ctx_params.wtype = wtype;
if (sampler_rng_type != RNG_TYPE_COUNT) ctx_params.sampler_rng_type = sampler_rng_type;
if (prediction != PREDICTION_COUNT) ctx_params.prediction = prediction;
if (lora_apply_mode != LORA_APPLY_MODE_COUNT) ctx_params.lora_apply_mode = lora_apply_mode;
ctx_params.offload_params_to_cpu = offload_params_to_cpu;
ctx_params.keep_control_net_on_cpu = keep_control_net_on_cpu;
ctx_params.keep_vae_on_cpu = keep_vae_on_cpu;
// Backend / parameter placement specs. Upstream replaced the boolean
// CPU-offload knobs (offload_params_to_cpu, keep_clip_on_cpu, keep_vae_on_cpu,
// keep_control_net_on_cpu) with these specs. Seed from the explicit
// backend/params_backend options, then prepend the legacy boolean-derived
// assignments, mirroring prepare_backend_assignments() in the upstream CLI.
// These strings must outlive new_sd_ctx() below.
std::string backend_spec = backend_arg;
std::string params_backend_spec = params_backend_arg;
auto prepend_spec = [](std::string& spec, const char* assignment) {
spec = spec.empty() ? std::string(assignment) : std::string(assignment) + "," + spec;
};
if (offload_params_to_cpu) prepend_spec(params_backend_spec, "*=cpu");
if (keep_clip_on_cpu) prepend_spec(backend_spec, "te=cpu");
if (keep_vae_on_cpu) prepend_spec(backend_spec, "vae=cpu");
if (keep_control_net_on_cpu) prepend_spec(backend_spec, "controlnet=cpu");
if (!backend_spec.empty()) ctx_params.backend = backend_spec.c_str();
if (!params_backend_spec.empty()) ctx_params.params_backend = params_backend_spec.c_str();
// RPC servers: prefer the explicit option, otherwise fall back to the
// LLAMACPP_GRPC_SERVERS env var. LocalAI's p2p worker mode populates that
// var with discovered ggml rpc-server workers (shared with the llama.cpp
// backend), so distributed image generation works with no extra config.
if (strlen(rpc_servers_arg) > 0) {
ctx_params.rpc_servers = rpc_servers_arg;
} else {
const char* env_rpc_servers = std::getenv("LLAMACPP_GRPC_SERVERS");
if (env_rpc_servers != NULL && strlen(env_rpc_servers) > 0) {
ctx_params.rpc_servers = env_rpc_servers;
}
}
// max_vram: GiB budget or per-backend spec for graph-cut segmented param
// offload ("0" = disabled, "-1" = auto). stream_layers only has effect when
// max_vram is set.
if (strlen(max_vram_arg) > 0) ctx_params.max_vram = max_vram_arg;
ctx_params.stream_layers = stream_layers;
ctx_params.diffusion_flash_attn = diffusion_flash_attn;
ctx_params.tae_preview_only = tae_preview_only;
ctx_params.diffusion_conv_direct = diffusion_conv_direct;

4
backend/go/supertonic/.gitignore vendored Normal file
View File

@@ -0,0 +1,4 @@
/supertonic
/sources/
/backend-assets/
/package/

View File

@@ -0,0 +1,62 @@
CURRENT_DIR=$(abspath ./)
GOCMD=go
ONNX_VERSION?=1.24.4
ONNX_ARCH?=x64
ONNX_OS?=linux
ifneq (,$(findstring aarch64,$(shell uname -m)))
ONNX_ARCH=aarch64
endif
ifeq ($(OS),Darwin)
ONNX_OS=osx
ifneq (,$(findstring arm64,$(shell uname -m)))
ONNX_ARCH=arm64
else
ONNX_ARCH=x86_64
endif
endif
# CUDA 12 ships as -gpu, CUDA 13 as -gpu_cuda13 (underscore). CPU has no suffix.
ifeq ($(BUILD_TYPE),cublas)
ONNX_PROVIDER=cuda
ifeq ($(CUDA_MAJOR_VERSION),13)
ONNX_VARIANT=-gpu_cuda13
else
ONNX_VARIANT=-gpu
endif
else
ONNX_VARIANT=
ONNX_PROVIDER=cpu
endif
sources/onnxruntime:
mkdir -p sources/onnxruntime
curl -L https://github.com/microsoft/onnxruntime/releases/download/v$(ONNX_VERSION)/onnxruntime-$(ONNX_OS)-$(ONNX_ARCH)$(ONNX_VARIANT)-$(ONNX_VERSION).tgz \
-o sources/onnxruntime/onnxruntime.tgz
cd sources/onnxruntime && tar -xf onnxruntime.tgz --strip-components=1 && rm onnxruntime.tgz
backend-assets/lib: sources/onnxruntime
mkdir -p backend-assets/lib
cp -rfLv sources/onnxruntime/lib/* backend-assets/lib/
supertonic: backend-assets/lib
CGO_ENABLED=1 $(GOCMD) build \
-ldflags "$(LD_FLAGS) -X main.onnxProvider=$(ONNX_PROVIDER)" \
-tags "$(GO_TAGS)" -o supertonic ./
package:
bash package.sh
build: supertonic package
# Tests need only the Go toolchain (gcc); yalue dlopens onnxruntime at
# runtime, so no tarball download is required to compile or run unit specs.
test:
CGO_ENABLED=1 $(GOCMD) test -v -timeout 120s ./...
clean:
rm -rf supertonic sources/ backend-assets/ package/
.PHONY: build package clean test

View File

@@ -0,0 +1,307 @@
package main
import (
"bytes"
"encoding/binary"
"fmt"
"os"
"path/filepath"
"strconv"
"strings"
"sync"
laudio "github.com/mudler/LocalAI/pkg/audio"
"github.com/mudler/LocalAI/pkg/grpc/base"
pb "github.com/mudler/LocalAI/pkg/grpc/proto"
)
// onnxProvider is set via -ldflags "-X main.onnxProvider=cuda" by the
// CUDA build (later phase). Defaults to CPU.
var onnxProvider = "cpu"
// Per-model generation defaults, overridable via ModelOptions.Options:
//
// supertonic.steps=<int> denoising steps (quality), default 8
// supertonic.speed=<float> speech rate, default 1.05
// supertonic.silence=<float> inter-chunk silence seconds, default 0.3
// supertonic.default_voice=<name> voice-style used when request omits voice
// supertonic.default_lang=<lang> language tag used when request omits it
const (
optionSteps = "supertonic.steps="
optionSpeed = "supertonic.speed="
optionSilence = "supertonic.silence="
optionDefaultVoice = "supertonic.default_voice="
optionDefaultLang = "supertonic.default_lang="
)
type SupertonicBackend struct {
base.SingleThread
tts *TextToSpeech
cfg Config
modelDir string
voicesDir string
defaultVoice string
defaultLang string
steps int
speed float32
silence float32
styleMu sync.Mutex
styles map[string]*Style // voice name -> loaded style cache
}
func (s *SupertonicBackend) Load(opts *pb.ModelOptions) error {
modelDir, err := resolveModelDir(opts.ModelFile)
if err != nil {
return err
}
s.modelDir = modelDir
s.voicesDir = resolveVoicesDir(modelDir)
cfg, err := LoadCfgs(modelDir)
if err != nil {
return fmt.Errorf("loading tts.json from %s: %w", modelDir, err)
}
s.cfg = cfg
// onnxProvider is "cpu" for the CPU build; the CUDA build sets it to
// "cuda" via -ldflags. Upstream LoadTextToSpeech still errors on GPU
// until the CUDA phase wires the execution provider.
tts, err := LoadTextToSpeech(modelDir, onnxProvider == "cuda", cfg)
if err != nil {
return fmt.Errorf("loading supertonic models from %s: %w", modelDir, err)
}
s.tts = tts
s.steps = int(findOptionInt(opts, optionSteps, 8))
s.speed = findOptionFloat(opts, optionSpeed, 1.05)
s.silence = findOptionFloat(opts, optionSilence, 0.3)
s.defaultVoice = findOptionValue(opts, optionDefaultVoice, "")
s.defaultLang = findOptionValue(opts, optionDefaultLang, "na")
s.styles = map[string]*Style{}
return nil
}
func (s *SupertonicBackend) TTS(req *pb.TTSRequest) error {
wav, sr, err := s.synthesize(req)
if err != nil {
return err
}
out := make([]float64, len(wav))
for i, v := range wav {
out[i] = float64(v)
}
if err := writeWavFile(req.Dst, out, sr); err != nil {
return fmt.Errorf("writing wav to %s: %w", req.Dst, err)
}
return nil
}
func (s *SupertonicBackend) TTSStream(req *pb.TTSRequest, results chan []byte) error {
defer close(results)
wav, sr, err := s.synthesize(req)
if err != nil {
return err
}
results <- streamingWAVHeader(uint32(sr))
const chunkSamples = 4096
for off := 0; off < len(wav); off += chunkSamples {
end := off + chunkSamples
if end > len(wav) {
end = len(wav)
}
results <- pcmFloatToInt16LE(wav[off:end])
}
return nil
}
// synthesize runs the full pipeline and returns the trimmed mono float32
// PCM and its sample rate.
func (s *SupertonicBackend) synthesize(req *pb.TTSRequest) ([]float32, int, error) {
if s.tts == nil {
return nil, 0, fmt.Errorf("supertonic model not loaded")
}
if strings.TrimSpace(req.Text) == "" {
return nil, 0, fmt.Errorf("empty text")
}
style, err := s.loadStyle(s.voiceName(req.Voice))
if err != nil {
return nil, 0, err
}
lang := s.resolveLang("")
if req.Language != nil {
lang = s.resolveLang(*req.Language)
}
wav, dur, err := s.tts.Call(req.Text, lang, style, s.steps, s.speed, s.silence)
if err != nil {
return nil, 0, err
}
sr := s.tts.SampleRate
// Call returns concatenated audio; trim to the reported duration.
wavLen := int(float32(sr) * dur)
if wavLen < 0 {
wavLen = 0
}
if wavLen > len(wav) {
wavLen = len(wav)
}
return wav[:wavLen], sr, nil
}
// voiceName picks the request voice, falling back to the model default.
func (s *SupertonicBackend) voiceName(reqVoice string) string {
v := strings.TrimSpace(reqVoice)
if v == "" {
return s.defaultVoice
}
return v
}
// resolveLang validates against AvailableLangs, falling back to the model
// default (then "na").
func (s *SupertonicBackend) resolveLang(reqLang string) string {
l := strings.TrimSpace(reqLang)
if l != "" && isValidLang(l) {
return l
}
if s.defaultLang != "" && isValidLang(s.defaultLang) {
return s.defaultLang
}
return "na"
}
// loadStyle resolves and caches a voice-style. An empty name with no model
// default is an error (supertonic requires a style embedding).
func (s *SupertonicBackend) loadStyle(name string) (*Style, error) {
if name == "" {
return nil, fmt.Errorf("no voice specified and no supertonic.default_voice set")
}
s.styleMu.Lock()
defer s.styleMu.Unlock()
if st, ok := s.styles[name]; ok {
return st, nil
}
path := s.voiceStylePath(name)
st, err := LoadVoiceStyle([]string{path}, false)
if err != nil {
return nil, fmt.Errorf("loading voice style %q (%s): %w", name, path, err)
}
s.styles[name] = st
return st, nil
}
// voiceStylePath maps a voice name to a JSON path. Absolute paths are honored;
// names containing a separator resolve under modelDir; bare names resolve under
// the resolved voicesDir (see resolveVoicesDir).
func (s *SupertonicBackend) voiceStylePath(name string) string {
if !strings.HasSuffix(name, ".json") {
name += ".json"
}
if filepath.IsAbs(name) {
return name
}
if strings.ContainsRune(name, filepath.Separator) {
return filepath.Join(s.modelDir, name)
}
return filepath.Join(s.voicesDir, name)
}
// resolveVoicesDir locates the voice_styles directory. The HF model layout
// puts the ONNX files in an onnx/ subdir with voice_styles/ as its sibling,
// so check modelDir/voice_styles first, then the parent's voice_styles.
func resolveVoicesDir(modelDir string) string {
candidates := []string{
filepath.Join(modelDir, "voice_styles"),
filepath.Join(filepath.Dir(modelDir), "voice_styles"),
}
for _, c := range candidates {
if info, err := os.Stat(c); err == nil && info.IsDir() {
return c
}
}
return candidates[0]
}
// resolveModelDir accepts either a directory (used as-is) or a file (its
// parent dir is used).
func resolveModelDir(modelFile string) (string, error) {
if modelFile == "" {
return "", fmt.Errorf("empty model path")
}
info, err := os.Stat(modelFile)
if err != nil {
return "", fmt.Errorf("stat model path %s: %w", modelFile, err)
}
if info.IsDir() {
return modelFile, nil
}
return filepath.Dir(modelFile), nil
}
// ---- option helpers (mirrors backend/go/sherpa-onnx/backend.go) ----
func findOptionValue(opts *pb.ModelOptions, prefix, def string) string {
for _, o := range opts.Options {
if strings.HasPrefix(o, prefix) {
return strings.TrimPrefix(o, prefix)
}
}
return def
}
func findOptionFloat(opts *pb.ModelOptions, prefix string, def float32) float32 {
raw := findOptionValue(opts, prefix, "")
if raw == "" {
return def
}
v, err := strconv.ParseFloat(raw, 32)
if err != nil {
return def
}
return float32(v)
}
func findOptionInt(opts *pb.ModelOptions, prefix string, def int32) int32 {
raw := findOptionValue(opts, prefix, "")
if raw == "" {
return def
}
v, err := strconv.ParseInt(raw, 10, 32)
if err != nil {
return def
}
return int32(v)
}
// ---- PCM helpers ----
func pcmFloatToInt16LE(samples []float32) []byte {
buf := make([]byte, len(samples)*2)
for i, f := range samples {
v := int32(f * 32767)
if v > 32767 {
v = 32767
} else if v < -32768 {
v = -32768
}
binary.LittleEndian.PutUint16(buf[2*i:], uint16(int16(v)))
}
return buf
}
func streamingWAVHeader(sampleRate uint32) []byte {
const streamingSize = 0xFFFFFFFF
h := laudio.NewWAVHeaderWithRate(streamingSize, sampleRate)
h.ChunkSize = streamingSize
var buf bytes.Buffer
_ = h.Write(&buf)
return buf.Bytes()
}

View File

@@ -0,0 +1,86 @@
package main
import (
"os"
"path/filepath"
. "github.com/onsi/ginkgo/v2"
. "github.com/onsi/gomega"
pb "github.com/mudler/LocalAI/pkg/grpc/proto"
)
var _ = Describe("voiceStylePath", func() {
s := &SupertonicBackend{modelDir: "/models/st/onnx", voicesDir: "/models/st/voice_styles"}
It("resolves a bare name under the resolved voicesDir", func() {
Expect(s.voiceStylePath("M1")).To(Equal(filepath.Join("/models/st/voice_styles", "M1.json")))
})
It("keeps an explicit .json suffix", func() {
Expect(s.voiceStylePath("M1.json")).To(Equal(filepath.Join("/models/st/voice_styles", "M1.json")))
})
It("honors absolute paths", func() {
Expect(s.voiceStylePath("/abs/v.json")).To(Equal("/abs/v.json"))
})
})
var _ = Describe("resolveVoicesDir", func() {
It("prefers voice_styles under modelDir", func() {
dir := GinkgoT().TempDir()
Expect(os.MkdirAll(filepath.Join(dir, "voice_styles"), 0o755)).To(Succeed())
Expect(resolveVoicesDir(dir)).To(Equal(filepath.Join(dir, "voice_styles")))
})
It("falls back to the sibling voice_styles next to an onnx subdir", func() {
root := GinkgoT().TempDir()
Expect(os.MkdirAll(filepath.Join(root, "voice_styles"), 0o755)).To(Succeed())
Expect(os.MkdirAll(filepath.Join(root, "onnx"), 0o755)).To(Succeed())
Expect(resolveVoicesDir(filepath.Join(root, "onnx"))).To(Equal(filepath.Join(root, "voice_styles")))
})
})
var _ = Describe("resolveLang", func() {
It("accepts a valid request language", func() {
s := &SupertonicBackend{defaultLang: "na"}
Expect(s.resolveLang("ko")).To(Equal("ko"))
})
It("falls back to the model default for an invalid language", func() {
s := &SupertonicBackend{defaultLang: "en"}
Expect(s.resolveLang("zz")).To(Equal("en"))
})
It("falls back to na when nothing is valid", func() {
s := &SupertonicBackend{defaultLang: ""}
Expect(s.resolveLang("")).To(Equal("na"))
})
})
var _ = Describe("pcmFloatToInt16LE", func() {
It("clamps and encodes little-endian", func() {
out := pcmFloatToInt16LE([]float32{0, 1.0, -1.0, 2.0})
Expect(out).To(HaveLen(8))
Expect(out[0:2]).To(Equal([]byte{0x00, 0x00})) // 0
Expect(out[2:4]).To(Equal([]byte{0xff, 0x7f})) // 32767
Expect(out[6:8]).To(Equal([]byte{0xff, 0x7f})) // clamp 2.0 -> 32767
})
})
var _ = Describe("end-to-end synthesis", Ordered, func() {
var modelDir string
BeforeAll(func() {
modelDir = os.Getenv("SUPERTONIC_MODEL_PATH")
if modelDir == "" {
Skip("set SUPERTONIC_MODEL_PATH to a supertonic model dir to run")
}
Expect(InitializeONNXRuntime()).To(Succeed())
})
It("synthesizes a wav file", func() {
b := &SupertonicBackend{}
Expect(b.Load(&pb.ModelOptions{ModelFile: modelDir, Options: []string{"supertonic.default_voice=F1"}})).To(Succeed())
dst := filepath.Join(GinkgoT().TempDir(), "out.wav")
lang := "en"
Expect(b.TTS(&pb.TTSRequest{Text: "Hello from LocalAI.", Dst: dst, Language: &lang})).To(Succeed())
info, err := os.Stat(dst)
Expect(err).ToNot(HaveOccurred())
Expect(info.Size()).To(BeNumerically(">", 44)) // header + PCM
})
})

View File

File diff suppressed because it is too large Load Diff

View File

@@ -0,0 +1,27 @@
package main
// Started internally by LocalAI; a server is allocated per model.
import (
"flag"
grpc "github.com/mudler/LocalAI/pkg/grpc"
ort "github.com/yalue/onnxruntime_go"
)
var addr = flag.String("addr", "localhost:50051", "the address to connect to")
func main() {
flag.Parse()
// InitializeONNXRuntime reads ONNXRUNTIME_LIB_PATH (set by run.sh) and
// dlopens libonnxruntime before any session is created in Load().
if err := InitializeONNXRuntime(); err != nil {
panic(err)
}
defer func() { _ = ort.DestroyEnvironment() }()
if err := grpc.StartServer(*addr, &SupertonicBackend{}); err != nil {
panic(err)
}
}

View File

@@ -1,4 +1,4 @@
package ssewire
package main
import (
"testing"
@@ -7,7 +7,7 @@ import (
. "github.com/onsi/gomega"
)
func TestSsewire(t *testing.T) {
func TestSupertonic(t *testing.T) {
RegisterFailHandler(Fail)
RunSpecs(t, "ssewire test suite")
RunSpecs(t, "Supertonic backend test suite")
}

View File

@@ -0,0 +1,49 @@
#!/bin/bash
set -e
CURDIR=$(dirname "$(realpath $0)")
REPO_ROOT="${CURDIR}/../../.."
mkdir -p $CURDIR/package/lib
cp -avf $CURDIR/supertonic $CURDIR/package/
cp -avf $CURDIR/run.sh $CURDIR/package/
cp -rfLv $CURDIR/backend-assets/lib/* $CURDIR/package/lib/
if [ -f "/lib64/ld-linux-x86-64.so.2" ]; then
echo "Detected x86_64 architecture, copying x86_64 libraries..."
cp -arfLv /lib64/ld-linux-x86-64.so.2 $CURDIR/package/lib/ld.so
cp -arfLv /lib/x86_64-linux-gnu/libc.so.6 $CURDIR/package/lib/libc.so.6
cp -arfLv /lib/x86_64-linux-gnu/libgcc_s.so.1 $CURDIR/package/lib/libgcc_s.so.1
cp -arfLv /lib/x86_64-linux-gnu/libstdc++.so.6 $CURDIR/package/lib/libstdc++.so.6
cp -arfLv /lib/x86_64-linux-gnu/libm.so.6 $CURDIR/package/lib/libm.so.6
cp -arfLv /lib/x86_64-linux-gnu/libgomp.so.1 $CURDIR/package/lib/libgomp.so.1
cp -arfLv /lib/x86_64-linux-gnu/libdl.so.2 $CURDIR/package/lib/libdl.so.2
cp -arfLv /lib/x86_64-linux-gnu/librt.so.1 $CURDIR/package/lib/librt.so.1
cp -arfLv /lib/x86_64-linux-gnu/libpthread.so.0 $CURDIR/package/lib/libpthread.so.0
elif [ -f "/lib/ld-linux-aarch64.so.1" ]; then
echo "Detected ARM64 architecture, copying ARM64 libraries..."
cp -arfLv /lib/ld-linux-aarch64.so.1 $CURDIR/package/lib/ld.so
cp -arfLv /lib/aarch64-linux-gnu/libc.so.6 $CURDIR/package/lib/libc.so.6
cp -arfLv /lib/aarch64-linux-gnu/libgcc_s.so.1 $CURDIR/package/lib/libgcc_s.so.1
cp -arfLv /lib/aarch64-linux-gnu/libstdc++.so.6 $CURDIR/package/lib/libstdc++.so.6
cp -arfLv /lib/aarch64-linux-gnu/libm.so.6 $CURDIR/package/lib/libm.so.6
cp -arfLv /lib/aarch64-linux-gnu/libgomp.so.1 $CURDIR/package/lib/libgomp.so.1
cp -arfLv /lib/aarch64-linux-gnu/libdl.so.2 $CURDIR/package/lib/libdl.so.2
cp -arfLv /lib/aarch64-linux-gnu/librt.so.1 $CURDIR/package/lib/librt.so.1
cp -arfLv /lib/aarch64-linux-gnu/libpthread.so.0 $CURDIR/package/lib/libpthread.so.0
else
echo "Error: Could not detect architecture"
exit 1
fi
GPU_LIB_SCRIPT="${REPO_ROOT}/scripts/build/package-gpu-libs.sh"
if [ -f "$GPU_LIB_SCRIPT" ]; then
echo "Packaging GPU libraries for BUILD_TYPE=${BUILD_TYPE:-cpu}..."
source "$GPU_LIB_SCRIPT" "$CURDIR/package/lib"
package_gpu_libs
fi
echo "Packaging completed successfully"
ls -liah $CURDIR/package/
ls -liah $CURDIR/package/lib/

14
backend/go/supertonic/run.sh Executable file
View File

@@ -0,0 +1,14 @@
#!/bin/bash
set -ex
CURDIR=$(dirname "$(realpath $0)")
export LD_LIBRARY_PATH=$CURDIR/lib:$LD_LIBRARY_PATH
export ONNXRUNTIME_LIB_PATH=$CURDIR/lib/libonnxruntime.so
if [ -f $CURDIR/lib/ld.so ]; then
echo "Using lib/ld.so"
exec $CURDIR/lib/ld.so $CURDIR/supertonic "$@"
fi
exec $CURDIR/supertonic "$@"

View File

@@ -8,7 +8,7 @@ JOBS?=$(shell nproc --ignore=1)
# whisper.cpp version
WHISPER_REPO?=https://github.com/ggml-org/whisper.cpp
WHISPER_CPP_VERSION?=df7638d8229a243af8a4b5a8ae557e0d74e0a0ae
WHISPER_CPP_VERSION?=86c40c3bd6fc86f1187fb751d111b49e0fc18e84
SO_TARGET?=libgowhisper.so
CMAKE_ARGS+=-DBUILD_SHARED_LIBS=OFF

View File

@@ -366,6 +366,218 @@
nvidia-l4t-cuda-13: "cuda13-nvidia-l4t-arm64-locate-anything-cpp"
intel: "intel-sycl-f32-locate-anything-cpp"
vulkan: "vulkan-locate-anything-cpp"
- !!merge <<: *locateanything
name: "locate-anything-development"
capabilities:
default: "cpu-locate-anything-cpp-development"
nvidia: "cuda12-locate-anything-cpp-development"
nvidia-cuda-12: "cuda12-locate-anything-cpp-development"
nvidia-cuda-13: "cuda13-locate-anything-cpp-development"
nvidia-l4t: "nvidia-l4t-arm64-locate-anything-cpp-development"
nvidia-l4t-cuda-12: "nvidia-l4t-arm64-locate-anything-cpp-development"
nvidia-l4t-cuda-13: "cuda13-nvidia-l4t-arm64-locate-anything-cpp-development"
intel: "intel-sycl-f32-locate-anything-cpp-development"
vulkan: "vulkan-locate-anything-cpp-development"
- !!merge <<: *locateanything
name: "cpu-locate-anything-cpp"
uri: "quay.io/go-skynet/local-ai-backends:latest-cpu-locate-anything-cpp"
mirrors:
- localai/localai-backends:latest-cpu-locate-anything-cpp
- !!merge <<: *locateanything
name: "cpu-locate-anything-cpp-development"
uri: "quay.io/go-skynet/local-ai-backends:master-cpu-locate-anything-cpp"
mirrors:
- localai/localai-backends:master-cpu-locate-anything-cpp
- !!merge <<: *locateanything
name: "cuda12-locate-anything-cpp"
uri: "quay.io/go-skynet/local-ai-backends:latest-gpu-nvidia-cuda-12-locate-anything-cpp"
mirrors:
- localai/localai-backends:latest-gpu-nvidia-cuda-12-locate-anything-cpp
- !!merge <<: *locateanything
name: "cuda12-locate-anything-cpp-development"
uri: "quay.io/go-skynet/local-ai-backends:master-gpu-nvidia-cuda-12-locate-anything-cpp"
mirrors:
- localai/localai-backends:master-gpu-nvidia-cuda-12-locate-anything-cpp
- !!merge <<: *locateanything
name: "cuda13-locate-anything-cpp"
uri: "quay.io/go-skynet/local-ai-backends:latest-gpu-nvidia-cuda-13-locate-anything-cpp"
mirrors:
- localai/localai-backends:latest-gpu-nvidia-cuda-13-locate-anything-cpp
- !!merge <<: *locateanything
name: "cuda13-locate-anything-cpp-development"
uri: "quay.io/go-skynet/local-ai-backends:master-gpu-nvidia-cuda-13-locate-anything-cpp"
mirrors:
- localai/localai-backends:master-gpu-nvidia-cuda-13-locate-anything-cpp
- !!merge <<: *locateanything
name: "nvidia-l4t-arm64-locate-anything-cpp"
uri: "quay.io/go-skynet/local-ai-backends:latest-nvidia-l4t-arm64-locate-anything-cpp"
mirrors:
- localai/localai-backends:latest-nvidia-l4t-arm64-locate-anything-cpp
- !!merge <<: *locateanything
name: "nvidia-l4t-arm64-locate-anything-cpp-development"
uri: "quay.io/go-skynet/local-ai-backends:master-nvidia-l4t-arm64-locate-anything-cpp"
mirrors:
- localai/localai-backends:master-nvidia-l4t-arm64-locate-anything-cpp
- !!merge <<: *locateanything
name: "cuda13-nvidia-l4t-arm64-locate-anything-cpp"
uri: "quay.io/go-skynet/local-ai-backends:latest-nvidia-l4t-cuda-13-arm64-locate-anything-cpp"
mirrors:
- localai/localai-backends:latest-nvidia-l4t-cuda-13-arm64-locate-anything-cpp
- !!merge <<: *locateanything
name: "cuda13-nvidia-l4t-arm64-locate-anything-cpp-development"
uri: "quay.io/go-skynet/local-ai-backends:master-nvidia-l4t-cuda-13-arm64-locate-anything-cpp"
mirrors:
- localai/localai-backends:master-nvidia-l4t-cuda-13-arm64-locate-anything-cpp
- !!merge <<: *locateanything
name: "intel-sycl-f32-locate-anything-cpp"
uri: "quay.io/go-skynet/local-ai-backends:latest-gpu-intel-sycl-f32-locate-anything-cpp"
mirrors:
- localai/localai-backends:latest-gpu-intel-sycl-f32-locate-anything-cpp
- !!merge <<: *locateanything
name: "intel-sycl-f32-locate-anything-cpp-development"
uri: "quay.io/go-skynet/local-ai-backends:master-gpu-intel-sycl-f32-locate-anything-cpp"
mirrors:
- localai/localai-backends:master-gpu-intel-sycl-f32-locate-anything-cpp
- !!merge <<: *locateanything
name: "intel-sycl-f16-locate-anything-cpp"
uri: "quay.io/go-skynet/local-ai-backends:latest-gpu-intel-sycl-f16-locate-anything-cpp"
mirrors:
- localai/localai-backends:latest-gpu-intel-sycl-f16-locate-anything-cpp
- !!merge <<: *locateanything
name: "intel-sycl-f16-locate-anything-cpp-development"
uri: "quay.io/go-skynet/local-ai-backends:master-gpu-intel-sycl-f16-locate-anything-cpp"
mirrors:
- localai/localai-backends:master-gpu-intel-sycl-f16-locate-anything-cpp
- !!merge <<: *locateanything
name: "vulkan-locate-anything-cpp"
uri: "quay.io/go-skynet/local-ai-backends:latest-gpu-vulkan-locate-anything-cpp"
mirrors:
- localai/localai-backends:latest-gpu-vulkan-locate-anything-cpp
- !!merge <<: *locateanything
name: "vulkan-locate-anything-cpp-development"
uri: "quay.io/go-skynet/local-ai-backends:master-gpu-vulkan-locate-anything-cpp"
mirrors:
- localai/localai-backends:master-gpu-vulkan-locate-anything-cpp
- &depthanything
name: "depth-anything"
alias: "depth-anything"
license: apache-2.0
description: |
Depth Anything 3 monocular metric depth + camera pose estimation in C/C++
using GGML. Loads pre-built GGUF weights and, given an image, returns a
dense depth map plus the recovered camera extrinsics (3x4) and intrinsics
(3x3). No Python at inference (purego, cgo-less).
urls:
- https://github.com/mudler/depth-anything.cpp
- https://huggingface.co/depth-anything/Depth-Anything-V3
tags:
- depth-estimation
- camera-pose
- depth-anything
- gpu
- cpu
capabilities:
default: "cpu-depth-anything-cpp"
nvidia: "cuda12-depth-anything-cpp"
nvidia-cuda-12: "cuda12-depth-anything-cpp"
nvidia-cuda-13: "cuda13-depth-anything-cpp"
nvidia-l4t: "nvidia-l4t-arm64-depth-anything-cpp"
nvidia-l4t-cuda-12: "nvidia-l4t-arm64-depth-anything-cpp"
nvidia-l4t-cuda-13: "cuda13-nvidia-l4t-arm64-depth-anything-cpp"
intel: "intel-sycl-f32-depth-anything-cpp"
vulkan: "vulkan-depth-anything-cpp"
- !!merge <<: *depthanything
name: "depth-anything-development"
capabilities:
default: "cpu-depth-anything-cpp-development"
nvidia: "cuda12-depth-anything-cpp-development"
nvidia-cuda-12: "cuda12-depth-anything-cpp-development"
nvidia-cuda-13: "cuda13-depth-anything-cpp-development"
nvidia-l4t: "nvidia-l4t-arm64-depth-anything-cpp-development"
nvidia-l4t-cuda-12: "nvidia-l4t-arm64-depth-anything-cpp-development"
nvidia-l4t-cuda-13: "cuda13-nvidia-l4t-arm64-depth-anything-cpp-development"
intel: "intel-sycl-f32-depth-anything-cpp-development"
vulkan: "vulkan-depth-anything-cpp-development"
- !!merge <<: *depthanything
name: "cpu-depth-anything-cpp"
uri: "quay.io/go-skynet/local-ai-backends:latest-cpu-depth-anything-cpp"
mirrors:
- localai/localai-backends:latest-cpu-depth-anything-cpp
- !!merge <<: *depthanything
name: "cpu-depth-anything-cpp-development"
uri: "quay.io/go-skynet/local-ai-backends:master-cpu-depth-anything-cpp"
mirrors:
- localai/localai-backends:master-cpu-depth-anything-cpp
- !!merge <<: *depthanything
name: "cuda12-depth-anything-cpp"
uri: "quay.io/go-skynet/local-ai-backends:latest-gpu-nvidia-cuda-12-depth-anything-cpp"
mirrors:
- localai/localai-backends:latest-gpu-nvidia-cuda-12-depth-anything-cpp
- !!merge <<: *depthanything
name: "cuda12-depth-anything-cpp-development"
uri: "quay.io/go-skynet/local-ai-backends:master-gpu-nvidia-cuda-12-depth-anything-cpp"
mirrors:
- localai/localai-backends:master-gpu-nvidia-cuda-12-depth-anything-cpp
- !!merge <<: *depthanything
name: "cuda13-depth-anything-cpp"
uri: "quay.io/go-skynet/local-ai-backends:latest-gpu-nvidia-cuda-13-depth-anything-cpp"
mirrors:
- localai/localai-backends:latest-gpu-nvidia-cuda-13-depth-anything-cpp
- !!merge <<: *depthanything
name: "cuda13-depth-anything-cpp-development"
uri: "quay.io/go-skynet/local-ai-backends:master-gpu-nvidia-cuda-13-depth-anything-cpp"
mirrors:
- localai/localai-backends:master-gpu-nvidia-cuda-13-depth-anything-cpp
- !!merge <<: *depthanything
name: "nvidia-l4t-arm64-depth-anything-cpp"
uri: "quay.io/go-skynet/local-ai-backends:latest-nvidia-l4t-arm64-depth-anything-cpp"
mirrors:
- localai/localai-backends:latest-nvidia-l4t-arm64-depth-anything-cpp
- !!merge <<: *depthanything
name: "nvidia-l4t-arm64-depth-anything-cpp-development"
uri: "quay.io/go-skynet/local-ai-backends:master-nvidia-l4t-arm64-depth-anything-cpp"
mirrors:
- localai/localai-backends:master-nvidia-l4t-arm64-depth-anything-cpp
- !!merge <<: *depthanything
name: "cuda13-nvidia-l4t-arm64-depth-anything-cpp"
uri: "quay.io/go-skynet/local-ai-backends:latest-nvidia-l4t-cuda-13-arm64-depth-anything-cpp"
mirrors:
- localai/localai-backends:latest-nvidia-l4t-cuda-13-arm64-depth-anything-cpp
- !!merge <<: *depthanything
name: "cuda13-nvidia-l4t-arm64-depth-anything-cpp-development"
uri: "quay.io/go-skynet/local-ai-backends:master-nvidia-l4t-cuda-13-arm64-depth-anything-cpp"
mirrors:
- localai/localai-backends:master-nvidia-l4t-cuda-13-arm64-depth-anything-cpp
- !!merge <<: *depthanything
name: "intel-sycl-f32-depth-anything-cpp"
uri: "quay.io/go-skynet/local-ai-backends:latest-gpu-intel-sycl-f32-depth-anything-cpp"
mirrors:
- localai/localai-backends:latest-gpu-intel-sycl-f32-depth-anything-cpp
- !!merge <<: *depthanything
name: "intel-sycl-f32-depth-anything-cpp-development"
uri: "quay.io/go-skynet/local-ai-backends:master-gpu-intel-sycl-f32-depth-anything-cpp"
mirrors:
- localai/localai-backends:master-gpu-intel-sycl-f32-depth-anything-cpp
- !!merge <<: *depthanything
name: "intel-sycl-f16-depth-anything-cpp"
uri: "quay.io/go-skynet/local-ai-backends:latest-gpu-intel-sycl-f16-depth-anything-cpp"
mirrors:
- localai/localai-backends:latest-gpu-intel-sycl-f16-depth-anything-cpp
- !!merge <<: *depthanything
name: "intel-sycl-f16-depth-anything-cpp-development"
uri: "quay.io/go-skynet/local-ai-backends:master-gpu-intel-sycl-f16-depth-anything-cpp"
mirrors:
- localai/localai-backends:master-gpu-intel-sycl-f16-depth-anything-cpp
- !!merge <<: *depthanything
name: "vulkan-depth-anything-cpp"
uri: "quay.io/go-skynet/local-ai-backends:latest-gpu-vulkan-depth-anything-cpp"
mirrors:
- localai/localai-backends:latest-gpu-vulkan-depth-anything-cpp
- !!merge <<: *depthanything
name: "vulkan-depth-anything-cpp-development"
uri: "quay.io/go-skynet/local-ai-backends:master-gpu-vulkan-depth-anything-cpp"
mirrors:
- localai/localai-backends:master-gpu-vulkan-depth-anything-cpp
- &vllm
name: "vllm"
license: apache-2.0
@@ -455,12 +667,9 @@
nvidia-l4t-cuda-13: "cuda13-nvidia-l4t-arm64-vllm-omni"
- &mlx
name: "mlx"
uri: "quay.io/go-skynet/local-ai-backends:latest-metal-darwin-arm64-mlx"
icon: https://avatars.githubusercontent.com/u/102832242?s=200&v=4
urls:
- https://github.com/ml-explore/mlx-lm
mirrors:
- localai/localai-backends:latest-metal-darwin-arm64-mlx
license: MIT
description: |
Run LLMs with MLX
@@ -479,12 +688,9 @@
nvidia-l4t-cuda-13: "cuda13-nvidia-l4t-arm64-mlx"
- &mlx-vlm
name: "mlx-vlm"
uri: "quay.io/go-skynet/local-ai-backends:latest-metal-darwin-arm64-mlx-vlm"
icon: https://avatars.githubusercontent.com/u/102832242?s=200&v=4
urls:
- https://github.com/Blaizzy/mlx-vlm
mirrors:
- localai/localai-backends:latest-metal-darwin-arm64-mlx-vlm
license: MIT
description: |
Run Vision-Language Models with MLX
@@ -505,12 +711,9 @@
nvidia-l4t-cuda-13: "cuda13-nvidia-l4t-arm64-mlx-vlm"
- &mlx-audio
name: "mlx-audio"
uri: "quay.io/go-skynet/local-ai-backends:latest-metal-darwin-arm64-mlx-audio"
icon: https://avatars.githubusercontent.com/u/102832242?s=200&v=4
urls:
- https://github.com/Blaizzy/mlx-audio
mirrors:
- localai/localai-backends:latest-metal-darwin-arm64-mlx-audio
license: MIT
description: |
Run Audio Models with MLX
@@ -531,12 +734,9 @@
nvidia-l4t-cuda-13: "cuda13-nvidia-l4t-arm64-mlx-audio"
- &mlx-distributed
name: "mlx-distributed"
uri: "quay.io/go-skynet/local-ai-backends:latest-metal-darwin-arm64-mlx-distributed"
icon: https://avatars.githubusercontent.com/u/102832242?s=200&v=4
urls:
- https://github.com/ml-explore/mlx-lm
mirrors:
- localai/localai-backends:latest-metal-darwin-arm64-mlx-distributed
license: MIT
description: |
Run distributed LLM inference with MLX across multiple Apple Silicon Macs
@@ -632,7 +832,7 @@
default: "cpu-diffusers"
nvidia-cuda-13: "cuda13-diffusers"
nvidia-cuda-12: "cuda12-diffusers"
nvidia-l4t-cuda-12: "nvidia-l4t-arm64-diffusers"
nvidia-l4t-cuda-12: "nvidia-l4t-diffusers"
nvidia-l4t-cuda-13: "cuda13-nvidia-l4t-arm64-diffusers"
- &ace-step
name: "ace-step"
@@ -688,14 +888,17 @@
- &qwen3ttscpp
name: "qwen3-tts-cpp"
description: |
Qwen3-TTS C++ backend using GGML. Native C++ text-to-speech with voice cloning support.
Generates 24kHz mono audio from text with optional reference audio for voice cloning via ECAPA-TDNN speaker embeddings.
Qwen3-TTS C++ backend using GGML (qwentts.cpp). Native C++ text-to-speech
with streaming output, named speakers, voice design, and zero-shot voice
cloning. 24kHz mono, 11 languages with Mandarin dialects. 0.6B and 1.7B
models in Q8_0 / Q4_K_M.
urls:
- https://github.com/predict-woo/qwen3-tts.cpp
- https://github.com/ServeurpersoCom/qwentts.cpp
tags:
- text-to-speech
- tts
- voice-cloning
- streaming
alias: "qwen3-tts-cpp"
capabilities:
default: "cpu-qwen3-tts-cpp"
@@ -709,6 +912,33 @@
nvidia-l4t: "nvidia-l4t-arm64-qwen3-tts-cpp"
nvidia-l4t-cuda-12: "nvidia-l4t-arm64-qwen3-tts-cpp"
nvidia-l4t-cuda-13: "cuda13-nvidia-l4t-arm64-qwen3-tts-cpp"
- &omnivoicecpp
name: "omnivoice-cpp"
description: |
OmniVoice C++ backend using GGML. Native text-to-speech with voice cloning
(reference audio + transcript) and voice design (attribute keywords: gender,
age, pitch, style, volume, emotion). 24kHz mono output, 646 languages.
Supports streaming synthesis.
urls:
- https://github.com/ServeurpersoCom/omnivoice.cpp
tags:
- text-to-speech
- tts
- voice-cloning
- voice-design
alias: "omnivoice-cpp"
capabilities:
default: "cpu-omnivoice-cpp"
nvidia: "cuda12-omnivoice-cpp"
nvidia-cuda-13: "cuda13-omnivoice-cpp"
nvidia-cuda-12: "cuda12-omnivoice-cpp"
intel: "intel-sycl-f16-omnivoice-cpp"
metal: "metal-omnivoice-cpp"
amd: "rocm-omnivoice-cpp"
vulkan: "vulkan-omnivoice-cpp"
nvidia-l4t: "nvidia-l4t-arm64-omnivoice-cpp"
nvidia-l4t-cuda-12: "nvidia-l4t-arm64-omnivoice-cpp"
nvidia-l4t-cuda-13: "cuda13-nvidia-l4t-arm64-omnivoice-cpp"
- &vibevoicecpp
name: "vibevoice-cpp"
description: |
@@ -769,6 +999,42 @@
nvidia-l4t: "vulkan-localvqe"
nvidia-l4t-cuda-12: "vulkan-localvqe"
nvidia-l4t-cuda-13: "vulkan-localvqe"
- &privacyfilter
name: "privacy-filter"
alias: "privacy-filter"
icon: https://cdn-avatars.huggingface.co/v1/production/uploads/5fd5e18a90b6dc4633f6d292/QPiv8pt4JNxr0FdGnpFef.png
description: |
Standalone GGML engine (privacy-filter.cpp) for the OpenMed privacy-filter
PII/NER token-classification model family. It runs the openai-privacy-filter
architecture (a gpt-oss-style sparse-MoE bidirectional token classifier) on
stock upstream GGML — no llama.cpp coupling and no Python — and serves the
TokenClassify RPC (constrained BIOES Viterbi decode into UTF-8 byte-offset
entity spans) used by LocalAI's NER PII redaction tier.
urls:
- https://github.com/localai-org/privacy-filter.cpp
tags:
- token-classification
- ner
- pii
- privacy
- CPU
- GPU
license: apache-2.0
# Builds: CPU (amd64+arm64 manifest), Vulkan (amd64) and CUDA 13 (amd64).
# Only a host that actually reports CUDA 13 gets the CUDA image (it bundles
# the CUDA 13 runtime and needs a recent driver); every other GPU — including
# NVIDIA without a CUDA-13 toolkit, AMD and Intel — routes to the Vulkan
# image, which only needs a Vulkan ICD. Everything else (incl. arm64/Jetson,
# where Vulkan/CUDA images are a future add) falls back to the CPU build,
# already fast for this ~50M-active-param model.
capabilities:
default: "cpu-privacy-filter"
nvidia: "vulkan-privacy-filter"
nvidia-cuda-12: "vulkan-privacy-filter"
nvidia-cuda-13: "cuda13-privacy-filter"
amd: "vulkan-privacy-filter"
intel: "vulkan-privacy-filter"
vulkan: "vulkan-privacy-filter"
- &faster-whisper
icon: https://avatars.githubusercontent.com/u/1520500?s=200&v=4
description: |
@@ -854,7 +1120,7 @@
metal: "metal-kokoro"
nvidia-cuda-13: "cuda13-kokoro"
nvidia-cuda-12: "cuda12-kokoro"
nvidia-l4t-cuda-12: "nvidia-l4t-arm64-kokoro"
nvidia-l4t-cuda-12: "nvidia-l4t-kokoro"
- &kokoros
icon: https://avatars.githubusercontent.com/u/166769057?v=4
description: |
@@ -897,7 +1163,6 @@
intel: "intel-coqui"
amd: "rocm-coqui"
metal: "metal-coqui"
nvidia-cuda-13: "cuda13-coqui"
nvidia-cuda-12: "cuda12-coqui"
icon: https://avatars.githubusercontent.com/u/1338804?s=200&v=4
- &outetts
@@ -1147,19 +1412,19 @@
icon: https://avatars.githubusercontent.com/u/151010778?s=200&v=4
- &piper
name: "piper"
uri: "quay.io/go-skynet/local-ai-backends:latest-piper"
icon: https://github.com/OHF-Voice/piper1-gpl/raw/main/etc/logo.png
urls:
- https://github.com/rhasspy/piper
- https://github.com/mudler/go-piper
mirrors:
- localai/localai-backends:latest-piper
license: MIT
description: |
A fast, local neural text to speech system
tags:
- text-to-speech
- TTS
capabilities:
default: "cpu-piper"
metal: "metal-piper"
- &opus
name: "opus"
alias: "opus"
@@ -1184,12 +1449,12 @@
metal: "metal-opus-development"
- &silero-vad
name: "silero-vad"
uri: "quay.io/go-skynet/local-ai-backends:latest-cpu-silero-vad"
icon: https://user-images.githubusercontent.com/12515440/89997349-b3523080-dc94-11ea-9906-ca2e8bc50535.png
urls:
- https://github.com/snakers4/silero-vad
mirrors:
- localai/localai-backends:latest-cpu-silero-vad
capabilities:
default: "cpu-silero-vad"
metal: "metal-silero-vad"
description: |
Silero VAD: pre-trained enterprise-grade Voice Activity Detector.
Silero VAD is a voice activity detection model that can be used to detect whether a given audio contains speech or not.
@@ -1200,9 +1465,6 @@
- CPU
- &local-store
name: "local-store"
uri: "quay.io/go-skynet/local-ai-backends:latest-cpu-local-store"
mirrors:
- localai/localai-backends:latest-cpu-local-store
urls:
- https://github.com/mudler/LocalAI
description: |
@@ -1213,11 +1475,11 @@
- open-source
- CPU
license: MIT
capabilities:
default: "cpu-local-store"
metal: "metal-local-store"
- &kitten-tts
name: "kitten-tts"
uri: "quay.io/go-skynet/local-ai-backends:latest-kitten-tts"
mirrors:
- localai/localai-backends:latest-kitten-tts
urls:
- https://github.com/KittenML/KittenTTS
description: |
@@ -1226,6 +1488,9 @@
- text-to-speech
- TTS
license: apache-2.0
capabilities:
default: "cpu-kitten-tts"
metal: "metal-kitten-tts"
- &neutts
name: "neutts"
urls:
@@ -1259,6 +1524,20 @@
nvidia: "cuda12-sherpa-onnx"
nvidia-cuda-12: "cuda12-sherpa-onnx"
metal: "metal-sherpa-onnx"
- &supertonic
name: "supertonic"
alias: "supertonic"
urls:
- https://github.com/supertone-inc/supertonic
description: |
Supertonic backend: lightning-fast, on-device multilingual text-to-speech via ONNX Runtime.
Runs Supertone's flow-matching TTS model (Supertone/supertonic-3), 44.1kHz output, 31 languages,
multiple preset voice styles. No espeak-ng dependency.
tags:
- text-to-speech
- TTS
capabilities:
default: "cpu-supertonic"
- !!merge <<: *neutts
name: "neutts-development"
capabilities:
@@ -1351,25 +1630,89 @@
mirrors:
- localai/localai-backends:master-gpu-rocm-hipblas-neutts
- !!merge <<: *mlx
name: "mlx-development"
name: "metal-mlx"
uri: "quay.io/go-skynet/local-ai-backends:latest-metal-darwin-arm64-mlx"
mirrors:
- localai/localai-backends:latest-metal-darwin-arm64-mlx
- !!merge <<: *mlx
name: "metal-mlx-development"
uri: "quay.io/go-skynet/local-ai-backends:master-metal-darwin-arm64-mlx"
mirrors:
- localai/localai-backends:master-metal-darwin-arm64-mlx
- !!merge <<: *mlx
name: "mlx-development"
capabilities:
default: "cpu-mlx-development"
nvidia: "cuda12-mlx-development"
metal: "metal-mlx-development"
nvidia-cuda-12: "cuda12-mlx-development"
nvidia-cuda-13: "cuda13-mlx-development"
nvidia-l4t: "nvidia-l4t-mlx-development"
nvidia-l4t-cuda-12: "nvidia-l4t-mlx-development"
nvidia-l4t-cuda-13: "cuda13-nvidia-l4t-arm64-mlx-development"
- !!merge <<: *mlx-vlm
name: "mlx-vlm-development"
name: "metal-mlx-vlm"
uri: "quay.io/go-skynet/local-ai-backends:latest-metal-darwin-arm64-mlx-vlm"
mirrors:
- localai/localai-backends:latest-metal-darwin-arm64-mlx-vlm
- !!merge <<: *mlx-vlm
name: "metal-mlx-vlm-development"
uri: "quay.io/go-skynet/local-ai-backends:master-metal-darwin-arm64-mlx-vlm"
mirrors:
- localai/localai-backends:master-metal-darwin-arm64-mlx-vlm
- !!merge <<: *mlx-vlm
name: "mlx-vlm-development"
capabilities:
default: "cpu-mlx-vlm-development"
nvidia: "cuda12-mlx-vlm-development"
metal: "metal-mlx-vlm-development"
nvidia-cuda-12: "cuda12-mlx-vlm-development"
nvidia-cuda-13: "cuda13-mlx-vlm-development"
nvidia-l4t: "nvidia-l4t-mlx-vlm-development"
nvidia-l4t-cuda-12: "nvidia-l4t-mlx-vlm-development"
nvidia-l4t-cuda-13: "cuda13-nvidia-l4t-arm64-mlx-vlm-development"
- !!merge <<: *mlx-audio
name: "mlx-audio-development"
name: "metal-mlx-audio"
uri: "quay.io/go-skynet/local-ai-backends:latest-metal-darwin-arm64-mlx-audio"
mirrors:
- localai/localai-backends:latest-metal-darwin-arm64-mlx-audio
- !!merge <<: *mlx-audio
name: "metal-mlx-audio-development"
uri: "quay.io/go-skynet/local-ai-backends:master-metal-darwin-arm64-mlx-audio"
mirrors:
- localai/localai-backends:master-metal-darwin-arm64-mlx-audio
- !!merge <<: *mlx-audio
name: "mlx-audio-development"
capabilities:
default: "cpu-mlx-audio-development"
nvidia: "cuda12-mlx-audio-development"
metal: "metal-mlx-audio-development"
nvidia-cuda-12: "cuda12-mlx-audio-development"
nvidia-cuda-13: "cuda13-mlx-audio-development"
nvidia-l4t: "nvidia-l4t-mlx-audio-development"
nvidia-l4t-cuda-12: "nvidia-l4t-mlx-audio-development"
nvidia-l4t-cuda-13: "cuda13-nvidia-l4t-arm64-mlx-audio-development"
- !!merge <<: *mlx-distributed
name: "mlx-distributed-development"
name: "metal-mlx-distributed"
uri: "quay.io/go-skynet/local-ai-backends:latest-metal-darwin-arm64-mlx-distributed"
mirrors:
- localai/localai-backends:latest-metal-darwin-arm64-mlx-distributed
- !!merge <<: *mlx-distributed
name: "metal-mlx-distributed-development"
uri: "quay.io/go-skynet/local-ai-backends:master-metal-darwin-arm64-mlx-distributed"
mirrors:
- localai/localai-backends:master-metal-darwin-arm64-mlx-distributed
- !!merge <<: *mlx-distributed
name: "mlx-distributed-development"
capabilities:
default: "cpu-mlx-distributed-development"
nvidia: "cuda12-mlx-distributed-development"
metal: "metal-mlx-distributed-development"
nvidia-cuda-12: "cuda12-mlx-distributed-development"
nvidia-cuda-13: "cuda13-mlx-distributed-development"
nvidia-l4t: "nvidia-l4t-mlx-distributed-development"
nvidia-l4t-cuda-12: "nvidia-l4t-mlx-distributed-development"
nvidia-l4t-cuda-13: "cuda13-nvidia-l4t-arm64-mlx-distributed-development"
## mlx
- !!merge <<: *mlx
name: "cpu-mlx"
@@ -1575,10 +1918,20 @@
mirrors:
- localai/localai-backends:master-nvidia-l4t-cuda-13-arm64-mlx-distributed
- !!merge <<: *kitten-tts
name: "kitten-tts-development"
name: "cpu-kitten-tts"
uri: "quay.io/go-skynet/local-ai-backends:latest-kitten-tts"
mirrors:
- localai/localai-backends:latest-kitten-tts
- !!merge <<: *kitten-tts
name: "cpu-kitten-tts-development"
uri: "quay.io/go-skynet/local-ai-backends:master-kitten-tts"
mirrors:
- localai/localai-backends:master-kitten-tts
- !!merge <<: *kitten-tts
name: "kitten-tts-development"
capabilities:
default: "cpu-kitten-tts-development"
metal: "metal-kitten-tts-development"
- !!merge <<: *kitten-tts
name: "metal-kitten-tts"
uri: "quay.io/go-skynet/local-ai-backends:latest-metal-darwin-arm64-kitten-tts"
@@ -1590,11 +1943,23 @@
mirrors:
- localai/localai-backends:master-metal-darwin-arm64-kitten-tts
- !!merge <<: *local-store
name: "local-store-development"
name: "cpu-local-store"
alias: "local-store"
uri: "quay.io/go-skynet/local-ai-backends:latest-cpu-local-store"
mirrors:
- localai/localai-backends:latest-cpu-local-store
- !!merge <<: *local-store
name: "cpu-local-store-development"
alias: "local-store"
uri: "quay.io/go-skynet/local-ai-backends:master-cpu-local-store"
mirrors:
- localai/localai-backends:master-cpu-local-store
- !!merge <<: *local-store
name: "local-store-development"
alias: "local-store"
capabilities:
default: "cpu-local-store-development"
metal: "metal-local-store-development"
- !!merge <<: *local-store
name: "metal-local-store"
uri: "quay.io/go-skynet/local-ai-backends:latest-metal-darwin-arm64-local-store"
@@ -1627,10 +1992,20 @@
mirrors:
- localai/localai-backends:master-metal-darwin-arm64-opus
- !!merge <<: *silero-vad
name: "silero-vad-development"
name: "cpu-silero-vad"
uri: "quay.io/go-skynet/local-ai-backends:latest-cpu-silero-vad"
mirrors:
- localai/localai-backends:latest-cpu-silero-vad
- !!merge <<: *silero-vad
name: "cpu-silero-vad-development"
uri: "quay.io/go-skynet/local-ai-backends:master-cpu-silero-vad"
mirrors:
- localai/localai-backends:master-cpu-silero-vad
- !!merge <<: *silero-vad
name: "silero-vad-development"
capabilities:
default: "cpu-silero-vad-development"
metal: "metal-silero-vad-development"
- !!merge <<: *silero-vad
name: "metal-silero-vad"
uri: "quay.io/go-skynet/local-ai-backends:latest-metal-darwin-arm64-silero-vad"
@@ -1642,10 +2017,20 @@
mirrors:
- localai/localai-backends:master-metal-darwin-arm64-silero-vad
- !!merge <<: *piper
name: "piper-development"
name: "cpu-piper"
uri: "quay.io/go-skynet/local-ai-backends:latest-piper"
mirrors:
- localai/localai-backends:latest-piper
- !!merge <<: *piper
name: "cpu-piper-development"
uri: "quay.io/go-skynet/local-ai-backends:master-piper"
mirrors:
- localai/localai-backends:master-piper
- !!merge <<: *piper
name: "piper-development"
capabilities:
default: "cpu-piper-development"
metal: "metal-piper-development"
- !!merge <<: *piper
name: "metal-piper"
uri: "quay.io/go-skynet/local-ai-backends:latest-metal-darwin-arm64-piper"
@@ -2354,6 +2739,37 @@
uri: "quay.io/go-skynet/local-ai-backends:master-gpu-nvidia-cuda-13-stablediffusion-ggml"
mirrors:
- localai/localai-backends:master-gpu-nvidia-cuda-13-stablediffusion-ggml
## privacy-filter
- !!merge <<: *privacyfilter
name: "cpu-privacy-filter"
uri: "quay.io/go-skynet/local-ai-backends:latest-cpu-privacy-filter"
mirrors:
- localai/localai-backends:latest-cpu-privacy-filter
- !!merge <<: *privacyfilter
name: "cpu-privacy-filter-development"
uri: "quay.io/go-skynet/local-ai-backends:master-cpu-privacy-filter"
mirrors:
- localai/localai-backends:master-cpu-privacy-filter
- !!merge <<: *privacyfilter
name: "vulkan-privacy-filter"
uri: "quay.io/go-skynet/local-ai-backends:latest-gpu-vulkan-privacy-filter"
mirrors:
- localai/localai-backends:latest-gpu-vulkan-privacy-filter
- !!merge <<: *privacyfilter
name: "vulkan-privacy-filter-development"
uri: "quay.io/go-skynet/local-ai-backends:master-gpu-vulkan-privacy-filter"
mirrors:
- localai/localai-backends:master-gpu-vulkan-privacy-filter
- !!merge <<: *privacyfilter
name: "cuda13-privacy-filter"
uri: "quay.io/go-skynet/local-ai-backends:latest-gpu-nvidia-cuda-13-privacy-filter"
mirrors:
- localai/localai-backends:latest-gpu-nvidia-cuda-13-privacy-filter
- !!merge <<: *privacyfilter
name: "cuda13-privacy-filter-development"
uri: "quay.io/go-skynet/local-ai-backends:master-gpu-nvidia-cuda-13-privacy-filter"
mirrors:
- localai/localai-backends:master-gpu-nvidia-cuda-13-privacy-filter
# vllm
- !!merge <<: *vllm
name: "vllm-development"
@@ -3288,6 +3704,121 @@
uri: "quay.io/go-skynet/local-ai-backends:master-gpu-nvidia-cuda-13-qwen3-tts-cpp"
mirrors:
- localai/localai-backends:master-gpu-nvidia-cuda-13-qwen3-tts-cpp
## omnivoice-cpp
- !!merge <<: *omnivoicecpp
name: "omnivoice-cpp-development"
capabilities:
default: "cpu-omnivoice-cpp-development"
nvidia: "cuda12-omnivoice-cpp-development"
nvidia-cuda-13: "cuda13-omnivoice-cpp-development"
nvidia-cuda-12: "cuda12-omnivoice-cpp-development"
intel: "intel-sycl-f16-omnivoice-cpp-development"
metal: "metal-omnivoice-cpp-development"
amd: "rocm-omnivoice-cpp-development"
vulkan: "vulkan-omnivoice-cpp-development"
nvidia-l4t: "nvidia-l4t-arm64-omnivoice-cpp-development"
nvidia-l4t-cuda-12: "nvidia-l4t-arm64-omnivoice-cpp-development"
nvidia-l4t-cuda-13: "cuda13-nvidia-l4t-arm64-omnivoice-cpp-development"
- !!merge <<: *omnivoicecpp
name: "nvidia-l4t-arm64-omnivoice-cpp"
uri: "quay.io/go-skynet/local-ai-backends:latest-nvidia-l4t-arm64-omnivoice-cpp"
mirrors:
- localai/localai-backends:latest-nvidia-l4t-arm64-omnivoice-cpp
- !!merge <<: *omnivoicecpp
name: "nvidia-l4t-arm64-omnivoice-cpp-development"
uri: "quay.io/go-skynet/local-ai-backends:master-nvidia-l4t-arm64-omnivoice-cpp"
mirrors:
- localai/localai-backends:master-nvidia-l4t-arm64-omnivoice-cpp
- !!merge <<: *omnivoicecpp
name: "cuda13-nvidia-l4t-arm64-omnivoice-cpp"
uri: "quay.io/go-skynet/local-ai-backends:latest-nvidia-l4t-cuda-13-arm64-omnivoice-cpp"
mirrors:
- localai/localai-backends:latest-nvidia-l4t-cuda-13-arm64-omnivoice-cpp
- !!merge <<: *omnivoicecpp
name: "cuda13-nvidia-l4t-arm64-omnivoice-cpp-development"
uri: "quay.io/go-skynet/local-ai-backends:master-nvidia-l4t-cuda-13-arm64-omnivoice-cpp"
mirrors:
- localai/localai-backends:master-nvidia-l4t-cuda-13-arm64-omnivoice-cpp
- !!merge <<: *omnivoicecpp
name: "cpu-omnivoice-cpp"
uri: "quay.io/go-skynet/local-ai-backends:latest-cpu-omnivoice-cpp"
mirrors:
- localai/localai-backends:latest-cpu-omnivoice-cpp
- !!merge <<: *omnivoicecpp
name: "metal-omnivoice-cpp"
uri: "quay.io/go-skynet/local-ai-backends:latest-metal-darwin-arm64-omnivoice-cpp"
mirrors:
- localai/localai-backends:latest-metal-darwin-arm64-omnivoice-cpp
- !!merge <<: *omnivoicecpp
name: "metal-omnivoice-cpp-development"
uri: "quay.io/go-skynet/local-ai-backends:master-metal-darwin-arm64-omnivoice-cpp"
mirrors:
- localai/localai-backends:master-metal-darwin-arm64-omnivoice-cpp
- !!merge <<: *omnivoicecpp
name: "cpu-omnivoice-cpp-development"
uri: "quay.io/go-skynet/local-ai-backends:master-cpu-omnivoice-cpp"
mirrors:
- localai/localai-backends:master-cpu-omnivoice-cpp
- !!merge <<: *omnivoicecpp
name: "cuda12-omnivoice-cpp"
uri: "quay.io/go-skynet/local-ai-backends:latest-gpu-nvidia-cuda-12-omnivoice-cpp"
mirrors:
- localai/localai-backends:latest-gpu-nvidia-cuda-12-omnivoice-cpp
- !!merge <<: *omnivoicecpp
name: "rocm-omnivoice-cpp"
uri: "quay.io/go-skynet/local-ai-backends:latest-gpu-rocm-hipblas-omnivoice-cpp"
mirrors:
- localai/localai-backends:latest-gpu-rocm-hipblas-omnivoice-cpp
- !!merge <<: *omnivoicecpp
name: "intel-sycl-f32-omnivoice-cpp"
uri: "quay.io/go-skynet/local-ai-backends:latest-gpu-intel-sycl-f32-omnivoice-cpp"
mirrors:
- localai/localai-backends:latest-gpu-intel-sycl-f32-omnivoice-cpp
- !!merge <<: *omnivoicecpp
name: "intel-sycl-f16-omnivoice-cpp"
uri: "quay.io/go-skynet/local-ai-backends:latest-gpu-intel-sycl-f16-omnivoice-cpp"
mirrors:
- localai/localai-backends:latest-gpu-intel-sycl-f16-omnivoice-cpp
- !!merge <<: *omnivoicecpp
name: "vulkan-omnivoice-cpp"
uri: "quay.io/go-skynet/local-ai-backends:latest-gpu-vulkan-omnivoice-cpp"
mirrors:
- localai/localai-backends:latest-gpu-vulkan-omnivoice-cpp
- !!merge <<: *omnivoicecpp
name: "vulkan-omnivoice-cpp-development"
uri: "quay.io/go-skynet/local-ai-backends:master-gpu-vulkan-omnivoice-cpp"
mirrors:
- localai/localai-backends:master-gpu-vulkan-omnivoice-cpp
- !!merge <<: *omnivoicecpp
name: "cuda12-omnivoice-cpp-development"
uri: "quay.io/go-skynet/local-ai-backends:master-gpu-nvidia-cuda-12-omnivoice-cpp"
mirrors:
- localai/localai-backends:master-gpu-nvidia-cuda-12-omnivoice-cpp
- !!merge <<: *omnivoicecpp
name: "rocm-omnivoice-cpp-development"
uri: "quay.io/go-skynet/local-ai-backends:master-gpu-rocm-hipblas-omnivoice-cpp"
mirrors:
- localai/localai-backends:master-gpu-rocm-hipblas-omnivoice-cpp
- !!merge <<: *omnivoicecpp
name: "intel-sycl-f32-omnivoice-cpp-development"
uri: "quay.io/go-skynet/local-ai-backends:master-gpu-intel-sycl-f32-omnivoice-cpp"
mirrors:
- localai/localai-backends:master-gpu-intel-sycl-f32-omnivoice-cpp
- !!merge <<: *omnivoicecpp
name: "intel-sycl-f16-omnivoice-cpp-development"
uri: "quay.io/go-skynet/local-ai-backends:master-gpu-intel-sycl-f16-omnivoice-cpp"
mirrors:
- localai/localai-backends:master-gpu-intel-sycl-f16-omnivoice-cpp
- !!merge <<: *omnivoicecpp
name: "cuda13-omnivoice-cpp"
uri: "quay.io/go-skynet/local-ai-backends:latest-gpu-nvidia-cuda-13-omnivoice-cpp"
mirrors:
- localai/localai-backends:latest-gpu-nvidia-cuda-13-omnivoice-cpp
- !!merge <<: *omnivoicecpp
name: "cuda13-omnivoice-cpp-development"
uri: "quay.io/go-skynet/local-ai-backends:master-gpu-nvidia-cuda-13-omnivoice-cpp"
mirrors:
- localai/localai-backends:master-gpu-nvidia-cuda-13-omnivoice-cpp
## vibevoice-cpp
- !!merge <<: *vibevoicecpp
name: "nvidia-l4t-arm64-vibevoice-cpp"
@@ -4618,24 +5149,24 @@
- localai/localai-backends:master-cpu-trl
- !!merge <<: *trl
name: "cuda12-trl"
uri: "quay.io/go-skynet/local-ai-backends:latest-cublas-cuda12-trl"
uri: "quay.io/go-skynet/local-ai-backends:latest-gpu-nvidia-cuda-12-trl"
mirrors:
- localai/localai-backends:latest-cublas-cuda12-trl
- localai/localai-backends:latest-gpu-nvidia-cuda-12-trl
- !!merge <<: *trl
name: "cuda12-trl-development"
uri: "quay.io/go-skynet/local-ai-backends:master-cublas-cuda12-trl"
uri: "quay.io/go-skynet/local-ai-backends:master-gpu-nvidia-cuda-12-trl"
mirrors:
- localai/localai-backends:master-cublas-cuda12-trl
- localai/localai-backends:master-gpu-nvidia-cuda-12-trl
- !!merge <<: *trl
name: "cuda13-trl"
uri: "quay.io/go-skynet/local-ai-backends:latest-cublas-cuda13-trl"
uri: "quay.io/go-skynet/local-ai-backends:latest-gpu-nvidia-cuda-13-trl"
mirrors:
- localai/localai-backends:latest-cublas-cuda13-trl
- localai/localai-backends:latest-gpu-nvidia-cuda-13-trl
- !!merge <<: *trl
name: "cuda13-trl-development"
uri: "quay.io/go-skynet/local-ai-backends:master-cublas-cuda13-trl"
uri: "quay.io/go-skynet/local-ai-backends:master-gpu-nvidia-cuda-13-trl"
mirrors:
- localai/localai-backends:master-cublas-cuda13-trl
- localai/localai-backends:master-gpu-nvidia-cuda-13-trl
## llama.cpp quantization backend
- &llama-cpp-quantization
name: "llama-cpp-quantization"
@@ -4802,3 +5333,18 @@
uri: "quay.io/go-skynet/local-ai-backends:master-metal-darwin-arm64-sherpa-onnx"
mirrors:
- localai/localai-backends:master-metal-darwin-arm64-sherpa-onnx
## supertonic
- !!merge <<: *supertonic
name: "supertonic-development"
capabilities:
default: "cpu-supertonic-development"
- !!merge <<: *supertonic
name: "cpu-supertonic"
uri: "quay.io/go-skynet/local-ai-backends:latest-cpu-supertonic"
mirrors:
- localai/localai-backends:latest-cpu-supertonic
- !!merge <<: *supertonic
name: "cpu-supertonic-development"
uri: "quay.io/go-skynet/local-ai-backends:master-cpu-supertonic"
mirrors:
- localai/localai-backends:master-cpu-supertonic

View File

@@ -270,10 +270,17 @@ class BackendServicer(backend_pb2_grpc.BackendServicer):
def TokenClassify(self, request, context):
# Runs HuggingFace's token-classification pipeline and returns
# the aggregated entity spans. The pipeline gives us byte
# offsets via aggregation_strategy="simple" (set at load
# time), so the caller can slice the original text without
# re-tokenising on the Go side.
# the aggregated entity spans.
#
# OFFSET UNITS: the proto contract (TokenClassifyEntity.start/end)
# is UTF-8 BYTE offsets into request.text. HuggingFace's pipeline,
# however, reports start/end as CODEPOINT offsets into the Python
# str (derived from the fast tokenizer's offset_mapping). Those
# coincide only for ASCII; for any multi-byte character they
# diverge — and this entry point exists to serve the explicitly
# multilingual privacy-filter model, so the conversion is
# mandatory, not a nicety. We build one prefix table mapping each
# codepoint index to its byte offset and translate every span.
if not getattr(self, "TokenClassification", False):
context.set_code(grpc.StatusCode.FAILED_PRECONDITION)
context.set_details("model was not loaded as Type=TokenClassification")
@@ -286,18 +293,50 @@ class BackendServicer(backend_pb2_grpc.BackendServicer):
context.set_details(f"token-classification failed: {err}")
return backend_pb2.TokenClassifyResponse()
text = request.text
# byte_at[i] = byte length of text[:i]; len == len(text)+1 so an
# exclusive end offset that points one past the last codepoint
# maps to len(text.encode("utf-8")). Built in a single O(n) pass.
byte_at = [0] * (len(text) + 1)
acc = 0
for i, ch in enumerate(text):
byte_at[i] = acc
acc += len(ch.encode("utf-8"))
byte_at[len(text)] = acc
def to_byte(cp_index, default):
# Clamp out-of-range codepoint indices into the table rather
# than throwing: a span we can't place is better dropped Go-side
# than crashing the RPC.
if cp_index is None:
cp_index = default
if cp_index < 0:
cp_index = 0
elif cp_index > len(text):
cp_index = len(text)
return byte_at[cp_index]
threshold = request.threshold if request.threshold > 0 else 0.0
entities = []
for r in results:
score = float(r.get("score", 0.0))
if score < threshold:
continue
cp_start = r.get("start")
cp_end = r.get("end")
start = to_byte(cp_start, 0)
end = to_byte(cp_end, 0)
entities.append(backend_pb2.TokenClassifyEntity(
entity_group=str(r.get("entity_group") or r.get("entity") or ""),
start=int(r.get("start", 0)),
end=int(r.get("end", 0)),
start=start,
end=end,
score=score,
text=str(r.get("word", "")),
# Slice the original text by the (codepoint) span so the
# echoed text matches start..end exactly, instead of the
# pipeline's reconstructed "word" which can carry wordpiece
# artifacts. Falls back to "word" when offsets are absent.
text=(text[cp_start:cp_end] if cp_start is not None and cp_end is not None
else str(r.get("word", ""))),
))
return backend_pb2.TokenClassifyResponse(entities=entities)

View File

@@ -1,6 +1,6 @@
--extra-index-url https://download.pytorch.org/whl/cpu
accelerate
torch==2.12.0+cpu
torch==2.9.1+cpu
torchvision
torchaudio
transformers

View File

@@ -3,5 +3,5 @@
# on a cu130 host. Pull the cu130-flavoured wheel from vLLM's per-tag index
# instead — the cublas13 case in install.sh adds --index-strategy=unsafe-best-match
# so uv consults this index alongside PyPI.
--extra-index-url https://wheels.vllm.ai/0.22.1/cu130
vllm==0.22.1
--extra-index-url https://wheels.vllm.ai/0.23.0/cu130
vllm==0.23.0

View File

@@ -1,4 +1,4 @@
grpcio==1.81.0
grpcio==1.81.1
protobuf
certifi
setuptools

View File

@@ -79,6 +79,7 @@ class BackendServicer(backend_pb2_grpc.BackendServicer):
def AudioTranscription(self, request, context):
import whisperx
from whisperx.diarize import DiarizationPipeline
resultSegments = []
text = ""
@@ -106,8 +107,8 @@ class BackendServicer(backend_pb2_grpc.BackendServicer):
# Diarize if requested and HF token is available
if request.diarize and self.hf_token:
if self.diarize_pipeline is None:
self.diarize_pipeline = whisperx.DiarizationPipeline(
use_auth_token=self.hf_token,
self.diarize_pipeline = DiarizationPipeline(
token=self.hf_token,
device=self.device,
)
diarize_segments = self.diarize_pipeline(audio)

View File

@@ -635,8 +635,11 @@ func (l *Launcher) showDownloadProgress(version, title string) {
progressBar := widget.NewProgressBar()
progressBar.SetValue(0)
// Status label
// Status label. Truncate with an ellipsis so a long "Download failed:
// <url>" message can't stretch the window (and progress bar) to fit the
// whole error on one line; the full error is shown in the dialog below.
statusLabel := widget.NewLabel("Preparing download...")
statusLabel.Truncation = fyne.TextTruncateEllipsis
// Release notes button
releaseNotesButton := widget.NewButton("View Release Notes", func() {

View File

@@ -454,8 +454,11 @@ func (sm *SystrayManager) showDownloadProgress(version string) {
progressBar := widget.NewProgressBar()
progressBar.SetValue(0)
// Status label
// Status label. Truncate with an ellipsis so a long "Download failed:
// <url>" message can't stretch the window (and progress bar) to fit the
// whole error on one line; the full error is shown in the dialog below.
statusLabel := widget.NewLabel("Preparing download...")
statusLabel.Truncation = fyne.TextTruncateEllipsis
// Release notes button
releaseNotesButton := widget.NewButton("View Release Notes", func() {

View File

@@ -57,8 +57,16 @@ type LauncherUI struct {
// NewLauncherUI creates a new UI instance
func NewLauncherUI() *LauncherUI {
// Truncate the status text with an ellipsis. Status messages can carry a
// download error containing a long, unbreakable URL/path; without this the
// label demands the full single-line width and stretches the window (and
// the progress bar) arbitrarily wide. The full error is still shown in the
// error dialog.
statusLabel := widget.NewLabel("Initializing...")
statusLabel.Truncation = fyne.TextTruncateEllipsis
return &LauncherUI{
statusLabel: widget.NewLabel("Initializing..."),
statusLabel: statusLabel,
versionLabel: widget.NewLabel("Version: Unknown"),
startStopButton: widget.NewButton("Start LocalAI", nil),
webUIButton: widget.NewButton("Open WebUI", nil),
@@ -602,8 +610,11 @@ func (ui *LauncherUI) showDownloadProgress(version, title string) {
progressBar := widget.NewProgressBar()
progressBar.SetValue(0)
// Status label
// Status label. Truncate with an ellipsis so a long "Download failed:
// <url>" message can't stretch the window (and progress bar) to fit the
// whole error on one line; the full error is shown in the dialog below.
statusLabel := widget.NewLabel("Preparing download...")
statusLabel.Truncation = fyne.TextTruncateEllipsis
// Release notes button
releaseNotesButton := widget.NewButton("View Release Notes", func() {

View File

@@ -12,14 +12,15 @@ import (
"github.com/mudler/LocalAI/core/http/auth"
mcpTools "github.com/mudler/LocalAI/core/http/endpoints/mcp"
"github.com/mudler/LocalAI/core/services/agentpool"
"github.com/mudler/LocalAI/core/services/cloudproxy/mitm"
"github.com/mudler/LocalAI/core/services/facerecognition"
"github.com/mudler/LocalAI/core/services/galleryop"
"github.com/mudler/LocalAI/core/services/monitoring"
"github.com/mudler/LocalAI/core/services/nodes"
"github.com/mudler/LocalAI/core/services/routing/admission"
"github.com/mudler/LocalAI/core/services/routing/billing"
"github.com/mudler/LocalAI/core/services/cloudproxy/mitm"
"github.com/mudler/LocalAI/core/services/routing/pii"
"github.com/mudler/LocalAI/core/services/routing/piidetector"
"github.com/mudler/LocalAI/core/services/routing/router"
"github.com/mudler/LocalAI/core/services/voicerecognition"
"github.com/mudler/LocalAI/core/templates"
@@ -71,15 +72,15 @@ type Application struct {
// 1-to-1 host↔model invariant the dispatcher relies on. Read by
// /api/middleware/status so the admin UI can surface the cause.
mitmHostConflicts atomic.Pointer[map[string][]string]
routerDecisions router.DecisionStore
routerRegistry *router.Registry
admissionLimiter *admission.Limiter
watchdogMutex sync.Mutex
watchdogStop chan bool
p2pMutex sync.Mutex
p2pCtx context.Context
p2pCancel context.CancelFunc
agentJobMutex sync.Mutex
routerDecisions router.DecisionStore
routerRegistry *router.Registry
admissionLimiter *admission.Limiter
watchdogMutex sync.Mutex
watchdogStop chan bool
p2pMutex sync.Mutex
p2pCtx context.Context
p2pCancel context.CancelFunc
agentJobMutex sync.Mutex
// Distributed mode services (nil when not in distributed mode)
distributed *DistributedServices
@@ -254,6 +255,122 @@ func (a *Application) PIIEvents() pii.EventStore {
return a.piiEvents
}
// PIINERResolver returns the resolver the chat PII middleware uses to
// turn a configured detector model name into a ready-to-use NERConfig:
// a token-classifier bound over the shared model loader (lazy — the
// model loads on first Detect) plus the detection policy read from that
// model's own pii_detection block. Unknown names resolve to (zero,
// false) so the middleware fails closed. Pass it via pii.WithNERResolver.
func (a *Application) PIINERResolver() pii.NERDetectorResolver {
return func(modelName string) (pii.NERConfig, bool) {
if modelName == "" {
return pii.NERConfig{}, false
}
cfg, ok := a.ModelConfigLoader().GetModelConfig(modelName)
if !ok {
return pii.NERConfig{}, false
}
// Pattern detectors match secrets with the restricted-regex tier
// in-process (no backend load). Build a pattern matcher instead of the
// gRPC token-classifier; on a compile error fail closed with an error
// detector so the request is blocked, not silently unscanned.
if cfg.IsPatternDetector() {
det, err := piidetector.NewPattern(cfg, a.ApplicationConfig())
if err != nil {
det = pii.NewErrNERDetector(err.Error())
}
return pii.NERConfigFromRaw(
det,
0, // patterns are deterministic — no confidence floor
cfg.PIIDetectionDefaultAction(),
patternEntityActions(cfg),
pii.SourcePattern,
), true
}
det := piidetector.New(a.ModelLoader(), cfg, a.ApplicationConfig())
return pii.NERConfigFromRaw(
det,
cfg.PIIDetectionMinScore(),
cfg.PIIDetectionDefaultAction(),
cfg.PIIDetectionEntityActions(),
pii.SourceNER,
), true
}
}
// patternEntityActions merges a pattern detector's per-pattern Action overrides
// into its entity_actions map. A pattern reports matches under its Name, so a
// per-pattern action is just an entity_actions[Name] entry; explicit
// entity_actions still win if both are set.
func patternEntityActions(cfg config.ModelConfig) map[string]string {
out := cfg.PIIDetectionEntityActions()
for _, p := range cfg.PIIDetection.Patterns {
if p.Action == "" || p.Name == "" {
continue
}
if out == nil {
out = map[string]string{}
}
if _, exists := out[p.Name]; !exists {
out[p.Name] = p.Action
}
}
return out
}
// ResolvePIIPolicy resolves the effective request-side PII policy for a
// consuming model, layering the instance-wide default detector
// (PIIDefaultDetectors, set via POST /api/settings) on top of the per-model
// config. It is the single decision point shared by the chat middleware (via
// WithPolicyResolver) and the MITM listener so both agree.
//
// - enabled: an explicit pii.enabled on the model always wins (true OR
// false). Otherwise PII is on when the backend defaults it on — today
// that means cloud-proxy models, which cross the network to a third party.
// - detectors: the model's own pii.detectors, or — when it lists none — the
// global PIIDefaultDetectors fallback. This is what makes cloud-proxy/MITM
// redaction work out of the box.
//
// appConfig is read live, so changes via the settings API take effect on the
// next request without a restart.
func (a *Application) ResolvePIIPolicy(cfg *config.ModelConfig) (enabled bool, detectors []string) {
if cfg == nil {
return false, nil
}
appCfg := a.ApplicationConfig()
if cfg.PII.Enabled != nil {
enabled = *cfg.PII.Enabled
} else {
enabled = cfg.PIIIsEnabled() // backend default (cloud-proxy)
}
if !enabled {
return false, nil
}
detectors = cfg.PIIDetectors()
if len(detectors) == 0 {
detectors = append([]string(nil), appCfg.PIIDefaultDetectors...)
}
return enabled, detectors
}
// PIIPolicyResolver adapts ResolvePIIPolicy to pii.PolicyResolver for
// pii.WithPolicyResolver. The middleware carries the resolved model config as
// `any` (the MODEL_CONFIG context value, a *config.ModelConfig); this asserts
// it back and applies the instance-wide defaults.
func (a *Application) PIIPolicyResolver() pii.PolicyResolver {
return func(modelCfg any) (bool, []string) {
cfg, ok := modelCfg.(*config.ModelConfig)
if !ok {
return false, nil
}
return a.ResolvePIIPolicy(cfg)
}
}
// MITMCA returns the cloudproxy MITM proxy's CA, or nil when the
// MITM listener is disabled.
func (a *Application) MITMCA() *mitm.CA { return a.mitmCA.Load() }

View File

@@ -161,6 +161,21 @@ func initDistributed(cfg *config.ApplicationConfig, authDB *gorm.DB, configLoade
}
xlog.Info("Node registry initialized")
// Seed declarative per-model scheduling config (LOCALAI_MODEL_SCHEDULING /
// LOCALAI_MODEL_SCHEDULING_CONFIG). Authoritative: overwrites matching models
// on every boot. Runs before the reconciler starts so the first tick already
// sees the desired state. Models not listed are left untouched.
if cfg.Distributed.ModelSchedulingJSON != "" || cfg.Distributed.ModelSchedulingConfigPath != "" {
schedConfigs, err := nodes.ParseSchedulingSeed(cfg.Distributed.ModelSchedulingJSON, cfg.Distributed.ModelSchedulingConfigPath)
if err != nil {
return nil, fmt.Errorf("parsing declarative model scheduling config: %w", err)
}
if err := registry.SeedModelScheduling(context.Background(), schedConfigs); err != nil {
return nil, fmt.Errorf("seeding declarative model scheduling config: %w", err)
}
xlog.Info("Applied declarative model scheduling config", "models", len(schedConfigs))
}
// Collect SmartRouter option values; the router itself is created after all
// dependencies (including FileStager and Unloader) are ready.
var routerAuthToken string

View File

@@ -8,6 +8,7 @@ import (
"github.com/mudler/LocalAI/core/config"
"github.com/mudler/LocalAI/core/services/cloudproxy/mitm"
"github.com/mudler/LocalAI/core/services/routing/pii"
"github.com/mudler/xlog"
)
@@ -91,25 +92,41 @@ func startMITMLocked(app *Application, options *config.ApplicationConfig) error
}
sort.Strings(effectiveHosts)
// Per-host PII gate inherits from the owning model's pii.enabled.
// A non-cloud-proxy backend with no explicit pii.enabled resolves
// to false → host is intercepted but the regex pass is skipped
// (audit events still record).
var piiDisabled []string
// Per-host NER detectors come from the owning model's pii.detectors
// (resolved against each detector model's pii_detection policy). A
// host whose model has pii.enabled=false, lists no detectors, or
// whose detectors can't be resolved gets no entry → it is intercepted
// and forwarded unredacted (audit events still record traffic). An
// unresolvable detector is recorded as an error-detector so the
// request fails closed at request time rather than leaking.
resolver := app.PIINERResolver()
detectorsByHost := map[string][]pii.NERConfig{}
for host, modelName := range ownership.Owners {
cfg, exists := app.backendLoader.GetModelConfig(modelName)
if !exists {
continue
}
if !cfg.PIIIsEnabled() {
piiDisabled = append(piiDisabled, host)
// Resolve through the shared policy so cloud-proxy hosts inherit the
// instance-wide default detector when they name none of their own.
enabled, detectors := app.ResolvePIIPolicy(&cfg)
if !enabled || len(detectors) == 0 {
continue
}
cfgs := make([]pii.NERConfig, 0, len(detectors))
for _, name := range detectors {
nc, ok := resolver(name)
if !ok {
xlog.Error("mitm: detector model not resolvable; requests to host will fail closed", "host", host, "detector", name)
nc = pii.NERConfig{Detector: pii.NewErrNERDetector("detector model '" + name + "' not resolvable")}
}
cfgs = append(cfgs, nc)
}
detectorsByHost[host] = cfgs
}
handler := mitm.NewPIIHandler(mitm.PIIHandlerOptions{
Redactor: app.piiRedactor,
EventStore: app.piiEvents,
HostsWithPIIDisabled: piiDisabled,
EventStore: app.piiEvents,
DetectorsByHost: detectorsByHost,
})
srv, err := mitm.NewServer(mitm.Config{
@@ -132,7 +149,7 @@ func startMITMLocked(app *Application, options *config.ApplicationConfig) error
"ca_dir", caDir,
"intercept_hosts", effectiveHosts,
"model_owned_hosts", len(ownership.Owners),
"pii_disabled_hosts", len(piiDisabled),
"pii_detector_hosts", len(detectorsByHost),
)
return nil
}

View File

@@ -0,0 +1,51 @@
package application
import (
"github.com/mudler/LocalAI/core/config"
. "github.com/onsi/ginkgo/v2"
. "github.com/onsi/gomega"
)
var _ = Describe("ResolvePIIPolicy", func() {
chat := config.FLAG_CHAT
bp := func(b bool) *bool { return &b }
mk := func(c *config.ApplicationConfig) *Application {
return &Application{applicationConfig: c}
}
It("lets an explicit pii.enabled=false win over the global default detector", func() {
app := mk(&config.ApplicationConfig{PIIDefaultDetectors: []string{"pf"}})
cfg := &config.ModelConfig{Backend: "cloud-proxy", KnownUsecases: &chat}
cfg.PII.Enabled = bp(false)
enabled, dets := app.ResolvePIIPolicy(cfg)
Expect(enabled).To(BeFalse())
Expect(dets).To(BeNil())
})
It("enables a cloud-proxy model with the global default detector (closes the no-op gap)", func() {
// cloud-proxy defaults PIIIsEnabled()==true but lists no detectors, so
// without a global default it scans with nothing.
app := mk(&config.ApplicationConfig{PIIDefaultDetectors: []string{"pf"}})
cfg := &config.ModelConfig{Backend: "cloud-proxy"}
enabled, dets := app.ResolvePIIPolicy(cfg)
Expect(enabled).To(BeTrue())
Expect(dets).To(Equal([]string{"pf"}))
})
It("leaves a non-cloud model off by default (no instance usecase default-on)", func() {
app := mk(&config.ApplicationConfig{PIIDefaultDetectors: []string{"pf"}})
cfg := &config.ModelConfig{Backend: "llama-cpp", KnownUsecases: &chat}
enabled, _ := app.ResolvePIIPolicy(cfg)
Expect(enabled).To(BeFalse())
})
It("prefers the model's own detectors over the global default", func() {
app := mk(&config.ApplicationConfig{PIIDefaultDetectors: []string{"global-pf"}})
cfg := &config.ModelConfig{Backend: "cloud-proxy"}
cfg.PII.Detectors = []string{"own-pf"}
enabled, dets := app.ResolvePIIPolicy(cfg)
Expect(enabled).To(BeTrue())
Expect(dets).To(Equal([]string{"own-pf"}))
})
})

View File

@@ -53,7 +53,6 @@ func New(opts ...config.AppOption) (*Application, error) {
caps, err := xsysinfo.CPUCapabilities()
if err == nil {
xlog.Debug("CPU capabilities", "capabilities", caps)
}
gpus, err := xsysinfo.GPUs()
if err == nil {
@@ -68,18 +67,18 @@ func New(opts ...config.AppOption) (*Application, error) {
return nil, fmt.Errorf("models path cannot be empty")
}
err = os.MkdirAll(options.SystemState.Model.ModelsPath, 0750)
err = os.MkdirAll(options.SystemState.Model.ModelsPath, 0o750)
if err != nil {
return nil, fmt.Errorf("unable to create ModelPath: %q", err)
}
if options.GeneratedContentDir != "" {
err := os.MkdirAll(options.GeneratedContentDir, 0750)
err := os.MkdirAll(options.GeneratedContentDir, 0o750)
if err != nil {
return nil, fmt.Errorf("unable to create ImageDir: %q", err)
}
}
if options.UploadDir != "" {
err := os.MkdirAll(options.UploadDir, 0750)
err := os.MkdirAll(options.UploadDir, 0o750)
if err != nil {
return nil, fmt.Errorf("unable to create UploadDir: %q", err)
}
@@ -87,7 +86,7 @@ func New(opts ...config.AppOption) (*Application, error) {
// Create and migrate data directory
if options.DataPath != "" {
if err := os.MkdirAll(options.DataPath, 0750); err != nil {
if err := os.MkdirAll(options.DataPath, 0o750); err != nil {
return nil, fmt.Errorf("unable to create DataPath: %q", err)
}
// Migrate data from DynamicConfigsDir to DataPath if needed
@@ -192,44 +191,14 @@ func New(opts ...config.AppOption) (*Application, error) {
xlog.Info("stats: disabled by --disable-stats")
}
// Wire the regex PII filter. Default-on: a single-user box gets
// the built-in pattern set the first time it starts, with email/
// phone/SSN/credit-card on mask and api_key_prefix on block. If
// the operator wants different actions, --pii-config points at a
// YAML file that overrides per-id; --disable-pii turns it off
// entirely.
if !options.DisablePII {
patterns, err := pii.LoadConfig(options.PIIConfigPath)
if err != nil {
return nil, fmt.Errorf("pii config: %w", err)
}
application.piiRedactor = pii.NewRedactor(patterns)
application.piiEvents = pii.NewMemoryEventStore(0)
// Apply persisted per-pattern overrides — admins toggling
// action/disabled via the UI and clicking "Save to disk" land
// here on the next start. Bad ids are warned and ignored so a
// stale entry doesn't block startup.
for id, ov := range options.PIIPatternOverrides {
if ov.Action != nil {
if err := application.piiRedactor.SetAction(id, pii.Action(*ov.Action)); err != nil {
xlog.Warn("pii: persisted override skipped", "pattern", id, "error", err)
continue
}
}
if ov.Disabled != nil {
if err := application.piiRedactor.SetDisabled(id, *ov.Disabled); err != nil {
xlog.Warn("pii: persisted disable skipped", "pattern", id, "error", err)
}
}
}
xlog.Info("pii: filter enabled",
"patterns", len(patterns),
"config_path", options.PIIConfigPath,
"persisted_overrides", len(options.PIIPatternOverrides),
)
} else {
xlog.Info("pii: disabled by --disable-pii")
}
// Wire the PII filter subsystem. The redactor is now a stateless
// handle — detection is driven by per-model NER detectors
// (pii.detectors → the detector model's pii_detection policy), run
// request-side by the chat middleware and the MITM input path. The
// regex tier was removed; redaction is opt-in per model via
// PIIIsEnabled(). The event store backs the /api/pii/events audit log.
application.piiRedactor = &pii.Redactor{}
application.piiEvents = pii.NewMemoryEventStore(0)
// Wire the routing decision log. Always-on when stats are enabled —
// the per-router admin page reads this as the live activity feed
@@ -517,7 +486,7 @@ func startWatcher(options *config.ApplicationConfig) {
if _, err := os.Stat(options.DynamicConfigsDir); err != nil {
if os.IsNotExist(err) {
// We try to create the directory if it does not exist and was specified
if err := os.MkdirAll(options.DynamicConfigsDir, 0700); err != nil {
if err := os.MkdirAll(options.DynamicConfigsDir, 0o700); err != nil {
xlog.Error("failed creating DynamicConfigsDir", "error", err)
}
} else {
@@ -764,16 +733,6 @@ func loadRuntimeSettingsFromFile(options *config.ApplicationConfig) {
options.MITMListen = *settings.MITMListen
}
// PII pattern overrides — file is the only source; CLI flags don't
// reach into this map. Apply unconditionally when present; the
// redactor wiring below sees the result on first construction.
if settings.PIIPatternOverrides != nil {
options.PIIPatternOverrides = make(map[string]config.PIIPatternRuntimeOverride, len(*settings.PIIPatternOverrides))
for id, ov := range *settings.PIIPatternOverrides {
options.PIIPatternOverrides[id] = ov
}
}
// Backend upgrade flags
if settings.AutoUpgradeBackends != nil {
if !options.AutoUpgradeBackends {
@@ -924,7 +883,7 @@ func loadOrGenerateHMACSecret(path string) (string, error) {
}
secret := hex.EncodeToString(b)
if err := os.WriteFile(path, []byte(secret), 0600); err != nil {
if err := os.WriteFile(path, []byte(secret), 0o600); err != nil {
return "", fmt.Errorf("failed to persist HMAC secret: %w", err)
}

66
core/backend/depth.go Normal file
View File

@@ -0,0 +1,66 @@
package backend
import (
"context"
"fmt"
"time"
"github.com/mudler/LocalAI/core/config"
"github.com/mudler/LocalAI/core/trace"
"github.com/mudler/LocalAI/pkg/grpc/proto"
"github.com/mudler/LocalAI/pkg/model"
)
// Depth runs depth estimation (Depth Anything 3) on the supplied image and
// returns the full DepthResponse: per-pixel metric depth + confidence + sky,
// camera pose (extrinsics/intrinsics), an optional 3D point cloud and any
// requested exports (glb/colmap). The include_* flags and exports mirror the
// DepthRequest proto so callers can ask for less work.
func Depth(
ctx context.Context,
in *proto.DepthRequest,
loader *model.ModelLoader,
appConfig *config.ApplicationConfig,
modelConfig config.ModelConfig,
) (*proto.DepthResponse, error) {
opts := ModelOptions(modelConfig, appConfig)
depthModel, err := loader.Load(opts...)
if err != nil {
recordModelLoadFailure(appConfig, modelConfig.Name, modelConfig.Backend, err, nil)
return nil, err
}
if depthModel == nil {
return nil, fmt.Errorf("could not load depth model")
}
var startTime time.Time
if appConfig.EnableTracing {
trace.InitBackendTracingIfEnabled(appConfig.TracingMaxItems, appConfig.TracingMaxBodyBytes)
startTime = time.Now()
}
res, err := depthModel.Depth(ctx, in)
if appConfig.EnableTracing {
errStr := ""
if err != nil {
errStr = err.Error()
}
trace.RecordBackendTrace(trace.BackendTrace{
Timestamp: startTime,
Duration: time.Since(startTime),
Type: trace.BackendTraceDepth,
ModelName: modelConfig.Name,
Backend: modelConfig.Backend,
Summary: trace.TruncateString(in.GetSrc(), 200),
Error: errStr,
Data: map[string]any{
"exports": in.GetExports(),
},
})
}
return res, err
}

View File

@@ -368,6 +368,25 @@ func gRPCPredictOpts(c config.ModelConfig, modelPath string) *pb.PredictOptions
if c.ReasoningEffort != "" {
metadata["reasoning_effort"] = c.ReasoningEffort
}
// Client request metadata overrides the server-derived reasoning levers and
// reaches every backend through these standalone string keys (Python backends
// read them directly). The reserved blob key is server-owned and skipped.
for k, v := range c.RequestMetadata {
if k == "chat_template_kwargs" {
continue
}
metadata[k] = v
}
// Build the generic chat_template_kwargs blob (model config map + coerced
// metadata) for llama.cpp and write it LAST so a client cannot clobber it.
if blob := c.ResolveChatTemplateKwargs(metadata); len(blob) > 0 {
b, err := json.Marshal(blob)
if err != nil {
xlog.Warn("failed to marshal chat_template_kwargs", "error", err)
} else {
metadata["chat_template_kwargs"] = string(b)
}
}
pbOpts.Metadata = metadata
// Logprobs and TopLogprobs are set by the caller if provided

View File

@@ -161,3 +161,67 @@ var _ = Describe("grpcModelOpts NBatch", func() {
Expect(opts.ContextSize).To(BeEquivalentTo(4096), "n_batch must match the effective n_ctx the backend receives")
})
})
// Guards the generic chat_template_kwargs forwarding: the model config map plus any
// per-request metadata overrides are merged, coerced, and serialised into the
// backend metadata blob that llama.cpp reads. Client metadata also overrides the
// server-derived standalone enable_thinking key (cross-backend consistency).
var _ = Describe("gRPCPredictOpts chat_template_kwargs metadata", func() {
baseCfg := func() config.ModelConfig {
cfg := config.ModelConfig{}
cfg.SetDefaults()
return cfg
}
It("serialises the config map into the chat_template_kwargs blob", func() {
cfg := baseCfg()
cfg.ChatTemplateKwargs = map[string]any{"preserve_thinking": true}
opts := gRPCPredictOpts(cfg, "/tmp/models")
Expect(opts.Metadata).To(HaveKey("chat_template_kwargs"))
var blob map[string]any
Expect(json.Unmarshal([]byte(opts.Metadata["chat_template_kwargs"]), &blob)).To(Succeed())
Expect(blob).To(HaveKeyWithValue("preserve_thinking", true))
})
It("serialises reasoning_effort into the blob as a JSON string", func() {
cfg := baseCfg()
cfg.ReasoningEffort = "high"
opts := gRPCPredictOpts(cfg, "/tmp/models")
Expect(opts.Metadata).To(HaveKey("chat_template_kwargs"))
var blob map[string]any
Expect(json.Unmarshal([]byte(opts.Metadata["chat_template_kwargs"]), &blob)).To(Succeed())
// reasoning_effort must remain a string in the blob (jinja templates that
// key on the level read a string), unlike enable_thinking which is a bool.
Expect(blob["reasoning_effort"]).To(BeAssignableToTypeOf(""))
Expect(blob).To(HaveKeyWithValue("reasoning_effort", "high"))
})
It("lets client request metadata override the server-derived enable_thinking key", func() {
cfg := baseCfg()
disable := true
cfg.ReasoningConfig = reasoning.Config{DisableReasoning: &disable} // server: enable_thinking=false
cfg.RequestMetadata = map[string]string{"enable_thinking": "true"} // client overrides
opts := gRPCPredictOpts(cfg, "/tmp/models")
// standalone key (Python backends) reflects the client override
Expect(opts.Metadata).To(HaveKeyWithValue("enable_thinking", "true"))
// blob (llama.cpp) reflects it too, as a real bool
var blob map[string]any
Expect(json.Unmarshal([]byte(opts.Metadata["chat_template_kwargs"]), &blob)).To(Succeed())
Expect(blob).To(HaveKeyWithValue("enable_thinking", true))
})
It("does not let a client clobber the blob via a chat_template_kwargs metadata key", func() {
cfg := baseCfg()
cfg.ChatTemplateKwargs = map[string]any{"preserve_thinking": true}
cfg.RequestMetadata = map[string]string{"chat_template_kwargs": "{\"preserve_thinking\": false}"}
opts := gRPCPredictOpts(cfg, "/tmp/models")
var blob map[string]any
Expect(json.Unmarshal([]byte(opts.Metadata["chat_template_kwargs"]), &blob)).To(Succeed())
Expect(blob).To(HaveKeyWithValue("preserve_thinking", true))
})
It("omits the blob when there is nothing to forward", func() {
opts := gRPCPredictOpts(baseCfg(), "/tmp/models")
Expect(opts.Metadata).ToNot(HaveKey("chat_template_kwargs"))
})
})

View File

@@ -0,0 +1,150 @@
package backend
import (
"context"
"time"
"github.com/mudler/LocalAI/core/config"
"github.com/mudler/LocalAI/core/trace"
pb "github.com/mudler/LocalAI/pkg/grpc/proto"
model "github.com/mudler/LocalAI/pkg/model"
)
// TokenEntity is one detected span from a token-classification (NER)
// model. Mirrors pb.TokenClassifyEntity but keeps the proto type out of
// consumers. Start/End are BYTE offsets into the classified text,
// half-open (addressing text[Start:End]) — the proto contract. Group is
// the model's entity label (e.g. "private_person", "EMAIL").
type TokenEntity struct {
Group string `json:"group"`
Start int `json:"start"`
End int `json:"end"`
Score float32 `json:"score"`
Text string `json:"text"`
}
// TokenClassifyOptions controls a single TokenClassify request.
type TokenClassifyOptions struct {
// Threshold drops entities the backend scores below this value at
// the source. 0 returns everything the model emits; downstream
// callers (e.g. the PII redactor's MinScore) can still filter
// further once they know the per-request policy.
Threshold float32
}
// TokenClassifier runs a token-classification model over text and
// returns the detected entity spans. Implemented by NewTokenClassifier
// over a model-loaded backend; the PII redactor's encoder/NER tier
// consumes this via a pii.NERDetector adapter (see
// core/services/routing/piidetector).
type TokenClassifier interface {
TokenClassify(ctx context.Context, text string) ([]TokenEntity, error)
}
// NewTokenClassifier binds (loader, modelConfig, appConfig) into a
// TokenClassifier. The underlying backend is resolved lazily on the
// first call, mirroring NewScorer.
func NewTokenClassifier(loader *model.ModelLoader, modelConfig config.ModelConfig, appConfig *config.ApplicationConfig, opts TokenClassifyOptions) TokenClassifier {
return &modelTokenClassifier{loader: loader, modelConfig: modelConfig, appConfig: appConfig, opts: opts}
}
type modelTokenClassifier struct {
loader *model.ModelLoader
modelConfig config.ModelConfig
appConfig *config.ApplicationConfig
opts TokenClassifyOptions
}
func (m *modelTokenClassifier) TokenClassify(ctx context.Context, text string) ([]TokenEntity, error) {
fn, err := ModelTokenClassify(text, m.opts, m.loader, m.modelConfig, m.appConfig)
if err != nil {
return nil, err
}
return fn(ctx)
}
// ModelTokenClassify loads the backend for modelConfig and returns a
// closure that classifies `text`. Mirrors ModelScore: the closure is
// bound to the loaded model so a caller can reuse it within a request
// without re-resolving the backend.
//
// When tracing is enabled it records a BackendTraceTokenClassify row so the
// detector's output — every entity's group, byte range, confidence and the
// matched substring — shows in the Traces UI alongside the request it gated.
// This is the technical view for debugging false positives (e.g. a phone
// number scored as SSN); the persisted PIIEvent keeps only a hash.
func ModelTokenClassify(text string, opts TokenClassifyOptions, loader *model.ModelLoader, modelConfig config.ModelConfig, appConfig *config.ApplicationConfig) (func(ctx context.Context) ([]TokenEntity, error), error) {
modelOpts := ModelOptions(modelConfig, appConfig)
inferenceModel, err := loader.Load(modelOpts...)
if err != nil {
recordModelLoadFailure(appConfig, modelConfig.Name, modelConfig.Backend, err, nil)
return nil, err
}
return func(ctx context.Context) ([]TokenEntity, error) {
var startTime time.Time
if appConfig.EnableTracing {
trace.InitBackendTracingIfEnabled(appConfig.TracingMaxItems, appConfig.TracingMaxBodyBytes)
startTime = time.Now()
}
resp, err := inferenceModel.TokenClassify(ctx, &pb.TokenClassifyRequest{
Text: text,
Threshold: opts.Threshold,
})
entities := tokenClassifyResponseToEntities(resp)
if appConfig.EnableTracing {
trace.RecordBackendTrace(tokenClassifyTrace(modelConfig, text, opts.Threshold, entities, startTime, err))
}
if err != nil {
return nil, err
}
return entities, nil
}, nil
}
// tokenClassifyTrace assembles the Traces-UI row for one NER call: the input
// preview, the threshold, and every detected entity (group, byte range,
// confidence, matched text). Split out from the closure so the Data assembly
// is unit-testable without a live backend.
func tokenClassifyTrace(modelConfig config.ModelConfig, text string, threshold float32, entities []TokenEntity, start time.Time, callErr error) trace.BackendTrace {
errStr := ""
if callErr != nil {
errStr = callErr.Error()
}
return trace.BackendTrace{
Timestamp: start,
Duration: time.Since(start),
Type: trace.BackendTraceTokenClassify,
ModelName: modelConfig.Name,
Backend: modelConfig.Backend,
Summary: trace.TruncateString(text, 200),
Error: errStr,
Data: map[string]any{
"input_chars": len(text),
"threshold": threshold,
"entities": entities,
},
}
}
// tokenClassifyResponseToEntities converts the wire-format response into
// the value type consumed by callers. Extracted so the conversion can be
// unit-tested without a real backend (see token_classify_test.go).
func tokenClassifyResponseToEntities(resp *pb.TokenClassifyResponse) []TokenEntity {
if resp == nil {
return nil
}
out := make([]TokenEntity, 0, len(resp.Entities))
for _, e := range resp.Entities {
if e == nil {
continue
}
out = append(out, TokenEntity{
Group: e.EntityGroup,
Start: int(e.Start),
End: int(e.End),
Score: e.Score,
Text: e.Text,
})
}
return out
}

View File

@@ -0,0 +1,61 @@
package backend
import (
"errors"
"time"
"github.com/mudler/LocalAI/core/config"
"github.com/mudler/LocalAI/core/trace"
pb "github.com/mudler/LocalAI/pkg/grpc/proto"
. "github.com/onsi/ginkgo/v2"
. "github.com/onsi/gomega"
)
var _ = Describe("tokenClassifyResponseToEntities", func() {
It("returns nil for a nil response", func() {
Expect(tokenClassifyResponseToEntities(nil)).To(BeNil())
})
It("maps proto entities to TokenEntity, skipping nil rows", func() {
resp := &pb.TokenClassifyResponse{
Entities: []*pb.TokenClassifyEntity{
{EntityGroup: "private_person", Start: 3, End: 8, Score: 0.97, Text: "Alice"},
nil,
{EntityGroup: "EMAIL", Start: 20, End: 40, Score: 0.5, Text: "a@b.com"},
},
}
Expect(tokenClassifyResponseToEntities(resp)).To(Equal([]TokenEntity{
{Group: "private_person", Start: 3, End: 8, Score: 0.97, Text: "Alice"},
{Group: "EMAIL", Start: 20, End: 40, Score: 0.5, Text: "a@b.com"},
}))
})
It("returns an empty (non-nil) slice for a response with no entities", func() {
out := tokenClassifyResponseToEntities(&pb.TokenClassifyResponse{})
Expect(out).NotTo(BeNil())
Expect(out).To(BeEmpty())
})
})
var _ = Describe("tokenClassifyTrace", func() {
cfg := config.ModelConfig{Name: "privacy-filter", Backend: "privacy-filter"}
ents := []TokenEntity{{Group: "SSN", Start: 5, End: 16, Score: 0.62, Text: "123-45-6789"}}
It("captures model, input preview, threshold and per-entity detail", func() {
tr := tokenClassifyTrace(cfg, "ssn is 123-45-6789", 0.5, ents, time.Now(), nil)
Expect(tr.Type).To(Equal(trace.BackendTraceTokenClassify))
Expect(tr.ModelName).To(Equal("privacy-filter"))
Expect(tr.Backend).To(Equal("privacy-filter"))
Expect(tr.Summary).To(ContainSubstring("ssn is"))
Expect(tr.Error).To(BeEmpty())
Expect(tr.Data["input_chars"]).To(Equal(len("ssn is 123-45-6789")))
Expect(tr.Data["threshold"]).To(BeEquivalentTo(float32(0.5)))
Expect(tr.Data["entities"]).To(Equal(ents))
})
It("records the backend error string when the call failed", func() {
tr := tokenClassifyTrace(cfg, "x", 0, nil, time.Now(), errors.New("boom"))
Expect(tr.Error).To(Equal("boom"))
})
})

View File

@@ -172,6 +172,8 @@ type RunCMD struct {
NatsTLSCert string `env:"LOCALAI_NATS_TLS_CERT" type:"existingfile" help:"Client certificate for NATS mTLS" group:"distributed"`
NatsTLSKey string `env:"LOCALAI_NATS_TLS_KEY" type:"existingfile" help:"Client private key for NATS mTLS" group:"distributed"`
ExposeNodeHeader bool `env:"LOCALAI_EXPOSE_NODE_HEADER" default:"false" help:"Set the X-LocalAI-Node response header on inference responses (OpenAI chat/completions/embeddings, Anthropic /v1/messages, Ollama /api/chat,/api/generate,/api/embed) with the ID of the worker that served the request. Disabled by default: the node ID reveals internal topology and should not be exposed on a public endpoint. Best-effort: under heavy concurrency the header may reflect a recent routing decision rather than this exact request's." group:"distributed"`
ModelScheduling string `env:"LOCALAI_MODEL_SCHEDULING" help:"Declarative per-model scheduling config applied at startup (inline JSON list of {model_name,node_selector,min_replicas,max_replicas,replicas:\"all\"}). Authoritative: overwrites matching models on every boot. Distributed mode only." group:"distributed"`
ModelSchedulingConfig string `env:"LOCALAI_MODEL_SCHEDULING_CONFIG" help:"Path to a YAML file with the same per-model scheduling list as LOCALAI_MODEL_SCHEDULING. Distributed mode only." group:"distributed"`
Version bool
@@ -347,6 +349,15 @@ func (r *RunCMD) Run(ctx *cliContext.Context) error {
if r.ExposeNodeHeader {
opts = append(opts, config.WithExposeNodeHeader(true))
}
if r.ModelScheduling != "" {
opts = append(opts, config.WithModelSchedulingJSON(r.ModelScheduling))
}
if r.ModelSchedulingConfig != "" {
opts = append(opts, config.WithModelSchedulingConfigPath(r.ModelSchedulingConfig))
}
if !r.Distributed && (r.ModelScheduling != "" || r.ModelSchedulingConfig != "") {
xlog.Warn("LOCALAI_MODEL_SCHEDULING / LOCALAI_MODEL_SCHEDULING_CONFIG is set but distributed mode is disabled (LOCALAI_DISTRIBUTED=false) - ignoring")
}
if r.DisableMetricsEndpoint {
opts = append(opts, config.DisableMetricsEndpoint)

View File

@@ -57,25 +57,6 @@ type ApplicationConfig struct {
// touch disk or memory.
DisableStats bool
// PIIConfigPath points to an optional YAML file describing the PII
// pattern set. When empty, the routing/pii module's DefaultPatterns()
// (email, phone, SSN, credit card, IPv4, API key prefixes) are
// loaded with their default actions. Each entry overrides the
// matching default by ID:
//
// patterns:
// - id: email
// action: allow # downgrade default mask -> allow (log only)
// - id: ssn
// action: block # upgrade default mask -> block
//
// Unknown ids are rejected with a clear error at startup.
PIIConfigPath string
// DisablePII turns the regex PII filter off entirely. Default
// (false) enables it on the OpenAI chat completions route.
DisablePII bool
// MITMListen is the address (host:port) the cloudproxy MITM
// listener binds on. Empty disables the MITM proxy entirely.
// Use case: redacting PII from Claude Code / Codex CLI traffic
@@ -84,18 +65,20 @@ type ApplicationConfig struct {
// LocalAI exposes at /api/middleware/proxy-ca.crt.
MITMListen string
// PIIDefaultDetectors lists token-classification (NER) detector model
// names applied to any PII-enabled model that does not name its own
// pii.detectors. This makes cloud-proxy / MITM redaction work out of the
// box (those default to PII-enabled but carry no detector list) and lets
// an operator set one detector for the whole instance. Set at runtime via
// POST /api/settings; read live by Application.ResolvePIIPolicy.
PIIDefaultDetectors []string
// MITMCADir holds the persisted MITM proxy CA cert and private
// key. The CA is generated on first start; subsequent starts
// reload it so clients keep trusting the same root. The key
// file is mode 0600.
MITMCADir string
// PIIPatternOverrides applies persisted per-id deltas (action,
// disabled) to the live redactor at startup. Loaded from
// runtime_settings.json and applied right after pii.NewRedactor.
// nil/empty leaves the YAML defaults in place.
PIIPatternOverrides map[string]PIIPatternRuntimeOverride
DisableWebUI bool
OllamaAPIRootEndpoint bool
EnforcePredownloadScans bool
@@ -488,6 +471,16 @@ func (o *ApplicationConfig) GetEffectiveMaxActiveBackends() int {
return 0
}
// WatchdogShouldRun reports whether the live watchdog process should be
// running for the current config. It mirrors the gating in
// (*Application).startWatchdog so the /api/settings start/stop decision and
// the startup path agree on a single source of truth: the watchdog runs when
// idle/busy checks are enabled (WatchDog), when LRU eviction is active
// (effective max active backends > 0), or when the memory reclaimer is on.
func (o *ApplicationConfig) WatchdogShouldRun() bool {
return o.WatchDog || o.GetEffectiveMaxActiveBackends() > 0 || o.MemoryReclaimerEnabled
}
// WithForceEvictionWhenBusy sets whether to force eviction even when models have active API calls
func WithForceEvictionWhenBusy(enabled bool) AppOption {
return func(o *ApplicationConfig) {
@@ -603,6 +596,7 @@ func WithJSONStringPreload(configFile string) AppOption {
o.PreloadJSONModels = configFile
}
}
func WithConfigFile(configFile string) AppOption {
return func(o *ApplicationConfig) {
o.ConfigFile = configFile
@@ -691,21 +685,6 @@ func WithDisableStats(disable bool) AppOption {
}
}
// WithPIIConfigPath points the routing PII filter at a YAML config
// file. CLI: --pii-config.
func WithPIIConfigPath(path string) AppOption {
return func(o *ApplicationConfig) {
o.PIIConfigPath = path
}
}
// WithDisablePII turns the regex PII filter off. CLI: --disable-pii.
func WithDisablePII(disable bool) AppOption {
return func(o *ApplicationConfig) {
o.DisablePII = disable
}
}
// WithMITMListen sets the address the cloudproxy MITM listener
// binds on. Empty = disabled. CLI: --mitm-listen.
func WithMITMListen(addr string) AppOption {
@@ -1127,6 +1106,8 @@ func (o *ApplicationConfig) ToRuntimeSettings() RuntimeSettings {
mitmListen := o.MITMListen
piiDefaultDetectors := append([]string(nil), o.PIIDefaultDetectors...)
return RuntimeSettings{
WatchdogEnabled: &watchdogEnabled,
WatchdogIdleEnabled: &watchdogIdle,
@@ -1181,6 +1162,7 @@ func (o *ApplicationConfig) ToRuntimeSettings() RuntimeSettings {
LogoHorizontalFile: &logoHorizontalFile,
FaviconFile: &faviconFile,
MITMListen: &mitmListen,
PIIDefaultDetectors: &piiDefaultDetectors,
}
}
@@ -1198,18 +1180,22 @@ func (o *ApplicationConfig) ApplyRuntimeSettings(settings *RuntimeSettings) (req
}
if settings.WatchdogIdleEnabled != nil {
o.WatchDogIdle = *settings.WatchdogIdleEnabled
if o.WatchDogIdle {
o.WatchDog = true
}
requireRestart = true
}
if settings.WatchdogBusyEnabled != nil {
o.WatchDogBusy = *settings.WatchdogBusyEnabled
if o.WatchDogBusy {
o.WatchDog = true
}
requireRestart = true
}
// The React Settings "Enable Watchdog" master toggle manages only the
// idle/busy checks — watchdog_enabled is vestigial in that UI. Whenever
// either idle/busy field is present in the body, derive the run-state from
// idle||busy so a cold enable starts the watchdog and a full disable stops
// it, instead of trusting the stale watchdog_enabled the UI never updates.
// This mirrors the startup invariant in startup.go. An API client posting
// only watchdog_enabled (idle/busy absent) keeps its explicit value.
if settings.WatchdogIdleEnabled != nil || settings.WatchdogBusyEnabled != nil {
o.WatchDog = o.WatchDogIdle || o.WatchDogBusy
}
if settings.WatchdogIdleTimeout != nil {
if dur, err := time.ParseDuration(*settings.WatchdogIdleTimeout); err == nil {
o.WatchDogIdleTimeout = dur
@@ -1410,6 +1396,10 @@ func (o *ApplicationConfig) ApplyRuntimeSettings(settings *RuntimeSettings) (req
o.MITMListen = *settings.MITMListen
}
if settings.PIIDefaultDetectors != nil {
o.PIIDefaultDetectors = append([]string(nil), (*settings.PIIDefaultDetectors)...)
}
// Note: ApiKeys requires special handling (merging with startup keys) - handled in caller
return requireRestart

View File

@@ -223,6 +223,69 @@ var _ = Describe("ApplicationConfig RuntimeSettings Conversion", func() {
Expect(appConfig.WatchDogBusy).To(BeTrue())
})
// Residual #9125: the React Settings "Enable Watchdog" master toggle
// manages only watchdog_idle_enabled / watchdog_busy_enabled — it never
// touches the vestigial watchdog_enabled field. On a cold enable the
// body therefore carries watchdog_enabled=false alongside idle/busy=true.
// The derived run-state (WatchDog) must follow idle||busy so the live
// watchdog actually starts, not the stale watchdog_enabled=false.
It("should derive WatchDog from idle||busy on a cold enable even when watchdog_enabled=false", func() {
appConfig := &ApplicationConfig{WatchDog: false}
watchdogEnabled := false
watchdogIdle := true
watchdogBusy := true
rs := &RuntimeSettings{
WatchdogEnabled: &watchdogEnabled,
WatchdogIdleEnabled: &watchdogIdle,
WatchdogBusyEnabled: &watchdogBusy,
}
appConfig.ApplyRuntimeSettings(rs)
Expect(appConfig.WatchDog).To(BeTrue())
Expect(appConfig.WatchdogShouldRun()).To(BeTrue())
})
// The disable direction: the master toggle off sends idle=false,
// busy=false, but watchdog_enabled may still be the stale true loaded
// before the change. WatchDog must follow idle||busy down to false so
// the live watchdog is stopped (it stays stopped unless LRU / memory
// reclaimer keep it alive, which is gated by WatchdogShouldRun).
It("should disable WatchDog when both idle and busy are turned off", func() {
appConfig := &ApplicationConfig{WatchDog: true, WatchDogIdle: true, WatchDogBusy: true}
watchdogEnabled := true
watchdogIdle := false
watchdogBusy := false
rs := &RuntimeSettings{
WatchdogEnabled: &watchdogEnabled,
WatchdogIdleEnabled: &watchdogIdle,
WatchdogBusyEnabled: &watchdogBusy,
}
appConfig.ApplyRuntimeSettings(rs)
Expect(appConfig.WatchDog).To(BeFalse())
Expect(appConfig.WatchdogShouldRun()).To(BeFalse())
})
// Backward compatibility: an API client that posts only watchdog_enabled
// (idle/busy nil) keeps the explicit value — the idle/busy derivation
// only kicks in when those fields are actually present in the body.
It("should preserve explicit watchdog_enabled when idle/busy are absent", func() {
appConfig := &ApplicationConfig{WatchDog: false}
watchdogEnabled := true
rs := &RuntimeSettings{
WatchdogEnabled: &watchdogEnabled,
}
appConfig.ApplyRuntimeSettings(rs)
Expect(appConfig.WatchDog).To(BeTrue())
})
It("should handle MaxActiveBackends and update SingleBackend accordingly", func() {
appConfig := &ApplicationConfig{}

Some files were not shown because too many files have changed in this diff Show More