* fix(http): close 0.0.0.0/[::] SSRF bypass in /api/cors-proxy
The CORS proxy carried its own private-network blocklist (RFC 1918 + a
handful of IPv6 ranges) instead of using the same classification as
pkg/utils/urlfetch.go. The hand-rolled list missed 0.0.0.0/8 and ::/128,
both of which Linux routes to localhost — so any user with FeatureMCP
(default-on for new users) could reach LocalAI's own listener and any
other service bound to 0.0.0.0:port via:
GET /api/cors-proxy?url=http://0.0.0.0:8080/...
GET /api/cors-proxy?url=http://[::]:8080/...
Replace the custom check with utils.IsPublicIP (Go stdlib IsLoopback /
IsLinkLocalUnicast / IsPrivate / IsUnspecified, plus IPv4-mapped IPv6
unmasking) and add an upfront hostname rejection for localhost, *.local,
and the cloud metadata aliases so split-horizon DNS can't paper over the
IP check.
The IP-pinning DialContext is unchanged: the validated IP from the
single resolution is reused for the connection, so DNS rebinding still
cannot swap a public answer for a private one between validate and dial.
Regression tests cover 0.0.0.0, 0.0.0.0:PORT, [::], ::ffff:127.0.0.1,
::ffff:10.0.0.1, file://, gopher://, ftp://, localhost, 127.0.0.1,
10.0.0.1, 169.254.169.254, metadata.google.internal.
Assisted-by: Claude:claude-opus-4-7 [Claude Code]
Signed-off-by: Richard Palethorpe <io@richiejp.com>
* fix(downloader): verify SHA before promoting temp file to final path
DownloadFileWithContext renamed the .partial file to its final name
*before* checking the streamed SHA, so a hash mismatch returned an
error but left the tampered file at filePath. Subsequent code that
operated on filePath (a backend launcher, a YAML loader, a re-download
that finds the file already present and skips) would consume the
attacker-supplied bytes.
Reorder: verify the streamed hash first, remove the .partial on
mismatch, then rename. The streamed hash is computed during io.Copy
so no second read is needed.
While here, raise the empty-SHA case from a Debug log to a Warn so
"this download had no integrity check" is visible at the default log
level. Backend installs currently pass through with no digest; the
warning makes that footprint observable without changing behaviour.
Regression test asserts os.IsNotExist on the destination after a
deliberate SHA mismatch.
Assisted-by: Claude:claude-opus-4-7 [Claude Code]
Signed-off-by: Richard Palethorpe <io@richiejp.com>
* fix(auth): require email_verified for OIDC admin promotion
extractOIDCUserInfo read the ID token's "email" claim but never
inspected "email_verified". With LOCALAI_ADMIN_EMAIL set, an attacker
who could register on the configured OIDC IdP under that email (some
IdPs accept self-supplied unverified emails) inherited admin role:
- first login: AssignRole(tx, email, adminEmail) → RoleAdmin
- re-login: MaybePromote(db, user, adminEmail) → flip to RoleAdmin
Add EmailVerified to oauthUserInfo, parse email_verified from the OIDC
claims (default false on absence so an IdP that omits the claim cannot
short-circuit the gate), and substitute "" for the role-decision email
when verified=false via emailForRoleDecision. The user record still
stores the unverified email for display.
GitHub's path defaults EmailVerified=true: GitHub only returns a public
profile email after verification, and fetchGitHubPrimaryEmail explicitly
filters to Verified=true.
Regression tests cover both the helper contract and integration with
AssignRole, including the bootstrap "first user" branch that would
otherwise mask the gate.
Assisted-by: Claude:claude-opus-4-7 [Claude Code]
Signed-off-by: Richard Palethorpe <io@richiejp.com>
* feat(cli): refuse public bind when no auth backend is configured
When neither an auth DB nor a static API key is set, the auth
middleware passes every request through. That is fine for a developer
laptop, a home LAN, or a Tailnet — the network itself is the trust
boundary. It is not fine on a public IP, where every model install,
settings change, and admin endpoint becomes reachable from the
internet.
Refuse to start in that exact configuration. Loopback, RFC 1918,
RFC 4193 ULA, link-local, and RFC 6598 CGNAT (Tailscale's default
range) all count as trusted; wildcard binds (`:port`, `0.0.0.0`,
`[::]`) are accepted only when every host interface is in one of those
ranges. Hostnames are resolved and treated as trusted only when every
answer is.
A new --allow-insecure-public-bind / LOCALAI_ALLOW_INSECURE_PUBLIC_BIND
flag opts out for deployments that gate access externally (a reverse
proxy enforcing auth, a mesh ACL, etc.). The error message lists this
plus the three constructive alternatives (bind a private interface,
enable --auth, set --api-keys).
The interface enumeration goes through a package-level interfaceAddrsFn
var so tests can simulate cloud-VM, home-LAN, Tailscale-only, and
enumeration-failure topologies without poking at the real network
stack.
Assisted-by: Claude:claude-opus-4-7 [Claude Code]
Signed-off-by: Richard Palethorpe <io@richiejp.com>
* test(http): regression-test the localai_assistant admin gate
ChatEndpoint already rejects metadata.localai_assistant=true from a
non-admin caller, but the gate was open-coded inline with no direct
test coverage. The chat route is FeatureChat-gated (default-on), and
the assistant's in-process MCP server can install/delete models and
edit configs — the wrong handler change would silently turn the LLM
into a confused deputy.
Extract the gate into requireAssistantAccess(c, authEnabled) and pin
its behaviour: auth disabled is a no-op, unauthenticated is 403,
RoleUser is 403, RoleAdmin and the synthetic legacy-key admin are
admitted.
No behaviour change in the production path.
Assisted-by: Claude:claude-opus-4-7 [Claude Code]
Signed-off-by: Richard Palethorpe <io@richiejp.com>
* test(http): assert every API route is auth-classified
The auth middleware classifies path prefixes (/api/, /v1/, /models/,
etc.) as protected and treats anything else as a static-asset
passthrough. A new endpoint shipped under a brand-new prefix — or a
new path that simply isn't on the prefix allowlist — would be
reachable anonymously.
Walk every route registered by API() with auth enabled and a fresh
in-memory database (no users, no keys), and assert each API-prefixed
route returns 401 / 404 / 405 to an anonymous request. Public surfaces
(/api/auth/*, /api/branding, /api/node/* token-authenticated routes,
/healthz, branding asset server, generated-content server, static
assets) are explicit allowlist entries with comments justifying them.
Build-tagged 'auth' so it runs against the SQLite-backed auth DB
(matches the existing auth suite).
Assisted-by: Claude:claude-opus-4-7 [Claude Code]
Signed-off-by: Richard Palethorpe <io@richiejp.com>
* test(http): pin agent endpoint per-user isolation contract
agents.go's getUserID / effectiveUserID / canImpersonateUser /
wantsAllUsers helpers are the single trust boundary for cross-user
access on agent, agent-jobs, collections, and skills routes. A
regression there is the difference between "regular user reads their
own data" and "regular user reads anyone's data via ?user_id=victim".
Lock in the contract:
- effectiveUserID ignores ?user_id= for unauthenticated and RoleUser
- effectiveUserID honours it for RoleAdmin and ProviderAgentWorker
- wantsAllUsers requires admin AND the literal "true" string
- canImpersonateUser is admin OR agent-worker, never plain RoleUser
No production change — this commit only adds tests.
Assisted-by: Claude:claude-opus-4-7 [Claude Code]
Signed-off-by: Richard Palethorpe <io@richiejp.com>
* fix(downloader): drop redundant stat in removePartialFile
The stat-then-remove pattern is a TOCTOU window and a wasted syscall —
os.Remove already returns ErrNotExist for the missing-file case, so trust
that and treat it as a no-op.
Assisted-by: Claude:claude-opus-4-7 [Claude Code]
Signed-off-by: Richard Palethorpe <io@richiejp.com>
* fix(http): redact secrets from trace buffer and distribution-token logs
The /api/traces buffer captured Authorization, Cookie, Set-Cookie, and
API-key headers verbatim from every request when tracing was enabled. The
endpoint is admin-only but the buffer is reachable via any heap-style
introspection and the captured tokens otherwise outlive the request.
Strip those header values at capture time. Body redaction is left to a
follow-up — the prompts are usually the operator's own and JSON-walking
is invasive.
Distribution tokens were also logged in plaintext from
core/explorer/discovery.go; logs forward to syslog/journald and outlive
the token. Redact those to a short prefix/suffix instead.
Assisted-by: Claude:claude-opus-4-7 [Claude Code]
Signed-off-by: Richard Palethorpe <io@richiejp.com>
* feat(auth): rate-limit OAuth callbacks separately from password endpoints
The shared 5/min/IP limit on auth endpoints is right for password-style
flows but too tight for OAuth callbacks: corporate SSO funnels many real
users through one outbound IP and would trip the limit. Add a separate
60/min/IP limiter for /api/auth/{github,oidc}/callback so callbacks are
bounded against floods without breaking shared-IP deployments.
Assisted-by: Claude:claude-opus-4-7 [Claude Code]
Signed-off-by: Richard Palethorpe <io@richiejp.com>
* feat(gallery): verify backend tarball sha256 when set in gallery entry
GalleryBackend gained an optional sha256 field; the install path now
threads it through to the existing downloader hash-verify (which already
streams, verifies, and rolls back on mismatch). Galleries without sha256
keep working; the empty-SHA path still emits the existing
"downloading without integrity check" warning.
Assisted-by: Claude:claude-opus-4-7 [Claude Code]
Signed-off-by: Richard Palethorpe <io@richiejp.com>
* test(http): pin CSRF coverage on multipart endpoints
The CSRF middleware in app.go is global (e.Use) so it covers every
multipart upload route — branding assets, fine-tune datasets, audio
transforms, agent collections. Pin that contract: cross-site multipart
POSTs are rejected; same-origin / same-site / API-key clients are not.
Also pins the SameSite=Lax fallback path the skipper relies on when
Sec-Fetch-Site is absent.
Assisted-by: Claude:claude-opus-4-7 [Claude Code]
Signed-off-by: Richard Palethorpe <io@richiejp.com>
* feat(http): XSS hardening — CSP headers, safe href, base-href escape, SVG sandbox
Several closely related XSS-prevention changes spanning the SPA shell, the
React UI, and the branding asset server:
- New SecurityHeaders middleware sets CSP, X-Content-Type-Options,
X-Frame-Options, and Referrer-Policy on every response. The CSP keeps
script-src permissive because the Vite bundle relies on inline + eval'd
scripts; tightening that requires moving to a nonce-based policy.
- The <base href> injection in the SPA shell escaped attacker-controllable
Host / X-Forwarded-Host headers — a single quote in the host header
broke out of the attribute. Pass through SecureBaseHref (html.EscapeString).
- Three React sinks rendering untrusted content via dangerouslySetInnerHTML
switch to text-node rendering with whiteSpace: pre-wrap: user message
bodies in Chat.jsx and AgentChat.jsx, and the agent activity log in
AgentChat.jsx. The hand-rolled escape on the agent user-message variant
is replaced by the same plain-text path.
- New safeHref util collapses non-allowlisted URI schemes (most
importantly javascript:) to '#'. Applied to gallery `<a href={url}>`
links in Models / Backends / Manage and to canvas artifact links —
these come from gallery JSON or assistant tool calls and must be treated
as untrusted.
- The branding asset server attaches a sandbox CSP plus same-origin CORP
to .svg responses. The React UI loads logos via <img>, but the same URL
is also reachable via direct navigation; this prevents script
execution if a hostile SVG slipped past upload validation.
Assisted-by: Claude:claude-opus-4-7 [Claude Code]
Signed-off-by: Richard Palethorpe <io@richiejp.com>
* feat(http): bound HTTP server with read-header and idle timeouts
A net/http server with no timeouts is trivially Slowloris-able and leaks
idle keep-alive connections. Set ReadHeaderTimeout (30s) to plug the
slow-headers attack and IdleTimeout (120s) to cap keep-alive sockets.
ReadTimeout and WriteTimeout stay at 0 because request bodies can be
multi-GB model uploads and SSE / chat completions stream for many
minutes; operators who need tighter per-request bounds should terminate
slow clients at a reverse proxy.
Assisted-by: Claude:claude-opus-4-7 [Claude Code]
Signed-off-by: Richard Palethorpe <io@richiejp.com>
* test(auth): pin PUT /api/auth/profile field-tampering contract
The handler uses an explicit local body struct (only name and avatar_url)
plus a gorm Updates(map) with a column allowlist, so an attacker posting
{"role":"admin","email":"...","password_hash":"..."} can't mass-assign
those fields. Lock that down with a regression test so a future
"let's just c.Bind(&user)" refactor breaks loudly.
Assisted-by: Claude:claude-opus-4-7 [Claude Code]
Signed-off-by: Richard Palethorpe <io@richiejp.com>
* fix(services): strip directory components from multipart upload filenames
UploadDataset and UploadToCollectionForUser took the raw multipart
file.Filename and joined it into a destination path. The fine-tune
upload was incidentally safe because of a UUID prefix that fused any
leading '..' to a literal segment, but the protection is fragile.
UploadToCollectionForUser handed the filename to a vendored backend
without sanitising at all.
Strip to filepath.Base at both boundaries and reject the trivial
unsafe values ("", ".", "..", "/").
Assisted-by: Claude:claude-opus-4-7 [Claude Code]
Signed-off-by: Richard Palethorpe <io@richiejp.com>
* fix(react-ui): validate persisted MCP server entries on load
localStorage is shared across same-origin pages; an XSS that lands once
can poison persisted MCP server config to attempt header injection or
to feed a non-http URL into the fetch path on subsequent loads.
Validate every entry: types must match, URL must parse with http(s)
scheme, header keys/values must be control-char-free. Drop anything
that doesn't fit.
Assisted-by: Claude:claude-opus-4-7 [Claude Code]
Signed-off-by: Richard Palethorpe <io@richiejp.com>
* fix(http): close X-Forwarded-Prefix open redirect
The reverse-proxy support concatenated X-Forwarded-Prefix into the
redirect target without validation, so a forged header value of
"//evil.com" turned the SPA-shell redirect helper at /, /browse, and
/browse/* into a 301 to //evil.com/app. The path-strip middleware had
the same shape on its prefix-trailing-slash redirect.
Add SafeForwardedPrefix at the middleware boundary: must start with
a single '/', no protocol-relative '//' opener, no scheme, no
backslash, no control characters. Apply at both consumers; misconfig
trips the validator and the header is dropped.
Assisted-by: Claude:claude-opus-4-7 [Claude Code]
Signed-off-by: Richard Palethorpe <io@richiejp.com>
* fix(http): refuse wildcard CORS when LOCALAI_CORS=true with empty allowlist
When LOCALAI_CORS=true but LOCALAI_CORS_ALLOW_ORIGINS was empty, Echo's
CORSWithConfig saw an empty allow-list and fell back to its default
AllowOrigins=["*"]. An operator who flipped the strict-CORS feature
flag without populating the list got the opposite of what they asked
for. Echo never sets Allow-Credentials: true so this isn't directly
exploitable (cookies aren't sent under wildcard CORS), but the
misconfiguration trap is worth closing. Skip the registration and warn.
Assisted-by: Claude:claude-opus-4-7 [Claude Code]
Signed-off-by: Richard Palethorpe <io@richiejp.com>
* feat(auth): zxcvbn password strength check with user-acknowledged override
The previous policy was len < 8, which let through "Password1" and the
rest of the credential-stuffing corpus. LocalAI has no second factor
yet, so the bar needs to sit higher.
Add ValidatePasswordStrength using github.com/timbutler/zxcvbn (an
actively-maintained fork of the trustelem port; v1.0.4, April 2024):
- min 12 chars, max 72 (bcrypt's truncation point)
- reject NUL bytes (some bcrypt callers truncate at the first NUL)
- require zxcvbn score >= 3 ("safely unguessable, ~10^8 guesses to
break"); the hint list ["localai", "local-ai", "admin"] penalises
passwords built from the app's own branding
zxcvbn produces false positives sometimes (a strong-looking password
that happens to match a dictionary word) and operators occasionally
need to set a known-weak password (kiosk demos, CI rigs). Add an
acknowledgement path: PasswordPolicy{AllowWeak: true} skips the
entropy check while still enforcing the hard rules. The structured
PasswordErrorResponse marks weak-password rejections as Overridable
so the UI can surface a "use this anyway" checkbox.
Wired through register, self-service password change, and admin
password reset on both the server and the React UI.
Assisted-by: Claude:claude-opus-4-7 [Claude Code]
Signed-off-by: Richard Palethorpe <io@richiejp.com>
* fix(react-ui): drop HTML5 minLength on new-password inputs
minLength={12} on the new-password input let the browser block the
form submit silently before any JS or network call ran. The browser
focused the field, showed a brief native tooltip, and that was that —
no toast, no fetch, no clue. Reproducible by typing fewer than 12
chars on the second password change of a session.
The JS-level length check in handleSubmit already shows a toast and
the server rejects with a structured error, so the HTML5 attribute
was redundant defence anyway. Drop it.
Assisted-by: Claude:claude-opus-4-7 [Claude Code]
Signed-off-by: Richard Palethorpe <io@richiejp.com>
* fix(react-ui): bundle Geist fonts locally instead of fetching from Google
The new CSP correctly refused to apply styles from
fonts.googleapis.com because style-src is locked to 'self' and
'unsafe-inline'. Loosening the CSP would defeat its purpose; the
right fix is to stop reaching out to a third-party CDN for fonts on
every page load.
Add @fontsource-variable/geist and @fontsource-variable/geist-mono as
npm deps and import them once at boot. Drop the <link rel="preconnect">
and external stylesheet from index.html.
Side benefit: no third-party tracking via Referer / IP on every UI
load, no failure mode when offline / behind a captive portal.
Assisted-by: Claude:claude-opus-4-7 [Claude Code]
Signed-off-by: Richard Palethorpe <io@richiejp.com>
* fix(react-ui): refresh i18n strings to reflect 12-char password minimum
The translations still said "at least 8 characters" everywhere — the
client-side toast on a too-short password change told the user the
wrong floor. Update tooShort and newPasswordPlaceholder /
newPasswordDescription across all five locales (en, es, it, de,
zh-CN) to match the real ValidatePasswordStrength rule.
Assisted-by: Claude:claude-opus-4-7 [Claude Code]
Signed-off-by: Richard Palethorpe <io@richiejp.com>
* feat(auth): make password length-floor overridable like the entropy check
The 12-char minimum was a policy choice, not a technical invariant —
only "non-empty", "<= 72 bytes", and "no NUL bytes" are real bcrypt
constraints. Treating length-12 as a hard rule was inconsistent with
the entropy check (already overridable) and friction for use cases
where the account is just a name on a session, not a security
boundary (single-user kiosk, CI rig, lab demo).
Restructure ValidatePasswordStrength:
- Hard rules (always enforced): non-empty, <= MaxPasswordLength, no NUL byte
- Policy rules (skipped when AllowWeak=true): length >= 12, zxcvbn score >= 3
PasswordError now marks password_too_short as Overridable too. The
React forms generalised from `error_code === 'password_too_weak'` to
`overridable === true`, and the JS-side preflight length checks were
removed (server is source of truth, returns the same checkbox flow).
Assisted-by: Claude:claude-opus-4-7 [Claude Code]
Signed-off-by: Richard Palethorpe <io@richiejp.com>
---------
Signed-off-by: Richard Palethorpe <io@richiejp.com>
* fix(http): log accurate status code when handler returns error
The custom xlog access-log middleware in API() reads res.Status
*before* Echo's central HTTPErrorHandler runs, so when a handler
returns an error without writing a response (e.g.
TranscriptEndpoint's `return err` on backend failure) the status
field stays at its default 200. The logged line then claims
status=200 while the client receives 500 — silently hiding every
500/503/etc. that bubbles up through Echo's error handler.
Mirror echo.DefaultHTTPErrorHandler's status derivation when
err != nil and the response hasn't been committed: default to 500,
upgrade to *echo.HTTPError.Code if applicable. The logged status now
matches what the client actually sees, so failed transcription
requests stop appearing as 200 in the access log.
Signed-off-by: Ettore Di Giacinto <mudler@localai.io>
Assisted-by: Claude:claude-opus-4-7 [Claude Code]
* fix(transcription): log underlying error before returning 500 to client
ModelTranscriptionWithOptions surfaces real failures — gRPC errors
from a remote node, model load problems, ffmpeg conversion crashes —
but TranscriptEndpoint just did `return err`, so Echo turned it into
a 500 with a generic body and the original error was lost. Operators
chasing transcription failures across distributed mode were left
with "upstream returned 500" on the client and zero context anywhere
in the frontend's logs.
Add an xlog.Error before returning, recording model name, the staged
audio path, and the underlying error. Combined with the access-log
status fix, a failing transcription now leaves an audit trail (real
status code in the access line, real cause in an Error line) instead
of vanishing.
Signed-off-by: Ettore Di Giacinto <mudler@localai.io>
Assisted-by: Claude:claude-opus-4-7 [Claude Code]
---------
Signed-off-by: Ettore Di Giacinto <mudler@localai.io>
Co-authored-by: Ettore Di Giacinto <mudler@localai.io>
Adds end-to-end internationalization to the React UI with five seed
languages (English, Italian, Spanish, German, Simplified Chinese) and
a sidebar-footer language switcher next to the existing theme toggle.
Library: react-i18next + i18next + i18next-http-backend +
i18next-browser-languagedetector. The detector caches the user's
choice in localStorage (key `localai-language`, mirroring the existing
`localai-theme` convention) and updates the `<html lang>` attribute on
change. fallbackLng is `en`, so any missing translation in another
locale falls back transparently.
Translation files live under `public/locales/<lng>/<ns>.json`. They
ride along with the existing `//go:embed react-ui/dist/*` directive,
but the previous SPA route in core/http/app.go only exposed
`/assets/*` from the embedded React build. This commit generalizes
the asset handler into a `serveReactSubdir(subdir)` helper and adds a
matching `/locales/*` route so i18next-http-backend can fetch the
JSONs at runtime. The http-backend `loadPath` is built via the
existing `apiUrl()` helper so instances served under a sub-path (e.g.
`<base href="/ui/">`) resolve correctly.
Namespaces (13): common, nav, errors, auth, home, models, importModel,
chat, agents, skills, collections, media, admin. Translated UI surfaces
include the sidebar/header/footer chrome, login + account flows, the
Home dashboard (incl. the manage-by-chat assistant CTA), the model
gallery + import flow, the chat experience (Chat.jsx + ChatsMenu),
agents/skills/collections list pages, the studio media tabs (Image,
Video, TTS), and the admin page-headers (Settings incl. its section
nav, Manage, Backends, Traces, Nodes, P2P, Users, Usage). Shared
components (ConfirmDialog, Toast) take their default labels from the
common namespace so callers don't need to pass strings explicitly.
Tooling for incremental adoption is included:
- `i18next-parser.config.js` + `npm run i18n:extract` to sweep `t()`
keys into the JSON skeletons.
- `scripts/translate-locales.mjs` (one-off helper) to bootstrap
non-English locales from English source via OpenAI or Anthropic
APIs, with --copy mode as a placeholder fallback. Idempotent;
preserves existing translations unless --overwrite is passed.
Larger config-driven pages (ModelEditor, Settings deep field forms,
AgentChat/AgentCreate, SkillEdit, CollectionDetails, Talk, Sound,
biometrics, FineTune/Quantize, Users modals, Nodes/P2P install
pickers, BackendLogs, Traces deep filters, Explorer) intentionally
keep their inner content untranslated for now — they fall back to
English via fallbackLng so functionality is unaffected, and the
extracted-strings pattern + the bootstrap script make follow-up
extraction straightforward.
The initial Suspense fallback at the root in main.jsx covers the
first JSON fetch on cold load. A simple `.app-boot-spinner` styled
in App.css provides a non-empty paint while the first namespace
loads.
Assisted-by: Claude:claude-opus-4-7 [Bash Read Edit Write Agent]
Signed-off-by: Ettore Di Giacinto <mudler@localai.io>
* feat: add distributed mode (experimental)
Signed-off-by: Ettore Di Giacinto <mudler@localai.io>
* fix data races, mutexes, transactions
Signed-off-by: Ettore Di Giacinto <mudler@localai.io>
* refactorings
Signed-off-by: Ettore Di Giacinto <mudler@localai.io>
* fixups
Signed-off-by: Ettore Di Giacinto <mudler@localai.io>
* fix events and tool stream in agent chat
Signed-off-by: Ettore Di Giacinto <mudler@localai.io>
* use ginkgo
Signed-off-by: Ettore Di Giacinto <mudler@localai.io>
* refactoring and consolidation
Signed-off-by: Ettore Di Giacinto <mudler@localai.io>
* refactoring and consolidation
Signed-off-by: Ettore Di Giacinto <mudler@localai.io>
* refactoring and consolidation
Signed-off-by: Ettore Di Giacinto <mudler@localai.io>
* refactoring and consolidation
Signed-off-by: Ettore Di Giacinto <mudler@localai.io>
* refactoring and consolidation
Signed-off-by: Ettore Di Giacinto <mudler@localai.io>
* refactoring and consolidation
Signed-off-by: Ettore Di Giacinto <mudler@localai.io>
* refactoring and consolidation
Signed-off-by: Ettore Di Giacinto <mudler@localai.io>
* refactoring and consolidation
Signed-off-by: Ettore Di Giacinto <mudler@localai.io>
* fix(cron): compute correctly time boundaries avoiding re-triggering
Signed-off-by: Ettore Di Giacinto <mudler@localai.io>
* enhancements, refactorings
Signed-off-by: Ettore Di Giacinto <mudler@localai.io>
* do not flood of healthy checks
Signed-off-by: Ettore Di Giacinto <mudler@localai.io>
* do not list obvious backends as text backends
Signed-off-by: Ettore Di Giacinto <mudler@localai.io>
* tests fixups
Signed-off-by: Ettore Di Giacinto <mudler@localai.io>
* refactoring and consolidation
Signed-off-by: Ettore Di Giacinto <mudler@localai.io>
* Drop redundant healthcheck
Signed-off-by: Ettore Di Giacinto <mudler@localai.io>
* enhancements, refactorings
Signed-off-by: Ettore Di Giacinto <mudler@localai.io>
---------
Signed-off-by: Ettore Di Giacinto <mudler@localai.io>
* feat: add fine-tuning endpoint
Signed-off-by: Ettore Di Giacinto <mudler@localai.io>
* feat(experimental): add fine-tuning endpoint and TRL support
This changeset defines new GRPC signatues for Fine tuning backends, and
add TRL backend as initial fine-tuning engine. This implementation also
supports exporting to GGUF and automatically importing it to LocalAI
after fine-tuning.
Signed-off-by: Ettore Di Giacinto <mudler@localai.io>
* commit TRL backend, stop by killing process
Signed-off-by: Ettore Di Giacinto <mudler@localai.io>
* move fine-tune to generic features
Signed-off-by: Ettore Di Giacinto <mudler@localai.io>
* add evals, reorder menu
Signed-off-by: Ettore Di Giacinto <mudler@localai.io>
* Fix tests
Signed-off-by: Ettore Di Giacinto <mudler@localai.io>
---------
Signed-off-by: Ettore Di Giacinto <mudler@localai.io>
* feat(ui): add users and authentication support
Signed-off-by: Ettore Di Giacinto <mudler@localai.io>
* feat: allow the admin user to impersonificate users
Signed-off-by: Ettore Di Giacinto <mudler@localai.io>
* chore: ui improvements, disable 'Users' button in navbar when no auth is configured
Signed-off-by: Ettore Di Giacinto <mudler@localai.io>
* feat: add OIDC support
Signed-off-by: Ettore Di Giacinto <mudler@localai.io>
* fix: gate models
Signed-off-by: Ettore Di Giacinto <mudler@localai.io>
* chore: cache requests to optimize speed
Signed-off-by: Ettore Di Giacinto <mudler@localai.io>
* small UI enhancements
Signed-off-by: Ettore Di Giacinto <mudler@localai.io>
* chore(ui): style improvements
Signed-off-by: Ettore Di Giacinto <mudler@localai.io>
* fix: cover other paths by auth
Signed-off-by: Ettore Di Giacinto <mudler@localai.io>
* chore: separate local auth, refactor
Signed-off-by: Ettore Di Giacinto <mudler@localai.io>
* security hardening, approval mode
Signed-off-by: Ettore Di Giacinto <mudler@localai.io>
* fix: fix tests and expectations
Signed-off-by: Ettore Di Giacinto <mudler@localai.io>
* chore: update localagi/localrecall
Signed-off-by: Ettore Di Giacinto <mudler@localai.io>
---------
Signed-off-by: Ettore Di Giacinto <mudler@localai.io>
Also test for regressions in HTTP GET API key exempted endpoints because
this list can get out of sync with the UI routes.
Also fix support for proxying on a different prefix both server and
client side.
Signed-off-by: Richard Palethorpe <io@richiejp.com>
* feat: add standalone and agentic functionalities
Signed-off-by: Ettore Di Giacinto <mudler@localai.io>
* expose agents via responses api
Signed-off-by: Ettore Di Giacinto <mudler@localai.io>
---------
Signed-off-by: Ettore Di Giacinto <mudler@localai.io>
* fix: reduce log verbosity for /api/operations polling
Reduces log clutter by changing the log level from INFO to DEBUG for successful (200 OK) /api/operations requests. This endpoint is polled frequently by the Web UI, causing log spam. Fixes#7989.
* fix: reduce log verbosity for /api/operations polling
Reduces log clutter by changing the log level from INFO to DEBUG for successful (200 OK) /api/operations requests. This endpoint is polled frequently by the Web UI, causing log spam. Fixes#7989.
This removes any ambiguity from how paths are handled, and at the same
time it uniforms the ui paths with the other paths that don't have a
trailing slash
Signed-off-by: Ettore Di Giacinto <mudler@localai.io>
* feat(agent): agent jobs
Signed-off-by: Ettore Di Giacinto <mudler@localai.io>
* Multiple webhooks, simplify
Signed-off-by: Ettore Di Giacinto <mudler@localai.io>
* Do not use cron with seconds
Signed-off-by: Ettore Di Giacinto <mudler@localai.io>
* Create separate pages for details
Signed-off-by: Ettore Di Giacinto <mudler@localai.io>
* Detect if no models have MCP configuration, show wizard
Signed-off-by: Ettore Di Giacinto <mudler@localai.io>
* Make services test to run
Signed-off-by: Ettore Di Giacinto <mudler@localai.io>
---------
Signed-off-by: Ettore Di Giacinto <mudler@localai.io>
* feat(ui): add watchdog settings
Signed-off-by: Ettore Di Giacinto <mudler@localai.io>
* Do not re-read env
Signed-off-by: Ettore Di Giacinto <mudler@localai.io>
* Some refactor, move other settings to runtime (p2p)
Signed-off-by: Ettore Di Giacinto <mudler@localai.io>
* Add API Keys handling
Signed-off-by: Ettore Di Giacinto <mudler@localai.io>
* Allow to disable runtime settings
Signed-off-by: Ettore Di Giacinto <mudler@localai.io>
* Documentation
Signed-off-by: Ettore Di Giacinto <mudler@localai.io>
* Small fixups
Signed-off-by: Ettore Di Giacinto <mudler@localai.io>
* show MCP toggle in index
Signed-off-by: Ettore Di Giacinto <mudler@localai.io>
* Drop context default
Signed-off-by: Ettore Di Giacinto <mudler@localai.io>
---------
Signed-off-by: Ettore Di Giacinto <mudler@localai.io>
* feat(mcp): add LocalAI endpoint to stream live results of the agent
Signed-off-by: Ettore Di Giacinto <mudler@localai.io>
* wip
Signed-off-by: Ettore Di Giacinto <mudler@localai.io>
* Refactoring
Signed-off-by: Ettore Di Giacinto <mudler@localai.io>
* MCP UX integration
Signed-off-by: Ettore Di Giacinto <mudler@localai.io>
* Enhance UX
Signed-off-by: Ettore Di Giacinto <mudler@localai.io>
* Support also non-SSE
Signed-off-by: Ettore Di Giacinto <mudler@localai.io>
---------
Signed-off-by: Ettore Di Giacinto <mudler@localai.io>
* feat: initial hook to install elements directly
Signed-off-by: Ettore Di Giacinto <mudler@localai.io>
* WIP: ui changes
Signed-off-by: Ettore Di Giacinto <mudler@localai.io>
* Move HF api client to pkg
Signed-off-by: Ettore Di Giacinto <mudler@localai.io>
* Add simple importer for gguf files
Signed-off-by: Ettore Di Giacinto <mudler@localai.io>
* Add opcache
Signed-off-by: Ettore Di Giacinto <mudler@localai.io>
* wire importers to CLI
Signed-off-by: Ettore Di Giacinto <mudler@localai.io>
* Add omitempty to config fields
Signed-off-by: Ettore Di Giacinto <mudler@localai.io>
* Fix tests
Signed-off-by: Ettore Di Giacinto <mudler@localai.io>
* Add MLX importer
Signed-off-by: Ettore Di Giacinto <mudler@localai.io>
* Small refactors to star to use HF for discovery
Signed-off-by: Ettore Di Giacinto <mudler@localai.io>
* Add tests
Signed-off-by: Ettore Di Giacinto <mudler@localai.io>
* Common preferences
Signed-off-by: Ettore Di Giacinto <mudler@localai.io>
* Add support to bare HF repos
Signed-off-by: Ettore Di Giacinto <mudler@localai.io>
* feat(importer/llama.cpp): add support for mmproj files
Signed-off-by: Ettore Di Giacinto <mudler@localai.io>
* add mmproj quants to common preferences
Signed-off-by: Ettore Di Giacinto <mudler@localai.io>
* Fix vlm usage in tokenizer mode with llama.cpp
Signed-off-by: Ettore Di Giacinto <mudler@localai.io>
---------
Signed-off-by: Ettore Di Giacinto <mudler@localai.io>
* feat(ui): use Alpine.js and drop HTMX
Signed-off-by: Ettore Di Giacinto <mudler@localai.io>
* Display pending ops
Signed-off-by: Ettore Di Giacinto <mudler@localai.io>
* Show in progress ops
Signed-off-by: Ettore Di Giacinto <mudler@localai.io>
* more stable sorting
Signed-off-by: Ettore Di Giacinto <mudler@localai.io>
* minor fixup
Signed-off-by: Ettore Di Giacinto <mudler@localai.io>
* Fix clipboard copy
Signed-off-by: Ettore Di Giacinto <mudler@localai.io>
* Cleanup
Signed-off-by: Ettore Di Giacinto <mudler@localai.io>
---------
Signed-off-by: Ettore Di Giacinto <mudler@localai.io>
* feat(p2p): sync models between federated nodes
This change makes sure that between federated nodes all the models are
synced with each other.
Note: this works exclusively with models belonging to a gallery. It does
not sync files between the nodes, but rather it synces the node setup.
E.g. All the nodes needs to have configured the same galleries and
install models without any local editing.
Signed-off-by: Ettore Di Giacinto <mudler@localai.io>
* Make nodes stable
Signed-off-by: Ettore Di Giacinto <mudler@localai.io>
* Fixups on syncing
Signed-off-by: Ettore Di Giacinto <mudler@localai.io>
* ui: improve p2p view
Signed-off-by: Ettore Di Giacinto <mudler@localai.io>
---------
Signed-off-by: Ettore Di Giacinto <mudler@localai.io>
- Add a system backend path
- Refactor and consolidate system information in system state
- Use system state in all the components to figure out the system paths
to used whenever needed
- Refactor BackendConfig -> ModelConfig. This was otherway misleading as
now we do have a backend configuration which is not the model config.
Signed-off-by: Ettore Di Giacinto <mudler@localai.io>
* feat(backend gallery): add meta packages
So we can have meta packages such as "vllm" that automatically installs
the corresponding package depending on the GPU that is being currently
detected in the system.
Signed-off-by: Ettore Di Giacinto <mudler@localai.io>
* feat: use a metadata file
Signed-off-by: Ettore Di Giacinto <mudler@localai.io>
---------
Signed-off-by: Ettore Di Giacinto <mudler@localai.io>
* feat: Add backend gallery
This PR add support to manage backends as similar to models. There is
now available a backend gallery which can be used to install and remove
extra backends.
The backend gallery can be configured similarly as a model gallery, and
API calls allows to install and remove new backends in runtime, and as
well during the startup phase of LocalAI.
Signed-off-by: Ettore Di Giacinto <mudler@localai.io>
* Add backends docs
Signed-off-by: Ettore Di Giacinto <mudler@localai.io>
* wip: Backend Dockerfile for python backends
Signed-off-by: Ettore Di Giacinto <mudler@localai.io>
* feat: drop extras images, build python backends separately
Signed-off-by: Ettore Di Giacinto <mudler@localai.io>
* fixup on all backends
Signed-off-by: Ettore Di Giacinto <mudler@localai.io>
* test CI
Signed-off-by: Ettore Di Giacinto <mudler@localai.io>
* Tweaks
Signed-off-by: Ettore Di Giacinto <mudler@localai.io>
* Drop old backends leftovers
Signed-off-by: Ettore Di Giacinto <mudler@localai.io>
* Fixup CI
Signed-off-by: Ettore Di Giacinto <mudler@localai.io>
* Move dockerfile upper
Signed-off-by: Ettore Di Giacinto <mudler@localai.io>
* Fix proto
Signed-off-by: Ettore Di Giacinto <mudler@localai.io>
* Feature dropped for consistency - we prefer model galleries
Signed-off-by: Ettore Di Giacinto <mudler@localai.io>
* Add missing packages in the build image
Signed-off-by: Ettore Di Giacinto <mudler@localai.io>
* exllama is ponly available on cublas
Signed-off-by: Ettore Di Giacinto <mudler@localai.io>
* pin torch on chatterbox
Signed-off-by: Ettore Di Giacinto <mudler@localai.io>
* Fixups to index
Signed-off-by: Ettore Di Giacinto <mudler@localai.io>
* CI
Signed-off-by: Ettore Di Giacinto <mudler@localai.io>
* Debug CI
* Install accellerators deps
Signed-off-by: Ettore Di Giacinto <mudler@localai.io>
* Add target arch
* Add cuda minor version
Signed-off-by: Ettore Di Giacinto <mudler@localai.io>
* Use self-hosted runners
Signed-off-by: Ettore Di Giacinto <mudler@localai.io>
* ci: use quay for test images
Signed-off-by: Ettore Di Giacinto <mudler@localai.io>
* fixups for vllm and chatterbox
Signed-off-by: Ettore Di Giacinto <mudler@localai.io>
* Small fixups on CI
Signed-off-by: Ettore Di Giacinto <mudler@localai.io>
* chatterbox is only available for nvidia
Signed-off-by: Ettore Di Giacinto <mudler@localai.io>
* Simplify CI builds
Signed-off-by: Ettore Di Giacinto <mudler@localai.io>
* Adapt test, use qwen3
Signed-off-by: Ettore Di Giacinto <mudler@localai.io>
* chore(model gallery): add jina-reranker-v1-tiny-en-gguf
Signed-off-by: Ettore Di Giacinto <mudler@localai.io>
* fix(gguf-parser): recover from potential panics that can happen while reading ggufs with gguf-parser
Signed-off-by: Ettore Di Giacinto <mudler@localai.io>
* Use reranker from llama.cpp in AIO images
Signed-off-by: Ettore Di Giacinto <mudler@localai.io>
* Limit concurrent jobs
Signed-off-by: Ettore Di Giacinto <mudler@localai.io>
---------
Signed-off-by: Ettore Di Giacinto <mudler@localai.io>
Signed-off-by: Ettore Di Giacinto <mudler@users.noreply.github.com>
* feat(realtime): Initial Realtime API implementation
Signed-off-by: Ettore Di Giacinto <mudler@localai.io>
* chore: go mod tidy
Signed-off-by: Richard Palethorpe <io@richiejp.com>
* feat: Implement transcription only mode for realtime API
Reduce the scope of the real time API for the initial realease and make
transcription only mode functional.
Signed-off-by: Richard Palethorpe <io@richiejp.com>
* chore(build): Build backends on a separate layer to speed up core only changes
Signed-off-by: Richard Palethorpe <io@richiejp.com>
---------
Signed-off-by: Ettore Di Giacinto <mudler@localai.io>
Signed-off-by: Richard Palethorpe <io@richiejp.com>
Co-authored-by: Ettore Di Giacinto <mudler@localai.io>
This PR changes entirely the UI look and feeling. It updates all
sections and makes it also mobile-ready.
Signed-off-by: Ettore Di Giacinto <mudler@localai.io>
* Add machine tag option, add extraUsage option, grpc-server -> proto -> endpoint extraUsage data is broken for now
Signed-off-by: mintyleaf <mintyleafdev@gmail.com>
* remove redurant timing fields, fix not working timings output
Signed-off-by: mintyleaf <mintyleafdev@gmail.com>
* use middleware for Machine-Tag only if tag is specified
Signed-off-by: mintyleaf <mintyleafdev@gmail.com>
---------
Signed-off-by: mintyleaf <mintyleafdev@gmail.com>
Makes the web app honour the `X-Forwarded-Prefix` HTTP request header that may be sent by a reverse-proxy in order to inform the app that its public routes contain a path prefix.
For instance this allows to serve the webapp via a reverse-proxy/ingress controller under a path prefix/sub path such as e.g. `/localai/` while still being able to use the regular LocalAI routes/paths without prefix when directly connecting to the LocalAI server.
Changes:
* Add new `StripPathPrefix` middleware to strip the path prefix (provided with the `X-Forwarded-Prefix` HTTP request header) from the request path prior to matching the HTTP route.
* Add a `BaseURL` utility function to build the base URL, honouring the `X-Forwarded-Prefix` HTTP request header.
* Generate the derived base URL into the HTML (`head.html` template) as `<base/>` tag.
* Make all webapp-internal URLs (within HTML+JS) relative in order to make the browser resolve them against the `<base/>` URL specified within each HTML page's header.
* Make font URLs within the CSS files relative to the CSS file.
* Generate redirect location URLs using the new `BaseURL` function.
* Use the new `BaseURL` function to generate absolute URLs within gallery JSON responses.
Closes#3095
TL;DR:
The header-based approach allows to move the path prefix configuration concern completely to the reverse-proxy/ingress as opposed to having to align the path prefix configuration between LocalAI, the reverse-proxy and potentially other internal LocalAI clients.
The gofiber swagger handler already supports path prefixes this way, see e2d9e9916d/swagger.go (L79)
Signed-off-by: Max Goltzsche <max.goltzsche@gmail.com>
* Read jinja templates as fallback
Signed-off-by: Ettore Di Giacinto <mudler@localai.io>
* Move templating out of model loader
Signed-off-by: Ettore Di Giacinto <mudler@localai.io>
* Test TemplateMessages
Signed-off-by: Ettore Di Giacinto <mudler@localai.io>
* Set role and content from transformers
Signed-off-by: Ettore Di Giacinto <mudler@localai.io>
* Tests: be more flexible
Signed-off-by: Ettore Di Giacinto <mudler@localai.io>
* More jinja
Signed-off-by: Ettore Di Giacinto <mudler@localai.io>
* Small refactoring and adaptations
Signed-off-by: Ettore Di Giacinto <mudler@localai.io>
---------
Signed-off-by: Ettore Di Giacinto <mudler@localai.io>
Seem the "/metrics" endpoint that is source of confusion as people tends
to believe we collect telemetry data just because we import
"opentelemetry", however it is still a good idea to allow to disable
even local metrics if not really required.
See also: https://github.com/mudler/LocalAI/issues/3942
Signed-off-by: Ettore Di Giacinto <mudler@localai.io>
* fix(health): do not require auth for /healthz and /readyz
Fixes: #3655
Signed-off-by: Ettore Di Giacinto <mudler@localai.io>
* Comment so I don’t forget
Adding a reminder here...
---------
Signed-off-by: Ettore Di Giacinto <mudler@localai.io>
Co-authored-by: Dave <dave@gray101.com>