feat: localai assistant chat modality (#9602)

* fix(tests): inline model_test fixtures after tests/models_fixtures removal The previous reorg removed tests/models_fixtures/ but core/config/model_test.go still read CONFIG_FILE/MODELS_PATH env vars pointing into that directory, so `make test` failed with "open : no such file or directory" on the readConfigFile spec (the suite ran with --fail-fast and bailed before openresponses_test). Inline the YAMLs (config/embeddings/grpc/rwkv/whisper) directly into the test file, materialise them into a per-test tmpdir via BeforeEach, and drop the env-var lookups. The test no longer depends on Makefile plumbing. Signed-off-by: Ettore Di Giacinto <mudler@localai.io> Assisted-by: claude-code:claude-opus-4-7 [Edit] [Write] [Bash] * refactor(modeladmin): extract model-admin helpers into a service package Lift the bodies of EditModelEndpoint, PatchConfigEndpoint, ToggleStateModelEndpoint, TogglePinnedModelEndpoint and VRAMEstimateEndpoint into core/services/modeladmin so the same logic can be called by non-HTTP clients (notably the in-process MCP server that backs the LocalAI Assistant chat modality, landing in a follow-up commit). The HTTP handlers shrink to thin shells that parse echo inputs, call the matching helper, map typed errors (ErrNotFound, ErrConflict, ErrPathNotTrusted, ErrBadAction, ...) to the existing HTTP status codes, and render the existing response shapes. No REST-surface behaviour change; the existing localai endpoint tests cover the regression net. Adds focused unit tests for each helper against tmp-dir-backed ModelConfigLoader fixtures (deep-merge patch, rename + conflict, path separator guard, toggle/pin enable/disable, sync callback). Assisted-by: Claude:claude-opus-4-7 [Read] [Edit] [Write] [Bash] Signed-off-by: Ettore Di Giacinto <mudler@localai.io> * feat(assistant): LocalAI Assistant chat modality with in-memory MCP server Adds a chat modality, admin-only, that wires the chat session to an in-memory MCP server exposing LocalAI's own admin/management surface as tools. An admin can install models, manage backends, edit configs and check status by chatting; the LLM calls tools like gallery_search, install_model, import_model_uri, list_installed_models, edit_model_config and surfaces the results. Same Go package powers two modes: pkg/mcp/localaitools/ NewServer(client, opts) builds an MCP server that registers the 19-tool admin catalog. The LocalAIClient interface has two impls: - inproc.Client — calls services directly (no HTTP loopback, no synthetic admin API key). Used in-process by the chat handler. - httpapi.Client — calls the LocalAI REST API. Used by the new `local-ai mcp-server --target=…` subcommand to control a remote LocalAI from a stdio MCP host. Tools and their embedded skill prompts are agnostic to which client backs them. Skill prompts are markdown files under prompts/, embedded via go:embed and assembled into the system prompt at server init. Wiring: - core/http/endpoints/mcp/localai_assistant.go — process-wide holder that spins up the in-memory MCP server once at Application start using paired net.Pipe transports, then reuses LocalToolExecutor (no fork) for every chat request that opts in. - core/http/endpoints/openai/chat.go — small branch ahead of the existing MCP block: when metadata.localai_assistant=true, defense-in-depth admin check + executor swap + system-prompt injection. All downstream tool dispatch is unchanged. - core/http/auth/{permissions,features}.go — adds FeatureLocalAIAssistant; gating happens at the chat handler entry plus admin-only `/api/settings`. - core/cli/{run.go,cli.go,mcp_server.go} — LOCALAI_DISABLE_ASSISTANT flag (runtime-toggleable via Settings, no restart), plus `local-ai mcp-server` stdio subcommand. - core/config/runtime_settings.go — `localai_assistant_enabled` runtime setting; the chat handler reads `DisableLocalAIAssistant` live at request entry. UI: - Home.jsx — prominent self-explanatory CTA card on first run ("Manage LocalAI by chatting"); collapses to a compact "Manage by chat" button in the quick-links row once used, persisted via localStorage. - Chat.jsx — admin-only "Manage" toggle in the chat header, "Manage mode" badge, dedicated empty-state copy, starter chips. - Settings.jsx — "LocalAI Assistant" section with the runtime enable toggle. - useChat.js — `localaiAssistant` flag on the chat schema; injects `metadata.localai_assistant=true` on requests when active. Distributed mode: the in-memory MCP server lives only on the head node; inproc.Client wraps already-distributed-aware services so installs propagate to workers via the existing GalleryService machinery. Documentation: `.agents/localai-assistant-mcp.md` is the contributor contract — when adding an admin REST endpoint, also add a LocalAIClient method, an inproc + httpapi impl, a tool registration, and a skill prompt update; the AGENTS.md index links to it. Out of scope (follow-ups): per-tool RBAC granularity for non-admin read-only access; streaming mcp_tool_progress for long installs; React Vitest rig for the UI changes. Assisted-by: Claude:claude-opus-4-7 [Read] [Edit] [Write] [Bash] Signed-off-by: Ettore Di Giacinto <mudler@localai.io> * refactor(assistant): extract tool/capability/MiB/server-name constants The MCP tool surface, capability tag set, server-name default, and the chat-handler metadata key were repeated as bare string literals across seven files. Renaming any one required hand-editing every call site and risked code/test/prompt drift. This pulls them into typed constants: - pkg/mcp/localaitools/tools.go — Tool* constants for the 19 MCP tools, plus DefaultServerName. - pkg/mcp/localaitools/capability.go — typed Capability + constants for the capability tag set the LLM passes to list_installed_models. The type rides through LocalAIClient.ListInstalledModels and replaces the triplet of "embed"/"embedding"/"embeddings" with the single CapabilityEmbeddings. - pkg/mcp/localaitools/inproc/client.go — bytesPerMiB constant for the VRAMEstimate byte→MB conversion. - core/http/endpoints/mcp/tools.go — MetadataKeyLocalAIAssistant for the "localai_assistant" request-metadata key consumed by the chat handler. Tool registrations, the test catalog, the dispatch table, the validation fixtures, and the fake/stub clients all reference the constants. The embedded skill prompts under prompts/ keep their bare strings (go:embed markdown can't import Go constants); the existing TestPromptsContain SafetyAnchors guards the alignment. No behaviour change. All tests pass with -race. Assisted-by: Claude:claude-opus-4-7 [Read] [Edit] [Write] [Bash] Signed-off-by: Ettore Di Giacinto <mudler@localai.io> * refactor(modeladmin): typed Action for ToggleState/TogglePinned The toggle/pin verbs were bare strings everywhere — handler signatures, service implementations, MCP tool args, the fake/stub clients, the inproc and httpapi LocalAIClient impls, plus 4 test files. A typo in any caller silently fell through to the runtime "must be 'enable' or 'disable'" check. Introduce core/services/modeladmin.Action (string alias) with ActionEnable, ActionDisable, ActionPin, ActionUnpin and a small Valid helper. The compiler now catches mismatches at every boundary; renames ripple through one source of truth. LocalAIClient.ToggleModelState/Pinned signatures change to take modeladmin.Action. The package is brand-new and unreleased so this is a free public-API tightening. Assisted-by: Claude:claude-opus-4-7 [Read] [Edit] [Bash] Signed-off-by: Ettore Di Giacinto <mudler@localai.io> * fix(assistant): respect ctx cancellation on gallery channel sends InstallModel, DeleteModel, ImportModelURI, InstallBackend and UpgradeBackend all pushed onto galleryop channels with bare sends. If the worker was paused or the buffer full, the chat-handler goroutine blocked forever — the LLM kept polling and the request leaked. Wrap the five sends in a sendModelOp/sendBackendOp helper that selects on ctx.Done() so a cancelled chat completion surfaces context.Canceled back to the LLM instead of hanging. Adds inproc/client_test.go with a pre-cancelled-ctx regression test on InstallModel; the helpers are shared so the same guarantee covers the other four call sites. Assisted-by: Claude:claude-opus-4-7 [Edit] [Write] [Bash] Signed-off-by: Ettore Di Giacinto <mudler@localai.io> * fix(assistant): graceful shutdown for in-memory holder and stdio CLI Two related leaks: - Application.start() built the LocalAIAssistantHolder but never wired Close() into the graceful-termination chain — the in-memory MCP transport pair stayed alive until process exit, and the goroutines behind net.Pipe() didn't drain. Hook into the existing signals.RegisterGracefulTerminationHandler chain (same pattern as core/http/endpoints/mcp/tools.go:770). - core/cli/mcp_server.go ran srv.Run with context.Background(); a Ctrl-C from the host (Claude Desktop, mcphost, npx inspector) or a SIGTERM from process supervision left the stdio loop reading from a closed pipe. Switch to signal.NotifyContext to surface the signal through ctx and let srv.Run drain. Assisted-by: Claude:claude-opus-4-7 [Read] [Edit] [Bash] Signed-off-by: Ettore Di Giacinto <mudler@localai.io> * fix(assistant): typed HTTPError + propagate prompt walk error The httpapi client detected "no such job" by substring-matching on the error string ("404", "could not find") — brittle to status-code formatting changes and to LocalAI fixing /models/jobs/:uuid to return a proper 404. Replace with a typed *HTTPError whose Is() method honours errors.Is(err, ErrHTTPNotFound). The 500-with-"could not find" branch stays as a transitional fallback documented in Is(). Same change covers ListNodes' 404 fallback for the /api/nodes endpoint. Adds httptest tests for both 404 and the legacy 500 path, plus a direct errors.Is exposure test so external callers (the standalone stdio CLI host) can match without re-string-parsing. Also tightens prompts.SystemPrompt: panic when fs.WalkDir on the embedded FS fails. The only realistic cause is a build-time //go:embed misconfiguration; serving an empty system prompt to the LLM is much worse than crashing init. TestSystemPromptIncludesAllEmbeddedFiles catches regressions in CI. Assisted-by: Claude:claude-opus-4-7 [Edit] [Write] [Bash] Signed-off-by: Ettore Di Giacinto <mudler@localai.io> * fix(modeladmin): atomic writes for model config files The five sites that wrote model YAML used os.WriteFile, which opens with O_TRUNC|O_WRONLY|O_CREATE. A crash mid-write left the destination truncated and the model unloadable until manual repair. Pre-existing behaviour inherited from the original endpoint handlers — fix once now that there's a single helper. Adds writeFileAtomic: writes to a sibling temp file, chmods, syncs via Close(), then os.Rename. Same-directory temp keeps the rename atomic on the same filesystem; cleanup runs on every error path so stray temps don't accumulate. No new dependency. Applied to: - ConfigService.PatchConfig - ConfigService.EditYAML (both rename and in-place branches) - mutateYAMLBoolFlag (drives ToggleState + TogglePinned) atomic_test.go covers the happy path plus a read-only-dir failure case that asserts the original file is preserved (skipped on Windows where the chmod trick is POSIX-specific). Assisted-by: Claude:claude-opus-4-7 [Edit] [Write] [Bash] Signed-off-by: Ettore Di Giacinto <mudler@localai.io> * chore(assistant): prune dead code, mark stub, document conventions Three small cleanups landing together: - Drop the unused errNotImplemented sentinel from inproc/client.go. All five methods that used to return it are wired to modeladmin helpers since the Phase B commit; the package var is dead. - Annotate httpapi.Client.GetModelConfig as a known stub. LocalAI's /models/edit/:name returns rendered HTML, not JSON, so the standalone CLI's get_model_config tool surfaces a clear error to the LLM. A future JSON-only /api/models/config-yaml/:name endpoint is tracked in the agent contract; FIXME points at it. - Extend `.agents/localai-assistant-mcp.md` with a "Code conventions" section that documents the audit-driven rules: tool/Capability/Action constants, errors.Is over substring matching, ctx-aware channel sends, atomic writes, and graceful shutdown. Refresh the file map so it lists tools.go and capability.go and drops the removed tools_bootstrap.go. The tools_models.go diff is a comment-only change explaining why the ModelName empty-string check stays at the tool layer (consistency across LocalAIClient implementations, since the SDK schema validator only enforces presence, not non-empty). Assisted-by: Claude:claude-opus-4-7 [Read] [Edit] [Bash] Signed-off-by: Ettore Di Giacinto <mudler@localai.io> * test(assistant): convert test files to ginkgo + gomega The repo convention (per core/http/endpoints/localai/*_test.go, core/gallery/**, etc.) is Ginkgo v2 with Gomega assertions. The tests I introduced for the assistant feature used vanilla testing.T, which made them stand out and stripped the BDD structure the rest of the suite relies on. Convert every test file in the assistant scope to Ginkgo: pkg/mcp/localaitools/ dto_test.go — Describe("DTOs round-trip through JSON") prompts_test.go — Describe("SystemPrompt assembler") server_test.go — Describe("Server tool catalog"), Describe("Tool dispatch"), Describe("Tool error surfacing"), Describe("Argument validation"), Describe("Concurrent tool calls") parity_test.go — Describe("LocalAIClient parity"), hosts the suite's single RunSpecs (the file is package localaitools_test so it can import httpapi without an import cycle; Ginkgo aggregates Describes from both the internal and external test packages into one run). httpapi/client_test.go — Describe("httpapi.Client against the LocalAI admin REST surface"), Describe("ErrHTTPNotFound"), Describe("Bearer token") inproc/client_test.go — Describe("inproc.Client cancellation") core/services/modeladmin/ config_test.go — Describe("ConfigService") with sub-Describes for GetConfig, PatchConfig, EditYAML state_test.go — Describe("ConfigService.ToggleState") pinned_test.go — Describe("ConfigService.TogglePinned") atomic_test.go — Describe("writeFileAtomic") core/http/endpoints/mcp/ localai_assistant_test.go — Describe("LocalAIAssistantHolder") Each package gets a `*_suite_test.go` with the standard `RegisterFailHandler(Fail) + RunSpecs(t, "...")` boilerplate. Helpers that previously took *testing.T (newTestService, writeModelYAML, readMap, sortedStrings, sortGalleries, etc.) drop the *T receiver and use Gomega Expectations directly. tmp dirs come from GinkgoT().TempDir(). No semantic change to test coverage — every original assertion has a direct Gomega counterpart. All suites pass with -race. Assisted-by: Claude:claude-opus-4-7 [Read] [Edit] [Write] [Bash] Signed-off-by: Ettore Di Giacinto <mudler@localai.io> * test+docs(assistant): drift detector for Tool ↔ REST route mapping Honest gap from the audit: the parity_test.go suite only checks four methods, and uses the same httpapi.Client for both sides — it asserts stability of the DTO shapes, not equivalence between in-process and HTTP. If a contributor adds an admin REST endpoint without an MCP tool, or a tool without a matching httpapi route, both surfaces silently diverge. Add a coverage test plus stronger docs: - pkg/mcp/localaitools/coverage_test.go introduces a hand-maintained toolToHTTPRoute map: every Tool* constant must list the REST endpoint the httpapi.Client hits (or "(none)" with a documented reason). Two Ginkgo specs assert the map and the published catalog stay in sync — one fails when a Tool is added without a route entry, the other fails when a route entry references a tool that no longer exists. Verified by removing the ToolDeleteModel entry locally; the test fired with a clear message pointing the contributor at the file. Deliberate non-test: we don't enumerate live admin REST routes from here. Walking the route registry requires booting Application; parsing core/http/routes/localai.go is brittle. The "new admin REST endpoint → MCP tool" direction stays a PR checklist item — see below. - AGENTS.md gets a new Quick Reference bullet that calls out the rule and points at the test by name. - .agents/api-endpoints-and-auth.md tightens the existing "Companion: MCP admin tool surface" subsection from "if useful, consider..." to "MUST be considered, with three concrete outcomes (tool added, deliberately skipped with documented reason, or forgot — which breaks the contract)". Adds a checklist item at the bottom of the file's authoritative checklist. Assisted-by: Claude:claude-opus-4-7 [Read] [Edit] [Write] [Bash] Signed-off-by: Ettore Di Giacinto <mudler@localai.io> * refactor(assistant): drop duplicate DTOs, surface canonical types Audit feedback: localaitools/dto.go reinvented several types that already existed in the codebase. Replace the duplicates with the canonical types so the LLM-visible wire format stays aligned with the rest of LocalAI by construction (no parallel structs to keep in sync). Removed (and the canonical type now used by the LocalAIClient interface): localaitools.Gallery → config.Gallery localaitools.GalleryModelHit → gallery.Metadata localaitools.VRAMEstimate → vram.EstimateResult Tightened scope: localaitools.Backend → kept, but reduced to {Name, Installed}. ListKnownBackends now returns []schema.KnownBackend (the canonical type already used by REST /backends/known). Kept with documented rationale: localaitools.JobStatus — galleryop.OpStatus has Error error which marshals to "{}". JobStatus is the JSON-friendly mirror. localaitools.Node — nodes.BackendNode carries gorm internals + token hash; we expose only the LLM-relevant fields. ImportModelURIRequest/Response — schema.ImportModelRequest and GalleryResponse are wire-shaped, mine are LLM-shaped (BackendPreference flat, AmbiguousBackend exposed). Side wins: - Drop bytesPerMiB; vram.EstimateResult already carries human-readable display strings (size_display, vram_display) the LLM uses directly. - Drop the handler-private vramEstimateRequest in core/http/endpoints/localai/vram.go and bind directly into modeladmin.VRAMRequest (now JSON-tagged). Both clients pass through these types now where possible (e.g. ListGalleries in inproc.Client is a one-liner returning AppConfig.Galleries; httpapi.Client.GallerySearch decodes straight into []gallery.Metadata). All tests green with -race. Assisted-by: Claude:claude-opus-4-7 [Read] [Edit] [Bash] Signed-off-by: Ettore Di Giacinto <mudler@localai.io> * refactor(assistant): extract REST route paths into named constants httpapi.Client had 18 bare-string path sites scattered across methods. Pull them into pkg/mcp/localaitools/httpapi/routes.go: static paths as package-private constants, dynamic paths as small builders that handle url.PathEscape on segment values. No behaviour change. Drops the now-unused net/url import from client.go since path escaping moved into routes.go alongside the path it applies to. Local-only by design: the server-side registrations in core/http/routes/localai.go remain bare strings. Sharing constants across the pkg/ ↔ core/ boundary would invert the layering today; the existing Tool↔REST drift-detector in coverage_test.go is the safety net for that direction. Signed-off-by: Ettore Di Giacinto <mudler@localai.io> Assisted-by: Claude:claude-opus-4-7 [Claude Code] * docs(assistant): align with shipped UI and dropped bootstrap env vars The LocalAI Assistant doc still described the older iteration: - The in-chat toggle was renamed from "Admin" to "Manage" (the badge is now "Manage mode" and the home page exposes a "Manage by chat" CTA). - LOCALAI_ASSISTANT_BOOTSTRAP_MODEL / --localai-assistant-bootstrap-model and the bootstrap_default_model tool were removed — admins pick a model from the existing selector instead, no env-var configuration required. - The shipped tool catalog includes import_model_uri but didn't appear in the doc; bootstrap_default_model appeared but no longer exists. - The Settings → LocalAI Assistant runtime toggle wasn't mentioned as the preferred way to disable without restart. Signed-off-by: Ettore Di Giacinto <mudler@localai.io> Assisted-by: Claude:claude-opus-4-7 [Claude Code] --------- Signed-off-by: Ettore Di Giacinto <mudler@localai.io>
2026-04-29 03:24:49 -04:00 · 2026-04-28 19:29:27 +02:00
parent 142919fc79
commit bcef72b9c1
78 changed files with 4916 additions and 661 deletions
--- a/.agents/api-endpoints-and-auth.md
+++ b/.agents/api-endpoints-and-auth.md
@@ -330,3 +330,16 @@ When adding a new endpoint:
 - [ ] Error responses use `schema.ErrorResponse` format (or `echo.NewHTTPError` with a mapped gRPC status — see the `mapBackendError` helper in `core/http/endpoints/localai/images.go`)
 - [ ] Tests cover both authenticated and unauthenticated access
 - [ ] Swagger regenerated (`make swagger`) if you changed any `@Router`/`@Tags`/`@Param` annotation
+
+## Companion: MCP admin tool surface
+
+**Required for admin endpoints.** Every new admin endpoint MUST be considered for the MCP admin tool surface — the REST API and the MCP tool catalog can drift silently otherwise, and both the LocalAI Assistant chat modality and the standalone `local-ai mcp-server` rely on `pkg/mcp/localaitools/` to mirror REST.
+
+Two outcomes are acceptable; one is not:
+
+- **Tool added.** The new endpoint is something an admin would manage conversationally (install, list, edit, toggle, upgrade). Follow the full checklist in [.agents/localai-assistant-mcp.md](localai-assistant-mcp.md): add a `LocalAIClient` interface method, implement it in both `inproc` and `httpapi`, register the tool with a `Tool*` constant, update the skill prompts, **and add the route to `toolToHTTPRoute` in `pkg/mcp/localaitools/coverage_test.go`**.
+- **Tool deliberately skipped.** The endpoint is internal/diagnostic and adding a chat path would be misleading. Document the decision in the PR description; no code action.
+- **Forgot.** This breaks the contract. The `TestToolHTTPRouteMappingComplete` test in `pkg/mcp/localaitools` is a partial guard (it checks every `Tool*` has a route mapping), but it does NOT detect new REST endpoints without a tool — that's still a process check on the PR author.
+
+**Add to the bottom of the checklist below**:
+- [ ] If admin: decided whether MCP coverage is needed; if yes, tool registered + map updated; if no, skip-reason in PR description.
--- a/.agents/localai-assistant-mcp.md
+++ b/.agents/localai-assistant-mcp.md
@@ -0,0 +1,97 @@
+# LocalAI Assistant — admin MCP server
+
+This document is the contract for **anyone** (human or AI agent) touching LocalAI's admin REST surface, the in-process MCP server that wraps it, or the embedded skill prompts that teach the assistant how to use it. Read this before adding/removing/renaming admin endpoints, MCP tools, or skill recipes.
+
+## What this feature is
+
+`pkg/mcp/localaitools/` is a public Go package that exposes LocalAI's admin/management surface as an MCP server. It is used in two ways:
+
+1. **In-process**: when an admin opens a chat with `metadata.localai_assistant=true`, the chat handler injects the in-memory MCP server (paired `net.Pipe()` transport, no HTTP loopback) so the LLM can install models, manage backends and edit configs by chatting.
+2. **Standalone**: the `local-ai mcp-server --target=…` subcommand serves the same MCP server over stdio, talking HTTP to a remote LocalAI instance.
+
+The two modes share **all** tool definitions and skill prompts. They differ only in their `LocalAIClient` implementation (`inproc/` calls services directly; `httpapi/` calls REST).
+
+## The three things you must keep in sync
+
+When you change LocalAI's admin surface, three layers must stay aligned:
+
+1. **REST endpoint** in `core/http/endpoints/localai/*.go`.
+2. **MCP tool registration** in `pkg/mcp/localaitools/tools_*.go`, plus a method on `LocalAIClient` (in `client.go`) and implementations in both `inproc/client.go` **and** `httpapi/client.go`.
+3. **Skill prompt** under `pkg/mcp/localaitools/prompts/skills/*.md` — the markdown that teaches the LLM how to use the new tool. If the new tool fits an existing recipe, update that recipe; otherwise add a new file.
+
+If you ship a REST endpoint without (2) and (3), conversational admins won't see the feature.
+
+## Checklist for adding a new admin endpoint
+
+- [ ] REST endpoint exists in `core/http/endpoints/localai/*.go` and is gated by `auth.RequireAdmin()` in `core/http/routes/localai.go`.
+- [ ] `LocalAIClient` interface in `pkg/mcp/localaitools/client.go` has a method covering the new operation.
+- [ ] DTOs added/updated in `pkg/mcp/localaitools/dto.go` (JSON-tagged; never expose raw service types).
+- [ ] `inproc/client.go` implements the new method by calling the service directly (not via HTTP loopback).
+- [ ] `httpapi/client.go` implements the new method by calling the REST endpoint.
+- [ ] Tool registration added in the appropriate `pkg/mcp/localaitools/tools_*.go`. Mutating tools must reference safety rule 1 in the description.
+- [ ] If the tool is mutating, ensure `Options{DisableMutating: true}` skips it (mirror the pattern in `tools_models.go`).
+- [ ] Skill prompt added or updated under `pkg/mcp/localaitools/prompts/skills/`. The prompt must instruct the LLM when to call the tool, what to ask the user first, and what to do on error.
+- [ ] Tests:
+   - `pkg/mcp/localaitools/server_test.go` adds the tool name to `expectedFullCatalog` and `expectedReadOnlyCatalog` (if read-only).
+   - Tool dispatch is added to `TestEachToolDispatchesToClient`.
+   - `pkg/mcp/localaitools/httpapi/client_test.go` covers the new HTTP path.
+
+## Adding a new skill recipe (no new tool)
+
+Sometimes you want to teach the LLM a new pattern that uses existing tools. Drop a markdown file under `pkg/mcp/localaitools/prompts/skills/<verb>_<noun>.md`. The file is automatically embedded by `//go:embed` and assembled into the system prompt in lexicographic order. No Go changes needed.
+
+Conventions:
+- Filename: `<verb>_<noun>.md` (e.g. `install_chat_model.md`, `upgrade_backend.md`).
+- First line: `# Skill: <Title Case description>`.
+- Number the steps. Reference exact tool names in backticks.
+- If the skill mutates state, remind the LLM to confirm with the user.
+
+## Code conventions
+
+These rules guard against the magic-literal drift that surfaced in the first audit. Do not re-introduce bare strings.
+
+- **Tool names** always come from the `Tool*` constants in `pkg/mcp/localaitools/tools.go`. Tool registrations, the test catalog (`server_test.go`'s `expectedFullCatalog` / `expectedReadOnlyCatalog`), and dispatch tables reference the constants. The embedded skill prompts under `prompts/` keep bare strings — that's the one allowed exception, and `TestPromptsContainSafetyAnchors` enforces alignment.
+- **Toggle/pin actions** use the `modeladmin.Action` type (`pkg/mcp/localaitools` and `core/services/modeladmin`). Use `ActionEnable`/`ActionDisable`/`ActionPin`/`ActionUnpin`; never bare `"enable"`/`"pin"` strings.
+- **Capability tags** for `list_installed_models` use the `localaitools.Capability` type (`capability.go`). The `LocalAIClient.ListInstalledModels` interface takes a typed `Capability`, and the `inproc` switch only accepts canonical values (`"embed"`/`"embedding"` are not aliases — only `CapabilityEmbeddings`).
+- **HTTP error checks** in `httpapi.Client` use `errors.Is(err, ErrHTTPNotFound)`, not substring matches on `err.Error()`. The typed `*HTTPError` carries `StatusCode` and `Body`; add new sentinel errors as needed rather than re-introducing string matching.
+- **Channel sends** to `GalleryService.ModelGalleryChannel` / `BackendGalleryChannel` from inproc clients MUST select on `ctx.Done()` so a cancelled chat completion releases the goroutine. See `inproc.sendModelOp` / `sendBackendOp`.
+- **Disk writes** of model config YAML go through `modeladmin.writeFileAtomic` (temp file + `os.Rename`). `os.WriteFile` truncates on crash and corrupts the model.
+- **MCP server lifecycle**: every initialised holder MUST register `Close()` with `signals.RegisterGracefulTerminationHandler`. The standalone `mcp-server` CLI uses `signal.NotifyContext` to honour SIGINT/SIGTERM.
+
+## File map (where to look)
+
+```
+pkg/mcp/localaitools/
+  client.go              # LocalAIClient interface + DTO registry
+  dto.go                 # JSON-tagged DTOs shared by both client impls
+  server.go              # NewServer(client, opts) — registers tools
+  tools.go               # Tool* name constants (single source of truth)
+  capability.go          # Capability type + constants
+  tools_models.go        # gallery_search, install_model, import_model_uri, ...
+  tools_backends.go
+  tools_config.go
+  tools_system.go
+  tools_state.go
+  prompts.go             # //go:embed loader + SystemPrompt(opts)
+  prompts/00_role.md
+  prompts/10_safety.md   # SAFETY RULES — change with care
+  prompts/20_tools.md    # curated tool catalog with one-liners
+  prompts/skills/*.md
+  inproc/client.go       # in-process LocalAIClient (services-direct)
+  httpapi/client.go      # REST LocalAIClient (for standalone CLI / remote)
+core/http/endpoints/mcp/
+  localai_assistant.go   # process-wide holder + LocalToolExecutor
+core/cli/mcp_server.go   # local-ai mcp-server subcommand
+```
+
+## Why two clients
+
+The in-process MCP server runs inside the same LocalAI binary that serves chat. Going over HTTP loopback would (a) require minting a synthetic admin API key for the server to authenticate against itself, (b) double-marshal every tool dispatch, and (c) lose access to in-process channels (e.g. `GalleryService.ModelGalleryChannel` for streaming install progress). So in-process uses `inproc.Client`. The standalone stdio CLI talks to a *remote* LocalAI; HTTP is the only option, so it uses `httpapi.Client`. Both implement the same `LocalAIClient` interface, and the parity test in `pkg/mcp/localaitools/parity_test.go` (when present) keeps their output equivalent.
+
+## Why prompt-enforced confirmation, not code gates
+
+The user chose KISS. Every mutating tool has a safety rule (`prompts/10_safety.md` rule 1) that requires the LLM to summarise the action and wait for explicit user confirmation before calling it. There is no `plan_*`/`apply_*` two-step in code. If you add a mutating tool, do **not** add per-tool confirmation logic in Go — instead, list the new tool name in `prompts/10_safety.md` so the LLM knows it falls under the confirmation rule.
+
+## Distributed mode
+
+The in-memory MCP server runs only on the head node (where the chat handler runs). `inproc.Client` wraps services that are already distributed-aware (`GalleryService` coordinates with workers; `ListNodes` reads the NATS-populated registry). No NATS routing of MCP tools — the admin surface lives on the head, period.
--- a/AGENTS.md
+++ b/AGENTS.md
@@ -28,6 +28,7 @@ LocalAI follows the Linux kernel project's [guidelines for AI coding assistants]
 | [.agents/api-endpoints-and-auth.md](.agents/api-endpoints-and-auth.md) | Adding API endpoints, auth middleware, feature permissions, user access control |
 | [.agents/debugging-backends.md](.agents/debugging-backends.md) | Debugging runtime backend failures, dependency conflicts, rebuilding backends |
 | [.agents/adding-gallery-models.md](.agents/adding-gallery-models.md) | Adding GGUF models from HuggingFace to the model gallery |
+| [.agents/localai-assistant-mcp.md](.agents/localai-assistant-mcp.md) | LocalAI Assistant chat modality — adding admin tools to the in-process MCP server, editing skill prompts, keeping REST + MCP + skills in sync |

 ## Quick Reference

@@ -36,5 +37,6 @@ LocalAI follows the Linux kernel project's [guidelines for AI coding assistants]
 - **Comments**: Explain *why*, not *what*
 - **Docs**: Update `docs/content/` when adding features or changing config
 - **New API endpoints**: LocalAI advertises its capability surface in several independent places — swagger `@Tags`, `/api/instructions` registry, auth `RouteFeatureRegistry`, React UI `capabilities.js`, docs. Read [.agents/api-endpoints-and-auth.md](.agents/api-endpoints-and-auth.md) and follow its checklist — missing any surface means clients, admins, and the UI won't know the endpoint exists.
+- **Admin endpoints → MCP tool**: every admin endpoint that an admin would manage conversationally (install/list/edit/toggle/upgrade) MUST also be exposed as an MCP tool in `pkg/mcp/localaitools/`. The LocalAI Assistant chat modality and the standalone `local-ai mcp-server` consume that package; drift between REST and MCP is a real risk. Read [.agents/localai-assistant-mcp.md](.agents/localai-assistant-mcp.md) — the `TestToolHTTPRouteMappingComplete` test fails until you wire the new tool and update the route map.
 - **Build**: Inspect `Makefile` and `.github/workflows/` — ask the user before running long builds
 - **UI**: The active UI is the React app in `core/http/react-ui/`. The older Alpine.js/HTML UI in `core/http/static/` is pending deprecation — all new UI work goes in the React UI
--- a/core/application/application.go
+++ b/core/application/application.go
@@ -17,7 +17,10 @@ import (
 	"github.com/mudler/LocalAI/core/services/voicerecognition"
 	"github.com/mudler/LocalAI/core/templates"
 	pkggrpc "github.com/mudler/LocalAI/pkg/grpc"
+	localaitools "github.com/mudler/LocalAI/pkg/mcp/localaitools"
+	localaiInproc "github.com/mudler/LocalAI/pkg/mcp/localaitools/inproc"
 	"github.com/mudler/LocalAI/pkg/model"
+	"github.com/mudler/LocalAI/pkg/signals"
 	"github.com/mudler/xlog"
 	"gorm.io/gorm"
 )
@@ -60,6 +63,10 @@ type Application struct {

 	// Upgrade checker (background service for detecting backend upgrades)
 	upgradeChecker *UpgradeChecker
+
+	// LocalAI Assistant in-process MCP server. nil when DisableLocalAIAssistant
+	// is set; otherwise initialised in start() after galleryService.
+	localAIAssistant *mcpTools.LocalAIAssistantHolder
 }

 func newApplication(appConfig *config.ApplicationConfig) *Application {
@@ -137,6 +144,13 @@ func (a *Application) UpgradeChecker() *UpgradeChecker {
 	return a.upgradeChecker
 }

+// LocalAIAssistant returns the in-process MCP holder used by the chat handler
+// when an admin opts into the assistant modality. Returns nil when the feature
+// is disabled at startup.
+func (a *Application) LocalAIAssistant() *mcpTools.LocalAIAssistantHolder {
+	return a.localAIAssistant
+}
+
 // distributedDB returns the PostgreSQL database for distributed coordination,
 // or nil in standalone mode.
 func (a *Application) distributedDB() *gorm.DB {
@@ -230,6 +244,32 @@ func (a *Application) start() error {

 	a.galleryService = galleryService

+	// LocalAI Assistant: in-process MCP server exposing admin tools. Initialised
+	// once at startup and reused across chat sessions that opt in via metadata.
+	if !a.applicationConfig.DisableLocalAIAssistant {
+		holder := mcpTools.NewLocalAIAssistantHolder()
+		assistantClient := localaiInproc.New(
+			a.applicationConfig,
+			a.applicationConfig.SystemState,
+			a.backendLoader,
+			a.modelLoader,
+			a.galleryService,
+		)
+		if err := holder.Initialize(a.applicationConfig.Context, assistantClient, localaitools.Options{}); err != nil {
+			// Why log+continue instead of fail: the assistant is an optional
+			// feature; a failure here must not take down the whole server.
+			xlog.Warn("LocalAI Assistant initialisation failed; feature unavailable", "error", err)
+		} else {
+			a.localAIAssistant = holder
+			// Tear the in-memory transport pair down on SIGINT/SIGTERM so the
+			// goroutine ends cleanly. Mirrors how core/http/endpoints/mcp/tools.go
+			// closes its per-model MCP sessions on graceful termination.
+			signals.RegisterGracefulTerminationHandler(func() {
+				_ = holder.Close()
+			})
+		}
+	}
+
 	// Initialize agent job service (Start() is deferred to after distributed wiring)
 	agentJobService := agentpool.NewAgentJobService(
 		a.ApplicationConfig(),
--- a/core/cli/cli.go
+++ b/core/cli/cli.go
@@ -20,6 +20,7 @@ var CLI struct {
 	AgentWorker     AgentWorkerCMD     `cmd:"" name:"agent-worker" help:"Start an agent worker for distributed mode (executes agent chats via NATS)"`
 	Util            UtilCMD            `cmd:"" help:"Utility commands"`
 	Agent           AgentCMD           `cmd:"" help:"Run agents standalone without the full LocalAI server"`
+	MCPServer       MCPServerCMD       `cmd:"" name:"mcp-server" help:"Run the LocalAI admin tool surface as a stdio MCP server (controls a remote LocalAI instance over HTTP)"`
 	Explorer        ExplorerCMD        `cmd:"" help:"Run p2p explorer"`
 	Completion      CompletionCMD      `cmd:"" help:"Generate shell completion scripts for bash, zsh, or fish"`
 }
--- a/core/cli/mcp_server.go
+++ b/core/cli/mcp_server.go
@@ -0,0 +1,47 @@
+package cli
+
+import (
+	"context"
+	"fmt"
+	"os"
+	"os/signal"
+	"syscall"
+
+	"github.com/modelcontextprotocol/go-sdk/mcp"
+
+	cliContext "github.com/mudler/LocalAI/core/cli/context"
+	localaitools "github.com/mudler/LocalAI/pkg/mcp/localaitools"
+	"github.com/mudler/LocalAI/pkg/mcp/localaitools/httpapi"
+)
+
+// MCPServerCMD runs the LocalAI admin tool surface as a stdio MCP server,
+// targeting a remote LocalAI instance over its HTTP API. The same Go package
+// that powers the in-process LocalAI Assistant chat modality is used here —
+// only the LocalAIClient implementation differs (httpapi instead of inproc).
+type MCPServerCMD struct {
+	Target   string `env:"LOCALAI_MCP_TARGET" default:"http://localhost:8080" help:"LocalAI base URL"`
+	APIKey   string `env:"LOCALAI_API_KEY" help:"Bearer API key for the target LocalAI"`
+	ReadOnly bool   `help:"Skip registration of mutating tools (install/delete/edit/upgrade/etc.) so the assistant can browse without changing remote state"`
+}
+
+func (m *MCPServerCMD) Run(_ *cliContext.Context) error {
+	if m.Target == "" {
+		return fmt.Errorf("--target / LOCALAI_MCP_TARGET is required")
+	}
+
+	client := httpapi.New(m.Target, m.APIKey)
+	srv := localaitools.NewServer(client, localaitools.Options{
+		DisableMutating: m.ReadOnly,
+	})
+
+	// Stdio: the host (e.g. Claude Desktop, Cursor, mcphost) talks JSON-RPC
+	// over our stdin/stdout. There's nothing else this process should print —
+	// every other goroutine logging to stderr is fine, but stdout is sacred.
+	//
+	// Honour SIGINT/SIGTERM so a Ctrl-C from the host or `kill -TERM` from
+	// process supervision gives srv.Run a chance to drain in-flight calls
+	// before exiting.
+	ctx, stop := signal.NotifyContext(context.Background(), os.Interrupt, syscall.SIGTERM)
+	defer stop()
+	return srv.Run(ctx, &mcp.StdioTransport{})
+}
--- a/core/cli/run.go
+++ b/core/cli/run.go
@@ -101,6 +101,9 @@ type RunCMD struct {
 	AgentJobRetentionDays              int      `env:"LOCALAI_AGENT_JOB_RETENTION_DAYS,AGENT_JOB_RETENTION_DAYS" default:"30" help:"Number of days to keep agent job history (default: 30)" group:"api"`
 	OpenResponsesStoreTTL              string   `env:"LOCALAI_OPEN_RESPONSES_STORE_TTL,OPEN_RESPONSES_STORE_TTL" default:"0" help:"TTL for Open Responses store (e.g., 1h, 30m, 0 = no expiration)" group:"api"`

+	// LocalAI Assistant chat modality (in-process admin MCP server)
+	DisableLocalAIAssistant bool `env:"LOCALAI_DISABLE_ASSISTANT" default:"false" help:"Disable the LocalAI Assistant chat modality (in-process admin MCP server)" group:"assistant"`
+
 	// Agent Pool (LocalAGI)
 	DisableAgents                  bool   `env:"LOCALAI_DISABLE_AGENTS" default:"false" help:"Disable the agent pool feature" group:"agents"`
 	AgentPoolAPIURL                string `env:"LOCALAI_AGENT_POOL_API_URL" help:"Default API URL for agents (defaults to self-referencing LocalAI)" group:"agents"`
@@ -323,6 +326,9 @@ func (r *RunCMD) Run(ctx *cliContext.Context) error {
 	if r.AgentPoolDefaultModel != "" {
 		opts = append(opts, config.WithAgentPoolDefaultModel(r.AgentPoolDefaultModel))
 	}
+	if r.DisableLocalAIAssistant {
+		opts = append(opts, config.WithDisableLocalAIAssistant(true))
+	}
 	if r.AgentPoolMultimodalModel != "" {
 		opts = append(opts, config.WithAgentPoolMultimodalModel(r.AgentPoolMultimodalModel))
 	}
--- a/core/config/application_config.go
+++ b/core/config/application_config.go
@@ -103,6 +103,10 @@ type ApplicationConfig struct {

 	// Distributed / Horizontal Scaling
 	Distributed DistributedConfig
+
+	// LocalAI Assistant chat modality. Hard-disable the in-process admin MCP
+	// server with this flag; runtime-toggleable via /api/settings.
+	DisableLocalAIAssistant bool
 }

 // AuthConfig holds configuration for user authentication and authorization.
@@ -825,6 +829,15 @@ func WithAuthDefaultAPIKeyExpiry(expiry string) AppOption {
 	}
 }

+// WithDisableLocalAIAssistant hard-disables the in-process admin MCP server.
+// When set, the chat-handler branch for metadata.localai_assistant=true
+// returns a "feature unavailable" error.
+func WithDisableLocalAIAssistant(disabled bool) AppOption {
+	return func(o *ApplicationConfig) {
+		o.DisableLocalAIAssistant = disabled
+	}
+}
+
 // ToConfigLoaderOptions returns a slice of ConfigLoader Option.
 // Some options defined at the application level are going to be passed as defaults for
 // all the configuration for the models.
@@ -916,6 +929,9 @@ func (o *ApplicationConfig) ToRuntimeSettings() RuntimeSettings {
 	agentPoolEnableLogs := o.AgentPool.EnableLogs
 	agentPoolCollectionDBPath := o.AgentPool.CollectionDBPath

+	// LocalAI Assistant settings
+	localAIAssistantEnabled := !o.DisableLocalAIAssistant
+
 	return RuntimeSettings{
 		WatchdogEnabled:           &watchdogEnabled,
 		WatchdogIdleEnabled:       &watchdogIdle,
@@ -959,6 +975,7 @@ func (o *ApplicationConfig) ToRuntimeSettings() RuntimeSettings {
 		AgentPoolChunkOverlap:     &agentPoolChunkOverlap,
 		AgentPoolEnableLogs:       &agentPoolEnableLogs,
 		AgentPoolCollectionDBPath: &agentPoolCollectionDBPath,
+		LocalAIAssistantEnabled:   &localAIAssistantEnabled,
 	}
 }

@@ -1144,6 +1161,13 @@ func (o *ApplicationConfig) ApplyRuntimeSettings(settings *RuntimeSettings) (req
 		requireRestart = true
 	}

+	// LocalAI Assistant: read live at request entry by the chat handler, so
+	// flipping the disable flag takes effect on the next request without a
+	// restart.
+	if settings.LocalAIAssistantEnabled != nil {
+		o.DisableLocalAIAssistant = !*settings.LocalAIAssistantEnabled
+	}
+
 	// Note: ApiKeys requires special handling (merging with startup keys) - handled in caller

 	return requireRestart
--- a/core/config/runtime_settings.go
+++ b/core/config/runtime_settings.go
@@ -73,4 +73,8 @@ type RuntimeSettings struct {
 	AgentPoolChunkOverlap     *int    `json:"agent_pool_chunk_overlap,omitempty"`
 	AgentPoolEnableLogs       *bool   `json:"agent_pool_enable_logs,omitempty"`
 	AgentPoolCollectionDBPath *string `json:"agent_pool_collection_db_path,omitempty"`
+
+	// LocalAI Assistant settings — read live by the chat handler at request
+	// entry, so flipping the toggle takes effect on the next request.
+	LocalAIAssistantEnabled *bool `json:"localai_assistant_enabled,omitempty"` // negation of DisableLocalAIAssistant for UI clarity
 }
--- a/core/http/auth/features.go
+++ b/core/http/auth/features.go
@@ -139,6 +139,7 @@ func AgentFeatureMetas() []FeatureMeta {
 		{FeatureSkills, "Skills", false},
 		{FeatureCollections, "Collections", false},
 		{FeatureMCPJobs, "MCP CI Jobs", false},
+		{FeatureLocalAIAssistant, "LocalAI Assistant", false},
 	}
 }

--- a/core/http/auth/permissions.go
+++ b/core/http/auth/permissions.go
@@ -27,10 +27,11 @@ func GetCachedUserPermissions(c echo.Context, db *gorm.DB, userID string) (*User
 // Feature name constants — all code must use these, never bare strings.
 const (
 	// Agent features (default OFF for new users)
-	FeatureAgents      = "agents"
-	FeatureSkills      = "skills"
-	FeatureCollections = "collections"
-	FeatureMCPJobs     = "mcp_jobs"
+	FeatureAgents           = "agents"
+	FeatureSkills           = "skills"
+	FeatureCollections      = "collections"
+	FeatureMCPJobs          = "mcp_jobs"
+	FeatureLocalAIAssistant = "localai_assistant"

 	// General features (default OFF for new users)
 	FeatureFineTuning   = "fine_tuning"
@@ -56,7 +57,7 @@ const (
 )

 // AgentFeatures lists agent-related features (default OFF).
-var AgentFeatures = []string{FeatureAgents, FeatureSkills, FeatureCollections, FeatureMCPJobs}
+var AgentFeatures = []string{FeatureAgents, FeatureSkills, FeatureCollections, FeatureMCPJobs, FeatureLocalAIAssistant}

 // GeneralFeatures lists general features (default OFF).
 var GeneralFeatures = []string{FeatureFineTuning, FeatureQuantization}
--- a/core/http/endpoints/localai/config_meta.go
+++ b/core/http/endpoints/localai/config_meta.go
@@ -6,21 +6,17 @@ import (
 	"io"
 	"net/http"
 	"net/url"
-	"os"
 	"reflect"
 	"sort"
 	"strings"

-	"dario.cat/mergo"
 	"github.com/labstack/echo/v4"
 	"github.com/mudler/LocalAI/core/config"
 	"github.com/mudler/LocalAI/core/config/meta"
 	"github.com/mudler/LocalAI/core/gallery"
 	"github.com/mudler/LocalAI/core/services/galleryop"
+	"github.com/mudler/LocalAI/core/services/modeladmin"
 	"github.com/mudler/LocalAI/pkg/model"
-	"github.com/mudler/LocalAI/pkg/utils"
-	"github.com/mudler/xlog"
-	"gopkg.in/yaml.v3"
 )

 // ConfigMetadataEndpoint returns field metadata for config fields.
@@ -156,86 +152,23 @@ func AutocompleteEndpoint(cl *config.ModelConfigLoader, ml *model.ModelLoader, a
 // @Success 200 {object} map[string]any "success message"
 // @Router /api/models/config-json/{name} [patch]
 func PatchConfigEndpoint(cl *config.ModelConfigLoader, _ *model.ModelLoader, appConfig *config.ApplicationConfig) echo.HandlerFunc {
+	svc := modeladmin.NewConfigService(cl, appConfig)
 	return func(c echo.Context) error {
 		modelName := c.Param("name")
 		if decoded, err := url.PathUnescape(modelName); err == nil {
 			modelName = decoded
 		}
-		if modelName == "" {
-			return c.JSON(http.StatusBadRequest, map[string]any{"error": "model name is required"})
-		}
-
-		modelConfig, exists := cl.GetModelConfig(modelName)
-		if !exists {
-			return c.JSON(http.StatusNotFound, map[string]any{"error": "model configuration not found"})
-		}
-
 		patchBody, err := io.ReadAll(c.Request().Body)
 		if err != nil || len(patchBody) == 0 {
 			return c.JSON(http.StatusBadRequest, map[string]any{"error": "request body is empty or unreadable"})
 		}
-
 		var patchMap map[string]any
 		if err := json.Unmarshal(patchBody, &patchMap); err != nil {
 			return c.JSON(http.StatusBadRequest, map[string]any{"error": "invalid JSON: " + err.Error()})
 		}
-
-		// Read the raw YAML from disk rather than serializing the in-memory config.
-		// The in-memory config has SetDefaults() applied, which would persist
-		// runtime-only defaults (top_p, temperature, mirostat, etc.) to the file.
-		configPath := modelConfig.GetModelConfigFile()
-		if err := utils.VerifyPath(configPath, appConfig.SystemState.Model.ModelsPath); err != nil {
-			return c.JSON(http.StatusForbidden, map[string]any{"error": "config path not trusted: " + err.Error()})
+		if _, err := svc.PatchConfig(c.Request().Context(), modelName, patchMap); err != nil {
+			return c.JSON(httpStatusForModelAdminError(err), map[string]any{"error": err.Error()})
 		}
-
-		diskYAML, err := os.ReadFile(configPath)
-		if err != nil {
-			return c.JSON(http.StatusInternalServerError, map[string]any{"error": "failed to read config file: " + err.Error()})
-		}
-
-		var existingMap map[string]any
-		if err := yaml.Unmarshal(diskYAML, &existingMap); err != nil {
-			return c.JSON(http.StatusInternalServerError, map[string]any{"error": "failed to parse existing config: " + err.Error()})
-		}
-		if existingMap == nil {
-			existingMap = map[string]any{}
-		}
-
-		if err := mergo.Merge(&existingMap, patchMap, mergo.WithOverride); err != nil {
-			return c.JSON(http.StatusInternalServerError, map[string]any{"error": "failed to merge configs: " + err.Error()})
-		}
-
-		// Marshal once and reuse for both validation and writing
-		yamlData, err := yaml.Marshal(existingMap)
-		if err != nil {
-			return c.JSON(http.StatusInternalServerError, map[string]any{"error": "failed to marshal YAML"})
-		}
-
-		var updatedConfig config.ModelConfig
-		if err := yaml.Unmarshal(yamlData, &updatedConfig); err != nil {
-			return c.JSON(http.StatusBadRequest, map[string]any{"error": "merged config is invalid: " + err.Error()})
-		}
-
-		if valid, err := updatedConfig.Validate(); !valid {
-			errMsg := "validation failed"
-			if err != nil {
-				errMsg = err.Error()
-			}
-			return c.JSON(http.StatusBadRequest, map[string]any{"error": errMsg})
-		}
-
-		if err := os.WriteFile(configPath, yamlData, 0644); err != nil {
-			return c.JSON(http.StatusInternalServerError, map[string]any{"error": "failed to write config file"})
-		}
-
-		if err := cl.LoadModelConfigsFromPath(appConfig.SystemState.Model.ModelsPath, appConfig.ToConfigLoaderOptions()...); err != nil {
-			return c.JSON(http.StatusInternalServerError, map[string]any{"error": "failed to reload configs: " + err.Error()})
-		}
-
-		if err := cl.Preload(appConfig.SystemState.Model.ModelsPath); err != nil {
-			xlog.Warn("Failed to preload after PATCH", "error", err)
-		}
-
 		return c.JSON(http.StatusOK, map[string]any{
 			"success": true,
 			"message": fmt.Sprintf("Model '%s' updated successfully", modelName),
--- a/core/http/endpoints/localai/edit_model.go
+++ b/core/http/endpoints/localai/edit_model.go
@@ -6,63 +6,31 @@ import (
 	"io"
 	"net/http"
 	"net/url"
-	"os"
-	"path/filepath"
-	"strings"

 	"github.com/labstack/echo/v4"
 	"github.com/mudler/LocalAI/core/config"
-	"github.com/mudler/LocalAI/core/gallery"
 	httpUtils "github.com/mudler/LocalAI/core/http/middleware"
+	"github.com/mudler/LocalAI/core/services/modeladmin"
 	"github.com/mudler/LocalAI/internal"
 	"github.com/mudler/LocalAI/pkg/model"
-	"github.com/mudler/LocalAI/pkg/utils"
-
-	"gopkg.in/yaml.v3"
 )

 // GetEditModelPage renders the edit model page with current configuration
 func GetEditModelPage(cl *config.ModelConfigLoader, appConfig *config.ApplicationConfig) echo.HandlerFunc {
+	svc := modeladmin.NewConfigService(cl, appConfig)
 	return func(c echo.Context) error {
 		modelName := c.Param("name")
 		if decoded, err := url.PathUnescape(modelName); err == nil {
 			modelName = decoded
 		}
-		if modelName == "" {
-			response := ModelResponse{
-				Success: false,
-				Error:   "Model name is required",
-			}
-			return c.JSON(http.StatusBadRequest, response)
-		}
-
-		modelConfig, exists := cl.GetModelConfig(modelName)
-		if !exists {
-			response := ModelResponse{
-				Success: false,
-				Error:   "Model configuration not found",
-			}
-			return c.JSON(http.StatusNotFound, response)
-		}
-
-		modelConfigFile := modelConfig.GetModelConfigFile()
-		if modelConfigFile == "" {
-			response := ModelResponse{
-				Success: false,
-				Error:   "Model configuration file not found",
-			}
-			return c.JSON(http.StatusNotFound, response)
-		}
-		configData, err := os.ReadFile(modelConfigFile)
+		view, err := svc.GetConfig(c.Request().Context(), modelName)
 		if err != nil {
-			response := ModelResponse{
-				Success: false,
-				Error:   "Failed to read configuration file: " + err.Error(),
-			}
-			return c.JSON(http.StatusInternalServerError, response)
+			return c.JSON(httpStatusForModelAdminError(err), ModelResponse{Success: false, Error: err.Error()})
 		}
-
-		// Render the edit page with the current configuration
+		// Render the edit page with the current configuration. Re-fetch the
+		// in-memory config from the loader for the template — the on-disk YAML
+		// view from svc doesn't carry the loader's parsed struct fields.
+		modelConfig, _ := cl.GetModelConfig(modelName)
 		templateData := struct {
 			Title                  string
 			ModelName              string
@@ -76,7 +44,7 @@ func GetEditModelPage(cl *config.ModelConfigLoader, appConfig *config.Applicatio
 			Title:                  "LocalAI - Edit Model " + modelName,
 			ModelName:              modelName,
 			Config:                 &modelConfig,
-			ConfigYAML:             string(configData),
+			ConfigYAML:             view.YAML,
 			BaseURL:                httpUtils.BaseURL(c),
 			Version:                internal.PrintableVersion(),
 			DisableRuntimeSettings: appConfig.DisableRuntimeSettings,
@@ -88,208 +56,53 @@ func GetEditModelPage(cl *config.ModelConfigLoader, appConfig *config.Applicatio

 // EditModelEndpoint handles updating existing model configurations
 func EditModelEndpoint(cl *config.ModelConfigLoader, ml *model.ModelLoader, appConfig *config.ApplicationConfig) echo.HandlerFunc {
+	svc := modeladmin.NewConfigService(cl, appConfig)
 	return func(c echo.Context) error {
 		modelName := c.Param("name")
 		if decoded, err := url.PathUnescape(modelName); err == nil {
 			modelName = decoded
 		}
-		if modelName == "" {
-			response := ModelResponse{
-				Success: false,
-				Error:   "Model name is required",
-			}
-			return c.JSON(http.StatusBadRequest, response)
-		}
-
-		modelConfig, exists := cl.GetModelConfig(modelName)
-		if !exists {
-			response := ModelResponse{
-				Success: false,
-				Error:   "Existing model configuration not found",
-			}
-			return c.JSON(http.StatusNotFound, response)
-		}
-
-		// Get the raw body
 		body, err := io.ReadAll(c.Request().Body)
 		if err != nil {
-			response := ModelResponse{
-				Success: false,
-				Error:   "Failed to read request body: " + err.Error(),
-			}
-			return c.JSON(http.StatusBadRequest, response)
+			return c.JSON(http.StatusBadRequest, ModelResponse{Success: false, Error: "Failed to read request body: " + err.Error()})
 		}
-		if len(body) == 0 {
-			response := ModelResponse{
-				Success: false,
-				Error:   "Request body is empty",
-			}
-			return c.JSON(http.StatusBadRequest, response)
+		result, err := svc.EditYAML(c.Request().Context(), modelName, body, ml)
+		if err != nil {
+			return c.JSON(httpStatusForModelAdminError(err), ModelResponse{Success: false, Error: err.Error()})
 		}
-
-		// Check content to see if it's a valid model config
-		var req config.ModelConfig
-
-		// Parse YAML
-		if err := yaml.Unmarshal(body, &req); err != nil {
-			response := ModelResponse{
-				Success: false,
-				Error:   "Failed to parse YAML: " + err.Error(),
-			}
-			return c.JSON(http.StatusBadRequest, response)
+		msg := fmt.Sprintf("Model '%s' updated successfully. Model has been reloaded with new configuration.", result.NewName)
+		if result.Renamed {
+			msg = fmt.Sprintf("Model '%s' renamed to '%s' and updated successfully.", result.OldName, result.NewName)
 		}
-
-		// Validate required fields
-		if req.Name == "" {
-			response := ModelResponse{
-				Success: false,
-				Error:   "Name is required",
-			}
-			return c.JSON(http.StatusBadRequest, response)
-		}
-
-		// Validate the configuration
-		if valid, _ := req.Validate(); !valid {
-			response := ModelResponse{
-				Success: false,
-				Error:   "Validation failed",
-				Details: []string{"Configuration validation failed. Please check your YAML syntax and required fields."},
-			}
-			return c.JSON(http.StatusBadRequest, response)
-		}
-
-		// Load the existing configuration
-		configPath := modelConfig.GetModelConfigFile()
-		modelsPath := appConfig.SystemState.Model.ModelsPath
-		if err := utils.VerifyPath(configPath, modelsPath); err != nil {
-			response := ModelResponse{
-				Success: false,
-				Error:   "Model configuration not trusted: " + err.Error(),
-			}
-			return c.JSON(http.StatusNotFound, response)
-		}
-
-		// Detect a rename: the URL param (old name) differs from the name field
-		// in the posted YAML. When that happens we must rename the on-disk file
-		// so that <name>.yaml stays in sync with the internal name field —
-		// otherwise a subsequent config reload indexes the file under the new
-		// name while the old key lingers in memory, producing duplicates in the UI.
-		renamed := req.Name != modelName
-		if renamed {
-			if strings.ContainsRune(req.Name, os.PathSeparator) || strings.Contains(req.Name, "/") || strings.Contains(req.Name, "\\") {
-				response := ModelResponse{
-					Success: false,
-					Error:   "Model name must not contain path separators",
-				}
-				return c.JSON(http.StatusBadRequest, response)
-			}
-			if _, exists := cl.GetModelConfig(req.Name); exists {
-				response := ModelResponse{
-					Success: false,
-					Error:   fmt.Sprintf("A model named %q already exists", req.Name),
-				}
-				return c.JSON(http.StatusConflict, response)
-			}
-			newConfigPath := filepath.Join(modelsPath, req.Name+".yaml")
-			if err := utils.VerifyPath(newConfigPath, modelsPath); err != nil {
-				response := ModelResponse{
-					Success: false,
-					Error:   "New model configuration path not trusted: " + err.Error(),
-				}
-				return c.JSON(http.StatusBadRequest, response)
-			}
-			if _, err := os.Stat(newConfigPath); err == nil {
-				response := ModelResponse{
-					Success: false,
-					Error:   fmt.Sprintf("A configuration file for %q already exists on disk", req.Name),
-				}
-				return c.JSON(http.StatusConflict, response)
-			} else if !errors.Is(err, os.ErrNotExist) {
-				response := ModelResponse{
-					Success: false,
-					Error:   "Failed to check for existing configuration: " + err.Error(),
-				}
-				return c.JSON(http.StatusInternalServerError, response)
-			}
-
-			if err := os.WriteFile(newConfigPath, body, 0644); err != nil {
-				response := ModelResponse{
-					Success: false,
-					Error:   "Failed to write configuration file: " + err.Error(),
-				}
-				return c.JSON(http.StatusInternalServerError, response)
-			}
-			if configPath != newConfigPath {
-				if err := os.Remove(configPath); err != nil && !errors.Is(err, os.ErrNotExist) {
-					fmt.Printf("Warning: Failed to remove old configuration file %q: %v\n", configPath, err)
-				}
-			}
-
-			// Rename the gallery metadata file if one exists, so the delete
-			// flow (which looks up ._gallery_<name>.yaml) can still find it.
-			oldGalleryPath := filepath.Join(modelsPath, gallery.GalleryFileName(modelName))
-			newGalleryPath := filepath.Join(modelsPath, gallery.GalleryFileName(req.Name))
-			if _, err := os.Stat(oldGalleryPath); err == nil {
-				if err := os.Rename(oldGalleryPath, newGalleryPath); err != nil {
-					fmt.Printf("Warning: Failed to rename gallery metadata from %q to %q: %v\n", oldGalleryPath, newGalleryPath, err)
-				}
-			}
-
-			// Drop the stale in-memory entry before the reload so we don't
-			// surface both names to callers between the reload scan steps.
-			cl.RemoveModelConfig(modelName)
-			configPath = newConfigPath
-		} else {
-			// Write new content to file
-			if err := os.WriteFile(configPath, body, 0644); err != nil {
-				response := ModelResponse{
-					Success: false,
-					Error:   "Failed to write configuration file: " + err.Error(),
-				}
-				return c.JSON(http.StatusInternalServerError, response)
-			}
-		}
-
-		// Reload configurations
-		if err := cl.LoadModelConfigsFromPath(modelsPath, appConfig.ToConfigLoaderOptions()...); err != nil {
-			response := ModelResponse{
-				Success: false,
-				Error:   "Failed to reload configurations: " + err.Error(),
-			}
-			return c.JSON(http.StatusInternalServerError, response)
-		}
-
-		// Shutdown the running model to apply new configuration (e.g., context_size)
-		// The model will be reloaded on the next inference request.
-		// Shutdown uses the old name because that's what the running instance
-		// (if any) was started with.
-		if err := ml.ShutdownModel(modelName); err != nil {
-			// Log the error but don't fail the request - the config was saved successfully
-			// The model can still be manually reloaded or restarted
-			fmt.Printf("Warning: Failed to shutdown model '%s': %v\n", modelName, err)
-		}
-
-		// Preload the model
-		if err := cl.Preload(modelsPath); err != nil {
-			response := ModelResponse{
-				Success: false,
-				Error:   "Failed to preload model: " + err.Error(),
-			}
-			return c.JSON(http.StatusInternalServerError, response)
-		}
-
-		// Return success response
-		msg := fmt.Sprintf("Model '%s' updated successfully. Model has been reloaded with new configuration.", req.Name)
-		if renamed {
-			msg = fmt.Sprintf("Model '%s' renamed to '%s' and updated successfully.", modelName, req.Name)
-		}
-		response := ModelResponse{
+		return c.JSON(http.StatusOK, ModelResponse{
 			Success:  true,
 			Message:  msg,
-			Filename: configPath,
-			Config:   req,
-		}
-		return c.JSON(200, response)
+			Filename: result.Filename,
+			Config:   result.Config,
+		})
+	}
+}
+
+// httpStatusForModelAdminError maps the typed errors from modeladmin to
+// the HTTP status codes the existing endpoints used to return — keeps the
+// REST contract identical after the refactor.
+func httpStatusForModelAdminError(err error) int {
+	switch {
+	case errors.Is(err, modeladmin.ErrNameRequired),
+		errors.Is(err, modeladmin.ErrEmptyBody),
+		errors.Is(err, modeladmin.ErrPathSeparator),
+		errors.Is(err, modeladmin.ErrBadAction),
+		errors.Is(err, modeladmin.ErrInvalidConfig):
+		return http.StatusBadRequest
+	case errors.Is(err, modeladmin.ErrNotFound),
+		errors.Is(err, modeladmin.ErrConfigFileMissing):
+		return http.StatusNotFound
+	case errors.Is(err, modeladmin.ErrPathNotTrusted):
+		return http.StatusForbidden
+	case errors.Is(err, modeladmin.ErrConflict):
+		return http.StatusConflict
+	default:
+		return http.StatusInternalServerError
 	}
 }

--- a/core/http/endpoints/localai/mcp.go
+++ b/core/http/endpoints/localai/mcp.go
@@ -58,7 +58,10 @@ type MCPErrorEvent struct {
 // @Success 200 {object} schema.OpenAIResponse "Response"
 // @Router /v1/mcp/chat/completions [post]
 func MCPEndpoint(cl *config.ModelConfigLoader, ml *model.ModelLoader, evaluator *templates.Evaluator, appConfig *config.ApplicationConfig, natsClient mcpTools.MCPNATSClient) echo.HandlerFunc {
-	chatHandler := openai.ChatEndpoint(cl, ml, evaluator, appConfig, natsClient)
+	// The legacy /v1/mcp/chat/completions endpoint never opts into the
+	// in-process LocalAI Assistant tool surface — pass nil holder so the
+	// assistant branch in chat.go is unreachable from this code path.
+	chatHandler := openai.ChatEndpoint(cl, ml, evaluator, appConfig, natsClient, nil)

 	return func(c echo.Context) error {
 		input, ok := c.Get(middleware.CONTEXT_LOCALS_KEY_LOCALAI_REQUEST).(*schema.OpenAIRequest)
--- a/core/http/endpoints/localai/pin_model.go
+++ b/core/http/endpoints/localai/pin_model.go
@@ -4,13 +4,10 @@ import (
 	"fmt"
 	"net/http"
 	"net/url"
-	"os"

 	"github.com/labstack/echo/v4"
 	"github.com/mudler/LocalAI/core/config"
-	"github.com/mudler/LocalAI/pkg/utils"
-
-	"gopkg.in/yaml.v3"
+	"github.com/mudler/LocalAI/core/services/modeladmin"
 )

 // TogglePinnedModelEndpoint handles pinning or unpinning a model.
@@ -27,118 +24,21 @@ import (
 // @Failure      500  {object}  ModelResponse
 // @Router       /api/models/toggle-pinned/{name}/{action} [put]
 func TogglePinnedModelEndpoint(cl *config.ModelConfigLoader, appConfig *config.ApplicationConfig, syncPinnedFn func()) echo.HandlerFunc {
+	svc := modeladmin.NewConfigService(cl, appConfig)
 	return func(c echo.Context) error {
 		modelName := c.Param("name")
 		if decoded, err := url.PathUnescape(modelName); err == nil {
 			modelName = decoded
 		}
-		if modelName == "" {
-			return c.JSON(http.StatusBadRequest, ModelResponse{
-				Success: false,
-				Error:   "Model name is required",
-			})
-		}
-
-		action := c.Param("action")
-		if action != "pin" && action != "unpin" {
-			return c.JSON(http.StatusBadRequest, ModelResponse{
-				Success: false,
-				Error:   "Action must be 'pin' or 'unpin'",
-			})
-		}
-
-		// Get existing model config
-		modelConfig, exists := cl.GetModelConfig(modelName)
-		if !exists {
-			return c.JSON(http.StatusNotFound, ModelResponse{
-				Success: false,
-				Error:   "Model configuration not found",
-			})
-		}
-
-		// Get the config file path
-		configPath := modelConfig.GetModelConfigFile()
-		if configPath == "" {
-			return c.JSON(http.StatusNotFound, ModelResponse{
-				Success: false,
-				Error:   "Model configuration file not found",
-			})
-		}
-
-		// Verify the path is trusted
-		if err := utils.VerifyPath(configPath, appConfig.SystemState.Model.ModelsPath); err != nil {
-			return c.JSON(http.StatusForbidden, ModelResponse{
-				Success: false,
-				Error:   "Model configuration not trusted: " + err.Error(),
-			})
-		}
-
-		// Read the existing config file
-		configData, err := os.ReadFile(configPath)
+		action := modeladmin.Action(c.Param("action"))
+		result, err := svc.TogglePinned(c.Request().Context(), modelName, action, syncPinnedFn)
 		if err != nil {
-			return c.JSON(http.StatusInternalServerError, ModelResponse{
-				Success: false,
-				Error:   "Failed to read configuration file: " + err.Error(),
-			})
+			return c.JSON(httpStatusForModelAdminError(err), ModelResponse{Success: false, Error: err.Error()})
 		}
-
-		// Parse the YAML config as a generic map to preserve all fields
-		var configMap map[string]interface{}
-		if err := yaml.Unmarshal(configData, &configMap); err != nil {
-			return c.JSON(http.StatusInternalServerError, ModelResponse{
-				Success: false,
-				Error:   "Failed to parse configuration file: " + err.Error(),
-			})
-		}
-
-		// Update the pinned field
-		pinned := action == "pin"
-		if pinned {
-			configMap["pinned"] = true
-		} else {
-			// Remove the pinned key entirely when unpinning (clean YAML)
-			delete(configMap, "pinned")
-		}
-
-		// Marshal back to YAML
-		updatedData, err := yaml.Marshal(configMap)
-		if err != nil {
-			return c.JSON(http.StatusInternalServerError, ModelResponse{
-				Success: false,
-				Error:   "Failed to serialize configuration: " + err.Error(),
-			})
-		}
-
-		// Write updated config back to file
-		if err := os.WriteFile(configPath, updatedData, 0644); err != nil {
-			return c.JSON(http.StatusInternalServerError, ModelResponse{
-				Success: false,
-				Error:   "Failed to write configuration file: " + err.Error(),
-			})
-		}
-
-		// Reload model configurations from disk
-		if err := cl.LoadModelConfigsFromPath(appConfig.SystemState.Model.ModelsPath, appConfig.ToConfigLoaderOptions()...); err != nil {
-			return c.JSON(http.StatusInternalServerError, ModelResponse{
-				Success: false,
-				Error:   "Failed to reload configurations: " + err.Error(),
-			})
-		}
-
-		// Sync pinned models to the watchdog
-		if syncPinnedFn != nil {
-			syncPinnedFn()
-		}
-
 		msg := fmt.Sprintf("Model '%s' has been %sned successfully.", modelName, action)
-		if pinned {
+		if action == modeladmin.ActionPin {
 			msg += " The model will be excluded from automatic eviction."
 		}
-
-		return c.JSON(http.StatusOK, ModelResponse{
-			Success:  true,
-			Message:  msg,
-			Filename: configPath,
-		})
+		return c.JSON(http.StatusOK, ModelResponse{Success: true, Message: msg, Filename: result.Filename})
 	}
 }
--- a/core/http/endpoints/localai/toggle_model.go
+++ b/core/http/endpoints/localai/toggle_model.go
@@ -4,14 +4,11 @@ import (
 	"fmt"
 	"net/http"
 	"net/url"
-	"os"

 	"github.com/labstack/echo/v4"
 	"github.com/mudler/LocalAI/core/config"
+	"github.com/mudler/LocalAI/core/services/modeladmin"
 	"github.com/mudler/LocalAI/pkg/model"
-	"github.com/mudler/LocalAI/pkg/utils"
-
-	"gopkg.in/yaml.v3"
 )

 // ToggleModelEndpoint handles enabling or disabling a model from being loaded on demand.
@@ -28,121 +25,21 @@ import (
 // @Failure      500  {object}  ModelResponse
 // @Router       /api/models/{name}/{action} [put]
 func ToggleStateModelEndpoint(cl *config.ModelConfigLoader, ml *model.ModelLoader, appConfig *config.ApplicationConfig) echo.HandlerFunc {
+	svc := modeladmin.NewConfigService(cl, appConfig)
 	return func(c echo.Context) error {
 		modelName := c.Param("name")
 		if decoded, err := url.PathUnescape(modelName); err == nil {
 			modelName = decoded
 		}
-		if modelName == "" {
-			return c.JSON(http.StatusBadRequest, ModelResponse{
-				Success: false,
-				Error:   "Model name is required",
-			})
-		}
-
-		action := c.Param("action")
-		if action != "enable" && action != "disable" {
-			return c.JSON(http.StatusBadRequest, ModelResponse{
-				Success: false,
-				Error:   "Action must be 'enable' or 'disable'",
-			})
-		}
-
-		// Get existing model config
-		modelConfig, exists := cl.GetModelConfig(modelName)
-		if !exists {
-			return c.JSON(http.StatusNotFound, ModelResponse{
-				Success: false,
-				Error:   "Model configuration not found",
-			})
-		}
-
-		// Get the config file path
-		configPath := modelConfig.GetModelConfigFile()
-		if configPath == "" {
-			return c.JSON(http.StatusNotFound, ModelResponse{
-				Success: false,
-				Error:   "Model configuration file not found",
-			})
-		}
-
-		// Verify the path is trusted
-		if err := utils.VerifyPath(configPath, appConfig.SystemState.Model.ModelsPath); err != nil {
-			return c.JSON(http.StatusForbidden, ModelResponse{
-				Success: false,
-				Error:   "Model configuration not trusted: " + err.Error(),
-			})
-		}
-
-		// Read the existing config file
-		configData, err := os.ReadFile(configPath)
+		action := modeladmin.Action(c.Param("action"))
+		result, err := svc.ToggleState(c.Request().Context(), modelName, action, ml)
 		if err != nil {
-			return c.JSON(http.StatusInternalServerError, ModelResponse{
-				Success: false,
-				Error:   "Failed to read configuration file: " + err.Error(),
-			})
+			return c.JSON(httpStatusForModelAdminError(err), ModelResponse{Success: false, Error: err.Error()})
 		}
-
-		// Parse the YAML config as a generic map to preserve all fields
-		var configMap map[string]interface{}
-		if err := yaml.Unmarshal(configData, &configMap); err != nil {
-			return c.JSON(http.StatusInternalServerError, ModelResponse{
-				Success: false,
-				Error:   "Failed to parse configuration file: " + err.Error(),
-			})
-		}
-
-		// Update the disabled field
-		disabled := action == "disable"
-		if disabled {
-			configMap["disabled"] = true
-		} else {
-			// Remove the disabled key entirely when enabling (clean YAML)
-			delete(configMap, "disabled")
-		}
-
-		// Marshal back to YAML
-		updatedData, err := yaml.Marshal(configMap)
-		if err != nil {
-			return c.JSON(http.StatusInternalServerError, ModelResponse{
-				Success: false,
-				Error:   "Failed to serialize configuration: " + err.Error(),
-			})
-		}
-
-		// Write updated config back to file
-		if err := os.WriteFile(configPath, updatedData, 0644); err != nil {
-			return c.JSON(http.StatusInternalServerError, ModelResponse{
-				Success: false,
-				Error:   "Failed to write configuration file: " + err.Error(),
-			})
-		}
-
-		// Reload model configurations from disk
-		if err := cl.LoadModelConfigsFromPath(appConfig.SystemState.Model.ModelsPath, appConfig.ToConfigLoaderOptions()...); err != nil {
-			return c.JSON(http.StatusInternalServerError, ModelResponse{
-				Success: false,
-				Error:   "Failed to reload configurations: " + err.Error(),
-			})
-		}
-
-		// If disabling, also shutdown the model if it's currently running
-		if disabled {
-			if err := ml.ShutdownModel(modelName); err != nil {
-				// Log but don't fail - the config was saved successfully
-				fmt.Printf("Warning: Failed to shutdown model '%s' during disable: %v\n", modelName, err)
-			}
-		}
-
 		msg := fmt.Sprintf("Model '%s' has been %sd successfully.", modelName, action)
-		if disabled {
+		if action == modeladmin.ActionDisable {
 			msg += " The model will not be loaded on demand until re-enabled."
 		}
-
-		return c.JSON(http.StatusOK, ModelResponse{
-			Success:  true,
-			Message:  msg,
-			Filename: configPath,
-		})
+		return c.JSON(http.StatusOK, ModelResponse{Success: true, Message: msg, Filename: result.Filename})
 	}
 }
--- a/core/http/endpoints/localai/vram.go
+++ b/core/http/endpoints/localai/vram.go
@@ -1,57 +1,13 @@
 package localai

 import (
-	"context"
-	"fmt"
 	"net/http"
-	"path/filepath"
-	"strings"
-	"time"

 	"github.com/labstack/echo/v4"
 	"github.com/mudler/LocalAI/core/config"
-	"github.com/mudler/LocalAI/pkg/vram"
+	"github.com/mudler/LocalAI/core/services/modeladmin"
 )

-type vramEstimateRequest struct {
-	Model       string `json:"model"`                      // model name (must be installed)
-	ContextSize uint32 `json:"context_size,omitempty"`     // context length to estimate for (default 8192)
-	GPULayers   int    `json:"gpu_layers,omitempty"`       // number of layers to offload to GPU (0 = all)
-	KVQuantBits int    `json:"kv_quant_bits,omitempty"`    // KV cache quantization bits (0 = fp16)
-}
-
-type vramEstimateResponse struct {
-	vram.EstimateResult
-	ContextNote     string `json:"context_note,omitempty"`      // note when context_size was defaulted
-	ModelMaxContext uint64 `json:"model_max_context,omitempty"` // model's trained maximum context length
-}
-
-// resolveModelURI converts a relative model path to a file:// URI so the
-// size resolver can stat it on disk. URIs that already have a scheme are
-// returned unchanged.
-func resolveModelURI(uri, modelsPath string) string {
-	if strings.Contains(uri, "://") {
-		return uri
-	}
-	return "file://" + filepath.Join(modelsPath, uri)
-}
-
-// addWeightFile appends a resolved weight file to files and tracks the first GGUF.
-func addWeightFile(uri, modelsPath string, files *[]vram.FileInput, firstGGUF *string, seen map[string]bool) {
-	if !vram.IsWeightFile(uri) {
-		return
-	}
-	resolved := resolveModelURI(uri, modelsPath)
-	if seen[resolved] {
-		return
-	}
-	seen[resolved] = true
-	*files = append(*files, vram.FileInput{URI: resolved, Size: 0})
-	if *firstGGUF == "" && vram.IsGGUF(uri) {
-		*firstGGUF = resolved
-	}
-}
-
 // VRAMEstimateEndpoint returns a handler that estimates VRAM usage for an
 // installed model configuration. For uninstalled models (gallery URLs), use
 // the gallery-level estimates in /api/models instead.
@@ -60,86 +16,24 @@ func addWeightFile(uri, modelsPath string, files *[]vram.FileInput, firstGGUF *s
 // @Tags config
 // @Accept json
 // @Produce json
-// @Param request body vramEstimateRequest true "VRAM estimation parameters"
-// @Success 200 {object} vramEstimateResponse "VRAM estimate"
+// @Param request body modeladmin.VRAMRequest true "VRAM estimation parameters"
+// @Success 200 {object} modeladmin.VRAMResponse "VRAM estimate"
 // @Router /api/models/vram-estimate [post]
 func VRAMEstimateEndpoint(cl *config.ModelConfigLoader, appConfig *config.ApplicationConfig) echo.HandlerFunc {
 	return func(c echo.Context) error {
-		var req vramEstimateRequest
+		var req modeladmin.VRAMRequest
 		if err := c.Bind(&req); err != nil {
 			return c.JSON(http.StatusBadRequest, map[string]any{"error": "invalid request body"})
 		}
-
-		if req.Model == "" {
-			return c.JSON(http.StatusBadRequest, map[string]any{"error": "model name is required"})
-		}
-
-		modelConfig, exists := cl.GetModelConfig(req.Model)
-		if !exists {
-			return c.JSON(http.StatusNotFound, map[string]any{"error": "model configuration not found"})
-		}
-
-		modelsPath := appConfig.SystemState.Model.ModelsPath
-
-		var files []vram.FileInput
-		var firstGGUF string
-		seen := make(map[string]bool)
-
-		for _, f := range modelConfig.DownloadFiles {
-			addWeightFile(string(f.URI), modelsPath, &files, &firstGGUF, seen)
-		}
-		if modelConfig.Model != "" {
-			addWeightFile(modelConfig.Model, modelsPath, &files, &firstGGUF, seen)
-		}
-		if modelConfig.MMProj != "" {
-			addWeightFile(modelConfig.MMProj, modelsPath, &files, &firstGGUF, seen)
-		}
-
-		if len(files) == 0 {
-			return c.JSON(http.StatusOK, map[string]any{
-				"message": "no weight files found for estimation",
-			})
-		}
-
-		contextDefaulted := false
-		opts := vram.EstimateOptions{
-			ContextLength: req.ContextSize,
-			GPULayers:     req.GPULayers,
-			KVQuantBits:   req.KVQuantBits,
-		}
-		if opts.ContextLength == 0 {
-			if modelConfig.ContextSize != nil {
-				opts.ContextLength = uint32(*modelConfig.ContextSize)
-			} else {
-				opts.ContextLength = 8192
-				contextDefaulted = true
-			}
-		}
-
-		ctx, cancel := context.WithTimeout(context.Background(), 10*time.Second)
-		defer cancel()
-
-		result, err := vram.Estimate(ctx, files, opts, vram.DefaultCachedSizeResolver(), vram.DefaultCachedGGUFReader())
+		resp, err := modeladmin.EstimateVRAM(c.Request().Context(), req, cl, appConfig.SystemState)
 		if err != nil {
-			return c.JSON(http.StatusInternalServerError, map[string]any{"error": err.Error()})
+			return c.JSON(httpStatusForModelAdminError(err), map[string]any{"error": err.Error()})
 		}
-
-		resp := vramEstimateResponse{EstimateResult: result}
-
-		// When context was defaulted to 8192, read the GGUF metadata to report
-		// the model's trained maximum context length so callers know the estimate
-		// may be conservative.
-		if contextDefaulted && firstGGUF != "" {
-			ggufMeta, err := vram.DefaultCachedGGUFReader().ReadMetadata(ctx, firstGGUF)
-			if err == nil && ggufMeta != nil && ggufMeta.MaximumContextLength > 0 {
-				resp.ModelMaxContext = ggufMeta.MaximumContextLength
-				resp.ContextNote = fmt.Sprintf(
-					"Estimate used default context_size=8192. The model's trained maximum context is %d; VRAM usage will be higher at larger context sizes.",
-					ggufMeta.MaximumContextLength,
-				)
-			}
+		// Backwards compat: when there are no weight files, the previous
+		// handler returned {"message": "..."} rather than a typed response.
+		if resp.ContextNote == "no weight files found for estimation" && resp.EstimateResult.SizeBytes == 0 {
+			return c.JSON(http.StatusOK, map[string]any{"message": resp.ContextNote})
 		}
-
 		return c.JSON(http.StatusOK, resp)
 	}
 }
--- a/core/http/endpoints/mcp/localai_assistant.go
+++ b/core/http/endpoints/mcp/localai_assistant.go
@@ -0,0 +1,124 @@
+package mcp
+
+import (
+	"context"
+	"fmt"
+	"sync"
+
+	"github.com/modelcontextprotocol/go-sdk/mcp"
+	"github.com/mudler/xlog"
+
+	localaitools "github.com/mudler/LocalAI/pkg/mcp/localaitools"
+)
+
+// LocalAIAssistantHolder owns the process-wide in-memory MCP server that
+// exposes LocalAI's admin surface to the chat session when an admin opts in
+// via metadata.localai_assistant=true.
+//
+// Why a holder rather than per-request wiring:
+//   - The MCP server is stateless across requests; building a new
+//     net.Pipe()-backed pair per request and re-listing tools would burn cycles
+//     for no benefit.
+//   - The same in-process LocalToolExecutor can serve every assistant chat —
+//     no NATS, no subprocesses, no synthetic admin credential.
+//
+// The holder is initialised once during Application bootstrap and is safe for
+// concurrent use thereafter. If Initialize fails (or DisableLocalAIAssistant is
+// true), Executor() returns an empty LocalToolExecutor and HasTools() is false,
+// which the chat handler treats as "feature unavailable".
+type LocalAIAssistantHolder struct {
+	once    sync.Once
+	initErr error
+	tools   []MCPToolInfo
+	opts    localaitools.Options
+
+	serverSession *mcp.ServerSession
+	clientSession *mcp.ClientSession
+}
+
+// NewLocalAIAssistantHolder returns an uninitialised holder. Call Initialize
+// once during application start.
+func NewLocalAIAssistantHolder() *LocalAIAssistantHolder {
+	return &LocalAIAssistantHolder{}
+}
+
+// Initialize wires the in-memory server+client pair and discovers the tool
+// list. Subsequent calls are no-ops; the first error is sticky.
+func (h *LocalAIAssistantHolder) Initialize(ctx context.Context, client localaitools.LocalAIClient, opts localaitools.Options) error {
+	h.once.Do(func() {
+		t1, t2 := mcp.NewInMemoryTransports()
+		srv := localaitools.NewServer(client, opts)
+
+		serverSession, err := srv.Connect(ctx, t1, nil)
+		if err != nil {
+			h.initErr = fmt.Errorf("connect localai-assistant server: %w", err)
+			return
+		}
+		h.serverSession = serverSession
+
+		c := mcp.NewClient(&mcp.Implementation{Name: "LocalAI-assistant", Version: "v1"}, nil)
+		clientSession, err := c.Connect(ctx, t2, nil)
+		if err != nil {
+			h.initErr = fmt.Errorf("connect localai-assistant client: %w", err)
+			return
+		}
+		h.clientSession = clientSession
+
+		// Pre-discover tools so the first chat request doesn't pay for a
+		// list_tools round-trip.
+		named := []NamedSession{{Name: "localai", Type: "inmemory", Session: clientSession}}
+		tools, err := DiscoverMCPTools(ctx, named)
+		if err != nil {
+			h.initErr = fmt.Errorf("discover localai-assistant tools: %w", err)
+			return
+		}
+		h.tools = tools
+		h.opts = opts
+
+		xlog.Info("LocalAI Assistant in-memory MCP server initialised",
+			"tools", len(tools),
+			"read_only", opts.DisableMutating,
+		)
+	})
+	return h.initErr
+}
+
+// Executor returns a tool executor backed by the holder's cached tools.
+// When the holder failed to initialise (or was never initialised), the
+// returned executor is empty — HasTools() is false.
+func (h *LocalAIAssistantHolder) Executor() ToolExecutor {
+	if h == nil || h.initErr != nil {
+		return &LocalToolExecutor{}
+	}
+	return &LocalToolExecutor{tools: h.tools}
+}
+
+// SystemPrompt returns the assembled embedded system prompt, freshly
+// assembled on every call so a runtime change to the bootstrap model is
+// reflected immediately. Returns "" if the holder failed to initialise.
+func (h *LocalAIAssistantHolder) SystemPrompt() string {
+	if h == nil || h.initErr != nil {
+		return ""
+	}
+	return localaitools.SystemPrompt(h.opts)
+}
+
+// HasTools reports whether the holder is initialised and has tools available.
+func (h *LocalAIAssistantHolder) HasTools() bool {
+	return h != nil && h.initErr == nil && len(h.tools) > 0
+}
+
+// Close tears down the in-memory transport pair. Safe to call multiple times.
+// Intended for graceful shutdown.
+func (h *LocalAIAssistantHolder) Close() error {
+	if h == nil {
+		return nil
+	}
+	if h.clientSession != nil {
+		_ = h.clientSession.Close()
+	}
+	if h.serverSession != nil {
+		_ = h.serverSession.Wait()
+	}
+	return nil
+}
--- a/core/http/endpoints/mcp/localai_assistant_test.go
+++ b/core/http/endpoints/mcp/localai_assistant_test.go
@@ -0,0 +1,119 @@
+package mcp
+
+import (
+	"context"
+	"sync"
+
+	. "github.com/onsi/ginkgo/v2"
+	. "github.com/onsi/gomega"
+
+	"github.com/mudler/LocalAI/core/config"
+	"github.com/mudler/LocalAI/core/gallery"
+	"github.com/mudler/LocalAI/core/schema"
+	"github.com/mudler/LocalAI/core/services/modeladmin"
+	localaitools "github.com/mudler/LocalAI/pkg/mcp/localaitools"
+	"github.com/mudler/LocalAI/pkg/vram"
+)
+
+// stubClient is the minimum LocalAIClient impl needed to exercise the holder.
+// It returns deterministic, non-zero values so we can assert tool dispatch.
+type stubClient struct{}
+
+func (stubClient) GallerySearch(_ context.Context, _ localaitools.GallerySearchQuery) ([]gallery.Metadata, error) {
+	return []gallery.Metadata{{Name: "stub", Gallery: config.Gallery{Name: "stub-gallery"}}}, nil
+}
+func (stubClient) ListInstalledModels(_ context.Context, _ localaitools.Capability) ([]localaitools.InstalledModel, error) {
+	return []localaitools.InstalledModel{{Name: "stub"}}, nil
+}
+func (stubClient) ListGalleries(_ context.Context) ([]config.Gallery, error) {
+	return []config.Gallery{{Name: "stub-gallery", URL: "http://example"}}, nil
+}
+func (stubClient) GetJobStatus(_ context.Context, _ string) (*localaitools.JobStatus, error) {
+	return &localaitools.JobStatus{ID: "stub", Processed: true}, nil
+}
+func (stubClient) GetModelConfig(_ context.Context, _ string) (*localaitools.ModelConfigView, error) {
+	return &localaitools.ModelConfigView{Name: "stub"}, nil
+}
+func (stubClient) InstallModel(_ context.Context, _ localaitools.InstallModelRequest) (string, error) {
+	return "stub-job", nil
+}
+func (stubClient) ImportModelURI(_ context.Context, _ localaitools.ImportModelURIRequest) (*localaitools.ImportModelURIResponse, error) {
+	return &localaitools.ImportModelURIResponse{JobID: "stub-import"}, nil
+}
+func (stubClient) DeleteModel(_ context.Context, _ string) error  { return nil }
+func (stubClient) EditModelConfig(_ context.Context, _ string, _ map[string]any) error {
+	return nil
+}
+func (stubClient) ReloadModels(_ context.Context) error { return nil }
+func (stubClient) ListBackends(_ context.Context) ([]localaitools.Backend, error) {
+	return []localaitools.Backend{{Name: "stub-backend", Installed: true}}, nil
+}
+func (stubClient) ListKnownBackends(_ context.Context) ([]schema.KnownBackend, error) {
+	return []schema.KnownBackend{}, nil
+}
+func (stubClient) InstallBackend(_ context.Context, _ localaitools.InstallBackendRequest) (string, error) {
+	return "stub-backend-job", nil
+}
+func (stubClient) UpgradeBackend(_ context.Context, _ string) (string, error) {
+	return "stub-upgrade-job", nil
+}
+func (stubClient) SystemInfo(_ context.Context) (*localaitools.SystemInfo, error) {
+	return &localaitools.SystemInfo{Version: "stub"}, nil
+}
+func (stubClient) ListNodes(_ context.Context) ([]localaitools.Node, error) {
+	return []localaitools.Node{}, nil
+}
+func (stubClient) VRAMEstimate(_ context.Context, _ localaitools.VRAMEstimateRequest) (*vram.EstimateResult, error) {
+	return &vram.EstimateResult{SizeDisplay: "stub"}, nil
+}
+func (stubClient) ToggleModelState(_ context.Context, _ string, _ modeladmin.Action) error  { return nil }
+func (stubClient) ToggleModelPinned(_ context.Context, _ string, _ modeladmin.Action) error { return nil }
+
+var _ = Describe("LocalAIAssistantHolder", func() {
+	var ctx context.Context
+
+	BeforeEach(func() {
+		ctx = context.Background()
+	})
+
+	It("Initialize wires the in-memory server, exposes tools, and dispatches", func() {
+		h := NewLocalAIAssistantHolder()
+		Expect(h.Initialize(ctx, stubClient{}, localaitools.Options{})).To(Succeed())
+		Expect(h.HasTools()).To(BeTrue())
+		Expect(h.SystemPrompt()).ToNot(BeEmpty())
+
+		exec := h.Executor()
+		Expect(exec.HasTools()).To(BeTrue())
+
+		out, err := exec.ExecuteTool(ctx, "list_installed_models", `{"capability":"chat"}`)
+		Expect(err).ToNot(HaveOccurred())
+		Expect(out).ToNot(BeEmpty())
+	})
+
+	It("Initialize is exactly-once even under concurrent callers", func() {
+		h := NewLocalAIAssistantHolder()
+
+		// Concurrent Initialize calls — only one should actually wire the server.
+		var wg sync.WaitGroup
+		for i := 0; i < 8; i++ {
+			wg.Add(1)
+			go func() {
+				defer wg.Done()
+				_ = h.Initialize(ctx, stubClient{}, localaitools.Options{})
+			}()
+		}
+		wg.Wait()
+
+		Expect(h.HasTools()).To(BeTrue())
+	})
+
+	It("methods are nil-safe on a nil holder", func() {
+		var h *LocalAIAssistantHolder
+		Expect(h.HasTools()).To(BeFalse())
+		Expect(h.SystemPrompt()).To(BeEmpty())
+		exec := h.Executor()
+		// Nil-receiver Executor returns an empty LocalToolExecutor.
+		Expect(exec).ToNot(BeNil())
+		Expect(exec.HasTools()).To(BeFalse())
+	})
+})
--- a/core/http/endpoints/mcp/mcp_suite_test.go
+++ b/core/http/endpoints/mcp/mcp_suite_test.go
@@ -0,0 +1,13 @@
+package mcp
+
+import (
+	"testing"
+
+	. "github.com/onsi/ginkgo/v2"
+	. "github.com/onsi/gomega"
+)
+
+func TestMCP(t *testing.T) {
+	RegisterFailHandler(Fail)
+	RunSpecs(t, "core/http/endpoints/mcp test suite")
+}
--- a/core/http/endpoints/mcp/tools.go
+++ b/core/http/endpoints/mcp/tools.go
@@ -98,6 +98,28 @@ type MCPNATSClient interface {
 	Request(subject string, data []byte, timeout time.Duration) ([]byte, error)
 }

+// MetadataKeyLocalAIAssistant is the request-metadata key the chat handler
+// inspects to decide whether to wire the in-process admin MCP server. UI
+// callers MUST use this constant rather than the raw string.
+const MetadataKeyLocalAIAssistant = "localai_assistant"
+
+// LocalAIAssistantFromMetadata reports whether the request opted into the
+// "LocalAI Assistant" chat modality (admin in-process MCP tool surface).
+// The MetadataKeyLocalAIAssistant key is consumed so it doesn't leak to
+// the backend. Truthy values: "1", "true", "yes" (case-insensitive).
+func LocalAIAssistantFromMetadata(metadata map[string]string) bool {
+	raw, ok := metadata[MetadataKeyLocalAIAssistant]
+	if !ok {
+		return false
+	}
+	delete(metadata, MetadataKeyLocalAIAssistant)
+	switch strings.ToLower(strings.TrimSpace(raw)) {
+	case "1", "true", "yes":
+		return true
+	}
+	return false
+}
+
 // MCPServersFromMetadata extracts the MCP server list from the metadata map
 // and returns the list. The "mcp_servers" key is consumed (deleted from the map)
 // so it doesn't leak to the backend.
--- a/core/http/endpoints/openai/chat.go
+++ b/core/http/endpoints/openai/chat.go
@@ -3,12 +3,14 @@ package openai
 import (
 	"encoding/json"
 	"fmt"
+	"net/http"
 	"time"

 	"github.com/google/uuid"
 	"github.com/labstack/echo/v4"
 	"github.com/mudler/LocalAI/core/backend"
 	"github.com/mudler/LocalAI/core/config"
+	"github.com/mudler/LocalAI/core/http/auth"
 	mcpTools "github.com/mudler/LocalAI/core/http/endpoints/mcp"
 	"github.com/mudler/LocalAI/core/http/middleware"
 	"github.com/mudler/LocalAI/core/schema"
@@ -22,6 +24,18 @@ import (
 	"github.com/mudler/xlog"
 )

+// hasSystemMessage reports whether the message slice already contains a
+// system-role message — used to avoid clobbering a caller-supplied system
+// prompt when the LocalAI Assistant modality is on.
+func hasSystemMessage(messages []schema.Message) bool {
+	for _, m := range messages {
+		if m.Role == "system" {
+			return true
+		}
+	}
+	return false
+}
+
 // mergeToolCallDeltas merges streaming tool call deltas into complete tool calls.
 // In SSE streaming, a single tool call arrives as multiple chunks sharing the same Index:
 // the first chunk carries the ID, Type, and Name; subsequent chunks append to Arguments.
@@ -59,7 +73,7 @@ func mergeToolCallDeltas(existing []schema.ToolCall, deltas []schema.ToolCall) [
 // @Param request body schema.OpenAIRequest true "query params"
 // @Success 200 {object} schema.OpenAIResponse "Response"
 // @Router /v1/chat/completions [post]
-func ChatEndpoint(cl *config.ModelConfigLoader, ml *model.ModelLoader, evaluator *templates.Evaluator, startupOptions *config.ApplicationConfig, natsClient mcpTools.MCPNATSClient) echo.HandlerFunc {
+func ChatEndpoint(cl *config.ModelConfigLoader, ml *model.ModelLoader, evaluator *templates.Evaluator, startupOptions *config.ApplicationConfig, natsClient mcpTools.MCPNATSClient, assistantHolder *mcpTools.LocalAIAssistantHolder) echo.HandlerFunc {
 	process := func(s string, req *schema.OpenAIRequest, config *config.ModelConfig, loader *model.ModelLoader, responses chan schema.OpenAIResponse, extraUsage bool, id string, created int) error {
 		initialMessage := schema.OpenAIResponse{
 			ID:      id,
@@ -443,6 +457,54 @@ func ChatEndpoint(cl *config.ModelConfigLoader, ml *model.ModelLoader, evaluator
 		var mcpExecutor mcpTools.ToolExecutor
 		mcpServers := mcpTools.MCPServersFromMetadata(input.Metadata)

+		// LocalAI Assistant modality: an admin opted into the in-process MCP
+		// admin tool surface. Runs *before* the regular MCP block — when both
+		// are set, the assistant tools win (the admin cannot mix them with
+		// per-model MCP servers in the same chat session by design).
+		assistantMode := mcpTools.LocalAIAssistantFromMetadata(input.Metadata)
+		if assistantMode {
+			// Defense-in-depth admin gate: the chat route is feature-gated
+			// (FeatureChat), but the assistant tools mutate server state, so
+			// re-check role here even when the deployment chose to skip
+			// FeatureLocalAIAssistant on the route.
+			if startupOptions.Auth.Enabled {
+				user := auth.GetUser(c)
+				if user == nil || user.Role != auth.RoleAdmin {
+					return echo.NewHTTPError(http.StatusForbidden, "localai_assistant requires admin")
+				}
+			}
+			// Read the disable flag live: an admin can flip it via /api/settings
+			// and the next request must see the change without a restart.
+			if startupOptions.DisableLocalAIAssistant {
+				return echo.NewHTTPError(http.StatusServiceUnavailable, "LocalAI Assistant is disabled on this server")
+			}
+			if assistantHolder == nil || !assistantHolder.HasTools() {
+				return echo.NewHTTPError(http.StatusServiceUnavailable, "LocalAI Assistant is not available on this server")
+			}
+			mcpExecutor = assistantHolder.Executor()
+			mcpFuncs, discErr := mcpExecutor.DiscoverTools(c.Request().Context())
+			if discErr != nil {
+				xlog.Error("Failed to discover LocalAI Assistant tools", "error", discErr)
+				return echo.NewHTTPError(http.StatusInternalServerError, "discover assistant tools: "+discErr.Error())
+			}
+			for _, fn := range mcpFuncs {
+				funcs = append(funcs, fn)
+				input.Tools = append(input.Tools, functions.Tool{Type: "function", Function: fn})
+			}
+			shouldUseFn = len(funcs) > 0 && config.ShouldUseFunctions()
+
+			// Prepend the embedded system prompt unless the caller supplied
+			// their own system message. Why: the prompt is what teaches the
+			// model the safety rules and recipes. If a caller already has a
+			// system message they're responsible for keeping the assistant
+			// safe, so we leave it alone.
+			if !hasSystemMessage(input.Messages) {
+				input.Messages = append([]schema.Message{{Role: "system", StringContent: assistantHolder.SystemPrompt()}}, input.Messages...)
+			}
+
+			xlog.Debug("LocalAI Assistant tools injected", "count", len(mcpFuncs))
+		}
+
 		// MCP prompt and resource injection (extracted before tool injection)
 		mcpPromptName, mcpPromptArgs := mcpTools.MCPPromptFromMetadata(input.Metadata)
 		mcpResourceURIs := mcpTools.MCPResourcesFromMetadata(input.Metadata)
--- a/core/http/react-ui/src/App.css
+++ b/core/http/react-ui/src/App.css
@@ -4869,6 +4869,81 @@ select.input {
  display: inline-block;
 }

+/* Home assistant CTA — a self-explanatory entry point for the in-process
+   admin tool surface. Distinct from the chat composer below it; uses the
+   accent token + a subtle gradient so it reads as a primary action without
+   looking AI-slop generative. */
+.home-assistant-card {
+  display: flex;
+  align-items: center;
+  gap: var(--spacing-md);
+  width: 100%;
+  margin-bottom: var(--spacing-md);
+  padding: var(--spacing-md) var(--spacing-lg);
+  background: var(--color-surface-raised);
+  border: 1px solid var(--color-accent);
+  border-radius: var(--radius-xl);
+  cursor: pointer;
+  text-align: left;
+  font: inherit;
+  color: var(--color-text);
+  transition: background-color 120ms ease, transform 120ms ease, box-shadow 120ms ease;
+}
+.home-assistant-card:hover {
+  background: var(--color-accent-light, var(--color-surface-hover));
+  transform: translateY(-1px);
+  box-shadow: var(--shadow-md);
+}
+.home-assistant-card:active {
+  transform: translateY(0);
+}
+.home-assistant-icon {
+  flex: 0 0 auto;
+  width: 40px;
+  height: 40px;
+  display: flex;
+  align-items: center;
+  justify-content: center;
+  border-radius: 50%;
+  background: var(--color-accent);
+  color: var(--color-on-accent, #ffffff);
+  font-size: 1.1rem;
+}
+.home-assistant-text {
+  flex: 1 1 auto;
+  display: flex;
+  flex-direction: column;
+  gap: 2px;
+  min-width: 0;
+}
+.home-assistant-title {
+  font-weight: 600;
+  font-size: 1rem;
+}
+.home-assistant-desc {
+  font-size: 0.8125rem;
+  color: var(--color-text-muted);
+}
+.home-assistant-cta {
+  flex: 0 0 auto;
+  display: inline-flex;
+  align-items: center;
+  gap: 6px;
+  font-size: 0.8125rem;
+  font-weight: 500;
+  color: var(--color-accent);
+  white-space: nowrap;
+}
+@media (max-width: 600px) {
+  .home-assistant-card {
+    flex-wrap: wrap;
+  }
+  .home-assistant-cta {
+    flex-basis: 100%;
+    justify-content: flex-end;
+  }
+}
+
 /* Home chat card */
 .home-chat-card {
  width: 100%;
--- a/core/http/react-ui/src/hooks/useChat.js
+++ b/core/http/react-ui/src/hooks/useChat.js
@@ -92,6 +92,9 @@ function createNewChat(model = '', systemPrompt = '', mcpMode = false) {
    mcpServers: [],
    mcpResources: [],
    clientMCPServers: [],
+    // localaiAssistant wires the chat to the in-process admin MCP server
+    // exposed by /v1/chat/completions when an admin opts in.
+    localaiAssistant: false,
    temperature: null,
    topP: null,
    topK: null,
@@ -272,6 +275,14 @@ export function useChat(initialModel = '') {
      requestBody.metadata.mcp_resources = activeChat.mcpResources.join(',')
    }

+    // LocalAI Assistant: opt this chat session into the in-process admin
+    // MCP server. The backend gates on admin role; the toggle is hidden
+    // for non-admins, but defense-in-depth still applies on the server.
+    if (activeChat.localaiAssistant) {
+      if (!requestBody.metadata) requestBody.metadata = {}
+      requestBody.metadata.localai_assistant = 'true'
+    }
+
    // Client-side MCP: inject tools into request body
    if (options.clientMCPTools && options.clientMCPTools.length > 0) {
      requestBody.tools = [...(requestBody.tools || []), ...options.clientMCPTools]
--- a/core/http/react-ui/src/pages/Chat.jsx
+++ b/core/http/react-ui/src/pages/Chat.jsx
@@ -532,8 +532,16 @@ export default function Chat() {
      try {
        const data = JSON.parse(stored)
        localStorage.removeItem('localai_index_chat_data')
-        if (data.message) {
-          // Create a new chat when coming from home
+
+        // Two entry shapes from Home:
+        //   - "compose-and-send": data.message present → open new chat,
+        //     prefill the composer, click submit.
+        //   - "open-assistant": no message, just data.localaiAssistant → open
+        //     a fresh chat already in admin mode so the wizard can fire.
+        const hasMessage = !!data.message
+        const wantsAssistant = !!data.localaiAssistant
+
+        if (hasMessage || wantsAssistant) {
          let targetChat = activeChat
          if (data.newChat) {
            targetChat = addChat(data.model || '', '', data.mcpMode || false)
@@ -551,12 +559,17 @@ export default function Chat() {
          if (data.clientMCPServers?.length > 0 && targetChat) {
            updateChatSettings(targetChat.id, { clientMCPServers: data.clientMCPServers })
          }
-          setInput(data.message)
-          if (data.files) setFiles(data.files)
-          setTimeout(() => {
-            const submitBtn = document.getElementById('chat-submit-btn')
-            submitBtn?.click()
-          }, 100)
+          if (wantsAssistant && targetChat) {
+            updateChatSettings(targetChat.id, { localaiAssistant: true })
+          }
+          if (hasMessage) {
+            setInput(data.message)
+            if (data.files) setFiles(data.files)
+            setTimeout(() => {
+              const submitBtn = document.getElementById('chat-submit-btn')
+              submitBtn?.click()
+            }, 100)
+          }
        }
      } catch (_e) { /* ignore */ }
    }
@@ -887,6 +900,11 @@ export default function Chat() {
            <i className={`fas fa-${sidebarOpen ? 'angles-left' : 'angles-right'}`} />
          </button>
          <span className="chat-header-title">{activeChat.name}</span>
+          {activeChat.localaiAssistant && (
+            <span className="badge badge-accent" title="This chat can install models, edit configs and manage backends by talking to LocalAI.">
+              <i className="fas fa-user-shield" /> Manage mode
+            </span>
+          )}
          <UnifiedMCPDropdown
            serverMCPAvailable={mcpAvailable}
            mcpServerList={mcpServerList}
@@ -962,6 +980,23 @@ export default function Chat() {
                <span className="toggle-slider" />
              </span>
            </label>
+            {isAdmin && (
+              <label
+                className="canvas-mode-toggle"
+                title="Manage LocalAI by chatting — install models, switch backends, and edit configs through the chat. Admin only."
+              >
+                <i className="fas fa-user-shield" />
+                <span className="canvas-mode-label">Manage</span>
+                <span className="toggle">
+                  <input
+                    type="checkbox"
+                    checked={!!activeChat.localaiAssistant}
+                    onChange={(e) => updateChatSettings(activeChat.id, { localaiAssistant: e.target.checked })}
+                  />
+                  <span className="toggle-slider" />
+                </span>
+              </label>
+            )}
            {canvasMode && artifacts.length > 0 && !canvasOpen && (
              <button
                className="btn btn-secondary btn-sm"
@@ -1108,10 +1143,17 @@ export default function Chat() {
              <div className="chat-empty-icon">
                <i className="fas fa-comments" />
              </div>
-              <h2 className="chat-empty-title">Start a conversation</h2>
-              <p className="chat-empty-text">{activeChat.model ? `Ready to chat with ${activeChat.model}` : 'Select a model above to get started'}</p>
+              <h2 className="chat-empty-title">{activeChat.localaiAssistant ? 'Manage LocalAI by chatting' : 'Start a conversation'}</h2>
+              <p className="chat-empty-text">
+                {activeChat.localaiAssistant
+                  ? 'Ask to install models, switch backends, edit configs, or check status. The assistant will summarise actions and wait for your confirmation before changing anything.'
+                  : (activeChat.model ? `Ready to chat with ${activeChat.model}` : 'Select a model above to get started')}
+              </p>
              <div className="chat-empty-suggestions">
-                {['Explain how this works', 'Help me write code', 'Summarize a document', 'Brainstorm ideas'].map((prompt) => (
+                {(activeChat.localaiAssistant
+                  ? ['What is installed?', 'Install a chat model', 'Show system status', 'Update a backend']
+                  : ['Explain how this works', 'Help me write code', 'Summarize a document', 'Brainstorm ideas']
+                ).map((prompt) => (
                  <button
                    key={prompt}
                    className="chat-empty-suggestion"
--- a/core/http/react-ui/src/pages/Home.jsx
+++ b/core/http/react-ui/src/pages/Home.jsx
@@ -37,6 +37,14 @@ export default function Home() {
  const [mcpServerCache, setMcpServerCache] = useState({})
  const [mcpSelectedServers, setMcpSelectedServers] = useState([])
  const [clientMCPSelectedIds, setClientMCPSelectedIds] = useState([])
+  const [assistantAvailable, setAssistantAvailable] = useState(false)
+  // Progressive disclosure: the big "Manage by chatting" CTA card is a
+  // first-run affordance. Once the admin has clicked it, we collapse to
+  // a small entry in the quick-links row so the home page doesn't keep
+  // shouting at them about a feature they already know.
+  const [assistantUsed, setAssistantUsed] = useState(() => {
+    try { return localStorage.getItem('localai_assistant_used') === '1' } catch { return false }
+  })
  const [confirmDialog, setConfirmDialog] = useState(null)
  const [distributedMode, setDistributedMode] = useState(false)
  const [clusterData, setClusterData] = useState(null)
@@ -44,11 +52,14 @@ export default function Home() {
  const audioInputRef = useRef(null)
  const fileInputRef = useRef(null)

-  // Detect distributed mode
+  // Detect distributed mode + assistant feature availability in one fetch.
  useEffect(() => {
    fetch(apiUrl('/api/features'))
      .then(r => r.json())
-      .then(data => setDistributedMode(!!data.distributed))
+      .then(data => {
+        setDistributedMode(!!data.distributed)
+        setAssistantAvailable(!!data.localai_assistant)
+      })
      .catch(() => {})
  }, [])

@@ -204,6 +215,22 @@ export default function Home() {
    navigate(`/app/chat/${encodeURIComponent(selectedModel)}`)
  }, [message, allFiles, selectedModel, mcpMode, mcpSelectedServers, clientMCPSelectedIds, addToast, navigate])

+  // Quick-launch: open a fresh chat already in assistant mode without
+  // requiring an initial message or model selection. Useful when an admin
+  // wants to start the assistant from a cold home page.
+  const openAssistantChat = useCallback(() => {
+    const chatData = {
+      model: selectedModel || '',
+      mcpMode: false,
+      localaiAssistant: true,
+      newChat: true,
+    }
+    localStorage.setItem('localai_index_chat_data', JSON.stringify(chatData))
+    try { localStorage.setItem('localai_assistant_used', '1') } catch { /* ignore */ }
+    setAssistantUsed(true)
+    navigate('/app/chat')
+  }, [navigate, selectedModel])
+
  const handleSubmit = (e) => {
    if (e) e.preventDefault()
    doSubmit()
@@ -308,6 +335,28 @@ export default function Home() {
            </div>
          ) : null}

+          {/* LocalAI Assistant — prominent CTA on first run. Once the
+              admin has used it, the big card collapses to a small entry in
+              the quick-links row below. */}
+          {isAdmin && assistantAvailable && !assistantUsed && (
+            <button
+              type="button"
+              onClick={openAssistantChat}
+              className="home-assistant-card"
+            >
+              <span className="home-assistant-icon"><i className="fas fa-user-shield" /></span>
+              <span className="home-assistant-text">
+                <span className="home-assistant-title">Manage LocalAI by chatting</span>
+                <span className="home-assistant-desc">
+                  Install models, switch backends, edit configs and check status by talking to LocalAI.
+                </span>
+              </span>
+              <span className="home-assistant-cta">
+                Open assistant <i className="fas fa-arrow-right" />
+              </span>
+            </button>
+          )}
+
          {/* Chat input form */}
          <div className="home-chat-card">
            <form onSubmit={handleSubmit}>
@@ -398,6 +447,15 @@ export default function Home() {
          <div className="home-quick-links">
            {isAdmin && (
              <>
+                {assistantAvailable && assistantUsed && (
+                  <button
+                    className="home-link-btn"
+                    onClick={openAssistantChat}
+                    title="Manage LocalAI by chatting"
+                  >
+                    <i className="fas fa-user-shield" /> Manage by chat
+                  </button>
+                )}
                <button className="home-link-btn" onClick={() => navigate('/app/manage')}>
                  <i className="fas fa-desktop" /> Installed Models
                </button>
--- a/core/http/react-ui/src/pages/Settings.jsx
+++ b/core/http/react-ui/src/pages/Settings.jsx
@@ -20,6 +20,7 @@ const SECTIONS = [
  { id: 'apikeys', icon: 'fa-key', color: 'var(--color-error)', label: 'API Keys' },
  { id: 'agents', icon: 'fa-tasks', color: 'var(--color-primary)', label: 'Agent Jobs' },
  { id: 'agentpool', icon: 'fa-robot', color: 'var(--color-primary)', label: 'Agent Pool' },
+  { id: 'assistant', icon: 'fa-user-shield', color: 'var(--color-accent)', label: 'LocalAI Assistant' },
  { id: 'responses', icon: 'fa-database', color: 'var(--color-accent)', label: 'Responses' },
 ]

@@ -460,6 +461,18 @@ export default function Settings() {
            </div>
          </div>

+          {/* LocalAI Assistant */}
+          <div ref={el => sectionRefs.current.assistant = el} style={{ marginBottom: 'var(--spacing-xl)' }}>
+            <h3 style={{ fontSize: '1rem', fontWeight: 700, display: 'flex', alignItems: 'center', gap: 'var(--spacing-sm)', marginBottom: 'var(--spacing-md)' }}>
+              <i className="fas fa-user-shield" style={{ color: 'var(--color-accent)' }} /> LocalAI Assistant
+            </h3>
+            <div className="card">
+              <SettingRow label="Enabled" description="Allow admins to opt chat sessions into the in-process admin tool surface. Disabling refuses new requests with the localai_assistant flag; takes effect without restart.">
+                <Toggle checked={settings.localai_assistant_enabled ?? true} onChange={(v) => update('localai_assistant_enabled', v)} />
+              </SettingRow>
+            </div>
+          </div>
+
          {/* Open Responses */}
          <div ref={el => sectionRefs.current.responses = el} style={{ marginBottom: 'var(--spacing-xl)' }}>
            <h3 style={{ fontSize: '1rem', fontWeight: 700, display: 'flex', alignItems: 'center', gap: 'var(--spacing-sm)', marginBottom: 'var(--spacing-md)' }}>
--- a/core/http/react-ui/src/utils/capabilities.js
+++ b/core/http/react-ui/src/utils/capabilities.js
@@ -1,4 +1,8 @@
 // Capability flags — must match the FLAG_* strings from core/config/model_config.go
+// Server feature keys (returned by GET /api/features), kept as constants so
+// callers don't sprinkle string literals across the UI.
+export const FEATURE_LOCALAI_ASSISTANT = 'localai_assistant'
+
 export const CAP_CHAT = 'FLAG_CHAT'
 export const CAP_COMPLETION = 'FLAG_COMPLETION'
 export const CAP_EDIT = 'FLAG_EDIT'
--- a/core/http/routes/localai.go
+++ b/core/http/routes/localai.go
@@ -329,11 +329,12 @@ func RegisterLocalAIRoutes(router *echo.Echo,

 	router.GET("/api/features", func(c echo.Context) error {
 		return c.JSON(200, map[string]bool{
-			"agents":       appConfig.AgentPool.Enabled,
-			"mcp":          !appConfig.DisableMCP,
-			"fine_tuning":  true,
-			"quantization": true,
-			"distributed":  appConfig.Distributed.Enabled,
+			"agents":            appConfig.AgentPool.Enabled,
+			"mcp":               !appConfig.DisableMCP,
+			"fine_tuning":       true,
+			"quantization":      true,
+			"distributed":       appConfig.Distributed.Enabled,
+			"localai_assistant": !appConfig.DisableLocalAIAssistant && app.LocalAIAssistant() != nil,
 		})
 	})

--- a/core/http/routes/openai.go
+++ b/core/http/routes/openai.go
@@ -32,7 +32,7 @@ func RegisterOpenAIRoutes(app *echo.Echo,
 	}

 	// chat
-	chatHandler := openai.ChatEndpoint(application.ModelConfigLoader(), application.ModelLoader(), application.TemplatesEvaluator(), application.ApplicationConfig(), natsClient)
+	chatHandler := openai.ChatEndpoint(application.ModelConfigLoader(), application.ModelLoader(), application.TemplatesEvaluator(), application.ApplicationConfig(), natsClient, application.LocalAIAssistant())
 	chatMiddleware := []echo.MiddlewareFunc{
 		usageMiddleware,
 		traceMiddleware,
--- a/core/services/modeladmin/action.go
+++ b/core/services/modeladmin/action.go
@@ -0,0 +1,26 @@
+package modeladmin
+
+// Action is the verb passed to ToggleState / TogglePinned. The typed alias
+// catches typos at compile time (a stray "enabled" or "Pin" never reaches
+// the runtime check) and lets callers reference the canonical strings via
+// the constants below rather than re-typing them.
+type Action string
+
+const (
+	ActionEnable  Action = "enable"
+	ActionDisable Action = "disable"
+	ActionPin     Action = "pin"
+	ActionUnpin   Action = "unpin"
+)
+
+// Valid reports whether a is one of the allowed actions for a given
+// operation. ToggleState passes ActionEnable/ActionDisable; TogglePinned
+// passes ActionPin/ActionUnpin.
+func (a Action) Valid(allowed ...Action) bool {
+	for _, x := range allowed {
+		if a == x {
+			return true
+		}
+	}
+	return false
+}
--- a/core/services/modeladmin/atomic.go
+++ b/core/services/modeladmin/atomic.go
@@ -0,0 +1,46 @@
+package modeladmin
+
+import (
+	"fmt"
+	"os"
+	"path/filepath"
+)
+
+// writeFileAtomic writes data to path via a sibling temp file followed by
+// an os.Rename. If the process is killed mid-write, the original file is
+// preserved intact instead of being truncated/partial — which os.WriteFile
+// + O_TRUNC|O_WRONLY would leave behind.
+//
+// The temp file lives in the same directory so the rename is atomic on the
+// same filesystem. The leading "." keeps it out of `ls` output. Cleanup
+// runs on every error path so stray temps don't accumulate when the
+// destination directory is read-only or out of inodes.
+func writeFileAtomic(path string, data []byte, mode os.FileMode) error {
+	dir := filepath.Dir(path)
+	f, err := os.CreateTemp(dir, "."+filepath.Base(path)+".tmp-*")
+	if err != nil {
+		return fmt.Errorf("create temp file: %w", err)
+	}
+	tmp := f.Name()
+	cleanup := func() { _ = os.Remove(tmp) }
+
+	if _, err := f.Write(data); err != nil {
+		_ = f.Close()
+		cleanup()
+		return fmt.Errorf("write temp file: %w", err)
+	}
+	if err := f.Chmod(mode); err != nil {
+		_ = f.Close()
+		cleanup()
+		return fmt.Errorf("chmod temp file: %w", err)
+	}
+	if err := f.Close(); err != nil {
+		cleanup()
+		return fmt.Errorf("close temp file: %w", err)
+	}
+	if err := os.Rename(tmp, path); err != nil {
+		cleanup()
+		return fmt.Errorf("rename temp file: %w", err)
+	}
+	return nil
+}
--- a/core/services/modeladmin/atomic_test.go
+++ b/core/services/modeladmin/atomic_test.go
@@ -0,0 +1,48 @@
+package modeladmin
+
+import (
+	"os"
+	"path/filepath"
+	"runtime"
+
+	. "github.com/onsi/ginkgo/v2"
+	. "github.com/onsi/gomega"
+)
+
+var _ = Describe("writeFileAtomic", func() {
+	It("writes the file with the requested content and leaves no temp leftovers", func() {
+		dir := GinkgoT().TempDir()
+		path := filepath.Join(dir, "model.yaml")
+		Expect(writeFileAtomic(path, []byte("name: x\n"), 0644)).To(Succeed())
+
+		got, err := os.ReadFile(path)
+		Expect(err).ToNot(HaveOccurred())
+		Expect(string(got)).To(Equal("name: x\n"))
+
+		entries, err := os.ReadDir(dir)
+		Expect(err).ToNot(HaveOccurred())
+		Expect(entries).To(HaveLen(1), "directory should contain only the destination file")
+	})
+
+	It("preserves the original file when the rename fails", func() {
+		if runtime.GOOS == "windows" {
+			Skip("chmod-based read-only directory trick is POSIX-specific")
+		}
+		dir := GinkgoT().TempDir()
+		path := filepath.Join(dir, "model.yaml")
+		Expect(os.WriteFile(path, []byte("original\n"), 0644)).To(Succeed())
+
+		// Make the directory read-only so os.CreateTemp fails — easiest way to
+		// force a write error mid-helper without invasive mocking.
+		Expect(os.Chmod(dir, 0o500)).To(Succeed())
+		DeferCleanup(func() { _ = os.Chmod(dir, 0o700) })
+
+		Expect(writeFileAtomic(path, []byte("new\n"), 0644)).ToNot(Succeed())
+
+		// Restore for the read-back below.
+		Expect(os.Chmod(dir, 0o700)).To(Succeed())
+		got, err := os.ReadFile(path)
+		Expect(err).ToNot(HaveOccurred())
+		Expect(string(got)).To(Equal("original\n"), "original file must not be clobbered")
+	})
+})
--- a/core/services/modeladmin/config.go
+++ b/core/services/modeladmin/config.go
@@ -0,0 +1,236 @@
+package modeladmin
+
+import (
+	"context"
+	"errors"
+	"fmt"
+	"os"
+	"path/filepath"
+	"strings"
+
+	"dario.cat/mergo"
+	"gopkg.in/yaml.v3"
+
+	"github.com/mudler/LocalAI/core/config"
+	"github.com/mudler/LocalAI/core/gallery"
+	"github.com/mudler/LocalAI/pkg/model"
+	"github.com/mudler/LocalAI/pkg/utils"
+)
+
+// ConfigService groups operations that read or mutate an installed model's
+// configuration on disk. It keeps the side-effect surface (loader reload,
+// model shutdown) explicit so callers know what gets touched.
+type ConfigService struct {
+	Loader    *config.ModelConfigLoader
+	AppConfig *config.ApplicationConfig
+}
+
+// NewConfigService returns a ConfigService bound to the supplied loader and
+// app config. The loader and the system state in AppConfig are mandatory; the
+// model loader is required only by EditYAML and ToggleState (for Shutdown).
+func NewConfigService(loader *config.ModelConfigLoader, appConfig *config.ApplicationConfig) *ConfigService {
+	return &ConfigService{Loader: loader, AppConfig: appConfig}
+}
+
+// ConfigView is the on-disk YAML plus the parsed JSON view, returned by GetConfig.
+// The YAML is read from disk (not serialised from the in-memory loader) so
+// callers see exactly what the user wrote — no SetDefaults() noise.
+type ConfigView struct {
+	Name string
+	YAML string
+	JSON map[string]any
+}
+
+// EditResult is what EditYAML returns to its caller.
+type EditResult struct {
+	Filename string
+	Renamed  bool
+	OldName  string
+	NewName  string
+	Config   config.ModelConfig
+}
+
+// modelsPath is shorthand for the configured models directory.
+func (s *ConfigService) modelsPath() string {
+	return s.AppConfig.SystemState.Model.ModelsPath
+}
+
+// GetConfig reads the YAML for an installed model from disk and returns it
+// alongside the parsed JSON view.
+func (s *ConfigService) GetConfig(_ context.Context, name string) (*ConfigView, error) {
+	if name == "" {
+		return nil, ErrNameRequired
+	}
+	cfg, exists := s.Loader.GetModelConfig(name)
+	if !exists {
+		return nil, ErrNotFound
+	}
+	configPath := cfg.GetModelConfigFile()
+	if configPath == "" {
+		return nil, ErrConfigFileMissing
+	}
+	if err := utils.VerifyPath(configPath, s.modelsPath()); err != nil {
+		return nil, fmt.Errorf("%w: %v", ErrPathNotTrusted, err)
+	}
+	data, err := os.ReadFile(configPath)
+	if err != nil {
+		return nil, fmt.Errorf("read config file: %w", err)
+	}
+	var jsonView map[string]any
+	_ = yaml.Unmarshal(data, &jsonView)
+	return &ConfigView{Name: name, YAML: string(data), JSON: jsonView}, nil
+}
+
+// PatchConfig applies a JSON deep-merge to an installed model's YAML and
+// reloads. Returns the merged config that's now in the loader.
+//
+// Mirrors PatchConfigEndpoint: read raw YAML from disk (not the in-memory
+// config — which has SetDefaults applied and would persist runtime defaults
+// like top_p/temperature/mirostat), deep-merge the patch, validate, write,
+// reload, preload (preload errors are non-fatal — log only).
+func (s *ConfigService) PatchConfig(_ context.Context, name string, patch map[string]any) (*config.ModelConfig, error) {
+	if name == "" {
+		return nil, ErrNameRequired
+	}
+	if len(patch) == 0 {
+		return nil, ErrEmptyBody
+	}
+	cfg, exists := s.Loader.GetModelConfig(name)
+	if !exists {
+		return nil, ErrNotFound
+	}
+	configPath := cfg.GetModelConfigFile()
+	if err := utils.VerifyPath(configPath, s.modelsPath()); err != nil {
+		return nil, fmt.Errorf("%w: %v", ErrPathNotTrusted, err)
+	}
+	diskYAML, err := os.ReadFile(configPath)
+	if err != nil {
+		return nil, fmt.Errorf("read config file: %w", err)
+	}
+	var existingMap map[string]any
+	if err := yaml.Unmarshal(diskYAML, &existingMap); err != nil {
+		return nil, fmt.Errorf("parse existing config: %w", err)
+	}
+	if existingMap == nil {
+		existingMap = map[string]any{}
+	}
+	if err := mergo.Merge(&existingMap, patch, mergo.WithOverride); err != nil {
+		return nil, fmt.Errorf("merge configs: %w", err)
+	}
+	yamlData, err := yaml.Marshal(existingMap)
+	if err != nil {
+		return nil, fmt.Errorf("marshal merged YAML: %w", err)
+	}
+	var updated config.ModelConfig
+	if err := yaml.Unmarshal(yamlData, &updated); err != nil {
+		return nil, fmt.Errorf("%w: %v", ErrInvalidConfig, err)
+	}
+	if valid, vErr := updated.Validate(); !valid {
+		if vErr != nil {
+			return nil, fmt.Errorf("%w: %v", ErrInvalidConfig, vErr)
+		}
+		return nil, ErrInvalidConfig
+	}
+	if err := writeFileAtomic(configPath, yamlData, 0644); err != nil {
+		return nil, fmt.Errorf("write config file: %w", err)
+	}
+	if err := s.Loader.LoadModelConfigsFromPath(s.modelsPath(), s.AppConfig.ToConfigLoaderOptions()...); err != nil {
+		return nil, fmt.Errorf("reload configs: %w", err)
+	}
+	// Preload is best-effort — a failure here doesn't undo the patch.
+	_ = s.Loader.Preload(s.modelsPath())
+	return &updated, nil
+}
+
+// EditYAML replaces the YAML for an installed model, with optional rename
+// support. ml may be nil; when set, EditYAML calls ml.ShutdownModel(oldName)
+// after a successful write so the next inference picks up the new config.
+func (s *ConfigService) EditYAML(_ context.Context, name string, body []byte, ml *model.ModelLoader) (*EditResult, error) {
+	if name == "" {
+		return nil, ErrNameRequired
+	}
+	if len(body) == 0 {
+		return nil, ErrEmptyBody
+	}
+	existing, exists := s.Loader.GetModelConfig(name)
+	if !exists {
+		return nil, ErrNotFound
+	}
+
+	var req config.ModelConfig
+	if err := yaml.Unmarshal(body, &req); err != nil {
+		return nil, fmt.Errorf("%w: %v", ErrInvalidConfig, err)
+	}
+	if req.Name == "" {
+		return nil, fmt.Errorf("%w: name field is required", ErrInvalidConfig)
+	}
+	if valid, _ := req.Validate(); !valid {
+		return nil, ErrInvalidConfig
+	}
+
+	configPath := existing.GetModelConfigFile()
+	modelsPath := s.modelsPath()
+	if err := utils.VerifyPath(configPath, modelsPath); err != nil {
+		return nil, fmt.Errorf("%w: %v", ErrPathNotTrusted, err)
+	}
+
+	renamed := req.Name != name
+	if renamed {
+		if strings.ContainsRune(req.Name, os.PathSeparator) || strings.Contains(req.Name, "/") || strings.Contains(req.Name, "\\") {
+			return nil, ErrPathSeparator
+		}
+		if _, exists := s.Loader.GetModelConfig(req.Name); exists {
+			return nil, fmt.Errorf("%w: %q", ErrConflict, req.Name)
+		}
+		newConfigPath := filepath.Join(modelsPath, req.Name+".yaml")
+		if err := utils.VerifyPath(newConfigPath, modelsPath); err != nil {
+			return nil, fmt.Errorf("%w: %v", ErrPathNotTrusted, err)
+		}
+		if _, err := os.Stat(newConfigPath); err == nil {
+			return nil, fmt.Errorf("%w: a config file for %q already exists on disk", ErrConflict, req.Name)
+		} else if !errors.Is(err, os.ErrNotExist) {
+			return nil, fmt.Errorf("stat new config: %w", err)
+		}
+		if err := writeFileAtomic(newConfigPath, body, 0644); err != nil {
+			return nil, fmt.Errorf("write new config: %w", err)
+		}
+		if configPath != newConfigPath {
+			// Best-effort: a stale old file is cosmetic, not load-bearing.
+			_ = os.Remove(configPath)
+		}
+		// Move the gallery metadata file so the delete flow can still find it.
+		oldGalleryPath := filepath.Join(modelsPath, gallery.GalleryFileName(name))
+		newGalleryPath := filepath.Join(modelsPath, gallery.GalleryFileName(req.Name))
+		if _, err := os.Stat(oldGalleryPath); err == nil {
+			_ = os.Rename(oldGalleryPath, newGalleryPath)
+		}
+		// Drop the stale in-memory entry before reload so we don't surface
+		// both names between scan steps.
+		s.Loader.RemoveModelConfig(name)
+		configPath = newConfigPath
+	} else {
+		if err := writeFileAtomic(configPath, body, 0644); err != nil {
+			return nil, fmt.Errorf("write config: %w", err)
+		}
+	}
+
+	if err := s.Loader.LoadModelConfigsFromPath(modelsPath, s.AppConfig.ToConfigLoaderOptions()...); err != nil {
+		return nil, fmt.Errorf("reload configs: %w", err)
+	}
+	// Best-effort shutdown: the config is already written; if shutdown fails
+	// the caller can manually reload. The shutdown uses the OLD name because
+	// that's what the running instance was started with.
+	if ml != nil {
+		_ = ml.ShutdownModel(name)
+	}
+	if err := s.Loader.Preload(modelsPath); err != nil {
+		return nil, fmt.Errorf("preload after edit: %w", err)
+	}
+	return &EditResult{
+		Filename: configPath,
+		Renamed:  renamed,
+		OldName:  name,
+		NewName:  req.Name,
+		Config:   req,
+	}, nil
+}
--- a/core/services/modeladmin/config_test.go
+++ b/core/services/modeladmin/config_test.go
@@ -0,0 +1,157 @@
+package modeladmin
+
+import (
+	"context"
+	"os"
+	"path/filepath"
+
+	. "github.com/onsi/ginkgo/v2"
+	. "github.com/onsi/gomega"
+	"gopkg.in/yaml.v3"
+
+	"github.com/mudler/LocalAI/core/config"
+	"github.com/mudler/LocalAI/pkg/system"
+)
+
+// newTestService stands up a ConfigService backed by a tmp dir so the file IO
+// is real but isolated. The model loader is loaded against the same tmp path
+// so GetModelConfig works.
+func newTestService() (*ConfigService, string) {
+	dir := GinkgoT().TempDir()
+	loader := config.NewModelConfigLoader(dir)
+	appConfig := &config.ApplicationConfig{
+		SystemState: &system.SystemState{Model: system.Model{ModelsPath: dir}},
+	}
+	return NewConfigService(loader, appConfig), dir
+}
+
+// writeModelYAML creates a model YAML on disk and reloads the loader so the
+// new entry is visible.
+func writeModelYAML(svc *ConfigService, dir, name string, body map[string]any) {
+	body["name"] = name
+	data, err := yaml.Marshal(body)
+	Expect(err).ToNot(HaveOccurred())
+	path := filepath.Join(dir, name+".yaml")
+	Expect(os.WriteFile(path, data, 0644)).To(Succeed())
+	Expect(svc.Loader.LoadModelConfigsFromPath(dir, svc.AppConfig.ToConfigLoaderOptions()...)).To(Succeed())
+}
+
+var _ = Describe("ConfigService", func() {
+	var (
+		svc *ConfigService
+		dir string
+		ctx context.Context
+	)
+
+	BeforeEach(func() {
+		svc, dir = newTestService()
+		ctx = context.Background()
+	})
+
+	Describe("GetConfig", func() {
+		It("round-trips YAML from disk and exposes the parsed JSON", func() {
+			writeModelYAML(svc, dir, "qwen", map[string]any{"backend": "llama-cpp", "context_size": 4096})
+
+			view, err := svc.GetConfig(ctx, "qwen")
+			Expect(err).ToNot(HaveOccurred())
+			Expect(view.Name).To(Equal("qwen"))
+			Expect(view.JSON).To(HaveKeyWithValue("backend", "llama-cpp"))
+		})
+
+		It("returns ErrNotFound for an unknown model", func() {
+			_, err := svc.GetConfig(ctx, "missing")
+			Expect(err).To(MatchError(ErrNotFound))
+		})
+
+		It("returns ErrNameRequired when name is empty", func() {
+			_, err := svc.GetConfig(ctx, "")
+			Expect(err).To(MatchError(ErrNameRequired))
+		})
+	})
+
+	Describe("PatchConfig", func() {
+		It("deep-merges the patch and preserves untouched siblings", func() {
+			writeModelYAML(svc, dir, "qwen", map[string]any{
+				"backend":      "llama-cpp",
+				"context_size": 4096,
+				"parameters":   map[string]any{"temperature": 0.7, "top_p": 0.9},
+			})
+
+			updated, err := svc.PatchConfig(ctx, "qwen", map[string]any{
+				"context_size": 8192,
+				"parameters":   map[string]any{"temperature": 0.5},
+			})
+			Expect(err).ToNot(HaveOccurred())
+			Expect(updated.Name).To(Equal("qwen"))
+
+			raw, err := os.ReadFile(filepath.Join(dir, "qwen.yaml"))
+			Expect(err).ToNot(HaveOccurred())
+			var got map[string]any
+			Expect(yaml.Unmarshal(raw, &got)).To(Succeed())
+			Expect(got).To(HaveKeyWithValue("context_size", 8192))
+
+			params, ok := got["parameters"].(map[string]any)
+			Expect(ok).To(BeTrue())
+			Expect(params).To(HaveKeyWithValue("temperature", 0.5))
+			// top_p must still be there: deep-merge should NOT clobber siblings.
+			Expect(params).To(HaveKeyWithValue("top_p", 0.9))
+		})
+
+		It("returns ErrNotFound for an unknown model", func() {
+			_, err := svc.PatchConfig(ctx, "ghost", map[string]any{"x": 1})
+			Expect(err).To(MatchError(ErrNotFound))
+		})
+
+		It("rejects an empty patch with ErrEmptyBody", func() {
+			writeModelYAML(svc, dir, "qwen", map[string]any{"backend": "llama-cpp"})
+			_, err := svc.PatchConfig(ctx, "qwen", map[string]any{})
+			Expect(err).To(MatchError(ErrEmptyBody))
+		})
+	})
+
+	Describe("EditYAML", func() {
+		It("renames the on-disk file and reindexes the loader", func() {
+			writeModelYAML(svc, dir, "old-name", map[string]any{"backend": "llama-cpp"})
+
+			body := []byte("name: new-name\nbackend: llama-cpp\n")
+			result, err := svc.EditYAML(ctx, "old-name", body, nil)
+			Expect(err).ToNot(HaveOccurred())
+			Expect(result.Renamed).To(BeTrue())
+			Expect(result.OldName).To(Equal("old-name"))
+			Expect(result.NewName).To(Equal("new-name"))
+
+			_, err = os.Stat(filepath.Join(dir, "old-name.yaml"))
+			Expect(os.IsNotExist(err)).To(BeTrue(), "old YAML should be removed")
+			_, err = os.Stat(filepath.Join(dir, "new-name.yaml"))
+			Expect(err).ToNot(HaveOccurred(), "new YAML should exist")
+
+			_, ok := svc.Loader.GetModelConfig("new-name")
+			Expect(ok).To(BeTrue(), "loader should have the renamed model")
+			_, ok = svc.Loader.GetModelConfig("old-name")
+			Expect(ok).To(BeFalse(), "loader should not retain the old name")
+		})
+
+		It("refuses a rename that would clobber an existing model", func() {
+			writeModelYAML(svc, dir, "alpha", map[string]any{"backend": "llama-cpp"})
+			writeModelYAML(svc, dir, "beta", map[string]any{"backend": "llama-cpp"})
+
+			body := []byte("name: beta\nbackend: llama-cpp\n")
+			_, err := svc.EditYAML(ctx, "alpha", body, nil)
+			Expect(err).To(MatchError(ErrConflict))
+		})
+
+		It("rejects path-separator characters in the new name", func() {
+			writeModelYAML(svc, dir, "alpha", map[string]any{"backend": "llama-cpp"})
+
+			body := []byte("name: ../escape\nbackend: llama-cpp\n")
+			_, err := svc.EditYAML(ctx, "alpha", body, nil)
+			Expect(err).To(MatchError(ErrPathSeparator))
+		})
+
+		It("returns ErrEmptyBody when the body is nil", func() {
+			writeModelYAML(svc, dir, "alpha", map[string]any{"backend": "llama-cpp"})
+			_, err := svc.EditYAML(ctx, "alpha", nil, nil)
+			Expect(err).To(MatchError(ErrEmptyBody))
+		})
+	})
+})
--- a/core/services/modeladmin/doc.go
+++ b/core/services/modeladmin/doc.go
@@ -0,0 +1,17 @@
+// Package modeladmin owns the operations that mutate or read the
+// configuration of an *already-installed* model on disk: full YAML edits
+// (with rename), JSON deep-merge patches, enable/disable, pin/unpin, VRAM
+// estimation, and read-back of the on-disk YAML.
+//
+// It exists so the same logic can be called from two places:
+//
+//   - HTTP handlers in core/http/endpoints/localai/* — the existing REST
+//     surface (PUT/PATCH/POST under /models/...).
+//   - In-process MCP clients (pkg/mcp/localaitools/inproc) — the LocalAI
+//     Assistant chat modality calls these helpers directly so the
+//     in-process tool surface and the REST surface stay in sync.
+//
+// Distinct from core/services/galleryop, which owns *sourcing* models
+// (install from a gallery, delete, upgrade). modeladmin only manages
+// configs and runtime flags of models that already exist locally.
+package modeladmin
--- a/core/services/modeladmin/errors.go
+++ b/core/services/modeladmin/errors.go
@@ -0,0 +1,27 @@
+package modeladmin
+
+import "errors"
+
+// Sentinel errors callers can switch on. HTTP handlers map them to specific
+// status codes; the inproc MCP client surfaces them verbatim to the LLM.
+var (
+	// ErrNameRequired is returned when an operation needs a model name and got nothing.
+	ErrNameRequired = errors.New("model name is required")
+	// ErrNotFound is returned when the model name doesn't exist in the loader.
+	ErrNotFound = errors.New("model configuration not found")
+	// ErrConfigFileMissing is returned when the loader knows the model but its
+	// config file is unset (in-memory-only model — shouldn't happen on disk).
+	ErrConfigFileMissing = errors.New("model configuration file not found")
+	// ErrPathNotTrusted is returned when utils.VerifyPath rejects a config path.
+	ErrPathNotTrusted = errors.New("model configuration path not trusted")
+	// ErrConflict is returned when a rename would clobber an existing model.
+	ErrConflict = errors.New("a model with that name already exists")
+	// ErrBadAction is returned when toggle/state actions are not in the allowed set.
+	ErrBadAction = errors.New("invalid action")
+	// ErrInvalidConfig is returned when the new YAML/JSON fails validation.
+	ErrInvalidConfig = errors.New("invalid model configuration")
+	// ErrEmptyBody is returned when the request body is empty.
+	ErrEmptyBody = errors.New("request body is empty")
+	// ErrPathSeparator is returned when a renamed model name contains path separators.
+	ErrPathSeparator = errors.New("model name must not contain path separators")
+)
--- a/core/services/modeladmin/modeladmin_suite_test.go
+++ b/core/services/modeladmin/modeladmin_suite_test.go
@@ -0,0 +1,13 @@
+package modeladmin
+
+import (
+	"testing"
+
+	. "github.com/onsi/ginkgo/v2"
+	. "github.com/onsi/gomega"
+)
+
+func TestModelAdmin(t *testing.T) {
+	RegisterFailHandler(Fail)
+	RunSpecs(t, "modeladmin test suite")
+}
--- a/core/services/modeladmin/pinned.go
+++ b/core/services/modeladmin/pinned.go
@@ -0,0 +1,45 @@
+package modeladmin
+
+import (
+	"context"
+	"fmt"
+
+	"github.com/mudler/LocalAI/pkg/utils"
+)
+
+// SyncPinnedFn lets the caller (HTTP handler or inproc client) propagate a
+// pin/unpin to the watchdog without coupling this package to it.
+type SyncPinnedFn func()
+
+// TogglePinned pins or unpins a model. action must be ActionPin or
+// ActionUnpin. syncPinned, if non-nil, is invoked after a successful
+// reload so the watchdog can refresh its eviction-exempt set.
+func (s *ConfigService) TogglePinned(_ context.Context, name string, action Action, syncPinned SyncPinnedFn) (*ToggleResult, error) {
+	if name == "" {
+		return nil, ErrNameRequired
+	}
+	if !action.Valid(ActionPin, ActionUnpin) {
+		return nil, fmt.Errorf("%w: must be %q or %q, got %q", ErrBadAction, ActionPin, ActionUnpin, action)
+	}
+	cfg, exists := s.Loader.GetModelConfig(name)
+	if !exists {
+		return nil, ErrNotFound
+	}
+	configPath := cfg.GetModelConfigFile()
+	if configPath == "" {
+		return nil, ErrConfigFileMissing
+	}
+	if err := utils.VerifyPath(configPath, s.modelsPath()); err != nil {
+		return nil, fmt.Errorf("%w: %v", ErrPathNotTrusted, err)
+	}
+	if err := mutateYAMLBoolFlag(configPath, "pinned", action == ActionPin); err != nil {
+		return nil, err
+	}
+	if err := s.Loader.LoadModelConfigsFromPath(s.modelsPath(), s.AppConfig.ToConfigLoaderOptions()...); err != nil {
+		return nil, fmt.Errorf("reload configs: %w", err)
+	}
+	if syncPinned != nil {
+		syncPinned()
+	}
+	return &ToggleResult{Filename: configPath, Action: action}, nil
+}
--- a/core/services/modeladmin/pinned_test.go
+++ b/core/services/modeladmin/pinned_test.go
@@ -0,0 +1,57 @@
+package modeladmin
+
+import (
+	"context"
+	"path/filepath"
+
+	. "github.com/onsi/ginkgo/v2"
+	. "github.com/onsi/gomega"
+)
+
+var _ = Describe("ConfigService.TogglePinned", func() {
+	var (
+		svc *ConfigService
+		dir string
+		ctx context.Context
+	)
+
+	BeforeEach(func() {
+		svc, dir = newTestService()
+		ctx = context.Background()
+	})
+
+	It("pins a model by writing pinned: true", func() {
+		writeModelYAML(svc, dir, "qwen", map[string]any{"backend": "llama-cpp"})
+
+		_, err := svc.TogglePinned(ctx, "qwen", ActionPin, nil)
+		Expect(err).ToNot(HaveOccurred())
+
+		got := readMap(filepath.Join(dir, "qwen.yaml"))
+		Expect(got).To(HaveKeyWithValue("pinned", true))
+	})
+
+	It("unpins by removing the pinned key entirely", func() {
+		writeModelYAML(svc, dir, "qwen", map[string]any{"backend": "llama-cpp", "pinned": true})
+
+		_, err := svc.TogglePinned(ctx, "qwen", ActionUnpin, nil)
+		Expect(err).ToNot(HaveOccurred())
+
+		got := readMap(filepath.Join(dir, "qwen.yaml"))
+		Expect(got).ToNot(HaveKey("pinned"))
+	})
+
+	It("rejects unknown actions with ErrBadAction", func() {
+		writeModelYAML(svc, dir, "qwen", map[string]any{"backend": "llama-cpp"})
+		_, err := svc.TogglePinned(ctx, "qwen", Action("stick"), nil)
+		Expect(err).To(MatchError(ErrBadAction))
+	})
+
+	It("invokes the syncPinned callback after a successful toggle", func() {
+		writeModelYAML(svc, dir, "qwen", map[string]any{"backend": "llama-cpp"})
+
+		called := false
+		_, err := svc.TogglePinned(ctx, "qwen", ActionPin, func() { called = true })
+		Expect(err).ToNot(HaveOccurred())
+		Expect(called).To(BeTrue(), "syncPinned callback should be invoked")
+	})
+})
--- a/core/services/modeladmin/state.go
+++ b/core/services/modeladmin/state.go
@@ -0,0 +1,85 @@
+package modeladmin
+
+import (
+	"context"
+	"fmt"
+	"os"
+
+	"gopkg.in/yaml.v3"
+
+	"github.com/mudler/LocalAI/pkg/model"
+	"github.com/mudler/LocalAI/pkg/utils"
+)
+
+// ToggleResult is shared by ToggleState and TogglePinned.
+type ToggleResult struct {
+	Filename string
+	Action   Action
+}
+
+// ToggleState enables or disables an installed model. action must be
+// ActionEnable or ActionDisable. When ml is non-nil and the action is
+// ActionDisable, ToggleState calls ml.ShutdownModel — best-effort.
+//
+// The on-disk YAML is mutated as a generic map so unrelated fields are
+// preserved verbatim; we only set or remove the `disabled` key.
+func (s *ConfigService) ToggleState(_ context.Context, name string, action Action, ml *model.ModelLoader) (*ToggleResult, error) {
+	if name == "" {
+		return nil, ErrNameRequired
+	}
+	if !action.Valid(ActionEnable, ActionDisable) {
+		return nil, fmt.Errorf("%w: must be %q or %q, got %q", ErrBadAction, ActionEnable, ActionDisable, action)
+	}
+	cfg, exists := s.Loader.GetModelConfig(name)
+	if !exists {
+		return nil, ErrNotFound
+	}
+	configPath := cfg.GetModelConfigFile()
+	if configPath == "" {
+		return nil, ErrConfigFileMissing
+	}
+	if err := utils.VerifyPath(configPath, s.modelsPath()); err != nil {
+		return nil, fmt.Errorf("%w: %v", ErrPathNotTrusted, err)
+	}
+	if err := mutateYAMLBoolFlag(configPath, "disabled", action == ActionDisable); err != nil {
+		return nil, err
+	}
+	if err := s.Loader.LoadModelConfigsFromPath(s.modelsPath(), s.AppConfig.ToConfigLoaderOptions()...); err != nil {
+		return nil, fmt.Errorf("reload configs: %w", err)
+	}
+	if action == ActionDisable && ml != nil {
+		// Best-effort: the YAML is saved; shutdown is a courtesy.
+		_ = ml.ShutdownModel(name)
+	}
+	return &ToggleResult{Filename: configPath, Action: action}, nil
+}
+
+// mutateYAMLBoolFlag is a small helper shared by ToggleState and
+// TogglePinned: read the file as a generic map, set or remove a bool key,
+// write back. Setting `set=false` removes the key for a clean YAML.
+func mutateYAMLBoolFlag(path, key string, set bool) error {
+	data, err := os.ReadFile(path)
+	if err != nil {
+		return fmt.Errorf("read config: %w", err)
+	}
+	var m map[string]any
+	if err := yaml.Unmarshal(data, &m); err != nil {
+		return fmt.Errorf("parse config: %w", err)
+	}
+	if m == nil {
+		m = map[string]any{}
+	}
+	if set {
+		m[key] = true
+	} else {
+		delete(m, key)
+	}
+	out, err := yaml.Marshal(m)
+	if err != nil {
+		return fmt.Errorf("marshal config: %w", err)
+	}
+	if err := writeFileAtomic(path, out, 0644); err != nil {
+		return fmt.Errorf("write config: %w", err)
+	}
+	return nil
+}
--- a/core/services/modeladmin/state_test.go
+++ b/core/services/modeladmin/state_test.go
@@ -0,0 +1,65 @@
+package modeladmin
+
+import (
+	"context"
+	"os"
+	"path/filepath"
+
+	. "github.com/onsi/ginkgo/v2"
+	. "github.com/onsi/gomega"
+	"gopkg.in/yaml.v3"
+)
+
+// readMap reads the YAML file at path as a map[string]any. Used by both
+// state and pinned specs to assert on the on-disk shape.
+func readMap(path string) map[string]any {
+	raw, err := os.ReadFile(path)
+	Expect(err).ToNot(HaveOccurred())
+	var m map[string]any
+	Expect(yaml.Unmarshal(raw, &m)).To(Succeed())
+	return m
+}
+
+var _ = Describe("ConfigService.ToggleState", func() {
+	var (
+		svc *ConfigService
+		dir string
+		ctx context.Context
+	)
+
+	BeforeEach(func() {
+		svc, dir = newTestService()
+		ctx = context.Background()
+	})
+
+	It("disables a model by writing disabled: true", func() {
+		writeModelYAML(svc, dir, "qwen", map[string]any{"backend": "llama-cpp"})
+
+		_, err := svc.ToggleState(ctx, "qwen", ActionDisable, nil)
+		Expect(err).ToNot(HaveOccurred())
+
+		got := readMap(filepath.Join(dir, "qwen.yaml"))
+		Expect(got).To(HaveKeyWithValue("disabled", true))
+	})
+
+	It("enables a model by removing the disabled key entirely", func() {
+		writeModelYAML(svc, dir, "qwen", map[string]any{"backend": "llama-cpp", "disabled": true})
+
+		_, err := svc.ToggleState(ctx, "qwen", ActionEnable, nil)
+		Expect(err).ToNot(HaveOccurred())
+
+		got := readMap(filepath.Join(dir, "qwen.yaml"))
+		Expect(got).ToNot(HaveKey("disabled"))
+	})
+
+	It("rejects unknown actions with ErrBadAction", func() {
+		writeModelYAML(svc, dir, "qwen", map[string]any{"backend": "llama-cpp"})
+		_, err := svc.ToggleState(ctx, "qwen", Action("noop"), nil)
+		Expect(err).To(MatchError(ErrBadAction))
+	})
+
+	It("returns ErrNotFound for an unknown model", func() {
+		_, err := svc.ToggleState(ctx, "ghost", ActionDisable, nil)
+		Expect(err).To(MatchError(ErrNotFound))
+	})
+})
--- a/core/services/modeladmin/vram.go
+++ b/core/services/modeladmin/vram.go
@@ -0,0 +1,128 @@
+package modeladmin
+
+import (
+	"context"
+	"fmt"
+	"path/filepath"
+	"strings"
+	"time"
+
+	"github.com/mudler/LocalAI/core/config"
+	"github.com/mudler/LocalAI/pkg/system"
+	"github.com/mudler/LocalAI/pkg/vram"
+)
+
+// VRAMRequest is the input for EstimateVRAM. JSON tags let the HTTP
+// handler bind directly into this type instead of carrying a parallel
+// private struct.
+type VRAMRequest struct {
+	Model       string `json:"model"`
+	ContextSize uint32 `json:"context_size,omitempty"`
+	GPULayers   int    `json:"gpu_layers,omitempty"`
+	KVQuantBits int    `json:"kv_quant_bits,omitempty"`
+}
+
+// VRAMResponse embeds vram.EstimateResult and adds the context-defaulted
+// note fields the HTTP endpoint surfaces.
+type VRAMResponse struct {
+	vram.EstimateResult
+	ContextNote     string `json:"context_note,omitempty"`
+	ModelMaxContext uint64 `json:"model_max_context,omitempty"`
+}
+
+// EstimateVRAM computes a VRAM estimate for an installed model. It mirrors
+// VRAMEstimateEndpoint without any HTTP coupling.
+func EstimateVRAM(ctx context.Context, req VRAMRequest, cl *config.ModelConfigLoader, sysState *system.SystemState) (*VRAMResponse, error) {
+	if req.Model == "" {
+		return nil, ErrNameRequired
+	}
+	cfg, exists := cl.GetModelConfig(req.Model)
+	if !exists {
+		return nil, ErrNotFound
+	}
+	modelsPath := sysState.Model.ModelsPath
+
+	var files []vram.FileInput
+	var firstGGUF string
+	seen := make(map[string]bool)
+
+	for _, f := range cfg.DownloadFiles {
+		addWeightFile(string(f.URI), modelsPath, &files, &firstGGUF, seen)
+	}
+	if cfg.Model != "" {
+		addWeightFile(cfg.Model, modelsPath, &files, &firstGGUF, seen)
+	}
+	if cfg.MMProj != "" {
+		addWeightFile(cfg.MMProj, modelsPath, &files, &firstGGUF, seen)
+	}
+
+	if len(files) == 0 {
+		// No weight files: the caller (HTTP or MCP) reports this as a
+		// non-error empty estimate. Returning a typed nil here lets both
+		// layers format the message consistently.
+		return &VRAMResponse{ContextNote: "no weight files found for estimation"}, nil
+	}
+
+	contextDefaulted := false
+	opts := vram.EstimateOptions{
+		ContextLength: req.ContextSize,
+		GPULayers:     req.GPULayers,
+		KVQuantBits:   req.KVQuantBits,
+	}
+	if opts.ContextLength == 0 {
+		if cfg.ContextSize != nil {
+			opts.ContextLength = uint32(*cfg.ContextSize)
+		} else {
+			opts.ContextLength = 8192
+			contextDefaulted = true
+		}
+	}
+
+	subCtx, cancel := context.WithTimeout(ctx, 10*time.Second)
+	defer cancel()
+
+	result, err := vram.Estimate(subCtx, files, opts, vram.DefaultCachedSizeResolver(), vram.DefaultCachedGGUFReader())
+	if err != nil {
+		return nil, fmt.Errorf("vram estimate: %w", err)
+	}
+
+	resp := &VRAMResponse{EstimateResult: result}
+
+	if contextDefaulted && firstGGUF != "" {
+		ggufMeta, err := vram.DefaultCachedGGUFReader().ReadMetadata(subCtx, firstGGUF)
+		if err == nil && ggufMeta != nil && ggufMeta.MaximumContextLength > 0 {
+			resp.ModelMaxContext = ggufMeta.MaximumContextLength
+			resp.ContextNote = fmt.Sprintf(
+				"Estimate used default context_size=8192. The model's trained maximum context is %d; VRAM usage will be higher at larger context sizes.",
+				ggufMeta.MaximumContextLength,
+			)
+		}
+	}
+	return resp, nil
+}
+
+// resolveModelURI converts a relative model path to a file:// URI so the
+// size resolver can stat it on disk. URIs that already have a scheme are
+// returned unchanged.
+func resolveModelURI(uri, modelsPath string) string {
+	if strings.Contains(uri, "://") {
+		return uri
+	}
+	return "file://" + filepath.Join(modelsPath, uri)
+}
+
+// addWeightFile appends a resolved weight file to files and tracks the first GGUF.
+func addWeightFile(uri, modelsPath string, files *[]vram.FileInput, firstGGUF *string, seen map[string]bool) {
+	if !vram.IsWeightFile(uri) {
+		return
+	}
+	resolved := resolveModelURI(uri, modelsPath)
+	if seen[resolved] {
+		return
+	}
+	seen[resolved] = true
+	*files = append(*files, vram.FileInput{URI: resolved, Size: 0})
+	if *firstGGUF == "" && vram.IsGGUF(uri) {
+		*firstGGUF = resolved
+	}
+}
--- a/docs/content/features/localai-assistant.md
+++ b/docs/content/features/localai-assistant.md
@@ -0,0 +1,61 @@
+++
+disableToc = false
+title = "LocalAI Assistant"
+weight = 27
+url = '/features/localai-assistant'
+++
+
+LocalAI Assistant is an admin-only chat modality. When enabled on a chat session, the conversation is wired to an in-process MCP server that exposes LocalAI's own admin/management surface as tools. You can install models, manage backends, edit model configs and check system status by chatting — no REST calls or YAML edits.
+
+The same MCP server is published as a Go package and can also be served over **stdio** to control a remote LocalAI instance from outside (e.g. from a desktop MCP host, Cursor, or `mcphost`).
+
+## Enabling the assistant in chat
+
+Open the chat UI as an **admin** user and pick a chat-capable model in the model selector. The header shows a **Manage** toggle — flip it on, and a `Manage mode` badge appears next to the chat title. Starter chips ("What is installed?", "Install a chat model", "Show system status", "Update a backend") help you get going.
+
+The home page also exposes a **Manage by chat** CTA that opens a fresh chat already in Manage mode.
+
+Once on, try:
+
+> Install Qwen 3 chat
+
+The assistant searches the gallery, lists candidates, asks you to pick, summarises the install, and waits for your confirmation before calling `install_model`. While the install runs, it polls progress and reports the outcome.
+
+## Disabling the feature
+
+Either toggle it off in **Settings → LocalAI Assistant** (takes effect without restart), or hard-disable at startup:
+
+```bash
+LOCALAI_DISABLE_ASSISTANT=true local-ai run
+```
+
+When disabled, the chat handler refuses requests with `metadata.localai_assistant=true` and returns 503. The Manage toggle is hidden in the UI.
+
+## Security model
+
+- The chat toggle is hidden for non-admin users.
+- The chat handler re-checks admin role at request time even when auth is configured to skip the assistant feature gate (defense in depth).
+- The MCP server itself is in-process — there is no localhost loopback, no synthetic API key, and no extra TCP socket.
+- Mutating tools (`install_model`, `delete_model`, `edit_model_config`, `upgrade_backend`, …) are guarded by a system-prompt rule that requires the LLM to confirm the action with the user before calling them. There is no separate code-side preview/apply step.
+
+## Standalone stdio MCP server
+
+You can run the same admin tool surface as a stdio MCP server pointed at any LocalAI HTTP API:
+
+```bash
+local-ai mcp-server --target http://remote.localai:8080 --api-key <admin-key>
+# read-only mode — skips registration of every mutating tool
+local-ai mcp-server --target http://remote.localai:8080 --read-only
+```
+
+Useful for hooking LocalAI admin into Claude Desktop, Cursor, or any MCP host. The tool catalog is identical to the in-process variant.
+
+## Tool catalog
+
+**Read-only**: `gallery_search`, `list_installed_models`, `list_galleries`, `list_backends`, `list_known_backends`, `get_job_status`, `get_model_config`, `vram_estimate`, `system_info`, `list_nodes`.
+
+**Mutating** (require user confirmation per the assistant's safety prompt): `install_model`, `import_model_uri`, `delete_model`, `install_backend`, `upgrade_backend`, `edit_model_config`, `reload_models`, `toggle_model_state`, `toggle_model_pinned`.
+
+## Adding new tools or skills
+
+The MCP server lives at `pkg/mcp/localaitools/`. Tools are registered in `tools_*.go`; skill prompts (the markdown the LLM sees) are embedded from `prompts/`. To add a new admin tool, add a method to the `LocalAIClient` interface plus implementations in `inproc/` and `httpapi/`, then register the tool and add or update a skill prompt. See `.agents/localai-assistant-mcp.md` for the full contributor checklist.
--- a/pkg/mcp/localaitools/capability.go
+++ b/pkg/mcp/localaitools/capability.go
@@ -0,0 +1,25 @@
+package localaitools
+
+// Capability is the human-readable tag the LLM uses to filter installed
+// models by purpose. The chat handler maps it to the loader's bitflag
+// (config.FLAG_*) — see inproc.Client.capabilityToFlag. The empty value
+// means "no filter".
+//
+// Renaming or adding values is a public-API change: tool DTOs reference
+// these constants in their jsonschema enum, so the LLM sees the canonical
+// list at tools/list time.
+type Capability string
+
+const (
+	// CapabilityAny is the explicit zero value — equivalent to no filter.
+	CapabilityAny Capability = ""
+
+	CapabilityChat       Capability = "chat"
+	CapabilityCompletion Capability = "completion"
+	CapabilityEmbeddings Capability = "embeddings"
+	CapabilityImage      Capability = "image"
+	CapabilityTTS        Capability = "tts"
+	CapabilityTranscript Capability = "transcript"
+	CapabilityRerank     Capability = "rerank"
+	CapabilityVAD        Capability = "vad"
+)
--- a/pkg/mcp/localaitools/client.go
+++ b/pkg/mcp/localaitools/client.go
@@ -0,0 +1,62 @@
+package localaitools
+
+import (
+	"context"
+
+	"github.com/mudler/LocalAI/core/config"
+	"github.com/mudler/LocalAI/core/gallery"
+	"github.com/mudler/LocalAI/core/schema"
+	"github.com/mudler/LocalAI/core/services/modeladmin"
+	"github.com/mudler/LocalAI/pkg/vram"
+)
+
+// LocalAIClient is the surface tools depend on. It has two implementations:
+//
+//   - inproc.Client (in-process; calls LocalAI services directly)
+//   - httpapi.Client (out-of-process; calls the LocalAI REST API)
+//
+// Tool handlers and the embedded skill prompts are agnostic to which
+// implementation backs the client.
+//
+// Where the same shape already exists elsewhere in the codebase
+// (config.Gallery, gallery.Metadata, schema.KnownBackend, vram.EstimateResult,
+// modeladmin.Action/Capability) we surface it directly rather than maintain
+// a parallel DTO — keeping the LLM-visible wire format aligned with the
+// rest of LocalAI by construction.
+type LocalAIClient interface {
+	// ---- Models / gallery (read) ----
+	GallerySearch(ctx context.Context, q GallerySearchQuery) ([]gallery.Metadata, error)
+	ListInstalledModels(ctx context.Context, capability Capability) ([]InstalledModel, error)
+	ListGalleries(ctx context.Context) ([]config.Gallery, error)
+	GetJobStatus(ctx context.Context, jobID string) (*JobStatus, error)
+	GetModelConfig(ctx context.Context, name string) (*ModelConfigView, error)
+
+	// ---- Models / gallery (write) ----
+	InstallModel(ctx context.Context, req InstallModelRequest) (jobID string, err error)
+	DeleteModel(ctx context.Context, name string) error
+	EditModelConfig(ctx context.Context, name string, patch map[string]any) error
+	ReloadModels(ctx context.Context) error
+	ImportModelURI(ctx context.Context, req ImportModelURIRequest) (*ImportModelURIResponse, error)
+
+	// ---- Backends ----
+	// ListBackends returns installed backends. The shape stays a thin
+	// localaitools.Backend rather than gallery.SystemBackend because the
+	// latter carries filesystem paths (RunFile, Metadata) the LLM
+	// shouldn't see.
+	ListBackends(ctx context.Context) ([]Backend, error)
+	// ListKnownBackends returns the same shape as REST /backends/known.
+	ListKnownBackends(ctx context.Context) ([]schema.KnownBackend, error)
+	InstallBackend(ctx context.Context, req InstallBackendRequest) (jobID string, err error)
+	UpgradeBackend(ctx context.Context, name string) (jobID string, err error)
+
+	// ---- System ----
+	SystemInfo(ctx context.Context) (*SystemInfo, error)
+	ListNodes(ctx context.Context) ([]Node, error)
+	VRAMEstimate(ctx context.Context, req VRAMEstimateRequest) (*vram.EstimateResult, error)
+
+	// ---- State ----
+	// ToggleModelState accepts modeladmin.ActionEnable / ActionDisable.
+	ToggleModelState(ctx context.Context, name string, action modeladmin.Action) error
+	// ToggleModelPinned accepts modeladmin.ActionPin / ActionUnpin.
+	ToggleModelPinned(ctx context.Context, name string, action modeladmin.Action) error
+}
--- a/pkg/mcp/localaitools/coverage_test.go
+++ b/pkg/mcp/localaitools/coverage_test.go
@@ -0,0 +1,89 @@
+package localaitools
+
+import (
+	. "github.com/onsi/ginkgo/v2"
+	. "github.com/onsi/gomega"
+)
+
+// toolToHTTPRoute is the canonical mapping between MCP tools and the
+// LocalAI admin REST endpoints they wrap. The httpapi.Client MUST hit the
+// listed route for the tool; the inproc.Client may bypass HTTP and call
+// services directly, but the on-the-wire shape is documented here so the
+// two sides stay aligned.
+//
+// Updating the map is REQUIRED when:
+//   - You add a Tool* constant (tools.go).
+//   - You change which REST endpoint the httpapi.Client calls.
+//
+// The TestToolHTTPRouteMappingComplete spec below FAILS until every Tool*
+// is in the map. That is the drift detector — see
+// .agents/localai-assistant-mcp.md for the contributor contract.
+//
+// "(none)" is a deliberate sentinel for tools whose data is not exposed
+// over a single REST endpoint (e.g. system_info aggregates data the
+// inproc client picks up directly from services). The httpapi.Client may
+// approximate via the welcome JSON; the test still requires an entry so
+// the contributor explicitly acknowledges the asymmetry.
+var toolToHTTPRoute = map[string]string{
+	// Read-only tools.
+	ToolGallerySearch:       "GET /models/available",
+	ToolListInstalledModels: "GET / (welcome JSON, ModelsConfig field)",
+	ToolListGalleries:       "GET /models/galleries",
+	ToolGetJobStatus:        "GET /models/jobs/:uuid",
+	ToolGetModelConfig:      "(none) — no JSON-only REST yet; httpapi.Client returns a documented stub",
+	ToolListBackends:        "GET /backends",
+	ToolListKnownBackends:   "GET /backends/known",
+	ToolSystemInfo:          "GET / (welcome JSON)",
+	ToolListNodes:           "GET /api/nodes",
+	ToolVRAMEstimate:        "POST /api/models/vram-estimate",
+
+	// Mutating tools.
+	ToolInstallModel:      "POST /models/apply",
+	ToolImportModelURI:    "POST /models/import-uri",
+	ToolDeleteModel:       "POST /models/delete/:name",
+	ToolEditModelConfig:   "PATCH /api/models/config-json/:name",
+	ToolReloadModels:      "POST /models/reload",
+	ToolInstallBackend:    "POST /backends/apply",
+	ToolUpgradeBackend:    "POST /backends/upgrade/:name",
+	ToolToggleModelState:  "PUT /models/toggle-state/:name/:action",
+	ToolToggleModelPinned: "PUT /models/toggle-pinned/:name/:action",
+}
+
+// allKnownTools is the union of expectedFullCatalog (defined in
+// server_test.go). Keeping a single source of truth — the slice from
+// server_test — and asserting the route map covers every entry catches
+// the case "you added a Tool* but forgot to register it as MCP" indirectly
+// (it'd be missing from expectedFullCatalog, which has its own assertion
+// in TestServerRegistersExpectedToolCatalog).
+var _ = Describe("Tool ↔ HTTP route coverage map", func() {
+	It("has an entry for every Tool* in the published catalog", func() {
+		for _, name := range expectedFullCatalog {
+			_, ok := toolToHTTPRoute[name]
+			Expect(ok).To(BeTrue(),
+				"Tool %q is in expectedFullCatalog but not in toolToHTTPRoute. "+
+					"When adding an MCP tool, update toolToHTTPRoute in coverage_test.go "+
+					"with the REST endpoint the httpapi.Client calls (or '(none)' with a reason).",
+				name)
+		}
+	})
+
+	It("does not document tools that no longer exist in the catalog", func() {
+		catalog := map[string]struct{}{}
+		for _, name := range expectedFullCatalog {
+			catalog[name] = struct{}{}
+		}
+		for name := range toolToHTTPRoute {
+			_, ok := catalog[name]
+			Expect(ok).To(BeTrue(),
+				"toolToHTTPRoute documents %q but the tool is not registered. "+
+					"Remove the stale entry.",
+				name)
+		}
+	})
+
+	// Deliberate non-test: we don't enumerate admin REST routes here. That
+	// would require booting Application or parsing core/http/routes/localai.go,
+	// both of which are brittle. The contract for "new admin REST endpoint
+	// → MCP tool" is enforced by the PR checklist in
+	// .agents/api-endpoints-and-auth.md, not by this test.
+})
--- a/pkg/mcp/localaitools/doc.go
+++ b/pkg/mcp/localaitools/doc.go
@@ -0,0 +1,16 @@
+// Package localaitools exposes LocalAI's admin/management surface as
+// a Model Context Protocol server. The same package is used in two ways:
+//
+//   - In-process, by the chat handler, when an admin opts the chat session
+//     into the "LocalAI Assistant" modality. The MCP server is wired to the
+//     chat session over a paired in-memory transport (net.Pipe()), and the
+//     LocalAIClient is implemented by the inproc subpackage, which calls
+//     LocalAI services directly.
+//
+//   - Out of process, by the standalone "local-ai mcp-server" subcommand,
+//     which speaks MCP over stdio and uses the httpapi subpackage to talk
+//     to a remote LocalAI instance over HTTP.
+//
+// Tool handlers and the embedded skill prompts only see the LocalAIClient
+// interface and are agnostic to the underlying transport or implementation.
+package localaitools
--- a/pkg/mcp/localaitools/dto.go
+++ b/pkg/mcp/localaitools/dto.go
@@ -0,0 +1,130 @@
+package localaitools
+
+// DTOs for the LocalAIClient interface. Where the same shape already exists
+// elsewhere (config.Gallery, gallery.Metadata, schema.KnownBackend,
+// vram.EstimateResult) we surface that type directly via the interface
+// instead of maintaining a parallel DTO. The remaining types in this file
+// are LLM-shaped views of internal state where the source struct carries
+// fields the LLM shouldn't see (auth tokens, filesystem paths) or
+// non-JSON-friendly fields (e.g. galleryop.OpStatus.Error which marshals
+// to "{}" because it's an interface).
+
+// GallerySearchQuery is the input for gallery_search.
+type GallerySearchQuery struct {
+	Query   string `json:"query"             jsonschema:"Free-text query matched against model name, gallery and tags. Empty returns the first Limit models."`
+	Limit   int    `json:"limit,omitempty"   jsonschema:"Maximum number of results to return. Defaults to 20 when zero or negative."`
+	Tag     string `json:"tag,omitempty"     jsonschema:"Optional tag filter (e.g. chat, embed, image)."`
+	Gallery string `json:"gallery,omitempty" jsonschema:"Restrict results to a specific gallery name."`
+}
+
+// InstalledModel is one entry in list_installed_models. Distinct from
+// config.ModelConfig (which is the full on-disk YAML — far too large to
+// serialise per request); this is a summary the LLM can scan cheaply.
+type InstalledModel struct {
+	Name         string   `json:"name"`
+	Backend      string   `json:"backend,omitempty"`
+	Capabilities []string `json:"capabilities,omitempty"`
+	Pinned       bool     `json:"pinned,omitempty"`
+	Disabled     bool     `json:"disabled,omitempty"`
+}
+
+// JobStatus is a JSON-friendly mirror of galleryop.OpStatus. We don't surface
+// OpStatus directly because its `Error error` field marshals to `{}` (the
+// json.Marshal default for an error interface), and the underlying status
+// map keys jobs by UUID rather than carrying the ID on the value, so we
+// add the ID here too. Keep field names aligned with OpStatus where they
+// overlap so callers comparing the two don't have to translate.
+type JobStatus struct {
+	ID                 string  `json:"id"`
+	Processed          bool    `json:"processed"`
+	Cancelled          bool    `json:"cancelled,omitempty"`
+	Progress           float64 `json:"progress"`
+	TotalFileSize      string  `json:"total_file_size,omitempty"`
+	DownloadedFileSize string  `json:"downloaded_file_size,omitempty"`
+	Message            string  `json:"message,omitempty"`
+	ErrorMessage       string  `json:"error,omitempty"`
+}
+
+// ModelConfigView is a JSON view of a model config file.
+type ModelConfigView struct {
+	Name string         `json:"name"`
+	YAML string         `json:"yaml,omitempty"  jsonschema:"Full YAML serialization of the model config."`
+	JSON map[string]any `json:"json,omitempty"  jsonschema:"Parsed JSON view of the same config (convenience for diffing)."`
+}
+
+// InstallModelRequest is the input for install_model.
+type InstallModelRequest struct {
+	GalleryName string         `json:"gallery_name,omitempty" jsonschema:"The gallery the model lives in (from gallery_search). Optional when ModelName is unique across galleries."`
+	ModelName   string         `json:"model_name"             jsonschema:"The canonical model name as returned by gallery_search."`
+	Overrides   map[string]any `json:"overrides,omitempty"    jsonschema:"Optional config overrides to merge into the installed model's YAML."`
+}
+
+// InstallBackendRequest is the input for install_backend.
+type InstallBackendRequest struct {
+	GalleryName string `json:"gallery_name,omitempty" jsonschema:"Source backend gallery."`
+	BackendName string `json:"backend_name"           jsonschema:"Backend identifier (e.g. llama-cpp)."`
+}
+
+// Backend is the LLM-facing summary returned by list_backends. We don't
+// expose gallery.SystemBackend directly because it carries filesystem
+// paths (RunFile, IsSystem, IsMeta, the full Metadata) the LLM doesn't
+// need and the tokens add up. ListKnownBackends returns schema.KnownBackend
+// directly — that one is already the canonical wire shape.
+type Backend struct {
+	Name      string `json:"name"`
+	Installed bool   `json:"installed"`
+}
+
+// SystemInfo summarises the LocalAI deployment.
+type SystemInfo struct {
+	Version          string   `json:"version"`
+	Distributed      bool     `json:"distributed"`
+	BackendsPath     string   `json:"backends_path,omitempty"`
+	ModelsPath       string   `json:"models_path,omitempty"`
+	LoadedModels     []string `json:"loaded_models,omitempty"`
+	InstalledBackends []string `json:"installed_backends,omitempty"`
+}
+
+// Node is one entry in list_nodes.
+type Node struct {
+	ID          string `json:"id"`
+	Address     string `json:"address,omitempty"`
+	HTTPAddress string `json:"http_address,omitempty"`
+	TotalVRAM   uint64 `json:"total_vram,omitempty"`
+	Healthy     bool   `json:"healthy"`
+	LastSeen    string `json:"last_seen,omitempty"`
+}
+
+// ImportModelURIRequest is the input for import_model_uri. It mirrors the
+// REST surface (`/models/import-uri`) closely so both clients can produce
+// identical responses; the BackendPreference is a flat field rather than the
+// REST `preferences` JSON blob since the LLM only needs to specify a backend
+// name when it disambiguates a multi-backend match.
+type ImportModelURIRequest struct {
+	URI               string         `json:"uri"                          jsonschema:"The model source. Accepts HuggingFace URLs (https://huggingface.co/...), OCI image references, http(s) URLs to a manifest, file:// paths, or a bare HF repo (e.g. Qwen/Qwen3-4B-GGUF)."`
+	BackendPreference string         `json:"backend_preference,omitempty" jsonschema:"Optional backend name (e.g. llama-cpp). Required as the second-step retry when a previous import_model_uri call returned ambiguous_backend=true."`
+	Overrides         map[string]any `json:"overrides,omitempty"          jsonschema:"Optional config overrides applied to the discovered model (e.g. context_size)."`
+}
+
+// ImportModelURIResponse is what import_model_uri returns. When
+// AmbiguousBackend is true the LLM must surface the candidates to the user
+// and call again with BackendPreference set; the JobID is empty in that case.
+type ImportModelURIResponse struct {
+	JobID               string   `json:"job_id,omitempty"`
+	DiscoveredModelName string   `json:"discovered_model_name,omitempty"`
+	AmbiguousBackend    bool     `json:"ambiguous_backend,omitempty"`
+	Modality            string   `json:"modality,omitempty"`
+	BackendCandidates   []string `json:"backend_candidates,omitempty"`
+	Hint                string   `json:"hint,omitempty"`
+}
+
+// VRAMEstimateRequest is the input for vram_estimate. The output type is
+// pkg/vram.EstimateResult — used directly via the LocalAIClient interface
+// so the LLM sees the same shape (size_bytes/size_display/vram_bytes/
+// vram_display) that the REST endpoint returns.
+type VRAMEstimateRequest struct {
+	ModelName   string `json:"model_name"              jsonschema:"Installed model name."`
+	ContextSize int    `json:"context_size,omitempty"  jsonschema:"Context size in tokens."`
+	GPULayers   int    `json:"gpu_layers,omitempty"    jsonschema:"Number of layers to offload to GPU. -1 for all."`
+	KVQuantBits int    `json:"kv_quant_bits,omitempty" jsonschema:"KV cache quantization bits (e.g. 4, 8, 16)."`
+}
--- a/pkg/mcp/localaitools/dto_test.go
+++ b/pkg/mcp/localaitools/dto_test.go
@@ -0,0 +1,39 @@
+package localaitools
+
+import (
+	"encoding/json"
+
+	. "github.com/onsi/ginkgo/v2"
+	. "github.com/onsi/gomega"
+)
+
+// roundTripDTO marshals v to JSON, decodes back into the same type, and
+// asserts equality. Catches struct-tag drift on every public DTO.
+func roundTripDTO[T any](v T) {
+	data, err := json.Marshal(v)
+	Expect(err).ToNot(HaveOccurred())
+	var got T
+	Expect(json.Unmarshal(data, &got)).To(Succeed())
+	Expect(got).To(Equal(v))
+}
+
+var _ = Describe("DTOs round-trip through JSON", func() {
+	It("preserves every field across the public DTO set", func() {
+		// Only the localaitools-owned DTOs need this guard. Types we
+		// surface from elsewhere (config.Gallery, gallery.Metadata,
+		// schema.KnownBackend, vram.EstimateResult) are tested by their
+		// owning packages and don't need a copy here.
+		roundTripDTO(GallerySearchQuery{Query: "qwen", Limit: 5, Tag: "chat", Gallery: "official"})
+		roundTripDTO(InstalledModel{Name: "n", Backend: "b", Capabilities: []string{"chat"}, Pinned: true, Disabled: false})
+		roundTripDTO(JobStatus{ID: "i", Processed: true, Progress: 0.5, Message: "m", ErrorMessage: ""})
+		roundTripDTO(ModelConfigView{Name: "n", YAML: "k: v\n", JSON: map[string]any{"k": "v"}})
+		roundTripDTO(InstallModelRequest{GalleryName: "g", ModelName: "m", Overrides: map[string]any{"k": "v"}})
+		roundTripDTO(InstallBackendRequest{GalleryName: "g", BackendName: "b"})
+		roundTripDTO(Backend{Name: "n", Installed: true})
+		roundTripDTO(SystemInfo{Version: "v1", Distributed: false, ModelsPath: "/tmp", LoadedModels: []string{"a"}, InstalledBackends: []string{"x"}})
+		roundTripDTO(Node{ID: "n", Address: "a", HTTPAddress: "h", TotalVRAM: 100, Healthy: true, LastSeen: "now"})
+		roundTripDTO(VRAMEstimateRequest{ModelName: "m", ContextSize: 4096, GPULayers: -1, KVQuantBits: 8})
+		roundTripDTO(ImportModelURIRequest{URI: "u", BackendPreference: "llama-cpp", Overrides: map[string]any{"k": "v"}})
+		roundTripDTO(ImportModelURIResponse{JobID: "j", DiscoveredModelName: "m", AmbiguousBackend: true, Modality: "tts", BackendCandidates: []string{"a", "b"}, Hint: "h"})
+	})
+})
--- a/pkg/mcp/localaitools/errors.go
+++ b/pkg/mcp/localaitools/errors.go
@@ -0,0 +1,34 @@
+package localaitools
+
+import (
+	"encoding/json"
+	"fmt"
+
+	"github.com/modelcontextprotocol/go-sdk/mcp"
+)
+
+// errorResult builds an MCP CallToolResult that surfaces err to the LLM via
+// the standard IsError + TextContent convention. The LLM is instructed (in
+// 10_safety.md) to surface tool errors verbatim.
+func errorResult(err error) *mcp.CallToolResult {
+	r := &mcp.CallToolResult{}
+	r.SetError(err)
+	return r
+}
+
+// errorResultf is the printf cousin of errorResult.
+func errorResultf(format string, args ...any) *mcp.CallToolResult {
+	return errorResult(fmt.Errorf(format, args...))
+}
+
+// jsonResult marshals v as pretty JSON and returns it as the tool's
+// TextContent payload. Errors during marshalling become tool errors.
+func jsonResult(v any) *mcp.CallToolResult {
+	data, err := json.MarshalIndent(v, "", "  ")
+	if err != nil {
+		return errorResult(fmt.Errorf("marshal tool result: %w", err))
+	}
+	return &mcp.CallToolResult{
+		Content: []mcp.Content{&mcp.TextContent{Text: string(data)}},
+	}
+}
--- a/pkg/mcp/localaitools/fakes_test.go
+++ b/pkg/mcp/localaitools/fakes_test.go
@@ -0,0 +1,222 @@
+package localaitools
+
+import (
+	"context"
+	"errors"
+	"fmt"
+	"sync"
+
+	"github.com/mudler/LocalAI/core/config"
+	"github.com/mudler/LocalAI/core/gallery"
+	"github.com/mudler/LocalAI/core/schema"
+	"github.com/mudler/LocalAI/core/services/modeladmin"
+	"github.com/mudler/LocalAI/pkg/vram"
+)
+
+// fakeClient is a recording, configurable LocalAIClient for unit tests.
+// Each method records the args it was called with and returns whatever the
+// matching field on the struct is configured to return. Methods are guarded
+// by a mutex so tests can run with -race.
+type fakeClient struct {
+	mu sync.Mutex
+
+	// Recorded calls (in order).
+	calls []fakeCall
+
+	// Per-method overrides. Tests set these.
+	gallerySearch       func(GallerySearchQuery) ([]gallery.Metadata, error)
+	listInstalledModels func(Capability) ([]InstalledModel, error)
+	listGalleries       func() ([]config.Gallery, error)
+	getJobStatus        func(string) (*JobStatus, error)
+	getModelConfig      func(string) (*ModelConfigView, error)
+	installModel        func(InstallModelRequest) (string, error)
+	importModelURI      func(ImportModelURIRequest) (*ImportModelURIResponse, error)
+	deleteModel         func(string) error
+	editModelConfig     func(string, map[string]any) error
+	reloadModels        func() error
+	listBackends        func() ([]Backend, error)
+	listKnownBackends   func() ([]schema.KnownBackend, error)
+	installBackend      func(InstallBackendRequest) (string, error)
+	upgradeBackend      func(string) (string, error)
+	systemInfo          func() (*SystemInfo, error)
+	listNodes           func() ([]Node, error)
+	vramEstimate        func(VRAMEstimateRequest) (*vram.EstimateResult, error)
+	toggleModelState    func(string, modeladmin.Action) error
+	toggleModelPinned   func(string, modeladmin.Action) error
+}
+
+type fakeCall struct {
+	method string
+	args   any
+}
+
+func (f *fakeClient) record(method string, args any) {
+	f.mu.Lock()
+	defer f.mu.Unlock()
+	f.calls = append(f.calls, fakeCall{method: method, args: args})
+}
+
+func (f *fakeClient) recorded() []fakeCall {
+	f.mu.Lock()
+	defer f.mu.Unlock()
+	out := make([]fakeCall, len(f.calls))
+	copy(out, f.calls)
+	return out
+}
+
+var errNotConfigured = errors.New("fakeClient method not configured")
+
+func (f *fakeClient) GallerySearch(_ context.Context, q GallerySearchQuery) ([]gallery.Metadata, error) {
+	f.record("GallerySearch", q)
+	if f.gallerySearch != nil {
+		return f.gallerySearch(q)
+	}
+	return nil, nil
+}
+
+func (f *fakeClient) ListInstalledModels(_ context.Context, capability Capability) ([]InstalledModel, error) {
+	f.record("ListInstalledModels", capability)
+	if f.listInstalledModels != nil {
+		return f.listInstalledModels(capability)
+	}
+	return nil, nil
+}
+
+func (f *fakeClient) ListGalleries(_ context.Context) ([]config.Gallery, error) {
+	f.record("ListGalleries", nil)
+	if f.listGalleries != nil {
+		return f.listGalleries()
+	}
+	return nil, nil
+}
+
+func (f *fakeClient) GetJobStatus(_ context.Context, jobID string) (*JobStatus, error) {
+	f.record("GetJobStatus", jobID)
+	if f.getJobStatus != nil {
+		return f.getJobStatus(jobID)
+	}
+	return nil, errNotConfigured
+}
+
+func (f *fakeClient) GetModelConfig(_ context.Context, name string) (*ModelConfigView, error) {
+	f.record("GetModelConfig", name)
+	if f.getModelConfig != nil {
+		return f.getModelConfig(name)
+	}
+	return nil, errNotConfigured
+}
+
+func (f *fakeClient) InstallModel(_ context.Context, req InstallModelRequest) (string, error) {
+	f.record("InstallModel", req)
+	if f.installModel != nil {
+		return f.installModel(req)
+	}
+	return "", errNotConfigured
+}
+
+func (f *fakeClient) DeleteModel(_ context.Context, name string) error {
+	f.record("DeleteModel", name)
+	if f.deleteModel != nil {
+		return f.deleteModel(name)
+	}
+	return nil
+}
+
+func (f *fakeClient) ImportModelURI(_ context.Context, req ImportModelURIRequest) (*ImportModelURIResponse, error) {
+	f.record("ImportModelURI", req)
+	if f.importModelURI != nil {
+		return f.importModelURI(req)
+	}
+	return &ImportModelURIResponse{JobID: "fake-import-job"}, nil
+}
+
+func (f *fakeClient) EditModelConfig(_ context.Context, name string, patch map[string]any) error {
+	f.record("EditModelConfig", []any{name, patch})
+	if f.editModelConfig != nil {
+		return f.editModelConfig(name, patch)
+	}
+	return nil
+}
+
+func (f *fakeClient) ReloadModels(_ context.Context) error {
+	f.record("ReloadModels", nil)
+	if f.reloadModels != nil {
+		return f.reloadModels()
+	}
+	return nil
+}
+
+func (f *fakeClient) ListBackends(_ context.Context) ([]Backend, error) {
+	f.record("ListBackends", nil)
+	if f.listBackends != nil {
+		return f.listBackends()
+	}
+	return nil, nil
+}
+
+func (f *fakeClient) ListKnownBackends(_ context.Context) ([]schema.KnownBackend, error) {
+	f.record("ListKnownBackends", nil)
+	if f.listKnownBackends != nil {
+		return f.listKnownBackends()
+	}
+	return nil, nil
+}
+
+func (f *fakeClient) InstallBackend(_ context.Context, req InstallBackendRequest) (string, error) {
+	f.record("InstallBackend", req)
+	if f.installBackend != nil {
+		return f.installBackend(req)
+	}
+	return "", errNotConfigured
+}
+
+func (f *fakeClient) UpgradeBackend(_ context.Context, name string) (string, error) {
+	f.record("UpgradeBackend", name)
+	if f.upgradeBackend != nil {
+		return f.upgradeBackend(name)
+	}
+	return "", errNotConfigured
+}
+
+func (f *fakeClient) SystemInfo(_ context.Context) (*SystemInfo, error) {
+	f.record("SystemInfo", nil)
+	if f.systemInfo != nil {
+		return f.systemInfo()
+	}
+	return &SystemInfo{Version: "test"}, nil
+}
+
+func (f *fakeClient) ListNodes(_ context.Context) ([]Node, error) {
+	f.record("ListNodes", nil)
+	if f.listNodes != nil {
+		return f.listNodes()
+	}
+	return nil, nil
+}
+
+func (f *fakeClient) VRAMEstimate(_ context.Context, req VRAMEstimateRequest) (*vram.EstimateResult, error) {
+	f.record("VRAMEstimate", req)
+	if f.vramEstimate != nil {
+		return f.vramEstimate(req)
+	}
+	return nil, errNotConfigured
+}
+
+func (f *fakeClient) ToggleModelState(_ context.Context, name string, action modeladmin.Action) error {
+	f.record("ToggleModelState", []any{name, action})
+	if f.toggleModelState != nil {
+		return f.toggleModelState(name, action)
+	}
+	return nil
+}
+
+func (f *fakeClient) ToggleModelPinned(_ context.Context, name string, action modeladmin.Action) error {
+	f.record("ToggleModelPinned", []any{name, action})
+	if f.toggleModelPinned != nil {
+		return f.toggleModelPinned(name, action)
+	}
+	return nil
+}
+
+// boom is a sentinel error used by tests that want a deterministic error string.
+var boom = fmt.Errorf("boom")
--- a/pkg/mcp/localaitools/httpapi/client.go
+++ b/pkg/mcp/localaitools/httpapi/client.go
@@ -0,0 +1,491 @@
+// Package httpapi provides a LocalAIClient that talks to a remote LocalAI
+// instance over its REST API. Used by the standalone "local-ai mcp-server"
+// subcommand to control a remote deployment over stdio.
+package httpapi
+
+import (
+	"bytes"
+	"context"
+	"encoding/json"
+	"errors"
+	"fmt"
+	"io"
+	"net/http"
+	"strings"
+	"time"
+
+	"github.com/mudler/LocalAI/core/config"
+	"github.com/mudler/LocalAI/core/gallery"
+	"github.com/mudler/LocalAI/core/schema"
+	"github.com/mudler/LocalAI/core/services/modeladmin"
+	localaitools "github.com/mudler/LocalAI/pkg/mcp/localaitools"
+	"github.com/mudler/LocalAI/pkg/vram"
+)
+
+// Client is a thin REST wrapper. It maps each LocalAIClient method to the
+// matching admin endpoint. Errors from non-2xx responses include the body for
+// the MCP layer to surface verbatim to the LLM.
+type Client struct {
+	BaseURL string
+	APIKey  string
+
+	HTTPClient *http.Client
+}
+
+// New returns a Client targeting baseURL with an optional bearer token.
+func New(baseURL, apiKey string) *Client {
+	return &Client{
+		BaseURL: strings.TrimRight(baseURL, "/"),
+		APIKey:  apiKey,
+		HTTPClient: &http.Client{
+			Timeout: 60 * time.Second,
+		},
+	}
+}
+
+// Compile-time assertion.
+var _ localaitools.LocalAIClient = (*Client)(nil)
+
+// HTTPError is returned by do() for non-2xx responses. Callers should use
+// errors.Is(err, ErrHTTPNotFound) instead of substring-matching on
+// err.Error() — the latter is brittle to status-code formatting changes.
+type HTTPError struct {
+	Method     string
+	Path       string
+	StatusCode int
+	Body       string
+}
+
+func (e *HTTPError) Error() string {
+	return fmt.Sprintf("%s %s: %d %s: %s", e.Method, e.Path, e.StatusCode, http.StatusText(e.StatusCode), strings.TrimSpace(e.Body))
+}
+
+// ErrHTTPNotFound is the sentinel for "the resource you asked for doesn't
+// exist". Match it via errors.Is on an *HTTPError.
+var ErrHTTPNotFound = errors.New("httpapi: not found")
+
+// Is supports errors.Is(*HTTPError, ErrHTTPNotFound). The 500-with-text
+// branch is a transitional fallback for /models/jobs/:uuid which today
+// returns a 500 carrying "could not find any status for ID" instead of a
+// proper 404. Drop the branch when the server is fixed.
+func (e *HTTPError) Is(target error) bool {
+	if target != ErrHTTPNotFound {
+		return false
+	}
+	if e.StatusCode == http.StatusNotFound {
+		return true
+	}
+	return e.StatusCode == http.StatusInternalServerError && strings.Contains(e.Body, "could not find")
+}
+
+// ---- HTTP helpers ----
+
+func (c *Client) do(ctx context.Context, method, path string, body any, out any) error {
+	var rdr io.Reader
+	if body != nil {
+		raw, err := json.Marshal(body)
+		if err != nil {
+			return fmt.Errorf("marshal body: %w", err)
+		}
+		rdr = bytes.NewReader(raw)
+	}
+
+	req, err := http.NewRequestWithContext(ctx, method, c.BaseURL+path, rdr)
+	if err != nil {
+		return err
+	}
+	if body != nil {
+		req.Header.Set("Content-Type", "application/json")
+	}
+	req.Header.Set("Accept", "application/json")
+	if c.APIKey != "" {
+		req.Header.Set("Authorization", "Bearer "+c.APIKey)
+	}
+
+	resp, err := c.HTTPClient.Do(req)
+	if err != nil {
+		return err
+	}
+	defer resp.Body.Close()
+
+	respBody, _ := io.ReadAll(resp.Body)
+	if resp.StatusCode < 200 || resp.StatusCode >= 300 {
+		return &HTTPError{Method: method, Path: path, StatusCode: resp.StatusCode, Body: string(respBody)}
+	}
+	if out == nil {
+		return nil
+	}
+	if err := json.Unmarshal(respBody, out); err != nil {
+		return fmt.Errorf("decode %s %s response: %w (body=%q)", method, path, err, truncate(string(respBody), 200))
+	}
+	return nil
+}
+
+func truncate(s string, n int) string {
+	if len(s) <= n {
+		return s
+	}
+	return s[:n] + "..."
+}
+
+// ---- Models / gallery (read) ----
+
+func (c *Client) GallerySearch(ctx context.Context, q localaitools.GallerySearchQuery) ([]gallery.Metadata, error) {
+	// /models/available already returns []gallery.Metadata — pass it
+	// through after applying the LLM-supplied filters client-side.
+	var metas []gallery.Metadata
+	if err := c.do(ctx, http.MethodGet, routeModelsAvail, nil, &metas); err != nil {
+		return nil, err
+	}
+	limit := q.Limit
+	if limit <= 0 {
+		limit = 20
+	}
+	out := make([]gallery.Metadata, 0, limit)
+	needle := strings.ToLower(q.Query)
+	tag := strings.ToLower(q.Tag)
+	for _, m := range metas {
+		if q.Gallery != "" && m.Gallery.Name != q.Gallery {
+			continue
+		}
+		if needle != "" && !contains(m.Name, needle) && !contains(m.Description, needle) && !containsTagsAny(m.Tags, needle) {
+			continue
+		}
+		if tag != "" && !containsTagExact(m.Tags, tag) {
+			continue
+		}
+		out = append(out, m)
+		if len(out) >= limit {
+			break
+		}
+	}
+	return out, nil
+}
+
+func (c *Client) ListInstalledModels(ctx context.Context, capability localaitools.Capability) ([]localaitools.InstalledModel, error) {
+	_ = capability // Capability filtering is unavailable over the welcome HTTP shape today; see TODO below.
+	// /v1/models is the OpenAI-compat shape; we use the LocalAI welcome JSON
+	// for richer info.
+	var welcome struct {
+		ModelsConfig []struct {
+			Name    string `json:"name"`
+			Backend string `json:"backend"`
+		} `json:"ModelsConfig"`
+	}
+	if err := c.do(ctx, http.MethodGet, routeWelcome, nil, &welcome); err != nil {
+		return nil, err
+	}
+	// Capability filtering is unavailable over HTTP without a dedicated endpoint
+	// — for now we return everything and let the LLM filter from the names. A
+	// follow-up should add a /api/models?capability=chat endpoint.
+	out := make([]localaitools.InstalledModel, 0, len(welcome.ModelsConfig))
+	for _, m := range welcome.ModelsConfig {
+		out = append(out, localaitools.InstalledModel{Name: m.Name, Backend: m.Backend})
+	}
+	return out, nil
+}
+
+func (c *Client) ListGalleries(ctx context.Context) ([]config.Gallery, error) {
+	// /models/galleries returns []config.Gallery directly.
+	var out []config.Gallery
+	if err := c.do(ctx, http.MethodGet, routeModelsGall, nil, &out); err != nil {
+		return nil, err
+	}
+	return out, nil
+}
+
+func (c *Client) GetJobStatus(ctx context.Context, jobID string) (*localaitools.JobStatus, error) {
+	if jobID == "" {
+		return nil, errors.New("job id is required")
+	}
+	var raw struct {
+		Processed          bool    `json:"processed"`
+		Cancelled          bool    `json:"cancelled"`
+		Progress           float64 `json:"progress"`
+		Message            string  `json:"message"`
+		FileSize           string  `json:"file_size"`
+		DownloadedSize     string  `json:"downloaded_size"`
+		Error              string  `json:"error,omitempty"`
+		GalleryElementName string  `json:"gallery_element_name"`
+	}
+	if err := c.do(ctx, http.MethodGet, routeJobStatus(jobID), nil, &raw); err != nil {
+		// "no such job" is not a real failure — surface (nil, nil) so the
+		// LLM can stop polling without treating the response as an error.
+		if errors.Is(err, ErrHTTPNotFound) {
+			return nil, nil
+		}
+		return nil, err
+	}
+	return &localaitools.JobStatus{
+		ID:                 jobID,
+		Processed:          raw.Processed,
+		Cancelled:          raw.Cancelled,
+		Progress:           raw.Progress,
+		TotalFileSize:      raw.FileSize,
+		DownloadedFileSize: raw.DownloadedSize,
+		Message:            raw.Message,
+		ErrorMessage:       raw.Error,
+	}, nil
+}
+
+// GetModelConfig is intentionally a stub for the HTTP client: LocalAI's
+// /models/edit/:name endpoint returns rendered HTML, not JSON, so the
+// standalone CLI's `get_model_config` tool surfaces a clear error to the
+// LLM. Tracked under the localai-assistant follow-ups (see
+// .agents/localai-assistant-mcp.md) — once a JSON-only
+// GET /api/models/config-yaml/:name endpoint lands on the server, this
+// method calls it and the stub goes away.
+//
+// FIXME(localai-assistant): wire to a JSON read-back endpoint.
+func (c *Client) GetModelConfig(_ context.Context, _ string) (*localaitools.ModelConfigView, error) {
+	return nil, errors.New("get_model_config over HTTP not yet supported by this client; use the in-process inproc client or REST /models/edit/{name}")
+}
+
+// ---- Models / gallery (write) ----
+
+func (c *Client) InstallModel(ctx context.Context, req localaitools.InstallModelRequest) (string, error) {
+	body := map[string]any{"id": req.ModelName}
+	if req.GalleryName != "" {
+		body["id"] = req.GalleryName + "@" + req.ModelName
+	}
+	body["name"] = req.ModelName
+	if len(req.Overrides) > 0 {
+		body["overrides"] = req.Overrides
+	}
+	var resp struct {
+		ID        string `json:"uuid"`
+		StatusURL string `json:"status"`
+	}
+	if err := c.do(ctx, http.MethodPost, routeModelsApply, body, &resp); err != nil {
+		return "", err
+	}
+	return resp.ID, nil
+}
+
+func (c *Client) ImportModelURI(ctx context.Context, req localaitools.ImportModelURIRequest) (*localaitools.ImportModelURIResponse, error) {
+	if req.URI == "" {
+		return nil, errors.New("uri is required")
+	}
+	body := map[string]any{"uri": req.URI}
+	if req.BackendPreference != "" {
+		// Server expects preferences as a JSON object; wrap the backend
+		// preference accordingly.
+		body["preferences"] = map[string]string{"backend": req.BackendPreference}
+	}
+
+	rawReq, err := json.Marshal(body)
+	if err != nil {
+		return nil, fmt.Errorf("marshal body: %w", err)
+	}
+	httpReq, err := http.NewRequestWithContext(ctx, http.MethodPost, c.BaseURL+routeModelsImport, bytes.NewReader(rawReq))
+	if err != nil {
+		return nil, err
+	}
+	httpReq.Header.Set("Content-Type", "application/json")
+	httpReq.Header.Set("Accept", "application/json")
+	if c.APIKey != "" {
+		httpReq.Header.Set("Authorization", "Bearer "+c.APIKey)
+	}
+	resp, err := c.HTTPClient.Do(httpReq)
+	if err != nil {
+		return nil, err
+	}
+	defer resp.Body.Close()
+	respBody, _ := io.ReadAll(resp.Body)
+
+	// 400 with `error: "ambiguous import"` is not a transport error — it's the
+	// disambiguation signal. Translate it back into AmbiguousBackend so the
+	// MCP layer surface stays identical regardless of in-process vs HTTP.
+	if resp.StatusCode == http.StatusBadRequest {
+		var amb struct {
+			Error      string   `json:"error"`
+			Detail     string   `json:"detail"`
+			Modality   string   `json:"modality"`
+			Candidates []string `json:"candidates"`
+			Hint       string   `json:"hint"`
+		}
+		if json.Unmarshal(respBody, &amb) == nil && amb.Error == "ambiguous import" {
+			return &localaitools.ImportModelURIResponse{
+				AmbiguousBackend:  true,
+				Modality:          amb.Modality,
+				BackendCandidates: amb.Candidates,
+				Hint:              amb.Hint,
+			}, nil
+		}
+	}
+	if resp.StatusCode < 200 || resp.StatusCode >= 300 {
+		return nil, fmt.Errorf("POST %s: %d %s: %s", routeModelsImport, resp.StatusCode, http.StatusText(resp.StatusCode), strings.TrimSpace(string(respBody)))
+	}
+
+	var raw struct {
+		ID string `json:"uuid"`
+	}
+	if err := json.Unmarshal(respBody, &raw); err != nil {
+		return nil, fmt.Errorf("decode import response: %w", err)
+	}
+	return &localaitools.ImportModelURIResponse{JobID: raw.ID}, nil
+}
+
+func (c *Client) DeleteModel(ctx context.Context, name string) error {
+	return c.do(ctx, http.MethodPost, routeModelDelete(name), nil, nil)
+}
+
+func (c *Client) EditModelConfig(ctx context.Context, name string, patch map[string]any) error {
+	return c.do(ctx, http.MethodPatch, routeModelConfigJSON(name), patch, nil)
+}
+
+func (c *Client) ReloadModels(ctx context.Context) error {
+	return c.do(ctx, http.MethodPost, routeModelsReload, nil, nil)
+}
+
+// ---- Backends ----
+
+func (c *Client) ListBackends(ctx context.Context) ([]localaitools.Backend, error) {
+	var raw []struct {
+		Name      string `json:"name"`
+		Installed bool   `json:"installed"`
+	}
+	if err := c.do(ctx, http.MethodGet, routeBackends, nil, &raw); err != nil {
+		return nil, err
+	}
+	out := make([]localaitools.Backend, 0, len(raw))
+	for _, b := range raw {
+		out = append(out, localaitools.Backend{Name: b.Name, Installed: b.Installed})
+	}
+	return out, nil
+}
+
+func (c *Client) ListKnownBackends(ctx context.Context) ([]schema.KnownBackend, error) {
+	// /backends/known emits []schema.KnownBackend directly — pass through.
+	var out []schema.KnownBackend
+	if err := c.do(ctx, http.MethodGet, routeBackendsKnown, nil, &out); err != nil {
+		return nil, err
+	}
+	return out, nil
+}
+
+func (c *Client) InstallBackend(ctx context.Context, req localaitools.InstallBackendRequest) (string, error) {
+	body := map[string]any{"id": req.BackendName}
+	if req.GalleryName != "" {
+		body["id"] = req.GalleryName + "@" + req.BackendName
+	}
+	body["name"] = req.BackendName
+	var resp struct {
+		ID string `json:"uuid"`
+	}
+	if err := c.do(ctx, http.MethodPost, routeBackendsApply, body, &resp); err != nil {
+		return "", err
+	}
+	return resp.ID, nil
+}
+
+func (c *Client) UpgradeBackend(ctx context.Context, name string) (string, error) {
+	var resp struct {
+		ID string `json:"uuid"`
+	}
+	if err := c.do(ctx, http.MethodPost, routeBackendUpgrade(name), nil, &resp); err != nil {
+		return "", err
+	}
+	return resp.ID, nil
+}
+
+// ---- System ----
+
+func (c *Client) SystemInfo(ctx context.Context) (*localaitools.SystemInfo, error) {
+	var welcome struct {
+		Version           string   `json:"Version"`
+		LoadedModels      []any    `json:"LoadedModels"`
+		InstalledBackends map[string]bool `json:"InstalledBackends"`
+	}
+	if err := c.do(ctx, http.MethodGet, routeWelcome, nil, &welcome); err != nil {
+		return nil, err
+	}
+	info := &localaitools.SystemInfo{Version: welcome.Version}
+	for name := range welcome.InstalledBackends {
+		info.InstalledBackends = append(info.InstalledBackends, name)
+	}
+	// LoadedModels shape varies; we don't attempt to decode it client-side.
+	return info, nil
+}
+
+func (c *Client) ListNodes(ctx context.Context) ([]localaitools.Node, error) {
+	var raw []struct {
+		ID          string `json:"id"`
+		Address     string `json:"address"`
+		HTTPAddress string `json:"http_address"`
+		Status      string `json:"status"`
+	}
+	if err := c.do(ctx, http.MethodGet, routeNodes, nil, &raw); err != nil {
+		// Treat 404/disabled as "no nodes" to keep parity with single-process.
+		if errors.Is(err, ErrHTTPNotFound) {
+			return []localaitools.Node{}, nil
+		}
+		return nil, err
+	}
+	out := make([]localaitools.Node, 0, len(raw))
+	for _, n := range raw {
+		out = append(out, localaitools.Node{
+			ID:          n.ID,
+			Address:     n.Address,
+			HTTPAddress: n.HTTPAddress,
+			Healthy:     n.Status == "healthy",
+		})
+	}
+	return out, nil
+}
+
+func (c *Client) VRAMEstimate(ctx context.Context, req localaitools.VRAMEstimateRequest) (*vram.EstimateResult, error) {
+	body := map[string]any{"model": req.ModelName}
+	if req.ContextSize > 0 {
+		body["context_size"] = req.ContextSize
+	}
+	if req.GPULayers != 0 {
+		body["gpu_layers"] = req.GPULayers
+	}
+	if req.KVQuantBits > 0 {
+		body["kv_quant_bits"] = req.KVQuantBits
+	}
+	// /api/models/vram-estimate returns a wrapper carrying vram.EstimateResult
+	// (size_bytes/size_display/vram_bytes/vram_display) plus context-note
+	// fields. Decode directly into EstimateResult — the LLM gets the
+	// pre-formatted display strings, identical to REST.
+	var out vram.EstimateResult
+	if err := c.do(ctx, http.MethodPost, routeVRAMEstimate, body, &out); err != nil {
+		return nil, err
+	}
+	return &out, nil
+}
+
+// ---- State ----
+
+func (c *Client) ToggleModelState(ctx context.Context, name string, action modeladmin.Action) error {
+	return c.do(ctx, http.MethodPut, routeToggleModelState(name, string(action)), nil, nil)
+}
+
+func (c *Client) ToggleModelPinned(ctx context.Context, name string, action modeladmin.Action) error {
+	return c.do(ctx, http.MethodPut, routeToggleModelPinned(name, string(action)), nil, nil)
+}
+
+// ---- helpers ----
+
+func contains(haystack, lowerNeedle string) bool {
+	return strings.Contains(strings.ToLower(haystack), lowerNeedle)
+}
+
+func containsTagsAny(tags []string, lowerNeedle string) bool {
+	for _, t := range tags {
+		if strings.Contains(strings.ToLower(t), lowerNeedle) {
+			return true
+		}
+	}
+	return false
+}
+
+func containsTagExact(tags []string, lowerNeedle string) bool {
+	for _, t := range tags {
+		if strings.EqualFold(t, lowerNeedle) {
+			return true
+		}
+	}
+	return false
+}
--- a/pkg/mcp/localaitools/httpapi/client_test.go
+++ b/pkg/mcp/localaitools/httpapi/client_test.go
@@ -0,0 +1,223 @@
+package httpapi
+
+import (
+	"context"
+	"encoding/json"
+	"errors"
+	"net/http"
+	"net/http/httptest"
+
+	. "github.com/onsi/ginkgo/v2"
+	. "github.com/onsi/gomega"
+
+	localaitools "github.com/mudler/LocalAI/pkg/mcp/localaitools"
+)
+
+// fakeLocalAI is a minimal HTTP server that mimics the relevant LocalAI
+// admin endpoints. Only the routes the httpapi.Client touches need to exist.
+func fakeLocalAI() *httptest.Server {
+	mux := http.NewServeMux()
+
+	mux.HandleFunc("/models/available", func(w http.ResponseWriter, _ *http.Request) {
+		_ = json.NewEncoder(w).Encode([]map[string]any{
+			{
+				"name":        "qwen2.5-7b-instruct",
+				"description": "Qwen 2.5 chat",
+				"license":     "apache-2.0",
+				"tags":        []string{"chat", "tools"},
+				"gallery":     map[string]any{"name": "official", "url": "http://gallery"},
+				"installed":   false,
+			},
+			{
+				"name":    "stable-diffusion-3.5",
+				"tags":    []string{"image"},
+				"gallery": map[string]any{"name": "official", "url": "http://gallery"},
+			},
+		})
+	})
+
+	mux.HandleFunc("/models/galleries", func(w http.ResponseWriter, _ *http.Request) {
+		_ = json.NewEncoder(w).Encode([]map[string]any{
+			{"name": "official", "url": "http://gallery"},
+		})
+	})
+
+	mux.HandleFunc("/models/apply", func(w http.ResponseWriter, r *http.Request) {
+		if r.Method != http.MethodPost {
+			http.Error(w, "method", http.StatusMethodNotAllowed)
+			return
+		}
+		_ = json.NewEncoder(w).Encode(map[string]any{
+			"uuid":   "job-123",
+			"status": r.Host + "/models/jobs/job-123",
+		})
+	})
+
+	mux.HandleFunc("/models/jobs/job-123", func(w http.ResponseWriter, _ *http.Request) {
+		_ = json.NewEncoder(w).Encode(map[string]any{
+			"processed": true, "progress": 100.0, "message": "done",
+		})
+	})
+
+	mux.HandleFunc("/models/jobs/missing", func(w http.ResponseWriter, _ *http.Request) {
+		http.Error(w, "could not find any status for ID", http.StatusInternalServerError)
+	})
+
+	mux.HandleFunc("/", func(w http.ResponseWriter, _ *http.Request) {
+		// LocalAI's welcome JSON.
+		_ = json.NewEncoder(w).Encode(map[string]any{
+			"Version":      "v0.0.0-test",
+			"LoadedModels": []any{},
+			"InstalledBackends": map[string]bool{
+				"llama-cpp": true,
+				"whisper":   true,
+			},
+			"ModelsConfig": []map[string]any{
+				{"name": "qwen2.5-7b-instruct", "backend": "llama-cpp"},
+			},
+		})
+	})
+
+	mux.HandleFunc("/backends", func(w http.ResponseWriter, _ *http.Request) {
+		_ = json.NewEncoder(w).Encode([]map[string]any{
+			{"name": "llama-cpp", "installed": true, "tags": []string{"chat"}},
+		})
+	})
+
+	return httptest.NewServer(mux)
+}
+
+var _ = Describe("httpapi.Client against the LocalAI admin REST surface", func() {
+	var (
+		srv *httptest.Server
+		c   *Client
+		ctx context.Context
+	)
+
+	BeforeEach(func() {
+		srv = fakeLocalAI()
+		c = New(srv.URL, "")
+		ctx = context.Background()
+	})
+
+	AfterEach(func() {
+		srv.Close()
+	})
+
+	Describe("GallerySearch", func() {
+		It("filters by tag, applies limit, and preserves tags on the result", func() {
+			hits, err := c.GallerySearch(ctx, localaitools.GallerySearchQuery{Query: "qwen", Tag: "chat", Limit: 5})
+			Expect(err).ToNot(HaveOccurred())
+			Expect(hits).To(HaveLen(1))
+			Expect(hits[0].Name).To(Equal("qwen2.5-7b-instruct"))
+			Expect(containsTagExact(hits[0].Tags, "chat")).To(BeTrue())
+		})
+	})
+
+	Describe("ListGalleries", func() {
+		It("returns the configured galleries", func() {
+			out, err := c.ListGalleries(ctx)
+			Expect(err).ToNot(HaveOccurred())
+			Expect(out).To(HaveLen(1))
+			Expect(out[0].Name).To(Equal("official"))
+		})
+	})
+
+	Describe("InstallModel", func() {
+		It("returns the job id parsed from the apply response", func() {
+			id, err := c.InstallModel(ctx, localaitools.InstallModelRequest{ModelName: "qwen2.5-7b-instruct"})
+			Expect(err).ToNot(HaveOccurred())
+			Expect(id).To(Equal("job-123"))
+		})
+	})
+
+	Describe("GetJobStatus", func() {
+		It("decodes the happy-path response", func() {
+			st, err := c.GetJobStatus(ctx, "job-123")
+			Expect(err).ToNot(HaveOccurred())
+			Expect(st.Processed).To(BeTrue())
+			Expect(st.Progress).To(Equal(100.0))
+		})
+
+		It("translates the legacy 500-with-could-not-find as nil status, nil err", func() {
+			st, err := c.GetJobStatus(ctx, "missing")
+			Expect(err).ToNot(HaveOccurred(), "legacy 500 should not surface as a real error")
+			Expect(st).To(BeNil())
+		})
+	})
+
+	Describe("SystemInfo", func() {
+		It("extracts version and installed-backend names from the welcome JSON", func() {
+			info, err := c.SystemInfo(ctx)
+			Expect(err).ToNot(HaveOccurred())
+			Expect(info.Version).To(Equal("v0.0.0-test"))
+			Expect(info.InstalledBackends).To(HaveLen(2))
+		})
+	})
+
+	Describe("ListBackends", func() {
+		It("returns each installed backend with its installed flag", func() {
+			bs, err := c.ListBackends(ctx)
+			Expect(err).ToNot(HaveOccurred())
+			Expect(bs).To(HaveLen(1))
+			Expect(bs[0].Name).To(Equal("llama-cpp"))
+			Expect(bs[0].Installed).To(BeTrue())
+		})
+	})
+})
+
+var _ = Describe("ErrHTTPNotFound", func() {
+	Context("on a clean 404 status", func() {
+		var (
+			srv *httptest.Server
+			c   *Client
+		)
+		BeforeEach(func() {
+			srv = httptest.NewServer(http.HandlerFunc(func(w http.ResponseWriter, _ *http.Request) {
+				http.Error(w, "nope", http.StatusNotFound)
+			}))
+			c = New(srv.URL, "")
+		})
+		AfterEach(func() { srv.Close() })
+
+		It("translates a 404 on /models/jobs/:id into nil status, nil err", func() {
+			st, err := c.GetJobStatus(context.Background(), "missing")
+			Expect(err).ToNot(HaveOccurred())
+			Expect(st).To(BeNil())
+		})
+
+		It("is detectable via errors.Is when callers don't translate", func() {
+			_, err := c.ListGalleries(context.Background())
+			Expect(errors.Is(err, ErrHTTPNotFound)).To(BeTrue(), "got: %v", err)
+		})
+	})
+
+	Context("on the legacy 500-with-could-not-find body", func() {
+		It("treats it as not-found until LocalAI returns a proper 404", func() {
+			srv := httptest.NewServer(http.HandlerFunc(func(w http.ResponseWriter, _ *http.Request) {
+				http.Error(w, "could not find any status for ID", http.StatusInternalServerError)
+			}))
+			DeferCleanup(srv.Close)
+			c := New(srv.URL, "")
+			st, err := c.GetJobStatus(context.Background(), "missing")
+			Expect(err).ToNot(HaveOccurred())
+			Expect(st).To(BeNil())
+		})
+	})
+})
+
+var _ = Describe("Bearer token", func() {
+	It("forwards the configured API key on every request", func() {
+		var sawAuth string
+		srv := httptest.NewServer(http.HandlerFunc(func(w http.ResponseWriter, r *http.Request) {
+			sawAuth = r.Header.Get("Authorization")
+			_ = json.NewEncoder(w).Encode([]map[string]any{})
+		}))
+		DeferCleanup(srv.Close)
+
+		c := New(srv.URL, "secret-key")
+		_, err := c.ListGalleries(context.Background())
+		Expect(err).ToNot(HaveOccurred())
+		Expect(sawAuth).To(Equal("Bearer secret-key"))
+	})
+})
--- a/pkg/mcp/localaitools/httpapi/httpapi_suite_test.go
+++ b/pkg/mcp/localaitools/httpapi/httpapi_suite_test.go
@@ -0,0 +1,13 @@
+package httpapi
+
+import (
+	"testing"
+
+	. "github.com/onsi/ginkgo/v2"
+	. "github.com/onsi/gomega"
+)
+
+func TestHTTPAPI(t *testing.T) {
+	RegisterFailHandler(Fail)
+	RunSpecs(t, "localaitools/httpapi test suite")
+}
--- a/pkg/mcp/localaitools/httpapi/routes.go
+++ b/pkg/mcp/localaitools/httpapi/routes.go
@@ -0,0 +1,49 @@
+package httpapi
+
+import (
+	"fmt"
+	"net/url"
+)
+
+// Route paths for the LocalAI admin REST surface that this client targets.
+// Static paths are constants; dynamic paths are builders that handle
+// url.PathEscape on segment values. Keep these aligned with the server-side
+// registrations in core/http/routes/localai.go — the Tool↔REST drift detector
+// in coverage_test.go documents the mapping.
+const (
+	routeWelcome       = "/"
+	routeModelsApply   = "/models/apply"
+	routeModelsAvail   = "/models/available"
+	routeModelsGall    = "/models/galleries"
+	routeModelsImport  = "/models/import-uri"
+	routeModelsReload  = "/models/reload"
+	routeBackends      = "/backends"
+	routeBackendsKnown = "/backends/known"
+	routeBackendsApply = "/backends/apply"
+	routeNodes         = "/api/nodes"
+	routeVRAMEstimate  = "/api/models/vram-estimate"
+)
+
+func routeJobStatus(jobID string) string {
+	return "/models/jobs/" + url.PathEscape(jobID)
+}
+
+func routeModelDelete(name string) string {
+	return "/models/delete/" + url.PathEscape(name)
+}
+
+func routeModelConfigJSON(name string) string {
+	return "/api/models/config-json/" + url.PathEscape(name)
+}
+
+func routeBackendUpgrade(name string) string {
+	return "/backends/upgrade/" + url.PathEscape(name)
+}
+
+func routeToggleModelState(name, action string) string {
+	return fmt.Sprintf("/models/toggle-state/%s/%s", url.PathEscape(name), url.PathEscape(action))
+}
+
+func routeToggleModelPinned(name, action string) string {
+	return fmt.Sprintf("/models/toggle-pinned/%s/%s", url.PathEscape(name), url.PathEscape(action))
+}
--- a/pkg/mcp/localaitools/inproc/client.go
+++ b/pkg/mcp/localaitools/inproc/client.go
@@ -0,0 +1,490 @@
+// Package inproc provides an in-process LocalAIClient that calls LocalAI
+// services directly. Used by the chat handler when a chat session opts into
+// the LocalAI Assistant modality, avoiding an HTTP loopback to the same
+// process and the synthetic admin-credential dance that would entail.
+package inproc
+
+import (
+	"context"
+	"encoding/json"
+	"errors"
+	"fmt"
+
+	"github.com/google/uuid"
+	"github.com/mudler/LocalAI/core/config"
+	"github.com/mudler/LocalAI/core/gallery"
+	"github.com/mudler/LocalAI/core/gallery/importers"
+	"github.com/mudler/LocalAI/core/schema"
+	"github.com/mudler/LocalAI/core/services/galleryop"
+	"github.com/mudler/LocalAI/core/services/modeladmin"
+	"github.com/mudler/LocalAI/internal"
+	localaitools "github.com/mudler/LocalAI/pkg/mcp/localaitools"
+	"github.com/mudler/LocalAI/pkg/model"
+	"github.com/mudler/LocalAI/pkg/system"
+	"github.com/mudler/LocalAI/pkg/vram"
+)
+
+// Client implements localaitools.LocalAIClient by calling LocalAI services
+// directly. It is intentionally a thin shim — distribution and persistence
+// concerns are handled by the underlying services (GalleryService is already
+// distributed-aware, ModelConfigLoader manages on-disk YAML, etc.), so this
+// layer just translates between MCP DTOs and service signatures.
+type Client struct {
+	AppConfig    *config.ApplicationConfig
+	SystemState  *system.SystemState
+	ConfigLoader *config.ModelConfigLoader
+	ModelLoader  *model.ModelLoader
+	Gallery      *galleryop.GalleryService
+
+	modelAdmin *modeladmin.ConfigService
+}
+
+// New builds a Client wired to the given services. All fields are required
+// except ModelLoader (used only for SystemInfo's loaded-models report and
+// best-effort ShutdownModel calls during config edits).
+func New(appConfig *config.ApplicationConfig, systemState *system.SystemState, cl *config.ModelConfigLoader, ml *model.ModelLoader, gs *galleryop.GalleryService) *Client {
+	return &Client{
+		AppConfig:    appConfig,
+		SystemState:  systemState,
+		ConfigLoader: cl,
+		ModelLoader:  ml,
+		Gallery:      gs,
+		modelAdmin:   modeladmin.NewConfigService(cl, appConfig),
+	}
+}
+
+// Compile-time assertion that *Client satisfies localaitools.LocalAIClient.
+var _ localaitools.LocalAIClient = (*Client)(nil)
+
+// ---- Models / gallery (read) ----
+
+func (c *Client) GallerySearch(_ context.Context, q localaitools.GallerySearchQuery) ([]gallery.Metadata, error) {
+	galleries := c.AppConfig.Galleries
+	if q.Gallery != "" {
+		galleries = filterGalleries(galleries, q.Gallery)
+	}
+	models, err := gallery.AvailableGalleryModels(galleries, c.SystemState)
+	if err != nil {
+		return nil, fmt.Errorf("list gallery models: %w", err)
+	}
+
+	if q.Query != "" {
+		models = models.Search(q.Query)
+	}
+	if q.Tag != "" {
+		models = models.FilterByTag(q.Tag)
+	}
+
+	limit := q.Limit
+	if limit <= 0 {
+		limit = 20
+	}
+
+	// Surface gallery.Metadata directly — same wire shape as gallery.AvailableGalleryModels
+	// returns and the same shape REST /models/available emits, so REST and MCP stay aligned.
+	out := make([]gallery.Metadata, 0, min(len(models), limit))
+	for i, m := range models {
+		if i >= limit {
+			break
+		}
+		out = append(out, m.Metadata)
+	}
+	return out, nil
+}
+
+func (c *Client) ListInstalledModels(_ context.Context, capability localaitools.Capability) ([]localaitools.InstalledModel, error) {
+	wantFlag, hasFlag := capabilityToFlag(capability)
+	configs := c.ConfigLoader.GetModelConfigsByFilter(func(_ string, m *config.ModelConfig) bool {
+		if !hasFlag {
+			return true
+		}
+		return m.HasUsecases(wantFlag)
+	})
+
+	out := make([]localaitools.InstalledModel, 0, len(configs))
+	for _, m := range configs {
+		out = append(out, localaitools.InstalledModel{
+			Name:         m.Name,
+			Backend:      m.Backend,
+			Capabilities: capabilityFlagsOf(&m),
+		})
+	}
+	return out, nil
+}
+
+func (c *Client) ListGalleries(_ context.Context) ([]config.Gallery, error) {
+	// AppConfig.Galleries is already []config.Gallery; the JSON shape
+	// matches what REST /models/galleries emits.
+	return c.AppConfig.Galleries, nil
+}
+
+func (c *Client) GetJobStatus(_ context.Context, jobID string) (*localaitools.JobStatus, error) {
+	if jobID == "" {
+		return nil, errors.New("job id is required")
+	}
+	st := c.Gallery.GetStatus(jobID)
+	if st == nil {
+		return nil, nil
+	}
+	out := &localaitools.JobStatus{
+		ID:                 jobID,
+		Processed:          st.Processed,
+		Cancelled:          st.Cancelled,
+		Progress:           st.Progress,
+		TotalFileSize:      st.TotalFileSize,
+		DownloadedFileSize: st.DownloadedFileSize,
+		Message:            st.Message,
+	}
+	if st.Error != nil {
+		out.ErrorMessage = st.Error.Error()
+	}
+	return out, nil
+}
+
+func (c *Client) GetModelConfig(ctx context.Context, name string) (*localaitools.ModelConfigView, error) {
+	view, err := c.modelAdmin.GetConfig(ctx, name)
+	if err != nil {
+		return nil, err
+	}
+	return &localaitools.ModelConfigView{Name: view.Name, YAML: view.YAML, JSON: view.JSON}, nil
+}
+
+// ---- Models / gallery (write) ----
+
+func (c *Client) InstallModel(ctx context.Context, req localaitools.InstallModelRequest) (string, error) {
+	if req.ModelName == "" {
+		return "", errors.New("model_name is required")
+	}
+	id, err := uuid.NewUUID()
+	if err != nil {
+		return "", fmt.Errorf("generate job id: %w", err)
+	}
+	galleries := c.AppConfig.Galleries
+	if req.GalleryName != "" {
+		galleries = filterGalleries(galleries, req.GalleryName)
+	}
+	op := galleryop.ManagementOp[gallery.GalleryModel, gallery.ModelConfig]{
+		ID:                 id.String(),
+		GalleryElementName: req.ModelName,
+		Req: gallery.GalleryModel{
+			Metadata: gallery.Metadata{Name: req.ModelName},
+		},
+		Galleries:        galleries,
+		BackendGalleries: c.AppConfig.BackendGalleries,
+	}
+	if err := sendModelOp(ctx, c.Gallery.ModelGalleryChannel, op); err != nil {
+		return "", err
+	}
+	return id.String(), nil
+}
+
+func (c *Client) ImportModelURI(ctx context.Context, req localaitools.ImportModelURIRequest) (*localaitools.ImportModelURIResponse, error) {
+	if req.URI == "" {
+		return nil, errors.New("uri is required")
+	}
+	// Build the preferences JSON expected by importers.DiscoverModelConfig.
+	// Today only `backend` is meaningful; future fields can be added without
+	// changing the MCP DTO.
+	var prefs json.RawMessage
+	if req.BackendPreference != "" {
+		raw, err := json.Marshal(map[string]string{"backend": req.BackendPreference})
+		if err != nil {
+			return nil, fmt.Errorf("marshal preferences: %w", err)
+		}
+		prefs = raw
+	}
+
+	modelConfig, err := importers.DiscoverModelConfig(req.URI, prefs)
+	if err != nil {
+		var amb *importers.AmbiguousImportError
+		if errors.As(err, &amb) {
+			candidates := amb.Candidates
+			if candidates == nil {
+				candidates = []string{}
+			}
+			return &localaitools.ImportModelURIResponse{
+				AmbiguousBackend:  true,
+				Modality:          amb.Modality,
+				BackendCandidates: candidates,
+				Hint:              "call import_model_uri again with backend_preference set to one of backend_candidates",
+			}, nil
+		}
+		if errors.Is(err, importers.ErrAmbiguousImport) {
+			return &localaitools.ImportModelURIResponse{
+				AmbiguousBackend:  true,
+				BackendCandidates: []string{},
+				Hint:              "call import_model_uri again with backend_preference set",
+			}, nil
+		}
+		return nil, fmt.Errorf("discover model config: %w", err)
+	}
+
+	id, err := uuid.NewUUID()
+	if err != nil {
+		return nil, fmt.Errorf("generate job id: %w", err)
+	}
+	galleryID := req.URI
+	if modelConfig.Name != "" {
+		galleryID = modelConfig.Name
+	}
+	overrides := req.Overrides
+	if overrides == nil {
+		overrides = map[string]any{}
+	}
+	op := galleryop.ManagementOp[gallery.GalleryModel, gallery.ModelConfig]{
+		Req:                gallery.GalleryModel{Overrides: overrides},
+		ID:                 id.String(),
+		GalleryElementName: galleryID,
+		GalleryElement:     &modelConfig,
+		BackendGalleries:   c.AppConfig.BackendGalleries,
+	}
+	if err := sendModelOp(ctx, c.Gallery.ModelGalleryChannel, op); err != nil {
+		return nil, err
+	}
+	return &localaitools.ImportModelURIResponse{
+		JobID:               id.String(),
+		DiscoveredModelName: modelConfig.Name,
+	}, nil
+}
+
+func (c *Client) DeleteModel(ctx context.Context, name string) error {
+	if name == "" {
+		return errors.New("name is required")
+	}
+	op := galleryop.ManagementOp[gallery.GalleryModel, gallery.ModelConfig]{
+		Delete:             true,
+		GalleryElementName: name,
+	}
+	if err := sendModelOp(ctx, c.Gallery.ModelGalleryChannel, op); err != nil {
+		return err
+	}
+	c.ConfigLoader.RemoveModelConfig(name)
+	return nil
+}
+
+func (c *Client) EditModelConfig(ctx context.Context, name string, patch map[string]any) error {
+	_, err := c.modelAdmin.PatchConfig(ctx, name, patch)
+	return err
+}
+
+func (c *Client) ReloadModels(_ context.Context) error {
+	if c.SystemState == nil {
+		return errors.New("system state not available")
+	}
+	return c.ConfigLoader.LoadModelConfigsFromPath(c.SystemState.Model.ModelsPath)
+}
+
+// ---- Backends ----
+
+func (c *Client) ListBackends(_ context.Context) ([]localaitools.Backend, error) {
+	systemBackends, err := c.Gallery.ListBackends()
+	if err != nil {
+		return nil, fmt.Errorf("list backends: %w", err)
+	}
+	out := make([]localaitools.Backend, 0, len(systemBackends))
+	for name := range systemBackends {
+		out = append(out, localaitools.Backend{Name: name, Installed: true})
+	}
+	return out, nil
+}
+
+func (c *Client) ListKnownBackends(_ context.Context) ([]schema.KnownBackend, error) {
+	available, err := gallery.AvailableBackends(c.AppConfig.BackendGalleries, c.SystemState)
+	if err != nil {
+		return nil, fmt.Errorf("list known backends: %w", err)
+	}
+	// Match the wire shape of REST /backends/known so the tool output is
+	// identical regardless of which client served it.
+	out := make([]schema.KnownBackend, 0, len(available))
+	for _, b := range available {
+		out = append(out, schema.KnownBackend{
+			Name:        b.GetName(),
+			Description: b.GetDescription(),
+			Installed:   b.GetInstalled(),
+		})
+	}
+	return out, nil
+}
+
+func (c *Client) InstallBackend(ctx context.Context, req localaitools.InstallBackendRequest) (string, error) {
+	if req.BackendName == "" {
+		return "", errors.New("backend_name is required")
+	}
+	id, err := uuid.NewUUID()
+	if err != nil {
+		return "", fmt.Errorf("generate job id: %w", err)
+	}
+	galleries := c.AppConfig.BackendGalleries
+	if req.GalleryName != "" {
+		galleries = filterGalleries(galleries, req.GalleryName)
+	}
+	op := galleryop.ManagementOp[gallery.GalleryBackend, any]{
+		ID:                 id.String(),
+		GalleryElementName: req.BackendName,
+		Galleries:          galleries,
+	}
+	if err := sendBackendOp(ctx, c.Gallery.BackendGalleryChannel, op); err != nil {
+		return "", err
+	}
+	return id.String(), nil
+}
+
+func (c *Client) UpgradeBackend(ctx context.Context, name string) (string, error) {
+	if name == "" {
+		return "", errors.New("name is required")
+	}
+	id, err := uuid.NewUUID()
+	if err != nil {
+		return "", fmt.Errorf("generate job id: %w", err)
+	}
+	op := galleryop.ManagementOp[gallery.GalleryBackend, any]{
+		ID:                 id.String(),
+		GalleryElementName: name,
+		Galleries:          c.AppConfig.BackendGalleries,
+		Upgrade:            true,
+	}
+	if err := sendBackendOp(ctx, c.Gallery.BackendGalleryChannel, op); err != nil {
+		return "", err
+	}
+	return id.String(), nil
+}
+
+// ---- System ----
+
+func (c *Client) SystemInfo(_ context.Context) (*localaitools.SystemInfo, error) {
+	info := &localaitools.SystemInfo{
+		Version:     internal.PrintableVersion(),
+		Distributed: c.AppConfig != nil && c.AppConfig.Distributed.Enabled,
+	}
+	if c.SystemState != nil {
+		info.BackendsPath = c.SystemState.Backend.BackendsPath
+		info.ModelsPath = c.SystemState.Model.ModelsPath
+	}
+	if c.ModelLoader != nil {
+		for _, m := range c.ModelLoader.ListLoadedModels() {
+			info.LoadedModels = append(info.LoadedModels, m.ID)
+		}
+	}
+	if c.Gallery != nil {
+		if backends, err := c.Gallery.ListBackends(); err == nil {
+			for name := range backends {
+				info.InstalledBackends = append(info.InstalledBackends, name)
+			}
+		}
+	}
+	return info, nil
+}
+
+func (c *Client) ListNodes(_ context.Context) ([]localaitools.Node, error) {
+	// Node-registry wiring is the responsibility of the Application layer; an
+	// empty list is the right answer in single-process mode and a sensible
+	// stub until the Application plumbs the registry into this client.
+	return []localaitools.Node{}, nil
+}
+
+func (c *Client) VRAMEstimate(ctx context.Context, req localaitools.VRAMEstimateRequest) (*vram.EstimateResult, error) {
+	resp, err := modeladmin.EstimateVRAM(ctx, modeladmin.VRAMRequest{
+		Model:       req.ModelName,
+		ContextSize: uint32(req.ContextSize),
+		GPULayers:   req.GPULayers,
+		KVQuantBits: req.KVQuantBits,
+	}, c.ConfigLoader, c.SystemState)
+	if err != nil {
+		return nil, err
+	}
+	// Forward vram.EstimateResult unchanged so the LLM sees the same
+	// shape (size_bytes / size_display / vram_bytes / vram_display) that
+	// REST /api/models/vram-estimate returns.
+	return &resp.EstimateResult, nil
+}
+
+// ---- State ----
+
+func (c *Client) ToggleModelState(ctx context.Context, name string, action modeladmin.Action) error {
+	_, err := c.modelAdmin.ToggleState(ctx, name, action, c.ModelLoader)
+	return err
+}
+
+func (c *Client) ToggleModelPinned(ctx context.Context, name string, action modeladmin.Action) error {
+	// No syncPinned callback wired here; the watchdog refresh callback is
+	// owned by the HTTP handler today. The MCP-driven path skips it; the
+	// next idle tick or manual reload picks the new pinned set up.
+	_, err := c.modelAdmin.TogglePinned(ctx, name, action, nil)
+	return err
+}
+
+// ---- helpers ----
+
+// sendModelOp pushes op onto ch but bails if ctx is cancelled before the
+// gallery worker is ready to receive. Without the select the chat handler
+// goroutine would block forever when the worker is paused or the buffer is
+// full — the LLM keeps polling and the request goroutine leaks. When the
+// caller cancels we surface ctx.Err() so the LLM stops polling.
+func sendModelOp(ctx context.Context, ch chan galleryop.ManagementOp[gallery.GalleryModel, gallery.ModelConfig], op galleryop.ManagementOp[gallery.GalleryModel, gallery.ModelConfig]) error {
+	select {
+	case ch <- op:
+		return nil
+	case <-ctx.Done():
+		return ctx.Err()
+	}
+}
+
+// sendBackendOp is the BackendGalleryChannel sibling of sendModelOp. Same
+// rationale — see that comment.
+func sendBackendOp(ctx context.Context, ch chan galleryop.ManagementOp[gallery.GalleryBackend, any], op galleryop.ManagementOp[gallery.GalleryBackend, any]) error {
+	select {
+	case ch <- op:
+		return nil
+	case <-ctx.Done():
+		return ctx.Err()
+	}
+}
+
+func filterGalleries(galleries []config.Gallery, name string) []config.Gallery {
+	for _, g := range galleries {
+		if g.Name == name {
+			return []config.Gallery{g}
+		}
+	}
+	return nil
+}
+
+// capabilityToFlag maps the public Capability constants to the loader's
+// usecase bitflag. CapabilityAny (the empty value) selects all models.
+func capabilityToFlag(capability localaitools.Capability) (config.ModelConfigUsecase, bool) {
+	switch capability {
+	case localaitools.CapabilityAny:
+		return 0, false
+	case localaitools.CapabilityChat:
+		return config.FLAG_CHAT, true
+	case localaitools.CapabilityCompletion:
+		return config.FLAG_COMPLETION, true
+	case localaitools.CapabilityEmbeddings:
+		return config.FLAG_EMBEDDINGS, true
+	case localaitools.CapabilityImage:
+		return config.FLAG_IMAGE, true
+	case localaitools.CapabilityTTS:
+		return config.FLAG_TTS, true
+	case localaitools.CapabilityTranscript:
+		return config.FLAG_TRANSCRIPT, true
+	case localaitools.CapabilityRerank:
+		return config.FLAG_RERANK, true
+	case localaitools.CapabilityVAD:
+		return config.FLAG_VAD, true
+	}
+	return 0, false
+}
+
+func capabilityFlagsOf(m *config.ModelConfig) []string {
+	var out []string
+	for label, flag := range config.GetAllModelConfigUsecases() {
+		if flag == 0 {
+			continue
+		}
+		if m.HasUsecases(flag) {
+			// Trim "FLAG_" prefix for prettier output.
+			out = append(out, label[len("FLAG_"):])
+		}
+	}
+	return out
+}
--- a/pkg/mcp/localaitools/inproc/client_test.go
+++ b/pkg/mcp/localaitools/inproc/client_test.go
@@ -0,0 +1,49 @@
+package inproc
+
+import (
+	"context"
+	"errors"
+	"time"
+
+	. "github.com/onsi/ginkgo/v2"
+	. "github.com/onsi/gomega"
+
+	"github.com/mudler/LocalAI/core/config"
+	"github.com/mudler/LocalAI/core/gallery"
+	"github.com/mudler/LocalAI/core/services/galleryop"
+	localaitools "github.com/mudler/LocalAI/pkg/mcp/localaitools"
+	"github.com/mudler/LocalAI/pkg/system"
+)
+
+// Regression spec for the bug we fixed when channel sends were
+// unconditional: with a never-read gallery channel and a pre-cancelled
+// ctx, InstallModel must surface ctx.Err() instead of blocking forever.
+// The same guarantee covers ImportModelURI, DeleteModel, InstallBackend,
+// UpgradeBackend — they all share sendModelOp / sendBackendOp.
+var _ = Describe("inproc.Client cancellation", func() {
+	It("InstallModel returns context.Canceled when the gallery channel is never drained", func() {
+		gs := &galleryop.GalleryService{
+			// Unbuffered. Nothing reads from it in this spec, so a naive
+			// send would block the goroutine indefinitely.
+			ModelGalleryChannel: make(chan galleryop.ManagementOp[gallery.GalleryModel, gallery.ModelConfig]),
+		}
+		c := &Client{
+			AppConfig:   &config.ApplicationConfig{SystemState: &system.SystemState{Model: system.Model{ModelsPath: GinkgoT().TempDir()}}},
+			SystemState: &system.SystemState{Model: system.Model{ModelsPath: GinkgoT().TempDir()}},
+			Gallery:     gs,
+		}
+
+		ctx, cancel := context.WithCancel(context.Background())
+		cancel() // pre-cancel: the select must take the ctx.Done branch immediately.
+
+		done := make(chan error, 1)
+		go func() {
+			_, err := c.InstallModel(ctx, localaitools.InstallModelRequest{ModelName: "x"})
+			done <- err
+		}()
+
+		var err error
+		Eventually(done, time.Second).Should(Receive(&err))
+		Expect(errors.Is(err, context.Canceled)).To(BeTrue(), "got: %v", err)
+	})
+})
--- a/pkg/mcp/localaitools/inproc/inproc_suite_test.go
+++ b/pkg/mcp/localaitools/inproc/inproc_suite_test.go
@@ -0,0 +1,13 @@
+package inproc
+
+import (
+	"testing"
+
+	. "github.com/onsi/ginkgo/v2"
+	. "github.com/onsi/gomega"
+)
+
+func TestInproc(t *testing.T) {
+	RegisterFailHandler(Fail)
+	RunSpecs(t, "localaitools/inproc test suite")
+}
--- a/pkg/mcp/localaitools/parity_test.go
+++ b/pkg/mcp/localaitools/parity_test.go
@@ -0,0 +1,155 @@
+package localaitools_test
+
+// Parity test: both LocalAIClient implementations (inproc.Client and
+// httpapi.Client) must produce equivalent payloads for the same input. The
+// test uses an httptest server that mimics the LocalAI admin REST surface
+// for the methods httpapi.Client touches; the inproc client is exercised
+// through the same fake by way of a stand-in that wraps the same data.
+//
+// This file also hosts the single RunSpecs entrypoint for the localaitools
+// suite — Ginkgo aggregates Describes from both the internal `localaitools`
+// package and the external `localaitools_test` package into one run.
+
+import (
+	"context"
+	"encoding/json"
+	"net/http"
+	"net/http/httptest"
+	"sort"
+	"testing"
+
+	. "github.com/onsi/ginkgo/v2"
+	. "github.com/onsi/gomega"
+
+	"github.com/mudler/LocalAI/core/config"
+	localaitools "github.com/mudler/LocalAI/pkg/mcp/localaitools"
+	"github.com/mudler/LocalAI/pkg/mcp/localaitools/httpapi"
+)
+
+func TestLocalAITools(t *testing.T) {
+	RegisterFailHandler(Fail)
+	RunSpecs(t, "localaitools test suite")
+}
+
+// fakeBackend is the canned JSON the httptest server returns. Keep
+// responses deterministic so we can compare byte-by-byte.
+func fakeBackend() *httptest.Server {
+	mux := http.NewServeMux()
+
+	mux.HandleFunc("/models/galleries", func(w http.ResponseWriter, _ *http.Request) {
+		_ = json.NewEncoder(w).Encode([]map[string]any{
+			{"name": "official", "url": "http://gallery"},
+		})
+	})
+	mux.HandleFunc("/models/available", func(w http.ResponseWriter, _ *http.Request) {
+		_ = json.NewEncoder(w).Encode([]map[string]any{
+			{
+				"name":    "qwen2.5-7b-instruct",
+				"tags":    []string{"chat"},
+				"gallery": map[string]any{"name": "official", "url": "http://gallery"},
+			},
+		})
+	})
+	mux.HandleFunc("/", func(w http.ResponseWriter, _ *http.Request) {
+		_ = json.NewEncoder(w).Encode(map[string]any{
+			"Version":           "v0.0.0-parity",
+			"InstalledBackends": map[string]bool{"llama-cpp": true},
+			"ModelsConfig":      []map[string]any{{"name": "qwen2.5-7b-instruct", "backend": "llama-cpp"}},
+		})
+	})
+	mux.HandleFunc("/backends", func(w http.ResponseWriter, _ *http.Request) {
+		_ = json.NewEncoder(w).Encode([]map[string]any{
+			{"name": "llama-cpp", "installed": true},
+		})
+	})
+	mux.HandleFunc("/models/import-uri", func(w http.ResponseWriter, _ *http.Request) {
+		// Simulate an ambiguous-backend response so we can verify the
+		// httpapi.Client translates 400 + "ambiguous import" into the same
+		// AmbiguousBackend shape the inproc client uses.
+		w.WriteHeader(http.StatusBadRequest)
+		_ = json.NewEncoder(w).Encode(map[string]any{
+			"error":      "ambiguous import",
+			"detail":     "multiple backends",
+			"modality":   "tts",
+			"candidates": []string{"piper", "kokoro"},
+			"hint":       "Pass preferences.backend",
+		})
+	})
+
+	return httptest.NewServer(mux)
+}
+
+// inprocLikeFromHTTP narrows the parity check to "the JSON the LLM
+// observes is the same regardless of which client produced it". A real
+// inproc-vs-http parity rig would need to wire the full service layer;
+// that lives in the modeladmin tests.
+func inprocLikeFromHTTP(target string) localaitools.LocalAIClient {
+	return httpapi.New(target, "")
+}
+
+func sortGalleries(in []config.Gallery) []config.Gallery {
+	out := append([]config.Gallery(nil), in...)
+	sort.Slice(out, func(i, j int) bool { return out[i].Name < out[j].Name })
+	return out
+}
+
+var _ = Describe("LocalAIClient parity", func() {
+	var (
+		srv *httptest.Server
+		ctx context.Context
+		a   localaitools.LocalAIClient
+		b   localaitools.LocalAIClient
+	)
+
+	BeforeEach(func() {
+		srv = fakeBackend()
+		ctx = context.Background()
+		a = httpapi.New(srv.URL, "")
+		b = inprocLikeFromHTTP(srv.URL)
+	})
+
+	AfterEach(func() {
+		srv.Close()
+	})
+
+	It("ListGalleries produces identical output", func() {
+		left, err := a.ListGalleries(ctx)
+		Expect(err).ToNot(HaveOccurred())
+		right, err := b.ListGalleries(ctx)
+		Expect(err).ToNot(HaveOccurred())
+		Expect(sortGalleries(left)).To(Equal(sortGalleries(right)))
+	})
+
+	It("GallerySearch produces identical output", func() {
+		q := localaitools.GallerySearchQuery{Query: "qwen", Tag: "chat", Limit: 5}
+		left, err := a.GallerySearch(ctx, q)
+		Expect(err).ToNot(HaveOccurred())
+		right, err := b.GallerySearch(ctx, q)
+		Expect(err).ToNot(HaveOccurred())
+		Expect(left).To(Equal(right))
+	})
+
+	It("ImportModelURI surfaces AmbiguousBackend equivalently", func() {
+		req := localaitools.ImportModelURIRequest{URI: "Qwen/Qwen3-4B-GGUF"}
+		left, err := a.ImportModelURI(ctx, req)
+		Expect(err).ToNot(HaveOccurred())
+		right, err := b.ImportModelURI(ctx, req)
+		Expect(err).ToNot(HaveOccurred())
+
+		Expect(left.AmbiguousBackend).To(BeTrue(), "left side ambiguity")
+		Expect(right.AmbiguousBackend).To(BeTrue(), "right side ambiguity")
+		Expect(left.BackendCandidates).To(Equal(right.BackendCandidates))
+	})
+
+	It("SystemInfo produces identical output (sorted)", func() {
+		left, err := a.SystemInfo(ctx)
+		Expect(err).ToNot(HaveOccurred())
+		right, err := b.SystemInfo(ctx)
+		Expect(err).ToNot(HaveOccurred())
+
+		// Backends slice ordering is map-iteration-sensitive; sort first.
+		sort.Strings(left.InstalledBackends)
+		sort.Strings(right.InstalledBackends)
+		Expect(left).To(Equal(right))
+	})
+})
--- a/pkg/mcp/localaitools/prompts.go
+++ b/pkg/mcp/localaitools/prompts.go
@@ -0,0 +1,78 @@
+package localaitools
+
+import (
+	"embed"
+	"fmt"
+	"io/fs"
+	"path"
+	"sort"
+	"strings"
+)
+
+//go:embed prompts/*.md prompts/skills/*.md
+var promptsFS embed.FS
+
+// SystemPrompt assembles the assistant system prompt from the embedded
+// markdown files. The walk is deterministic (lexicographic).
+//
+// Panics if the embedded FS walk fails: the only realistic cause is a
+// build-time misconfiguration of the //go:embed directive, and serving
+// a silently-empty prompt to the LLM is far worse than crashing the
+// init path. The TestSystemPromptIncludesAllEmbeddedFiles test catches
+// regressions in CI before they ship.
+func SystemPrompt(_ Options) string {
+	var paths []string
+	if err := fs.WalkDir(promptsFS, "prompts", func(p string, d fs.DirEntry, err error) error {
+		if err != nil {
+			return err
+		}
+		if d.IsDir() || !strings.HasSuffix(p, ".md") {
+			return nil
+		}
+		paths = append(paths, p)
+		return nil
+	}); err != nil {
+		panic(fmt.Errorf("localaitools: walk embedded prompts: %w", err))
+	}
+	sort.Strings(paths)
+
+	var b strings.Builder
+	for _, p := range paths {
+		data, err := promptsFS.ReadFile(p)
+		if err != nil {
+			continue
+		}
+		// File-level header for traceability ("which skill is the model citing?").
+		// We use the file basename without extension.
+		section := strings.TrimSuffix(path.Base(p), ".md")
+		if b.Len() > 0 {
+			b.WriteString("\n\n")
+		}
+		b.WriteString("<!-- file: ")
+		b.WriteString(p)
+		b.WriteString(" -->\n")
+		b.WriteString("# section: ")
+		b.WriteString(section)
+		b.WriteString("\n\n")
+		b.WriteString(string(data))
+	}
+	return b.String()
+}
+
+// embeddedPromptPaths returns the lexicographically-sorted list of embedded
+// prompt files. Exposed for tests.
+func embeddedPromptPaths() []string {
+	var paths []string
+	_ = fs.WalkDir(promptsFS, "prompts", func(p string, d fs.DirEntry, err error) error {
+		if err != nil {
+			return err
+		}
+		if d.IsDir() || !strings.HasSuffix(p, ".md") {
+			return nil
+		}
+		paths = append(paths, p)
+		return nil
+	})
+	sort.Strings(paths)
+	return paths
+}
--- a/pkg/mcp/localaitools/prompts/00_role.md
+++ b/pkg/mcp/localaitools/prompts/00_role.md
@@ -0,0 +1,7 @@
+# Role
+
+You are **LocalAI Assistant**, the in-chat operator for this LocalAI deployment. You help an administrator manage the server conversationally — installing and removing models, managing backends, editing model configs, and reporting system status — by calling the MCP tools available to you.
+
+You are running **inside** the very LocalAI instance you administer. Tool calls take effect on this server immediately. Treat that gravity accordingly: be precise, confirm before mutating, and surface tool errors verbatim.
+
+Be concise. Prefer short answers and bullet lists over prose. When you list options, number them so the user can pick by number.
--- a/pkg/mcp/localaitools/prompts/10_safety.md
+++ b/pkg/mcp/localaitools/prompts/10_safety.md
@@ -0,0 +1,13 @@
+# Safety rules
+
+These rules are non-negotiable. The user trusts you to operate their server without unintended changes.
+
+1. **Confirm before mutating.** Before calling any of these tools — `install_model`, `import_model_uri`, `delete_model`, `install_backend`, `upgrade_backend`, `edit_model_config`, `reload_models`, `toggle_model_state`, `toggle_model_pinned` — first state in plain language what you are about to do (which tool, which target, which arguments) and wait for the user's explicit confirmation in the next turn. "Yes", "do it", "go ahead", "proceed" all count as confirmation. Anything else does not.
+
+2. **Disambiguate before mutating.** If the user's request is ambiguous (several gallery candidates match, the model name has multiple installed versions, the backend has variants), present the candidates as a numbered list and ask the user to pick before calling any mutating tool.
+
+3. **Surface tool errors verbatim.** If a tool returns an error, quote the error message back to the user inside a fenced code block. Do not retry or paraphrase. Wait for the user's instruction before acting again.
+
+4. **Never invent identifiers.** Only use model names, gallery names, backend names, and job IDs that came from a tool result earlier in this conversation. If you don't have one, call the appropriate `gallery_search` / `list_*` tool first.
+
+5. **Polling.** When polling `get_job_status`, stop after the status reports `processed: true`, `cancelled: true`, or you have polled 30 times — whichever comes first. Always summarise the final outcome to the user.
--- a/pkg/mcp/localaitools/prompts/20_tools.md
+++ b/pkg/mcp/localaitools/prompts/20_tools.md
@@ -0,0 +1,28 @@
+# Tool catalog
+
+The MCP `tools/list` endpoint also exposes the full input schema for each of these. The list below is the canonical curated description.
+
+## Read-only
+
+- `gallery_search` — Search configured galleries for installable models.
+- `list_installed_models` — List models currently installed on this LocalAI. Optional `capability` filter (e.g. `chat`, `embed`, `image`).
+- `list_galleries` — List configured model galleries.
+- `list_backends` — List installed backends.
+- `list_known_backends` — List backends available to install from configured backend galleries.
+- `get_job_status` — Poll the status of an install/delete/upgrade job by id.
+- `get_model_config` — Read the YAML/JSON config of an installed model.
+- `vram_estimate` — Estimate VRAM use for a model under a given config.
+- `system_info` — LocalAI version, paths, distributed flag, loaded models, installed backends.
+- `list_nodes` — List federated worker nodes (only useful in distributed mode).
+
+## Mutating (require user confirmation per safety rule 1)
+
+- `install_model` — Install a model from a gallery. Returns a job id; poll with `get_job_status`.
+- `import_model_uri` — Install a model from an arbitrary URI (HuggingFace, OCI, http(s), file://). May return `ambiguous_backend` when several backends apply; call again with `backend_preference` to disambiguate.
+- `delete_model` — Delete an installed model.
+- `install_backend` — Install a backend.
+- `upgrade_backend` — Upgrade an installed backend by name.
+- `edit_model_config` — Patch (deep-merge) JSON into an installed model's config.
+- `reload_models` — Reload all model configs from disk.
+- `toggle_model_state` — Enable or disable a model (`action`: `enable` or `disable`).
+- `toggle_model_pinned` — Pin or unpin a model (`action`: `pin` or `unpin`).
--- a/pkg/mcp/localaitools/prompts/skills/edit_model_config.md
+++ b/pkg/mcp/localaitools/prompts/skills/edit_model_config.md
@@ -0,0 +1,13 @@
+# Skill: Safely edit a model config
+
+Use this when the user wants to change a setting on an installed model (context size, parameters, prompt template, etc.).
+
+1. Call `get_model_config` with the model name. Display the relevant section of the YAML/JSON.
+2. Identify the field(s) the user wants to change. If their intent is ambiguous, ask before proceeding.
+3. Show a diff: **before** → **after** for each touched field.
+4. Summarise and ask for confirmation: "I'll patch **`<name>`** with `{...}` — confirm?".
+5. On confirmation, call `edit_model_config` with the smallest possible deep-merge patch (only the changed keys).
+6. Call `reload_models` so the change takes effect.
+7. Verify by calling `get_model_config` again and reporting the new values.
+
+Never call `edit_model_config` without showing the diff first.
--- a/pkg/mcp/localaitools/prompts/skills/import_model_from_uri.md
+++ b/pkg/mcp/localaitools/prompts/skills/import_model_from_uri.md
@@ -0,0 +1,17 @@
+# Skill: Import a model from a URI
+
+Use this when the user wants to install a model from a source LocalAI doesn't already curate — a HuggingFace link, an OCI image reference, a `file://` path, or a generic HTTP URL. Prefer `gallery_search` + `install_model` first if the model is in a configured gallery; only fall back to import when it isn't.
+
+1. If the user has not already supplied a URI, ask for one. Acceptable forms include `Qwen/Qwen3-4B-GGUF`, `https://huggingface.co/...`, `oci://...`, or `file:///path/to/local.yaml`.
+2. Summarise: "I'll import `<URI>` — confirm?" and wait for confirmation.
+3. On confirmation, call `import_model_uri` with the URI and **no** `backend_preference`.
+4. **If the response has `ambiguous_backend == true`:**
+   - Show the user the `backend_candidates` list as a numbered choice, mention `modality` if present, and quote any returned `hint` verbatim.
+   - Wait for the user to pick.
+   - Call `import_model_uri` again with the URI and `backend_preference` set to the chosen candidate.
+5. **If the response has a `job_id`:**
+   - Note the `discovered_model_name` if present (the assistant should use that name for follow-ups, since the importer may rewrite it).
+   - Poll `get_job_status` until the job reports `processed: true`. Report meaningful progress changes.
+6. After success, call `reload_models`, then `list_installed_models` to confirm the new model is visible. Tell the user the canonical name to use in chat completions.
+
+If `import_model_uri` returns a non-ambiguity error (network, gated repo, unsupported source), surface it verbatim and ask whether to retry, try a different URI, or abort. Never re-call with a guessed backend — only use `backend_candidates` from a real ambiguity response.
--- a/pkg/mcp/localaitools/prompts/skills/install_chat_model.md
+++ b/pkg/mcp/localaitools/prompts/skills/install_chat_model.md
@@ -0,0 +1,14 @@
+# Skill: Install a chat model
+
+Use this when the user wants to install a chat-capable model — including the case where there are no chat models installed at all and they ask you to bootstrap one.
+
+1. Call `gallery_search` with their query (and `tag: "chat"` when the user asked specifically for chat).
+2. Show the top results as a numbered list with name, gallery, short description, and license. If none match, say so and ask whether to broaden the search.
+3. Wait for the user to pick.
+4. Summarise the chosen install ("I'll install **`<gallery>/<name>`** — confirm?") and wait for confirmation.
+5. On confirmation, call `install_model` with `gallery_name` and `model_name` from the chosen hit.
+6. Poll `get_job_status` with the returned job id. Report meaningful progress changes (every ~10–20%, plus completion).
+7. When the job reports `processed: true` and no error, call `reload_models`, then `list_installed_models` with `capability: "chat"` to confirm the model is now visible.
+8. Tell the user the model is ready and how to use it (its name as the `model` field in chat completions).
+
+If `install_model` fails, surface the error and ask whether to retry, pick a different model, or abort.
--- a/pkg/mcp/localaitools/prompts/skills/system_status.md
+++ b/pkg/mcp/localaitools/prompts/skills/system_status.md
@@ -0,0 +1,14 @@
+# Skill: System status
+
+Use this when the user asks "what's installed?", "what's running?", "show status", or anything similar.
+
+1. Call `system_info` for version, paths, distributed flag, loaded models, installed backends.
+2. Call `list_installed_models` (no capability filter) for the full installed-model inventory.
+3. If `system_info.distributed` is true, also call `list_nodes` and report worker health.
+4. Present a concise summary:
+   - **Version & mode** (`distributed: true|false`)
+   - **Installed models** (count + list, each with name and capabilities)
+   - **Installed backends** (count + list)
+   - **Loaded right now** (from `loaded_models`)
+   - **Workers** (only when distributed)
+5. Do not call mutating tools in this skill.
--- a/pkg/mcp/localaitools/prompts/skills/upgrade_backend.md
+++ b/pkg/mcp/localaitools/prompts/skills/upgrade_backend.md
@@ -0,0 +1,10 @@
+# Skill: Upgrade a backend
+
+Use this when the user asks to upgrade, refresh, or update a backend.
+
+1. Call `list_backends` to confirm the backend is installed and to capture its canonical name.
+2. If the user asked generically ("upgrade the backends"), call `list_known_backends` and compare versions/tags to identify upgrade candidates. Present them as a numbered list and ask which to upgrade.
+3. Summarise: "I'll upgrade **`<name>`** — confirm?" and wait.
+4. On confirmation, call `upgrade_backend` with the canonical name.
+5. Poll `get_job_status` until done. Report progress and the final outcome.
+6. After success, recommend the user reload any model that was using the upgraded backend (or call `reload_models` if the user agrees).
--- a/pkg/mcp/localaitools/prompts_test.go
+++ b/pkg/mcp/localaitools/prompts_test.go
@@ -0,0 +1,57 @@
+package localaitools
+
+import (
+	"sort"
+	"strings"
+
+	. "github.com/onsi/ginkgo/v2"
+	. "github.com/onsi/gomega"
+)
+
+var _ = Describe("SystemPrompt assembler", func() {
+	It("includes every embedded markdown file in lexicographic order", func() {
+		got := SystemPrompt(Options{})
+
+		paths := embeddedPromptPaths()
+		Expect(paths).ToNot(BeEmpty(), "embed directive must surface at least one file")
+
+		// Each file's basename should appear in the output, and in order.
+		var lastIdx int
+		for _, p := range paths {
+			section := strings.TrimSuffix(p[strings.LastIndex(p, "/")+1:], ".md")
+			needle := "# section: " + section
+			idx := strings.Index(got, needle)
+			Expect(idx).To(BeNumerically(">=", 0), "section %q (%s) missing from output", section, p)
+			Expect(idx).To(BeNumerically(">=", lastIdx), "section %q out of lexicographic order", section)
+			lastIdx = idx
+		}
+	})
+
+	It("exposes the embedded paths sorted", func() {
+		paths := embeddedPromptPaths()
+		Expect(sort.StringsAreSorted(paths)).To(BeTrue(), "paths %v are not sorted", paths)
+	})
+
+	It("contains the safety anchors that the LLM relies on", func() {
+		out := SystemPrompt(Options{})
+
+		// Guards against accidental safety-rule deletion. These strings
+		// also live in tools_*.go (the Tool* constants); the prompt MUST
+		// stay aligned because the LLM uses these names verbatim.
+		mustContain := []string{
+			"LocalAI Assistant",
+			"Confirm before mutating",
+			"Surface tool errors verbatim",
+			"install_model",
+			"delete_model",
+			// Skill files we ship.
+			"Skill: Install a chat model",
+			"Skill: Upgrade a backend",
+			"Skill: System status",
+			"Skill: Safely edit a model config",
+		}
+		for _, s := range mustContain {
+			Expect(out).To(ContainSubstring(s), "system prompt missing required anchor %q", s)
+		}
+	})
+})
--- a/pkg/mcp/localaitools/server.go
+++ b/pkg/mcp/localaitools/server.go
@@ -0,0 +1,52 @@
+package localaitools
+
+import (
+	"github.com/modelcontextprotocol/go-sdk/mcp"
+
+	"github.com/mudler/LocalAI/internal"
+)
+
+// Options control which tools the server registers and how the embedded
+// skill prompts are surfaced.
+type Options struct {
+	// DisableMutating omits all tools that change server state. Used by the
+	// "--read-only" flavour of the standalone stdio CLI.
+	DisableMutating bool
+
+	// ServerName overrides the MCP server's advertised Implementation.Name.
+	// Defaults to "localai-admin".
+	ServerName string
+
+	// ServerVersion overrides the advertised version. Defaults to the linked
+	// internal.PrintableVersion().
+	ServerVersion string
+}
+
+// NewServer builds an MCP server that exposes LocalAI's admin surface as
+// tools, backed by the supplied LocalAIClient. The same server type is used
+// in-process (paired in-memory transport) and out-of-process (stdio).
+func NewServer(client LocalAIClient, opts Options) *mcp.Server {
+	name := opts.ServerName
+	if name == "" {
+		name = DefaultServerName
+	}
+	version := opts.ServerVersion
+	if version == "" {
+		version = internal.PrintableVersion()
+	}
+
+	srv := mcp.NewServer(&mcp.Implementation{
+		Name:    name,
+		Version: version,
+	}, &mcp.ServerOptions{
+		Instructions: SystemPrompt(opts),
+	})
+
+	registerModelTools(srv, client, opts)
+	registerBackendTools(srv, client, opts)
+	registerConfigTools(srv, client, opts)
+	registerSystemTools(srv, client, opts)
+	registerStateTools(srv, client, opts)
+
+	return srv
+}
--- a/pkg/mcp/localaitools/server_test.go
+++ b/pkg/mcp/localaitools/server_test.go
@@ -0,0 +1,244 @@
+package localaitools
+
+import (
+	"context"
+	"errors"
+	"sort"
+	"strings"
+	"sync"
+
+	. "github.com/onsi/ginkgo/v2"
+	. "github.com/onsi/gomega"
+
+	"github.com/modelcontextprotocol/go-sdk/mcp"
+
+	"github.com/mudler/LocalAI/core/gallery"
+)
+
+// connectInMemory wires an MCP server (built via NewServer) to a client over
+// a paired in-memory transport (net.Pipe). Returns the client session along
+// with a teardown closure suitable for DeferCleanup.
+func connectInMemory(client LocalAIClient, opts Options) (context.Context, *mcp.ClientSession, func()) {
+	ctx, cancel := context.WithCancel(context.Background())
+	srv := NewServer(client, opts)
+	t1, t2 := mcp.NewInMemoryTransports()
+
+	serverSession, err := srv.Connect(ctx, t1, nil)
+	Expect(err).ToNot(HaveOccurred(), "server connect")
+
+	c := mcp.NewClient(&mcp.Implementation{Name: "test-client", Version: "v0"}, nil)
+	clientSession, err := c.Connect(ctx, t2, nil)
+	Expect(err).ToNot(HaveOccurred(), "client connect")
+
+	return ctx, clientSession, func() {
+		_ = clientSession.Close()
+		_ = serverSession.Wait()
+		cancel()
+	}
+}
+
+// listToolNames returns the sorted list of tool names exposed by the server.
+func listToolNames(ctx context.Context, sess *mcp.ClientSession) []string {
+	res, err := sess.ListTools(ctx, nil)
+	Expect(err).ToNot(HaveOccurred(), "list tools")
+	names := make([]string, 0, len(res.Tools))
+	for _, tl := range res.Tools {
+		names = append(names, tl.Name)
+	}
+	sort.Strings(names)
+	return names
+}
+
+// callTool is a small wrapper to reduce boilerplate. CallToolParams.Arguments
+// is declared as `any` and the SDK marshals it for the wire — passing a
+// pre-marshalled []byte (or json.RawMessage) here would be double-encoded as
+// a base64 string.
+func callTool(ctx context.Context, sess *mcp.ClientSession, name string, args any) *mcp.CallToolResult {
+	res, err := sess.CallTool(ctx, &mcp.CallToolParams{Name: name, Arguments: args})
+	Expect(err).ToNot(HaveOccurred(), "call tool %s", name)
+	return res
+}
+
+// resultText concatenates all TextContent items of a result.
+func resultText(res *mcp.CallToolResult) string {
+	var b strings.Builder
+	for _, c := range res.Content {
+		if tc, ok := c.(*mcp.TextContent); ok {
+			b.WriteString(tc.Text)
+		}
+	}
+	return b.String()
+}
+
+// expectedFullCatalog is the tool set when DisableMutating=false. Sorted.
+// References the Tool* constants so a rename can't drift code from tests.
+var expectedFullCatalog = sortedStrings(
+	ToolDeleteModel,
+	ToolEditModelConfig,
+	ToolGallerySearch,
+	ToolGetJobStatus,
+	ToolGetModelConfig,
+	ToolImportModelURI,
+	ToolInstallBackend,
+	ToolInstallModel,
+	ToolListBackends,
+	ToolListGalleries,
+	ToolListInstalledModels,
+	ToolListKnownBackends,
+	ToolListNodes,
+	ToolReloadModels,
+	ToolSystemInfo,
+	ToolToggleModelPinned,
+	ToolToggleModelState,
+	ToolUpgradeBackend,
+	ToolVRAMEstimate,
+)
+
+// expectedReadOnlyCatalog is the tool set when DisableMutating=true. Sorted.
+var expectedReadOnlyCatalog = sortedStrings(
+	ToolGallerySearch,
+	ToolGetJobStatus,
+	ToolGetModelConfig,
+	ToolListBackends,
+	ToolListGalleries,
+	ToolListInstalledModels,
+	ToolListKnownBackends,
+	ToolListNodes,
+	ToolSystemInfo,
+	ToolVRAMEstimate,
+)
+
+func sortedStrings(in ...string) []string {
+	out := append([]string(nil), in...)
+	sort.Strings(out)
+	return out
+}
+
+var _ = Describe("Server tool catalog", func() {
+	It("registers the full catalog when mutating tools are enabled", func() {
+		ctx, sess, done := connectInMemory(&fakeClient{}, Options{})
+		DeferCleanup(done)
+
+		Expect(listToolNames(ctx, sess)).To(Equal(expectedFullCatalog))
+	})
+
+	It("skips mutating tools when DisableMutating is set", func() {
+		ctx, sess, done := connectInMemory(&fakeClient{}, Options{DisableMutating: true})
+		DeferCleanup(done)
+
+		Expect(listToolNames(ctx, sess)).To(Equal(expectedReadOnlyCatalog))
+	})
+})
+
+var _ = Describe("Tool dispatch", func() {
+	type dispatchCase struct {
+		tool       string
+		args       any
+		wantMethod string
+	}
+
+	cases := []dispatchCase{
+		{ToolGallerySearch, GallerySearchQuery{Query: "qwen"}, "GallerySearch"},
+		{ToolListInstalledModels, map[string]any{"capability": "chat"}, "ListInstalledModels"},
+		{ToolListGalleries, struct{}{}, "ListGalleries"},
+		{ToolListBackends, struct{}{}, "ListBackends"},
+		{ToolListKnownBackends, struct{}{}, "ListKnownBackends"},
+		{ToolSystemInfo, struct{}{}, "SystemInfo"},
+		{ToolListNodes, struct{}{}, "ListNodes"},
+		{ToolInstallModel, InstallModelRequest{ModelName: "test/foo"}, "InstallModel"},
+		{ToolImportModelURI, ImportModelURIRequest{URI: "Qwen/Qwen3-4B-GGUF"}, "ImportModelURI"},
+		{ToolDeleteModel, map[string]any{"name": "foo"}, "DeleteModel"},
+		{ToolInstallBackend, InstallBackendRequest{BackendName: "llama-cpp"}, "InstallBackend"},
+		{ToolUpgradeBackend, map[string]any{"name": "llama-cpp"}, "UpgradeBackend"},
+		{ToolEditModelConfig, map[string]any{"name": "foo", "patch": map[string]any{"context_size": 4096}}, "EditModelConfig"},
+		{ToolReloadModels, struct{}{}, "ReloadModels"},
+		{ToolToggleModelState, map[string]any{"name": "foo", "action": "enable"}, "ToggleModelState"},
+		{ToolToggleModelPinned, map[string]any{"name": "foo", "action": "pin"}, "ToggleModelPinned"},
+	}
+
+	for _, c := range cases {
+		c := c
+		It("routes "+c.tool+" to "+c.wantMethod, func() {
+			fc := &fakeClient{
+				installModel:   func(InstallModelRequest) (string, error) { return "job-1", nil },
+				installBackend: func(InstallBackendRequest) (string, error) { return "job-2", nil },
+				upgradeBackend: func(string) (string, error) { return "job-3", nil },
+			}
+			ctx, sess, done := connectInMemory(fc, Options{})
+			DeferCleanup(done)
+
+			res := callTool(ctx, sess, c.tool, c.args)
+			Expect(res.IsError).To(BeFalse(), "tool %s returned error: %s", c.tool, resultText(res))
+
+			calls := fc.recorded()
+			Expect(calls).ToNot(BeEmpty(), "tool %s did not call the client", c.tool)
+			Expect(calls[len(calls)-1].method).To(Equal(c.wantMethod))
+		})
+	}
+})
+
+var _ = Describe("Tool error surfacing", func() {
+	It("propagates client errors verbatim via IsError + TextContent", func() {
+		fc := &fakeClient{
+			gallerySearch: func(GallerySearchQuery) ([]gallery.Metadata, error) {
+				return nil, errors.New("backend on fire")
+			},
+		}
+		ctx, sess, done := connectInMemory(fc, Options{})
+		DeferCleanup(done)
+
+		res := callTool(ctx, sess, ToolGallerySearch, GallerySearchQuery{Query: "x"})
+		Expect(res.IsError).To(BeTrue(), "expected IsError, got: %s", resultText(res))
+		Expect(resultText(res)).To(ContainSubstring("backend on fire"))
+	})
+})
+
+var _ = Describe("Argument validation", func() {
+	type validationCase struct {
+		desc string
+		tool string
+		args any
+		want string
+	}
+
+	// Required-field misses go through the SDK schema validator (the
+	// generated input schema marks name as required), not our handler.
+	cases := []validationCase{
+		{"install_model rejects empty model_name", ToolInstallModel, InstallModelRequest{}, "model_name is required"},
+		{"delete_model rejects missing name (schema)", ToolDeleteModel, map[string]any{}, "missing properties"},
+		{"toggle_model_state rejects unknown action", ToolToggleModelState, map[string]any{"name": "foo", "action": "noop"}, "action must be one of"},
+		{"edit_model_config rejects empty patch", ToolEditModelConfig, map[string]any{"name": "foo", "patch": map[string]any{}}, "patch is required"},
+	}
+
+	for _, c := range cases {
+		c := c
+		It(c.desc, func() {
+			ctx, sess, done := connectInMemory(&fakeClient{}, Options{})
+			DeferCleanup(done)
+
+			res := callTool(ctx, sess, c.tool, c.args)
+			Expect(res.IsError).To(BeTrue(), "expected validation error; got %s", resultText(res))
+			Expect(resultText(res)).To(ContainSubstring(c.want))
+		})
+	}
+})
+
+var _ = Describe("Concurrent tool calls", func() {
+	It("handles 20 parallel CallTool requests against one session without a race", func() {
+		fc := &fakeClient{}
+		ctx, sess, done := connectInMemory(fc, Options{})
+		DeferCleanup(done)
+
+		var wg sync.WaitGroup
+		for i := 0; i < 20; i++ {
+			wg.Add(1)
+			go func() {
+				defer wg.Done()
+				callTool(ctx, sess, ToolListGalleries, struct{}{})
+			}()
+		}
+		wg.Wait()
+
+		Expect(fc.recorded()).To(HaveLen(20))
+	})
+})
--- a/pkg/mcp/localaitools/tools.go
+++ b/pkg/mcp/localaitools/tools.go
@@ -0,0 +1,38 @@
+package localaitools
+
+// Tool names exposed by the LocalAI Assistant MCP server. Use these
+// constants — never bare strings — when registering tools, asserting the
+// catalog in tests, or referencing tool names from other packages. The
+// embedded skill prompts under prompts/ keep the bare strings because
+// go:embed-ed markdown can't reference Go constants; TestPromptsContain
+// SafetyAnchors guards that those strings stay aligned.
+const (
+	// Read-only tools.
+	ToolGallerySearch       = "gallery_search"
+	ToolListInstalledModels = "list_installed_models"
+	ToolListGalleries       = "list_galleries"
+	ToolGetJobStatus        = "get_job_status"
+	ToolGetModelConfig      = "get_model_config"
+	ToolListBackends        = "list_backends"
+	ToolListKnownBackends   = "list_known_backends"
+	ToolSystemInfo          = "system_info"
+	ToolListNodes           = "list_nodes"
+	ToolVRAMEstimate        = "vram_estimate"
+
+	// Mutating tools — guarded by Options.DisableMutating and the
+	// LLM-side safety prompt (see prompts/10_safety.md).
+	ToolInstallModel      = "install_model"
+	ToolImportModelURI    = "import_model_uri"
+	ToolDeleteModel       = "delete_model"
+	ToolEditModelConfig   = "edit_model_config"
+	ToolReloadModels      = "reload_models"
+	ToolInstallBackend    = "install_backend"
+	ToolUpgradeBackend    = "upgrade_backend"
+	ToolToggleModelState  = "toggle_model_state"
+	ToolToggleModelPinned = "toggle_model_pinned"
+)
+
+// DefaultServerName is the MCP Implementation.Name surfaced when
+// Options.ServerName is empty. Use the constant when you want a stable
+// reference across packages (e.g. test fixtures, CLI defaults).
+const DefaultServerName = "localai-admin"
--- a/pkg/mcp/localaitools/tools_backends.go
+++ b/pkg/mcp/localaitools/tools_backends.go
@@ -0,0 +1,65 @@
+package localaitools
+
+import (
+	"context"
+
+	"github.com/modelcontextprotocol/go-sdk/mcp"
+)
+
+func registerBackendTools(s *mcp.Server, client LocalAIClient, opts Options) {
+	mcp.AddTool(s, &mcp.Tool{
+		Name:        ToolListBackends,
+		Description: "List installed backends.",
+	}, func(ctx context.Context, _ *mcp.CallToolRequest, _ struct{}) (*mcp.CallToolResult, any, error) {
+		backends, err := client.ListBackends(ctx)
+		if err != nil {
+			return errorResult(err), nil, nil
+		}
+		return jsonResult(backends), nil, nil
+	})
+
+	mcp.AddTool(s, &mcp.Tool{
+		Name:        ToolListKnownBackends,
+		Description: "List backends available to install from configured backend galleries.",
+	}, func(ctx context.Context, _ *mcp.CallToolRequest, _ struct{}) (*mcp.CallToolResult, any, error) {
+		backends, err := client.ListKnownBackends(ctx)
+		if err != nil {
+			return errorResult(err), nil, nil
+		}
+		return jsonResult(backends), nil, nil
+	})
+
+	if opts.DisableMutating {
+		return
+	}
+
+	mcp.AddTool(s, &mcp.Tool{
+		Name:        ToolInstallBackend,
+		Description: "Install a backend from a backend gallery. Requires user confirmation per safety rule 1.",
+	}, func(ctx context.Context, _ *mcp.CallToolRequest, args InstallBackendRequest) (*mcp.CallToolResult, any, error) {
+		if args.BackendName == "" {
+			return errorResultf("backend_name is required"), nil, nil
+		}
+		jobID, err := client.InstallBackend(ctx, args)
+		if err != nil {
+			return errorResult(err), nil, nil
+		}
+		return jsonResult(map[string]any{"job_id": jobID}), nil, nil
+	})
+
+	mcp.AddTool(s, &mcp.Tool{
+		Name:        ToolUpgradeBackend,
+		Description: "Upgrade an installed backend by name. Requires user confirmation per safety rule 1.",
+	}, func(ctx context.Context, _ *mcp.CallToolRequest, args struct {
+		Name string `json:"name" jsonschema:"The installed backend name."`
+	}) (*mcp.CallToolResult, any, error) {
+		if args.Name == "" {
+			return errorResultf("name is required"), nil, nil
+		}
+		jobID, err := client.UpgradeBackend(ctx, args.Name)
+		if err != nil {
+			return errorResult(err), nil, nil
+		}
+		return jsonResult(map[string]any{"job_id": jobID}), nil, nil
+	})
+}
--- a/pkg/mcp/localaitools/tools_config.go
+++ b/pkg/mcp/localaitools/tools_config.go
@@ -0,0 +1,72 @@
+package localaitools
+
+import (
+	"context"
+
+	"github.com/modelcontextprotocol/go-sdk/mcp"
+)
+
+func registerConfigTools(s *mcp.Server, client LocalAIClient, opts Options) {
+	mcp.AddTool(s, &mcp.Tool{
+		Name:        ToolGetModelConfig,
+		Description: "Read the YAML/JSON config of an installed model. Use this before edit_model_config to show the user a diff.",
+	}, func(ctx context.Context, _ *mcp.CallToolRequest, args struct {
+		Name string `json:"name" jsonschema:"The installed model name."`
+	}) (*mcp.CallToolResult, any, error) {
+		if args.Name == "" {
+			return errorResultf("name is required"), nil, nil
+		}
+		cfg, err := client.GetModelConfig(ctx, args.Name)
+		if err != nil {
+			return errorResult(err), nil, nil
+		}
+		return jsonResult(cfg), nil, nil
+	})
+
+	mcp.AddTool(s, &mcp.Tool{
+		Name:        ToolVRAMEstimate,
+		Description: "Estimate VRAM usage for an installed model under a given config (context size, GPU layers, KV cache quantization).",
+	}, func(ctx context.Context, _ *mcp.CallToolRequest, args VRAMEstimateRequest) (*mcp.CallToolResult, any, error) {
+		if args.ModelName == "" {
+			return errorResultf("model_name is required"), nil, nil
+		}
+		est, err := client.VRAMEstimate(ctx, args)
+		if err != nil {
+			return errorResult(err), nil, nil
+		}
+		return jsonResult(est), nil, nil
+	})
+
+	if opts.DisableMutating {
+		return
+	}
+
+	mcp.AddTool(s, &mcp.Tool{
+		Name:        ToolEditModelConfig,
+		Description: "Patch (deep-merge) JSON into an installed model's config. Requires user confirmation per safety rule 1; show a diff first.",
+	}, func(ctx context.Context, _ *mcp.CallToolRequest, args struct {
+		Name  string         `json:"name"  jsonschema:"The installed model name."`
+		Patch map[string]any `json:"patch" jsonschema:"Deep-merge JSON patch — only the changed keys."`
+	}) (*mcp.CallToolResult, any, error) {
+		if args.Name == "" {
+			return errorResultf("name is required"), nil, nil
+		}
+		if len(args.Patch) == 0 {
+			return errorResultf("patch is required"), nil, nil
+		}
+		if err := client.EditModelConfig(ctx, args.Name, args.Patch); err != nil {
+			return errorResult(err), nil, nil
+		}
+		return jsonResult(map[string]any{"patched": args.Name}), nil, nil
+	})
+
+	mcp.AddTool(s, &mcp.Tool{
+		Name:        ToolReloadModels,
+		Description: "Reload all model configs from disk so changes (e.g. from edit_model_config) take effect. Requires user confirmation per safety rule 1.",
+	}, func(ctx context.Context, _ *mcp.CallToolRequest, _ struct{}) (*mcp.CallToolResult, any, error) {
+		if err := client.ReloadModels(ctx); err != nil {
+			return errorResult(err), nil, nil
+		}
+		return jsonResult(map[string]any{"reloaded": true}), nil, nil
+	})
+}
--- a/pkg/mcp/localaitools/tools_models.go
+++ b/pkg/mcp/localaitools/tools_models.go
@@ -0,0 +1,113 @@
+package localaitools
+
+import (
+	"context"
+
+	"github.com/modelcontextprotocol/go-sdk/mcp"
+)
+
+func registerModelTools(s *mcp.Server, client LocalAIClient, opts Options) {
+	mcp.AddTool(s, &mcp.Tool{
+		Name:        ToolGallerySearch,
+		Description: "Search configured galleries for installable models. Returns name, gallery, description, license and tags. Always run this before install_model.",
+	}, func(ctx context.Context, _ *mcp.CallToolRequest, args GallerySearchQuery) (*mcp.CallToolResult, any, error) {
+		hits, err := client.GallerySearch(ctx, args)
+		if err != nil {
+			return errorResult(err), nil, nil
+		}
+		return jsonResult(hits), nil, nil
+	})
+
+	mcp.AddTool(s, &mcp.Tool{
+		Name:        ToolListInstalledModels,
+		Description: "List models currently installed on this LocalAI. Optional capability filter (chat, completion, embeddings, image, tts, transcript, rerank, vad).",
+	}, func(ctx context.Context, _ *mcp.CallToolRequest, args struct {
+		Capability Capability `json:"capability,omitempty" jsonschema:"Filter to models advertising this capability. One of: chat, completion, embeddings, image, tts, transcript, rerank, vad. Empty value = no filter."`
+	}) (*mcp.CallToolResult, any, error) {
+		models, err := client.ListInstalledModels(ctx, args.Capability)
+		if err != nil {
+			return errorResult(err), nil, nil
+		}
+		return jsonResult(models), nil, nil
+	})
+
+	mcp.AddTool(s, &mcp.Tool{
+		Name:        ToolListGalleries,
+		Description: "List configured model galleries.",
+	}, func(ctx context.Context, _ *mcp.CallToolRequest, _ struct{}) (*mcp.CallToolResult, any, error) {
+		galleries, err := client.ListGalleries(ctx)
+		if err != nil {
+			return errorResult(err), nil, nil
+		}
+		return jsonResult(galleries), nil, nil
+	})
+
+	mcp.AddTool(s, &mcp.Tool{
+		Name:        ToolGetJobStatus,
+		Description: "Poll the status of an install/delete/upgrade job by id. Returns processed, progress, message, and error fields.",
+	}, func(ctx context.Context, _ *mcp.CallToolRequest, args struct {
+		JobID string `json:"job_id" jsonschema:"The job id returned by install_model / install_backend / upgrade_backend / delete_model."`
+	}) (*mcp.CallToolResult, any, error) {
+		if args.JobID == "" {
+			return errorResultf("job_id is required"), nil, nil
+		}
+		status, err := client.GetJobStatus(ctx, args.JobID)
+		if err != nil {
+			return errorResult(err), nil, nil
+		}
+		if status == nil {
+			return errorResultf("no job with id %q", args.JobID), nil, nil
+		}
+		return jsonResult(status), nil, nil
+	})
+
+	if opts.DisableMutating {
+		return
+	}
+
+	mcp.AddTool(s, &mcp.Tool{
+		Name:        ToolInstallModel,
+		Description: "Install a model from a gallery. Requires explicit user confirmation per safety rule 1. Returns a job id; poll with get_job_status.",
+	}, func(ctx context.Context, _ *mcp.CallToolRequest, args InstallModelRequest) (*mcp.CallToolResult, any, error) {
+		// Empty-string check at the tool layer: the SDK schema validator
+		// only enforces presence, not non-empty, and we want a consistent
+		// error regardless of which LocalAIClient backs the tool.
+		if args.ModelName == "" {
+			return errorResultf("model_name is required"), nil, nil
+		}
+		jobID, err := client.InstallModel(ctx, args)
+		if err != nil {
+			return errorResult(err), nil, nil
+		}
+		return jsonResult(map[string]any{"job_id": jobID}), nil, nil
+	})
+
+	mcp.AddTool(s, &mcp.Tool{
+		Name:        ToolImportModelURI,
+		Description: "Import a model from a URI (HuggingFace link, OCI image, file path, or HTTP URL). The importer auto-detects the backend; when multiple backends could handle the source, the response sets ambiguous_backend=true and lists candidates. Surface them to the user, then call again with backend_preference set. Requires user confirmation per safety rule 1.",
+	}, func(ctx context.Context, _ *mcp.CallToolRequest, args ImportModelURIRequest) (*mcp.CallToolResult, any, error) {
+		if args.URI == "" {
+			return errorResultf("uri is required"), nil, nil
+		}
+		resp, err := client.ImportModelURI(ctx, args)
+		if err != nil {
+			return errorResult(err), nil, nil
+		}
+		return jsonResult(resp), nil, nil
+	})
+
+	mcp.AddTool(s, &mcp.Tool{
+		Name:        ToolDeleteModel,
+		Description: "Delete an installed model by name. Requires explicit user confirmation per safety rule 1.",
+	}, func(ctx context.Context, _ *mcp.CallToolRequest, args struct {
+		Name string `json:"name" jsonschema:"The installed model name."`
+	}) (*mcp.CallToolResult, any, error) {
+		if args.Name == "" {
+			return errorResultf("name is required"), nil, nil
+		}
+		if err := client.DeleteModel(ctx, args.Name); err != nil {
+			return errorResult(err), nil, nil
+		}
+		return jsonResult(map[string]any{"deleted": args.Name}), nil, nil
+	})
+}
--- a/pkg/mcp/localaitools/tools_state.go
+++ b/pkg/mcp/localaitools/tools_state.go
@@ -0,0 +1,58 @@
+package localaitools
+
+import (
+	"context"
+	"fmt"
+
+	"github.com/modelcontextprotocol/go-sdk/mcp"
+
+	"github.com/mudler/LocalAI/core/services/modeladmin"
+)
+
+func registerStateTools(s *mcp.Server, client LocalAIClient, opts Options) {
+	if opts.DisableMutating {
+		return
+	}
+
+	mcp.AddTool(s, &mcp.Tool{
+		Name:        ToolToggleModelState,
+		Description: "Enable or disable an installed model. action must be 'enable' or 'disable'. Requires user confirmation per safety rule 1.",
+	}, func(ctx context.Context, _ *mcp.CallToolRequest, args struct {
+		Name   string            `json:"name"   jsonschema:"The installed model name."`
+		Action modeladmin.Action `json:"action" jsonschema:"Either 'enable' or 'disable'."`
+	}) (*mcp.CallToolResult, any, error) {
+		if err := requireToggleArgs(args.Name, args.Action, modeladmin.ActionEnable, modeladmin.ActionDisable); err != nil {
+			return errorResult(err), nil, nil
+		}
+		if err := client.ToggleModelState(ctx, args.Name, args.Action); err != nil {
+			return errorResult(err), nil, nil
+		}
+		return jsonResult(map[string]any{"name": args.Name, "action": args.Action}), nil, nil
+	})
+
+	mcp.AddTool(s, &mcp.Tool{
+		Name:        ToolToggleModelPinned,
+		Description: "Pin or unpin an installed model. action must be 'pin' or 'unpin'. Requires user confirmation per safety rule 1.",
+	}, func(ctx context.Context, _ *mcp.CallToolRequest, args struct {
+		Name   string            `json:"name"   jsonschema:"The installed model name."`
+		Action modeladmin.Action `json:"action" jsonschema:"Either 'pin' or 'unpin'."`
+	}) (*mcp.CallToolResult, any, error) {
+		if err := requireToggleArgs(args.Name, args.Action, modeladmin.ActionPin, modeladmin.ActionUnpin); err != nil {
+			return errorResult(err), nil, nil
+		}
+		if err := client.ToggleModelPinned(ctx, args.Name, args.Action); err != nil {
+			return errorResult(err), nil, nil
+		}
+		return jsonResult(map[string]any{"name": args.Name, "action": args.Action}), nil, nil
+	})
+}
+
+func requireToggleArgs(name string, action modeladmin.Action, allowed ...modeladmin.Action) error {
+	if name == "" {
+		return fmt.Errorf("name is required")
+	}
+	if !action.Valid(allowed...) {
+		return fmt.Errorf("action must be one of %v, got %q", allowed, action)
+	}
+	return nil
+}
--- a/pkg/mcp/localaitools/tools_system.go
+++ b/pkg/mcp/localaitools/tools_system.go
@@ -0,0 +1,31 @@
+package localaitools
+
+import (
+	"context"
+
+	"github.com/modelcontextprotocol/go-sdk/mcp"
+)
+
+func registerSystemTools(s *mcp.Server, client LocalAIClient, _ Options) {
+	mcp.AddTool(s, &mcp.Tool{
+		Name:        ToolSystemInfo,
+		Description: "Report LocalAI version, paths, distributed flag, currently loaded models, and installed backends.",
+	}, func(ctx context.Context, _ *mcp.CallToolRequest, _ struct{}) (*mcp.CallToolResult, any, error) {
+		info, err := client.SystemInfo(ctx)
+		if err != nil {
+			return errorResult(err), nil, nil
+		}
+		return jsonResult(info), nil, nil
+	})
+
+	mcp.AddTool(s, &mcp.Tool{
+		Name:        ToolListNodes,
+		Description: "List federated worker nodes (only meaningful in distributed mode; returns an empty list otherwise).",
+	}, func(ctx context.Context, _ *mcp.CallToolRequest, _ struct{}) (*mcp.CallToolResult, any, error) {
+		nodes, err := client.ListNodes(ctx)
+		if err != nil {
+			return errorResult(err), nil, nil
+		}
+		return jsonResult(nodes), nil, nil
+	})
+}