feat(ui): MCP Apps, mcp streaming and client-side support (#8947)

* Revert "fix: Add timeout-based wait for model deletion completion (#8756)" This reverts commit 9e1b0d0c82. Signed-off-by: Ettore Di Giacinto <mudler@localai.io> * feat: add mcp prompts and resources Signed-off-by: Ettore Di Giacinto <mudler@localai.io> * feat(ui): add client-side MCP Signed-off-by: Ettore Di Giacinto <mudler@localai.io> * feat(ui): allow to authenticate MCP servers Signed-off-by: Ettore Di Giacinto <mudler@localai.io> * feat(ui): add MCP Apps Signed-off-by: Ettore Di Giacinto <mudler@localai.io> * chore: update AGENTS Signed-off-by: Ettore Di Giacinto <mudler@localai.io> * chore: allow to collapse navbar, save state in storage Signed-off-by: Ettore Di Giacinto <mudler@localai.io> * feat(ui): add MCP button also to home page Signed-off-by: Ettore Di Giacinto <mudler@localai.io> * fix(chat): populate string content Signed-off-by: Ettore Di Giacinto <mudler@localai.io> --------- Signed-off-by: Ettore Di Giacinto <mudler@localai.io>
2026-07-04 21:37:02 -04:00 · 2026-03-11 07:30:49 +01:00
parent 79f90de935
commit 8818452d85
44 changed files with 6526 additions and 2479 deletions
--- a/.agents/adding-backends.md
+++ b/.agents/adding-backends.md
@@ -0,0 +1,143 @@
+# Adding a New Backend
+
+When adding a new backend to LocalAI, you need to update several files to ensure the backend is properly built, tested, and registered. Here's a step-by-step guide based on the pattern used for adding backends like `moonshine`:
+
+## 1. Create Backend Directory Structure
+
+Create the backend directory under the appropriate location:
+- **Python backends**: `backend/python/<backend-name>/`
+- **Go backends**: `backend/go/<backend-name>/`
+- **C++ backends**: `backend/cpp/<backend-name>/`
+
+For Python backends, you'll typically need:
+- `backend.py` - Main gRPC server implementation
+- `Makefile` - Build configuration
+- `install.sh` - Installation script for dependencies
+- `protogen.sh` - Protocol buffer generation script
+- `requirements.txt` - Python dependencies
+- `run.sh` - Runtime script
+- `test.py` / `test.sh` - Test files
+
+## 2. Add Build Configurations to `.github/workflows/backend.yml`
+
+Add build matrix entries for each platform/GPU type you want to support. Look at similar backends (e.g., `chatterbox`, `faster-whisper`) for reference.
+
+**Placement in file:**
+- CPU builds: Add after other CPU builds (e.g., after `cpu-chatterbox`)
+- CUDA 12 builds: Add after other CUDA 12 builds (e.g., after `gpu-nvidia-cuda-12-chatterbox`)
+- CUDA 13 builds: Add after other CUDA 13 builds (e.g., after `gpu-nvidia-cuda-13-chatterbox`)
+
+**Additional build types you may need:**
+- ROCm/HIP: Use `build-type: 'hipblas'` with `base-image: "rocm/dev-ubuntu-24.04:6.4.4"`
+- Intel/SYCL: Use `build-type: 'intel'` or `build-type: 'sycl_f16'`/`sycl_f32` with `base-image: "intel/oneapi-basekit:2025.3.0-0-devel-ubuntu24.04"`
+- L4T (ARM): Use `build-type: 'l4t'` with `platforms: 'linux/arm64'` and `runs-on: 'ubuntu-24.04-arm'`
+
+## 3. Add Backend Metadata to `backend/index.yaml`
+
+**Step 3a: Add Meta Definition**
+
+Add a YAML anchor definition in the `## metas` section (around line 2-300). Look for similar backends to use as a template such as `diffusers` or `chatterbox`
+
+**Step 3b: Add Image Entries**
+
+Add image entries at the end of the file, following the pattern of similar backends such as `diffusers` or `chatterbox`. Include both `latest` (production) and `master` (development) tags.
+
+## 4. Update the Makefile
+
+The Makefile needs to be updated in several places to support building and testing the new backend:
+
+**Step 4a: Add to `.NOTPARALLEL`**
+
+Add `backends/<backend-name>` to the `.NOTPARALLEL` line (around line 2) to prevent parallel execution conflicts:
+
+```makefile
+.NOTPARALLEL: ... backends/<backend-name>
+```
+
+**Step 4b: Add to `prepare-test-extra`**
+
+Add the backend to the `prepare-test-extra` target (around line 312) to prepare it for testing:
+
+```makefile
+prepare-test-extra: protogen-python
+	...
+	$(MAKE) -C backend/python/<backend-name>
+```
+
+**Step 4c: Add to `test-extra`**
+
+Add the backend to the `test-extra` target (around line 319) to run its tests:
+
+```makefile
+test-extra: prepare-test-extra
+	...
+	$(MAKE) -C backend/python/<backend-name> test
+```
+
+**Step 4d: Add Backend Definition**
+
+Add a backend definition variable in the backend definitions section (around line 428-457). The format depends on the backend type:
+
+**For Python backends with root context** (like `faster-whisper`, `coqui`):
+```makefile
+BACKEND_<BACKEND_NAME> = <backend-name>|python|.|false|true
+```
+
+**For Python backends with `./backend` context** (like `chatterbox`, `moonshine`):
+```makefile
+BACKEND_<BACKEND_NAME> = <backend-name>|python|./backend|false|true
+```
+
+**For Go backends**:
+```makefile
+BACKEND_<BACKEND_NAME> = <backend-name>|golang|.|false|true
+```
+
+**Step 4e: Generate Docker Build Target**
+
+Add an eval call to generate the docker-build target (around line 480-501):
+
+```makefile
+$(eval $(call generate-docker-build-target,$(BACKEND_<BACKEND_NAME>)))
+```
+
+**Step 4f: Add to `docker-build-backends`**
+
+Add `docker-build-<backend-name>` to the `docker-build-backends` target (around line 507):
+
+```makefile
+docker-build-backends: ... docker-build-<backend-name>
+```
+
+**Determining the Context:**
+
+- If the backend is in `backend/python/<backend-name>/` and uses `./backend` as context in the workflow file, use `./backend` context
+- If the backend is in `backend/python/<backend-name>/` but uses `.` as context in the workflow file, use `.` context
+- Check similar backends to determine the correct context
+
+## 5. Verification Checklist
+
+After adding a new backend, verify:
+
+- [ ] Backend directory structure is complete with all necessary files
+- [ ] Build configurations added to `.github/workflows/backend.yml` for all desired platforms
+- [ ] Meta definition added to `backend/index.yaml` in the `## metas` section
+- [ ] Image entries added to `backend/index.yaml` for all build variants (latest + development)
+- [ ] Tag suffixes match between workflow file and index.yaml
+- [ ] Makefile updated with all 6 required changes (`.NOTPARALLEL`, `prepare-test-extra`, `test-extra`, backend definition, docker-build target eval, `docker-build-backends`)
+- [ ] No YAML syntax errors (check with linter)
+- [ ] No Makefile syntax errors (check with linter)
+- [ ] Follows the same pattern as similar backends (e.g., if it's a transcription backend, follow `faster-whisper` pattern)
+
+## 6. Example: Adding a Python Backend
+
+For reference, when `moonshine` was added:
+- **Files created**: `backend/python/moonshine/{backend.py, Makefile, install.sh, protogen.sh, requirements.txt, run.sh, test.py, test.sh}`
+- **Workflow entries**: 3 build configurations (CPU, CUDA 12, CUDA 13)
+- **Index entries**: 1 meta definition + 6 image entries (cpu, cuda12, cuda13 x latest/development)
+- **Makefile updates**:
+  - Added to `.NOTPARALLEL` line
+  - Added to `prepare-test-extra` and `test-extra` targets
+  - Added `BACKEND_MOONSHINE = moonshine|python|./backend|false|true`
+  - Added eval for docker-build target generation
+  - Added `docker-build-moonshine` to `docker-build-backends`
--- a/.agents/building-and-testing.md
+++ b/.agents/building-and-testing.md
@@ -0,0 +1,16 @@
+# Build and Testing
+
+Building and testing the project depends on the components involved and the platform where development is taking place. Due to the amount of context required it's usually best not to try building or testing the project unless the user requests it. If you must build the project then inspect the Makefile in the project root and the Makefiles of any backends that are effected by changes you are making. In addition the workflows in .github/workflows can be used as a reference when it is unclear how to build or test a component. The primary Makefile contains targets for building inside or outside Docker, if the user has not previously specified a preference then ask which they would like to use.
+
+## Building a specified backend
+
+Let's say the user wants to build a particular backend for a given platform. For example let's say they want to build coqui for ROCM/hipblas
+
+- The Makefile has targets like `docker-build-coqui` created with `generate-docker-build-target` at the time of writing. Recently added backends may require a new target.
+- At a minimum we need to set the BUILD_TYPE, BASE_IMAGE build-args
+  - Use .github/workflows/backend.yml as a reference it lists the needed args in the `include` job strategy matrix
+  - l4t and cublas also requires the CUDA major and minor version
+- You can pretty print a command like `DOCKER_MAKEFLAGS=-j$(nproc --ignore=1) BUILD_TYPE=hipblas BASE_IMAGE=rocm/dev-ubuntu-24.04:6.4.4 make docker-build-coqui`
+- Unless the user specifies that they want you to run the command, then just print it because not all agent frontends handle long running jobs well and the output may overflow your context
+- The user may say they want to build AMD or ROCM instead of hipblas, or Intel instead of SYCL or NVIDIA insted of l4t or cublas. Ask for confirmation if there is ambiguity.
+- Sometimes the user may need extra parameters to be added to `docker build` (e.g. `--platform` for cross-platform builds or `--progress` to view the full logs), in which case you can generate the `docker build` command directly.
--- a/.agents/coding-style.md
+++ b/.agents/coding-style.md
@@ -0,0 +1,51 @@
+# Coding Style
+
+The project has the following .editorconfig:
+
+```
+root = true
+
+[*]
+indent_style = space
+indent_size = 2
+end_of_line = lf
+charset = utf-8
+trim_trailing_whitespace = true
+insert_final_newline = true
+
+[*.go]
+indent_style = tab
+
+[Makefile]
+indent_style = tab
+
+[*.proto]
+indent_size = 2
+
+[*.py]
+indent_size = 4
+
+[*.js]
+indent_size = 2
+
+[*.yaml]
+indent_size = 2
+
+[*.md]
+trim_trailing_whitespace = false
+```
+
+- Use comments sparingly to explain why code does something, not what it does. Comments are there to add context that would be difficult to deduce from reading the code.
+- Prefer modern Go e.g. use `any` not `interface{}`
+
+## Logging
+
+Use `github.com/mudler/xlog` for logging which has the same API as slog.
+
+## Documentation
+
+The project documentation is located in `docs/content`. When adding new features or changing existing functionality, it is crucial to update the documentation to reflect these changes. This helps users understand how to use the new capabilities and ensures the documentation stays relevant.
+
+- **Feature Documentation**: If you add a new feature (like a new backend or API endpoint), create a new markdown file in `docs/content/features/` explaining what it is, how to configure it, and how to use it.
+- **Configuration**: If you modify configuration options, update the relevant sections in `docs/content/`.
+- **Examples**: providing concrete examples (like YAML configuration blocks) is highly encouraged to help users get started quickly.
--- a/.agents/llama-cpp-backend.md
+++ b/.agents/llama-cpp-backend.md
@@ -0,0 +1,77 @@
+# llama.cpp Backend
+
+The llama.cpp backend (`backend/cpp/llama-cpp/grpc-server.cpp`) is a gRPC adaptation of the upstream HTTP server (`llama.cpp/tools/server/server.cpp`). It uses the same underlying server infrastructure from `llama.cpp/tools/server/server-context.cpp`.
+
+## Building and Testing
+
+- Test llama.cpp backend compilation: `make backends/llama-cpp`
+- The backend is built as part of the main build process
+- Check `backend/cpp/llama-cpp/Makefile` for build configuration
+
+## Architecture
+
+- **grpc-server.cpp**: gRPC server implementation, adapts HTTP server patterns to gRPC
+- Uses shared server infrastructure: `server-context.cpp`, `server-task.cpp`, `server-queue.cpp`, `server-common.cpp`
+- The gRPC server mirrors the HTTP server's functionality but uses gRPC instead of HTTP
+
+## Common Issues When Updating llama.cpp
+
+When fixing compilation errors after upstream changes:
+1. Check how `server.cpp` (HTTP server) handles the same change
+2. Look for new public APIs or getter methods
+3. Store copies of needed data instead of accessing private members
+4. Update function calls to match new signatures
+5. Test with `make backends/llama-cpp`
+
+## Key Differences from HTTP Server
+
+- gRPC uses `BackendServiceImpl` class with gRPC service methods
+- HTTP server uses `server_routes` with HTTP handlers
+- Both use the same `server_context` and task queue infrastructure
+- gRPC methods: `LoadModel`, `Predict`, `PredictStream`, `Embedding`, `Rerank`, `TokenizeString`, `GetMetrics`, `Health`
+
+## Tool Call Parsing Maintenance
+
+When working on JSON/XML tool call parsing functionality, always check llama.cpp for reference implementation and updates:
+
+### Checking for XML Parsing Changes
+
+1. **Review XML Format Definitions**: Check `llama.cpp/common/chat-parser-xml-toolcall.h` for `xml_tool_call_format` struct changes
+2. **Review Parsing Logic**: Check `llama.cpp/common/chat-parser-xml-toolcall.cpp` for parsing algorithm updates
+3. **Review Format Presets**: Check `llama.cpp/common/chat-parser.cpp` for new XML format presets (search for `xml_tool_call_format form`)
+4. **Review Model Lists**: Check `llama.cpp/common/chat.h` for `COMMON_CHAT_FORMAT_*` enum values that use XML parsing:
+   - `COMMON_CHAT_FORMAT_GLM_4_5`
+   - `COMMON_CHAT_FORMAT_MINIMAX_M2`
+   - `COMMON_CHAT_FORMAT_KIMI_K2`
+   - `COMMON_CHAT_FORMAT_QWEN3_CODER_XML`
+   - `COMMON_CHAT_FORMAT_APRIEL_1_5`
+   - `COMMON_CHAT_FORMAT_XIAOMI_MIMO`
+   - Any new formats added
+
+### Model Configuration Options
+
+Always check `llama.cpp` for new model configuration options that should be supported in LocalAI:
+
+1. **Check Server Context**: Review `llama.cpp/tools/server/server-context.cpp` for new parameters
+2. **Check Chat Params**: Review `llama.cpp/common/chat.h` for `common_chat_params` struct changes
+3. **Check Server Options**: Review `llama.cpp/tools/server/server.cpp` for command-line argument changes
+4. **Examples of options to check**:
+   - `ctx_shift` - Context shifting support
+   - `parallel_tool_calls` - Parallel tool calling
+   - `reasoning_format` - Reasoning format options
+   - Any new flags or parameters
+
+### Implementation Guidelines
+
+1. **Feature Parity**: Always aim for feature parity with llama.cpp's implementation
+2. **Test Coverage**: Add tests for new features matching llama.cpp's behavior
+3. **Documentation**: Update relevant documentation when adding new formats or options
+4. **Backward Compatibility**: Ensure changes don't break existing functionality
+
+### Files to Monitor
+
+- `llama.cpp/common/chat-parser-xml-toolcall.h` - Format definitions
+- `llama.cpp/common/chat-parser-xml-toolcall.cpp` - Parsing logic
+- `llama.cpp/common/chat-parser.cpp` - Format presets and model-specific handlers
+- `llama.cpp/common/chat.h` - Format enums and parameter structures
+- `llama.cpp/tools/server/server-context.cpp` - Server configuration options
--- a/.agents/testing-mcp-apps.md
+++ b/.agents/testing-mcp-apps.md
@@ -0,0 +1,120 @@
+# Testing MCP Apps (Interactive Tool UIs)
+
+MCP Apps is an extension to MCP where tools declare interactive HTML UIs via `_meta.ui.resourceUri`. When the LLM calls such a tool, the UI renders the app in a sandboxed iframe inline in the chat. The app communicates bidirectionally with the host via `postMessage` (JSON-RPC) and can call server tools, send messages, and update model context.
+
+Spec: https://modelcontextprotocol.io/extensions/apps/overview
+
+## Quick Start: Run a Test MCP App Server
+
+The `@modelcontextprotocol/server-basic-react` npm package is a ready-to-use test server that exposes a `get-time` tool with an interactive React clock UI. It requires Node >= 20, so run it in Docker:
+
+```bash
+docker run -d --name mcp-app-test -p 3001:3001 node:22-slim \
+  sh -c 'npx -y @modelcontextprotocol/server-basic-react'
+```
+
+Wait ~10 seconds for it to start, then verify:
+
+```bash
+# Check it's running
+docker logs mcp-app-test
+# Expected: "MCP server listening on http://localhost:3001/mcp"
+
+# Verify MCP protocol works
+curl -s -X POST http://localhost:3001/mcp \
+  -H 'Content-Type: application/json' \
+  -H 'Accept: application/json, text/event-stream' \
+  -d '{"jsonrpc":"2.0","id":1,"method":"initialize","params":{"protocolVersion":"2025-03-26","capabilities":{},"clientInfo":{"name":"test","version":"1.0.0"}}}'
+
+# List tools — should show get-time with _meta.ui.resourceUri
+curl -s -X POST http://localhost:3001/mcp \
+  -H 'Content-Type: application/json' \
+  -H 'Accept: application/json, text/event-stream' \
+  -d '{"jsonrpc":"2.0","id":2,"method":"tools/list","params":{}}'
+```
+
+The `tools/list` response should contain:
+```json
+{
+  "name": "get-time",
+  "_meta": {
+    "ui": { "resourceUri": "ui://get-time/mcp-app.html" }
+  }
+}
+```
+
+## Testing in LocalAI's UI
+
+1. Make sure LocalAI is running (e.g. `http://localhost:8080`)
+2. Build the React UI: `cd core/http/react-ui && npm install && npm run build`
+3. Open the Chat page in your browser
+4. Click **"Client MCP"** in the chat header
+5. Add a new client MCP server:
+   - **URL**: `http://localhost:3001/mcp`
+   - **Use CORS proxy**: enabled (default) — required because the browser can't hit `localhost:3001` directly due to CORS; LocalAI's proxy at `/api/cors-proxy` handles it
+6. The server should connect and discover the `get-time` tool
+7. Select a model and send: **"What time is it?"**
+8. The LLM should call the `get-time` tool
+9. The tool result should render the interactive React clock app in an iframe as a standalone chat message (not inside the collapsed activity group)
+
+## What to Verify
+
+- [ ] Tool appears in the connected tools list (not filtered — `get-time` is callable by the LLM)
+- [ ] The iframe renders as a standalone chat message with a puzzle-piece icon
+- [ ] The app loads and is interactive (clock UI, buttons work)
+- [ ] No "Reconnect to MCP server" overlay (connection is live)
+- [ ] Console logs show bidirectional communication:
+  - `tools/call` messages from app to host (app calling server tools)
+  - `ui/message` notifications (app sending messages)
+- [ ] After the app renders, the LLM continues and produces a text response with the time
+- [ ] Non-UI tools continue to work normally (text-only results)
+- [ ] Page reload shows the HTML statically with a reconnect overlay until you reconnect
+
+## Console Log Patterns
+
+Healthy bidirectional communication looks like:
+
+```
+Parsed message { jsonrpc: "2.0", id: N, result: {...} }     // Bridge init
+get-time result: { content: [...] }                          // Tool result received
+Calling get-time tool...                                     // App calls tool
+Sending message { method: "tools/call", ... }                // App -> host -> server
+Parsed message { jsonrpc: "2.0", id: N, result: {...} }     // Server response
+Sending message text to Host: ...                            // App sends message
+Sending message { method: "ui/message", ... }                // Message notification
+Message accepted                                             // Host acknowledged
+```
+
+Benign warnings to ignore:
+- `Source map error: ... about:srcdoc` — browser devtools can't find source maps for srcdoc iframes
+- `Ignoring message from unknown source` — duplicate postMessage from iframe navigation
+- `notifications/cancelled` — app cleaning up previous requests
+
+## Architecture Notes
+
+- **No server-side changes needed** — the MCP App protocol runs entirely in the browser
+- `PostMessageTransport` wraps `window.postMessage` between host and `srcdoc` iframe
+- `AppBridge` (from `@modelcontextprotocol/ext-apps`) auto-forwards `tools/call`, `resources/read`, `resources/list` from the app to the MCP server via the host's `Client`
+- The iframe uses `sandbox="allow-scripts allow-forms"` (no `allow-same-origin`) — opaque origin, no access to host cookies/DOM/localStorage
+- App-only tools (`_meta.ui.visibility: "app-only"`) are filtered from the LLM's tool list but remain callable by the app iframe
+
+## Key Files
+
+- `core/http/react-ui/src/components/MCPAppFrame.jsx` — iframe + AppBridge component
+- `core/http/react-ui/src/hooks/useMCPClient.js` — MCP client hook with app UI helpers (`hasAppUI`, `getAppResource`, `getClientForTool`, `getToolDefinition`)
+- `core/http/react-ui/src/hooks/useChat.js` — agentic loop, attaches `appUI` to tool_result messages
+- `core/http/react-ui/src/pages/Chat.jsx` — renders MCPAppFrame as standalone chat messages
+
+## Other Test Servers
+
+The `@modelcontextprotocol/ext-apps` repo has many example servers:
+- `@modelcontextprotocol/server-basic-react` — simple clock (React)
+- More examples at https://github.com/modelcontextprotocol/ext-apps/tree/main/examples
+
+All examples support both stdio and HTTP transport. Run without `--stdio` for HTTP mode on port 3001.
+
+## Cleanup
+
+```bash
+docker rm -f mcp-app-test
+```
--- a/AGENTS.md
+++ b/AGENTS.md
@@ -1,290 +1,22 @@
-# Build and testing
+# LocalAI Agent Instructions

-Building and testing the project depends on the components involved and the platform where development is taking place. Due to the amount of context required it's usually best not to try building or testing the project unless the user requests it. If you must build the project then inspect the Makefile in the project root and the Makefiles of any backends that are effected by changes you are making. In addition the workflows in .github/workflows can be used as a reference when it is unclear how to build or test a component. The primary Makefile contains targets for building inside or outside Docker, if the user has not previously specified a preference then ask which they would like to use.
+This file is an index to detailed topic guides in the `.agents/` directory. Read the relevant file(s) for the task at hand — you don't need to load all of them.

-## Building a specified backend
+## Topics

-Let's say the user wants to build a particular backend for a given platform. For example let's say they want to build coqui for ROCM/hipblas
+| File | When to read |
+|------|-------------|
+| [.agents/building-and-testing.md](.agents/building-and-testing.md) | Building the project, running tests, Docker builds for specific platforms |
+| [.agents/adding-backends.md](.agents/adding-backends.md) | Adding a new backend (Python, Go, or C++) — full step-by-step checklist |
+| [.agents/coding-style.md](.agents/coding-style.md) | Code style, editorconfig, logging, documentation conventions |
+| [.agents/llama-cpp-backend.md](.agents/llama-cpp-backend.md) | Working on the llama.cpp backend — architecture, updating, tool call parsing |
+| [.agents/testing-mcp-apps.md](.agents/testing-mcp-apps.md) | Testing MCP Apps (interactive tool UIs) in the React UI |

- The Makefile has targets like `docker-build-coqui` created with `generate-docker-build-target` at the time of writing. Recently added backends may require a new target.
- At a minimum we need to set the BUILD_TYPE, BASE_IMAGE build-args
-  - Use .github/workflows/backend.yml as a reference it lists the needed args in the `include` job strategy matrix
-  - l4t and cublas also requires the CUDA major and minor version
- You can pretty print a command like `DOCKER_MAKEFLAGS=-j$(nproc --ignore=1) BUILD_TYPE=hipblas BASE_IMAGE=rocm/dev-ubuntu-24.04:6.4.4 make docker-build-coqui`
- Unless the user specifies that they want you to run the command, then just print it because not all agent frontends handle long running jobs well and the output may overflow your context
- The user may say they want to build AMD or ROCM instead of hipblas, or Intel instead of SYCL or NVIDIA insted of l4t or cublas. Ask for confirmation if there is ambiguity.
- Sometimes the user may need extra parameters to be added to `docker build` (e.g. `--platform` for cross-platform builds or `--progress` to view the full logs), in which case you can generate the `docker build` command directly.
+## Quick Reference

-## Adding a New Backend
-
-When adding a new backend to LocalAI, you need to update several files to ensure the backend is properly built, tested, and registered. Here's a step-by-step guide based on the pattern used for adding backends like `moonshine`:
-
-### 1. Create Backend Directory Structure
-
-Create the backend directory under the appropriate location:
- **Python backends**: `backend/python/<backend-name>/`
- **Go backends**: `backend/go/<backend-name>/`
- **C++ backends**: `backend/cpp/<backend-name>/`
-
-For Python backends, you'll typically need:
- `backend.py` - Main gRPC server implementation
- `Makefile` - Build configuration
- `install.sh` - Installation script for dependencies
- `protogen.sh` - Protocol buffer generation script
- `requirements.txt` - Python dependencies
- `run.sh` - Runtime script
- `test.py` / `test.sh` - Test files
-
-### 2. Add Build Configurations to `.github/workflows/backend.yml`
-
-Add build matrix entries for each platform/GPU type you want to support. Look at similar backends (e.g., `chatterbox`, `faster-whisper`) for reference.
-
-**Placement in file:**
- CPU builds: Add after other CPU builds (e.g., after `cpu-chatterbox`)
- CUDA 12 builds: Add after other CUDA 12 builds (e.g., after `gpu-nvidia-cuda-12-chatterbox`)
- CUDA 13 builds: Add after other CUDA 13 builds (e.g., after `gpu-nvidia-cuda-13-chatterbox`)
-
-**Additional build types you may need:**
- ROCm/HIP: Use `build-type: 'hipblas'` with `base-image: "rocm/dev-ubuntu-24.04:6.4.4"`
- Intel/SYCL: Use `build-type: 'intel'` or `build-type: 'sycl_f16'`/`sycl_f32` with `base-image: "intel/oneapi-basekit:2025.3.0-0-devel-ubuntu24.04"`
- L4T (ARM): Use `build-type: 'l4t'` with `platforms: 'linux/arm64'` and `runs-on: 'ubuntu-24.04-arm'`
-
-### 3. Add Backend Metadata to `backend/index.yaml`
-
-**Step 3a: Add Meta Definition**
-
-Add a YAML anchor definition in the `## metas` section (around line 2-300). Look for similar backends to use as a template such as `diffusers` or `chatterbox`
-
-**Step 3b: Add Image Entries**
-
-Add image entries at the end of the file, following the pattern of similar backends such as `diffusers` or `chatterbox`. Include both `latest` (production) and `master` (development) tags.
-
-### 4. Update the Makefile
-
-The Makefile needs to be updated in several places to support building and testing the new backend:
-
-**Step 4a: Add to `.NOTPARALLEL`**
-
-Add `backends/<backend-name>` to the `.NOTPARALLEL` line (around line 2) to prevent parallel execution conflicts:
-
-```makefile
-.NOTPARALLEL: ... backends/<backend-name>
-```
-
-**Step 4b: Add to `prepare-test-extra`**
-
-Add the backend to the `prepare-test-extra` target (around line 312) to prepare it for testing:
-
-```makefile
-prepare-test-extra: protogen-python
-	...
-	$(MAKE) -C backend/python/<backend-name>
-```
-
-**Step 4c: Add to `test-extra`**
-
-Add the backend to the `test-extra` target (around line 319) to run its tests:
-
-```makefile
-test-extra: prepare-test-extra
-	...
-	$(MAKE) -C backend/python/<backend-name> test
-```
-
-**Step 4d: Add Backend Definition**
-
-Add a backend definition variable in the backend definitions section (around line 428-457). The format depends on the backend type:
-
-**For Python backends with root context** (like `faster-whisper`, `coqui`):
-```makefile
-BACKEND_<BACKEND_NAME> = <backend-name>|python|.|false|true
-```
-
-**For Python backends with `./backend` context** (like `chatterbox`, `moonshine`):
-```makefile
-BACKEND_<BACKEND_NAME> = <backend-name>|python|./backend|false|true
-```
-
-**For Go backends**:
-```makefile
-BACKEND_<BACKEND_NAME> = <backend-name>|golang|.|false|true
-```
-
-**Step 4e: Generate Docker Build Target**
-
-Add an eval call to generate the docker-build target (around line 480-501):
-
-```makefile
-$(eval $(call generate-docker-build-target,$(BACKEND_<BACKEND_NAME>)))
-```
-
-**Step 4f: Add to `docker-build-backends`**
-
-Add `docker-build-<backend-name>` to the `docker-build-backends` target (around line 507):
-
-```makefile
-docker-build-backends: ... docker-build-<backend-name>
-```
-
-**Determining the Context:**
-
- If the backend is in `backend/python/<backend-name>/` and uses `./backend` as context in the workflow file, use `./backend` context
- If the backend is in `backend/python/<backend-name>/` but uses `.` as context in the workflow file, use `.` context
- Check similar backends to determine the correct context
-
-### 5. Verification Checklist
-
-After adding a new backend, verify:
-
- [ ] Backend directory structure is complete with all necessary files
- [ ] Build configurations added to `.github/workflows/backend.yml` for all desired platforms
- [ ] Meta definition added to `backend/index.yaml` in the `## metas` section
- [ ] Image entries added to `backend/index.yaml` for all build variants (latest + development)
- [ ] Tag suffixes match between workflow file and index.yaml
- [ ] Makefile updated with all 6 required changes (`.NOTPARALLEL`, `prepare-test-extra`, `test-extra`, backend definition, docker-build target eval, `docker-build-backends`)
- [ ] No YAML syntax errors (check with linter)
- [ ] No Makefile syntax errors (check with linter)
- [ ] Follows the same pattern as similar backends (e.g., if it's a transcription backend, follow `faster-whisper` pattern)
-
-### 6. Example: Adding a Python Backend
-
-For reference, when `moonshine` was added:
- **Files created**: `backend/python/moonshine/{backend.py, Makefile, install.sh, protogen.sh, requirements.txt, run.sh, test.py, test.sh}`
- **Workflow entries**: 3 build configurations (CPU, CUDA 12, CUDA 13)
- **Index entries**: 1 meta definition + 6 image entries (cpu, cuda12, cuda13 × latest/development)
- **Makefile updates**: 
-  - Added to `.NOTPARALLEL` line
-  - Added to `prepare-test-extra` and `test-extra` targets
-  - Added `BACKEND_MOONSHINE = moonshine|python|./backend|false|true`
-  - Added eval for docker-build target generation
-  - Added `docker-build-moonshine` to `docker-build-backends`
-
-# Coding style
-
- The project has the following .editorconfig
-
-```
-root = true
-
-[*]
-indent_style = space
-indent_size = 2
-end_of_line = lf
-charset = utf-8
-trim_trailing_whitespace = true
-insert_final_newline = true
-
-[*.go]
-indent_style = tab
-
-[Makefile]
-indent_style = tab
-
-[*.proto]
-indent_size = 2
-
-[*.py]
-indent_size = 4
-
-[*.js]
-indent_size = 2
-
-[*.yaml]
-indent_size = 2
-
-[*.md]
-trim_trailing_whitespace = false
-```
-
- Use comments sparingly to explain why code does something, not what it does. Comments are there to add context that would be difficult to deduce from reading the code.
- Prefer modern Go e.g. use `any` not `interface{}`
-
-# Logging
-
-Use `github.com/mudler/xlog` for logging which has the same API as slog.
-
-# llama.cpp Backend
-
-The llama.cpp backend (`backend/cpp/llama-cpp/grpc-server.cpp`) is a gRPC adaptation of the upstream HTTP server (`llama.cpp/tools/server/server.cpp`). It uses the same underlying server infrastructure from `llama.cpp/tools/server/server-context.cpp`.
-
-## Building and Testing
-
- Test llama.cpp backend compilation: `make backends/llama-cpp`
- The backend is built as part of the main build process
- Check `backend/cpp/llama-cpp/Makefile` for build configuration
-
-## Architecture
-
- **grpc-server.cpp**: gRPC server implementation, adapts HTTP server patterns to gRPC
- Uses shared server infrastructure: `server-context.cpp`, `server-task.cpp`, `server-queue.cpp`, `server-common.cpp`
- The gRPC server mirrors the HTTP server's functionality but uses gRPC instead of HTTP
-
-## Common Issues When Updating llama.cpp
-
-When fixing compilation errors after upstream changes:
-1. Check how `server.cpp` (HTTP server) handles the same change
-2. Look for new public APIs or getter methods
-3. Store copies of needed data instead of accessing private members
-4. Update function calls to match new signatures
-5. Test with `make backends/llama-cpp`
-
-## Key Differences from HTTP Server
-
- gRPC uses `BackendServiceImpl` class with gRPC service methods
- HTTP server uses `server_routes` with HTTP handlers
- Both use the same `server_context` and task queue infrastructure
- gRPC methods: `LoadModel`, `Predict`, `PredictStream`, `Embedding`, `Rerank`, `TokenizeString`, `GetMetrics`, `Health`
-
-## Tool Call Parsing Maintenance
-
-When working on JSON/XML tool call parsing functionality, always check llama.cpp for reference implementation and updates:
-
-### Checking for XML Parsing Changes
-
-1. **Review XML Format Definitions**: Check `llama.cpp/common/chat-parser-xml-toolcall.h` for `xml_tool_call_format` struct changes
-2. **Review Parsing Logic**: Check `llama.cpp/common/chat-parser-xml-toolcall.cpp` for parsing algorithm updates
-3. **Review Format Presets**: Check `llama.cpp/common/chat-parser.cpp` for new XML format presets (search for `xml_tool_call_format form`)
-4. **Review Model Lists**: Check `llama.cpp/common/chat.h` for `COMMON_CHAT_FORMAT_*` enum values that use XML parsing:
-   - `COMMON_CHAT_FORMAT_GLM_4_5`
-   - `COMMON_CHAT_FORMAT_MINIMAX_M2`
-   - `COMMON_CHAT_FORMAT_KIMI_K2`
-   - `COMMON_CHAT_FORMAT_QWEN3_CODER_XML`
-   - `COMMON_CHAT_FORMAT_APRIEL_1_5`
-   - `COMMON_CHAT_FORMAT_XIAOMI_MIMO`
-   - Any new formats added
-
-### Model Configuration Options
-
-Always check `llama.cpp` for new model configuration options that should be supported in LocalAI:
-
-1. **Check Server Context**: Review `llama.cpp/tools/server/server-context.cpp` for new parameters
-2. **Check Chat Params**: Review `llama.cpp/common/chat.h` for `common_chat_params` struct changes
-3. **Check Server Options**: Review `llama.cpp/tools/server/server.cpp` for command-line argument changes
-4. **Examples of options to check**:
-   - `ctx_shift` - Context shifting support
-   - `parallel_tool_calls` - Parallel tool calling
-   - `reasoning_format` - Reasoning format options
-   - Any new flags or parameters
-
-### Implementation Guidelines
-
-1. **Feature Parity**: Always aim for feature parity with llama.cpp's implementation
-2. **Test Coverage**: Add tests for new features matching llama.cpp's behavior
-3. **Documentation**: Update relevant documentation when adding new formats or options
-4. **Backward Compatibility**: Ensure changes don't break existing functionality
-
-### Files to Monitor
-
- `llama.cpp/common/chat-parser-xml-toolcall.h` - Format definitions
- `llama.cpp/common/chat-parser-xml-toolcall.cpp` - Parsing logic
- `llama.cpp/common/chat-parser.cpp` - Format presets and model-specific handlers
- `llama.cpp/common/chat.h` - Format enums and parameter structures
- `llama.cpp/tools/server/server-context.cpp` - Server configuration options
-
-# Documentation
-
-The project documentation is located in `docs/content`. When adding new features or changing existing functionality, it is crucial to update the documentation to reflect these changes. This helps users understand how to use the new capabilities and ensures the documentation stays relevant.
-
- **Feature Documentation**: If you add a new feature (like a new backend or API endpoint), create a new markdown file in `docs/content/features/` explaining what it is, how to configure it, and how to use it.
- **Configuration**: If you modify configuration options, update the relevant sections in `docs/content/`.
- **Examples**: providing concrete examples (like YAML configuration blocks) is highly encouraged to help users get started quickly.
+- **Logging**: Use `github.com/mudler/xlog` (same API as slog)
+- **Go style**: Prefer `any` over `interface{}`
+- **Comments**: Explain *why*, not *what*
+- **Docs**: Update `docs/content/` when adding features or changing config
+- **Build**: Inspect `Makefile` and `.github/workflows/` — ask the user before running long builds
+- **UI**: The active UI is the React app in `core/http/react-ui/`. The older Alpine.js/HTML UI in `core/http/static/` is pending deprecation — all new UI work goes in the React UI
--- a/README.md
+++ b/README.md
@@ -235,7 +235,7 @@ local-ai run oci://localai/phi-2:latest
 For more information, see [💻 Getting started](https://localai.io/basics/getting_started/index.html), if you are interested in our roadmap items and future enhancements, you can see the [Issues labeled as Roadmap here](https://github.com/mudler/LocalAI/issues?q=is%3Aissue+is%3Aopen+label%3Aroadmap)

 ## 📰 Latest project news
- March 2026: [Agent management](https://github.com/mudler/LocalAI/pull/8820), [New React UI](https://github.com/mudler/LocalAI/pull/8772), [WebRTC](https://github.com/mudler/LocalAI/pull/8790),[MLX-distributed via P2P and RDMA](https://github.com/mudler/LocalAI/pull/8801)
+- March 2026: [Agent management](https://github.com/mudler/LocalAI/pull/8820), [New React UI](https://github.com/mudler/LocalAI/pull/8772), [WebRTC](https://github.com/mudler/LocalAI/pull/8790),[MLX-distributed via P2P and RDMA](https://github.com/mudler/LocalAI/pull/8801), [MCP Apps, MCP Client-side](https://github.com/mudler/LocalAI/pull/8947)
 - February 2026: [Realtime API for audio-to-audio with tool calling](https://github.com/mudler/LocalAI/pull/6245), [ACE-Step 1.5 support](https://github.com/mudler/LocalAI/pull/8396)
 - January 2026: **LocalAI 3.10.0** - Major release with Anthropic API support, Open Responses API for stateful agents, video & image generation suite (LTX-2), unified GPU backends, tool streaming & XML parsing, system-aware backend gallery, crash fixes for AVX-only CPUs and AMD VRAM reporting, request tracing, and new backends: **Moonshine** (ultra-fast transcription), **Pocket-TTS** (lightweight TTS). Vulkan arm64 builds now available. [Release notes](https://github.com/mudler/LocalAI/releases/tag/v3.10.0).
 - December 2025: [Dynamic Memory Resource reclaimer](https://github.com/mudler/LocalAI/pull/7583), [Automatic fitting of models to multiple GPUS(llama.cpp)](https://github.com/mudler/LocalAI/pull/7584), [Added Vibevoice backend](https://github.com/mudler/LocalAI/pull/7494)
--- a/core/application/application.go
+++ b/core/application/application.go
@@ -5,6 +5,7 @@ import (
 	"sync"

 	"github.com/mudler/LocalAI/core/config"
+	mcpTools "github.com/mudler/LocalAI/core/http/endpoints/mcp"
 	"github.com/mudler/LocalAI/core/services"
 	"github.com/mudler/LocalAI/core/templates"
 	"github.com/mudler/LocalAI/pkg/model"
@@ -29,9 +30,16 @@ type Application struct {
 }

 func newApplication(appConfig *config.ApplicationConfig) *Application {
+	ml := model.NewModelLoader(appConfig.SystemState)
+
+	// Close MCP sessions when a model is unloaded (watchdog eviction, manual shutdown, etc.)
+	ml.OnModelUnload(func(modelName string) {
+		mcpTools.CloseMCPSessions(modelName)
+	})
+
 	return &Application{
 		backendLoader:      config.NewModelConfigLoader(appConfig.SystemState.Model.ModelsPath),
-		modelLoader:        model.NewModelLoader(appConfig.SystemState),
+		modelLoader:        ml,
 		applicationConfig:  appConfig,
 		templatesEvaluator: templates.NewEvaluator(appConfig.SystemState.Model.ModelsPath),
 	}
--- a/core/http/endpoints/anthropic/messages.go
+++ b/core/http/endpoints/anthropic/messages.go
@@ -3,11 +3,13 @@ package anthropic
 import (
 	"encoding/json"
 	"fmt"
+	"strings"

 	"github.com/google/uuid"
 	"github.com/labstack/echo/v4"
 	"github.com/mudler/LocalAI/core/backend"
 	"github.com/mudler/LocalAI/core/config"
+	mcpTools "github.com/mudler/LocalAI/core/http/endpoints/mcp"
 	"github.com/mudler/LocalAI/core/http/middleware"
 	"github.com/mudler/LocalAI/core/schema"
 	"github.com/mudler/LocalAI/core/templates"
@@ -48,6 +50,92 @@ func MessagesEndpoint(cl *config.ModelConfigLoader, ml *model.ModelLoader, evalu
 		// Convert Anthropic tools to internal Functions format
 		funcs, shouldUseFn := convertAnthropicTools(input, cfg)

+		// MCP injection: prompts, resources, and tools
+		var mcpToolInfos []mcpTools.MCPToolInfo
+		mcpServers := mcpTools.MCPServersFromMetadata(input.Metadata)
+		mcpPromptName, mcpPromptArgs := mcpTools.MCPPromptFromMetadata(input.Metadata)
+		mcpResourceURIs := mcpTools.MCPResourcesFromMetadata(input.Metadata)
+
+		if (len(mcpServers) > 0 || mcpPromptName != "" || len(mcpResourceURIs) > 0) && (cfg.MCP.Servers != "" || cfg.MCP.Stdio != "") {
+			remote, stdio, mcpErr := cfg.MCP.MCPConfigFromYAML()
+			if mcpErr == nil {
+				namedSessions, sessErr := mcpTools.NamedSessionsFromMCPConfig(cfg.Name, remote, stdio, mcpServers)
+				if sessErr == nil && len(namedSessions) > 0 {
+					// Prompt injection
+					if mcpPromptName != "" {
+						prompts, discErr := mcpTools.DiscoverMCPPrompts(c.Request().Context(), namedSessions)
+						if discErr == nil {
+							promptMsgs, getErr := mcpTools.GetMCPPrompt(c.Request().Context(), prompts, mcpPromptName, mcpPromptArgs)
+							if getErr == nil {
+								var injected []schema.Message
+								for _, pm := range promptMsgs {
+									injected = append(injected, schema.Message{
+										Role:    string(pm.Role),
+										Content: mcpTools.PromptMessageToText(pm),
+									})
+								}
+								openAIMessages = append(injected, openAIMessages...)
+								xlog.Debug("Anthropic MCP prompt injected", "prompt", mcpPromptName, "messages", len(injected))
+							} else {
+								xlog.Error("Failed to get MCP prompt", "error", getErr)
+							}
+						}
+					}
+
+					// Resource injection
+					if len(mcpResourceURIs) > 0 {
+						resources, discErr := mcpTools.DiscoverMCPResources(c.Request().Context(), namedSessions)
+						if discErr == nil {
+							var resourceTexts []string
+							for _, uri := range mcpResourceURIs {
+								content, readErr := mcpTools.ReadMCPResource(c.Request().Context(), resources, uri)
+								if readErr != nil {
+									xlog.Error("Failed to read MCP resource", "error", readErr, "uri", uri)
+									continue
+								}
+								name := uri
+								for _, r := range resources {
+									if r.URI == uri {
+										name = r.Name
+										break
+									}
+								}
+								resourceTexts = append(resourceTexts, fmt.Sprintf("--- MCP Resource: %s ---\n%s", name, content))
+							}
+							if len(resourceTexts) > 0 && len(openAIMessages) > 0 {
+								lastIdx := len(openAIMessages) - 1
+								suffix := "\n\n" + strings.Join(resourceTexts, "\n\n")
+								switch ct := openAIMessages[lastIdx].Content.(type) {
+								case string:
+									openAIMessages[lastIdx].Content = ct + suffix
+								default:
+									openAIMessages[lastIdx].Content = fmt.Sprintf("%v%s", ct, suffix)
+								}
+								xlog.Debug("Anthropic MCP resources injected", "count", len(resourceTexts))
+							}
+						}
+					}
+
+					// Tool injection
+					if len(mcpServers) > 0 {
+						discovered, discErr := mcpTools.DiscoverMCPTools(c.Request().Context(), namedSessions)
+						if discErr == nil {
+							mcpToolInfos = discovered
+							for _, ti := range mcpToolInfos {
+								funcs = append(funcs, ti.Function)
+							}
+							shouldUseFn = len(funcs) > 0 && cfg.ShouldUseFunctions()
+							xlog.Debug("Anthropic MCP tools injected", "count", len(mcpToolInfos), "total_funcs", len(funcs))
+						} else {
+							xlog.Error("Failed to discover MCP tools", "error", discErr)
+						}
+					}
+				}
+			} else {
+				xlog.Error("Failed to parse MCP config", "error", mcpErr)
+			}
+		}
+
 		// Create an OpenAI-compatible request for internal processing
 		openAIReq := &schema.OpenAIRequest{
 			PredictionOptions: schema.PredictionOptions{
@@ -88,138 +176,200 @@ func MessagesEndpoint(cl *config.ModelConfigLoader, ml *model.ModelLoader, evalu
 		xlog.Debug("Anthropic Messages - Prompt (after templating)", "prompt", predInput)

 		if input.Stream {
-			return handleAnthropicStream(c, id, input, cfg, ml, cl, appConfig, predInput, openAIReq, funcs, shouldUseFn)
+			return handleAnthropicStream(c, id, input, cfg, ml, cl, appConfig, predInput, openAIReq, funcs, shouldUseFn, mcpToolInfos, evaluator)
 		}

-		return handleAnthropicNonStream(c, id, input, cfg, ml, cl, appConfig, predInput, openAIReq, funcs, shouldUseFn)
+		return handleAnthropicNonStream(c, id, input, cfg, ml, cl, appConfig, predInput, openAIReq, funcs, shouldUseFn, mcpToolInfos, evaluator)
 	}
 }

-func handleAnthropicNonStream(c echo.Context, id string, input *schema.AnthropicRequest, cfg *config.ModelConfig, ml *model.ModelLoader, cl *config.ModelConfigLoader, appConfig *config.ApplicationConfig, predInput string, openAIReq *schema.OpenAIRequest, funcs functions.Functions, shouldUseFn bool) error {
-	images := []string{}
-	for _, m := range openAIReq.Messages {
-		images = append(images, m.StringImages...)
+func handleAnthropicNonStream(c echo.Context, id string, input *schema.AnthropicRequest, cfg *config.ModelConfig, ml *model.ModelLoader, cl *config.ModelConfigLoader, appConfig *config.ApplicationConfig, predInput string, openAIReq *schema.OpenAIRequest, funcs functions.Functions, shouldUseFn bool, mcpToolInfos []mcpTools.MCPToolInfo, evaluator *templates.Evaluator) error {
+	mcpMaxIterations := 10
+	if cfg.Agent.MaxIterations > 0 {
+		mcpMaxIterations = cfg.Agent.MaxIterations
 	}
+	hasMCPTools := len(mcpToolInfos) > 0

-	toolsJSON := ""
-	if len(funcs) > 0 {
-		openAITools := make([]functions.Tool, len(funcs))
-		for i, f := range funcs {
-			openAITools[i] = functions.Tool{Type: "function", Function: f}
+	for mcpIteration := 0; mcpIteration <= mcpMaxIterations; mcpIteration++ {
+		// Re-template on each MCP iteration since messages may have changed
+		if mcpIteration > 0 {
+			predInput = evaluator.TemplateMessages(*openAIReq, openAIReq.Messages, cfg, funcs, shouldUseFn)
+			xlog.Debug("Anthropic MCP re-templating", "iteration", mcpIteration, "prompt_len", len(predInput))
 		}
-		if toolsBytes, err := json.Marshal(openAITools); err == nil {
-			toolsJSON = string(toolsBytes)
-		}
-	}
-	toolChoiceJSON := ""
-	if input.ToolChoice != nil {
-		if toolChoiceBytes, err := json.Marshal(input.ToolChoice); err == nil {
-			toolChoiceJSON = string(toolChoiceBytes)
-		}
-	}

-	predFunc, err := backend.ModelInference(
-		input.Context, predInput, openAIReq.Messages, images, nil, nil, ml, cfg, cl, appConfig, nil, toolsJSON, toolChoiceJSON, nil, nil, nil, input.Metadata)
-	if err != nil {
-		xlog.Error("Anthropic model inference failed", "error", err)
-		return sendAnthropicError(c, 500, "api_error", fmt.Sprintf("model inference failed: %v", err))
-	}
+		images := []string{}
+		for _, m := range openAIReq.Messages {
+			images = append(images, m.StringImages...)
+		}

-	const maxEmptyRetries = 5
-	var prediction backend.LLMResponse
-	var result string
-	for attempt := 0; attempt <= maxEmptyRetries; attempt++ {
-		prediction, err = predFunc()
-		if err != nil {
-			xlog.Error("Anthropic prediction failed", "error", err)
-			return sendAnthropicError(c, 500, "api_error", fmt.Sprintf("prediction failed: %v", err))
-		}
-		result = backend.Finetune(*cfg, predInput, prediction.Response)
-		if result != "" || !shouldUseFn {
-			break
-		}
-		xlog.Warn("Anthropic: retrying prediction due to empty backend response", "attempt", attempt+1, "maxRetries", maxEmptyRetries)
-	}
-	
-	// Try pre-parsed tool calls from C++ autoparser first, fall back to text parsing
-	var toolCalls []functions.FuncCallResults
-	if deltaToolCalls := functions.ToolCallsFromChatDeltas(prediction.ChatDeltas); len(deltaToolCalls) > 0 {
-		xlog.Debug("[ChatDeltas] Anthropic: using pre-parsed tool calls", "count", len(deltaToolCalls))
-		toolCalls = deltaToolCalls
-	} else {
-		xlog.Debug("[ChatDeltas] Anthropic: no pre-parsed tool calls, falling back to Go-side text parsing")
-		toolCalls = functions.ParseFunctionCall(result, cfg.FunctionsConfig)
-	}
-	
-	var contentBlocks []schema.AnthropicContentBlock
-	var stopReason string
-	
-	if shouldUseFn && len(toolCalls) > 0 {
-		// Model wants to use tools
-		stopReason = "tool_use"
-		for _, tc := range toolCalls {
-			// Parse arguments as JSON
-			var inputArgs map[string]interface{}
-			if err := json.Unmarshal([]byte(tc.Arguments), &inputArgs); err != nil {
-				xlog.Warn("Failed to parse tool call arguments as JSON", "error", err, "args", tc.Arguments)
-				inputArgs = map[string]interface{}{"raw": tc.Arguments}
+		toolsJSON := ""
+		if len(funcs) > 0 {
+			openAITools := make([]functions.Tool, len(funcs))
+			for i, f := range funcs {
+				openAITools[i] = functions.Tool{Type: "function", Function: f}
+			}
+			if toolsBytes, err := json.Marshal(openAITools); err == nil {
+				toolsJSON = string(toolsBytes)
 			}
-			
-			contentBlocks = append(contentBlocks, schema.AnthropicContentBlock{
-				Type:  "tool_use",
-				ID:    fmt.Sprintf("toolu_%s_%d", id, len(contentBlocks)),
-				Name:  tc.Name,
-				Input: inputArgs,
-			})
 		}
-		
-		// Add any text content before the tool calls
-		textContent := functions.ParseTextContent(result, cfg.FunctionsConfig)
-		if textContent != "" {
-			// Prepend text block
-			contentBlocks = append([]schema.AnthropicContentBlock{{Type: "text", Text: textContent}}, contentBlocks...)
+		toolChoiceJSON := ""
+		if input.ToolChoice != nil {
+			if toolChoiceBytes, err := json.Marshal(input.ToolChoice); err == nil {
+				toolChoiceJSON = string(toolChoiceBytes)
+			}
 		}
-	} else {
-		// Normal text response
-		stopReason = "end_turn"
-		contentBlocks = []schema.AnthropicContentBlock{
-			{Type: "text", Text: result},
+
+		predFunc, err := backend.ModelInference(
+			input.Context, predInput, openAIReq.Messages, images, nil, nil, ml, cfg, cl, appConfig, nil, toolsJSON, toolChoiceJSON, nil, nil, nil, input.Metadata)
+		if err != nil {
+			xlog.Error("Anthropic model inference failed", "error", err)
+			return sendAnthropicError(c, 500, "api_error", fmt.Sprintf("model inference failed: %v", err))
 		}
-	}

-	resp := &schema.AnthropicResponse{
-		ID:         fmt.Sprintf("msg_%s", id),
-		Type:       "message",
-		Role:       "assistant",
-		Model:      input.Model,
-		StopReason: &stopReason,
-		Content:    contentBlocks,
-		Usage: schema.AnthropicUsage{
-			InputTokens:  prediction.Usage.Prompt,
-			OutputTokens: prediction.Usage.Completion,
-		},
-	}
+		const maxEmptyRetries = 5
+		var prediction backend.LLMResponse
+		var result string
+		for attempt := 0; attempt <= maxEmptyRetries; attempt++ {
+			prediction, err = predFunc()
+			if err != nil {
+				xlog.Error("Anthropic prediction failed", "error", err)
+				return sendAnthropicError(c, 500, "api_error", fmt.Sprintf("prediction failed: %v", err))
+			}
+			result = backend.Finetune(*cfg, predInput, prediction.Response)
+			if result != "" || !shouldUseFn {
+				break
+			}
+			xlog.Warn("Anthropic: retrying prediction due to empty backend response", "attempt", attempt+1, "maxRetries", maxEmptyRetries)
+		}

-	if respData, err := json.Marshal(resp); err == nil {
-		xlog.Debug("Anthropic Response", "response", string(respData))
-	}
+		// Try pre-parsed tool calls from C++ autoparser first, fall back to text parsing
+		var toolCalls []functions.FuncCallResults
+		if deltaToolCalls := functions.ToolCallsFromChatDeltas(prediction.ChatDeltas); len(deltaToolCalls) > 0 {
+			xlog.Debug("[ChatDeltas] Anthropic: using pre-parsed tool calls", "count", len(deltaToolCalls))
+			toolCalls = deltaToolCalls
+		} else {
+			xlog.Debug("[ChatDeltas] Anthropic: no pre-parsed tool calls, falling back to Go-side text parsing")
+			toolCalls = functions.ParseFunctionCall(result, cfg.FunctionsConfig)
+		}

-	return c.JSON(200, resp)
+		// MCP server-side tool execution: if any tool calls are MCP tools, execute and loop
+		if hasMCPTools && shouldUseFn && len(toolCalls) > 0 {
+			var hasMCPCalls bool
+			for _, tc := range toolCalls {
+				if mcpTools.IsMCPTool(mcpToolInfos, tc.Name) {
+					hasMCPCalls = true
+					break
+				}
+			}
+			if hasMCPCalls {
+				// Append assistant message with tool_calls to conversation
+				assistantMsg := schema.Message{
+					Role:    "assistant",
+					Content: result,
+				}
+				for i, tc := range toolCalls {
+					toolCallID := tc.ID
+					if toolCallID == "" {
+						toolCallID = fmt.Sprintf("toolu_%s_%d", id, i)
+					}
+					assistantMsg.ToolCalls = append(assistantMsg.ToolCalls, schema.ToolCall{
+						Index: i,
+						ID:    toolCallID,
+						Type:  "function",
+						FunctionCall: schema.FunctionCall{
+							Name:      tc.Name,
+							Arguments: tc.Arguments,
+						},
+					})
+				}
+				openAIReq.Messages = append(openAIReq.Messages, assistantMsg)
+
+				// Execute each MCP tool call and append results
+				for _, tc := range assistantMsg.ToolCalls {
+					if !mcpTools.IsMCPTool(mcpToolInfos, tc.FunctionCall.Name) {
+						continue
+					}
+					xlog.Debug("Executing MCP tool (Anthropic)", "tool", tc.FunctionCall.Name, "iteration", mcpIteration)
+					toolResult, toolErr := mcpTools.ExecuteMCPToolCall(
+						c.Request().Context(), mcpToolInfos,
+						tc.FunctionCall.Name, tc.FunctionCall.Arguments,
+					)
+					if toolErr != nil {
+						xlog.Error("MCP tool execution failed", "tool", tc.FunctionCall.Name, "error", toolErr)
+						toolResult = fmt.Sprintf("Error: %v", toolErr)
+					}
+					openAIReq.Messages = append(openAIReq.Messages, schema.Message{
+						Role:          "tool",
+						Content:       toolResult,
+						StringContent: toolResult,
+						ToolCallID:    tc.ID,
+						Name:          tc.FunctionCall.Name,
+					})
+				}
+
+				xlog.Debug("Anthropic MCP tools executed, re-running inference", "iteration", mcpIteration)
+				continue // next MCP iteration
+			}
+		}
+
+		// No MCP tools to execute, build and return response
+		var contentBlocks []schema.AnthropicContentBlock
+		var stopReason string
+
+		if shouldUseFn && len(toolCalls) > 0 {
+			stopReason = "tool_use"
+			for _, tc := range toolCalls {
+				var inputArgs map[string]interface{}
+				if err := json.Unmarshal([]byte(tc.Arguments), &inputArgs); err != nil {
+					xlog.Warn("Failed to parse tool call arguments as JSON", "error", err, "args", tc.Arguments)
+					inputArgs = map[string]interface{}{"raw": tc.Arguments}
+				}
+				contentBlocks = append(contentBlocks, schema.AnthropicContentBlock{
+					Type:  "tool_use",
+					ID:    fmt.Sprintf("toolu_%s_%d", id, len(contentBlocks)),
+					Name:  tc.Name,
+					Input: inputArgs,
+				})
+			}
+			textContent := functions.ParseTextContent(result, cfg.FunctionsConfig)
+			if textContent != "" {
+				contentBlocks = append([]schema.AnthropicContentBlock{{Type: "text", Text: textContent}}, contentBlocks...)
+			}
+		} else {
+			stopReason = "end_turn"
+			contentBlocks = []schema.AnthropicContentBlock{
+				{Type: "text", Text: result},
+			}
+		}
+
+		resp := &schema.AnthropicResponse{
+			ID:         fmt.Sprintf("msg_%s", id),
+			Type:       "message",
+			Role:       "assistant",
+			Model:      input.Model,
+			StopReason: &stopReason,
+			Content:    contentBlocks,
+			Usage: schema.AnthropicUsage{
+				InputTokens:  prediction.Usage.Prompt,
+				OutputTokens: prediction.Usage.Completion,
+			},
+		}
+
+		if respData, err := json.Marshal(resp); err == nil {
+			xlog.Debug("Anthropic Response", "response", string(respData))
+		}
+
+		return c.JSON(200, resp)
+	} // end MCP iteration loop
+
+	return sendAnthropicError(c, 500, "api_error", "MCP iteration limit reached")
 }

-func handleAnthropicStream(c echo.Context, id string, input *schema.AnthropicRequest, cfg *config.ModelConfig, ml *model.ModelLoader, cl *config.ModelConfigLoader, appConfig *config.ApplicationConfig, predInput string, openAIReq *schema.OpenAIRequest, funcs functions.Functions, shouldUseFn bool) error {
+func handleAnthropicStream(c echo.Context, id string, input *schema.AnthropicRequest, cfg *config.ModelConfig, ml *model.ModelLoader, cl *config.ModelConfigLoader, appConfig *config.ApplicationConfig, predInput string, openAIReq *schema.OpenAIRequest, funcs functions.Functions, shouldUseFn bool, mcpToolInfos []mcpTools.MCPToolInfo, evaluator *templates.Evaluator) error {
 	c.Response().Header().Set("Content-Type", "text/event-stream")
 	c.Response().Header().Set("Cache-Control", "no-cache")
 	c.Response().Header().Set("Connection", "keep-alive")

-	// Create OpenAI messages for inference
-	openAIMessages := openAIReq.Messages
-
-	images := []string{}
-	for _, m := range openAIMessages {
-		images = append(images, m.StringImages...)
-	}
-
 	// Send message_start event
 	messageStart := schema.AnthropicStreamEvent{
 		Type: "message_start",
@@ -234,159 +384,232 @@ func handleAnthropicStream(c echo.Context, id string, input *schema.AnthropicReq
 	}
 	sendAnthropicSSE(c, messageStart)

-	// Track accumulated content for tool call detection
-	accumulatedContent := ""
-	currentBlockIndex := 0
-	inToolCall := false
-	toolCallsEmitted := 0
-	
-	// Send initial content_block_start event
-	contentBlockStart := schema.AnthropicStreamEvent{
-		Type:         "content_block_start",
-		Index:        currentBlockIndex,
-		ContentBlock: &schema.AnthropicContentBlock{Type: "text", Text: ""},
+	mcpMaxIterations := 10
+	if cfg.Agent.MaxIterations > 0 {
+		mcpMaxIterations = cfg.Agent.MaxIterations
 	}
-	sendAnthropicSSE(c, contentBlockStart)
+	hasMCPTools := len(mcpToolInfos) > 0

-	// Stream content deltas
-	tokenCallback := func(token string, usage backend.TokenUsage) bool {
-		accumulatedContent += token
-		
-		// If we're using functions, try to detect tool calls incrementally
-		if shouldUseFn {
-			cleanedResult := functions.CleanupLLMResult(accumulatedContent, cfg.FunctionsConfig)
-			
-			// Try parsing for tool calls
-			toolCalls := functions.ParseFunctionCall(cleanedResult, cfg.FunctionsConfig)
-			
-			// If we detected new tool calls and haven't emitted them yet
-			if len(toolCalls) > toolCallsEmitted {
-				// Stop the current text block if we were in one
-				if !inToolCall && currentBlockIndex == 0 {
-					sendAnthropicSSE(c, schema.AnthropicStreamEvent{
-						Type:  "content_block_stop",
-						Index: currentBlockIndex,
-					})
-					currentBlockIndex++
-					inToolCall = true
+	for mcpIteration := 0; mcpIteration <= mcpMaxIterations; mcpIteration++ {
+		// Re-template on MCP iterations
+		if mcpIteration > 0 {
+			predInput = evaluator.TemplateMessages(*openAIReq, openAIReq.Messages, cfg, funcs, shouldUseFn)
+			xlog.Debug("Anthropic MCP stream re-templating", "iteration", mcpIteration)
+		}
+
+		openAIMessages := openAIReq.Messages
+		images := []string{}
+		for _, m := range openAIMessages {
+			images = append(images, m.StringImages...)
+		}
+
+		// Track accumulated content for tool call detection
+		accumulatedContent := ""
+		currentBlockIndex := 0
+		inToolCall := false
+		toolCallsEmitted := 0
+
+		// Send initial content_block_start event
+		contentBlockStart := schema.AnthropicStreamEvent{
+			Type:         "content_block_start",
+			Index:        currentBlockIndex,
+			ContentBlock: &schema.AnthropicContentBlock{Type: "text", Text: ""},
+		}
+		sendAnthropicSSE(c, contentBlockStart)
+
+		// Collect tool calls for MCP execution
+		var collectedToolCalls []functions.FuncCallResults
+
+		tokenCallback := func(token string, usage backend.TokenUsage) bool {
+			accumulatedContent += token
+
+			if shouldUseFn {
+				cleanedResult := functions.CleanupLLMResult(accumulatedContent, cfg.FunctionsConfig)
+				toolCalls := functions.ParseFunctionCall(cleanedResult, cfg.FunctionsConfig)
+
+				if len(toolCalls) > toolCallsEmitted {
+					if !inToolCall && currentBlockIndex == 0 {
+						sendAnthropicSSE(c, schema.AnthropicStreamEvent{
+							Type:  "content_block_stop",
+							Index: currentBlockIndex,
+						})
+						currentBlockIndex++
+						inToolCall = true
+					}
+
+					for i := toolCallsEmitted; i < len(toolCalls); i++ {
+						tc := toolCalls[i]
+						sendAnthropicSSE(c, schema.AnthropicStreamEvent{
+							Type:  "content_block_start",
+							Index: currentBlockIndex,
+							ContentBlock: &schema.AnthropicContentBlock{
+								Type: "tool_use",
+								ID:   fmt.Sprintf("toolu_%s_%d", id, i),
+								Name: tc.Name,
+							},
+						})
+						sendAnthropicSSE(c, schema.AnthropicStreamEvent{
+							Type:  "content_block_delta",
+							Index: currentBlockIndex,
+							Delta: &schema.AnthropicStreamDelta{
+								Type:        "input_json_delta",
+								PartialJSON: tc.Arguments,
+							},
+						})
+						sendAnthropicSSE(c, schema.AnthropicStreamEvent{
+							Type:  "content_block_stop",
+							Index: currentBlockIndex,
+						})
+						currentBlockIndex++
+					}
+					collectedToolCalls = toolCalls
+					toolCallsEmitted = len(toolCalls)
+					return true
 				}
-				
-				// Emit new tool calls
-				for i := toolCallsEmitted; i < len(toolCalls); i++ {
-					tc := toolCalls[i]
-					
-					// Send content_block_start for tool_use
-					sendAnthropicSSE(c, schema.AnthropicStreamEvent{
-						Type:  "content_block_start",
-						Index: currentBlockIndex,
-						ContentBlock: &schema.AnthropicContentBlock{
-							Type: "tool_use",
-							ID:   fmt.Sprintf("toolu_%s_%d", id, i),
-							Name: tc.Name,
-						},
-					})
-					
-					// Send input_json_delta with the arguments
-					sendAnthropicSSE(c, schema.AnthropicStreamEvent{
-						Type:  "content_block_delta",
-						Index: currentBlockIndex,
-						Delta: &schema.AnthropicStreamDelta{
-							Type:        "input_json_delta",
-							PartialJSON: tc.Arguments,
-						},
-					})
-					
-					// Send content_block_stop
-					sendAnthropicSSE(c, schema.AnthropicStreamEvent{
-						Type:  "content_block_stop",
-						Index: currentBlockIndex,
-					})
-					
-					currentBlockIndex++
-				}
-				toolCallsEmitted = len(toolCalls)
-				return true
+			}
+
+			if !inToolCall {
+				sendAnthropicSSE(c, schema.AnthropicStreamEvent{
+					Type:  "content_block_delta",
+					Index: 0,
+					Delta: &schema.AnthropicStreamDelta{
+						Type: "text_delta",
+						Text: token,
+					},
+				})
+			}
+			return true
+		}
+
+		toolsJSON := ""
+		if len(funcs) > 0 {
+			openAITools := make([]functions.Tool, len(funcs))
+			for i, f := range funcs {
+				openAITools[i] = functions.Tool{Type: "function", Function: f}
+			}
+			if toolsBytes, err := json.Marshal(openAITools); err == nil {
+				toolsJSON = string(toolsBytes)
 			}
 		}
-		
-		// Send regular text delta if not in tool call mode
+		toolChoiceJSON := ""
+		if input.ToolChoice != nil {
+			if toolChoiceBytes, err := json.Marshal(input.ToolChoice); err == nil {
+				toolChoiceJSON = string(toolChoiceBytes)
+			}
+		}
+
+		predFunc, err := backend.ModelInference(
+			input.Context, predInput, openAIMessages, images, nil, nil, ml, cfg, cl, appConfig, tokenCallback, toolsJSON, toolChoiceJSON, nil, nil, nil, input.Metadata)
+		if err != nil {
+			xlog.Error("Anthropic stream model inference failed", "error", err)
+			return sendAnthropicError(c, 500, "api_error", fmt.Sprintf("model inference failed: %v", err))
+		}
+
+		prediction, err := predFunc()
+		if err != nil {
+			xlog.Error("Anthropic stream prediction failed", "error", err)
+			return sendAnthropicError(c, 500, "api_error", fmt.Sprintf("prediction failed: %v", err))
+		}
+
+		// Also check chat deltas for tool calls
+		if deltaToolCalls := functions.ToolCallsFromChatDeltas(prediction.ChatDeltas); len(deltaToolCalls) > 0 && len(collectedToolCalls) == 0 {
+			collectedToolCalls = deltaToolCalls
+		}
+
+		// MCP streaming tool execution: if we collected MCP tool calls, execute and loop
+		if hasMCPTools && len(collectedToolCalls) > 0 {
+			var hasMCPCalls bool
+			for _, tc := range collectedToolCalls {
+				if mcpTools.IsMCPTool(mcpToolInfos, tc.Name) {
+					hasMCPCalls = true
+					break
+				}
+			}
+			if hasMCPCalls {
+				// Append assistant message with tool_calls
+				assistantMsg := schema.Message{
+					Role:    "assistant",
+					Content: accumulatedContent,
+				}
+				for i, tc := range collectedToolCalls {
+					toolCallID := tc.ID
+					if toolCallID == "" {
+						toolCallID = fmt.Sprintf("toolu_%s_%d", id, i)
+					}
+					assistantMsg.ToolCalls = append(assistantMsg.ToolCalls, schema.ToolCall{
+						Index: i,
+						ID:    toolCallID,
+						Type:  "function",
+						FunctionCall: schema.FunctionCall{
+							Name:      tc.Name,
+							Arguments: tc.Arguments,
+						},
+					})
+				}
+				openAIReq.Messages = append(openAIReq.Messages, assistantMsg)
+
+				// Execute MCP tool calls
+				for _, tc := range assistantMsg.ToolCalls {
+					if !mcpTools.IsMCPTool(mcpToolInfos, tc.FunctionCall.Name) {
+						continue
+					}
+					xlog.Debug("Executing MCP tool (Anthropic stream)", "tool", tc.FunctionCall.Name, "iteration", mcpIteration)
+					toolResult, toolErr := mcpTools.ExecuteMCPToolCall(
+						c.Request().Context(), mcpToolInfos,
+						tc.FunctionCall.Name, tc.FunctionCall.Arguments,
+					)
+					if toolErr != nil {
+						xlog.Error("MCP tool execution failed", "tool", tc.FunctionCall.Name, "error", toolErr)
+						toolResult = fmt.Sprintf("Error: %v", toolErr)
+					}
+					openAIReq.Messages = append(openAIReq.Messages, schema.Message{
+						Role:          "tool",
+						Content:       toolResult,
+						StringContent: toolResult,
+						ToolCallID:    tc.ID,
+						Name:          tc.FunctionCall.Name,
+					})
+				}
+
+				xlog.Debug("Anthropic MCP streaming tools executed, re-running inference", "iteration", mcpIteration)
+				continue // next MCP iteration
+			}
+		}
+
+		// No MCP tools to execute, close stream
 		if !inToolCall {
-			delta := schema.AnthropicStreamEvent{
-				Type:  "content_block_delta",
+			sendAnthropicSSE(c, schema.AnthropicStreamEvent{
+				Type:  "content_block_stop",
 				Index: 0,
-				Delta: &schema.AnthropicStreamDelta{
-					Type: "text_delta",
-					Text: token,
-				},
-			}
-			sendAnthropicSSE(c, delta)
+			})
 		}
-		return true
-	}

-	toolsJSON := ""
-	if len(funcs) > 0 {
-		openAITools := make([]functions.Tool, len(funcs))
-		for i, f := range funcs {
-			openAITools[i] = functions.Tool{Type: "function", Function: f}
+		stopReason := "end_turn"
+		if toolCallsEmitted > 0 {
+			stopReason = "tool_use"
 		}
-		if toolsBytes, err := json.Marshal(openAITools); err == nil {
-			toolsJSON = string(toolsBytes)
-		}
-	}
-	toolChoiceJSON := ""
-	if input.ToolChoice != nil {
-		if toolChoiceBytes, err := json.Marshal(input.ToolChoice); err == nil {
-			toolChoiceJSON = string(toolChoiceBytes)
-		}
-	}

-	predFunc, err := backend.ModelInference(
-		input.Context, predInput, openAIMessages, images, nil, nil, ml, cfg, cl, appConfig, tokenCallback, toolsJSON, toolChoiceJSON, nil, nil, nil, input.Metadata)
-	if err != nil {
-		xlog.Error("Anthropic stream model inference failed", "error", err)
-		return sendAnthropicError(c, 500, "api_error", fmt.Sprintf("model inference failed: %v", err))
-	}
+		sendAnthropicSSE(c, schema.AnthropicStreamEvent{
+			Type: "message_delta",
+			Delta: &schema.AnthropicStreamDelta{
+				StopReason: &stopReason,
+			},
+			Usage: &schema.AnthropicUsage{
+				OutputTokens: prediction.Usage.Completion,
+			},
+		})

-	prediction, err := predFunc()
-	if err != nil {
-		xlog.Error("Anthropic stream prediction failed", "error", err)
-		return sendAnthropicError(c, 500, "api_error", fmt.Sprintf("prediction failed: %v", err))
-	}
+		sendAnthropicSSE(c, schema.AnthropicStreamEvent{
+			Type: "message_stop",
+		})

-	// Send content_block_stop event for last block if we didn't close it yet
-	if !inToolCall {
-		contentBlockStop := schema.AnthropicStreamEvent{
-			Type:  "content_block_stop",
-			Index: 0,
-		}
-		sendAnthropicSSE(c, contentBlockStop)
-	}
+		return nil
+	} // end MCP iteration loop

-	// Determine stop reason
-	stopReason := "end_turn"
-	if toolCallsEmitted > 0 {
-		stopReason = "tool_use"
-	}
-
-	// Send message_delta event with stop_reason
-	messageDelta := schema.AnthropicStreamEvent{
-		Type: "message_delta",
-		Delta: &schema.AnthropicStreamDelta{
-			StopReason: &stopReason,
-		},
-		Usage: &schema.AnthropicUsage{
-			OutputTokens: prediction.Usage.Completion,
-		},
-	}
-	sendAnthropicSSE(c, messageDelta)
-
-	// Send message_stop event
-	messageStop := schema.AnthropicStreamEvent{
+	// Safety fallback
+	sendAnthropicSSE(c, schema.AnthropicStreamEvent{
 		Type: "message_stop",
-	}
-	sendAnthropicSSE(c, messageStop)
-
+	})
 	return nil
 }

--- a/core/http/endpoints/localai/cors_proxy.go
+++ b/core/http/endpoints/localai/cors_proxy.go
@@ -0,0 +1,108 @@
+package localai
+
+import (
+	"fmt"
+	"io"
+	"net/http"
+	"net/url"
+	"strings"
+	"time"
+
+	"github.com/labstack/echo/v4"
+	"github.com/mudler/LocalAI/core/config"
+	"github.com/mudler/xlog"
+)
+
+var corsProxyClient = &http.Client{
+	Timeout: 10 * time.Minute,
+}
+
+// CORSProxyEndpoint proxies HTTP requests to external MCP servers,
+// solving CORS issues for browser-based MCP connections.
+// The target URL is passed as a query parameter: /api/cors-proxy?url=https://...
+func CORSProxyEndpoint(appConfig *config.ApplicationConfig) echo.HandlerFunc {
+	return func(c echo.Context) error {
+		targetURL := c.QueryParam("url")
+		if targetURL == "" {
+			return c.JSON(http.StatusBadRequest, map[string]string{"error": "missing 'url' query parameter"})
+		}
+
+		parsed, err := url.Parse(targetURL)
+		if err != nil {
+			return c.JSON(http.StatusBadRequest, map[string]string{"error": "invalid target URL"})
+		}
+
+		if parsed.Scheme != "http" && parsed.Scheme != "https" {
+			return c.JSON(http.StatusBadRequest, map[string]string{"error": "only http and https schemes are supported"})
+		}
+
+		xlog.Debug("CORS proxy request", "method", c.Request().Method, "target", targetURL)
+
+		proxyReq, err := http.NewRequestWithContext(
+			c.Request().Context(),
+			c.Request().Method,
+			targetURL,
+			c.Request().Body,
+		)
+		if err != nil {
+			return fmt.Errorf("failed to create proxy request: %w", err)
+		}
+
+		// Copy headers from the original request, excluding hop-by-hop headers
+		skipHeaders := map[string]bool{
+			"Host": true, "Connection": true, "Keep-Alive": true,
+			"Transfer-Encoding": true, "Upgrade": true, "Origin": true,
+			"Referer": true,
+		}
+		for key, values := range c.Request().Header {
+			if skipHeaders[key] {
+				continue
+			}
+			for _, v := range values {
+				proxyReq.Header.Add(key, v)
+			}
+		}
+
+		resp, err := corsProxyClient.Do(proxyReq)
+		if err != nil {
+			xlog.Error("CORS proxy request failed", "error", err, "target", targetURL)
+			return c.JSON(http.StatusBadGateway, map[string]string{"error": "proxy request failed: " + err.Error()})
+		}
+		defer resp.Body.Close()
+
+		// Copy response headers
+		for key, values := range resp.Header {
+			lower := strings.ToLower(key)
+			// Skip CORS headers — we'll set our own
+			if strings.HasPrefix(lower, "access-control-") {
+				continue
+			}
+			for _, v := range values {
+				c.Response().Header().Add(key, v)
+			}
+		}
+
+		// Set CORS headers to allow browser access
+		c.Response().Header().Set("Access-Control-Allow-Origin", "*")
+		c.Response().Header().Set("Access-Control-Allow-Methods", "GET, POST, PUT, DELETE, OPTIONS")
+		c.Response().Header().Set("Access-Control-Allow-Headers", "*")
+		c.Response().Header().Set("Access-Control-Expose-Headers", "*")
+
+		c.Response().WriteHeader(resp.StatusCode)
+
+		// Stream the response body
+		_, err = io.Copy(c.Response().Writer, resp.Body)
+		return err
+	}
+}
+
+// CORSProxyOptionsEndpoint handles CORS preflight requests for the proxy.
+func CORSProxyOptionsEndpoint() echo.HandlerFunc {
+	return func(c echo.Context) error {
+		c.Response().Header().Set("Access-Control-Allow-Origin", "*")
+		c.Response().Header().Set("Access-Control-Allow-Methods", "GET, POST, PUT, DELETE, OPTIONS")
+		c.Response().Header().Set("Access-Control-Allow-Headers", "*")
+		c.Response().Header().Set("Access-Control-Max-Age", "86400")
+		return c.NoContent(http.StatusNoContent)
+	}
+}
--- a/core/http/endpoints/localai/mcp.go
+++ b/core/http/endpoints/localai/mcp.go
@@ -1,26 +1,19 @@
 package localai

 import (
-	"context"
-	"encoding/json"
-	"errors"
 	"fmt"
-	"net"
-	"time"
+	"strings"

 	"github.com/labstack/echo/v4"
 	"github.com/mudler/LocalAI/core/config"
-	mcpTools "github.com/mudler/LocalAI/core/http/endpoints/mcp"
+	"github.com/mudler/LocalAI/core/http/endpoints/openai"
 	"github.com/mudler/LocalAI/core/http/middleware"
 	"github.com/mudler/LocalAI/core/schema"
 	"github.com/mudler/LocalAI/core/templates"
 	"github.com/mudler/LocalAI/pkg/model"
-	"github.com/mudler/cogito"
-	"github.com/mudler/cogito/clients"
-	"github.com/mudler/xlog"
 )

-// MCP SSE Event Types
+// MCP SSE Event Types (kept for backward compatibility with MCP endpoint consumers)
 type MCPReasoningEvent struct {
 	Type    string `json:"type"`
 	Content string `json:"content"`
@@ -54,262 +47,53 @@ type MCPErrorEvent struct {
 	Message string `json:"message"`
 }

-// MCPEndpoint is the endpoint for MCP chat completions. Supports SSE mode, but it is not compatible with the OpenAI apis.
-// @Summary Stream MCP chat completions with reasoning, tool calls, and results
+// MCPEndpoint is the endpoint for MCP chat completions.
+// It enables all MCP servers for the model and delegates to the standard chat endpoint,
+// which handles MCP tool injection and server-side execution.
+// Both streaming and non-streaming modes use standard OpenAI response format.
+// @Summary MCP chat completions with automatic tool execution
 // @Param request body schema.OpenAIRequest true "query params"
 // @Success 200 {object} schema.OpenAIResponse "Response"
 // @Router /v1/mcp/chat/completions [post]
 func MCPEndpoint(cl *config.ModelConfigLoader, ml *model.ModelLoader, evaluator *templates.Evaluator, appConfig *config.ApplicationConfig) echo.HandlerFunc {
+	chatHandler := openai.ChatEndpoint(cl, ml, evaluator, appConfig)
+
 	return func(c echo.Context) error {
-		ctx := c.Request().Context()
-		created := int(time.Now().Unix())
-
-		// Handle Correlation
-		id := c.Request().Header.Get("X-Correlation-ID")
-		if id == "" {
-			id = fmt.Sprintf("mcp-%d", time.Now().UnixNano())
-		}
-
 		input, ok := c.Get(middleware.CONTEXT_LOCALS_KEY_LOCALAI_REQUEST).(*schema.OpenAIRequest)
 		if !ok || input.Model == "" {
 			return echo.ErrBadRequest
 		}

-		config, ok := c.Get(middleware.CONTEXT_LOCALS_KEY_MODEL_CONFIG).(*config.ModelConfig)
-		if !ok || config == nil {
+		modelConfig, ok := c.Get(middleware.CONTEXT_LOCALS_KEY_MODEL_CONFIG).(*config.ModelConfig)
+		if !ok || modelConfig == nil {
 			return echo.ErrBadRequest
 		}

-		if config.MCP.Servers == "" && config.MCP.Stdio == "" {
+		if modelConfig.MCP.Servers == "" && modelConfig.MCP.Stdio == "" {
 			return fmt.Errorf("no MCP servers configured")
 		}

-		// Get MCP config from model config
-		remote, stdio, err := config.MCP.MCPConfigFromYAML()
-		if err != nil {
-			return fmt.Errorf("failed to get MCP config: %w", err)
+		// Enable all MCP servers if none explicitly specified (preserve original behavior)
+		if input.Metadata == nil {
+			input.Metadata = map[string]string{}
 		}
-
-		// Check if we have tools in cache, or we have to have an initial connection
-		sessions, err := mcpTools.SessionsFromMCPConfig(config.Name, remote, stdio)
-		if err != nil {
-			return fmt.Errorf("failed to get MCP sessions: %w", err)
-		}
-
-		if len(sessions) == 0 {
-			return fmt.Errorf("no working MCP servers found")
-		}
-
-		// Build fragment from messages
-		fragment := cogito.NewEmptyFragment()
-		for _, message := range input.Messages {
-			fragment = fragment.AddMessage(cogito.MessageRole(message.Role), message.StringContent)
-		}
-
-		_, port, err := net.SplitHostPort(appConfig.APIAddress)
-		if err != nil {
-			return err
-		}
-		apiKey := ""
-		if len(appConfig.ApiKeys) > 0 {
-			apiKey = appConfig.ApiKeys[0]
-		}
-
-		ctxWithCancellation, cancel := context.WithCancel(ctx)
-		defer cancel()
-
-		// TODO: instead of connecting to the API, we should just wire this internally
-		// and act like completion.go.
-		// We can do this as cogito expects an interface and we can create one that
-		// we satisfy to just call internally ComputeChoices
-		defaultLLM := clients.NewLocalAILLM(config.Name, apiKey, "http://127.0.0.1:"+port)
-
-		// Build cogito options using the consolidated method
-		cogitoOpts := config.BuildCogitoOptions()
-		cogitoOpts = append(
-			cogitoOpts,
-			cogito.WithContext(ctxWithCancellation),
-			cogito.WithMCPs(sessions...),
-		)
-		// Check if streaming is requested
-		toStream := input.Stream
-
-		if !toStream {
-			// Non-streaming mode: execute synchronously and return JSON response
-			cogitoOpts = append(
-				cogitoOpts,
-				cogito.WithStatusCallback(func(s string) {
-					xlog.Debug("[model agent] Status", "model", config.Name, "status", s)
-				}),
-				cogito.WithReasoningCallback(func(s string) {
-					xlog.Debug("[model agent] Reasoning", "model", config.Name, "reasoning", s)
-				}),
-				cogito.WithToolCallBack(func(t *cogito.ToolChoice, state *cogito.SessionState) cogito.ToolCallDecision {
-					xlog.Debug("[model agent] Tool call", "model", config.Name, "tool", t.Name, "reasoning", t.Reasoning, "arguments", t.Arguments)
-					return cogito.ToolCallDecision{
-						Approved: true,
-					}
-				}),
-				cogito.WithToolCallResultCallback(func(t cogito.ToolStatus) {
-					xlog.Debug("[model agent] Tool call result", "model", config.Name, "tool", t.Name, "result", t.Result, "tool_arguments", t.ToolArguments)
-				}),
-			)
-
-			f, err := cogito.ExecuteTools(
-				defaultLLM, fragment,
-				cogitoOpts...,
-			)
-			if err != nil && !errors.Is(err, cogito.ErrNoToolSelected) {
-				return err
+		if _, hasMCP := input.Metadata["mcp_servers"]; !hasMCP {
+			remote, stdio, err := modelConfig.MCP.MCPConfigFromYAML()
+			if err != nil {
+				return fmt.Errorf("failed to get MCP config: %w", err)
 			}
-
-			resp := &schema.OpenAIResponse{
-				ID:      id,
-				Created: created,
-				Model:   input.Model, // we have to return what the user sent here, due to OpenAI spec.
-				Choices: []schema.Choice{{Message: &schema.Message{Role: "assistant", Content: &f.LastMessage().Content}}},
-				Object:  "chat.completion",
+			var allServers []string
+			for name := range remote.Servers {
+				allServers = append(allServers, name)
 			}
-
-			jsonResult, _ := json.Marshal(resp)
-			xlog.Debug("Response", "response", string(jsonResult))
-
-			// Return the prediction in the response body
-			return c.JSON(200, resp)
+			for name := range stdio.Servers {
+				allServers = append(allServers, name)
+			}
+			input.Metadata["mcp_servers"] = strings.Join(allServers, ",")
 		}

-		// Streaming mode: use SSE
-		// Set up SSE headers
-		c.Response().Header().Set("Content-Type", "text/event-stream")
-		c.Response().Header().Set("Cache-Control", "no-cache")
-		c.Response().Header().Set("Connection", "keep-alive")
-		c.Response().Header().Set("X-Correlation-ID", id)
-
-		// Create channel for streaming events
-		events := make(chan interface{})
-		ended := make(chan error, 1)
-
-		// Set up callbacks for streaming
-		statusCallback := func(s string) {
-			events <- MCPStatusEvent{
-				Type:    "status",
-				Message: s,
-			}
-		}
-
-		reasoningCallback := func(s string) {
-			events <- MCPReasoningEvent{
-				Type:    "reasoning",
-				Content: s,
-			}
-		}
-
-		toolCallCallback := func(t *cogito.ToolChoice, state *cogito.SessionState) cogito.ToolCallDecision {
-			events <- MCPToolCallEvent{
-				Type:      "tool_call",
-				Name:      t.Name,
-				Arguments: t.Arguments,
-				Reasoning: t.Reasoning,
-			}
-			return cogito.ToolCallDecision{
-				Approved: true,
-			}
-		}
-
-		toolCallResultCallback := func(t cogito.ToolStatus) {
-			events <- MCPToolResultEvent{
-				Type:   "tool_result",
-				Name:   t.Name,
-				Result: t.Result,
-			}
-		}
-
-		cogitoOpts = append(cogitoOpts,
-			cogito.WithStatusCallback(statusCallback),
-			cogito.WithReasoningCallback(reasoningCallback),
-			cogito.WithToolCallBack(toolCallCallback),
-			cogito.WithToolCallResultCallback(toolCallResultCallback),
-		)
-
-		// Execute tools in a goroutine
-		go func() {
-			defer close(events)
-
-			f, err := cogito.ExecuteTools(
-				defaultLLM, fragment,
-				cogitoOpts...,
-			)
-			if err != nil && !errors.Is(err, cogito.ErrNoToolSelected) {
-				events <- MCPErrorEvent{
-					Type:    "error",
-					Message: fmt.Sprintf("Failed to execute tools: %v", err),
-				}
-				ended <- err
-				return
-			}
-
-			// Stream final assistant response
-			content := f.LastMessage().Content
-			events <- MCPAssistantEvent{
-				Type:    "assistant",
-				Content: content,
-			}
-
-			ended <- nil
-		}()
-
-		// Stream events to client
-	LOOP:
-		for {
-			select {
-			case <-ctx.Done():
-				// Context was cancelled (client disconnected or request cancelled)
-				xlog.Debug("Request context cancelled, stopping stream")
-				cancel()
-				break LOOP
-			case event := <-events:
-				if event == nil {
-					// Channel closed
-					break LOOP
-				}
-				eventData, err := json.Marshal(event)
-				if err != nil {
-					xlog.Debug("Failed to marshal event", "error", err)
-					continue
-				}
-				xlog.Debug("Sending event", "event", string(eventData))
-				_, err = fmt.Fprintf(c.Response().Writer, "data: %s\n\n", string(eventData))
-				if err != nil {
-					xlog.Debug("Sending event failed", "error", err)
-					cancel()
-					return err
-				}
-				c.Response().Flush()
-			case err := <-ended:
-				if err == nil {
-					// Send done signal
-					fmt.Fprintf(c.Response().Writer, "data: [DONE]\n\n")
-					c.Response().Flush()
-					break LOOP
-				}
-				xlog.Error("Stream ended with error", "error", err)
-				errorEvent := MCPErrorEvent{
-					Type:    "error",
-					Message: err.Error(),
-				}
-				errorData, marshalErr := json.Marshal(errorEvent)
-				if marshalErr != nil {
-					fmt.Fprintf(c.Response().Writer, "data: {\"type\":\"error\",\"message\":\"Internal error\"}\n\n")
-				} else {
-					fmt.Fprintf(c.Response().Writer, "data: %s\n\n", string(errorData))
-				}
-				fmt.Fprintf(c.Response().Writer, "data: [DONE]\n\n")
-				c.Response().Flush()
-				return nil
-			}
-		}
-
-		xlog.Debug("Stream ended")
-		return nil
+		// Delegate to the standard chat endpoint which handles MCP tool
+		// injection and server-side execution for both streaming and non-streaming.
+		return chatHandler(c)
 	}
 }
--- a/core/http/endpoints/localai/mcp_prompts.go
+++ b/core/http/endpoints/localai/mcp_prompts.go
@@ -0,0 +1,141 @@
+package localai
+
+import (
+	"fmt"
+
+	"github.com/labstack/echo/v4"
+	"github.com/mudler/LocalAI/core/config"
+	mcpTools "github.com/mudler/LocalAI/core/http/endpoints/mcp"
+)
+
+// MCPPromptsEndpoint returns the list of MCP prompts for a given model.
+// GET /v1/mcp/prompts/:model
+func MCPPromptsEndpoint(cl *config.ModelConfigLoader, appConfig *config.ApplicationConfig) echo.HandlerFunc {
+	return func(c echo.Context) error {
+		modelName := c.Param("model")
+		if modelName == "" {
+			return echo.ErrBadRequest
+		}
+
+		cfg, exists := cl.GetModelConfig(modelName)
+		if !exists {
+			return fmt.Errorf("model %q not found", modelName)
+		}
+
+		if cfg.MCP.Servers == "" && cfg.MCP.Stdio == "" {
+			return c.JSON(200, []any{})
+		}
+
+		remote, stdio, err := cfg.MCP.MCPConfigFromYAML()
+		if err != nil {
+			return fmt.Errorf("failed to parse MCP config: %w", err)
+		}
+
+		namedSessions, err := mcpTools.NamedSessionsFromMCPConfig(cfg.Name, remote, stdio, nil)
+		if err != nil {
+			return fmt.Errorf("failed to get MCP sessions: %w", err)
+		}
+
+		prompts, err := mcpTools.DiscoverMCPPrompts(c.Request().Context(), namedSessions)
+		if err != nil {
+			return fmt.Errorf("failed to discover MCP prompts: %w", err)
+		}
+
+		type promptArgJSON struct {
+			Name        string `json:"name"`
+			Description string `json:"description,omitempty"`
+			Required    bool   `json:"required,omitempty"`
+		}
+		type promptJSON struct {
+			Name        string          `json:"name"`
+			Description string          `json:"description,omitempty"`
+			Title       string          `json:"title,omitempty"`
+			Arguments   []promptArgJSON `json:"arguments,omitempty"`
+			Server      string          `json:"server"`
+		}
+
+		var result []promptJSON
+		for _, p := range prompts {
+			pj := promptJSON{
+				Name:        p.PromptName,
+				Description: p.Description,
+				Title:       p.Title,
+				Server:      p.ServerName,
+			}
+			for _, arg := range p.Arguments {
+				pj.Arguments = append(pj.Arguments, promptArgJSON{
+					Name:        arg.Name,
+					Description: arg.Description,
+					Required:    arg.Required,
+				})
+			}
+			result = append(result, pj)
+		}
+
+		return c.JSON(200, result)
+	}
+}
+
+// MCPGetPromptEndpoint expands a prompt by name with the given arguments.
+// POST /v1/mcp/prompts/:model/:prompt
+func MCPGetPromptEndpoint(cl *config.ModelConfigLoader, appConfig *config.ApplicationConfig) echo.HandlerFunc {
+	return func(c echo.Context) error {
+		modelName := c.Param("model")
+		promptName := c.Param("prompt")
+		if modelName == "" || promptName == "" {
+			return echo.ErrBadRequest
+		}
+
+		cfg, exists := cl.GetModelConfig(modelName)
+		if !exists {
+			return fmt.Errorf("model %q not found", modelName)
+		}
+
+		if cfg.MCP.Servers == "" && cfg.MCP.Stdio == "" {
+			return fmt.Errorf("no MCP servers configured for model %q", modelName)
+		}
+
+		var req struct {
+			Arguments map[string]string `json:"arguments"`
+		}
+		if err := c.Bind(&req); err != nil {
+			return echo.ErrBadRequest
+		}
+
+		remote, stdio, err := cfg.MCP.MCPConfigFromYAML()
+		if err != nil {
+			return fmt.Errorf("failed to parse MCP config: %w", err)
+		}
+
+		namedSessions, err := mcpTools.NamedSessionsFromMCPConfig(cfg.Name, remote, stdio, nil)
+		if err != nil {
+			return fmt.Errorf("failed to get MCP sessions: %w", err)
+		}
+
+		prompts, err := mcpTools.DiscoverMCPPrompts(c.Request().Context(), namedSessions)
+		if err != nil {
+			return fmt.Errorf("failed to discover MCP prompts: %w", err)
+		}
+
+		messages, err := mcpTools.GetMCPPrompt(c.Request().Context(), prompts, promptName, req.Arguments)
+		if err != nil {
+			return fmt.Errorf("failed to get prompt: %w", err)
+		}
+
+		type messageJSON struct {
+			Role    string `json:"role"`
+			Content string `json:"content"`
+		}
+		var result []messageJSON
+		for _, m := range messages {
+			result = append(result, messageJSON{
+				Role:    string(m.Role),
+				Content: mcpTools.PromptMessageToText(m),
+			})
+		}
+
+		return c.JSON(200, map[string]any{
+			"messages": result,
+		})
+	}
+}
--- a/core/http/endpoints/localai/mcp_resources.go
+++ b/core/http/endpoints/localai/mcp_resources.go
@@ -0,0 +1,127 @@
+package localai
+
+import (
+	"fmt"
+
+	"github.com/labstack/echo/v4"
+	"github.com/mudler/LocalAI/core/config"
+	mcpTools "github.com/mudler/LocalAI/core/http/endpoints/mcp"
+)
+
+// MCPResourcesEndpoint returns the list of MCP resources for a given model.
+// GET /v1/mcp/resources/:model
+func MCPResourcesEndpoint(cl *config.ModelConfigLoader, appConfig *config.ApplicationConfig) echo.HandlerFunc {
+	return func(c echo.Context) error {
+		modelName := c.Param("model")
+		if modelName == "" {
+			return echo.ErrBadRequest
+		}
+
+		cfg, exists := cl.GetModelConfig(modelName)
+		if !exists {
+			return fmt.Errorf("model %q not found", modelName)
+		}
+
+		if cfg.MCP.Servers == "" && cfg.MCP.Stdio == "" {
+			return c.JSON(200, []any{})
+		}
+
+		remote, stdio, err := cfg.MCP.MCPConfigFromYAML()
+		if err != nil {
+			return fmt.Errorf("failed to parse MCP config: %w", err)
+		}
+
+		namedSessions, err := mcpTools.NamedSessionsFromMCPConfig(cfg.Name, remote, stdio, nil)
+		if err != nil {
+			return fmt.Errorf("failed to get MCP sessions: %w", err)
+		}
+
+		resources, err := mcpTools.DiscoverMCPResources(c.Request().Context(), namedSessions)
+		if err != nil {
+			return fmt.Errorf("failed to discover MCP resources: %w", err)
+		}
+
+		type resourceJSON struct {
+			Name        string `json:"name"`
+			URI         string `json:"uri"`
+			Description string `json:"description,omitempty"`
+			MIMEType    string `json:"mimeType,omitempty"`
+			Server      string `json:"server"`
+		}
+
+		var result []resourceJSON
+		for _, r := range resources {
+			result = append(result, resourceJSON{
+				Name:        r.Name,
+				URI:         r.URI,
+				Description: r.Description,
+				MIMEType:    r.MIMEType,
+				Server:      r.ServerName,
+			})
+		}
+
+		return c.JSON(200, result)
+	}
+}
+
+// MCPReadResourceEndpoint reads a specific MCP resource by URI.
+// POST /v1/mcp/resources/:model/read
+func MCPReadResourceEndpoint(cl *config.ModelConfigLoader, appConfig *config.ApplicationConfig) echo.HandlerFunc {
+	return func(c echo.Context) error {
+		modelName := c.Param("model")
+		if modelName == "" {
+			return echo.ErrBadRequest
+		}
+
+		cfg, exists := cl.GetModelConfig(modelName)
+		if !exists {
+			return fmt.Errorf("model %q not found", modelName)
+		}
+
+		if cfg.MCP.Servers == "" && cfg.MCP.Stdio == "" {
+			return fmt.Errorf("no MCP servers configured for model %q", modelName)
+		}
+
+		var req struct {
+			URI string `json:"uri"`
+		}
+		if err := c.Bind(&req); err != nil || req.URI == "" {
+			return echo.ErrBadRequest
+		}
+
+		remote, stdio, err := cfg.MCP.MCPConfigFromYAML()
+		if err != nil {
+			return fmt.Errorf("failed to parse MCP config: %w", err)
+		}
+
+		namedSessions, err := mcpTools.NamedSessionsFromMCPConfig(cfg.Name, remote, stdio, nil)
+		if err != nil {
+			return fmt.Errorf("failed to get MCP sessions: %w", err)
+		}
+
+		resources, err := mcpTools.DiscoverMCPResources(c.Request().Context(), namedSessions)
+		if err != nil {
+			return fmt.Errorf("failed to discover MCP resources: %w", err)
+		}
+
+		content, err := mcpTools.ReadMCPResource(c.Request().Context(), resources, req.URI)
+		if err != nil {
+			return fmt.Errorf("failed to read resource: %w", err)
+		}
+
+		// Find the resource info for mimeType
+		mimeType := ""
+		for _, r := range resources {
+			if r.URI == req.URI {
+				mimeType = r.MIMEType
+				break
+			}
+		}
+
+		return c.JSON(200, map[string]any{
+			"uri":      req.URI,
+			"content":  content,
+			"mimeType": mimeType,
+		})
+	}
+}
--- a/core/http/endpoints/localai/mcp_tools.go
+++ b/core/http/endpoints/localai/mcp_tools.go
@@ -0,0 +1,91 @@
+package localai
+
+import (
+	"fmt"
+
+	"github.com/labstack/echo/v4"
+	"github.com/mudler/LocalAI/core/config"
+	mcpTools "github.com/mudler/LocalAI/core/http/endpoints/mcp"
+	"github.com/mudler/LocalAI/core/http/middleware"
+)
+
+// MCPServersEndpoint returns the list of MCP servers and their tools for a given model.
+// GET /v1/mcp/servers/:model
+func MCPServersEndpoint(cl *config.ModelConfigLoader, appConfig *config.ApplicationConfig) echo.HandlerFunc {
+	return func(c echo.Context) error {
+		modelName := c.Param("model")
+		if modelName == "" {
+			return echo.ErrBadRequest
+		}
+
+		cfg, exists := cl.GetModelConfig(modelName)
+		if !exists {
+			return fmt.Errorf("model %q not found", modelName)
+		}
+
+		if cfg.MCP.Servers == "" && cfg.MCP.Stdio == "" {
+			return c.JSON(200, map[string]any{
+				"model":   modelName,
+				"servers": []any{},
+			})
+		}
+
+		remote, stdio, err := cfg.MCP.MCPConfigFromYAML()
+		if err != nil {
+			return fmt.Errorf("failed to parse MCP config: %w", err)
+		}
+
+		namedSessions, err := mcpTools.NamedSessionsFromMCPConfig(cfg.Name, remote, stdio, nil)
+		if err != nil {
+			return fmt.Errorf("failed to get MCP sessions: %w", err)
+		}
+
+		servers, err := mcpTools.ListMCPServers(c.Request().Context(), namedSessions)
+		if err != nil {
+			return fmt.Errorf("failed to list MCP servers: %w", err)
+		}
+
+		return c.JSON(200, map[string]any{
+			"model":   modelName,
+			"servers": servers,
+		})
+	}
+}
+
+// MCPServersEndpointFromMiddleware is a version that uses the middleware-resolved model config.
+// This allows it to use the same middleware chain as other endpoints.
+func MCPServersEndpointFromMiddleware() echo.HandlerFunc {
+	return func(c echo.Context) error {
+		cfg, ok := c.Get(middleware.CONTEXT_LOCALS_KEY_MODEL_CONFIG).(*config.ModelConfig)
+		if !ok || cfg == nil {
+			return echo.ErrBadRequest
+		}
+
+		if cfg.MCP.Servers == "" && cfg.MCP.Stdio == "" {
+			return c.JSON(200, map[string]any{
+				"model":   cfg.Name,
+				"servers": []any{},
+			})
+		}
+
+		remote, stdio, err := cfg.MCP.MCPConfigFromYAML()
+		if err != nil {
+			return fmt.Errorf("failed to parse MCP config: %w", err)
+		}
+
+		namedSessions, err := mcpTools.NamedSessionsFromMCPConfig(cfg.Name, remote, stdio, nil)
+		if err != nil {
+			return fmt.Errorf("failed to get MCP sessions: %w", err)
+		}
+
+		servers, err := mcpTools.ListMCPServers(c.Request().Context(), namedSessions)
+		if err != nil {
+			return fmt.Errorf("failed to list MCP servers: %w", err)
+		}
+
+		return c.JSON(200, map[string]any{
+			"model":   cfg.Name,
+			"servers": servers,
+		})
+	}
+}
--- a/core/http/endpoints/mcp/tools.go
+++ b/core/http/endpoints/mcp/tools.go
@@ -2,32 +2,109 @@ package mcp

 import (
 	"context"
+	"encoding/json"
+	"fmt"
 	"net/http"
 	"os"
 	"os/exec"
+	"strings"
 	"sync"
 	"time"

 	"github.com/mudler/LocalAI/core/config"
+	"github.com/mudler/LocalAI/pkg/functions"
 	"github.com/mudler/LocalAI/pkg/signals"

 	"github.com/modelcontextprotocol/go-sdk/mcp"
 	"github.com/mudler/xlog"
 )

+// NamedSession pairs an MCP session with its server name and type.
+type NamedSession struct {
+	Name    string
+	Type    string // "remote" or "stdio"
+	Session *mcp.ClientSession
+}
+
+// MCPToolInfo holds a discovered MCP tool along with its origin session.
+type MCPToolInfo struct {
+	ServerName string
+	ToolName   string
+	Function   functions.Function
+	Session    *mcp.ClientSession
+}
+
+// MCPServerInfo describes an MCP server and its available tools, prompts, and resources.
+type MCPServerInfo struct {
+	Name      string   `json:"name"`
+	Type      string   `json:"type"`
+	Tools     []string `json:"tools"`
+	Prompts   []string `json:"prompts,omitempty"`
+	Resources []string `json:"resources,omitempty"`
+}
+
+// MCPPromptInfo holds a discovered MCP prompt along with its origin session.
+type MCPPromptInfo struct {
+	ServerName  string
+	PromptName  string
+	Description string
+	Title       string
+	Arguments   []*mcp.PromptArgument
+	Session     *mcp.ClientSession
+}
+
+// MCPResourceInfo holds a discovered MCP resource along with its origin session.
+type MCPResourceInfo struct {
+	ServerName  string
+	Name        string
+	URI         string
+	Description string
+	MIMEType    string
+	Session     *mcp.ClientSession
+}
+
 type sessionCache struct {
-	mu    sync.Mutex
-	cache map[string][]*mcp.ClientSession
+	mu      sync.Mutex
+	cache   map[string][]*mcp.ClientSession
+	cancels map[string]context.CancelFunc
+}
+
+type namedSessionCache struct {
+	mu      sync.Mutex
+	cache   map[string][]NamedSession
+	cancels map[string]context.CancelFunc
 }

 var (
 	cache = sessionCache{
-		cache: make(map[string][]*mcp.ClientSession),
+		cache:   make(map[string][]*mcp.ClientSession),
+		cancels: make(map[string]context.CancelFunc),
+	}
+
+	namedCache = namedSessionCache{
+		cache:   make(map[string][]NamedSession),
+		cancels: make(map[string]context.CancelFunc),
 	}

 	client = mcp.NewClient(&mcp.Implementation{Name: "LocalAI", Version: "v1.0.0"}, nil)
 )

+// MCPServersFromMetadata extracts the MCP server list from the metadata map
+// and returns the list. The "mcp_servers" key is consumed (deleted from the map)
+// so it doesn't leak to the backend.
+func MCPServersFromMetadata(metadata map[string]string) []string {
+	raw, ok := metadata["mcp_servers"]
+	if !ok || raw == "" {
+		return nil
+	}
+	delete(metadata, "mcp_servers")
+	servers := strings.Split(raw, ",")
+	for i := range servers {
+		servers[i] = strings.TrimSpace(servers[i])
+	}
+	return servers
+}
+
 func SessionsFromMCPConfig(
 	name string,
 	remote config.MCPGenericConfig[config.MCPRemoteServers],
@@ -83,16 +160,461 @@ func SessionsFromMCPConfig(
 		allSessions = append(allSessions, mcpSession)
 	}

-	signals.RegisterGracefulTerminationHandler(func() {
-		for _, session := range allSessions {
-			session.Close()
-		}
-		cancel()
-	})
+	cache.cancels[name] = cancel

 	return allSessions, nil
 }

+// NamedSessionsFromMCPConfig returns sessions with their server names preserved.
+// If enabledServers is non-empty, only servers with matching names are returned.
+func NamedSessionsFromMCPConfig(
+	name string,
+	remote config.MCPGenericConfig[config.MCPRemoteServers],
+	stdio config.MCPGenericConfig[config.MCPSTDIOServers],
+	enabledServers []string,
+) ([]NamedSession, error) {
+	namedCache.mu.Lock()
+	defer namedCache.mu.Unlock()
+
+	allSessions, exists := namedCache.cache[name]
+	if !exists {
+		ctx, cancel := context.WithCancel(context.Background())
+
+		for serverName, server := range remote.Servers {
+			xlog.Debug("[MCP remote server] Configuration", "name", serverName, "server", server)
+			httpClient := &http.Client{
+				Timeout:   360 * time.Second,
+				Transport: newBearerTokenRoundTripper(server.Token, http.DefaultTransport),
+			}
+
+			transport := &mcp.StreamableClientTransport{Endpoint: server.URL, HTTPClient: httpClient}
+			mcpSession, err := client.Connect(ctx, transport, nil)
+			if err != nil {
+				xlog.Error("Failed to connect to MCP server", "error", err, "name", serverName, "url", server.URL)
+				continue
+			}
+			xlog.Debug("[MCP remote server] Connected", "name", serverName, "url", server.URL)
+			allSessions = append(allSessions, NamedSession{
+				Name:    serverName,
+				Type:    "remote",
+				Session: mcpSession,
+			})
+		}
+
+		for serverName, server := range stdio.Servers {
+			xlog.Debug("[MCP stdio server] Configuration", "name", serverName, "server", server)
+			command := exec.Command(server.Command, server.Args...)
+			command.Env = os.Environ()
+			for key, value := range server.Env {
+				command.Env = append(command.Env, key+"="+value)
+			}
+			transport := &mcp.CommandTransport{Command: command}
+			mcpSession, err := client.Connect(ctx, transport, nil)
+			if err != nil {
+				xlog.Error("Failed to start MCP server", "error", err, "name", serverName, "command", command)
+				continue
+			}
+			xlog.Debug("[MCP stdio server] Connected", "name", serverName, "command", command)
+			allSessions = append(allSessions, NamedSession{
+				Name:    serverName,
+				Type:    "stdio",
+				Session: mcpSession,
+			})
+		}
+
+		namedCache.cache[name] = allSessions
+		namedCache.cancels[name] = cancel
+	}
+
+	if len(enabledServers) == 0 {
+		return allSessions, nil
+	}
+
+	enabled := make(map[string]bool, len(enabledServers))
+	for _, s := range enabledServers {
+		enabled[s] = true
+	}
+	var filtered []NamedSession
+	for _, ns := range allSessions {
+		if enabled[ns.Name] {
+			filtered = append(filtered, ns)
+		}
+	}
+	return filtered, nil
+}
+
+// DiscoverMCPTools queries each session for its tools and converts them to functions.Function.
+// Deduplicates by tool name (first server wins).
+func DiscoverMCPTools(ctx context.Context, sessions []NamedSession) ([]MCPToolInfo, error) {
+	seen := make(map[string]bool)
+	var result []MCPToolInfo
+
+	for _, ns := range sessions {
+		toolsResult, err := ns.Session.ListTools(ctx, nil)
+		if err != nil {
+			xlog.Error("Failed to list tools from MCP server", "error", err, "server", ns.Name)
+			continue
+		}
+		for _, tool := range toolsResult.Tools {
+			if seen[tool.Name] {
+				continue
+			}
+			seen[tool.Name] = true
+
+			f := functions.Function{
+				Name:        tool.Name,
+				Description: tool.Description,
+			}
+
+			// Convert InputSchema to map[string]interface{} for functions.Function
+			if tool.InputSchema != nil {
+				schemaBytes, err := json.Marshal(tool.InputSchema)
+				if err == nil {
+					var params map[string]interface{}
+					if json.Unmarshal(schemaBytes, &params) == nil {
+						f.Parameters = params
+					}
+				}
+			}
+			if f.Parameters == nil {
+				f.Parameters = map[string]interface{}{
+					"type":       "object",
+					"properties": map[string]interface{}{},
+				}
+			}
+
+			result = append(result, MCPToolInfo{
+				ServerName: ns.Name,
+				ToolName:   tool.Name,
+				Function:   f,
+				Session:    ns.Session,
+			})
+		}
+	}
+	return result, nil
+}
+
+// ExecuteMCPToolCall finds the matching tool and executes it.
+func ExecuteMCPToolCall(ctx context.Context, tools []MCPToolInfo, toolName string, arguments string) (string, error) {
+	var toolInfo *MCPToolInfo
+	for i := range tools {
+		if tools[i].ToolName == toolName {
+			toolInfo = &tools[i]
+			break
+		}
+	}
+	if toolInfo == nil {
+		return "", fmt.Errorf("MCP tool %q not found", toolName)
+	}
+
+	var args map[string]any
+	if arguments != "" {
+		if err := json.Unmarshal([]byte(arguments), &args); err != nil {
+			return "", fmt.Errorf("failed to parse arguments for tool %q: %w", toolName, err)
+		}
+	}
+
+	result, err := toolInfo.Session.CallTool(ctx, &mcp.CallToolParams{
+		Name:      toolName,
+		Arguments: args,
+	})
+	if err != nil {
+		return "", fmt.Errorf("MCP tool %q call failed: %w", toolName, err)
+	}
+
+	// Extract text content from result
+	var texts []string
+	for _, content := range result.Content {
+		if tc, ok := content.(*mcp.TextContent); ok {
+			texts = append(texts, tc.Text)
+		}
+	}
+	if len(texts) == 0 {
+		// Fallback: marshal the whole result
+		data, _ := json.Marshal(result.Content)
+		return string(data), nil
+	}
+	if len(texts) == 1 {
+		return texts[0], nil
+	}
+	combined, _ := json.Marshal(texts)
+	return string(combined), nil
+}
+
+// ListMCPServers returns server info with tool, prompt, and resource names for each session.
+func ListMCPServers(ctx context.Context, sessions []NamedSession) ([]MCPServerInfo, error) {
+	var result []MCPServerInfo
+	for _, ns := range sessions {
+		info := MCPServerInfo{
+			Name: ns.Name,
+			Type: ns.Type,
+		}
+		toolsResult, err := ns.Session.ListTools(ctx, nil)
+		if err != nil {
+			xlog.Error("Failed to list tools from MCP server", "error", err, "server", ns.Name)
+		} else {
+			for _, tool := range toolsResult.Tools {
+				info.Tools = append(info.Tools, tool.Name)
+			}
+		}
+
+		promptsResult, err := ns.Session.ListPrompts(ctx, nil)
+		if err != nil {
+			xlog.Debug("Failed to list prompts from MCP server", "error", err, "server", ns.Name)
+		} else {
+			for _, p := range promptsResult.Prompts {
+				info.Prompts = append(info.Prompts, p.Name)
+			}
+		}
+
+		resourcesResult, err := ns.Session.ListResources(ctx, nil)
+		if err != nil {
+			xlog.Debug("Failed to list resources from MCP server", "error", err, "server", ns.Name)
+		} else {
+			for _, r := range resourcesResult.Resources {
+				info.Resources = append(info.Resources, r.URI)
+			}
+		}
+
+		result = append(result, info)
+	}
+	return result, nil
+}
+
+// IsMCPTool checks if a tool name is in the MCP tool list.
+func IsMCPTool(tools []MCPToolInfo, name string) bool {
+	for _, t := range tools {
+		if t.ToolName == name {
+			return true
+		}
+	}
+	return false
+}
+
+// DiscoverMCPPrompts queries each session for its prompts.
+// Deduplicates by prompt name (first server wins).
+func DiscoverMCPPrompts(ctx context.Context, sessions []NamedSession) ([]MCPPromptInfo, error) {
+	seen := make(map[string]bool)
+	var result []MCPPromptInfo
+
+	for _, ns := range sessions {
+		promptsResult, err := ns.Session.ListPrompts(ctx, nil)
+		if err != nil {
+			xlog.Error("Failed to list prompts from MCP server", "error", err, "server", ns.Name)
+			continue
+		}
+		for _, p := range promptsResult.Prompts {
+			if seen[p.Name] {
+				continue
+			}
+			seen[p.Name] = true
+			result = append(result, MCPPromptInfo{
+				ServerName:  ns.Name,
+				PromptName:  p.Name,
+				Description: p.Description,
+				Title:       p.Title,
+				Arguments:   p.Arguments,
+				Session:     ns.Session,
+			})
+		}
+	}
+	return result, nil
+}
+
+// GetMCPPrompt finds and expands a prompt by name using the discovered prompts list.
+func GetMCPPrompt(ctx context.Context, prompts []MCPPromptInfo, name string, args map[string]string) ([]*mcp.PromptMessage, error) {
+	var info *MCPPromptInfo
+	for i := range prompts {
+		if prompts[i].PromptName == name {
+			info = &prompts[i]
+			break
+		}
+	}
+	if info == nil {
+		return nil, fmt.Errorf("MCP prompt %q not found", name)
+	}
+
+	result, err := info.Session.GetPrompt(ctx, &mcp.GetPromptParams{
+		Name:      name,
+		Arguments: args,
+	})
+	if err != nil {
+		return nil, fmt.Errorf("MCP prompt %q get failed: %w", name, err)
+	}
+	return result.Messages, nil
+}
+
+// DiscoverMCPResources queries each session for its resources.
+// Deduplicates by URI (first server wins).
+func DiscoverMCPResources(ctx context.Context, sessions []NamedSession) ([]MCPResourceInfo, error) {
+	seen := make(map[string]bool)
+	var result []MCPResourceInfo
+
+	for _, ns := range sessions {
+		resourcesResult, err := ns.Session.ListResources(ctx, nil)
+		if err != nil {
+			xlog.Error("Failed to list resources from MCP server", "error", err, "server", ns.Name)
+			continue
+		}
+		for _, r := range resourcesResult.Resources {
+			if seen[r.URI] {
+				continue
+			}
+			seen[r.URI] = true
+			result = append(result, MCPResourceInfo{
+				ServerName:  ns.Name,
+				Name:        r.Name,
+				URI:         r.URI,
+				Description: r.Description,
+				MIMEType:    r.MIMEType,
+				Session:     ns.Session,
+			})
+		}
+	}
+	return result, nil
+}
+
+// ReadMCPResource reads a resource by URI from the matching session.
+func ReadMCPResource(ctx context.Context, resources []MCPResourceInfo, uri string) (string, error) {
+	var info *MCPResourceInfo
+	for i := range resources {
+		if resources[i].URI == uri {
+			info = &resources[i]
+			break
+		}
+	}
+	if info == nil {
+		return "", fmt.Errorf("MCP resource %q not found", uri)
+	}
+
+	result, err := info.Session.ReadResource(ctx, &mcp.ReadResourceParams{URI: uri})
+	if err != nil {
+		return "", fmt.Errorf("MCP resource %q read failed: %w", uri, err)
+	}
+
+	var texts []string
+	for _, c := range result.Contents {
+		if c.Text != "" {
+			texts = append(texts, c.Text)
+		}
+	}
+	return strings.Join(texts, "\n"), nil
+}
+
+// MCPPromptFromMetadata extracts the prompt name and arguments from metadata.
+// The "mcp_prompt" and "mcp_prompt_args" keys are consumed (deleted from the map).
+func MCPPromptFromMetadata(metadata map[string]string) (string, map[string]string) {
+	name, ok := metadata["mcp_prompt"]
+	if !ok || name == "" {
+		return "", nil
+	}
+	delete(metadata, "mcp_prompt")
+
+	var args map[string]string
+	if raw, ok := metadata["mcp_prompt_args"]; ok && raw != "" {
+		json.Unmarshal([]byte(raw), &args)
+		delete(metadata, "mcp_prompt_args")
+	}
+	return name, args
+}
+
+// MCPResourcesFromMetadata extracts resource URIs from metadata.
+// The "mcp_resources" key is consumed (deleted from the map).
+func MCPResourcesFromMetadata(metadata map[string]string) []string {
+	raw, ok := metadata["mcp_resources"]
+	if !ok || raw == "" {
+		return nil
+	}
+	delete(metadata, "mcp_resources")
+	uris := strings.Split(raw, ",")
+	for i := range uris {
+		uris[i] = strings.TrimSpace(uris[i])
+	}
+	return uris
+}
+
+// PromptMessageToText extracts text from a PromptMessage's Content.
+func PromptMessageToText(msg *mcp.PromptMessage) string {
+	if tc, ok := msg.Content.(*mcp.TextContent); ok {
+		return tc.Text
+	}
+	// Fallback: marshal content
+	data, _ := json.Marshal(msg.Content)
+	return string(data)
+}
+
+// CloseMCPSessions closes all MCP sessions for a given model and removes them from the cache.
+// This should be called when a model is unloaded or shut down.
+func CloseMCPSessions(modelName string) {
+	// Close sessions in the unnamed cache
+	cache.mu.Lock()
+	if sessions, ok := cache.cache[modelName]; ok {
+		for _, s := range sessions {
+			s.Close()
+		}
+		delete(cache.cache, modelName)
+	}
+	if cancel, ok := cache.cancels[modelName]; ok {
+		cancel()
+		delete(cache.cancels, modelName)
+	}
+	cache.mu.Unlock()
+
+	// Close sessions in the named cache
+	namedCache.mu.Lock()
+	if sessions, ok := namedCache.cache[modelName]; ok {
+		for _, ns := range sessions {
+			ns.Session.Close()
+		}
+		delete(namedCache.cache, modelName)
+	}
+	if cancel, ok := namedCache.cancels[modelName]; ok {
+		cancel()
+		delete(namedCache.cancels, modelName)
+	}
+	namedCache.mu.Unlock()
+
+	xlog.Debug("Closed MCP sessions for model", "model", modelName)
+}
+
+// CloseAllMCPSessions closes all cached MCP sessions across all models.
+// This should be called during graceful shutdown.
+func CloseAllMCPSessions() {
+	cache.mu.Lock()
+	for name, sessions := range cache.cache {
+		for _, s := range sessions {
+			s.Close()
+		}
+		if cancel, ok := cache.cancels[name]; ok {
+			cancel()
+		}
+	}
+	cache.cache = make(map[string][]*mcp.ClientSession)
+	cache.cancels = make(map[string]context.CancelFunc)
+	cache.mu.Unlock()
+
+	namedCache.mu.Lock()
+	for name, sessions := range namedCache.cache {
+		for _, ns := range sessions {
+			ns.Session.Close()
+		}
+		if cancel, ok := namedCache.cancels[name]; ok {
+			cancel()
+		}
+	}
+	namedCache.cache = make(map[string][]NamedSession)
+	namedCache.cancels = make(map[string]context.CancelFunc)
+	namedCache.mu.Unlock()
+
+	xlog.Debug("Closed all MCP sessions")
+}
+
+func init() {
+	signals.RegisterGracefulTerminationHandler(func() {
+		CloseAllMCPSessions()
+	})
+}
+
 // bearerTokenRoundTripper is a custom roundtripper that injects a bearer token
 // into HTTP requests
 type bearerTokenRoundTripper struct {
--- a/core/http/endpoints/openai/chat.go
+++ b/core/http/endpoints/openai/chat.go
@@ -10,6 +10,7 @@ import (
 	"github.com/labstack/echo/v4"
 	"github.com/mudler/LocalAI/core/backend"
 	"github.com/mudler/LocalAI/core/config"
+	mcpTools "github.com/mudler/LocalAI/core/http/endpoints/mcp"
 	"github.com/mudler/LocalAI/core/http/middleware"
 	"github.com/mudler/LocalAI/core/schema"
 	"github.com/mudler/LocalAI/pkg/functions"
@@ -22,6 +23,37 @@ import (
 	"github.com/mudler/xlog"
 )

+// mergeToolCallDeltas merges streaming tool call deltas into complete tool calls.
+// In SSE streaming, a single tool call arrives as multiple chunks sharing the same Index:
+// the first chunk carries the ID, Type, and Name; subsequent chunks append to Arguments.
+func mergeToolCallDeltas(existing []schema.ToolCall, deltas []schema.ToolCall) []schema.ToolCall {
+	byIndex := make(map[int]int, len(existing)) // tool call Index -> position in slice
+	for i, tc := range existing {
+		byIndex[tc.Index] = i
+	}
+	for _, d := range deltas {
+		pos, found := byIndex[d.Index]
+		if !found {
+			byIndex[d.Index] = len(existing)
+			existing = append(existing, d)
+			continue
+		}
+		// Merge into existing entry
+		tc := &existing[pos]
+		if d.ID != "" {
+			tc.ID = d.ID
+		}
+		if d.Type != "" {
+			tc.Type = d.Type
+		}
+		if d.FunctionCall.Name != "" {
+			tc.FunctionCall.Name = d.FunctionCall.Name
+		}
+		tc.FunctionCall.Arguments += d.FunctionCall.Arguments
+	}
+	return existing
+}
+
 // ChatEndpoint is the OpenAI Completion API endpoint https://platform.openai.com/docs/api-reference/chat/create
 // @Summary Generate a chat completions for a given prompt and model.
 // @Param request body schema.OpenAIRequest true "query params"
@@ -405,6 +437,100 @@ func ChatEndpoint(cl *config.ModelConfigLoader, ml *model.ModelLoader, evaluator
 		shouldUseFn := len(input.Functions) > 0 && config.ShouldUseFunctions()
 		strictMode := false

+		// MCP tool injection: when mcp_servers is set in metadata and model has MCP config
+		var mcpToolInfos []mcpTools.MCPToolInfo
+		mcpServers := mcpTools.MCPServersFromMetadata(input.Metadata)
+
+		// MCP prompt and resource injection (extracted before tool injection)
+		mcpPromptName, mcpPromptArgs := mcpTools.MCPPromptFromMetadata(input.Metadata)
+		mcpResourceURIs := mcpTools.MCPResourcesFromMetadata(input.Metadata)
+
+		if (len(mcpServers) > 0 || mcpPromptName != "" || len(mcpResourceURIs) > 0) && (config.MCP.Servers != "" || config.MCP.Stdio != "") {
+			remote, stdio, mcpErr := config.MCP.MCPConfigFromYAML()
+			if mcpErr == nil {
+				namedSessions, sessErr := mcpTools.NamedSessionsFromMCPConfig(config.Name, remote, stdio, mcpServers)
+				if sessErr == nil && len(namedSessions) > 0 {
+					// Prompt injection: prepend prompt messages to the conversation
+					if mcpPromptName != "" {
+						prompts, discErr := mcpTools.DiscoverMCPPrompts(c.Request().Context(), namedSessions)
+						if discErr == nil {
+							promptMsgs, getErr := mcpTools.GetMCPPrompt(c.Request().Context(), prompts, mcpPromptName, mcpPromptArgs)
+							if getErr == nil {
+								var injected []schema.Message
+								for _, pm := range promptMsgs {
+									injected = append(injected, schema.Message{
+										Role:    string(pm.Role),
+										Content: mcpTools.PromptMessageToText(pm),
+									})
+								}
+								input.Messages = append(injected, input.Messages...)
+								xlog.Debug("MCP prompt injected", "prompt", mcpPromptName, "messages", len(injected))
+							} else {
+								xlog.Error("Failed to get MCP prompt", "error", getErr)
+							}
+						} else {
+							xlog.Error("Failed to discover MCP prompts", "error", discErr)
+						}
+					}
+
+					// Resource injection: append resource content to the last user message
+					if len(mcpResourceURIs) > 0 {
+						resources, discErr := mcpTools.DiscoverMCPResources(c.Request().Context(), namedSessions)
+						if discErr == nil {
+							var resourceTexts []string
+							for _, uri := range mcpResourceURIs {
+								content, readErr := mcpTools.ReadMCPResource(c.Request().Context(), resources, uri)
+								if readErr != nil {
+									xlog.Error("Failed to read MCP resource", "error", readErr, "uri", uri)
+									continue
+								}
+								// Find resource name
+								name := uri
+								for _, r := range resources {
+									if r.URI == uri {
+										name = r.Name
+										break
+									}
+								}
+								resourceTexts = append(resourceTexts, fmt.Sprintf("--- MCP Resource: %s ---\n%s", name, content))
+							}
+							if len(resourceTexts) > 0 && len(input.Messages) > 0 {
+								lastIdx := len(input.Messages) - 1
+								suffix := "\n\n" + strings.Join(resourceTexts, "\n\n")
+								switch ct := input.Messages[lastIdx].Content.(type) {
+								case string:
+									input.Messages[lastIdx].Content = ct + suffix
+								default:
+									input.Messages[lastIdx].Content = fmt.Sprintf("%v%s", ct, suffix)
+								}
+								xlog.Debug("MCP resources injected", "count", len(resourceTexts))
+							}
+						} else {
+							xlog.Error("Failed to discover MCP resources", "error", discErr)
+						}
+					}
+
+					// Tool injection
+					if len(mcpServers) > 0 {
+						discovered, discErr := mcpTools.DiscoverMCPTools(c.Request().Context(), namedSessions)
+						if discErr == nil {
+							mcpToolInfos = discovered
+							for _, ti := range mcpToolInfos {
+								funcs = append(funcs, ti.Function)
+								input.Tools = append(input.Tools, functions.Tool{Type: "function", Function: ti.Function})
+							}
+							shouldUseFn = len(funcs) > 0 && config.ShouldUseFunctions()
+							xlog.Debug("MCP tools injected", "count", len(mcpToolInfos), "total_funcs", len(funcs))
+						} else {
+							xlog.Error("Failed to discover MCP tools", "error", discErr)
+						}
+					}
+				}
+			} else {
+				xlog.Error("Failed to parse MCP config", "error", mcpErr)
+			}
+		}
+
 		xlog.Debug("Tool call routing decision",
 			"shouldUseFn", shouldUseFn,
 			"len(input.Functions)", len(input.Functions),
@@ -552,6 +678,19 @@ func ChatEndpoint(cl *config.ModelConfigLoader, ml *model.ModelLoader, evaluator
 			c.Response().Header().Set("Connection", "keep-alive")
 			c.Response().Header().Set("X-Correlation-ID", id)

+			mcpStreamMaxIterations := 10
+			if config.Agent.MaxIterations > 0 {
+				mcpStreamMaxIterations = config.Agent.MaxIterations
+			}
+			hasMCPToolsStream := len(mcpToolInfos) > 0
+
+			for mcpStreamIter := 0; mcpStreamIter <= mcpStreamMaxIterations; mcpStreamIter++ {
+			// Re-template on MCP iterations
+			if mcpStreamIter > 0 && !config.TemplateConfig.UseTokenizerTemplate {
+				predInput = evaluator.TemplateMessages(*input, input.Messages, config, funcs, shouldUseFn)
+				xlog.Debug("MCP stream re-templating", "iteration", mcpStreamIter)
+			}
+
 			responses := make(chan schema.OpenAIResponse)
 			ended := make(chan error, 1)

@@ -565,6 +704,8 @@ func ChatEndpoint(cl *config.ModelConfigLoader, ml *model.ModelLoader, evaluator

 			usage := &schema.OpenAIUsage{}
 			toolsCalled := false
+			var collectedToolCalls []schema.ToolCall
+			var collectedContent string

 		LOOP:
 			for {
@@ -582,6 +723,18 @@ func ChatEndpoint(cl *config.ModelConfigLoader, ml *model.ModelLoader, evaluator
 					usage = &ev.Usage // Copy a pointer to the latest usage chunk so that the stop message can reference it
 					if len(ev.Choices[0].Delta.ToolCalls) > 0 {
 						toolsCalled = true
+						// Collect and merge tool call deltas for MCP execution
+						if hasMCPToolsStream {
+							collectedToolCalls = mergeToolCallDeltas(collectedToolCalls, ev.Choices[0].Delta.ToolCalls)
+						}
+					}
+					// Collect content for MCP conversation history
+					if hasMCPToolsStream && ev.Choices[0].Delta != nil && ev.Choices[0].Delta.Content != nil {
+						if s, ok := ev.Choices[0].Delta.Content.(string); ok {
+							collectedContent += s
+						} else if sp, ok := ev.Choices[0].Delta.Content.(*string); ok && sp != nil {
+							collectedContent += *sp
+						}
 					}
 					respData, err := json.Marshal(ev)
 					if err != nil {
@@ -632,6 +785,64 @@ func ChatEndpoint(cl *config.ModelConfigLoader, ml *model.ModelLoader, evaluator
 				}
 			}

+			// MCP streaming tool execution: if we collected MCP tool calls, execute and loop
+			if hasMCPToolsStream && toolsCalled && len(collectedToolCalls) > 0 {
+				var hasMCPCalls bool
+				for _, tc := range collectedToolCalls {
+					if mcpTools.IsMCPTool(mcpToolInfos, tc.FunctionCall.Name) {
+						hasMCPCalls = true
+						break
+					}
+				}
+				if hasMCPCalls {
+					// Append assistant message with tool_calls
+					assistantMsg := schema.Message{
+						Role:      "assistant",
+						Content:   collectedContent,
+						ToolCalls: collectedToolCalls,
+					}
+					input.Messages = append(input.Messages, assistantMsg)
+
+					// Execute MCP tool calls and stream results as tool_result events
+					for _, tc := range collectedToolCalls {
+						if !mcpTools.IsMCPTool(mcpToolInfos, tc.FunctionCall.Name) {
+							continue
+						}
+						xlog.Debug("Executing MCP tool (stream)", "tool", tc.FunctionCall.Name, "iteration", mcpStreamIter)
+						toolResult, toolErr := mcpTools.ExecuteMCPToolCall(
+							c.Request().Context(), mcpToolInfos,
+							tc.FunctionCall.Name, tc.FunctionCall.Arguments,
+						)
+						if toolErr != nil {
+							xlog.Error("MCP tool execution failed", "tool", tc.FunctionCall.Name, "error", toolErr)
+							toolResult = fmt.Sprintf("Error: %v", toolErr)
+						}
+						input.Messages = append(input.Messages, schema.Message{
+							Role:          "tool",
+							Content:       toolResult,
+							StringContent: toolResult,
+							ToolCallID:    tc.ID,
+							Name:          tc.FunctionCall.Name,
+						})
+
+						// Stream tool result event to client
+						mcpEvent := map[string]any{
+							"type":   "mcp_tool_result",
+							"name":   tc.FunctionCall.Name,
+							"result": toolResult,
+						}
+						if mcpEventData, err := json.Marshal(mcpEvent); err == nil {
+							fmt.Fprintf(c.Response().Writer, "data: %s\n\n", mcpEventData)
+							c.Response().Flush()
+						}
+					}
+
+					xlog.Debug("MCP streaming tools executed, re-running inference", "iteration", mcpStreamIter)
+					continue // next MCP stream iteration
+				}
+			}
+
+			// No MCP tools to execute, send final stop message
 			finishReason := FinishReasonStop
 			if toolsCalled && len(input.Tools) > 0 {
 				finishReason = FinishReasonToolCalls
@@ -659,9 +870,28 @@ func ChatEndpoint(cl *config.ModelConfigLoader, ml *model.ModelLoader, evaluator
 			c.Response().Flush()
 			xlog.Debug("Stream ended")
 			return nil
+			} // end MCP stream iteration loop
+
+			// Safety fallback
+			fmt.Fprintf(c.Response().Writer, "data: [DONE]\n\n")
+			c.Response().Flush()
+			return nil

 		// no streaming mode
 		default:
+			mcpMaxIterations := 10
+			if config.Agent.MaxIterations > 0 {
+				mcpMaxIterations = config.Agent.MaxIterations
+			}
+			hasMCPTools := len(mcpToolInfos) > 0
+
+			for mcpIteration := 0; mcpIteration <= mcpMaxIterations; mcpIteration++ {
+			// Re-template on each MCP iteration since messages may have changed
+			if mcpIteration > 0 && !config.TemplateConfig.UseTokenizerTemplate {
+				predInput = evaluator.TemplateMessages(*input, input.Messages, config, funcs, shouldUseFn)
+				xlog.Debug("MCP re-templating", "iteration", mcpIteration, "prompt_len", len(predInput))
+			}
+
 			// Detect if thinking token is already in prompt or template
 			var template string
 			if config.TemplateConfig.UseTokenizerTemplate {
@@ -839,6 +1069,75 @@ func ChatEndpoint(cl *config.ModelConfigLoader, ml *model.ModelLoader, evaluator
 				})
 			}

+			// MCP server-side tool execution loop:
+			// If we have MCP tools and the model returned tool_calls, execute MCP tools
+			// and re-run inference with the results appended to the conversation.
+			if hasMCPTools && len(result) > 0 {
+				var mcpCallsExecuted bool
+				for _, choice := range result {
+					if choice.Message == nil || len(choice.Message.ToolCalls) == 0 {
+						continue
+					}
+					// Check if any tool calls are MCP tools
+					var hasMCPCalls bool
+					for _, tc := range choice.Message.ToolCalls {
+						if mcpTools.IsMCPTool(mcpToolInfos, tc.FunctionCall.Name) {
+							hasMCPCalls = true
+							break
+						}
+					}
+					if !hasMCPCalls {
+						continue
+					}
+
+					// Append assistant message with tool_calls to conversation
+					assistantContent := ""
+					if choice.Message.Content != nil {
+						if s, ok := choice.Message.Content.(string); ok {
+							assistantContent = s
+						} else if sp, ok := choice.Message.Content.(*string); ok && sp != nil {
+							assistantContent = *sp
+						}
+					}
+					assistantMsg := schema.Message{
+						Role:      "assistant",
+						Content:   assistantContent,
+						ToolCalls: choice.Message.ToolCalls,
+					}
+					input.Messages = append(input.Messages, assistantMsg)
+
+					// Execute each MCP tool call and append results
+					for _, tc := range choice.Message.ToolCalls {
+						if !mcpTools.IsMCPTool(mcpToolInfos, tc.FunctionCall.Name) {
+							continue
+						}
+						xlog.Debug("Executing MCP tool", "tool", tc.FunctionCall.Name, "arguments", tc.FunctionCall.Arguments, "iteration", mcpIteration)
+						toolResult, toolErr := mcpTools.ExecuteMCPToolCall(
+							c.Request().Context(), mcpToolInfos,
+							tc.FunctionCall.Name, tc.FunctionCall.Arguments,
+						)
+						if toolErr != nil {
+							xlog.Error("MCP tool execution failed", "tool", tc.FunctionCall.Name, "error", toolErr)
+							toolResult = fmt.Sprintf("Error: %v", toolErr)
+						}
+						input.Messages = append(input.Messages, schema.Message{
+							Role:          "tool",
+							Content:       toolResult,
+							StringContent: toolResult,
+							ToolCallID:    tc.ID,
+							Name:          tc.FunctionCall.Name,
+						})
+						mcpCallsExecuted = true
+					}
+				}
+
+				if mcpCallsExecuted {
+					xlog.Debug("MCP tools executed, re-running inference", "iteration", mcpIteration, "messages", len(input.Messages))
+					continue // next MCP iteration
+				}
+			}
+
+			// No MCP tools to execute (or no MCP tools configured), return response
 			usage := schema.OpenAIUsage{
 				PromptTokens:     tokenUsage.Prompt,
 				CompletionTokens: tokenUsage.Completion,
@@ -862,6 +1161,10 @@ func ChatEndpoint(cl *config.ModelConfigLoader, ml *model.ModelLoader, evaluator

 			// Return the prediction in the response body
 			return c.JSON(200, resp)
+			} // end MCP iteration loop
+
+			// Should not reach here, but safety fallback
+			return fmt.Errorf("MCP iteration limit reached")
 		}
 	}
 }
--- a/core/http/endpoints/openresponses/responses.go
+++ b/core/http/endpoints/openresponses/responses.go
--- a/core/http/endpoints/openresponses/websocket.go
+++ b/core/http/endpoints/openresponses/websocket.go
@@ -308,7 +308,7 @@ func handleWSResponseCreate(connCtx context.Context, conn *lockedConn, input *sc
 		defer close(processDone)
 		store.UpdateStatus(responseID, schema.ORStatusInProgress, nil)

-		finalResponse, bgErr := handleBackgroundStream(reqCtx, store, responseID, createdAt, input, cfg, ml, cl, appConfig, predInput, openAIReq, funcs, shouldUseFn)
+		finalResponse, bgErr := handleBackgroundStream(reqCtx, store, responseID, createdAt, input, cfg, ml, cl, appConfig, predInput, openAIReq, funcs, shouldUseFn, nil, nil)
 		if bgErr != nil {
 			xlog.Error("WebSocket Responses: processing failed", "response_id", responseID, "error", bgErr)
 			now := time.Now().Unix()
--- a/core/http/react-ui/package-lock.json
+++ b/core/http/react-ui/package-lock.json
--- a/core/http/react-ui/package.json
+++ b/core/http/react-ui/package.json
@@ -16,7 +16,9 @@
    "highlight.js": "^11.11.1",
    "marked": "^15.0.7",
    "dompurify": "^3.2.5",
-    "@fortawesome/fontawesome-free": "^6.7.2"
+    "@fortawesome/fontawesome-free": "^6.7.2",
+    "@modelcontextprotocol/sdk": "^1.25.1",
+    "@modelcontextprotocol/ext-apps": "^1.2.2"
  },
  "devDependencies": {
    "@vitejs/plugin-react": "^4.5.2",
--- a/core/http/react-ui/src/App.css
+++ b/core/http/react-ui/src/App.css
@@ -16,6 +16,10 @@
  transition: margin-left var(--duration-normal) var(--ease-default);
 }

+.sidebar-is-collapsed .main-content {
+  margin-left: var(--sidebar-width-collapsed);
+}
+
 .main-content-inner {
  flex: 1;
  display: flex;
@@ -136,7 +140,8 @@
  z-index: 50;
  overflow-y: auto;
  box-shadow: var(--shadow-sidebar);
-  transition: transform var(--duration-normal) var(--ease-default);
+  transition: width var(--duration-normal) var(--ease-default),
+              transform var(--duration-normal) var(--ease-default);
 }

 .sidebar-overlay {
@@ -147,8 +152,9 @@
  display: flex;
  align-items: center;
  justify-content: space-between;
-  padding: var(--spacing-md);
+  padding: var(--spacing-sm) var(--spacing-sm);
  border-bottom: 1px solid var(--color-border-subtle);
+  min-height: 44px;
 }

 .sidebar-logo-link {
@@ -157,11 +163,20 @@

 .sidebar-logo-img {
  width: 100%;
-  max-width: 140px;
+  max-width: 120px;
  height: auto;
  padding: 0 var(--spacing-xs);
 }

+.sidebar-logo-icon {
+  display: none;
+}
+
+.sidebar-logo-icon-img {
+  width: 28px;
+  height: 28px;
+}
+
 .sidebar-close-btn {
  display: none;
  background: none;
@@ -173,33 +188,37 @@

 .sidebar-nav {
  flex: 1;
-  padding: var(--spacing-xs) 0;
+  padding: 2px 0;
  overflow-y: auto;
 }

 .sidebar-section {
-  padding: var(--spacing-xs) 0;
+  padding: 2px 0;
 }

 .sidebar-section-title {
-  padding: var(--spacing-sm) var(--spacing-md) var(--spacing-xs);
-  font-size: 0.6875rem;
+  padding: var(--spacing-xs) var(--spacing-sm) 2px;
+  font-size: 0.625rem;
  font-weight: 600;
  text-transform: uppercase;
  letter-spacing: 0.05em;
  color: var(--color-text-muted);
+  white-space: nowrap;
+  overflow: hidden;
 }

 .nav-item {
  display: flex;
  align-items: center;
  gap: var(--spacing-sm);
-  padding: var(--spacing-sm) var(--spacing-md);
+  padding: 6px var(--spacing-sm);
  color: var(--color-text-secondary);
  text-decoration: none;
-  font-size: 0.875rem;
+  font-size: 0.8125rem;
  transition: all var(--duration-fast) var(--ease-default);
  border-left: 3px solid transparent;
+  white-space: nowrap;
+  overflow: hidden;
 }

 .nav-item:hover {
@@ -215,17 +234,100 @@
 }

 .nav-icon {
-  width: 20px;
+  width: 18px;
  text-align: center;
+  flex-shrink: 0;
+  font-size: 0.85rem;
 }

 .nav-label {
  flex: 1;
+  overflow: hidden;
+  text-overflow: ellipsis;
+}
+
+.nav-external {
+  font-size: 0.55rem;
+  margin-left: auto;
+  opacity: 0.5;
+  flex-shrink: 0;
 }

 .sidebar-footer {
-  padding: var(--spacing-sm) var(--spacing-md);
+  padding: var(--spacing-xs) var(--spacing-sm);
  border-top: 1px solid var(--color-border-subtle);
+  display: flex;
+  align-items: center;
+  justify-content: space-between;
+  gap: var(--spacing-xs);
+}
+
+.sidebar-collapse-btn {
+  background: none;
+  border: none;
+  color: var(--color-text-muted);
+  cursor: pointer;
+  padding: 4px;
+  border-radius: var(--radius-sm);
+  font-size: 0.75rem;
+  transition: color var(--duration-fast);
+  flex-shrink: 0;
+}
+
+.sidebar-collapse-btn:hover {
+  color: var(--color-text-primary);
+}
+
+/* Collapsed sidebar (desktop only) */
+.sidebar.collapsed {
+  width: var(--sidebar-width-collapsed);
+}
+
+.sidebar.collapsed .sidebar-logo-link {
+  display: none;
+}
+
+.sidebar.collapsed .sidebar-logo-icon {
+  display: flex;
+  align-items: center;
+  justify-content: center;
+  width: 100%;
+}
+
+.sidebar.collapsed .sidebar-header {
+  justify-content: center;
+}
+
+.sidebar.collapsed .nav-label,
+.sidebar.collapsed .nav-external,
+.sidebar.collapsed .sidebar-section-title {
+  display: none;
+}
+
+.sidebar.collapsed .nav-item {
+  justify-content: center;
+  padding: 8px 0;
+  border-left-width: 2px;
+}
+
+.sidebar.collapsed .nav-icon {
+  width: auto;
+  font-size: 1rem;
+}
+
+.sidebar.collapsed .sidebar-footer {
+  justify-content: center;
+  flex-direction: column;
+  gap: var(--spacing-xs);
+}
+
+.sidebar.collapsed .theme-toggle {
+  padding: 4px;
+  font-size: 0.75rem;
+}
+
+.sidebar.collapsed .theme-toggle .nav-label {
+  display: none;
 }

 /* Theme toggle */
@@ -1696,19 +1798,129 @@
  border-color: var(--color-primary-border);
  color: var(--color-primary);
 }
-/* Chat MCP toggle switch */
-.chat-mcp-switch {
+/* Chat MCP dropdown */
+.chat-mcp-dropdown {
+  position: relative;
+  display: inline-block;
+}
+.chat-mcp-dropdown .btn {
  display: flex;
  align-items: center;
-  gap: 6px;
-  cursor: pointer;
-  user-select: none;
+  gap: 5px;
 }
-.chat-mcp-switch-label {
-  font-size: 0.75rem;
-  font-weight: 500;
+.chat-mcp-badge {
+  display: inline-flex;
+  align-items: center;
+  justify-content: center;
+  min-width: 18px;
+  height: 18px;
+  padding: 0 5px;
+  border-radius: 9px;
+  background: rgba(255,255,255,0.25);
+  font-size: 0.7rem;
+  font-weight: 600;
+  line-height: 1;
+}
+.chat-mcp-dropdown-menu {
+  position: absolute;
+  top: calc(100% + 4px);
+  right: 0;
+  z-index: 100;
+  min-width: 240px;
+  max-height: 320px;
+  overflow-y: auto;
+  background: var(--color-bg-primary);
+  border: 1px solid var(--color-border-subtle);
+  border-radius: var(--radius-md);
+  box-shadow: var(--shadow-lg);
+}
+.chat-mcp-dropdown-loading,
+.chat-mcp-dropdown-empty {
+  padding: var(--spacing-sm) var(--spacing-md);
+  font-size: 0.8125rem;
  color: var(--color-text-secondary);
 }
+.chat-mcp-dropdown-header {
+  display: flex;
+  align-items: center;
+  justify-content: space-between;
+  padding: var(--spacing-xs) var(--spacing-md);
+  border-bottom: 1px solid var(--color-border-divider);
+  font-size: 0.75rem;
+  font-weight: 600;
+  color: var(--color-text-secondary);
+  text-transform: uppercase;
+  letter-spacing: 0.03em;
+}
+.chat-mcp-select-all {
+  background: none;
+  border: none;
+  padding: 0;
+  font-size: 0.75rem;
+  color: var(--color-accent);
+  cursor: pointer;
+  text-transform: none;
+  letter-spacing: 0;
+}
+.chat-mcp-select-all:hover {
+  text-decoration: underline;
+}
+.chat-mcp-server-item {
+  display: flex;
+  align-items: center;
+  gap: 8px;
+  padding: var(--spacing-xs) var(--spacing-md);
+  cursor: pointer;
+  transition: background 120ms;
+}
+.chat-mcp-server-item:hover {
+  background: var(--color-bg-hover);
+}
+.chat-mcp-server-item input[type="checkbox"] {
+  flex-shrink: 0;
+}
+.chat-mcp-server-info {
+  display: flex;
+  flex-direction: column;
+  gap: 1px;
+  min-width: 0;
+}
+.chat-mcp-server-name {
+  font-size: 0.8125rem;
+  font-weight: 500;
+  color: var(--color-text-primary);
+  white-space: nowrap;
+  overflow: hidden;
+  text-overflow: ellipsis;
+}
+.chat-mcp-server-tools {
+  font-size: 0.7rem;
+  color: var(--color-text-tertiary);
+}
+
+/* Client MCP status indicators */
+.chat-client-mcp-status {
+  display: inline-block;
+  width: 8px;
+  height: 8px;
+  border-radius: 50%;
+  flex-shrink: 0;
+  background: var(--color-text-tertiary);
+}
+.chat-client-mcp-status-connected {
+  background: #22c55e;
+  box-shadow: 0 0 4px rgba(34, 197, 94, 0.5);
+}
+.chat-client-mcp-status-connecting {
+  background: #f59e0b;
+  animation: pulse 1s infinite;
+}
+.chat-client-mcp-status-error {
+  background: #ef4444;
+}
+.chat-client-mcp-status-disconnected {
+  background: var(--color-text-tertiary);
+}

 /* Chat model info panel */
 .chat-model-info-panel {
@@ -2035,7 +2247,8 @@

 /* Responsive */
@media (max-width: 1023px) {
-  .main-content {
+  .main-content,
+  .sidebar-is-collapsed .main-content {
    margin-left: 0;
  }

@@ -2045,6 +2258,11 @@

  .sidebar {
    transform: translateX(-100%);
+    width: var(--sidebar-width);
+  }
+
+  .sidebar.collapsed {
+    width: var(--sidebar-width);
  }

  .sidebar.open {
@@ -2055,6 +2273,39 @@
    display: block;
  }

+  .sidebar-collapse-btn {
+    display: none;
+  }
+
+  .sidebar.collapsed .nav-label,
+  .sidebar.collapsed .nav-external,
+  .sidebar.collapsed .sidebar-section-title {
+    display: unset;
+  }
+
+  .sidebar.collapsed .sidebar-logo-link {
+    display: block;
+  }
+
+  .sidebar.collapsed .sidebar-logo-icon {
+    display: none;
+  }
+
+  .sidebar.collapsed .nav-item {
+    justify-content: flex-start;
+    padding: 6px var(--spacing-sm);
+    border-left-width: 3px;
+  }
+
+  .sidebar.collapsed .nav-icon {
+    width: 18px;
+    font-size: 0.85rem;
+  }
+
+  .sidebar.collapsed .sidebar-header {
+    justify-content: space-between;
+  }
+
  .sidebar-overlay {
    display: block;
    position: fixed;
@@ -2388,3 +2639,37 @@
    gap: var(--spacing-xs);
  }
 }
+
+/* MCP App Frame */
+.mcp-app-frame-container {
+  width: 100%;
+  margin: var(--spacing-sm) 0;
+  border-radius: var(--border-radius-md);
+  overflow: hidden;
+  border: 1px solid var(--color-border-subtle);
+}
+
+.mcp-app-iframe {
+  width: 100%;
+  border: none;
+  display: block;
+  min-height: 100px;
+  max-height: 600px;
+  transition: height 0.2s ease;
+  background: var(--color-bg-primary);
+}
+
+.mcp-app-error {
+  padding: var(--spacing-sm) var(--spacing-md);
+  color: var(--color-text-danger, #e53e3e);
+  font-size: 0.85rem;
+}
+
+.mcp-app-reconnect-overlay {
+  padding: var(--spacing-sm);
+  text-align: center;
+  font-size: 0.8rem;
+  color: var(--color-text-secondary);
+  background: var(--color-bg-secondary);
+  border-top: 1px solid var(--color-border-subtle);
+}
--- a/core/http/react-ui/src/App.jsx
+++ b/core/http/react-ui/src/App.jsx
@@ -5,8 +5,13 @@ import OperationsBar from './components/OperationsBar'
 import { ToastContainer, useToast } from './components/Toast'
 import { systemApi } from './utils/api'

+const COLLAPSED_KEY = 'localai_sidebar_collapsed'
+
 export default function App() {
  const [sidebarOpen, setSidebarOpen] = useState(false)
+  const [sidebarCollapsed, setSidebarCollapsed] = useState(() => {
+    try { return localStorage.getItem(COLLAPSED_KEY) === 'true' } catch (_) { return false }
+  })
  const { toasts, addToast, removeToast } = useToast()
  const [version, setVersion] = useState('')
  const location = useLocation()
@@ -18,8 +23,20 @@ export default function App() {
      .catch(() => {})
  }, [])

+  useEffect(() => {
+    const handler = (e) => setSidebarCollapsed(e.detail.collapsed)
+    window.addEventListener('sidebar-collapse', handler)
+    return () => window.removeEventListener('sidebar-collapse', handler)
+  }, [])
+
+  const layoutClasses = [
+    'app-layout',
+    isChatRoute ? 'app-layout-chat' : '',
+    sidebarCollapsed ? 'sidebar-is-collapsed' : '',
+  ].filter(Boolean).join(' ')
+
  return (
-    <div className={`app-layout${isChatRoute ? ' app-layout-chat' : ''}`}>
+    <div className={layoutClasses}>
      <Sidebar isOpen={sidebarOpen} onClose={() => setSidebarOpen(false)} />
      <main className="main-content">
        <OperationsBar />
--- a/core/http/react-ui/src/components/ClientMCPDropdown.jsx
+++ b/core/http/react-ui/src/components/ClientMCPDropdown.jsx
@@ -0,0 +1,154 @@
+import { useState, useEffect, useRef, useCallback } from 'react'
+import { loadClientMCPServers, addClientMCPServer, removeClientMCPServer } from '../utils/mcpClientStorage'
+
+export default function ClientMCPDropdown({
+  activeServerIds = [],
+  onToggleServer,
+  onServerAdded,
+  onServerRemoved,
+  connectionStatuses = {},
+  getConnectedTools,
+}) {
+  const [open, setOpen] = useState(false)
+  const [addDialog, setAddDialog] = useState(false)
+  const [servers, setServers] = useState(() => loadClientMCPServers())
+  const [url, setUrl] = useState('')
+  const [name, setName] = useState('')
+  const [authToken, setAuthToken] = useState('')
+  const [useProxy, setUseProxy] = useState(true)
+  const ref = useRef(null)
+
+  useEffect(() => {
+    if (!open) return
+    const handleClick = (e) => {
+      if (ref.current && !ref.current.contains(e.target)) setOpen(false)
+    }
+    document.addEventListener('mousedown', handleClick)
+    return () => document.removeEventListener('mousedown', handleClick)
+  }, [open])
+
+  const handleAdd = useCallback(() => {
+    if (!url.trim()) return
+    const headers = {}
+    if (authToken.trim()) {
+      headers.Authorization = `Bearer ${authToken.trim()}`
+    }
+    const server = addClientMCPServer({ name: name.trim() || undefined, url: url.trim(), headers, useProxy })
+    setServers(loadClientMCPServers())
+    setUrl('')
+    setName('')
+    setAuthToken('')
+    setUseProxy(true)
+    setAddDialog(false)
+    if (onServerAdded) onServerAdded(server)
+  }, [url, name, authToken, useProxy, onServerAdded])
+
+  const handleRemove = useCallback((id) => {
+    removeClientMCPServer(id)
+    setServers(loadClientMCPServers())
+    if (onServerRemoved) onServerRemoved(id)
+  }, [onServerRemoved])
+
+  const activeCount = activeServerIds.length
+
+  return (
+    <div className="chat-mcp-dropdown" ref={ref}>
+      <button
+        type="button"
+        className={`btn btn-sm ${activeCount > 0 ? 'btn-primary' : 'btn-secondary'}`}
+        title="Client-side MCP servers (browser connects directly)"
+        onClick={() => setOpen(!open)}
+      >
+        <i className="fas fa-globe" /> Client MCP
+        {activeCount > 0 && (
+          <span className="chat-mcp-badge">{activeCount}</span>
+        )}
+      </button>
+      {open && (
+        <div className="chat-mcp-dropdown-menu" style={{ minWidth: '280px' }}>
+          <div className="chat-mcp-dropdown-header">
+            <span>Client MCP Servers</span>
+            <button className="chat-mcp-select-all" onClick={() => setAddDialog(!addDialog)}>
+              <i className="fas fa-plus" /> Add
+            </button>
+          </div>
+          {addDialog && (
+            <div style={{ padding: '8px 10px', borderBottom: '1px solid var(--color-border)' }}>
+              <input
+                type="text"
+                className="input input-sm"
+                placeholder="Server URL (e.g. https://mcp.example.com/sse)"
+                value={url}
+                onChange={e => setUrl(e.target.value)}
+                style={{ width: '100%', marginBottom: '4px' }}
+              />
+              <input
+                type="text"
+                className="input input-sm"
+                placeholder="Name (optional)"
+                value={name}
+                onChange={e => setName(e.target.value)}
+                style={{ width: '100%', marginBottom: '4px' }}
+              />
+              <input
+                type="password"
+                className="input input-sm"
+                placeholder="Auth token (optional)"
+                value={authToken}
+                onChange={e => setAuthToken(e.target.value)}
+                style={{ width: '100%', marginBottom: '4px' }}
+              />
+              <label style={{ display: 'flex', alignItems: 'center', gap: '6px', fontSize: '0.8rem', marginBottom: '6px' }}>
+                <input type="checkbox" checked={useProxy} onChange={e => setUseProxy(e.target.checked)} />
+                Use CORS proxy
+              </label>
+              <div style={{ display: 'flex', gap: '4px', justifyContent: 'flex-end' }}>
+                <button className="btn btn-sm btn-secondary" onClick={() => setAddDialog(false)}>Cancel</button>
+                <button className="btn btn-sm btn-primary" onClick={handleAdd} disabled={!url.trim()}>Add</button>
+              </div>
+            </div>
+          )}
+          {servers.length === 0 && !addDialog ? (
+            <div className="chat-mcp-dropdown-empty">No client MCP servers configured</div>
+          ) : (
+            servers.map(server => {
+              const status = connectionStatuses[server.id]?.status || 'disconnected'
+              const isActive = activeServerIds.includes(server.id)
+              const connTools = getConnectedTools?.().find(c => c.serverId === server.id)
+              return (
+                <label key={server.id} className="chat-mcp-server-item">
+                  <input
+                    type="checkbox"
+                    checked={isActive}
+                    onChange={() => onToggleServer(server.id)}
+                  />
+                  <div className="chat-mcp-server-info" style={{ flex: 1 }}>
+                    <div style={{ display: 'flex', alignItems: 'center', gap: '6px' }}>
+                      <span className={`chat-client-mcp-status chat-client-mcp-status-${status}`} />
+                      <span className="chat-mcp-server-name">{server.name}</span>
+                      {server.headers?.Authorization && <i className="fas fa-lock" style={{ fontSize: '0.65rem', opacity: 0.5 }} title="Authenticated" />}
+                    </div>
+                    <span className="chat-mcp-server-tools">
+                      {status === 'connecting' ? 'Connecting...' :
+                       status === 'error' ? (connectionStatuses[server.id]?.error || 'Error') :
+                       status === 'connected' && connTools ? `${connTools.tools.length} tools` :
+                       server.url}
+                    </span>
+                  </div>
+                  <button
+                    className="btn btn-sm"
+                    style={{ padding: '2px 6px', fontSize: '0.7rem', color: 'var(--color-error)' }}
+                    onClick={(e) => { e.preventDefault(); e.stopPropagation(); handleRemove(server.id) }}
+                    title="Remove server"
+                  >
+                    <i className="fas fa-trash" />
+                  </button>
+                </label>
+              )
+            })
+          )}
+        </div>
+      )}
+    </div>
+  )
+}
--- a/core/http/react-ui/src/components/MCPAppFrame.jsx
+++ b/core/http/react-ui/src/components/MCPAppFrame.jsx
@@ -0,0 +1,104 @@
+import { useRef, useEffect, useState, useCallback } from 'react'
+import { AppBridge, PostMessageTransport, buildAllowAttribute } from '@modelcontextprotocol/ext-apps/app-bridge'
+
+export default function MCPAppFrame({ toolName, toolInput, toolResult, mcpClient, toolDefinition: _toolDefinition, appHtml, resourceMeta }) {
+  const iframeRef = useRef(null)
+  const bridgeRef = useRef(null)
+  const [iframeHeight, setIframeHeight] = useState(200)
+  const [error, setError] = useState(null)
+  const initializedRef = useRef(false)
+
+  const setupBridge = useCallback(async () => {
+    if (!mcpClient || !iframeRef.current || initializedRef.current) return
+
+    const iframe = iframeRef.current
+    initializedRef.current = true
+
+    try {
+      const transport = new PostMessageTransport(iframe.contentWindow, iframe.contentWindow)
+      const bridge = new AppBridge(
+        mcpClient,
+        { name: 'LocalAI', version: '1.0.0' },
+        { openLinks: {}, serverTools: {}, serverResources: {}, logging: {} },
+        { hostContext: { displayMode: 'inline' } }
+      )
+
+      bridge.oninitialized = () => {
+        if (toolInput) bridge.sendToolInput({ arguments: toolInput })
+        if (toolResult) bridge.sendToolResult(toolResult)
+      }
+
+      bridge.onsizechange = ({ height }) => {
+        if (height && height > 0) setIframeHeight(Math.min(height, 600))
+      }
+
+      bridge.onopenlink = async ({ url }) => {
+        window.open(url, '_blank', 'noopener,noreferrer')
+        return {}
+      }
+
+      bridge.onmessage = async () => {
+        return {}
+      }
+
+      bridge.onrequestdisplaymode = async () => {
+        return { mode: 'inline' }
+      }
+
+      await bridge.connect(transport)
+      bridgeRef.current = bridge
+    } catch (err) {
+      setError(`Bridge error: ${err.message}`)
+    }
+  }, [mcpClient, toolInput, toolResult])
+
+  const handleIframeLoad = useCallback(() => {
+    setupBridge()
+  }, [setupBridge])
+
+  // Send toolResult when it arrives after initialization
+  useEffect(() => {
+    if (bridgeRef.current && toolResult && initializedRef.current) {
+      bridgeRef.current.sendToolResult(toolResult)
+    }
+  }, [toolResult])
+
+  // Cleanup on unmount — only close the local transport, don't send
+  // teardownResource which would kill server-side state and cause
+  // "Connection closed" errors if the component remounts (e.g. when
+  // streaming ends and ActivityGroup takes over from StreamingActivity).
+  useEffect(() => {
+    return () => {
+      const bridge = bridgeRef.current
+      if (bridge) {
+        try { bridge.close() } catch (_) { /* ignore */ }
+      }
+    }
+  }, [])
+
+  if (!appHtml) return null
+
+  const permissions = resourceMeta?.permissions
+  const allowAttr = permissions ? buildAllowAttribute(permissions) : undefined
+
+  return (
+    <div className="mcp-app-frame-container">
+      <iframe
+        ref={iframeRef}
+        srcDoc={appHtml}
+        sandbox="allow-scripts allow-forms"
+        allow={allowAttr}
+        className="mcp-app-iframe"
+        style={{ height: `${iframeHeight}px` }}
+        onLoad={handleIframeLoad}
+        title={`MCP App: ${toolName || 'unknown'}`}
+      />
+      {error && <div className="mcp-app-error">{error}</div>}
+      {!mcpClient && (
+        <div className="mcp-app-reconnect-overlay">
+          Reconnect to MCP server to interact with this app
+        </div>
+      )}
+    </div>
+  )
+}
--- a/core/http/react-ui/src/components/Sidebar.jsx
+++ b/core/http/react-ui/src/components/Sidebar.jsx
@@ -2,6 +2,8 @@ import { useState, useEffect } from 'react'
 import { NavLink } from 'react-router-dom'
 import ThemeToggle from './ThemeToggle'

+const COLLAPSED_KEY = 'localai_sidebar_collapsed'
+
 const mainItems = [
  { path: '/', icon: 'fas fa-home', label: 'Home' },
  { path: '/browse', icon: 'fas fa-download', label: 'Install Models' },
@@ -28,7 +30,7 @@ const systemItems = [
  { path: '/settings', icon: 'fas fa-cog', label: 'Settings' },
 ]

-function NavItem({ item, onClose }) {
+function NavItem({ item, onClose, collapsed }) {
  return (
    <NavLink
      to={item.path}
@@ -37,6 +39,7 @@ function NavItem({ item, onClose }) {
        `nav-item ${isActive ? 'active' : ''}`
      }
      onClick={onClose}
+      title={collapsed ? item.label : undefined}
    >
      <i className={`${item.icon} nav-icon`} />
      <span className="nav-label">{item.label}</span>
@@ -46,20 +49,36 @@ function NavItem({ item, onClose }) {

 export default function Sidebar({ isOpen, onClose }) {
  const [features, setFeatures] = useState({})
+  const [collapsed, setCollapsed] = useState(() => {
+    try { return localStorage.getItem(COLLAPSED_KEY) === 'true' } catch (_) { return false }
+  })
+
  useEffect(() => {
    fetch('/api/features').then(r => r.json()).then(setFeatures).catch(() => {})
  }, [])

+  const toggleCollapse = () => {
+    setCollapsed(prev => {
+      const next = !prev
+      try { localStorage.setItem(COLLAPSED_KEY, String(next)) } catch (_) { /* ignore */ }
+      window.dispatchEvent(new CustomEvent('sidebar-collapse', { detail: { collapsed: next } }))
+      return next
+    })
+  }
+
  return (
    <>
      {isOpen && <div className="sidebar-overlay" onClick={onClose} />}

-      <aside className={`sidebar ${isOpen ? 'open' : ''}`}>
+      <aside className={`sidebar ${isOpen ? 'open' : ''} ${collapsed ? 'collapsed' : ''}`}>
        {/* Logo */}
        <div className="sidebar-header">
          <a href="./" className="sidebar-logo-link">
            <img src="/static/logo_horizontal.png" alt="LocalAI" className="sidebar-logo-img" />
          </a>
+          <a href="./" className="sidebar-logo-icon" title="LocalAI">
+            <img src="/static/logo.png" alt="LocalAI" className="sidebar-logo-icon-img" />
+          </a>
          <button className="sidebar-close-btn" onClick={onClose} aria-label="Close menu">
            <i className="fas fa-times" />
          </button>
@@ -70,7 +89,7 @@ export default function Sidebar({ isOpen, onClose }) {
          {/* Main section */}
          <div className="sidebar-section">
            {mainItems.map(item => (
-              <NavItem key={item.path} item={item} onClose={onClose} />
+              <NavItem key={item.path} item={item} onClose={onClose} collapsed={collapsed} />
            ))}
          </div>

@@ -79,7 +98,7 @@ export default function Sidebar({ isOpen, onClose }) {
            <div className="sidebar-section">
              <div className="sidebar-section-title">Agents</div>
              {agentItems.filter(item => !item.feature || features[item.feature] !== false).map(item => (
-                <NavItem key={item.path} item={item} onClose={onClose} />
+                <NavItem key={item.path} item={item} onClose={onClose} collapsed={collapsed} />
              ))}
            </div>
          )}
@@ -92,13 +111,14 @@ export default function Sidebar({ isOpen, onClose }) {
              target="_blank"
              rel="noopener noreferrer"
              className="nav-item"
+              title={collapsed ? 'API' : undefined}
            >
              <i className="fas fa-code nav-icon" />
              <span className="nav-label">API</span>
-              <i className="fas fa-external-link-alt" style={{ fontSize: '0.6rem', marginLeft: 'auto', opacity: 0.5 }} />
+              <i className="fas fa-external-link-alt nav-external" />
            </a>
            {systemItems.map(item => (
-              <NavItem key={item.path} item={item} onClose={onClose} />
+              <NavItem key={item.path} item={item} onClose={onClose} collapsed={collapsed} />
            ))}
          </div>
        </nav>
@@ -106,6 +126,13 @@ export default function Sidebar({ isOpen, onClose }) {
        {/* Footer */}
        <div className="sidebar-footer">
          <ThemeToggle />
+          <button
+            className="sidebar-collapse-btn"
+            onClick={toggleCollapse}
+            title={collapsed ? 'Expand sidebar' : 'Collapse sidebar'}
+          >
+            <i className={`fas fa-chevron-${collapsed ? 'right' : 'left'}`} />
+          </button>
        </div>
      </aside>
    </>
--- a/core/http/react-ui/src/hooks/useChat.js
+++ b/core/http/react-ui/src/hooks/useChat.js
@@ -52,6 +52,8 @@ function saveChats(chats, activeChatId) {
        history: chat.history,
        systemPrompt: chat.systemPrompt,
        mcpMode: chat.mcpMode,
+        mcpServers: chat.mcpServers,
+        clientMCPServers: chat.clientMCPServers,
        temperature: chat.temperature,
        topP: chat.topP,
        topK: chat.topK,
@@ -79,6 +81,9 @@ function createNewChat(model = '', systemPrompt = '', mcpMode = false) {
    history: [],
    systemPrompt,
    mcpMode,
+    mcpServers: [],
+    mcpResources: [],
+    clientMCPServers: [],
    temperature: null,
    topP: null,
    topK: null,
@@ -256,8 +261,28 @@ export function useChat(initialModel = '') {
    if (topK !== null && topK !== undefined) requestBody.top_k = topK
    if (contextSize) requestBody.max_tokens = contextSize

-    // Choose endpoint
-    const endpoint = activeChat.mcpMode
+    // MCP: send selected servers via metadata so the backend activates them
+    const hasMcpServers = activeChat.mcpServers && activeChat.mcpServers.length > 0
+    if (hasMcpServers) {
+      if (!requestBody.metadata) requestBody.metadata = {}
+      requestBody.metadata.mcp_servers = activeChat.mcpServers.join(',')
+    }
+
+    // MCP: send selected resource URIs via metadata
+    const hasMcpResources = activeChat.mcpResources && activeChat.mcpResources.length > 0
+    if (hasMcpResources) {
+      if (!requestBody.metadata) requestBody.metadata = {}
+      requestBody.metadata.mcp_resources = activeChat.mcpResources.join(',')
+    }
+
+    // Client-side MCP: inject tools into request body
+    if (options.clientMCPTools && options.clientMCPTools.length > 0) {
+      requestBody.tools = [...(requestBody.tools || []), ...options.clientMCPTools]
+    }
+
+    // Use MCP endpoint only for legacy mcpMode without specific servers selected
+    // (the MCP endpoint auto-enables all servers)
+    const endpoint = (activeChat.mcpMode && !hasMcpServers)
      ? API_CONFIG.endpoints.mcpChatCompletions
      : API_CONFIG.endpoints.chatCompletions

@@ -277,8 +302,8 @@ export function useChat(initialModel = '') {
    let usage = {}
    const newMessages = [] // Accumulate messages to add to history

-    if (activeChat.mcpMode) {
-      // MCP SSE streaming
+    if (activeChat.mcpMode && !hasMcpServers) {
+      // Legacy MCP SSE streaming (custom event types from /v1/mcp/chat/completions)
      try {
        const timeoutId = setTimeout(() => controller.abort(), 300000) // 5 min timeout
        const response = await fetch(endpoint, {
@@ -407,122 +432,250 @@ export function useChat(initialModel = '') {
        }
      }
    } else {
-      // Regular SSE streaming
-      let rawContent = ''
-      let reasoningContent = ''
-      let hasReasoningFromAPI = false
-      let insideThinkTag = false
+      // Regular SSE streaming with client-side agentic loop support
+      const maxToolTurns = options.maxToolTurns || 10
+      let turnCount = 0
+      let loopMessages = [...messages]
+      let loopBody = { ...requestBody }

-      try {
-        const response = await fetch(endpoint, {
-          method: 'POST',
-          headers: { 'Content-Type': 'application/json' },
-          body: JSON.stringify(requestBody),
-          signal: controller.signal,
-        })
+      // Outer loop: re-sends when client-side tool calls are detected
+      let continueLoop = true
+      while (continueLoop) {
+        continueLoop = false

-        if (!response.ok) {
-          throw new Error(`HTTP ${response.status}`)
-        }
+        let rawContent = ''
+        let reasoningContent = ''
+        let hasReasoningFromAPI = false
+        let insideThinkTag = false
+        let currentToolCalls = []
+        let finishReason = null
+        let fullToolCalls = [] // Tool calls with id for agentic loop

-        const reader = response.body.getReader()
-        const decoder = new TextDecoder()
-        let buffer = ''
+        try {
+          const response = await fetch(endpoint, {
+            method: 'POST',
+            headers: { 'Content-Type': 'application/json' },
+            body: JSON.stringify(loopBody),
+            signal: controller.signal,
+          })

-        while (true) {
-          const { done, value } = await reader.read()
-          if (done) break
+          if (!response.ok) {
+            throw new Error(`HTTP ${response.status}`)
+          }

-          buffer += decoder.decode(value, { stream: true })
-          const lines = buffer.split('\n')
-          buffer = lines.pop() || ''
+          const reader = response.body.getReader()
+          const decoder = new TextDecoder()
+          let buffer = ''

-          for (const line of lines) {
-            const trimmed = line.trim()
-            if (!trimmed || !trimmed.startsWith('data: ')) continue
-            const data = trimmed.slice(6)
-            if (data === '[DONE]') continue
+          while (true) {
+            const { done, value } = await reader.read()
+            if (done) break

-            try {
-              const parsed = JSON.parse(data)
-              const delta = parsed?.choices?.[0]?.delta
+            buffer += decoder.decode(value, { stream: true })
+            const lines = buffer.split('\n')
+            buffer = lines.pop() || ''

-              // Handle reasoning field from API
-              if (delta?.reasoning) {
-                hasReasoningFromAPI = true
-                reasoningContent += delta.reasoning
-                tokenCountRef.current++
-                setStreamingReasoning(reasoningContent)
-                updateTps()
-              }
+            for (const line of lines) {
+              const trimmed = line.trim()
+              if (!trimmed || !trimmed.startsWith('data: ')) continue
+              const data = trimmed.slice(6)
+              if (data === '[DONE]') continue

-              if (delta?.content) {
-                rawContent += delta.content
-                tokenCountRef.current++
+              try {
+                const parsed = JSON.parse(data)

-                if (!hasReasoningFromAPI) {
-                  // Check thinking tags
-                  if (openThinkTagRegex.test(rawContent) && !closeThinkTagRegex.test(rawContent)) {
-                    insideThinkTag = true
-                  }
-                  if (insideThinkTag && closeThinkTagRegex.test(rawContent)) {
-                    insideThinkTag = false
-                  }
+                // Handle MCP tool result events
+                if (parsed?.type === 'mcp_tool_result') {
+                  currentToolCalls.push({
+                    type: 'tool_result',
+                    name: parsed.name || 'tool',
+                    result: parsed.result || '',
+                  })
+                  setStreamingToolCalls([...currentToolCalls.filter(Boolean)])
+                  continue
+                }

-                  const { regularContent, thinkingContent } = extractThinking(rawContent)
-                  if (thinkingContent) {
-                    reasoningContent = thinkingContent
-                  }
+                const choice = parsed?.choices?.[0]
+                const delta = choice?.delta

-                  if (insideThinkTag) {
-                    const lastOpen = Math.max(rawContent.lastIndexOf('<thinking>'), rawContent.lastIndexOf('<think>'))
-                    if (lastOpen >= 0) {
-                      const partial = rawContent.slice(lastOpen).replace(/<thinking>|<think>/, '')
-                      setStreamingReasoning(partial)
-                      // Only show content before the unclosed think tag (with prior complete pairs removed)
-                      const beforeThink = rawContent.slice(0, lastOpen)
-                      const { regularContent: contentBeforeThink } = extractThinking(beforeThink)
-                      setStreamingContent(contentBeforeThink)
+                // Track finish_reason
+                if (choice?.finish_reason) {
+                  finishReason = choice.finish_reason
+                }
+
+                // Handle reasoning field from API
+                if (delta?.reasoning) {
+                  hasReasoningFromAPI = true
+                  reasoningContent += delta.reasoning
+                  tokenCountRef.current++
+                  setStreamingReasoning(reasoningContent)
+                  updateTps()
+                }
+
+                // Handle tool call deltas
+                if (delta?.tool_calls) {
+                  for (const tc of delta.tool_calls) {
+                    const idx = tc.index ?? 0
+                    if (!currentToolCalls[idx]) {
+                      currentToolCalls[idx] = {
+                        type: 'tool_call',
+                        name: tc.function?.name || '',
+                        arguments: tc.function?.arguments || '',
+                      }
+                      fullToolCalls[idx] = {
+                        id: tc.id || `call_${idx}`,
+                        type: 'function',
+                        function: { name: tc.function?.name || '', arguments: tc.function?.arguments || '' },
+                      }
                    } else {
+                      if (tc.function?.name) {
+                        currentToolCalls[idx].name = tc.function.name
+                        fullToolCalls[idx].function.name = tc.function.name
+                      }
+                      if (tc.function?.arguments) {
+                        currentToolCalls[idx].arguments += tc.function.arguments
+                        fullToolCalls[idx].function.arguments += tc.function.arguments
+                      }
+                      if (tc.id) fullToolCalls[idx].id = tc.id
+                    }
+                  }
+                  setStreamingToolCalls([...currentToolCalls.filter(Boolean)])
+                }
+
+                if (delta?.content) {
+                  rawContent += delta.content
+                  tokenCountRef.current++
+
+                  if (!hasReasoningFromAPI) {
+                    if (openThinkTagRegex.test(rawContent) && !closeThinkTagRegex.test(rawContent)) {
+                      insideThinkTag = true
+                    }
+                    if (insideThinkTag && closeThinkTagRegex.test(rawContent)) {
+                      insideThinkTag = false
+                    }
+
+                    const { regularContent, thinkingContent } = extractThinking(rawContent)
+                    if (thinkingContent) {
+                      reasoningContent = thinkingContent
+                    }
+
+                    if (insideThinkTag) {
+                      const lastOpen = Math.max(rawContent.lastIndexOf('<thinking>'), rawContent.lastIndexOf('<think>'))
+                      if (lastOpen >= 0) {
+                        const partial = rawContent.slice(lastOpen).replace(/<thinking>|<think>/, '')
+                        setStreamingReasoning(partial)
+                        const beforeThink = rawContent.slice(0, lastOpen)
+                        const { regularContent: contentBeforeThink } = extractThinking(beforeThink)
+                        setStreamingContent(contentBeforeThink)
+                      } else {
+                        setStreamingContent(regularContent)
+                      }
+                    } else {
+                      setStreamingReasoning(reasoningContent)
                      setStreamingContent(regularContent)
                    }
                  } else {
-                    setStreamingReasoning(reasoningContent)
-                    setStreamingContent(regularContent)
+                    setStreamingContent(rawContent)
                  }
-                } else {
-                  setStreamingContent(rawContent)
-                }

-                updateTps()
+                  updateTps()
+                }
+                if (parsed?.usage) {
+                  usage = parsed.usage
+                }
+              } catch (_e) {
+                // skip malformed JSON
              }
-              if (parsed?.usage) {
-                usage = parsed.usage
-              }
-            } catch (_e) {
-              // skip malformed JSON
            }
          }
+        } catch (err) {
+          if (err.name !== 'AbortError') {
+            rawContent += `\n\nError: ${err.message}`
+          }
        }
-      } catch (err) {
-        if (err.name !== 'AbortError') {
-          rawContent += `\n\nError: ${err.message}`
+
+        // Client-side agentic loop: check for client tool calls
+        const validToolCalls = fullToolCalls.filter(Boolean)
+        const hasClientToolCalls = (
+          (finishReason === 'tool_calls' || finishReason === 'stop' && validToolCalls.length > 0) &&
+          validToolCalls.length > 0 &&
+          options.isClientTool &&
+          options.executeTool &&
+          turnCount < maxToolTurns
+        )
+
+        const clientCalls = hasClientToolCalls
+          ? validToolCalls.filter(tc => options.isClientTool(tc.function?.name))
+          : []
+
+        if (clientCalls.length > 0) {
+          // Add tool calls to streaming display
+          for (const tc of clientCalls) {
+            newMessages.push({
+              role: 'tool_call',
+              content: JSON.stringify({ type: 'tool_call', name: tc.function.name, arguments: tc.function.arguments }, null, 2),
+              expanded: false,
+            })
+          }
+
+          // Build assistant message with tool_calls for conversation
+          const assistantMsg = {
+            role: 'assistant',
+            content: rawContent || null,
+            tool_calls: validToolCalls,
+          }
+          loopMessages.push(assistantMsg)
+
+          // Execute each client-side tool
+          for (const tc of clientCalls) {
+            const result = await options.executeTool(tc.function.name, tc.function.arguments)
+            const toolResultMsg = { role: 'tool', tool_call_id: tc.id, content: result }
+            loopMessages.push(toolResultMsg)
+
+            // Check for MCP App UI
+            let appUI = null
+            if (options.getToolAppUI) {
+              let parsedArgs
+              try {
+                parsedArgs = typeof tc.function.arguments === 'string'
+                  ? JSON.parse(tc.function.arguments) : tc.function.arguments
+              } catch (_) { parsedArgs = {} }
+              appUI = await options.getToolAppUI(tc.function.name, parsedArgs, result)
+            }
+
+            // Show result in UI
+            newMessages.push({
+              role: 'tool_result',
+              content: JSON.stringify({ type: 'tool_result', name: tc.function.name, result }, null, 2),
+              expanded: false,
+              appUI,
+            })
+            currentToolCalls.push({ type: 'tool_result', name: tc.function.name, result, appUI })
+            setStreamingToolCalls([...currentToolCalls.filter(Boolean)])
+          }
+
+          // Re-send with updated messages
+          loopBody = { ...requestBody, messages: loopMessages, stream: true }
+          setStreamingContent('')
+          turnCount++
+          continueLoop = true
+          continue
        }
-      }

-      // Determine final content
-      let finalContent = rawContent
-      if (!hasReasoningFromAPI) {
-        const { regularContent, thinkingContent } = extractThinking(rawContent)
-        finalContent = regularContent
-        if (thinkingContent && !reasoningContent) reasoningContent = thinkingContent
-      }
+        // No more client tool calls — finalize
+        let finalContent = rawContent
+        if (!hasReasoningFromAPI) {
+          const { regularContent, thinkingContent } = extractThinking(rawContent)
+          finalContent = regularContent
+          if (thinkingContent && !reasoningContent) reasoningContent = thinkingContent
+        }

-      if (reasoningContent) {
-        newMessages.push({ role: 'thinking', content: reasoningContent, expanded: true })
-      }
-      if (finalContent) {
-        newMessages.push({ role: 'assistant', content: finalContent })
+        if (reasoningContent) {
+          newMessages.push({ role: 'thinking', content: reasoningContent, expanded: true })
+        }
+        if (finalContent) {
+          newMessages.push({ role: 'assistant', content: finalContent })
+        }
      }
    }

@@ -582,6 +735,17 @@ export function useChat(initialModel = '') {

  const isActiveStreaming = isStreaming && streamingChatId === activeChatId

+  const addMessage = useCallback((chatId, message) => {
+    setChats(prev => prev.map(c => {
+      if (c.id !== chatId) return c
+      return {
+        ...c,
+        history: [...c.history, { ...message, timestamp: Date.now() }],
+        updatedAt: Date.now(),
+      }
+    }))
+  }, [])
+
  return {
    chats,
    activeChat,
@@ -603,5 +767,6 @@ export function useChat(initialModel = '') {
    stopGeneration,
    clearHistory,
    getContextUsagePercent,
+    addMessage,
  }
 }
--- a/core/http/react-ui/src/hooks/useMCPClient.js
+++ b/core/http/react-ui/src/hooks/useMCPClient.js
@@ -0,0 +1,244 @@
+import { useState, useRef, useCallback } from 'react'
+import { Client } from '@modelcontextprotocol/sdk/client/index.js'
+import { StreamableHTTPClientTransport } from '@modelcontextprotocol/sdk/client/streamableHttp.js'
+import { SSEClientTransport } from '@modelcontextprotocol/sdk/client/sse.js'
+import { getToolUiResourceUri, isToolVisibilityAppOnly } from '@modelcontextprotocol/ext-apps/app-bridge'
+import { API_CONFIG } from '../utils/config'
+
+function buildProxyUrl(targetUrl, useProxy = true) {
+  if (!useProxy) return new URL(targetUrl)
+  const base = window.location.origin
+  return new URL(`${base}${API_CONFIG.endpoints.corsProxy}?url=${encodeURIComponent(targetUrl)}`)
+}
+
+export function useMCPClient() {
+  const connectionsRef = useRef(new Map())
+  const toolIndexRef = useRef(new Map())
+  const [connectionStatuses, setConnectionStatuses] = useState({})
+
+  const updateStatus = useCallback((serverId, status, error = null) => {
+    setConnectionStatuses(prev => ({ ...prev, [serverId]: { status, error } }))
+  }, [])
+
+  const connect = useCallback(async (serverConfig) => {
+    const { id, url, headers = {}, useProxy = true } = serverConfig
+    if (connectionsRef.current.has(id)) return
+
+    updateStatus(id, 'connecting')
+
+    const proxyUrl = buildProxyUrl(url, useProxy)
+    const transportHeaders = { ...headers }
+
+    let client = null
+    let transport = null
+
+    // Try StreamableHTTP first, then SSE fallback
+    for (const TransportClass of [StreamableHTTPClientTransport, SSEClientTransport]) {
+      try {
+        transport = new TransportClass(proxyUrl, { requestInit: { headers: transportHeaders } })
+        client = new Client({ name: 'LocalAI-WebUI', version: '1.0.0' })
+        await client.connect(transport)
+        break
+      } catch (err) {
+        client = null
+        transport = null
+        if (TransportClass === SSEClientTransport) {
+          updateStatus(id, 'error', err.message)
+          return
+        }
+      }
+    }
+
+    if (!client) {
+      updateStatus(id, 'error', 'Failed to connect with any transport')
+      return
+    }
+
+    try {
+      const { tools = [] } = await client.listTools()
+
+      // Remove old tool index entries for this server
+      for (const [toolName, sId] of toolIndexRef.current) {
+        if (sId === id) toolIndexRef.current.delete(toolName)
+      }
+
+      for (const tool of tools) {
+        toolIndexRef.current.set(tool.name, id)
+      }
+
+      connectionsRef.current.set(id, { client, transport, tools, serverConfig })
+      updateStatus(id, 'connected')
+    } catch (err) {
+      try { await client.close() } catch (_) { /* ignore */ }
+      updateStatus(id, 'error', err.message)
+    }
+  }, [updateStatus])
+
+  const disconnect = useCallback(async (serverId) => {
+    const conn = connectionsRef.current.get(serverId)
+    if (!conn) return
+
+    // Remove tool index entries
+    for (const [toolName, sId] of toolIndexRef.current) {
+      if (sId === serverId) toolIndexRef.current.delete(toolName)
+    }
+
+    try { await conn.client.close() } catch (_) { /* ignore */ }
+    connectionsRef.current.delete(serverId)
+    updateStatus(serverId, 'disconnected')
+  }, [updateStatus])
+
+  const disconnectAll = useCallback(async () => {
+    const ids = [...connectionsRef.current.keys()]
+    for (const id of ids) {
+      await disconnect(id)
+    }
+  }, [disconnect])
+
+  const getToolsForLLM = useCallback(() => {
+    const tools = []
+    for (const [, conn] of connectionsRef.current) {
+      for (const tool of conn.tools) {
+        if (isToolVisibilityAppOnly(tool)) continue
+        tools.push({
+          type: 'function',
+          function: {
+            name: tool.name,
+            description: tool.description || '',
+            parameters: tool.inputSchema || { type: 'object', properties: {} },
+          },
+        })
+      }
+    }
+    return tools
+  }, [])
+
+  const isClientTool = useCallback((toolName) => {
+    return toolIndexRef.current.has(toolName)
+  }, [])
+
+  const executeTool = useCallback(async (toolName, argumentsJson) => {
+    const serverId = toolIndexRef.current.get(toolName)
+    if (!serverId) return `Error: no MCP server found for tool "${toolName}"`
+
+    const conn = connectionsRef.current.get(serverId)
+    if (!conn) return `Error: server not connected for tool "${toolName}"`
+
+    let args
+    try {
+      args = typeof argumentsJson === 'string' ? JSON.parse(argumentsJson) : argumentsJson
+    } catch (_) {
+      args = {}
+    }
+
+    try {
+      const result = await conn.client.callTool({ name: toolName, arguments: args })
+      return formatToolResult(result)
+    } catch (err) {
+      // Session might have expired — try reconnecting once
+      if (err.message?.includes('404') || err.message?.includes('session')) {
+        try {
+          await disconnect(serverId)
+          await connect(conn.serverConfig)
+          const newConn = connectionsRef.current.get(serverId)
+          if (newConn) {
+            const result = await newConn.client.callTool({ name: toolName, arguments: args })
+            return formatToolResult(result)
+          }
+        } catch (retryErr) {
+          return `Error executing tool "${toolName}": ${retryErr.message}`
+        }
+      }
+      return `Error executing tool "${toolName}": ${err.message}`
+    }
+  }, [connect, disconnect])
+
+  const getConnectedTools = useCallback(() => {
+    const result = []
+    for (const [serverId, conn] of connectionsRef.current) {
+      result.push({
+        serverId,
+        serverName: conn.serverConfig.name,
+        tools: conn.tools.map(t => t.name),
+      })
+    }
+    return result
+  }, [])
+
+  const findToolAndConnection = useCallback((toolName) => {
+    const serverId = toolIndexRef.current.get(toolName)
+    if (!serverId) return null
+    const conn = connectionsRef.current.get(serverId)
+    if (!conn) return null
+    const tool = conn.tools.find(t => t.name === toolName)
+    if (!tool) return null
+    return { tool, conn }
+  }, [])
+
+  const hasAppUI = useCallback((toolName) => {
+    const found = findToolAndConnection(toolName)
+    if (!found) return false
+    return !!getToolUiResourceUri(found.tool)
+  }, [findToolAndConnection])
+
+  const getAppResource = useCallback(async (toolName) => {
+    const found = findToolAndConnection(toolName)
+    if (!found) return null
+    const uri = getToolUiResourceUri(found.tool)
+    if (!uri) return null
+    try {
+      const res = await found.conn.client.readResource({ uri })
+      const htmlContent = res.contents?.[0]
+      if (!htmlContent) return null
+      return {
+        html: htmlContent.text || '',
+        meta: found.tool._meta?.ui || {},
+      }
+    } catch (err) {
+      console.warn('Failed to fetch MCP app resource:', err)
+      return null
+    }
+  }, [findToolAndConnection])
+
+  const getClientForTool = useCallback((toolName) => {
+    const found = findToolAndConnection(toolName)
+    return found ? found.conn.client : null
+  }, [findToolAndConnection])
+
+  const getToolDefinition = useCallback((toolName) => {
+    const found = findToolAndConnection(toolName)
+    return found ? found.tool : null
+  }, [findToolAndConnection])
+
+  return {
+    connect,
+    disconnect,
+    disconnectAll,
+    getToolsForLLM,
+    isClientTool,
+    executeTool,
+    connectionStatuses,
+    getConnectedTools,
+    hasAppUI,
+    getAppResource,
+    getClientForTool,
+    getToolDefinition,
+  }
+}
+
+function formatToolResult(result) {
+  if (!result?.content) return ''
+  const parts = []
+  for (const item of result.content) {
+    if (item.type === 'text') {
+      parts.push(item.text)
+    } else if (item.type === 'image') {
+      parts.push(`[Image: ${item.mimeType || 'image'}]`)
+    } else if (item.type === 'resource') {
+      parts.push(item.resource?.text || JSON.stringify(item.resource))
+    } else {
+      parts.push(JSON.stringify(item))
+    }
+  }
+  return parts.join('\n')
+}
--- a/core/http/react-ui/src/pages/Chat.jsx
+++ b/core/http/react-ui/src/pages/Chat.jsx
@@ -5,7 +5,11 @@ import ModelSelector from '../components/ModelSelector'
 import { renderMarkdown, highlightAll } from '../utils/markdown'
 import { extractCodeArtifacts, renderMarkdownWithArtifacts } from '../utils/artifacts'
 import CanvasPanel from '../components/CanvasPanel'
-import { fileToBase64, modelsApi } from '../utils/api'
+import { fileToBase64, modelsApi, mcpApi } from '../utils/api'
+import { useMCPClient } from '../hooks/useMCPClient'
+import MCPAppFrame from '../components/MCPAppFrame'
+import ClientMCPDropdown from '../components/ClientMCPDropdown'
+import { loadClientMCPServers } from '../utils/mcpClientStorage'

 function relativeTime(ts) {
  if (!ts) return ''
@@ -60,7 +64,10 @@ function formatToolContent(raw) {
  try {
    const data = JSON.parse(raw)
    const name = data.name || 'unknown'
-    const params = data.arguments || data.input || data.result || data.parameters || {}
+    let params = data.arguments || data.input || data.result || data.parameters || {}
+    if (typeof params === 'string') {
+      try { params = JSON.parse(params) } catch (_) { /* keep as string */ }
+    }
    const entries = typeof params === 'object' && params !== null ? Object.entries(params) : []
    return { name, entries, fallback: null }
  } catch (_e) {
@@ -89,7 +96,7 @@ function ToolParams({ entries, fallback }) {
  )
 }

-function ActivityGroup({ items, updateChatSettings, activeChat }) {
+function ActivityGroup({ items, updateChatSettings, activeChat, getClientForTool }) {
  const [expanded, setExpanded] = useState(false)
  const contentRef = useRef(null)

@@ -99,7 +106,11 @@ function ActivityGroup({ items, updateChatSettings, activeChat }) {

  if (!items || items.length === 0) return null

-  const labels = items.map(item => {
+  // Separate out tool_result items that have appUI — they render outside the collapsed group
+  const appUIItems = items.filter(item => item.role === 'tool_result' && item.appUI)
+  const regularItems = items.filter(item => !(item.role === 'tool_result' && item.appUI))
+
+  const labels = regularItems.map(item => {
    if (item.role === 'thinking' || item.role === 'reasoning') return 'Thought'
    if (item.role === 'tool_call') {
      try { return JSON.parse(item.content)?.name || 'Tool' } catch (_e) { return 'Tool' }
@@ -112,40 +123,63 @@ function ActivityGroup({ items, updateChatSettings, activeChat }) {
  const summary = labels.join(' → ')

  return (
-    <div className="chat-message chat-message-assistant">
-      <div className="chat-message-avatar">
-        <i className="fas fa-cogs" />
-      </div>
-      <div className="chat-activity-group">
-        <button className="chat-activity-toggle" onClick={() => setExpanded(!expanded)}>
-          <span className="chat-activity-summary">{summary}</span>
-          <i className={`fas fa-chevron-${expanded ? 'up' : 'down'}`} />
-        </button>
-        {expanded && (
-          <div className="chat-activity-details" ref={contentRef}>
-            {items.map((item, idx) => {
-              if (item.role === 'thinking' || item.role === 'reasoning') {
-                return (
-                  <div key={idx} className="chat-activity-item chat-activity-thinking">
-                    <span className="chat-activity-item-label">Thought</span>
-                    <div className="chat-activity-item-content"
-                      dangerouslySetInnerHTML={{ __html: renderMarkdown(item.content || '') }} />
-                  </div>
-                )
-              }
-              const isCall = item.role === 'tool_call'
-              const parsed = formatToolContent(item.content)
-              return (
-                <div key={idx} className={`chat-activity-item ${isCall ? 'chat-activity-tool-call' : 'chat-activity-tool-result'}`}>
-                  <span className="chat-activity-item-label">{labels[idx]}</span>
-                  <ToolParams entries={parsed.entries} fallback={parsed.fallback} />
-                </div>
-              )
-            })}
+    <>
+      {regularItems.length > 0 && (
+        <div className="chat-message chat-message-assistant">
+          <div className="chat-message-avatar">
+            <i className="fas fa-cogs" />
          </div>
-        )}
-      </div>
-    </div>
+          <div className="chat-activity-group">
+            <button className="chat-activity-toggle" onClick={() => setExpanded(!expanded)}>
+              <span className="chat-activity-summary">{summary}</span>
+              <i className={`fas fa-chevron-${expanded ? 'up' : 'down'}`} />
+            </button>
+            {expanded && (
+              <div className="chat-activity-details" ref={contentRef}>
+                {regularItems.map((item, idx) => {
+                  if (item.role === 'thinking' || item.role === 'reasoning') {
+                    return (
+                      <div key={idx} className="chat-activity-item chat-activity-thinking">
+                        <span className="chat-activity-item-label">Thought</span>
+                        <div className="chat-activity-item-content"
+                          dangerouslySetInnerHTML={{ __html: renderMarkdown(item.content || '') }} />
+                      </div>
+                    )
+                  }
+                  const isCall = item.role === 'tool_call'
+                  const parsed = formatToolContent(item.content)
+                  return (
+                    <div key={idx} className={`chat-activity-item ${isCall ? 'chat-activity-tool-call' : 'chat-activity-tool-result'}`}>
+                      <span className="chat-activity-item-label">{labels[idx]}</span>
+                      <ToolParams entries={parsed.entries} fallback={parsed.fallback} />
+                    </div>
+                  )
+                })}
+              </div>
+            )}
+          </div>
+        </div>
+      )}
+      {appUIItems.map((item, idx) => (
+        <div key={`appui-${idx}`} className="chat-message chat-message-assistant">
+          <div className="chat-message-avatar">
+            <i className="fas fa-puzzle-piece" />
+          </div>
+          <div className="chat-message-bubble">
+            <span className="chat-message-model">{item.appUI.toolName}</span>
+            <MCPAppFrame
+              toolName={item.appUI.toolName}
+              toolInput={item.appUI.toolInput}
+              toolResult={item.appUI.toolResult}
+              mcpClient={getClientForTool?.(item.appUI.toolName) || null}
+              toolDefinition={item.appUI.toolDefinition}
+              appHtml={item.appUI.html}
+              resourceMeta={item.appUI.meta}
+            />
+          </div>
+        </div>
+      ))}
+    </>
  )
 }

@@ -156,8 +190,8 @@ function StreamingActivity({ reasoning, toolCalls, hasResponse }) {
  const contentRef = useRef(null)
  const [manualCollapse, setManualCollapse] = useState(null)

-  // Auto-expand while thinking, auto-collapse when response starts
-  const autoExpanded = reasoning && !hasResponse
+  // Auto-expand while thinking or tool-calling, auto-collapse when response starts
+  const autoExpanded = (reasoning || (toolCalls && toolCalls.length > 0)) && !hasResponse
  const expanded = manualCollapse !== null ? !manualCollapse : autoExpanded

  // Scroll to bottom of thinking content as it streams
@@ -202,9 +236,18 @@ function StreamingActivity({ reasoning, toolCalls, hasResponse }) {
        {expanded && toolCalls && toolCalls.length > 0 && (
          <div className="chat-activity-details">
            {toolCalls.map((tc, idx) => {
+              if (tc.type === 'tool_result') {
+                return (
+                  <div key={idx} className="chat-activity-item chat-activity-tool-result">
+                    <span className="chat-activity-item-label">{tc.name} result</span>
+                    <div className="chat-activity-item-content"
+                      dangerouslySetInnerHTML={{ __html: renderMarkdown(tc.result || '') }} />
+                  </div>
+                )
+              }
              const parsed = formatToolContent(JSON.stringify(tc, null, 2))
              return (
-                <div key={idx} className={`chat-activity-item ${tc.type === 'tool_call' ? 'chat-activity-tool-call' : 'chat-activity-tool-result'}`}>
+                <div key={idx} className="chat-activity-item chat-activity-tool-call">
                  <span className="chat-activity-item-label">{tc.name || tc.type}</span>
                  <ToolParams entries={parsed.entries} fallback={parsed.fallback} />
                </div>
@@ -247,7 +290,7 @@ export default function Chat() {
    chats, activeChat, activeChatId, isStreaming, streamingChatId, streamingContent,
    streamingReasoning, streamingToolCalls, tokensPerSecond, maxTokensPerSecond,
    addChat, switchChat, deleteChat, deleteAllChats, renameChat, updateChatSettings,
-    sendMessage, stopGeneration, clearHistory, getContextUsagePercent,
+    sendMessage, stopGeneration, clearHistory, getContextUsagePercent, addMessage,
  } = useChat(urlModel || '')

  const [input, setInput] = useState('')
@@ -256,6 +299,18 @@ export default function Chat() {
  const [editingName, setEditingName] = useState(null)
  const [editName, setEditName] = useState('')
  const [mcpAvailable, setMcpAvailable] = useState(false)
+  const [mcpServersOpen, setMcpServersOpen] = useState(false)
+  const [mcpServerList, setMcpServerList] = useState([])
+  const [mcpServersLoading, setMcpServersLoading] = useState(false)
+  const [mcpServerCache, setMcpServerCache] = useState({})
+  const [mcpPromptsOpen, setMcpPromptsOpen] = useState(false)
+  const [mcpPromptList, setMcpPromptList] = useState([])
+  const [mcpPromptsLoading, setMcpPromptsLoading] = useState(false)
+  const [mcpPromptArgsDialog, setMcpPromptArgsDialog] = useState(null)
+  const [mcpPromptArgsValues, setMcpPromptArgsValues] = useState({})
+  const [mcpResourcesOpen, setMcpResourcesOpen] = useState(false)
+  const [mcpResourceList, setMcpResourceList] = useState([])
+  const [mcpResourcesLoading, setMcpResourcesLoading] = useState(false)
  const [chatSearch, setChatSearch] = useState('')
  const [modelInfo, setModelInfo] = useState(null)
  const [showModelInfo, setShowModelInfo] = useState(false)
@@ -263,6 +318,12 @@ export default function Chat() {
  const [canvasMode, setCanvasMode] = useState(false)
  const [canvasOpen, setCanvasOpen] = useState(false)
  const [selectedArtifactId, setSelectedArtifactId] = useState(null)
+  const [clientMCPServers, setClientMCPServers] = useState(() => loadClientMCPServers())
+  const {
+    connect: mcpConnect, disconnect: mcpDisconnect, disconnectAll: mcpDisconnectAll,
+    getToolsForLLM, isClientTool, executeTool, connectionStatuses, getConnectedTools,
+    hasAppUI, getAppResource, getClientForTool, getToolDefinition,
+  } = useMCPClient()
  const messagesEndRef = useRef(null)
  const fileInputRef = useRef(null)
  const messagesRef = useRef(null)
@@ -296,12 +357,191 @@ export default function Chat() {
      const hasMcp = !!(cfg?.mcp?.remote || cfg?.mcp?.stdio)
      setMcpAvailable(hasMcp)
      if (!hasMcp && activeChat?.mcpMode) {
-        updateChatSettings(activeChat.id, { mcpMode: false })
+        updateChatSettings(activeChat.id, { mcpMode: false, mcpServers: [] })
      }
    }).catch(() => { if (!cancelled) { setMcpAvailable(false); setModelInfo(null) } })
    return () => { cancelled = true }
  }, [activeChat?.model])

+  const mcpDropdownRef = useRef(null)
+  useEffect(() => {
+    if (!mcpServersOpen) return
+    const handleClick = (e) => {
+      if (mcpDropdownRef.current && !mcpDropdownRef.current.contains(e.target)) {
+        setMcpServersOpen(false)
+      }
+    }
+    document.addEventListener('mousedown', handleClick)
+    return () => document.removeEventListener('mousedown', handleClick)
+  }, [mcpServersOpen])
+
+  const fetchMcpServers = useCallback(async () => {
+    const model = activeChat?.model
+    if (!model) return
+    if (mcpServerCache[model]) {
+      setMcpServerList(mcpServerCache[model])
+      return
+    }
+    setMcpServersLoading(true)
+    try {
+      const data = await mcpApi.listServers(model)
+      const servers = data?.servers || []
+      setMcpServerList(servers)
+      setMcpServerCache(prev => ({ ...prev, [model]: servers }))
+    } catch (_e) {
+      setMcpServerList([])
+    } finally {
+      setMcpServersLoading(false)
+    }
+  }, [activeChat?.model, mcpServerCache])
+
+  const toggleMcpServer = useCallback((serverName) => {
+    if (!activeChat) return
+    const current = activeChat.mcpServers || []
+    const next = current.includes(serverName)
+      ? current.filter(s => s !== serverName)
+      : [...current, serverName]
+    updateChatSettings(activeChat.id, { mcpServers: next })
+  }, [activeChat, updateChatSettings])
+
+  const mcpPromptsRef = useRef(null)
+  useEffect(() => {
+    if (!mcpPromptsOpen) return
+    const handleClick = (e) => {
+      if (mcpPromptsRef.current && !mcpPromptsRef.current.contains(e.target)) {
+        setMcpPromptsOpen(false)
+      }
+    }
+    document.addEventListener('mousedown', handleClick)
+    return () => document.removeEventListener('mousedown', handleClick)
+  }, [mcpPromptsOpen])
+
+  const mcpResourcesRef = useRef(null)
+  useEffect(() => {
+    if (!mcpResourcesOpen) return
+    const handleClick = (e) => {
+      if (mcpResourcesRef.current && !mcpResourcesRef.current.contains(e.target)) {
+        setMcpResourcesOpen(false)
+      }
+    }
+    document.addEventListener('mousedown', handleClick)
+    return () => document.removeEventListener('mousedown', handleClick)
+  }, [mcpResourcesOpen])
+
+  const fetchMcpPrompts = useCallback(async () => {
+    const model = activeChat?.model
+    if (!model) return
+    setMcpPromptsLoading(true)
+    try {
+      const data = await mcpApi.listPrompts(model)
+      setMcpPromptList(Array.isArray(data) ? data : [])
+    } catch (_e) {
+      setMcpPromptList([])
+    } finally {
+      setMcpPromptsLoading(false)
+    }
+  }, [activeChat?.model])
+
+  const fetchMcpResources = useCallback(async () => {
+    const model = activeChat?.model
+    if (!model) return
+    setMcpResourcesLoading(true)
+    try {
+      const data = await mcpApi.listResources(model)
+      setMcpResourceList(Array.isArray(data) ? data : [])
+    } catch (_e) {
+      setMcpResourceList([])
+    } finally {
+      setMcpResourcesLoading(false)
+    }
+  }, [activeChat?.model])
+
+  const handleSelectPrompt = useCallback(async (prompt) => {
+    if (prompt.arguments && prompt.arguments.length > 0) {
+      setMcpPromptArgsDialog(prompt)
+      setMcpPromptArgsValues({})
+      return
+    }
+    // No arguments, expand immediately
+    const model = activeChat?.model
+    if (!model) return
+    try {
+      const result = await mcpApi.getPrompt(model, prompt.name, {})
+      if (result?.messages) {
+        for (const msg of result.messages) {
+          addMessage(activeChat.id, { role: msg.role || 'user', content: msg.content })
+        }
+      }
+    } catch (e) {
+      addMessage(activeChat.id, { role: 'system', content: `Failed to expand prompt: ${e.message}` })
+    }
+    setMcpPromptsOpen(false)
+  }, [activeChat?.model, activeChat?.id, addMessage])
+
+  const handleExpandPromptWithArgs = useCallback(async () => {
+    if (!mcpPromptArgsDialog) return
+    const model = activeChat?.model
+    if (!model) return
+    try {
+      const result = await mcpApi.getPrompt(model, mcpPromptArgsDialog.name, mcpPromptArgsValues)
+      if (result?.messages) {
+        for (const msg of result.messages) {
+          addMessage(activeChat.id, { role: msg.role || 'user', content: msg.content })
+        }
+      }
+    } catch (e) {
+      addMessage(activeChat.id, { role: 'system', content: `Failed to expand prompt: ${e.message}` })
+    }
+    setMcpPromptArgsDialog(null)
+    setMcpPromptArgsValues({})
+    setMcpPromptsOpen(false)
+  }, [activeChat?.model, activeChat?.id, mcpPromptArgsDialog, mcpPromptArgsValues, addMessage])
+
+  const toggleMcpResource = useCallback((uri) => {
+    if (!activeChat) return
+    const current = activeChat.mcpResources || []
+    const next = current.includes(uri)
+      ? current.filter(u => u !== uri)
+      : [...current, uri]
+    updateChatSettings(activeChat.id, { mcpResources: next })
+  }, [activeChat, updateChatSettings])
+
+  // Auto-connect/disconnect client MCP servers based on chat's active list
+  const activeMCPIds = activeChat?.clientMCPServers || []
+  useEffect(() => {
+    const activeSet = new Set(activeMCPIds)
+    for (const server of clientMCPServers) {
+      const status = connectionStatuses[server.id]?.status
+      if (activeSet.has(server.id) && status !== 'connected' && status !== 'connecting') {
+        mcpConnect(server)
+      } else if (!activeSet.has(server.id) && (status === 'connected' || status === 'connecting')) {
+        mcpDisconnect(server.id)
+      }
+    }
+  }, [activeMCPIds.join(','), clientMCPServers])
+
+  const handleClientMCPServerAdded = useCallback((server) => {
+    setClientMCPServers(loadClientMCPServers())
+    const current = activeChat?.clientMCPServers || []
+    if (activeChat) updateChatSettings(activeChat.id, { clientMCPServers: [...current, server.id] })
+  }, [activeChat, updateChatSettings])
+
+  const handleClientMCPServerRemoved = useCallback(async (id) => {
+    await mcpDisconnect(id)
+    setClientMCPServers(loadClientMCPServers())
+    if (activeChat) {
+      const current = activeChat.clientMCPServers || []
+      updateChatSettings(activeChat.id, { clientMCPServers: current.filter(s => s !== id) })
+    }
+  }, [activeChat, mcpDisconnect, updateChatSettings])
+
+  const handleClientMCPToggle = useCallback((serverId) => {
+    if (!activeChat) return
+    const current = activeChat.clientMCPServers || []
+    const next = current.includes(serverId) ? current.filter(s => s !== serverId) : [...current, serverId]
+    updateChatSettings(activeChat.id, { clientMCPServers: next })
+  }, [activeChat, updateChatSettings])
+
  // Load initial message from home page
  const homeDataProcessed = useRef(false)
  useEffect(() => {
@@ -325,6 +565,12 @@ export default function Chat() {
              updateChatSettings(activeChat.id, { mcpMode: true })
            }
          }
+          if (data.mcpServers?.length > 0 && targetChat) {
+            updateChatSettings(targetChat.id, { mcpServers: data.mcpServers })
+          }
+          if (data.clientMCPServers?.length > 0 && targetChat) {
+            updateChatSettings(targetChat.id, { clientMCPServers: data.clientMCPServers })
+          }
          setInput(data.message)
          if (data.files) setFiles(data.files)
          setTimeout(() => {
@@ -418,8 +664,28 @@ export default function Chat() {
    }
    setInput('')
    setFiles([])
-    await sendMessage(msg, files)
-  }, [input, files, activeChat, sendMessage, addToast])
+    const tools = getToolsForLLM()
+    const mcpOptions = tools.length > 0 ? {
+      clientMCPTools: tools,
+      isClientTool: (name) => isClientTool(name),
+      executeTool: (name, args) => executeTool(name, args),
+      maxToolTurns: 10,
+      getToolAppUI: async (toolName, toolInput, toolResultText) => {
+        if (!hasAppUI(toolName)) return null
+        const resource = await getAppResource(toolName)
+        if (!resource) return null
+        return {
+          html: resource.html,
+          meta: resource.meta,
+          toolName,
+          toolInput,
+          toolDefinition: getToolDefinition(toolName),
+          toolResult: { content: [{ type: 'text', text: toolResultText }] },
+        }
+      },
+    } : {}
+    await sendMessage(msg, files, mcpOptions)
+  }, [input, files, activeChat, sendMessage, addToast, getToolsForLLM, isClientTool, executeTool, hasAppUI, getAppResource, getToolDefinition])

  const handleRegenerate = useCallback(async () => {
    if (!activeChat || isStreaming) return
@@ -631,18 +897,170 @@ export default function Chat() {
            </>
          )}
          {mcpAvailable && (
-            <label className="chat-mcp-switch" title="Toggle MCP mode">
-              <span className="chat-mcp-switch-label">MCP</span>
-              <span className="toggle">
-                <input
-                  type="checkbox"
-                  checked={activeChat.mcpMode || false}
-                  onChange={(e) => updateChatSettings(activeChat.id, { mcpMode: e.target.checked })}
-                />
-                <span className="toggle-slider" />
-              </span>
-            </label>
+            <div className="chat-mcp-dropdown" ref={mcpDropdownRef}>
+              <button
+                className={`btn btn-sm ${(activeChat.mcpServers?.length > 0) ? 'btn-primary' : 'btn-secondary'}`}
+                title="Select MCP servers"
+                onClick={() => { setMcpServersOpen(!mcpServersOpen); if (!mcpServersOpen) fetchMcpServers() }}
+              >
+                <i className="fas fa-plug" /> MCP
+                {activeChat.mcpServers?.length > 0 && (
+                  <span className="chat-mcp-badge">{activeChat.mcpServers.length}</span>
+                )}
+              </button>
+              {mcpServersOpen && (
+                <div className="chat-mcp-dropdown-menu">
+                  {mcpServersLoading ? (
+                    <div className="chat-mcp-dropdown-loading"><i className="fas fa-spinner fa-spin" /> Loading servers...</div>
+                  ) : mcpServerList.length === 0 ? (
+                    <div className="chat-mcp-dropdown-empty">No MCP servers configured</div>
+                  ) : (
+                    <>
+                      <div className="chat-mcp-dropdown-header">
+                        <span>MCP Servers</span>
+                        <button
+                          className="chat-mcp-select-all"
+                          onClick={() => {
+                            const allNames = mcpServerList.map(s => s.name)
+                            const allSelected = allNames.every(n => (activeChat.mcpServers || []).includes(n))
+                            updateChatSettings(activeChat.id, { mcpServers: allSelected ? [] : allNames })
+                          }}
+                        >
+                          {mcpServerList.every(s => (activeChat.mcpServers || []).includes(s.name)) ? 'Deselect all' : 'Select all'}
+                        </button>
+                      </div>
+                      {mcpServerList.map(server => (
+                        <label key={server.name} className="chat-mcp-server-item">
+                          <input
+                            type="checkbox"
+                            checked={(activeChat.mcpServers || []).includes(server.name)}
+                            onChange={() => toggleMcpServer(server.name)}
+                          />
+                          <div className="chat-mcp-server-info">
+                            <span className="chat-mcp-server-name">{server.name}</span>
+                            <span className="chat-mcp-server-tools">{server.tools?.length || 0} tools</span>
+                          </div>
+                        </label>
+                      ))}
+                    </>
+                  )}
+                </div>
+              )}
+            </div>
          )}
+          {mcpAvailable && (
+            <div className="chat-mcp-dropdown" ref={mcpPromptsRef}>
+              <button
+                className="btn btn-sm btn-secondary"
+                title="MCP Prompts"
+                onClick={() => { setMcpPromptsOpen(!mcpPromptsOpen); if (!mcpPromptsOpen) fetchMcpPrompts() }}
+              >
+                <i className="fas fa-comment-dots" /> Prompts
+              </button>
+              {mcpPromptsOpen && (
+                <div className="chat-mcp-dropdown-menu">
+                  {mcpPromptsLoading ? (
+                    <div className="chat-mcp-dropdown-loading"><i className="fas fa-spinner fa-spin" /> Loading prompts...</div>
+                  ) : mcpPromptList.length === 0 ? (
+                    <div className="chat-mcp-dropdown-empty">No MCP prompts available</div>
+                  ) : (
+                    <>
+                      <div className="chat-mcp-dropdown-header"><span>MCP Prompts</span></div>
+                      {mcpPromptList.map(prompt => (
+                        <div
+                          key={prompt.name}
+                          className="chat-mcp-server-item"
+                          style={{ cursor: 'pointer', padding: '6px 10px' }}
+                          onClick={() => handleSelectPrompt(prompt)}
+                        >
+                          <div className="chat-mcp-server-info">
+                            <span className="chat-mcp-server-name">{prompt.title || prompt.name}</span>
+                            {prompt.description && (
+                              <span className="chat-mcp-server-tools">{prompt.description}</span>
+                            )}
+                          </div>
+                        </div>
+                      ))}
+                    </>
+                  )}
+                </div>
+              )}
+              {mcpPromptArgsDialog && (
+                <div className="chat-mcp-dropdown-menu" style={{ minWidth: '250px' }}>
+                  <div className="chat-mcp-dropdown-header">
+                    <span>{mcpPromptArgsDialog.title || mcpPromptArgsDialog.name}</span>
+                  </div>
+                  {mcpPromptArgsDialog.arguments.map(arg => (
+                    <div key={arg.name} style={{ padding: '4px 10px' }}>
+                      <label style={{ fontSize: '0.8rem', display: 'block', marginBottom: '2px' }}>
+                        {arg.name}{arg.required ? ' *' : ''}
+                      </label>
+                      <input
+                        type="text"
+                        className="input input-sm"
+                        style={{ width: '100%' }}
+                        placeholder={arg.description || arg.name}
+                        value={mcpPromptArgsValues[arg.name] || ''}
+                        onChange={e => setMcpPromptArgsValues(prev => ({ ...prev, [arg.name]: e.target.value }))}
+                      />
+                    </div>
+                  ))}
+                  <div style={{ padding: '6px 10px', display: 'flex', gap: '6px', justifyContent: 'flex-end' }}>
+                    <button className="btn btn-sm btn-secondary" onClick={() => setMcpPromptArgsDialog(null)}>Cancel</button>
+                    <button className="btn btn-sm btn-primary" onClick={handleExpandPromptWithArgs}>Apply</button>
+                  </div>
+                </div>
+              )}
+            </div>
+          )}
+          {mcpAvailable && (
+            <div className="chat-mcp-dropdown" ref={mcpResourcesRef}>
+              <button
+                className={`btn btn-sm ${(activeChat.mcpResources?.length > 0) ? 'btn-primary' : 'btn-secondary'}`}
+                title="MCP Resources"
+                onClick={() => { setMcpResourcesOpen(!mcpResourcesOpen); if (!mcpResourcesOpen) fetchMcpResources() }}
+              >
+                <i className="fas fa-paperclip" /> Resources
+                {activeChat.mcpResources?.length > 0 && (
+                  <span className="chat-mcp-badge">{activeChat.mcpResources.length}</span>
+                )}
+              </button>
+              {mcpResourcesOpen && (
+                <div className="chat-mcp-dropdown-menu">
+                  {mcpResourcesLoading ? (
+                    <div className="chat-mcp-dropdown-loading"><i className="fas fa-spinner fa-spin" /> Loading resources...</div>
+                  ) : mcpResourceList.length === 0 ? (
+                    <div className="chat-mcp-dropdown-empty">No MCP resources available</div>
+                  ) : (
+                    <>
+                      <div className="chat-mcp-dropdown-header"><span>MCP Resources</span></div>
+                      {mcpResourceList.map(resource => (
+                        <label key={resource.uri} className="chat-mcp-server-item">
+                          <input
+                            type="checkbox"
+                            checked={(activeChat.mcpResources || []).includes(resource.uri)}
+                            onChange={() => toggleMcpResource(resource.uri)}
+                          />
+                          <div className="chat-mcp-server-info">
+                            <span className="chat-mcp-server-name">{resource.name}</span>
+                            <span className="chat-mcp-server-tools">{resource.uri}</span>
+                          </div>
+                        </label>
+                      ))}
+                    </>
+                  )}
+                </div>
+              )}
+            </div>
+          )}
+          <ClientMCPDropdown
+            activeServerIds={activeChat.clientMCPServers || []}
+            onToggleServer={handleClientMCPToggle}
+            onServerAdded={handleClientMCPServerAdded}
+            onServerRemoved={handleClientMCPServerRemoved}
+            connectionStatuses={connectionStatuses}
+            getConnectedTools={getConnectedTools}
+          />
          <div className="chat-header-actions">
            <label className="canvas-mode-toggle" title="Extract code blocks and media into a side panel for preview, copy, and download">
              <i className="fas fa-columns" />
@@ -821,7 +1239,8 @@ export default function Chat() {
              if (activityBuf.length > 0) {
                elements.push(
                  <ActivityGroup key={`ag-${key}`} items={[...activityBuf]}
-                    updateChatSettings={updateChatSettings} activeChat={activeChat} />
+                    updateChatSettings={updateChatSettings} activeChat={activeChat}
+                    getClientForTool={getClientForTool} />
                )
                activityBuf = []
              }
--- a/core/http/react-ui/src/pages/Home.jsx
+++ b/core/http/react-ui/src/pages/Home.jsx
@@ -1,8 +1,9 @@
 import { useState, useEffect, useRef, useCallback } from 'react'
 import { useNavigate, useOutletContext } from 'react-router-dom'
 import ModelSelector from '../components/ModelSelector'
+import ClientMCPDropdown from '../components/ClientMCPDropdown'
 import { useResources } from '../hooks/useResources'
-import { fileToBase64, backendControlApi, systemApi, modelsApi } from '../utils/api'
+import { fileToBase64, backendControlApi, systemApi, modelsApi, mcpApi } from '../utils/api'
 import { API_CONFIG } from '../utils/config'

 const placeholderMessages = [
@@ -35,6 +36,13 @@ export default function Home() {
  const [textFiles, setTextFiles] = useState([])
  const [mcpMode, setMcpMode] = useState(false)
  const [mcpAvailable, setMcpAvailable] = useState(false)
+  const [mcpServersOpen, setMcpServersOpen] = useState(false)
+  const [mcpServerList, setMcpServerList] = useState([])
+  const [mcpServersLoading, setMcpServersLoading] = useState(false)
+  const [mcpServerCache, setMcpServerCache] = useState({})
+  const [mcpSelectedServers, setMcpSelectedServers] = useState([])
+  const [clientMCPSelectedIds, setClientMCPSelectedIds] = useState([])
+  const mcpDropdownRef = useRef(null)
  const [placeholderIdx, setPlaceholderIdx] = useState(0)
  const [placeholderText, setPlaceholderText] = useState('')
  const imageInputRef = useRef(null)
@@ -72,6 +80,7 @@ export default function Home() {
    if (!selectedModel) {
      setMcpAvailable(false)
      setMcpMode(false)
+      setMcpSelectedServers([])
      return
    }
    let cancelled = false
@@ -79,11 +88,15 @@ export default function Home() {
      if (cancelled) return
      const hasMcp = !!(cfg?.mcp?.remote || cfg?.mcp?.stdio)
      setMcpAvailable(hasMcp)
-      if (!hasMcp) setMcpMode(false)
+      if (!hasMcp) {
+        setMcpMode(false)
+        setMcpSelectedServers([])
+      }
    }).catch(() => {
      if (!cancelled) {
        setMcpAvailable(false)
        setMcpMode(false)
+        setMcpSelectedServers([])
      }
    })
    return () => { cancelled = true }
@@ -126,6 +139,42 @@ export default function Home() {
    else setTextFiles(removeFn)
  }, [])

+  useEffect(() => {
+    if (!mcpServersOpen) return
+    const handleClick = (e) => {
+      if (mcpDropdownRef.current && !mcpDropdownRef.current.contains(e.target)) {
+        setMcpServersOpen(false)
+      }
+    }
+    document.addEventListener('mousedown', handleClick)
+    return () => document.removeEventListener('mousedown', handleClick)
+  }, [mcpServersOpen])
+
+  const fetchMcpServers = useCallback(async () => {
+    if (!selectedModel) return
+    if (mcpServerCache[selectedModel]) {
+      setMcpServerList(mcpServerCache[selectedModel])
+      return
+    }
+    setMcpServersLoading(true)
+    try {
+      const data = await mcpApi.listServers(selectedModel)
+      const servers = data?.servers || []
+      setMcpServerList(servers)
+      setMcpServerCache(prev => ({ ...prev, [selectedModel]: servers }))
+    } catch (_e) {
+      setMcpServerList([])
+    } finally {
+      setMcpServersLoading(false)
+    }
+  }, [selectedModel, mcpServerCache])
+
+  const toggleMcpServer = useCallback((serverName) => {
+    setMcpSelectedServers(prev =>
+      prev.includes(serverName) ? prev.filter(s => s !== serverName) : [...prev, serverName]
+    )
+  }, [])
+
  const doSubmit = useCallback(() => {
    const text = message.trim() || placeholderText
    if (!text && allFiles.length === 0) return
@@ -139,11 +188,13 @@ export default function Home() {
      model: selectedModel,
      files: allFiles,
      mcpMode,
+      mcpServers: mcpSelectedServers,
+      clientMCPServers: clientMCPSelectedIds,
      newChat: true,
    }
    localStorage.setItem('localai_index_chat_data', JSON.stringify(chatData))
    navigate(`/chat/${encodeURIComponent(selectedModel)}`)
-  }, [message, placeholderText, allFiles, selectedModel, mcpMode, addToast, navigate])
+  }, [message, placeholderText, allFiles, selectedModel, mcpMode, mcpSelectedServers, clientMCPSelectedIds, addToast, navigate])

  const handleSubmit = (e) => {
    if (e) e.preventDefault()
@@ -200,26 +251,69 @@ export default function Home() {
              <div className="home-model-row">
                <ModelSelector value={selectedModel} onChange={setSelectedModel} capability="FLAG_CHAT" />
                {mcpAvailable && (
-                  <label className="home-mcp-toggle">
-                    <span className="home-mcp-label">MCP</span>
-                    <span className="toggle">
-                      <input
-                        type="checkbox"
-                        checked={mcpMode}
-                        onChange={(e) => setMcpMode(e.target.checked)}
-                      />
-                      <span className="toggle-slider" />
-                    </span>
-                  </label>
+                  <div className="chat-mcp-dropdown" ref={mcpDropdownRef}>
+                    <button
+                      type="button"
+                      className={`btn btn-sm ${mcpSelectedServers.length > 0 ? 'btn-primary' : 'btn-secondary'}`}
+                      title="Select MCP servers"
+                      onClick={() => { setMcpServersOpen(!mcpServersOpen); if (!mcpServersOpen) fetchMcpServers() }}
+                    >
+                      <i className="fas fa-plug" /> MCP
+                      {mcpSelectedServers.length > 0 && (
+                        <span className="chat-mcp-badge">{mcpSelectedServers.length}</span>
+                      )}
+                    </button>
+                    {mcpServersOpen && (
+                      <div className="chat-mcp-dropdown-menu">
+                        {mcpServersLoading ? (
+                          <div className="chat-mcp-dropdown-loading"><i className="fas fa-spinner fa-spin" /> Loading servers...</div>
+                        ) : mcpServerList.length === 0 ? (
+                          <div className="chat-mcp-dropdown-empty">No MCP servers configured</div>
+                        ) : (
+                          <>
+                            <div className="chat-mcp-dropdown-header">
+                              <span>MCP Servers</span>
+                              <button
+                                type="button"
+                                className="chat-mcp-select-all"
+                                onClick={() => {
+                                  const allNames = mcpServerList.map(s => s.name)
+                                  const allSelected = allNames.every(n => mcpSelectedServers.includes(n))
+                                  setMcpSelectedServers(allSelected ? [] : allNames)
+                                }}
+                              >
+                                {mcpServerList.every(s => mcpSelectedServers.includes(s.name)) ? 'Deselect all' : 'Select all'}
+                              </button>
+                            </div>
+                            {mcpServerList.map(server => (
+                              <label key={server.name} className="chat-mcp-server-item">
+                                <input
+                                  type="checkbox"
+                                  checked={mcpSelectedServers.includes(server.name)}
+                                  onChange={() => toggleMcpServer(server.name)}
+                                />
+                                <div className="chat-mcp-server-info">
+                                  <span className="chat-mcp-server-name">{server.name}</span>
+                                  <span className="chat-mcp-server-tools">{server.tools?.length || 0} tools</span>
+                                </div>
+                              </label>
+                            ))}
+                          </>
+                        )}
+                      </div>
+                    )}
+                  </div>
                )}
+                <ClientMCPDropdown
+                  activeServerIds={clientMCPSelectedIds}
+                  onToggleServer={(id) => setClientMCPSelectedIds(prev =>
+                    prev.includes(id) ? prev.filter(s => s !== id) : [...prev, id]
+                  )}
+                  onServerAdded={(server) => setClientMCPSelectedIds(prev => [...prev, server.id])}
+                  onServerRemoved={(id) => setClientMCPSelectedIds(prev => prev.filter(s => s !== id))}
+                />
              </div>

-              {mcpMode && (
-                <div className="home-mcp-info">
-                  <i className="fas fa-info-circle" /> Non-streaming mode active.
-                </div>
-              )}
-
              {/* File attachment tags */}
              {allFiles.length > 0 && (
                <div className="home-file-tags">
@@ -453,21 +547,6 @@ export default function Home() {
          gap: var(--spacing-sm);
          margin-bottom: var(--spacing-sm);
        }
-        .home-mcp-toggle {
-          display: flex;
-          align-items: center;
-          gap: 6px;
-          cursor: pointer;
-          user-select: none;
-        }
-        .home-mcp-info {
-          font-size: 0.75rem;
-          color: var(--color-accent);
-          padding: var(--spacing-xs) var(--spacing-sm);
-          background: var(--color-accent-light);
-          border-radius: var(--radius-md);
-          margin-bottom: var(--spacing-sm);
-        }
        .home-file-tags {
          display: flex;
          flex-wrap: wrap;
--- a/core/http/react-ui/src/theme.css
+++ b/core/http/react-ui/src/theme.css
@@ -74,7 +74,8 @@
  --radius-xl: 12px;
  --radius-full: 9999px;

-  --sidebar-width: 220px;
+  --sidebar-width: 200px;
+  --sidebar-width-collapsed: 52px;
  --color-toggle-off: #475569;
 }

--- a/core/http/react-ui/src/utils/api.js
+++ b/core/http/react-ui/src/utils/api.js
@@ -112,6 +112,15 @@ export const chatApi = {
  mcpComplete: (body) => postJSON(API_CONFIG.endpoints.mcpChatCompletions, body),
 }

+// MCP API
+export const mcpApi = {
+  listServers: (model) => fetchJSON(API_CONFIG.endpoints.mcpServers(model)),
+  listPrompts: (model) => fetchJSON(API_CONFIG.endpoints.mcpPrompts(model)),
+  getPrompt: (model, name, args) => postJSON(API_CONFIG.endpoints.mcpGetPrompt(model, name), { arguments: args }),
+  listResources: (model) => fetchJSON(API_CONFIG.endpoints.mcpResources(model)),
+  readResource: (model, uri) => postJSON(API_CONFIG.endpoints.mcpReadResource(model), { uri }),
+}
+
 // Resources API
 export const resourcesApi = {
  get: () => fetchJSON(API_CONFIG.endpoints.resources),
--- a/core/http/react-ui/src/utils/config.js
+++ b/core/http/react-ui/src/utils/config.js
@@ -50,6 +50,11 @@ export const API_CONFIG = {
    // OpenAI-compatible endpoints
    chatCompletions: '/v1/chat/completions',
    mcpChatCompletions: '/v1/mcp/chat/completions',
+    mcpServers: (model) => `/v1/mcp/servers/${model}`,
+    mcpPrompts: (model) => `/v1/mcp/prompts/${model}`,
+    mcpGetPrompt: (model, prompt) => `/v1/mcp/prompts/${model}/${encodeURIComponent(prompt)}`,
+    mcpResources: (model) => `/v1/mcp/resources/${model}`,
+    mcpReadResource: (model) => `/v1/mcp/resources/${model}/read`,
    completions: '/v1/completions',
    imageGenerations: '/v1/images/generations',
    audioSpeech: '/v1/audio/speech',
@@ -78,5 +83,6 @@ export const API_CONFIG = {
    backendsInstalled: '/backends',
    version: '/version',
    system: '/system',
+    corsProxy: '/api/cors-proxy',
  },
 }
--- a/core/http/react-ui/src/utils/mcpClientStorage.js
+++ b/core/http/react-ui/src/utils/mcpClientStorage.js
@@ -0,0 +1,54 @@
+const STORAGE_KEY = 'localai_client_mcp_servers'
+
+function generateId() {
+  return Date.now().toString(36) + Math.random().toString(36).slice(2)
+}
+
+export function loadClientMCPServers() {
+  try {
+    const stored = localStorage.getItem(STORAGE_KEY)
+    if (stored) {
+      const data = JSON.parse(stored)
+      if (Array.isArray(data)) return data
+    }
+  } catch (_e) {
+    // ignore
+  }
+  return []
+}
+
+export function saveClientMCPServers(servers) {
+  try {
+    localStorage.setItem(STORAGE_KEY, JSON.stringify(servers))
+  } catch (_e) {
+    // ignore
+  }
+}
+
+export function addClientMCPServer({ name, url, headers, useProxy }) {
+  const servers = loadClientMCPServers()
+  const server = {
+    id: generateId(),
+    name: name || new URL(url).hostname,
+    url,
+    headers: headers || {},
+    useProxy: useProxy !== false,
+  }
+  servers.push(server)
+  saveClientMCPServers(servers)
+  return server
+}
+
+export function removeClientMCPServer(id) {
+  const servers = loadClientMCPServers().filter(s => s.id !== id)
+  saveClientMCPServers(servers)
+  return servers
+}
+
+export function updateClientMCPServer(id, updates) {
+  const servers = loadClientMCPServers().map(s =>
+    s.id === id ? { ...s, ...updates } : s
+  )
+  saveClientMCPServers(servers)
+  return servers
+}
--- a/core/http/routes/localai.go
+++ b/core/http/routes/localai.go
@@ -164,6 +164,22 @@ func RegisterLocalAIRoutes(router *echo.Echo,
 		router.POST("/v1/mcp/chat/completions", mcpStreamHandler, mcpStreamMiddleware...)
 		router.POST("/mcp/v1/chat/completions", mcpStreamHandler, mcpStreamMiddleware...)
 		router.POST("/mcp/chat/completions", mcpStreamHandler, mcpStreamMiddleware...)
+
+		// MCP server listing endpoint
+		router.GET("/v1/mcp/servers/:model", localai.MCPServersEndpoint(cl, appConfig))
+
+		// MCP prompts endpoints
+		router.GET("/v1/mcp/prompts/:model", localai.MCPPromptsEndpoint(cl, appConfig))
+		router.POST("/v1/mcp/prompts/:model/:prompt", localai.MCPGetPromptEndpoint(cl, appConfig))
+
+		// MCP resources endpoints
+		router.GET("/v1/mcp/resources/:model", localai.MCPResourcesEndpoint(cl, appConfig))
+		router.POST("/v1/mcp/resources/:model/read", localai.MCPReadResourceEndpoint(cl, appConfig))
+
+		// CORS proxy for client-side MCP connections
+		router.GET("/api/cors-proxy", localai.CORSProxyEndpoint(appConfig))
+		router.POST("/api/cors-proxy", localai.CORSProxyEndpoint(appConfig))
+		router.OPTIONS("/api/cors-proxy", localai.CORSProxyOptionsEndpoint())
 	}

 	// Agent job routes (MCP CI Jobs — requires MCP to be enabled)
--- a/core/http/routes/ui_api.go
+++ b/core/http/routes/ui_api.go
@@ -557,21 +557,6 @@ func RegisterUIAPIRoutes(app *echo.Echo, cl *config.ModelConfigLoader, ml *model
 		galleryService.StoreCancellation(uid, cancelFunc)
 		go func() {
 			galleryService.ModelGalleryChannel <- op
-			// Wait for the deletion operation to complete with a timeout
-			ctx, cancel := context.WithTimeout(context.Background(), 10*time.Second)
-			defer cancel()
-			for {
-				select {
-				case <-ctx.Done():
-					xlog.Warn("Timeout waiting for deletion to complete", "uid", uid)
-					break
-				default:
-					if status := galleryService.GetStatus(uid); status != nil && status.Processed {
-						break
-					}
-					time.Sleep(100 * time.Millisecond)
-				}
-			}
 			cl.RemoveModelConfig(galleryName)
 		}()

--- a/core/http/static/chat.js
+++ b/core/http/static/chat.js
@@ -1209,7 +1209,7 @@ async function promptGPT(systemPrompt, input) {
    model: model,
    messages: messages,
  };
-  
+
  // Add stream parameter for both regular chat and MCP (MCP now supports SSE streaming)
  requestBody.stream = true;
  
--- a/core/http/views/chat.html
+++ b/core/http/views/chat.html
@@ -895,7 +895,7 @@ SOFTWARE.

          <div x-data="{ showPromptForm: false, showParamsForm: false }" class="space-y-2">
              <!-- MCP Toggle - Compact (shown dynamically based on model support) -->
-              <div x-data="{ 
+              <div x-data="{
                mcpAvailable: false,
                checkMCP() {
                  const modelSelector = document.getElementById('modelSelector');
@@ -910,7 +910,7 @@ SOFTWARE.
                  }
                  const hasMCP = selectedOption.getAttribute('data-has-mcp') === 'true';
                  this.mcpAvailable = hasMCP;
-                  
+
                  // If model doesn't support MCP, disable MCP mode
                  const activeChat = $store.chat.activeChat();
                  if (activeChat && !hasMCP) {
--- a/core/schema/message.go
+++ b/core/schema/message.go
@@ -28,6 +28,8 @@ type Message struct {

 	ToolCalls []ToolCall `json:"tool_calls,omitempty" yaml:"tool_call,omitempty"`

+	ToolCallID string `json:"tool_call_id,omitempty" yaml:"tool_call_id,omitempty"`
+
 	// Reasoning content extracted from <thinking>...</thinking> tags
 	Reasoning *string `json:"reasoning,omitempty" yaml:"reasoning,omitempty"`
 }
--- a/docs/content/features/mcp.md
+++ b/docs/content/features/mcp.md
@@ -21,23 +21,23 @@ The Model Context Protocol is a standard for connecting AI models to external to

 ## Key Features

- **🔄 Real-time Tool Access**: Connect to external MCP servers for live data
- **🛠️ Multiple Server Support**: Configure both remote HTTP and local stdio servers
- **⚡ Cached Connections**: Efficient tool caching for better performance
- **🔒 Secure Authentication**: Support for bearer token authentication
- **🎯 OpenAI Compatible**: Uses the familiar `/mcp/v1/chat/completions` endpoint
- **🧠 Advanced Reasoning**: Configurable reasoning and re-evaluation capabilities
- **📋 Auto-Planning**: Break down complex tasks into manageable steps
- **🎯 MCP Prompts**: Specialized prompts for better MCP server interaction
- **🔄 Plan Re-evaluation**: Dynamic plan adjustment based on results
- **⚙️ Flexible Agent Control**: Customizable execution limits and retry behavior
+- **Real-time Tool Access**: Connect to external MCP servers for live data
+- **Multiple Server Support**: Configure both remote HTTP and local stdio servers
+- **Cached Connections**: Efficient tool caching for better performance
+- **Secure Authentication**: Support for bearer token authentication
+- **Multi-endpoint Support**: Works with OpenAI Chat, Anthropic Messages, and Open Responses APIs
+- **Selective Server Activation**: Use `metadata.mcp_servers` to enable only specific servers per request
+- **Server-side Tool Execution**: Tools are executed on the server and results fed back to the model automatically
+- **Agent Configuration**: Customizable execution limits and retry behavior
+- **MCP Prompts**: Discover and expand reusable prompt templates from MCP servers
+- **MCP Resources**: Browse and inject resource content (files, data) from MCP servers into conversations

 ## Configuration

 MCP support is configured in your model's YAML configuration file using the `mcp` section:

 ```yaml
-name: my-agentic-model
+name: my-mcp-model
 backend: llama-cpp
 parameters:
  model: qwen3-4b.gguf
@@ -56,7 +56,7 @@ mcp:
        }
      }
    }
-  
+
  stdio: |
    {
      "mcpServers": {
@@ -78,16 +78,7 @@ mcp:
    }

 agent:
-  max_attempts: 3              # Maximum number of tool execution attempts
-  max_iterations: 3            # Maximum number of reasoning iterations
-  enable_reasoning: true       # Enable tool reasoning capabilities
-  enable_planning: false       # Enable auto-planning capabilities
-  enable_mcp_prompts: false    # Enable MCP prompts
-  enable_plan_re_evaluator: false # Enable plan re-evaluation
-  disable_sink_state: false    # Disable sink state behavior
-  loop_detection: 3            # Loop detection sensitivity level
-  max_adjustment_attempts: 5   # Maximum adjustment attempts for tool calls
-  force_reasoning_tool: false  # Force reasoning tool usage
+  max_iterations: 10             # Maximum MCP tool execution loop iterations
 ```

 ### Configuration Options
@@ -106,39 +97,226 @@ Configure local command-based MCP servers:
 - **`env`**: Environment variables (optional)

 #### Agent Configuration (`agent`)
-Configure agent behavior and tool execution:

-**Execution Control**
- **`max_attempts`**: Maximum number of tool execution attempts (default: 3). Higher values provide more resilience but may increase response time.
- **`max_iterations`**: Maximum number of reasoning iterations (default: 3). More iterations allow for complex multi-step problem solving.
- **`loop_detection`**: Loop detection sensitivity level (default: 0, disabled). Set to a positive integer (e.g., 3) to enable loop detection and prevent infinite execution cycles.
- **`max_adjustment_attempts`**: Maximum adjustment attempts for tool calls (default: 5). Prevents infinite loops when adjusting tool call parameters.
-
-**Reasoning and Planning**
- **`enable_reasoning`**: Enable tool reasoning capabilities (default: false). When enabled, the agent uses advanced reasoning to better understand tool results.
- **`enable_planning`**: Enable auto-planning capabilities (default: false). When enabled, breaks down complex tasks into manageable steps.
- **`disable_sink_state`**: Disable sink state behavior (default: false). When enabled, prevents the agent from entering a sink state.
- **`force_reasoning_tool`**: Force reasoning tool usage (default: false). When enabled, always use the reasoning tool in the agent's reasoning process.
-
-**MCP Integration**
- **`enable_mcp_prompts`**: Enable MCP prompts (default: false). When enabled, uses specialized prompts exposed by MCP servers.
- **`enable_plan_re_evaluator`**: Enable plan re-evaluation (default: false). When enabled, dynamically adjusts execution plans based on results.
+- **`max_iterations`**: Maximum number of MCP tool execution loop iterations (default: 10). Each iteration allows the model to call tools and receive results before generating the next response.

 ## Usage

-### API Endpoint
+### Selecting MCP Servers via `metadata`

-Use the MCP-enabled completion endpoint:
+All API endpoints support MCP server selection through the standard `metadata` field. Pass a comma-separated list of server names in `metadata.mcp_servers`:
+
+- **When present**: Only the named MCP servers are activated for this request. Server names must match the keys in the model's MCP config YAML (e.g., `"weather-api"`, `"search-engine"`).
+- **When absent**: Behavior depends on the endpoint:
+  - **OpenAI Chat Completions** and **Anthropic Messages**: No MCP tools are injected (standard behavior).
+  - **Open Responses**: If the model has MCP config and no user-provided tools, all MCP servers are auto-activated (backward compatible).
+
+The `mcp_servers` metadata key is consumed by the MCP engine and stripped before reaching the backend. Clients that support the standard `metadata` field can use this without custom schema extensions.
+
+### API Endpoints
+
+MCP tools work across all three API endpoints:
+
+#### OpenAI Chat Completions (`/v1/chat/completions`)
+
+```bash
+curl http://localhost:8080/v1/chat/completions \
+  -H "Content-Type: application/json" \
+  -d '{
+    "model": "my-mcp-model",
+    "messages": [{"role": "user", "content": "What is the weather in New York?"}],
+    "metadata": {"mcp_servers": "weather-api"},
+    "stream": true
+  }'
+```
+
+#### Anthropic Messages (`/v1/messages`)
+
+```bash
+curl http://localhost:8080/v1/messages \
+  -H "Content-Type: application/json" \
+  -d '{
+    "model": "my-mcp-model",
+    "max_tokens": 1024,
+    "messages": [{"role": "user", "content": "What is the weather in New York?"}],
+    "metadata": {"mcp_servers": "weather-api"}
+  }'
+```
+
+#### Open Responses (`/v1/responses`)
+
+```bash
+curl http://localhost:8080/v1/responses \
+  -H "Content-Type: application/json" \
+  -d '{
+    "model": "my-mcp-model",
+    "input": "What is the weather in New York?",
+    "metadata": {"mcp_servers": "weather-api"}
+  }'
+```
+
+### Server Listing Endpoint
+
+You can list available MCP servers and their tools for a given model:
+
+```bash
+curl http://localhost:8080/v1/mcp/servers/my-mcp-model
+```
+
+Returns:
+
+```json
+[
+  {
+    "name": "weather-api",
+    "type": "remote",
+    "tools": ["get_weather", "get_forecast"]
+  },
+  {
+    "name": "search-engine",
+    "type": "remote",
+    "tools": ["web_search", "image_search"]
+  }
+]
+```
+
+### MCP Prompts
+
+MCP servers can provide reusable prompt templates. LocalAI supports discovering and expanding prompts from MCP servers.
+
+#### List Prompts
+
+```bash
+curl http://localhost:8080/v1/mcp/prompts/my-mcp-model
+```
+
+Returns:
+
+```json
+[
+  {
+    "name": "code-review",
+    "description": "Review code for best practices",
+    "title": "Code Review",
+    "arguments": [
+      {"name": "language", "description": "Programming language", "required": true}
+    ],
+    "server": "dev-tools"
+  }
+]
+```
+
+#### Expand a Prompt
+
+```bash
+curl -X POST http://localhost:8080/v1/mcp/prompts/my-mcp-model/code-review \
+  -H "Content-Type: application/json" \
+  -d '{"arguments": {"language": "go"}}'
+```
+
+Returns:
+
+```json
+{
+  "messages": [
+    {"role": "user", "content": "Please review the following Go code for best practices..."}
+  ]
+}
+```
+
+#### Inject Prompts via Metadata
+
+You can inject MCP prompts into any chat request using `metadata.mcp_prompt` and `metadata.mcp_prompt_args`:
+
+```bash
+curl http://localhost:8080/v1/chat/completions \
+  -H "Content-Type: application/json" \
+  -d '{
+    "model": "my-mcp-model",
+    "messages": [{"role": "user", "content": "Review this function: func add(a, b int) int { return a + b }"}],
+    "metadata": {
+      "mcp_servers": "dev-tools",
+      "mcp_prompt": "code-review",
+      "mcp_prompt_args": "{\"language\": \"go\"}"
+    }
+  }'
+```
+
+The prompt messages are prepended to the conversation before inference.
+
+### MCP Resources
+
+MCP servers can expose data/content (files, database records, etc.) as resources identified by URI.
+
+#### List Resources
+
+```bash
+curl http://localhost:8080/v1/mcp/resources/my-mcp-model
+```
+
+Returns:
+
+```json
+[
+  {
+    "name": "project-readme",
+    "uri": "file:///README.md",
+    "description": "Project documentation",
+    "mimeType": "text/markdown",
+    "server": "file-manager"
+  }
+]
+```
+
+#### Read a Resource
+
+```bash
+curl -X POST http://localhost:8080/v1/mcp/resources/my-mcp-model/read \
+  -H "Content-Type: application/json" \
+  -d '{"uri": "file:///README.md"}'
+```
+
+Returns:
+
+```json
+{
+  "uri": "file:///README.md",
+  "content": "# My Project\n...",
+  "mimeType": "text/markdown"
+}
+```
+
+#### Inject Resources via Metadata
+
+You can inject MCP resources into chat requests using `metadata.mcp_resources` (comma-separated URIs):
+
+```bash
+curl http://localhost:8080/v1/chat/completions \
+  -H "Content-Type: application/json" \
+  -d '{
+    "model": "my-mcp-model",
+    "messages": [{"role": "user", "content": "Summarize this project"}],
+    "metadata": {
+      "mcp_servers": "file-manager",
+      "mcp_resources": "file:///README.md,file:///CHANGELOG.md"
+    }
+  }'
+```
+
+Resource contents are appended to the last user message as text blocks (following the same approach as llama.cpp's WebUI).
+
+### Legacy Endpoint
+
+The `/mcp/v1/chat/completions` endpoint is still supported for backward compatibility. It automatically enables all configured MCP servers (equivalent to not specifying `mcp_servers`).

 ```bash
 curl http://localhost:8080/mcp/v1/chat/completions \
  -H "Content-Type: application/json" \
  -d '{
-    "model": "my-agentic-model",
+    "model": "my-mcp-model",
    "messages": [
      {"role": "user", "content": "What is the current weather in New York?"}
-    ],
-    "temperature": 0.7
+    ]
  }'
 ```

@@ -148,10 +326,10 @@ curl http://localhost:8080/mcp/v1/chat/completions \
 {
  "id": "chatcmpl-123",
  "created": 1699123456,
-  "model": "my-agentic-model",
+  "model": "my-mcp-model",
  "choices": [
    {
-      "text": "The current weather in New York is 72°F (22°C) with partly cloudy skies. The humidity is 65% and there's a light breeze from the west at 8 mph."
+      "text": "The current weather in New York is 72°F (22°C) with partly cloudy skies."
    }
  ],
  "object": "text_completion"
@@ -160,7 +338,6 @@ curl http://localhost:8080/mcp/v1/chat/completions \

 ## Example Configurations

-
 ### Docker-based Tools

 ```yaml
@@ -184,47 +361,28 @@ mcp:
    }

 agent:
-  max_attempts: 5
-  max_iterations: 5
-  enable_reasoning: true
-  enable_planning: true
-  enable_mcp_prompts: true
-  enable_plan_re_evaluator: true
+  max_iterations: 10
 ```

-## Agent Configuration Details
-
-The `agent` section controls how the AI model interacts with MCP tools:
-
-### Execution Control
- **`max_attempts`**: Limits how many times a tool can be retried if it fails. Higher values provide more resilience but may increase response time.
- **`max_iterations`**: Controls the maximum number of reasoning cycles the agent can perform. More iterations allow for complex multi-step problem solving.
- **`loop_detection`**: Set to a positive integer (e.g., 3) to enable loop detection and prevent infinite execution cycles. Default is 0 (disabled).
- **`max_adjustment_attempts`**: Limits the number of times the agent can adjust tool call parameters. Prevents infinite loops during tool execution (default: 5).
-
-### Reasoning Capabilities
- **`enable_reasoning`**: When enabled, the agent uses advanced reasoning to better understand tool results and plan next steps.
- **`force_reasoning_tool`**: When enabled, forces the agent to always use the reasoning tool in its reasoning process, ensuring explicit reasoning steps.
- **`disable_sink_state`**: When enabled, prevents the agent from entering a sink state where it stops making progress.
-
-### Planning Capabilities
- **`enable_planning`**: When enabled, the agent uses auto-planning to break down complex tasks into manageable steps and execute them systematically. The agent will automatically detect when planning is needed.
- **`enable_mcp_prompts`**: When enabled, the agent uses specialized prompts exposed by the MCP servers to interact with the exposed tools.
- **`enable_plan_re_evaluator`**: When enabled, the agent can re-evaluate and adjust its execution plan based on intermediate results.
-
-### Recommended Settings
- **Simple tasks**: `max_attempts: 2`, `max_iterations: 2`, `enable_reasoning: false`, `enable_planning: false`
- **Complex tasks**: `max_attempts: 5`, `max_iterations: 5`, `enable_reasoning: true`, `enable_planning: true`, `enable_mcp_prompts: true`
- **Advanced planning**: `max_attempts: 5`, `max_iterations: 5`, `enable_reasoning: true`, `enable_planning: true`, `enable_mcp_prompts: true`, `enable_plan_re_evaluator: true`, `loop_detection: 3`
- **Development/Debugging**: `max_attempts: 1`, `max_iterations: 1`, `enable_reasoning: true`, `enable_planning: true`
- **Aggressive loop prevention**: `max_attempts: 5`, `max_iterations: 5`, `loop_detection: 2`, `max_adjustment_attempts: 3`, `force_reasoning_tool: true`
-
 ## How It Works

 1. **Tool Discovery**: LocalAI connects to configured MCP servers and discovers available tools
-2. **Tool Caching**: Tools are cached per model for efficient reuse
-3. **Agent Execution**: The AI model uses the [Cogito](https://github.com/mudler/cogito) framework to execute tools
-4. **Response Generation**: The model generates responses incorporating tool results
+2. **Tool Injection**: Discovered tools are injected into the model's tool/function list alongside any user-provided tools
+3. **Inference Loop**: The model generates a response. If it calls MCP tools, LocalAI executes them server-side, appends results to the conversation, and re-runs inference
+4. **Response Generation**: When the model produces a final response (no more MCP tool calls), it is returned to the client
+
+The execution loop is bounded by `agent.max_iterations` (default 10) to prevent infinite loops.
+
+## Session Lifecycle
+
+MCP sessions are automatically managed by LocalAI:
+
+- **Lazy initialization**: Sessions are created the first time a model's MCP tools are used
+- **Cached per model**: Sessions are reused across requests for the same model
+- **Cleanup on model unload**: When a model is unloaded (idle watchdog eviction, manual stop, or shutdown), all associated MCP sessions are closed and resources freed
+- **Graceful shutdown**: All MCP sessions are closed when LocalAI shuts down
+
+This means you don't need to manually manage MCP connections — they follow the model's lifecycle automatically.

 ## Supported MCP Servers

@@ -255,15 +413,16 @@ Use MCP-enabled models in your applications:
 import openai

 client = openai.OpenAI(
-    base_url="http://localhost:8080/mcp/v1",
+    base_url="http://localhost:8080/v1",
    api_key="your-api-key"
 )

 response = client.chat.completions.create(
-    model="my-agentic-model",
+    model="my-mcp-model",
    messages=[
        {"role": "user", "content": "Analyze the latest research papers on AI"}
-    ]
+    ],
+    extra_body={"metadata": {"mcp_servers": "search-engine"}}
 )
 ```

@@ -366,6 +525,60 @@ mcp:
 - [Awesome MCPs](https://github.com/punkpeye/awesome-mcp-servers)
 - [A list of MCPs by mudler](https://github.com/mudler/MCPs)

+## Client-Side MCP (Browser)
+
+In addition to server-side MCP (where the backend connects to MCP servers), LocalAI supports **client-side MCP** where the browser connects directly to MCP servers. This is inspired by llama.cpp's WebUI and works alongside server-side MCP.
+
+### How It Works
+
+1. **Add servers in the UI**: Click the "Client MCP" button in the chat header and add MCP server URLs
+2. **Browser connects directly**: The browser uses the MCP TypeScript SDK (`StreamableHTTPClientTransport` or `SSEClientTransport`) to connect to MCP servers
+3. **Tool discovery**: Connected servers' tools are sent as `tools` in the chat request body
+4. **Browser-side execution**: When the LLM calls a client-side tool, the browser executes it against the MCP server and sends the result back in a follow-up request
+5. **Agentic loop**: This continues (up to 10 turns) until the LLM produces a final response
+
+### CORS Proxy
+
+Since browsers enforce CORS restrictions, LocalAI provides a built-in proxy at `/api/cors-proxy`. When "Use CORS proxy" is enabled (default), requests to external MCP servers are routed through:
+
+```
+/api/cors-proxy?url=https://remote-mcp-server.example.com/sse
+```
+
+The proxy forwards the request method, headers, and body to the target URL and streams the response back with appropriate CORS headers.
+
+### MCP Apps (Interactive Tool UIs)
+
+LocalAI supports the [MCP Apps extension](https://modelcontextprotocol.io/extensions/apps/overview), which allows MCP tools to declare interactive HTML UIs. When a tool has `_meta.ui.resourceUri` in its definition, calling that tool renders the app's HTML inline in the chat as a sandboxed iframe.
+
+**How it works:**
+
+- When the LLM calls a tool with `_meta.ui.resourceUri`, the browser fetches the HTML resource from the MCP server and renders it in an iframe
+- The iframe is sandboxed (`allow-scripts allow-forms`, no `allow-same-origin`) for security
+- The app can call server tools, send messages, and update context via the `AppBridge` protocol (JSON-RPC over `postMessage`)
+- Tools marked as app-only (`_meta.ui.visibility: "app-only"`) are hidden from the LLM and only callable by the app iframe
+- On page reload, apps render statically until the MCP connection is re-established
+
+**Requirements:**
+
+- Only works with **client-side MCP** connections (the browser must be connected to the MCP server)
+- The MCP server must implement the Apps extension (`_meta.ui.resourceUri` on tools, resource serving)
+
+### Coexistence with Server-Side MCP
+
+Both modes work simultaneously in the same chat:
+
+- **Server-side MCP tools** are configured in model YAML files and executed by the backend. The backend handles these in its own agentic loop.
+- **Client-side MCP tools** are configured per-user in the browser and sent as `tools` in the request. When the LLM calls them, the browser executes them.
+
+If both sides have a tool with the same name, the server-side tool takes priority.
+
+### Security Considerations
+
+- The CORS proxy can forward requests to any HTTP/HTTPS URL. It is only available when MCP is enabled (`LOCALAI_DISABLE_MCP` is not set).
+- Client-side MCP server configurations are stored in the browser's localStorage and are not shared with the server.
+- Custom headers (e.g., API keys) for MCP servers are stored in localStorage. Use with caution on shared machines.
+
 ## Disabling MCP Support

 You can completely disable MCP functionality in LocalAI by setting the `LOCALAI_DISABLE_MCP` environment variable to `true`, `1`, or `yes`:
--- a/pkg/model/loader.go
+++ b/pkg/model/loader.go
@@ -19,6 +19,10 @@ import (
 // new idea: what if we declare a struct of these here, and use a loop to check?

 // TODO: Split ModelLoader and TemplateLoader? Just to keep things more organized. Left together to share a mutex until I look into that. Would split if we separate directories for .bin/.yaml and .tmpl
+// ModelUnloadHook is called when a model is about to be unloaded.
+// The model name is passed as the argument.
+type ModelUnloadHook func(modelName string)
+
 type ModelLoader struct {
 	ModelPath                string
 	mu                       sync.Mutex
@@ -28,6 +32,7 @@ type ModelLoader struct {
 	externalBackends         map[string]string
 	lruEvictionMaxRetries    int           // Maximum number of retries when waiting for busy models
 	lruEvictionRetryInterval time.Duration // Interval between retries when waiting for busy models
+	onUnloadHooks            []ModelUnloadHook
 }

 // NewModelLoader creates a new ModelLoader instance.
@@ -52,6 +57,13 @@ func (ml *ModelLoader) GetLoadingCount() int {
 	return len(ml.loading)
 }

+// OnModelUnload registers a hook that is called when a model is unloaded.
+func (ml *ModelLoader) OnModelUnload(hook ModelUnloadHook) {
+	ml.mu.Lock()
+	defer ml.mu.Unlock()
+	ml.onUnloadHooks = append(ml.onUnloadHooks, hook)
+}
+
 func (ml *ModelLoader) SetWatchDog(wd *WatchDog) {
 	ml.wd = wd
 }
--- a/pkg/model/process.go
+++ b/pkg/model/process.go
@@ -46,6 +46,11 @@ func (ml *ModelLoader) deleteProcess(s string) error {

 	xlog.Debug("Deleting process", "model", s)

+	// Run unload hooks (e.g. close MCP sessions)
+	for _, hook := range ml.onUnloadHooks {
+		hook(s)
+	}
+
 	// Free GPU resources before stopping the process to ensure VRAM is released
 	if freeFunc, ok := model.GRPC(false, ml.wd).(interface{ Free() error }); ok {
 		xlog.Debug("Calling Free() to release GPU resources", "model", s)
--- a/tests/e2e/e2e_mcp_test.go
+++ b/tests/e2e/e2e_mcp_test.go
@@ -0,0 +1,439 @@
+package e2e_test
+
+import (
+	"bytes"
+	"context"
+	"encoding/json"
+	"fmt"
+	"io"
+	"net"
+	"net/http"
+	"time"
+
+	"github.com/modelcontextprotocol/go-sdk/mcp"
+	. "github.com/onsi/ginkgo/v2"
+	. "github.com/onsi/gomega"
+	"github.com/openai/openai-go/v3"
+)
+
+// startMockMCPServer creates an in-process MCP HTTP server with a "get_weather" tool
+// and returns its URL and a shutdown function.
+func startMockMCPServer() (string, func()) {
+	server := mcp.NewServer(
+		&mcp.Implementation{Name: "mock-mcp", Version: "v1.0.0"},
+		nil,
+	)
+
+	server.AddTool(
+		&mcp.Tool{
+			Name:        "get_weather",
+			Description: "Get the current weather for a location",
+			InputSchema: json.RawMessage(`{"type":"object","properties":{"location":{"type":"string","description":"City name"}},"required":["location"]}`),
+		},
+		func(_ context.Context, req *mcp.CallToolRequest) (*mcp.CallToolResult, error) {
+			var args struct {
+				Location string `json:"location"`
+			}
+			if req.Params.Arguments != nil {
+				data, _ := json.Marshal(req.Params.Arguments)
+				json.Unmarshal(data, &args)
+			}
+			return &mcp.CallToolResult{
+				Content: []mcp.Content{
+					&mcp.TextContent{
+						Text: fmt.Sprintf("Weather in %s: sunny, 72°F", args.Location),
+					},
+				},
+			}, nil
+		},
+	)
+
+	handler := mcp.NewStreamableHTTPHandler(
+		func(r *http.Request) *mcp.Server { return server },
+		&mcp.StreamableHTTPOptions{
+			Stateless: true,
+		},
+	)
+
+	listener, err := net.Listen("tcp", "127.0.0.1:0")
+	Expect(err).ToNot(HaveOccurred())
+
+	httpServer := &http.Server{Handler: handler}
+	go httpServer.Serve(listener)
+
+	url := fmt.Sprintf("http://%s/mcp", listener.Addr().String())
+	shutdown := func() {
+		ctx, cancel := context.WithTimeout(context.Background(), 2*time.Second)
+		defer cancel()
+		httpServer.Shutdown(ctx)
+	}
+	return url, shutdown
+}
+
+// mcpModelConfig generates a model config YAML that includes MCP remote server config.
+func mcpModelConfig(mcpServerURL string) map[string]any {
+	mcpRemote := fmt.Sprintf(`{"mcpServers":{"weather-api":{"url":"%s"}}}`, mcpServerURL)
+	return map[string]any{
+		"name":    "mock-model-mcp",
+		"backend": "mock-backend",
+		"parameters": map[string]any{
+			"model": "mock-model-mcp.bin",
+		},
+		"mcp": map[string]any{
+			"remote": mcpRemote,
+		},
+		"agent": map[string]any{
+			// The mock backend returns a tool call on the first inference, then
+			// a plain text response once tool results appear in the prompt.
+			// max_iterations=1 is enough for one tool-call round-trip.
+			"max_iterations": 1,
+		},
+	}
+}
+
+// httpPost sends a JSON POST request and returns the response.
+func httpPost(url string, body any) (*http.Response, error) {
+	data, err := json.Marshal(body)
+	if err != nil {
+		return nil, err
+	}
+	req, err := http.NewRequest("POST", url, bytes.NewReader(data))
+	if err != nil {
+		return nil, err
+	}
+	req.Header.Set("Content-Type", "application/json")
+	return (&http.Client{Timeout: 60 * time.Second}).Do(req)
+}
+
+// readBody reads and returns the response body as a string.
+func readBody(resp *http.Response) string {
+	data, err := io.ReadAll(resp.Body)
+	Expect(err).ToNot(HaveOccurred())
+	return string(data)
+}
+
+var _ = Describe("MCP Tool Integration E2E Tests", Label("MCP"), func() {
+
+	Describe("MCP Server Listing", func() {
+		It("should list MCP servers and tools for a configured model", func() {
+			resp, err := http.Get(fmt.Sprintf("http://127.0.0.1:%d/v1/mcp/servers/mock-model-mcp", apiPort))
+			Expect(err).ToNot(HaveOccurred())
+			defer resp.Body.Close()
+			Expect(resp.StatusCode).To(Equal(200))
+
+			var result struct {
+				Model   string `json:"model"`
+				Servers []struct {
+					Name  string   `json:"name"`
+					Type  string   `json:"type"`
+					Tools []string `json:"tools"`
+				} `json:"servers"`
+			}
+			Expect(json.NewDecoder(resp.Body).Decode(&result)).To(Succeed())
+			Expect(result.Model).To(Equal("mock-model-mcp"))
+			Expect(result.Servers).To(HaveLen(1))
+			Expect(result.Servers[0].Name).To(Equal("weather-api"))
+			Expect(result.Servers[0].Tools).To(ContainElement("get_weather"))
+		})
+
+		It("should return empty servers for a model without MCP config", func() {
+			resp, err := http.Get(fmt.Sprintf("http://127.0.0.1:%d/v1/mcp/servers/mock-model", apiPort))
+			Expect(err).ToNot(HaveOccurred())
+			defer resp.Body.Close()
+			Expect(resp.StatusCode).To(Equal(200))
+
+			var result struct {
+				Servers []any `json:"servers"`
+			}
+			Expect(json.NewDecoder(resp.Body).Decode(&result)).To(Succeed())
+			Expect(result.Servers).To(BeEmpty())
+		})
+	})
+
+	Describe("OpenAI Chat Completions with MCP", func() {
+		Context("Non-streaming", func() {
+			It("should inject and execute MCP tools when mcp_servers is set", func() {
+				body := map[string]any{
+					"model":       "mock-model-mcp",
+					"messages":    []map[string]string{{"role": "user", "content": "What is the weather in San Francisco?"}},
+					"metadata": map[string]string{"mcp_servers": "weather-api"},
+				}
+				resp, err := httpPost(apiURL+"/chat/completions", body)
+				Expect(err).ToNot(HaveOccurred())
+				defer resp.Body.Close()
+				respBody := readBody(resp)
+				Expect(resp.StatusCode).To(Equal(200), "unexpected status, body: %s", respBody)
+
+				var result struct {
+					Choices []struct {
+						Message struct {
+							Content string `json:"content"`
+						} `json:"message"`
+					} `json:"choices"`
+				}
+				Expect(json.Unmarshal([]byte(respBody), &result)).To(Succeed())
+				Expect(result.Choices).To(HaveLen(1))
+				Expect(result.Choices[0].Message.Content).To(ContainSubstring("weather"))
+			})
+
+			It("should not inject MCP tools when mcp_servers is not set", func() {
+				resp, err := client.Chat.Completions.New(
+					context.TODO(),
+					openai.ChatCompletionNewParams{
+						Model: "mock-model-mcp",
+						Messages: []openai.ChatCompletionMessageParamUnion{
+							openai.UserMessage("Hello"),
+						},
+					},
+				)
+				Expect(err).ToNot(HaveOccurred())
+				Expect(len(resp.Choices)).To(Equal(1))
+				Expect(resp.Choices[0].Message.Content).To(ContainSubstring("mocked response"))
+			})
+		})
+
+		Context("Streaming", func() {
+			It("should work with MCP tools in streaming mode", func() {
+				body := map[string]any{
+					"model":       "mock-model-mcp",
+					"messages":    []map[string]string{{"role": "user", "content": "What is the weather?"}},
+					"metadata": map[string]string{"mcp_servers": "weather-api"},
+					"stream":      true,
+				}
+				resp, err := httpPost(apiURL+"/chat/completions", body)
+				Expect(err).ToNot(HaveOccurred())
+				defer resp.Body.Close()
+				Expect(resp.StatusCode).To(Equal(200))
+				Expect(resp.Header.Get("Content-Type")).To(ContainSubstring("text/event-stream"))
+
+				data, err := io.ReadAll(resp.Body)
+				Expect(err).ToNot(HaveOccurred())
+				Expect(string(data)).To(ContainSubstring("data:"))
+			})
+		})
+	})
+
+	Describe("Anthropic Messages with MCP", func() {
+		Context("Non-streaming", func() {
+			It("should inject and execute MCP tools when mcp_servers is set", func() {
+				body := map[string]any{
+					"model":       "mock-model-mcp",
+					"max_tokens":  1024,
+					"messages":    []map[string]string{{"role": "user", "content": "What is the weather?"}},
+					"metadata": map[string]string{"mcp_servers": "weather-api"},
+				}
+				resp, err := httpPost(fmt.Sprintf("http://127.0.0.1:%d/v1/messages", apiPort), body)
+				Expect(err).ToNot(HaveOccurred())
+				defer resp.Body.Close()
+				respBody := readBody(resp)
+				Expect(resp.StatusCode).To(Equal(200), "unexpected status, body: %s", respBody)
+
+				var result map[string]any
+				Expect(json.Unmarshal([]byte(respBody), &result)).To(Succeed())
+				content, ok := result["content"].([]any)
+				Expect(ok).To(BeTrue())
+				Expect(content).ToNot(BeEmpty())
+				first, ok := content[0].(map[string]any)
+				Expect(ok).To(BeTrue())
+				Expect(first["text"]).To(ContainSubstring("weather"))
+			})
+
+			It("should return standard response without mcp_servers", func() {
+				body := map[string]any{
+					"model":      "mock-model-mcp",
+					"max_tokens": 1024,
+					"messages":   []map[string]string{{"role": "user", "content": "Hello"}},
+				}
+				resp, err := httpPost(fmt.Sprintf("http://127.0.0.1:%d/v1/messages", apiPort), body)
+				Expect(err).ToNot(HaveOccurred())
+				defer resp.Body.Close()
+				Expect(resp.StatusCode).To(Equal(200))
+
+				var result map[string]any
+				Expect(json.NewDecoder(resp.Body).Decode(&result)).To(Succeed())
+				content, ok := result["content"].([]any)
+				Expect(ok).To(BeTrue())
+				Expect(content).ToNot(BeEmpty())
+				first, ok := content[0].(map[string]any)
+				Expect(ok).To(BeTrue())
+				Expect(first["text"]).To(ContainSubstring("mocked response"))
+			})
+		})
+
+		Context("Streaming", func() {
+			It("should work with MCP tools in streaming mode", func() {
+				body := map[string]any{
+					"model":       "mock-model-mcp",
+					"max_tokens":  1024,
+					"messages":    []map[string]string{{"role": "user", "content": "What is the weather?"}},
+					"metadata": map[string]string{"mcp_servers": "weather-api"},
+					"stream":      true,
+				}
+				resp, err := httpPost(fmt.Sprintf("http://127.0.0.1:%d/v1/messages", apiPort), body)
+				Expect(err).ToNot(HaveOccurred())
+				defer resp.Body.Close()
+				Expect(resp.StatusCode).To(Equal(200))
+
+				data, err := io.ReadAll(resp.Body)
+				Expect(err).ToNot(HaveOccurred())
+				Expect(string(data)).To(ContainSubstring("event:"))
+			})
+		})
+	})
+
+	Describe("Open Responses with MCP", func() {
+		Context("Non-streaming", func() {
+			It("should inject and execute MCP tools when mcp_servers is set", func() {
+				body := map[string]any{
+					"model":       "mock-model-mcp",
+					"input":       "What is the weather in San Francisco?",
+					"metadata": map[string]string{"mcp_servers": "weather-api"},
+				}
+				resp, err := httpPost(fmt.Sprintf("http://127.0.0.1:%d/v1/responses", apiPort), body)
+				Expect(err).ToNot(HaveOccurred())
+				defer resp.Body.Close()
+				respBody := readBody(resp)
+				Expect(resp.StatusCode).To(Equal(200), "unexpected status, body: %s", respBody)
+
+				var result map[string]any
+				Expect(json.Unmarshal([]byte(respBody), &result)).To(Succeed())
+				// Open Responses wraps output in an "output" array
+				output, ok := result["output"].([]any)
+				Expect(ok).To(BeTrue(), "expected output array in response: %s", respBody)
+				Expect(output).ToNot(BeEmpty())
+			})
+
+			It("should auto-activate MCP tools without mcp_servers (backward compat)", func() {
+				// Open Responses auto-activates all MCP servers when no metadata
+				// mcp_servers key is provided and no user tools are set.
+				body := map[string]any{
+					"model": "mock-model-mcp",
+					"input": "Hello",
+				}
+				resp, err := httpPost(fmt.Sprintf("http://127.0.0.1:%d/v1/responses", apiPort), body)
+				Expect(err).ToNot(HaveOccurred())
+				defer resp.Body.Close()
+				respBody := readBody(resp)
+				Expect(resp.StatusCode).To(Equal(200), "unexpected status, body: %s", respBody)
+
+				var result map[string]any
+				Expect(json.Unmarshal([]byte(respBody), &result)).To(Succeed())
+				output, ok := result["output"].([]any)
+				Expect(ok).To(BeTrue(), "expected output array in response: %s", respBody)
+				Expect(output).ToNot(BeEmpty())
+			})
+		})
+
+		Context("Streaming", func() {
+			It("should work with MCP tools in streaming mode", func() {
+				body := map[string]any{
+					"model":       "mock-model-mcp",
+					"input":       "What is the weather?",
+					"metadata": map[string]string{"mcp_servers": "weather-api"},
+					"stream":      true,
+				}
+				resp, err := httpPost(fmt.Sprintf("http://127.0.0.1:%d/v1/responses", apiPort), body)
+				Expect(err).ToNot(HaveOccurred())
+				defer resp.Body.Close()
+				Expect(resp.StatusCode).To(Equal(200))
+
+				data, err := io.ReadAll(resp.Body)
+				Expect(err).ToNot(HaveOccurred())
+				Expect(string(data)).To(ContainSubstring("event:"))
+			})
+		})
+	})
+
+	Describe("Legacy /mcp endpoint", func() {
+		It("should auto-enable all MCP servers and complete the tool loop", func() {
+			body := map[string]any{
+				"model":    "mock-model-mcp",
+				"messages": []map[string]string{{"role": "user", "content": "What is the weather in San Francisco?"}},
+			}
+			resp, err := httpPost(fmt.Sprintf("http://127.0.0.1:%d/mcp/v1/chat/completions", apiPort), body)
+			Expect(err).ToNot(HaveOccurred())
+			defer resp.Body.Close()
+			respBody := readBody(resp)
+			Expect(resp.StatusCode).To(Equal(200), "unexpected status, body: %s", respBody)
+
+			var result struct {
+				Choices []struct {
+					Message struct {
+						Content string `json:"content"`
+					} `json:"message"`
+				} `json:"choices"`
+			}
+			Expect(json.Unmarshal([]byte(respBody), &result)).To(Succeed())
+			Expect(result.Choices).To(HaveLen(1))
+			Expect(result.Choices[0].Message.Content).To(ContainSubstring("weather"))
+		})
+
+		It("should respect metadata mcp_servers when provided", func() {
+			body := map[string]any{
+				"model":    "mock-model-mcp",
+				"messages": []map[string]string{{"role": "user", "content": "Hello"}},
+				"metadata": map[string]string{"mcp_servers": "non-existent-server"},
+			}
+			// Even through the /mcp endpoint, an explicit metadata selection
+			// should be honoured — a non-existent server means no MCP tools.
+			resp, err := httpPost(fmt.Sprintf("http://127.0.0.1:%d/mcp/v1/chat/completions", apiPort), body)
+			Expect(err).ToNot(HaveOccurred())
+			defer resp.Body.Close()
+			respBody := readBody(resp)
+			Expect(resp.StatusCode).To(Equal(200), "unexpected status, body: %s", respBody)
+
+			var result struct {
+				Choices []struct {
+					Message struct {
+						Content string `json:"content"`
+					} `json:"message"`
+				} `json:"choices"`
+			}
+			Expect(json.Unmarshal([]byte(respBody), &result)).To(Succeed())
+			Expect(result.Choices).To(HaveLen(1))
+			Expect(result.Choices[0].Message.Content).To(ContainSubstring("mocked response"))
+		})
+
+		It("should work in streaming mode", func() {
+			body := map[string]any{
+				"model":    "mock-model-mcp",
+				"messages": []map[string]string{{"role": "user", "content": "What is the weather?"}},
+				"stream":   true,
+			}
+			resp, err := httpPost(fmt.Sprintf("http://127.0.0.1:%d/mcp/v1/chat/completions", apiPort), body)
+			Expect(err).ToNot(HaveOccurred())
+			defer resp.Body.Close()
+			Expect(resp.StatusCode).To(Equal(200))
+			Expect(resp.Header.Get("Content-Type")).To(ContainSubstring("text/event-stream"))
+
+			data, err := io.ReadAll(resp.Body)
+			Expect(err).ToNot(HaveOccurred())
+			Expect(string(data)).To(ContainSubstring("data:"))
+		})
+	})
+
+	Describe("MCP with invalid server name", func() {
+		It("should work without MCP tools when specifying non-existent server", func() {
+			body := map[string]any{
+				"model":       "mock-model-mcp",
+				"messages":    []map[string]string{{"role": "user", "content": "Hello"}},
+				"metadata": map[string]string{"mcp_servers": "non-existent-server"},
+			}
+			resp, err := httpPost(apiURL+"/chat/completions", body)
+			Expect(err).ToNot(HaveOccurred())
+			defer resp.Body.Close()
+			Expect(resp.StatusCode).To(Equal(200))
+
+			var result struct {
+				Choices []struct {
+					Message struct {
+						Content string `json:"content"`
+					} `json:"message"`
+				} `json:"choices"`
+			}
+			Expect(json.NewDecoder(resp.Body).Decode(&result)).To(Succeed())
+			Expect(result.Choices).To(HaveLen(1))
+			Expect(result.Choices[0].Message.Content).To(ContainSubstring("mocked response"))
+		})
+	})
+})
--- a/tests/e2e/e2e_suite_test.go
+++ b/tests/e2e/e2e_suite_test.go
@@ -25,18 +25,20 @@ import (
 )

 var (
-	anthropicBaseURL string
-	tmpDir           string
-	backendPath    string
-	modelsPath     string
-	configPath     string
-	app            *echo.Echo
-	appCtx         context.Context
-	appCancel      context.CancelFunc
-	client         openai.Client
-	apiPort        int
-	apiURL         string
-	mockBackendPath string
+	anthropicBaseURL   string
+	tmpDir             string
+	backendPath        string
+	modelsPath         string
+	configPath         string
+	app                *echo.Echo
+	appCtx             context.Context
+	appCancel          context.CancelFunc
+	client             openai.Client
+	apiPort            int
+	apiURL             string
+	mockBackendPath    string
+	mcpServerURL       string
+	mcpServerShutdown  func()
 )

 var _ = BeforeSuite(func() {
@@ -99,6 +101,14 @@ var _ = BeforeSuite(func() {
 	Expect(err).ToNot(HaveOccurred())
 	Expect(os.WriteFile(configPath, configYAML, 0644)).To(Succeed())

+	// Start mock MCP server and create MCP-enabled model config
+	mcpServerURL, mcpServerShutdown = startMockMCPServer()
+	mcpConfig := mcpModelConfig(mcpServerURL)
+	mcpConfigPath := filepath.Join(modelsPath, "mock-model-mcp.yaml")
+	mcpConfigYAML, err := yaml.Marshal(mcpConfig)
+	Expect(err).ToNot(HaveOccurred())
+	Expect(os.WriteFile(mcpConfigPath, mcpConfigYAML, 0644)).To(Succeed())
+
 	// Set up system state
 	systemState, err := system.GetSystemState(
 		system.WithBackendPath(backendPath),
@@ -160,6 +170,9 @@ var _ = AfterSuite(func() {
 		defer cancel()
 		Expect(app.Shutdown(ctx)).To(Succeed())
 	}
+	if mcpServerShutdown != nil {
+		mcpServerShutdown()
+	}
 	if tmpDir != "" {
 		os.RemoveAll(tmpDir)
 	}
--- a/tests/e2e/mock-backend/main.go
+++ b/tests/e2e/mock-backend/main.go
@@ -10,6 +10,7 @@ import (
 	"net"
 	"os"
 	"path/filepath"
+	"strings"

 	pb "github.com/mudler/LocalAI/pkg/grpc/proto"
 	"github.com/mudler/xlog"
@@ -20,11 +21,20 @@ var (
 	addr = flag.String("addr", "localhost:50051", "the address to connect to")
 )

-// MockBackend implements the Backend gRPC service with mocked responses
+// MockBackend implements the Backend gRPC service with mocked responses.
+// When tools are present but the prompt already contains MCP tool results
+// (indicated by the marker from the mock MCP server), it returns a plain
+// text response instead of another tool call, letting the MCP loop complete.
 type MockBackend struct {
 	pb.UnimplementedBackendServer
 }

+// promptHasToolResults checks if the prompt contains evidence of prior tool
+// execution — specifically the output from the mock MCP server's get_weather tool.
+func promptHasToolResults(prompt string) bool {
+	return strings.Contains(prompt, "Weather in")
+}
+
 func (m *MockBackend) Health(ctx context.Context, in *pb.HealthMessage) (*pb.Reply, error) {
 	xlog.Debug("Health check called")
 	return &pb.Reply{Message: []byte("OK")}, nil
@@ -42,8 +52,12 @@ func (m *MockBackend) Predict(ctx context.Context, in *pb.PredictOptions) (*pb.R
 	xlog.Debug("Predict called", "prompt", in.Prompt)
 	var response string
 	toolName := mockToolNameFromRequest(in)
-	if toolName != "" {
+	if toolName != "" && !promptHasToolResults(in.Prompt) {
+		// First call with tools: return a tool call so the MCP loop executes it.
 		response = fmt.Sprintf(`{"name": "%s", "arguments": {"location": "San Francisco"}}`, toolName)
+	} else if toolName != "" {
+		// Subsequent call: tool results already in prompt, return final text.
+		response = "Based on the tool results, the weather in San Francisco is sunny, 72°F."
 	} else {
 		response = "This is a mocked response."
 	}
@@ -60,8 +74,10 @@ func (m *MockBackend) PredictStream(in *pb.PredictOptions, stream pb.Backend_Pre
 	xlog.Debug("PredictStream called", "prompt", in.Prompt)
 	var toStream string
 	toolName := mockToolNameFromRequest(in)
-	if toolName != "" {
+	if toolName != "" && !promptHasToolResults(in.Prompt) {
 		toStream = fmt.Sprintf(`{"name": "%s", "arguments": {"location": "San Francisco"}}`, toolName)
+	} else if toolName != "" {
+		toStream = "Based on the tool results, the weather in San Francisco is sunny, 72°F."
 	} else {
 		toStream = "This is a mocked streaming response."
 	}