feat(ui): MCP Apps, mcp streaming and client-side support (#8947)

* Revert "fix: Add timeout-based wait for model deletion completion (#8756)"

This reverts commit 9e1b0d0c82.

Signed-off-by: Ettore Di Giacinto <mudler@localai.io>

* feat: add mcp prompts and resources

Signed-off-by: Ettore Di Giacinto <mudler@localai.io>

* feat(ui): add client-side MCP

Signed-off-by: Ettore Di Giacinto <mudler@localai.io>

* feat(ui): allow to authenticate MCP servers

Signed-off-by: Ettore Di Giacinto <mudler@localai.io>

* feat(ui): add MCP Apps

Signed-off-by: Ettore Di Giacinto <mudler@localai.io>

* chore: update AGENTS

Signed-off-by: Ettore Di Giacinto <mudler@localai.io>

* chore: allow to collapse navbar, save state in storage

Signed-off-by: Ettore Di Giacinto <mudler@localai.io>

* feat(ui): add MCP button also to home page

Signed-off-by: Ettore Di Giacinto <mudler@localai.io>

* fix(chat): populate string content

Signed-off-by: Ettore Di Giacinto <mudler@localai.io>

---------

Signed-off-by: Ettore Di Giacinto <mudler@localai.io>
This commit is contained in:
Ettore Di Giacinto
2026-03-11 07:30:49 +01:00
committed by GitHub
parent 79f90de935
commit 8818452d85
44 changed files with 6526 additions and 2479 deletions

143
.agents/adding-backends.md Normal file
View File

@@ -0,0 +1,143 @@
# Adding a New Backend
When adding a new backend to LocalAI, you need to update several files to ensure the backend is properly built, tested, and registered. Here's a step-by-step guide based on the pattern used for adding backends like `moonshine`:
## 1. Create Backend Directory Structure
Create the backend directory under the appropriate location:
- **Python backends**: `backend/python/<backend-name>/`
- **Go backends**: `backend/go/<backend-name>/`
- **C++ backends**: `backend/cpp/<backend-name>/`
For Python backends, you'll typically need:
- `backend.py` - Main gRPC server implementation
- `Makefile` - Build configuration
- `install.sh` - Installation script for dependencies
- `protogen.sh` - Protocol buffer generation script
- `requirements.txt` - Python dependencies
- `run.sh` - Runtime script
- `test.py` / `test.sh` - Test files
## 2. Add Build Configurations to `.github/workflows/backend.yml`
Add build matrix entries for each platform/GPU type you want to support. Look at similar backends (e.g., `chatterbox`, `faster-whisper`) for reference.
**Placement in file:**
- CPU builds: Add after other CPU builds (e.g., after `cpu-chatterbox`)
- CUDA 12 builds: Add after other CUDA 12 builds (e.g., after `gpu-nvidia-cuda-12-chatterbox`)
- CUDA 13 builds: Add after other CUDA 13 builds (e.g., after `gpu-nvidia-cuda-13-chatterbox`)
**Additional build types you may need:**
- ROCm/HIP: Use `build-type: 'hipblas'` with `base-image: "rocm/dev-ubuntu-24.04:6.4.4"`
- Intel/SYCL: Use `build-type: 'intel'` or `build-type: 'sycl_f16'`/`sycl_f32` with `base-image: "intel/oneapi-basekit:2025.3.0-0-devel-ubuntu24.04"`
- L4T (ARM): Use `build-type: 'l4t'` with `platforms: 'linux/arm64'` and `runs-on: 'ubuntu-24.04-arm'`
## 3. Add Backend Metadata to `backend/index.yaml`
**Step 3a: Add Meta Definition**
Add a YAML anchor definition in the `## metas` section (around line 2-300). Look for similar backends to use as a template such as `diffusers` or `chatterbox`
**Step 3b: Add Image Entries**
Add image entries at the end of the file, following the pattern of similar backends such as `diffusers` or `chatterbox`. Include both `latest` (production) and `master` (development) tags.
## 4. Update the Makefile
The Makefile needs to be updated in several places to support building and testing the new backend:
**Step 4a: Add to `.NOTPARALLEL`**
Add `backends/<backend-name>` to the `.NOTPARALLEL` line (around line 2) to prevent parallel execution conflicts:
```makefile
.NOTPARALLEL: ... backends/<backend-name>
```
**Step 4b: Add to `prepare-test-extra`**
Add the backend to the `prepare-test-extra` target (around line 312) to prepare it for testing:
```makefile
prepare-test-extra: protogen-python
...
$(MAKE) -C backend/python/<backend-name>
```
**Step 4c: Add to `test-extra`**
Add the backend to the `test-extra` target (around line 319) to run its tests:
```makefile
test-extra: prepare-test-extra
...
$(MAKE) -C backend/python/<backend-name> test
```
**Step 4d: Add Backend Definition**
Add a backend definition variable in the backend definitions section (around line 428-457). The format depends on the backend type:
**For Python backends with root context** (like `faster-whisper`, `coqui`):
```makefile
BACKEND_<BACKEND_NAME> = <backend-name>|python|.|false|true
```
**For Python backends with `./backend` context** (like `chatterbox`, `moonshine`):
```makefile
BACKEND_<BACKEND_NAME> = <backend-name>|python|./backend|false|true
```
**For Go backends**:
```makefile
BACKEND_<BACKEND_NAME> = <backend-name>|golang|.|false|true
```
**Step 4e: Generate Docker Build Target**
Add an eval call to generate the docker-build target (around line 480-501):
```makefile
$(eval $(call generate-docker-build-target,$(BACKEND_<BACKEND_NAME>)))
```
**Step 4f: Add to `docker-build-backends`**
Add `docker-build-<backend-name>` to the `docker-build-backends` target (around line 507):
```makefile
docker-build-backends: ... docker-build-<backend-name>
```
**Determining the Context:**
- If the backend is in `backend/python/<backend-name>/` and uses `./backend` as context in the workflow file, use `./backend` context
- If the backend is in `backend/python/<backend-name>/` but uses `.` as context in the workflow file, use `.` context
- Check similar backends to determine the correct context
## 5. Verification Checklist
After adding a new backend, verify:
- [ ] Backend directory structure is complete with all necessary files
- [ ] Build configurations added to `.github/workflows/backend.yml` for all desired platforms
- [ ] Meta definition added to `backend/index.yaml` in the `## metas` section
- [ ] Image entries added to `backend/index.yaml` for all build variants (latest + development)
- [ ] Tag suffixes match between workflow file and index.yaml
- [ ] Makefile updated with all 6 required changes (`.NOTPARALLEL`, `prepare-test-extra`, `test-extra`, backend definition, docker-build target eval, `docker-build-backends`)
- [ ] No YAML syntax errors (check with linter)
- [ ] No Makefile syntax errors (check with linter)
- [ ] Follows the same pattern as similar backends (e.g., if it's a transcription backend, follow `faster-whisper` pattern)
## 6. Example: Adding a Python Backend
For reference, when `moonshine` was added:
- **Files created**: `backend/python/moonshine/{backend.py, Makefile, install.sh, protogen.sh, requirements.txt, run.sh, test.py, test.sh}`
- **Workflow entries**: 3 build configurations (CPU, CUDA 12, CUDA 13)
- **Index entries**: 1 meta definition + 6 image entries (cpu, cuda12, cuda13 x latest/development)
- **Makefile updates**:
- Added to `.NOTPARALLEL` line
- Added to `prepare-test-extra` and `test-extra` targets
- Added `BACKEND_MOONSHINE = moonshine|python|./backend|false|true`
- Added eval for docker-build target generation
- Added `docker-build-moonshine` to `docker-build-backends`

View File

@@ -0,0 +1,16 @@
# Build and Testing
Building and testing the project depends on the components involved and the platform where development is taking place. Due to the amount of context required it's usually best not to try building or testing the project unless the user requests it. If you must build the project then inspect the Makefile in the project root and the Makefiles of any backends that are effected by changes you are making. In addition the workflows in .github/workflows can be used as a reference when it is unclear how to build or test a component. The primary Makefile contains targets for building inside or outside Docker, if the user has not previously specified a preference then ask which they would like to use.
## Building a specified backend
Let's say the user wants to build a particular backend for a given platform. For example let's say they want to build coqui for ROCM/hipblas
- The Makefile has targets like `docker-build-coqui` created with `generate-docker-build-target` at the time of writing. Recently added backends may require a new target.
- At a minimum we need to set the BUILD_TYPE, BASE_IMAGE build-args
- Use .github/workflows/backend.yml as a reference it lists the needed args in the `include` job strategy matrix
- l4t and cublas also requires the CUDA major and minor version
- You can pretty print a command like `DOCKER_MAKEFLAGS=-j$(nproc --ignore=1) BUILD_TYPE=hipblas BASE_IMAGE=rocm/dev-ubuntu-24.04:6.4.4 make docker-build-coqui`
- Unless the user specifies that they want you to run the command, then just print it because not all agent frontends handle long running jobs well and the output may overflow your context
- The user may say they want to build AMD or ROCM instead of hipblas, or Intel instead of SYCL or NVIDIA insted of l4t or cublas. Ask for confirmation if there is ambiguity.
- Sometimes the user may need extra parameters to be added to `docker build` (e.g. `--platform` for cross-platform builds or `--progress` to view the full logs), in which case you can generate the `docker build` command directly.

51
.agents/coding-style.md Normal file
View File

@@ -0,0 +1,51 @@
# Coding Style
The project has the following .editorconfig:
```
root = true
[*]
indent_style = space
indent_size = 2
end_of_line = lf
charset = utf-8
trim_trailing_whitespace = true
insert_final_newline = true
[*.go]
indent_style = tab
[Makefile]
indent_style = tab
[*.proto]
indent_size = 2
[*.py]
indent_size = 4
[*.js]
indent_size = 2
[*.yaml]
indent_size = 2
[*.md]
trim_trailing_whitespace = false
```
- Use comments sparingly to explain why code does something, not what it does. Comments are there to add context that would be difficult to deduce from reading the code.
- Prefer modern Go e.g. use `any` not `interface{}`
## Logging
Use `github.com/mudler/xlog` for logging which has the same API as slog.
## Documentation
The project documentation is located in `docs/content`. When adding new features or changing existing functionality, it is crucial to update the documentation to reflect these changes. This helps users understand how to use the new capabilities and ensures the documentation stays relevant.
- **Feature Documentation**: If you add a new feature (like a new backend or API endpoint), create a new markdown file in `docs/content/features/` explaining what it is, how to configure it, and how to use it.
- **Configuration**: If you modify configuration options, update the relevant sections in `docs/content/`.
- **Examples**: providing concrete examples (like YAML configuration blocks) is highly encouraged to help users get started quickly.

View File

@@ -0,0 +1,77 @@
# llama.cpp Backend
The llama.cpp backend (`backend/cpp/llama-cpp/grpc-server.cpp`) is a gRPC adaptation of the upstream HTTP server (`llama.cpp/tools/server/server.cpp`). It uses the same underlying server infrastructure from `llama.cpp/tools/server/server-context.cpp`.
## Building and Testing
- Test llama.cpp backend compilation: `make backends/llama-cpp`
- The backend is built as part of the main build process
- Check `backend/cpp/llama-cpp/Makefile` for build configuration
## Architecture
- **grpc-server.cpp**: gRPC server implementation, adapts HTTP server patterns to gRPC
- Uses shared server infrastructure: `server-context.cpp`, `server-task.cpp`, `server-queue.cpp`, `server-common.cpp`
- The gRPC server mirrors the HTTP server's functionality but uses gRPC instead of HTTP
## Common Issues When Updating llama.cpp
When fixing compilation errors after upstream changes:
1. Check how `server.cpp` (HTTP server) handles the same change
2. Look for new public APIs or getter methods
3. Store copies of needed data instead of accessing private members
4. Update function calls to match new signatures
5. Test with `make backends/llama-cpp`
## Key Differences from HTTP Server
- gRPC uses `BackendServiceImpl` class with gRPC service methods
- HTTP server uses `server_routes` with HTTP handlers
- Both use the same `server_context` and task queue infrastructure
- gRPC methods: `LoadModel`, `Predict`, `PredictStream`, `Embedding`, `Rerank`, `TokenizeString`, `GetMetrics`, `Health`
## Tool Call Parsing Maintenance
When working on JSON/XML tool call parsing functionality, always check llama.cpp for reference implementation and updates:
### Checking for XML Parsing Changes
1. **Review XML Format Definitions**: Check `llama.cpp/common/chat-parser-xml-toolcall.h` for `xml_tool_call_format` struct changes
2. **Review Parsing Logic**: Check `llama.cpp/common/chat-parser-xml-toolcall.cpp` for parsing algorithm updates
3. **Review Format Presets**: Check `llama.cpp/common/chat-parser.cpp` for new XML format presets (search for `xml_tool_call_format form`)
4. **Review Model Lists**: Check `llama.cpp/common/chat.h` for `COMMON_CHAT_FORMAT_*` enum values that use XML parsing:
- `COMMON_CHAT_FORMAT_GLM_4_5`
- `COMMON_CHAT_FORMAT_MINIMAX_M2`
- `COMMON_CHAT_FORMAT_KIMI_K2`
- `COMMON_CHAT_FORMAT_QWEN3_CODER_XML`
- `COMMON_CHAT_FORMAT_APRIEL_1_5`
- `COMMON_CHAT_FORMAT_XIAOMI_MIMO`
- Any new formats added
### Model Configuration Options
Always check `llama.cpp` for new model configuration options that should be supported in LocalAI:
1. **Check Server Context**: Review `llama.cpp/tools/server/server-context.cpp` for new parameters
2. **Check Chat Params**: Review `llama.cpp/common/chat.h` for `common_chat_params` struct changes
3. **Check Server Options**: Review `llama.cpp/tools/server/server.cpp` for command-line argument changes
4. **Examples of options to check**:
- `ctx_shift` - Context shifting support
- `parallel_tool_calls` - Parallel tool calling
- `reasoning_format` - Reasoning format options
- Any new flags or parameters
### Implementation Guidelines
1. **Feature Parity**: Always aim for feature parity with llama.cpp's implementation
2. **Test Coverage**: Add tests for new features matching llama.cpp's behavior
3. **Documentation**: Update relevant documentation when adding new formats or options
4. **Backward Compatibility**: Ensure changes don't break existing functionality
### Files to Monitor
- `llama.cpp/common/chat-parser-xml-toolcall.h` - Format definitions
- `llama.cpp/common/chat-parser-xml-toolcall.cpp` - Parsing logic
- `llama.cpp/common/chat-parser.cpp` - Format presets and model-specific handlers
- `llama.cpp/common/chat.h` - Format enums and parameter structures
- `llama.cpp/tools/server/server-context.cpp` - Server configuration options

120
.agents/testing-mcp-apps.md Normal file
View File

@@ -0,0 +1,120 @@
# Testing MCP Apps (Interactive Tool UIs)
MCP Apps is an extension to MCP where tools declare interactive HTML UIs via `_meta.ui.resourceUri`. When the LLM calls such a tool, the UI renders the app in a sandboxed iframe inline in the chat. The app communicates bidirectionally with the host via `postMessage` (JSON-RPC) and can call server tools, send messages, and update model context.
Spec: https://modelcontextprotocol.io/extensions/apps/overview
## Quick Start: Run a Test MCP App Server
The `@modelcontextprotocol/server-basic-react` npm package is a ready-to-use test server that exposes a `get-time` tool with an interactive React clock UI. It requires Node >= 20, so run it in Docker:
```bash
docker run -d --name mcp-app-test -p 3001:3001 node:22-slim \
sh -c 'npx -y @modelcontextprotocol/server-basic-react'
```
Wait ~10 seconds for it to start, then verify:
```bash
# Check it's running
docker logs mcp-app-test
# Expected: "MCP server listening on http://localhost:3001/mcp"
# Verify MCP protocol works
curl -s -X POST http://localhost:3001/mcp \
-H 'Content-Type: application/json' \
-H 'Accept: application/json, text/event-stream' \
-d '{"jsonrpc":"2.0","id":1,"method":"initialize","params":{"protocolVersion":"2025-03-26","capabilities":{},"clientInfo":{"name":"test","version":"1.0.0"}}}'
# List tools — should show get-time with _meta.ui.resourceUri
curl -s -X POST http://localhost:3001/mcp \
-H 'Content-Type: application/json' \
-H 'Accept: application/json, text/event-stream' \
-d '{"jsonrpc":"2.0","id":2,"method":"tools/list","params":{}}'
```
The `tools/list` response should contain:
```json
{
"name": "get-time",
"_meta": {
"ui": { "resourceUri": "ui://get-time/mcp-app.html" }
}
}
```
## Testing in LocalAI's UI
1. Make sure LocalAI is running (e.g. `http://localhost:8080`)
2. Build the React UI: `cd core/http/react-ui && npm install && npm run build`
3. Open the Chat page in your browser
4. Click **"Client MCP"** in the chat header
5. Add a new client MCP server:
- **URL**: `http://localhost:3001/mcp`
- **Use CORS proxy**: enabled (default) — required because the browser can't hit `localhost:3001` directly due to CORS; LocalAI's proxy at `/api/cors-proxy` handles it
6. The server should connect and discover the `get-time` tool
7. Select a model and send: **"What time is it?"**
8. The LLM should call the `get-time` tool
9. The tool result should render the interactive React clock app in an iframe as a standalone chat message (not inside the collapsed activity group)
## What to Verify
- [ ] Tool appears in the connected tools list (not filtered — `get-time` is callable by the LLM)
- [ ] The iframe renders as a standalone chat message with a puzzle-piece icon
- [ ] The app loads and is interactive (clock UI, buttons work)
- [ ] No "Reconnect to MCP server" overlay (connection is live)
- [ ] Console logs show bidirectional communication:
- `tools/call` messages from app to host (app calling server tools)
- `ui/message` notifications (app sending messages)
- [ ] After the app renders, the LLM continues and produces a text response with the time
- [ ] Non-UI tools continue to work normally (text-only results)
- [ ] Page reload shows the HTML statically with a reconnect overlay until you reconnect
## Console Log Patterns
Healthy bidirectional communication looks like:
```
Parsed message { jsonrpc: "2.0", id: N, result: {...} } // Bridge init
get-time result: { content: [...] } // Tool result received
Calling get-time tool... // App calls tool
Sending message { method: "tools/call", ... } // App -> host -> server
Parsed message { jsonrpc: "2.0", id: N, result: {...} } // Server response
Sending message text to Host: ... // App sends message
Sending message { method: "ui/message", ... } // Message notification
Message accepted // Host acknowledged
```
Benign warnings to ignore:
- `Source map error: ... about:srcdoc` — browser devtools can't find source maps for srcdoc iframes
- `Ignoring message from unknown source` — duplicate postMessage from iframe navigation
- `notifications/cancelled` — app cleaning up previous requests
## Architecture Notes
- **No server-side changes needed** — the MCP App protocol runs entirely in the browser
- `PostMessageTransport` wraps `window.postMessage` between host and `srcdoc` iframe
- `AppBridge` (from `@modelcontextprotocol/ext-apps`) auto-forwards `tools/call`, `resources/read`, `resources/list` from the app to the MCP server via the host's `Client`
- The iframe uses `sandbox="allow-scripts allow-forms"` (no `allow-same-origin`) — opaque origin, no access to host cookies/DOM/localStorage
- App-only tools (`_meta.ui.visibility: "app-only"`) are filtered from the LLM's tool list but remain callable by the app iframe
## Key Files
- `core/http/react-ui/src/components/MCPAppFrame.jsx` — iframe + AppBridge component
- `core/http/react-ui/src/hooks/useMCPClient.js` — MCP client hook with app UI helpers (`hasAppUI`, `getAppResource`, `getClientForTool`, `getToolDefinition`)
- `core/http/react-ui/src/hooks/useChat.js` — agentic loop, attaches `appUI` to tool_result messages
- `core/http/react-ui/src/pages/Chat.jsx` — renders MCPAppFrame as standalone chat messages
## Other Test Servers
The `@modelcontextprotocol/ext-apps` repo has many example servers:
- `@modelcontextprotocol/server-basic-react` — simple clock (React)
- More examples at https://github.com/modelcontextprotocol/ext-apps/tree/main/examples
All examples support both stdio and HTTP transport. Run without `--stdio` for HTTP mode on port 3001.
## Cleanup
```bash
docker rm -f mcp-app-test
```

302
AGENTS.md
View File

@@ -1,290 +1,22 @@
# Build and testing
# LocalAI Agent Instructions
Building and testing the project depends on the components involved and the platform where development is taking place. Due to the amount of context required it's usually best not to try building or testing the project unless the user requests it. If you must build the project then inspect the Makefile in the project root and the Makefiles of any backends that are effected by changes you are making. In addition the workflows in .github/workflows can be used as a reference when it is unclear how to build or test a component. The primary Makefile contains targets for building inside or outside Docker, if the user has not previously specified a preference then ask which they would like to use.
This file is an index to detailed topic guides in the `.agents/` directory. Read the relevant file(s) for the task at hand — you don't need to load all of them.
## Building a specified backend
## Topics
Let's say the user wants to build a particular backend for a given platform. For example let's say they want to build coqui for ROCM/hipblas
| File | When to read |
|------|-------------|
| [.agents/building-and-testing.md](.agents/building-and-testing.md) | Building the project, running tests, Docker builds for specific platforms |
| [.agents/adding-backends.md](.agents/adding-backends.md) | Adding a new backend (Python, Go, or C++) — full step-by-step checklist |
| [.agents/coding-style.md](.agents/coding-style.md) | Code style, editorconfig, logging, documentation conventions |
| [.agents/llama-cpp-backend.md](.agents/llama-cpp-backend.md) | Working on the llama.cpp backend — architecture, updating, tool call parsing |
| [.agents/testing-mcp-apps.md](.agents/testing-mcp-apps.md) | Testing MCP Apps (interactive tool UIs) in the React UI |
- The Makefile has targets like `docker-build-coqui` created with `generate-docker-build-target` at the time of writing. Recently added backends may require a new target.
- At a minimum we need to set the BUILD_TYPE, BASE_IMAGE build-args
- Use .github/workflows/backend.yml as a reference it lists the needed args in the `include` job strategy matrix
- l4t and cublas also requires the CUDA major and minor version
- You can pretty print a command like `DOCKER_MAKEFLAGS=-j$(nproc --ignore=1) BUILD_TYPE=hipblas BASE_IMAGE=rocm/dev-ubuntu-24.04:6.4.4 make docker-build-coqui`
- Unless the user specifies that they want you to run the command, then just print it because not all agent frontends handle long running jobs well and the output may overflow your context
- The user may say they want to build AMD or ROCM instead of hipblas, or Intel instead of SYCL or NVIDIA insted of l4t or cublas. Ask for confirmation if there is ambiguity.
- Sometimes the user may need extra parameters to be added to `docker build` (e.g. `--platform` for cross-platform builds or `--progress` to view the full logs), in which case you can generate the `docker build` command directly.
## Quick Reference
## Adding a New Backend
When adding a new backend to LocalAI, you need to update several files to ensure the backend is properly built, tested, and registered. Here's a step-by-step guide based on the pattern used for adding backends like `moonshine`:
### 1. Create Backend Directory Structure
Create the backend directory under the appropriate location:
- **Python backends**: `backend/python/<backend-name>/`
- **Go backends**: `backend/go/<backend-name>/`
- **C++ backends**: `backend/cpp/<backend-name>/`
For Python backends, you'll typically need:
- `backend.py` - Main gRPC server implementation
- `Makefile` - Build configuration
- `install.sh` - Installation script for dependencies
- `protogen.sh` - Protocol buffer generation script
- `requirements.txt` - Python dependencies
- `run.sh` - Runtime script
- `test.py` / `test.sh` - Test files
### 2. Add Build Configurations to `.github/workflows/backend.yml`
Add build matrix entries for each platform/GPU type you want to support. Look at similar backends (e.g., `chatterbox`, `faster-whisper`) for reference.
**Placement in file:**
- CPU builds: Add after other CPU builds (e.g., after `cpu-chatterbox`)
- CUDA 12 builds: Add after other CUDA 12 builds (e.g., after `gpu-nvidia-cuda-12-chatterbox`)
- CUDA 13 builds: Add after other CUDA 13 builds (e.g., after `gpu-nvidia-cuda-13-chatterbox`)
**Additional build types you may need:**
- ROCm/HIP: Use `build-type: 'hipblas'` with `base-image: "rocm/dev-ubuntu-24.04:6.4.4"`
- Intel/SYCL: Use `build-type: 'intel'` or `build-type: 'sycl_f16'`/`sycl_f32` with `base-image: "intel/oneapi-basekit:2025.3.0-0-devel-ubuntu24.04"`
- L4T (ARM): Use `build-type: 'l4t'` with `platforms: 'linux/arm64'` and `runs-on: 'ubuntu-24.04-arm'`
### 3. Add Backend Metadata to `backend/index.yaml`
**Step 3a: Add Meta Definition**
Add a YAML anchor definition in the `## metas` section (around line 2-300). Look for similar backends to use as a template such as `diffusers` or `chatterbox`
**Step 3b: Add Image Entries**
Add image entries at the end of the file, following the pattern of similar backends such as `diffusers` or `chatterbox`. Include both `latest` (production) and `master` (development) tags.
### 4. Update the Makefile
The Makefile needs to be updated in several places to support building and testing the new backend:
**Step 4a: Add to `.NOTPARALLEL`**
Add `backends/<backend-name>` to the `.NOTPARALLEL` line (around line 2) to prevent parallel execution conflicts:
```makefile
.NOTPARALLEL: ... backends/<backend-name>
```
**Step 4b: Add to `prepare-test-extra`**
Add the backend to the `prepare-test-extra` target (around line 312) to prepare it for testing:
```makefile
prepare-test-extra: protogen-python
...
$(MAKE) -C backend/python/<backend-name>
```
**Step 4c: Add to `test-extra`**
Add the backend to the `test-extra` target (around line 319) to run its tests:
```makefile
test-extra: prepare-test-extra
...
$(MAKE) -C backend/python/<backend-name> test
```
**Step 4d: Add Backend Definition**
Add a backend definition variable in the backend definitions section (around line 428-457). The format depends on the backend type:
**For Python backends with root context** (like `faster-whisper`, `coqui`):
```makefile
BACKEND_<BACKEND_NAME> = <backend-name>|python|.|false|true
```
**For Python backends with `./backend` context** (like `chatterbox`, `moonshine`):
```makefile
BACKEND_<BACKEND_NAME> = <backend-name>|python|./backend|false|true
```
**For Go backends**:
```makefile
BACKEND_<BACKEND_NAME> = <backend-name>|golang|.|false|true
```
**Step 4e: Generate Docker Build Target**
Add an eval call to generate the docker-build target (around line 480-501):
```makefile
$(eval $(call generate-docker-build-target,$(BACKEND_<BACKEND_NAME>)))
```
**Step 4f: Add to `docker-build-backends`**
Add `docker-build-<backend-name>` to the `docker-build-backends` target (around line 507):
```makefile
docker-build-backends: ... docker-build-<backend-name>
```
**Determining the Context:**
- If the backend is in `backend/python/<backend-name>/` and uses `./backend` as context in the workflow file, use `./backend` context
- If the backend is in `backend/python/<backend-name>/` but uses `.` as context in the workflow file, use `.` context
- Check similar backends to determine the correct context
### 5. Verification Checklist
After adding a new backend, verify:
- [ ] Backend directory structure is complete with all necessary files
- [ ] Build configurations added to `.github/workflows/backend.yml` for all desired platforms
- [ ] Meta definition added to `backend/index.yaml` in the `## metas` section
- [ ] Image entries added to `backend/index.yaml` for all build variants (latest + development)
- [ ] Tag suffixes match between workflow file and index.yaml
- [ ] Makefile updated with all 6 required changes (`.NOTPARALLEL`, `prepare-test-extra`, `test-extra`, backend definition, docker-build target eval, `docker-build-backends`)
- [ ] No YAML syntax errors (check with linter)
- [ ] No Makefile syntax errors (check with linter)
- [ ] Follows the same pattern as similar backends (e.g., if it's a transcription backend, follow `faster-whisper` pattern)
### 6. Example: Adding a Python Backend
For reference, when `moonshine` was added:
- **Files created**: `backend/python/moonshine/{backend.py, Makefile, install.sh, protogen.sh, requirements.txt, run.sh, test.py, test.sh}`
- **Workflow entries**: 3 build configurations (CPU, CUDA 12, CUDA 13)
- **Index entries**: 1 meta definition + 6 image entries (cpu, cuda12, cuda13 × latest/development)
- **Makefile updates**:
- Added to `.NOTPARALLEL` line
- Added to `prepare-test-extra` and `test-extra` targets
- Added `BACKEND_MOONSHINE = moonshine|python|./backend|false|true`
- Added eval for docker-build target generation
- Added `docker-build-moonshine` to `docker-build-backends`
# Coding style
- The project has the following .editorconfig
```
root = true
[*]
indent_style = space
indent_size = 2
end_of_line = lf
charset = utf-8
trim_trailing_whitespace = true
insert_final_newline = true
[*.go]
indent_style = tab
[Makefile]
indent_style = tab
[*.proto]
indent_size = 2
[*.py]
indent_size = 4
[*.js]
indent_size = 2
[*.yaml]
indent_size = 2
[*.md]
trim_trailing_whitespace = false
```
- Use comments sparingly to explain why code does something, not what it does. Comments are there to add context that would be difficult to deduce from reading the code.
- Prefer modern Go e.g. use `any` not `interface{}`
# Logging
Use `github.com/mudler/xlog` for logging which has the same API as slog.
# llama.cpp Backend
The llama.cpp backend (`backend/cpp/llama-cpp/grpc-server.cpp`) is a gRPC adaptation of the upstream HTTP server (`llama.cpp/tools/server/server.cpp`). It uses the same underlying server infrastructure from `llama.cpp/tools/server/server-context.cpp`.
## Building and Testing
- Test llama.cpp backend compilation: `make backends/llama-cpp`
- The backend is built as part of the main build process
- Check `backend/cpp/llama-cpp/Makefile` for build configuration
## Architecture
- **grpc-server.cpp**: gRPC server implementation, adapts HTTP server patterns to gRPC
- Uses shared server infrastructure: `server-context.cpp`, `server-task.cpp`, `server-queue.cpp`, `server-common.cpp`
- The gRPC server mirrors the HTTP server's functionality but uses gRPC instead of HTTP
## Common Issues When Updating llama.cpp
When fixing compilation errors after upstream changes:
1. Check how `server.cpp` (HTTP server) handles the same change
2. Look for new public APIs or getter methods
3. Store copies of needed data instead of accessing private members
4. Update function calls to match new signatures
5. Test with `make backends/llama-cpp`
## Key Differences from HTTP Server
- gRPC uses `BackendServiceImpl` class with gRPC service methods
- HTTP server uses `server_routes` with HTTP handlers
- Both use the same `server_context` and task queue infrastructure
- gRPC methods: `LoadModel`, `Predict`, `PredictStream`, `Embedding`, `Rerank`, `TokenizeString`, `GetMetrics`, `Health`
## Tool Call Parsing Maintenance
When working on JSON/XML tool call parsing functionality, always check llama.cpp for reference implementation and updates:
### Checking for XML Parsing Changes
1. **Review XML Format Definitions**: Check `llama.cpp/common/chat-parser-xml-toolcall.h` for `xml_tool_call_format` struct changes
2. **Review Parsing Logic**: Check `llama.cpp/common/chat-parser-xml-toolcall.cpp` for parsing algorithm updates
3. **Review Format Presets**: Check `llama.cpp/common/chat-parser.cpp` for new XML format presets (search for `xml_tool_call_format form`)
4. **Review Model Lists**: Check `llama.cpp/common/chat.h` for `COMMON_CHAT_FORMAT_*` enum values that use XML parsing:
- `COMMON_CHAT_FORMAT_GLM_4_5`
- `COMMON_CHAT_FORMAT_MINIMAX_M2`
- `COMMON_CHAT_FORMAT_KIMI_K2`
- `COMMON_CHAT_FORMAT_QWEN3_CODER_XML`
- `COMMON_CHAT_FORMAT_APRIEL_1_5`
- `COMMON_CHAT_FORMAT_XIAOMI_MIMO`
- Any new formats added
### Model Configuration Options
Always check `llama.cpp` for new model configuration options that should be supported in LocalAI:
1. **Check Server Context**: Review `llama.cpp/tools/server/server-context.cpp` for new parameters
2. **Check Chat Params**: Review `llama.cpp/common/chat.h` for `common_chat_params` struct changes
3. **Check Server Options**: Review `llama.cpp/tools/server/server.cpp` for command-line argument changes
4. **Examples of options to check**:
- `ctx_shift` - Context shifting support
- `parallel_tool_calls` - Parallel tool calling
- `reasoning_format` - Reasoning format options
- Any new flags or parameters
### Implementation Guidelines
1. **Feature Parity**: Always aim for feature parity with llama.cpp's implementation
2. **Test Coverage**: Add tests for new features matching llama.cpp's behavior
3. **Documentation**: Update relevant documentation when adding new formats or options
4. **Backward Compatibility**: Ensure changes don't break existing functionality
### Files to Monitor
- `llama.cpp/common/chat-parser-xml-toolcall.h` - Format definitions
- `llama.cpp/common/chat-parser-xml-toolcall.cpp` - Parsing logic
- `llama.cpp/common/chat-parser.cpp` - Format presets and model-specific handlers
- `llama.cpp/common/chat.h` - Format enums and parameter structures
- `llama.cpp/tools/server/server-context.cpp` - Server configuration options
# Documentation
The project documentation is located in `docs/content`. When adding new features or changing existing functionality, it is crucial to update the documentation to reflect these changes. This helps users understand how to use the new capabilities and ensures the documentation stays relevant.
- **Feature Documentation**: If you add a new feature (like a new backend or API endpoint), create a new markdown file in `docs/content/features/` explaining what it is, how to configure it, and how to use it.
- **Configuration**: If you modify configuration options, update the relevant sections in `docs/content/`.
- **Examples**: providing concrete examples (like YAML configuration blocks) is highly encouraged to help users get started quickly.
- **Logging**: Use `github.com/mudler/xlog` (same API as slog)
- **Go style**: Prefer `any` over `interface{}`
- **Comments**: Explain *why*, not *what*
- **Docs**: Update `docs/content/` when adding features or changing config
- **Build**: Inspect `Makefile` and `.github/workflows/` — ask the user before running long builds
- **UI**: The active UI is the React app in `core/http/react-ui/`. The older Alpine.js/HTML UI in `core/http/static/` is pending deprecation — all new UI work goes in the React UI

View File

@@ -235,7 +235,7 @@ local-ai run oci://localai/phi-2:latest
For more information, see [💻 Getting started](https://localai.io/basics/getting_started/index.html), if you are interested in our roadmap items and future enhancements, you can see the [Issues labeled as Roadmap here](https://github.com/mudler/LocalAI/issues?q=is%3Aissue+is%3Aopen+label%3Aroadmap)
## 📰 Latest project news
- March 2026: [Agent management](https://github.com/mudler/LocalAI/pull/8820), [New React UI](https://github.com/mudler/LocalAI/pull/8772), [WebRTC](https://github.com/mudler/LocalAI/pull/8790),[MLX-distributed via P2P and RDMA](https://github.com/mudler/LocalAI/pull/8801)
- March 2026: [Agent management](https://github.com/mudler/LocalAI/pull/8820), [New React UI](https://github.com/mudler/LocalAI/pull/8772), [WebRTC](https://github.com/mudler/LocalAI/pull/8790),[MLX-distributed via P2P and RDMA](https://github.com/mudler/LocalAI/pull/8801), [MCP Apps, MCP Client-side](https://github.com/mudler/LocalAI/pull/8947)
- February 2026: [Realtime API for audio-to-audio with tool calling](https://github.com/mudler/LocalAI/pull/6245), [ACE-Step 1.5 support](https://github.com/mudler/LocalAI/pull/8396)
- January 2026: **LocalAI 3.10.0** - Major release with Anthropic API support, Open Responses API for stateful agents, video & image generation suite (LTX-2), unified GPU backends, tool streaming & XML parsing, system-aware backend gallery, crash fixes for AVX-only CPUs and AMD VRAM reporting, request tracing, and new backends: **Moonshine** (ultra-fast transcription), **Pocket-TTS** (lightweight TTS). Vulkan arm64 builds now available. [Release notes](https://github.com/mudler/LocalAI/releases/tag/v3.10.0).
- December 2025: [Dynamic Memory Resource reclaimer](https://github.com/mudler/LocalAI/pull/7583), [Automatic fitting of models to multiple GPUS(llama.cpp)](https://github.com/mudler/LocalAI/pull/7584), [Added Vibevoice backend](https://github.com/mudler/LocalAI/pull/7494)

View File

@@ -5,6 +5,7 @@ import (
"sync"
"github.com/mudler/LocalAI/core/config"
mcpTools "github.com/mudler/LocalAI/core/http/endpoints/mcp"
"github.com/mudler/LocalAI/core/services"
"github.com/mudler/LocalAI/core/templates"
"github.com/mudler/LocalAI/pkg/model"
@@ -29,9 +30,16 @@ type Application struct {
}
func newApplication(appConfig *config.ApplicationConfig) *Application {
ml := model.NewModelLoader(appConfig.SystemState)
// Close MCP sessions when a model is unloaded (watchdog eviction, manual shutdown, etc.)
ml.OnModelUnload(func(modelName string) {
mcpTools.CloseMCPSessions(modelName)
})
return &Application{
backendLoader: config.NewModelConfigLoader(appConfig.SystemState.Model.ModelsPath),
modelLoader: model.NewModelLoader(appConfig.SystemState),
modelLoader: ml,
applicationConfig: appConfig,
templatesEvaluator: templates.NewEvaluator(appConfig.SystemState.Model.ModelsPath),
}

View File

@@ -3,11 +3,13 @@ package anthropic
import (
"encoding/json"
"fmt"
"strings"
"github.com/google/uuid"
"github.com/labstack/echo/v4"
"github.com/mudler/LocalAI/core/backend"
"github.com/mudler/LocalAI/core/config"
mcpTools "github.com/mudler/LocalAI/core/http/endpoints/mcp"
"github.com/mudler/LocalAI/core/http/middleware"
"github.com/mudler/LocalAI/core/schema"
"github.com/mudler/LocalAI/core/templates"
@@ -48,6 +50,92 @@ func MessagesEndpoint(cl *config.ModelConfigLoader, ml *model.ModelLoader, evalu
// Convert Anthropic tools to internal Functions format
funcs, shouldUseFn := convertAnthropicTools(input, cfg)
// MCP injection: prompts, resources, and tools
var mcpToolInfos []mcpTools.MCPToolInfo
mcpServers := mcpTools.MCPServersFromMetadata(input.Metadata)
mcpPromptName, mcpPromptArgs := mcpTools.MCPPromptFromMetadata(input.Metadata)
mcpResourceURIs := mcpTools.MCPResourcesFromMetadata(input.Metadata)
if (len(mcpServers) > 0 || mcpPromptName != "" || len(mcpResourceURIs) > 0) && (cfg.MCP.Servers != "" || cfg.MCP.Stdio != "") {
remote, stdio, mcpErr := cfg.MCP.MCPConfigFromYAML()
if mcpErr == nil {
namedSessions, sessErr := mcpTools.NamedSessionsFromMCPConfig(cfg.Name, remote, stdio, mcpServers)
if sessErr == nil && len(namedSessions) > 0 {
// Prompt injection
if mcpPromptName != "" {
prompts, discErr := mcpTools.DiscoverMCPPrompts(c.Request().Context(), namedSessions)
if discErr == nil {
promptMsgs, getErr := mcpTools.GetMCPPrompt(c.Request().Context(), prompts, mcpPromptName, mcpPromptArgs)
if getErr == nil {
var injected []schema.Message
for _, pm := range promptMsgs {
injected = append(injected, schema.Message{
Role: string(pm.Role),
Content: mcpTools.PromptMessageToText(pm),
})
}
openAIMessages = append(injected, openAIMessages...)
xlog.Debug("Anthropic MCP prompt injected", "prompt", mcpPromptName, "messages", len(injected))
} else {
xlog.Error("Failed to get MCP prompt", "error", getErr)
}
}
}
// Resource injection
if len(mcpResourceURIs) > 0 {
resources, discErr := mcpTools.DiscoverMCPResources(c.Request().Context(), namedSessions)
if discErr == nil {
var resourceTexts []string
for _, uri := range mcpResourceURIs {
content, readErr := mcpTools.ReadMCPResource(c.Request().Context(), resources, uri)
if readErr != nil {
xlog.Error("Failed to read MCP resource", "error", readErr, "uri", uri)
continue
}
name := uri
for _, r := range resources {
if r.URI == uri {
name = r.Name
break
}
}
resourceTexts = append(resourceTexts, fmt.Sprintf("--- MCP Resource: %s ---\n%s", name, content))
}
if len(resourceTexts) > 0 && len(openAIMessages) > 0 {
lastIdx := len(openAIMessages) - 1
suffix := "\n\n" + strings.Join(resourceTexts, "\n\n")
switch ct := openAIMessages[lastIdx].Content.(type) {
case string:
openAIMessages[lastIdx].Content = ct + suffix
default:
openAIMessages[lastIdx].Content = fmt.Sprintf("%v%s", ct, suffix)
}
xlog.Debug("Anthropic MCP resources injected", "count", len(resourceTexts))
}
}
}
// Tool injection
if len(mcpServers) > 0 {
discovered, discErr := mcpTools.DiscoverMCPTools(c.Request().Context(), namedSessions)
if discErr == nil {
mcpToolInfos = discovered
for _, ti := range mcpToolInfos {
funcs = append(funcs, ti.Function)
}
shouldUseFn = len(funcs) > 0 && cfg.ShouldUseFunctions()
xlog.Debug("Anthropic MCP tools injected", "count", len(mcpToolInfos), "total_funcs", len(funcs))
} else {
xlog.Error("Failed to discover MCP tools", "error", discErr)
}
}
}
} else {
xlog.Error("Failed to parse MCP config", "error", mcpErr)
}
}
// Create an OpenAI-compatible request for internal processing
openAIReq := &schema.OpenAIRequest{
PredictionOptions: schema.PredictionOptions{
@@ -88,138 +176,200 @@ func MessagesEndpoint(cl *config.ModelConfigLoader, ml *model.ModelLoader, evalu
xlog.Debug("Anthropic Messages - Prompt (after templating)", "prompt", predInput)
if input.Stream {
return handleAnthropicStream(c, id, input, cfg, ml, cl, appConfig, predInput, openAIReq, funcs, shouldUseFn)
return handleAnthropicStream(c, id, input, cfg, ml, cl, appConfig, predInput, openAIReq, funcs, shouldUseFn, mcpToolInfos, evaluator)
}
return handleAnthropicNonStream(c, id, input, cfg, ml, cl, appConfig, predInput, openAIReq, funcs, shouldUseFn)
return handleAnthropicNonStream(c, id, input, cfg, ml, cl, appConfig, predInput, openAIReq, funcs, shouldUseFn, mcpToolInfos, evaluator)
}
}
func handleAnthropicNonStream(c echo.Context, id string, input *schema.AnthropicRequest, cfg *config.ModelConfig, ml *model.ModelLoader, cl *config.ModelConfigLoader, appConfig *config.ApplicationConfig, predInput string, openAIReq *schema.OpenAIRequest, funcs functions.Functions, shouldUseFn bool) error {
images := []string{}
for _, m := range openAIReq.Messages {
images = append(images, m.StringImages...)
func handleAnthropicNonStream(c echo.Context, id string, input *schema.AnthropicRequest, cfg *config.ModelConfig, ml *model.ModelLoader, cl *config.ModelConfigLoader, appConfig *config.ApplicationConfig, predInput string, openAIReq *schema.OpenAIRequest, funcs functions.Functions, shouldUseFn bool, mcpToolInfos []mcpTools.MCPToolInfo, evaluator *templates.Evaluator) error {
mcpMaxIterations := 10
if cfg.Agent.MaxIterations > 0 {
mcpMaxIterations = cfg.Agent.MaxIterations
}
hasMCPTools := len(mcpToolInfos) > 0
toolsJSON := ""
if len(funcs) > 0 {
openAITools := make([]functions.Tool, len(funcs))
for i, f := range funcs {
openAITools[i] = functions.Tool{Type: "function", Function: f}
for mcpIteration := 0; mcpIteration <= mcpMaxIterations; mcpIteration++ {
// Re-template on each MCP iteration since messages may have changed
if mcpIteration > 0 {
predInput = evaluator.TemplateMessages(*openAIReq, openAIReq.Messages, cfg, funcs, shouldUseFn)
xlog.Debug("Anthropic MCP re-templating", "iteration", mcpIteration, "prompt_len", len(predInput))
}
if toolsBytes, err := json.Marshal(openAITools); err == nil {
toolsJSON = string(toolsBytes)
}
}
toolChoiceJSON := ""
if input.ToolChoice != nil {
if toolChoiceBytes, err := json.Marshal(input.ToolChoice); err == nil {
toolChoiceJSON = string(toolChoiceBytes)
}
}
predFunc, err := backend.ModelInference(
input.Context, predInput, openAIReq.Messages, images, nil, nil, ml, cfg, cl, appConfig, nil, toolsJSON, toolChoiceJSON, nil, nil, nil, input.Metadata)
if err != nil {
xlog.Error("Anthropic model inference failed", "error", err)
return sendAnthropicError(c, 500, "api_error", fmt.Sprintf("model inference failed: %v", err))
}
images := []string{}
for _, m := range openAIReq.Messages {
images = append(images, m.StringImages...)
}
const maxEmptyRetries = 5
var prediction backend.LLMResponse
var result string
for attempt := 0; attempt <= maxEmptyRetries; attempt++ {
prediction, err = predFunc()
if err != nil {
xlog.Error("Anthropic prediction failed", "error", err)
return sendAnthropicError(c, 500, "api_error", fmt.Sprintf("prediction failed: %v", err))
}
result = backend.Finetune(*cfg, predInput, prediction.Response)
if result != "" || !shouldUseFn {
break
}
xlog.Warn("Anthropic: retrying prediction due to empty backend response", "attempt", attempt+1, "maxRetries", maxEmptyRetries)
}
// Try pre-parsed tool calls from C++ autoparser first, fall back to text parsing
var toolCalls []functions.FuncCallResults
if deltaToolCalls := functions.ToolCallsFromChatDeltas(prediction.ChatDeltas); len(deltaToolCalls) > 0 {
xlog.Debug("[ChatDeltas] Anthropic: using pre-parsed tool calls", "count", len(deltaToolCalls))
toolCalls = deltaToolCalls
} else {
xlog.Debug("[ChatDeltas] Anthropic: no pre-parsed tool calls, falling back to Go-side text parsing")
toolCalls = functions.ParseFunctionCall(result, cfg.FunctionsConfig)
}
var contentBlocks []schema.AnthropicContentBlock
var stopReason string
if shouldUseFn && len(toolCalls) > 0 {
// Model wants to use tools
stopReason = "tool_use"
for _, tc := range toolCalls {
// Parse arguments as JSON
var inputArgs map[string]interface{}
if err := json.Unmarshal([]byte(tc.Arguments), &inputArgs); err != nil {
xlog.Warn("Failed to parse tool call arguments as JSON", "error", err, "args", tc.Arguments)
inputArgs = map[string]interface{}{"raw": tc.Arguments}
toolsJSON := ""
if len(funcs) > 0 {
openAITools := make([]functions.Tool, len(funcs))
for i, f := range funcs {
openAITools[i] = functions.Tool{Type: "function", Function: f}
}
if toolsBytes, err := json.Marshal(openAITools); err == nil {
toolsJSON = string(toolsBytes)
}
contentBlocks = append(contentBlocks, schema.AnthropicContentBlock{
Type: "tool_use",
ID: fmt.Sprintf("toolu_%s_%d", id, len(contentBlocks)),
Name: tc.Name,
Input: inputArgs,
})
}
// Add any text content before the tool calls
textContent := functions.ParseTextContent(result, cfg.FunctionsConfig)
if textContent != "" {
// Prepend text block
contentBlocks = append([]schema.AnthropicContentBlock{{Type: "text", Text: textContent}}, contentBlocks...)
toolChoiceJSON := ""
if input.ToolChoice != nil {
if toolChoiceBytes, err := json.Marshal(input.ToolChoice); err == nil {
toolChoiceJSON = string(toolChoiceBytes)
}
}
} else {
// Normal text response
stopReason = "end_turn"
contentBlocks = []schema.AnthropicContentBlock{
{Type: "text", Text: result},
predFunc, err := backend.ModelInference(
input.Context, predInput, openAIReq.Messages, images, nil, nil, ml, cfg, cl, appConfig, nil, toolsJSON, toolChoiceJSON, nil, nil, nil, input.Metadata)
if err != nil {
xlog.Error("Anthropic model inference failed", "error", err)
return sendAnthropicError(c, 500, "api_error", fmt.Sprintf("model inference failed: %v", err))
}
}
resp := &schema.AnthropicResponse{
ID: fmt.Sprintf("msg_%s", id),
Type: "message",
Role: "assistant",
Model: input.Model,
StopReason: &stopReason,
Content: contentBlocks,
Usage: schema.AnthropicUsage{
InputTokens: prediction.Usage.Prompt,
OutputTokens: prediction.Usage.Completion,
},
}
const maxEmptyRetries = 5
var prediction backend.LLMResponse
var result string
for attempt := 0; attempt <= maxEmptyRetries; attempt++ {
prediction, err = predFunc()
if err != nil {
xlog.Error("Anthropic prediction failed", "error", err)
return sendAnthropicError(c, 500, "api_error", fmt.Sprintf("prediction failed: %v", err))
}
result = backend.Finetune(*cfg, predInput, prediction.Response)
if result != "" || !shouldUseFn {
break
}
xlog.Warn("Anthropic: retrying prediction due to empty backend response", "attempt", attempt+1, "maxRetries", maxEmptyRetries)
}
if respData, err := json.Marshal(resp); err == nil {
xlog.Debug("Anthropic Response", "response", string(respData))
}
// Try pre-parsed tool calls from C++ autoparser first, fall back to text parsing
var toolCalls []functions.FuncCallResults
if deltaToolCalls := functions.ToolCallsFromChatDeltas(prediction.ChatDeltas); len(deltaToolCalls) > 0 {
xlog.Debug("[ChatDeltas] Anthropic: using pre-parsed tool calls", "count", len(deltaToolCalls))
toolCalls = deltaToolCalls
} else {
xlog.Debug("[ChatDeltas] Anthropic: no pre-parsed tool calls, falling back to Go-side text parsing")
toolCalls = functions.ParseFunctionCall(result, cfg.FunctionsConfig)
}
return c.JSON(200, resp)
// MCP server-side tool execution: if any tool calls are MCP tools, execute and loop
if hasMCPTools && shouldUseFn && len(toolCalls) > 0 {
var hasMCPCalls bool
for _, tc := range toolCalls {
if mcpTools.IsMCPTool(mcpToolInfos, tc.Name) {
hasMCPCalls = true
break
}
}
if hasMCPCalls {
// Append assistant message with tool_calls to conversation
assistantMsg := schema.Message{
Role: "assistant",
Content: result,
}
for i, tc := range toolCalls {
toolCallID := tc.ID
if toolCallID == "" {
toolCallID = fmt.Sprintf("toolu_%s_%d", id, i)
}
assistantMsg.ToolCalls = append(assistantMsg.ToolCalls, schema.ToolCall{
Index: i,
ID: toolCallID,
Type: "function",
FunctionCall: schema.FunctionCall{
Name: tc.Name,
Arguments: tc.Arguments,
},
})
}
openAIReq.Messages = append(openAIReq.Messages, assistantMsg)
// Execute each MCP tool call and append results
for _, tc := range assistantMsg.ToolCalls {
if !mcpTools.IsMCPTool(mcpToolInfos, tc.FunctionCall.Name) {
continue
}
xlog.Debug("Executing MCP tool (Anthropic)", "tool", tc.FunctionCall.Name, "iteration", mcpIteration)
toolResult, toolErr := mcpTools.ExecuteMCPToolCall(
c.Request().Context(), mcpToolInfos,
tc.FunctionCall.Name, tc.FunctionCall.Arguments,
)
if toolErr != nil {
xlog.Error("MCP tool execution failed", "tool", tc.FunctionCall.Name, "error", toolErr)
toolResult = fmt.Sprintf("Error: %v", toolErr)
}
openAIReq.Messages = append(openAIReq.Messages, schema.Message{
Role: "tool",
Content: toolResult,
StringContent: toolResult,
ToolCallID: tc.ID,
Name: tc.FunctionCall.Name,
})
}
xlog.Debug("Anthropic MCP tools executed, re-running inference", "iteration", mcpIteration)
continue // next MCP iteration
}
}
// No MCP tools to execute, build and return response
var contentBlocks []schema.AnthropicContentBlock
var stopReason string
if shouldUseFn && len(toolCalls) > 0 {
stopReason = "tool_use"
for _, tc := range toolCalls {
var inputArgs map[string]interface{}
if err := json.Unmarshal([]byte(tc.Arguments), &inputArgs); err != nil {
xlog.Warn("Failed to parse tool call arguments as JSON", "error", err, "args", tc.Arguments)
inputArgs = map[string]interface{}{"raw": tc.Arguments}
}
contentBlocks = append(contentBlocks, schema.AnthropicContentBlock{
Type: "tool_use",
ID: fmt.Sprintf("toolu_%s_%d", id, len(contentBlocks)),
Name: tc.Name,
Input: inputArgs,
})
}
textContent := functions.ParseTextContent(result, cfg.FunctionsConfig)
if textContent != "" {
contentBlocks = append([]schema.AnthropicContentBlock{{Type: "text", Text: textContent}}, contentBlocks...)
}
} else {
stopReason = "end_turn"
contentBlocks = []schema.AnthropicContentBlock{
{Type: "text", Text: result},
}
}
resp := &schema.AnthropicResponse{
ID: fmt.Sprintf("msg_%s", id),
Type: "message",
Role: "assistant",
Model: input.Model,
StopReason: &stopReason,
Content: contentBlocks,
Usage: schema.AnthropicUsage{
InputTokens: prediction.Usage.Prompt,
OutputTokens: prediction.Usage.Completion,
},
}
if respData, err := json.Marshal(resp); err == nil {
xlog.Debug("Anthropic Response", "response", string(respData))
}
return c.JSON(200, resp)
} // end MCP iteration loop
return sendAnthropicError(c, 500, "api_error", "MCP iteration limit reached")
}
func handleAnthropicStream(c echo.Context, id string, input *schema.AnthropicRequest, cfg *config.ModelConfig, ml *model.ModelLoader, cl *config.ModelConfigLoader, appConfig *config.ApplicationConfig, predInput string, openAIReq *schema.OpenAIRequest, funcs functions.Functions, shouldUseFn bool) error {
func handleAnthropicStream(c echo.Context, id string, input *schema.AnthropicRequest, cfg *config.ModelConfig, ml *model.ModelLoader, cl *config.ModelConfigLoader, appConfig *config.ApplicationConfig, predInput string, openAIReq *schema.OpenAIRequest, funcs functions.Functions, shouldUseFn bool, mcpToolInfos []mcpTools.MCPToolInfo, evaluator *templates.Evaluator) error {
c.Response().Header().Set("Content-Type", "text/event-stream")
c.Response().Header().Set("Cache-Control", "no-cache")
c.Response().Header().Set("Connection", "keep-alive")
// Create OpenAI messages for inference
openAIMessages := openAIReq.Messages
images := []string{}
for _, m := range openAIMessages {
images = append(images, m.StringImages...)
}
// Send message_start event
messageStart := schema.AnthropicStreamEvent{
Type: "message_start",
@@ -234,159 +384,232 @@ func handleAnthropicStream(c echo.Context, id string, input *schema.AnthropicReq
}
sendAnthropicSSE(c, messageStart)
// Track accumulated content for tool call detection
accumulatedContent := ""
currentBlockIndex := 0
inToolCall := false
toolCallsEmitted := 0
// Send initial content_block_start event
contentBlockStart := schema.AnthropicStreamEvent{
Type: "content_block_start",
Index: currentBlockIndex,
ContentBlock: &schema.AnthropicContentBlock{Type: "text", Text: ""},
mcpMaxIterations := 10
if cfg.Agent.MaxIterations > 0 {
mcpMaxIterations = cfg.Agent.MaxIterations
}
sendAnthropicSSE(c, contentBlockStart)
hasMCPTools := len(mcpToolInfos) > 0
// Stream content deltas
tokenCallback := func(token string, usage backend.TokenUsage) bool {
accumulatedContent += token
// If we're using functions, try to detect tool calls incrementally
if shouldUseFn {
cleanedResult := functions.CleanupLLMResult(accumulatedContent, cfg.FunctionsConfig)
// Try parsing for tool calls
toolCalls := functions.ParseFunctionCall(cleanedResult, cfg.FunctionsConfig)
// If we detected new tool calls and haven't emitted them yet
if len(toolCalls) > toolCallsEmitted {
// Stop the current text block if we were in one
if !inToolCall && currentBlockIndex == 0 {
sendAnthropicSSE(c, schema.AnthropicStreamEvent{
Type: "content_block_stop",
Index: currentBlockIndex,
})
currentBlockIndex++
inToolCall = true
for mcpIteration := 0; mcpIteration <= mcpMaxIterations; mcpIteration++ {
// Re-template on MCP iterations
if mcpIteration > 0 {
predInput = evaluator.TemplateMessages(*openAIReq, openAIReq.Messages, cfg, funcs, shouldUseFn)
xlog.Debug("Anthropic MCP stream re-templating", "iteration", mcpIteration)
}
openAIMessages := openAIReq.Messages
images := []string{}
for _, m := range openAIMessages {
images = append(images, m.StringImages...)
}
// Track accumulated content for tool call detection
accumulatedContent := ""
currentBlockIndex := 0
inToolCall := false
toolCallsEmitted := 0
// Send initial content_block_start event
contentBlockStart := schema.AnthropicStreamEvent{
Type: "content_block_start",
Index: currentBlockIndex,
ContentBlock: &schema.AnthropicContentBlock{Type: "text", Text: ""},
}
sendAnthropicSSE(c, contentBlockStart)
// Collect tool calls for MCP execution
var collectedToolCalls []functions.FuncCallResults
tokenCallback := func(token string, usage backend.TokenUsage) bool {
accumulatedContent += token
if shouldUseFn {
cleanedResult := functions.CleanupLLMResult(accumulatedContent, cfg.FunctionsConfig)
toolCalls := functions.ParseFunctionCall(cleanedResult, cfg.FunctionsConfig)
if len(toolCalls) > toolCallsEmitted {
if !inToolCall && currentBlockIndex == 0 {
sendAnthropicSSE(c, schema.AnthropicStreamEvent{
Type: "content_block_stop",
Index: currentBlockIndex,
})
currentBlockIndex++
inToolCall = true
}
for i := toolCallsEmitted; i < len(toolCalls); i++ {
tc := toolCalls[i]
sendAnthropicSSE(c, schema.AnthropicStreamEvent{
Type: "content_block_start",
Index: currentBlockIndex,
ContentBlock: &schema.AnthropicContentBlock{
Type: "tool_use",
ID: fmt.Sprintf("toolu_%s_%d", id, i),
Name: tc.Name,
},
})
sendAnthropicSSE(c, schema.AnthropicStreamEvent{
Type: "content_block_delta",
Index: currentBlockIndex,
Delta: &schema.AnthropicStreamDelta{
Type: "input_json_delta",
PartialJSON: tc.Arguments,
},
})
sendAnthropicSSE(c, schema.AnthropicStreamEvent{
Type: "content_block_stop",
Index: currentBlockIndex,
})
currentBlockIndex++
}
collectedToolCalls = toolCalls
toolCallsEmitted = len(toolCalls)
return true
}
// Emit new tool calls
for i := toolCallsEmitted; i < len(toolCalls); i++ {
tc := toolCalls[i]
// Send content_block_start for tool_use
sendAnthropicSSE(c, schema.AnthropicStreamEvent{
Type: "content_block_start",
Index: currentBlockIndex,
ContentBlock: &schema.AnthropicContentBlock{
Type: "tool_use",
ID: fmt.Sprintf("toolu_%s_%d", id, i),
Name: tc.Name,
},
})
// Send input_json_delta with the arguments
sendAnthropicSSE(c, schema.AnthropicStreamEvent{
Type: "content_block_delta",
Index: currentBlockIndex,
Delta: &schema.AnthropicStreamDelta{
Type: "input_json_delta",
PartialJSON: tc.Arguments,
},
})
// Send content_block_stop
sendAnthropicSSE(c, schema.AnthropicStreamEvent{
Type: "content_block_stop",
Index: currentBlockIndex,
})
currentBlockIndex++
}
toolCallsEmitted = len(toolCalls)
return true
}
if !inToolCall {
sendAnthropicSSE(c, schema.AnthropicStreamEvent{
Type: "content_block_delta",
Index: 0,
Delta: &schema.AnthropicStreamDelta{
Type: "text_delta",
Text: token,
},
})
}
return true
}
toolsJSON := ""
if len(funcs) > 0 {
openAITools := make([]functions.Tool, len(funcs))
for i, f := range funcs {
openAITools[i] = functions.Tool{Type: "function", Function: f}
}
if toolsBytes, err := json.Marshal(openAITools); err == nil {
toolsJSON = string(toolsBytes)
}
}
// Send regular text delta if not in tool call mode
toolChoiceJSON := ""
if input.ToolChoice != nil {
if toolChoiceBytes, err := json.Marshal(input.ToolChoice); err == nil {
toolChoiceJSON = string(toolChoiceBytes)
}
}
predFunc, err := backend.ModelInference(
input.Context, predInput, openAIMessages, images, nil, nil, ml, cfg, cl, appConfig, tokenCallback, toolsJSON, toolChoiceJSON, nil, nil, nil, input.Metadata)
if err != nil {
xlog.Error("Anthropic stream model inference failed", "error", err)
return sendAnthropicError(c, 500, "api_error", fmt.Sprintf("model inference failed: %v", err))
}
prediction, err := predFunc()
if err != nil {
xlog.Error("Anthropic stream prediction failed", "error", err)
return sendAnthropicError(c, 500, "api_error", fmt.Sprintf("prediction failed: %v", err))
}
// Also check chat deltas for tool calls
if deltaToolCalls := functions.ToolCallsFromChatDeltas(prediction.ChatDeltas); len(deltaToolCalls) > 0 && len(collectedToolCalls) == 0 {
collectedToolCalls = deltaToolCalls
}
// MCP streaming tool execution: if we collected MCP tool calls, execute and loop
if hasMCPTools && len(collectedToolCalls) > 0 {
var hasMCPCalls bool
for _, tc := range collectedToolCalls {
if mcpTools.IsMCPTool(mcpToolInfos, tc.Name) {
hasMCPCalls = true
break
}
}
if hasMCPCalls {
// Append assistant message with tool_calls
assistantMsg := schema.Message{
Role: "assistant",
Content: accumulatedContent,
}
for i, tc := range collectedToolCalls {
toolCallID := tc.ID
if toolCallID == "" {
toolCallID = fmt.Sprintf("toolu_%s_%d", id, i)
}
assistantMsg.ToolCalls = append(assistantMsg.ToolCalls, schema.ToolCall{
Index: i,
ID: toolCallID,
Type: "function",
FunctionCall: schema.FunctionCall{
Name: tc.Name,
Arguments: tc.Arguments,
},
})
}
openAIReq.Messages = append(openAIReq.Messages, assistantMsg)
// Execute MCP tool calls
for _, tc := range assistantMsg.ToolCalls {
if !mcpTools.IsMCPTool(mcpToolInfos, tc.FunctionCall.Name) {
continue
}
xlog.Debug("Executing MCP tool (Anthropic stream)", "tool", tc.FunctionCall.Name, "iteration", mcpIteration)
toolResult, toolErr := mcpTools.ExecuteMCPToolCall(
c.Request().Context(), mcpToolInfos,
tc.FunctionCall.Name, tc.FunctionCall.Arguments,
)
if toolErr != nil {
xlog.Error("MCP tool execution failed", "tool", tc.FunctionCall.Name, "error", toolErr)
toolResult = fmt.Sprintf("Error: %v", toolErr)
}
openAIReq.Messages = append(openAIReq.Messages, schema.Message{
Role: "tool",
Content: toolResult,
StringContent: toolResult,
ToolCallID: tc.ID,
Name: tc.FunctionCall.Name,
})
}
xlog.Debug("Anthropic MCP streaming tools executed, re-running inference", "iteration", mcpIteration)
continue // next MCP iteration
}
}
// No MCP tools to execute, close stream
if !inToolCall {
delta := schema.AnthropicStreamEvent{
Type: "content_block_delta",
sendAnthropicSSE(c, schema.AnthropicStreamEvent{
Type: "content_block_stop",
Index: 0,
Delta: &schema.AnthropicStreamDelta{
Type: "text_delta",
Text: token,
},
}
sendAnthropicSSE(c, delta)
})
}
return true
}
toolsJSON := ""
if len(funcs) > 0 {
openAITools := make([]functions.Tool, len(funcs))
for i, f := range funcs {
openAITools[i] = functions.Tool{Type: "function", Function: f}
stopReason := "end_turn"
if toolCallsEmitted > 0 {
stopReason = "tool_use"
}
if toolsBytes, err := json.Marshal(openAITools); err == nil {
toolsJSON = string(toolsBytes)
}
}
toolChoiceJSON := ""
if input.ToolChoice != nil {
if toolChoiceBytes, err := json.Marshal(input.ToolChoice); err == nil {
toolChoiceJSON = string(toolChoiceBytes)
}
}
predFunc, err := backend.ModelInference(
input.Context, predInput, openAIMessages, images, nil, nil, ml, cfg, cl, appConfig, tokenCallback, toolsJSON, toolChoiceJSON, nil, nil, nil, input.Metadata)
if err != nil {
xlog.Error("Anthropic stream model inference failed", "error", err)
return sendAnthropicError(c, 500, "api_error", fmt.Sprintf("model inference failed: %v", err))
}
sendAnthropicSSE(c, schema.AnthropicStreamEvent{
Type: "message_delta",
Delta: &schema.AnthropicStreamDelta{
StopReason: &stopReason,
},
Usage: &schema.AnthropicUsage{
OutputTokens: prediction.Usage.Completion,
},
})
prediction, err := predFunc()
if err != nil {
xlog.Error("Anthropic stream prediction failed", "error", err)
return sendAnthropicError(c, 500, "api_error", fmt.Sprintf("prediction failed: %v", err))
}
sendAnthropicSSE(c, schema.AnthropicStreamEvent{
Type: "message_stop",
})
// Send content_block_stop event for last block if we didn't close it yet
if !inToolCall {
contentBlockStop := schema.AnthropicStreamEvent{
Type: "content_block_stop",
Index: 0,
}
sendAnthropicSSE(c, contentBlockStop)
}
return nil
} // end MCP iteration loop
// Determine stop reason
stopReason := "end_turn"
if toolCallsEmitted > 0 {
stopReason = "tool_use"
}
// Send message_delta event with stop_reason
messageDelta := schema.AnthropicStreamEvent{
Type: "message_delta",
Delta: &schema.AnthropicStreamDelta{
StopReason: &stopReason,
},
Usage: &schema.AnthropicUsage{
OutputTokens: prediction.Usage.Completion,
},
}
sendAnthropicSSE(c, messageDelta)
// Send message_stop event
messageStop := schema.AnthropicStreamEvent{
// Safety fallback
sendAnthropicSSE(c, schema.AnthropicStreamEvent{
Type: "message_stop",
}
sendAnthropicSSE(c, messageStop)
})
return nil
}

View File

@@ -0,0 +1,108 @@
package localai
import (
"fmt"
"io"
"net/http"
"net/url"
"strings"
"time"
"github.com/labstack/echo/v4"
"github.com/mudler/LocalAI/core/config"
"github.com/mudler/xlog"
)
var corsProxyClient = &http.Client{
Timeout: 10 * time.Minute,
}
// CORSProxyEndpoint proxies HTTP requests to external MCP servers,
// solving CORS issues for browser-based MCP connections.
// The target URL is passed as a query parameter: /api/cors-proxy?url=https://...
func CORSProxyEndpoint(appConfig *config.ApplicationConfig) echo.HandlerFunc {
return func(c echo.Context) error {
targetURL := c.QueryParam("url")
if targetURL == "" {
return c.JSON(http.StatusBadRequest, map[string]string{"error": "missing 'url' query parameter"})
}
parsed, err := url.Parse(targetURL)
if err != nil {
return c.JSON(http.StatusBadRequest, map[string]string{"error": "invalid target URL"})
}
if parsed.Scheme != "http" && parsed.Scheme != "https" {
return c.JSON(http.StatusBadRequest, map[string]string{"error": "only http and https schemes are supported"})
}
xlog.Debug("CORS proxy request", "method", c.Request().Method, "target", targetURL)
proxyReq, err := http.NewRequestWithContext(
c.Request().Context(),
c.Request().Method,
targetURL,
c.Request().Body,
)
if err != nil {
return fmt.Errorf("failed to create proxy request: %w", err)
}
// Copy headers from the original request, excluding hop-by-hop headers
skipHeaders := map[string]bool{
"Host": true, "Connection": true, "Keep-Alive": true,
"Transfer-Encoding": true, "Upgrade": true, "Origin": true,
"Referer": true,
}
for key, values := range c.Request().Header {
if skipHeaders[key] {
continue
}
for _, v := range values {
proxyReq.Header.Add(key, v)
}
}
resp, err := corsProxyClient.Do(proxyReq)
if err != nil {
xlog.Error("CORS proxy request failed", "error", err, "target", targetURL)
return c.JSON(http.StatusBadGateway, map[string]string{"error": "proxy request failed: " + err.Error()})
}
defer resp.Body.Close()
// Copy response headers
for key, values := range resp.Header {
lower := strings.ToLower(key)
// Skip CORS headers — we'll set our own
if strings.HasPrefix(lower, "access-control-") {
continue
}
for _, v := range values {
c.Response().Header().Add(key, v)
}
}
// Set CORS headers to allow browser access
c.Response().Header().Set("Access-Control-Allow-Origin", "*")
c.Response().Header().Set("Access-Control-Allow-Methods", "GET, POST, PUT, DELETE, OPTIONS")
c.Response().Header().Set("Access-Control-Allow-Headers", "*")
c.Response().Header().Set("Access-Control-Expose-Headers", "*")
c.Response().WriteHeader(resp.StatusCode)
// Stream the response body
_, err = io.Copy(c.Response().Writer, resp.Body)
return err
}
}
// CORSProxyOptionsEndpoint handles CORS preflight requests for the proxy.
func CORSProxyOptionsEndpoint() echo.HandlerFunc {
return func(c echo.Context) error {
c.Response().Header().Set("Access-Control-Allow-Origin", "*")
c.Response().Header().Set("Access-Control-Allow-Methods", "GET, POST, PUT, DELETE, OPTIONS")
c.Response().Header().Set("Access-Control-Allow-Headers", "*")
c.Response().Header().Set("Access-Control-Max-Age", "86400")
return c.NoContent(http.StatusNoContent)
}
}

View File

@@ -1,26 +1,19 @@
package localai
import (
"context"
"encoding/json"
"errors"
"fmt"
"net"
"time"
"strings"
"github.com/labstack/echo/v4"
"github.com/mudler/LocalAI/core/config"
mcpTools "github.com/mudler/LocalAI/core/http/endpoints/mcp"
"github.com/mudler/LocalAI/core/http/endpoints/openai"
"github.com/mudler/LocalAI/core/http/middleware"
"github.com/mudler/LocalAI/core/schema"
"github.com/mudler/LocalAI/core/templates"
"github.com/mudler/LocalAI/pkg/model"
"github.com/mudler/cogito"
"github.com/mudler/cogito/clients"
"github.com/mudler/xlog"
)
// MCP SSE Event Types
// MCP SSE Event Types (kept for backward compatibility with MCP endpoint consumers)
type MCPReasoningEvent struct {
Type string `json:"type"`
Content string `json:"content"`
@@ -54,262 +47,53 @@ type MCPErrorEvent struct {
Message string `json:"message"`
}
// MCPEndpoint is the endpoint for MCP chat completions. Supports SSE mode, but it is not compatible with the OpenAI apis.
// @Summary Stream MCP chat completions with reasoning, tool calls, and results
// MCPEndpoint is the endpoint for MCP chat completions.
// It enables all MCP servers for the model and delegates to the standard chat endpoint,
// which handles MCP tool injection and server-side execution.
// Both streaming and non-streaming modes use standard OpenAI response format.
// @Summary MCP chat completions with automatic tool execution
// @Param request body schema.OpenAIRequest true "query params"
// @Success 200 {object} schema.OpenAIResponse "Response"
// @Router /v1/mcp/chat/completions [post]
func MCPEndpoint(cl *config.ModelConfigLoader, ml *model.ModelLoader, evaluator *templates.Evaluator, appConfig *config.ApplicationConfig) echo.HandlerFunc {
chatHandler := openai.ChatEndpoint(cl, ml, evaluator, appConfig)
return func(c echo.Context) error {
ctx := c.Request().Context()
created := int(time.Now().Unix())
// Handle Correlation
id := c.Request().Header.Get("X-Correlation-ID")
if id == "" {
id = fmt.Sprintf("mcp-%d", time.Now().UnixNano())
}
input, ok := c.Get(middleware.CONTEXT_LOCALS_KEY_LOCALAI_REQUEST).(*schema.OpenAIRequest)
if !ok || input.Model == "" {
return echo.ErrBadRequest
}
config, ok := c.Get(middleware.CONTEXT_LOCALS_KEY_MODEL_CONFIG).(*config.ModelConfig)
if !ok || config == nil {
modelConfig, ok := c.Get(middleware.CONTEXT_LOCALS_KEY_MODEL_CONFIG).(*config.ModelConfig)
if !ok || modelConfig == nil {
return echo.ErrBadRequest
}
if config.MCP.Servers == "" && config.MCP.Stdio == "" {
if modelConfig.MCP.Servers == "" && modelConfig.MCP.Stdio == "" {
return fmt.Errorf("no MCP servers configured")
}
// Get MCP config from model config
remote, stdio, err := config.MCP.MCPConfigFromYAML()
if err != nil {
return fmt.Errorf("failed to get MCP config: %w", err)
// Enable all MCP servers if none explicitly specified (preserve original behavior)
if input.Metadata == nil {
input.Metadata = map[string]string{}
}
// Check if we have tools in cache, or we have to have an initial connection
sessions, err := mcpTools.SessionsFromMCPConfig(config.Name, remote, stdio)
if err != nil {
return fmt.Errorf("failed to get MCP sessions: %w", err)
}
if len(sessions) == 0 {
return fmt.Errorf("no working MCP servers found")
}
// Build fragment from messages
fragment := cogito.NewEmptyFragment()
for _, message := range input.Messages {
fragment = fragment.AddMessage(cogito.MessageRole(message.Role), message.StringContent)
}
_, port, err := net.SplitHostPort(appConfig.APIAddress)
if err != nil {
return err
}
apiKey := ""
if len(appConfig.ApiKeys) > 0 {
apiKey = appConfig.ApiKeys[0]
}
ctxWithCancellation, cancel := context.WithCancel(ctx)
defer cancel()
// TODO: instead of connecting to the API, we should just wire this internally
// and act like completion.go.
// We can do this as cogito expects an interface and we can create one that
// we satisfy to just call internally ComputeChoices
defaultLLM := clients.NewLocalAILLM(config.Name, apiKey, "http://127.0.0.1:"+port)
// Build cogito options using the consolidated method
cogitoOpts := config.BuildCogitoOptions()
cogitoOpts = append(
cogitoOpts,
cogito.WithContext(ctxWithCancellation),
cogito.WithMCPs(sessions...),
)
// Check if streaming is requested
toStream := input.Stream
if !toStream {
// Non-streaming mode: execute synchronously and return JSON response
cogitoOpts = append(
cogitoOpts,
cogito.WithStatusCallback(func(s string) {
xlog.Debug("[model agent] Status", "model", config.Name, "status", s)
}),
cogito.WithReasoningCallback(func(s string) {
xlog.Debug("[model agent] Reasoning", "model", config.Name, "reasoning", s)
}),
cogito.WithToolCallBack(func(t *cogito.ToolChoice, state *cogito.SessionState) cogito.ToolCallDecision {
xlog.Debug("[model agent] Tool call", "model", config.Name, "tool", t.Name, "reasoning", t.Reasoning, "arguments", t.Arguments)
return cogito.ToolCallDecision{
Approved: true,
}
}),
cogito.WithToolCallResultCallback(func(t cogito.ToolStatus) {
xlog.Debug("[model agent] Tool call result", "model", config.Name, "tool", t.Name, "result", t.Result, "tool_arguments", t.ToolArguments)
}),
)
f, err := cogito.ExecuteTools(
defaultLLM, fragment,
cogitoOpts...,
)
if err != nil && !errors.Is(err, cogito.ErrNoToolSelected) {
return err
if _, hasMCP := input.Metadata["mcp_servers"]; !hasMCP {
remote, stdio, err := modelConfig.MCP.MCPConfigFromYAML()
if err != nil {
return fmt.Errorf("failed to get MCP config: %w", err)
}
resp := &schema.OpenAIResponse{
ID: id,
Created: created,
Model: input.Model, // we have to return what the user sent here, due to OpenAI spec.
Choices: []schema.Choice{{Message: &schema.Message{Role: "assistant", Content: &f.LastMessage().Content}}},
Object: "chat.completion",
var allServers []string
for name := range remote.Servers {
allServers = append(allServers, name)
}
jsonResult, _ := json.Marshal(resp)
xlog.Debug("Response", "response", string(jsonResult))
// Return the prediction in the response body
return c.JSON(200, resp)
for name := range stdio.Servers {
allServers = append(allServers, name)
}
input.Metadata["mcp_servers"] = strings.Join(allServers, ",")
}
// Streaming mode: use SSE
// Set up SSE headers
c.Response().Header().Set("Content-Type", "text/event-stream")
c.Response().Header().Set("Cache-Control", "no-cache")
c.Response().Header().Set("Connection", "keep-alive")
c.Response().Header().Set("X-Correlation-ID", id)
// Create channel for streaming events
events := make(chan interface{})
ended := make(chan error, 1)
// Set up callbacks for streaming
statusCallback := func(s string) {
events <- MCPStatusEvent{
Type: "status",
Message: s,
}
}
reasoningCallback := func(s string) {
events <- MCPReasoningEvent{
Type: "reasoning",
Content: s,
}
}
toolCallCallback := func(t *cogito.ToolChoice, state *cogito.SessionState) cogito.ToolCallDecision {
events <- MCPToolCallEvent{
Type: "tool_call",
Name: t.Name,
Arguments: t.Arguments,
Reasoning: t.Reasoning,
}
return cogito.ToolCallDecision{
Approved: true,
}
}
toolCallResultCallback := func(t cogito.ToolStatus) {
events <- MCPToolResultEvent{
Type: "tool_result",
Name: t.Name,
Result: t.Result,
}
}
cogitoOpts = append(cogitoOpts,
cogito.WithStatusCallback(statusCallback),
cogito.WithReasoningCallback(reasoningCallback),
cogito.WithToolCallBack(toolCallCallback),
cogito.WithToolCallResultCallback(toolCallResultCallback),
)
// Execute tools in a goroutine
go func() {
defer close(events)
f, err := cogito.ExecuteTools(
defaultLLM, fragment,
cogitoOpts...,
)
if err != nil && !errors.Is(err, cogito.ErrNoToolSelected) {
events <- MCPErrorEvent{
Type: "error",
Message: fmt.Sprintf("Failed to execute tools: %v", err),
}
ended <- err
return
}
// Stream final assistant response
content := f.LastMessage().Content
events <- MCPAssistantEvent{
Type: "assistant",
Content: content,
}
ended <- nil
}()
// Stream events to client
LOOP:
for {
select {
case <-ctx.Done():
// Context was cancelled (client disconnected or request cancelled)
xlog.Debug("Request context cancelled, stopping stream")
cancel()
break LOOP
case event := <-events:
if event == nil {
// Channel closed
break LOOP
}
eventData, err := json.Marshal(event)
if err != nil {
xlog.Debug("Failed to marshal event", "error", err)
continue
}
xlog.Debug("Sending event", "event", string(eventData))
_, err = fmt.Fprintf(c.Response().Writer, "data: %s\n\n", string(eventData))
if err != nil {
xlog.Debug("Sending event failed", "error", err)
cancel()
return err
}
c.Response().Flush()
case err := <-ended:
if err == nil {
// Send done signal
fmt.Fprintf(c.Response().Writer, "data: [DONE]\n\n")
c.Response().Flush()
break LOOP
}
xlog.Error("Stream ended with error", "error", err)
errorEvent := MCPErrorEvent{
Type: "error",
Message: err.Error(),
}
errorData, marshalErr := json.Marshal(errorEvent)
if marshalErr != nil {
fmt.Fprintf(c.Response().Writer, "data: {\"type\":\"error\",\"message\":\"Internal error\"}\n\n")
} else {
fmt.Fprintf(c.Response().Writer, "data: %s\n\n", string(errorData))
}
fmt.Fprintf(c.Response().Writer, "data: [DONE]\n\n")
c.Response().Flush()
return nil
}
}
xlog.Debug("Stream ended")
return nil
// Delegate to the standard chat endpoint which handles MCP tool
// injection and server-side execution for both streaming and non-streaming.
return chatHandler(c)
}
}

View File

@@ -0,0 +1,141 @@
package localai
import (
"fmt"
"github.com/labstack/echo/v4"
"github.com/mudler/LocalAI/core/config"
mcpTools "github.com/mudler/LocalAI/core/http/endpoints/mcp"
)
// MCPPromptsEndpoint returns the list of MCP prompts for a given model.
// GET /v1/mcp/prompts/:model
func MCPPromptsEndpoint(cl *config.ModelConfigLoader, appConfig *config.ApplicationConfig) echo.HandlerFunc {
return func(c echo.Context) error {
modelName := c.Param("model")
if modelName == "" {
return echo.ErrBadRequest
}
cfg, exists := cl.GetModelConfig(modelName)
if !exists {
return fmt.Errorf("model %q not found", modelName)
}
if cfg.MCP.Servers == "" && cfg.MCP.Stdio == "" {
return c.JSON(200, []any{})
}
remote, stdio, err := cfg.MCP.MCPConfigFromYAML()
if err != nil {
return fmt.Errorf("failed to parse MCP config: %w", err)
}
namedSessions, err := mcpTools.NamedSessionsFromMCPConfig(cfg.Name, remote, stdio, nil)
if err != nil {
return fmt.Errorf("failed to get MCP sessions: %w", err)
}
prompts, err := mcpTools.DiscoverMCPPrompts(c.Request().Context(), namedSessions)
if err != nil {
return fmt.Errorf("failed to discover MCP prompts: %w", err)
}
type promptArgJSON struct {
Name string `json:"name"`
Description string `json:"description,omitempty"`
Required bool `json:"required,omitempty"`
}
type promptJSON struct {
Name string `json:"name"`
Description string `json:"description,omitempty"`
Title string `json:"title,omitempty"`
Arguments []promptArgJSON `json:"arguments,omitempty"`
Server string `json:"server"`
}
var result []promptJSON
for _, p := range prompts {
pj := promptJSON{
Name: p.PromptName,
Description: p.Description,
Title: p.Title,
Server: p.ServerName,
}
for _, arg := range p.Arguments {
pj.Arguments = append(pj.Arguments, promptArgJSON{
Name: arg.Name,
Description: arg.Description,
Required: arg.Required,
})
}
result = append(result, pj)
}
return c.JSON(200, result)
}
}
// MCPGetPromptEndpoint expands a prompt by name with the given arguments.
// POST /v1/mcp/prompts/:model/:prompt
func MCPGetPromptEndpoint(cl *config.ModelConfigLoader, appConfig *config.ApplicationConfig) echo.HandlerFunc {
return func(c echo.Context) error {
modelName := c.Param("model")
promptName := c.Param("prompt")
if modelName == "" || promptName == "" {
return echo.ErrBadRequest
}
cfg, exists := cl.GetModelConfig(modelName)
if !exists {
return fmt.Errorf("model %q not found", modelName)
}
if cfg.MCP.Servers == "" && cfg.MCP.Stdio == "" {
return fmt.Errorf("no MCP servers configured for model %q", modelName)
}
var req struct {
Arguments map[string]string `json:"arguments"`
}
if err := c.Bind(&req); err != nil {
return echo.ErrBadRequest
}
remote, stdio, err := cfg.MCP.MCPConfigFromYAML()
if err != nil {
return fmt.Errorf("failed to parse MCP config: %w", err)
}
namedSessions, err := mcpTools.NamedSessionsFromMCPConfig(cfg.Name, remote, stdio, nil)
if err != nil {
return fmt.Errorf("failed to get MCP sessions: %w", err)
}
prompts, err := mcpTools.DiscoverMCPPrompts(c.Request().Context(), namedSessions)
if err != nil {
return fmt.Errorf("failed to discover MCP prompts: %w", err)
}
messages, err := mcpTools.GetMCPPrompt(c.Request().Context(), prompts, promptName, req.Arguments)
if err != nil {
return fmt.Errorf("failed to get prompt: %w", err)
}
type messageJSON struct {
Role string `json:"role"`
Content string `json:"content"`
}
var result []messageJSON
for _, m := range messages {
result = append(result, messageJSON{
Role: string(m.Role),
Content: mcpTools.PromptMessageToText(m),
})
}
return c.JSON(200, map[string]any{
"messages": result,
})
}
}

View File

@@ -0,0 +1,127 @@
package localai
import (
"fmt"
"github.com/labstack/echo/v4"
"github.com/mudler/LocalAI/core/config"
mcpTools "github.com/mudler/LocalAI/core/http/endpoints/mcp"
)
// MCPResourcesEndpoint returns the list of MCP resources for a given model.
// GET /v1/mcp/resources/:model
func MCPResourcesEndpoint(cl *config.ModelConfigLoader, appConfig *config.ApplicationConfig) echo.HandlerFunc {
return func(c echo.Context) error {
modelName := c.Param("model")
if modelName == "" {
return echo.ErrBadRequest
}
cfg, exists := cl.GetModelConfig(modelName)
if !exists {
return fmt.Errorf("model %q not found", modelName)
}
if cfg.MCP.Servers == "" && cfg.MCP.Stdio == "" {
return c.JSON(200, []any{})
}
remote, stdio, err := cfg.MCP.MCPConfigFromYAML()
if err != nil {
return fmt.Errorf("failed to parse MCP config: %w", err)
}
namedSessions, err := mcpTools.NamedSessionsFromMCPConfig(cfg.Name, remote, stdio, nil)
if err != nil {
return fmt.Errorf("failed to get MCP sessions: %w", err)
}
resources, err := mcpTools.DiscoverMCPResources(c.Request().Context(), namedSessions)
if err != nil {
return fmt.Errorf("failed to discover MCP resources: %w", err)
}
type resourceJSON struct {
Name string `json:"name"`
URI string `json:"uri"`
Description string `json:"description,omitempty"`
MIMEType string `json:"mimeType,omitempty"`
Server string `json:"server"`
}
var result []resourceJSON
for _, r := range resources {
result = append(result, resourceJSON{
Name: r.Name,
URI: r.URI,
Description: r.Description,
MIMEType: r.MIMEType,
Server: r.ServerName,
})
}
return c.JSON(200, result)
}
}
// MCPReadResourceEndpoint reads a specific MCP resource by URI.
// POST /v1/mcp/resources/:model/read
func MCPReadResourceEndpoint(cl *config.ModelConfigLoader, appConfig *config.ApplicationConfig) echo.HandlerFunc {
return func(c echo.Context) error {
modelName := c.Param("model")
if modelName == "" {
return echo.ErrBadRequest
}
cfg, exists := cl.GetModelConfig(modelName)
if !exists {
return fmt.Errorf("model %q not found", modelName)
}
if cfg.MCP.Servers == "" && cfg.MCP.Stdio == "" {
return fmt.Errorf("no MCP servers configured for model %q", modelName)
}
var req struct {
URI string `json:"uri"`
}
if err := c.Bind(&req); err != nil || req.URI == "" {
return echo.ErrBadRequest
}
remote, stdio, err := cfg.MCP.MCPConfigFromYAML()
if err != nil {
return fmt.Errorf("failed to parse MCP config: %w", err)
}
namedSessions, err := mcpTools.NamedSessionsFromMCPConfig(cfg.Name, remote, stdio, nil)
if err != nil {
return fmt.Errorf("failed to get MCP sessions: %w", err)
}
resources, err := mcpTools.DiscoverMCPResources(c.Request().Context(), namedSessions)
if err != nil {
return fmt.Errorf("failed to discover MCP resources: %w", err)
}
content, err := mcpTools.ReadMCPResource(c.Request().Context(), resources, req.URI)
if err != nil {
return fmt.Errorf("failed to read resource: %w", err)
}
// Find the resource info for mimeType
mimeType := ""
for _, r := range resources {
if r.URI == req.URI {
mimeType = r.MIMEType
break
}
}
return c.JSON(200, map[string]any{
"uri": req.URI,
"content": content,
"mimeType": mimeType,
})
}
}

View File

@@ -0,0 +1,91 @@
package localai
import (
"fmt"
"github.com/labstack/echo/v4"
"github.com/mudler/LocalAI/core/config"
mcpTools "github.com/mudler/LocalAI/core/http/endpoints/mcp"
"github.com/mudler/LocalAI/core/http/middleware"
)
// MCPServersEndpoint returns the list of MCP servers and their tools for a given model.
// GET /v1/mcp/servers/:model
func MCPServersEndpoint(cl *config.ModelConfigLoader, appConfig *config.ApplicationConfig) echo.HandlerFunc {
return func(c echo.Context) error {
modelName := c.Param("model")
if modelName == "" {
return echo.ErrBadRequest
}
cfg, exists := cl.GetModelConfig(modelName)
if !exists {
return fmt.Errorf("model %q not found", modelName)
}
if cfg.MCP.Servers == "" && cfg.MCP.Stdio == "" {
return c.JSON(200, map[string]any{
"model": modelName,
"servers": []any{},
})
}
remote, stdio, err := cfg.MCP.MCPConfigFromYAML()
if err != nil {
return fmt.Errorf("failed to parse MCP config: %w", err)
}
namedSessions, err := mcpTools.NamedSessionsFromMCPConfig(cfg.Name, remote, stdio, nil)
if err != nil {
return fmt.Errorf("failed to get MCP sessions: %w", err)
}
servers, err := mcpTools.ListMCPServers(c.Request().Context(), namedSessions)
if err != nil {
return fmt.Errorf("failed to list MCP servers: %w", err)
}
return c.JSON(200, map[string]any{
"model": modelName,
"servers": servers,
})
}
}
// MCPServersEndpointFromMiddleware is a version that uses the middleware-resolved model config.
// This allows it to use the same middleware chain as other endpoints.
func MCPServersEndpointFromMiddleware() echo.HandlerFunc {
return func(c echo.Context) error {
cfg, ok := c.Get(middleware.CONTEXT_LOCALS_KEY_MODEL_CONFIG).(*config.ModelConfig)
if !ok || cfg == nil {
return echo.ErrBadRequest
}
if cfg.MCP.Servers == "" && cfg.MCP.Stdio == "" {
return c.JSON(200, map[string]any{
"model": cfg.Name,
"servers": []any{},
})
}
remote, stdio, err := cfg.MCP.MCPConfigFromYAML()
if err != nil {
return fmt.Errorf("failed to parse MCP config: %w", err)
}
namedSessions, err := mcpTools.NamedSessionsFromMCPConfig(cfg.Name, remote, stdio, nil)
if err != nil {
return fmt.Errorf("failed to get MCP sessions: %w", err)
}
servers, err := mcpTools.ListMCPServers(c.Request().Context(), namedSessions)
if err != nil {
return fmt.Errorf("failed to list MCP servers: %w", err)
}
return c.JSON(200, map[string]any{
"model": cfg.Name,
"servers": servers,
})
}
}

View File

@@ -2,32 +2,109 @@ package mcp
import (
"context"
"encoding/json"
"fmt"
"net/http"
"os"
"os/exec"
"strings"
"sync"
"time"
"github.com/mudler/LocalAI/core/config"
"github.com/mudler/LocalAI/pkg/functions"
"github.com/mudler/LocalAI/pkg/signals"
"github.com/modelcontextprotocol/go-sdk/mcp"
"github.com/mudler/xlog"
)
// NamedSession pairs an MCP session with its server name and type.
type NamedSession struct {
Name string
Type string // "remote" or "stdio"
Session *mcp.ClientSession
}
// MCPToolInfo holds a discovered MCP tool along with its origin session.
type MCPToolInfo struct {
ServerName string
ToolName string
Function functions.Function
Session *mcp.ClientSession
}
// MCPServerInfo describes an MCP server and its available tools, prompts, and resources.
type MCPServerInfo struct {
Name string `json:"name"`
Type string `json:"type"`
Tools []string `json:"tools"`
Prompts []string `json:"prompts,omitempty"`
Resources []string `json:"resources,omitempty"`
}
// MCPPromptInfo holds a discovered MCP prompt along with its origin session.
type MCPPromptInfo struct {
ServerName string
PromptName string
Description string
Title string
Arguments []*mcp.PromptArgument
Session *mcp.ClientSession
}
// MCPResourceInfo holds a discovered MCP resource along with its origin session.
type MCPResourceInfo struct {
ServerName string
Name string
URI string
Description string
MIMEType string
Session *mcp.ClientSession
}
type sessionCache struct {
mu sync.Mutex
cache map[string][]*mcp.ClientSession
mu sync.Mutex
cache map[string][]*mcp.ClientSession
cancels map[string]context.CancelFunc
}
type namedSessionCache struct {
mu sync.Mutex
cache map[string][]NamedSession
cancels map[string]context.CancelFunc
}
var (
cache = sessionCache{
cache: make(map[string][]*mcp.ClientSession),
cache: make(map[string][]*mcp.ClientSession),
cancels: make(map[string]context.CancelFunc),
}
namedCache = namedSessionCache{
cache: make(map[string][]NamedSession),
cancels: make(map[string]context.CancelFunc),
}
client = mcp.NewClient(&mcp.Implementation{Name: "LocalAI", Version: "v1.0.0"}, nil)
)
// MCPServersFromMetadata extracts the MCP server list from the metadata map
// and returns the list. The "mcp_servers" key is consumed (deleted from the map)
// so it doesn't leak to the backend.
func MCPServersFromMetadata(metadata map[string]string) []string {
raw, ok := metadata["mcp_servers"]
if !ok || raw == "" {
return nil
}
delete(metadata, "mcp_servers")
servers := strings.Split(raw, ",")
for i := range servers {
servers[i] = strings.TrimSpace(servers[i])
}
return servers
}
func SessionsFromMCPConfig(
name string,
remote config.MCPGenericConfig[config.MCPRemoteServers],
@@ -83,16 +160,461 @@ func SessionsFromMCPConfig(
allSessions = append(allSessions, mcpSession)
}
signals.RegisterGracefulTerminationHandler(func() {
for _, session := range allSessions {
session.Close()
}
cancel()
})
cache.cancels[name] = cancel
return allSessions, nil
}
// NamedSessionsFromMCPConfig returns sessions with their server names preserved.
// If enabledServers is non-empty, only servers with matching names are returned.
func NamedSessionsFromMCPConfig(
name string,
remote config.MCPGenericConfig[config.MCPRemoteServers],
stdio config.MCPGenericConfig[config.MCPSTDIOServers],
enabledServers []string,
) ([]NamedSession, error) {
namedCache.mu.Lock()
defer namedCache.mu.Unlock()
allSessions, exists := namedCache.cache[name]
if !exists {
ctx, cancel := context.WithCancel(context.Background())
for serverName, server := range remote.Servers {
xlog.Debug("[MCP remote server] Configuration", "name", serverName, "server", server)
httpClient := &http.Client{
Timeout: 360 * time.Second,
Transport: newBearerTokenRoundTripper(server.Token, http.DefaultTransport),
}
transport := &mcp.StreamableClientTransport{Endpoint: server.URL, HTTPClient: httpClient}
mcpSession, err := client.Connect(ctx, transport, nil)
if err != nil {
xlog.Error("Failed to connect to MCP server", "error", err, "name", serverName, "url", server.URL)
continue
}
xlog.Debug("[MCP remote server] Connected", "name", serverName, "url", server.URL)
allSessions = append(allSessions, NamedSession{
Name: serverName,
Type: "remote",
Session: mcpSession,
})
}
for serverName, server := range stdio.Servers {
xlog.Debug("[MCP stdio server] Configuration", "name", serverName, "server", server)
command := exec.Command(server.Command, server.Args...)
command.Env = os.Environ()
for key, value := range server.Env {
command.Env = append(command.Env, key+"="+value)
}
transport := &mcp.CommandTransport{Command: command}
mcpSession, err := client.Connect(ctx, transport, nil)
if err != nil {
xlog.Error("Failed to start MCP server", "error", err, "name", serverName, "command", command)
continue
}
xlog.Debug("[MCP stdio server] Connected", "name", serverName, "command", command)
allSessions = append(allSessions, NamedSession{
Name: serverName,
Type: "stdio",
Session: mcpSession,
})
}
namedCache.cache[name] = allSessions
namedCache.cancels[name] = cancel
}
if len(enabledServers) == 0 {
return allSessions, nil
}
enabled := make(map[string]bool, len(enabledServers))
for _, s := range enabledServers {
enabled[s] = true
}
var filtered []NamedSession
for _, ns := range allSessions {
if enabled[ns.Name] {
filtered = append(filtered, ns)
}
}
return filtered, nil
}
// DiscoverMCPTools queries each session for its tools and converts them to functions.Function.
// Deduplicates by tool name (first server wins).
func DiscoverMCPTools(ctx context.Context, sessions []NamedSession) ([]MCPToolInfo, error) {
seen := make(map[string]bool)
var result []MCPToolInfo
for _, ns := range sessions {
toolsResult, err := ns.Session.ListTools(ctx, nil)
if err != nil {
xlog.Error("Failed to list tools from MCP server", "error", err, "server", ns.Name)
continue
}
for _, tool := range toolsResult.Tools {
if seen[tool.Name] {
continue
}
seen[tool.Name] = true
f := functions.Function{
Name: tool.Name,
Description: tool.Description,
}
// Convert InputSchema to map[string]interface{} for functions.Function
if tool.InputSchema != nil {
schemaBytes, err := json.Marshal(tool.InputSchema)
if err == nil {
var params map[string]interface{}
if json.Unmarshal(schemaBytes, &params) == nil {
f.Parameters = params
}
}
}
if f.Parameters == nil {
f.Parameters = map[string]interface{}{
"type": "object",
"properties": map[string]interface{}{},
}
}
result = append(result, MCPToolInfo{
ServerName: ns.Name,
ToolName: tool.Name,
Function: f,
Session: ns.Session,
})
}
}
return result, nil
}
// ExecuteMCPToolCall finds the matching tool and executes it.
func ExecuteMCPToolCall(ctx context.Context, tools []MCPToolInfo, toolName string, arguments string) (string, error) {
var toolInfo *MCPToolInfo
for i := range tools {
if tools[i].ToolName == toolName {
toolInfo = &tools[i]
break
}
}
if toolInfo == nil {
return "", fmt.Errorf("MCP tool %q not found", toolName)
}
var args map[string]any
if arguments != "" {
if err := json.Unmarshal([]byte(arguments), &args); err != nil {
return "", fmt.Errorf("failed to parse arguments for tool %q: %w", toolName, err)
}
}
result, err := toolInfo.Session.CallTool(ctx, &mcp.CallToolParams{
Name: toolName,
Arguments: args,
})
if err != nil {
return "", fmt.Errorf("MCP tool %q call failed: %w", toolName, err)
}
// Extract text content from result
var texts []string
for _, content := range result.Content {
if tc, ok := content.(*mcp.TextContent); ok {
texts = append(texts, tc.Text)
}
}
if len(texts) == 0 {
// Fallback: marshal the whole result
data, _ := json.Marshal(result.Content)
return string(data), nil
}
if len(texts) == 1 {
return texts[0], nil
}
combined, _ := json.Marshal(texts)
return string(combined), nil
}
// ListMCPServers returns server info with tool, prompt, and resource names for each session.
func ListMCPServers(ctx context.Context, sessions []NamedSession) ([]MCPServerInfo, error) {
var result []MCPServerInfo
for _, ns := range sessions {
info := MCPServerInfo{
Name: ns.Name,
Type: ns.Type,
}
toolsResult, err := ns.Session.ListTools(ctx, nil)
if err != nil {
xlog.Error("Failed to list tools from MCP server", "error", err, "server", ns.Name)
} else {
for _, tool := range toolsResult.Tools {
info.Tools = append(info.Tools, tool.Name)
}
}
promptsResult, err := ns.Session.ListPrompts(ctx, nil)
if err != nil {
xlog.Debug("Failed to list prompts from MCP server", "error", err, "server", ns.Name)
} else {
for _, p := range promptsResult.Prompts {
info.Prompts = append(info.Prompts, p.Name)
}
}
resourcesResult, err := ns.Session.ListResources(ctx, nil)
if err != nil {
xlog.Debug("Failed to list resources from MCP server", "error", err, "server", ns.Name)
} else {
for _, r := range resourcesResult.Resources {
info.Resources = append(info.Resources, r.URI)
}
}
result = append(result, info)
}
return result, nil
}
// IsMCPTool checks if a tool name is in the MCP tool list.
func IsMCPTool(tools []MCPToolInfo, name string) bool {
for _, t := range tools {
if t.ToolName == name {
return true
}
}
return false
}
// DiscoverMCPPrompts queries each session for its prompts.
// Deduplicates by prompt name (first server wins).
func DiscoverMCPPrompts(ctx context.Context, sessions []NamedSession) ([]MCPPromptInfo, error) {
seen := make(map[string]bool)
var result []MCPPromptInfo
for _, ns := range sessions {
promptsResult, err := ns.Session.ListPrompts(ctx, nil)
if err != nil {
xlog.Error("Failed to list prompts from MCP server", "error", err, "server", ns.Name)
continue
}
for _, p := range promptsResult.Prompts {
if seen[p.Name] {
continue
}
seen[p.Name] = true
result = append(result, MCPPromptInfo{
ServerName: ns.Name,
PromptName: p.Name,
Description: p.Description,
Title: p.Title,
Arguments: p.Arguments,
Session: ns.Session,
})
}
}
return result, nil
}
// GetMCPPrompt finds and expands a prompt by name using the discovered prompts list.
func GetMCPPrompt(ctx context.Context, prompts []MCPPromptInfo, name string, args map[string]string) ([]*mcp.PromptMessage, error) {
var info *MCPPromptInfo
for i := range prompts {
if prompts[i].PromptName == name {
info = &prompts[i]
break
}
}
if info == nil {
return nil, fmt.Errorf("MCP prompt %q not found", name)
}
result, err := info.Session.GetPrompt(ctx, &mcp.GetPromptParams{
Name: name,
Arguments: args,
})
if err != nil {
return nil, fmt.Errorf("MCP prompt %q get failed: %w", name, err)
}
return result.Messages, nil
}
// DiscoverMCPResources queries each session for its resources.
// Deduplicates by URI (first server wins).
func DiscoverMCPResources(ctx context.Context, sessions []NamedSession) ([]MCPResourceInfo, error) {
seen := make(map[string]bool)
var result []MCPResourceInfo
for _, ns := range sessions {
resourcesResult, err := ns.Session.ListResources(ctx, nil)
if err != nil {
xlog.Error("Failed to list resources from MCP server", "error", err, "server", ns.Name)
continue
}
for _, r := range resourcesResult.Resources {
if seen[r.URI] {
continue
}
seen[r.URI] = true
result = append(result, MCPResourceInfo{
ServerName: ns.Name,
Name: r.Name,
URI: r.URI,
Description: r.Description,
MIMEType: r.MIMEType,
Session: ns.Session,
})
}
}
return result, nil
}
// ReadMCPResource reads a resource by URI from the matching session.
func ReadMCPResource(ctx context.Context, resources []MCPResourceInfo, uri string) (string, error) {
var info *MCPResourceInfo
for i := range resources {
if resources[i].URI == uri {
info = &resources[i]
break
}
}
if info == nil {
return "", fmt.Errorf("MCP resource %q not found", uri)
}
result, err := info.Session.ReadResource(ctx, &mcp.ReadResourceParams{URI: uri})
if err != nil {
return "", fmt.Errorf("MCP resource %q read failed: %w", uri, err)
}
var texts []string
for _, c := range result.Contents {
if c.Text != "" {
texts = append(texts, c.Text)
}
}
return strings.Join(texts, "\n"), nil
}
// MCPPromptFromMetadata extracts the prompt name and arguments from metadata.
// The "mcp_prompt" and "mcp_prompt_args" keys are consumed (deleted from the map).
func MCPPromptFromMetadata(metadata map[string]string) (string, map[string]string) {
name, ok := metadata["mcp_prompt"]
if !ok || name == "" {
return "", nil
}
delete(metadata, "mcp_prompt")
var args map[string]string
if raw, ok := metadata["mcp_prompt_args"]; ok && raw != "" {
json.Unmarshal([]byte(raw), &args)
delete(metadata, "mcp_prompt_args")
}
return name, args
}
// MCPResourcesFromMetadata extracts resource URIs from metadata.
// The "mcp_resources" key is consumed (deleted from the map).
func MCPResourcesFromMetadata(metadata map[string]string) []string {
raw, ok := metadata["mcp_resources"]
if !ok || raw == "" {
return nil
}
delete(metadata, "mcp_resources")
uris := strings.Split(raw, ",")
for i := range uris {
uris[i] = strings.TrimSpace(uris[i])
}
return uris
}
// PromptMessageToText extracts text from a PromptMessage's Content.
func PromptMessageToText(msg *mcp.PromptMessage) string {
if tc, ok := msg.Content.(*mcp.TextContent); ok {
return tc.Text
}
// Fallback: marshal content
data, _ := json.Marshal(msg.Content)
return string(data)
}
// CloseMCPSessions closes all MCP sessions for a given model and removes them from the cache.
// This should be called when a model is unloaded or shut down.
func CloseMCPSessions(modelName string) {
// Close sessions in the unnamed cache
cache.mu.Lock()
if sessions, ok := cache.cache[modelName]; ok {
for _, s := range sessions {
s.Close()
}
delete(cache.cache, modelName)
}
if cancel, ok := cache.cancels[modelName]; ok {
cancel()
delete(cache.cancels, modelName)
}
cache.mu.Unlock()
// Close sessions in the named cache
namedCache.mu.Lock()
if sessions, ok := namedCache.cache[modelName]; ok {
for _, ns := range sessions {
ns.Session.Close()
}
delete(namedCache.cache, modelName)
}
if cancel, ok := namedCache.cancels[modelName]; ok {
cancel()
delete(namedCache.cancels, modelName)
}
namedCache.mu.Unlock()
xlog.Debug("Closed MCP sessions for model", "model", modelName)
}
// CloseAllMCPSessions closes all cached MCP sessions across all models.
// This should be called during graceful shutdown.
func CloseAllMCPSessions() {
cache.mu.Lock()
for name, sessions := range cache.cache {
for _, s := range sessions {
s.Close()
}
if cancel, ok := cache.cancels[name]; ok {
cancel()
}
}
cache.cache = make(map[string][]*mcp.ClientSession)
cache.cancels = make(map[string]context.CancelFunc)
cache.mu.Unlock()
namedCache.mu.Lock()
for name, sessions := range namedCache.cache {
for _, ns := range sessions {
ns.Session.Close()
}
if cancel, ok := namedCache.cancels[name]; ok {
cancel()
}
}
namedCache.cache = make(map[string][]NamedSession)
namedCache.cancels = make(map[string]context.CancelFunc)
namedCache.mu.Unlock()
xlog.Debug("Closed all MCP sessions")
}
func init() {
signals.RegisterGracefulTerminationHandler(func() {
CloseAllMCPSessions()
})
}
// bearerTokenRoundTripper is a custom roundtripper that injects a bearer token
// into HTTP requests
type bearerTokenRoundTripper struct {

View File

@@ -10,6 +10,7 @@ import (
"github.com/labstack/echo/v4"
"github.com/mudler/LocalAI/core/backend"
"github.com/mudler/LocalAI/core/config"
mcpTools "github.com/mudler/LocalAI/core/http/endpoints/mcp"
"github.com/mudler/LocalAI/core/http/middleware"
"github.com/mudler/LocalAI/core/schema"
"github.com/mudler/LocalAI/pkg/functions"
@@ -22,6 +23,37 @@ import (
"github.com/mudler/xlog"
)
// mergeToolCallDeltas merges streaming tool call deltas into complete tool calls.
// In SSE streaming, a single tool call arrives as multiple chunks sharing the same Index:
// the first chunk carries the ID, Type, and Name; subsequent chunks append to Arguments.
func mergeToolCallDeltas(existing []schema.ToolCall, deltas []schema.ToolCall) []schema.ToolCall {
byIndex := make(map[int]int, len(existing)) // tool call Index -> position in slice
for i, tc := range existing {
byIndex[tc.Index] = i
}
for _, d := range deltas {
pos, found := byIndex[d.Index]
if !found {
byIndex[d.Index] = len(existing)
existing = append(existing, d)
continue
}
// Merge into existing entry
tc := &existing[pos]
if d.ID != "" {
tc.ID = d.ID
}
if d.Type != "" {
tc.Type = d.Type
}
if d.FunctionCall.Name != "" {
tc.FunctionCall.Name = d.FunctionCall.Name
}
tc.FunctionCall.Arguments += d.FunctionCall.Arguments
}
return existing
}
// ChatEndpoint is the OpenAI Completion API endpoint https://platform.openai.com/docs/api-reference/chat/create
// @Summary Generate a chat completions for a given prompt and model.
// @Param request body schema.OpenAIRequest true "query params"
@@ -405,6 +437,100 @@ func ChatEndpoint(cl *config.ModelConfigLoader, ml *model.ModelLoader, evaluator
shouldUseFn := len(input.Functions) > 0 && config.ShouldUseFunctions()
strictMode := false
// MCP tool injection: when mcp_servers is set in metadata and model has MCP config
var mcpToolInfos []mcpTools.MCPToolInfo
mcpServers := mcpTools.MCPServersFromMetadata(input.Metadata)
// MCP prompt and resource injection (extracted before tool injection)
mcpPromptName, mcpPromptArgs := mcpTools.MCPPromptFromMetadata(input.Metadata)
mcpResourceURIs := mcpTools.MCPResourcesFromMetadata(input.Metadata)
if (len(mcpServers) > 0 || mcpPromptName != "" || len(mcpResourceURIs) > 0) && (config.MCP.Servers != "" || config.MCP.Stdio != "") {
remote, stdio, mcpErr := config.MCP.MCPConfigFromYAML()
if mcpErr == nil {
namedSessions, sessErr := mcpTools.NamedSessionsFromMCPConfig(config.Name, remote, stdio, mcpServers)
if sessErr == nil && len(namedSessions) > 0 {
// Prompt injection: prepend prompt messages to the conversation
if mcpPromptName != "" {
prompts, discErr := mcpTools.DiscoverMCPPrompts(c.Request().Context(), namedSessions)
if discErr == nil {
promptMsgs, getErr := mcpTools.GetMCPPrompt(c.Request().Context(), prompts, mcpPromptName, mcpPromptArgs)
if getErr == nil {
var injected []schema.Message
for _, pm := range promptMsgs {
injected = append(injected, schema.Message{
Role: string(pm.Role),
Content: mcpTools.PromptMessageToText(pm),
})
}
input.Messages = append(injected, input.Messages...)
xlog.Debug("MCP prompt injected", "prompt", mcpPromptName, "messages", len(injected))
} else {
xlog.Error("Failed to get MCP prompt", "error", getErr)
}
} else {
xlog.Error("Failed to discover MCP prompts", "error", discErr)
}
}
// Resource injection: append resource content to the last user message
if len(mcpResourceURIs) > 0 {
resources, discErr := mcpTools.DiscoverMCPResources(c.Request().Context(), namedSessions)
if discErr == nil {
var resourceTexts []string
for _, uri := range mcpResourceURIs {
content, readErr := mcpTools.ReadMCPResource(c.Request().Context(), resources, uri)
if readErr != nil {
xlog.Error("Failed to read MCP resource", "error", readErr, "uri", uri)
continue
}
// Find resource name
name := uri
for _, r := range resources {
if r.URI == uri {
name = r.Name
break
}
}
resourceTexts = append(resourceTexts, fmt.Sprintf("--- MCP Resource: %s ---\n%s", name, content))
}
if len(resourceTexts) > 0 && len(input.Messages) > 0 {
lastIdx := len(input.Messages) - 1
suffix := "\n\n" + strings.Join(resourceTexts, "\n\n")
switch ct := input.Messages[lastIdx].Content.(type) {
case string:
input.Messages[lastIdx].Content = ct + suffix
default:
input.Messages[lastIdx].Content = fmt.Sprintf("%v%s", ct, suffix)
}
xlog.Debug("MCP resources injected", "count", len(resourceTexts))
}
} else {
xlog.Error("Failed to discover MCP resources", "error", discErr)
}
}
// Tool injection
if len(mcpServers) > 0 {
discovered, discErr := mcpTools.DiscoverMCPTools(c.Request().Context(), namedSessions)
if discErr == nil {
mcpToolInfos = discovered
for _, ti := range mcpToolInfos {
funcs = append(funcs, ti.Function)
input.Tools = append(input.Tools, functions.Tool{Type: "function", Function: ti.Function})
}
shouldUseFn = len(funcs) > 0 && config.ShouldUseFunctions()
xlog.Debug("MCP tools injected", "count", len(mcpToolInfos), "total_funcs", len(funcs))
} else {
xlog.Error("Failed to discover MCP tools", "error", discErr)
}
}
}
} else {
xlog.Error("Failed to parse MCP config", "error", mcpErr)
}
}
xlog.Debug("Tool call routing decision",
"shouldUseFn", shouldUseFn,
"len(input.Functions)", len(input.Functions),
@@ -552,6 +678,19 @@ func ChatEndpoint(cl *config.ModelConfigLoader, ml *model.ModelLoader, evaluator
c.Response().Header().Set("Connection", "keep-alive")
c.Response().Header().Set("X-Correlation-ID", id)
mcpStreamMaxIterations := 10
if config.Agent.MaxIterations > 0 {
mcpStreamMaxIterations = config.Agent.MaxIterations
}
hasMCPToolsStream := len(mcpToolInfos) > 0
for mcpStreamIter := 0; mcpStreamIter <= mcpStreamMaxIterations; mcpStreamIter++ {
// Re-template on MCP iterations
if mcpStreamIter > 0 && !config.TemplateConfig.UseTokenizerTemplate {
predInput = evaluator.TemplateMessages(*input, input.Messages, config, funcs, shouldUseFn)
xlog.Debug("MCP stream re-templating", "iteration", mcpStreamIter)
}
responses := make(chan schema.OpenAIResponse)
ended := make(chan error, 1)
@@ -565,6 +704,8 @@ func ChatEndpoint(cl *config.ModelConfigLoader, ml *model.ModelLoader, evaluator
usage := &schema.OpenAIUsage{}
toolsCalled := false
var collectedToolCalls []schema.ToolCall
var collectedContent string
LOOP:
for {
@@ -582,6 +723,18 @@ func ChatEndpoint(cl *config.ModelConfigLoader, ml *model.ModelLoader, evaluator
usage = &ev.Usage // Copy a pointer to the latest usage chunk so that the stop message can reference it
if len(ev.Choices[0].Delta.ToolCalls) > 0 {
toolsCalled = true
// Collect and merge tool call deltas for MCP execution
if hasMCPToolsStream {
collectedToolCalls = mergeToolCallDeltas(collectedToolCalls, ev.Choices[0].Delta.ToolCalls)
}
}
// Collect content for MCP conversation history
if hasMCPToolsStream && ev.Choices[0].Delta != nil && ev.Choices[0].Delta.Content != nil {
if s, ok := ev.Choices[0].Delta.Content.(string); ok {
collectedContent += s
} else if sp, ok := ev.Choices[0].Delta.Content.(*string); ok && sp != nil {
collectedContent += *sp
}
}
respData, err := json.Marshal(ev)
if err != nil {
@@ -632,6 +785,64 @@ func ChatEndpoint(cl *config.ModelConfigLoader, ml *model.ModelLoader, evaluator
}
}
// MCP streaming tool execution: if we collected MCP tool calls, execute and loop
if hasMCPToolsStream && toolsCalled && len(collectedToolCalls) > 0 {
var hasMCPCalls bool
for _, tc := range collectedToolCalls {
if mcpTools.IsMCPTool(mcpToolInfos, tc.FunctionCall.Name) {
hasMCPCalls = true
break
}
}
if hasMCPCalls {
// Append assistant message with tool_calls
assistantMsg := schema.Message{
Role: "assistant",
Content: collectedContent,
ToolCalls: collectedToolCalls,
}
input.Messages = append(input.Messages, assistantMsg)
// Execute MCP tool calls and stream results as tool_result events
for _, tc := range collectedToolCalls {
if !mcpTools.IsMCPTool(mcpToolInfos, tc.FunctionCall.Name) {
continue
}
xlog.Debug("Executing MCP tool (stream)", "tool", tc.FunctionCall.Name, "iteration", mcpStreamIter)
toolResult, toolErr := mcpTools.ExecuteMCPToolCall(
c.Request().Context(), mcpToolInfos,
tc.FunctionCall.Name, tc.FunctionCall.Arguments,
)
if toolErr != nil {
xlog.Error("MCP tool execution failed", "tool", tc.FunctionCall.Name, "error", toolErr)
toolResult = fmt.Sprintf("Error: %v", toolErr)
}
input.Messages = append(input.Messages, schema.Message{
Role: "tool",
Content: toolResult,
StringContent: toolResult,
ToolCallID: tc.ID,
Name: tc.FunctionCall.Name,
})
// Stream tool result event to client
mcpEvent := map[string]any{
"type": "mcp_tool_result",
"name": tc.FunctionCall.Name,
"result": toolResult,
}
if mcpEventData, err := json.Marshal(mcpEvent); err == nil {
fmt.Fprintf(c.Response().Writer, "data: %s\n\n", mcpEventData)
c.Response().Flush()
}
}
xlog.Debug("MCP streaming tools executed, re-running inference", "iteration", mcpStreamIter)
continue // next MCP stream iteration
}
}
// No MCP tools to execute, send final stop message
finishReason := FinishReasonStop
if toolsCalled && len(input.Tools) > 0 {
finishReason = FinishReasonToolCalls
@@ -659,9 +870,28 @@ func ChatEndpoint(cl *config.ModelConfigLoader, ml *model.ModelLoader, evaluator
c.Response().Flush()
xlog.Debug("Stream ended")
return nil
} // end MCP stream iteration loop
// Safety fallback
fmt.Fprintf(c.Response().Writer, "data: [DONE]\n\n")
c.Response().Flush()
return nil
// no streaming mode
default:
mcpMaxIterations := 10
if config.Agent.MaxIterations > 0 {
mcpMaxIterations = config.Agent.MaxIterations
}
hasMCPTools := len(mcpToolInfos) > 0
for mcpIteration := 0; mcpIteration <= mcpMaxIterations; mcpIteration++ {
// Re-template on each MCP iteration since messages may have changed
if mcpIteration > 0 && !config.TemplateConfig.UseTokenizerTemplate {
predInput = evaluator.TemplateMessages(*input, input.Messages, config, funcs, shouldUseFn)
xlog.Debug("MCP re-templating", "iteration", mcpIteration, "prompt_len", len(predInput))
}
// Detect if thinking token is already in prompt or template
var template string
if config.TemplateConfig.UseTokenizerTemplate {
@@ -839,6 +1069,75 @@ func ChatEndpoint(cl *config.ModelConfigLoader, ml *model.ModelLoader, evaluator
})
}
// MCP server-side tool execution loop:
// If we have MCP tools and the model returned tool_calls, execute MCP tools
// and re-run inference with the results appended to the conversation.
if hasMCPTools && len(result) > 0 {
var mcpCallsExecuted bool
for _, choice := range result {
if choice.Message == nil || len(choice.Message.ToolCalls) == 0 {
continue
}
// Check if any tool calls are MCP tools
var hasMCPCalls bool
for _, tc := range choice.Message.ToolCalls {
if mcpTools.IsMCPTool(mcpToolInfos, tc.FunctionCall.Name) {
hasMCPCalls = true
break
}
}
if !hasMCPCalls {
continue
}
// Append assistant message with tool_calls to conversation
assistantContent := ""
if choice.Message.Content != nil {
if s, ok := choice.Message.Content.(string); ok {
assistantContent = s
} else if sp, ok := choice.Message.Content.(*string); ok && sp != nil {
assistantContent = *sp
}
}
assistantMsg := schema.Message{
Role: "assistant",
Content: assistantContent,
ToolCalls: choice.Message.ToolCalls,
}
input.Messages = append(input.Messages, assistantMsg)
// Execute each MCP tool call and append results
for _, tc := range choice.Message.ToolCalls {
if !mcpTools.IsMCPTool(mcpToolInfos, tc.FunctionCall.Name) {
continue
}
xlog.Debug("Executing MCP tool", "tool", tc.FunctionCall.Name, "arguments", tc.FunctionCall.Arguments, "iteration", mcpIteration)
toolResult, toolErr := mcpTools.ExecuteMCPToolCall(
c.Request().Context(), mcpToolInfos,
tc.FunctionCall.Name, tc.FunctionCall.Arguments,
)
if toolErr != nil {
xlog.Error("MCP tool execution failed", "tool", tc.FunctionCall.Name, "error", toolErr)
toolResult = fmt.Sprintf("Error: %v", toolErr)
}
input.Messages = append(input.Messages, schema.Message{
Role: "tool",
Content: toolResult,
StringContent: toolResult,
ToolCallID: tc.ID,
Name: tc.FunctionCall.Name,
})
mcpCallsExecuted = true
}
}
if mcpCallsExecuted {
xlog.Debug("MCP tools executed, re-running inference", "iteration", mcpIteration, "messages", len(input.Messages))
continue // next MCP iteration
}
}
// No MCP tools to execute (or no MCP tools configured), return response
usage := schema.OpenAIUsage{
PromptTokens: tokenUsage.Prompt,
CompletionTokens: tokenUsage.Completion,
@@ -862,6 +1161,10 @@ func ChatEndpoint(cl *config.ModelConfigLoader, ml *model.ModelLoader, evaluator
// Return the prediction in the response body
return c.JSON(200, resp)
} // end MCP iteration loop
// Should not reach here, but safety fallback
return fmt.Errorf("MCP iteration limit reached")
}
}
}

View File

File diff suppressed because it is too large Load Diff

View File

@@ -308,7 +308,7 @@ func handleWSResponseCreate(connCtx context.Context, conn *lockedConn, input *sc
defer close(processDone)
store.UpdateStatus(responseID, schema.ORStatusInProgress, nil)
finalResponse, bgErr := handleBackgroundStream(reqCtx, store, responseID, createdAt, input, cfg, ml, cl, appConfig, predInput, openAIReq, funcs, shouldUseFn)
finalResponse, bgErr := handleBackgroundStream(reqCtx, store, responseID, createdAt, input, cfg, ml, cl, appConfig, predInput, openAIReq, funcs, shouldUseFn, nil, nil)
if bgErr != nil {
xlog.Error("WebSocket Responses: processing failed", "response_id", responseID, "error", bgErr)
now := time.Now().Unix()

View File

File diff suppressed because it is too large Load Diff

View File

@@ -16,7 +16,9 @@
"highlight.js": "^11.11.1",
"marked": "^15.0.7",
"dompurify": "^3.2.5",
"@fortawesome/fontawesome-free": "^6.7.2"
"@fortawesome/fontawesome-free": "^6.7.2",
"@modelcontextprotocol/sdk": "^1.25.1",
"@modelcontextprotocol/ext-apps": "^1.2.2"
},
"devDependencies": {
"@vitejs/plugin-react": "^4.5.2",

View File

@@ -16,6 +16,10 @@
transition: margin-left var(--duration-normal) var(--ease-default);
}
.sidebar-is-collapsed .main-content {
margin-left: var(--sidebar-width-collapsed);
}
.main-content-inner {
flex: 1;
display: flex;
@@ -136,7 +140,8 @@
z-index: 50;
overflow-y: auto;
box-shadow: var(--shadow-sidebar);
transition: transform var(--duration-normal) var(--ease-default);
transition: width var(--duration-normal) var(--ease-default),
transform var(--duration-normal) var(--ease-default);
}
.sidebar-overlay {
@@ -147,8 +152,9 @@
display: flex;
align-items: center;
justify-content: space-between;
padding: var(--spacing-md);
padding: var(--spacing-sm) var(--spacing-sm);
border-bottom: 1px solid var(--color-border-subtle);
min-height: 44px;
}
.sidebar-logo-link {
@@ -157,11 +163,20 @@
.sidebar-logo-img {
width: 100%;
max-width: 140px;
max-width: 120px;
height: auto;
padding: 0 var(--spacing-xs);
}
.sidebar-logo-icon {
display: none;
}
.sidebar-logo-icon-img {
width: 28px;
height: 28px;
}
.sidebar-close-btn {
display: none;
background: none;
@@ -173,33 +188,37 @@
.sidebar-nav {
flex: 1;
padding: var(--spacing-xs) 0;
padding: 2px 0;
overflow-y: auto;
}
.sidebar-section {
padding: var(--spacing-xs) 0;
padding: 2px 0;
}
.sidebar-section-title {
padding: var(--spacing-sm) var(--spacing-md) var(--spacing-xs);
font-size: 0.6875rem;
padding: var(--spacing-xs) var(--spacing-sm) 2px;
font-size: 0.625rem;
font-weight: 600;
text-transform: uppercase;
letter-spacing: 0.05em;
color: var(--color-text-muted);
white-space: nowrap;
overflow: hidden;
}
.nav-item {
display: flex;
align-items: center;
gap: var(--spacing-sm);
padding: var(--spacing-sm) var(--spacing-md);
padding: 6px var(--spacing-sm);
color: var(--color-text-secondary);
text-decoration: none;
font-size: 0.875rem;
font-size: 0.8125rem;
transition: all var(--duration-fast) var(--ease-default);
border-left: 3px solid transparent;
white-space: nowrap;
overflow: hidden;
}
.nav-item:hover {
@@ -215,17 +234,100 @@
}
.nav-icon {
width: 20px;
width: 18px;
text-align: center;
flex-shrink: 0;
font-size: 0.85rem;
}
.nav-label {
flex: 1;
overflow: hidden;
text-overflow: ellipsis;
}
.nav-external {
font-size: 0.55rem;
margin-left: auto;
opacity: 0.5;
flex-shrink: 0;
}
.sidebar-footer {
padding: var(--spacing-sm) var(--spacing-md);
padding: var(--spacing-xs) var(--spacing-sm);
border-top: 1px solid var(--color-border-subtle);
display: flex;
align-items: center;
justify-content: space-between;
gap: var(--spacing-xs);
}
.sidebar-collapse-btn {
background: none;
border: none;
color: var(--color-text-muted);
cursor: pointer;
padding: 4px;
border-radius: var(--radius-sm);
font-size: 0.75rem;
transition: color var(--duration-fast);
flex-shrink: 0;
}
.sidebar-collapse-btn:hover {
color: var(--color-text-primary);
}
/* Collapsed sidebar (desktop only) */
.sidebar.collapsed {
width: var(--sidebar-width-collapsed);
}
.sidebar.collapsed .sidebar-logo-link {
display: none;
}
.sidebar.collapsed .sidebar-logo-icon {
display: flex;
align-items: center;
justify-content: center;
width: 100%;
}
.sidebar.collapsed .sidebar-header {
justify-content: center;
}
.sidebar.collapsed .nav-label,
.sidebar.collapsed .nav-external,
.sidebar.collapsed .sidebar-section-title {
display: none;
}
.sidebar.collapsed .nav-item {
justify-content: center;
padding: 8px 0;
border-left-width: 2px;
}
.sidebar.collapsed .nav-icon {
width: auto;
font-size: 1rem;
}
.sidebar.collapsed .sidebar-footer {
justify-content: center;
flex-direction: column;
gap: var(--spacing-xs);
}
.sidebar.collapsed .theme-toggle {
padding: 4px;
font-size: 0.75rem;
}
.sidebar.collapsed .theme-toggle .nav-label {
display: none;
}
/* Theme toggle */
@@ -1696,19 +1798,129 @@
border-color: var(--color-primary-border);
color: var(--color-primary);
}
/* Chat MCP toggle switch */
.chat-mcp-switch {
/* Chat MCP dropdown */
.chat-mcp-dropdown {
position: relative;
display: inline-block;
}
.chat-mcp-dropdown .btn {
display: flex;
align-items: center;
gap: 6px;
cursor: pointer;
user-select: none;
gap: 5px;
}
.chat-mcp-switch-label {
font-size: 0.75rem;
font-weight: 500;
.chat-mcp-badge {
display: inline-flex;
align-items: center;
justify-content: center;
min-width: 18px;
height: 18px;
padding: 0 5px;
border-radius: 9px;
background: rgba(255,255,255,0.25);
font-size: 0.7rem;
font-weight: 600;
line-height: 1;
}
.chat-mcp-dropdown-menu {
position: absolute;
top: calc(100% + 4px);
right: 0;
z-index: 100;
min-width: 240px;
max-height: 320px;
overflow-y: auto;
background: var(--color-bg-primary);
border: 1px solid var(--color-border-subtle);
border-radius: var(--radius-md);
box-shadow: var(--shadow-lg);
}
.chat-mcp-dropdown-loading,
.chat-mcp-dropdown-empty {
padding: var(--spacing-sm) var(--spacing-md);
font-size: 0.8125rem;
color: var(--color-text-secondary);
}
.chat-mcp-dropdown-header {
display: flex;
align-items: center;
justify-content: space-between;
padding: var(--spacing-xs) var(--spacing-md);
border-bottom: 1px solid var(--color-border-divider);
font-size: 0.75rem;
font-weight: 600;
color: var(--color-text-secondary);
text-transform: uppercase;
letter-spacing: 0.03em;
}
.chat-mcp-select-all {
background: none;
border: none;
padding: 0;
font-size: 0.75rem;
color: var(--color-accent);
cursor: pointer;
text-transform: none;
letter-spacing: 0;
}
.chat-mcp-select-all:hover {
text-decoration: underline;
}
.chat-mcp-server-item {
display: flex;
align-items: center;
gap: 8px;
padding: var(--spacing-xs) var(--spacing-md);
cursor: pointer;
transition: background 120ms;
}
.chat-mcp-server-item:hover {
background: var(--color-bg-hover);
}
.chat-mcp-server-item input[type="checkbox"] {
flex-shrink: 0;
}
.chat-mcp-server-info {
display: flex;
flex-direction: column;
gap: 1px;
min-width: 0;
}
.chat-mcp-server-name {
font-size: 0.8125rem;
font-weight: 500;
color: var(--color-text-primary);
white-space: nowrap;
overflow: hidden;
text-overflow: ellipsis;
}
.chat-mcp-server-tools {
font-size: 0.7rem;
color: var(--color-text-tertiary);
}
/* Client MCP status indicators */
.chat-client-mcp-status {
display: inline-block;
width: 8px;
height: 8px;
border-radius: 50%;
flex-shrink: 0;
background: var(--color-text-tertiary);
}
.chat-client-mcp-status-connected {
background: #22c55e;
box-shadow: 0 0 4px rgba(34, 197, 94, 0.5);
}
.chat-client-mcp-status-connecting {
background: #f59e0b;
animation: pulse 1s infinite;
}
.chat-client-mcp-status-error {
background: #ef4444;
}
.chat-client-mcp-status-disconnected {
background: var(--color-text-tertiary);
}
/* Chat model info panel */
.chat-model-info-panel {
@@ -2035,7 +2247,8 @@
/* Responsive */
@media (max-width: 1023px) {
.main-content {
.main-content,
.sidebar-is-collapsed .main-content {
margin-left: 0;
}
@@ -2045,6 +2258,11 @@
.sidebar {
transform: translateX(-100%);
width: var(--sidebar-width);
}
.sidebar.collapsed {
width: var(--sidebar-width);
}
.sidebar.open {
@@ -2055,6 +2273,39 @@
display: block;
}
.sidebar-collapse-btn {
display: none;
}
.sidebar.collapsed .nav-label,
.sidebar.collapsed .nav-external,
.sidebar.collapsed .sidebar-section-title {
display: unset;
}
.sidebar.collapsed .sidebar-logo-link {
display: block;
}
.sidebar.collapsed .sidebar-logo-icon {
display: none;
}
.sidebar.collapsed .nav-item {
justify-content: flex-start;
padding: 6px var(--spacing-sm);
border-left-width: 3px;
}
.sidebar.collapsed .nav-icon {
width: 18px;
font-size: 0.85rem;
}
.sidebar.collapsed .sidebar-header {
justify-content: space-between;
}
.sidebar-overlay {
display: block;
position: fixed;
@@ -2388,3 +2639,37 @@
gap: var(--spacing-xs);
}
}
/* MCP App Frame */
.mcp-app-frame-container {
width: 100%;
margin: var(--spacing-sm) 0;
border-radius: var(--border-radius-md);
overflow: hidden;
border: 1px solid var(--color-border-subtle);
}
.mcp-app-iframe {
width: 100%;
border: none;
display: block;
min-height: 100px;
max-height: 600px;
transition: height 0.2s ease;
background: var(--color-bg-primary);
}
.mcp-app-error {
padding: var(--spacing-sm) var(--spacing-md);
color: var(--color-text-danger, #e53e3e);
font-size: 0.85rem;
}
.mcp-app-reconnect-overlay {
padding: var(--spacing-sm);
text-align: center;
font-size: 0.8rem;
color: var(--color-text-secondary);
background: var(--color-bg-secondary);
border-top: 1px solid var(--color-border-subtle);
}

View File

@@ -5,8 +5,13 @@ import OperationsBar from './components/OperationsBar'
import { ToastContainer, useToast } from './components/Toast'
import { systemApi } from './utils/api'
const COLLAPSED_KEY = 'localai_sidebar_collapsed'
export default function App() {
const [sidebarOpen, setSidebarOpen] = useState(false)
const [sidebarCollapsed, setSidebarCollapsed] = useState(() => {
try { return localStorage.getItem(COLLAPSED_KEY) === 'true' } catch (_) { return false }
})
const { toasts, addToast, removeToast } = useToast()
const [version, setVersion] = useState('')
const location = useLocation()
@@ -18,8 +23,20 @@ export default function App() {
.catch(() => {})
}, [])
useEffect(() => {
const handler = (e) => setSidebarCollapsed(e.detail.collapsed)
window.addEventListener('sidebar-collapse', handler)
return () => window.removeEventListener('sidebar-collapse', handler)
}, [])
const layoutClasses = [
'app-layout',
isChatRoute ? 'app-layout-chat' : '',
sidebarCollapsed ? 'sidebar-is-collapsed' : '',
].filter(Boolean).join(' ')
return (
<div className={`app-layout${isChatRoute ? ' app-layout-chat' : ''}`}>
<div className={layoutClasses}>
<Sidebar isOpen={sidebarOpen} onClose={() => setSidebarOpen(false)} />
<main className="main-content">
<OperationsBar />

View File

@@ -0,0 +1,154 @@
import { useState, useEffect, useRef, useCallback } from 'react'
import { loadClientMCPServers, addClientMCPServer, removeClientMCPServer } from '../utils/mcpClientStorage'
export default function ClientMCPDropdown({
activeServerIds = [],
onToggleServer,
onServerAdded,
onServerRemoved,
connectionStatuses = {},
getConnectedTools,
}) {
const [open, setOpen] = useState(false)
const [addDialog, setAddDialog] = useState(false)
const [servers, setServers] = useState(() => loadClientMCPServers())
const [url, setUrl] = useState('')
const [name, setName] = useState('')
const [authToken, setAuthToken] = useState('')
const [useProxy, setUseProxy] = useState(true)
const ref = useRef(null)
useEffect(() => {
if (!open) return
const handleClick = (e) => {
if (ref.current && !ref.current.contains(e.target)) setOpen(false)
}
document.addEventListener('mousedown', handleClick)
return () => document.removeEventListener('mousedown', handleClick)
}, [open])
const handleAdd = useCallback(() => {
if (!url.trim()) return
const headers = {}
if (authToken.trim()) {
headers.Authorization = `Bearer ${authToken.trim()}`
}
const server = addClientMCPServer({ name: name.trim() || undefined, url: url.trim(), headers, useProxy })
setServers(loadClientMCPServers())
setUrl('')
setName('')
setAuthToken('')
setUseProxy(true)
setAddDialog(false)
if (onServerAdded) onServerAdded(server)
}, [url, name, authToken, useProxy, onServerAdded])
const handleRemove = useCallback((id) => {
removeClientMCPServer(id)
setServers(loadClientMCPServers())
if (onServerRemoved) onServerRemoved(id)
}, [onServerRemoved])
const activeCount = activeServerIds.length
return (
<div className="chat-mcp-dropdown" ref={ref}>
<button
type="button"
className={`btn btn-sm ${activeCount > 0 ? 'btn-primary' : 'btn-secondary'}`}
title="Client-side MCP servers (browser connects directly)"
onClick={() => setOpen(!open)}
>
<i className="fas fa-globe" /> Client MCP
{activeCount > 0 && (
<span className="chat-mcp-badge">{activeCount}</span>
)}
</button>
{open && (
<div className="chat-mcp-dropdown-menu" style={{ minWidth: '280px' }}>
<div className="chat-mcp-dropdown-header">
<span>Client MCP Servers</span>
<button className="chat-mcp-select-all" onClick={() => setAddDialog(!addDialog)}>
<i className="fas fa-plus" /> Add
</button>
</div>
{addDialog && (
<div style={{ padding: '8px 10px', borderBottom: '1px solid var(--color-border)' }}>
<input
type="text"
className="input input-sm"
placeholder="Server URL (e.g. https://mcp.example.com/sse)"
value={url}
onChange={e => setUrl(e.target.value)}
style={{ width: '100%', marginBottom: '4px' }}
/>
<input
type="text"
className="input input-sm"
placeholder="Name (optional)"
value={name}
onChange={e => setName(e.target.value)}
style={{ width: '100%', marginBottom: '4px' }}
/>
<input
type="password"
className="input input-sm"
placeholder="Auth token (optional)"
value={authToken}
onChange={e => setAuthToken(e.target.value)}
style={{ width: '100%', marginBottom: '4px' }}
/>
<label style={{ display: 'flex', alignItems: 'center', gap: '6px', fontSize: '0.8rem', marginBottom: '6px' }}>
<input type="checkbox" checked={useProxy} onChange={e => setUseProxy(e.target.checked)} />
Use CORS proxy
</label>
<div style={{ display: 'flex', gap: '4px', justifyContent: 'flex-end' }}>
<button className="btn btn-sm btn-secondary" onClick={() => setAddDialog(false)}>Cancel</button>
<button className="btn btn-sm btn-primary" onClick={handleAdd} disabled={!url.trim()}>Add</button>
</div>
</div>
)}
{servers.length === 0 && !addDialog ? (
<div className="chat-mcp-dropdown-empty">No client MCP servers configured</div>
) : (
servers.map(server => {
const status = connectionStatuses[server.id]?.status || 'disconnected'
const isActive = activeServerIds.includes(server.id)
const connTools = getConnectedTools?.().find(c => c.serverId === server.id)
return (
<label key={server.id} className="chat-mcp-server-item">
<input
type="checkbox"
checked={isActive}
onChange={() => onToggleServer(server.id)}
/>
<div className="chat-mcp-server-info" style={{ flex: 1 }}>
<div style={{ display: 'flex', alignItems: 'center', gap: '6px' }}>
<span className={`chat-client-mcp-status chat-client-mcp-status-${status}`} />
<span className="chat-mcp-server-name">{server.name}</span>
{server.headers?.Authorization && <i className="fas fa-lock" style={{ fontSize: '0.65rem', opacity: 0.5 }} title="Authenticated" />}
</div>
<span className="chat-mcp-server-tools">
{status === 'connecting' ? 'Connecting...' :
status === 'error' ? (connectionStatuses[server.id]?.error || 'Error') :
status === 'connected' && connTools ? `${connTools.tools.length} tools` :
server.url}
</span>
</div>
<button
className="btn btn-sm"
style={{ padding: '2px 6px', fontSize: '0.7rem', color: 'var(--color-error)' }}
onClick={(e) => { e.preventDefault(); e.stopPropagation(); handleRemove(server.id) }}
title="Remove server"
>
<i className="fas fa-trash" />
</button>
</label>
)
})
)}
</div>
)}
</div>
)
}

View File

@@ -0,0 +1,104 @@
import { useRef, useEffect, useState, useCallback } from 'react'
import { AppBridge, PostMessageTransport, buildAllowAttribute } from '@modelcontextprotocol/ext-apps/app-bridge'
export default function MCPAppFrame({ toolName, toolInput, toolResult, mcpClient, toolDefinition: _toolDefinition, appHtml, resourceMeta }) {
const iframeRef = useRef(null)
const bridgeRef = useRef(null)
const [iframeHeight, setIframeHeight] = useState(200)
const [error, setError] = useState(null)
const initializedRef = useRef(false)
const setupBridge = useCallback(async () => {
if (!mcpClient || !iframeRef.current || initializedRef.current) return
const iframe = iframeRef.current
initializedRef.current = true
try {
const transport = new PostMessageTransport(iframe.contentWindow, iframe.contentWindow)
const bridge = new AppBridge(
mcpClient,
{ name: 'LocalAI', version: '1.0.0' },
{ openLinks: {}, serverTools: {}, serverResources: {}, logging: {} },
{ hostContext: { displayMode: 'inline' } }
)
bridge.oninitialized = () => {
if (toolInput) bridge.sendToolInput({ arguments: toolInput })
if (toolResult) bridge.sendToolResult(toolResult)
}
bridge.onsizechange = ({ height }) => {
if (height && height > 0) setIframeHeight(Math.min(height, 600))
}
bridge.onopenlink = async ({ url }) => {
window.open(url, '_blank', 'noopener,noreferrer')
return {}
}
bridge.onmessage = async () => {
return {}
}
bridge.onrequestdisplaymode = async () => {
return { mode: 'inline' }
}
await bridge.connect(transport)
bridgeRef.current = bridge
} catch (err) {
setError(`Bridge error: ${err.message}`)
}
}, [mcpClient, toolInput, toolResult])
const handleIframeLoad = useCallback(() => {
setupBridge()
}, [setupBridge])
// Send toolResult when it arrives after initialization
useEffect(() => {
if (bridgeRef.current && toolResult && initializedRef.current) {
bridgeRef.current.sendToolResult(toolResult)
}
}, [toolResult])
// Cleanup on unmount — only close the local transport, don't send
// teardownResource which would kill server-side state and cause
// "Connection closed" errors if the component remounts (e.g. when
// streaming ends and ActivityGroup takes over from StreamingActivity).
useEffect(() => {
return () => {
const bridge = bridgeRef.current
if (bridge) {
try { bridge.close() } catch (_) { /* ignore */ }
}
}
}, [])
if (!appHtml) return null
const permissions = resourceMeta?.permissions
const allowAttr = permissions ? buildAllowAttribute(permissions) : undefined
return (
<div className="mcp-app-frame-container">
<iframe
ref={iframeRef}
srcDoc={appHtml}
sandbox="allow-scripts allow-forms"
allow={allowAttr}
className="mcp-app-iframe"
style={{ height: `${iframeHeight}px` }}
onLoad={handleIframeLoad}
title={`MCP App: ${toolName || 'unknown'}`}
/>
{error && <div className="mcp-app-error">{error}</div>}
{!mcpClient && (
<div className="mcp-app-reconnect-overlay">
Reconnect to MCP server to interact with this app
</div>
)}
</div>
)
}

View File

@@ -2,6 +2,8 @@ import { useState, useEffect } from 'react'
import { NavLink } from 'react-router-dom'
import ThemeToggle from './ThemeToggle'
const COLLAPSED_KEY = 'localai_sidebar_collapsed'
const mainItems = [
{ path: '/', icon: 'fas fa-home', label: 'Home' },
{ path: '/browse', icon: 'fas fa-download', label: 'Install Models' },
@@ -28,7 +30,7 @@ const systemItems = [
{ path: '/settings', icon: 'fas fa-cog', label: 'Settings' },
]
function NavItem({ item, onClose }) {
function NavItem({ item, onClose, collapsed }) {
return (
<NavLink
to={item.path}
@@ -37,6 +39,7 @@ function NavItem({ item, onClose }) {
`nav-item ${isActive ? 'active' : ''}`
}
onClick={onClose}
title={collapsed ? item.label : undefined}
>
<i className={`${item.icon} nav-icon`} />
<span className="nav-label">{item.label}</span>
@@ -46,20 +49,36 @@ function NavItem({ item, onClose }) {
export default function Sidebar({ isOpen, onClose }) {
const [features, setFeatures] = useState({})
const [collapsed, setCollapsed] = useState(() => {
try { return localStorage.getItem(COLLAPSED_KEY) === 'true' } catch (_) { return false }
})
useEffect(() => {
fetch('/api/features').then(r => r.json()).then(setFeatures).catch(() => {})
}, [])
const toggleCollapse = () => {
setCollapsed(prev => {
const next = !prev
try { localStorage.setItem(COLLAPSED_KEY, String(next)) } catch (_) { /* ignore */ }
window.dispatchEvent(new CustomEvent('sidebar-collapse', { detail: { collapsed: next } }))
return next
})
}
return (
<>
{isOpen && <div className="sidebar-overlay" onClick={onClose} />}
<aside className={`sidebar ${isOpen ? 'open' : ''}`}>
<aside className={`sidebar ${isOpen ? 'open' : ''} ${collapsed ? 'collapsed' : ''}`}>
{/* Logo */}
<div className="sidebar-header">
<a href="./" className="sidebar-logo-link">
<img src="/static/logo_horizontal.png" alt="LocalAI" className="sidebar-logo-img" />
</a>
<a href="./" className="sidebar-logo-icon" title="LocalAI">
<img src="/static/logo.png" alt="LocalAI" className="sidebar-logo-icon-img" />
</a>
<button className="sidebar-close-btn" onClick={onClose} aria-label="Close menu">
<i className="fas fa-times" />
</button>
@@ -70,7 +89,7 @@ export default function Sidebar({ isOpen, onClose }) {
{/* Main section */}
<div className="sidebar-section">
{mainItems.map(item => (
<NavItem key={item.path} item={item} onClose={onClose} />
<NavItem key={item.path} item={item} onClose={onClose} collapsed={collapsed} />
))}
</div>
@@ -79,7 +98,7 @@ export default function Sidebar({ isOpen, onClose }) {
<div className="sidebar-section">
<div className="sidebar-section-title">Agents</div>
{agentItems.filter(item => !item.feature || features[item.feature] !== false).map(item => (
<NavItem key={item.path} item={item} onClose={onClose} />
<NavItem key={item.path} item={item} onClose={onClose} collapsed={collapsed} />
))}
</div>
)}
@@ -92,13 +111,14 @@ export default function Sidebar({ isOpen, onClose }) {
target="_blank"
rel="noopener noreferrer"
className="nav-item"
title={collapsed ? 'API' : undefined}
>
<i className="fas fa-code nav-icon" />
<span className="nav-label">API</span>
<i className="fas fa-external-link-alt" style={{ fontSize: '0.6rem', marginLeft: 'auto', opacity: 0.5 }} />
<i className="fas fa-external-link-alt nav-external" />
</a>
{systemItems.map(item => (
<NavItem key={item.path} item={item} onClose={onClose} />
<NavItem key={item.path} item={item} onClose={onClose} collapsed={collapsed} />
))}
</div>
</nav>
@@ -106,6 +126,13 @@ export default function Sidebar({ isOpen, onClose }) {
{/* Footer */}
<div className="sidebar-footer">
<ThemeToggle />
<button
className="sidebar-collapse-btn"
onClick={toggleCollapse}
title={collapsed ? 'Expand sidebar' : 'Collapse sidebar'}
>
<i className={`fas fa-chevron-${collapsed ? 'right' : 'left'}`} />
</button>
</div>
</aside>
</>

View File

@@ -52,6 +52,8 @@ function saveChats(chats, activeChatId) {
history: chat.history,
systemPrompt: chat.systemPrompt,
mcpMode: chat.mcpMode,
mcpServers: chat.mcpServers,
clientMCPServers: chat.clientMCPServers,
temperature: chat.temperature,
topP: chat.topP,
topK: chat.topK,
@@ -79,6 +81,9 @@ function createNewChat(model = '', systemPrompt = '', mcpMode = false) {
history: [],
systemPrompt,
mcpMode,
mcpServers: [],
mcpResources: [],
clientMCPServers: [],
temperature: null,
topP: null,
topK: null,
@@ -256,8 +261,28 @@ export function useChat(initialModel = '') {
if (topK !== null && topK !== undefined) requestBody.top_k = topK
if (contextSize) requestBody.max_tokens = contextSize
// Choose endpoint
const endpoint = activeChat.mcpMode
// MCP: send selected servers via metadata so the backend activates them
const hasMcpServers = activeChat.mcpServers && activeChat.mcpServers.length > 0
if (hasMcpServers) {
if (!requestBody.metadata) requestBody.metadata = {}
requestBody.metadata.mcp_servers = activeChat.mcpServers.join(',')
}
// MCP: send selected resource URIs via metadata
const hasMcpResources = activeChat.mcpResources && activeChat.mcpResources.length > 0
if (hasMcpResources) {
if (!requestBody.metadata) requestBody.metadata = {}
requestBody.metadata.mcp_resources = activeChat.mcpResources.join(',')
}
// Client-side MCP: inject tools into request body
if (options.clientMCPTools && options.clientMCPTools.length > 0) {
requestBody.tools = [...(requestBody.tools || []), ...options.clientMCPTools]
}
// Use MCP endpoint only for legacy mcpMode without specific servers selected
// (the MCP endpoint auto-enables all servers)
const endpoint = (activeChat.mcpMode && !hasMcpServers)
? API_CONFIG.endpoints.mcpChatCompletions
: API_CONFIG.endpoints.chatCompletions
@@ -277,8 +302,8 @@ export function useChat(initialModel = '') {
let usage = {}
const newMessages = [] // Accumulate messages to add to history
if (activeChat.mcpMode) {
// MCP SSE streaming
if (activeChat.mcpMode && !hasMcpServers) {
// Legacy MCP SSE streaming (custom event types from /v1/mcp/chat/completions)
try {
const timeoutId = setTimeout(() => controller.abort(), 300000) // 5 min timeout
const response = await fetch(endpoint, {
@@ -407,122 +432,250 @@ export function useChat(initialModel = '') {
}
}
} else {
// Regular SSE streaming
let rawContent = ''
let reasoningContent = ''
let hasReasoningFromAPI = false
let insideThinkTag = false
// Regular SSE streaming with client-side agentic loop support
const maxToolTurns = options.maxToolTurns || 10
let turnCount = 0
let loopMessages = [...messages]
let loopBody = { ...requestBody }
try {
const response = await fetch(endpoint, {
method: 'POST',
headers: { 'Content-Type': 'application/json' },
body: JSON.stringify(requestBody),
signal: controller.signal,
})
// Outer loop: re-sends when client-side tool calls are detected
let continueLoop = true
while (continueLoop) {
continueLoop = false
if (!response.ok) {
throw new Error(`HTTP ${response.status}`)
}
let rawContent = ''
let reasoningContent = ''
let hasReasoningFromAPI = false
let insideThinkTag = false
let currentToolCalls = []
let finishReason = null
let fullToolCalls = [] // Tool calls with id for agentic loop
const reader = response.body.getReader()
const decoder = new TextDecoder()
let buffer = ''
try {
const response = await fetch(endpoint, {
method: 'POST',
headers: { 'Content-Type': 'application/json' },
body: JSON.stringify(loopBody),
signal: controller.signal,
})
while (true) {
const { done, value } = await reader.read()
if (done) break
if (!response.ok) {
throw new Error(`HTTP ${response.status}`)
}
buffer += decoder.decode(value, { stream: true })
const lines = buffer.split('\n')
buffer = lines.pop() || ''
const reader = response.body.getReader()
const decoder = new TextDecoder()
let buffer = ''
for (const line of lines) {
const trimmed = line.trim()
if (!trimmed || !trimmed.startsWith('data: ')) continue
const data = trimmed.slice(6)
if (data === '[DONE]') continue
while (true) {
const { done, value } = await reader.read()
if (done) break
try {
const parsed = JSON.parse(data)
const delta = parsed?.choices?.[0]?.delta
buffer += decoder.decode(value, { stream: true })
const lines = buffer.split('\n')
buffer = lines.pop() || ''
// Handle reasoning field from API
if (delta?.reasoning) {
hasReasoningFromAPI = true
reasoningContent += delta.reasoning
tokenCountRef.current++
setStreamingReasoning(reasoningContent)
updateTps()
}
for (const line of lines) {
const trimmed = line.trim()
if (!trimmed || !trimmed.startsWith('data: ')) continue
const data = trimmed.slice(6)
if (data === '[DONE]') continue
if (delta?.content) {
rawContent += delta.content
tokenCountRef.current++
try {
const parsed = JSON.parse(data)
if (!hasReasoningFromAPI) {
// Check thinking tags
if (openThinkTagRegex.test(rawContent) && !closeThinkTagRegex.test(rawContent)) {
insideThinkTag = true
}
if (insideThinkTag && closeThinkTagRegex.test(rawContent)) {
insideThinkTag = false
}
// Handle MCP tool result events
if (parsed?.type === 'mcp_tool_result') {
currentToolCalls.push({
type: 'tool_result',
name: parsed.name || 'tool',
result: parsed.result || '',
})
setStreamingToolCalls([...currentToolCalls.filter(Boolean)])
continue
}
const { regularContent, thinkingContent } = extractThinking(rawContent)
if (thinkingContent) {
reasoningContent = thinkingContent
}
const choice = parsed?.choices?.[0]
const delta = choice?.delta
if (insideThinkTag) {
const lastOpen = Math.max(rawContent.lastIndexOf('<thinking>'), rawContent.lastIndexOf('<think>'))
if (lastOpen >= 0) {
const partial = rawContent.slice(lastOpen).replace(/<thinking>|<think>/, '')
setStreamingReasoning(partial)
// Only show content before the unclosed think tag (with prior complete pairs removed)
const beforeThink = rawContent.slice(0, lastOpen)
const { regularContent: contentBeforeThink } = extractThinking(beforeThink)
setStreamingContent(contentBeforeThink)
// Track finish_reason
if (choice?.finish_reason) {
finishReason = choice.finish_reason
}
// Handle reasoning field from API
if (delta?.reasoning) {
hasReasoningFromAPI = true
reasoningContent += delta.reasoning
tokenCountRef.current++
setStreamingReasoning(reasoningContent)
updateTps()
}
// Handle tool call deltas
if (delta?.tool_calls) {
for (const tc of delta.tool_calls) {
const idx = tc.index ?? 0
if (!currentToolCalls[idx]) {
currentToolCalls[idx] = {
type: 'tool_call',
name: tc.function?.name || '',
arguments: tc.function?.arguments || '',
}
fullToolCalls[idx] = {
id: tc.id || `call_${idx}`,
type: 'function',
function: { name: tc.function?.name || '', arguments: tc.function?.arguments || '' },
}
} else {
if (tc.function?.name) {
currentToolCalls[idx].name = tc.function.name
fullToolCalls[idx].function.name = tc.function.name
}
if (tc.function?.arguments) {
currentToolCalls[idx].arguments += tc.function.arguments
fullToolCalls[idx].function.arguments += tc.function.arguments
}
if (tc.id) fullToolCalls[idx].id = tc.id
}
}
setStreamingToolCalls([...currentToolCalls.filter(Boolean)])
}
if (delta?.content) {
rawContent += delta.content
tokenCountRef.current++
if (!hasReasoningFromAPI) {
if (openThinkTagRegex.test(rawContent) && !closeThinkTagRegex.test(rawContent)) {
insideThinkTag = true
}
if (insideThinkTag && closeThinkTagRegex.test(rawContent)) {
insideThinkTag = false
}
const { regularContent, thinkingContent } = extractThinking(rawContent)
if (thinkingContent) {
reasoningContent = thinkingContent
}
if (insideThinkTag) {
const lastOpen = Math.max(rawContent.lastIndexOf('<thinking>'), rawContent.lastIndexOf('<think>'))
if (lastOpen >= 0) {
const partial = rawContent.slice(lastOpen).replace(/<thinking>|<think>/, '')
setStreamingReasoning(partial)
const beforeThink = rawContent.slice(0, lastOpen)
const { regularContent: contentBeforeThink } = extractThinking(beforeThink)
setStreamingContent(contentBeforeThink)
} else {
setStreamingContent(regularContent)
}
} else {
setStreamingReasoning(reasoningContent)
setStreamingContent(regularContent)
}
} else {
setStreamingReasoning(reasoningContent)
setStreamingContent(regularContent)
setStreamingContent(rawContent)
}
} else {
setStreamingContent(rawContent)
}
updateTps()
updateTps()
}
if (parsed?.usage) {
usage = parsed.usage
}
} catch (_e) {
// skip malformed JSON
}
if (parsed?.usage) {
usage = parsed.usage
}
} catch (_e) {
// skip malformed JSON
}
}
} catch (err) {
if (err.name !== 'AbortError') {
rawContent += `\n\nError: ${err.message}`
}
}
} catch (err) {
if (err.name !== 'AbortError') {
rawContent += `\n\nError: ${err.message}`
// Client-side agentic loop: check for client tool calls
const validToolCalls = fullToolCalls.filter(Boolean)
const hasClientToolCalls = (
(finishReason === 'tool_calls' || finishReason === 'stop' && validToolCalls.length > 0) &&
validToolCalls.length > 0 &&
options.isClientTool &&
options.executeTool &&
turnCount < maxToolTurns
)
const clientCalls = hasClientToolCalls
? validToolCalls.filter(tc => options.isClientTool(tc.function?.name))
: []
if (clientCalls.length > 0) {
// Add tool calls to streaming display
for (const tc of clientCalls) {
newMessages.push({
role: 'tool_call',
content: JSON.stringify({ type: 'tool_call', name: tc.function.name, arguments: tc.function.arguments }, null, 2),
expanded: false,
})
}
// Build assistant message with tool_calls for conversation
const assistantMsg = {
role: 'assistant',
content: rawContent || null,
tool_calls: validToolCalls,
}
loopMessages.push(assistantMsg)
// Execute each client-side tool
for (const tc of clientCalls) {
const result = await options.executeTool(tc.function.name, tc.function.arguments)
const toolResultMsg = { role: 'tool', tool_call_id: tc.id, content: result }
loopMessages.push(toolResultMsg)
// Check for MCP App UI
let appUI = null
if (options.getToolAppUI) {
let parsedArgs
try {
parsedArgs = typeof tc.function.arguments === 'string'
? JSON.parse(tc.function.arguments) : tc.function.arguments
} catch (_) { parsedArgs = {} }
appUI = await options.getToolAppUI(tc.function.name, parsedArgs, result)
}
// Show result in UI
newMessages.push({
role: 'tool_result',
content: JSON.stringify({ type: 'tool_result', name: tc.function.name, result }, null, 2),
expanded: false,
appUI,
})
currentToolCalls.push({ type: 'tool_result', name: tc.function.name, result, appUI })
setStreamingToolCalls([...currentToolCalls.filter(Boolean)])
}
// Re-send with updated messages
loopBody = { ...requestBody, messages: loopMessages, stream: true }
setStreamingContent('')
turnCount++
continueLoop = true
continue
}
}
// Determine final content
let finalContent = rawContent
if (!hasReasoningFromAPI) {
const { regularContent, thinkingContent } = extractThinking(rawContent)
finalContent = regularContent
if (thinkingContent && !reasoningContent) reasoningContent = thinkingContent
}
// No more client tool calls — finalize
let finalContent = rawContent
if (!hasReasoningFromAPI) {
const { regularContent, thinkingContent } = extractThinking(rawContent)
finalContent = regularContent
if (thinkingContent && !reasoningContent) reasoningContent = thinkingContent
}
if (reasoningContent) {
newMessages.push({ role: 'thinking', content: reasoningContent, expanded: true })
}
if (finalContent) {
newMessages.push({ role: 'assistant', content: finalContent })
if (reasoningContent) {
newMessages.push({ role: 'thinking', content: reasoningContent, expanded: true })
}
if (finalContent) {
newMessages.push({ role: 'assistant', content: finalContent })
}
}
}
@@ -582,6 +735,17 @@ export function useChat(initialModel = '') {
const isActiveStreaming = isStreaming && streamingChatId === activeChatId
const addMessage = useCallback((chatId, message) => {
setChats(prev => prev.map(c => {
if (c.id !== chatId) return c
return {
...c,
history: [...c.history, { ...message, timestamp: Date.now() }],
updatedAt: Date.now(),
}
}))
}, [])
return {
chats,
activeChat,
@@ -603,5 +767,6 @@ export function useChat(initialModel = '') {
stopGeneration,
clearHistory,
getContextUsagePercent,
addMessage,
}
}

View File

@@ -0,0 +1,244 @@
import { useState, useRef, useCallback } from 'react'
import { Client } from '@modelcontextprotocol/sdk/client/index.js'
import { StreamableHTTPClientTransport } from '@modelcontextprotocol/sdk/client/streamableHttp.js'
import { SSEClientTransport } from '@modelcontextprotocol/sdk/client/sse.js'
import { getToolUiResourceUri, isToolVisibilityAppOnly } from '@modelcontextprotocol/ext-apps/app-bridge'
import { API_CONFIG } from '../utils/config'
function buildProxyUrl(targetUrl, useProxy = true) {
if (!useProxy) return new URL(targetUrl)
const base = window.location.origin
return new URL(`${base}${API_CONFIG.endpoints.corsProxy}?url=${encodeURIComponent(targetUrl)}`)
}
export function useMCPClient() {
const connectionsRef = useRef(new Map())
const toolIndexRef = useRef(new Map())
const [connectionStatuses, setConnectionStatuses] = useState({})
const updateStatus = useCallback((serverId, status, error = null) => {
setConnectionStatuses(prev => ({ ...prev, [serverId]: { status, error } }))
}, [])
const connect = useCallback(async (serverConfig) => {
const { id, url, headers = {}, useProxy = true } = serverConfig
if (connectionsRef.current.has(id)) return
updateStatus(id, 'connecting')
const proxyUrl = buildProxyUrl(url, useProxy)
const transportHeaders = { ...headers }
let client = null
let transport = null
// Try StreamableHTTP first, then SSE fallback
for (const TransportClass of [StreamableHTTPClientTransport, SSEClientTransport]) {
try {
transport = new TransportClass(proxyUrl, { requestInit: { headers: transportHeaders } })
client = new Client({ name: 'LocalAI-WebUI', version: '1.0.0' })
await client.connect(transport)
break
} catch (err) {
client = null
transport = null
if (TransportClass === SSEClientTransport) {
updateStatus(id, 'error', err.message)
return
}
}
}
if (!client) {
updateStatus(id, 'error', 'Failed to connect with any transport')
return
}
try {
const { tools = [] } = await client.listTools()
// Remove old tool index entries for this server
for (const [toolName, sId] of toolIndexRef.current) {
if (sId === id) toolIndexRef.current.delete(toolName)
}
for (const tool of tools) {
toolIndexRef.current.set(tool.name, id)
}
connectionsRef.current.set(id, { client, transport, tools, serverConfig })
updateStatus(id, 'connected')
} catch (err) {
try { await client.close() } catch (_) { /* ignore */ }
updateStatus(id, 'error', err.message)
}
}, [updateStatus])
const disconnect = useCallback(async (serverId) => {
const conn = connectionsRef.current.get(serverId)
if (!conn) return
// Remove tool index entries
for (const [toolName, sId] of toolIndexRef.current) {
if (sId === serverId) toolIndexRef.current.delete(toolName)
}
try { await conn.client.close() } catch (_) { /* ignore */ }
connectionsRef.current.delete(serverId)
updateStatus(serverId, 'disconnected')
}, [updateStatus])
const disconnectAll = useCallback(async () => {
const ids = [...connectionsRef.current.keys()]
for (const id of ids) {
await disconnect(id)
}
}, [disconnect])
const getToolsForLLM = useCallback(() => {
const tools = []
for (const [, conn] of connectionsRef.current) {
for (const tool of conn.tools) {
if (isToolVisibilityAppOnly(tool)) continue
tools.push({
type: 'function',
function: {
name: tool.name,
description: tool.description || '',
parameters: tool.inputSchema || { type: 'object', properties: {} },
},
})
}
}
return tools
}, [])
const isClientTool = useCallback((toolName) => {
return toolIndexRef.current.has(toolName)
}, [])
const executeTool = useCallback(async (toolName, argumentsJson) => {
const serverId = toolIndexRef.current.get(toolName)
if (!serverId) return `Error: no MCP server found for tool "${toolName}"`
const conn = connectionsRef.current.get(serverId)
if (!conn) return `Error: server not connected for tool "${toolName}"`
let args
try {
args = typeof argumentsJson === 'string' ? JSON.parse(argumentsJson) : argumentsJson
} catch (_) {
args = {}
}
try {
const result = await conn.client.callTool({ name: toolName, arguments: args })
return formatToolResult(result)
} catch (err) {
// Session might have expired — try reconnecting once
if (err.message?.includes('404') || err.message?.includes('session')) {
try {
await disconnect(serverId)
await connect(conn.serverConfig)
const newConn = connectionsRef.current.get(serverId)
if (newConn) {
const result = await newConn.client.callTool({ name: toolName, arguments: args })
return formatToolResult(result)
}
} catch (retryErr) {
return `Error executing tool "${toolName}": ${retryErr.message}`
}
}
return `Error executing tool "${toolName}": ${err.message}`
}
}, [connect, disconnect])
const getConnectedTools = useCallback(() => {
const result = []
for (const [serverId, conn] of connectionsRef.current) {
result.push({
serverId,
serverName: conn.serverConfig.name,
tools: conn.tools.map(t => t.name),
})
}
return result
}, [])
const findToolAndConnection = useCallback((toolName) => {
const serverId = toolIndexRef.current.get(toolName)
if (!serverId) return null
const conn = connectionsRef.current.get(serverId)
if (!conn) return null
const tool = conn.tools.find(t => t.name === toolName)
if (!tool) return null
return { tool, conn }
}, [])
const hasAppUI = useCallback((toolName) => {
const found = findToolAndConnection(toolName)
if (!found) return false
return !!getToolUiResourceUri(found.tool)
}, [findToolAndConnection])
const getAppResource = useCallback(async (toolName) => {
const found = findToolAndConnection(toolName)
if (!found) return null
const uri = getToolUiResourceUri(found.tool)
if (!uri) return null
try {
const res = await found.conn.client.readResource({ uri })
const htmlContent = res.contents?.[0]
if (!htmlContent) return null
return {
html: htmlContent.text || '',
meta: found.tool._meta?.ui || {},
}
} catch (err) {
console.warn('Failed to fetch MCP app resource:', err)
return null
}
}, [findToolAndConnection])
const getClientForTool = useCallback((toolName) => {
const found = findToolAndConnection(toolName)
return found ? found.conn.client : null
}, [findToolAndConnection])
const getToolDefinition = useCallback((toolName) => {
const found = findToolAndConnection(toolName)
return found ? found.tool : null
}, [findToolAndConnection])
return {
connect,
disconnect,
disconnectAll,
getToolsForLLM,
isClientTool,
executeTool,
connectionStatuses,
getConnectedTools,
hasAppUI,
getAppResource,
getClientForTool,
getToolDefinition,
}
}
function formatToolResult(result) {
if (!result?.content) return ''
const parts = []
for (const item of result.content) {
if (item.type === 'text') {
parts.push(item.text)
} else if (item.type === 'image') {
parts.push(`[Image: ${item.mimeType || 'image'}]`)
} else if (item.type === 'resource') {
parts.push(item.resource?.text || JSON.stringify(item.resource))
} else {
parts.push(JSON.stringify(item))
}
}
return parts.join('\n')
}

View File

@@ -5,7 +5,11 @@ import ModelSelector from '../components/ModelSelector'
import { renderMarkdown, highlightAll } from '../utils/markdown'
import { extractCodeArtifacts, renderMarkdownWithArtifacts } from '../utils/artifacts'
import CanvasPanel from '../components/CanvasPanel'
import { fileToBase64, modelsApi } from '../utils/api'
import { fileToBase64, modelsApi, mcpApi } from '../utils/api'
import { useMCPClient } from '../hooks/useMCPClient'
import MCPAppFrame from '../components/MCPAppFrame'
import ClientMCPDropdown from '../components/ClientMCPDropdown'
import { loadClientMCPServers } from '../utils/mcpClientStorage'
function relativeTime(ts) {
if (!ts) return ''
@@ -60,7 +64,10 @@ function formatToolContent(raw) {
try {
const data = JSON.parse(raw)
const name = data.name || 'unknown'
const params = data.arguments || data.input || data.result || data.parameters || {}
let params = data.arguments || data.input || data.result || data.parameters || {}
if (typeof params === 'string') {
try { params = JSON.parse(params) } catch (_) { /* keep as string */ }
}
const entries = typeof params === 'object' && params !== null ? Object.entries(params) : []
return { name, entries, fallback: null }
} catch (_e) {
@@ -89,7 +96,7 @@ function ToolParams({ entries, fallback }) {
)
}
function ActivityGroup({ items, updateChatSettings, activeChat }) {
function ActivityGroup({ items, updateChatSettings, activeChat, getClientForTool }) {
const [expanded, setExpanded] = useState(false)
const contentRef = useRef(null)
@@ -99,7 +106,11 @@ function ActivityGroup({ items, updateChatSettings, activeChat }) {
if (!items || items.length === 0) return null
const labels = items.map(item => {
// Separate out tool_result items that have appUI — they render outside the collapsed group
const appUIItems = items.filter(item => item.role === 'tool_result' && item.appUI)
const regularItems = items.filter(item => !(item.role === 'tool_result' && item.appUI))
const labels = regularItems.map(item => {
if (item.role === 'thinking' || item.role === 'reasoning') return 'Thought'
if (item.role === 'tool_call') {
try { return JSON.parse(item.content)?.name || 'Tool' } catch (_e) { return 'Tool' }
@@ -112,40 +123,63 @@ function ActivityGroup({ items, updateChatSettings, activeChat }) {
const summary = labels.join(' → ')
return (
<div className="chat-message chat-message-assistant">
<div className="chat-message-avatar">
<i className="fas fa-cogs" />
</div>
<div className="chat-activity-group">
<button className="chat-activity-toggle" onClick={() => setExpanded(!expanded)}>
<span className="chat-activity-summary">{summary}</span>
<i className={`fas fa-chevron-${expanded ? 'up' : 'down'}`} />
</button>
{expanded && (
<div className="chat-activity-details" ref={contentRef}>
{items.map((item, idx) => {
if (item.role === 'thinking' || item.role === 'reasoning') {
return (
<div key={idx} className="chat-activity-item chat-activity-thinking">
<span className="chat-activity-item-label">Thought</span>
<div className="chat-activity-item-content"
dangerouslySetInnerHTML={{ __html: renderMarkdown(item.content || '') }} />
</div>
)
}
const isCall = item.role === 'tool_call'
const parsed = formatToolContent(item.content)
return (
<div key={idx} className={`chat-activity-item ${isCall ? 'chat-activity-tool-call' : 'chat-activity-tool-result'}`}>
<span className="chat-activity-item-label">{labels[idx]}</span>
<ToolParams entries={parsed.entries} fallback={parsed.fallback} />
</div>
)
})}
<>
{regularItems.length > 0 && (
<div className="chat-message chat-message-assistant">
<div className="chat-message-avatar">
<i className="fas fa-cogs" />
</div>
)}
</div>
</div>
<div className="chat-activity-group">
<button className="chat-activity-toggle" onClick={() => setExpanded(!expanded)}>
<span className="chat-activity-summary">{summary}</span>
<i className={`fas fa-chevron-${expanded ? 'up' : 'down'}`} />
</button>
{expanded && (
<div className="chat-activity-details" ref={contentRef}>
{regularItems.map((item, idx) => {
if (item.role === 'thinking' || item.role === 'reasoning') {
return (
<div key={idx} className="chat-activity-item chat-activity-thinking">
<span className="chat-activity-item-label">Thought</span>
<div className="chat-activity-item-content"
dangerouslySetInnerHTML={{ __html: renderMarkdown(item.content || '') }} />
</div>
)
}
const isCall = item.role === 'tool_call'
const parsed = formatToolContent(item.content)
return (
<div key={idx} className={`chat-activity-item ${isCall ? 'chat-activity-tool-call' : 'chat-activity-tool-result'}`}>
<span className="chat-activity-item-label">{labels[idx]}</span>
<ToolParams entries={parsed.entries} fallback={parsed.fallback} />
</div>
)
})}
</div>
)}
</div>
</div>
)}
{appUIItems.map((item, idx) => (
<div key={`appui-${idx}`} className="chat-message chat-message-assistant">
<div className="chat-message-avatar">
<i className="fas fa-puzzle-piece" />
</div>
<div className="chat-message-bubble">
<span className="chat-message-model">{item.appUI.toolName}</span>
<MCPAppFrame
toolName={item.appUI.toolName}
toolInput={item.appUI.toolInput}
toolResult={item.appUI.toolResult}
mcpClient={getClientForTool?.(item.appUI.toolName) || null}
toolDefinition={item.appUI.toolDefinition}
appHtml={item.appUI.html}
resourceMeta={item.appUI.meta}
/>
</div>
</div>
))}
</>
)
}
@@ -156,8 +190,8 @@ function StreamingActivity({ reasoning, toolCalls, hasResponse }) {
const contentRef = useRef(null)
const [manualCollapse, setManualCollapse] = useState(null)
// Auto-expand while thinking, auto-collapse when response starts
const autoExpanded = reasoning && !hasResponse
// Auto-expand while thinking or tool-calling, auto-collapse when response starts
const autoExpanded = (reasoning || (toolCalls && toolCalls.length > 0)) && !hasResponse
const expanded = manualCollapse !== null ? !manualCollapse : autoExpanded
// Scroll to bottom of thinking content as it streams
@@ -202,9 +236,18 @@ function StreamingActivity({ reasoning, toolCalls, hasResponse }) {
{expanded && toolCalls && toolCalls.length > 0 && (
<div className="chat-activity-details">
{toolCalls.map((tc, idx) => {
if (tc.type === 'tool_result') {
return (
<div key={idx} className="chat-activity-item chat-activity-tool-result">
<span className="chat-activity-item-label">{tc.name} result</span>
<div className="chat-activity-item-content"
dangerouslySetInnerHTML={{ __html: renderMarkdown(tc.result || '') }} />
</div>
)
}
const parsed = formatToolContent(JSON.stringify(tc, null, 2))
return (
<div key={idx} className={`chat-activity-item ${tc.type === 'tool_call' ? 'chat-activity-tool-call' : 'chat-activity-tool-result'}`}>
<div key={idx} className="chat-activity-item chat-activity-tool-call">
<span className="chat-activity-item-label">{tc.name || tc.type}</span>
<ToolParams entries={parsed.entries} fallback={parsed.fallback} />
</div>
@@ -247,7 +290,7 @@ export default function Chat() {
chats, activeChat, activeChatId, isStreaming, streamingChatId, streamingContent,
streamingReasoning, streamingToolCalls, tokensPerSecond, maxTokensPerSecond,
addChat, switchChat, deleteChat, deleteAllChats, renameChat, updateChatSettings,
sendMessage, stopGeneration, clearHistory, getContextUsagePercent,
sendMessage, stopGeneration, clearHistory, getContextUsagePercent, addMessage,
} = useChat(urlModel || '')
const [input, setInput] = useState('')
@@ -256,6 +299,18 @@ export default function Chat() {
const [editingName, setEditingName] = useState(null)
const [editName, setEditName] = useState('')
const [mcpAvailable, setMcpAvailable] = useState(false)
const [mcpServersOpen, setMcpServersOpen] = useState(false)
const [mcpServerList, setMcpServerList] = useState([])
const [mcpServersLoading, setMcpServersLoading] = useState(false)
const [mcpServerCache, setMcpServerCache] = useState({})
const [mcpPromptsOpen, setMcpPromptsOpen] = useState(false)
const [mcpPromptList, setMcpPromptList] = useState([])
const [mcpPromptsLoading, setMcpPromptsLoading] = useState(false)
const [mcpPromptArgsDialog, setMcpPromptArgsDialog] = useState(null)
const [mcpPromptArgsValues, setMcpPromptArgsValues] = useState({})
const [mcpResourcesOpen, setMcpResourcesOpen] = useState(false)
const [mcpResourceList, setMcpResourceList] = useState([])
const [mcpResourcesLoading, setMcpResourcesLoading] = useState(false)
const [chatSearch, setChatSearch] = useState('')
const [modelInfo, setModelInfo] = useState(null)
const [showModelInfo, setShowModelInfo] = useState(false)
@@ -263,6 +318,12 @@ export default function Chat() {
const [canvasMode, setCanvasMode] = useState(false)
const [canvasOpen, setCanvasOpen] = useState(false)
const [selectedArtifactId, setSelectedArtifactId] = useState(null)
const [clientMCPServers, setClientMCPServers] = useState(() => loadClientMCPServers())
const {
connect: mcpConnect, disconnect: mcpDisconnect, disconnectAll: mcpDisconnectAll,
getToolsForLLM, isClientTool, executeTool, connectionStatuses, getConnectedTools,
hasAppUI, getAppResource, getClientForTool, getToolDefinition,
} = useMCPClient()
const messagesEndRef = useRef(null)
const fileInputRef = useRef(null)
const messagesRef = useRef(null)
@@ -296,12 +357,191 @@ export default function Chat() {
const hasMcp = !!(cfg?.mcp?.remote || cfg?.mcp?.stdio)
setMcpAvailable(hasMcp)
if (!hasMcp && activeChat?.mcpMode) {
updateChatSettings(activeChat.id, { mcpMode: false })
updateChatSettings(activeChat.id, { mcpMode: false, mcpServers: [] })
}
}).catch(() => { if (!cancelled) { setMcpAvailable(false); setModelInfo(null) } })
return () => { cancelled = true }
}, [activeChat?.model])
const mcpDropdownRef = useRef(null)
useEffect(() => {
if (!mcpServersOpen) return
const handleClick = (e) => {
if (mcpDropdownRef.current && !mcpDropdownRef.current.contains(e.target)) {
setMcpServersOpen(false)
}
}
document.addEventListener('mousedown', handleClick)
return () => document.removeEventListener('mousedown', handleClick)
}, [mcpServersOpen])
const fetchMcpServers = useCallback(async () => {
const model = activeChat?.model
if (!model) return
if (mcpServerCache[model]) {
setMcpServerList(mcpServerCache[model])
return
}
setMcpServersLoading(true)
try {
const data = await mcpApi.listServers(model)
const servers = data?.servers || []
setMcpServerList(servers)
setMcpServerCache(prev => ({ ...prev, [model]: servers }))
} catch (_e) {
setMcpServerList([])
} finally {
setMcpServersLoading(false)
}
}, [activeChat?.model, mcpServerCache])
const toggleMcpServer = useCallback((serverName) => {
if (!activeChat) return
const current = activeChat.mcpServers || []
const next = current.includes(serverName)
? current.filter(s => s !== serverName)
: [...current, serverName]
updateChatSettings(activeChat.id, { mcpServers: next })
}, [activeChat, updateChatSettings])
const mcpPromptsRef = useRef(null)
useEffect(() => {
if (!mcpPromptsOpen) return
const handleClick = (e) => {
if (mcpPromptsRef.current && !mcpPromptsRef.current.contains(e.target)) {
setMcpPromptsOpen(false)
}
}
document.addEventListener('mousedown', handleClick)
return () => document.removeEventListener('mousedown', handleClick)
}, [mcpPromptsOpen])
const mcpResourcesRef = useRef(null)
useEffect(() => {
if (!mcpResourcesOpen) return
const handleClick = (e) => {
if (mcpResourcesRef.current && !mcpResourcesRef.current.contains(e.target)) {
setMcpResourcesOpen(false)
}
}
document.addEventListener('mousedown', handleClick)
return () => document.removeEventListener('mousedown', handleClick)
}, [mcpResourcesOpen])
const fetchMcpPrompts = useCallback(async () => {
const model = activeChat?.model
if (!model) return
setMcpPromptsLoading(true)
try {
const data = await mcpApi.listPrompts(model)
setMcpPromptList(Array.isArray(data) ? data : [])
} catch (_e) {
setMcpPromptList([])
} finally {
setMcpPromptsLoading(false)
}
}, [activeChat?.model])
const fetchMcpResources = useCallback(async () => {
const model = activeChat?.model
if (!model) return
setMcpResourcesLoading(true)
try {
const data = await mcpApi.listResources(model)
setMcpResourceList(Array.isArray(data) ? data : [])
} catch (_e) {
setMcpResourceList([])
} finally {
setMcpResourcesLoading(false)
}
}, [activeChat?.model])
const handleSelectPrompt = useCallback(async (prompt) => {
if (prompt.arguments && prompt.arguments.length > 0) {
setMcpPromptArgsDialog(prompt)
setMcpPromptArgsValues({})
return
}
// No arguments, expand immediately
const model = activeChat?.model
if (!model) return
try {
const result = await mcpApi.getPrompt(model, prompt.name, {})
if (result?.messages) {
for (const msg of result.messages) {
addMessage(activeChat.id, { role: msg.role || 'user', content: msg.content })
}
}
} catch (e) {
addMessage(activeChat.id, { role: 'system', content: `Failed to expand prompt: ${e.message}` })
}
setMcpPromptsOpen(false)
}, [activeChat?.model, activeChat?.id, addMessage])
const handleExpandPromptWithArgs = useCallback(async () => {
if (!mcpPromptArgsDialog) return
const model = activeChat?.model
if (!model) return
try {
const result = await mcpApi.getPrompt(model, mcpPromptArgsDialog.name, mcpPromptArgsValues)
if (result?.messages) {
for (const msg of result.messages) {
addMessage(activeChat.id, { role: msg.role || 'user', content: msg.content })
}
}
} catch (e) {
addMessage(activeChat.id, { role: 'system', content: `Failed to expand prompt: ${e.message}` })
}
setMcpPromptArgsDialog(null)
setMcpPromptArgsValues({})
setMcpPromptsOpen(false)
}, [activeChat?.model, activeChat?.id, mcpPromptArgsDialog, mcpPromptArgsValues, addMessage])
const toggleMcpResource = useCallback((uri) => {
if (!activeChat) return
const current = activeChat.mcpResources || []
const next = current.includes(uri)
? current.filter(u => u !== uri)
: [...current, uri]
updateChatSettings(activeChat.id, { mcpResources: next })
}, [activeChat, updateChatSettings])
// Auto-connect/disconnect client MCP servers based on chat's active list
const activeMCPIds = activeChat?.clientMCPServers || []
useEffect(() => {
const activeSet = new Set(activeMCPIds)
for (const server of clientMCPServers) {
const status = connectionStatuses[server.id]?.status
if (activeSet.has(server.id) && status !== 'connected' && status !== 'connecting') {
mcpConnect(server)
} else if (!activeSet.has(server.id) && (status === 'connected' || status === 'connecting')) {
mcpDisconnect(server.id)
}
}
}, [activeMCPIds.join(','), clientMCPServers])
const handleClientMCPServerAdded = useCallback((server) => {
setClientMCPServers(loadClientMCPServers())
const current = activeChat?.clientMCPServers || []
if (activeChat) updateChatSettings(activeChat.id, { clientMCPServers: [...current, server.id] })
}, [activeChat, updateChatSettings])
const handleClientMCPServerRemoved = useCallback(async (id) => {
await mcpDisconnect(id)
setClientMCPServers(loadClientMCPServers())
if (activeChat) {
const current = activeChat.clientMCPServers || []
updateChatSettings(activeChat.id, { clientMCPServers: current.filter(s => s !== id) })
}
}, [activeChat, mcpDisconnect, updateChatSettings])
const handleClientMCPToggle = useCallback((serverId) => {
if (!activeChat) return
const current = activeChat.clientMCPServers || []
const next = current.includes(serverId) ? current.filter(s => s !== serverId) : [...current, serverId]
updateChatSettings(activeChat.id, { clientMCPServers: next })
}, [activeChat, updateChatSettings])
// Load initial message from home page
const homeDataProcessed = useRef(false)
useEffect(() => {
@@ -325,6 +565,12 @@ export default function Chat() {
updateChatSettings(activeChat.id, { mcpMode: true })
}
}
if (data.mcpServers?.length > 0 && targetChat) {
updateChatSettings(targetChat.id, { mcpServers: data.mcpServers })
}
if (data.clientMCPServers?.length > 0 && targetChat) {
updateChatSettings(targetChat.id, { clientMCPServers: data.clientMCPServers })
}
setInput(data.message)
if (data.files) setFiles(data.files)
setTimeout(() => {
@@ -418,8 +664,28 @@ export default function Chat() {
}
setInput('')
setFiles([])
await sendMessage(msg, files)
}, [input, files, activeChat, sendMessage, addToast])
const tools = getToolsForLLM()
const mcpOptions = tools.length > 0 ? {
clientMCPTools: tools,
isClientTool: (name) => isClientTool(name),
executeTool: (name, args) => executeTool(name, args),
maxToolTurns: 10,
getToolAppUI: async (toolName, toolInput, toolResultText) => {
if (!hasAppUI(toolName)) return null
const resource = await getAppResource(toolName)
if (!resource) return null
return {
html: resource.html,
meta: resource.meta,
toolName,
toolInput,
toolDefinition: getToolDefinition(toolName),
toolResult: { content: [{ type: 'text', text: toolResultText }] },
}
},
} : {}
await sendMessage(msg, files, mcpOptions)
}, [input, files, activeChat, sendMessage, addToast, getToolsForLLM, isClientTool, executeTool, hasAppUI, getAppResource, getToolDefinition])
const handleRegenerate = useCallback(async () => {
if (!activeChat || isStreaming) return
@@ -631,18 +897,170 @@ export default function Chat() {
</>
)}
{mcpAvailable && (
<label className="chat-mcp-switch" title="Toggle MCP mode">
<span className="chat-mcp-switch-label">MCP</span>
<span className="toggle">
<input
type="checkbox"
checked={activeChat.mcpMode || false}
onChange={(e) => updateChatSettings(activeChat.id, { mcpMode: e.target.checked })}
/>
<span className="toggle-slider" />
</span>
</label>
<div className="chat-mcp-dropdown" ref={mcpDropdownRef}>
<button
className={`btn btn-sm ${(activeChat.mcpServers?.length > 0) ? 'btn-primary' : 'btn-secondary'}`}
title="Select MCP servers"
onClick={() => { setMcpServersOpen(!mcpServersOpen); if (!mcpServersOpen) fetchMcpServers() }}
>
<i className="fas fa-plug" /> MCP
{activeChat.mcpServers?.length > 0 && (
<span className="chat-mcp-badge">{activeChat.mcpServers.length}</span>
)}
</button>
{mcpServersOpen && (
<div className="chat-mcp-dropdown-menu">
{mcpServersLoading ? (
<div className="chat-mcp-dropdown-loading"><i className="fas fa-spinner fa-spin" /> Loading servers...</div>
) : mcpServerList.length === 0 ? (
<div className="chat-mcp-dropdown-empty">No MCP servers configured</div>
) : (
<>
<div className="chat-mcp-dropdown-header">
<span>MCP Servers</span>
<button
className="chat-mcp-select-all"
onClick={() => {
const allNames = mcpServerList.map(s => s.name)
const allSelected = allNames.every(n => (activeChat.mcpServers || []).includes(n))
updateChatSettings(activeChat.id, { mcpServers: allSelected ? [] : allNames })
}}
>
{mcpServerList.every(s => (activeChat.mcpServers || []).includes(s.name)) ? 'Deselect all' : 'Select all'}
</button>
</div>
{mcpServerList.map(server => (
<label key={server.name} className="chat-mcp-server-item">
<input
type="checkbox"
checked={(activeChat.mcpServers || []).includes(server.name)}
onChange={() => toggleMcpServer(server.name)}
/>
<div className="chat-mcp-server-info">
<span className="chat-mcp-server-name">{server.name}</span>
<span className="chat-mcp-server-tools">{server.tools?.length || 0} tools</span>
</div>
</label>
))}
</>
)}
</div>
)}
</div>
)}
{mcpAvailable && (
<div className="chat-mcp-dropdown" ref={mcpPromptsRef}>
<button
className="btn btn-sm btn-secondary"
title="MCP Prompts"
onClick={() => { setMcpPromptsOpen(!mcpPromptsOpen); if (!mcpPromptsOpen) fetchMcpPrompts() }}
>
<i className="fas fa-comment-dots" /> Prompts
</button>
{mcpPromptsOpen && (
<div className="chat-mcp-dropdown-menu">
{mcpPromptsLoading ? (
<div className="chat-mcp-dropdown-loading"><i className="fas fa-spinner fa-spin" /> Loading prompts...</div>
) : mcpPromptList.length === 0 ? (
<div className="chat-mcp-dropdown-empty">No MCP prompts available</div>
) : (
<>
<div className="chat-mcp-dropdown-header"><span>MCP Prompts</span></div>
{mcpPromptList.map(prompt => (
<div
key={prompt.name}
className="chat-mcp-server-item"
style={{ cursor: 'pointer', padding: '6px 10px' }}
onClick={() => handleSelectPrompt(prompt)}
>
<div className="chat-mcp-server-info">
<span className="chat-mcp-server-name">{prompt.title || prompt.name}</span>
{prompt.description && (
<span className="chat-mcp-server-tools">{prompt.description}</span>
)}
</div>
</div>
))}
</>
)}
</div>
)}
{mcpPromptArgsDialog && (
<div className="chat-mcp-dropdown-menu" style={{ minWidth: '250px' }}>
<div className="chat-mcp-dropdown-header">
<span>{mcpPromptArgsDialog.title || mcpPromptArgsDialog.name}</span>
</div>
{mcpPromptArgsDialog.arguments.map(arg => (
<div key={arg.name} style={{ padding: '4px 10px' }}>
<label style={{ fontSize: '0.8rem', display: 'block', marginBottom: '2px' }}>
{arg.name}{arg.required ? ' *' : ''}
</label>
<input
type="text"
className="input input-sm"
style={{ width: '100%' }}
placeholder={arg.description || arg.name}
value={mcpPromptArgsValues[arg.name] || ''}
onChange={e => setMcpPromptArgsValues(prev => ({ ...prev, [arg.name]: e.target.value }))}
/>
</div>
))}
<div style={{ padding: '6px 10px', display: 'flex', gap: '6px', justifyContent: 'flex-end' }}>
<button className="btn btn-sm btn-secondary" onClick={() => setMcpPromptArgsDialog(null)}>Cancel</button>
<button className="btn btn-sm btn-primary" onClick={handleExpandPromptWithArgs}>Apply</button>
</div>
</div>
)}
</div>
)}
{mcpAvailable && (
<div className="chat-mcp-dropdown" ref={mcpResourcesRef}>
<button
className={`btn btn-sm ${(activeChat.mcpResources?.length > 0) ? 'btn-primary' : 'btn-secondary'}`}
title="MCP Resources"
onClick={() => { setMcpResourcesOpen(!mcpResourcesOpen); if (!mcpResourcesOpen) fetchMcpResources() }}
>
<i className="fas fa-paperclip" /> Resources
{activeChat.mcpResources?.length > 0 && (
<span className="chat-mcp-badge">{activeChat.mcpResources.length}</span>
)}
</button>
{mcpResourcesOpen && (
<div className="chat-mcp-dropdown-menu">
{mcpResourcesLoading ? (
<div className="chat-mcp-dropdown-loading"><i className="fas fa-spinner fa-spin" /> Loading resources...</div>
) : mcpResourceList.length === 0 ? (
<div className="chat-mcp-dropdown-empty">No MCP resources available</div>
) : (
<>
<div className="chat-mcp-dropdown-header"><span>MCP Resources</span></div>
{mcpResourceList.map(resource => (
<label key={resource.uri} className="chat-mcp-server-item">
<input
type="checkbox"
checked={(activeChat.mcpResources || []).includes(resource.uri)}
onChange={() => toggleMcpResource(resource.uri)}
/>
<div className="chat-mcp-server-info">
<span className="chat-mcp-server-name">{resource.name}</span>
<span className="chat-mcp-server-tools">{resource.uri}</span>
</div>
</label>
))}
</>
)}
</div>
)}
</div>
)}
<ClientMCPDropdown
activeServerIds={activeChat.clientMCPServers || []}
onToggleServer={handleClientMCPToggle}
onServerAdded={handleClientMCPServerAdded}
onServerRemoved={handleClientMCPServerRemoved}
connectionStatuses={connectionStatuses}
getConnectedTools={getConnectedTools}
/>
<div className="chat-header-actions">
<label className="canvas-mode-toggle" title="Extract code blocks and media into a side panel for preview, copy, and download">
<i className="fas fa-columns" />
@@ -821,7 +1239,8 @@ export default function Chat() {
if (activityBuf.length > 0) {
elements.push(
<ActivityGroup key={`ag-${key}`} items={[...activityBuf]}
updateChatSettings={updateChatSettings} activeChat={activeChat} />
updateChatSettings={updateChatSettings} activeChat={activeChat}
getClientForTool={getClientForTool} />
)
activityBuf = []
}

View File

@@ -1,8 +1,9 @@
import { useState, useEffect, useRef, useCallback } from 'react'
import { useNavigate, useOutletContext } from 'react-router-dom'
import ModelSelector from '../components/ModelSelector'
import ClientMCPDropdown from '../components/ClientMCPDropdown'
import { useResources } from '../hooks/useResources'
import { fileToBase64, backendControlApi, systemApi, modelsApi } from '../utils/api'
import { fileToBase64, backendControlApi, systemApi, modelsApi, mcpApi } from '../utils/api'
import { API_CONFIG } from '../utils/config'
const placeholderMessages = [
@@ -35,6 +36,13 @@ export default function Home() {
const [textFiles, setTextFiles] = useState([])
const [mcpMode, setMcpMode] = useState(false)
const [mcpAvailable, setMcpAvailable] = useState(false)
const [mcpServersOpen, setMcpServersOpen] = useState(false)
const [mcpServerList, setMcpServerList] = useState([])
const [mcpServersLoading, setMcpServersLoading] = useState(false)
const [mcpServerCache, setMcpServerCache] = useState({})
const [mcpSelectedServers, setMcpSelectedServers] = useState([])
const [clientMCPSelectedIds, setClientMCPSelectedIds] = useState([])
const mcpDropdownRef = useRef(null)
const [placeholderIdx, setPlaceholderIdx] = useState(0)
const [placeholderText, setPlaceholderText] = useState('')
const imageInputRef = useRef(null)
@@ -72,6 +80,7 @@ export default function Home() {
if (!selectedModel) {
setMcpAvailable(false)
setMcpMode(false)
setMcpSelectedServers([])
return
}
let cancelled = false
@@ -79,11 +88,15 @@ export default function Home() {
if (cancelled) return
const hasMcp = !!(cfg?.mcp?.remote || cfg?.mcp?.stdio)
setMcpAvailable(hasMcp)
if (!hasMcp) setMcpMode(false)
if (!hasMcp) {
setMcpMode(false)
setMcpSelectedServers([])
}
}).catch(() => {
if (!cancelled) {
setMcpAvailable(false)
setMcpMode(false)
setMcpSelectedServers([])
}
})
return () => { cancelled = true }
@@ -126,6 +139,42 @@ export default function Home() {
else setTextFiles(removeFn)
}, [])
useEffect(() => {
if (!mcpServersOpen) return
const handleClick = (e) => {
if (mcpDropdownRef.current && !mcpDropdownRef.current.contains(e.target)) {
setMcpServersOpen(false)
}
}
document.addEventListener('mousedown', handleClick)
return () => document.removeEventListener('mousedown', handleClick)
}, [mcpServersOpen])
const fetchMcpServers = useCallback(async () => {
if (!selectedModel) return
if (mcpServerCache[selectedModel]) {
setMcpServerList(mcpServerCache[selectedModel])
return
}
setMcpServersLoading(true)
try {
const data = await mcpApi.listServers(selectedModel)
const servers = data?.servers || []
setMcpServerList(servers)
setMcpServerCache(prev => ({ ...prev, [selectedModel]: servers }))
} catch (_e) {
setMcpServerList([])
} finally {
setMcpServersLoading(false)
}
}, [selectedModel, mcpServerCache])
const toggleMcpServer = useCallback((serverName) => {
setMcpSelectedServers(prev =>
prev.includes(serverName) ? prev.filter(s => s !== serverName) : [...prev, serverName]
)
}, [])
const doSubmit = useCallback(() => {
const text = message.trim() || placeholderText
if (!text && allFiles.length === 0) return
@@ -139,11 +188,13 @@ export default function Home() {
model: selectedModel,
files: allFiles,
mcpMode,
mcpServers: mcpSelectedServers,
clientMCPServers: clientMCPSelectedIds,
newChat: true,
}
localStorage.setItem('localai_index_chat_data', JSON.stringify(chatData))
navigate(`/chat/${encodeURIComponent(selectedModel)}`)
}, [message, placeholderText, allFiles, selectedModel, mcpMode, addToast, navigate])
}, [message, placeholderText, allFiles, selectedModel, mcpMode, mcpSelectedServers, clientMCPSelectedIds, addToast, navigate])
const handleSubmit = (e) => {
if (e) e.preventDefault()
@@ -200,26 +251,69 @@ export default function Home() {
<div className="home-model-row">
<ModelSelector value={selectedModel} onChange={setSelectedModel} capability="FLAG_CHAT" />
{mcpAvailable && (
<label className="home-mcp-toggle">
<span className="home-mcp-label">MCP</span>
<span className="toggle">
<input
type="checkbox"
checked={mcpMode}
onChange={(e) => setMcpMode(e.target.checked)}
/>
<span className="toggle-slider" />
</span>
</label>
<div className="chat-mcp-dropdown" ref={mcpDropdownRef}>
<button
type="button"
className={`btn btn-sm ${mcpSelectedServers.length > 0 ? 'btn-primary' : 'btn-secondary'}`}
title="Select MCP servers"
onClick={() => { setMcpServersOpen(!mcpServersOpen); if (!mcpServersOpen) fetchMcpServers() }}
>
<i className="fas fa-plug" /> MCP
{mcpSelectedServers.length > 0 && (
<span className="chat-mcp-badge">{mcpSelectedServers.length}</span>
)}
</button>
{mcpServersOpen && (
<div className="chat-mcp-dropdown-menu">
{mcpServersLoading ? (
<div className="chat-mcp-dropdown-loading"><i className="fas fa-spinner fa-spin" /> Loading servers...</div>
) : mcpServerList.length === 0 ? (
<div className="chat-mcp-dropdown-empty">No MCP servers configured</div>
) : (
<>
<div className="chat-mcp-dropdown-header">
<span>MCP Servers</span>
<button
type="button"
className="chat-mcp-select-all"
onClick={() => {
const allNames = mcpServerList.map(s => s.name)
const allSelected = allNames.every(n => mcpSelectedServers.includes(n))
setMcpSelectedServers(allSelected ? [] : allNames)
}}
>
{mcpServerList.every(s => mcpSelectedServers.includes(s.name)) ? 'Deselect all' : 'Select all'}
</button>
</div>
{mcpServerList.map(server => (
<label key={server.name} className="chat-mcp-server-item">
<input
type="checkbox"
checked={mcpSelectedServers.includes(server.name)}
onChange={() => toggleMcpServer(server.name)}
/>
<div className="chat-mcp-server-info">
<span className="chat-mcp-server-name">{server.name}</span>
<span className="chat-mcp-server-tools">{server.tools?.length || 0} tools</span>
</div>
</label>
))}
</>
)}
</div>
)}
</div>
)}
<ClientMCPDropdown
activeServerIds={clientMCPSelectedIds}
onToggleServer={(id) => setClientMCPSelectedIds(prev =>
prev.includes(id) ? prev.filter(s => s !== id) : [...prev, id]
)}
onServerAdded={(server) => setClientMCPSelectedIds(prev => [...prev, server.id])}
onServerRemoved={(id) => setClientMCPSelectedIds(prev => prev.filter(s => s !== id))}
/>
</div>
{mcpMode && (
<div className="home-mcp-info">
<i className="fas fa-info-circle" /> Non-streaming mode active.
</div>
)}
{/* File attachment tags */}
{allFiles.length > 0 && (
<div className="home-file-tags">
@@ -453,21 +547,6 @@ export default function Home() {
gap: var(--spacing-sm);
margin-bottom: var(--spacing-sm);
}
.home-mcp-toggle {
display: flex;
align-items: center;
gap: 6px;
cursor: pointer;
user-select: none;
}
.home-mcp-info {
font-size: 0.75rem;
color: var(--color-accent);
padding: var(--spacing-xs) var(--spacing-sm);
background: var(--color-accent-light);
border-radius: var(--radius-md);
margin-bottom: var(--spacing-sm);
}
.home-file-tags {
display: flex;
flex-wrap: wrap;

View File

@@ -74,7 +74,8 @@
--radius-xl: 12px;
--radius-full: 9999px;
--sidebar-width: 220px;
--sidebar-width: 200px;
--sidebar-width-collapsed: 52px;
--color-toggle-off: #475569;
}

View File

@@ -112,6 +112,15 @@ export const chatApi = {
mcpComplete: (body) => postJSON(API_CONFIG.endpoints.mcpChatCompletions, body),
}
// MCP API
export const mcpApi = {
listServers: (model) => fetchJSON(API_CONFIG.endpoints.mcpServers(model)),
listPrompts: (model) => fetchJSON(API_CONFIG.endpoints.mcpPrompts(model)),
getPrompt: (model, name, args) => postJSON(API_CONFIG.endpoints.mcpGetPrompt(model, name), { arguments: args }),
listResources: (model) => fetchJSON(API_CONFIG.endpoints.mcpResources(model)),
readResource: (model, uri) => postJSON(API_CONFIG.endpoints.mcpReadResource(model), { uri }),
}
// Resources API
export const resourcesApi = {
get: () => fetchJSON(API_CONFIG.endpoints.resources),

View File

@@ -50,6 +50,11 @@ export const API_CONFIG = {
// OpenAI-compatible endpoints
chatCompletions: '/v1/chat/completions',
mcpChatCompletions: '/v1/mcp/chat/completions',
mcpServers: (model) => `/v1/mcp/servers/${model}`,
mcpPrompts: (model) => `/v1/mcp/prompts/${model}`,
mcpGetPrompt: (model, prompt) => `/v1/mcp/prompts/${model}/${encodeURIComponent(prompt)}`,
mcpResources: (model) => `/v1/mcp/resources/${model}`,
mcpReadResource: (model) => `/v1/mcp/resources/${model}/read`,
completions: '/v1/completions',
imageGenerations: '/v1/images/generations',
audioSpeech: '/v1/audio/speech',
@@ -78,5 +83,6 @@ export const API_CONFIG = {
backendsInstalled: '/backends',
version: '/version',
system: '/system',
corsProxy: '/api/cors-proxy',
},
}

View File

@@ -0,0 +1,54 @@
const STORAGE_KEY = 'localai_client_mcp_servers'
function generateId() {
return Date.now().toString(36) + Math.random().toString(36).slice(2)
}
export function loadClientMCPServers() {
try {
const stored = localStorage.getItem(STORAGE_KEY)
if (stored) {
const data = JSON.parse(stored)
if (Array.isArray(data)) return data
}
} catch (_e) {
// ignore
}
return []
}
export function saveClientMCPServers(servers) {
try {
localStorage.setItem(STORAGE_KEY, JSON.stringify(servers))
} catch (_e) {
// ignore
}
}
export function addClientMCPServer({ name, url, headers, useProxy }) {
const servers = loadClientMCPServers()
const server = {
id: generateId(),
name: name || new URL(url).hostname,
url,
headers: headers || {},
useProxy: useProxy !== false,
}
servers.push(server)
saveClientMCPServers(servers)
return server
}
export function removeClientMCPServer(id) {
const servers = loadClientMCPServers().filter(s => s.id !== id)
saveClientMCPServers(servers)
return servers
}
export function updateClientMCPServer(id, updates) {
const servers = loadClientMCPServers().map(s =>
s.id === id ? { ...s, ...updates } : s
)
saveClientMCPServers(servers)
return servers
}

View File

@@ -164,6 +164,22 @@ func RegisterLocalAIRoutes(router *echo.Echo,
router.POST("/v1/mcp/chat/completions", mcpStreamHandler, mcpStreamMiddleware...)
router.POST("/mcp/v1/chat/completions", mcpStreamHandler, mcpStreamMiddleware...)
router.POST("/mcp/chat/completions", mcpStreamHandler, mcpStreamMiddleware...)
// MCP server listing endpoint
router.GET("/v1/mcp/servers/:model", localai.MCPServersEndpoint(cl, appConfig))
// MCP prompts endpoints
router.GET("/v1/mcp/prompts/:model", localai.MCPPromptsEndpoint(cl, appConfig))
router.POST("/v1/mcp/prompts/:model/:prompt", localai.MCPGetPromptEndpoint(cl, appConfig))
// MCP resources endpoints
router.GET("/v1/mcp/resources/:model", localai.MCPResourcesEndpoint(cl, appConfig))
router.POST("/v1/mcp/resources/:model/read", localai.MCPReadResourceEndpoint(cl, appConfig))
// CORS proxy for client-side MCP connections
router.GET("/api/cors-proxy", localai.CORSProxyEndpoint(appConfig))
router.POST("/api/cors-proxy", localai.CORSProxyEndpoint(appConfig))
router.OPTIONS("/api/cors-proxy", localai.CORSProxyOptionsEndpoint())
}
// Agent job routes (MCP CI Jobs — requires MCP to be enabled)

View File

@@ -557,21 +557,6 @@ func RegisterUIAPIRoutes(app *echo.Echo, cl *config.ModelConfigLoader, ml *model
galleryService.StoreCancellation(uid, cancelFunc)
go func() {
galleryService.ModelGalleryChannel <- op
// Wait for the deletion operation to complete with a timeout
ctx, cancel := context.WithTimeout(context.Background(), 10*time.Second)
defer cancel()
for {
select {
case <-ctx.Done():
xlog.Warn("Timeout waiting for deletion to complete", "uid", uid)
break
default:
if status := galleryService.GetStatus(uid); status != nil && status.Processed {
break
}
time.Sleep(100 * time.Millisecond)
}
}
cl.RemoveModelConfig(galleryName)
}()

View File

@@ -1209,7 +1209,7 @@ async function promptGPT(systemPrompt, input) {
model: model,
messages: messages,
};
// Add stream parameter for both regular chat and MCP (MCP now supports SSE streaming)
requestBody.stream = true;

View File

@@ -895,7 +895,7 @@ SOFTWARE.
<div x-data="{ showPromptForm: false, showParamsForm: false }" class="space-y-2">
<!-- MCP Toggle - Compact (shown dynamically based on model support) -->
<div x-data="{
<div x-data="{
mcpAvailable: false,
checkMCP() {
const modelSelector = document.getElementById('modelSelector');
@@ -910,7 +910,7 @@ SOFTWARE.
}
const hasMCP = selectedOption.getAttribute('data-has-mcp') === 'true';
this.mcpAvailable = hasMCP;
// If model doesn't support MCP, disable MCP mode
const activeChat = $store.chat.activeChat();
if (activeChat && !hasMCP) {

View File

@@ -28,6 +28,8 @@ type Message struct {
ToolCalls []ToolCall `json:"tool_calls,omitempty" yaml:"tool_call,omitempty"`
ToolCallID string `json:"tool_call_id,omitempty" yaml:"tool_call_id,omitempty"`
// Reasoning content extracted from <thinking>...</thinking> tags
Reasoning *string `json:"reasoning,omitempty" yaml:"reasoning,omitempty"`
}

View File

@@ -21,23 +21,23 @@ The Model Context Protocol is a standard for connecting AI models to external to
## Key Features
- **🔄 Real-time Tool Access**: Connect to external MCP servers for live data
- **🛠️ Multiple Server Support**: Configure both remote HTTP and local stdio servers
- **Cached Connections**: Efficient tool caching for better performance
- **🔒 Secure Authentication**: Support for bearer token authentication
- **🎯 OpenAI Compatible**: Uses the familiar `/mcp/v1/chat/completions` endpoint
- **🧠 Advanced Reasoning**: Configurable reasoning and re-evaluation capabilities
- **📋 Auto-Planning**: Break down complex tasks into manageable steps
- **🎯 MCP Prompts**: Specialized prompts for better MCP server interaction
- **🔄 Plan Re-evaluation**: Dynamic plan adjustment based on results
- **⚙️ Flexible Agent Control**: Customizable execution limits and retry behavior
- **Real-time Tool Access**: Connect to external MCP servers for live data
- **Multiple Server Support**: Configure both remote HTTP and local stdio servers
- **Cached Connections**: Efficient tool caching for better performance
- **Secure Authentication**: Support for bearer token authentication
- **Multi-endpoint Support**: Works with OpenAI Chat, Anthropic Messages, and Open Responses APIs
- **Selective Server Activation**: Use `metadata.mcp_servers` to enable only specific servers per request
- **Server-side Tool Execution**: Tools are executed on the server and results fed back to the model automatically
- **Agent Configuration**: Customizable execution limits and retry behavior
- **MCP Prompts**: Discover and expand reusable prompt templates from MCP servers
- **MCP Resources**: Browse and inject resource content (files, data) from MCP servers into conversations
## Configuration
MCP support is configured in your model's YAML configuration file using the `mcp` section:
```yaml
name: my-agentic-model
name: my-mcp-model
backend: llama-cpp
parameters:
model: qwen3-4b.gguf
@@ -56,7 +56,7 @@ mcp:
}
}
}
stdio: |
{
"mcpServers": {
@@ -78,16 +78,7 @@ mcp:
}
agent:
max_attempts: 3 # Maximum number of tool execution attempts
max_iterations: 3 # Maximum number of reasoning iterations
enable_reasoning: true # Enable tool reasoning capabilities
enable_planning: false # Enable auto-planning capabilities
enable_mcp_prompts: false # Enable MCP prompts
enable_plan_re_evaluator: false # Enable plan re-evaluation
disable_sink_state: false # Disable sink state behavior
loop_detection: 3 # Loop detection sensitivity level
max_adjustment_attempts: 5 # Maximum adjustment attempts for tool calls
force_reasoning_tool: false # Force reasoning tool usage
max_iterations: 10 # Maximum MCP tool execution loop iterations
```
### Configuration Options
@@ -106,39 +97,226 @@ Configure local command-based MCP servers:
- **`env`**: Environment variables (optional)
#### Agent Configuration (`agent`)
Configure agent behavior and tool execution:
**Execution Control**
- **`max_attempts`**: Maximum number of tool execution attempts (default: 3). Higher values provide more resilience but may increase response time.
- **`max_iterations`**: Maximum number of reasoning iterations (default: 3). More iterations allow for complex multi-step problem solving.
- **`loop_detection`**: Loop detection sensitivity level (default: 0, disabled). Set to a positive integer (e.g., 3) to enable loop detection and prevent infinite execution cycles.
- **`max_adjustment_attempts`**: Maximum adjustment attempts for tool calls (default: 5). Prevents infinite loops when adjusting tool call parameters.
**Reasoning and Planning**
- **`enable_reasoning`**: Enable tool reasoning capabilities (default: false). When enabled, the agent uses advanced reasoning to better understand tool results.
- **`enable_planning`**: Enable auto-planning capabilities (default: false). When enabled, breaks down complex tasks into manageable steps.
- **`disable_sink_state`**: Disable sink state behavior (default: false). When enabled, prevents the agent from entering a sink state.
- **`force_reasoning_tool`**: Force reasoning tool usage (default: false). When enabled, always use the reasoning tool in the agent's reasoning process.
**MCP Integration**
- **`enable_mcp_prompts`**: Enable MCP prompts (default: false). When enabled, uses specialized prompts exposed by MCP servers.
- **`enable_plan_re_evaluator`**: Enable plan re-evaluation (default: false). When enabled, dynamically adjusts execution plans based on results.
- **`max_iterations`**: Maximum number of MCP tool execution loop iterations (default: 10). Each iteration allows the model to call tools and receive results before generating the next response.
## Usage
### API Endpoint
### Selecting MCP Servers via `metadata`
Use the MCP-enabled completion endpoint:
All API endpoints support MCP server selection through the standard `metadata` field. Pass a comma-separated list of server names in `metadata.mcp_servers`:
- **When present**: Only the named MCP servers are activated for this request. Server names must match the keys in the model's MCP config YAML (e.g., `"weather-api"`, `"search-engine"`).
- **When absent**: Behavior depends on the endpoint:
- **OpenAI Chat Completions** and **Anthropic Messages**: No MCP tools are injected (standard behavior).
- **Open Responses**: If the model has MCP config and no user-provided tools, all MCP servers are auto-activated (backward compatible).
The `mcp_servers` metadata key is consumed by the MCP engine and stripped before reaching the backend. Clients that support the standard `metadata` field can use this without custom schema extensions.
### API Endpoints
MCP tools work across all three API endpoints:
#### OpenAI Chat Completions (`/v1/chat/completions`)
```bash
curl http://localhost:8080/v1/chat/completions \
-H "Content-Type: application/json" \
-d '{
"model": "my-mcp-model",
"messages": [{"role": "user", "content": "What is the weather in New York?"}],
"metadata": {"mcp_servers": "weather-api"},
"stream": true
}'
```
#### Anthropic Messages (`/v1/messages`)
```bash
curl http://localhost:8080/v1/messages \
-H "Content-Type: application/json" \
-d '{
"model": "my-mcp-model",
"max_tokens": 1024,
"messages": [{"role": "user", "content": "What is the weather in New York?"}],
"metadata": {"mcp_servers": "weather-api"}
}'
```
#### Open Responses (`/v1/responses`)
```bash
curl http://localhost:8080/v1/responses \
-H "Content-Type: application/json" \
-d '{
"model": "my-mcp-model",
"input": "What is the weather in New York?",
"metadata": {"mcp_servers": "weather-api"}
}'
```
### Server Listing Endpoint
You can list available MCP servers and their tools for a given model:
```bash
curl http://localhost:8080/v1/mcp/servers/my-mcp-model
```
Returns:
```json
[
{
"name": "weather-api",
"type": "remote",
"tools": ["get_weather", "get_forecast"]
},
{
"name": "search-engine",
"type": "remote",
"tools": ["web_search", "image_search"]
}
]
```
### MCP Prompts
MCP servers can provide reusable prompt templates. LocalAI supports discovering and expanding prompts from MCP servers.
#### List Prompts
```bash
curl http://localhost:8080/v1/mcp/prompts/my-mcp-model
```
Returns:
```json
[
{
"name": "code-review",
"description": "Review code for best practices",
"title": "Code Review",
"arguments": [
{"name": "language", "description": "Programming language", "required": true}
],
"server": "dev-tools"
}
]
```
#### Expand a Prompt
```bash
curl -X POST http://localhost:8080/v1/mcp/prompts/my-mcp-model/code-review \
-H "Content-Type: application/json" \
-d '{"arguments": {"language": "go"}}'
```
Returns:
```json
{
"messages": [
{"role": "user", "content": "Please review the following Go code for best practices..."}
]
}
```
#### Inject Prompts via Metadata
You can inject MCP prompts into any chat request using `metadata.mcp_prompt` and `metadata.mcp_prompt_args`:
```bash
curl http://localhost:8080/v1/chat/completions \
-H "Content-Type: application/json" \
-d '{
"model": "my-mcp-model",
"messages": [{"role": "user", "content": "Review this function: func add(a, b int) int { return a + b }"}],
"metadata": {
"mcp_servers": "dev-tools",
"mcp_prompt": "code-review",
"mcp_prompt_args": "{\"language\": \"go\"}"
}
}'
```
The prompt messages are prepended to the conversation before inference.
### MCP Resources
MCP servers can expose data/content (files, database records, etc.) as resources identified by URI.
#### List Resources
```bash
curl http://localhost:8080/v1/mcp/resources/my-mcp-model
```
Returns:
```json
[
{
"name": "project-readme",
"uri": "file:///README.md",
"description": "Project documentation",
"mimeType": "text/markdown",
"server": "file-manager"
}
]
```
#### Read a Resource
```bash
curl -X POST http://localhost:8080/v1/mcp/resources/my-mcp-model/read \
-H "Content-Type: application/json" \
-d '{"uri": "file:///README.md"}'
```
Returns:
```json
{
"uri": "file:///README.md",
"content": "# My Project\n...",
"mimeType": "text/markdown"
}
```
#### Inject Resources via Metadata
You can inject MCP resources into chat requests using `metadata.mcp_resources` (comma-separated URIs):
```bash
curl http://localhost:8080/v1/chat/completions \
-H "Content-Type: application/json" \
-d '{
"model": "my-mcp-model",
"messages": [{"role": "user", "content": "Summarize this project"}],
"metadata": {
"mcp_servers": "file-manager",
"mcp_resources": "file:///README.md,file:///CHANGELOG.md"
}
}'
```
Resource contents are appended to the last user message as text blocks (following the same approach as llama.cpp's WebUI).
### Legacy Endpoint
The `/mcp/v1/chat/completions` endpoint is still supported for backward compatibility. It automatically enables all configured MCP servers (equivalent to not specifying `mcp_servers`).
```bash
curl http://localhost:8080/mcp/v1/chat/completions \
-H "Content-Type: application/json" \
-d '{
"model": "my-agentic-model",
"model": "my-mcp-model",
"messages": [
{"role": "user", "content": "What is the current weather in New York?"}
],
"temperature": 0.7
]
}'
```
@@ -148,10 +326,10 @@ curl http://localhost:8080/mcp/v1/chat/completions \
{
"id": "chatcmpl-123",
"created": 1699123456,
"model": "my-agentic-model",
"model": "my-mcp-model",
"choices": [
{
"text": "The current weather in New York is 72°F (22°C) with partly cloudy skies. The humidity is 65% and there's a light breeze from the west at 8 mph."
"text": "The current weather in New York is 72°F (22°C) with partly cloudy skies."
}
],
"object": "text_completion"
@@ -160,7 +338,6 @@ curl http://localhost:8080/mcp/v1/chat/completions \
## Example Configurations
### Docker-based Tools
```yaml
@@ -184,47 +361,28 @@ mcp:
}
agent:
max_attempts: 5
max_iterations: 5
enable_reasoning: true
enable_planning: true
enable_mcp_prompts: true
enable_plan_re_evaluator: true
max_iterations: 10
```
## Agent Configuration Details
The `agent` section controls how the AI model interacts with MCP tools:
### Execution Control
- **`max_attempts`**: Limits how many times a tool can be retried if it fails. Higher values provide more resilience but may increase response time.
- **`max_iterations`**: Controls the maximum number of reasoning cycles the agent can perform. More iterations allow for complex multi-step problem solving.
- **`loop_detection`**: Set to a positive integer (e.g., 3) to enable loop detection and prevent infinite execution cycles. Default is 0 (disabled).
- **`max_adjustment_attempts`**: Limits the number of times the agent can adjust tool call parameters. Prevents infinite loops during tool execution (default: 5).
### Reasoning Capabilities
- **`enable_reasoning`**: When enabled, the agent uses advanced reasoning to better understand tool results and plan next steps.
- **`force_reasoning_tool`**: When enabled, forces the agent to always use the reasoning tool in its reasoning process, ensuring explicit reasoning steps.
- **`disable_sink_state`**: When enabled, prevents the agent from entering a sink state where it stops making progress.
### Planning Capabilities
- **`enable_planning`**: When enabled, the agent uses auto-planning to break down complex tasks into manageable steps and execute them systematically. The agent will automatically detect when planning is needed.
- **`enable_mcp_prompts`**: When enabled, the agent uses specialized prompts exposed by the MCP servers to interact with the exposed tools.
- **`enable_plan_re_evaluator`**: When enabled, the agent can re-evaluate and adjust its execution plan based on intermediate results.
### Recommended Settings
- **Simple tasks**: `max_attempts: 2`, `max_iterations: 2`, `enable_reasoning: false`, `enable_planning: false`
- **Complex tasks**: `max_attempts: 5`, `max_iterations: 5`, `enable_reasoning: true`, `enable_planning: true`, `enable_mcp_prompts: true`
- **Advanced planning**: `max_attempts: 5`, `max_iterations: 5`, `enable_reasoning: true`, `enable_planning: true`, `enable_mcp_prompts: true`, `enable_plan_re_evaluator: true`, `loop_detection: 3`
- **Development/Debugging**: `max_attempts: 1`, `max_iterations: 1`, `enable_reasoning: true`, `enable_planning: true`
- **Aggressive loop prevention**: `max_attempts: 5`, `max_iterations: 5`, `loop_detection: 2`, `max_adjustment_attempts: 3`, `force_reasoning_tool: true`
## How It Works
1. **Tool Discovery**: LocalAI connects to configured MCP servers and discovers available tools
2. **Tool Caching**: Tools are cached per model for efficient reuse
3. **Agent Execution**: The AI model uses the [Cogito](https://github.com/mudler/cogito) framework to execute tools
4. **Response Generation**: The model generates responses incorporating tool results
2. **Tool Injection**: Discovered tools are injected into the model's tool/function list alongside any user-provided tools
3. **Inference Loop**: The model generates a response. If it calls MCP tools, LocalAI executes them server-side, appends results to the conversation, and re-runs inference
4. **Response Generation**: When the model produces a final response (no more MCP tool calls), it is returned to the client
The execution loop is bounded by `agent.max_iterations` (default 10) to prevent infinite loops.
## Session Lifecycle
MCP sessions are automatically managed by LocalAI:
- **Lazy initialization**: Sessions are created the first time a model's MCP tools are used
- **Cached per model**: Sessions are reused across requests for the same model
- **Cleanup on model unload**: When a model is unloaded (idle watchdog eviction, manual stop, or shutdown), all associated MCP sessions are closed and resources freed
- **Graceful shutdown**: All MCP sessions are closed when LocalAI shuts down
This means you don't need to manually manage MCP connections — they follow the model's lifecycle automatically.
## Supported MCP Servers
@@ -255,15 +413,16 @@ Use MCP-enabled models in your applications:
import openai
client = openai.OpenAI(
base_url="http://localhost:8080/mcp/v1",
base_url="http://localhost:8080/v1",
api_key="your-api-key"
)
response = client.chat.completions.create(
model="my-agentic-model",
model="my-mcp-model",
messages=[
{"role": "user", "content": "Analyze the latest research papers on AI"}
]
],
extra_body={"metadata": {"mcp_servers": "search-engine"}}
)
```
@@ -366,6 +525,60 @@ mcp:
- [Awesome MCPs](https://github.com/punkpeye/awesome-mcp-servers)
- [A list of MCPs by mudler](https://github.com/mudler/MCPs)
## Client-Side MCP (Browser)
In addition to server-side MCP (where the backend connects to MCP servers), LocalAI supports **client-side MCP** where the browser connects directly to MCP servers. This is inspired by llama.cpp's WebUI and works alongside server-side MCP.
### How It Works
1. **Add servers in the UI**: Click the "Client MCP" button in the chat header and add MCP server URLs
2. **Browser connects directly**: The browser uses the MCP TypeScript SDK (`StreamableHTTPClientTransport` or `SSEClientTransport`) to connect to MCP servers
3. **Tool discovery**: Connected servers' tools are sent as `tools` in the chat request body
4. **Browser-side execution**: When the LLM calls a client-side tool, the browser executes it against the MCP server and sends the result back in a follow-up request
5. **Agentic loop**: This continues (up to 10 turns) until the LLM produces a final response
### CORS Proxy
Since browsers enforce CORS restrictions, LocalAI provides a built-in proxy at `/api/cors-proxy`. When "Use CORS proxy" is enabled (default), requests to external MCP servers are routed through:
```
/api/cors-proxy?url=https://remote-mcp-server.example.com/sse
```
The proxy forwards the request method, headers, and body to the target URL and streams the response back with appropriate CORS headers.
### MCP Apps (Interactive Tool UIs)
LocalAI supports the [MCP Apps extension](https://modelcontextprotocol.io/extensions/apps/overview), which allows MCP tools to declare interactive HTML UIs. When a tool has `_meta.ui.resourceUri` in its definition, calling that tool renders the app's HTML inline in the chat as a sandboxed iframe.
**How it works:**
- When the LLM calls a tool with `_meta.ui.resourceUri`, the browser fetches the HTML resource from the MCP server and renders it in an iframe
- The iframe is sandboxed (`allow-scripts allow-forms`, no `allow-same-origin`) for security
- The app can call server tools, send messages, and update context via the `AppBridge` protocol (JSON-RPC over `postMessage`)
- Tools marked as app-only (`_meta.ui.visibility: "app-only"`) are hidden from the LLM and only callable by the app iframe
- On page reload, apps render statically until the MCP connection is re-established
**Requirements:**
- Only works with **client-side MCP** connections (the browser must be connected to the MCP server)
- The MCP server must implement the Apps extension (`_meta.ui.resourceUri` on tools, resource serving)
### Coexistence with Server-Side MCP
Both modes work simultaneously in the same chat:
- **Server-side MCP tools** are configured in model YAML files and executed by the backend. The backend handles these in its own agentic loop.
- **Client-side MCP tools** are configured per-user in the browser and sent as `tools` in the request. When the LLM calls them, the browser executes them.
If both sides have a tool with the same name, the server-side tool takes priority.
### Security Considerations
- The CORS proxy can forward requests to any HTTP/HTTPS URL. It is only available when MCP is enabled (`LOCALAI_DISABLE_MCP` is not set).
- Client-side MCP server configurations are stored in the browser's localStorage and are not shared with the server.
- Custom headers (e.g., API keys) for MCP servers are stored in localStorage. Use with caution on shared machines.
## Disabling MCP Support
You can completely disable MCP functionality in LocalAI by setting the `LOCALAI_DISABLE_MCP` environment variable to `true`, `1`, or `yes`:

View File

@@ -19,6 +19,10 @@ import (
// new idea: what if we declare a struct of these here, and use a loop to check?
// TODO: Split ModelLoader and TemplateLoader? Just to keep things more organized. Left together to share a mutex until I look into that. Would split if we separate directories for .bin/.yaml and .tmpl
// ModelUnloadHook is called when a model is about to be unloaded.
// The model name is passed as the argument.
type ModelUnloadHook func(modelName string)
type ModelLoader struct {
ModelPath string
mu sync.Mutex
@@ -28,6 +32,7 @@ type ModelLoader struct {
externalBackends map[string]string
lruEvictionMaxRetries int // Maximum number of retries when waiting for busy models
lruEvictionRetryInterval time.Duration // Interval between retries when waiting for busy models
onUnloadHooks []ModelUnloadHook
}
// NewModelLoader creates a new ModelLoader instance.
@@ -52,6 +57,13 @@ func (ml *ModelLoader) GetLoadingCount() int {
return len(ml.loading)
}
// OnModelUnload registers a hook that is called when a model is unloaded.
func (ml *ModelLoader) OnModelUnload(hook ModelUnloadHook) {
ml.mu.Lock()
defer ml.mu.Unlock()
ml.onUnloadHooks = append(ml.onUnloadHooks, hook)
}
func (ml *ModelLoader) SetWatchDog(wd *WatchDog) {
ml.wd = wd
}

View File

@@ -46,6 +46,11 @@ func (ml *ModelLoader) deleteProcess(s string) error {
xlog.Debug("Deleting process", "model", s)
// Run unload hooks (e.g. close MCP sessions)
for _, hook := range ml.onUnloadHooks {
hook(s)
}
// Free GPU resources before stopping the process to ensure VRAM is released
if freeFunc, ok := model.GRPC(false, ml.wd).(interface{ Free() error }); ok {
xlog.Debug("Calling Free() to release GPU resources", "model", s)

439
tests/e2e/e2e_mcp_test.go Normal file
View File

@@ -0,0 +1,439 @@
package e2e_test
import (
"bytes"
"context"
"encoding/json"
"fmt"
"io"
"net"
"net/http"
"time"
"github.com/modelcontextprotocol/go-sdk/mcp"
. "github.com/onsi/ginkgo/v2"
. "github.com/onsi/gomega"
"github.com/openai/openai-go/v3"
)
// startMockMCPServer creates an in-process MCP HTTP server with a "get_weather" tool
// and returns its URL and a shutdown function.
func startMockMCPServer() (string, func()) {
server := mcp.NewServer(
&mcp.Implementation{Name: "mock-mcp", Version: "v1.0.0"},
nil,
)
server.AddTool(
&mcp.Tool{
Name: "get_weather",
Description: "Get the current weather for a location",
InputSchema: json.RawMessage(`{"type":"object","properties":{"location":{"type":"string","description":"City name"}},"required":["location"]}`),
},
func(_ context.Context, req *mcp.CallToolRequest) (*mcp.CallToolResult, error) {
var args struct {
Location string `json:"location"`
}
if req.Params.Arguments != nil {
data, _ := json.Marshal(req.Params.Arguments)
json.Unmarshal(data, &args)
}
return &mcp.CallToolResult{
Content: []mcp.Content{
&mcp.TextContent{
Text: fmt.Sprintf("Weather in %s: sunny, 72°F", args.Location),
},
},
}, nil
},
)
handler := mcp.NewStreamableHTTPHandler(
func(r *http.Request) *mcp.Server { return server },
&mcp.StreamableHTTPOptions{
Stateless: true,
},
)
listener, err := net.Listen("tcp", "127.0.0.1:0")
Expect(err).ToNot(HaveOccurred())
httpServer := &http.Server{Handler: handler}
go httpServer.Serve(listener)
url := fmt.Sprintf("http://%s/mcp", listener.Addr().String())
shutdown := func() {
ctx, cancel := context.WithTimeout(context.Background(), 2*time.Second)
defer cancel()
httpServer.Shutdown(ctx)
}
return url, shutdown
}
// mcpModelConfig generates a model config YAML that includes MCP remote server config.
func mcpModelConfig(mcpServerURL string) map[string]any {
mcpRemote := fmt.Sprintf(`{"mcpServers":{"weather-api":{"url":"%s"}}}`, mcpServerURL)
return map[string]any{
"name": "mock-model-mcp",
"backend": "mock-backend",
"parameters": map[string]any{
"model": "mock-model-mcp.bin",
},
"mcp": map[string]any{
"remote": mcpRemote,
},
"agent": map[string]any{
// The mock backend returns a tool call on the first inference, then
// a plain text response once tool results appear in the prompt.
// max_iterations=1 is enough for one tool-call round-trip.
"max_iterations": 1,
},
}
}
// httpPost sends a JSON POST request and returns the response.
func httpPost(url string, body any) (*http.Response, error) {
data, err := json.Marshal(body)
if err != nil {
return nil, err
}
req, err := http.NewRequest("POST", url, bytes.NewReader(data))
if err != nil {
return nil, err
}
req.Header.Set("Content-Type", "application/json")
return (&http.Client{Timeout: 60 * time.Second}).Do(req)
}
// readBody reads and returns the response body as a string.
func readBody(resp *http.Response) string {
data, err := io.ReadAll(resp.Body)
Expect(err).ToNot(HaveOccurred())
return string(data)
}
var _ = Describe("MCP Tool Integration E2E Tests", Label("MCP"), func() {
Describe("MCP Server Listing", func() {
It("should list MCP servers and tools for a configured model", func() {
resp, err := http.Get(fmt.Sprintf("http://127.0.0.1:%d/v1/mcp/servers/mock-model-mcp", apiPort))
Expect(err).ToNot(HaveOccurred())
defer resp.Body.Close()
Expect(resp.StatusCode).To(Equal(200))
var result struct {
Model string `json:"model"`
Servers []struct {
Name string `json:"name"`
Type string `json:"type"`
Tools []string `json:"tools"`
} `json:"servers"`
}
Expect(json.NewDecoder(resp.Body).Decode(&result)).To(Succeed())
Expect(result.Model).To(Equal("mock-model-mcp"))
Expect(result.Servers).To(HaveLen(1))
Expect(result.Servers[0].Name).To(Equal("weather-api"))
Expect(result.Servers[0].Tools).To(ContainElement("get_weather"))
})
It("should return empty servers for a model without MCP config", func() {
resp, err := http.Get(fmt.Sprintf("http://127.0.0.1:%d/v1/mcp/servers/mock-model", apiPort))
Expect(err).ToNot(HaveOccurred())
defer resp.Body.Close()
Expect(resp.StatusCode).To(Equal(200))
var result struct {
Servers []any `json:"servers"`
}
Expect(json.NewDecoder(resp.Body).Decode(&result)).To(Succeed())
Expect(result.Servers).To(BeEmpty())
})
})
Describe("OpenAI Chat Completions with MCP", func() {
Context("Non-streaming", func() {
It("should inject and execute MCP tools when mcp_servers is set", func() {
body := map[string]any{
"model": "mock-model-mcp",
"messages": []map[string]string{{"role": "user", "content": "What is the weather in San Francisco?"}},
"metadata": map[string]string{"mcp_servers": "weather-api"},
}
resp, err := httpPost(apiURL+"/chat/completions", body)
Expect(err).ToNot(HaveOccurred())
defer resp.Body.Close()
respBody := readBody(resp)
Expect(resp.StatusCode).To(Equal(200), "unexpected status, body: %s", respBody)
var result struct {
Choices []struct {
Message struct {
Content string `json:"content"`
} `json:"message"`
} `json:"choices"`
}
Expect(json.Unmarshal([]byte(respBody), &result)).To(Succeed())
Expect(result.Choices).To(HaveLen(1))
Expect(result.Choices[0].Message.Content).To(ContainSubstring("weather"))
})
It("should not inject MCP tools when mcp_servers is not set", func() {
resp, err := client.Chat.Completions.New(
context.TODO(),
openai.ChatCompletionNewParams{
Model: "mock-model-mcp",
Messages: []openai.ChatCompletionMessageParamUnion{
openai.UserMessage("Hello"),
},
},
)
Expect(err).ToNot(HaveOccurred())
Expect(len(resp.Choices)).To(Equal(1))
Expect(resp.Choices[0].Message.Content).To(ContainSubstring("mocked response"))
})
})
Context("Streaming", func() {
It("should work with MCP tools in streaming mode", func() {
body := map[string]any{
"model": "mock-model-mcp",
"messages": []map[string]string{{"role": "user", "content": "What is the weather?"}},
"metadata": map[string]string{"mcp_servers": "weather-api"},
"stream": true,
}
resp, err := httpPost(apiURL+"/chat/completions", body)
Expect(err).ToNot(HaveOccurred())
defer resp.Body.Close()
Expect(resp.StatusCode).To(Equal(200))
Expect(resp.Header.Get("Content-Type")).To(ContainSubstring("text/event-stream"))
data, err := io.ReadAll(resp.Body)
Expect(err).ToNot(HaveOccurred())
Expect(string(data)).To(ContainSubstring("data:"))
})
})
})
Describe("Anthropic Messages with MCP", func() {
Context("Non-streaming", func() {
It("should inject and execute MCP tools when mcp_servers is set", func() {
body := map[string]any{
"model": "mock-model-mcp",
"max_tokens": 1024,
"messages": []map[string]string{{"role": "user", "content": "What is the weather?"}},
"metadata": map[string]string{"mcp_servers": "weather-api"},
}
resp, err := httpPost(fmt.Sprintf("http://127.0.0.1:%d/v1/messages", apiPort), body)
Expect(err).ToNot(HaveOccurred())
defer resp.Body.Close()
respBody := readBody(resp)
Expect(resp.StatusCode).To(Equal(200), "unexpected status, body: %s", respBody)
var result map[string]any
Expect(json.Unmarshal([]byte(respBody), &result)).To(Succeed())
content, ok := result["content"].([]any)
Expect(ok).To(BeTrue())
Expect(content).ToNot(BeEmpty())
first, ok := content[0].(map[string]any)
Expect(ok).To(BeTrue())
Expect(first["text"]).To(ContainSubstring("weather"))
})
It("should return standard response without mcp_servers", func() {
body := map[string]any{
"model": "mock-model-mcp",
"max_tokens": 1024,
"messages": []map[string]string{{"role": "user", "content": "Hello"}},
}
resp, err := httpPost(fmt.Sprintf("http://127.0.0.1:%d/v1/messages", apiPort), body)
Expect(err).ToNot(HaveOccurred())
defer resp.Body.Close()
Expect(resp.StatusCode).To(Equal(200))
var result map[string]any
Expect(json.NewDecoder(resp.Body).Decode(&result)).To(Succeed())
content, ok := result["content"].([]any)
Expect(ok).To(BeTrue())
Expect(content).ToNot(BeEmpty())
first, ok := content[0].(map[string]any)
Expect(ok).To(BeTrue())
Expect(first["text"]).To(ContainSubstring("mocked response"))
})
})
Context("Streaming", func() {
It("should work with MCP tools in streaming mode", func() {
body := map[string]any{
"model": "mock-model-mcp",
"max_tokens": 1024,
"messages": []map[string]string{{"role": "user", "content": "What is the weather?"}},
"metadata": map[string]string{"mcp_servers": "weather-api"},
"stream": true,
}
resp, err := httpPost(fmt.Sprintf("http://127.0.0.1:%d/v1/messages", apiPort), body)
Expect(err).ToNot(HaveOccurred())
defer resp.Body.Close()
Expect(resp.StatusCode).To(Equal(200))
data, err := io.ReadAll(resp.Body)
Expect(err).ToNot(HaveOccurred())
Expect(string(data)).To(ContainSubstring("event:"))
})
})
})
Describe("Open Responses with MCP", func() {
Context("Non-streaming", func() {
It("should inject and execute MCP tools when mcp_servers is set", func() {
body := map[string]any{
"model": "mock-model-mcp",
"input": "What is the weather in San Francisco?",
"metadata": map[string]string{"mcp_servers": "weather-api"},
}
resp, err := httpPost(fmt.Sprintf("http://127.0.0.1:%d/v1/responses", apiPort), body)
Expect(err).ToNot(HaveOccurred())
defer resp.Body.Close()
respBody := readBody(resp)
Expect(resp.StatusCode).To(Equal(200), "unexpected status, body: %s", respBody)
var result map[string]any
Expect(json.Unmarshal([]byte(respBody), &result)).To(Succeed())
// Open Responses wraps output in an "output" array
output, ok := result["output"].([]any)
Expect(ok).To(BeTrue(), "expected output array in response: %s", respBody)
Expect(output).ToNot(BeEmpty())
})
It("should auto-activate MCP tools without mcp_servers (backward compat)", func() {
// Open Responses auto-activates all MCP servers when no metadata
// mcp_servers key is provided and no user tools are set.
body := map[string]any{
"model": "mock-model-mcp",
"input": "Hello",
}
resp, err := httpPost(fmt.Sprintf("http://127.0.0.1:%d/v1/responses", apiPort), body)
Expect(err).ToNot(HaveOccurred())
defer resp.Body.Close()
respBody := readBody(resp)
Expect(resp.StatusCode).To(Equal(200), "unexpected status, body: %s", respBody)
var result map[string]any
Expect(json.Unmarshal([]byte(respBody), &result)).To(Succeed())
output, ok := result["output"].([]any)
Expect(ok).To(BeTrue(), "expected output array in response: %s", respBody)
Expect(output).ToNot(BeEmpty())
})
})
Context("Streaming", func() {
It("should work with MCP tools in streaming mode", func() {
body := map[string]any{
"model": "mock-model-mcp",
"input": "What is the weather?",
"metadata": map[string]string{"mcp_servers": "weather-api"},
"stream": true,
}
resp, err := httpPost(fmt.Sprintf("http://127.0.0.1:%d/v1/responses", apiPort), body)
Expect(err).ToNot(HaveOccurred())
defer resp.Body.Close()
Expect(resp.StatusCode).To(Equal(200))
data, err := io.ReadAll(resp.Body)
Expect(err).ToNot(HaveOccurred())
Expect(string(data)).To(ContainSubstring("event:"))
})
})
})
Describe("Legacy /mcp endpoint", func() {
It("should auto-enable all MCP servers and complete the tool loop", func() {
body := map[string]any{
"model": "mock-model-mcp",
"messages": []map[string]string{{"role": "user", "content": "What is the weather in San Francisco?"}},
}
resp, err := httpPost(fmt.Sprintf("http://127.0.0.1:%d/mcp/v1/chat/completions", apiPort), body)
Expect(err).ToNot(HaveOccurred())
defer resp.Body.Close()
respBody := readBody(resp)
Expect(resp.StatusCode).To(Equal(200), "unexpected status, body: %s", respBody)
var result struct {
Choices []struct {
Message struct {
Content string `json:"content"`
} `json:"message"`
} `json:"choices"`
}
Expect(json.Unmarshal([]byte(respBody), &result)).To(Succeed())
Expect(result.Choices).To(HaveLen(1))
Expect(result.Choices[0].Message.Content).To(ContainSubstring("weather"))
})
It("should respect metadata mcp_servers when provided", func() {
body := map[string]any{
"model": "mock-model-mcp",
"messages": []map[string]string{{"role": "user", "content": "Hello"}},
"metadata": map[string]string{"mcp_servers": "non-existent-server"},
}
// Even through the /mcp endpoint, an explicit metadata selection
// should be honoured — a non-existent server means no MCP tools.
resp, err := httpPost(fmt.Sprintf("http://127.0.0.1:%d/mcp/v1/chat/completions", apiPort), body)
Expect(err).ToNot(HaveOccurred())
defer resp.Body.Close()
respBody := readBody(resp)
Expect(resp.StatusCode).To(Equal(200), "unexpected status, body: %s", respBody)
var result struct {
Choices []struct {
Message struct {
Content string `json:"content"`
} `json:"message"`
} `json:"choices"`
}
Expect(json.Unmarshal([]byte(respBody), &result)).To(Succeed())
Expect(result.Choices).To(HaveLen(1))
Expect(result.Choices[0].Message.Content).To(ContainSubstring("mocked response"))
})
It("should work in streaming mode", func() {
body := map[string]any{
"model": "mock-model-mcp",
"messages": []map[string]string{{"role": "user", "content": "What is the weather?"}},
"stream": true,
}
resp, err := httpPost(fmt.Sprintf("http://127.0.0.1:%d/mcp/v1/chat/completions", apiPort), body)
Expect(err).ToNot(HaveOccurred())
defer resp.Body.Close()
Expect(resp.StatusCode).To(Equal(200))
Expect(resp.Header.Get("Content-Type")).To(ContainSubstring("text/event-stream"))
data, err := io.ReadAll(resp.Body)
Expect(err).ToNot(HaveOccurred())
Expect(string(data)).To(ContainSubstring("data:"))
})
})
Describe("MCP with invalid server name", func() {
It("should work without MCP tools when specifying non-existent server", func() {
body := map[string]any{
"model": "mock-model-mcp",
"messages": []map[string]string{{"role": "user", "content": "Hello"}},
"metadata": map[string]string{"mcp_servers": "non-existent-server"},
}
resp, err := httpPost(apiURL+"/chat/completions", body)
Expect(err).ToNot(HaveOccurred())
defer resp.Body.Close()
Expect(resp.StatusCode).To(Equal(200))
var result struct {
Choices []struct {
Message struct {
Content string `json:"content"`
} `json:"message"`
} `json:"choices"`
}
Expect(json.NewDecoder(resp.Body).Decode(&result)).To(Succeed())
Expect(result.Choices).To(HaveLen(1))
Expect(result.Choices[0].Message.Content).To(ContainSubstring("mocked response"))
})
})
})

View File

@@ -25,18 +25,20 @@ import (
)
var (
anthropicBaseURL string
tmpDir string
backendPath string
modelsPath string
configPath string
app *echo.Echo
appCtx context.Context
appCancel context.CancelFunc
client openai.Client
apiPort int
apiURL string
mockBackendPath string
anthropicBaseURL string
tmpDir string
backendPath string
modelsPath string
configPath string
app *echo.Echo
appCtx context.Context
appCancel context.CancelFunc
client openai.Client
apiPort int
apiURL string
mockBackendPath string
mcpServerURL string
mcpServerShutdown func()
)
var _ = BeforeSuite(func() {
@@ -99,6 +101,14 @@ var _ = BeforeSuite(func() {
Expect(err).ToNot(HaveOccurred())
Expect(os.WriteFile(configPath, configYAML, 0644)).To(Succeed())
// Start mock MCP server and create MCP-enabled model config
mcpServerURL, mcpServerShutdown = startMockMCPServer()
mcpConfig := mcpModelConfig(mcpServerURL)
mcpConfigPath := filepath.Join(modelsPath, "mock-model-mcp.yaml")
mcpConfigYAML, err := yaml.Marshal(mcpConfig)
Expect(err).ToNot(HaveOccurred())
Expect(os.WriteFile(mcpConfigPath, mcpConfigYAML, 0644)).To(Succeed())
// Set up system state
systemState, err := system.GetSystemState(
system.WithBackendPath(backendPath),
@@ -160,6 +170,9 @@ var _ = AfterSuite(func() {
defer cancel()
Expect(app.Shutdown(ctx)).To(Succeed())
}
if mcpServerShutdown != nil {
mcpServerShutdown()
}
if tmpDir != "" {
os.RemoveAll(tmpDir)
}

View File

@@ -10,6 +10,7 @@ import (
"net"
"os"
"path/filepath"
"strings"
pb "github.com/mudler/LocalAI/pkg/grpc/proto"
"github.com/mudler/xlog"
@@ -20,11 +21,20 @@ var (
addr = flag.String("addr", "localhost:50051", "the address to connect to")
)
// MockBackend implements the Backend gRPC service with mocked responses
// MockBackend implements the Backend gRPC service with mocked responses.
// When tools are present but the prompt already contains MCP tool results
// (indicated by the marker from the mock MCP server), it returns a plain
// text response instead of another tool call, letting the MCP loop complete.
type MockBackend struct {
pb.UnimplementedBackendServer
}
// promptHasToolResults checks if the prompt contains evidence of prior tool
// execution — specifically the output from the mock MCP server's get_weather tool.
func promptHasToolResults(prompt string) bool {
return strings.Contains(prompt, "Weather in")
}
func (m *MockBackend) Health(ctx context.Context, in *pb.HealthMessage) (*pb.Reply, error) {
xlog.Debug("Health check called")
return &pb.Reply{Message: []byte("OK")}, nil
@@ -42,8 +52,12 @@ func (m *MockBackend) Predict(ctx context.Context, in *pb.PredictOptions) (*pb.R
xlog.Debug("Predict called", "prompt", in.Prompt)
var response string
toolName := mockToolNameFromRequest(in)
if toolName != "" {
if toolName != "" && !promptHasToolResults(in.Prompt) {
// First call with tools: return a tool call so the MCP loop executes it.
response = fmt.Sprintf(`{"name": "%s", "arguments": {"location": "San Francisco"}}`, toolName)
} else if toolName != "" {
// Subsequent call: tool results already in prompt, return final text.
response = "Based on the tool results, the weather in San Francisco is sunny, 72°F."
} else {
response = "This is a mocked response."
}
@@ -60,8 +74,10 @@ func (m *MockBackend) PredictStream(in *pb.PredictOptions, stream pb.Backend_Pre
xlog.Debug("PredictStream called", "prompt", in.Prompt)
var toStream string
toolName := mockToolNameFromRequest(in)
if toolName != "" {
if toolName != "" && !promptHasToolResults(in.Prompt) {
toStream = fmt.Sprintf(`{"name": "%s", "arguments": {"location": "San Francisco"}}`, toolName)
} else if toolName != "" {
toStream = "Based on the tool results, the weather in San Francisco is sunny, 72°F."
} else {
toStream = "This is a mocked streaming response."
}