mirror of
https://github.com/mudler/LocalAI.git
synced 2026-04-01 13:42:20 -04:00
* Revert "fix: Add timeout-based wait for model deletion completion (#8756)"
This reverts commit 9e1b0d0c82.
Signed-off-by: Ettore Di Giacinto <mudler@localai.io>
* feat: add mcp prompts and resources
Signed-off-by: Ettore Di Giacinto <mudler@localai.io>
* feat(ui): add client-side MCP
Signed-off-by: Ettore Di Giacinto <mudler@localai.io>
* feat(ui): allow to authenticate MCP servers
Signed-off-by: Ettore Di Giacinto <mudler@localai.io>
* feat(ui): add MCP Apps
Signed-off-by: Ettore Di Giacinto <mudler@localai.io>
* chore: update AGENTS
Signed-off-by: Ettore Di Giacinto <mudler@localai.io>
* chore: allow to collapse navbar, save state in storage
Signed-off-by: Ettore Di Giacinto <mudler@localai.io>
* feat(ui): add MCP button also to home page
Signed-off-by: Ettore Di Giacinto <mudler@localai.io>
* fix(chat): populate string content
Signed-off-by: Ettore Di Giacinto <mudler@localai.io>
---------
Signed-off-by: Ettore Di Giacinto <mudler@localai.io>
611 lines
18 KiB
Markdown
611 lines
18 KiB
Markdown
+++
|
|
title = "🔗 Model Context Protocol (MCP)"
|
|
weight = 20
|
|
toc = true
|
|
description = "Agentic capabilities with Model Context Protocol integration"
|
|
tags = ["MCP", "Agents", "Tools", "Advanced"]
|
|
categories = ["Features"]
|
|
+++
|
|
|
|
|
|
LocalAI now supports the **Model Context Protocol (MCP)**, enabling powerful agentic capabilities by connecting AI models to external tools and services. This feature allows your LocalAI models to interact with various MCP servers, providing access to real-time data, APIs, and specialized tools.
|
|
|
|
## What is MCP?
|
|
|
|
The Model Context Protocol is a standard for connecting AI models to external tools and data sources. It enables AI agents to:
|
|
|
|
- Access real-time information from external APIs
|
|
- Execute commands and interact with external systems
|
|
- Use specialized tools for specific tasks
|
|
- Maintain context across multiple tool interactions
|
|
|
|
## Key Features
|
|
|
|
- **Real-time Tool Access**: Connect to external MCP servers for live data
|
|
- **Multiple Server Support**: Configure both remote HTTP and local stdio servers
|
|
- **Cached Connections**: Efficient tool caching for better performance
|
|
- **Secure Authentication**: Support for bearer token authentication
|
|
- **Multi-endpoint Support**: Works with OpenAI Chat, Anthropic Messages, and Open Responses APIs
|
|
- **Selective Server Activation**: Use `metadata.mcp_servers` to enable only specific servers per request
|
|
- **Server-side Tool Execution**: Tools are executed on the server and results fed back to the model automatically
|
|
- **Agent Configuration**: Customizable execution limits and retry behavior
|
|
- **MCP Prompts**: Discover and expand reusable prompt templates from MCP servers
|
|
- **MCP Resources**: Browse and inject resource content (files, data) from MCP servers into conversations
|
|
|
|
## Configuration
|
|
|
|
MCP support is configured in your model's YAML configuration file using the `mcp` section:
|
|
|
|
```yaml
|
|
name: my-mcp-model
|
|
backend: llama-cpp
|
|
parameters:
|
|
model: qwen3-4b.gguf
|
|
|
|
mcp:
|
|
remote: |
|
|
{
|
|
"mcpServers": {
|
|
"weather-api": {
|
|
"url": "https://api.weather.com/v1",
|
|
"token": "your-api-token"
|
|
},
|
|
"search-engine": {
|
|
"url": "https://search.example.com/mcp",
|
|
"token": "your-search-token"
|
|
}
|
|
}
|
|
}
|
|
|
|
stdio: |
|
|
{
|
|
"mcpServers": {
|
|
"file-manager": {
|
|
"command": "python",
|
|
"args": ["-m", "mcp_file_manager"],
|
|
"env": {
|
|
"API_KEY": "your-key"
|
|
}
|
|
},
|
|
"database-tools": {
|
|
"command": "node",
|
|
"args": ["database-mcp-server.js"],
|
|
"env": {
|
|
"DB_URL": "postgresql://localhost/mydb"
|
|
}
|
|
}
|
|
}
|
|
}
|
|
|
|
agent:
|
|
max_iterations: 10 # Maximum MCP tool execution loop iterations
|
|
```
|
|
|
|
### Configuration Options
|
|
|
|
#### Remote Servers (`remote`)
|
|
Configure HTTP-based MCP servers:
|
|
|
|
- **`url`**: The MCP server endpoint URL
|
|
- **`token`**: Bearer token for authentication (optional)
|
|
|
|
#### STDIO Servers (`stdio`)
|
|
Configure local command-based MCP servers:
|
|
|
|
- **`command`**: The executable command to run
|
|
- **`args`**: Array of command-line arguments
|
|
- **`env`**: Environment variables (optional)
|
|
|
|
#### Agent Configuration (`agent`)
|
|
|
|
- **`max_iterations`**: Maximum number of MCP tool execution loop iterations (default: 10). Each iteration allows the model to call tools and receive results before generating the next response.
|
|
|
|
## Usage
|
|
|
|
### Selecting MCP Servers via `metadata`
|
|
|
|
All API endpoints support MCP server selection through the standard `metadata` field. Pass a comma-separated list of server names in `metadata.mcp_servers`:
|
|
|
|
- **When present**: Only the named MCP servers are activated for this request. Server names must match the keys in the model's MCP config YAML (e.g., `"weather-api"`, `"search-engine"`).
|
|
- **When absent**: Behavior depends on the endpoint:
|
|
- **OpenAI Chat Completions** and **Anthropic Messages**: No MCP tools are injected (standard behavior).
|
|
- **Open Responses**: If the model has MCP config and no user-provided tools, all MCP servers are auto-activated (backward compatible).
|
|
|
|
The `mcp_servers` metadata key is consumed by the MCP engine and stripped before reaching the backend. Clients that support the standard `metadata` field can use this without custom schema extensions.
|
|
|
|
### API Endpoints
|
|
|
|
MCP tools work across all three API endpoints:
|
|
|
|
#### OpenAI Chat Completions (`/v1/chat/completions`)
|
|
|
|
```bash
|
|
curl http://localhost:8080/v1/chat/completions \
|
|
-H "Content-Type: application/json" \
|
|
-d '{
|
|
"model": "my-mcp-model",
|
|
"messages": [{"role": "user", "content": "What is the weather in New York?"}],
|
|
"metadata": {"mcp_servers": "weather-api"},
|
|
"stream": true
|
|
}'
|
|
```
|
|
|
|
#### Anthropic Messages (`/v1/messages`)
|
|
|
|
```bash
|
|
curl http://localhost:8080/v1/messages \
|
|
-H "Content-Type: application/json" \
|
|
-d '{
|
|
"model": "my-mcp-model",
|
|
"max_tokens": 1024,
|
|
"messages": [{"role": "user", "content": "What is the weather in New York?"}],
|
|
"metadata": {"mcp_servers": "weather-api"}
|
|
}'
|
|
```
|
|
|
|
#### Open Responses (`/v1/responses`)
|
|
|
|
```bash
|
|
curl http://localhost:8080/v1/responses \
|
|
-H "Content-Type: application/json" \
|
|
-d '{
|
|
"model": "my-mcp-model",
|
|
"input": "What is the weather in New York?",
|
|
"metadata": {"mcp_servers": "weather-api"}
|
|
}'
|
|
```
|
|
|
|
### Server Listing Endpoint
|
|
|
|
You can list available MCP servers and their tools for a given model:
|
|
|
|
```bash
|
|
curl http://localhost:8080/v1/mcp/servers/my-mcp-model
|
|
```
|
|
|
|
Returns:
|
|
|
|
```json
|
|
[
|
|
{
|
|
"name": "weather-api",
|
|
"type": "remote",
|
|
"tools": ["get_weather", "get_forecast"]
|
|
},
|
|
{
|
|
"name": "search-engine",
|
|
"type": "remote",
|
|
"tools": ["web_search", "image_search"]
|
|
}
|
|
]
|
|
```
|
|
|
|
### MCP Prompts
|
|
|
|
MCP servers can provide reusable prompt templates. LocalAI supports discovering and expanding prompts from MCP servers.
|
|
|
|
#### List Prompts
|
|
|
|
```bash
|
|
curl http://localhost:8080/v1/mcp/prompts/my-mcp-model
|
|
```
|
|
|
|
Returns:
|
|
|
|
```json
|
|
[
|
|
{
|
|
"name": "code-review",
|
|
"description": "Review code for best practices",
|
|
"title": "Code Review",
|
|
"arguments": [
|
|
{"name": "language", "description": "Programming language", "required": true}
|
|
],
|
|
"server": "dev-tools"
|
|
}
|
|
]
|
|
```
|
|
|
|
#### Expand a Prompt
|
|
|
|
```bash
|
|
curl -X POST http://localhost:8080/v1/mcp/prompts/my-mcp-model/code-review \
|
|
-H "Content-Type: application/json" \
|
|
-d '{"arguments": {"language": "go"}}'
|
|
```
|
|
|
|
Returns:
|
|
|
|
```json
|
|
{
|
|
"messages": [
|
|
{"role": "user", "content": "Please review the following Go code for best practices..."}
|
|
]
|
|
}
|
|
```
|
|
|
|
#### Inject Prompts via Metadata
|
|
|
|
You can inject MCP prompts into any chat request using `metadata.mcp_prompt` and `metadata.mcp_prompt_args`:
|
|
|
|
```bash
|
|
curl http://localhost:8080/v1/chat/completions \
|
|
-H "Content-Type: application/json" \
|
|
-d '{
|
|
"model": "my-mcp-model",
|
|
"messages": [{"role": "user", "content": "Review this function: func add(a, b int) int { return a + b }"}],
|
|
"metadata": {
|
|
"mcp_servers": "dev-tools",
|
|
"mcp_prompt": "code-review",
|
|
"mcp_prompt_args": "{\"language\": \"go\"}"
|
|
}
|
|
}'
|
|
```
|
|
|
|
The prompt messages are prepended to the conversation before inference.
|
|
|
|
### MCP Resources
|
|
|
|
MCP servers can expose data/content (files, database records, etc.) as resources identified by URI.
|
|
|
|
#### List Resources
|
|
|
|
```bash
|
|
curl http://localhost:8080/v1/mcp/resources/my-mcp-model
|
|
```
|
|
|
|
Returns:
|
|
|
|
```json
|
|
[
|
|
{
|
|
"name": "project-readme",
|
|
"uri": "file:///README.md",
|
|
"description": "Project documentation",
|
|
"mimeType": "text/markdown",
|
|
"server": "file-manager"
|
|
}
|
|
]
|
|
```
|
|
|
|
#### Read a Resource
|
|
|
|
```bash
|
|
curl -X POST http://localhost:8080/v1/mcp/resources/my-mcp-model/read \
|
|
-H "Content-Type: application/json" \
|
|
-d '{"uri": "file:///README.md"}'
|
|
```
|
|
|
|
Returns:
|
|
|
|
```json
|
|
{
|
|
"uri": "file:///README.md",
|
|
"content": "# My Project\n...",
|
|
"mimeType": "text/markdown"
|
|
}
|
|
```
|
|
|
|
#### Inject Resources via Metadata
|
|
|
|
You can inject MCP resources into chat requests using `metadata.mcp_resources` (comma-separated URIs):
|
|
|
|
```bash
|
|
curl http://localhost:8080/v1/chat/completions \
|
|
-H "Content-Type: application/json" \
|
|
-d '{
|
|
"model": "my-mcp-model",
|
|
"messages": [{"role": "user", "content": "Summarize this project"}],
|
|
"metadata": {
|
|
"mcp_servers": "file-manager",
|
|
"mcp_resources": "file:///README.md,file:///CHANGELOG.md"
|
|
}
|
|
}'
|
|
```
|
|
|
|
Resource contents are appended to the last user message as text blocks (following the same approach as llama.cpp's WebUI).
|
|
|
|
### Legacy Endpoint
|
|
|
|
The `/mcp/v1/chat/completions` endpoint is still supported for backward compatibility. It automatically enables all configured MCP servers (equivalent to not specifying `mcp_servers`).
|
|
|
|
```bash
|
|
curl http://localhost:8080/mcp/v1/chat/completions \
|
|
-H "Content-Type: application/json" \
|
|
-d '{
|
|
"model": "my-mcp-model",
|
|
"messages": [
|
|
{"role": "user", "content": "What is the current weather in New York?"}
|
|
]
|
|
}'
|
|
```
|
|
|
|
### Example Response
|
|
|
|
```json
|
|
{
|
|
"id": "chatcmpl-123",
|
|
"created": 1699123456,
|
|
"model": "my-mcp-model",
|
|
"choices": [
|
|
{
|
|
"text": "The current weather in New York is 72°F (22°C) with partly cloudy skies."
|
|
}
|
|
],
|
|
"object": "text_completion"
|
|
}
|
|
```
|
|
|
|
## Example Configurations
|
|
|
|
### Docker-based Tools
|
|
|
|
```yaml
|
|
name: docker-agent
|
|
backend: llama-cpp
|
|
parameters:
|
|
model: qwen3-4b.gguf
|
|
|
|
mcp:
|
|
stdio: |
|
|
{
|
|
"mcpServers": {
|
|
"searxng": {
|
|
"command": "docker",
|
|
"args": [
|
|
"run", "-i", "--rm",
|
|
"quay.io/mudler/tests:duckduckgo-localai"
|
|
]
|
|
}
|
|
}
|
|
}
|
|
|
|
agent:
|
|
max_iterations: 10
|
|
```
|
|
|
|
## How It Works
|
|
|
|
1. **Tool Discovery**: LocalAI connects to configured MCP servers and discovers available tools
|
|
2. **Tool Injection**: Discovered tools are injected into the model's tool/function list alongside any user-provided tools
|
|
3. **Inference Loop**: The model generates a response. If it calls MCP tools, LocalAI executes them server-side, appends results to the conversation, and re-runs inference
|
|
4. **Response Generation**: When the model produces a final response (no more MCP tool calls), it is returned to the client
|
|
|
|
The execution loop is bounded by `agent.max_iterations` (default 10) to prevent infinite loops.
|
|
|
|
## Session Lifecycle
|
|
|
|
MCP sessions are automatically managed by LocalAI:
|
|
|
|
- **Lazy initialization**: Sessions are created the first time a model's MCP tools are used
|
|
- **Cached per model**: Sessions are reused across requests for the same model
|
|
- **Cleanup on model unload**: When a model is unloaded (idle watchdog eviction, manual stop, or shutdown), all associated MCP sessions are closed and resources freed
|
|
- **Graceful shutdown**: All MCP sessions are closed when LocalAI shuts down
|
|
|
|
This means you don't need to manually manage MCP connections — they follow the model's lifecycle automatically.
|
|
|
|
## Supported MCP Servers
|
|
|
|
LocalAI is compatible with any MCP-compliant server.
|
|
|
|
## Best Practices
|
|
|
|
### Security
|
|
- Use environment variables for sensitive tokens
|
|
- Validate MCP server endpoints before deployment
|
|
- Implement proper authentication for remote servers
|
|
|
|
### Performance
|
|
- Cache frequently used tools
|
|
- Use appropriate timeout values for external APIs
|
|
- Monitor resource usage for stdio servers
|
|
|
|
### Error Handling
|
|
- Implement fallback mechanisms for tool failures
|
|
- Log tool execution for debugging
|
|
- Handle network timeouts gracefully
|
|
|
|
### With External Applications
|
|
|
|
Use MCP-enabled models in your applications:
|
|
|
|
```python
|
|
import openai
|
|
|
|
client = openai.OpenAI(
|
|
base_url="http://localhost:8080/v1",
|
|
api_key="your-api-key"
|
|
)
|
|
|
|
response = client.chat.completions.create(
|
|
model="my-mcp-model",
|
|
messages=[
|
|
{"role": "user", "content": "Analyze the latest research papers on AI"}
|
|
],
|
|
extra_body={"metadata": {"mcp_servers": "search-engine"}}
|
|
)
|
|
```
|
|
|
|
### MCP and adding packages
|
|
|
|
It might be handy to install packages before starting the container to setup the environment. This is an example on how you can do that with docker-compose (installing and configuring docker)
|
|
|
|
```yaml
|
|
services:
|
|
local-ai:
|
|
image: localai/localai:latest
|
|
#image: localai/localai:latest-gpu-nvidia-cuda-13
|
|
#image: localai/localai:latest-gpu-nvidia-cuda-12
|
|
container_name: local-ai
|
|
restart: always
|
|
entrypoint: [ "/bin/bash" ]
|
|
command: >
|
|
-c "apt-get update &&
|
|
apt-get install -y docker.io &&
|
|
/entrypoint.sh"
|
|
environment:
|
|
- DEBUG=true
|
|
- LOCALAI_WATCHDOG_IDLE=true
|
|
- LOCALAI_WATCHDOG_BUSY=true
|
|
- LOCALAI_WATCHDOG_IDLE_TIMEOUT=15m
|
|
- LOCALAI_WATCHDOG_BUSY_TIMEOUT=15m
|
|
- LOCALAI_API_KEY=my-beautiful-api-key
|
|
- DOCKER_HOST=tcp://docker:2376
|
|
- DOCKER_TLS_VERIFY=1
|
|
- DOCKER_CERT_PATH=/certs/client
|
|
ports:
|
|
- "8080:8080"
|
|
volumes:
|
|
- /data/models:/models
|
|
- /data/backends:/backends
|
|
- certs:/certs:ro
|
|
# uncomment for nvidia
|
|
# deploy:
|
|
# resources:
|
|
# reservations:
|
|
# devices:
|
|
# - capabilities: [gpu]
|
|
# device_ids: ['7']
|
|
# runtime: nvidia
|
|
|
|
docker:
|
|
image: docker:dind
|
|
privileged: true
|
|
container_name: docker
|
|
volumes:
|
|
- certs:/certs
|
|
healthcheck:
|
|
test: ["CMD", "docker", "info"]
|
|
interval: 10s
|
|
timeout: 5s
|
|
volumes:
|
|
certs:
|
|
```
|
|
|
|
An example model config (to append to any existing model you have) can be:
|
|
|
|
```yaml
|
|
mcp:
|
|
stdio: |
|
|
{
|
|
"mcpServers": {
|
|
"weather": {
|
|
"command": "docker",
|
|
"args": [
|
|
"run", "-i", "--rm",
|
|
"ghcr.io/mudler/mcps/weather:master"
|
|
]
|
|
},
|
|
"memory": {
|
|
"command": "docker",
|
|
"env": {
|
|
"MEMORY_INDEX_PATH": "/data/memory.bleve"
|
|
},
|
|
"args": [
|
|
"run", "-i", "--rm", "-v", "/host/data:/data",
|
|
"ghcr.io/mudler/mcps/memory:master"
|
|
]
|
|
},
|
|
"ddg": {
|
|
"command": "docker",
|
|
"env": {
|
|
"MAX_RESULTS": "10"
|
|
},
|
|
"args": [
|
|
"run", "-i", "--rm", "-e", "MAX_RESULTS",
|
|
"ghcr.io/mudler/mcps/duckduckgo:master"
|
|
]
|
|
}
|
|
}
|
|
}
|
|
```
|
|
|
|
### Links
|
|
|
|
- [Awesome MCPs](https://github.com/punkpeye/awesome-mcp-servers)
|
|
- [A list of MCPs by mudler](https://github.com/mudler/MCPs)
|
|
|
|
## Client-Side MCP (Browser)
|
|
|
|
In addition to server-side MCP (where the backend connects to MCP servers), LocalAI supports **client-side MCP** where the browser connects directly to MCP servers. This is inspired by llama.cpp's WebUI and works alongside server-side MCP.
|
|
|
|
### How It Works
|
|
|
|
1. **Add servers in the UI**: Click the "Client MCP" button in the chat header and add MCP server URLs
|
|
2. **Browser connects directly**: The browser uses the MCP TypeScript SDK (`StreamableHTTPClientTransport` or `SSEClientTransport`) to connect to MCP servers
|
|
3. **Tool discovery**: Connected servers' tools are sent as `tools` in the chat request body
|
|
4. **Browser-side execution**: When the LLM calls a client-side tool, the browser executes it against the MCP server and sends the result back in a follow-up request
|
|
5. **Agentic loop**: This continues (up to 10 turns) until the LLM produces a final response
|
|
|
|
### CORS Proxy
|
|
|
|
Since browsers enforce CORS restrictions, LocalAI provides a built-in proxy at `/api/cors-proxy`. When "Use CORS proxy" is enabled (default), requests to external MCP servers are routed through:
|
|
|
|
```
|
|
/api/cors-proxy?url=https://remote-mcp-server.example.com/sse
|
|
```
|
|
|
|
The proxy forwards the request method, headers, and body to the target URL and streams the response back with appropriate CORS headers.
|
|
|
|
### MCP Apps (Interactive Tool UIs)
|
|
|
|
LocalAI supports the [MCP Apps extension](https://modelcontextprotocol.io/extensions/apps/overview), which allows MCP tools to declare interactive HTML UIs. When a tool has `_meta.ui.resourceUri` in its definition, calling that tool renders the app's HTML inline in the chat as a sandboxed iframe.
|
|
|
|
**How it works:**
|
|
|
|
- When the LLM calls a tool with `_meta.ui.resourceUri`, the browser fetches the HTML resource from the MCP server and renders it in an iframe
|
|
- The iframe is sandboxed (`allow-scripts allow-forms`, no `allow-same-origin`) for security
|
|
- The app can call server tools, send messages, and update context via the `AppBridge` protocol (JSON-RPC over `postMessage`)
|
|
- Tools marked as app-only (`_meta.ui.visibility: "app-only"`) are hidden from the LLM and only callable by the app iframe
|
|
- On page reload, apps render statically until the MCP connection is re-established
|
|
|
|
**Requirements:**
|
|
|
|
- Only works with **client-side MCP** connections (the browser must be connected to the MCP server)
|
|
- The MCP server must implement the Apps extension (`_meta.ui.resourceUri` on tools, resource serving)
|
|
|
|
### Coexistence with Server-Side MCP
|
|
|
|
Both modes work simultaneously in the same chat:
|
|
|
|
- **Server-side MCP tools** are configured in model YAML files and executed by the backend. The backend handles these in its own agentic loop.
|
|
- **Client-side MCP tools** are configured per-user in the browser and sent as `tools` in the request. When the LLM calls them, the browser executes them.
|
|
|
|
If both sides have a tool with the same name, the server-side tool takes priority.
|
|
|
|
### Security Considerations
|
|
|
|
- The CORS proxy can forward requests to any HTTP/HTTPS URL. It is only available when MCP is enabled (`LOCALAI_DISABLE_MCP` is not set).
|
|
- Client-side MCP server configurations are stored in the browser's localStorage and are not shared with the server.
|
|
- Custom headers (e.g., API keys) for MCP servers are stored in localStorage. Use with caution on shared machines.
|
|
|
|
## Disabling MCP Support
|
|
|
|
You can completely disable MCP functionality in LocalAI by setting the `LOCALAI_DISABLE_MCP` environment variable to `true`, `1`, or `yes`:
|
|
|
|
```bash
|
|
export LOCALAI_DISABLE_MCP=true
|
|
```
|
|
|
|
When this environment variable is set, all MCP-related features will be disabled, including:
|
|
- MCP server connections (both remote and stdio)
|
|
- Agent tool execution
|
|
- The `/mcp/v1/chat/completions` endpoint
|
|
|
|
This is useful when you want to:
|
|
- Run LocalAI without MCP capabilities for security reasons
|
|
- Reduce the attack surface by disabling unnecessary features
|
|
- Troubleshoot MCP-related issues
|
|
|
|
### Example
|
|
|
|
```bash
|
|
# Disable MCP completely
|
|
LOCALAI_DISABLE_MCP=true localai run
|
|
|
|
# Or in Docker
|
|
docker run -e LOCALAI_DISABLE_MCP=true localai/localai:latest
|
|
```
|
|
|
|
When MCP is disabled, any model configuration with `mcp` sections will be ignored, and attempts to use the MCP endpoint will return an error indicating that MCP support is disabled.
|