LocalAI/docs/content/features/mcp.md

+++
title = "🔗 Model Context Protocol (MCP)"
weight = 20
toc = true
description = "Agentic capabilities with Model Context Protocol integration"
tags = ["MCP", "Agents", "Tools", "Advanced"]
categories = ["Features"]
+++


LocalAI now supports the **Model Context Protocol (MCP)**, enabling powerful agentic capabilities by connecting AI models to external tools and services. This feature allows your LocalAI models to interact with various MCP servers, providing access to real-time data, APIs, and specialized tools.

## What is MCP?

The Model Context Protocol is a standard for connecting AI models to external tools and data sources. It enables AI agents to:

- Access real-time information from external APIs
- Execute commands and interact with external systems
- Use specialized tools for specific tasks
- Maintain context across multiple tool interactions

## Key Features

- **Real-time Tool Access**: Connect to external MCP servers for live data
- **Multiple Server Support**: Configure both remote HTTP and local stdio servers
- **Cached Connections**: Efficient tool caching for better performance
- **Secure Authentication**: Support for bearer token authentication
- **Multi-endpoint Support**: Works with OpenAI Chat, Anthropic Messages, and Open Responses APIs
- **Selective Server Activation**: Use `metadata.mcp_servers` to enable only specific servers per request
- **Server-side Tool Execution**: Tools are executed on the server and results fed back to the model automatically
- **Agent Configuration**: Customizable execution limits and retry behavior
- **MCP Prompts**: Discover and expand reusable prompt templates from MCP servers
- **MCP Resources**: Browse and inject resource content (files, data) from MCP servers into conversations

## Configuration

MCP support is configured in your model's YAML configuration file using the `mcp` section:

```yaml
name: my-mcp-model
backend: llama-cpp
parameters:
  model: qwen3-4b.gguf

mcp:
  remote: |
    {
      "mcpServers": {
        "weather-api": {
          "url": "https://api.weather.com/v1",
          "token": "your-api-token"
        },
        "search-engine": {
          "url": "https://search.example.com/mcp",
          "token": "your-search-token"
        }
      }
    }

  stdio: |
    {
      "mcpServers": {
        "file-manager": {
          "command": "python",
          "args": ["-m", "mcp_file_manager"],
          "env": {
            "API_KEY": "your-key"
          }
        },
        "database-tools": {
          "command": "node",
          "args": ["database-mcp-server.js"],
          "env": {
            "DB_URL": "postgresql://localhost/mydb"
          }
        }
      }
    }

agent:
  max_iterations: 10             # Maximum MCP tool execution loop iterations
```

### Configuration Options

#### Remote Servers (`remote`)
Configure HTTP-based MCP servers:

- **`url`**: The MCP server endpoint URL
- **`token`**: Bearer token for authentication (optional)

#### STDIO Servers (`stdio`)
Configure local command-based MCP servers:

- **`command`**: The executable command to run
- **`args`**: Array of command-line arguments
- **`env`**: Environment variables (optional)

#### Agent Configuration (`agent`)

- **`max_iterations`**: Maximum number of MCP tool execution loop iterations (default: 10). Each iteration allows the model to call tools and receive results before generating the next response.

## Usage

### Selecting MCP Servers via `metadata`

All API endpoints support MCP server selection through the standard `metadata` field. Pass a comma-separated list of server names in `metadata.mcp_servers`:

- **When present**: Only the named MCP servers are activated for this request. Server names must match the keys in the model's MCP config YAML (e.g., `"weather-api"`, `"search-engine"`).
- **When absent**: Behavior depends on the endpoint:
  - **OpenAI Chat Completions** and **Anthropic Messages**: No MCP tools are injected (standard behavior).
  - **Open Responses**: If the model has MCP config and no user-provided tools, all MCP servers are auto-activated (backward compatible).

The `mcp_servers` metadata key is consumed by the MCP engine and stripped before reaching the backend. Clients that support the standard `metadata` field can use this without custom schema extensions.

### API Endpoints

MCP tools work across all three API endpoints:

#### OpenAI Chat Completions (`/v1/chat/completions`)

```bash
curl http://localhost:8080/v1/chat/completions \
  -H "Content-Type: application/json" \
  -d '{
    "model": "my-mcp-model",
    "messages": [{"role": "user", "content": "What is the weather in New York?"}],
    "metadata": {"mcp_servers": "weather-api"},
    "stream": true
  }'
```

#### Anthropic Messages (`/v1/messages`)

```bash
curl http://localhost:8080/v1/messages \
  -H "Content-Type: application/json" \
  -d '{
    "model": "my-mcp-model",
    "max_tokens": 1024,
    "messages": [{"role": "user", "content": "What is the weather in New York?"}],
    "metadata": {"mcp_servers": "weather-api"}
  }'
```

#### Open Responses (`/v1/responses`)

```bash
curl http://localhost:8080/v1/responses \
  -H "Content-Type: application/json" \
  -d '{
    "model": "my-mcp-model",
    "input": "What is the weather in New York?",
    "metadata": {"mcp_servers": "weather-api"}
  }'
```

### Server Listing Endpoint

You can list available MCP servers and their tools for a given model:

```bash
curl http://localhost:8080/v1/mcp/servers/my-mcp-model
```

Returns:

```json
[
  {
    "name": "weather-api",
    "type": "remote",
    "tools": ["get_weather", "get_forecast"]
  },
  {
    "name": "search-engine",
    "type": "remote",
    "tools": ["web_search", "image_search"]
  }
]
```

### MCP Prompts

MCP servers can provide reusable prompt templates. LocalAI supports discovering and expanding prompts from MCP servers.

#### List Prompts

```bash
curl http://localhost:8080/v1/mcp/prompts/my-mcp-model
```

Returns:

```json
[
  {
    "name": "code-review",
    "description": "Review code for best practices",
    "title": "Code Review",
    "arguments": [
      {"name": "language", "description": "Programming language", "required": true}
    ],
    "server": "dev-tools"
  }
]
```

#### Expand a Prompt

```bash
curl -X POST http://localhost:8080/v1/mcp/prompts/my-mcp-model/code-review \
  -H "Content-Type: application/json" \
  -d '{"arguments": {"language": "go"}}'
```

Returns:

```json
{
  "messages": [
    {"role": "user", "content": "Please review the following Go code for best practices..."}
  ]
}
```

#### Inject Prompts via Metadata

You can inject MCP prompts into any chat request using `metadata.mcp_prompt` and `metadata.mcp_prompt_args`:

```bash
curl http://localhost:8080/v1/chat/completions \
  -H "Content-Type: application/json" \
  -d '{
    "model": "my-mcp-model",
    "messages": [{"role": "user", "content": "Review this function: func add(a, b int) int { return a + b }"}],
    "metadata": {
      "mcp_servers": "dev-tools",
      "mcp_prompt": "code-review",
      "mcp_prompt_args": "{\"language\": \"go\"}"
    }
  }'
```

The prompt messages are prepended to the conversation before inference.

### MCP Resources

MCP servers can expose data/content (files, database records, etc.) as resources identified by URI.

#### List Resources

```bash
curl http://localhost:8080/v1/mcp/resources/my-mcp-model
```

Returns:

```json
[
  {
    "name": "project-readme",
    "uri": "file:///README.md",
    "description": "Project documentation",
    "mimeType": "text/markdown",
    "server": "file-manager"
  }
]
```

#### Read a Resource

```bash
curl -X POST http://localhost:8080/v1/mcp/resources/my-mcp-model/read \
  -H "Content-Type: application/json" \
  -d '{"uri": "file:///README.md"}'
```

Returns:

```json
{
  "uri": "file:///README.md",
  "content": "# My Project\n...",
  "mimeType": "text/markdown"
}
```

#### Inject Resources via Metadata

You can inject MCP resources into chat requests using `metadata.mcp_resources` (comma-separated URIs):

```bash
curl http://localhost:8080/v1/chat/completions \
  -H "Content-Type: application/json" \
  -d '{
    "model": "my-mcp-model",
    "messages": [{"role": "user", "content": "Summarize this project"}],
    "metadata": {
      "mcp_servers": "file-manager",
      "mcp_resources": "file:///README.md,file:///CHANGELOG.md"
    }
  }'
```

Resource contents are appended to the last user message as text blocks (following the same approach as llama.cpp's WebUI).

### Legacy Endpoint

The `/mcp/v1/chat/completions` endpoint is still supported for backward compatibility. It automatically enables all configured MCP servers (equivalent to not specifying `mcp_servers`).

```bash
curl http://localhost:8080/mcp/v1/chat/completions \
  -H "Content-Type: application/json" \
  -d '{
    "model": "my-mcp-model",
    "messages": [
      {"role": "user", "content": "What is the current weather in New York?"}
    ]
  }'
```

### Example Response

```json
{
  "id": "chatcmpl-123",
  "created": 1699123456,
  "model": "my-mcp-model",
  "choices": [
    {
      "text": "The current weather in New York is 72°F (22°C) with partly cloudy skies."
    }
  ],
  "object": "text_completion"
}
```

## Example Configurations

### Docker-based Tools

```yaml
name: docker-agent
backend: llama-cpp
parameters:
  model: qwen3-4b.gguf

mcp:
  stdio: |
    {
      "mcpServers": {
        "searxng": {
          "command": "docker",
          "args": [
            "run", "-i", "--rm",
            "quay.io/mudler/tests:duckduckgo-localai"
          ]
        }
      }
    }

agent:
  max_iterations: 10
```

## How It Works

1. **Tool Discovery**: LocalAI connects to configured MCP servers and discovers available tools
2. **Tool Injection**: Discovered tools are injected into the model's tool/function list alongside any user-provided tools
3. **Inference Loop**: The model generates a response. If it calls MCP tools, LocalAI executes them server-side, appends results to the conversation, and re-runs inference
4. **Response Generation**: When the model produces a final response (no more MCP tool calls), it is returned to the client

The execution loop is bounded by `agent.max_iterations` (default 10) to prevent infinite loops.

## Session Lifecycle

MCP sessions are automatically managed by LocalAI:

- **Lazy initialization**: Sessions are created the first time a model's MCP tools are used
- **Cached per model**: Sessions are reused across requests for the same model
- **Cleanup on model unload**: When a model is unloaded (idle watchdog eviction, manual stop, or shutdown), all associated MCP sessions are closed and resources freed
- **Graceful shutdown**: All MCP sessions are closed when LocalAI shuts down

This means you don't need to manually manage MCP connections — they follow the model's lifecycle automatically.

## Supported MCP Servers

LocalAI is compatible with any MCP-compliant server.

## Best Practices

### Security
- Use environment variables for sensitive tokens
- Validate MCP server endpoints before deployment
- Implement proper authentication for remote servers

### Performance
- Cache frequently used tools
- Use appropriate timeout values for external APIs
- Monitor resource usage for stdio servers

### Error Handling
- Implement fallback mechanisms for tool failures
- Log tool execution for debugging
- Handle network timeouts gracefully

### With External Applications

Use MCP-enabled models in your applications:

```python
import openai

client = openai.OpenAI(
    base_url="http://localhost:8080/v1",
    api_key="your-api-key"
)

response = client.chat.completions.create(
    model="my-mcp-model",
    messages=[
        {"role": "user", "content": "Analyze the latest research papers on AI"}
    ],
    extra_body={"metadata": {"mcp_servers": "search-engine"}}
)
```

### MCP and adding packages

It might be handy to install packages before starting the container to setup the environment. This is an example on how you can do that with docker-compose (installing and configuring docker)

```yaml
services:
  local-ai:
    image: localai/localai:latest
    #image: localai/localai:latest-gpu-nvidia-cuda-13
    #image: localai/localai:latest-gpu-nvidia-cuda-12
    container_name: local-ai
    restart: always
    entrypoint: [ "/bin/bash" ]
    command: >
     -c "apt-get update &&
         apt-get install -y docker.io &&
         /entrypoint.sh"
    environment:
      - DEBUG=true
      - LOCALAI_WATCHDOG_IDLE=true
      - LOCALAI_WATCHDOG_BUSY=true
      - LOCALAI_WATCHDOG_IDLE_TIMEOUT=15m
      - LOCALAI_WATCHDOG_BUSY_TIMEOUT=15m
      - LOCALAI_API_KEY=my-beautiful-api-key
      - DOCKER_HOST=tcp://docker:2376
      - DOCKER_TLS_VERIFY=1
      - DOCKER_CERT_PATH=/certs/client
    ports:
      - "8080:8080"
    volumes:
      - /data/models:/models
      - /data/backends:/backends
      - certs:/certs:ro
    # uncomment for nvidia
    # deploy:
    #   resources:
    #     reservations:
    #       devices:
    #         - capabilities: [gpu]
    #           device_ids: ['7']
    # runtime: nvidia

  docker:
    image: docker:dind
    privileged: true
    container_name: docker
    volumes:
      - certs:/certs
    healthcheck:
      test: ["CMD", "docker", "info"]
      interval: 10s
      timeout: 5s
volumes:
  certs:
```

An example model config (to append to any existing model you have) can be:

```yaml
mcp:
  stdio: |
     {
      "mcpServers": {
        "weather": {
          "command": "docker",
          "args": [
            "run", "-i", "--rm",
            "ghcr.io/mudler/mcps/weather:master"
          ]
        },
        "memory": {
          "command": "docker",
          "env": {
            "MEMORY_INDEX_PATH": "/data/memory.bleve"
          },
          "args": [
            "run", "-i", "--rm", "-v", "/host/data:/data",
            "ghcr.io/mudler/mcps/memory:master"
          ]
        },
        "ddg": {
          "command": "docker",
          "env": {
            "MAX_RESULTS": "10"
          },
          "args": [
            "run", "-i", "--rm", "-e", "MAX_RESULTS",
            "ghcr.io/mudler/mcps/duckduckgo:master"
          ]
        }
      }
     }
```

### Links

- [Awesome MCPs](https://github.com/punkpeye/awesome-mcp-servers)
- [A list of MCPs by mudler](https://github.com/mudler/MCPs)

## Client-Side MCP (Browser)

In addition to server-side MCP (where the backend connects to MCP servers), LocalAI supports **client-side MCP** where the browser connects directly to MCP servers. This is inspired by llama.cpp's WebUI and works alongside server-side MCP.

### How It Works

1. **Add servers in the UI**: Click the "Client MCP" button in the chat header and add MCP server URLs
2. **Browser connects directly**: The browser uses the MCP TypeScript SDK (`StreamableHTTPClientTransport` or `SSEClientTransport`) to connect to MCP servers
3. **Tool discovery**: Connected servers' tools are sent as `tools` in the chat request body
4. **Browser-side execution**: When the LLM calls a client-side tool, the browser executes it against the MCP server and sends the result back in a follow-up request
5. **Agentic loop**: This continues (up to 10 turns) until the LLM produces a final response

### CORS Proxy

Since browsers enforce CORS restrictions, LocalAI provides a built-in proxy at `/api/cors-proxy`. When "Use CORS proxy" is enabled (default), requests to external MCP servers are routed through:

```
/api/cors-proxy?url=https://remote-mcp-server.example.com/sse
```

The proxy forwards the request method, headers, and body to the target URL and streams the response back with appropriate CORS headers.

### MCP Apps (Interactive Tool UIs)

LocalAI supports the [MCP Apps extension](https://modelcontextprotocol.io/extensions/apps/overview), which allows MCP tools to declare interactive HTML UIs. When a tool has `_meta.ui.resourceUri` in its definition, calling that tool renders the app's HTML inline in the chat as a sandboxed iframe.

**How it works:**

- When the LLM calls a tool with `_meta.ui.resourceUri`, the browser fetches the HTML resource from the MCP server and renders it in an iframe
- The iframe is sandboxed (`allow-scripts allow-forms`, no `allow-same-origin`) for security
- The app can call server tools, send messages, and update context via the `AppBridge` protocol (JSON-RPC over `postMessage`)
- Tools marked as app-only (`_meta.ui.visibility: "app-only"`) are hidden from the LLM and only callable by the app iframe
- On page reload, apps render statically until the MCP connection is re-established

**Requirements:**

- Only works with **client-side MCP** connections (the browser must be connected to the MCP server)
- The MCP server must implement the Apps extension (`_meta.ui.resourceUri` on tools, resource serving)

### Coexistence with Server-Side MCP

Both modes work simultaneously in the same chat:

- **Server-side MCP tools** are configured in model YAML files and executed by the backend. The backend handles these in its own agentic loop.
- **Client-side MCP tools** are configured per-user in the browser and sent as `tools` in the request. When the LLM calls them, the browser executes them.

If both sides have a tool with the same name, the server-side tool takes priority.

### Security Considerations

- The CORS proxy can forward requests to any HTTP/HTTPS URL. It is only available when MCP is enabled (`LOCALAI_DISABLE_MCP` is not set).
- Client-side MCP server configurations are stored in the browser's localStorage and are not shared with the server.
- Custom headers (e.g., API keys) for MCP servers are stored in localStorage. Use with caution on shared machines.

## Disabling MCP Support

You can completely disable MCP functionality in LocalAI by setting the `LOCALAI_DISABLE_MCP` environment variable to `true`, `1`, or `yes`:

```bash
export LOCALAI_DISABLE_MCP=true
```

When this environment variable is set, all MCP-related features will be disabled, including:
- MCP server connections (both remote and stdio)
- Agent tool execution
- The `/mcp/v1/chat/completions` endpoint

This is useful when you want to:
- Run LocalAI without MCP capabilities for security reasons
- Reduce the attack surface by disabling unnecessary features
- Troubleshoot MCP-related issues

### Example

```bash
# Disable MCP completely
LOCALAI_DISABLE_MCP=true localai run

# Or in Docker
docker run -e LOCALAI_DISABLE_MCP=true localai/localai:latest
```

When MCP is disabled, any model configuration with `mcp` sections will be ignored, and attempts to use the MCP endpoint will return an error indicating that MCP support is disabled.