LocalAI/docs/content/features/mitm-proxy.md at e4c70fca7a2afa197656d28eb0e26954ffcad96c

mirror of https://github.com/mudler/LocalAI.git synced 2026-05-29 11:07:18 -04:00

Files

Richard Palethorpe 6a80e23733 feat(middleware): Model routing, PII filtering, Cloud model proxies (#9802 )

Add a routing middleware stack and a cloud-proxy backend.

* cloud-proxy: a Go gRPC backend that forwards OpenAI- and
  Anthropic-shaped chat requests to upstream providers, with an
  optional translate mode (OpenAI request -> Anthropic /v1/messages
  -> OpenAI response) and full tool-calling support.

* routing: admission control, content-aware model routing
  (embedding cache + classifier + rerank + Arch-Router score),
  PII detection/redaction (regex + NER) with streaming filter and
  OpenAI/Anthropic adapters, and a per-user/per-key billing recorder
  backed by GORM or in-memory storage.

* middleware: UsageMiddleware records usage via the billing recorder,
  plus admission, route-model, usage-stamp and trace middlewares.

* observability: BackendTrace ring buffer stores full request bodies
  (capped), MITM proxy emits structured trace events, and router
  classifier decisions surface at /api/router/decide.

* gallery: Arch-Router-1.5B (Q4_K_M and Q8_0).

* UI: cloud-proxy model-editor fields, classifier system-prompt and
  score-normalization config, and a Traces page rendering request
  bodies.

Assisted-by: claude-code:claude-opus-4-7 [Read] [Edit] [Bash]

Signed-off-by: Richard Palethorpe <io@richiejp.com>

2026-05-25 09:28:27 +02:00

6.7 KiB

Raw Blame History

+++ title = "MITM proxy for Claude Code / Codex CLI" weight = 29 toc = true description = "Redact PII from cloud-AI traffic without LocalAI holding API keys" tags = ["Proxy", "MITM", "Privacy", "Routing", "Advanced"] categories = ["Features"] +++

LocalAI can act as a local HTTPS proxy that redacts PII from your Claude Code, OpenAI Codex CLI, or any HTTPS client without holding their API keys. The proxy intercepts only the LLM API endpoints you allowlist (default: api.anthropic.com, api.openai.com); everything else — OAuth, telemetry, package fetches — passes through as a plain TCP tunnel.

Use this when:

You want to use Claude Code with a Claude Pro/Max subscription but still apply the same PII redaction LocalAI applies to API-key traffic.
You run Codex CLI on a corporate laptop and need an audit trail of prompts.
You want LocalAI to enforce egress policies for AI traffic without becoming the API endpoint clients talk to.

The proxy is off by default. Operators opt in by setting --mitm-listen and distributing the generated CA cert.

How it works

The proxy generates a private CA on first start (persisted to disk).
Clients set HTTPS_PROXY=http://localai:port and add the CA to their trust store (e.g. NODE_EXTRA_CA_CERTS for Node-based CLIs like Claude Code and Codex).
The CLI sends CONNECT api.anthropic.com:443 to the proxy.
For allowlisted hosts, the proxy mints a per-host leaf cert signed by the CA, terminates TLS, parses the HTTP request, applies the global PII redactor on /v1/messages or /v1/chat/completions, and forwards to the real upstream over its own TLS connection.
The streaming SSE response runs through the same pii.StreamFilter the cloud-proxy backend uses.
For non-allowlisted hosts, the proxy is a plain CONNECT tunnel — no TLS termination, no inspection, no CA trust required.

The CLI authenticates with its own subscription / API key as it normally would. LocalAI never holds the credential — it just observes and rewrites the request body.

Quick start

Start LocalAI with the MITM listener:

local-ai run --mitm-listen :8443

The first start generates a CA at <data-path>/mitm-ca/{ca.crt,ca.key}. Restarting reloads the same CA so clients keep trusting it.

Download the public CA cert:

curl -O http://localhost:8080/api/middleware/proxy-ca.crt

Configure Claude Code to use the proxy and trust the cert:

export HTTPS_PROXY=http://localhost:8443
export NODE_EXTRA_CA_CERTS=$(pwd)/proxy-ca.crt
claude

Now any claude chat session that touches api.anthropic.com/v1/messages gets its prompts and tool inputs scanned by LocalAI's PII filter, and any PII the model emits in its streaming response is masked before reaching your terminal. Events appear in the LocalAI middleware admin page under Filtering → Recent events.

The same works for Codex CLI — set HTTPS_PROXY and NODE_EXTRA_CA_CERTS and run codex.

Configuration

Flag / env	Default	Purpose
`--mitm-listen` / `LOCALAI_MITM_LISTEN`	empty (disabled)	Address to bind the proxy listener on
`--mitm-ca-dir` / `LOCALAI_MITM_CA_DIR`	`<data-path>/mitm-ca`	Where to persist the CA cert + key
`--mitm-intercept-hosts` / `LOCALAI_MITM_INTERCEPT_HOSTS`	`api.anthropic.com,api.openai.com`	Hosts to terminate TLS for; everything else tunnels

Hostnames are case-insensitive. Add custom upstreams (e.g. an OpenAI-compatible third-party provider) by extending the allowlist and ensuring their endpoint paths match /v1/chat/completions or /v1/messages.

What gets redacted

Same patterns the regular request middleware uses:

Email addresses → masked
Phone numbers → masked
US Social Security Numbers → masked
Credit card numbers (Luhn-verified) → masked
IPv4 addresses → masked
API key prefixes (sk-, pk-, ghp_, github_pat_, xoxb-) → blocked

A block action returns HTTP 400 with error.type=pii_blocked to the client. The CLI sees the rejection and shows it as a request error.

Events are persisted via the same pii.EventStore the rest of LocalAI uses, so the /api/pii/events endpoint and the middleware admin page include MITM events alongside direct-API events.

Security notes

The CA private key is the master credential. Anyone with read access to <data-path>/mitm-ca/ca.key can forge TLS for any host the proxy could intercept. The file is mode 0600; keep it that way.
The proxy listener accepts plaintext HTTP CONNECT requests — bind it to localhost (--mitm-listen 127.0.0.1:8443) unless you've added auth in front of the listener. There is no built-in API-key check on this port.
The MITM CA is separate from any TLS cert LocalAI's main HTTP API uses. Installing the MITM CA grants trust only for traffic that flows through this proxy.
The proxy does not pin upstream certificates; it trusts the system certificate store. If your machine's trust store is compromised, the proxy is too.
TLS termination negotiates HTTP/2 by default (ALPN h2) and falls back to HTTP/1.1 for clients that don't speak h2. Modern CLIs (Claude Code, Codex) and the Anthropic / OpenAI APIs all use h2.

Limitations

Only /v1/messages and /v1/chat/completions get redacted. Other paths on the same host (OAuth, model listing) are forwarded verbatim.
No request-shape translation. The proxy assumes the request body matches the host's wire format; cross-shape forwarding is the cloud proxy backend's job, not the MITM's.
No CA rotation in the MVP. To rotate, delete ca.key and ca.crt and re-distribute the new cert to every client.
Cert pinning kills MITM. Neither Claude Code nor Codex CLI pins certificates today, but a future SDK update could. If a CLI starts refusing the proxied handshake, that's the signal.

Comparison with the cloud-proxy backend

LocalAI ships two cloud-related proxy modes; pick by who holds the credential:

	Cloud-proxy backend (`backend: proxy-*`)	MITM proxy (`--mitm-listen`)
Client config	`localai:8080` as API endpoint	`localai:8443` as HTTPS_PROXY
Holds API key	LocalAI	Client (CLI's own auth)
Works with subscription auth	No	Yes (CLI uses its own login)
Request rewriting	Yes (handler controls it)	Yes (selective per host+path)
CA cert distribution	Not needed	Required on every client
Routes through LocalAI's auth/usage tracking	Yes	Yes (per-correlation-id events)

For shared deployments where LocalAI owns the API key and clients are unsophisticated (curl, simple webapps), use the cloud-proxy backend. For "give my Claude Code a privacy filter" use cases, use the MITM proxy.

6.7 KiB Raw Blame History