Files
LocalAI/core/services/routing/billing/backend.go
Richard Palethorpe 6a80e23733 feat(middleware): Model routing, PII filtering, Cloud model proxies (#9802)
Add a routing middleware stack and a cloud-proxy backend.

* cloud-proxy: a Go gRPC backend that forwards OpenAI- and
  Anthropic-shaped chat requests to upstream providers, with an
  optional translate mode (OpenAI request -> Anthropic /v1/messages
  -> OpenAI response) and full tool-calling support.

* routing: admission control, content-aware model routing
  (embedding cache + classifier + rerank + Arch-Router score),
  PII detection/redaction (regex + NER) with streaming filter and
  OpenAI/Anthropic adapters, and a per-user/per-key billing recorder
  backed by GORM or in-memory storage.

* middleware: UsageMiddleware records usage via the billing recorder,
  plus admission, route-model, usage-stamp and trace middlewares.

* observability: BackendTrace ring buffer stores full request bodies
  (capped), MITM proxy emits structured trace events, and router
  classifier decisions surface at /api/router/decide.

* gallery: Arch-Router-1.5B (Q4_K_M and Q8_0).

* UI: cloud-proxy model-editor fields, classifier system-prompt and
  score-normalization config, and a Traces page rendering request
  bodies.

Assisted-by: claude-code:claude-opus-4-7 [Read] [Edit] [Bash]

Signed-off-by: Richard Palethorpe <io@richiejp.com>
2026-05-25 09:28:27 +02:00

53 lines
2.2 KiB
Go

// Package billing provides the StatsBackend abstraction that decouples
// per-request token tracking from the auth database. This lets a
// single-user no-auth deployment still see usage and costs, which the
// pre-routing-module middleware did not allow.
package billing
import (
"context"
"github.com/mudler/LocalAI/core/http/auth"
)
// StatsBackend is the persistence target for usage records. Three
// implementations exist:
//
// - GORM (auth-DB-backed) — used when --auth is on; records share the
// auth database and existing aggregation queries continue to work.
// - Memory (ring buffer) — used when --auth is off and no other DB is
// configured. Records are lost on restart by design; the same
// process can still answer aggregation queries for live dashboards.
// - Disabled — explicit no-op when --disable-stats is set, useful in
// ephemeral CI runs.
//
// All implementations are safe for concurrent use. Record() must not
// block the caller for more than the time it takes to enqueue — durable
// flushing happens on a background goroutine inside the implementation.
type StatsBackend interface {
// Record enqueues a single usage record. The record is asynchronously
// persisted; callers should not assume durability on return. The ctx
// is currently unused but reserved for future cancellation.
Record(ctx context.Context, r *auth.UsageRecord) error
// Aggregate returns time-bucketed totals for the dashboard. The
// AggregateQuery's UserID is required; pass the empty string only
// from admin-scoped paths. Implementations that do not support
// aggregation (e.g., ring buffer in saturation) may return an empty
// result with no error.
Aggregate(ctx context.Context, q AggregateQuery) ([]auth.UsageBucket, error)
// Close releases resources (flushes pending records, stops
// goroutines). Safe to call multiple times.
Close() error
}
// AggregateQuery describes a usage aggregation request. Period is one of
// "day", "week", "month", "all" (matching the existing auth.UsageRecord
// vocabulary). UserID empty means cluster-wide; callers must enforce
// admin permission before passing the empty string.
type AggregateQuery struct {
UserID string
Period string
}