LocalAI/docs/content/overview.md at 415b56194752d2d80576f050d462beaaea993d7f

mirror of https://github.com/mudler/LocalAI.git synced 2026-06-03 13:56:46 -04:00

Files

LocalAI [bot] 7e59a5c7c5 docs: architecture & feature diagrams (blueprint style) (#10137 )

* docs: add 'how LocalAI works' architecture diagram

Add a blueprint-style architecture diagram: clients -> small core (API,
router, WebUI, agents) -> gRPC -> backend processes pulled on demand as
OCI images. Place it on the overview page and replace the stale external
architecture image on the reference page.

Assisted-by: Claude:claude-opus-4-8 [Claude Code]
Signed-off-by: Ettore Di Giacinto <mudler@localai.io>

* docs: add blueprint diagrams across feature, distributed & getting-started docs

Add 24 architecture/flow/comparison diagrams (PNG + HTML source) under
docs/static/images/diagrams/, wired into their docs pages, from an
impact-vs-effort audit of the docs. Broaden the API surface on the
overview architecture diagram (OpenAI, Anthropic, ElevenLabs, Ollama,
and LocalAI's own API) and move the gRPC boundary label clear of the arrows.

Pages: distributed mode (architecture, scheduling, ds4 layer-split),
distributed inferencing, MLX, realtime, quantization, MCP, agents,
mitm & cloud proxy, middleware, reverse-proxy TLS, VRAM, voice & face
recognition, reranker, function calling, fine-tuning (recipe + jobs),
diarization, audio transform, quickstart, model resolution.

Assisted-by: Claude:claude-opus-4-8 [Claude Code]
Signed-off-by: Ettore Di Giacinto <mudler@localai.io>

* docs: add composable-core diagram to README hero

Commit the composable-core card (small core + on-demand backend tiles)
alongside the other diagrams and reference it from the README hero via a
repo-relative path, so it renders on GitHub.

Assisted-by: Claude:claude-opus-4-8 [Claude Code]
Signed-off-by: Ettore Di Giacinto <mudler@localai.io>

* docs: fix composable-core connectors/badge and federated-vs-worker layout

- composable-core: thicken the plug-in connectors so they read clearly, and
  widen the SEPARATE IMAGE badge so its text no longer overflows the box.
- federated-vs-worker: shorten the WHOLE/SPLIT REQUEST pills to fit, and
  replace the tangled node-to-node activation arrows with a clean fan-out
  (request split across all sharded nodes), mirroring the federated panel.

Assisted-by: Claude:claude-opus-4-8 [Claude Code]
Signed-off-by: Ettore Di Giacinto <mudler@localai.io>

---------

Signed-off-by: Ettore Di Giacinto <mudler@localai.io>
Co-authored-by: Ettore Di Giacinto <mudler@localai.io>

2026-06-02 18:43:22 +02:00

5.7 KiB

Raw Blame History

+++ title = "Overview" weight = 1 toc = true description = "What is LocalAI?" tags = ["Beginners"] categories = [""] url = "/docs/overview" author = "Ettore Di Giacinto" icon = "info" +++

LocalAI is a composable AI stack for running models locally: a small core that speaks the OpenAI and Anthropic APIs, with each model backend added only when you need it. It's simple, efficient, and private by default, and a drop-in replacement that keeps your data on your own hardware.

Why LocalAI?

In today's AI landscape, privacy, control, and flexibility are paramount. LocalAI addresses these needs by:

Privacy First: Your data never leaves your machine
Complete Control: Run models on your terms, with your hardware
Open Source: MIT licensed and community-driven
Flexible Deployment: From laptops to servers, with or without GPUs
Composable by design: A small core, not a bundle. Backends are separate and installed on demand, so you only run what you use

What's Included

The LocalAI core is a single small binary (or container). It gives you everything you need to serve models, and pulls each model backend on demand, so you install only what you use:

OpenAI-compatible API — Drop-in replacement for OpenAI, Anthropic, and Open Responses APIs
Built-in Web Interface — Chat, model management, agent creation, image generation, and system monitoring
AI Agents — Create autonomous agents with MCP (Model Context Protocol) tool support, directly from the UI
Any Model, Any Modality: LLMs, image and video, text-to-speech, speech-to-text, vision, and embeddings, each on its own backend, pulled automatically when you load a model
GPU Acceleration — Automatic detection and support for NVIDIA, AMD, Intel, and Vulkan GPUs
Distributed Mode — Scale horizontally with worker nodes, P2P federation, and model sharding
No GPU Required — Runs on CPU with consumer-grade hardware

LocalAI integrates LocalAGI (agent platform) and LocalRecall (semantic memory) as built-in libraries — no separate installation needed.

Each backend is a dedicated gRPC service that LocalAI builds around a best-in-class engine (llama.cpp, vLLM, whisper.cpp, stable-diffusion, MLX, and more), exposing it through the unified API. Backends ship as standard OCI images and run as isolated processes, so each one can be installed, upgraded, or removed without touching the core, can even run on a separate machine, and a fault in one never brings down the rest.

Because the backend contract is a simple gRPC interface, the system is open: bring your own model, or write a custom backend in any language and plug it in, exactly how the built-in backends work. This is what keeps the core small and gives you the flexibility to run precisely the stack you want, instead of compiling every engine into one binary.

Getting Started

LocalAI can be installed in several ways. Docker is the recommended installation method for most users as it provides the easiest setup and works across all platforms.

Recommended: Docker Installation

The quickest way to get started with LocalAI is using Docker:

docker run -p 8080:8080 --name local-ai -ti localai/localai:latest-cpu

Then open http://localhost:8080 to access the web interface, install models, and start chatting.

For GPU support, see the [Container images reference]({{% relref "getting-started/container-images" %}}) or the [Quickstart guide]({{% relref "getting-started/quickstart" %}}).

For complete installation instructions including Docker, macOS, Linux, Kubernetes, and building from source, see the Installation guide.

Key Features

Text Generation: Run various LLMs locally (llama.cpp, transformers, vLLM, and more)
Image Generation: Create images with Stable Diffusion, Flux, and other models
Audio Processing: Text-to-speech and speech-to-text
Vision API: Image understanding and analysis
Embeddings: Vector representations for search and retrieval
Function Calling: OpenAI-compatible tool use
AI Agents: Autonomous agents with MCP tool support
MCP Apps: Interactive tool UIs in the web interface
P2P & Distributed: Federated inference and model sharding across machines

Community and Support

LocalAI is a community-driven project. You can:

Join our Discord community
Check out our GitHub repository
Contribute to the project
Share your use cases and examples

Next Steps

Ready to dive in? Here are some recommended next steps:

Install LocalAI - Start with Docker installation (recommended) or choose another method
[Quickstart guide]({{% relref "getting-started/quickstart" %}}) - Get up and running in minutes
Explore available models
Model compatibility
[Try out examples]({{% relref "getting-started/try-it-out" %}})
Join the community

Team

LocalAI is created by Ettore Di Giacinto and maintained by the LocalAI team:

Ettore Di Giacinto — original author and project lead
Richard Palethorpe — maintainer

LocalAI is helped by the wider community of contributors. See the full contributors list.

License

LocalAI is MIT licensed.

5.7 KiB Raw Blame History