docs: position LocalAI as a composable engine, not a bundle (#10136)

Reframe the README hero and docs (homepage, overview, FAQ) around the composable architecture: a small core, with backends built as dedicated gRPC services around best-in-class engines, shipped as separate OCI images and pulled on demand. Lead from strength: drop the "36+ backends" kitchen-sink framing and the "All-in-One Complete AI Stack" / "single binary that gives you everything" lines that read as a monolith. - README: small-core differentiator; composable + open/extensible bullets - _index.md: composable tagline; install only what you use - overview.md: core vs on-demand backends; gRPC/OCI mechanics as benefits; bring-your-own model and backend - faq.md: "Do I need to install all the backends?" and "Can I bring my own model or backend?" Assisted-by: Claude:claude-opus-4-8 [Claude Code] Signed-off-by: Ettore Di Giacinto <mudler@localai.io> Co-authored-by: Ettore Di Giacinto <mudler@localai.io>
2026-07-22 06:04:40 -04:00 · 2026-06-02 17:34:43 +02:00
parent 595e448714
commit aea954a482
4 changed files with 36 additions and 12 deletions
--- a/README.md
+++ b/README.md
@@ -31,12 +31,16 @@

 **LocalAI** is the open-source AI engine. Run any model - LLMs, vision, voice, image, video - on any hardware. No GPU required.

- **Drop-in API compatibility** — OpenAI, Anthropic, ElevenLabs APIs
- **36+ backends** — llama.cpp, vLLM, transformers, whisper, diffusers, MLX...
- **Any hardware** — NVIDIA, AMD, Intel, Apple Silicon, Vulkan, or CPU-only
- **Multi-user ready** — API key auth, user quotas, role-based access
- **Built-in AI agents** — autonomous agents with tool use, RAG, MCP, and skills
- **Privacy-first** — your data never leaves your infrastructure
+**A small core, not a bundle.** Each backend wraps a best-in-class engine (llama.cpp, vLLM, whisper.cpp, stable-diffusion, MLX...) in its own image, pulled only when a model needs it. You install nothing you don't use.
+
+- **Composable by design**: backends are separate and pulled on demand, so you install only what your model needs
+- **Open and extensible**: load any model, or build your own backend in any language against an open interface
+- **Drop-in API compatibility**: OpenAI, Anthropic, and ElevenLabs APIs across every backend
+- **Any model, any modality**: LLMs, vision, voice, image, and video behind one API
+- **Any hardware**: NVIDIA, AMD, Intel, Apple Silicon, Vulkan, or CPU-only
+- **Multi-user ready**: API key auth, user quotas, role-based access
+- **Built-in AI agents**: autonomous agents with tool use, RAG, MCP, and skills
+- **Privacy-first**: your data never leaves your infrastructure

 Created by [Ettore Di Giacinto](https://github.com/mudler) and maintained by the [LocalAI team](#team).

--- a/docs/content/_index.md
+++ b/docs/content/_index.md
@@ -1,10 +1,10 @@
 +++
 title = "LocalAI"
-description = "The free, OpenAI, Anthropic alternative. Your All-in-One Complete AI Stack"
+description = "The free, OpenAI and Anthropic alternative. A small, composable AI stack: run any model locally and install only what you use."
 type = "home"
 +++

-**The free, OpenAI, Anthropic alternative. Your All-in-One Complete AI Stack** - Run powerful language models, autonomous agents, and document intelligence **locally** on your hardware. 
+**The free, OpenAI and Anthropic alternative. A small, composable AI stack.** - Run powerful language models, autonomous agents, and document intelligence **locally** on your hardware. A lean core that pulls model backends on demand, so you install only what you use. 

 **No cloud, no limits, no compromise.**

--- a/docs/content/faq.md
+++ b/docs/content/faq.md
@@ -12,6 +12,22 @@ url = "/faq/"
 Here are answers to some of the most common questions.


+### Do I need to install all the backends?
+
+No. You install only the backends your models use. LocalAI's core is a single binary (or container) that provides the OpenAI-compatible API, request routing, the web UI, and agents. Each inference backend (llama.cpp, vLLM, whisper.cpp, stable-diffusion, MLX, and others) is a separate artifact, installed only when a model needs it.
+
+In practice:
+
+- **You install one backend, not all of them.** Run a model with `local-ai run <model>` and the matching backend is pulled automatically; nothing else is downloaded.
+- **Each backend is purpose-built for its engine.** LocalAI builds a dedicated gRPC backend around each engine, so every one stays independently optimized without a single binary trying to support every model architecture at once.
+- **You manage backends individually** with `local-ai backends list/install/uninstall` or from the web UI.
+
+The catalog's breadth is optionality: you only ever run what your models use.
+
+### Can I bring my own model or backend?
+
+Yes. You can load any compatible model, not just the ones in the gallery. And because every backend talks to the core over a simple gRPC interface, you can write your own backend in any language and plug it in, exactly how the built-in backends work. Nothing about the core is closed off, which gives you the flexibility to run precisely the stack you want.
+
 ### How do I get models? 

 Most gguf-based models should work, but newer models may require additions to the API. If a model doesn't work, please feel free to open up issues. However, be cautious about downloading models from the internet and directly onto your machine, as there may be security vulnerabilities in lama.cpp or ggml that could be maliciously exploited. Some models can be found on Hugging Face: https://huggingface.co/models?search=gguf, or models from gpt4all are compatible too: https://github.com/nomic-ai/gpt4all.
--- a/docs/content/overview.md
+++ b/docs/content/overview.md
@@ -11,7 +11,7 @@ icon = "info"
 +++


-LocalAI is your complete AI stack for running AI models locally. It's designed to be simple, efficient, and accessible, providing a drop-in replacement for OpenAI's API while keeping your data private and secure.
+LocalAI is a composable AI stack for running models locally: a small core that speaks the OpenAI and Anthropic APIs, with each model backend added only when you need it. It's simple, efficient, and private by default, and a drop-in replacement that keeps your data on your own hardware.

 ## Why LocalAI?

@@ -21,22 +21,26 @@ In today's AI landscape, privacy, control, and flexibility are paramount. LocalA
 - **Complete Control**: Run models on your terms, with your hardware
 - **Open Source**: MIT licensed and community-driven
 - **Flexible Deployment**: From laptops to servers, with or without GPUs
- **Extensible**: Add new models and features as needed
+- **Composable by design**: A small core, not a bundle. Backends are separate and installed on demand, so you only run what you use

 ## What's Included

-LocalAI is a single binary (or container) that gives you everything you need:
+The LocalAI core is a single small binary (or container). It gives you everything you need to serve models, and pulls each model backend on demand, so you install only what you use:

 - **OpenAI-compatible API** — Drop-in replacement for OpenAI, Anthropic, and Open Responses APIs
 - **Built-in Web Interface** — Chat, model management, agent creation, image generation, and system monitoring
 - **AI Agents** — Create autonomous agents with MCP (Model Context Protocol) tool support, directly from the UI
- **Multiple Model Support** — LLMs, image generation, text-to-speech, speech-to-text, vision, embeddings, and more
+- **Any Model, Any Modality**: LLMs, image and video, text-to-speech, speech-to-text, vision, and embeddings, each on its own backend, pulled automatically when you load a model
 - **GPU Acceleration** — Automatic detection and support for NVIDIA, AMD, Intel, and Vulkan GPUs
 - **Distributed Mode** — Scale horizontally with worker nodes, P2P federation, and model sharding
 - **No GPU Required** — Runs on CPU with consumer-grade hardware

 LocalAI integrates [LocalAGI](https://github.com/mudler/LocalAGI) (agent platform) and [LocalRecall](https://github.com/mudler/LocalRecall) (semantic memory) as built-in libraries — no separate installation needed.

+Each backend is a dedicated gRPC service that LocalAI builds around a best-in-class engine (llama.cpp, vLLM, whisper.cpp, stable-diffusion, MLX, and more), exposing it through the unified API. Backends ship as standard OCI images and run as isolated processes, so each one can be installed, upgraded, or removed without touching the core, can even run on a separate machine, and a fault in one never brings down the rest.
+
+Because the backend contract is a simple gRPC interface, the system is open: bring your own model, or write a custom backend in any language and plug it in, exactly how the built-in backends work. This is what keeps the core small and gives you the flexibility to run precisely the stack you want, instead of compiling every engine into one binary.
+
 ## Getting Started

 LocalAI can be installed in several ways. **Docker is the recommended installation method** for most users as it provides the easiest setup and works across all platforms.