chore: drop AIO images (#9004)

AIO images are behind, and takes effort to maintain these. Wizard and
installation of models have been semplified massively, so AIO images
lost their purpose.

This allows us to be more laser focused on main images and reliefes
stress from CI.

Signed-off-by: Ettore Di Giacinto <mudler@localai.io>
This commit is contained in:
Ettore Di Giacinto
2026-03-14 17:49:36 +01:00
committed by GitHub
parent 0ac4ac5bdd
commit 5affb747a9
44 changed files with 68 additions and 988 deletions

View File

@@ -206,7 +206,7 @@ The following are examples of the ROCm specific configuration elements required.
```yaml
# For full functionality select a non-'core' image, version locking the image is recommended for debug purposes.
image: quay.io/go-skynet/local-ai:master-aio-gpu-hipblas
image: quay.io/go-skynet/local-ai:master-gpu-hipblas
environment:
- DEBUG=true
# If your gpu is not already included in the current list of default targets the following build details are required.
@@ -229,13 +229,11 @@ docker run \
-e GPU_TARGETS=gfx906 \
--device /dev/dri \
--device /dev/kfd \
quay.io/go-skynet/local-ai:master-aio-gpu-hipblas
quay.io/go-skynet/local-ai:master-gpu-hipblas
```
Please ensure to add all other required environment variables, port forwardings, etc to your `compose` file or `run` command.
The rebuild process will take some time to complete when deploying these containers and it is recommended that you `pull` the image prior to deployment as depending on the version these images may be ~20GB in size.
#### Example (k8s) (Advanced Deployment/WIP)
For k8s deployments there is an additional step required before deployment, this is the deployment of the [ROCm/k8s-device-plugin](https://artifacthub.io/packages/helm/amd-gpu-helm/amd-gpu).
@@ -434,7 +432,7 @@ If your AMD GPU is not in the default target list, set `REBUILD=true` and `GPU_T
```bash
docker run -e REBUILD=true -e BUILD_TYPE=hipblas -e GPU_TARGETS=gfx1030 \
--device /dev/dri --device /dev/kfd \
quay.io/go-skynet/local-ai:master-aio-gpu-hipblas
quay.io/go-skynet/local-ai:master-gpu-hipblas
```
### Intel SYCL: model hangs

View File

@@ -32,6 +32,4 @@ Grammars and function tools can be used as well in conjunction with vision APIs:
### Setup
All-in-One images have already shipped the llava model as `gpt-4-vision-preview`, so no setup is needed in this case.
To setup the LLaVa models, follow the full example in the [configuration examples](https://github.com/mudler/LocalAI-examples/blob/main/configurations/llava/llava.yaml).

View File

@@ -8,8 +8,6 @@ ico = "rocket_launch"
LocalAI provides a variety of images to support different environments. These images are available on [quay.io](https://quay.io/repository/go-skynet/local-ai?tab=tags) and [Docker Hub](https://hub.docker.com/r/localai/localai).
All-in-One images comes with a pre-configured set of models and backends, standard images instead do not have any model pre-configured and installed.
For GPU Acceleration support for Nvidia video graphic cards, use the Nvidia/CUDA images, if you don't have a GPU, use the CPU images. If you have AMD or Mac Silicon, see the [build section]({{%relref "installation/build" %}}).
{{% notice tip %}}
@@ -17,7 +15,6 @@ For GPU Acceleration support for Nvidia video graphic cards, use the Nvidia/CUDA
**Available Images Types**:
- Images ending with `-core` are smaller images without predownload python dependencies. Use these images if you plan to use `llama.cpp`, `stablediffusion-ncn` or `rwkv` backends - if you are not sure which one to use, do **not** use these images.
- Images containing the `aio` tag are all-in-one images with all the features enabled, and come with an opinionated set of configuration.
{{% /notice %}}
@@ -124,109 +121,6 @@ These images are compatible with Nvidia ARM64 devices with CUDA 13, such as the
{{< /tabs >}}
## All-in-one images
All-In-One images are images that come pre-configured with a set of models and backends to fully leverage almost all the LocalAI featureset. These images are available for both CPU and GPU environments. The AIO images are designed to be easy to use and require no configuration. Models configuration can be found [here](https://github.com/mudler/LocalAI/tree/master/aio) separated by size.
In the AIO images there are models configured with the names of OpenAI models, however, they are really backed by Open Source models. You can find the table below
| Category | Model name | Real model (CPU) | Real model (GPU) |
| ---- | ---- | ---- | ---- |
| Text Generation | `gpt-4` | `phi-2` | `hermes-2-pro-mistral` |
| Multimodal Vision | `gpt-4-vision-preview` | `bakllava` | `llava-1.6-mistral` |
| Image Generation | `stablediffusion` | `stablediffusion` | `dreamshaper-8` |
| Speech to Text | `whisper-1` | `whisper` with `whisper-base` model | <= same |
| Text to Speech | `tts-1` | `en-us-amy-low.onnx` from `rhasspy/piper` | <= same |
| Embeddings | `text-embedding-ada-002` | `all-MiniLM-L6-v2` in Q4 | `all-MiniLM-L6-v2` |
### Usage
Select the image (CPU or GPU) and start the container with Docker:
```bash
docker run -p 8080:8080 --name local-ai -ti localai/localai:latest-aio-cpu
```
LocalAI will automatically download all the required models, and the API will be available at [localhost:8080](http://localhost:8080/v1/models).
Or with a docker-compose file:
```yaml
version: "3.9"
services:
api:
image: localai/localai:latest-aio-cpu
# For a specific version:
# image: localai/localai:{{< version >}}-aio-cpu
# For Nvidia GPUs decomment one of the following (cuda12 or cuda13):
# image: localai/localai:{{< version >}}-aio-gpu-nvidia-cuda-12
# image: localai/localai:{{< version >}}-aio-gpu-nvidia-cuda-13
# image: localai/localai:latest-aio-gpu-nvidia-cuda-12
# image: localai/localai:latest-aio-gpu-nvidia-cuda-13
healthcheck:
test: ["CMD", "curl", "-f", "http://localhost:8080/readyz"]
interval: 1m
timeout: 20m
retries: 5
ports:
- 8080:8080
environment:
- DEBUG=true
# ...
volumes:
- ./models:/models:cached
# decomment the following piece if running with Nvidia GPUs
# deploy:
# resources:
# reservations:
# devices:
# - driver: nvidia
# count: 1
# capabilities: [gpu]
```
{{% notice tip %}}
**Models caching**: The **AIO** image will download the needed models on the first run if not already present and store those in `/models` inside the container. The AIO models will be automatically updated with new versions of AIO images.
You can change the directory inside the container by specifying a `MODELS_PATH` environment variable (or `--models-path`).
If you want to use a named model or a local directory, you can mount it as a volume to `/models`:
```bash
docker run -p 8080:8080 --name local-ai -ti -v $PWD/models:/models localai/localai:latest-aio-cpu
```
or associate a volume:
```bash
docker volume create localai-models
docker run -p 8080:8080 --name local-ai -ti -v localai-models:/models localai/localai:latest-aio-cpu
```
{{% /notice %}}
### Available AIO images
| Description | Quay | Docker Hub |
| --- | --- |-----------------------------------------------|
| Latest images for CPU | `quay.io/go-skynet/local-ai:latest-aio-cpu` | `localai/localai:latest-aio-cpu` |
| Versioned image (e.g. for CPU) | `quay.io/go-skynet/local-ai:{{< version >}}-aio-cpu` | `localai/localai:{{< version >}}-aio-cpu` |
| Latest images for Nvidia GPU (CUDA12) | `quay.io/go-skynet/local-ai:latest-aio-gpu-nvidia-cuda-12` | `localai/localai:latest-aio-gpu-nvidia-cuda-12` |
| Latest images for Nvidia GPU (CUDA13) | `quay.io/go-skynet/local-ai:latest-aio-gpu-nvidia-cuda-13` | `localai/localai:latest-aio-gpu-nvidia-cuda-13` |
| Latest images for AMD GPU | `quay.io/go-skynet/local-ai:latest-aio-gpu-hipblas` | `localai/localai:latest-aio-gpu-hipblas` |
| Latest images for Intel GPU | `quay.io/go-skynet/local-ai:latest-aio-gpu-intel` | `localai/localai:latest-aio-gpu-intel` |
### Available environment variables
The AIO Images are inheriting the same environment variables as the base images and the environment of LocalAI (that you can inspect by calling `--help`). However, it supports additional environment variables available only from the container image
| Variable | Default | Description |
| ---------------------| ------- | ----------- |
| `PROFILE` | Auto-detected | The size of the model to use. Available: `cpu`, `gpu-8g` |
| `MODELS` | Auto-detected | A list of models YAML Configuration file URI/URL (see also [running models]({{%relref "getting-started/models" %}})) |
## See Also
- [GPU acceleration]({{%relref "features/gpu-acceleration" %}})

View File

@@ -20,7 +20,7 @@ With the CLI you can list the models with `local-ai models list` and install the
You can also [run models manually]({{%relref "getting-started/models" %}}) by copying files into the `models` directory.
{{% /notice %}}
You can test out the API endpoints using `curl`, few examples are listed below. The models we are referring here (`gpt-4`, `gpt-4-vision-preview`, `tts-1`, `whisper-1`) are the default models that come with the AIO images - you can also use any other model you have installed.
You can test out the API endpoints using `curl`, few examples are listed below. The models we are referring here (`gpt-4`, `gpt-4-vision-preview`, `tts-1`, `whisper-1`) are examples - replace them with the model names you have installed.
### Text Generation

View File

@@ -30,7 +30,7 @@ docker run -p 8080:8080 --name local-ai -ti localai/localai:latest
podman run -p 8080:8080 --name local-ai -ti localai/localai:latest
```
This will start LocalAI. The API will be available at `http://localhost:8080`. For images with pre-configured models, see [All-in-One images](/getting-started/container-images/#all-in-one-images).
This will start LocalAI. The API will be available at `http://localhost:8080`.
For other platforms:
- **macOS**: Download the [DMG](macos/)

View File

@@ -93,48 +93,6 @@ CUDA 13 (for Nvidia DGX Spark):
docker run -ti --name local-ai -p 8080:8080 --runtime nvidia --gpus all localai/localai:latest-nvidia-l4t-arm64-cuda-13
```
### All-in-One (AIO) Images
**Recommended for beginners** - These images come pre-configured with models and backends, ready to use immediately.
#### CPU Image
```bash
docker run -ti --name local-ai -p 8080:8080 localai/localai:latest-aio-cpu
# Or with Podman:
podman run -ti --name local-ai -p 8080:8080 localai/localai:latest-aio-cpu
```
#### GPU Images
**NVIDIA CUDA 13:**
```bash
docker run -ti --name local-ai -p 8080:8080 --gpus all localai/localai:latest-aio-gpu-nvidia-cuda-13
# Or with Podman:
podman run -ti --name local-ai -p 8080:8080 --device nvidia.com/gpu=all localai/localai:latest-aio-gpu-nvidia-cuda-13
```
**NVIDIA CUDA 12:**
```bash
docker run -ti --name local-ai -p 8080:8080 --gpus all localai/localai:latest-aio-gpu-nvidia-cuda-12
# Or with Podman:
podman run -ti --name local-ai -p 8080:8080 --device nvidia.com/gpu=all localai/localai:latest-aio-gpu-nvidia-cuda-12
```
**AMD GPU (ROCm):**
```bash
docker run -ti --name local-ai -p 8080:8080 --device=/dev/kfd --device=/dev/dri --group-add=video localai/localai:latest-aio-gpu-hipblas
# Or with Podman:
podman run -ti --name local-ai -p 8080:8080 --device rocm.com/gpu=all localai/localai:latest-aio-gpu-hipblas
```
**Intel GPU:**
```bash
docker run -ti --name local-ai -p 8080:8080 localai/localai:latest-aio-gpu-intel
# Or with Podman:
podman run -ti --name local-ai -p 8080:8080 --device gpu.intel.com/all localai/localai:latest-aio-gpu-intel
```
## Using Compose
For a more manageable setup, especially with persistent volumes, use Docker Compose or Podman Compose:
@@ -147,8 +105,8 @@ The CDI approach is recommended for newer versions of the NVIDIA Container Toolk
version: "3.9"
services:
api:
image: localai/localai:latest-aio-gpu-nvidia-cuda-12
# For CUDA 13, use: localai/localai:latest-aio-gpu-nvidia-cuda-13
image: localai/localai:latest-gpu-nvidia-cuda-12
# For CUDA 13, use: localai/localai:latest-gpu-nvidia-cuda-13
healthcheck:
test: ["CMD", "curl", "-f", "http://localhost:8080/readyz"]
interval: 1m
@@ -187,8 +145,8 @@ If you are using an older version of the NVIDIA Container Toolkit (before 1.14),
version: "3.9"
services:
api:
image: localai/localai:latest-aio-gpu-nvidia-cuda-12
# For CUDA 13, use: localai/localai:latest-aio-gpu-nvidia-cuda-13
image: localai/localai:latest-gpu-nvidia-cuda-12
# For CUDA 13, use: localai/localai:latest-gpu-nvidia-cuda-13
healthcheck:
test: ["CMD", "curl", "-f", "http://localhost:8080/readyz"]
interval: 1m
@@ -227,12 +185,12 @@ To persist models and data, mount volumes:
docker run -ti --name local-ai -p 8080:8080 \
-v $PWD/models:/models \
-v $PWD/data:/data \
localai/localai:latest-aio-cpu
localai/localai:latest
# Or with Podman:
podman run -ti --name local-ai -p 8080:8080 \
-v $PWD/models:/models \
-v $PWD/data:/data \
localai/localai:latest-aio-cpu
localai/localai:latest
```
Or use named volumes:
@@ -243,29 +201,16 @@ docker volume create localai-data
docker run -ti --name local-ai -p 8080:8080 \
-v localai-models:/models \
-v localai-data:/data \
localai/localai:latest-aio-cpu
localai/localai:latest
# Or with Podman:
podman volume create localai-models
podman volume create localai-data
podman run -ti --name local-ai -p 8080:8080 \
-v localai-models:/models \
-v localai-data:/data \
localai/localai:latest-aio-cpu
localai/localai:latest
```
## What's Included in AIO Images
All-in-One images come pre-configured with:
- **Text Generation**: LLM models for chat and completion
- **Image Generation**: Stable Diffusion models
- **Text to Speech**: TTS models
- **Speech to Text**: Whisper models
- **Embeddings**: Vector embedding models
- **Function Calling**: Support for OpenAI-compatible function calling
The AIO images use OpenAI-compatible model names (like `gpt-4`, `gpt-4-vision-preview`) but are backed by open-source models. See the [container images documentation](/getting-started/container-images/#all-in-one-images) for the complete mapping.
## Next Steps
After installation: