mirror of
https://github.com/mudler/LocalAI.git
synced 2026-03-31 21:25:59 -04:00
chore: drop AIO images (#9004)
AIO images are behind, and takes effort to maintain these. Wizard and installation of models have been semplified massively, so AIO images lost their purpose. This allows us to be more laser focused on main images and reliefes stress from CI. Signed-off-by: Ettore Di Giacinto <mudler@localai.io>
This commit is contained in:
committed by
GitHub
parent
0ac4ac5bdd
commit
5affb747a9
@@ -206,7 +206,7 @@ The following are examples of the ROCm specific configuration elements required.
|
||||
|
||||
```yaml
|
||||
# For full functionality select a non-'core' image, version locking the image is recommended for debug purposes.
|
||||
image: quay.io/go-skynet/local-ai:master-aio-gpu-hipblas
|
||||
image: quay.io/go-skynet/local-ai:master-gpu-hipblas
|
||||
environment:
|
||||
- DEBUG=true
|
||||
# If your gpu is not already included in the current list of default targets the following build details are required.
|
||||
@@ -229,13 +229,11 @@ docker run \
|
||||
-e GPU_TARGETS=gfx906 \
|
||||
--device /dev/dri \
|
||||
--device /dev/kfd \
|
||||
quay.io/go-skynet/local-ai:master-aio-gpu-hipblas
|
||||
quay.io/go-skynet/local-ai:master-gpu-hipblas
|
||||
```
|
||||
|
||||
Please ensure to add all other required environment variables, port forwardings, etc to your `compose` file or `run` command.
|
||||
|
||||
The rebuild process will take some time to complete when deploying these containers and it is recommended that you `pull` the image prior to deployment as depending on the version these images may be ~20GB in size.
|
||||
|
||||
#### Example (k8s) (Advanced Deployment/WIP)
|
||||
|
||||
For k8s deployments there is an additional step required before deployment, this is the deployment of the [ROCm/k8s-device-plugin](https://artifacthub.io/packages/helm/amd-gpu-helm/amd-gpu).
|
||||
@@ -434,7 +432,7 @@ If your AMD GPU is not in the default target list, set `REBUILD=true` and `GPU_T
|
||||
```bash
|
||||
docker run -e REBUILD=true -e BUILD_TYPE=hipblas -e GPU_TARGETS=gfx1030 \
|
||||
--device /dev/dri --device /dev/kfd \
|
||||
quay.io/go-skynet/local-ai:master-aio-gpu-hipblas
|
||||
quay.io/go-skynet/local-ai:master-gpu-hipblas
|
||||
```
|
||||
|
||||
### Intel SYCL: model hangs
|
||||
|
||||
@@ -32,6 +32,4 @@ Grammars and function tools can be used as well in conjunction with vision APIs:
|
||||
|
||||
### Setup
|
||||
|
||||
All-in-One images have already shipped the llava model as `gpt-4-vision-preview`, so no setup is needed in this case.
|
||||
|
||||
To setup the LLaVa models, follow the full example in the [configuration examples](https://github.com/mudler/LocalAI-examples/blob/main/configurations/llava/llava.yaml).
|
||||
@@ -8,8 +8,6 @@ ico = "rocket_launch"
|
||||
|
||||
LocalAI provides a variety of images to support different environments. These images are available on [quay.io](https://quay.io/repository/go-skynet/local-ai?tab=tags) and [Docker Hub](https://hub.docker.com/r/localai/localai).
|
||||
|
||||
All-in-One images comes with a pre-configured set of models and backends, standard images instead do not have any model pre-configured and installed.
|
||||
|
||||
For GPU Acceleration support for Nvidia video graphic cards, use the Nvidia/CUDA images, if you don't have a GPU, use the CPU images. If you have AMD or Mac Silicon, see the [build section]({{%relref "installation/build" %}}).
|
||||
|
||||
{{% notice tip %}}
|
||||
@@ -17,7 +15,6 @@ For GPU Acceleration support for Nvidia video graphic cards, use the Nvidia/CUDA
|
||||
**Available Images Types**:
|
||||
|
||||
- Images ending with `-core` are smaller images without predownload python dependencies. Use these images if you plan to use `llama.cpp`, `stablediffusion-ncn` or `rwkv` backends - if you are not sure which one to use, do **not** use these images.
|
||||
- Images containing the `aio` tag are all-in-one images with all the features enabled, and come with an opinionated set of configuration.
|
||||
|
||||
{{% /notice %}}
|
||||
|
||||
@@ -124,109 +121,6 @@ These images are compatible with Nvidia ARM64 devices with CUDA 13, such as the
|
||||
|
||||
{{< /tabs >}}
|
||||
|
||||
## All-in-one images
|
||||
|
||||
All-In-One images are images that come pre-configured with a set of models and backends to fully leverage almost all the LocalAI featureset. These images are available for both CPU and GPU environments. The AIO images are designed to be easy to use and require no configuration. Models configuration can be found [here](https://github.com/mudler/LocalAI/tree/master/aio) separated by size.
|
||||
|
||||
In the AIO images there are models configured with the names of OpenAI models, however, they are really backed by Open Source models. You can find the table below
|
||||
|
||||
| Category | Model name | Real model (CPU) | Real model (GPU) |
|
||||
| ---- | ---- | ---- | ---- |
|
||||
| Text Generation | `gpt-4` | `phi-2` | `hermes-2-pro-mistral` |
|
||||
| Multimodal Vision | `gpt-4-vision-preview` | `bakllava` | `llava-1.6-mistral` |
|
||||
| Image Generation | `stablediffusion` | `stablediffusion` | `dreamshaper-8` |
|
||||
| Speech to Text | `whisper-1` | `whisper` with `whisper-base` model | <= same |
|
||||
| Text to Speech | `tts-1` | `en-us-amy-low.onnx` from `rhasspy/piper` | <= same |
|
||||
| Embeddings | `text-embedding-ada-002` | `all-MiniLM-L6-v2` in Q4 | `all-MiniLM-L6-v2` |
|
||||
|
||||
### Usage
|
||||
|
||||
Select the image (CPU or GPU) and start the container with Docker:
|
||||
|
||||
```bash
|
||||
docker run -p 8080:8080 --name local-ai -ti localai/localai:latest-aio-cpu
|
||||
```
|
||||
|
||||
LocalAI will automatically download all the required models, and the API will be available at [localhost:8080](http://localhost:8080/v1/models).
|
||||
|
||||
|
||||
Or with a docker-compose file:
|
||||
|
||||
```yaml
|
||||
version: "3.9"
|
||||
services:
|
||||
api:
|
||||
image: localai/localai:latest-aio-cpu
|
||||
# For a specific version:
|
||||
# image: localai/localai:{{< version >}}-aio-cpu
|
||||
# For Nvidia GPUs decomment one of the following (cuda12 or cuda13):
|
||||
# image: localai/localai:{{< version >}}-aio-gpu-nvidia-cuda-12
|
||||
# image: localai/localai:{{< version >}}-aio-gpu-nvidia-cuda-13
|
||||
# image: localai/localai:latest-aio-gpu-nvidia-cuda-12
|
||||
# image: localai/localai:latest-aio-gpu-nvidia-cuda-13
|
||||
healthcheck:
|
||||
test: ["CMD", "curl", "-f", "http://localhost:8080/readyz"]
|
||||
interval: 1m
|
||||
timeout: 20m
|
||||
retries: 5
|
||||
ports:
|
||||
- 8080:8080
|
||||
environment:
|
||||
- DEBUG=true
|
||||
# ...
|
||||
volumes:
|
||||
- ./models:/models:cached
|
||||
# decomment the following piece if running with Nvidia GPUs
|
||||
# deploy:
|
||||
# resources:
|
||||
# reservations:
|
||||
# devices:
|
||||
# - driver: nvidia
|
||||
# count: 1
|
||||
# capabilities: [gpu]
|
||||
```
|
||||
|
||||
{{% notice tip %}}
|
||||
|
||||
**Models caching**: The **AIO** image will download the needed models on the first run if not already present and store those in `/models` inside the container. The AIO models will be automatically updated with new versions of AIO images.
|
||||
|
||||
You can change the directory inside the container by specifying a `MODELS_PATH` environment variable (or `--models-path`).
|
||||
|
||||
If you want to use a named model or a local directory, you can mount it as a volume to `/models`:
|
||||
|
||||
```bash
|
||||
docker run -p 8080:8080 --name local-ai -ti -v $PWD/models:/models localai/localai:latest-aio-cpu
|
||||
```
|
||||
|
||||
or associate a volume:
|
||||
|
||||
```bash
|
||||
docker volume create localai-models
|
||||
docker run -p 8080:8080 --name local-ai -ti -v localai-models:/models localai/localai:latest-aio-cpu
|
||||
```
|
||||
|
||||
{{% /notice %}}
|
||||
|
||||
### Available AIO images
|
||||
|
||||
| Description | Quay | Docker Hub |
|
||||
| --- | --- |-----------------------------------------------|
|
||||
| Latest images for CPU | `quay.io/go-skynet/local-ai:latest-aio-cpu` | `localai/localai:latest-aio-cpu` |
|
||||
| Versioned image (e.g. for CPU) | `quay.io/go-skynet/local-ai:{{< version >}}-aio-cpu` | `localai/localai:{{< version >}}-aio-cpu` |
|
||||
| Latest images for Nvidia GPU (CUDA12) | `quay.io/go-skynet/local-ai:latest-aio-gpu-nvidia-cuda-12` | `localai/localai:latest-aio-gpu-nvidia-cuda-12` |
|
||||
| Latest images for Nvidia GPU (CUDA13) | `quay.io/go-skynet/local-ai:latest-aio-gpu-nvidia-cuda-13` | `localai/localai:latest-aio-gpu-nvidia-cuda-13` |
|
||||
| Latest images for AMD GPU | `quay.io/go-skynet/local-ai:latest-aio-gpu-hipblas` | `localai/localai:latest-aio-gpu-hipblas` |
|
||||
| Latest images for Intel GPU | `quay.io/go-skynet/local-ai:latest-aio-gpu-intel` | `localai/localai:latest-aio-gpu-intel` |
|
||||
|
||||
### Available environment variables
|
||||
|
||||
The AIO Images are inheriting the same environment variables as the base images and the environment of LocalAI (that you can inspect by calling `--help`). However, it supports additional environment variables available only from the container image
|
||||
|
||||
| Variable | Default | Description |
|
||||
| ---------------------| ------- | ----------- |
|
||||
| `PROFILE` | Auto-detected | The size of the model to use. Available: `cpu`, `gpu-8g` |
|
||||
| `MODELS` | Auto-detected | A list of models YAML Configuration file URI/URL (see also [running models]({{%relref "getting-started/models" %}})) |
|
||||
|
||||
## See Also
|
||||
|
||||
- [GPU acceleration]({{%relref "features/gpu-acceleration" %}})
|
||||
|
||||
@@ -20,7 +20,7 @@ With the CLI you can list the models with `local-ai models list` and install the
|
||||
You can also [run models manually]({{%relref "getting-started/models" %}}) by copying files into the `models` directory.
|
||||
{{% /notice %}}
|
||||
|
||||
You can test out the API endpoints using `curl`, few examples are listed below. The models we are referring here (`gpt-4`, `gpt-4-vision-preview`, `tts-1`, `whisper-1`) are the default models that come with the AIO images - you can also use any other model you have installed.
|
||||
You can test out the API endpoints using `curl`, few examples are listed below. The models we are referring here (`gpt-4`, `gpt-4-vision-preview`, `tts-1`, `whisper-1`) are examples - replace them with the model names you have installed.
|
||||
|
||||
### Text Generation
|
||||
|
||||
|
||||
@@ -30,7 +30,7 @@ docker run -p 8080:8080 --name local-ai -ti localai/localai:latest
|
||||
podman run -p 8080:8080 --name local-ai -ti localai/localai:latest
|
||||
```
|
||||
|
||||
This will start LocalAI. The API will be available at `http://localhost:8080`. For images with pre-configured models, see [All-in-One images](/getting-started/container-images/#all-in-one-images).
|
||||
This will start LocalAI. The API will be available at `http://localhost:8080`.
|
||||
|
||||
For other platforms:
|
||||
- **macOS**: Download the [DMG](macos/)
|
||||
|
||||
@@ -93,48 +93,6 @@ CUDA 13 (for Nvidia DGX Spark):
|
||||
docker run -ti --name local-ai -p 8080:8080 --runtime nvidia --gpus all localai/localai:latest-nvidia-l4t-arm64-cuda-13
|
||||
```
|
||||
|
||||
### All-in-One (AIO) Images
|
||||
|
||||
**Recommended for beginners** - These images come pre-configured with models and backends, ready to use immediately.
|
||||
|
||||
#### CPU Image
|
||||
|
||||
```bash
|
||||
docker run -ti --name local-ai -p 8080:8080 localai/localai:latest-aio-cpu
|
||||
# Or with Podman:
|
||||
podman run -ti --name local-ai -p 8080:8080 localai/localai:latest-aio-cpu
|
||||
```
|
||||
|
||||
#### GPU Images
|
||||
|
||||
**NVIDIA CUDA 13:**
|
||||
```bash
|
||||
docker run -ti --name local-ai -p 8080:8080 --gpus all localai/localai:latest-aio-gpu-nvidia-cuda-13
|
||||
# Or with Podman:
|
||||
podman run -ti --name local-ai -p 8080:8080 --device nvidia.com/gpu=all localai/localai:latest-aio-gpu-nvidia-cuda-13
|
||||
```
|
||||
|
||||
**NVIDIA CUDA 12:**
|
||||
```bash
|
||||
docker run -ti --name local-ai -p 8080:8080 --gpus all localai/localai:latest-aio-gpu-nvidia-cuda-12
|
||||
# Or with Podman:
|
||||
podman run -ti --name local-ai -p 8080:8080 --device nvidia.com/gpu=all localai/localai:latest-aio-gpu-nvidia-cuda-12
|
||||
```
|
||||
|
||||
**AMD GPU (ROCm):**
|
||||
```bash
|
||||
docker run -ti --name local-ai -p 8080:8080 --device=/dev/kfd --device=/dev/dri --group-add=video localai/localai:latest-aio-gpu-hipblas
|
||||
# Or with Podman:
|
||||
podman run -ti --name local-ai -p 8080:8080 --device rocm.com/gpu=all localai/localai:latest-aio-gpu-hipblas
|
||||
```
|
||||
|
||||
**Intel GPU:**
|
||||
```bash
|
||||
docker run -ti --name local-ai -p 8080:8080 localai/localai:latest-aio-gpu-intel
|
||||
# Or with Podman:
|
||||
podman run -ti --name local-ai -p 8080:8080 --device gpu.intel.com/all localai/localai:latest-aio-gpu-intel
|
||||
```
|
||||
|
||||
## Using Compose
|
||||
|
||||
For a more manageable setup, especially with persistent volumes, use Docker Compose or Podman Compose:
|
||||
@@ -147,8 +105,8 @@ The CDI approach is recommended for newer versions of the NVIDIA Container Toolk
|
||||
version: "3.9"
|
||||
services:
|
||||
api:
|
||||
image: localai/localai:latest-aio-gpu-nvidia-cuda-12
|
||||
# For CUDA 13, use: localai/localai:latest-aio-gpu-nvidia-cuda-13
|
||||
image: localai/localai:latest-gpu-nvidia-cuda-12
|
||||
# For CUDA 13, use: localai/localai:latest-gpu-nvidia-cuda-13
|
||||
healthcheck:
|
||||
test: ["CMD", "curl", "-f", "http://localhost:8080/readyz"]
|
||||
interval: 1m
|
||||
@@ -187,8 +145,8 @@ If you are using an older version of the NVIDIA Container Toolkit (before 1.14),
|
||||
version: "3.9"
|
||||
services:
|
||||
api:
|
||||
image: localai/localai:latest-aio-gpu-nvidia-cuda-12
|
||||
# For CUDA 13, use: localai/localai:latest-aio-gpu-nvidia-cuda-13
|
||||
image: localai/localai:latest-gpu-nvidia-cuda-12
|
||||
# For CUDA 13, use: localai/localai:latest-gpu-nvidia-cuda-13
|
||||
healthcheck:
|
||||
test: ["CMD", "curl", "-f", "http://localhost:8080/readyz"]
|
||||
interval: 1m
|
||||
@@ -227,12 +185,12 @@ To persist models and data, mount volumes:
|
||||
docker run -ti --name local-ai -p 8080:8080 \
|
||||
-v $PWD/models:/models \
|
||||
-v $PWD/data:/data \
|
||||
localai/localai:latest-aio-cpu
|
||||
localai/localai:latest
|
||||
# Or with Podman:
|
||||
podman run -ti --name local-ai -p 8080:8080 \
|
||||
-v $PWD/models:/models \
|
||||
-v $PWD/data:/data \
|
||||
localai/localai:latest-aio-cpu
|
||||
localai/localai:latest
|
||||
```
|
||||
|
||||
Or use named volumes:
|
||||
@@ -243,29 +201,16 @@ docker volume create localai-data
|
||||
docker run -ti --name local-ai -p 8080:8080 \
|
||||
-v localai-models:/models \
|
||||
-v localai-data:/data \
|
||||
localai/localai:latest-aio-cpu
|
||||
localai/localai:latest
|
||||
# Or with Podman:
|
||||
podman volume create localai-models
|
||||
podman volume create localai-data
|
||||
podman run -ti --name local-ai -p 8080:8080 \
|
||||
-v localai-models:/models \
|
||||
-v localai-data:/data \
|
||||
localai/localai:latest-aio-cpu
|
||||
localai/localai:latest
|
||||
```
|
||||
|
||||
## What's Included in AIO Images
|
||||
|
||||
All-in-One images come pre-configured with:
|
||||
|
||||
- **Text Generation**: LLM models for chat and completion
|
||||
- **Image Generation**: Stable Diffusion models
|
||||
- **Text to Speech**: TTS models
|
||||
- **Speech to Text**: Whisper models
|
||||
- **Embeddings**: Vector embedding models
|
||||
- **Function Calling**: Support for OpenAI-compatible function calling
|
||||
|
||||
The AIO images use OpenAI-compatible model names (like `gpt-4`, `gpt-4-vision-preview`) but are backed by open-source models. See the [container images documentation](/getting-started/container-images/#all-in-one-images) for the complete mapping.
|
||||
|
||||
## Next Steps
|
||||
|
||||
After installation:
|
||||
|
||||
Reference in New Issue
Block a user