Files
LocalAI/docs/content/features/image-generation.md
LocalAI [bot] f44034021e chore: ⬆️ Update leejet/stable-diffusion.cpp to 5a34bc7f6e0621dd2f899daa64476eac667d7ed3 (#10335)
* ⬆️ Update leejet/stable-diffusion.cpp

Signed-off-by: github-actions[bot] <41898282+github-actions[bot]@users.noreply.github.com>

* fix(stablediffusion-ggml): adapt gosd.cpp to upstream sd_ctx_params_t API

The bump to 5a34bc7 restructured sd_ctx_params_t: the boolean CPU-offload
knobs (offload_params_to_cpu, keep_clip_on_cpu, keep_vae_on_cpu,
keep_control_net_on_cpu) were replaced by backend assignment specs
(backend/params_backend), and vae_decode_only / free_params_immediately
were dropped entirely. The build broke with "no member named ..." on
every arch.

Translate the legacy options we still accept from gallery configs into
the new backend assignment specs, mirroring prepare_backend_assignments()
in the upstream CLI, so offload_params_to_cpu / keep_*_on_cpu keep
working. vae_decode_only is parsed and ignored for config compatibility.

Signed-off-by: Ettore Di Giacinto <mudler@localai.io>
Assisted-by: Claude:claude-opus-4-8 [Claude Code]

* feat(stablediffusion-ggml): expose backend/params placement options

The upstream bump introduced new sd_ctx_params_t fields for device and
memory placement (backend, params_backend, rpc_servers, max_vram,
stream_layers) plus PuLID-Flux weights (pulid_weights_path). Wire them up
as backend options so models can be split across CPU/GPU/disk/RPC:

- backend: per-component compute placement (e.g. clip=cpu,vae=cuda0)
- params_backend: per-component weight storage incl. disk mmap
- max_vram / stream_layers: graph-cut segmented parameter offload budget
- rpc_servers: offload compute to remote RPC servers
- pulid_weights_path: PuLID-Flux identity injection

The legacy keep_*_on_cpu / offload_params_to_cpu booleans now seed and
compose with the explicit backend/params_backend specs, matching upstream
prepare_backend_assignments(). Option values are taken as everything after
the first ':' so colon-bearing values (rpc_servers host:port) survive
parsing. Documented the new options in the image-generation guide.

Signed-off-by: Ettore Di Giacinto <mudler@localai.io>
Assisted-by: Claude:claude-opus-4-8 [Claude Code]

* feat(stablediffusion-ggml): distributed RPC across ggml workers

Enable the ggml RPC backend (-DSD_RPC=ON) so image generation can be
sharded across remote rpc-server workers. The ggml rpc-server is
backend-agnostic, so this reuses the exact same worker pool as the
llama.cpp backend - one set of `local-ai worker llama-cpp-rpc` /
`p2p-llama-cpp-rpc` workers accelerates both text and image generation.

RPC servers are selected by precedence:
- the explicit `rpc_servers` option, else
- the LLAMACPP_GRPC_SERVERS env var, which LocalAI's p2p worker mode
  populates automatically with discovered workers (the backend inherits
  it from the parent process env), so distributed image generation needs
  no per-model configuration.

Documented manual and p2p setup in the image-generation guide.

Signed-off-by: Ettore Di Giacinto <mudler@localai.io>
Assisted-by: Claude:claude-opus-4-8 [Claude Code]

---------

Signed-off-by: github-actions[bot] <41898282+github-actions[bot]@users.noreply.github.com>
Signed-off-by: Ettore Di Giacinto <mudler@localai.io>
Co-authored-by: mudler <2420543+mudler@users.noreply.github.com>
Co-authored-by: Ettore Di Giacinto <mudler@localai.io>
2026-06-16 12:15:45 +02:00

14 KiB

+++ disableToc = false title = "Image Generation" weight = 12 url = "/features/image-generation/" +++

anime_girl (Generated with AnimagineXL)

LocalAI supports generating images with Stable diffusion, running on CPU using C++ and Python implementations.

Usage

OpenAI docs: https://platform.openai.com/docs/api-reference/images/create

To generate an image you can send a POST request to the /v1/images/generations endpoint with the instruction as the request body:

curl http://localhost:8080/v1/images/generations -H "Content-Type: application/json" -d '{
  "prompt": "A cute baby sea otter",
  "size": "256x256"
}'

Available additional parameters: mode, step.

Note: To set a negative prompt, you can split the prompt with |, for instance: a cute baby sea otter|malformed.

curl http://localhost:8080/v1/images/generations -H "Content-Type: application/json" -d '{
  "prompt": "floating hair, portrait, ((loli)), ((one girl)), cute face, hidden hands, asymmetrical bangs, beautiful detailed eyes, eye shadow, hair ornament, ribbons, bowties, buttons, pleated skirt, (((masterpiece))), ((best quality)), colorful|((part of the head)), ((((mutated hands and fingers)))), deformed, blurry, bad anatomy, disfigured, poorly drawn face, mutation, mutated, extra limb, ugly, poorly drawn hands, missing limb, blurry, floating limbs, disconnected limbs, malformed hands, blur, out of focus, long neck, long body, Octane renderer, lowres, bad anatomy, bad hands, text",
  "size": "256x256"
}'

Backends

stablediffusion-ggml

This backend is based on stable-diffusion.cpp. Every model supported by that backend is supported indeed with LocalAI.

Setup

There are already several models in the gallery that are available to install and get up and running with this backend, you can for example run flux by searching it in the Model gallery (flux.1-dev-ggml) or start LocalAI with run:

local-ai run flux.1-dev-ggml

To use a custom model, you can follow these steps:

  1. Create a model file stablediffusion.yaml in the models folder:
name: stablediffusion
backend: stablediffusion-ggml
parameters:
  model: gguf_model.gguf
step: 25
cfg_scale: 4.5
options:
- "clip_l_path:clip_l.safetensors"
- "clip_g_path:clip_g.safetensors"
- "t5xxl_path:t5xxl-Q5_0.gguf"
- "sampler:euler"
  1. Download the required assets to the models repository
  2. Start LocalAI

Memory and device placement options

When a model does not fit entirely in VRAM, the following options: control where weights and computation are placed. They map directly to the upstream stable-diffusion.cpp options.

Option Example Description
backend backend:clip=cpu,vae=cuda0,diffusion=vulkan0 Runtime (compute) backend assignment per component. Use cpu to place a component's compute on the CPU. Component keys include te (text encoder / CLIP), vae, diffusion, controlnet.
params_backend params_backend:diffusion=disk,clip=cpu Where parameters (weights) are stored. Supports cpu, disk (mmap weights from disk to save RAM/VRAM), or per-component specs.
max_vram max_vram:8 or max_vram:-1 VRAM budget (in GiB) for graph-cut segmented parameter offload. 0 disables it, -1 auto-selects (free VRAM minus ~1 GiB). Also accepts per-backend budgets.
stream_layers stream_layers:true Enable residency + prefetch streaming on top of max_vram (no effect unless max_vram is set).
rpc_servers rpc_servers:localhost:50052,192.168.1.3:50052 Comma-separated list of host:port RPC servers to offload compute to.
pulid_weights_path pulid_weights_path:pulid.safetensors Path to PuLID-Flux weights for identity injection.

The following convenience booleans are still accepted and are translated into the backend / params_backend specs above:

Option Equivalent spec
offload_params_to_cpu:true params_backend += *=cpu
keep_clip_on_cpu:true backend += te=cpu
keep_vae_on_cpu:true backend += vae=cpu
keep_control_net_on_cpu:true backend += controlnet=cpu

For example, to mmap the diffusion weights from disk while keeping the text encoder on the CPU:

options:
- "diffusion_model"
- "sampler:euler"
- "params_backend:diffusion=disk"
- "keep_clip_on_cpu:true"

{{% alert note %}} vae_decode_only is still accepted for backwards compatibility but is now a no-op: upstream removed the flag and the model decides automatically. {{% /alert %}}

Distributed inference (RPC workers)

The stablediffusion-ggml backend can offload computation to remote ggml RPC workers, sharding a model that does not fit on a single machine. It reuses the same backend-agnostic rpc-server workers as the llama.cpp backend, so one worker pool can serve both.

Manual: point the model at running workers with the rpc_servers option:

options:
- "rpc_servers:192.168.1.10:50052,192.168.1.11:50052"

Start a worker on each remote machine the same way you would for llama.cpp:

local-ai worker llama-cpp-rpc --llama-cpp-args="--host 0.0.0.0 --port 50052"

Automatic (peer-to-peer): when LocalAI runs in [p2p worker mode]({{%relref "features/distributed_inferencing" %}}), discovered workers are published in the LLAMACPP_GRPC_SERVERS environment variable. The image-generation backend reads that variable automatically (when rpc_servers is not set), so the same local-ai worker p2p-llama-cpp-rpc workers used for text generation also accelerate image generation - no per-model configuration needed.

By default the RPC devices join the pool and participate in placement; combine with the backend / params_backend options above to pin specific components to them (e.g. backend:diffusion=rpc0).

Diffusers

Diffusers is the go-to library for state-of-the-art pretrained diffusion models for generating images, audio, and even 3D structures of molecules. LocalAI has a diffusers backend which allows image generation using the diffusers library.

anime_girl (Generated with AnimagineXL)

Model setup

The models will be downloaded the first time you use the backend from huggingface automatically.

Create a model configuration file in the models directory, for instance to use Linaqruf/animagine-xl with CPU:

name: animagine-xl
parameters:
  model: Linaqruf/animagine-xl
backend: diffusers

f16: false
diffusers:
  cuda: false # Enable for GPU usage (CUDA)
  scheduler_type: euler_a

Dependencies

This is an extra backend - in the container is already available and there is nothing to do for the setup. Do not use core images (ending with -core). If you are building manually, see the [build instructions]({{%relref "installation/build" %}}).

Model setup

The models will be downloaded the first time you use the backend from huggingface automatically.

Create a model configuration file in the models directory, for instance to use Linaqruf/animagine-xl with CPU:

name: animagine-xl
parameters:
  model: Linaqruf/animagine-xl
backend: diffusers
cuda: true
f16: true
diffusers:
  scheduler_type: euler_a

Local models

You can also use local models, or modify some parameters like clip_skip, scheduler_type, for instance:

name: stablediffusion
parameters:
  model: toonyou_beta6.safetensors
backend: diffusers
step: 30
f16: true
cuda: true
diffusers:
  pipeline_type: StableDiffusionPipeline
  enable_parameters: "negative_prompt,num_inference_steps,clip_skip"
  scheduler_type: "k_dpmpp_sde"
  clip_skip: 11

cfg_scale: 8

Configuration parameters

The following parameters are available in the configuration file:

Parameter Description Default
f16 Force the usage of float16 instead of float32 false
step Number of steps to run the model for 30
cuda Enable CUDA acceleration false
enable_parameters Parameters to enable for the model negative_prompt,num_inference_steps,clip_skip
scheduler_type Scheduler type k_dpp_sde
cfg_scale Configuration scale 8
clip_skip Clip skip None
pipeline_type Pipeline type AutoPipelineForText2Image
lora_adapters A list of lora adapters (file names relative to model directory) to apply None
lora_scales A list of lora scales (floats) to apply None

There are available several types of schedulers:

Scheduler Description
ddim DDIM
pndm PNDM
heun Heun
unipc UniPC
euler Euler
euler_a Euler a
lms LMS
k_lms LMS Karras
dpm_2 DPM2
k_dpm_2 DPM2 Karras
dpm_2_a DPM2 a
k_dpm_2_a DPM2 a Karras
dpmpp_2m DPM++ 2M
k_dpmpp_2m DPM++ 2M Karras
dpmpp_sde DPM++ SDE
k_dpmpp_sde DPM++ SDE Karras
dpmpp_2m_sde DPM++ 2M SDE
k_dpmpp_2m_sde DPM++ 2M SDE Karras

Pipelines types available:

Pipeline type Description
StableDiffusionPipeline Stable diffusion pipeline
StableDiffusionImg2ImgPipeline Stable diffusion image to image pipeline
StableDiffusionDepth2ImgPipeline Stable diffusion depth to image pipeline
DiffusionPipeline Diffusion pipeline
StableDiffusionXLPipeline Stable diffusion XL pipeline
StableVideoDiffusionPipeline Stable video diffusion pipeline
AutoPipelineForText2Image Automatic detection pipeline for text to image
VideoDiffusionPipeline Video diffusion pipeline
StableDiffusion3Pipeline Stable diffusion 3 pipeline
FluxPipeline Flux pipeline
FluxTransformer2DModel Flux transformer 2D model
SanaPipeline Sana pipeline
Advanced: Additional parameters

Additional arbitrary parameters can be specified in the option field in key/value separated by ::

name: animagine-xl
options:
- "cfg_scale:6"

Note: There is no complete parameter list. Any parameter can be passed arbitrarily and is passed to the model directly as argument to the pipeline. Different pipelines/implementations support different parameters.

The example above, will result in the following python code when generating images:

pipe(
    prompt="A cute baby sea otter", # Options passed via API
    size="256x256", # Options passed via API
    cfg_scale=6 # Additional parameter passed via configuration file
)

Usage

Text to Image

Use the image generation endpoint with the model name from the configuration file:

curl http://localhost:8080/v1/images/generations \
    -H "Content-Type: application/json" \
    -d '{
      "prompt": "<positive prompt>|<negative prompt>", 
      "model": "animagine-xl", 
      "step": 51,
      "size": "1024x1024" 
    }'

Image to Image

https://huggingface.co/docs/diffusers/using-diffusers/img2img

An example model (GPU):

name: stablediffusion-edit
parameters:
  model: nitrosocke/Ghibli-Diffusion
backend: diffusers
step: 25
cuda: true
f16: true
diffusers:
  pipeline_type: StableDiffusionImg2ImgPipeline
  enable_parameters: "negative_prompt,num_inference_steps,image"
IMAGE_PATH=/path/to/your/image
(echo -n '{"file": "'; base64 $IMAGE_PATH; echo '", "prompt": "a sky background","size": "512x512","model":"stablediffusion-edit"}') |
curl -H "Content-Type: application/json" -d @-  http://localhost:8080/v1/images/generations
🖼️ Flux kontext with stable-diffusion.cpp

LocalAI supports Flux Kontext and can be used to edit images via the API:

Install with:

local-ai run flux.1-kontext-dev

To test:

curl http://localhost:8080/v1/images/generations -H "Content-Type: application/json" -d '{
  "model": "flux.1-kontext-dev",
  "prompt": "change 'flux.cpp' to 'LocalAI'",
  "size": "256x256",
  "ref_images": [
  	"https://raw.githubusercontent.com/leejet/stable-diffusion.cpp/master/assets/flux/flux1-dev-q8_0.png"
  ]
}'

Depth to Image

https://huggingface.co/docs/diffusers/using-diffusers/depth2img

name: stablediffusion-depth
parameters:
  model: stabilityai/stable-diffusion-2-depth
backend: diffusers
step: 50
f16: true
cuda: true
diffusers:
  pipeline_type: StableDiffusionDepth2ImgPipeline
  enable_parameters: "negative_prompt,num_inference_steps,image"

cfg_scale: 6
(echo -n '{"file": "'; base64 ~/path/to/image.jpeg; echo '", "prompt": "a sky background","size": "512x512","model":"stablediffusion-depth"}') |
curl -H "Content-Type: application/json" -d @-  http://localhost:8080/v1/images/generations

img2vid

name: img2vid
parameters:
  model: stabilityai/stable-video-diffusion-img2vid
backend: diffusers
step: 25
f16: true
cuda: true
diffusers:
  pipeline_type: StableVideoDiffusionPipeline
(echo -n '{"file": "https://huggingface.co/datasets/huggingface/documentation-images/resolve/main/diffusers/svd/rocket.png?download=true","size": "512x512","model":"img2vid"}') |
curl -H "Content-Type: application/json" -X POST -d @- http://localhost:8080/v1/images/generations

txt2vid

name: txt2vid
parameters:
  model: damo-vilab/text-to-video-ms-1.7b
backend: diffusers
step: 25
f16: true
cuda: true
diffusers:
  pipeline_type: VideoDiffusionPipeline
  cuda: true
(echo -n '{"prompt": "spiderman surfing","size": "512x512","model":"txt2vid"}') |
curl -H "Content-Type: application/json" -X POST -d @- http://localhost:8080/v1/images/generations