mirror of
https://github.com/mudler/LocalAI.git
synced 2026-06-19 06:09:07 -04:00
* ⬆️ Update leejet/stable-diffusion.cpp Signed-off-by: github-actions[bot] <41898282+github-actions[bot]@users.noreply.github.com> * fix(stablediffusion-ggml): adapt gosd.cpp to upstream sd_ctx_params_t API The bump to 5a34bc7 restructured sd_ctx_params_t: the boolean CPU-offload knobs (offload_params_to_cpu, keep_clip_on_cpu, keep_vae_on_cpu, keep_control_net_on_cpu) were replaced by backend assignment specs (backend/params_backend), and vae_decode_only / free_params_immediately were dropped entirely. The build broke with "no member named ..." on every arch. Translate the legacy options we still accept from gallery configs into the new backend assignment specs, mirroring prepare_backend_assignments() in the upstream CLI, so offload_params_to_cpu / keep_*_on_cpu keep working. vae_decode_only is parsed and ignored for config compatibility. Signed-off-by: Ettore Di Giacinto <mudler@localai.io> Assisted-by: Claude:claude-opus-4-8 [Claude Code] * feat(stablediffusion-ggml): expose backend/params placement options The upstream bump introduced new sd_ctx_params_t fields for device and memory placement (backend, params_backend, rpc_servers, max_vram, stream_layers) plus PuLID-Flux weights (pulid_weights_path). Wire them up as backend options so models can be split across CPU/GPU/disk/RPC: - backend: per-component compute placement (e.g. clip=cpu,vae=cuda0) - params_backend: per-component weight storage incl. disk mmap - max_vram / stream_layers: graph-cut segmented parameter offload budget - rpc_servers: offload compute to remote RPC servers - pulid_weights_path: PuLID-Flux identity injection The legacy keep_*_on_cpu / offload_params_to_cpu booleans now seed and compose with the explicit backend/params_backend specs, matching upstream prepare_backend_assignments(). Option values are taken as everything after the first ':' so colon-bearing values (rpc_servers host:port) survive parsing. Documented the new options in the image-generation guide. Signed-off-by: Ettore Di Giacinto <mudler@localai.io> Assisted-by: Claude:claude-opus-4-8 [Claude Code] * feat(stablediffusion-ggml): distributed RPC across ggml workers Enable the ggml RPC backend (-DSD_RPC=ON) so image generation can be sharded across remote rpc-server workers. The ggml rpc-server is backend-agnostic, so this reuses the exact same worker pool as the llama.cpp backend - one set of `local-ai worker llama-cpp-rpc` / `p2p-llama-cpp-rpc` workers accelerates both text and image generation. RPC servers are selected by precedence: - the explicit `rpc_servers` option, else - the LLAMACPP_GRPC_SERVERS env var, which LocalAI's p2p worker mode populates automatically with discovered workers (the backend inherits it from the parent process env), so distributed image generation needs no per-model configuration. Documented manual and p2p setup in the image-generation guide. Signed-off-by: Ettore Di Giacinto <mudler@localai.io> Assisted-by: Claude:claude-opus-4-8 [Claude Code] --------- Signed-off-by: github-actions[bot] <41898282+github-actions[bot]@users.noreply.github.com> Signed-off-by: Ettore Di Giacinto <mudler@localai.io> Co-authored-by: mudler <2420543+mudler@users.noreply.github.com> Co-authored-by: Ettore Di Giacinto <mudler@localai.io>
57 KiB
57 KiB