LocalAI

mirror of https://github.com/mudler/LocalAI.git synced 2026-06-07 00:06:51 -04:00

Author	SHA1	Message	Date
Ettore Di Giacinto	cd6079b2f3	feat(backend): add buun-llama-cpp fork (DFlash + TCQ KV-cache) spiritbuun/buun-llama-cpp is a fork of TheTom/llama-cpp-turboquant that adds two independent features on top: DFlash block-diffusion speculative decoding (via a dedicated DFlashDraftModel GGUF arch) and two extra TCQ KV-cache variants (turbo2_tcq, turbo3_tcq) on top of TurboQuant's turbo2/turbo3/turbo4. Follows the turboquant thin-wrapper pattern — reuses backend/cpp/llama-cpp grpc-server sources verbatim, patches only the build copy to extend the KV allow-list and wire up buun-exclusive tree_budget / draft_topk options. DraftModel is already wired end-to-end (proto field 39 → params.speculative), so DFlash activation only needs the existing options passthrough (spec_type:dflash) plus the drafter path in draft_model. CacheTypeOptions now surfaces the five turbo* values so the React UI dropdown shows them — benefits turboquant too (previously users had to type them in YAML manually). Assisted-by: Claude:Opus-4.7 [Read] [Edit] [Bash] [WebFetch] Signed-off-by: Ettore Di Giacinto <mudler@localai.io>	2026-04-24 12:52:53 +00:00
Ettore Di Giacinto	39573ecd2a	chore(whisperx): drop ROCm/hipblas build target (#9474 ) whisperx has no upstream AMD GPU support and its core transcription path (faster-whisper -> ctranslate2) falls back to CPU on AMD since the PyPI ctranslate2 is CUDA-only. The torch rocm wheels would accelerate only the alignment/diarization stages, producing a misleadingly half-working image. Drop the hipblas variant rather than shipping a partially accelerated build users can't distinguish from the real thing. AMD hosts now fall through the capability map to cpu-whisperx / cpu-whisperx-development. Also removes the now-dangling rocm-whisperx assertion from pkg/system/capabilities_test.go and the ROCm mention from the whisperx row in docs/content/reference/compatibility-table.md. Assisted-by: Claude Code:claude-opus-4-7	2026-04-21 21:50:18 +02:00
Ettore Di Giacinto	0e7c0adee4	docs: document tool calling on vLLM and MLX backends openai-functions.md used to claim LocalAI tool calling worked only on llama.cpp-compatible models. That was true when it was written; it's not true now — vLLM (since PR #9328) and MLX/MLX-VLM both extract structured tool calls from model output. - openai-functions.md: new 'Supported backends' matrix covering llama.cpp, vllm, vllm-omni, mlx, mlx-vlm, with the key distinction that vllm needs an explicit tool_parser: option while mlx auto- detects from the chat template. Reasoning content (think tags) is extracted on the same set of backends. Added setup snippets for both the vllm and mlx paths, and noted the gallery importer pre-fills tool_parser:/reasoning_parser: for known families. - compatibility-table.md: fix the stale 'Streaming: no' for vllm, vllm-omni, mlx, mlx-vlm (all four support streaming now). Add 'Functions' to their capabilities. Also widen the MLX Acceleration column to reflect the CPU/CUDA/Jetson L4T backends that already exist in backend/index.yaml — 'Metal' on its own was misleading.	2026-04-13 16:58:55 +00:00
Ettore Di Giacinto	9ca03cf9cc	feat(backends): add ik-llama-cpp (#9326 ) * feat(backends): add ik-llama-cpp Signed-off-by: Ettore Di Giacinto <mudler@localai.io> * chore: add grpc e2e suite, hook to CI, update README Signed-off-by: Ettore Di Giacinto <mudler@localai.io> * Apply suggestion from @mudler Signed-off-by: Ettore Di Giacinto <mudler@users.noreply.github.com> * Apply suggestion from @mudler Signed-off-by: Ettore Di Giacinto <mudler@users.noreply.github.com> --------- Signed-off-by: Ettore Di Giacinto <mudler@localai.io> Signed-off-by: Ettore Di Giacinto <mudler@users.noreply.github.com>	2026-04-12 13:51:28 +02:00
Ettore Di Giacinto	cecd8d6aa5	chore(docs): simplify Signed-off-by: Ettore Di Giacinto <mudler@localai.io>	2026-03-22 20:24:44 +00:00
LocalAI [bot]	9da24cdf85	docs: Update model compatibility documentation with missing backends (#8889 ) * docs: Update model compatibility documentation with missing backends Added the following backends to README.md and compatibility-table.md: - vllm-omni: Multimodal vLLM with vision and audio support - nemo: NVIDIA NeMo framework for speech models - outetts: OuteTTS with voice cloning capabilities - faster-qwen3-tts: Faster Qwen3 TTS implementation - qwen-asr: Qwen automatic speech recognition - voxcpm: VoxCPM speech understanding model - whisperx: Enhanced Whisper with word-level transcription These backends exist in the codebase (backend/index.yaml) but were missing from the documentation. This update ensures accurate reflection of currently supported backends in LocalAI. * Apply suggestion from @mudler Signed-off-by: Ettore Di Giacinto <mudler@users.noreply.github.com> --------- Signed-off-by: Ettore Di Giacinto <mudler@users.noreply.github.com> Co-authored-by: localai-bot <localai-bot@example.com> Co-authored-by: Ettore Di Giacinto <mudler@users.noreply.github.com>	2026-03-09 09:22:34 +01:00
Ettore Di Giacinto	26a374b717	chore: drop bark which is unmaintained (#8207 ) Signed-off-by: Ettore Di Giacinto <mudler@localai.io>	2026-01-25 09:26:40 +01:00
Ettore Di Giacinto	05904c77f5	chore(exllama): drop backend now almost deprecated (#8186 ) exllama2 development has stalled and only old architectures are supported. exllamav3 is still in development, meanwhile cleaning up exllama2 from the gallery. Signed-off-by: Ettore Di Giacinto <mudler@localai.io>	2026-01-24 08:57:37 +01:00
Ettore Di Giacinto	a6ff354c86	feat(tts): add pocket-tts backend (#8018 ) * feat(pocket-tts): add new backend Signed-off-by: Ettore Di Giacinto <mudler@localai.io> * Add to the gallery Signed-off-by: Ettore Di Giacinto <mudler@localai.io> * fixups Signed-off-by: Ettore Di Giacinto <mudler@localai.io> * Update docs Signed-off-by: Ettore Di Giacinto <mudler@localai.io> --------- Signed-off-by: Ettore Di Giacinto <mudler@localai.io>	2026-01-13 23:35:19 +01:00
Richard Palethorpe	e6ba26c3e7	chore: Update to Ubuntu24.04 (cont #7423 ) (#7769 ) * ci(workflows): bump GitHub Actions images to Ubuntu 24.04 Signed-off-by: Alessandro Sturniolo <alessandro.sturniolo@gmail.com> * ci(workflows): remove CUDA 11.x support from GitHub Actions (incompatible with ubuntu:24.04) Signed-off-by: Alessandro Sturniolo <alessandro.sturniolo@gmail.com> * ci(workflows): bump GitHub Actions CUDA support to 12.9 Signed-off-by: Alessandro Sturniolo <alessandro.sturniolo@gmail.com> * build(docker): bump base image to ubuntu:24.04 and adjust Vulkan SDK/packages Signed-off-by: Alessandro Sturniolo <alessandro.sturniolo@gmail.com> * fix(backend): correct context paths for Python backends in workflows, Makefile and Dockerfile Signed-off-by: Alessandro Sturniolo <alessandro.sturniolo@gmail.com> * chore(make): disable parallel backend builds to avoid race conditions Signed-off-by: Alessandro Sturniolo <alessandro.sturniolo@gmail.com> * chore(make): export CUDA_MAJOR_VERSION and CUDA_MINOR_VERSION for override Signed-off-by: Alessandro Sturniolo <alessandro.sturniolo@gmail.com> * build(backend): update backend Dockerfiles to Ubuntu 24.04 Signed-off-by: Alessandro Sturniolo <alessandro.sturniolo@gmail.com> * chore(backend): add ROCm env vars and default AMDGPU_TARGETS for hipBLAS builds Signed-off-by: Alessandro Sturniolo <alessandro.sturniolo@gmail.com> * chore(chatterbox): bump ROCm PyTorch to 2.9.1+rocm6.4 and update index URL; align hipblas requirements Signed-off-by: Alessandro Sturniolo <alessandro.sturniolo@gmail.com> * chore: add local-ai-launcher to .gitignore Signed-off-by: Alessandro Sturniolo <alessandro.sturniolo@gmail.com> * ci(workflows): fix backends GitHub Actions workflows after rebase Signed-off-by: Alessandro Sturniolo <alessandro.sturniolo@gmail.com> * build(docker): use build-time UBUNTU_VERSION variable Signed-off-by: Alessandro Sturniolo <alessandro.sturniolo@gmail.com> * chore(docker): remove libquadmath0 from requirements-stage base image Signed-off-by: Alessandro Sturniolo <alessandro.sturniolo@gmail.com> * chore(make): add backends/vllm to .NOTPARALLEL to prevent parallel builds Signed-off-by: Alessandro Sturniolo <alessandro.sturniolo@gmail.com> * fix(docker): correct CUDA installation steps in backend Dockerfiles Signed-off-by: Alessandro Sturniolo <alessandro.sturniolo@gmail.com> * chore(backend): update ROCm to 6.4 and align Python hipblas requirements Signed-off-by: Alessandro Sturniolo <alessandro.sturniolo@gmail.com> * ci(workflows): switch GitHub Actions runners to Ubuntu-24.04 for CUDA on arm64 builds Signed-off-by: Alessandro Sturniolo <alessandro.sturniolo@gmail.com> * build(docker): update base image and backend Dockerfiles for Ubuntu 24.04 compatibility on arm64 Signed-off-by: Alessandro Sturniolo <alessandro.sturniolo@gmail.com> * build(backend): increase timeout for uv installs behind slow networks on backend/Dockerfile.python Signed-off-by: Alessandro Sturniolo <alessandro.sturniolo@gmail.com> * ci(workflows): switch GitHub Actions runners to Ubuntu-24.04 for vibevoice backend Signed-off-by: Alessandro Sturniolo <alessandro.sturniolo@gmail.com> * ci(workflows): fix failing GitHub Actions runners Signed-off-by: Alessandro Sturniolo <alessandro.sturniolo@gmail.com> * fix: Allow FROM_SOURCE to be unset, use upstream Intel images etc. Signed-off-by: Richard Palethorpe <io@richiejp.com> * chore(build): rm all traces of CUDA 11 Signed-off-by: Richard Palethorpe <io@richiejp.com> * chore(build): Add Ubuntu codename as an argument Signed-off-by: Richard Palethorpe <io@richiejp.com> --------- Signed-off-by: Alessandro Sturniolo <alessandro.sturniolo@gmail.com> Signed-off-by: Richard Palethorpe <io@richiejp.com> Co-authored-by: Alessandro Sturniolo <alessandro.sturniolo@gmail.com>	2026-01-06 15:26:42 +01:00
Ettore Di Giacinto	bf2f95c684	chore(docs): update docs with cuda 13 instructions and the new vibevoice backend Signed-off-by: Ettore Di Giacinto <mudler@localai.io>	2025-12-25 10:00:07 +01:00
Ettore Di Giacinto	2cc4809b0d	feat: docs revamp (#7313 ) * docs Signed-off-by: Ettore Di Giacinto <mudler@localai.io> * Small enhancements Signed-off-by: Ettore Di Giacinto <mudler@localai.io> * Enhancements * Default to zen-dark Signed-off-by: Ettore Di Giacinto <mudler@localai.io> * fixups Signed-off-by: Ettore Di Giacinto <mudler@localai.io> --------- Signed-off-by: Ettore Di Giacinto <mudler@localai.io>	2025-11-19 22:21:20 +01:00

12 Commits