LocalAI

mirror of https://github.com/mudler/LocalAI.git synced 2026-06-13 03:09:03 -04:00

Files

LocalAI [bot] a7a7bd646b fix(mlx): route vision-language models to the mlx-vlm backend (#10274 )

Vision-language checkpoints such as mlx-community/gemma-4-E4B-it-qat-4bit
declare the "image-text-to-text" pipeline tag on HuggingFace. The mlx
importer hardcoded backend "mlx" for every mlx-community model, so these
VLMs were served by the text-only mlx-lm backend whose tokenizer does not
carry the processor chat template. The template was never applied and the
model produced degenerate, looping output that echoed the prompt.

Detect the "image-text-to-text" pipeline tag in the importer and route those
models to mlx-vlm, which applies the processor-aware chat template. An
explicit backend preference still wins.

As a defensive backstop, the mlx backend now warns loudly when the loaded
model has no chat template, so a misrouted VLM surfaces the problem instead
of silently looping.

Fixes #10269


Assisted-by: Claude:claude-opus-4-8 [Claude Code]

Signed-off-by: Ettore Di Giacinto <mudler@localai.io>
Co-authored-by: Ettore Di Giacinto <mudler@localai.io>

2026-06-12 23:12:42 +02:00

backend.py

fix(mlx): route vision-language models to the mlx-vlm backend (#10274 )

2026-06-12 23:12:42 +02:00

install.sh

feat(mlx): add mlx backend (#6049 )

2025-08-22 08:42:29 +02:00

Makefile

feat(mlx): add mlx backend (#6049 )

2025-08-22 08:42:29 +02:00

mlx_cache.py

feat(mlx): add thread-safe LRU prompt cache and min_p/top_k sampling (#7556 )