mirror of
https://github.com/mudler/LocalAI.git
synced 2026-06-25 09:09:07 -04:00
feat(vllm): macOS/Metal support via vllm-metal (MLX)
Add an additive Apple-Silicon path to the existing vllm Python backend so vLLM runs on macOS via vllm-metal (github.com/vllm-project/vllm-metal). Spike outcome (proven on a real M4 / macOS 26.5, Qwen3-0.6B): - vllm-metal registers through vLLM's platform-plugin entry point (metal -> vllm_metal:register); MetalPlatform activates and runs on the GPU through MLX. - LocalAI's backend.py is UNCHANGED: AsyncEngineArgs(...) -> AsyncLLMEngine.from_engine_args transparently resolves to vLLM 0.23's v1 AsyncLLM MLX engine, and async generate produced correct output. - backend.py is NOT touched: its only empty_cache() call is CUDA-only (guarded by torch.cuda.is_available()), so the benign shutdown-only "Allocator for mps is not a DeviceAllocator" noise comes from vLLM's internal EngineCore teardown, not from our code. Changes (all gated behind a darwin condition; Linux/CUDA/ROCm/Intel paths are byte-for-byte unchanged): - install.sh: darwin branch forces PYTHON_VERSION=3.12 (vllm-metal requirement), creates/activates LocalAI's managed venv via ensureVenv, then reproduces vllm-metal's installer INTO that venv (build vLLM 0.23.0 from the release source tarball against requirements/cpu.txt, then install the prebuilt vllm-metal wheel from its latest GitHub release), and runs runProtogen. installRequirements is skipped on darwin. - backend-matrix.yml: add a vllm includeDarwin entry (mps, python). - index.yaml: add metal capability + concrete metal-vllm / metal-vllm-development child entries mirroring the metal-kitten-tts template. Version coupling: vllm-metal pins vLLM 0.23.0, equal to LocalAI's current vllm pin. Bumping vllm must be coordinated with a supporting vllm-metal release; documented in install.sh and requirements-cublas13-after.txt. Signed-off-by: Ettore Di Giacinto <mudler@localai.io> Assisted-by: Claude:opus-4.8 [Claude Code]
This commit is contained in:
7
.github/backend-matrix.yml
vendored
7
.github/backend-matrix.yml
vendored
@@ -4974,6 +4974,13 @@ includeDarwin:
|
||||
- backend: "kitten-tts"
|
||||
tag-suffix: "-metal-darwin-arm64-kitten-tts"
|
||||
build-type: "mps"
|
||||
# vLLM on Apple Silicon via vllm-metal (MLX). The install is custom
|
||||
# (backend/python/vllm/install.sh has a darwin branch); lang stays python so
|
||||
# backend_build_darwin.yml drives it through build-darwin-python-backend ->
|
||||
# scripts/build/python-darwin.sh, which runs the backend's install.sh.
|
||||
- backend: "vllm"
|
||||
tag-suffix: "-metal-darwin-arm64-vllm"
|
||||
build-type: "mps"
|
||||
- backend: "piper"
|
||||
tag-suffix: "-metal-darwin-arm64-piper"
|
||||
build-type: "metal"
|
||||
|
||||
Reference in New Issue
Block a user