mirror/LocalAI

mirror of https://github.com/mudler/LocalAI.git synced 2026-06-11 10:19:33 -04:00

Files

History

pos-ei-don 228a6dfe79 fix(vllm): restore compatibility with vLLM >= 0.22 (get_tokenizer moved to vllm.tokenizers) (#10252 )

fix(vllm): restore compatibility with vLLM >= 0.22 (get_tokenizer moved)

vLLM 0.22 moved get_tokenizer from vllm.transformers_utils.tokenizer
to vllm.tokenizers. Since the backend requirements install vllm
unpinned, freshly built/installed vllm backends currently fail to
start with ModuleNotFoundError: No module named
'vllm.transformers_utils.tokenizer' (surfacing as 'grpc service not
ready' when loading a model).

Use the same try/except version-compat import pattern already used
elsewhere in this file: try the new vllm.tokenizers location first and
fall back to the pre-0.22 path.

Tested on a DGX Spark (GB10, ARM64) with the
cuda13-nvidia-l4t-arm64-vllm backend and vllm 0.22.0: model load, chat
completions and tool calls all work with this patch applied.

Signed-off-by: pos-ei-don <1822533+pos-ei-don@users.noreply.github.com>
Co-authored-by: Claude Opus 4.8 (1M context) <noreply@anthropic.com>

2026-06-11 09:05:23 +02:00

..

backend.py

fix(vllm): restore compatibility with vLLM >= 0.22 (get_tokenizer moved to vllm.tokenizers) (#10252 )

2026-06-11 09:05:23 +02:00

install.sh

fix(L4T13 backends): switch vllm/sglang/vllm-omni to PyPI aarch64+cu130 wheels (#9950 )

2026-05-22 23:01:22 +02:00

Makefile

feat(mlx): add mlx backend (#6049 )

2025-08-22 08:42:29 +02:00

package.sh

feat(vllm, distributed): tensor parallel distributed workers (#9612 )

2026-05-06 00:22:50 +02:00

README.md

refactor: move backends into the backends directory (#1279 )

2023-11-13 22:40:16 +01:00

requirements-after.txt

feat(vllm): parity with llama.cpp backend (#9328 )

2026-04-13 11:00:29 +02:00

requirements-cpu-after.txt

feat(vllm): parity with llama.cpp backend (#9328 )

2026-04-13 11:00:29 +02:00

requirements-cpu.txt

feat(vllm): parity with llama.cpp backend (#9328 )

2026-04-13 11:00:29 +02:00

requirements-cublas12-after.txt

fix(vllm): drop flash-attn wheel to avoid torch 2.10 ABI mismatch (#9557 )

2026-04-25 15:38:13 +00:00

requirements-cublas12.txt

fix(vllm): drop flash-attn wheel to avoid torch 2.10 ABI mismatch (#9557 )

2026-04-25 15:38:13 +00:00

requirements-cublas13-after.txt

chore: ⬆️ Update vllm-project/vllm cu130 wheel to 0.22.1 (#10188 )

2026-06-05 23:42:50 +02:00

requirements-cublas13.txt

feat(backends): add CUDA 13 + L4T arm64 CUDA 13 variants for vllm/vllm-omni/sglang (#9553 )

2026-04-25 12:26:29 +02:00

requirements-hipblas-after.txt

feat(vllm): parity with llama.cpp backend (#9328 )

2026-04-13 11:00:29 +02:00

requirements-hipblas.txt

feat(rocm): bump to 7.x (#9323 )

2026-04-12 08:51:30 +02:00

requirements-install.txt

fix(vllm): seed pybind11 for fastsafetensors build under --no-build-isolation

2026-04-28 20:08:26 +00:00

requirements-intel-after.txt

feat(vllm, distributed): tensor parallel distributed workers (#9612 )

2026-05-06 00:22:50 +02:00

requirements-intel.txt

feat(vllm, distributed): tensor parallel distributed workers (#9612 )

2026-05-06 00:22:50 +02:00

requirements-l4t13-after.txt

fix(L4T13 backends): switch vllm/sglang/vllm-omni to PyPI aarch64+cu130 wheels (#9950 )

2026-05-22 23:01:22 +02:00

requirements-l4t13.txt

fix(L4T13 backends): switch vllm/sglang/vllm-omni to PyPI aarch64+cu130 wheels (#9950 )

2026-05-22 23:01:22 +02:00

requirements.txt

chore(deps): bump grpcio from 1.80.0 to 1.81.0 in /backend/python/vllm (#10157 )

2026-06-03 10:37:16 +02:00

run.sh

fix(python-backend): make JIT subprocesses work on hosts of any size (#9679 )

2026-05-06 00:28:01 +02:00

test.py

feat(vllm): expose AsyncEngineArgs via generic engine_args YAML map (#9563 )

2026-04-29 00:49:28 +02:00

test.sh

feat: Add backend gallery (#5607 )

2025-06-15 14:56:52 +02:00

README.md

Creating a separate environment for the vllm project

make vllm