LocalAI/backend/python/common/vllm_utils.py at ea2bbabffd4a037cb1851a2be56dae577f058069

mirror of https://github.com/mudler/LocalAI.git synced 2026-04-17 05:18:53 -04:00

Files

Ettore Di Giacinto b215843807 feat(vllm): CPU support + shared utils + vllm-omni feature parity

- Split vllm install per acceleration: move generic `vllm` out of
  requirements-after.txt into per-profile after files (cublas12, hipblas,
  intel) and add CPU wheel URL for cpu-after.txt
- requirements-cpu.txt now pulls torch==2.7.0+cpu from PyTorch CPU index
- backend/index.yaml: register cpu-vllm / cpu-vllm-development variants
- New backend/python/common/vllm_utils.py: shared parse_options,
  messages_to_dicts, setup_parsers helpers (used by both vllm backends)
- vllm-omni: replace hardcoded chat template with tokenizer.apply_chat_template,
  wire native parsers via shared utils, emit ChatDelta with token counts,
  add TokenizeString and Free RPCs, detect CPU and set VLLM_TARGET_DEVICE
- Add test_cpu_inference.py: standalone script to validate CPU build with
  a small model (Qwen2.5-0.5B-Instruct)

2026-04-12 14:48:28 +00:00

2.9 KiB

Raw Blame History

View Raw

2.9 KiB Raw Blame History

2.9 KiB

Raw Blame History