LocalAI

mirror of https://github.com/mudler/LocalAI.git synced 2026-05-19 14:17:21 -04:00

Author	SHA1	Message	Date
Ettore Di Giacinto	c2f73a987e	fix(vllm): CPU build compatibility with vllm 0.14.1 Validated end-to-end on CPU with Qwen2.5-0.5B-Instruct (LoadModel, Predict, TokenizeString, Free all working). - requirements-cpu-after.txt: pin vllm to 0.14.1+cpu (pre-built wheel from GitHub releases) for x86_64 and aarch64. vllm 0.14.1 is the newest CPU wheel whose torch dependency resolves against published PyTorch builds (torch==2.9.1+cpu). Later vllm CPU wheels currently require torch==2.10.0+cpu which is only available on the PyTorch test channel with incompatible torchvision. - requirements-cpu.txt: bump torch to 2.9.1+cpu, add torchvision/torchaudio so uv resolves them consistently from the PyTorch CPU index. - install.sh: add --index-strategy=unsafe-best-match for CPU builds so uv can mix the PyTorch index and PyPI for transitive deps (matches the existing intel profile behaviour). - backend.py LoadModel: vllm >= 0.14 removed AsyncLLMEngine.get_model_config so the old code path errored out with AttributeError on model load. Switch to the new get_tokenizer()/tokenizer accessor with a fallback to building the tokenizer directly from request.Model.	2026-04-12 14:48:28 +00:00
Ettore Di Giacinto	b215843807	feat(vllm): CPU support + shared utils + vllm-omni feature parity - Split vllm install per acceleration: move generic `vllm` out of requirements-after.txt into per-profile after files (cublas12, hipblas, intel) and add CPU wheel URL for cpu-after.txt - requirements-cpu.txt now pulls torch==2.7.0+cpu from PyTorch CPU index - backend/index.yaml: register cpu-vllm / cpu-vllm-development variants - New backend/python/common/vllm_utils.py: shared parse_options, messages_to_dicts, setup_parsers helpers (used by both vllm backends) - vllm-omni: replace hardcoded chat template with tokenizer.apply_chat_template, wire native parsers via shared utils, emit ChatDelta with token counts, add TokenizeString and Free RPCs, detect CPU and set VLLM_TARGET_DEVICE - Add test_cpu_inference.py: standalone script to validate CPU build with a small model (Qwen2.5-0.5B-Instruct)	2026-04-12 14:48:28 +00:00
Ettore Di Giacinto	d6409bd2eb	Revert "chore(deps): bump torch from 2.7.0 to 2.7.1+xpu in /backend/python/vllm in the pip group across 1 directory" (#8367 ) Revert "chore(deps): bump torch from 2.7.0 to 2.7.1+xpu in /backend/python/vl…" This reverts commit `4c0e70086d`.	2026-02-03 08:34:54 +01:00
dependabot[bot]	4c0e70086d	chore(deps): bump torch from 2.7.0 to 2.7.1+xpu in /backend/python/vllm in the pip group across 1 directory (#8360 ) chore(deps): bump torch Bumps the pip group with 1 update in the /backend/python/vllm directory: torch. Updates `torch` from 2.7.0 to 2.7.1+xpu --- updated-dependencies: - dependency-name: torch dependency-version: 2.7.1+xpu dependency-type: direct:production dependency-group: pip ... Signed-off-by: dependabot[bot] <support@github.com> Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com>	2026-02-03 03:07:02 +00:00
Ettore Di Giacinto	8b889955b4	chore(deps): bump pytorch to 2.7 in vllm (#5576 ) Signed-off-by: Ettore Di Giacinto <mudler@localai.io>	2025-06-04 08:56:45 +02:00
Ettore Di Giacinto	5ffad3b004	chore(deps): remove pin on transformers (#5501 ) Signed-off-by: Ettore Di Giacinto <mudler@localai.io>	2025-05-27 09:24:27 +02:00
Ettore Di Giacinto	6a382a1afe	fix(transformers): try to pin to working release (#5426 ) Signed-off-by: Ettore Di Giacinto <mudler@localai.io>	2025-05-22 12:50:51 +02:00
Ettore Di Giacinto	3e77a17b26	fix(dependencies): pin pytorch version (#3872 ) Signed-off-by: Ettore Di Giacinto <mudler@localai.io>	2024-10-18 09:11:59 +02:00
Ettore Di Giacinto	2c8623dbb4	fix(python): move vllm to after deps, drop diffusers main deps Signed-off-by: Ettore Di Giacinto <mudler@localai.io>	2024-08-07 23:34:37 +02:00
Ettore Di Giacinto	61b5602111	fix(python): move accelerate and GPU-specific libs to build-type (#3194 ) Some of the dependencies in `requirements.txt`, even if generic, pulls down the line CUDA libraries. This changes moves mostly all GPU-specific libs to the build-type, and tries a safer approach. In `requirements.txt` now are listed only "first-level" dependencies, for instance, grpc, but libs-dependencies are moved down to the respective build-type `requirements.txt` to avoid any mixin. This should fix #2737 and #1592. Signed-off-by: Ettore Di Giacinto <mudler@localai.io>	2024-08-07 17:02:32 +02:00

10 Commits