LocalAI

mirror of https://github.com/mudler/LocalAI.git synced 2026-07-01 11:56:57 -04:00

Files

Richard Palethorpe 16b2d4c807 fix(python-backend): make JIT subprocesses work on hosts of any size (#9679 )

Two related runtime fixes for Python backends that JIT-compile CUDA
kernels at first model load (FlashInfer, PyTorch inductor, triton):

1. libbackend.sh: replace `source ${EDIR}/venv/bin/activate` with a
   minimal manual setup (_activateVenv: export VIRTUAL_ENV, prepend
   PATH, unset PYTHONHOME) computed from $EDIR at runtime. `uv venv`
   and `python -m venv` both bake the create-time absolute path into
   bin/activate (e.g. VIRTUAL_ENV='/vllm/venv' from the Docker build
   stage), so sourcing activate on a relocated venv — copied out of
   the build container and unpacked at an arbitrary backend dir —
   prepends a stale, non-existent path to $PATH. Pip-installed CLI
   tools (e.g. ninja, used by FlashInfer's NVFP4 GEMM JIT) are then
   never found and the load aborts with FileNotFoundError. Doing the
   env setup ourselves matches what `uv run` does internally and
   sidesteps the relocation problem entirely. Generic — every Python
   backend benefits.

2. vllm/run.sh: replace ninja's default -j$(nproc)+2 with an adaptive
   MAX_JOBS = min(nproc, (MemAvailable-4)/4). Each concurrent
   nvcc/cudafe++ peaks at multiple GiB; the default OOM-kills on
   memory-tight hosts (e.g. a 16 GiB desktop loading a 27B NVFP4
   model) but underutilises 100-core / 1 TB boxes. User-set MAX_JOBS
   still wins. Also pin NVCC_THREADS=2 unless overridden.

Refs: https://github.com/vllm-project/vllm/issues/20079

Assisted-by: Claude:claude-opus-4-7 [Edit] [Bash]

2026-05-06 00:28:01 +02:00

template

feat(rocm): bump to 7.x (#9323 )

2026-04-12 08:51:30 +02:00

grpc_auth.py

feat: add distributed mode (#9124 )

2026-03-30 00:47:27 +02:00

libbackend.sh

fix(python-backend): make JIT subprocesses work on hosts of any size (#9679 )

2026-05-06 00:28:01 +02:00

mlx_utils.py

feat: refactor shared helpers and enhance MLX backend functionality (#9335 )