mirror of
https://github.com/mudler/LocalAI.git
synced 2026-06-27 09:57:14 -04:00
* fix(kokoros): implement new Backend RPCs to fix the build
The backend.proto grew six RPCs (SoundDetection, Depth, TokenClassify,
Score and the bidi-streaming Forward) that the kokoros gRPC service never
implemented, so the trait impl no longer satisfies `Backend`:
error[E0046]: not all trait items implemented, missing:
`sound_detection`, `depth`, `token_classify`, `score`,
`ForwardStream`, `forward`
kokoros is a TTS backend with no use for these, so add `unimplemented`
stubs (plus the `ForwardStream` associated type) matching the existing
pattern for every other unsupported RPC in this file.
Signed-off-by: Ettore Di Giacinto <mudler@localai.io>
Assisted-by: Claude:claude-opus-4-8 [Claude Code]
* fix(fish-speech): add setuptools-rust for the editable source install
install.sh installs the fish-speech source tree editable with
`--no-build-isolation`, which means the build backends of its transitive
dependencies must already be present in the venv. One of them builds a
Rust extension and its metadata step fails with:
ModuleNotFoundError: No module named 'setuptools_rust'
Add setuptools-rust to requirements.txt so installRequirements provisions
it before the editable install runs.
Signed-off-by: Ettore Di Giacinto <mudler@localai.io>
Assisted-by: Claude:claude-opus-4-8 [Claude Code]
* fix(llama-cpp-quantization): vendor convert_hf_to_gguf.py with conversion/
Upstream llama.cpp split the model-specific logic out of the single
convert_hf_to_gguf.py file into a sibling `conversion/` package, so the
script now starts with `from conversion import ...`. Downloading just the
one file therefore fails at runtime with:
ModuleNotFoundError: No module named 'conversion'
Clone the repo (reusing the clone already needed to build llama-quantize)
and copy both the script and the `conversion/` package into the backend
dir. Python puts the script's own directory on sys.path[0], so the package
resolves when it sits beside the script.
Signed-off-by: Ettore Di Giacinto <mudler@localai.io>
Assisted-by: Claude:claude-opus-4-8 [Claude Code]
* fix(sglang): pin the CPU source build to sglang v0.5.11
The CPU profile builds sgl-kernel from a `git clone` of sglang with no
ref, so it always tracks master. Recent master added CPU kernels (e.g.
mamba/fla.cpp) that fail to compile in our builder:
constexpr variable 'scale' must be initialized by a constant
static library kineto_LIBRARY-NOTFOUND not found
Pin the clone to v0.5.11, the same release the GPU path already floors on
(requirements-cublas12-after.txt). Overridable via SGLANG_VERSION so the
pin can be bumped deliberately.
Signed-off-by: Ettore Di Giacinto <mudler@localai.io>
Assisted-by: Claude:claude-opus-4-8 [Claude Code]
---------
Signed-off-by: Ettore Di Giacinto <mudler@localai.io>
Co-authored-by: Ettore Di Giacinto <mudler@localai.io>
118 lines
5.0 KiB
Bash
Executable File
118 lines
5.0 KiB
Bash
Executable File
#!/bin/bash
|
|
set -e
|
|
|
|
EXTRA_PIP_INSTALL_FLAGS="--no-build-isolation"
|
|
|
|
# Avoid overcommitting the CPU during builds that compile native code.
|
|
export NVCC_THREADS=2
|
|
export MAX_JOBS=1
|
|
|
|
backend_dir=$(dirname $0)
|
|
|
|
if [ -d $backend_dir/common ]; then
|
|
source $backend_dir/common/libbackend.sh
|
|
else
|
|
source $backend_dir/../common/libbackend.sh
|
|
fi
|
|
|
|
if [ "x${BUILD_PROFILE}" == "xintel" ]; then
|
|
EXTRA_PIP_INSTALL_FLAGS+=" --upgrade --index-strategy=unsafe-first-match"
|
|
fi
|
|
|
|
if [ "x${BUILD_PROFILE}" == "xcpu" ]; then
|
|
EXTRA_PIP_INSTALL_FLAGS+=" --index-strategy=unsafe-best-match"
|
|
fi
|
|
|
|
# cublas12 needs a cu128 torch index (see requirements-cublas12.txt) — without
|
|
# unsafe-best-match uv falls through to default PyPI's cu130 torch wheel and
|
|
# the resulting sgl-kernel can't load on our cu12 host libs.
|
|
if [ "x${BUILD_PROFILE}" == "xcublas12" ]; then
|
|
EXTRA_PIP_INSTALL_FLAGS+=" --index-strategy=unsafe-best-match"
|
|
fi
|
|
|
|
# sglang 0.5.11 (Gemma 4 support) declares flash-attn-4 as a hard dep, but
|
|
# upstream only publishes pre-release wheels (4.0.0b*). uv rejects
|
|
# pre-releases by default — opt in for sglang specifically. Drop this once
|
|
# flash-attn-4 4.0 stable lands.
|
|
EXTRA_PIP_INSTALL_FLAGS+=" --prerelease=allow"
|
|
|
|
# JetPack 7 / L4T arm64 sglang + torch wheels come straight from PyPI now
|
|
# (torch 2.11+ ships aarch64 + cu130 manylinux wheels and sglang 0.5.11+
|
|
# ships a cp312 aarch64 wheel pinned to that torch). They're cp312-only,
|
|
# so bump the venv Python accordingly.
|
|
# https://pytorch.org/blog/vllm-and-pytorch-work-together-to-improve-the-developer-experience-on-aarch64/
|
|
if [ "x${BUILD_PROFILE}" == "xl4t13" ]; then
|
|
PYTHON_VERSION="3.12"
|
|
PYTHON_PATCH="12"
|
|
PY_STANDALONE_TAG="20251120"
|
|
fi
|
|
|
|
# sglang's CPU path has no prebuilt wheel on PyPI — upstream publishes
|
|
# a separate pyproject_cpu.toml that must be swapped in before `pip install`.
|
|
# Reference: docker/xeon.Dockerfile in the sglang upstream repo.
|
|
#
|
|
# When BUILD_TYPE is empty (CPU profile) or FROM_SOURCE=true is forced,
|
|
# install torch/transformers/etc from requirements-cpu.txt, then clone
|
|
# sglang and install its python/ and sgl-kernel/ packages from source
|
|
# using the CPU pyproject.
|
|
if [ "x${BUILD_TYPE}" == "x" ] || [ "x${FROM_SOURCE:-}" == "xtrue" ]; then
|
|
# sgl-kernel's CPU build links against libnuma and libtbb. Install
|
|
# them here (Docker builder stage) before running the source build.
|
|
# Harmless no-op on runs outside the docker build since installRequirements
|
|
# below still needs them only if we reach the source build branch.
|
|
if command -v apt-get >/dev/null 2>&1 && [ "$(id -u)" = "0" ]; then
|
|
apt-get update
|
|
DEBIAN_FRONTEND=noninteractive apt-get install -y --no-install-recommends \
|
|
libnuma-dev numactl libtbb-dev libgomp1 libomp-dev google-perftools \
|
|
build-essential cmake ninja-build
|
|
fi
|
|
|
|
installRequirements
|
|
|
|
# sgl-kernel's pyproject_cpu.toml uses scikit-build-core as its build
|
|
# backend. With --no-build-isolation, that (and ninja/cmake) must be
|
|
# present in the venv before we build from source.
|
|
uv pip install --no-build-isolation "scikit-build-core>=0.10" ninja cmake
|
|
|
|
# sgl-kernel's CPU shm.cpp uses __m512 AVX-512 intrinsics unconditionally.
|
|
# csrc/cpu/CMakeLists.txt hard-codes add_compile_options(-march=native),
|
|
# which on runners without AVX-512 in /proc/cpuinfo fails with
|
|
# "__m512 return without 'avx512f' enabled changes the ABI".
|
|
# CXXFLAGS alone is insufficient because CMake's add_compile_options()
|
|
# appends -march=native *after* CXXFLAGS, overriding it.
|
|
# We therefore patch the CMakeLists.txt to replace -march=native with
|
|
# -march=sapphirerapids so the flag is consistent throughout the build.
|
|
# The resulting binary still requires an AVX-512 capable CPU at runtime,
|
|
# same constraint sglang upstream documents in docker/xeon.Dockerfile.
|
|
|
|
# Pin the source build to the same release the GPU path floors on
|
|
# (0.5.11, see requirements-cublas12-after.txt). An unpinned master clone
|
|
# pulls in newer CPU kernels (e.g. mamba/fla.cpp) that fail to compile
|
|
# (constexpr non-constant + kineto_LIBRARY-NOTFOUND). Bump deliberately.
|
|
SGLANG_VERSION="${SGLANG_VERSION:-v0.5.11}"
|
|
_sgl_src=$(mktemp -d)
|
|
trap 'rm -rf "${_sgl_src}"' EXIT
|
|
git clone --depth 1 --branch "${SGLANG_VERSION}" \
|
|
https://github.com/sgl-project/sglang "${_sgl_src}/sglang"
|
|
|
|
# Patch -march=native → -march=sapphirerapids in the CPU kernel CMakeLists
|
|
sed -i 's/-march=native/-march=sapphirerapids/g' \
|
|
"${_sgl_src}/sglang/sgl-kernel/csrc/cpu/CMakeLists.txt"
|
|
|
|
pushd "${_sgl_src}/sglang/sgl-kernel"
|
|
if [ -f pyproject_cpu.toml ]; then
|
|
cp pyproject_cpu.toml pyproject.toml
|
|
fi
|
|
uv pip install ${EXTRA_PIP_INSTALL_FLAGS:-} .
|
|
popd
|
|
|
|
pushd "${_sgl_src}/sglang/python"
|
|
if [ -f pyproject_cpu.toml ]; then
|
|
cp pyproject_cpu.toml pyproject.toml
|
|
fi
|
|
uv pip install ${EXTRA_PIP_INSTALL_FLAGS:-} .
|
|
popd
|
|
else
|
|
installRequirements
|
|
fi
|