mirror of
https://github.com/mudler/LocalAI.git
synced 2026-06-27 09:57:14 -04:00
* fix(kokoros): implement new Backend RPCs to fix the build
The backend.proto grew six RPCs (SoundDetection, Depth, TokenClassify,
Score and the bidi-streaming Forward) that the kokoros gRPC service never
implemented, so the trait impl no longer satisfies `Backend`:
error[E0046]: not all trait items implemented, missing:
`sound_detection`, `depth`, `token_classify`, `score`,
`ForwardStream`, `forward`
kokoros is a TTS backend with no use for these, so add `unimplemented`
stubs (plus the `ForwardStream` associated type) matching the existing
pattern for every other unsupported RPC in this file.
Signed-off-by: Ettore Di Giacinto <mudler@localai.io>
Assisted-by: Claude:claude-opus-4-8 [Claude Code]
* fix(fish-speech): add setuptools-rust for the editable source install
install.sh installs the fish-speech source tree editable with
`--no-build-isolation`, which means the build backends of its transitive
dependencies must already be present in the venv. One of them builds a
Rust extension and its metadata step fails with:
ModuleNotFoundError: No module named 'setuptools_rust'
Add setuptools-rust to requirements.txt so installRequirements provisions
it before the editable install runs.
Signed-off-by: Ettore Di Giacinto <mudler@localai.io>
Assisted-by: Claude:claude-opus-4-8 [Claude Code]
* fix(llama-cpp-quantization): vendor convert_hf_to_gguf.py with conversion/
Upstream llama.cpp split the model-specific logic out of the single
convert_hf_to_gguf.py file into a sibling `conversion/` package, so the
script now starts with `from conversion import ...`. Downloading just the
one file therefore fails at runtime with:
ModuleNotFoundError: No module named 'conversion'
Clone the repo (reusing the clone already needed to build llama-quantize)
and copy both the script and the `conversion/` package into the backend
dir. Python puts the script's own directory on sys.path[0], so the package
resolves when it sits beside the script.
Signed-off-by: Ettore Di Giacinto <mudler@localai.io>
Assisted-by: Claude:claude-opus-4-8 [Claude Code]
* fix(sglang): pin the CPU source build to sglang v0.5.11
The CPU profile builds sgl-kernel from a `git clone` of sglang with no
ref, so it always tracks master. Recent master added CPU kernels (e.g.
mamba/fla.cpp) that fail to compile in our builder:
constexpr variable 'scale' must be initialized by a constant
static library kineto_LIBRARY-NOTFOUND not found
Pin the clone to v0.5.11, the same release the GPU path already floors on
(requirements-cublas12-after.txt). Overridable via SGLANG_VERSION so the
pin can be bumped deliberately.
Signed-off-by: Ettore Di Giacinto <mudler@localai.io>
Assisted-by: Claude:claude-opus-4-8 [Claude Code]
---------
Signed-off-by: Ettore Di Giacinto <mudler@localai.io>
Co-authored-by: Ettore Di Giacinto <mudler@localai.io>
71 lines
3.1 KiB
Bash
Executable File
71 lines
3.1 KiB
Bash
Executable File
#!/bin/bash
|
|
set -e
|
|
|
|
backend_dir=$(dirname $0)
|
|
if [ -d $backend_dir/common ]; then
|
|
source $backend_dir/common/libbackend.sh
|
|
else
|
|
source $backend_dir/../common/libbackend.sh
|
|
fi
|
|
|
|
EXTRA_PIP_INSTALL_FLAGS+=" --upgrade "
|
|
installRequirements
|
|
|
|
# Fetch convert_hf_to_gguf.py from llama.cpp.
|
|
# Upstream split the model-specific logic out of the single file into a
|
|
# sibling `conversion/` package (convert_hf_to_gguf.py now does
|
|
# `from conversion import ...`), so a single-file download no longer runs —
|
|
# it fails with `ModuleNotFoundError: No module named 'conversion'`. We clone
|
|
# the repo and copy both the script and the package; Python puts the script's
|
|
# own directory on sys.path[0], so the package resolves when placed beside it.
|
|
LLAMA_CPP_CONVERT_VERSION="${LLAMA_CPP_CONVERT_VERSION:-master}"
|
|
LLAMA_CPP_SRC="${EDIR}/llama.cpp"
|
|
CONVERT_SCRIPT="${EDIR}/convert_hf_to_gguf.py"
|
|
|
|
cloneLlamaCpp() {
|
|
if [ ! -d "${LLAMA_CPP_SRC}/.git" ]; then
|
|
git clone --depth 1 --branch "${LLAMA_CPP_CONVERT_VERSION}" \
|
|
https://github.com/ggml-org/llama.cpp.git "${LLAMA_CPP_SRC}" 2>/dev/null || \
|
|
git clone --depth 1 https://github.com/ggml-org/llama.cpp.git "${LLAMA_CPP_SRC}"
|
|
fi
|
|
}
|
|
|
|
if [ ! -f "${CONVERT_SCRIPT}" ] || [ ! -d "${EDIR}/conversion" ]; then
|
|
echo "Fetching convert_hf_to_gguf.py + conversion/ from llama.cpp (${LLAMA_CPP_CONVERT_VERSION})..."
|
|
cloneLlamaCpp
|
|
cp "${LLAMA_CPP_SRC}/convert_hf_to_gguf.py" "${CONVERT_SCRIPT}"
|
|
rm -rf "${EDIR}/conversion"
|
|
cp -r "${LLAMA_CPP_SRC}/conversion" "${EDIR}/conversion"
|
|
fi
|
|
|
|
# Install gguf package from the same llama.cpp commit to keep them in sync
|
|
GGUF_PIP_SPEC="gguf @ git+https://github.com/ggml-org/llama.cpp@${LLAMA_CPP_CONVERT_VERSION}#subdirectory=gguf-py"
|
|
echo "Installing gguf package from llama.cpp (${LLAMA_CPP_CONVERT_VERSION})..."
|
|
if [ "x${USE_PIP:-}" == "xtrue" ]; then
|
|
pip install "${GGUF_PIP_SPEC}" || {
|
|
echo "Warning: Failed to install gguf from llama.cpp commit, falling back to PyPI..."
|
|
pip install "gguf>=0.16.0"
|
|
}
|
|
else
|
|
uv pip install "${GGUF_PIP_SPEC}" || {
|
|
echo "Warning: Failed to install gguf from llama.cpp commit, falling back to PyPI..."
|
|
uv pip install "gguf>=0.16.0"
|
|
}
|
|
fi
|
|
|
|
# Build llama-quantize from llama.cpp if not already present
|
|
QUANTIZE_BIN="${EDIR}/llama-quantize"
|
|
if [ ! -x "${QUANTIZE_BIN}" ] && ! command -v llama-quantize &>/dev/null; then
|
|
if command -v cmake &>/dev/null; then
|
|
echo "Building llama-quantize from llama.cpp (${LLAMA_CPP_CONVERT_VERSION})..."
|
|
cloneLlamaCpp # reuses the clone fetched for convert_hf_to_gguf.py
|
|
cmake -B "${LLAMA_CPP_SRC}/build" -S "${LLAMA_CPP_SRC}" -DGGML_NATIVE=OFF -DBUILD_SHARED_LIBS=OFF
|
|
cmake --build "${LLAMA_CPP_SRC}/build" --target llama-quantize -j"$(nproc 2>/dev/null || echo 2)"
|
|
cp "${LLAMA_CPP_SRC}/build/bin/llama-quantize" "${QUANTIZE_BIN}"
|
|
chmod +x "${QUANTIZE_BIN}"
|
|
echo "Built llama-quantize at ${QUANTIZE_BIN}"
|
|
else
|
|
echo "Warning: cmake not found — llama-quantize will not be available. Install cmake or provide llama-quantize on PATH."
|
|
fi
|
|
fi
|