mirror of
https://github.com/mudler/LocalAI.git
synced 2026-04-17 05:18:53 -04:00
* refactor(backends): extract python_utils + add mlx_utils shared helpers
Move parse_options() and messages_to_dicts() out of vllm_utils.py into a
new framework-agnostic python_utils.py, and re-export them from vllm_utils
so existing vllm / vllm-omni imports keep working.
Add mlx_utils.py with split_reasoning() and parse_tool_calls() — ported
from mlx_vlm/server.py's process_tool_calls. These work with any
mlx-lm / mlx-vlm tool module (anything exposing tool_call_start,
tool_call_end, parse_tool_call). Used by the mlx and mlx-vlm backends in
later commits to emit structured ChatDelta.tool_calls without
reimplementing per-model parsing.
Shared smoke tests confirm:
- parse_options round-trips bool/int/float/string
- vllm_utils re-exports are identity-equal to python_utils originals
- mlx_utils parse_tool_calls handles <tool_call>...</tool_call> with a
shim module and produces a correctly-indexed list with JSON arguments
- mlx_utils split_reasoning extracts <think> blocks and leaves clean
content
* feat(mlx): wire native tool parsers + ChatDelta + token usage + logprobs
Bring the MLX backend up to the same structured-output contract as vLLM
and llama.cpp: emit Reply.chat_deltas so the OpenAI HTTP layer sees
tool_calls and reasoning_content, not just raw text.
Key insight: mlx_lm.load() returns a TokenizerWrapper that already auto-
detects the right tool parser from the model's chat template
(_infer_tool_parser in mlx_lm/tokenizer_utils.py). The wrapper exposes
has_tool_calling, has_thinking, tool_parser, tool_call_start,
tool_call_end, think_start, think_end — no user configuration needed,
unlike vLLM.
Changes in backend/python/mlx/backend.py:
- Imports: replace inline parse_options / messages_to_dicts with the
shared helpers from python_utils. Pull split_reasoning / parse_tool_calls
from the new mlx_utils shared module.
- LoadModel: log the auto-detected has_tool_calling / has_thinking /
tool_parser_type for observability. Drop the local is_float / is_int
duplicates.
- _prepare_prompt: run request.Messages through messages_to_dicts so
tool_call_id / tool_calls / reasoning_content survive the conversion,
and pass tools=json.loads(request.Tools) + enable_thinking=True (when
request.Metadata says so) to apply_chat_template. Falls back on
TypeError for tokenizers whose template doesn't accept those kwargs.
- _build_generation_params: return an additional (logits_params,
stop_words) pair. Maps RepetitionPenalty / PresencePenalty /
FrequencyPenalty to mlx_lm.sample_utils.make_logits_processors and
threads StopPrompts through to post-decode truncation.
- New _tool_module_from_tokenizer / _finalize_output / _truncate_at_stop
helpers. _finalize_output runs split_reasoning when has_thinking is
true and parse_tool_calls (using a SimpleNamespace shim around the
wrapper's tool_parser callable) when has_tool_calling is true, then
extracts prompt_tokens, generation_tokens and (best-effort) logprobs
from the last GenerationResponse chunk.
- Predict: use make_logits_processors, accumulate text + last_response,
finalize into a structured Reply carrying chat_deltas,
prompt_tokens, tokens, logprobs. Early-stops on user stop sequences.
- PredictStream: per-chunk Reply still carries raw message bytes for
back-compat but now also emits chat_deltas=[ChatDelta(content=delta)].
On loop exit, emit a terminal Reply with structured
reasoning_content / tool_calls / token counts / logprobs — so the Go
side sees tool calls without needing the regex fallback.
- TokenizeString RPC: uses the TokenizerWrapper's encode(); returns
length + tokens or FAILED_PRECONDITION if the model isn't loaded.
- Free RPC: drops model / tokenizer / lru_cache, runs gc.collect(),
calls mx.metal.clear_cache() when available, and best-effort clears
torch.cuda as a belt-and-suspenders.
* feat(mlx-vlm): mirror MLX parity (tool parsers + ChatDelta + samplers)
Same treatment as the MLX backend: emit structured Reply.chat_deltas,
tool_calls, reasoning_content, token counts and logprobs, and extend
sampling parameter coverage beyond the temp/top_p pair the backend
used to handle.
- Imports: drop the inline is_float/is_int helpers, pull parse_options /
messages_to_dicts from python_utils and split_reasoning /
parse_tool_calls from mlx_utils. Also import make_sampler and
make_logits_processors from mlx_lm.sample_utils — mlx-vlm re-uses them.
- LoadModel: use parse_options; call mlx_vlm.tool_parsers._infer_tool_parser
/ load_tool_module to auto-detect a tool module from the processor's
chat_template. Stash think_start / think_end / has_thinking so later
finalisation can split reasoning blocks without duck-typing on each
call. Logs the detected parser type.
- _prepare_prompt: convert proto Messages via messages_to_dicts (so
tool_call_id / tool_calls survive), pass tools=json.loads(request.Tools)
and enable_thinking=True to apply_chat_template when present, fall
back on TypeError for older mlx-vlm versions. Also handle the
prompt-only + media and empty-prompt + media paths consistently.
- _build_generation_params: return (max_tokens, sampler_params,
logits_params, stop_words). Maps repetition_penalty / presence_penalty /
frequency_penalty and passes them through make_logits_processors.
- _finalize_output / _truncate_at_stop: common helper used by Predict
and PredictStream to split reasoning, run parse_tool_calls against the
auto-detected tool module, build ToolCallDelta list, and extract token
counts + logprobs from the last GenerationResult.
- Predict / PredictStream: switch from mlx_vlm.generate to mlx_vlm.stream_generate
in both paths, accumulate text + last_response, pass sampler and
logits_processors through, emit content-only ChatDelta per streaming
chunk followed by a terminal Reply carrying reasoning_content,
tool_calls, prompt_tokens, tokens and logprobs. Non-streaming Predict
returns the same structured Reply shape.
- New helper _collect_media extracted from the duplicated base64 image /
audio decode loop.
- New TokenizeString RPC using the processor's tokenizer.encode and
Free RPC that drops model/processor/config, runs gc + Metal cache
clear + best-effort torch.cuda cache clear.
* feat(importer/mlx): auto-set tool_parser/reasoning_parser on import
Mirror what core/gallery/importers/vllm.go does: after applying the
shared inference defaults, look up the model URI in parser_defaults.json
and append matching tool_parser:/reasoning_parser: entries to Options.
The MLX backends auto-detect tool parsers from the chat template at
runtime so they don't actually consume these options — but surfacing
them in the generated YAML:
- keeps the import experience consistent with vllm
- gives users a single visible place to override
- documents the intended parser for a given model family
* test(mlx): add helper unit tests + TokenizeString/Free + e2e make targets
- backend/python/mlx/test.py: add TestSharedHelpers with server-less
unit tests for parse_options, messages_to_dicts, split_reasoning and
parse_tool_calls (using a SimpleNamespace shim to fake a tool module
without requiring a model). Plus test_tokenize_string and test_free
RPC tests that load a tiny MLX-quantized Llama and exercise the new
RPCs end-to-end.
- backend/python/mlx-vlm/test.py: same helper unit tests + cleanup of
the duplicated import block at the top of the file.
- Makefile: register BACKEND_MLX and BACKEND_MLX_VLM (they were missing
from the docker-build-target eval list — only mlx-distributed had a
generated target before). Add test-extra-backend-mlx and
test-extra-backend-mlx-vlm convenience targets that build the
respective image and run tests/e2e-backends with the tools capability
against mlx-community/Qwen2.5-0.5B-Instruct-4bit. The MLX backend
auto-detects the tool parser from the chat template so no
BACKEND_TEST_OPTIONS is needed (unlike vllm).
* fix(libbackend): don't pass --copies to venv unless PORTABLE_PYTHON=true
backend/python/common/libbackend.sh:ensureVenv() always invoked
'python -m venv --copies', but macOS system python (and some other
builds) refuses with:
Error: This build of python cannot create venvs without using symlinks
--copies only matters when _makeVenvPortable later relocates the venv,
which only happens when PORTABLE_PYTHON=true. Make --copies conditional
on that flag and fall back to default (symlinked) venv otherwise.
Caught while bringing up the mlx backend on Apple Silicon — the same
build path is used by every Python backend with USE_PIP=true.
* fix(mlx): support mlx-lm 0.29.x tool calling + drop deprecated clear_cache
The released mlx-lm 0.29.x ships a much simpler tool-calling API than
HEAD: TokenizerWrapper detects the <tool_call>...</tool_call> markers
from the tokenizer vocab and exposes has_tool_calling /
tool_call_start / tool_call_end, but does NOT expose a tool_parser
callable on the wrapper and does NOT ship a mlx_lm.tool_parsers
subpackage at all (those only exist on main).
Caught while running the smoke test on Apple Silicon with the
released mlx-lm 0.29.1: tokenizer.tool_parser raised AttributeError
(falling through to the underlying HF tokenizer), so
_tool_module_from_tokenizer always returned None and tool calls slipped
through as raw <tool_call>...</tool_call> text in Reply.message instead
of being parsed into ChatDelta.tool_calls.
Fix: when has_tool_calling is True but tokenizer.tool_parser is missing,
default the parse_tool_call callable to json.loads(body.strip()) — that's
exactly what mlx_lm.tool_parsers.json_tools.parse_tool_call does on HEAD
and covers the only format 0.29 detects (<tool_call>JSON</tool_call>).
Future mlx-lm releases that ship more parsers will be picked up
automatically via the tokenizer.tool_parser attribute when present.
Also tighten the LoadModel logging — the old log line read
init_kwargs.get('tool_parser_type') which doesn't exist on 0.29 and
showed None even when has_tool_calling was True. Log the actual
tool_call_start / tool_call_end markers instead.
While here, switch Free()'s Metal cache clear from the deprecated
mx.metal.clear_cache to mx.clear_cache (mlx >= 0.30), with a
fallback for older releases. Mirrored to the mlx-vlm backend.
* feat(mlx-distributed): mirror MLX parity (tool calls + ChatDelta + sampler)
Same treatment as the mlx and mlx-vlm backends: emit Reply.chat_deltas
with structured tool_calls / reasoning_content / token counts /
logprobs, expand sampling parameter coverage beyond temp+top_p, and
add the missing TokenizeString and Free RPCs.
Notes specific to mlx-distributed:
- Rank 0 is the only rank that owns a sampler — workers participate in
the pipeline-parallel forward pass via mx.distributed and don't
re-implement sampling. So the new logits_params (repetition_penalty,
presence_penalty, frequency_penalty) and stop_words apply on rank 0
only; we don't need to extend coordinator.broadcast_generation_params,
which still ships only max_tokens / temperature / top_p to workers
(everything else is a rank-0 concern).
- Free() now broadcasts CMD_SHUTDOWN to workers when a coordinator is
active, so they release the model on their end too. The constant is
already defined and handled by the existing worker loop in
backend.py:633 (CMD_SHUTDOWN = -1).
- Drop the locally-defined is_float / is_int / parse_options trio in
favor of python_utils.parse_options, re-exported under the module
name for back-compat with anything that imported it directly.
- _prepare_prompt: route through messages_to_dicts so tool_call_id /
tool_calls / reasoning_content survive, pass tools=json.loads(
request.Tools) and enable_thinking=True to apply_chat_template, fall
back on TypeError for templates that don't accept those kwargs.
- New _tool_module_from_tokenizer (with the json.loads fallback for
mlx-lm 0.29.x), _finalize_output, _truncate_at_stop helpers — same
contract as the mlx backend.
- LoadModel logs the auto-detected has_tool_calling / has_thinking /
tool_call_start / tool_call_end so users can see what the wrapper
picked up for the loaded model.
- backend/python/mlx-distributed/test.py: add the same TestSharedHelpers
unit tests (parse_options, messages_to_dicts, split_reasoning,
parse_tool_calls) that exist for mlx and mlx-vlm.
545 lines
18 KiB
Bash
545 lines
18 KiB
Bash
#!/usr/bin/env bash
|
|
set -euo pipefail
|
|
|
|
#
|
|
# use the library by adding the following line to a script:
|
|
# source $(dirname $0)/../common/libbackend.sh
|
|
#
|
|
# If you want to limit what targets a backend can be used on, set the variable LIMIT_TARGETS to a
|
|
# space separated list of valid targets BEFORE sourcing the library, for example to only allow a backend
|
|
# to be used on CUDA and CPU backends:
|
|
#
|
|
# LIMIT_TARGETS="cublas cpu"
|
|
# source $(dirname $0)/../common/libbackend.sh
|
|
#
|
|
# You can use any valid BUILD_TYPE or BUILD_PROFILE, if you need to limit a backend to CUDA 12 only:
|
|
#
|
|
# LIMIT_TARGETS="cublas12"
|
|
# source $(dirname $0)/../common/libbackend.sh
|
|
#
|
|
# You can switch between uv (conda-like) and pip installation methods by setting USE_PIP:
|
|
# USE_PIP=true source $(dirname $0)/../common/libbackend.sh
|
|
#
|
|
# ===================== user-configurable defaults =====================
|
|
PYTHON_VERSION="${PYTHON_VERSION:-3.10}" # e.g. 3.10 / 3.11 / 3.12 / 3.13
|
|
PYTHON_PATCH="${PYTHON_PATCH:-18}" # e.g. 18 -> 3.10.18 ; 13 -> 3.11.13
|
|
PY_STANDALONE_TAG="${PY_STANDALONE_TAG:-20250818}" # release tag date
|
|
# Enable/disable bundling of a portable Python build
|
|
PORTABLE_PYTHON="${PORTABLE_PYTHON:-false}"
|
|
|
|
# If you want to fully pin the filename (including tuned CPU targets), set:
|
|
# PORTABLE_PY_FILENAME="cpython-3.10.18+20250818-x86_64_v3-unknown-linux-gnu-install_only.tar.gz"
|
|
: "${PORTABLE_PY_FILENAME:=}"
|
|
: "${PORTABLE_PY_SHA256:=}" # optional; if set we verify the download
|
|
# =====================================================================
|
|
|
|
# Default to uv if USE_PIP is not set
|
|
if [ "x${USE_PIP:-}" == "x" ]; then
|
|
USE_PIP=false
|
|
fi
|
|
|
|
# ----------------------- helpers -----------------------
|
|
function _is_musl() {
|
|
# detect musl (Alpine, etc)
|
|
if command -v ldd >/dev/null 2>&1; then
|
|
ldd --version 2>&1 | grep -qi musl && return 0
|
|
fi
|
|
# busybox-ish fallback
|
|
if command -v getconf >/dev/null 2>&1; then
|
|
getconf GNU_LIBC_VERSION >/dev/null 2>&1 || return 0
|
|
fi
|
|
return 1
|
|
}
|
|
|
|
function _triple() {
|
|
local os="" arch="" libc="gnu"
|
|
case "$(uname -s)" in
|
|
Linux*) os="unknown-linux" ;;
|
|
Darwin*) os="apple-darwin" ;;
|
|
MINGW*|MSYS*|CYGWIN*) os="pc-windows-msvc" ;; # best-effort for Git Bash
|
|
*) echo "Unsupported OS $(uname -s)"; exit 1;;
|
|
esac
|
|
|
|
case "$(uname -m)" in
|
|
x86_64) arch="x86_64" ;;
|
|
aarch64|arm64) arch="aarch64" ;;
|
|
armv7l) arch="armv7" ;;
|
|
i686|i386) arch="i686" ;;
|
|
ppc64le) arch="ppc64le" ;;
|
|
s390x) arch="s390x" ;;
|
|
riscv64) arch="riscv64" ;;
|
|
*) echo "Unsupported arch $(uname -m)"; exit 1;;
|
|
esac
|
|
|
|
if [[ "$os" == "unknown-linux" ]]; then
|
|
if _is_musl; then
|
|
libc="musl"
|
|
else
|
|
libc="gnu"
|
|
fi
|
|
echo "${arch}-${os}-${libc}"
|
|
else
|
|
echo "${arch}-${os}"
|
|
fi
|
|
}
|
|
|
|
function _portable_dir() {
|
|
echo "${EDIR}/python"
|
|
}
|
|
|
|
function _portable_bin() {
|
|
# python-build-standalone puts python in ./bin
|
|
echo "$(_portable_dir)/bin"
|
|
}
|
|
|
|
function _portable_python() {
|
|
if [ -x "$(_portable_bin)/python3" ]; then
|
|
echo "$(_portable_bin)/python3"
|
|
else
|
|
echo "$(_portable_bin)/python"
|
|
fi
|
|
}
|
|
|
|
|
|
# macOS loader env for the portable CPython
|
|
_macosPortableEnv() {
|
|
if [ "$(uname -s)" = "Darwin" ]; then
|
|
export DYLD_LIBRARY_PATH="$(_portable_dir)/lib${DYLD_LIBRARY_PATH:+:${DYLD_LIBRARY_PATH}}"
|
|
export DYLD_FALLBACK_LIBRARY_PATH="$(_portable_dir)/lib${DYLD_FALLBACK_LIBRARY_PATH:+:${DYLD_FALLBACK_LIBRARY_PATH}}"
|
|
fi
|
|
}
|
|
|
|
# Good hygiene on macOS for downloaded/extracted trees
|
|
_unquarantinePortablePython() {
|
|
if [ "$(uname -s)" = "Darwin" ]; then
|
|
command -v xattr >/dev/null 2>&1 && xattr -dr com.apple.quarantine "$(_portable_dir)" || true
|
|
fi
|
|
}
|
|
|
|
# ------------------ ### PORTABLE PYTHON ------------------
|
|
function ensurePortablePython() {
|
|
local pdir="$(_portable_dir)"
|
|
local pbin="$(_portable_bin)"
|
|
local pyexe
|
|
|
|
if [ -x "${pbin}/python3" ] || [ -x "${pbin}/python" ]; then
|
|
_macosPortableEnv
|
|
return 0
|
|
fi
|
|
|
|
mkdir -p "${pdir}"
|
|
local triple="$(_triple)"
|
|
|
|
local full_ver="${PYTHON_VERSION}.${PYTHON_PATCH}"
|
|
local fn=""
|
|
if [ -n "${PORTABLE_PY_FILENAME}" ]; then
|
|
fn="${PORTABLE_PY_FILENAME}"
|
|
else
|
|
# generic asset name: cpython-<full_ver>+<tag>-<triple>-install_only.tar.gz
|
|
fn="cpython-${full_ver}+${PY_STANDALONE_TAG}-${triple}-install_only.tar.gz"
|
|
fi
|
|
|
|
local url="https://github.com/astral-sh/python-build-standalone/releases/download/${PY_STANDALONE_TAG}/${fn}"
|
|
local tmp="${pdir}/${fn}"
|
|
echo "Downloading portable Python: ${fn}"
|
|
# curl with retries; fall back to wget if needed
|
|
if command -v curl >/dev/null 2>&1; then
|
|
curl -L --fail --retry 3 --retry-delay 1 -o "${tmp}" "${url}"
|
|
else
|
|
wget -O "${tmp}" "${url}"
|
|
fi
|
|
|
|
if [ -n "${PORTABLE_PY_SHA256}" ]; then
|
|
echo "${PORTABLE_PY_SHA256} ${tmp}" | sha256sum -c -
|
|
fi
|
|
|
|
echo "Extracting ${fn} -> ${pdir}"
|
|
# always a .tar.gz (we purposely choose install_only)
|
|
tar -xzf "${tmp}" -C "${pdir}"
|
|
rm -f "${tmp}"
|
|
|
|
# Some archives nest a directory; if so, flatten to ${pdir}
|
|
# Find the first dir with a 'bin/python*'
|
|
local inner
|
|
inner="$(find "${pdir}" -type f -path "*/bin/python*" -maxdepth 3 2>/dev/null | head -n1 || true)"
|
|
if [ -n "${inner}" ]; then
|
|
local inner_root
|
|
inner_root="$(dirname "$(dirname "${inner}")")" # .../bin -> root
|
|
if [ "${inner_root}" != "${pdir}" ]; then
|
|
# move contents up one level
|
|
shopt -s dotglob
|
|
mv "${inner_root}/"* "${pdir}/"
|
|
rm -rf "${inner_root}"
|
|
shopt -u dotglob
|
|
fi
|
|
fi
|
|
|
|
_unquarantinePortablePython
|
|
_macosPortableEnv
|
|
# Make sure it's runnable
|
|
pyexe="$(_portable_python)"
|
|
"${pyexe}" -V
|
|
}
|
|
|
|
# init handles the setup of the library
|
|
function init() {
|
|
BACKEND_NAME=${PWD##*/}
|
|
MY_DIR=$(realpath "$(dirname "$0")")
|
|
BUILD_PROFILE=$(getBuildProfile)
|
|
|
|
EDIR=${MY_DIR}
|
|
if [ "x${ENV_DIR:-}" != "x" ]; then
|
|
EDIR=${ENV_DIR}
|
|
fi
|
|
|
|
if [ ! -z "${LIMIT_TARGETS:-}" ]; then
|
|
isValidTarget=$(checkTargets ${LIMIT_TARGETS})
|
|
if [ ${isValidTarget} != true ]; then
|
|
echo "${BACKEND_NAME} can only be used on the following targets: ${LIMIT_TARGETS}"
|
|
exit 0
|
|
fi
|
|
fi
|
|
|
|
echo "Initializing libbackend for ${BACKEND_NAME}"
|
|
}
|
|
|
|
|
|
# getBuildProfile will inspect the system to determine which build profile is appropriate:
|
|
# returns one of the following:
|
|
# - cublas12
|
|
# - cublas13
|
|
# - hipblas
|
|
# - intel
|
|
function getBuildProfile() {
|
|
if [ x"${BUILD_TYPE:-}" == "xcublas" ] || [ x"${BUILD_TYPE:-}" == "xl4t" ]; then
|
|
if [ ! -z "${CUDA_MAJOR_VERSION:-}" ]; then
|
|
echo ${BUILD_TYPE}${CUDA_MAJOR_VERSION}
|
|
else
|
|
echo ${BUILD_TYPE}
|
|
fi
|
|
return 0
|
|
fi
|
|
|
|
if [ -d "/opt/intel" ]; then
|
|
echo "intel"
|
|
return 0
|
|
fi
|
|
|
|
if [ -n "${BUILD_TYPE:-}" ]; then
|
|
echo ${BUILD_TYPE}
|
|
return 0
|
|
fi
|
|
|
|
echo "cpu"
|
|
}
|
|
|
|
|
|
# Make the venv relocatable:
|
|
# - rewrite venv/bin/python{,3} to relative symlinks into $(_portable_dir)
|
|
# - normalize entrypoint shebangs to /usr/bin/env python3
|
|
# - optionally update pyvenv.cfg to point to the portable Python directory (only at runtime)
|
|
# Usage: _makeVenvPortable [--update-pyvenv-cfg]
|
|
_makeVenvPortable() {
|
|
local update_pyvenv_cfg=false
|
|
if [ "${1:-}" = "--update-pyvenv-cfg" ]; then
|
|
update_pyvenv_cfg=true
|
|
fi
|
|
|
|
local venv_dir="${EDIR}/venv"
|
|
local vbin="${venv_dir}/bin"
|
|
|
|
[ -d "${vbin}" ] || return 0
|
|
|
|
# 1) Replace python symlinks with relative ones to ../../python/bin/python3
|
|
# (venv/bin -> venv -> EDIR -> python/bin)
|
|
local rel_py='../../python/bin/python3'
|
|
|
|
for name in python3 python; do
|
|
if [ -e "${vbin}/${name}" ] || [ -L "${vbin}/${name}" ]; then
|
|
rm -f "${vbin}/${name}"
|
|
fi
|
|
done
|
|
ln -s "${rel_py}" "${vbin}/python3"
|
|
ln -s "python3" "${vbin}/python"
|
|
|
|
# 2) Update pyvenv.cfg to point to the portable Python directory (only at runtime)
|
|
# Use absolute path resolved at runtime so it works when the venv is copied
|
|
if [ "$update_pyvenv_cfg" = "true" ]; then
|
|
local pyvenv_cfg="${venv_dir}/pyvenv.cfg"
|
|
if [ -f "${pyvenv_cfg}" ]; then
|
|
local portable_dir="$(_portable_dir)"
|
|
# Resolve to absolute path - this ensures it works when the backend is copied
|
|
# Only resolve if the directory exists (it should if ensurePortablePython was called)
|
|
if [ -d "${portable_dir}" ]; then
|
|
portable_dir="$(cd "${portable_dir}" && pwd)"
|
|
else
|
|
# Fallback to relative path if directory doesn't exist yet
|
|
portable_dir="../python"
|
|
fi
|
|
local sed_i=(sed -i)
|
|
# macOS/BSD sed needs a backup suffix; GNU sed doesn't. Make it portable:
|
|
if sed --version >/dev/null 2>&1; then
|
|
sed_i=(sed -i)
|
|
else
|
|
sed_i=(sed -i '')
|
|
fi
|
|
# Update the home field in pyvenv.cfg
|
|
# Handle both absolute paths (starting with /) and relative paths
|
|
if grep -q "^home = " "${pyvenv_cfg}"; then
|
|
"${sed_i[@]}" "s|^home = .*|home = ${portable_dir}|" "${pyvenv_cfg}"
|
|
else
|
|
# If home field doesn't exist, add it
|
|
echo "home = ${portable_dir}" >> "${pyvenv_cfg}"
|
|
fi
|
|
fi
|
|
fi
|
|
|
|
# 3) Rewrite shebangs of entry points to use env, so the venv is relocatable
|
|
# Only touch text files that start with #! and reference the current venv.
|
|
local ve_abs="${vbin}/python"
|
|
local sed_i=(sed -i)
|
|
# macOS/BSD sed needs a backup suffix; GNU sed doesn't. Make it portable:
|
|
if sed --version >/dev/null 2>&1; then
|
|
sed_i=(sed -i)
|
|
else
|
|
sed_i=(sed -i '')
|
|
fi
|
|
|
|
for f in "${vbin}"/*; do
|
|
[ -f "$f" ] || continue
|
|
# Fast path: check first two bytes (#!)
|
|
head -c2 "$f" 2>/dev/null | grep -q '^#!' || continue
|
|
# Only rewrite if the shebang mentions the (absolute) venv python
|
|
if head -n1 "$f" | grep -Fq "${ve_abs}"; then
|
|
"${sed_i[@]}" '1s|^#!.*$|#!/usr/bin/env python3|' "$f"
|
|
chmod +x "$f" 2>/dev/null || true
|
|
fi
|
|
done
|
|
}
|
|
|
|
|
|
# ensureVenv makes sure that the venv for the backend both exists, and is activated.
|
|
#
|
|
# This function is idempotent, so you can call it as many times as you want and it will
|
|
# always result in an activated virtual environment
|
|
function ensureVenv() {
|
|
local interpreter=""
|
|
|
|
if [ "x${PORTABLE_PYTHON}" == "xtrue" ] || [ -e "$(_portable_python)" ]; then
|
|
echo "Using portable Python"
|
|
ensurePortablePython
|
|
interpreter="$(_portable_python)"
|
|
else
|
|
# Prefer system python${PYTHON_VERSION}, else python3, else fall back to bundled
|
|
if command -v python${PYTHON_VERSION} >/dev/null 2>&1; then
|
|
interpreter="python${PYTHON_VERSION}"
|
|
elif command -v python3 >/dev/null 2>&1; then
|
|
interpreter="python3"
|
|
else
|
|
echo "No suitable system Python found, bootstrapping portable build..."
|
|
ensurePortablePython
|
|
interpreter="$(_portable_python)"
|
|
fi
|
|
fi
|
|
|
|
if [ ! -d "${EDIR}/venv" ]; then
|
|
if [ "x${USE_PIP}" == "xtrue" ]; then
|
|
# --copies is only needed when we will later relocate the venv via
|
|
# _makeVenvPortable (PORTABLE_PYTHON=true). Some Python builds —
|
|
# notably macOS system Python — refuse to create a venv with
|
|
# --copies because the build doesn't support it. Fall back to
|
|
# symlinks in that case.
|
|
local venv_args=""
|
|
if [ "x${PORTABLE_PYTHON}" == "xtrue" ]; then
|
|
venv_args="--copies"
|
|
fi
|
|
"${interpreter}" -m venv ${venv_args} "${EDIR}/venv"
|
|
source "${EDIR}/venv/bin/activate"
|
|
"${interpreter}" -m pip install --upgrade pip
|
|
else
|
|
if [ "x${PORTABLE_PYTHON}" == "xtrue" ]; then
|
|
uv venv --python "${interpreter}" "${EDIR}/venv"
|
|
else
|
|
uv venv --python "${PYTHON_VERSION}" "${EDIR}/venv"
|
|
fi
|
|
fi
|
|
if [ "x${PORTABLE_PYTHON}" == "xtrue" ]; then
|
|
# During install, only update symlinks and shebangs, not pyvenv.cfg
|
|
_makeVenvPortable
|
|
fi
|
|
fi
|
|
|
|
# We call it here to make sure that when we source a venv we can still use python as expected
|
|
if [ -x "$(_portable_python)" ]; then
|
|
_macosPortableEnv
|
|
fi
|
|
|
|
if [ "x${VIRTUAL_ENV:-}" != "x${EDIR}/venv" ]; then
|
|
source "${EDIR}/venv/bin/activate"
|
|
fi
|
|
}
|
|
|
|
|
|
function runProtogen() {
|
|
ensureVenv
|
|
if [ "x${USE_PIP}" == "xtrue" ]; then
|
|
pip install grpcio-tools
|
|
else
|
|
uv pip install grpcio-tools
|
|
fi
|
|
pushd "${EDIR}" >/dev/null
|
|
# use the venv python (ensures correct interpreter & sys.path)
|
|
python -m grpc_tools.protoc -I../../ -I./ --python_out=. --grpc_python_out=. backend.proto
|
|
popd >/dev/null
|
|
}
|
|
|
|
|
|
# installRequirements looks for several requirements files and if they exist runs the install for them in order
|
|
#
|
|
# - requirements-install.txt
|
|
# - requirements.txt
|
|
# - requirements-${BUILD_TYPE}.txt
|
|
# - requirements-${BUILD_PROFILE}.txt
|
|
#
|
|
# BUILD_PROFILE is a more specific version of BUILD_TYPE, ex: cuda-12 or cuda-13
|
|
# it can also include some options that we do not have BUILD_TYPES for, ex: intel
|
|
#
|
|
# NOTE: for BUILD_PROFILE==intel, this function does NOT automatically use the Intel python package index.
|
|
# you may want to add the following line to a requirements-intel.txt if you use one:
|
|
#
|
|
# --index-url https://download.pytorch.org/whl/xpu
|
|
#
|
|
# If you need to add extra flags into the pip install command you can do so by setting the variable EXTRA_PIP_INSTALL_FLAGS
|
|
# before calling installRequirements. For example:
|
|
#
|
|
# source $(dirname $0)/../common/libbackend.sh
|
|
# EXTRA_PIP_INSTALL_FLAGS="--no-build-isolation"
|
|
# installRequirements
|
|
function installRequirements() {
|
|
ensureVenv
|
|
declare -a requirementFiles=(
|
|
"${EDIR}/requirements-install.txt"
|
|
"${EDIR}/requirements.txt"
|
|
"${EDIR}/requirements-${BUILD_TYPE:-}.txt"
|
|
)
|
|
|
|
if [ "x${BUILD_TYPE:-}" != "x${BUILD_PROFILE}" ]; then
|
|
requirementFiles+=("${EDIR}/requirements-${BUILD_PROFILE}.txt")
|
|
fi
|
|
if [ "x${BUILD_TYPE:-}" == "x" ]; then
|
|
requirementFiles+=("${EDIR}/requirements-cpu.txt")
|
|
fi
|
|
requirementFiles+=("${EDIR}/requirements-after.txt")
|
|
if [ "x${BUILD_TYPE:-}" != "x${BUILD_PROFILE}" ]; then
|
|
requirementFiles+=("${EDIR}/requirements-${BUILD_PROFILE}-after.txt")
|
|
fi
|
|
|
|
# This is needed to build wheels that e.g. depends on Python.h
|
|
if [ "x${PORTABLE_PYTHON}" == "xtrue" ]; then
|
|
export C_INCLUDE_PATH="${C_INCLUDE_PATH:-}:$(_portable_dir)/include/python${PYTHON_VERSION}"
|
|
fi
|
|
|
|
for reqFile in ${requirementFiles[@]}; do
|
|
if [ -f "${reqFile}" ]; then
|
|
echo "starting requirements install for ${reqFile}"
|
|
if [ "x${USE_PIP}" == "xtrue" ]; then
|
|
pip install ${EXTRA_PIP_INSTALL_FLAGS:-} --requirement "${reqFile}"
|
|
else
|
|
uv pip install ${EXTRA_PIP_INSTALL_FLAGS:-} --requirement "${reqFile}"
|
|
fi
|
|
echo "finished requirements install for ${reqFile}"
|
|
fi
|
|
done
|
|
|
|
runProtogen
|
|
}
|
|
|
|
# startBackend discovers and runs the backend GRPC server
|
|
#
|
|
# You can specify a specific backend file to execute by setting BACKEND_FILE before calling startBackend.
|
|
# example:
|
|
#
|
|
# source ../common/libbackend.sh
|
|
# BACKEND_FILE="${MY_DIR}/source/backend.py"
|
|
# startBackend $@
|
|
#
|
|
# valid filenames for autodiscovered backend servers are:
|
|
# - server.py
|
|
# - backend.py
|
|
# - ${BACKEND_NAME}.py
|
|
function startBackend() {
|
|
ensureVenv
|
|
# Update pyvenv.cfg before running to ensure paths are correct for current location
|
|
# This is critical when the backend position is dynamic (e.g., copied from container)
|
|
if [ "x${PORTABLE_PYTHON}" == "xtrue" ] || [ -x "$(_portable_python)" ]; then
|
|
_makeVenvPortable --update-pyvenv-cfg
|
|
fi
|
|
|
|
# Set up GPU library paths if a lib directory exists
|
|
# This allows backends to include their own GPU libraries (CUDA, ROCm, etc.)
|
|
if [ -d "${EDIR}/lib" ]; then
|
|
export LD_LIBRARY_PATH="${EDIR}/lib:${LD_LIBRARY_PATH:-}"
|
|
echo "Added ${EDIR}/lib to LD_LIBRARY_PATH for GPU libraries"
|
|
fi
|
|
|
|
if [ ! -z "${BACKEND_FILE:-}" ]; then
|
|
exec "${EDIR}/venv/bin/python" "${BACKEND_FILE}" "$@"
|
|
elif [ -e "${MY_DIR}/server.py" ]; then
|
|
exec "${EDIR}/venv/bin/python" "${MY_DIR}/server.py" "$@"
|
|
elif [ -e "${MY_DIR}/backend.py" ]; then
|
|
exec "${EDIR}/venv/bin/python" "${MY_DIR}/backend.py" "$@"
|
|
elif [ -e "${MY_DIR}/${BACKEND_NAME}.py" ]; then
|
|
exec "${EDIR}/venv/bin/python" "${MY_DIR}/${BACKEND_NAME}.py" "$@"
|
|
fi
|
|
}
|
|
|
|
|
|
# runUnittests discovers and runs python unittests
|
|
#
|
|
# You can specify a specific test file to use by setting TEST_FILE before calling runUnittests.
|
|
# example:
|
|
#
|
|
# source ../common/libbackend.sh
|
|
# TEST_FILE="${MY_DIR}/source/test.py"
|
|
# runUnittests $@
|
|
#
|
|
# be default a file named test.py in the backends directory will be used
|
|
function runUnittests() {
|
|
ensureVenv
|
|
if [ ! -z "${TEST_FILE:-}" ]; then
|
|
testDir=$(dirname "$(realpath "${TEST_FILE}")")
|
|
testFile=$(basename "${TEST_FILE}")
|
|
pushd "${testDir}" >/dev/null
|
|
python -m unittest "${testFile}"
|
|
popd >/dev/null
|
|
elif [ -f "${MY_DIR}/test.py" ]; then
|
|
pushd "${MY_DIR}" >/dev/null
|
|
python -m unittest test.py
|
|
popd >/dev/null
|
|
else
|
|
echo "no tests defined for ${BACKEND_NAME}"
|
|
fi
|
|
}
|
|
|
|
|
|
##################################################################################
|
|
# Below here are helper functions not intended to be used outside of the library #
|
|
##################################################################################
|
|
|
|
# checkTargets determines if the current BUILD_TYPE or BUILD_PROFILE is in a list of valid targets
|
|
function checkTargets() {
|
|
targets=$@
|
|
declare -a targets=($targets)
|
|
for target in ${targets[@]}; do
|
|
if [ "x${BUILD_TYPE:-}" == "x${target}" ]; then
|
|
echo true; return 0
|
|
fi
|
|
if [ "x${BUILD_PROFILE}" == "x${target}" ]; then
|
|
echo true; return 0
|
|
fi
|
|
done
|
|
echo false
|
|
}
|
|
|
|
init
|