mirror of
https://github.com/mudler/LocalAI.git
synced 2026-04-17 05:18:53 -04:00
* feat(backend): add tinygrad multimodal backend
Wire tinygrad as a new Python backend covering LLM text generation with
native tool-call extraction, embeddings, Stable Diffusion 1.x image
generation, and Whisper speech-to-text from a single self-contained
container.
Backend (`backend/python/tinygrad/`):
- `backend.py` gRPC servicer with LLM Predict/PredictStream (auto-detects
Llama / Qwen2 / Mistral architecture from `config.json`, supports
safetensors and GGUF), Embedding via mean-pooled last hidden state,
GenerateImage via the vendored SD1.x pipeline, AudioTranscription +
AudioTranscriptionStream via the vendored Whisper inference loop, plus
Tokenize / ModelMetadata / Status / Free.
- Vendored upstream model code under `vendor/` (MIT, headers preserved):
llama.py with an added `qkv_bias` flag for Qwen2-family bias support
and an `embed()` method that returns the last hidden state, plus
clip.py, unet.py, stable_diffusion.py (trimmed to drop the MLPerf
training branch that pulls `mlperf.initializers`), audio_helpers.py
and whisper.py (trimmed to drop the pyaudio listener).
- Pluggable tool-call parsers under `tool_parsers/`: hermes (Qwen2.5 /
Hermes), llama3_json (Llama 3.1+), qwen3_xml (Qwen 3), mistral
(Mistral / Mixtral). Auto-selected from model architecture or `Options`.
- `install.sh` pins Python 3.11.14 (tinygrad >=0.12 needs >=3.11; the
default portable python is 3.10).
- `package.sh` bundles libLLVM.so.1 + libedit/libtinfo/libgomp/libsndfile
into the scratch image. `run.sh` sets `CPU_LLVM=1` and `LLVM_PATH` so
tinygrad's CPU device uses the in-process libLLVM JIT instead of
shelling out to the missing `clang` binary.
- Local unit tests for Health and the four parsers in `test.py`.
Build wiring:
- Root `Makefile`: `.NOTPARALLEL`, `prepare-test-extra`, `test-extra`,
`BACKEND_TINYGRAD = tinygrad|python|.|false|true`,
docker-build-target eval, and `docker-build-backends` aggregator.
- `.github/workflows/backend.yml`: cpu / cuda12 / cuda13 build matrix
entries (mirrors the transformers backend placement).
- `backend/index.yaml`: `&tinygrad` meta + cpu/cuda12/cuda13 image
entries (latest + development).
E2E test wiring:
- `tests/e2e-backends/backend_test.go` gains an `image` capability that
exercises GenerateImage and asserts a non-empty PNG is written to
`dst`. New `BACKEND_TEST_IMAGE_PROMPT` / `BACKEND_TEST_IMAGE_STEPS`
knobs.
- Five new make targets next to `test-extra-backend-vllm`:
- `test-extra-backend-tinygrad` — Qwen2.5-0.5B-Instruct + hermes,
mirrors the vllm target 1:1 (5/9 specs in ~57s).
- `test-extra-backend-tinygrad-embeddings` — same model, embeddings
via LLM hidden state (3/9 in ~10s).
- `test-extra-backend-tinygrad-sd` — stable-diffusion-v1-5 mirror,
health/load/image (3/9 in ~10min, 4 diffusion steps on CPU).
- `test-extra-backend-tinygrad-whisper` — openai/whisper-tiny.en
against jfk.wav from whisper.cpp samples (4/9 in ~49s).
- `test-extra-backend-tinygrad-all` aggregate.
All four targets land green on the first MVP pass: 15 specs total, 0
failures across LLM+tools, embeddings, image generation, and speech
transcription.
* refactor(tinygrad): collapse to a single backend image
tinygrad generates its own GPU kernels (PTX renderer for CUDA, the
autogen ctypes wrappers for HIP / Metal / WebGPU) and never links
against cuDNN, cuBLAS, or any toolkit-version-tied library. The only
runtime dependency that varies across hosts is the driver's libcuda.so.1
/ libamdhip64.so, which are injected into the container at run time by
the nvidia-container / rocm runtimes. So unlike torch- or vLLM-based
backends, there is no reason to ship per-CUDA-version images.
- Drop the cuda12-tinygrad and cuda13-tinygrad build-matrix entries
from .github/workflows/backend.yml. The sole remaining entry is
renamed to -tinygrad (from -cpu-tinygrad) since it is no longer
CPU-only.
- Collapse backend/index.yaml to a single meta + development pair.
The meta anchor carries the latest uri directly; the development
entry points at the master tag.
- run.sh picks the tinygrad device at launch time by probing
/usr/lib/... for libcuda.so.1 / libamdhip64.so. When libcuda is
visible we set CUDA=1 + CUDA_PTX=1 so tinygrad uses its own PTX
renderer (avoids any nvrtc/toolkit dependency); otherwise we fall
back to HIP or CLANG. CPU_LLVM=1 + LLVM_PATH keep the in-process
libLLVM JIT for the CLANG path.
- backend.py's _select_tinygrad_device() is trimmed to a CLANG-only
fallback since production device selection happens in run.sh.
Re-ran test-extra-backend-tinygrad after the change:
Ran 5 of 9 Specs in 56.541 seconds — 5 Passed, 0 Failed
104 lines
3.4 KiB
Bash
Executable File
104 lines
3.4 KiB
Bash
Executable File
#!/bin/bash
|
|
# Script to package runtime shared libraries for the tinygrad backend.
|
|
#
|
|
# The final Dockerfile.python stage is FROM scratch, so system libraries
|
|
# must be explicitly copied into ${BACKEND}/lib so the backend can run on
|
|
# any host without installing them. libbackend.sh automatically prepends
|
|
# that directory to LD_LIBRARY_PATH at run time.
|
|
#
|
|
# tinygrad's CPU device (CLANG / LLVM renderer) JIT-compiles kernels at
|
|
# runtime. The default `CLANG` path invokes the external `clang` binary via
|
|
# subprocess, which does not exist in the scratch image. We force the
|
|
# in-process LLVM path (`CPU_LLVM=1` in run.sh) which loads libLLVM.so.*
|
|
# through ctypes and bundle the library + its runtime dependencies here.
|
|
#
|
|
# Also bundle libgomp (pulled by librosa / numpy via numba) and libsndfile
|
|
# (required by soundfile -> librosa audio I/O for Whisper).
|
|
|
|
set -e
|
|
|
|
CURDIR=$(dirname "$(realpath "$0")")
|
|
LIB_DIR="${CURDIR}/lib"
|
|
mkdir -p "${LIB_DIR}"
|
|
|
|
SEARCH_DIRS=(
|
|
/usr/lib/x86_64-linux-gnu
|
|
/usr/lib/aarch64-linux-gnu
|
|
/lib/x86_64-linux-gnu
|
|
/lib/aarch64-linux-gnu
|
|
/usr/lib
|
|
/lib
|
|
)
|
|
|
|
copy_with_symlinks() {
|
|
local soname="$1"
|
|
local hit=""
|
|
for dir in "${SEARCH_DIRS[@]}"; do
|
|
if [ -e "${dir}/${soname}" ]; then
|
|
hit="${dir}/${soname}"
|
|
break
|
|
fi
|
|
done
|
|
if [ -z "${hit}" ]; then
|
|
echo "warning: ${soname} not found in standard lib paths" >&2
|
|
return 0
|
|
fi
|
|
local real
|
|
real=$(readlink -f "${hit}")
|
|
cp -v "${real}" "${LIB_DIR}/"
|
|
local real_base
|
|
real_base=$(basename "${real}")
|
|
if [ "${real_base}" != "${soname}" ]; then
|
|
ln -sf "${real_base}" "${LIB_DIR}/${soname}"
|
|
fi
|
|
}
|
|
|
|
# tinygrad searches for libLLVM under these sonames (see
|
|
# tinygrad/runtime/autogen/llvm.py). Ubuntu 24.04's `llvm` metapackage
|
|
# installs `libLLVM-18.so.1` into `/usr/lib/llvm-18/lib/`. Also scan the
|
|
# standard lib directories in case a different distro layout puts it in
|
|
# /usr/lib/x86_64-linux-gnu.
|
|
llvm_so=""
|
|
shopt -s nullglob
|
|
LLVM_EXTRA_DIRS=(/usr/lib/llvm-*/lib /usr/lib/llvm-*)
|
|
# First try the versioned symlink (libLLVM-18.so) since that's what
|
|
# tinygrad's DLL loader matches against (see llvm.py DLL name list).
|
|
for dir in "${SEARCH_DIRS[@]}" "${LLVM_EXTRA_DIRS[@]}"; do
|
|
for candidate in "${dir}"/libLLVM-[0-9]*.so "${dir}"/libLLVM-[0-9]*.so.[0-9]*; do
|
|
if [ -e "${candidate}" ]; then
|
|
llvm_so="${candidate}"
|
|
break 2
|
|
fi
|
|
done
|
|
done
|
|
# Fallback: any libLLVM.so file under /usr.
|
|
if [ -z "${llvm_so}" ]; then
|
|
llvm_so=$(find /usr -maxdepth 5 -name 'libLLVM*.so*' 2>/dev/null | head -1)
|
|
fi
|
|
shopt -u nullglob
|
|
if [ -z "${llvm_so}" ]; then
|
|
echo "ERROR: libLLVM not found — tinygrad CPU device needs it." >&2
|
|
echo "Install the Ubuntu \`llvm\` package in the builder stage." >&2
|
|
exit 1
|
|
fi
|
|
echo "Found libLLVM at: ${llvm_so}"
|
|
llvm_base=$(basename "${llvm_so}")
|
|
real_llvm=$(readlink -f "${llvm_so}")
|
|
cp -v "${real_llvm}" "${LIB_DIR}/"
|
|
real_base=$(basename "${real_llvm}")
|
|
if [ "${real_base}" != "${llvm_base}" ]; then
|
|
ln -sf "${real_base}" "${LIB_DIR}/${llvm_base}"
|
|
fi
|
|
|
|
# libLLVM has soft runtime deps on libedit / libtinfo; pick them up if
|
|
# present. They're optional but loading without them can fail.
|
|
copy_with_symlinks libedit.so.2
|
|
copy_with_symlinks libtinfo.so.6
|
|
|
|
# Audio I/O for the Whisper path.
|
|
copy_with_symlinks libsndfile.so.1
|
|
copy_with_symlinks libgomp.so.1
|
|
|
|
echo "tinygrad packaging completed successfully"
|
|
ls -liah "${LIB_DIR}/"
|