mirror of
https://github.com/mudler/LocalAI.git
synced 2026-04-17 05:18:53 -04:00
* refactor(backends): extract python_utils + add mlx_utils shared helpers
Move parse_options() and messages_to_dicts() out of vllm_utils.py into a
new framework-agnostic python_utils.py, and re-export them from vllm_utils
so existing vllm / vllm-omni imports keep working.
Add mlx_utils.py with split_reasoning() and parse_tool_calls() — ported
from mlx_vlm/server.py's process_tool_calls. These work with any
mlx-lm / mlx-vlm tool module (anything exposing tool_call_start,
tool_call_end, parse_tool_call). Used by the mlx and mlx-vlm backends in
later commits to emit structured ChatDelta.tool_calls without
reimplementing per-model parsing.
Shared smoke tests confirm:
- parse_options round-trips bool/int/float/string
- vllm_utils re-exports are identity-equal to python_utils originals
- mlx_utils parse_tool_calls handles <tool_call>...</tool_call> with a
shim module and produces a correctly-indexed list with JSON arguments
- mlx_utils split_reasoning extracts <think> blocks and leaves clean
content
* feat(mlx): wire native tool parsers + ChatDelta + token usage + logprobs
Bring the MLX backend up to the same structured-output contract as vLLM
and llama.cpp: emit Reply.chat_deltas so the OpenAI HTTP layer sees
tool_calls and reasoning_content, not just raw text.
Key insight: mlx_lm.load() returns a TokenizerWrapper that already auto-
detects the right tool parser from the model's chat template
(_infer_tool_parser in mlx_lm/tokenizer_utils.py). The wrapper exposes
has_tool_calling, has_thinking, tool_parser, tool_call_start,
tool_call_end, think_start, think_end — no user configuration needed,
unlike vLLM.
Changes in backend/python/mlx/backend.py:
- Imports: replace inline parse_options / messages_to_dicts with the
shared helpers from python_utils. Pull split_reasoning / parse_tool_calls
from the new mlx_utils shared module.
- LoadModel: log the auto-detected has_tool_calling / has_thinking /
tool_parser_type for observability. Drop the local is_float / is_int
duplicates.
- _prepare_prompt: run request.Messages through messages_to_dicts so
tool_call_id / tool_calls / reasoning_content survive the conversion,
and pass tools=json.loads(request.Tools) + enable_thinking=True (when
request.Metadata says so) to apply_chat_template. Falls back on
TypeError for tokenizers whose template doesn't accept those kwargs.
- _build_generation_params: return an additional (logits_params,
stop_words) pair. Maps RepetitionPenalty / PresencePenalty /
FrequencyPenalty to mlx_lm.sample_utils.make_logits_processors and
threads StopPrompts through to post-decode truncation.
- New _tool_module_from_tokenizer / _finalize_output / _truncate_at_stop
helpers. _finalize_output runs split_reasoning when has_thinking is
true and parse_tool_calls (using a SimpleNamespace shim around the
wrapper's tool_parser callable) when has_tool_calling is true, then
extracts prompt_tokens, generation_tokens and (best-effort) logprobs
from the last GenerationResponse chunk.
- Predict: use make_logits_processors, accumulate text + last_response,
finalize into a structured Reply carrying chat_deltas,
prompt_tokens, tokens, logprobs. Early-stops on user stop sequences.
- PredictStream: per-chunk Reply still carries raw message bytes for
back-compat but now also emits chat_deltas=[ChatDelta(content=delta)].
On loop exit, emit a terminal Reply with structured
reasoning_content / tool_calls / token counts / logprobs — so the Go
side sees tool calls without needing the regex fallback.
- TokenizeString RPC: uses the TokenizerWrapper's encode(); returns
length + tokens or FAILED_PRECONDITION if the model isn't loaded.
- Free RPC: drops model / tokenizer / lru_cache, runs gc.collect(),
calls mx.metal.clear_cache() when available, and best-effort clears
torch.cuda as a belt-and-suspenders.
* feat(mlx-vlm): mirror MLX parity (tool parsers + ChatDelta + samplers)
Same treatment as the MLX backend: emit structured Reply.chat_deltas,
tool_calls, reasoning_content, token counts and logprobs, and extend
sampling parameter coverage beyond the temp/top_p pair the backend
used to handle.
- Imports: drop the inline is_float/is_int helpers, pull parse_options /
messages_to_dicts from python_utils and split_reasoning /
parse_tool_calls from mlx_utils. Also import make_sampler and
make_logits_processors from mlx_lm.sample_utils — mlx-vlm re-uses them.
- LoadModel: use parse_options; call mlx_vlm.tool_parsers._infer_tool_parser
/ load_tool_module to auto-detect a tool module from the processor's
chat_template. Stash think_start / think_end / has_thinking so later
finalisation can split reasoning blocks without duck-typing on each
call. Logs the detected parser type.
- _prepare_prompt: convert proto Messages via messages_to_dicts (so
tool_call_id / tool_calls survive), pass tools=json.loads(request.Tools)
and enable_thinking=True to apply_chat_template when present, fall
back on TypeError for older mlx-vlm versions. Also handle the
prompt-only + media and empty-prompt + media paths consistently.
- _build_generation_params: return (max_tokens, sampler_params,
logits_params, stop_words). Maps repetition_penalty / presence_penalty /
frequency_penalty and passes them through make_logits_processors.
- _finalize_output / _truncate_at_stop: common helper used by Predict
and PredictStream to split reasoning, run parse_tool_calls against the
auto-detected tool module, build ToolCallDelta list, and extract token
counts + logprobs from the last GenerationResult.
- Predict / PredictStream: switch from mlx_vlm.generate to mlx_vlm.stream_generate
in both paths, accumulate text + last_response, pass sampler and
logits_processors through, emit content-only ChatDelta per streaming
chunk followed by a terminal Reply carrying reasoning_content,
tool_calls, prompt_tokens, tokens and logprobs. Non-streaming Predict
returns the same structured Reply shape.
- New helper _collect_media extracted from the duplicated base64 image /
audio decode loop.
- New TokenizeString RPC using the processor's tokenizer.encode and
Free RPC that drops model/processor/config, runs gc + Metal cache
clear + best-effort torch.cuda cache clear.
* feat(importer/mlx): auto-set tool_parser/reasoning_parser on import
Mirror what core/gallery/importers/vllm.go does: after applying the
shared inference defaults, look up the model URI in parser_defaults.json
and append matching tool_parser:/reasoning_parser: entries to Options.
The MLX backends auto-detect tool parsers from the chat template at
runtime so they don't actually consume these options — but surfacing
them in the generated YAML:
- keeps the import experience consistent with vllm
- gives users a single visible place to override
- documents the intended parser for a given model family
* test(mlx): add helper unit tests + TokenizeString/Free + e2e make targets
- backend/python/mlx/test.py: add TestSharedHelpers with server-less
unit tests for parse_options, messages_to_dicts, split_reasoning and
parse_tool_calls (using a SimpleNamespace shim to fake a tool module
without requiring a model). Plus test_tokenize_string and test_free
RPC tests that load a tiny MLX-quantized Llama and exercise the new
RPCs end-to-end.
- backend/python/mlx-vlm/test.py: same helper unit tests + cleanup of
the duplicated import block at the top of the file.
- Makefile: register BACKEND_MLX and BACKEND_MLX_VLM (they were missing
from the docker-build-target eval list — only mlx-distributed had a
generated target before). Add test-extra-backend-mlx and
test-extra-backend-mlx-vlm convenience targets that build the
respective image and run tests/e2e-backends with the tools capability
against mlx-community/Qwen2.5-0.5B-Instruct-4bit. The MLX backend
auto-detects the tool parser from the chat template so no
BACKEND_TEST_OPTIONS is needed (unlike vllm).
* fix(libbackend): don't pass --copies to venv unless PORTABLE_PYTHON=true
backend/python/common/libbackend.sh:ensureVenv() always invoked
'python -m venv --copies', but macOS system python (and some other
builds) refuses with:
Error: This build of python cannot create venvs without using symlinks
--copies only matters when _makeVenvPortable later relocates the venv,
which only happens when PORTABLE_PYTHON=true. Make --copies conditional
on that flag and fall back to default (symlinked) venv otherwise.
Caught while bringing up the mlx backend on Apple Silicon — the same
build path is used by every Python backend with USE_PIP=true.
* fix(mlx): support mlx-lm 0.29.x tool calling + drop deprecated clear_cache
The released mlx-lm 0.29.x ships a much simpler tool-calling API than
HEAD: TokenizerWrapper detects the <tool_call>...</tool_call> markers
from the tokenizer vocab and exposes has_tool_calling /
tool_call_start / tool_call_end, but does NOT expose a tool_parser
callable on the wrapper and does NOT ship a mlx_lm.tool_parsers
subpackage at all (those only exist on main).
Caught while running the smoke test on Apple Silicon with the
released mlx-lm 0.29.1: tokenizer.tool_parser raised AttributeError
(falling through to the underlying HF tokenizer), so
_tool_module_from_tokenizer always returned None and tool calls slipped
through as raw <tool_call>...</tool_call> text in Reply.message instead
of being parsed into ChatDelta.tool_calls.
Fix: when has_tool_calling is True but tokenizer.tool_parser is missing,
default the parse_tool_call callable to json.loads(body.strip()) — that's
exactly what mlx_lm.tool_parsers.json_tools.parse_tool_call does on HEAD
and covers the only format 0.29 detects (<tool_call>JSON</tool_call>).
Future mlx-lm releases that ship more parsers will be picked up
automatically via the tokenizer.tool_parser attribute when present.
Also tighten the LoadModel logging — the old log line read
init_kwargs.get('tool_parser_type') which doesn't exist on 0.29 and
showed None even when has_tool_calling was True. Log the actual
tool_call_start / tool_call_end markers instead.
While here, switch Free()'s Metal cache clear from the deprecated
mx.metal.clear_cache to mx.clear_cache (mlx >= 0.30), with a
fallback for older releases. Mirrored to the mlx-vlm backend.
* feat(mlx-distributed): mirror MLX parity (tool calls + ChatDelta + sampler)
Same treatment as the mlx and mlx-vlm backends: emit Reply.chat_deltas
with structured tool_calls / reasoning_content / token counts /
logprobs, expand sampling parameter coverage beyond temp+top_p, and
add the missing TokenizeString and Free RPCs.
Notes specific to mlx-distributed:
- Rank 0 is the only rank that owns a sampler — workers participate in
the pipeline-parallel forward pass via mx.distributed and don't
re-implement sampling. So the new logits_params (repetition_penalty,
presence_penalty, frequency_penalty) and stop_words apply on rank 0
only; we don't need to extend coordinator.broadcast_generation_params,
which still ships only max_tokens / temperature / top_p to workers
(everything else is a rank-0 concern).
- Free() now broadcasts CMD_SHUTDOWN to workers when a coordinator is
active, so they release the model on their end too. The constant is
already defined and handled by the existing worker loop in
backend.py:633 (CMD_SHUTDOWN = -1).
- Drop the locally-defined is_float / is_int / parse_options trio in
favor of python_utils.parse_options, re-exported under the module
name for back-compat with anything that imported it directly.
- _prepare_prompt: route through messages_to_dicts so tool_call_id /
tool_calls / reasoning_content survive, pass tools=json.loads(
request.Tools) and enable_thinking=True to apply_chat_template, fall
back on TypeError for templates that don't accept those kwargs.
- New _tool_module_from_tokenizer (with the json.loads fallback for
mlx-lm 0.29.x), _finalize_output, _truncate_at_stop helpers — same
contract as the mlx backend.
- LoadModel logs the auto-detected has_tool_calling / has_thinking /
tool_call_start / tool_call_end so users can see what the wrapper
picked up for the loaded model.
- backend/python/mlx-distributed/test.py: add the same TestSharedHelpers
unit tests (parse_options, messages_to_dicts, split_reasoning,
parse_tool_calls) that exist for mlx and mlx-vlm.
828 lines
34 KiB
Makefile
828 lines
34 KiB
Makefile
# Disable parallel execution for backend builds
|
|
.NOTPARALLEL: backends/diffusers backends/llama-cpp backends/outetts backends/piper backends/stablediffusion-ggml backends/whisper backends/faster-whisper backends/silero-vad backends/local-store backends/huggingface backends/rfdetr backends/kitten-tts backends/kokoro backends/chatterbox backends/llama-cpp-darwin backends/neutts build-darwin-python-backend build-darwin-go-backend backends/mlx backends/diffuser-darwin backends/mlx-vlm backends/mlx-audio backends/mlx-distributed backends/stablediffusion-ggml-darwin backends/vllm backends/vllm-omni backends/moonshine backends/pocket-tts backends/qwen-tts backends/faster-qwen3-tts backends/qwen-asr backends/nemo backends/voxcpm backends/whisperx backends/ace-step backends/acestep-cpp backends/fish-speech backends/voxtral backends/opus backends/trl backends/llama-cpp-quantization backends/kokoros backends/sam3-cpp backends/qwen3-tts-cpp
|
|
|
|
GOCMD=go
|
|
GOTEST=$(GOCMD) test
|
|
GOVET=$(GOCMD) vet
|
|
BINARY_NAME=local-ai
|
|
LAUNCHER_BINARY_NAME=local-ai-launcher
|
|
|
|
UBUNTU_VERSION?=2404
|
|
UBUNTU_CODENAME?=noble
|
|
|
|
GORELEASER?=
|
|
|
|
export BUILD_TYPE?=
|
|
export CUDA_MAJOR_VERSION?=13
|
|
export CUDA_MINOR_VERSION?=0
|
|
|
|
GO_TAGS?=
|
|
BUILD_ID?=
|
|
NATIVE?=false
|
|
|
|
TEST_DIR=/tmp/test
|
|
|
|
TEST_FLAKES?=5
|
|
|
|
RANDOM := $(shell bash -c 'echo $$RANDOM')
|
|
|
|
VERSION?=$(shell git describe --always --tags || echo "dev" )
|
|
# go tool nm ./local-ai | grep Commit
|
|
LD_FLAGS?=-s -w
|
|
override LD_FLAGS += -X "github.com/mudler/LocalAI/internal.Version=$(VERSION)"
|
|
override LD_FLAGS += -X "github.com/mudler/LocalAI/internal.Commit=$(shell git rev-parse HEAD)"
|
|
|
|
OPTIONAL_TARGETS?=
|
|
|
|
export OS := $(shell uname -s)
|
|
ARCH := $(shell uname -m)
|
|
GREEN := $(shell tput -Txterm setaf 2)
|
|
YELLOW := $(shell tput -Txterm setaf 3)
|
|
WHITE := $(shell tput -Txterm setaf 7)
|
|
CYAN := $(shell tput -Txterm setaf 6)
|
|
RESET := $(shell tput -Txterm sgr0)
|
|
|
|
# Default Docker bridge IP
|
|
E2E_BRIDGE_IP?=172.17.0.1
|
|
|
|
ifndef UNAME_S
|
|
UNAME_S := $(shell uname -s)
|
|
endif
|
|
|
|
ifeq ($(OS),Darwin)
|
|
ifeq ($(OSX_SIGNING_IDENTITY),)
|
|
OSX_SIGNING_IDENTITY := $(shell security find-identity -v -p codesigning | grep '"' | head -n 1 | sed -E 's/.*"(.*)"/\1/')
|
|
endif
|
|
endif
|
|
|
|
# check if goreleaser exists
|
|
ifeq (, $(shell which goreleaser))
|
|
GORELEASER=curl -sfL https://goreleaser.com/static/run | bash -s --
|
|
else
|
|
GORELEASER=$(shell which goreleaser)
|
|
endif
|
|
|
|
TEST_PATHS?=./api/... ./pkg/... ./core/...
|
|
|
|
|
|
.PHONY: all test build vendor
|
|
|
|
all: help
|
|
|
|
## GENERIC
|
|
rebuild: ## Rebuilds the project
|
|
$(GOCMD) clean -cache
|
|
$(MAKE) build
|
|
|
|
clean: ## Remove build related file
|
|
$(GOCMD) clean -cache
|
|
rm -f prepare
|
|
rm -rf $(BINARY_NAME)
|
|
rm -rf release/
|
|
$(MAKE) protogen-clean
|
|
rmdir pkg/grpc/proto || true
|
|
|
|
clean-tests:
|
|
rm -rf test-models
|
|
rm -rf test-dir
|
|
|
|
## Install Go tools
|
|
install-go-tools:
|
|
go install google.golang.org/grpc/cmd/protoc-gen-go-grpc@1958fcbe2ca8bd93af633f11e97d44e567e945af
|
|
go install google.golang.org/protobuf/cmd/protoc-gen-go@v1.34.2
|
|
|
|
## React UI:
|
|
react-ui:
|
|
ifneq ($(wildcard core/http/react-ui/dist),)
|
|
@echo "react-ui dist already exists, skipping build"
|
|
else
|
|
cd core/http/react-ui && npm install && npm run build
|
|
endif
|
|
|
|
react-ui-docker:
|
|
docker run --entrypoint /bin/bash -v $(CURDIR):/app:z oven/bun:1 \
|
|
-c "cd /app/core/http/react-ui && bun install && bun run build"
|
|
|
|
core/http/react-ui/dist: react-ui
|
|
|
|
## Build:
|
|
|
|
build: protogen-go generate install-go-tools core/http/react-ui/dist ## Build the project
|
|
$(info ${GREEN}I local-ai build info:${RESET})
|
|
$(info ${GREEN}I BUILD_TYPE: ${YELLOW}$(BUILD_TYPE)${RESET})
|
|
$(info ${GREEN}I GO_TAGS: ${YELLOW}$(GO_TAGS)${RESET})
|
|
$(info ${GREEN}I LD_FLAGS: ${YELLOW}$(LD_FLAGS)${RESET})
|
|
$(info ${GREEN}I UPX: ${YELLOW}$(UPX)${RESET})
|
|
rm -rf $(BINARY_NAME) || true
|
|
CGO_LDFLAGS="$(CGO_LDFLAGS)" $(GOCMD) build -ldflags "$(LD_FLAGS)" -tags "$(GO_TAGS)" -o $(BINARY_NAME) ./cmd/local-ai
|
|
|
|
build-launcher: ## Build the launcher application
|
|
$(info ${GREEN}I local-ai launcher build info:${RESET})
|
|
$(info ${GREEN}I BUILD_TYPE: ${YELLOW}$(BUILD_TYPE)${RESET})
|
|
$(info ${GREEN}I GO_TAGS: ${YELLOW}$(GO_TAGS)${RESET})
|
|
$(info ${GREEN}I LD_FLAGS: ${YELLOW}$(LD_FLAGS)${RESET})
|
|
rm -rf $(LAUNCHER_BINARY_NAME) || true
|
|
CGO_LDFLAGS="$(CGO_LDFLAGS)" $(GOCMD) build -ldflags "$(LD_FLAGS)" -tags "$(GO_TAGS)" -o $(LAUNCHER_BINARY_NAME) ./cmd/launcher
|
|
|
|
build-all: build build-launcher ## Build both server and launcher
|
|
|
|
build-dev: ## Run LocalAI in dev mode with live reload
|
|
@command -v air >/dev/null 2>&1 || go install github.com/air-verse/air@latest
|
|
air -c .air.toml
|
|
|
|
dev-dist:
|
|
$(GORELEASER) build --snapshot --clean
|
|
|
|
dist:
|
|
$(GORELEASER) build --clean
|
|
|
|
osx-signed: build
|
|
codesign --deep --force --sign "$(OSX_SIGNING_IDENTITY)" --entitlements "./Entitlements.plist" "./$(BINARY_NAME)"
|
|
|
|
## Run
|
|
run: ## run local-ai
|
|
CGO_LDFLAGS="$(CGO_LDFLAGS)" $(GOCMD) run ./
|
|
|
|
test-models/testmodel.ggml:
|
|
mkdir -p test-models
|
|
mkdir -p test-dir
|
|
wget -q https://huggingface.co/mradermacher/gpt2-alpaca-gpt4-GGUF/resolve/main/gpt2-alpaca-gpt4.Q4_K_M.gguf -O test-models/testmodel.ggml
|
|
wget -q https://huggingface.co/ggerganov/whisper.cpp/resolve/main/ggml-base.en.bin -O test-models/whisper-en
|
|
wget -q https://cdn.openai.com/whisper/draft-20220913a/micro-machines.wav -O test-dir/audio.wav
|
|
cp tests/models_fixtures/* test-models
|
|
|
|
prepare-test: protogen-go
|
|
cp tests/models_fixtures/* test-models
|
|
|
|
########################################################
|
|
## Tests
|
|
########################################################
|
|
|
|
## Test targets
|
|
test: test-models/testmodel.ggml protogen-go
|
|
@echo 'Running tests'
|
|
export GO_TAGS="debug"
|
|
$(MAKE) prepare-test
|
|
OPUS_SHIM_LIBRARY=$(abspath ./pkg/opus/shim/libopusshim.so) \
|
|
HUGGINGFACE_GRPC=$(abspath ./)/backend/python/transformers/run.sh TEST_DIR=$(abspath ./)/test-dir/ FIXTURES=$(abspath ./)/tests/fixtures CONFIG_FILE=$(abspath ./)/test-models/config.yaml MODELS_PATH=$(abspath ./)/test-models BACKENDS_PATH=$(abspath ./)/backends \
|
|
$(GOCMD) run github.com/onsi/ginkgo/v2/ginkgo --label-filter="!llama-gguf" --flake-attempts $(TEST_FLAKES) --fail-fast -v -r $(TEST_PATHS)
|
|
$(MAKE) test-llama-gguf
|
|
$(MAKE) test-tts
|
|
$(MAKE) test-stablediffusion
|
|
|
|
########################################################
|
|
## E2E AIO tests (uses standard image with pre-configured models)
|
|
########################################################
|
|
|
|
docker-build-e2e:
|
|
docker build \
|
|
--build-arg MAKEFLAGS="--jobs=5 --output-sync=target" \
|
|
--build-arg BASE_IMAGE=$(BASE_IMAGE) \
|
|
--build-arg IMAGE_TYPE=$(IMAGE_TYPE) \
|
|
--build-arg BUILD_TYPE=$(BUILD_TYPE) \
|
|
--build-arg CUDA_MAJOR_VERSION=$(CUDA_MAJOR_VERSION) \
|
|
--build-arg CUDA_MINOR_VERSION=$(CUDA_MINOR_VERSION) \
|
|
--build-arg UBUNTU_VERSION=$(UBUNTU_VERSION) \
|
|
--build-arg UBUNTU_CODENAME=$(UBUNTU_CODENAME) \
|
|
--build-arg GO_TAGS="$(GO_TAGS)" \
|
|
-t local-ai:tests -f Dockerfile .
|
|
|
|
e2e-aio:
|
|
LOCALAI_BACKEND_DIR=$(abspath ./backends) \
|
|
LOCALAI_MODELS_DIR=$(abspath ./tests/e2e-aio/models) \
|
|
LOCALAI_IMAGE_TAG=tests \
|
|
LOCALAI_IMAGE=local-ai \
|
|
$(MAKE) run-e2e-aio
|
|
|
|
run-e2e-aio: protogen-go
|
|
@echo 'Running e2e AIO tests'
|
|
$(GOCMD) run github.com/onsi/ginkgo/v2/ginkgo --flake-attempts $(TEST_FLAKES) -v -r ./tests/e2e-aio
|
|
|
|
########################################################
|
|
## E2E tests
|
|
########################################################
|
|
|
|
prepare-e2e:
|
|
docker build \
|
|
--build-arg IMAGE_TYPE=core \
|
|
--build-arg BUILD_TYPE=$(BUILD_TYPE) \
|
|
--build-arg BASE_IMAGE=$(BASE_IMAGE) \
|
|
--build-arg CUDA_MAJOR_VERSION=$(CUDA_MAJOR_VERSION) \
|
|
--build-arg CUDA_MINOR_VERSION=$(CUDA_MINOR_VERSION) \
|
|
--build-arg UBUNTU_VERSION=$(UBUNTU_VERSION) \
|
|
--build-arg UBUNTU_CODENAME=$(UBUNTU_CODENAME) \
|
|
--build-arg GO_TAGS="$(GO_TAGS)" \
|
|
--build-arg MAKEFLAGS="$(DOCKER_MAKEFLAGS)" \
|
|
-t localai-tests .
|
|
|
|
run-e2e-image:
|
|
docker run -p 5390:8080 -e MODELS_PATH=/models -e THREADS=1 -e DEBUG=true -d --rm -v $(TEST_DIR):/models --name e2e-tests-$(RANDOM) localai-tests
|
|
|
|
test-e2e: build-mock-backend prepare-e2e run-e2e-image
|
|
@echo 'Running e2e tests'
|
|
BUILD_TYPE=$(BUILD_TYPE) \
|
|
LOCALAI_API=http://$(E2E_BRIDGE_IP):5390 \
|
|
$(GOCMD) run github.com/onsi/ginkgo/v2/ginkgo --flake-attempts $(TEST_FLAKES) -v -r ./tests/e2e
|
|
$(MAKE) clean-mock-backend
|
|
$(MAKE) teardown-e2e
|
|
docker rmi localai-tests
|
|
|
|
teardown-e2e:
|
|
rm -rf $(TEST_DIR) || true
|
|
docker stop $$(docker ps -q --filter ancestor=localai-tests)
|
|
|
|
########################################################
|
|
## Integration and unit tests
|
|
########################################################
|
|
|
|
test-llama-gguf: prepare-test
|
|
TEST_DIR=$(abspath ./)/test-dir/ FIXTURES=$(abspath ./)/tests/fixtures CONFIG_FILE=$(abspath ./)/test-models/config.yaml MODELS_PATH=$(abspath ./)/test-models BACKENDS_PATH=$(abspath ./)/backends \
|
|
$(GOCMD) run github.com/onsi/ginkgo/v2/ginkgo --label-filter="llama-gguf" --flake-attempts $(TEST_FLAKES) -v -r $(TEST_PATHS)
|
|
|
|
test-tts: prepare-test
|
|
TEST_DIR=$(abspath ./)/test-dir/ FIXTURES=$(abspath ./)/tests/fixtures CONFIG_FILE=$(abspath ./)/test-models/config.yaml MODELS_PATH=$(abspath ./)/test-models BACKENDS_PATH=$(abspath ./)/backends \
|
|
$(GOCMD) run github.com/onsi/ginkgo/v2/ginkgo --label-filter="tts" --flake-attempts $(TEST_FLAKES) -v -r $(TEST_PATHS)
|
|
|
|
test-stablediffusion: prepare-test
|
|
TEST_DIR=$(abspath ./)/test-dir/ FIXTURES=$(abspath ./)/tests/fixtures CONFIG_FILE=$(abspath ./)/test-models/config.yaml MODELS_PATH=$(abspath ./)/test-models BACKENDS_PATH=$(abspath ./)/backends \
|
|
$(GOCMD) run github.com/onsi/ginkgo/v2/ginkgo --label-filter="stablediffusion" --flake-attempts $(TEST_FLAKES) -v -r $(TEST_PATHS)
|
|
|
|
test-stores:
|
|
$(GOCMD) run github.com/onsi/ginkgo/v2/ginkgo --label-filter="stores" --flake-attempts $(TEST_FLAKES) -v -r tests/integration
|
|
|
|
test-opus:
|
|
@echo 'Running opus backend tests'
|
|
$(MAKE) -C backend/go/opus libopusshim.so
|
|
$(GOCMD) run github.com/onsi/ginkgo/v2/ginkgo --flake-attempts $(TEST_FLAKES) -v -r ./backend/go/opus/...
|
|
|
|
test-opus-docker:
|
|
@echo 'Running opus backend tests in Docker'
|
|
docker build --target builder \
|
|
--build-arg BUILD_TYPE=$(or $(BUILD_TYPE),) \
|
|
--build-arg BASE_IMAGE=$(or $(BASE_IMAGE),ubuntu:24.04) \
|
|
--build-arg BACKEND=opus \
|
|
-t localai-opus-test -f backend/Dockerfile.golang .
|
|
docker run --rm localai-opus-test \
|
|
bash -c 'cd /LocalAI && go run github.com/onsi/ginkgo/v2/ginkgo --flake-attempts $(TEST_FLAKES) -v -r ./backend/go/opus/...'
|
|
|
|
test-realtime: build-mock-backend
|
|
@echo 'Running realtime e2e tests (mock backend)'
|
|
$(GOCMD) run github.com/onsi/ginkgo/v2/ginkgo --label-filter="Realtime && !real-models" --flake-attempts $(TEST_FLAKES) -v -r ./tests/e2e
|
|
|
|
# Real-model realtime tests. Set REALTIME_TEST_MODEL to use your own pipeline,
|
|
# or leave unset to auto-build one from the component env vars below.
|
|
REALTIME_VAD?=silero-vad-ggml
|
|
REALTIME_STT?=whisper-1
|
|
REALTIME_LLM?=qwen3-0.6b
|
|
REALTIME_TTS?=tts-1
|
|
REALTIME_BACKENDS_PATH?=$(abspath ./)/backends
|
|
|
|
test-realtime-models: build-mock-backend
|
|
@echo 'Running realtime e2e tests (real models)'
|
|
REALTIME_TEST_MODEL=$${REALTIME_TEST_MODEL:-realtime-test-pipeline} \
|
|
REALTIME_VAD=$(REALTIME_VAD) \
|
|
REALTIME_STT=$(REALTIME_STT) \
|
|
REALTIME_LLM=$(REALTIME_LLM) \
|
|
REALTIME_TTS=$(REALTIME_TTS) \
|
|
REALTIME_BACKENDS_PATH=$(REALTIME_BACKENDS_PATH) \
|
|
$(GOCMD) run github.com/onsi/ginkgo/v2/ginkgo --label-filter="Realtime" --flake-attempts $(TEST_FLAKES) -v -r ./tests/e2e
|
|
|
|
# --- Container-based real-model testing ---
|
|
|
|
REALTIME_BACKEND_NAMES ?= silero-vad whisper llama-cpp kokoro
|
|
REALTIME_MODELS_DIR ?= $(abspath ./models)
|
|
REALTIME_BACKENDS_DIR ?= $(abspath ./local-backends)
|
|
REALTIME_DOCKER_FLAGS ?= --gpus all
|
|
|
|
local-backends:
|
|
mkdir -p local-backends
|
|
|
|
extract-backend-%: docker-build-% local-backends
|
|
@echo "Extracting backend $*..."
|
|
@CID=$$(docker create local-ai-backend:$*) && \
|
|
rm -rf local-backends/$* && mkdir -p local-backends/$* && \
|
|
docker cp $$CID:/ - | tar -xf - -C local-backends/$* && \
|
|
docker rm $$CID > /dev/null
|
|
|
|
extract-realtime-backends: $(addprefix extract-backend-,$(REALTIME_BACKEND_NAMES))
|
|
|
|
test-realtime-models-docker: build-mock-backend
|
|
docker build --target build-requirements \
|
|
--build-arg BUILD_TYPE=$(or $(BUILD_TYPE),cublas) \
|
|
--build-arg CUDA_MAJOR_VERSION=$(or $(CUDA_MAJOR_VERSION),13) \
|
|
--build-arg CUDA_MINOR_VERSION=$(or $(CUDA_MINOR_VERSION),0) \
|
|
-t localai-test-runner .
|
|
docker run --rm \
|
|
$(REALTIME_DOCKER_FLAGS) \
|
|
-v $(abspath ./):/build \
|
|
-v $(REALTIME_MODELS_DIR):/models:ro \
|
|
-v $(REALTIME_BACKENDS_DIR):/backends \
|
|
-v localai-go-cache:/root/go/pkg/mod \
|
|
-v localai-go-build-cache:/root/.cache/go-build \
|
|
-e REALTIME_TEST_MODEL=$${REALTIME_TEST_MODEL:-realtime-test-pipeline} \
|
|
-e REALTIME_VAD=$(REALTIME_VAD) \
|
|
-e REALTIME_STT=$(REALTIME_STT) \
|
|
-e REALTIME_LLM=$(REALTIME_LLM) \
|
|
-e REALTIME_TTS=$(REALTIME_TTS) \
|
|
-e REALTIME_BACKENDS_PATH=/backends \
|
|
-e REALTIME_MODELS_PATH=/models \
|
|
-w /build \
|
|
localai-test-runner \
|
|
bash -c 'git config --global --add safe.directory /build && \
|
|
make protogen-go && make build-mock-backend && \
|
|
go run github.com/onsi/ginkgo/v2/ginkgo --label-filter="Realtime" --flake-attempts $(TEST_FLAKES) -v -r ./tests/e2e'
|
|
|
|
test-container:
|
|
docker build --target requirements -t local-ai-test-container .
|
|
docker run -ti --rm --entrypoint /bin/bash -ti -v $(abspath ./):/build local-ai-test-container
|
|
|
|
########################################################
|
|
## Help
|
|
########################################################
|
|
|
|
## Help:
|
|
help: ## Show this help.
|
|
@echo ''
|
|
@echo 'Usage:'
|
|
@echo ' ${YELLOW}make${RESET} ${GREEN}<target>${RESET}'
|
|
@echo ''
|
|
@echo 'Targets:'
|
|
@awk 'BEGIN {FS = ":.*?## "} { \
|
|
if (/^[a-zA-Z_-]+:.*?##.*$$/) {printf " ${YELLOW}%-20s${GREEN}%s${RESET}\n", $$1, $$2} \
|
|
else if (/^## .*$$/) {printf " ${CYAN}%s${RESET}\n", substr($$1,4)} \
|
|
}' $(MAKEFILE_LIST)
|
|
|
|
########################################################
|
|
## Backends
|
|
########################################################
|
|
|
|
.PHONY: protogen
|
|
protogen: protogen-go
|
|
|
|
protoc:
|
|
@OS_NAME=$$(uname -s | tr '[:upper:]' '[:lower:]'); \
|
|
ARCH_NAME=$$(uname -m); \
|
|
if [ "$$OS_NAME" = "darwin" ]; then \
|
|
if [ "$$ARCH_NAME" = "arm64" ]; then \
|
|
FILE=protoc-31.1-osx-aarch_64.zip; \
|
|
elif [ "$$ARCH_NAME" = "x86_64" ]; then \
|
|
FILE=protoc-31.1-osx-x86_64.zip; \
|
|
else \
|
|
echo "Unsupported macOS architecture: $$ARCH_NAME"; exit 1; \
|
|
fi; \
|
|
elif [ "$$OS_NAME" = "linux" ]; then \
|
|
if [ "$$ARCH_NAME" = "x86_64" ]; then \
|
|
FILE=protoc-31.1-linux-x86_64.zip; \
|
|
elif [ "$$ARCH_NAME" = "aarch64" ] || [ "$$ARCH_NAME" = "arm64" ]; then \
|
|
FILE=protoc-31.1-linux-aarch_64.zip; \
|
|
elif [ "$$ARCH_NAME" = "ppc64le" ]; then \
|
|
FILE=protoc-31.1-linux-ppcle_64.zip; \
|
|
elif [ "$$ARCH_NAME" = "s390x" ]; then \
|
|
FILE=protoc-31.1-linux-s390_64.zip; \
|
|
elif [ "$$ARCH_NAME" = "i386" ] || [ "$$ARCH_NAME" = "x86" ]; then \
|
|
FILE=protoc-31.1-linux-x86_32.zip; \
|
|
else \
|
|
echo "Unsupported Linux architecture: $$ARCH_NAME"; exit 1; \
|
|
fi; \
|
|
else \
|
|
echo "Unsupported OS: $$OS_NAME"; exit 1; \
|
|
fi; \
|
|
URL=https://github.com/protocolbuffers/protobuf/releases/download/v31.1/$$FILE; \
|
|
curl -L $$URL -o protoc.zip && \
|
|
unzip -j -d $(CURDIR) protoc.zip bin/protoc && rm protoc.zip
|
|
|
|
.PHONY: protogen-go
|
|
protogen-go: protoc install-go-tools
|
|
mkdir -p pkg/grpc/proto
|
|
./protoc --experimental_allow_proto3_optional -Ibackend/ --go_out=pkg/grpc/proto/ --go_opt=paths=source_relative --go-grpc_out=pkg/grpc/proto/ --go-grpc_opt=paths=source_relative \
|
|
backend/backend.proto
|
|
|
|
core/config/inference_defaults.json: ## Fetch inference defaults from unsloth (only if missing)
|
|
$(GOCMD) generate ./core/config/...
|
|
|
|
.PHONY: generate
|
|
generate: core/config/inference_defaults.json ## Ensure inference defaults exist
|
|
|
|
.PHONY: generate-force
|
|
generate-force: ## Re-fetch inference defaults from unsloth (always)
|
|
$(GOCMD) generate ./core/config/...
|
|
|
|
.PHONY: protogen-go-clean
|
|
protogen-go-clean:
|
|
$(RM) pkg/grpc/proto/backend.pb.go pkg/grpc/proto/backend_grpc.pb.go
|
|
$(RM) bin/*
|
|
|
|
prepare-test-extra: protogen-python
|
|
$(MAKE) -C backend/python/transformers
|
|
$(MAKE) -C backend/python/outetts
|
|
$(MAKE) -C backend/python/diffusers
|
|
$(MAKE) -C backend/python/chatterbox
|
|
$(MAKE) -C backend/python/vllm
|
|
$(MAKE) -C backend/python/vllm-omni
|
|
$(MAKE) -C backend/python/vibevoice
|
|
$(MAKE) -C backend/python/moonshine
|
|
$(MAKE) -C backend/python/pocket-tts
|
|
$(MAKE) -C backend/python/qwen-tts
|
|
$(MAKE) -C backend/python/fish-speech
|
|
$(MAKE) -C backend/python/faster-qwen3-tts
|
|
$(MAKE) -C backend/python/qwen-asr
|
|
$(MAKE) -C backend/python/nemo
|
|
$(MAKE) -C backend/python/voxcpm
|
|
$(MAKE) -C backend/python/faster-whisper
|
|
$(MAKE) -C backend/python/whisperx
|
|
$(MAKE) -C backend/python/ace-step
|
|
$(MAKE) -C backend/python/trl
|
|
$(MAKE) -C backend/rust/kokoros kokoros-grpc
|
|
|
|
test-extra: prepare-test-extra
|
|
$(MAKE) -C backend/python/transformers test
|
|
$(MAKE) -C backend/python/outetts test
|
|
$(MAKE) -C backend/python/diffusers test
|
|
$(MAKE) -C backend/python/chatterbox test
|
|
$(MAKE) -C backend/python/vllm test
|
|
$(MAKE) -C backend/python/vllm-omni test
|
|
$(MAKE) -C backend/python/vibevoice test
|
|
$(MAKE) -C backend/python/moonshine test
|
|
$(MAKE) -C backend/python/pocket-tts test
|
|
$(MAKE) -C backend/python/qwen-tts test
|
|
$(MAKE) -C backend/python/fish-speech test
|
|
$(MAKE) -C backend/python/faster-qwen3-tts test
|
|
$(MAKE) -C backend/python/qwen-asr test
|
|
$(MAKE) -C backend/python/nemo test
|
|
$(MAKE) -C backend/python/voxcpm test
|
|
$(MAKE) -C backend/python/faster-whisper test
|
|
$(MAKE) -C backend/python/whisperx test
|
|
$(MAKE) -C backend/python/ace-step test
|
|
$(MAKE) -C backend/python/trl test
|
|
$(MAKE) -C backend/rust/kokoros test
|
|
|
|
##
|
|
## End-to-end gRPC tests that exercise a built backend container image.
|
|
##
|
|
## The test suite in tests/e2e-backends is backend-agnostic. You drive it via env
|
|
## vars (see tests/e2e-backends/backend_test.go for the full list) and the
|
|
## capability-driven harness picks which gRPC RPCs to exercise:
|
|
##
|
|
## BACKEND_IMAGE Required. Docker image to test, e.g. local-ai-backend:llama-cpp.
|
|
## BACKEND_TEST_MODEL_URL URL of a model file to download and load.
|
|
## BACKEND_TEST_MODEL_FILE Path to an already-downloaded model (skips download).
|
|
## BACKEND_TEST_MODEL_NAME HuggingFace repo id (e.g. Qwen/Qwen2.5-0.5B-Instruct).
|
|
## Use this instead of MODEL_URL for backends that
|
|
## resolve HF model ids natively (vllm, vllm-omni).
|
|
## BACKEND_TEST_CAPS Comma-separated capabilities, default "health,load,predict,stream".
|
|
## Adds "tools" to exercise ChatDelta tool call extraction.
|
|
## BACKEND_TEST_PROMPT Override the prompt used in predict/stream specs.
|
|
## BACKEND_TEST_OPTIONS Comma-separated Options[] entries forwarded to LoadModel,
|
|
## e.g. "tool_parser:hermes,reasoning_parser:qwen3".
|
|
##
|
|
## Direct usage (image already built, no docker-build-* dependency):
|
|
##
|
|
## make test-extra-backend BACKEND_IMAGE=local-ai-backend:llama-cpp \
|
|
## BACKEND_TEST_MODEL_URL=https://.../model.gguf
|
|
##
|
|
## Convenience wrappers below build a specific backend image first, then run the
|
|
## suite against it.
|
|
##
|
|
BACKEND_TEST_MODEL_URL?=https://huggingface.co/Qwen/Qwen3-0.6B-GGUF/resolve/main/Qwen3-0.6B-Q8_0.gguf
|
|
|
|
## Generic target — runs the suite against whatever BACKEND_IMAGE points at.
|
|
## Depends on protogen-go so pkg/grpc/proto is generated before `go test`.
|
|
test-extra-backend: protogen-go
|
|
@test -n "$$BACKEND_IMAGE" || { echo "BACKEND_IMAGE must be set" >&2; exit 1; }
|
|
BACKEND_IMAGE="$$BACKEND_IMAGE" \
|
|
BACKEND_TEST_MODEL_URL="$${BACKEND_TEST_MODEL_URL:-$(BACKEND_TEST_MODEL_URL)}" \
|
|
BACKEND_TEST_MODEL_FILE="$$BACKEND_TEST_MODEL_FILE" \
|
|
BACKEND_TEST_MODEL_NAME="$$BACKEND_TEST_MODEL_NAME" \
|
|
BACKEND_TEST_CAPS="$$BACKEND_TEST_CAPS" \
|
|
BACKEND_TEST_PROMPT="$$BACKEND_TEST_PROMPT" \
|
|
BACKEND_TEST_OPTIONS="$$BACKEND_TEST_OPTIONS" \
|
|
BACKEND_TEST_TOOL_PROMPT="$$BACKEND_TEST_TOOL_PROMPT" \
|
|
BACKEND_TEST_TOOL_NAME="$$BACKEND_TEST_TOOL_NAME" \
|
|
go test -v -timeout 30m ./tests/e2e-backends/...
|
|
|
|
## Convenience wrappers: build the image, then exercise it.
|
|
test-extra-backend-llama-cpp: docker-build-llama-cpp
|
|
BACKEND_IMAGE=local-ai-backend:llama-cpp $(MAKE) test-extra-backend
|
|
|
|
test-extra-backend-ik-llama-cpp: docker-build-ik-llama-cpp
|
|
BACKEND_IMAGE=local-ai-backend:ik-llama-cpp $(MAKE) test-extra-backend
|
|
|
|
## vllm is resolved from a HuggingFace model id (no file download) and
|
|
## exercises Predict + streaming + tool-call extraction via the hermes parser.
|
|
## Requires a host CPU with the SIMD instructions the prebuilt vllm CPU
|
|
## wheel was compiled against (AVX-512 VNNI/BF16); older CPUs will SIGILL
|
|
## on import — on CI this means using the bigger-runner label.
|
|
test-extra-backend-vllm: docker-build-vllm
|
|
BACKEND_IMAGE=local-ai-backend:vllm \
|
|
BACKEND_TEST_MODEL_NAME=Qwen/Qwen2.5-0.5B-Instruct \
|
|
BACKEND_TEST_CAPS=health,load,predict,stream,tools \
|
|
BACKEND_TEST_OPTIONS=tool_parser:hermes \
|
|
$(MAKE) test-extra-backend
|
|
|
|
## mlx is Apple-Silicon-first — the MLX backend auto-detects the right tool
|
|
## parser from the chat template, so no tool_parser: option is needed (it
|
|
## would be ignored at runtime). Run this on macOS / arm64 with Metal; the
|
|
## Linux/CPU mlx variant is untested in CI.
|
|
test-extra-backend-mlx: docker-build-mlx
|
|
BACKEND_IMAGE=local-ai-backend:mlx \
|
|
BACKEND_TEST_MODEL_NAME=mlx-community/Qwen2.5-0.5B-Instruct-4bit \
|
|
BACKEND_TEST_CAPS=health,load,predict,stream,tools \
|
|
$(MAKE) test-extra-backend
|
|
|
|
test-extra-backend-mlx-vlm: docker-build-mlx-vlm
|
|
BACKEND_IMAGE=local-ai-backend:mlx-vlm \
|
|
BACKEND_TEST_MODEL_NAME=mlx-community/Qwen2.5-0.5B-Instruct-4bit \
|
|
BACKEND_TEST_CAPS=health,load,predict,stream,tools \
|
|
$(MAKE) test-extra-backend
|
|
|
|
DOCKER_IMAGE?=local-ai
|
|
IMAGE_TYPE?=core
|
|
BASE_IMAGE?=ubuntu:24.04
|
|
|
|
docker:
|
|
docker build \
|
|
--build-arg BASE_IMAGE=$(BASE_IMAGE) \
|
|
--build-arg IMAGE_TYPE=$(IMAGE_TYPE) \
|
|
--build-arg GO_TAGS="$(GO_TAGS)" \
|
|
--build-arg MAKEFLAGS="$(DOCKER_MAKEFLAGS)" \
|
|
--build-arg BUILD_TYPE=$(BUILD_TYPE) \
|
|
--build-arg CUDA_MAJOR_VERSION=$(CUDA_MAJOR_VERSION) \
|
|
--build-arg CUDA_MINOR_VERSION=$(CUDA_MINOR_VERSION) \
|
|
--build-arg UBUNTU_VERSION=$(UBUNTU_VERSION) \
|
|
--build-arg UBUNTU_CODENAME=$(UBUNTU_CODENAME) \
|
|
-t $(DOCKER_IMAGE) .
|
|
|
|
docker-cuda12:
|
|
docker build \
|
|
--build-arg CUDA_MAJOR_VERSION=${CUDA_MAJOR_VERSION} \
|
|
--build-arg CUDA_MINOR_VERSION=${CUDA_MINOR_VERSION} \
|
|
--build-arg BASE_IMAGE=$(BASE_IMAGE) \
|
|
--build-arg IMAGE_TYPE=$(IMAGE_TYPE) \
|
|
--build-arg GO_TAGS="$(GO_TAGS)" \
|
|
--build-arg MAKEFLAGS="$(DOCKER_MAKEFLAGS)" \
|
|
--build-arg BUILD_TYPE=$(BUILD_TYPE) \
|
|
--build-arg UBUNTU_VERSION=$(UBUNTU_VERSION) \
|
|
--build-arg UBUNTU_CODENAME=$(UBUNTU_CODENAME) \
|
|
-t $(DOCKER_IMAGE)-cuda-12 .
|
|
|
|
docker-image-intel:
|
|
docker build \
|
|
--build-arg BASE_IMAGE=intel/oneapi-basekit:2025.3.0-0-devel-ubuntu24.04 \
|
|
--build-arg IMAGE_TYPE=$(IMAGE_TYPE) \
|
|
--build-arg GO_TAGS="$(GO_TAGS)" \
|
|
--build-arg MAKEFLAGS="$(DOCKER_MAKEFLAGS)" \
|
|
--build-arg BUILD_TYPE=intel \
|
|
--build-arg CUDA_MAJOR_VERSION=$(CUDA_MAJOR_VERSION) \
|
|
--build-arg CUDA_MINOR_VERSION=$(CUDA_MINOR_VERSION) \
|
|
--build-arg UBUNTU_VERSION=$(UBUNTU_VERSION) \
|
|
--build-arg UBUNTU_CODENAME=$(UBUNTU_CODENAME) \
|
|
-t $(DOCKER_IMAGE) .
|
|
|
|
########################################################
|
|
## Backends
|
|
########################################################
|
|
|
|
# Pattern rule for standard backends (docker-based)
|
|
# This matches all backends that use docker-build-* and docker-save-*
|
|
backends/%: docker-build-% docker-save-% build
|
|
./local-ai backends install "ocifile://$(abspath ./backend-images/$*.tar)"
|
|
|
|
# Darwin-specific backends (keep as explicit targets since they have special build logic)
|
|
backends/llama-cpp-darwin: build
|
|
bash ./scripts/build/llama-cpp-darwin.sh
|
|
./local-ai backends install "ocifile://$(abspath ./backend-images/llama-cpp.tar)"
|
|
|
|
build-darwin-python-backend: build
|
|
bash ./scripts/build/python-darwin.sh
|
|
|
|
build-darwin-go-backend: build
|
|
bash ./scripts/build/golang-darwin.sh
|
|
|
|
backends/mlx:
|
|
BACKEND=mlx $(MAKE) build-darwin-python-backend
|
|
./local-ai backends install "ocifile://$(abspath ./backend-images/mlx.tar)"
|
|
|
|
backends/diffuser-darwin:
|
|
BACKEND=diffusers $(MAKE) build-darwin-python-backend
|
|
./local-ai backends install "ocifile://$(abspath ./backend-images/diffusers.tar)"
|
|
|
|
backends/mlx-vlm:
|
|
BACKEND=mlx-vlm $(MAKE) build-darwin-python-backend
|
|
./local-ai backends install "ocifile://$(abspath ./backend-images/mlx-vlm.tar)"
|
|
|
|
backends/mlx-audio:
|
|
BACKEND=mlx-audio $(MAKE) build-darwin-python-backend
|
|
./local-ai backends install "ocifile://$(abspath ./backend-images/mlx-audio.tar)"
|
|
|
|
backends/mlx-distributed:
|
|
BACKEND=mlx-distributed $(MAKE) build-darwin-python-backend
|
|
./local-ai backends install "ocifile://$(abspath ./backend-images/mlx-distributed.tar)"
|
|
|
|
backends/stablediffusion-ggml-darwin:
|
|
BACKEND=stablediffusion-ggml BUILD_TYPE=metal $(MAKE) build-darwin-go-backend
|
|
./local-ai backends install "ocifile://$(abspath ./backend-images/stablediffusion-ggml.tar)"
|
|
|
|
backend-images:
|
|
mkdir -p backend-images
|
|
|
|
# Backend metadata: BACKEND_NAME | DOCKERFILE_TYPE | BUILD_CONTEXT | PROGRESS_FLAG | NEEDS_BACKEND_ARG
|
|
# llama-cpp is special - uses llama-cpp Dockerfile and doesn't need BACKEND arg
|
|
BACKEND_LLAMA_CPP = llama-cpp|llama-cpp|.|false|false
|
|
# ik-llama-cpp is a fork of llama.cpp with superior CPU performance
|
|
BACKEND_IK_LLAMA_CPP = ik-llama-cpp|ik-llama-cpp|.|false|false
|
|
|
|
# Golang backends
|
|
BACKEND_PIPER = piper|golang|.|false|true
|
|
BACKEND_LOCAL_STORE = local-store|golang|.|false|true
|
|
BACKEND_HUGGINGFACE = huggingface|golang|.|false|true
|
|
BACKEND_SILERO_VAD = silero-vad|golang|.|false|true
|
|
BACKEND_STABLEDIFFUSION_GGML = stablediffusion-ggml|golang|.|--progress=plain|true
|
|
BACKEND_WHISPER = whisper|golang|.|false|true
|
|
BACKEND_VOXTRAL = voxtral|golang|.|false|true
|
|
BACKEND_ACESTEP_CPP = acestep-cpp|golang|.|false|true
|
|
BACKEND_QWEN3_TTS_CPP = qwen3-tts-cpp|golang|.|false|true
|
|
BACKEND_OPUS = opus|golang|.|false|true
|
|
|
|
# Python backends with root context
|
|
BACKEND_RERANKERS = rerankers|python|.|false|true
|
|
BACKEND_TRANSFORMERS = transformers|python|.|false|true
|
|
BACKEND_OUTETTS = outetts|python|.|false|true
|
|
BACKEND_FASTER_WHISPER = faster-whisper|python|.|false|true
|
|
BACKEND_COQUI = coqui|python|.|false|true
|
|
BACKEND_RFDETR = rfdetr|python|.|false|true
|
|
BACKEND_KITTEN_TTS = kitten-tts|python|.|false|true
|
|
BACKEND_NEUTTS = neutts|python|.|false|true
|
|
BACKEND_KOKORO = kokoro|python|.|false|true
|
|
BACKEND_VLLM = vllm|python|.|false|true
|
|
BACKEND_VLLM_OMNI = vllm-omni|python|.|false|true
|
|
BACKEND_DIFFUSERS = diffusers|python|.|--progress=plain|true
|
|
BACKEND_CHATTERBOX = chatterbox|python|.|false|true
|
|
BACKEND_VIBEVOICE = vibevoice|python|.|--progress=plain|true
|
|
BACKEND_MOONSHINE = moonshine|python|.|false|true
|
|
BACKEND_POCKET_TTS = pocket-tts|python|.|false|true
|
|
BACKEND_QWEN_TTS = qwen-tts|python|.|false|true
|
|
BACKEND_FISH_SPEECH = fish-speech|python|.|false|true
|
|
BACKEND_FASTER_QWEN3_TTS = faster-qwen3-tts|python|.|false|true
|
|
BACKEND_QWEN_ASR = qwen-asr|python|.|false|true
|
|
BACKEND_NEMO = nemo|python|.|false|true
|
|
BACKEND_VOXCPM = voxcpm|python|.|false|true
|
|
BACKEND_WHISPERX = whisperx|python|.|false|true
|
|
BACKEND_ACE_STEP = ace-step|python|.|false|true
|
|
BACKEND_MLX = mlx|python|.|false|true
|
|
BACKEND_MLX_VLM = mlx-vlm|python|.|false|true
|
|
BACKEND_MLX_DISTRIBUTED = mlx-distributed|python|./|false|true
|
|
BACKEND_TRL = trl|python|.|false|true
|
|
BACKEND_LLAMA_CPP_QUANTIZATION = llama-cpp-quantization|python|.|false|true
|
|
|
|
# Rust backends
|
|
BACKEND_KOKOROS = kokoros|rust|.|false|true
|
|
|
|
# C++ backends (Go wrapper with purego)
|
|
BACKEND_SAM3_CPP = sam3-cpp|golang|.|false|true
|
|
|
|
# Helper function to build docker image for a backend
|
|
# Usage: $(call docker-build-backend,BACKEND_NAME,DOCKERFILE_TYPE,BUILD_CONTEXT,PROGRESS_FLAG,NEEDS_BACKEND_ARG)
|
|
define docker-build-backend
|
|
docker build $(if $(filter-out false,$(4)),$(4)) \
|
|
--build-arg BUILD_TYPE=$(BUILD_TYPE) \
|
|
--build-arg BASE_IMAGE=$(BASE_IMAGE) \
|
|
--build-arg CUDA_MAJOR_VERSION=$(CUDA_MAJOR_VERSION) \
|
|
--build-arg CUDA_MINOR_VERSION=$(CUDA_MINOR_VERSION) \
|
|
--build-arg UBUNTU_VERSION=$(UBUNTU_VERSION) \
|
|
--build-arg UBUNTU_CODENAME=$(UBUNTU_CODENAME) \
|
|
$(if $(FROM_SOURCE),--build-arg FROM_SOURCE=$(FROM_SOURCE)) \
|
|
$(if $(filter true,$(5)),--build-arg BACKEND=$(1)) \
|
|
-t local-ai-backend:$(1) -f backend/Dockerfile.$(2) $(3)
|
|
endef
|
|
|
|
# Generate docker-build targets from backend definitions
|
|
define generate-docker-build-target
|
|
docker-build-$(word 1,$(subst |, ,$(1))):
|
|
$$(call docker-build-backend,$(word 1,$(subst |, ,$(1))),$(word 2,$(subst |, ,$(1))),$(word 3,$(subst |, ,$(1))),$(word 4,$(subst |, ,$(1))),$(word 5,$(subst |, ,$(1))))
|
|
endef
|
|
|
|
# Generate all docker-build targets
|
|
$(eval $(call generate-docker-build-target,$(BACKEND_LLAMA_CPP)))
|
|
$(eval $(call generate-docker-build-target,$(BACKEND_IK_LLAMA_CPP)))
|
|
$(eval $(call generate-docker-build-target,$(BACKEND_PIPER)))
|
|
$(eval $(call generate-docker-build-target,$(BACKEND_LOCAL_STORE)))
|
|
$(eval $(call generate-docker-build-target,$(BACKEND_HUGGINGFACE)))
|
|
$(eval $(call generate-docker-build-target,$(BACKEND_SILERO_VAD)))
|
|
$(eval $(call generate-docker-build-target,$(BACKEND_STABLEDIFFUSION_GGML)))
|
|
$(eval $(call generate-docker-build-target,$(BACKEND_WHISPER)))
|
|
$(eval $(call generate-docker-build-target,$(BACKEND_VOXTRAL)))
|
|
$(eval $(call generate-docker-build-target,$(BACKEND_OPUS)))
|
|
$(eval $(call generate-docker-build-target,$(BACKEND_RERANKERS)))
|
|
$(eval $(call generate-docker-build-target,$(BACKEND_TRANSFORMERS)))
|
|
$(eval $(call generate-docker-build-target,$(BACKEND_OUTETTS)))
|
|
$(eval $(call generate-docker-build-target,$(BACKEND_FASTER_WHISPER)))
|
|
$(eval $(call generate-docker-build-target,$(BACKEND_COQUI)))
|
|
$(eval $(call generate-docker-build-target,$(BACKEND_RFDETR)))
|
|
$(eval $(call generate-docker-build-target,$(BACKEND_KITTEN_TTS)))
|
|
$(eval $(call generate-docker-build-target,$(BACKEND_NEUTTS)))
|
|
$(eval $(call generate-docker-build-target,$(BACKEND_KOKORO)))
|
|
$(eval $(call generate-docker-build-target,$(BACKEND_VLLM)))
|
|
$(eval $(call generate-docker-build-target,$(BACKEND_VLLM_OMNI)))
|
|
$(eval $(call generate-docker-build-target,$(BACKEND_DIFFUSERS)))
|
|
$(eval $(call generate-docker-build-target,$(BACKEND_CHATTERBOX)))
|
|
$(eval $(call generate-docker-build-target,$(BACKEND_VIBEVOICE)))
|
|
$(eval $(call generate-docker-build-target,$(BACKEND_MOONSHINE)))
|
|
$(eval $(call generate-docker-build-target,$(BACKEND_POCKET_TTS)))
|
|
$(eval $(call generate-docker-build-target,$(BACKEND_QWEN_TTS)))
|
|
$(eval $(call generate-docker-build-target,$(BACKEND_FISH_SPEECH)))
|
|
$(eval $(call generate-docker-build-target,$(BACKEND_FASTER_QWEN3_TTS)))
|
|
$(eval $(call generate-docker-build-target,$(BACKEND_QWEN_ASR)))
|
|
$(eval $(call generate-docker-build-target,$(BACKEND_NEMO)))
|
|
$(eval $(call generate-docker-build-target,$(BACKEND_VOXCPM)))
|
|
$(eval $(call generate-docker-build-target,$(BACKEND_WHISPERX)))
|
|
$(eval $(call generate-docker-build-target,$(BACKEND_ACE_STEP)))
|
|
$(eval $(call generate-docker-build-target,$(BACKEND_ACESTEP_CPP)))
|
|
$(eval $(call generate-docker-build-target,$(BACKEND_QWEN3_TTS_CPP)))
|
|
$(eval $(call generate-docker-build-target,$(BACKEND_MLX)))
|
|
$(eval $(call generate-docker-build-target,$(BACKEND_MLX_VLM)))
|
|
$(eval $(call generate-docker-build-target,$(BACKEND_MLX_DISTRIBUTED)))
|
|
$(eval $(call generate-docker-build-target,$(BACKEND_TRL)))
|
|
$(eval $(call generate-docker-build-target,$(BACKEND_LLAMA_CPP_QUANTIZATION)))
|
|
$(eval $(call generate-docker-build-target,$(BACKEND_KOKOROS)))
|
|
$(eval $(call generate-docker-build-target,$(BACKEND_SAM3_CPP)))
|
|
|
|
# Pattern rule for docker-save targets
|
|
docker-save-%: backend-images
|
|
docker save local-ai-backend:$* -o backend-images/$*.tar
|
|
|
|
docker-build-backends: docker-build-llama-cpp docker-build-ik-llama-cpp docker-build-rerankers docker-build-vllm docker-build-vllm-omni docker-build-transformers docker-build-outetts docker-build-diffusers docker-build-kokoro docker-build-faster-whisper docker-build-coqui docker-build-chatterbox docker-build-vibevoice docker-build-moonshine docker-build-pocket-tts docker-build-qwen-tts docker-build-fish-speech docker-build-faster-qwen3-tts docker-build-qwen-asr docker-build-nemo docker-build-voxcpm docker-build-whisperx docker-build-ace-step docker-build-acestep-cpp docker-build-voxtral docker-build-mlx-distributed docker-build-trl docker-build-llama-cpp-quantization docker-build-kokoros docker-build-sam3-cpp docker-build-qwen3-tts-cpp
|
|
|
|
########################################################
|
|
### Mock Backend for E2E Tests
|
|
########################################################
|
|
|
|
build-mock-backend: protogen-go
|
|
$(GOCMD) build -o tests/e2e/mock-backend/mock-backend ./tests/e2e/mock-backend
|
|
|
|
clean-mock-backend:
|
|
rm -f tests/e2e/mock-backend/mock-backend
|
|
|
|
########################################################
|
|
### UI E2E Test Server
|
|
########################################################
|
|
|
|
build-ui-test-server: build-mock-backend react-ui protogen-go
|
|
$(GOCMD) build -o tests/e2e-ui/ui-test-server ./tests/e2e-ui
|
|
|
|
test-ui-e2e: build-ui-test-server
|
|
cd core/http/react-ui && npm install && npx playwright install --with-deps chromium && npx playwright test
|
|
|
|
test-ui-e2e-docker:
|
|
docker build -t localai-ui-e2e -f tests/e2e-ui/Dockerfile .
|
|
docker run --rm localai-ui-e2e
|
|
|
|
clean-ui-test-server:
|
|
rm -f tests/e2e-ui/ui-test-server
|
|
|
|
########################################################
|
|
### END Backends
|
|
########################################################
|
|
|
|
.PHONY: swagger
|
|
swagger:
|
|
swag init -g core/http/app.go --output swagger
|
|
|
|
# DEPRECATED: gen-assets is for the legacy Alpine.js UI. Remove when legacy UI is removed.
|
|
.PHONY: gen-assets
|
|
gen-assets:
|
|
$(GOCMD) run core/dependencies_manager/manager.go webui_static.yaml core/http/static/assets
|
|
|
|
## Documentation
|
|
docs/layouts/_default:
|
|
mkdir -p docs/layouts/_default
|
|
|
|
docs/static/gallery.html: docs/layouts/_default
|
|
$(GOCMD) run ./.github/ci/modelslist.go ./gallery/index.yaml > docs/static/gallery.html
|
|
|
|
docs/public: docs/layouts/_default docs/static/gallery.html
|
|
cd docs && hugo --minify
|
|
|
|
docs-clean:
|
|
rm -rf docs/public
|
|
rm -rf docs/static/gallery.html
|
|
|
|
.PHONY: docs
|
|
docs: docs/static/gallery.html
|
|
cd docs && hugo serve
|
|
|
|
########################################################
|
|
## Platform-specific builds
|
|
########################################################
|
|
|
|
## fyne cross-platform build
|
|
build-launcher-darwin: build-launcher
|
|
go run github.com/tiagomelo/macos-dmg-creator/cmd/createdmg@latest \
|
|
--appName "LocalAI" \
|
|
--appBinaryPath "$(LAUNCHER_BINARY_NAME)" \
|
|
--bundleIdentifier "com.localai.launcher" \
|
|
--iconPath "core/http/static/logo.png" \
|
|
--outputDir "dist/"
|
|
|
|
build-launcher-linux:
|
|
cd cmd/launcher && go run fyne.io/tools/cmd/fyne@latest package -os linux -icon ../../core/http/static/logo.png --executable $(LAUNCHER_BINARY_NAME)-linux && mv launcher.tar.xz ../../$(LAUNCHER_BINARY_NAME)-linux.tar.xz
|