Files
LocalAI/.agents/adding-backends.md
LocalAI [bot] 600dafd20b feat(ced): sound-event classification backend (CED audio tagger) (#10425)
* feat(ced): sketch sound-classification backend (CED audio tagger)

Wires ced.cpp (CED, 527-class AudioSet sound-event tagger; baby cry,
footsteps, glass, alarms, dog bark) into LocalAI as a Go/purego backend.

SKETCH (backend skeleton real; core REST wiring + CI/gallery is a checklist
in DESIGN.md):
- backend/backend.proto: new SoundDetection rpc + SoundClass messages
  (run `make protogen-go` to regenerate pkg/grpc/proto).
- backend/go/ced: main.go (purego dlopen libced.so + ced_capi.h),
  goced.go (Ced gRPC backend: Load + SoundDetection), Makefile
  (clone-at-pin CED_VERSION, ggml static-PIC shared build), run.sh,
  package.sh, .gitignore.
- DESIGN.md: REST /v1/audio/classification wiring (handler/route/capability
  registration checklist), gallery/index + CI registration, and a scoping
  note for the realtime/websocket live-recognition path (sliding-window
  classify over the existing ws transport + voicegate; the ced C-API
  per-PCM entry point is already window-friendly).

Backend code does not compile until protogen-go regenerates the pb types
and a libced.so is built (Makefile clones+builds it).

Signed-off-by: Ettore Di Giacinto <mudler@localai.io>

* feat(ced): REST /v1/audio/classification endpoint + capability registration

Wires the ced sound-event classification backend (AudioSet audio tagger)
end to end through the REST surface, mirroring the transcription path.

- Handler: core/http/endpoints/openai/sound_classification.go parses the
  multipart audio upload, temp-files it, resolves the model config and
  calls the SoundDetection RPC; returns {model, detections[]} JSON.
- Backend wrapper: core/backend/sound_classification.go (ModelSoundDetection)
  loads the model and normalizes the proto response into schema types.
- Schema: core/schema/sound_classification.go (SoundClassificationResult).
- gRPC layer: SoundDetection wired through the LocalAI wrapper (interface,
  Backend client, Client, embed, server, base default) so the loader-typed
  client exposes the RPC; proto regenerated via make protogen-go.
- Route: POST /v1/audio/classification (+ /audio/classification alias) with
  the audio/multipart default-model middleware in routes/openai.go.
- Capability surfaces: swagger @Tags/@Router on the handler; FLAG_SOUND_
  CLASSIFICATION usecase flag + UsecaseSoundClassification + UsecaseInfoMap +
  GuessUsecases + ModalityGroups + GetAllModelConfigUsecases; meta usecase
  option; /api/instructions audio area updated; auth RouteFeatureRegistry +
  FeatureAudioClassification (APIFeatures, default ON) + FeatureMetas; UI
  usecaseFilters, capabilities.js CAP_SOUND_CLASSIFICATION, Models.jsx filter
  + i18n; docs page features/audio-classification.md + whats-new + crosslink.

Signed-off-by: Ettore Di Giacinto <mudler@localai.io>

* feat(ced): realtime sound-event detection over the websocket API

When a realtime pipeline configures a sound-classification model, each
VAD-committed utterance (the same window the transcription path produces)
is also run through the CED sound-event classifier and the scored AudioSet
tags are emitted as a new server event. No new backend rpc is needed: the
SoundDetection gRPC method already exists on this branch.

- config: add Pipeline.SoundDetection (yaml/json sound_detection,omitempty)
  beside Transcription/VAD.
- realtime: add Model.SoundDetection(ctx, audio, topK, threshold) to the
  ModelInterface; implement it on wrappedModel and transcriptOnlyModel by
  calling backend.ModelSoundDetection with the session's sound-classification
  model config (mirrors how Transcribe dispatches). Load the optional config
  in newModel / newTranscriptionOnlyModel; nil config keeps it additive.
- types: add ConversationItemSoundDetectionEvent (item_id, content_index,
  detections[]{label,score,index}) with type conversation.item.sound_detection,
  its ServerEventType constant and MarshalJSON, mirroring the transcription
  completed event.
- realtime: add emitSoundDetection (unary path: classify the committed window,
  build the event, t.SendEvent) and wire it at the utterance-commit hook right
  after emitTranscription; gated on session.SoundDetectionEnabled (resolved
  from Pipeline.SoundDetection at session setup, defaults top_k=5, threshold=0).
  Its error is logged via xlog but never aborts the turn.
- test: Ginkgo specs for emitSoundDetection (tags emitted, empty detections,
  classifier error) plus a SoundDetection method on the fakeModel double.

Signed-off-by: Ettore Di Giacinto <mudler@localai.io>

* fix(ced): implement SoundDetection in nodes backend test doubles

The SoundDetection method added to the grpc backend interface left two
test doubles (fakeBackendClient, fakeGRPCBackend) incomplete, so
core/services/nodes failed to compile under `go vet`/`go test` (go build
missed it: the doubles live in _test.go). Add the method to both,
mirroring their existing Detect mock. Repairs CI for the nodes package.

Signed-off-by: Ettore Di Giacinto <mudler@localai.io>

* feat(ced): decouple realtime sound detection from VAD (sound-only sessions)

Sound-event detection must activate on sounds, not speech, so it no longer
runs through the voice VAD/transcription path. A sound-detection-only
pipeline (sound_detection set, no transcription/LLM) now:

- is accepted by prepareRealtimeConfig (sound_detection counts as a pipeline
  stage),
- builds a lightweight model via newSoundDetectionOnlyModel (no VAD/STT/LLM/TTS
  loaded), and
- defaults the session to turn_detection none (no VAD) with no transcription
  stage, so the client drives windowing via input_audio_buffer.commit
  (option A: client-side sliding window). The per-PCM C-API already supports
  arbitrary windows.

commitUtterance gains a sound-only branch: it emits the
conversation.item.sound_detection event (scored AudioSet tags) and stops -
no transcription, no LLM response. generateResponse is now guarded on a
transcription stage being present, so a sound-only turn never invokes the LLM.

Existing transcription/VAD sessions are unchanged (additive). Added a
commitUtterance sound-only Ginkgo spec asserting it emits the sound event and
neither transcribes nor generates a response. go vet + golangci-lint
(new-from-merge-base) clean; openai suite green.

Signed-off-by: Ettore Di Giacinto <mudler@localai.io>

* feat(ced): register sound-classification backend in gallery + CI

Mechanical backend-image registration for the ced sound-event classifier,
mirroring the parakeet-cpp Go/purego backend everywhere it is wired up.

- .github/backend-matrix.yml: add the ced build matrix, field-for-field copies
  of the parakeet-cpp entries (cpu amd64/arm64, cublas cuda 12/13 amd64,
  l4t cuda-13 arm64, l4t-jetpack cuda-12 arm64, sycl f32/f16, vulkan
  amd64/arm64, rocm hipblas, and the metal darwin entry), changing only
  backend and tag-suffix. dockerfile stays ./backend/Dockerfile.golang.
- backend/index.yaml: add the &ced meta anchor (capabilities map per platform)
  plus ced-development and the per-arch image entries, each uri/mirror
  tag-suffix matching the matrix exactly. The model gallery (GGUF) entry is
  intentionally deferred pending the HuggingFace publish (TODO note inline).
- scripts/changed-backends.js: add an explicit item.backend === "ced" branch in
  inferBackendPath mapping to backend/go/ced/, same mechanism and ordering as
  the parakeet-cpp branch (before the generic golang fallthrough).
- .github/workflows/bump_deps.yaml: register mudler/ced.cpp -> CED_VERSION in
  backend/go/ced/Makefile so the daily bot bumps the pin.
- swagger/{docs.go,swagger.json,swagger.yaml}: regenerated via make swagger so
  the existing /v1/audio/classification annotations land in the generated spec.

Signed-off-by: Ettore Di Giacinto <mudler@localai.io>

* feat(ced): server-side windowing for realtime sound detection (option B)

Adds an optional server-driven sliding-window classifier so a sound-only
realtime client only has to stream audio (no input_audio_buffer.commit):

- Pipeline.sound_detection_window_ms / sound_detection_hop_ms config knobs.
  When both > 0 on a sound-only session, the server classifies the last
  window of streamed audio every hop and emits a conversation.item.sound_
  detection event; the input buffer is trimmed to one window so a long
  stream stays bounded. When unset, the session stays client-driven
  (option A). Runs independent of VAD (sound events are not speech).
- handleSoundWindow (ticker) + classifySoundWindow (one tick, extracted so
  it is unit-testable) + writeWindowWAV, which declares the true
  InputSampleRate (NewWAVHeaderWithRate) so the classifier resamples
  correctly. Goroutine is started after toggleVAD and torn down with the
  session (close + wg.Wait).
- Register pipeline.sound_detection (+window_ms/hop_ms) in the config meta
  registry; the earlier realtime commit added pipeline.sound_detection
  without a registry entry, failing TestAllFieldsHaveRegistryEntries. This
  fixes that and covers the two new knobs.

Tests: classifySoundWindow emits an event + trims the buffer to one window,
no-ops on too-little audio; writeWindowWAV declares the given sample rate.
go build/vet + golangci-lint (new-from-merge-base) clean; config + openai
suites green.

Signed-off-by: Ettore Di Giacinto <mudler@localai.io>

* feat(ced): add ced-base GGUF model gallery entries (f16 + q8_0)

The ced-base weights are now published at mudler/ced-base-gguf (Apache-2.0,
converted from mispeech/ced-base). Adds gallery/ced.yaml (backend: ced +
known_usecases: sound_classification) and two gallery/index.yaml entries
(ced-base-f16 default, ced-base-q8 smallest) with sha256-pinned files, and
removes the now-resolved TODO from backend/index.yaml.

Signed-off-by: Ettore Di Giacinto <mudler@localai.io>

* feat(ced): add tiny/mini/small GGUF model gallery entries

Publishes the rest of the CED family (same architecture, metadata-driven port
verified end-to-end on ced-tiny) to mudler/ced-{tiny,mini,small}-gguf and adds
their f16 + q8_0 gallery entries:

  ced-tiny  (5.5M, edge/Pi-class)  f16 11MB / q8_0 6MB
  ced-mini  (9.6M)                 f16 19MB / q8_0 11MB
  ced-small (22M)                  f16 42MB / q8_0 23MB

All sha256-pinned. ced-base remains the accuracy default.

Signed-off-by: Ettore Di Giacinto <mudler@localai.io>

* chore(ced): point gallery entries at the consolidated mudler/ced-gguf repo

All CED quantizations (tiny/mini/small/base, f16/q8_0) now live in a single
HuggingFace repo, mudler/ced-gguf, instead of per-model repos. Repoint the 8
gallery model entries' urls + file uris accordingly. sha256 and filenames are
unchanged.

Signed-off-by: Ettore Di Giacinto <mudler@localai.io>

* chore(ced): bump CED_VERSION to the short-clip fix

Pin the ced backend to ced.cpp 99c6ed3, which fixes a crash on any clip
shorter than target_length (~10.11s): time_pos_embed was added at its full
63-frame grid instead of being sliced to the clip's actual time grid, tripping
ggml_can_repeat in ggml_add. Surfaced by the live realtime e2e (sub-10s
windows) and gated with a short-clip parity test upstream.

Signed-off-by: Ettore Di Giacinto <mudler@localai.io>

* docs(ced): list ced.cpp as a LocalAI-team engine + backend-guide directive

- README.md: add ced.cpp to the "native C/C++/GGML engines developed and
  maintained by the LocalAI project" table.
- docs/content/features/backends.md: add a Sound Classification backend
  category (sound-event classification / audio tagging) listing ced.cpp.
- .agents/adding-backends.md: add a "Documenting the backend" section and two
  verification-checklist items requiring new backends to be documented in the
  backends.md category list, and in-house native engines to be added to the
  README maintained-engines table. This directive was missing.

Signed-off-by: Ettore Di Giacinto <mudler@localai.io>

* chore(ced): repin CED_VERSION to the v0.1.0 release commit

ced.cpp history was squashed into a single release commit (tagged v0.1.0), so
the previous pin (99c6ed3) no longer exists upstream. Pin to c04ac14, the
v0.1.0 release commit, so the backend builds against a commit that exists.

Signed-off-by: Ettore Di Giacinto <mudler@localai.io>

* fix(ced): silence gosec G304/G103 + govet unsafeptr on audited paths

- sound_classification.go: os.Create(dst) where dst = temp dir + path.Base of
  the upload (no traversal). #nosec G304, matching the depth-anything-cpp handler.
- goced.go: reading a NUL-terminated C string from a libced-owned buffer.
  #nosec G103 (gosec) + //nolint:govet (golangci-lint's unsafeptr check), since
  the uintptr is a C-owned malloc'd buffer, not Go-GC memory.

Signed-off-by: Ettore Di Giacinto <mudler@localai.io>

---------

Signed-off-by: Ettore Di Giacinto <mudler@localai.io>
Co-authored-by: Ettore Di Giacinto <mudler@localai.io>
2026-06-22 01:00:28 +02:00

18 KiB
Raw Blame History

Adding a New Backend

When adding a new backend to LocalAI, you need to update several files to ensure the backend is properly built, tested, and registered. Here's a step-by-step guide based on the pattern used for adding backends like moonshine:

1. Create Backend Directory Structure

Create the backend directory under the appropriate location:

  • Python backends: backend/python/<backend-name>/
  • Go backends: backend/go/<backend-name>/
  • C++ backends: backend/cpp/<backend-name>/
  • Rust backends: backend/rust/<backend-name>/

For Python backends, you'll typically need:

  • backend.py - Main gRPC server implementation
  • Makefile - Build configuration
  • install.sh - Installation script for dependencies
  • protogen.sh - Protocol buffer generation script
  • requirements.txt - Python dependencies
  • run.sh - Runtime script
  • test.py / test.sh - Test files

For Rust backends, you'll typically need (see backend/rust/kokoros/ as a reference):

  • Cargo.toml - Crate manifest; depend on the upstream project as a submodule under sources/
  • build.rs - Invokes tonic_build to generate gRPC stubs from backend/backend.proto (use the BACKEND_PROTO_PATH env var so the Makefile can inject the canonical copy)
  • src/ - The gRPC server implementation (implement Backend via tonic)
  • Makefile - Copies backend.proto into the crate, runs cargo build --release, then package.sh
  • package.sh - Uses ldd to bundle the binary's dynamic deps and ld.so into package/lib/
  • run.sh - Sets LD_LIBRARY_PATH/SSL_CERT_DIR and execs the binary via the bundled lib/ld.so
  • sources/<UpstreamProject>/ - Git submodule with the upstream Rust crate

2. Add Build Configurations to .github/backend-matrix.yml

The build matrix is data-only YAML at .github/backend-matrix.yml (not inside backend.yml itself). backend.yml (master push) and backend_pr.yml (PR) load it via scripts/changed-backends.js, which also handles per-file path filtering so only touched backends rebuild on PRs and master pushes alike. Add build matrix entries to .github/backend-matrix.yml for each platform/GPU type you want to support. Look at similar backends for reference — chatterbox/faster-whisper for Python, piper/silero-vad for Go, kokoros for Rust.

Without an entry here no image is ever built or pushed, and the gallery entry in backend/index.yaml will point at a tag that does not exist. The dockerfile: field must point at ./backend/Dockerfile.<lang> matching the language bucket from step 1 (e.g. Dockerfile.python, Dockerfile.golang, Dockerfile.rust). The tag-suffix must match the uri: in the corresponding backend/index.yaml image entry exactly.

scripts/changed-backends.js registration — REQUIRED for any new dockerfile suffix. This is the single most common omission, because it has no effect on the PR that adds the backend (when no prior path filter could catch it anyway) — it only breaks the next PR that touches your backend's directory, which then gets zero CI jobs and looks broken for unrelated reasons. Edit scripts/changed-backends.js:inferBackendPath and add a branch BEFORE the more-generic suffixes:

if (item.dockerfile.endsWith("<your-dockerfile-suffix>")) {
    return `backend/cpp/<your-backend>/`;   // or backend/python|go|rust/...
}

The endsWith() test is against the matrix entry's dockerfile: value (e.g. ./backend/Dockerfile.ds4endsWith("ds4")). Specificity order matters here just like it does for importers: more-specific suffixes go BEFORE more-generic ones (e.g. ds4 before llama-cpp even though both end with letters, because some upstream might one day call itself super-ds4-llama-cpp). Verify locally before pushing:

# Confirm your dockerfile suffix is unique enough
node -e "
const yaml = require('js-yaml'); const fs = require('fs');
const m = yaml.load(fs.readFileSync('.github/backend-matrix.yml','utf8'));
for (const e of m.include.filter(e => e.backend === '<your-backend>')) {
  console.log(e.dockerfile, '->', e.dockerfile.endsWith('<suffix>'));
}"

A quick way to find the right insertion point: grep -n 'item.dockerfile.endsWith' scripts/changed-backends.js.

bump_deps.yaml registration — REQUIRED for any backend pinning an upstream commit. If your backend's Makefile has a *_VERSION?=<sha> pin to a third-party repo, the daily auto-bump bot at .github/workflows/bump_deps.yaml won't notice it unless you register the backend in its matrix. The bot runs .github/bump_deps.sh which greps for ^$VAR?= in the Makefile you list — so the pin MUST live in the Makefile (not in a separate shell script). The bump for ds4 (#9761) had to walk this back because the original landed the pin in prepare.sh, which the bot can't see. Pattern (for antirez/ds4):

# .github/workflows/bump_deps.yaml
matrix:
  include:
    - repository: "antirez/ds4"
      variable: "DS4_VERSION"
      branch: "main"
      file: "backend/cpp/ds4/Makefile"

And the corresponding Makefile shape (mirror backend/cpp/llama-cpp/Makefile):

DS4_VERSION?=ae302c2fa18cc6d9aefc021d0f27ae03c9ad2fc0
DS4_REPO?=https://github.com/antirez/ds4
...
ds4:
	mkdir -p ds4
	cd ds4 && git init -q && \
	git remote add origin $(DS4_REPO) && \
	git fetch --depth 1 origin $(DS4_VERSION) && \
	git checkout FETCH_HEAD

If you have a prepare.sh doing the clone, delete it — the recipe belongs in the Makefile target so make purge && make works as a clean-and-rebuild and so the bump bot finds the pin.

Placement in file:

  • CPU builds: Add after other CPU builds (e.g., after cpu-chatterbox)
  • CUDA 12 builds: Add after other CUDA 12 builds (e.g., after gpu-nvidia-cuda-12-chatterbox)
  • CUDA 13 builds: Add after other CUDA 13 builds (e.g., after gpu-nvidia-cuda-13-chatterbox)

Additional build types you may need:

  • ROCm/HIP: Use build-type: 'hipblas' with base-image: "rocm/dev-ubuntu-24.04:7.2.1"
  • Intel/SYCL: Use build-type: 'intel' or build-type: 'sycl_f16'/sycl_f32 with base-image: "intel/oneapi-basekit:2025.3.2-0-devel-ubuntu24.04"
  • L4T (ARM): Use build-type: 'l4t' with platforms: 'linux/arm64' and runs-on: 'ubuntu-24.04-arm'

Per-arch native builds (linux/amd64 + linux/arm64):

Multi-arch backends are NOT a single matrix entry with platforms: 'linux/amd64,linux/arm64'. Instead, add two entries — one with platforms: 'linux/amd64' + platform-tag: 'amd64' + runs-on: 'ubuntu-latest', one with platforms: 'linux/arm64' + platform-tag: 'arm64' + runs-on: 'ubuntu-24.04-arm' — both sharing the same tag-suffix. The script detects the shared tag-suffix and emits a merge-matrix entry, so backend-merge-jobs (in backend.yml/backend_pr.yml) automatically assembles the manifest list from per-arch digest artifacts. See -cpu-faster-whisper in .github/backend-matrix.yml for a reference shape.

llama-cpp / ik-llama-cpp / turboquant variants only — builder-base-image:

Entries whose dockerfile is ./backend/Dockerfile.{llama-cpp,ik-llama-cpp,turboquant} must also set a builder-base-image field pointing at a prebuilt base from quay.io/go-skynet/ci-cache:base-grpc-* (CI builds these via .github/workflows/base-images.yml). The mapping is by (build-type, platforms) — see existing entries for the pattern. CI uses these prebuilt bases to skip the gRPC compile (~2535 min cold). Local make backends/<name> ignores builder-base-image and uses the from-source path inside the Dockerfile, so you don't need quay access for local builds.

3. Add Backend Metadata to backend/index.yaml

Step 3a: Add Meta Definition

Add a YAML anchor definition in the ## metas section (around line 2-300). Look for similar backends to use as a template such as diffusers or chatterbox

Step 3b: Add Image Entries

Add image entries at the end of the file, following the pattern of similar backends such as diffusers or chatterbox. Include both latest (production) and master (development) tags.

Note on integrity: OCI backends installed from a gallery whose verification: block is set are verified against a keyless-cosign policy before extraction; tarball/HTTP backends use the optional sha256: field. New backends do not need any extra YAML — the gallery-level verification: block covers every entry. See .agents/backend-signing.md for the producer-side CI step.

4. Update the Makefile

The Makefile needs to be updated in several places to support building and testing the new backend:

Step 4a: Add to .NOTPARALLEL

Add backends/<backend-name> to the .NOTPARALLEL line (around line 2) to prevent parallel execution conflicts:

.NOTPARALLEL: ... backends/<backend-name>

Step 4b: Add to prepare-test-extra

Add the backend to the prepare-test-extra target to prepare it for testing. Use the path matching your language bucket (backend/python/, backend/go/, backend/rust/, …):

prepare-test-extra: protogen-python
	...
	$(MAKE) -C backend/<lang>/<backend-name>

For Rust backends the target is usually the crate build target itself (e.g. $(MAKE) -C backend/rust/<backend-name> <backend-name>-grpc) so the binary is in place before test runs.

Step 4c: Add to test-extra

Add the backend to the test-extra target to run its tests — applies to Go and Rust backends too, not only Python:

test-extra: prepare-test-extra
	...
	$(MAKE) -C backend/<lang>/<backend-name> test

Each backend's own Makefile should define a test target so this line works regardless of language. Integration tests that need large model downloads should be gated behind an env var (see backend/rust/kokoros/'s KOKOROS_MODEL_PATH pattern) so CI only runs unit tests.

Step 4d: Add Backend Definition

Add a backend definition variable in the backend definitions section (around line 428-457). The format depends on the backend type:

For Python backends with root context (like faster-whisper, coqui):

BACKEND_<BACKEND_NAME> = <backend-name>|python|.|false|true

For Python backends with ./backend context (like chatterbox, moonshine):

BACKEND_<BACKEND_NAME> = <backend-name>|python|./backend|false|true

For Go backends:

BACKEND_<BACKEND_NAME> = <backend-name>|golang|.|false|true

For Rust backends:

BACKEND_<BACKEND_NAME> = <backend-name>|rust|.|false|true

The language field (python/golang/rust/…) must match a backend/Dockerfile.<lang> file.

Step 4e: Generate Docker Build Target

Add an eval call to generate the docker-build target (around line 480-501):

$(eval $(call generate-docker-build-target,$(BACKEND_<BACKEND_NAME>)))

Step 4f: Add to docker-build-backends

Add docker-build-<backend-name> to the docker-build-backends target (around line 507):

docker-build-backends: ... docker-build-<backend-name>

Determining the Context:

  • If the backend is in backend/python/<backend-name>/ and uses ./backend as context in the workflow file, use ./backend context
  • If the backend is in backend/python/<backend-name>/ but uses . as context in the workflow file, use . context
  • Check similar backends to determine the correct context

Documenting the backend (README + docs)

A backend is not "added" until it is discoverable. Update the user-facing docs:

  • docs/content/features/backends.md - add the backend to the right category in the "LocalAI supports various types of backends" list (and add a new category if it introduces a new modality, e.g. sound classification).
  • If the backend introduces a new API surface (a new endpoint or a realtime capability), document it under docs/content/ where its area lives (audio, vision, etc.) and follow the api-endpoints checklist in api-endpoints-and-auth.md.

If the backend is a native C/C++/GGML engine created and maintained by the LocalAI team (a from-scratch port like parakeet.cpp, ced.cpp, vibevoice.cpp, rf-detr.cpp, not a wrapper around a third-party runtime), it ALSO belongs in the top-level README.md table under "native C/C++/GGML engines ... developed and maintained by the LocalAI project itself". Add a row linking the upstream engine repo with a one-line description. This is the project's showcase of its own engines; a new in-house backend that is missing from it is a documentation bug.

5. Verification Checklist

After adding a new backend, verify:

  • Backend directory structure is complete with all necessary files
  • Build configurations added to .github/backend-matrix.yml for all desired platforms (per-arch entries with platform-tag for multi-arch; builder-base-image for llama-cpp / ik-llama-cpp / turboquant)
  • Meta definition added to backend/index.yaml in the ## metas section
  • Image entries added to backend/index.yaml for all build variants (latest + development)
  • Tag suffixes match between workflow file and index.yaml
  • Makefile updated with all 6 required changes (.NOTPARALLEL, prepare-test-extra, test-extra, backend definition, docker-build target eval, docker-build-backends)
  • No YAML syntax errors (check with linter)
  • No Makefile syntax errors (check with linter)
  • Follows the same pattern as similar backends (e.g., if it's a transcription backend, follow faster-whisper pattern)
  • Documented: added to the category list in docs/content/features/backends.md (and any new endpoint/realtime capability documented under docs/content/)
  • If it is an in-house native C/C++/GGML engine, added to the maintained-engines table in the top-level README.md

Bundling runtime shared libraries (package.sh)

The final Dockerfile.python stage is FROM scratch — there is no system libc, no apt, no fallback library path. Only files explicitly copied from the builder stage end up in the backend image. That means any runtime dlopen your backend (or its Python deps) needs must be packaged into ${BACKEND}/lib/.

Pattern:

  1. Make sure the library is installed in the builder stage of backend/Dockerfile.python (add it to the top-level apt-get install).
  2. Drop a package.sh in your backend directory that copies the library — and its soname symlinks — into $(dirname $0)/lib. See backend/python/vllm/package.sh for a reference implementation that walks /usr/lib/x86_64-linux-gnu, /usr/lib/aarch64-linux-gnu, etc.
  3. Dockerfile.python already runs package.sh automatically if it exists, after package-gpu-libs.sh.
  4. libbackend.sh automatically prepends ${EDIR}/lib to LD_LIBRARY_PATH at run time, so anything packaged this way is found by dlopen.

How to find missing libs: when a Python module silently fails to register torch ops or you see AttributeError: '_OpNamespace' '...' object has no attribute '...', run the backend image's Python with LD_DEBUG=libs to see which dlopen failed. The filename in the error message (e.g. libnuma.so.1) is what you need to package.

To verify packaging works without trusting the host:

make docker-build-<backend>
CID=$(docker create --entrypoint=/run.sh local-ai-backend:<backend>)
docker cp $CID:/lib /tmp/check && docker rm $CID
ls /tmp/check    # expect the bundled .so files + symlinks

Then boot it inside a fresh ubuntu:24.04 (which intentionally does not have the lib installed) to confirm it actually loads from the backend dir.

Importer integration

When you add a new backend, you MUST also make it importable via the model import form (/import-model). The import form dropdown is sourced dynamically from GET /backends/known — it reads the importer registry at core/gallery/importers/importers.go, so the steps below are the ONLY way to make your backend show up.

Required steps:

  1. If your backend has unambiguous detection signals (unique file extension, HF pipeline_tag, unique repo name pattern, unique artefact like modules.json):
    • Create an importer file at core/gallery/importers/<backend>.go following the Match/Import pattern in llama-cpp.go.
    • Register it in importers.go:defaultImporters in specificity order — more specific detectors must appear BEFORE more generic ones (e.g. sentencetransformers before transformers, stablediffusion-ggml before llama-cpp, vllm-omni before vllm). First match wins.
  2. If your backend is a drop-in replacement (same artefacts as another backend, e.g. ik-llama-cpp and turboquant both consume GGUF the same way llama-cpp does):
    • Do NOT create a new importer. Extend the existing importer's Import() to swap the emitted backend: field when preferences.backend matches. See llama-cpp.go for the pattern.
  3. If your backend has no reliable auto-detect signal (preference-only — e.g. sglang, tinygrad, whisperx):
    • Do NOT create an importer. Instead add the backend name to the curated pref-only slice in core/http/endpoints/localai/backend.go that feeds /backends/known. A single line addition.
  4. Always add a table-driven test in core/gallery/importers/importers_test.go (Ginkgo/Gomega):
    • Use a real public HuggingFace repo URI as the test fixture (existing tests already hit the live HF API — follow that pattern).
    • Cover detection (auto-match without preferences), preference-override (explicit backend: in preferences wins), and — if the backend's modality has a common pipeline_tag but ambiguous artefacts — an ambiguity test asserting errors.Is(err, importers.ErrAmbiguousImport).

Rules of thumb:

  • When in doubt, lean pref-only. A wrong auto-detect is worse than a forced preference.
  • Never silently emit a modality mismatch (e.g. emit llama-cpp for a TTS repo because .gguf is present). Return ErrAmbiguousImport instead.
  • Registration order is the single most common source of bugs. Check by running go test ./core/gallery/importers/... — the existing suite will fail if you've shadowed a pre-existing detector.

6. Example: Adding a Python Backend

For reference, when moonshine was added:

  • Files created: backend/python/moonshine/{backend.py, Makefile, install.sh, protogen.sh, requirements.txt, run.sh, test.py, test.sh}
  • Workflow entries: 3 build configurations (CPU, CUDA 12, CUDA 13)
  • Index entries: 1 meta definition + 6 image entries (cpu, cuda12, cuda13 x latest/development)
  • Makefile updates:
    • Added to .NOTPARALLEL line
    • Added to prepare-test-extra and test-extra targets
    • Added BACKEND_MOONSHINE = moonshine|python|./backend|false|true
    • Added eval for docker-build target generation
    • Added docker-build-moonshine to docker-build-backends