feat(backend): rfdetr-cpp native object detection + segmentation backend (#10028)

Adds a Go native gRPC backend that dlopens librfdetrcpp.so (built from mudler/rf-detr.cpp at the pinned RFDETR_VERSION) via purego and exposes the rfdetr.cpp inference pipeline through LocalAI's existing Detect RPC. Supports all 5 RF-DETR detection variants (Nano/Small/Base/Medium/Large) and 6 segmentation variants (SegNano/SegSmall/SegMedium/SegLarge/ SegXLarge/Seg2XLarge) with F32/F16/Q8_0/Q4_K quantizations. Pre-built GGUFs ship at mudler/rfdetr-cpp-* on HuggingFace. Detection returns Bbox + class_name + confidence; segmentation also returns PNG-encoded per-detection masks via the rfdetr_capi accessor functions (rfdetr_capi_get_detection_{class_id,box,score,class_name, mask_png}). End-to-end verified through POST /v1/detection: HTTP -> gRPC -> purego dlopen -> rfdetr.cpp -> ggml -> response (9 detections on the detection model, 21 detections + valid PNG masks on the seg-nano model against the kitchen fixture). Wiring: - backend/go/rfdetr-cpp/{main.go,gorfdetrcpp.go,CMakeLists.txt, Makefile,run.sh,package.sh,test.sh,.gitignore} - Top-level Makefile: BACKEND_RFDETR_CPP, docker-build target, .NOTPARALLEL, prepare-test-extra, test-extra - backend/go/rfdetr-cpp/Makefile: `test` target invoked by test-extra - .github/backend-matrix.yml: CPU + CUDA-12/13 + L4T CUDA-12/13 (arm64) + HIP + Vulkan (amd64 + arm64) + SYCL f32/f16 - backend/index.yaml: rfdetr-cpp meta anchor + latest/development image entries for every matrix tag-suffix - .github/workflows/bump_deps.yaml: RFDETR_VERSION pin tracking (mudler/rf-detr.cpp branch main) - gallery/index.yaml: 11 rfdetr-cpp-* entries (nano + 4 detection variants + 6 seg variants), all backed by mudler/rfdetr-cpp-* on HuggingFace with sha256 pinning on the F16 default - core/gallery/importers/rfdetr.go: GGUF auto-routing for HF imports (mudler/rfdetr-cpp-* repos route to rfdetr-cpp, Transformer-format repos stay on the Python rfdetr backend; explicit preferences.backend overrides both heuristics) - core/gallery/importers/rfdetr_test.go: table-driven coverage of the auto-routing + a live mudler/rfdetr-cpp-nano cross-check scripts/changed-backends.js needs no change: the existing Dockerfile.golang -> backend/go/${item.backend}/ branch already routes the 9 rfdetr-cpp matrix entries to the correct backend path. Assisted-by: Claude:claude-opus-4-7 [Claude Code] Signed-off-by: Ettore Di Giacinto <mudler@localai.io> Co-authored-by: Ettore Di Giacinto <mudler@localai.io>
2026-07-14 02:04:13 -04:00 · 2026-05-27 18:43:57 +02:00
parent 893e69cbf8
commit 7a4ca8f60d
18 changed files with 1697 additions and 6 deletions
--- a/docs/content/features/object-detection.md
+++ b/docs/content/features/object-detection.md
@@ -5,7 +5,7 @@ weight = 13
 url = "/features/object-detection/"
 +++

-LocalAI supports object detection and image segmentation through various backends. This feature allows you to identify and locate objects within images with high accuracy and real-time performance. Available backends include [RF-DETR](https://github.com/roboflow/rf-detr) for object detection and [sam3.cpp](https://github.com/PABannier/sam3.cpp) for image segmentation (SAM 3/2/EdgeTAM).
+LocalAI supports object detection and image segmentation through various backends. This feature allows you to identify and locate objects within images with high accuracy and real-time performance. Available backends include [RF-DETR](https://github.com/roboflow/rf-detr) (Python) and [rf-detr.cpp](https://github.com/mudler/rf-detr.cpp) (native C++/ggml) for object detection and segmentation, and [sam3.cpp](https://github.com/PABannier/sam3.cpp) for image segmentation (SAM 3/2/EdgeTAM).

 For detecting **faces** specifically, see the dedicated
 [Face Recognition](/features/face-recognition/) feature — its
@@ -135,6 +135,74 @@ Currently, the following model is available in the [Model Gallery]({{%relref "fe

 You can browse and install this model through the LocalAI web interface or using the command line.

+### RF-DETR Native Backend (rfdetr-cpp)
+
+The `rfdetr-cpp` backend is a native C++/ggml implementation of RF-DETR
+inference based on [rf-detr.cpp](https://github.com/mudler/rf-detr.cpp). It
+runs as a Go gRPC service that dlopens a per-CPU-variant shared library, so
+there is no Python runtime on the inference path — startup is fast and the
+binary is self-contained.
+
+Compared to the Python `rfdetr` backend, the native backend:
+
+- Has no Python or PyTorch dependency at inference time
+- Loads quantized GGUF models (F32, F16, Q8_0, Q4_K) for smaller footprint
+- Supports both detection and segmentation variants of RF-DETR
+- Returns segmentation masks as PNG bytes in `Detection.mask`
+
+#### Setup
+
+1. **Install the backend**
+
+   ```bash
+   local-ai backends install rfdetr-cpp
+   ```
+
+2. **Using the Model Gallery (Recommended)**
+
+   The gallery ships ready-to-run entries for every published variant:
+
+   ```bash
+   # Detection variants
+   local-ai run rfdetr-cpp-nano
+   local-ai run rfdetr-cpp-small
+   local-ai run rfdetr-cpp-base
+   local-ai run rfdetr-cpp-medium
+   local-ai run rfdetr-cpp-large
+
+   # Segmentation variants (return per-instance PNG masks)
+   local-ai run rfdetr-cpp-seg-nano
+   local-ai run rfdetr-cpp-seg-small
+   local-ai run rfdetr-cpp-seg-medium
+   local-ai run rfdetr-cpp-seg-large
+   local-ai run rfdetr-cpp-seg-xlarge
+   local-ai run rfdetr-cpp-seg-2xlarge
+   ```
+
+3. **Manual Configuration**
+
+   ```yaml
+   name: rfdetr-cpp-seg-nano
+   backend: rfdetr-cpp
+   parameters:
+     model: rfdetr-seg-nano-f16.gguf
+     threads: 4
+   known_usecases:
+     - detection
+   ```
+
+   Pre-quantized GGUFs are published under
+   [`mudler/rfdetr-cpp-*`](https://huggingface.co/mudler?search_models=rfdetr-cpp)
+   on Hugging Face. Each repo carries the F32/F16/Q8_0/Q4_K quants — F16 is
+   the recommended default (matches F32 accuracy, ~1.86x smaller).
+
+#### Segmentation Output
+
+When running a segmentation model (any `rfdetr-cpp-seg-*` variant), each
+`Detection` in the response carries a `mask` field with a base64-encoded
+PNG of the per-instance binary mask. The mask is sized to the original
+image resolution and aligns with the corresponding bounding box.
+
 ### SAM3 Backend (sam3-cpp)

 The sam3-cpp backend provides image segmentation using [sam3.cpp](https://github.com/PABannier/sam3.cpp), a portable C++ implementation of Meta's Segment Anything Model. It supports multiple model architectures:
@@ -261,7 +329,8 @@ local-ai run --debug rfdetr-base

 LocalAI includes a dedicated **object-detection** category for models and backends that specialize in identifying and locating objects within images. This category currently includes:

- **RF-DETR**: Real-time transformer-based object detection
+- **RF-DETR**: Real-time transformer-based object detection (Python backend)
+- **rfdetr-cpp**: Native C++/ggml RF-DETR for detection + segmentation
 - **sam3-cpp**: SAM 3/2/EdgeTAM image segmentation

 Additional object detection models and backends will be added to this category in the future. You can filter models by the `object-detection` tag in the model gallery to find all available object detection models.