Adds a Go native gRPC backend that dlopens librfdetrcpp.so (built from
mudler/rf-detr.cpp at the pinned RFDETR_VERSION) via purego and exposes
the rfdetr.cpp inference pipeline through LocalAI's existing Detect RPC.
Supports all 5 RF-DETR detection variants (Nano/Small/Base/Medium/Large)
and 6 segmentation variants (SegNano/SegSmall/SegMedium/SegLarge/
SegXLarge/Seg2XLarge) with F32/F16/Q8_0/Q4_K quantizations. Pre-built
GGUFs ship at mudler/rfdetr-cpp-* on HuggingFace.
Detection returns Bbox + class_name + confidence; segmentation also
returns PNG-encoded per-detection masks via the rfdetr_capi accessor
functions (rfdetr_capi_get_detection_{class_id,box,score,class_name,
mask_png}).
End-to-end verified through POST /v1/detection: HTTP -> gRPC -> purego
dlopen -> rfdetr.cpp -> ggml -> response (9 detections on the detection
model, 21 detections + valid PNG masks on the seg-nano model against
the kitchen fixture).
Wiring:
- backend/go/rfdetr-cpp/{main.go,gorfdetrcpp.go,CMakeLists.txt,
Makefile,run.sh,package.sh,test.sh,.gitignore}
- Top-level Makefile: BACKEND_RFDETR_CPP, docker-build target,
.NOTPARALLEL, prepare-test-extra, test-extra
- backend/go/rfdetr-cpp/Makefile: `test` target invoked by test-extra
- .github/backend-matrix.yml: CPU + CUDA-12/13 + L4T CUDA-12/13
(arm64) + HIP + Vulkan (amd64 + arm64) + SYCL f32/f16
- backend/index.yaml: rfdetr-cpp meta anchor + latest/development
image entries for every matrix tag-suffix
- .github/workflows/bump_deps.yaml: RFDETR_VERSION pin tracking
(mudler/rf-detr.cpp branch main)
- gallery/index.yaml: 11 rfdetr-cpp-* entries (nano + 4 detection
variants + 6 seg variants), all backed by mudler/rfdetr-cpp-*
on HuggingFace with sha256 pinning on the F16 default
- core/gallery/importers/rfdetr.go: GGUF auto-routing for HF imports
(mudler/rfdetr-cpp-* repos route to rfdetr-cpp, Transformer-format
repos stay on the Python rfdetr backend; explicit preferences.backend
overrides both heuristics)
- core/gallery/importers/rfdetr_test.go: table-driven coverage of the
auto-routing + a live mudler/rfdetr-cpp-nano cross-check
scripts/changed-backends.js needs no change: the existing
Dockerfile.golang -> backend/go/${item.backend}/ branch already routes
the 9 rfdetr-cpp matrix entries to the correct backend path.
Assisted-by: Claude:claude-opus-4-7 [Claude Code]
Signed-off-by: Ettore Di Giacinto <mudler@localai.io>
Co-authored-by: Ettore Di Giacinto <mudler@localai.io>
11 KiB
+++ disableToc = false title = "Object Detection" weight = 13 url = "/features/object-detection/" +++
LocalAI supports object detection and image segmentation through various backends. This feature allows you to identify and locate objects within images with high accuracy and real-time performance. Available backends include RF-DETR (Python) and rf-detr.cpp (native C++/ggml) for object detection and segmentation, and sam3.cpp for image segmentation (SAM 3/2/EdgeTAM).
For detecting faces specifically, see the dedicated
Face Recognition feature — its
/v1/detection support is tuned for face bounding boxes and ships
with commercially-safe model options.
Overview
Object detection in LocalAI is implemented through dedicated backends that can identify and locate objects within images. Each backend provides different capabilities and model architectures.
Key Features:
- Real-time object detection
- High accuracy detection with bounding boxes
- Image segmentation with binary masks (SAM backends)
- Text-prompted, point-prompted, and box-prompted segmentation
- Support for multiple hardware accelerators (CPU, NVIDIA GPU, Intel GPU, AMD GPU)
- Structured detection results with confidence scores
- Easy integration through the
/v1/detectionendpoint
Usage
Detection Endpoint
LocalAI provides a dedicated /v1/detection endpoint for object detection tasks. This endpoint is specifically designed for object detection and returns structured detection results with bounding boxes and confidence scores.
API Reference
To perform object detection, send a POST request to the /v1/detection endpoint:
curl -X POST http://localhost:8080/v1/detection \
-H "Content-Type: application/json" \
-d '{
"model": "rfdetr-base",
"image": "https://media.roboflow.com/dog.jpeg"
}'
Request Format
The request body should contain:
model: The name of the object detection model (e.g., "rfdetr-base")image: The image to analyze, which can be:- A URL to an image
- A base64-encoded image
prompt(optional): Text prompt for text-prompted segmentation (SAM 3 only)points(optional): Point coordinates as[x, y, label, ...]triples (label: 1=positive, 0=negative)boxes(optional): Box coordinates as[x1, y1, x2, y2, ...]quadsthreshold(optional): Detection confidence threshold (default: 0.5)
Response Format
The API returns a JSON response with detected objects:
{
"detections": [
{
"x": 100.5,
"y": 150.2,
"width": 200.0,
"height": 300.0,
"confidence": 0.95,
"class_name": "dog"
},
{
"x": 400.0,
"y": 200.0,
"width": 150.0,
"height": 250.0,
"confidence": 0.87,
"class_name": "person"
}
]
}
Each detection includes:
x,y: Coordinates of the bounding box top-left cornerwidth,height: Dimensions of the bounding boxconfidence: Detection confidence score (0.0 to 1.0)class_name: The detected object classmask(optional): Base64-encoded PNG binary segmentation mask (SAM backends only)
Backends
RF-DETR Backend
The RF-DETR backend is implemented as a Python-based gRPC service that integrates seamlessly with LocalAI. It provides object detection capabilities using the RF-DETR model architecture and supports multiple hardware configurations:
- CPU: Optimized for CPU inference
- NVIDIA GPU: CUDA acceleration for NVIDIA GPUs
- Intel GPU: Intel oneAPI optimization
- AMD GPU: ROCm acceleration for AMD GPUs
- NVIDIA Jetson: Optimized for ARM64 NVIDIA Jetson devices
Setup
-
Using the Model Gallery (Recommended)
The easiest way to get started is using the model gallery. The
rfdetr-basemodel is available in the official LocalAI gallery:# Install and run the rfdetr-base model local-ai run rfdetr-baseYou can also install it through the web interface by navigating to the Models section and searching for "rfdetr-base".
-
Manual Configuration
Create a model configuration file in your
modelsdirectory:name: rfdetr backend: rfdetr parameters: model: rfdetr-base
Available Models
Currently, the following model is available in the [Model Gallery]({{%relref "features/model-gallery" %}}):
- rfdetr-base: Base model with balanced performance and accuracy
You can browse and install this model through the LocalAI web interface or using the command line.
RF-DETR Native Backend (rfdetr-cpp)
The rfdetr-cpp backend is a native C++/ggml implementation of RF-DETR
inference based on rf-detr.cpp. It
runs as a Go gRPC service that dlopens a per-CPU-variant shared library, so
there is no Python runtime on the inference path — startup is fast and the
binary is self-contained.
Compared to the Python rfdetr backend, the native backend:
- Has no Python or PyTorch dependency at inference time
- Loads quantized GGUF models (F32, F16, Q8_0, Q4_K) for smaller footprint
- Supports both detection and segmentation variants of RF-DETR
- Returns segmentation masks as PNG bytes in
Detection.mask
Setup
-
Install the backend
local-ai backends install rfdetr-cpp -
Using the Model Gallery (Recommended)
The gallery ships ready-to-run entries for every published variant:
# Detection variants local-ai run rfdetr-cpp-nano local-ai run rfdetr-cpp-small local-ai run rfdetr-cpp-base local-ai run rfdetr-cpp-medium local-ai run rfdetr-cpp-large # Segmentation variants (return per-instance PNG masks) local-ai run rfdetr-cpp-seg-nano local-ai run rfdetr-cpp-seg-small local-ai run rfdetr-cpp-seg-medium local-ai run rfdetr-cpp-seg-large local-ai run rfdetr-cpp-seg-xlarge local-ai run rfdetr-cpp-seg-2xlarge -
Manual Configuration
name: rfdetr-cpp-seg-nano backend: rfdetr-cpp parameters: model: rfdetr-seg-nano-f16.gguf threads: 4 known_usecases: - detectionPre-quantized GGUFs are published under
mudler/rfdetr-cpp-*on Hugging Face. Each repo carries the F32/F16/Q8_0/Q4_K quants — F16 is the recommended default (matches F32 accuracy, ~1.86x smaller).
Segmentation Output
When running a segmentation model (any rfdetr-cpp-seg-* variant), each
Detection in the response carries a mask field with a base64-encoded
PNG of the per-instance binary mask. The mask is sized to the original
image resolution and aligns with the corresponding bounding box.
SAM3 Backend (sam3-cpp)
The sam3-cpp backend provides image segmentation using sam3.cpp, a portable C++ implementation of Meta's Segment Anything Model. It supports multiple model architectures:
- SAM 3: Full model with text encoder for text-prompted detection and segmentation
- SAM 2 / SAM 2.1: Hiera backbone models in multiple sizes
- SAM 3 Visual-Only: Point/box segmentation without text encoder
- EdgeTAM: Ultra-efficient mobile variant (~15MB quantized)
Setup
-
Manual Configuration
Create a model configuration file in your
modelsdirectory:name: sam3 backend: sam3-cpp parameters: model: edgetam_q4_0.ggml threads: 4 known_usecases: - detectionDownload the model from Hugging Face.
Segmentation Modes
Point-prompted segmentation (all models):
curl -X POST http://localhost:8080/v1/detection \
-H "Content-Type: application/json" \
-d '{
"model": "sam3",
"image": "data:image/jpeg;base64,...",
"points": [256.0, 256.0, 1.0],
"threshold": 0.5
}'
Box-prompted segmentation (all models):
curl -X POST http://localhost:8080/v1/detection \
-H "Content-Type: application/json" \
-d '{
"model": "sam3",
"image": "data:image/jpeg;base64,...",
"boxes": [100.0, 100.0, 400.0, 400.0],
"threshold": 0.5
}'
Text-prompted segmentation (SAM 3 full model only):
curl -X POST http://localhost:8080/v1/detection \
-H "Content-Type: application/json" \
-d '{
"model": "sam3",
"image": "data:image/jpeg;base64,...",
"prompt": "cat",
"threshold": 0.5
}'
The response includes segmentation masks as base64-encoded PNGs in the mask field of each detection.
Examples
Basic Object Detection
curl -X POST http://localhost:8080/v1/detection \
-H "Content-Type: application/json" \
-d '{
"model": "rfdetr-base",
"image": "https://example.com/image.jpg"
}'
Base64 Image Detection
base64_image=$(base64 -w 0 image.jpg)
curl -X POST http://localhost:8080/v1/detection \
-H "Content-Type: application/json" \
-d "{
\"model\": \"rfdetr-base\",
\"image\": \"data:image/jpeg;base64,$base64_image\"
}"
Troubleshooting
Common Issues
-
Model Loading Errors
- Ensure the model file is properly downloaded
- Check available disk space
- Verify model compatibility with your backend version
-
Low Detection Accuracy
- Ensure good image quality and lighting
- Check if objects are clearly visible
- Consider using a larger model for better accuracy
-
Slow Performance
- Enable GPU acceleration if available
- Use a smaller model for faster inference
- Optimize image resolution
Debug Mode
Enable debug logging for troubleshooting:
local-ai run --debug rfdetr-base
Object Detection Category
LocalAI includes a dedicated object-detection category for models and backends that specialize in identifying and locating objects within images. This category currently includes:
- RF-DETR: Real-time transformer-based object detection (Python backend)
- rfdetr-cpp: Native C++/ggml RF-DETR for detection + segmentation
- sam3-cpp: SAM 3/2/EdgeTAM image segmentation
Additional object detection models and backends will be added to this category in the future. You can filter models by the object-detection tag in the model gallery to find all available object detection models.
Related Features
- [🎨 Image generation]({{%relref "features/image-generation" %}}): Generate images with AI
- [📖 Text generation]({{%relref "features/text-generation" %}}): Generate text with language models
- [🔍 GPT Vision]({{%relref "features/gpt-vision" %}}): Analyze images with language models
- [🚀 GPU acceleration]({{%relref "features/GPU-acceleration" %}}): Optimize performance with GPU acceleration