Files
LocalAI/docs/content/features/object-detection.md
LocalAI [bot] 7a4ca8f60d feat(backend): rfdetr-cpp native object detection + segmentation backend (#10028)
Adds a Go native gRPC backend that dlopens librfdetrcpp.so (built from
mudler/rf-detr.cpp at the pinned RFDETR_VERSION) via purego and exposes
the rfdetr.cpp inference pipeline through LocalAI's existing Detect RPC.

Supports all 5 RF-DETR detection variants (Nano/Small/Base/Medium/Large)
and 6 segmentation variants (SegNano/SegSmall/SegMedium/SegLarge/
SegXLarge/Seg2XLarge) with F32/F16/Q8_0/Q4_K quantizations. Pre-built
GGUFs ship at mudler/rfdetr-cpp-* on HuggingFace.

Detection returns Bbox + class_name + confidence; segmentation also
returns PNG-encoded per-detection masks via the rfdetr_capi accessor
functions (rfdetr_capi_get_detection_{class_id,box,score,class_name,
mask_png}).

End-to-end verified through POST /v1/detection: HTTP -> gRPC -> purego
dlopen -> rfdetr.cpp -> ggml -> response (9 detections on the detection
model, 21 detections + valid PNG masks on the seg-nano model against
the kitchen fixture).

Wiring:
  - backend/go/rfdetr-cpp/{main.go,gorfdetrcpp.go,CMakeLists.txt,
    Makefile,run.sh,package.sh,test.sh,.gitignore}
  - Top-level Makefile: BACKEND_RFDETR_CPP, docker-build target,
    .NOTPARALLEL, prepare-test-extra, test-extra
  - backend/go/rfdetr-cpp/Makefile: `test` target invoked by test-extra
  - .github/backend-matrix.yml: CPU + CUDA-12/13 + L4T CUDA-12/13
    (arm64) + HIP + Vulkan (amd64 + arm64) + SYCL f32/f16
  - backend/index.yaml: rfdetr-cpp meta anchor + latest/development
    image entries for every matrix tag-suffix
  - .github/workflows/bump_deps.yaml: RFDETR_VERSION pin tracking
    (mudler/rf-detr.cpp branch main)
  - gallery/index.yaml: 11 rfdetr-cpp-* entries (nano + 4 detection
    variants + 6 seg variants), all backed by mudler/rfdetr-cpp-*
    on HuggingFace with sha256 pinning on the F16 default
  - core/gallery/importers/rfdetr.go: GGUF auto-routing for HF imports
    (mudler/rfdetr-cpp-* repos route to rfdetr-cpp, Transformer-format
    repos stay on the Python rfdetr backend; explicit preferences.backend
    overrides both heuristics)
  - core/gallery/importers/rfdetr_test.go: table-driven coverage of the
    auto-routing + a live mudler/rfdetr-cpp-nano cross-check

scripts/changed-backends.js needs no change: the existing
Dockerfile.golang -> backend/go/${item.backend}/ branch already routes
the 9 rfdetr-cpp matrix entries to the correct backend path.

Assisted-by: Claude:claude-opus-4-7 [Claude Code]

Signed-off-by: Ettore Di Giacinto <mudler@localai.io>
Co-authored-by: Ettore Di Giacinto <mudler@localai.io>
2026-05-27 18:43:57 +02:00

344 lines
11 KiB
Markdown

+++
disableToc = false
title = "Object Detection"
weight = 13
url = "/features/object-detection/"
+++
LocalAI supports object detection and image segmentation through various backends. This feature allows you to identify and locate objects within images with high accuracy and real-time performance. Available backends include [RF-DETR](https://github.com/roboflow/rf-detr) (Python) and [rf-detr.cpp](https://github.com/mudler/rf-detr.cpp) (native C++/ggml) for object detection and segmentation, and [sam3.cpp](https://github.com/PABannier/sam3.cpp) for image segmentation (SAM 3/2/EdgeTAM).
For detecting **faces** specifically, see the dedicated
[Face Recognition](/features/face-recognition/) feature — its
`/v1/detection` support is tuned for face bounding boxes and ships
with commercially-safe model options.
## Overview
Object detection in LocalAI is implemented through dedicated backends that can identify and locate objects within images. Each backend provides different capabilities and model architectures.
**Key Features:**
- Real-time object detection
- High accuracy detection with bounding boxes
- Image segmentation with binary masks (SAM backends)
- Text-prompted, point-prompted, and box-prompted segmentation
- Support for multiple hardware accelerators (CPU, NVIDIA GPU, Intel GPU, AMD GPU)
- Structured detection results with confidence scores
- Easy integration through the `/v1/detection` endpoint
## Usage
### Detection Endpoint
LocalAI provides a dedicated `/v1/detection` endpoint for object detection tasks. This endpoint is specifically designed for object detection and returns structured detection results with bounding boxes and confidence scores.
### API Reference
To perform object detection, send a POST request to the `/v1/detection` endpoint:
```bash
curl -X POST http://localhost:8080/v1/detection \
-H "Content-Type: application/json" \
-d '{
"model": "rfdetr-base",
"image": "https://media.roboflow.com/dog.jpeg"
}'
```
### Request Format
The request body should contain:
- `model`: The name of the object detection model (e.g., "rfdetr-base")
- `image`: The image to analyze, which can be:
- A URL to an image
- A base64-encoded image
- `prompt` (optional): Text prompt for text-prompted segmentation (SAM 3 only)
- `points` (optional): Point coordinates as `[x, y, label, ...]` triples (label: 1=positive, 0=negative)
- `boxes` (optional): Box coordinates as `[x1, y1, x2, y2, ...]` quads
- `threshold` (optional): Detection confidence threshold (default: 0.5)
### Response Format
The API returns a JSON response with detected objects:
```json
{
"detections": [
{
"x": 100.5,
"y": 150.2,
"width": 200.0,
"height": 300.0,
"confidence": 0.95,
"class_name": "dog"
},
{
"x": 400.0,
"y": 200.0,
"width": 150.0,
"height": 250.0,
"confidence": 0.87,
"class_name": "person"
}
]
}
```
Each detection includes:
- `x`, `y`: Coordinates of the bounding box top-left corner
- `width`, `height`: Dimensions of the bounding box
- `confidence`: Detection confidence score (0.0 to 1.0)
- `class_name`: The detected object class
- `mask` (optional): Base64-encoded PNG binary segmentation mask (SAM backends only)
## Backends
### RF-DETR Backend
The RF-DETR backend is implemented as a Python-based gRPC service that integrates seamlessly with LocalAI. It provides object detection capabilities using the RF-DETR model architecture and supports multiple hardware configurations:
- **CPU**: Optimized for CPU inference
- **NVIDIA GPU**: CUDA acceleration for NVIDIA GPUs
- **Intel GPU**: Intel oneAPI optimization
- **AMD GPU**: ROCm acceleration for AMD GPUs
- **NVIDIA Jetson**: Optimized for ARM64 NVIDIA Jetson devices
#### Setup
1. **Using the Model Gallery (Recommended)**
The easiest way to get started is using the model gallery. The `rfdetr-base` model is available in the official LocalAI gallery:
```bash
# Install and run the rfdetr-base model
local-ai run rfdetr-base
```
You can also install it through the web interface by navigating to the Models section and searching for "rfdetr-base".
2. **Manual Configuration**
Create a model configuration file in your `models` directory:
```yaml
name: rfdetr
backend: rfdetr
parameters:
model: rfdetr-base
```
#### Available Models
Currently, the following model is available in the [Model Gallery]({{%relref "features/model-gallery" %}}):
- **rfdetr-base**: Base model with balanced performance and accuracy
You can browse and install this model through the LocalAI web interface or using the command line.
### RF-DETR Native Backend (rfdetr-cpp)
The `rfdetr-cpp` backend is a native C++/ggml implementation of RF-DETR
inference based on [rf-detr.cpp](https://github.com/mudler/rf-detr.cpp). It
runs as a Go gRPC service that dlopens a per-CPU-variant shared library, so
there is no Python runtime on the inference path — startup is fast and the
binary is self-contained.
Compared to the Python `rfdetr` backend, the native backend:
- Has no Python or PyTorch dependency at inference time
- Loads quantized GGUF models (F32, F16, Q8_0, Q4_K) for smaller footprint
- Supports both detection and segmentation variants of RF-DETR
- Returns segmentation masks as PNG bytes in `Detection.mask`
#### Setup
1. **Install the backend**
```bash
local-ai backends install rfdetr-cpp
```
2. **Using the Model Gallery (Recommended)**
The gallery ships ready-to-run entries for every published variant:
```bash
# Detection variants
local-ai run rfdetr-cpp-nano
local-ai run rfdetr-cpp-small
local-ai run rfdetr-cpp-base
local-ai run rfdetr-cpp-medium
local-ai run rfdetr-cpp-large
# Segmentation variants (return per-instance PNG masks)
local-ai run rfdetr-cpp-seg-nano
local-ai run rfdetr-cpp-seg-small
local-ai run rfdetr-cpp-seg-medium
local-ai run rfdetr-cpp-seg-large
local-ai run rfdetr-cpp-seg-xlarge
local-ai run rfdetr-cpp-seg-2xlarge
```
3. **Manual Configuration**
```yaml
name: rfdetr-cpp-seg-nano
backend: rfdetr-cpp
parameters:
model: rfdetr-seg-nano-f16.gguf
threads: 4
known_usecases:
- detection
```
Pre-quantized GGUFs are published under
[`mudler/rfdetr-cpp-*`](https://huggingface.co/mudler?search_models=rfdetr-cpp)
on Hugging Face. Each repo carries the F32/F16/Q8_0/Q4_K quants — F16 is
the recommended default (matches F32 accuracy, ~1.86x smaller).
#### Segmentation Output
When running a segmentation model (any `rfdetr-cpp-seg-*` variant), each
`Detection` in the response carries a `mask` field with a base64-encoded
PNG of the per-instance binary mask. The mask is sized to the original
image resolution and aligns with the corresponding bounding box.
### SAM3 Backend (sam3-cpp)
The sam3-cpp backend provides image segmentation using [sam3.cpp](https://github.com/PABannier/sam3.cpp), a portable C++ implementation of Meta's Segment Anything Model. It supports multiple model architectures:
- **SAM 3**: Full model with text encoder for text-prompted detection and segmentation
- **SAM 2 / SAM 2.1**: Hiera backbone models in multiple sizes
- **SAM 3 Visual-Only**: Point/box segmentation without text encoder
- **EdgeTAM**: Ultra-efficient mobile variant (~15MB quantized)
#### Setup
1. **Manual Configuration**
Create a model configuration file in your `models` directory:
```yaml
name: sam3
backend: sam3-cpp
parameters:
model: edgetam_q4_0.ggml
threads: 4
known_usecases:
- detection
```
Download the model from [Hugging Face](https://huggingface.co/PABannier/sam3.cpp).
#### Segmentation Modes
**Point-prompted segmentation** (all models):
```bash
curl -X POST http://localhost:8080/v1/detection \
-H "Content-Type: application/json" \
-d '{
"model": "sam3",
"image": "data:image/jpeg;base64,...",
"points": [256.0, 256.0, 1.0],
"threshold": 0.5
}'
```
**Box-prompted segmentation** (all models):
```bash
curl -X POST http://localhost:8080/v1/detection \
-H "Content-Type: application/json" \
-d '{
"model": "sam3",
"image": "data:image/jpeg;base64,...",
"boxes": [100.0, 100.0, 400.0, 400.0],
"threshold": 0.5
}'
```
**Text-prompted segmentation** (SAM 3 full model only):
```bash
curl -X POST http://localhost:8080/v1/detection \
-H "Content-Type: application/json" \
-d '{
"model": "sam3",
"image": "data:image/jpeg;base64,...",
"prompt": "cat",
"threshold": 0.5
}'
```
The response includes segmentation masks as base64-encoded PNGs in the `mask` field of each detection.
## Examples
### Basic Object Detection
```bash
curl -X POST http://localhost:8080/v1/detection \
-H "Content-Type: application/json" \
-d '{
"model": "rfdetr-base",
"image": "https://example.com/image.jpg"
}'
```
### Base64 Image Detection
```bash
base64_image=$(base64 -w 0 image.jpg)
curl -X POST http://localhost:8080/v1/detection \
-H "Content-Type: application/json" \
-d "{
\"model\": \"rfdetr-base\",
\"image\": \"data:image/jpeg;base64,$base64_image\"
}"
```
## Troubleshooting
### Common Issues
1. **Model Loading Errors**
- Ensure the model file is properly downloaded
- Check available disk space
- Verify model compatibility with your backend version
2. **Low Detection Accuracy**
- Ensure good image quality and lighting
- Check if objects are clearly visible
- Consider using a larger model for better accuracy
3. **Slow Performance**
- Enable GPU acceleration if available
- Use a smaller model for faster inference
- Optimize image resolution
### Debug Mode
Enable debug logging for troubleshooting:
```bash
local-ai run --debug rfdetr-base
```
## Object Detection Category
LocalAI includes a dedicated **object-detection** category for models and backends that specialize in identifying and locating objects within images. This category currently includes:
- **RF-DETR**: Real-time transformer-based object detection (Python backend)
- **rfdetr-cpp**: Native C++/ggml RF-DETR for detection + segmentation
- **sam3-cpp**: SAM 3/2/EdgeTAM image segmentation
Additional object detection models and backends will be added to this category in the future. You can filter models by the `object-detection` tag in the model gallery to find all available object detection models.
## Related Features
- [🎨 Image generation]({{%relref "features/image-generation" %}}): Generate images with AI
- [📖 Text generation]({{%relref "features/text-generation" %}}): Generate text with language models
- [🔍 GPT Vision]({{%relref "features/gpt-vision" %}}): Analyze images with language models
- [🚀 GPU acceleration]({{%relref "features/GPU-acceleration" %}}): Optimize performance with GPU acceleration