fix(backend gallery): intel images for python-based backends, re-add exllama2 (#5928 )

chore(backend gallery): fix intel images for python-based backends Signed-off-by: Ettore Di Giacinto <mudler@localai.io>
chore: ⬆️ Update ggml-org/llama.cpp to bf78f5439ee8e82e367674043303ebf8e92b4805 (#5927 )
2026-02-03 11:13:31 -05:00 · 2025-07-28 15:15:19 +02:00 · 2025-07-27 21:08:32 +00:00 · 2025-07-27 22:02:51 +02:00 · 2025-07-27 21:06:09 +02:00 · 2025-07-27 11:51:28 +02:00
49 changed files with 1033 additions and 1850 deletions
--- a/.github/workflows/backend.yml
+++ b/.github/workflows/backend.yml
@@ -381,24 +381,12 @@ jobs:
            dockerfile: "./backend/Dockerfile.python"
            context: "./backend"
          # sycl builds
-          - build-type: 'sycl_f32'
+          - build-type: 'intel'
            cuda-major-version: ""
            cuda-minor-version: ""
            platforms: 'linux/amd64'
            tag-latest: 'auto'
-            tag-suffix: '-gpu-intel-sycl-f32-rerankers'
-            runs-on: 'ubuntu-latest'
-            base-image: "quay.io/go-skynet/intel-oneapi-base:latest"
-            skip-drivers: 'false'
-            backend: "rerankers"
-            dockerfile: "./backend/Dockerfile.python"
-            context: "./backend"
-          - build-type: 'sycl_f16'
-            cuda-major-version: ""
-            cuda-minor-version: ""
-            platforms: 'linux/amd64'
-            tag-latest: 'auto'
-            tag-suffix: '-gpu-intel-sycl-f16-rerankers'
+            tag-suffix: '-gpu-intel-rerankers'
            runs-on: 'ubuntu-latest'
            base-image: "quay.io/go-skynet/intel-oneapi-base:latest"
            skip-drivers: 'false'
@@ -429,60 +417,36 @@ jobs:
            backend: "llama-cpp"
            dockerfile: "./backend/Dockerfile.llama-cpp"
            context: "./"
-          - build-type: 'sycl_f32'
+          - build-type: 'intel'
            cuda-major-version: ""
            cuda-minor-version: ""
            platforms: 'linux/amd64'
            tag-latest: 'auto'
-            tag-suffix: '-gpu-intel-sycl-f32-vllm'
+            tag-suffix: '-gpu-intel-vllm'
            runs-on: 'ubuntu-latest'
            base-image: "quay.io/go-skynet/intel-oneapi-base:latest"
            skip-drivers: 'false'
            backend: "vllm"
            dockerfile: "./backend/Dockerfile.python"
            context: "./backend"
-          - build-type: 'sycl_f16'
+          - build-type: 'intel'
            cuda-major-version: ""
            cuda-minor-version: ""
            platforms: 'linux/amd64'
            tag-latest: 'auto'
-            tag-suffix: '-gpu-intel-sycl-f16-vllm'
-            runs-on: 'ubuntu-latest'
-            base-image: "quay.io/go-skynet/intel-oneapi-base:latest"
-            skip-drivers: 'false'
-            backend: "vllm"
-            dockerfile: "./backend/Dockerfile.python"
-            context: "./backend"
-          - build-type: 'sycl_f32'
-            cuda-major-version: ""
-            cuda-minor-version: ""
-            platforms: 'linux/amd64'
-            tag-latest: 'auto'
-            tag-suffix: '-gpu-intel-sycl-f32-transformers'
+            tag-suffix: '-gpu-intel-transformers'
            runs-on: 'ubuntu-latest'
            base-image: "quay.io/go-skynet/intel-oneapi-base:latest"
            skip-drivers: 'false'
            backend: "transformers"
            dockerfile: "./backend/Dockerfile.python"
            context: "./backend"
-          - build-type: 'sycl_f16'
+          - build-type: 'intel'
            cuda-major-version: ""
            cuda-minor-version: ""
            platforms: 'linux/amd64'
            tag-latest: 'auto'
-            tag-suffix: '-gpu-intel-sycl-f16-transformers'
-            runs-on: 'ubuntu-latest'
-            base-image: "quay.io/go-skynet/intel-oneapi-base:latest"
-            skip-drivers: 'false'
-            backend: "transformers"
-            dockerfile: "./backend/Dockerfile.python"
-            context: "./backend"
-          - build-type: 'sycl_f32'
-            cuda-major-version: ""
-            cuda-minor-version: ""
-            platforms: 'linux/amd64'
-            tag-latest: 'auto'
-            tag-suffix: '-gpu-intel-sycl-f32-diffusers'
+            tag-suffix: '-gpu-intel-diffusers'
            runs-on: 'ubuntu-latest'
            base-image: "quay.io/go-skynet/intel-oneapi-base:latest"
            skip-drivers: 'false'
@@ -490,96 +454,48 @@ jobs:
            dockerfile: "./backend/Dockerfile.python"
            context: "./backend"
          # SYCL additional backends
-          - build-type: 'sycl_f32'
+          - build-type: 'intel'
            cuda-major-version: ""
            cuda-minor-version: ""
            platforms: 'linux/amd64'
            tag-latest: 'auto'
-            tag-suffix: '-gpu-intel-sycl-f32-kokoro'
+            tag-suffix: '-gpu-intel-kokoro'
            runs-on: 'ubuntu-latest'
            base-image: "quay.io/go-skynet/intel-oneapi-base:latest"
            skip-drivers: 'false'
            backend: "kokoro"
            dockerfile: "./backend/Dockerfile.python"
            context: "./backend"
-          - build-type: 'sycl_f16'
+          - build-type: 'intel'
            cuda-major-version: ""
            cuda-minor-version: ""
            platforms: 'linux/amd64'
            tag-latest: 'auto'
-            tag-suffix: '-gpu-intel-sycl-f16-kokoro'
-            runs-on: 'ubuntu-latest'
-            base-image: "quay.io/go-skynet/intel-oneapi-base:latest"
-            skip-drivers: 'false'
-            backend: "kokoro"
-            dockerfile: "./backend/Dockerfile.python"
-            context: "./backend"
-          - build-type: 'sycl_f32'
-            cuda-major-version: ""
-            cuda-minor-version: ""
-            platforms: 'linux/amd64'
-            tag-latest: 'auto'
-            tag-suffix: '-gpu-intel-sycl-f32-faster-whisper'
+            tag-suffix: '-gpu-intel-faster-whisper'
            runs-on: 'ubuntu-latest'
            base-image: "quay.io/go-skynet/intel-oneapi-base:latest"
            skip-drivers: 'false'
            backend: "faster-whisper"
            dockerfile: "./backend/Dockerfile.python"
            context: "./backend"
-          - build-type: 'sycl_f16'
+          - build-type: 'intel'
            cuda-major-version: ""
            cuda-minor-version: ""
            platforms: 'linux/amd64'
            tag-latest: 'auto'
-            tag-suffix: '-gpu-intel-sycl-f16-faster-whisper'
-            runs-on: 'ubuntu-latest'
-            base-image: "quay.io/go-skynet/intel-oneapi-base:latest"
-            skip-drivers: 'false'
-            backend: "faster-whisper"
-            dockerfile: "./backend/Dockerfile.python"
-            context: "./backend"
-          - build-type: 'sycl_f32'
-            cuda-major-version: ""
-            cuda-minor-version: ""
-            platforms: 'linux/amd64'
-            tag-latest: 'auto'
-            tag-suffix: '-gpu-intel-sycl-f32-coqui'
+            tag-suffix: '-gpu-intel-coqui'
            runs-on: 'ubuntu-latest'
            base-image: "quay.io/go-skynet/intel-oneapi-base:latest"
            skip-drivers: 'false'
            backend: "coqui"
            dockerfile: "./backend/Dockerfile.python"
            context: "./backend"
-          - build-type: 'sycl_f16'
+          - build-type: 'intel'
            cuda-major-version: ""
            cuda-minor-version: ""
            platforms: 'linux/amd64'
            tag-latest: 'auto'
-            tag-suffix: '-gpu-intel-sycl-f16-coqui'
-            runs-on: 'ubuntu-latest'
-            base-image: "quay.io/go-skynet/intel-oneapi-base:latest"
-            skip-drivers: 'false'
-            backend: "coqui"
-            dockerfile: "./backend/Dockerfile.python"
-            context: "./backend"
-          - build-type: 'sycl_f32'
-            cuda-major-version: ""
-            cuda-minor-version: ""
-            platforms: 'linux/amd64'
-            tag-latest: 'auto'
-            tag-suffix: '-gpu-intel-sycl-f32-bark'
-            runs-on: 'ubuntu-latest'
-            base-image: "quay.io/go-skynet/intel-oneapi-base:latest"
-            skip-drivers: 'false'
-            backend: "bark"
-            dockerfile: "./backend/Dockerfile.python"
-            context: "./backend"
-          - build-type: 'sycl_f16'
-            cuda-major-version: ""
-            cuda-minor-version: ""
-            platforms: 'linux/amd64'
-            tag-latest: 'auto'
-            tag-suffix: '-gpu-intel-sycl-f16-bark'
+            tag-suffix: '-gpu-intel-bark'
            runs-on: 'ubuntu-latest'
            base-image: "quay.io/go-skynet/intel-oneapi-base:latest"
            skip-drivers: 'false'
@@ -868,7 +784,142 @@ jobs:
            skip-drivers: 'false'
            backend: "huggingface"
            dockerfile: "./backend/Dockerfile.golang"
-            context: "./"  
+            context: "./"
+          # rfdetr
+          - build-type: ''
+            cuda-major-version: ""
+            cuda-minor-version: ""
+            platforms: 'linux/amd64,linux/arm64'
+            tag-latest: 'auto'
+            tag-suffix: '-cpu-rfdetr'
+            runs-on: 'ubuntu-latest'
+            base-image: "ubuntu:22.04"
+            skip-drivers: 'false'
+            backend: "rfdetr"
+            dockerfile: "./backend/Dockerfile.python"
+            context: "./backend"
+          - build-type: 'cublas'
+            cuda-major-version: "12"
+            cuda-minor-version: "0"
+            platforms: 'linux/amd64'
+            tag-latest: 'auto'
+            tag-suffix: '-gpu-nvidia-cuda-12-rfdetr'
+            runs-on: 'ubuntu-latest'
+            base-image: "ubuntu:22.04"
+            skip-drivers: 'false'
+            backend: "rfdetr"
+            dockerfile: "./backend/Dockerfile.python"
+            context: "./backend"
+          - build-type: 'cublas'
+            cuda-major-version: "11"
+            cuda-minor-version: "7"
+            platforms: 'linux/amd64'
+            tag-latest: 'auto'
+            tag-suffix: '-gpu-nvidia-cuda-11-rfdetr'
+            runs-on: 'ubuntu-latest'
+            base-image: "ubuntu:22.04"
+            skip-drivers: 'false'
+            backend: "rfdetr"
+            dockerfile: "./backend/Dockerfile.python"
+            context: "./backend"
+          - build-type: 'intel'
+            cuda-major-version: ""
+            cuda-minor-version: ""
+            platforms: 'linux/amd64'
+            tag-latest: 'auto'
+            tag-suffix: '-gpu-intel-rfdetr'
+            runs-on: 'ubuntu-latest'
+            base-image: "quay.io/go-skynet/intel-oneapi-base:latest"
+            skip-drivers: 'false'
+            backend: "rfdetr"
+            dockerfile: "./backend/Dockerfile.python"
+            context: "./backend"
+          - build-type: 'cublas'
+            cuda-major-version: "12"
+            cuda-minor-version: "0"
+            platforms: 'linux/arm64'
+            skip-drivers: 'true'
+            tag-latest: 'auto'
+            tag-suffix: '-nvidia-l4t-arm64-rfdetr'
+            base-image: "nvcr.io/nvidia/l4t-jetpack:r36.4.0"
+            runs-on: 'ubuntu-24.04-arm'
+            backend: "rfdetr"
+            dockerfile: "./backend/Dockerfile.python"
+            context: "./backend"
+          # exllama2
+          - build-type: ''
+            cuda-major-version: ""
+            cuda-minor-version: ""
+            platforms: 'linux/amd64'
+            tag-latest: 'auto'
+            tag-suffix: '-cpu-exllama2'
+            runs-on: 'ubuntu-latest'
+            base-image: "ubuntu:22.04"
+            skip-drivers: 'false'
+            backend: "exllama2"
+            dockerfile: "./backend/Dockerfile.python"
+            context: "./backend"
+          - build-type: 'cublas'
+            cuda-major-version: "12"
+            cuda-minor-version: "0"
+            platforms: 'linux/amd64'
+            tag-latest: 'auto'
+            tag-suffix: '-gpu-nvidia-cuda-12-exllama2'
+            runs-on: 'ubuntu-latest'
+            base-image: "ubuntu:22.04"
+            skip-drivers: 'false'
+            backend: "exllama2"
+            dockerfile: "./backend/Dockerfile.python"
+            context: "./backend"
+          - build-type: 'cublas'
+            cuda-major-version: "11"
+            cuda-minor-version: "7"
+            platforms: 'linux/amd64'
+            tag-latest: 'auto'
+            tag-suffix: '-gpu-nvidia-cuda-11-exllama2'
+            runs-on: 'ubuntu-latest'
+            base-image: "ubuntu:22.04"
+            skip-drivers: 'false'
+            backend: "exllama2"
+            dockerfile: "./backend/Dockerfile.python"
+            context: "./backend"
+          - build-type: 'intel'
+            cuda-major-version: ""
+            cuda-minor-version: ""
+            platforms: 'linux/amd64'
+            tag-latest: 'auto'
+            tag-suffix: '-gpu-intel-exllama2'
+            runs-on: 'ubuntu-latest'
+            base-image: "quay.io/go-skynet/intel-oneapi-base:latest"
+            skip-drivers: 'false'
+            backend: "exllama2"
+            dockerfile: "./backend/Dockerfile.python"
+            context: "./backend"
+          - build-type: 'hipblas'
+            cuda-major-version: ""
+            cuda-minor-version: ""
+            platforms: 'linux/amd64'
+            skip-drivers: 'true'
+            tag-latest: 'auto'
+            tag-suffix: '-gpu-hipblas-exllama2'
+            base-image: "rocm/dev-ubuntu-22.04:6.1"
+            runs-on: 'ubuntu-latest'
+            backend: "exllama2"
+            dockerfile: "./backend/Dockerfile.python"
+            context: "./backend"
+          # runs out of space on the runner
+          # - build-type: 'hipblas'
+          #   cuda-major-version: ""
+          #   cuda-minor-version: ""
+          #   platforms: 'linux/amd64'
+          #   tag-latest: 'auto'
+          #   tag-suffix: '-gpu-hipblas-rfdetr'
+          #   base-image: "rocm/dev-ubuntu-22.04:6.1"
+          #   runs-on: 'ubuntu-latest'
+          #   skip-drivers: 'false'
+          #   backend: "rfdetr"
+          #   dockerfile: "./backend/Dockerfile.python"
+          #   context: "./backend"
  llama-cpp-darwin:
    runs-on: macOS-14
    strategy:
--- a/9
+++ b/9
@@ -155,6 +155,9 @@ backends/local-store: docker-build-local-store docker-save-local-store build
 backends/huggingface: docker-build-huggingface docker-save-huggingface build
 	./local-ai backends install "ocifile://$(abspath ./backend-images/huggingface.tar)"

+backends/rfdetr: docker-build-rfdetr docker-save-rfdetr build
+	./local-ai backends install "ocifile://$(abspath ./backend-images/rfdetr.tar)"
+
 ########################################################
 ## AIO tests
 ########################################################
@@ -373,6 +376,12 @@ docker-build-local-store:
 docker-build-huggingface:
 	docker build --build-arg BUILD_TYPE=$(BUILD_TYPE) --build-arg BASE_IMAGE=$(BASE_IMAGE) -t local-ai-backend:huggingface -f backend/Dockerfile.golang --build-arg BACKEND=huggingface .

+docker-build-rfdetr:
+	docker build --build-arg BUILD_TYPE=$(BUILD_TYPE) --build-arg BASE_IMAGE=$(BASE_IMAGE) -t local-ai-backend:rfdetr -f backend/Dockerfile.python --build-arg BACKEND=rfdetr ./backend
+
+docker-save-rfdetr: backend-images
+	docker save local-ai-backend:rfdetr -o backend-images/rfdetr.tar
+
 docker-save-huggingface: backend-images
 	docker save local-ai-backend:huggingface -o backend-images/huggingface.tar

--- a/README.md
+++ b/README.md
@@ -195,6 +195,7 @@ For more information, see [💻 Getting started](https://localai.io/basics/getti

 ## 📰 Latest project news

+- July/August 2025: 🔍 [Object Detection](https://localai.io/features/object-detection/) added to the API featuring [rf-detr](https://github.com/roboflow/rf-detr)
 - July 2025: All backends migrated outside of the main binary. LocalAI is now more lightweight, small, and automatically downloads the required backend to run the model. [Read the release notes](https://github.com/mudler/LocalAI/releases/tag/v3.2.0)
 - June 2025: [Backend management](https://github.com/mudler/LocalAI/pull/5607) has been added. Attention: extras images are going to be deprecated from the next release! Read [the backend management PR](https://github.com/mudler/LocalAI/pull/5607).
 - May 2025: [Audio input](https://github.com/mudler/LocalAI/pull/5466) and [Reranking](https://github.com/mudler/LocalAI/pull/5396) in llama.cpp backend, [Realtime API](https://github.com/mudler/LocalAI/pull/5392),  Support to Gemma, SmollVLM, and more multimodal models (available in the gallery).
@@ -228,6 +229,7 @@ Roadmap items: [List of issues](https://github.com/mudler/LocalAI/issues?q=is%3A
 - ✍️ [Constrained grammars](https://localai.io/features/constrained_grammars/)
 - 🖼️ [Download Models directly from Huggingface ](https://localai.io/models/)
 - 🥽 [Vision API](https://localai.io/features/gpt-vision/)
+- 🔍 [Object Detection](https://localai.io/features/object-detection/)
 - 📈 [Reranker API](https://localai.io/features/reranker/)
 - 🆕🖧 [P2P Inferencing](https://localai.io/features/distribute/)
 - [Agentic capabilities](https://github.com/mudler/LocalAGI)
--- a/backend/backend.proto
+++ b/backend/backend.proto
@@ -20,6 +20,7 @@ service Backend {
  rpc SoundGeneration(SoundGenerationRequest) returns (Result) {}
  rpc TokenizeString(PredictOptions) returns (TokenizationResponse) {}
  rpc Status(HealthMessage) returns (StatusResponse) {}
+  rpc Detect(DetectOptions) returns (DetectResponse) {}

  rpc StoresSet(StoresSetOptions) returns (Result) {}
  rpc StoresDelete(StoresDeleteOptions) returns (Result) {}
@@ -376,3 +377,20 @@ message Message {
  string role = 1;
  string content = 2;
 }
+
+message DetectOptions {
+  string src = 1;
+}
+
+message Detection {
+  float x = 1;
+  float y = 2;
+  float width = 3;
+  float height = 4;
+  float confidence = 5;
+  string class_name = 6;
+}
+
+message DetectResponse {
+  repeated Detection Detections = 1;
+}
--- a/backend/cpp/llama-cpp/Makefile
+++ b/backend/cpp/llama-cpp/Makefile
@@ -1,5 +1,5 @@

-LLAMA_VERSION?=c7f3169cd523140a288095f2d79befb20a0b73f4
+LLAMA_VERSION?=bf78f5439ee8e82e367674043303ebf8e92b4805
 LLAMA_REPO?=https://github.com/ggerganov/llama.cpp

 CMAKE_ARGS?=
--- a/backend/go/whisper/Makefile
+++ b/backend/go/whisper/Makefile
@@ -6,7 +6,7 @@ CMAKE_ARGS?=

 # whisper.cpp version
 WHISPER_REPO?=https://github.com/ggml-org/whisper.cpp
-WHISPER_CPP_VERSION?=7de8dd783f7b2eab56bff6bbc5d3369e34f0e77f
+WHISPER_CPP_VERSION?=e7bf0294ec9099b5fc21f5ba969805dfb2108cea

 export WHISPER_CMAKE_ARGS?=-DBUILD_SHARED_LIBS=OFF
 export WHISPER_DIR=$(abspath ./sources/whisper.cpp)
--- a/backend/index.yaml
+++ b/backend/index.yaml
@@ -73,6 +73,28 @@
    nvidia-l4t: "nvidia-l4t-arm64-stablediffusion-ggml"
    # metal: "metal-stablediffusion-ggml"
    # darwin-x86: "darwin-x86-stablediffusion-ggml"
+- &rfdetr
+  name: "rfdetr"
+  alias: "rfdetr"
+  license: apache-2.0
+  icon: https://avatars.githubusercontent.com/u/53104118?s=200&v=4
+  description: |
+    RF-DETR is a real-time, transformer-based object detection model architecture developed by Roboflow and released under the Apache 2.0 license.
+    RF-DETR is the first real-time model to exceed 60 AP on the Microsoft COCO benchmark alongside competitive performance at base sizes. It also achieves state-of-the-art performance on RF100-VL, an object detection benchmark that measures model domain adaptability to real world problems. RF-DETR is fastest and most accurate for its size when compared current real-time objection models.
+    RF-DETR is small enough to run on the edge using Inference, making it an ideal model for deployments that need both strong accuracy and real-time performance.
+  urls:
+    - https://github.com/roboflow/rf-detr
+  tags:
+    - object-detection
+    - rfdetr
+    - gpu
+    - cpu
+  capabilities:
+    nvidia: "cuda12-rfdetr"
+    intel: "intel-rfdetr"
+    #amd: "rocm-rfdetr"
+    nvidia-l4t: "nvidia-l4t-arm64-rfdetr"
+    default: "cpu-rfdetr"
 - &vllm
  name: "vllm"
  license: apache-2.0
@@ -104,13 +126,13 @@
  capabilities:
    nvidia: "cuda12-vllm"
    amd: "rocm-vllm"
-    intel: "intel-sycl-f16-vllm"
+    intel: "intel-vllm"
 - &rerankers
  name: "rerankers"
  alias: "rerankers"
  capabilities:
    nvidia: "cuda12-rerankers"
-    intel: "intel-sycl-f16-rerankers"
+    intel: "intel-rerankers"
    amd: "rocm-rerankers"
 - &transformers
  name: "transformers"
@@ -127,7 +149,7 @@
    - multimodal
  capabilities:
    nvidia: "cuda12-transformers"
-    intel: "intel-sycl-f16-transformers"
+    intel: "intel-transformers"
    amd: "rocm-transformers"
 - &diffusers
  name: "diffusers"
@@ -144,7 +166,7 @@
  alias: "diffusers"
  capabilities:
    nvidia: "cuda12-diffusers"
-    intel: "intel-sycl-f32-diffusers"
+    intel: "intel-diffusers"
    amd: "rocm-diffusers"
 - &exllama2
  name: "exllama2"
@@ -160,8 +182,7 @@
  alias: "exllama2"
  capabilities:
    nvidia: "cuda12-exllama2"
-    intel: "intel-sycl-f32-exllama2"
-    amd: "rocm-exllama2"
+    intel: "intel-exllama2"
 - &faster-whisper
  icon: https://avatars.githubusercontent.com/u/1520500?s=200&v=4
  description: |
@@ -176,7 +197,7 @@
  name: "faster-whisper"
  capabilities:
    nvidia: "cuda12-faster-whisper"
-    intel: "intel-sycl-f32-faster-whisper"
+    intel: "intel-faster-whisper"
    amd: "rocm-faster-whisper"
 - &kokoro
  icon: https://avatars.githubusercontent.com/u/166769057?v=4
@@ -194,7 +215,7 @@
  name: "kokoro"
  capabilities:
    nvidia: "cuda12-kokoro"
-    intel: "intel-sycl-f32-kokoro"
+    intel: "intel-kokoro"
    amd: "rocm-kokoro"
 - &coqui
  urls:
@@ -215,7 +236,7 @@
  alias: "coqui"
  capabilities:
    nvidia: "cuda12-coqui"
-    intel: "intel-sycl-f32-coqui"
+    intel: "intel-coqui"
    amd: "rocm-coqui"
  icon: https://avatars.githubusercontent.com/u/1338804?s=200&v=4
 - &bark
@@ -231,7 +252,7 @@
  alias: "bark"
  capabilities:
    cuda: "cuda12-bark"
-    intel: "intel-sycl-f32-bark"
+    intel: "intel-bark"
    rocm: "rocm-bark"
  icon: https://avatars.githubusercontent.com/u/99442120?s=200&v=4
 - &barkcpp
@@ -622,7 +643,7 @@
  capabilities:
    nvidia: "cuda12-vllm-development"
    amd: "rocm-vllm-development"
-    intel: "intel-sycl-f16-vllm-development"
+    intel: "intel-vllm-development"
 - !!merge <<: *vllm
  name: "cuda12-vllm"
  uri: "quay.io/go-skynet/local-ai-backends:latest-gpu-nvidia-cuda-12-vllm"
@@ -634,15 +655,10 @@
  mirrors:
    - localai/localai-backends:latest-gpu-rocm-hipblas-vllm
 - !!merge <<: *vllm
-  name: "intel-sycl-f32-vllm"
-  uri: "quay.io/go-skynet/local-ai-backends:latest-gpu-intel-sycl-f32-vllm"
+  name: "intel-vllm"
+  uri: "quay.io/go-skynet/local-ai-backends:latest-gpu-intel-vllm"
  mirrors:
-    - localai/localai-backends:latest-gpu-intel-sycl-f32-vllm
- !!merge <<: *vllm
-  name: "intel-sycl-f16-vllm"
-  uri: "quay.io/go-skynet/local-ai-backends:latest-gpu-intel-sycl-f16-vllm"
-  mirrors:
-    - localai/localai-backends:latest-gpu-intel-sycl-f16-vllm
+    - localai/localai-backends:latest-gpu-intel-vllm
 - !!merge <<: *vllm
  name: "cuda12-vllm-development"
  uri: "quay.io/go-skynet/local-ai-backends:master-gpu-nvidia-cuda-12-vllm"
@@ -654,21 +670,75 @@
  mirrors:
    - localai/localai-backends:master-gpu-rocm-hipblas-vllm
 - !!merge <<: *vllm
-  name: "intel-sycl-f32-vllm-development"
-  uri: "quay.io/go-skynet/local-ai-backends:master-gpu-intel-sycl-f32-vllm"
+  name: "intel-vllm-development"
+  uri: "quay.io/go-skynet/local-ai-backends:master-gpu-intel-vllm"
  mirrors:
-    - localai/localai-backends:master-gpu-intel-sycl-f32-vllm
- !!merge <<: *vllm
-  name: "intel-sycl-f16-vllm-development"
-  uri: "quay.io/go-skynet/local-ai-backends:master-gpu-intel-sycl-f16-vllm"
+    - localai/localai-backends:master-gpu-intel-vllm
+# rfdetr
+- !!merge <<: *rfdetr
+  name: "rfdetr-development"
+  capabilities:
+    nvidia: "cuda12-rfdetr-development"
+    intel: "intel-rfdetr-development"
+    #amd: "rocm-rfdetr-development"
+    nvidia-l4t: "nvidia-l4t-arm64-rfdetr-development"
+    default: "cpu-rfdetr-development"
+- !!merge <<: *rfdetr
+  name: "cuda12-rfdetr"
+  uri: "quay.io/go-skynet/local-ai-backends:latest-gpu-nvidia-cuda-12-rfdetr"
  mirrors:
-    - localai/localai-backends:master-gpu-intel-sycl-f16-vllm
+    - localai/localai-backends:latest-gpu-nvidia-cuda-12-rfdetr
+- !!merge <<: *rfdetr
+  name: "intel-rfdetr"
+  uri: "quay.io/go-skynet/local-ai-backends:latest-gpu-intel-rfdetr"
+  mirrors:
+    - localai/localai-backends:latest-gpu-intel-rfdetr
+# - !!merge <<: *rfdetr
+#   name: "rocm-rfdetr"
+#   uri: "quay.io/go-skynet/local-ai-backends:latest-gpu-hipblas-rfdetr"
+#   mirrors:
+#     - localai/localai-backends:latest-gpu-hipblas-rfdetr
+- !!merge <<: *rfdetr
+  name: "nvidia-l4t-arm64-rfdetr"
+  uri: "quay.io/go-skynet/local-ai-backends:latest-nvidia-l4t-arm64-rfdetr"
+  mirrors:
+    - localai/localai-backends:latest-nvidia-l4t-arm64-rfdetr
+- !!merge <<: *rfdetr
+  name: "cpu-rfdetr"
+  uri: "quay.io/go-skynet/local-ai-backends:latest-cpu-rfdetr"
+  mirrors:
+    - localai/localai-backends:latest-cpu-rfdetr
+- !!merge <<: *rfdetr
+  name: "cuda12-rfdetr-development"
+  uri: "quay.io/go-skynet/local-ai-backends:master-gpu-nvidia-cuda-12-rfdetr"
+  mirrors:
+    - localai/localai-backends:master-gpu-nvidia-cuda-12-rfdetr
+- !!merge <<: *rfdetr
+  name: "intel-rfdetr-development"
+  uri: "quay.io/go-skynet/local-ai-backends:master-gpu-intel-rfdetr"
+  mirrors:
+    - localai/localai-backends:master-gpu-intel-rfdetr
+# - !!merge <<: *rfdetr
+#   name: "rocm-rfdetr-development"
+#   uri: "quay.io/go-skynet/local-ai-backends:master-gpu-hipblas-rfdetr"
+#   mirrors:
+#     - localai/localai-backends:master-gpu-hipblas-rfdetr
+- !!merge <<: *rfdetr
+  name: "cpu-rfdetr-development"
+  uri: "quay.io/go-skynet/local-ai-backends:master-cpu-rfdetr"
+  mirrors:
+    - localai/localai-backends:master-cpu-rfdetr
+- !!merge <<: *rfdetr
+  name: "intel-rfdetr"
+  uri: "quay.io/go-skynet/local-ai-backends:latest-gpu-intel-rfdetr"
+  mirrors:
+    - localai/localai-backends:latest-gpu-intel-rfdetr
 ## Rerankers
 - !!merge <<: *rerankers
  name: "rerankers-development"
  capabilities:
    nvidia: "cuda12-rerankers-development"
-    intel: "intel-sycl-f16-rerankers-development"
+    intel: "intel-rerankers-development"
    amd: "rocm-rerankers-development"
 - !!merge <<: *rerankers
  name: "cuda11-rerankers"
@@ -681,15 +751,10 @@
  mirrors:
    - localai/localai-backends:latest-gpu-nvidia-cuda-12-rerankers
 - !!merge <<: *rerankers
-  name: "intel-sycl-f32-rerankers"
-  uri: "quay.io/go-skynet/local-ai-backends:latest-gpu-intel-sycl-f32-rerankers"
+  name: "intel-rerankers"
+  uri: "quay.io/go-skynet/local-ai-backends:latest-gpu-intel-rerankers"
  mirrors:
-    - localai/localai-backends:latest-gpu-intel-sycl-f32-rerankers
- !!merge <<: *rerankers
-  name: "intel-sycl-f16-rerankers"
-  uri: "quay.io/go-skynet/local-ai-backends:latest-gpu-intel-sycl-f16-rerankers"
-  mirrors:
-    - localai/localai-backends:latest-gpu-intel-sycl-f16-rerankers
+    - localai/localai-backends:latest-gpu-intel-rerankers
 - !!merge <<: *rerankers
  name: "rocm-rerankers"
  uri: "quay.io/go-skynet/local-ai-backends:latest-gpu-rocm-hipblas-rerankers"
@@ -711,21 +776,16 @@
  mirrors:
    - localai/localai-backends:master-gpu-rocm-hipblas-rerankers
 - !!merge <<: *rerankers
-  name: "intel-sycl-f32-rerankers-development"
-  uri: "quay.io/go-skynet/local-ai-backends:master-gpu-intel-sycl-f32-rerankers"
+  name: "intel-rerankers-development"
+  uri: "quay.io/go-skynet/local-ai-backends:master-gpu-intel-rerankers"
  mirrors:
-    - localai/localai-backends:master-gpu-intel-sycl-f32-rerankers
- !!merge <<: *rerankers
-  name: "intel-sycl-f16-rerankers-development"
-  uri: "quay.io/go-skynet/local-ai-backends:master-gpu-intel-sycl-f16-rerankers"
-  mirrors:
-    - localai/localai-backends:master-gpu-intel-sycl-f16-rerankers
+    - localai/localai-backends:master-gpu-intel-rerankers
 ## Transformers
 - !!merge <<: *transformers
  name: "transformers-development"
  capabilities:
    nvidia: "cuda12-transformers-development"
-    intel: "intel-sycl-f16-transformers-development"
+    intel: "intel-transformers-development"
    amd: "rocm-transformers-development"
 - !!merge <<: *transformers
  name: "cuda12-transformers"
@@ -738,15 +798,10 @@
  mirrors:
    - localai/localai-backends:latest-gpu-rocm-hipblas-transformers
 - !!merge <<: *transformers
-  name: "intel-sycl-f32-transformers"
-  uri: "quay.io/go-skynet/local-ai-backends:latest-gpu-intel-sycl-f32-transformers"
+  name: "intel-transformers"
+  uri: "quay.io/go-skynet/local-ai-backends:latest-gpu-intel-transformers"
  mirrors:
-    - localai/localai-backends:latest-gpu-intel-sycl-f32-transformers
- !!merge <<: *transformers
-  name: "intel-sycl-f16-transformers"
-  uri: "quay.io/go-skynet/local-ai-backends:latest-gpu-intel-sycl-f16-transformers"
-  mirrors:
-    - localai/localai-backends:latest-gpu-intel-sycl-f16-transformers
+    - localai/localai-backends:latest-gpu-intel-transformers
 - !!merge <<: *transformers
  name: "cuda11-transformers-development"
  uri: "quay.io/go-skynet/local-ai-backends:master-gpu-nvidia-cuda-11-transformers"
@@ -768,21 +823,16 @@
  mirrors:
    - localai/localai-backends:master-gpu-rocm-hipblas-transformers
 - !!merge <<: *transformers
-  name: "intel-sycl-f32-transformers-development"
-  uri: "quay.io/go-skynet/local-ai-backends:master-gpu-intel-sycl-f32-transformers"
+  name: "intel-transformers-development"
+  uri: "quay.io/go-skynet/local-ai-backends:master-gpu-intel-transformers"
  mirrors:
-    - localai/localai-backends:master-gpu-intel-sycl-f32-transformers
- !!merge <<: *transformers
-  name: "intel-sycl-f16-transformers-development"
-  uri: "quay.io/go-skynet/local-ai-backends:master-gpu-intel-sycl-f16-transformers"
-  mirrors:
-    - localai/localai-backends:master-gpu-intel-sycl-f16-transformers
+    - localai/localai-backends:master-gpu-intel-transformers
 ## Diffusers
 - !!merge <<: *diffusers
  name: "diffusers-development"
  capabilities:
    nvidia: "cuda12-diffusers-development"
-    intel: "intel-sycl-f32-diffusers-development"
+    intel: "intel-diffusers-development"
    amd: "rocm-diffusers-development"
 - !!merge <<: *diffusers
  name: "cuda12-diffusers"
@@ -800,10 +850,10 @@
  mirrors:
    - localai/localai-backends:latest-gpu-nvidia-cuda-11-diffusers
 - !!merge <<: *diffusers
-  name: "intel-sycl-f32-diffusers"
-  uri: "quay.io/go-skynet/local-ai-backends:latest-gpu-intel-sycl-f32-diffusers"
+  name: "intel-diffusers"
+  uri: "quay.io/go-skynet/local-ai-backends:latest-gpu-intel-diffusers"
  mirrors:
-    - localai/localai-backends:latest-gpu-intel-sycl-f32-diffusers
+    - localai/localai-backends:latest-gpu-intel-diffusers
 - !!merge <<: *diffusers
  name: "cuda11-diffusers-development"
  uri: "quay.io/go-skynet/local-ai-backends:master-gpu-nvidia-cuda-11-diffusers"
@@ -820,17 +870,16 @@
  mirrors:
    - localai/localai-backends:master-gpu-rocm-hipblas-diffusers
 - !!merge <<: *diffusers
-  name: "intel-sycl-f32-diffusers-development"
-  uri: "quay.io/go-skynet/local-ai-backends:master-gpu-intel-sycl-f32-diffusers"
+  name: "intel-diffusers-development"
+  uri: "quay.io/go-skynet/local-ai-backends:master-gpu-intel-diffusers"
  mirrors:
-    - localai/localai-backends:master-gpu-intel-sycl-f32-diffusers
+    - localai/localai-backends:master-gpu-intel-diffusers
  ## exllama2
 - !!merge <<: *exllama2
  name: "exllama2-development"
  capabilities:
    nvidia: "cuda12-exllama2-development"
-    intel: "intel-sycl-f32-exllama2-development"
-    amd: "rocm-exllama2-development"
+    intel: "intel-exllama2-development"
 - !!merge <<: *exllama2
  name: "cuda11-exllama2"
  uri: "quay.io/go-skynet/local-ai-backends:latest-gpu-nvidia-cuda-11-exllama2"
@@ -856,7 +905,7 @@
  name: "kokoro-development"
  capabilities:
    nvidia: "cuda12-kokoro-development"
-    intel: "intel-sycl-f32-kokoro-development"
+    intel: "intel-kokoro-development"
    amd: "rocm-kokoro-development"
 - !!merge <<: *kokoro
  name: "cuda11-kokoro-development"
@@ -874,25 +923,15 @@
  mirrors:
    - localai/localai-backends:master-gpu-rocm-hipblas-kokoro
 - !!merge <<: *kokoro
-  name: "sycl-f32-kokoro"
-  uri: "quay.io/go-skynet/local-ai-backends:latest-gpu-intel-sycl-f32-kokoro"
+  name: "intel-kokoro"
+  uri: "quay.io/go-skynet/local-ai-backends:latest-gpu-intel-kokoro"
  mirrors:
-    - localai/localai-backends:latest-gpu-intel-sycl-f32-kokoro
+    - localai/localai-backends:latest-gpu-intel-kokoro
 - !!merge <<: *kokoro
-  name: "sycl-f16-kokoro"
-  uri: "quay.io/go-skynet/local-ai-backends:latest-gpu-intel-sycl-f16-kokoro"
+  name: "intel-kokoro-development"
+  uri: "quay.io/go-skynet/local-ai-backends:master-gpu-intel-kokoro"
  mirrors:
-    - localai/localai-backends:latest-gpu-intel-sycl-f16-kokoro
- !!merge <<: *kokoro
-  name: "sycl-f16-kokoro-development"
-  uri: "quay.io/go-skynet/local-ai-backends:master-gpu-intel-sycl-f16-kokoro"
-  mirrors:
-    - localai/localai-backends:master-gpu-intel-sycl-f16-kokoro
- !!merge <<: *kokoro
-  name: "sycl-f32-kokoro-development"
-  uri: "quay.io/go-skynet/local-ai-backends:master-gpu-intel-sycl-f32-kokoro"
-  mirrors:
-    - localai/localai-backends:master-gpu-intel-sycl-f32-kokoro
+    - localai/localai-backends:master-gpu-intel-kokoro
 - !!merge <<: *kokoro
  name: "cuda11-kokoro"
  uri: "quay.io/go-skynet/local-ai-backends:latest-gpu-nvidia-cuda-11-kokoro"
@@ -913,7 +952,7 @@
  name: "faster-whisper-development"
  capabilities:
    nvidia: "cuda12-faster-whisper-development"
-    intel: "intel-sycl-f32-faster-whisper-development"
+    intel: "intel-faster-whisper-development"
    amd: "rocm-faster-whisper-development"
 - !!merge <<: *faster-whisper
  name: "cuda11-faster-whisper"
@@ -931,32 +970,22 @@
  mirrors:
    - localai/localai-backends:master-gpu-rocm-hipblas-faster-whisper
 - !!merge <<: *faster-whisper
-  name: "sycl-f32-faster-whisper"
-  uri: "quay.io/go-skynet/local-ai-backends:latest-gpu-intel-sycl-f32-faster-whisper"
+  name: "intel-faster-whisper"
+  uri: "quay.io/go-skynet/local-ai-backends:latest-gpu-intel-faster-whisper"
  mirrors:
-    - localai/localai-backends:latest-gpu-intel-sycl-f32-faster-whisper
+    - localai/localai-backends:latest-gpu-intel-faster-whisper
 - !!merge <<: *faster-whisper
-  name: "sycl-f16-faster-whisper"
-  uri: "quay.io/go-skynet/local-ai-backends:latest-gpu-intel-sycl-f16-faster-whisper"
+  name: "intel-faster-whisper-development"
+  uri: "quay.io/go-skynet/local-ai-backends:master-gpu-intel-faster-whisper"
  mirrors:
-    - localai/localai-backends:latest-gpu-intel-sycl-f16-faster-whisper
- !!merge <<: *faster-whisper
-  name: "sycl-f32-faster-whisper-development"
-  uri: "quay.io/go-skynet/local-ai-backends:master-gpu-intel-sycl-f32-faster-whisper"
-  mirrors:
-    - localai/localai-backends:master-gpu-intel-sycl-f32-faster-whisper
- !!merge <<: *faster-whisper
-  name: "sycl-f16-faster-whisper-development"
-  uri: "quay.io/go-skynet/local-ai-backends:master-gpu-intel-sycl-f16-faster-whisper"
-  mirrors:
-    - localai/localai-backends:master-gpu-intel-sycl-f16-faster-whisper
+    - localai/localai-backends:master-gpu-intel-faster-whisper
 ## coqui

 - !!merge <<: *coqui
  name: "coqui-development"
  capabilities:
    nvidia: "cuda12-coqui-development"
-    intel: "intel-sycl-f32-coqui-development"
+    intel: "intel-coqui-development"
    amd: "rocm-coqui-development"
 - !!merge <<: *coqui
  name: "cuda11-coqui"
@@ -984,25 +1013,15 @@
  mirrors:
    - localai/localai-backends:master-gpu-rocm-hipblas-coqui
 - !!merge <<: *coqui
-  name: "sycl-f32-coqui"
-  uri: "quay.io/go-skynet/local-ai-backends:latest-gpu-intel-sycl-f32-coqui"
+  name: "intel-coqui"
+  uri: "quay.io/go-skynet/local-ai-backends:latest-gpu-intel-coqui"
  mirrors:
-    - localai/localai-backends:latest-gpu-intel-sycl-f32-coqui
+    - localai/localai-backends:latest-gpu-intel-coqui
 - !!merge <<: *coqui
-  name: "sycl-f16-coqui"
-  uri: "quay.io/go-skynet/local-ai-backends:latest-gpu-intel-sycl-f16-coqui"
+  name: "intel-coqui-development"
+  uri: "quay.io/go-skynet/local-ai-backends:master-gpu-intel-coqui"
  mirrors:
-    - localai/localai-backends:latest-gpu-intel-sycl-f16-coqui
- !!merge <<: *coqui
-  name: "sycl-f32-coqui-development"
-  uri: "quay.io/go-skynet/local-ai-backends:master-gpu-intel-sycl-f32-coqui"
-  mirrors:
-    - localai/localai-backends:master-gpu-intel-sycl-f32-coqui
- !!merge <<: *coqui
-  name: "sycl-f16-coqui-development"
-  uri: "quay.io/go-skynet/local-ai-backends:master-gpu-intel-sycl-f16-coqui"
-  mirrors:
-    - localai/localai-backends:master-gpu-intel-sycl-f16-coqui
+    - localai/localai-backends:master-gpu-intel-coqui
 - !!merge <<: *coqui
  name: "rocm-coqui"
  uri: "quay.io/go-skynet/local-ai-backends:latest-gpu-rocm-hipblas-coqui"
@@ -1013,7 +1032,7 @@
  name: "bark-development"
  capabilities:
    nvidia: "cuda12-bark-development"
-    intel: "intel-sycl-f32-bark-development"
+    intel: "intel-bark-development"
    amd: "rocm-bark-development"
 - !!merge <<: *bark
  name: "cuda11-bark-development"
@@ -1031,25 +1050,15 @@
  mirrors:
    - localai/localai-backends:master-gpu-rocm-hipblas-bark
 - !!merge <<: *bark
-  name: "sycl-f32-bark"
-  uri: "quay.io/go-skynet/local-ai-backends:latest-gpu-intel-sycl-f32-bark"
+  name: "intel-bark"
+  uri: "quay.io/go-skynet/local-ai-backends:latest-gpu-intel-bark"
  mirrors:
-    - localai/localai-backends:latest-gpu-intel-sycl-f32-bark
+    - localai/localai-backends:latest-gpu-intel-bark
 - !!merge <<: *bark
-  name: "sycl-f16-bark"
-  uri: "quay.io/go-skynet/local-ai-backends:latest-gpu-intel-sycl-f16-bark"
+  name: "intel-bark-development"
+  uri: "quay.io/go-skynet/local-ai-backends:master-gpu-intel-bark"
  mirrors:
-    - localai/localai-backends:latest-gpu-intel-sycl-f16-bark
- !!merge <<: *bark
-  name: "sycl-f32-bark-development"
-  uri: "quay.io/go-skynet/local-ai-backends:master-gpu-intel-sycl-f32-bark"
-  mirrors:
-    - localai/localai-backends:master-gpu-intel-sycl-f32-bark
- !!merge <<: *bark
-  name: "sycl-f16-bark-development"
-  uri: "quay.io/go-skynet/local-ai-backends:master-gpu-intel-sycl-f16-bark"
-  mirrors:
-    - localai/localai-backends:master-gpu-intel-sycl-f16-bark
+    - localai/localai-backends:master-gpu-intel-bark
 - !!merge <<: *bark
  name: "cuda12-bark"
  uri: "quay.io/go-skynet/local-ai-backends:latest-gpu-nvidia-cuda-12-bark"
--- a/backend/python/common/template/protogen.sh
+++ b/backend/python/common/template/protogen.sh
@@ -8,4 +8,6 @@ else
    source $backend_dir/../common/libbackend.sh
 fi

+ensureVenv
+
 python3 -m grpc_tools.protoc -I../.. -I./ --python_out=. --grpc_python_out=. backend.proto
--- a/backend/python/rfdetr/Makefile
+++ b/backend/python/rfdetr/Makefile
@@ -0,0 +1,20 @@
+.DEFAULT_GOAL := install
+
+.PHONY: install
+install:
+	bash install.sh
+	$(MAKE) protogen
+
+.PHONY: protogen
+protogen: backend_pb2_grpc.py backend_pb2.py
+
+.PHONY: protogen-clean
+protogen-clean:
+	$(RM) backend_pb2_grpc.py backend_pb2.py
+
+backend_pb2_grpc.py backend_pb2.py:
+	bash protogen.sh
+
+.PHONY: clean
+clean: protogen-clean
+	rm -rf venv __pycache__
--- a/backend/python/rfdetr/backend.py
+++ b/backend/python/rfdetr/backend.py
@@ -0,0 +1,174 @@
+#!/usr/bin/env python3
+"""
+gRPC server for RFDETR object detection models.
+"""
+from concurrent import futures
+
+import argparse
+import signal
+import sys
+import os
+import time
+import base64
+import backend_pb2
+import backend_pb2_grpc
+import grpc
+
+import requests
+
+import supervision as sv
+from inference import get_model
+from PIL import Image
+from io import BytesIO
+
+_ONE_DAY_IN_SECONDS = 60 * 60 * 24
+
+# If MAX_WORKERS are specified in the environment use it, otherwise default to 1
+MAX_WORKERS = int(os.environ.get('PYTHON_GRPC_MAX_WORKERS', '1'))
+
+# Implement the BackendServicer class with the service methods
+class BackendServicer(backend_pb2_grpc.BackendServicer):
+    """
+    A gRPC servicer for the RFDETR backend service.
+
+    This class implements the gRPC methods for object detection using RFDETR models.
+    """
+    
+    def __init__(self):
+        self.model = None
+        self.model_name = None
+        
+    def Health(self, request, context):
+        """
+        A gRPC method that returns the health status of the backend service.
+
+        Args:
+            request: A HealthMessage object that contains the request parameters.
+            context: A grpc.ServicerContext object that provides information about the RPC.
+
+        Returns:
+            A Reply object that contains the health status of the backend service.
+        """
+        return backend_pb2.Reply(message=bytes("OK", 'utf-8'))
+
+    def LoadModel(self, request, context):
+        """
+        A gRPC method that loads a RFDETR model into memory.
+
+        Args:
+            request: A ModelOptions object that contains the model parameters.
+            context: A grpc.ServicerContext object that provides information about the RPC.
+
+        Returns:
+            A Result object that contains the result of the LoadModel operation.
+        """
+        model_name = request.Model
+        try:
+            # Load the RFDETR model
+            self.model = get_model(model_name)
+            self.model_name = model_name
+            print(f'Loaded RFDETR model: {model_name}')
+        except Exception as err:
+            return backend_pb2.Result(success=False, message=f"Failed to load model: {err}")
+
+        return backend_pb2.Result(message="Model loaded successfully", success=True)
+
+    def Detect(self, request, context):
+        """
+        A gRPC method that performs object detection on an image.
+
+        Args:
+            request: A DetectOptions object that contains the image source.
+            context: A grpc.ServicerContext object that provides information about the RPC.
+
+        Returns:
+            A DetectResponse object that contains the detection results.
+        """
+        if self.model is None:
+            print(f"Model is None")
+            return backend_pb2.DetectResponse()
+        print(f"Model is not None")
+        try:
+            print(f"Decoding image")
+            # Decode the base64 image
+            print(f"Image data: {request.src}")
+
+            image_data = base64.b64decode(request.src)
+            image = Image.open(BytesIO(image_data))
+            
+            # Perform inference
+            predictions = self.model.infer(image, confidence=0.5)[0]
+          
+            # Convert to proto format
+            proto_detections = []
+            for i in range(len(predictions.predictions)):
+                pred = predictions.predictions[i]
+                print(f"Prediction: {pred}")
+                proto_detection = backend_pb2.Detection(
+                    x=float(pred.x),
+                    y=float(pred.y),
+                    width=float(pred.width),
+                    height=float(pred.height),
+                    confidence=float(pred.confidence),
+                    class_name=pred.class_name
+                )
+                proto_detections.append(proto_detection)
+            
+            return backend_pb2.DetectResponse(Detections=proto_detections)
+        except Exception as err:
+            print(f"Detection error: {err}")
+            return backend_pb2.DetectResponse()
+
+    def Status(self, request, context):
+        """
+        A gRPC method that returns the status of the backend service.
+
+        Args:
+            request: A HealthMessage object that contains the request parameters.
+            context: A grpc.ServicerContext object that provides information about the RPC.
+
+        Returns:
+            A StatusResponse object that contains the status information.
+        """
+        state = backend_pb2.StatusResponse.READY if self.model is not None else backend_pb2.StatusResponse.UNINITIALIZED
+        return backend_pb2.StatusResponse(state=state)
+
+def serve(address):
+    server = grpc.server(futures.ThreadPoolExecutor(max_workers=MAX_WORKERS),
+        options=[
+            ('grpc.max_message_length', 50 * 1024 * 1024),  # 50MB
+            ('grpc.max_send_message_length', 50 * 1024 * 1024),  # 50MB
+            ('grpc.max_receive_message_length', 50 * 1024 * 1024),  # 50MB
+        ])
+    backend_pb2_grpc.add_BackendServicer_to_server(BackendServicer(), server)
+    server.add_insecure_port(address)
+    server.start()
+    print("[RFDETR] Server started. Listening on: " + address, file=sys.stderr)
+
+    # Define the signal handler function
+    def signal_handler(sig, frame):
+        print("[RFDETR] Received termination signal. Shutting down...")
+        server.stop(0)
+        sys.exit(0)
+
+    # Set the signal handlers for SIGINT and SIGTERM
+    signal.signal(signal.SIGINT, signal_handler)
+    signal.signal(signal.SIGTERM, signal_handler)
+
+    try:
+        while True:
+            time.sleep(_ONE_DAY_IN_SECONDS)
+    except KeyboardInterrupt:
+        server.stop(0)
+
+if __name__ == "__main__":
+    parser = argparse.ArgumentParser(description="Run the RFDETR gRPC server.")
+    parser.add_argument(
+        "--addr", default="localhost:50051", help="The address to bind the server to."
+    )
+    args = parser.parse_args()
+    print(f"[RFDETR] startup: {args}", file=sys.stderr)
+    serve(args.addr)
+
+
+
--- a/backend/python/rfdetr/install.sh
+++ b/backend/python/rfdetr/install.sh
@@ -0,0 +1,19 @@
+#!/bin/bash
+set -e
+
+backend_dir=$(dirname $0)
+if [ -d $backend_dir/common ]; then
+    source $backend_dir/common/libbackend.sh
+else
+    source $backend_dir/../common/libbackend.sh
+fi
+
+# This is here because the Intel pip index is broken and returns 200 status codes for every package name, it just doesn't return any package links.
+# This makes uv think that the package exists in the Intel pip index, and by default it stops looking at other pip indexes once it finds a match.
+# We need uv to continue falling through to the pypi default index to find optimum[openvino] in the pypi index
+# the --upgrade actually allows us to *downgrade* torch to the version provided in the Intel pip index
+if [ "x${BUILD_PROFILE}" == "xintel" ]; then
+    EXTRA_PIP_INSTALL_FLAGS+=" --upgrade --index-strategy=unsafe-first-match"
+fi
+
+installRequirements
--- a/backend/python/rfdetr/protogen.sh
+++ b/backend/python/rfdetr/protogen.sh
@@ -0,0 +1,13 @@
+#!/bin/bash
+set -e
+
+backend_dir=$(dirname $0)
+if [ -d $backend_dir/common ]; then
+    source $backend_dir/common/libbackend.sh
+else
+    source $backend_dir/../common/libbackend.sh
+fi
+
+ensureVenv
+
+python3 -m grpc_tools.protoc -I../.. -I./ --python_out=. --grpc_python_out=. backend.proto
--- a/backend/python/rfdetr/requirements-cpu.txt
+++ b/backend/python/rfdetr/requirements-cpu.txt
@@ -0,0 +1,7 @@
+rfdetr
+opencv-python
+accelerate
+peft
+inference
+torch==2.7.1
+optimum-quanto
--- a/backend/python/rfdetr/requirements-cublas11.txt
+++ b/backend/python/rfdetr/requirements-cublas11.txt
@@ -0,0 +1,8 @@
+--extra-index-url https://download.pytorch.org/whl/cu118
+torch==2.7.1+cu118
+rfdetr
+opencv-python
+accelerate
+inference
+peft
+optimum-quanto
--- a/backend/python/rfdetr/requirements-cublas12.txt
+++ b/backend/python/rfdetr/requirements-cublas12.txt
@@ -0,0 +1,7 @@
+torch==2.7.1
+rfdetr
+opencv-python
+accelerate
+inference
+peft
+optimum-quanto
--- a/backend/python/rfdetr/requirements-hipblas.txt
+++ b/backend/python/rfdetr/requirements-hipblas.txt
@@ -0,0 +1,9 @@
+--extra-index-url https://download.pytorch.org/whl/rocm6.3
+torch==2.7.1+rocm6.3
+torchvision==0.22.1+rocm6.3
+rfdetr
+opencv-python
+accelerate
+inference
+peft
+optimum-quanto
--- a/backend/python/rfdetr/requirements-intel.txt
+++ b/backend/python/rfdetr/requirements-intel.txt
@@ -0,0 +1,13 @@
+--extra-index-url https://pytorch-extension.intel.com/release-whl/stable/xpu/us/
+intel-extension-for-pytorch==2.3.110+xpu
+torch==2.3.1+cxx11.abi
+torchvision==0.18.1+cxx11.abi
+oneccl_bind_pt==2.3.100+xpu
+optimum[openvino]
+setuptools
+rfdetr
+inference
+opencv-python
+accelerate
+peft
+optimum-quanto
--- a/backend/python/rfdetr/requirements.txt
+++ b/backend/python/rfdetr/requirements.txt
@@ -0,0 +1,3 @@
+grpcio==1.71.0
+protobuf
+grpcio-tools
--- a/backend/python/rfdetr/run.sh
+++ b/backend/python/rfdetr/run.sh
@@ -0,0 +1,9 @@
+#!/bin/bash
+backend_dir=$(dirname $0)
+if [ -d $backend_dir/common ]; then
+    source $backend_dir/common/libbackend.sh
+else
+    source $backend_dir/../common/libbackend.sh
+fi
+
+startBackend $@
--- a/backend/python/rfdetr/test.sh
+++ b/backend/python/rfdetr/test.sh
@@ -0,0 +1,11 @@
+#!/bin/bash
+set -e
+
+backend_dir=$(dirname $0)
+if [ -d $backend_dir/common ]; then
+    source $backend_dir/common/libbackend.sh
+else
+    source $backend_dir/../common/libbackend.sh
+fi
+
+runUnittests
--- a/core/backend/detection.go
+++ b/core/backend/detection.go
@@ -0,0 +1,34 @@
+package backend
+
+import (
+	"context"
+	"fmt"
+
+	"github.com/mudler/LocalAI/core/config"
+	"github.com/mudler/LocalAI/pkg/grpc/proto"
+	"github.com/mudler/LocalAI/pkg/model"
+)
+
+func Detection(
+	sourceFile string,
+	loader *model.ModelLoader,
+	appConfig *config.ApplicationConfig,
+	backendConfig config.BackendConfig,
+) (*proto.DetectResponse, error) {
+	opts := ModelOptions(backendConfig, appConfig)
+	detectionModel, err := loader.Load(opts...)
+	if err != nil {
+		return nil, err
+	}
+	defer loader.Close()
+
+	if detectionModel == nil {
+		return nil, fmt.Errorf("could not load detection model")
+	}
+
+	res, err := detectionModel.Detect(context.Background(), &proto.DetectOptions{
+		Src: sourceFile,
+	})
+
+	return res, err
+}
--- a/core/cli/run.go
+++ b/core/cli/run.go
@@ -25,7 +25,6 @@ type RunCMD struct {
 	ModelsPath                   string        `env:"LOCALAI_MODELS_PATH,MODELS_PATH" type:"path" default:"${basepath}/models" help:"Path containing models used for inferencing" group:"storage"`
 	GeneratedContentPath         string        `env:"LOCALAI_GENERATED_CONTENT_PATH,GENERATED_CONTENT_PATH" type:"path" default:"/tmp/generated/content" help:"Location for generated content (e.g. images, audio, videos)" group:"storage"`
 	UploadPath                   string        `env:"LOCALAI_UPLOAD_PATH,UPLOAD_PATH" type:"path" default:"/tmp/localai/upload" help:"Path to store uploads from files api" group:"storage"`
-	ConfigPath                   string        `env:"LOCALAI_CONFIG_PATH,CONFIG_PATH" default:"/tmp/localai/config" group:"storage"`
 	LocalaiConfigDir             string        `env:"LOCALAI_CONFIG_DIR" type:"path" default:"${basepath}/configuration" help:"Directory for dynamic loading of certain configuration files (currently api_keys.json and external_backends.json)" group:"storage"`
 	LocalaiConfigDirPollInterval time.Duration `env:"LOCALAI_CONFIG_DIR_POLL_INTERVAL" help:"Typically the config path picks up changes automatically, but if your system has broken fsnotify events, set this to an interval to poll the LocalAI Config Dir (example: 1m)" group:"storage"`
 	// The alias on this option is there to preserve functionality with the old `--config-file` parameter
@@ -88,7 +87,6 @@ func (r *RunCMD) Run(ctx *cliContext.Context) error {
 		config.WithDebug(zerolog.GlobalLevel() <= zerolog.DebugLevel),
 		config.WithGeneratedContentDir(r.GeneratedContentPath),
 		config.WithUploadDir(r.UploadPath),
-		config.WithConfigsDir(r.ConfigPath),
 		config.WithDynamicConfigDir(r.LocalaiConfigDir),
 		config.WithDynamicConfigDirPollInterval(r.LocalaiConfigDirPollInterval),
 		config.WithF16(r.F16),
--- a/core/config/application_config.go
+++ b/core/config/application_config.go
@@ -21,8 +21,7 @@ type ApplicationConfig struct {
 	Debug                               bool
 	GeneratedContentDir                 string

-	ConfigsDir string
-	UploadDir  string
+	UploadDir string

 	DynamicConfigsDir             string
 	DynamicConfigsDirPollInterval time.Duration
@@ -302,12 +301,6 @@ func WithUploadDir(uploadDir string) AppOption {
 	}
 }

-func WithConfigsDir(configsDir string) AppOption {
-	return func(o *ApplicationConfig) {
-		o.ConfigsDir = configsDir
-	}
-}
-
 func WithDynamicConfigDir(dynamicConfigsDir string) AppOption {
 	return func(o *ApplicationConfig) {
 		o.DynamicConfigsDir = dynamicConfigsDir
--- a/core/config/backend_config.go
+++ b/core/config/backend_config.go
@@ -458,6 +458,7 @@ const (
 	FLAG_TOKENIZE         BackendConfigUsecases = 0b001000000000
 	FLAG_VAD              BackendConfigUsecases = 0b010000000000
 	FLAG_VIDEO            BackendConfigUsecases = 0b100000000000
+	FLAG_DETECTION        BackendConfigUsecases = 0b1000000000000

 	// Common Subsets
 	FLAG_LLM BackendConfigUsecases = FLAG_CHAT | FLAG_COMPLETION | FLAG_EDIT
@@ -479,6 +480,7 @@ func GetAllBackendConfigUsecases() map[string]BackendConfigUsecases {
 		"FLAG_VAD":              FLAG_VAD,
 		"FLAG_LLM":              FLAG_LLM,
 		"FLAG_VIDEO":            FLAG_VIDEO,
+		"FLAG_DETECTION":        FLAG_DETECTION,
 	}
 }

@@ -572,6 +574,12 @@ func (c *BackendConfig) GuessUsecases(u BackendConfigUsecases) bool {
 		}
 	}

+	if (u & FLAG_DETECTION) == FLAG_DETECTION {
+		if c.Backend != "rfdetr" {
+			return false
+		}
+	}
+
 	if (u & FLAG_SOUND_GENERATION) == FLAG_SOUND_GENERATION {
 		if c.Backend != "transformers-musicgen" {
 			return false
--- a/core/gallery/gallery.go
+++ b/core/gallery/gallery.go
@@ -95,7 +95,7 @@ func FindGalleryElement[T GalleryElement](models []T, name string, basePath stri

 	if !strings.Contains(name, "@") {
 		for _, m := range models {
-			if strings.EqualFold(m.GetName(), name) {
+			if strings.EqualFold(strings.ToLower(m.GetName()), strings.ToLower(name)) {
 				model = m
 				break
 			}
@@ -103,7 +103,7 @@ func FindGalleryElement[T GalleryElement](models []T, name string, basePath stri

 	} else {
 		for _, m := range models {
-			if strings.EqualFold(name, fmt.Sprintf("%s@%s", m.GetGallery().Name, m.GetName())) {
+			if strings.EqualFold(strings.ToLower(name), strings.ToLower(fmt.Sprintf("%s@%s", m.GetGallery().Name, m.GetName()))) {
 				model = m
 				break
 			}
--- a/core/http/app.go
+++ b/core/http/app.go
@@ -10,10 +10,8 @@ import (

 	"github.com/dave-gray101/v2keyauth"
 	"github.com/gofiber/websocket/v2"
-	"github.com/mudler/LocalAI/pkg/utils"

 	"github.com/mudler/LocalAI/core/http/endpoints/localai"
-	"github.com/mudler/LocalAI/core/http/endpoints/openai"
 	"github.com/mudler/LocalAI/core/http/middleware"
 	"github.com/mudler/LocalAI/core/http/routes"

@@ -199,11 +197,6 @@ func API(application *application.Application) (*fiber.App, error) {
 		router.Use(csrf.New())
 	}

-	// Load config jsons
-	utils.LoadConfig(application.ApplicationConfig().UploadDir, openai.UploadedFilesFile, &openai.UploadedFiles)
-	utils.LoadConfig(application.ApplicationConfig().ConfigsDir, openai.AssistantsConfigFile, &openai.Assistants)
-	utils.LoadConfig(application.ApplicationConfig().ConfigsDir, openai.AssistantsFileConfigFile, &openai.AssistantFiles)
-
 	galleryService := services.NewGalleryService(application.ApplicationConfig(), application.ModelLoader())
 	err = galleryService.Start(application.ApplicationConfig().Context, application.BackendLoader())
 	if err != nil {
--- a/core/http/endpoints/localai/detection.go
+++ b/core/http/endpoints/localai/detection.go
@@ -0,0 +1,59 @@
+package localai
+
+import (
+	"github.com/gofiber/fiber/v2"
+	"github.com/mudler/LocalAI/core/backend"
+	"github.com/mudler/LocalAI/core/config"
+	"github.com/mudler/LocalAI/core/http/middleware"
+	"github.com/mudler/LocalAI/core/schema"
+	"github.com/mudler/LocalAI/pkg/model"
+	"github.com/mudler/LocalAI/pkg/utils"
+	"github.com/rs/zerolog/log"
+)
+
+// DetectionEndpoint is the LocalAI Detection endpoint https://localai.io/docs/api-reference/detection
+// @Summary Detects objects in the input image.
+// @Param request body schema.DetectionRequest true "query params"
+// @Success 200 {object} schema.DetectionResponse "Response"
+// @Router /v1/detection [post]
+func DetectionEndpoint(cl *config.BackendConfigLoader, ml *model.ModelLoader, appConfig *config.ApplicationConfig) func(c *fiber.Ctx) error {
+	return func(c *fiber.Ctx) error {
+
+		input, ok := c.Locals(middleware.CONTEXT_LOCALS_KEY_LOCALAI_REQUEST).(*schema.DetectionRequest)
+		if !ok || input.Model == "" {
+			return fiber.ErrBadRequest
+		}
+
+		cfg, ok := c.Locals(middleware.CONTEXT_LOCALS_KEY_MODEL_CONFIG).(*config.BackendConfig)
+		if !ok || cfg == nil {
+			return fiber.ErrBadRequest
+		}
+
+		log.Debug().Str("image", input.Image).Str("modelFile", "modelFile").Str("backend", cfg.Backend).Msg("Detection")
+
+		image, err := utils.GetContentURIAsBase64(input.Image)
+		if err != nil {
+			return err
+		}
+
+		res, err := backend.Detection(image, ml, appConfig, *cfg)
+		if err != nil {
+			return err
+		}
+
+		response := schema.DetectionResponse{
+			Detections: make([]schema.Detection, len(res.Detections)),
+		}
+		for i, detection := range res.Detections {
+			response.Detections[i] = schema.Detection{
+				X:         detection.X,
+				Y:         detection.Y,
+				Width:     detection.Width,
+				Height:    detection.Height,
+				ClassName: detection.ClassName,
+			}
+		}
+
+		return c.JSON(response)
+	}
+}
--- a/core/http/endpoints/openai/assistant.go
+++ b/core/http/endpoints/openai/assistant.go
@@ -1,522 +0,0 @@
-package openai
-
-import (
-	"fmt"
-	"net/http"
-	"sort"
-	"strconv"
-	"strings"
-	"sync/atomic"
-	"time"
-
-	"github.com/gofiber/fiber/v2"
-	"github.com/microcosm-cc/bluemonday"
-	"github.com/mudler/LocalAI/core/config"
-	"github.com/mudler/LocalAI/core/schema"
-	"github.com/mudler/LocalAI/core/services"
-	model "github.com/mudler/LocalAI/pkg/model"
-	"github.com/mudler/LocalAI/pkg/utils"
-	"github.com/rs/zerolog/log"
-)
-
-// ToolType defines a type for tool options
-type ToolType string
-
-const (
-	CodeInterpreter ToolType = "code_interpreter"
-	Retrieval       ToolType = "retrieval"
-	Function        ToolType = "function"
-
-	MaxCharacterInstructions  = 32768
-	MaxCharacterDescription   = 512
-	MaxCharacterName          = 256
-	MaxToolsSize              = 128
-	MaxFileIdSize             = 20
-	MaxCharacterMetadataKey   = 64
-	MaxCharacterMetadataValue = 512
-)
-
-type Tool struct {
-	Type ToolType `json:"type"`
-}
-
-// Assistant represents the structure of an assistant object from the OpenAI API.
-type Assistant struct {
-	ID           string            `json:"id"`                     // The unique identifier of the assistant.
-	Object       string            `json:"object"`                 // Object type, which is "assistant".
-	Created      int64             `json:"created"`                // The time at which the assistant was created.
-	Model        string            `json:"model"`                  // The model ID used by the assistant.
-	Name         string            `json:"name,omitempty"`         // The name of the assistant.
-	Description  string            `json:"description,omitempty"`  // The description of the assistant.
-	Instructions string            `json:"instructions,omitempty"` // The system instructions that the assistant uses.
-	Tools        []Tool            `json:"tools,omitempty"`        // A list of tools enabled on the assistant.
-	FileIDs      []string          `json:"file_ids,omitempty"`     // A list of file IDs attached to this assistant.
-	Metadata     map[string]string `json:"metadata,omitempty"`     // Set of key-value pairs attached to the assistant.
-}
-
-var (
-	Assistants           = []Assistant{} // better to return empty array instead of "null"
-	AssistantsConfigFile = "assistants.json"
-)
-
-type AssistantRequest struct {
-	Model        string            `json:"model"`
-	Name         string            `json:"name,omitempty"`
-	Description  string            `json:"description,omitempty"`
-	Instructions string            `json:"instructions,omitempty"`
-	Tools        []Tool            `json:"tools,omitempty"`
-	FileIDs      []string          `json:"file_ids,omitempty"`
-	Metadata     map[string]string `json:"metadata,omitempty"`
-}
-
-// CreateAssistantEndpoint is the OpenAI Assistant API endpoint https://platform.openai.com/docs/api-reference/assistants/createAssistant
-// @Summary Create an assistant with a model and instructions.
-// @Param request body AssistantRequest true "query params"
-// @Success 200 {object} Assistant "Response"
-// @Router /v1/assistants [post]
-func CreateAssistantEndpoint(cl *config.BackendConfigLoader, ml *model.ModelLoader, appConfig *config.ApplicationConfig) func(c *fiber.Ctx) error {
-	return func(c *fiber.Ctx) error {
-		request := new(AssistantRequest)
-		if err := c.BodyParser(request); err != nil {
-			log.Warn().AnErr("Unable to parse AssistantRequest", err)
-			return c.Status(fiber.StatusBadRequest).JSON(fiber.Map{"error": "Cannot parse JSON"})
-		}
-
-		if !modelExists(cl, ml, request.Model) {
-			log.Warn().Msgf("Model: %s was not found in list of models.", request.Model)
-			return c.Status(fiber.StatusBadRequest).SendString(bluemonday.StrictPolicy().Sanitize(fmt.Sprintf("Model %q not found", request.Model)))
-		}
-
-		if request.Tools == nil {
-			request.Tools = []Tool{}
-		}
-
-		if request.FileIDs == nil {
-			request.FileIDs = []string{}
-		}
-
-		if request.Metadata == nil {
-			request.Metadata = make(map[string]string)
-		}
-
-		id := "asst_" + strconv.FormatInt(generateRandomID(), 10)
-
-		assistant := Assistant{
-			ID:           id,
-			Object:       "assistant",
-			Created:      time.Now().Unix(),
-			Model:        request.Model,
-			Name:         request.Name,
-			Description:  request.Description,
-			Instructions: request.Instructions,
-			Tools:        request.Tools,
-			FileIDs:      request.FileIDs,
-			Metadata:     request.Metadata,
-		}
-
-		Assistants = append(Assistants, assistant)
-		utils.SaveConfig(appConfig.ConfigsDir, AssistantsConfigFile, Assistants)
-		return c.Status(fiber.StatusOK).JSON(assistant)
-	}
-}
-
-var currentId int64 = 0
-
-func generateRandomID() int64 {
-	atomic.AddInt64(&currentId, 1)
-	return currentId
-}
-
-// ListAssistantsEndpoint is the OpenAI Assistant API endpoint to list assistents https://platform.openai.com/docs/api-reference/assistants/listAssistants
-// @Summary List available assistents
-// @Param limit query int false "Limit the number of assistants returned"
-// @Param order query string false "Order of assistants returned"
-// @Param after query string false "Return assistants created after the given ID"
-// @Param before query string false "Return assistants created before the given ID"
-// @Success 200 {object} []Assistant "Response"
-// @Router /v1/assistants [get]
-func ListAssistantsEndpoint(cl *config.BackendConfigLoader, ml *model.ModelLoader, appConfig *config.ApplicationConfig) func(c *fiber.Ctx) error {
-	return func(c *fiber.Ctx) error {
-		// Because we're altering the existing assistants list we should just duplicate it for now.
-		returnAssistants := Assistants
-		// Parse query parameters
-		limitQuery := c.Query("limit", "20")
-		orderQuery := c.Query("order", "desc")
-		afterQuery := c.Query("after")
-		beforeQuery := c.Query("before")
-
-		// Convert string limit to integer
-		limit, err := strconv.Atoi(limitQuery)
-		if err != nil {
-			return c.Status(http.StatusBadRequest).SendString(bluemonday.StrictPolicy().Sanitize(fmt.Sprintf("Invalid limit query value: %s", limitQuery)))
-		}
-
-		// Sort assistants
-		sort.SliceStable(returnAssistants, func(i, j int) bool {
-			if orderQuery == "asc" {
-				return returnAssistants[i].Created < returnAssistants[j].Created
-			}
-			return returnAssistants[i].Created > returnAssistants[j].Created
-		})
-
-		// After and before cursors
-		if afterQuery != "" {
-			returnAssistants = filterAssistantsAfterID(returnAssistants, afterQuery)
-		}
-		if beforeQuery != "" {
-			returnAssistants = filterAssistantsBeforeID(returnAssistants, beforeQuery)
-		}
-
-		// Apply limit
-		if limit < len(returnAssistants) {
-			returnAssistants = returnAssistants[:limit]
-		}
-
-		return c.JSON(returnAssistants)
-	}
-}
-
-// FilterAssistantsBeforeID filters out those assistants whose ID comes before the given ID
-// We assume that the assistants are already sorted
-func filterAssistantsBeforeID(assistants []Assistant, id string) []Assistant {
-	idInt, err := strconv.Atoi(id)
-	if err != nil {
-		return assistants // Return original slice if invalid id format is provided
-	}
-
-	var filteredAssistants []Assistant
-
-	for _, assistant := range assistants {
-		aid, err := strconv.Atoi(strings.TrimPrefix(assistant.ID, "asst_"))
-		if err != nil {
-			continue // Skip if invalid id in assistant
-		}
-
-		if aid < idInt {
-			filteredAssistants = append(filteredAssistants, assistant)
-		}
-	}
-
-	return filteredAssistants
-}
-
-// FilterAssistantsAfterID filters out those assistants whose ID comes after the given ID
-// We assume that the assistants are already sorted
-func filterAssistantsAfterID(assistants []Assistant, id string) []Assistant {
-	idInt, err := strconv.Atoi(id)
-	if err != nil {
-		return assistants // Return original slice if invalid id format is provided
-	}
-
-	var filteredAssistants []Assistant
-
-	for _, assistant := range assistants {
-		aid, err := strconv.Atoi(strings.TrimPrefix(assistant.ID, "asst_"))
-		if err != nil {
-			continue // Skip if invalid id in assistant
-		}
-
-		if aid > idInt {
-			filteredAssistants = append(filteredAssistants, assistant)
-		}
-	}
-
-	return filteredAssistants
-}
-
-func modelExists(cl *config.BackendConfigLoader, ml *model.ModelLoader, modelName string) (found bool) {
-	found = false
-	models, err := services.ListModels(cl, ml, config.NoFilterFn, services.SKIP_IF_CONFIGURED)
-	if err != nil {
-		return
-	}
-
-	for _, model := range models {
-		if model == modelName {
-			found = true
-			return
-		}
-	}
-	return
-}
-
-// DeleteAssistantEndpoint is the OpenAI Assistant API endpoint to delete assistents https://platform.openai.com/docs/api-reference/assistants/deleteAssistant
-// @Summary Delete assistents
-// @Success 200 {object} schema.DeleteAssistantResponse "Response"
-// @Router /v1/assistants/{assistant_id} [delete]
-func DeleteAssistantEndpoint(cl *config.BackendConfigLoader, ml *model.ModelLoader, appConfig *config.ApplicationConfig) func(c *fiber.Ctx) error {
-	return func(c *fiber.Ctx) error {
-		assistantID := c.Params("assistant_id")
-		if assistantID == "" {
-			return c.Status(fiber.StatusBadRequest).SendString("parameter assistant_id is required")
-		}
-
-		for i, assistant := range Assistants {
-			if assistant.ID == assistantID {
-				Assistants = append(Assistants[:i], Assistants[i+1:]...)
-				utils.SaveConfig(appConfig.ConfigsDir, AssistantsConfigFile, Assistants)
-				return c.Status(fiber.StatusOK).JSON(schema.DeleteAssistantResponse{
-					ID:      assistantID,
-					Object:  "assistant.deleted",
-					Deleted: true,
-				})
-			}
-		}
-
-		log.Warn().Msgf("Unable to find assistant %s for deletion", assistantID)
-		return c.Status(fiber.StatusNotFound).JSON(schema.DeleteAssistantResponse{
-			ID:      assistantID,
-			Object:  "assistant.deleted",
-			Deleted: false,
-		})
-	}
-}
-
-// GetAssistantEndpoint is the OpenAI Assistant API endpoint to get assistents https://platform.openai.com/docs/api-reference/assistants/getAssistant
-// @Summary Get assistent data
-// @Success 200 {object} Assistant "Response"
-// @Router /v1/assistants/{assistant_id} [get]
-func GetAssistantEndpoint(cl *config.BackendConfigLoader, ml *model.ModelLoader, appConfig *config.ApplicationConfig) func(c *fiber.Ctx) error {
-	return func(c *fiber.Ctx) error {
-		assistantID := c.Params("assistant_id")
-		if assistantID == "" {
-			return c.Status(fiber.StatusBadRequest).SendString("parameter assistant_id is required")
-		}
-
-		for _, assistant := range Assistants {
-			if assistant.ID == assistantID {
-				return c.Status(fiber.StatusOK).JSON(assistant)
-			}
-		}
-
-		return c.Status(fiber.StatusNotFound).SendString(bluemonday.StrictPolicy().Sanitize(fmt.Sprintf("Unable to find assistant with id: %s", assistantID)))
-	}
-}
-
-type AssistantFile struct {
-	ID          string `json:"id"`
-	Object      string `json:"object"`
-	CreatedAt   int64  `json:"created_at"`
-	AssistantID string `json:"assistant_id"`
-}
-
-var (
-	AssistantFiles           []AssistantFile
-	AssistantsFileConfigFile = "assistantsFile.json"
-)
-
-func CreateAssistantFileEndpoint(cl *config.BackendConfigLoader, ml *model.ModelLoader, appConfig *config.ApplicationConfig) func(c *fiber.Ctx) error {
-	return func(c *fiber.Ctx) error {
-		request := new(schema.AssistantFileRequest)
-		if err := c.BodyParser(request); err != nil {
-			return c.Status(fiber.StatusBadRequest).JSON(fiber.Map{"error": "Cannot parse JSON"})
-		}
-
-		assistantID := c.Params("assistant_id")
-		if assistantID == "" {
-			return c.Status(fiber.StatusBadRequest).SendString("parameter assistant_id is required")
-		}
-
-		for _, assistant := range Assistants {
-			if assistant.ID == assistantID {
-				if len(assistant.FileIDs) > MaxFileIdSize {
-					return c.Status(fiber.StatusBadRequest).SendString(fmt.Sprintf("Max files %d for assistant %s reached.", MaxFileIdSize, assistant.Name))
-				}
-
-				for _, file := range UploadedFiles {
-					if file.ID == request.FileID {
-						assistant.FileIDs = append(assistant.FileIDs, request.FileID)
-						assistantFile := AssistantFile{
-							ID:          file.ID,
-							Object:      "assistant.file",
-							CreatedAt:   time.Now().Unix(),
-							AssistantID: assistant.ID,
-						}
-						AssistantFiles = append(AssistantFiles, assistantFile)
-						utils.SaveConfig(appConfig.ConfigsDir, AssistantsFileConfigFile, AssistantFiles)
-						return c.Status(fiber.StatusOK).JSON(assistantFile)
-					}
-				}
-
-				return c.Status(fiber.StatusNotFound).SendString(bluemonday.StrictPolicy().Sanitize(fmt.Sprintf("Unable to find file_id: %s", request.FileID)))
-			}
-		}
-
-		return c.Status(fiber.StatusNotFound).SendString(bluemonday.StrictPolicy().Sanitize(fmt.Sprintf("Unable to find %q", assistantID)))
-	}
-}
-
-func ListAssistantFilesEndpoint(cl *config.BackendConfigLoader, ml *model.ModelLoader, appConfig *config.ApplicationConfig) func(c *fiber.Ctx) error {
-	type ListAssistantFiles struct {
-		Data   []schema.File
-		Object string
-	}
-
-	return func(c *fiber.Ctx) error {
-		assistantID := c.Params("assistant_id")
-		if assistantID == "" {
-			return c.Status(fiber.StatusBadRequest).SendString("parameter assistant_id is required")
-		}
-
-		limitQuery := c.Query("limit", "20")
-		order := c.Query("order", "desc")
-		limit, err := strconv.Atoi(limitQuery)
-		if err != nil || limit < 1 || limit > 100 {
-			limit = 20 // Default to 20 if there's an error or the limit is out of bounds
-		}
-
-		// Sort files by CreatedAt depending on the order query parameter
-		if order == "asc" {
-			sort.Slice(AssistantFiles, func(i, j int) bool {
-				return AssistantFiles[i].CreatedAt < AssistantFiles[j].CreatedAt
-			})
-		} else { // default to "desc"
-			sort.Slice(AssistantFiles, func(i, j int) bool {
-				return AssistantFiles[i].CreatedAt > AssistantFiles[j].CreatedAt
-			})
-		}
-
-		// Limit the number of files returned
-		var limitedFiles []AssistantFile
-		hasMore := false
-		if len(AssistantFiles) > limit {
-			hasMore = true
-			limitedFiles = AssistantFiles[:limit]
-		} else {
-			limitedFiles = AssistantFiles
-		}
-
-		response := map[string]interface{}{
-			"object": "list",
-			"data":   limitedFiles,
-			"first_id": func() string {
-				if len(limitedFiles) > 0 {
-					return limitedFiles[0].ID
-				}
-				return ""
-			}(),
-			"last_id": func() string {
-				if len(limitedFiles) > 0 {
-					return limitedFiles[len(limitedFiles)-1].ID
-				}
-				return ""
-			}(),
-			"has_more": hasMore,
-		}
-
-		return c.Status(fiber.StatusOK).JSON(response)
-	}
-}
-
-func ModifyAssistantEndpoint(cl *config.BackendConfigLoader, ml *model.ModelLoader, appConfig *config.ApplicationConfig) func(c *fiber.Ctx) error {
-	return func(c *fiber.Ctx) error {
-		request := new(AssistantRequest)
-		if err := c.BodyParser(request); err != nil {
-			log.Warn().AnErr("Unable to parse AssistantRequest", err)
-			return c.Status(fiber.StatusBadRequest).JSON(fiber.Map{"error": "Cannot parse JSON"})
-		}
-
-		assistantID := c.Params("assistant_id")
-		if assistantID == "" {
-			return c.Status(fiber.StatusBadRequest).SendString("parameter assistant_id is required")
-		}
-
-		for i, assistant := range Assistants {
-			if assistant.ID == assistantID {
-				newAssistant := Assistant{
-					ID:           assistantID,
-					Object:       assistant.Object,
-					Created:      assistant.Created,
-					Model:        request.Model,
-					Name:         request.Name,
-					Description:  request.Description,
-					Instructions: request.Instructions,
-					Tools:        request.Tools,
-					FileIDs:      request.FileIDs, // todo: should probably verify fileids exist
-					Metadata:     request.Metadata,
-				}
-
-				// Remove old one and replace with new one
-				Assistants = append(Assistants[:i], Assistants[i+1:]...)
-				Assistants = append(Assistants, newAssistant)
-				utils.SaveConfig(appConfig.ConfigsDir, AssistantsConfigFile, Assistants)
-				return c.Status(fiber.StatusOK).JSON(newAssistant)
-			}
-		}
-		return c.Status(fiber.StatusNotFound).SendString(bluemonday.StrictPolicy().Sanitize(fmt.Sprintf("Unable to find assistant with id: %s", assistantID)))
-	}
-}
-
-func DeleteAssistantFileEndpoint(cl *config.BackendConfigLoader, ml *model.ModelLoader, appConfig *config.ApplicationConfig) func(c *fiber.Ctx) error {
-	return func(c *fiber.Ctx) error {
-		assistantID := c.Params("assistant_id")
-		fileId := c.Params("file_id")
-		if assistantID == "" {
-			return c.Status(fiber.StatusBadRequest).SendString("parameter assistant_id and file_id are required")
-		}
-		// First remove file from assistant
-		for i, assistant := range Assistants {
-			if assistant.ID == assistantID {
-				for j, fileId := range assistant.FileIDs {
-					Assistants[i].FileIDs = append(Assistants[i].FileIDs[:j], Assistants[i].FileIDs[j+1:]...)
-
-					// Check if the file exists in the assistantFiles slice
-					for i, assistantFile := range AssistantFiles {
-						if assistantFile.ID == fileId {
-							// Remove the file from the assistantFiles slice
-							AssistantFiles = append(AssistantFiles[:i], AssistantFiles[i+1:]...)
-							utils.SaveConfig(appConfig.ConfigsDir, AssistantsFileConfigFile, AssistantFiles)
-							return c.Status(fiber.StatusOK).JSON(schema.DeleteAssistantFileResponse{
-								ID:      fileId,
-								Object:  "assistant.file.deleted",
-								Deleted: true,
-							})
-						}
-					}
-				}
-
-				log.Warn().Msgf("Unable to locate file_id: %s in assistants: %s. Continuing to delete assistant file.", fileId, assistantID)
-				for i, assistantFile := range AssistantFiles {
-					if assistantFile.AssistantID == assistantID {
-
-						AssistantFiles = append(AssistantFiles[:i], AssistantFiles[i+1:]...)
-						utils.SaveConfig(appConfig.ConfigsDir, AssistantsFileConfigFile, AssistantFiles)
-
-						return c.Status(fiber.StatusNotFound).JSON(schema.DeleteAssistantFileResponse{
-							ID:      fileId,
-							Object:  "assistant.file.deleted",
-							Deleted: true,
-						})
-					}
-				}
-			}
-		}
-		log.Warn().Msgf("Unable to find assistant: %s", assistantID)
-
-		return c.Status(fiber.StatusNotFound).JSON(schema.DeleteAssistantFileResponse{
-			ID:      fileId,
-			Object:  "assistant.file.deleted",
-			Deleted: false,
-		})
-	}
-}
-
-func GetAssistantFileEndpoint(cl *config.BackendConfigLoader, ml *model.ModelLoader, appConfig *config.ApplicationConfig) func(c *fiber.Ctx) error {
-	return func(c *fiber.Ctx) error {
-		assistantID := c.Params("assistant_id")
-		fileId := c.Params("file_id")
-		if assistantID == "" {
-			return c.Status(fiber.StatusBadRequest).SendString("parameter assistant_id and file_id are required")
-		}
-
-		for _, assistantFile := range AssistantFiles {
-			if assistantFile.AssistantID == assistantID {
-				if assistantFile.ID == fileId {
-					return c.Status(fiber.StatusOK).JSON(assistantFile)
-				}
-				return c.Status(fiber.StatusNotFound).SendString(bluemonday.StrictPolicy().Sanitize(fmt.Sprintf("Unable to find assistant file with file_id: %s", fileId)))
-			}
-		}
-		return c.Status(fiber.StatusNotFound).SendString(bluemonday.StrictPolicy().Sanitize(fmt.Sprintf("Unable to find assistant file with assistant_id: %s", assistantID)))
-	}
-}
--- a/core/http/endpoints/openai/assistant_test.go
+++ b/core/http/endpoints/openai/assistant_test.go
@@ -1,460 +0,0 @@
-package openai
-
-import (
-	"encoding/json"
-	"fmt"
-	"io"
-	"net/http"
-	"net/http/httptest"
-	"os"
-	"path/filepath"
-	"strings"
-	"testing"
-	"time"
-
-	"github.com/gofiber/fiber/v2"
-	"github.com/mudler/LocalAI/core/config"
-	"github.com/mudler/LocalAI/core/schema"
-	"github.com/mudler/LocalAI/pkg/model"
-	"github.com/stretchr/testify/assert"
-)
-
-var configsDir string = "/tmp/localai/configs"
-
-type MockLoader struct {
-	models []string
-}
-
-func tearDown() func() {
-	return func() {
-		UploadedFiles = []schema.File{}
-		Assistants = []Assistant{}
-		AssistantFiles = []AssistantFile{}
-		_ = os.Remove(filepath.Join(configsDir, AssistantsConfigFile))
-		_ = os.Remove(filepath.Join(configsDir, AssistantsFileConfigFile))
-	}
-}
-
-func TestAssistantEndpoints(t *testing.T) {
-	// Preparing the mocked objects
-	cl := &config.BackendConfigLoader{}
-	//configsDir := "/tmp/localai/configs"
-	modelPath := "/tmp/localai/model"
-	var ml = model.NewModelLoader(modelPath, false)
-
-	appConfig := &config.ApplicationConfig{
-		ConfigsDir:    configsDir,
-		UploadLimitMB: 10,
-		UploadDir:     "test_dir",
-		ModelPath:     modelPath,
-	}
-
-	_ = os.RemoveAll(appConfig.ConfigsDir)
-	_ = os.MkdirAll(appConfig.ConfigsDir, 0750)
-	_ = os.MkdirAll(modelPath, 0750)
-	os.Create(filepath.Join(modelPath, "ggml-gpt4all-j"))
-
-	app := fiber.New(fiber.Config{
-		BodyLimit: 20 * 1024 * 1024, // sets the limit to 20MB.
-	})
-
-	// Create a Test Server
-	app.Get("/assistants", ListAssistantsEndpoint(cl, ml, appConfig))
-	app.Post("/assistants", CreateAssistantEndpoint(cl, ml, appConfig))
-	app.Delete("/assistants/:assistant_id", DeleteAssistantEndpoint(cl, ml, appConfig))
-	app.Get("/assistants/:assistant_id", GetAssistantEndpoint(cl, ml, appConfig))
-	app.Post("/assistants/:assistant_id", ModifyAssistantEndpoint(cl, ml, appConfig))
-
-	app.Post("/files", UploadFilesEndpoint(cl, appConfig))
-	app.Get("/assistants/:assistant_id/files", ListAssistantFilesEndpoint(cl, ml, appConfig))
-	app.Post("/assistants/:assistant_id/files", CreateAssistantFileEndpoint(cl, ml, appConfig))
-	app.Delete("/assistants/:assistant_id/files/:file_id", DeleteAssistantFileEndpoint(cl, ml, appConfig))
-	app.Get("/assistants/:assistant_id/files/:file_id", GetAssistantFileEndpoint(cl, ml, appConfig))
-
-	t.Run("CreateAssistantEndpoint", func(t *testing.T) {
-		t.Cleanup(tearDown())
-		ar := &AssistantRequest{
-			Model:        "ggml-gpt4all-j",
-			Name:         "3.5-turbo",
-			Description:  "Test Assistant",
-			Instructions: "You are computer science teacher answering student questions",
-			Tools:        []Tool{{Type: Function}},
-			FileIDs:      nil,
-			Metadata:     nil,
-		}
-
-		resultAssistant, resp, err := createAssistant(app, *ar)
-		assert.NoError(t, err)
-		assert.Equal(t, fiber.StatusOK, resp.StatusCode)
-
-		assert.Equal(t, 1, len(Assistants))
-		//t.Cleanup(cleanupAllAssistants(t, app, []string{resultAssistant.ID}))
-
-		assert.Equal(t, ar.Name, resultAssistant.Name)
-		assert.Equal(t, ar.Model, resultAssistant.Model)
-		assert.Equal(t, ar.Tools, resultAssistant.Tools)
-		assert.Equal(t, ar.Description, resultAssistant.Description)
-		assert.Equal(t, ar.Instructions, resultAssistant.Instructions)
-		assert.Equal(t, ar.FileIDs, resultAssistant.FileIDs)
-		assert.Equal(t, ar.Metadata, resultAssistant.Metadata)
-	})
-
-	t.Run("ListAssistantsEndpoint", func(t *testing.T) {
-		var ids []string
-		var resultAssistant []Assistant
-		for i := 0; i < 4; i++ {
-			ar := &AssistantRequest{
-				Model:        "ggml-gpt4all-j",
-				Name:         fmt.Sprintf("3.5-turbo-%d", i),
-				Description:  fmt.Sprintf("Test Assistant - %d", i),
-				Instructions: fmt.Sprintf("You are computer science teacher answering student questions - %d", i),
-				Tools:        []Tool{{Type: Function}},
-				FileIDs:      []string{"fid-1234"},
-				Metadata:     map[string]string{"meta": "data"},
-			}
-
-			//var err error
-			ra, _, err := createAssistant(app, *ar)
-			// Because we create the assistants so fast all end up with the same created time.
-			time.Sleep(time.Second)
-			resultAssistant = append(resultAssistant, ra)
-			assert.NoError(t, err)
-			ids = append(ids, resultAssistant[i].ID)
-		}
-
-		t.Cleanup(cleanupAllAssistants(t, app, ids))
-
-		tests := []struct {
-			name                 string
-			reqURL               string
-			expectedStatus       int
-			expectedResult       []Assistant
-			expectedStringResult string
-		}{
-			{
-				name:           "Valid Usage - limit only",
-				reqURL:         "/assistants?limit=2",
-				expectedStatus: http.StatusOK,
-				expectedResult: Assistants[:2], // Expecting the first two assistants
-			},
-			{
-				name:           "Valid Usage - order asc",
-				reqURL:         "/assistants?order=asc",
-				expectedStatus: http.StatusOK,
-				expectedResult: Assistants, // Expecting all assistants in ascending order
-			},
-			{
-				name:           "Valid Usage - order desc",
-				reqURL:         "/assistants?order=desc",
-				expectedStatus: http.StatusOK,
-				expectedResult: []Assistant{Assistants[3], Assistants[2], Assistants[1], Assistants[0]}, // Expecting all assistants in descending order
-			},
-			{
-				name:           "Valid Usage - after specific ID",
-				reqURL:         "/assistants?after=2",
-				expectedStatus: http.StatusOK,
-				// Note this is correct because it's put in descending order already
-				expectedResult: Assistants[:3], // Expecting assistants after (excluding) ID 2
-			},
-			{
-				name:           "Valid Usage - before specific ID",
-				reqURL:         "/assistants?before=4",
-				expectedStatus: http.StatusOK,
-				expectedResult: Assistants[2:], // Expecting assistants before (excluding) ID 3.
-			},
-			{
-				name:                 "Invalid Usage - non-integer limit",
-				reqURL:               "/assistants?limit=two",
-				expectedStatus:       http.StatusBadRequest,
-				expectedStringResult: "Invalid limit query value: two",
-			},
-			{
-				name:           "Invalid Usage - non-existing id in after",
-				reqURL:         "/assistants?after=100",
-				expectedStatus: http.StatusOK,
-				expectedResult: []Assistant(nil), // Expecting empty list as there are no IDs above 100
-			},
-		}
-
-		for _, tt := range tests {
-			t.Run(tt.name, func(t *testing.T) {
-				request := httptest.NewRequest(http.MethodGet, tt.reqURL, nil)
-				response, err := app.Test(request)
-				assert.NoError(t, err)
-				assert.Equal(t, tt.expectedStatus, response.StatusCode)
-				if tt.expectedStatus != fiber.StatusOK {
-					all, _ := io.ReadAll(response.Body)
-					assert.Equal(t, tt.expectedStringResult, string(all))
-				} else {
-					var result []Assistant
-					err = json.NewDecoder(response.Body).Decode(&result)
-					assert.NoError(t, err)
-
-					assert.Equal(t, tt.expectedResult, result)
-				}
-			})
-		}
-	})
-
-	t.Run("DeleteAssistantEndpoint", func(t *testing.T) {
-		ar := &AssistantRequest{
-			Model:        "ggml-gpt4all-j",
-			Name:         "3.5-turbo",
-			Description:  "Test Assistant",
-			Instructions: "You are computer science teacher answering student questions",
-			Tools:        []Tool{{Type: Function}},
-			FileIDs:      nil,
-			Metadata:     nil,
-		}
-
-		resultAssistant, _, err := createAssistant(app, *ar)
-		assert.NoError(t, err)
-
-		target := fmt.Sprintf("/assistants/%s", resultAssistant.ID)
-		deleteReq := httptest.NewRequest(http.MethodDelete, target, nil)
-		_, err = app.Test(deleteReq)
-		assert.NoError(t, err)
-		assert.Equal(t, 0, len(Assistants))
-	})
-
-	t.Run("GetAssistantEndpoint", func(t *testing.T) {
-		ar := &AssistantRequest{
-			Model:        "ggml-gpt4all-j",
-			Name:         "3.5-turbo",
-			Description:  "Test Assistant",
-			Instructions: "You are computer science teacher answering student questions",
-			Tools:        []Tool{{Type: Function}},
-			FileIDs:      nil,
-			Metadata:     nil,
-		}
-
-		resultAssistant, _, err := createAssistant(app, *ar)
-		assert.NoError(t, err)
-		t.Cleanup(cleanupAllAssistants(t, app, []string{resultAssistant.ID}))
-
-		target := fmt.Sprintf("/assistants/%s", resultAssistant.ID)
-		request := httptest.NewRequest(http.MethodGet, target, nil)
-		response, err := app.Test(request)
-		assert.NoError(t, err)
-
-		var getAssistant Assistant
-		err = json.NewDecoder(response.Body).Decode(&getAssistant)
-		assert.NoError(t, err)
-
-		assert.Equal(t, resultAssistant.ID, getAssistant.ID)
-	})
-
-	t.Run("ModifyAssistantEndpoint", func(t *testing.T) {
-		ar := &AssistantRequest{
-			Model:        "ggml-gpt4all-j",
-			Name:         "3.5-turbo",
-			Description:  "Test Assistant",
-			Instructions: "You are computer science teacher answering student questions",
-			Tools:        []Tool{{Type: Function}},
-			FileIDs:      nil,
-			Metadata:     nil,
-		}
-
-		resultAssistant, _, err := createAssistant(app, *ar)
-		assert.NoError(t, err)
-
-		modifiedAr := &AssistantRequest{
-			Model:        "ggml-gpt4all-j",
-			Name:         "4.0-turbo",
-			Description:  "Modified Test Assistant",
-			Instructions: "You are math teacher answering student questions",
-			Tools:        []Tool{{Type: CodeInterpreter}},
-			FileIDs:      nil,
-			Metadata:     nil,
-		}
-
-		modifiedArJson, err := json.Marshal(modifiedAr)
-		assert.NoError(t, err)
-
-		target := fmt.Sprintf("/assistants/%s", resultAssistant.ID)
-		request := httptest.NewRequest(http.MethodPost, target, strings.NewReader(string(modifiedArJson)))
-		request.Header.Set(fiber.HeaderContentType, "application/json")
-
-		modifyResponse, err := app.Test(request)
-		assert.NoError(t, err)
-		var getAssistant Assistant
-		err = json.NewDecoder(modifyResponse.Body).Decode(&getAssistant)
-		assert.NoError(t, err)
-
-		t.Cleanup(cleanupAllAssistants(t, app, []string{getAssistant.ID}))
-
-		assert.Equal(t, resultAssistant.ID, getAssistant.ID) // IDs should match even if contents change
-		assert.Equal(t, modifiedAr.Tools, getAssistant.Tools)
-		assert.Equal(t, modifiedAr.Name, getAssistant.Name)
-		assert.Equal(t, modifiedAr.Instructions, getAssistant.Instructions)
-		assert.Equal(t, modifiedAr.Description, getAssistant.Description)
-	})
-
-	t.Run("CreateAssistantFileEndpoint", func(t *testing.T) {
-		t.Cleanup(tearDown())
-		file, assistant, err := createFileAndAssistant(t, app, appConfig)
-		assert.NoError(t, err)
-
-		afr := schema.AssistantFileRequest{FileID: file.ID}
-		af, _, err := createAssistantFile(app, afr, assistant.ID)
-
-		assert.NoError(t, err)
-		assert.Equal(t, assistant.ID, af.AssistantID)
-	})
-	t.Run("ListAssistantFilesEndpoint", func(t *testing.T) {
-		t.Cleanup(tearDown())
-		file, assistant, err := createFileAndAssistant(t, app, appConfig)
-		assert.NoError(t, err)
-
-		afr := schema.AssistantFileRequest{FileID: file.ID}
-		af, _, err := createAssistantFile(app, afr, assistant.ID)
-		assert.NoError(t, err)
-
-		assert.Equal(t, assistant.ID, af.AssistantID)
-	})
-	t.Run("GetAssistantFileEndpoint", func(t *testing.T) {
-		t.Cleanup(tearDown())
-		file, assistant, err := createFileAndAssistant(t, app, appConfig)
-		assert.NoError(t, err)
-
-		afr := schema.AssistantFileRequest{FileID: file.ID}
-		af, _, err := createAssistantFile(app, afr, assistant.ID)
-		assert.NoError(t, err)
-		t.Cleanup(cleanupAssistantFile(t, app, af.ID, af.AssistantID))
-
-		target := fmt.Sprintf("/assistants/%s/files/%s", assistant.ID, file.ID)
-		request := httptest.NewRequest(http.MethodGet, target, nil)
-		response, err := app.Test(request)
-		assert.NoError(t, err)
-
-		var assistantFile AssistantFile
-		err = json.NewDecoder(response.Body).Decode(&assistantFile)
-		assert.NoError(t, err)
-
-		assert.Equal(t, af.ID, assistantFile.ID)
-		assert.Equal(t, af.AssistantID, assistantFile.AssistantID)
-	})
-	t.Run("DeleteAssistantFileEndpoint", func(t *testing.T) {
-		t.Cleanup(tearDown())
-		file, assistant, err := createFileAndAssistant(t, app, appConfig)
-		assert.NoError(t, err)
-
-		afr := schema.AssistantFileRequest{FileID: file.ID}
-		af, _, err := createAssistantFile(app, afr, assistant.ID)
-		assert.NoError(t, err)
-
-		cleanupAssistantFile(t, app, af.ID, af.AssistantID)()
-
-		assert.Empty(t, AssistantFiles)
-	})
-
-}
-
-func createFileAndAssistant(t *testing.T, app *fiber.App, o *config.ApplicationConfig) (schema.File, Assistant, error) {
-	ar := &AssistantRequest{
-		Model:        "ggml-gpt4all-j",
-		Name:         "3.5-turbo",
-		Description:  "Test Assistant",
-		Instructions: "You are computer science teacher answering student questions",
-		Tools:        []Tool{{Type: Function}},
-		FileIDs:      nil,
-		Metadata:     nil,
-	}
-
-	assistant, _, err := createAssistant(app, *ar)
-	if err != nil {
-		return schema.File{}, Assistant{}, err
-	}
-	t.Cleanup(cleanupAllAssistants(t, app, []string{assistant.ID}))
-
-	file := CallFilesUploadEndpointWithCleanup(t, app, "test.txt", "file", "fine-tune", 5, o)
-	t.Cleanup(func() {
-		_, err := CallFilesDeleteEndpoint(t, app, file.ID)
-		assert.NoError(t, err)
-	})
-	return file, assistant, nil
-}
-
-func createAssistantFile(app *fiber.App, afr schema.AssistantFileRequest, assistantId string) (AssistantFile, *http.Response, error) {
-	afrJson, err := json.Marshal(afr)
-	if err != nil {
-		return AssistantFile{}, nil, err
-	}
-
-	target := fmt.Sprintf("/assistants/%s/files", assistantId)
-	request := httptest.NewRequest(http.MethodPost, target, strings.NewReader(string(afrJson)))
-	request.Header.Set(fiber.HeaderContentType, "application/json")
-	request.Header.Set("OpenAi-Beta", "assistants=v1")
-
-	resp, err := app.Test(request)
-	if err != nil {
-		return AssistantFile{}, resp, err
-	}
-
-	var assistantFile AssistantFile
-	all, err := io.ReadAll(resp.Body)
-	if err != nil {
-		return AssistantFile{}, resp, err
-	}
-	err = json.NewDecoder(strings.NewReader(string(all))).Decode(&assistantFile)
-	if err != nil {
-		return AssistantFile{}, resp, err
-	}
-
-	return assistantFile, resp, nil
-}
-
-func createAssistant(app *fiber.App, ar AssistantRequest) (Assistant, *http.Response, error) {
-	assistant, err := json.Marshal(ar)
-	if err != nil {
-		return Assistant{}, nil, err
-	}
-
-	request := httptest.NewRequest(http.MethodPost, "/assistants", strings.NewReader(string(assistant)))
-	request.Header.Set(fiber.HeaderContentType, "application/json")
-	request.Header.Set("OpenAi-Beta", "assistants=v1")
-
-	resp, err := app.Test(request)
-	if err != nil {
-		return Assistant{}, resp, err
-	}
-
-	bodyString, err := io.ReadAll(resp.Body)
-	if err != nil {
-		return Assistant{}, resp, err
-	}
-
-	var resultAssistant Assistant
-	err = json.NewDecoder(strings.NewReader(string(bodyString))).Decode(&resultAssistant)
-	return resultAssistant, resp, err
-}
-
-func cleanupAllAssistants(t *testing.T, app *fiber.App, ids []string) func() {
-	return func() {
-		for _, assistant := range ids {
-			target := fmt.Sprintf("/assistants/%s", assistant)
-			deleteReq := httptest.NewRequest(http.MethodDelete, target, nil)
-			_, err := app.Test(deleteReq)
-			if err != nil {
-				t.Fatalf("Failed to delete assistant %s: %v", assistant, err)
-			}
-		}
-	}
-}
-
-func cleanupAssistantFile(t *testing.T, app *fiber.App, fileId, assistantId string) func() {
-	return func() {
-		target := fmt.Sprintf("/assistants/%s/files/%s", assistantId, fileId)
-		request := httptest.NewRequest(http.MethodDelete, target, nil)
-		request.Header.Set(fiber.HeaderContentType, "application/json")
-		request.Header.Set("OpenAi-Beta", "assistants=v1")
-
-		resp, err := app.Test(request)
-		assert.NoError(t, err)
-
-		var dafr schema.DeleteAssistantFileResponse
-		err = json.NewDecoder(resp.Body).Decode(&dafr)
-		assert.NoError(t, err)
-		assert.True(t, dafr.Deleted)
-	}
-}
--- a/core/http/endpoints/openai/files.go
+++ b/core/http/endpoints/openai/files.go
@@ -1,194 +0,0 @@
-package openai
-
-import (
-	"errors"
-	"fmt"
-	"os"
-	"path/filepath"
-	"sync/atomic"
-	"time"
-
-	"github.com/microcosm-cc/bluemonday"
-	"github.com/mudler/LocalAI/core/config"
-	"github.com/mudler/LocalAI/core/schema"
-
-	"github.com/gofiber/fiber/v2"
-	"github.com/mudler/LocalAI/pkg/utils"
-)
-
-var UploadedFiles []schema.File
-
-const UploadedFilesFile = "uploadedFiles.json"
-
-// UploadFilesEndpoint https://platform.openai.com/docs/api-reference/files/create
-func UploadFilesEndpoint(cm *config.BackendConfigLoader, appConfig *config.ApplicationConfig) func(c *fiber.Ctx) error {
-	return func(c *fiber.Ctx) error {
-		file, err := c.FormFile("file")
-		if err != nil {
-			return err
-		}
-
-		// Check the file size
-		if file.Size > int64(appConfig.UploadLimitMB*1024*1024) {
-			return c.Status(fiber.StatusBadRequest).SendString(fmt.Sprintf("File size %d exceeds upload limit %d", file.Size, appConfig.UploadLimitMB))
-		}
-
-		purpose := c.FormValue("purpose", "") //TODO put in purpose dirs
-		if purpose == "" {
-			return c.Status(fiber.StatusBadRequest).SendString("Purpose is not defined")
-		}
-
-		// Sanitize the filename to prevent directory traversal
-		filename := utils.SanitizeFileName(file.Filename)
-
-		savePath := filepath.Join(appConfig.UploadDir, filename)
-
-		// Check if file already exists
-		if _, err := os.Stat(savePath); !os.IsNotExist(err) {
-			return c.Status(fiber.StatusBadRequest).SendString("File already exists")
-		}
-
-		err = c.SaveFile(file, savePath)
-		if err != nil {
-			return c.Status(fiber.StatusInternalServerError).SendString("Failed to save file: " + bluemonday.StrictPolicy().Sanitize(err.Error()))
-		}
-
-		f := schema.File{
-			ID:        fmt.Sprintf("file-%d", getNextFileId()),
-			Object:    "file",
-			Bytes:     int(file.Size),
-			CreatedAt: time.Now(),
-			Filename:  file.Filename,
-			Purpose:   purpose,
-		}
-
-		UploadedFiles = append(UploadedFiles, f)
-		utils.SaveConfig(appConfig.UploadDir, UploadedFilesFile, UploadedFiles)
-		return c.Status(fiber.StatusOK).JSON(f)
-	}
-}
-
-var currentFileId int64 = 0
-
-func getNextFileId() int64 {
-	atomic.AddInt64(&currentId, 1)
-	return currentId
-}
-
-// ListFilesEndpoint https://platform.openai.com/docs/api-reference/files/list
-// @Summary List files.
-// @Success 200 {object} schema.ListFiles "Response"
-// @Router /v1/files [get]
-func ListFilesEndpoint(cm *config.BackendConfigLoader, appConfig *config.ApplicationConfig) func(c *fiber.Ctx) error {
-
-	return func(c *fiber.Ctx) error {
-		var listFiles schema.ListFiles
-
-		purpose := c.Query("purpose")
-		if purpose == "" {
-			listFiles.Data = UploadedFiles
-		} else {
-			for _, f := range UploadedFiles {
-				if purpose == f.Purpose {
-					listFiles.Data = append(listFiles.Data, f)
-				}
-			}
-		}
-		listFiles.Object = "list"
-		return c.Status(fiber.StatusOK).JSON(listFiles)
-	}
-}
-
-func getFileFromRequest(c *fiber.Ctx) (*schema.File, error) {
-	id := c.Params("file_id")
-	if id == "" {
-		return nil, fmt.Errorf("file_id parameter is required")
-	}
-
-	for _, f := range UploadedFiles {
-		if id == f.ID {
-			return &f, nil
-		}
-	}
-
-	return nil, fmt.Errorf("unable to find file id %s", id)
-}
-
-// GetFilesEndpoint is the OpenAI API endpoint to get files https://platform.openai.com/docs/api-reference/files/retrieve
-// @Summary Returns information about a specific file.
-// @Success 200 {object} schema.File "Response"
-// @Router /v1/files/{file_id} [get]
-func GetFilesEndpoint(cm *config.BackendConfigLoader, appConfig *config.ApplicationConfig) func(c *fiber.Ctx) error {
-	return func(c *fiber.Ctx) error {
-		file, err := getFileFromRequest(c)
-		if err != nil {
-			return c.Status(fiber.StatusInternalServerError).SendString(bluemonday.StrictPolicy().Sanitize(err.Error()))
-		}
-
-		return c.JSON(file)
-	}
-}
-
-type DeleteStatus struct {
-	Id      string
-	Object  string
-	Deleted bool
-}
-
-// DeleteFilesEndpoint is the OpenAI API endpoint to delete files https://platform.openai.com/docs/api-reference/files/delete
-// @Summary Delete a file.
-// @Success 200 {object} DeleteStatus "Response"
-// @Router /v1/files/{file_id} [delete]
-func DeleteFilesEndpoint(cm *config.BackendConfigLoader, appConfig *config.ApplicationConfig) func(c *fiber.Ctx) error {
-
-	return func(c *fiber.Ctx) error {
-		file, err := getFileFromRequest(c)
-		if err != nil {
-			return c.Status(fiber.StatusInternalServerError).SendString(bluemonday.StrictPolicy().Sanitize(err.Error()))
-		}
-
-		err = os.Remove(filepath.Join(appConfig.UploadDir, file.Filename))
-		if err != nil {
-			// If the file doesn't exist then we should just continue to remove it
-			if !errors.Is(err, os.ErrNotExist) {
-				return c.Status(fiber.StatusInternalServerError).SendString(bluemonday.StrictPolicy().Sanitize(fmt.Sprintf("Unable to delete file: %s, %v", file.Filename, err)))
-			}
-		}
-
-		// Remove upload from list
-		for i, f := range UploadedFiles {
-			if f.ID == file.ID {
-				UploadedFiles = append(UploadedFiles[:i], UploadedFiles[i+1:]...)
-				break
-			}
-		}
-
-		utils.SaveConfig(appConfig.UploadDir, UploadedFilesFile, UploadedFiles)
-		return c.JSON(DeleteStatus{
-			Id:      file.ID,
-			Object:  "file",
-			Deleted: true,
-		})
-	}
-}
-
-// GetFilesContentsEndpoint is the OpenAI API endpoint to get files content https://platform.openai.com/docs/api-reference/files/retrieve-contents
-// @Summary Returns information about a specific file.
-// @Success	200		{string}	binary				"file"
-// @Router /v1/files/{file_id}/content [get]
-// GetFilesContentsEndpoint
-func GetFilesContentsEndpoint(cm *config.BackendConfigLoader, appConfig *config.ApplicationConfig) func(c *fiber.Ctx) error {
-	return func(c *fiber.Ctx) error {
-		file, err := getFileFromRequest(c)
-		if err != nil {
-			return c.Status(fiber.StatusInternalServerError).SendString(bluemonday.StrictPolicy().Sanitize(err.Error()))
-		}
-
-		fileContents, err := os.ReadFile(filepath.Join(appConfig.UploadDir, file.Filename))
-		if err != nil {
-			return c.Status(fiber.StatusInternalServerError).SendString(bluemonday.StrictPolicy().Sanitize(err.Error()))
-		}
-
-		return c.Send(fileContents)
-	}
-}
--- a/core/http/endpoints/openai/files_test.go
+++ b/core/http/endpoints/openai/files_test.go
@@ -1,301 +0,0 @@
-package openai
-
-import (
-	"encoding/json"
-	"fmt"
-	"io"
-	"mime/multipart"
-	"net/http"
-	"net/http/httptest"
-	"os"
-	"path/filepath"
-	"strings"
-
-	"github.com/rs/zerolog/log"
-
-	"github.com/mudler/LocalAI/core/config"
-	"github.com/mudler/LocalAI/core/schema"
-
-	"github.com/gofiber/fiber/v2"
-	utils2 "github.com/mudler/LocalAI/pkg/utils"
-	"github.com/stretchr/testify/assert"
-
-	"testing"
-)
-
-func startUpApp() (app *fiber.App, option *config.ApplicationConfig, loader *config.BackendConfigLoader) {
-	// Preparing the mocked objects
-	loader = &config.BackendConfigLoader{}
-
-	option = &config.ApplicationConfig{
-		UploadLimitMB: 10,
-		UploadDir:     "test_dir",
-	}
-
-	_ = os.RemoveAll(option.UploadDir)
-
-	app = fiber.New(fiber.Config{
-		BodyLimit: 20 * 1024 * 1024, // sets the limit to 20MB.
-	})
-
-	// Create a Test Server
-	app.Post("/files", UploadFilesEndpoint(loader, option))
-	app.Get("/files", ListFilesEndpoint(loader, option))
-	app.Get("/files/:file_id", GetFilesEndpoint(loader, option))
-	app.Delete("/files/:file_id", DeleteFilesEndpoint(loader, option))
-	app.Get("/files/:file_id/content", GetFilesContentsEndpoint(loader, option))
-
-	return
-}
-
-func TestUploadFileExceedSizeLimit(t *testing.T) {
-	// Preparing the mocked objects
-	loader := &config.BackendConfigLoader{}
-
-	option := &config.ApplicationConfig{
-		UploadLimitMB: 10,
-		UploadDir:     "test_dir",
-	}
-
-	_ = os.RemoveAll(option.UploadDir)
-
-	app := fiber.New(fiber.Config{
-		BodyLimit: 20 * 1024 * 1024, // sets the limit to 20MB.
-	})
-
-	// Create a Test Server
-	app.Post("/files", UploadFilesEndpoint(loader, option))
-	app.Get("/files", ListFilesEndpoint(loader, option))
-	app.Get("/files/:file_id", GetFilesEndpoint(loader, option))
-	app.Delete("/files/:file_id", DeleteFilesEndpoint(loader, option))
-	app.Get("/files/:file_id/content", GetFilesContentsEndpoint(loader, option))
-
-	t.Run("UploadFilesEndpoint file size exceeds limit", func(t *testing.T) {
-		t.Cleanup(tearDown())
-		resp, err := CallFilesUploadEndpoint(t, app, "foo.txt", "file", "fine-tune", 11, option)
-		assert.NoError(t, err)
-
-		assert.Equal(t, fiber.StatusBadRequest, resp.StatusCode)
-		assert.Contains(t, bodyToString(resp, t), "exceeds upload limit")
-	})
-	t.Run("UploadFilesEndpoint purpose not defined", func(t *testing.T) {
-		t.Cleanup(tearDown())
-		resp, _ := CallFilesUploadEndpoint(t, app, "foo.txt", "file", "", 5, option)
-
-		assert.Equal(t, fiber.StatusBadRequest, resp.StatusCode)
-		assert.Contains(t, bodyToString(resp, t), "Purpose is not defined")
-	})
-	t.Run("UploadFilesEndpoint file already exists", func(t *testing.T) {
-		t.Cleanup(tearDown())
-		f1 := CallFilesUploadEndpointWithCleanup(t, app, "foo.txt", "file", "fine-tune", 5, option)
-
-		resp, err := CallFilesUploadEndpoint(t, app, "foo.txt", "file", "fine-tune", 5, option)
-		fmt.Println(f1)
-		fmt.Printf("ERror: %v\n", err)
-		fmt.Printf("resp: %+v\n", resp)
-
-		assert.Equal(t, fiber.StatusBadRequest, resp.StatusCode)
-		assert.Contains(t, bodyToString(resp, t), "File already exists")
-	})
-	t.Run("UploadFilesEndpoint file uploaded successfully", func(t *testing.T) {
-		t.Cleanup(tearDown())
-		file := CallFilesUploadEndpointWithCleanup(t, app, "test.txt", "file", "fine-tune", 5, option)
-
-		// Check if file exists in the disk
-		testName := strings.Split(t.Name(), "/")[1]
-		fileName := testName + "-test.txt"
-		filePath := filepath.Join(option.UploadDir, utils2.SanitizeFileName(fileName))
-		_, err := os.Stat(filePath)
-
-		assert.False(t, os.IsNotExist(err))
-		assert.Equal(t, file.Bytes, 5242880)
-		assert.NotEmpty(t, file.CreatedAt)
-		assert.Equal(t, file.Filename, fileName)
-		assert.Equal(t, file.Purpose, "fine-tune")
-	})
-	t.Run("ListFilesEndpoint without purpose parameter", func(t *testing.T) {
-		t.Cleanup(tearDown())
-		resp, err := CallListFilesEndpoint(t, app, "")
-		assert.NoError(t, err)
-
-		assert.Equal(t, 200, resp.StatusCode)
-
-		listFiles := responseToListFile(t, resp)
-		if len(listFiles.Data) != len(UploadedFiles) {
-			t.Errorf("Expected %v files, got %v files", len(UploadedFiles), len(listFiles.Data))
-		}
-	})
-	t.Run("ListFilesEndpoint with valid purpose parameter", func(t *testing.T) {
-		t.Cleanup(tearDown())
-		_ = CallFilesUploadEndpointWithCleanup(t, app, "test.txt", "file", "fine-tune", 5, option)
-
-		resp, err := CallListFilesEndpoint(t, app, "fine-tune")
-		assert.NoError(t, err)
-
-		listFiles := responseToListFile(t, resp)
-		if len(listFiles.Data) != 1 {
-			t.Errorf("Expected 1 file, got %v files", len(listFiles.Data))
-		}
-	})
-	t.Run("ListFilesEndpoint with invalid query parameter", func(t *testing.T) {
-		t.Cleanup(tearDown())
-		resp, err := CallListFilesEndpoint(t, app, "not-so-fine-tune")
-		assert.NoError(t, err)
-		assert.Equal(t, 200, resp.StatusCode)
-
-		listFiles := responseToListFile(t, resp)
-
-		if len(listFiles.Data) != 0 {
-			t.Errorf("Expected 0 file, got %v files", len(listFiles.Data))
-		}
-	})
-	t.Run("GetFilesContentsEndpoint get file content", func(t *testing.T) {
-		t.Cleanup(tearDown())
-		req := httptest.NewRequest("GET", "/files", nil)
-		resp, _ := app.Test(req)
-		assert.Equal(t, 200, resp.StatusCode)
-
-		var listFiles schema.ListFiles
-		if err := json.Unmarshal(bodyToByteArray(resp, t), &listFiles); err != nil {
-			t.Errorf("Failed to decode response: %v", err)
-			return
-		}
-
-		if len(listFiles.Data) != 0 {
-			t.Errorf("Expected 0 file, got %v files", len(listFiles.Data))
-		}
-	})
-}
-
-func CallListFilesEndpoint(t *testing.T, app *fiber.App, purpose string) (*http.Response, error) {
-	var target string
-	if purpose != "" {
-		target = fmt.Sprintf("/files?purpose=%s", purpose)
-	} else {
-		target = "/files"
-	}
-	req := httptest.NewRequest("GET", target, nil)
-	return app.Test(req)
-}
-
-func CallFilesContentEndpoint(t *testing.T, app *fiber.App, fileId string) (*http.Response, error) {
-	request := httptest.NewRequest("GET", "/files?file_id="+fileId, nil)
-	return app.Test(request)
-}
-
-func CallFilesUploadEndpoint(t *testing.T, app *fiber.App, fileName, tag, purpose string, fileSize int, appConfig *config.ApplicationConfig) (*http.Response, error) {
-	testName := strings.Split(t.Name(), "/")[1]
-
-	// Create a file that exceeds the limit
-	file := createTestFile(t, testName+"-"+fileName, fileSize, appConfig)
-
-	// Creating a new HTTP Request
-	body, writer := newMultipartFile(file.Name(), tag, purpose)
-
-	req := httptest.NewRequest(http.MethodPost, "/files", body)
-	req.Header.Set(fiber.HeaderContentType, writer.FormDataContentType())
-	return app.Test(req)
-}
-
-func CallFilesUploadEndpointWithCleanup(t *testing.T, app *fiber.App, fileName, tag, purpose string, fileSize int, appConfig *config.ApplicationConfig) schema.File {
-	// Create a file that exceeds the limit
-	testName := strings.Split(t.Name(), "/")[1]
-	file := createTestFile(t, testName+"-"+fileName, fileSize, appConfig)
-
-	// Creating a new HTTP Request
-	body, writer := newMultipartFile(file.Name(), tag, purpose)
-
-	req := httptest.NewRequest(http.MethodPost, "/files", body)
-	req.Header.Set(fiber.HeaderContentType, writer.FormDataContentType())
-	resp, err := app.Test(req)
-	assert.NoError(t, err)
-	f := responseToFile(t, resp)
-
-	//id := f.ID
-	//t.Cleanup(func() {
-	//	_, err := CallFilesDeleteEndpoint(t, app, id)
-	//	assert.NoError(t, err)
-	//	assert.Empty(t, UploadedFiles)
-	//})
-
-	return f
-
-}
-
-func CallFilesDeleteEndpoint(t *testing.T, app *fiber.App, fileId string) (*http.Response, error) {
-	target := fmt.Sprintf("/files/%s", fileId)
-	req := httptest.NewRequest(http.MethodDelete, target, nil)
-	return app.Test(req)
-}
-
-// Helper to create multi-part file
-func newMultipartFile(filePath, tag, purpose string) (*strings.Reader, *multipart.Writer) {
-	body := new(strings.Builder)
-	writer := multipart.NewWriter(body)
-	file, _ := os.Open(filePath)
-	defer file.Close()
-	part, _ := writer.CreateFormFile(tag, filepath.Base(filePath))
-	io.Copy(part, file)
-
-	if purpose != "" {
-		_ = writer.WriteField("purpose", purpose)
-	}
-
-	writer.Close()
-	return strings.NewReader(body.String()), writer
-}
-
-// Helper to create test files
-func createTestFile(t *testing.T, name string, sizeMB int, option *config.ApplicationConfig) *os.File {
-	err := os.MkdirAll(option.UploadDir, 0750)
-	if err != nil {
-
-		t.Fatalf("Error MKDIR: %v", err)
-	}
-
-	file, err := os.Create(name)
-	assert.NoError(t, err)
-	file.WriteString(strings.Repeat("a", sizeMB*1024*1024)) // sizeMB MB File
-
-	t.Cleanup(func() {
-		os.Remove(name)
-		os.RemoveAll(option.UploadDir)
-	})
-	return file
-}
-
-func bodyToString(resp *http.Response, t *testing.T) string {
-	return string(bodyToByteArray(resp, t))
-}
-
-func bodyToByteArray(resp *http.Response, t *testing.T) []byte {
-	bodyBytes, err := io.ReadAll(resp.Body)
-	if err != nil {
-		t.Fatal(err)
-	}
-	return bodyBytes
-}
-
-func responseToFile(t *testing.T, resp *http.Response) schema.File {
-	var file schema.File
-	responseToString := bodyToString(resp, t)
-
-	err := json.NewDecoder(strings.NewReader(responseToString)).Decode(&file)
-	if err != nil {
-		t.Errorf("Failed to decode response: %s", err)
-	}
-
-	return file
-}
-
-func responseToListFile(t *testing.T, resp *http.Response) schema.ListFiles {
-	var listFiles schema.ListFiles
-	responseToString := bodyToString(resp, t)
-
-	err := json.NewDecoder(strings.NewReader(responseToString)).Decode(&listFiles)
-	if err != nil {
-		log.Error().Err(err).Msg("failed to decode response")
-	}
-
-	return listFiles
-}
--- a/core/http/routes/localai.go
+++ b/core/http/routes/localai.go
@@ -41,6 +41,11 @@ func RegisterLocalAIRoutes(router *fiber.App,
 		router.Get("/backends/jobs/:uuid", backendGalleryEndpointService.GetOpStatusEndpoint())
 	}

+	router.Post("/v1/detection",
+		requestExtractor.BuildFilteredFirstAvailableDefaultModel(config.BuildUsecaseFilterFn(config.FLAG_DETECTION)),
+		requestExtractor.SetModelAndConfig(func() schema.LocalAIRequest { return new(schema.DetectionRequest) }),
+		localai.DetectionEndpoint(cl, ml, appConfig))
+
 	router.Post("/tts",
 		requestExtractor.BuildFilteredFirstAvailableDefaultModel(config.BuildUsecaseFilterFn(config.FLAG_TTS)),
 		requestExtractor.SetModelAndConfig(func() schema.LocalAIRequest { return new(schema.TTSRequest) }),
--- a/core/http/routes/openai.go
+++ b/core/http/routes/openai.go
@@ -54,38 +54,6 @@ func RegisterOpenAIRoutes(app *fiber.App,
 	app.Post("/completions", completionChain...)
 	app.Post("/v1/engines/:model/completions", completionChain...)

-	// assistant
-	app.Get("/v1/assistants", openai.ListAssistantsEndpoint(application.BackendLoader(), application.ModelLoader(), application.ApplicationConfig()))
-	app.Get("/assistants", openai.ListAssistantsEndpoint(application.BackendLoader(), application.ModelLoader(), application.ApplicationConfig()))
-	app.Post("/v1/assistants", openai.CreateAssistantEndpoint(application.BackendLoader(), application.ModelLoader(), application.ApplicationConfig()))
-	app.Post("/assistants", openai.CreateAssistantEndpoint(application.BackendLoader(), application.ModelLoader(), application.ApplicationConfig()))
-	app.Delete("/v1/assistants/:assistant_id", openai.DeleteAssistantEndpoint(application.BackendLoader(), application.ModelLoader(), application.ApplicationConfig()))
-	app.Delete("/assistants/:assistant_id", openai.DeleteAssistantEndpoint(application.BackendLoader(), application.ModelLoader(), application.ApplicationConfig()))
-	app.Get("/v1/assistants/:assistant_id", openai.GetAssistantEndpoint(application.BackendLoader(), application.ModelLoader(), application.ApplicationConfig()))
-	app.Get("/assistants/:assistant_id", openai.GetAssistantEndpoint(application.BackendLoader(), application.ModelLoader(), application.ApplicationConfig()))
-	app.Post("/v1/assistants/:assistant_id", openai.ModifyAssistantEndpoint(application.BackendLoader(), application.ModelLoader(), application.ApplicationConfig()))
-	app.Post("/assistants/:assistant_id", openai.ModifyAssistantEndpoint(application.BackendLoader(), application.ModelLoader(), application.ApplicationConfig()))
-	app.Get("/v1/assistants/:assistant_id/files", openai.ListAssistantFilesEndpoint(application.BackendLoader(), application.ModelLoader(), application.ApplicationConfig()))
-	app.Get("/assistants/:assistant_id/files", openai.ListAssistantFilesEndpoint(application.BackendLoader(), application.ModelLoader(), application.ApplicationConfig()))
-	app.Post("/v1/assistants/:assistant_id/files", openai.CreateAssistantFileEndpoint(application.BackendLoader(), application.ModelLoader(), application.ApplicationConfig()))
-	app.Post("/assistants/:assistant_id/files", openai.CreateAssistantFileEndpoint(application.BackendLoader(), application.ModelLoader(), application.ApplicationConfig()))
-	app.Delete("/v1/assistants/:assistant_id/files/:file_id", openai.DeleteAssistantFileEndpoint(application.BackendLoader(), application.ModelLoader(), application.ApplicationConfig()))
-	app.Delete("/assistants/:assistant_id/files/:file_id", openai.DeleteAssistantFileEndpoint(application.BackendLoader(), application.ModelLoader(), application.ApplicationConfig()))
-	app.Get("/v1/assistants/:assistant_id/files/:file_id", openai.GetAssistantFileEndpoint(application.BackendLoader(), application.ModelLoader(), application.ApplicationConfig()))
-	app.Get("/assistants/:assistant_id/files/:file_id", openai.GetAssistantFileEndpoint(application.BackendLoader(), application.ModelLoader(), application.ApplicationConfig()))
-
-	// files
-	app.Post("/v1/files", openai.UploadFilesEndpoint(application.BackendLoader(), application.ApplicationConfig()))
-	app.Post("/files", openai.UploadFilesEndpoint(application.BackendLoader(), application.ApplicationConfig()))
-	app.Get("/v1/files", openai.ListFilesEndpoint(application.BackendLoader(), application.ApplicationConfig()))
-	app.Get("/files", openai.ListFilesEndpoint(application.BackendLoader(), application.ApplicationConfig()))
-	app.Get("/v1/files/:file_id", openai.GetFilesEndpoint(application.BackendLoader(), application.ApplicationConfig()))
-	app.Get("/files/:file_id", openai.GetFilesEndpoint(application.BackendLoader(), application.ApplicationConfig()))
-	app.Delete("/v1/files/:file_id", openai.DeleteFilesEndpoint(application.BackendLoader(), application.ApplicationConfig()))
-	app.Delete("/files/:file_id", openai.DeleteFilesEndpoint(application.BackendLoader(), application.ApplicationConfig()))
-	app.Get("/v1/files/:file_id/content", openai.GetFilesContentsEndpoint(application.BackendLoader(), application.ApplicationConfig()))
-	app.Get("/files/:file_id/content", openai.GetFilesContentsEndpoint(application.BackendLoader(), application.ApplicationConfig()))
-
 	// embeddings
 	embeddingChain := []fiber.Handler{
 		re.BuildFilteredFirstAvailableDefaultModel(config.BuildUsecaseFilterFn(config.FLAG_EMBEDDINGS)),
--- a/core/http/views/backends.html
+++ b/core/http/views/backends.html
@@ -90,6 +90,14 @@
                        hx-indicator=".htmx-indicator">
                        <i class="fas fa-headphones mr-2"></i>Whisper
                    </button>
+                    <button hx-post="browse/search/backends" 
+                        class="inline-flex items-center rounded-full px-4 py-2 text-sm font-medium bg-red-900/60 text-red-200 border border-red-700/50 hover:bg-red-800 transition duration-200 ease-in-out"
+                        hx-target="#search-results" 
+                        hx-vals='{"search": "object-detection"}'
+                        onclick="hidePagination()"
+                        hx-indicator=".htmx-indicator">
+                        <i class="fas fa-eye mr-2"></i>Object detection
+                    </button>
                </div>
            </div>
        </div>
--- a/core/http/views/models.html
+++ b/core/http/views/models.html
@@ -115,6 +115,14 @@
                        hx-indicator=".htmx-indicator">
                        <i class="fas fa-headphones mr-2"></i>Audio transcription
                    </button>
+                    <button hx-post="browse/search/models"
+                        class="inline-flex items-center rounded-full px-4 py-2 text-sm font-medium bg-red-900/60 text-red-200 border border-red-700/50 hover:bg-red-800 transition duration-200 ease-in-out"
+                        hx-target="#search-results" 
+                        hx-vals='{"search": "object-detection"}'
+                        onclick="hidePagination()"
+                        hx-indicator=".htmx-indicator">
+                        <i class="fas fa-eye mr-2"></i>Object detection
+                    </button>
                </div>
            </div>
            
--- a/core/schema/localai.go
+++ b/core/schema/localai.go
@@ -120,3 +120,20 @@ type SystemInformationResponse struct {
 	Backends []string       `json:"backends"`
 	Models   []SysInfoModel `json:"loaded_models"`
 }
+
+type DetectionRequest struct {
+	BasicModelRequest
+	Image string `json:"image"`
+}
+
+type DetectionResponse struct {
+	Detections []Detection `json:"detections"`
+}
+
+type Detection struct {
+	X         float32 `json:"x"`
+	Y         float32 `json:"y"`
+	Width     float32 `json:"width"`
+	Height    float32 `json:"height"`
+	ClassName string  `json:"class_name"`
+}
--- a/core/schema/openai.go
+++ b/core/schema/openai.go
@@ -2,7 +2,6 @@ package schema

 import (
 	"context"
-	"time"

 	functions "github.com/mudler/LocalAI/pkg/functions"
 )
@@ -115,37 +114,6 @@ type OpenAIModel struct {
 	Object string `json:"object"`
 }

-type DeleteAssistantResponse struct {
-	ID      string `json:"id"`
-	Object  string `json:"object"`
-	Deleted bool   `json:"deleted"`
-}
-
-// File represents the structure of a file object from the OpenAI API.
-type File struct {
-	ID        string    `json:"id"`         // Unique identifier for the file
-	Object    string    `json:"object"`     // Type of the object (e.g., "file")
-	Bytes     int       `json:"bytes"`      // Size of the file in bytes
-	CreatedAt time.Time `json:"created_at"` // The time at which the file was created
-	Filename  string    `json:"filename"`   // The name of the file
-	Purpose   string    `json:"purpose"`    // The purpose of the file (e.g., "fine-tune", "classifications", etc.)
-}
-
-type ListFiles struct {
-	Data   []File
-	Object string
-}
-
-type AssistantFileRequest struct {
-	FileID string `json:"file_id"`
-}
-
-type DeleteAssistantFileResponse struct {
-	ID      string `json:"id"`
-	Object  string `json:"object"`
-	Deleted bool   `json:"deleted"`
-}
-
 type ImageGenerationResponseFormat string

 type ChatCompletionResponseFormatType string
--- a/docs/content/docs/features/object-detection.md
+++ b/docs/content/docs/features/object-detection.md
@@ -0,0 +1,193 @@
+++
+disableToc = false
+title = "🔍 Object detection"
+weight = 13
+url = "/features/object-detection/"
+++
+
+LocalAI supports object detection through various backends. This feature allows you to identify and locate objects within images with high accuracy and real-time performance. Currently, [RF-DETR](https://github.com/roboflow/rf-detr) is available as an implementation.
+
+## Overview
+
+Object detection in LocalAI is implemented through dedicated backends that can identify and locate objects within images. Each backend provides different capabilities and model architectures.
+
+**Key Features:**
+- Real-time object detection
+- High accuracy detection with bounding boxes
+- Support for multiple hardware accelerators (CPU, NVIDIA GPU, Intel GPU, AMD GPU)
+- Structured detection results with confidence scores
+- Easy integration through the `/v1/detection` endpoint
+
+## Usage
+
+### Detection Endpoint
+
+LocalAI provides a dedicated `/v1/detection` endpoint for object detection tasks. This endpoint is specifically designed for object detection and returns structured detection results with bounding boxes and confidence scores.
+
+### API Reference
+
+To perform object detection, send a POST request to the `/v1/detection` endpoint:
+
+```bash
+curl -X POST http://localhost:8080/v1/detection \
+  -H "Content-Type: application/json" \
+  -d '{
+    "model": "rfdetr-base",
+    "image": "https://media.roboflow.com/dog.jpeg"
+  }'
+```
+
+### Request Format
+
+The request body should contain:
+
+- `model`: The name of the object detection model (e.g., "rfdetr-base")
+- `image`: The image to analyze, which can be:
+  - A URL to an image
+  - A base64-encoded image
+
+### Response Format
+
+The API returns a JSON response with detected objects:
+
+```json
+{
+  "detections": [
+    {
+      "x": 100.5,
+      "y": 150.2,
+      "width": 200.0,
+      "height": 300.0,
+      "confidence": 0.95,
+      "class_name": "dog"
+    },
+    {
+      "x": 400.0,
+      "y": 200.0,
+      "width": 150.0,
+      "height": 250.0,
+      "confidence": 0.87,
+      "class_name": "person"
+    }
+  ]
+}
+```
+
+Each detection includes:
+- `x`, `y`: Coordinates of the bounding box top-left corner
+- `width`, `height`: Dimensions of the bounding box
+- `confidence`: Detection confidence score (0.0 to 1.0)
+- `class_name`: The detected object class
+
+## Backends
+
+### RF-DETR Backend
+
+The RF-DETR backend is implemented as a Python-based gRPC service that integrates seamlessly with LocalAI. It provides object detection capabilities using the RF-DETR model architecture and supports multiple hardware configurations:
+
+- **CPU**: Optimized for CPU inference
+- **NVIDIA GPU**: CUDA acceleration for NVIDIA GPUs
+- **Intel GPU**: Intel oneAPI optimization
+- **AMD GPU**: ROCm acceleration for AMD GPUs
+- **NVIDIA Jetson**: Optimized for ARM64 NVIDIA Jetson devices
+
+#### Setup
+
+1. **Using the Model Gallery (Recommended)**
+
+   The easiest way to get started is using the model gallery. The `rfdetr-base` model is available in the official LocalAI gallery:
+
+   ```bash
+   # Install and run the rfdetr-base model
+   local-ai run rfdetr-base
+   ```
+
+   You can also install it through the web interface by navigating to the Models section and searching for "rfdetr-base".
+
+2. **Manual Configuration**
+
+   Create a model configuration file in your `models` directory:
+
+   ```yaml
+   name: rfdetr
+   backend: rfdetr
+   parameters:
+     model: rfdetr-base
+   ```
+
+#### Available Models
+
+Currently, the following model is available in the [Model Gallery]({{%relref "docs/features/model-gallery" %}}):
+
+- **rfdetr-base**: Base model with balanced performance and accuracy
+
+You can browse and install this model through the LocalAI web interface or using the command line.
+
+## Examples
+
+### Basic Object Detection
+
+```bash
+# Detect objects in an image from URL
+curl -X POST http://localhost:8080/v1/detection \
+  -H "Content-Type: application/json" \
+  -d '{
+    "model": "rfdetr-base",
+    "image": "https://example.com/image.jpg"
+  }'
+```
+
+### Base64 Image Detection
+
+```bash
+# Convert image to base64 and send
+base64_image=$(base64 -w 0 image.jpg)
+curl -X POST http://localhost:8080/v1/detection \
+  -H "Content-Type: application/json" \
+  -d "{
+    \"model\": \"rfdetr-base\",
+    \"image\": \"data:image/jpeg;base64,$base64_image\"
+  }"
+```
+
+## Troubleshooting
+
+### Common Issues
+
+1. **Model Loading Errors**
+   - Ensure the model file is properly downloaded
+   - Check available disk space
+   - Verify model compatibility with your backend version
+
+2. **Low Detection Accuracy**
+   - Ensure good image quality and lighting
+   - Check if objects are clearly visible
+   - Consider using a larger model for better accuracy
+
+3. **Slow Performance**
+   - Enable GPU acceleration if available
+   - Use a smaller model for faster inference
+   - Optimize image resolution
+
+### Debug Mode
+
+Enable debug logging for troubleshooting:
+
+```bash
+local-ai run --debug rfdetr-base
+```
+
+## Object Detection Category
+
+LocalAI includes a dedicated **object-detection** category for models and backends that specialize in identifying and locating objects within images. This category currently includes:
+
+- **RF-DETR**: Real-time transformer-based object detection
+
+Additional object detection models and backends will be added to this category in the future. You can filter models by the `object-detection` tag in the model gallery to find all available object detection models.
+
+## Related Features
+
+- [🎨 Image generation]({{%relref "docs/features/image-generation" %}}): Generate images with AI
+- [📖 Text generation]({{%relref "docs/features/text-generation" %}}): Generate text with language models
+- [🔍 GPT Vision]({{%relref "docs/features/gpt-vision" %}}): Analyze images with language models
+- [🚀 GPU acceleration]({{%relref "docs/features/GPU-acceleration" %}}): Optimize performance with GPU acceleration 
--- a/docs/content/docs/getting-started/container-images.md
+++ b/docs/content/docs/getting-started/container-images.md
@@ -173,7 +173,7 @@ Standard container images do not have pre-installed models.

 | Description | Quay | Docker Hub                                                  |
 | --- | --- |-------------------------------------------------------------|
-| Latest images from the branch (development) | `quay.io/go-skynet/local-ai:master-gpu-nvidia-cuda-12` | `localai/localai:master-gpu-nvidia-cuda12`                      |
+| Latest images from the branch (development) | `quay.io/go-skynet/local-ai:master-gpu-nvidia-cuda-12` | `localai/localai:master-gpu-nvidia-cuda-12`                      |
 | Latest tag | `quay.io/go-skynet/local-ai:latest-gpu-nvidia-cuda-12` | `localai/localai:latest-gpu-nvidia-cuda-12`                 |
 | Versioned image | `quay.io/go-skynet/local-ai:{{< version >}}-gpu-nvidia-cuda-12` | `localai/localai:{{< version >}}-gpu-nvidia-cuda-12`             |

--- a/docs/data/version.json
+++ b/docs/data/version.json
@@ -1,3 +1,3 @@
 {
-  "version": "v3.2.1"
+  "version": "v3.2.3"
 }
--- a/gallery/index.yaml
+++ b/gallery/index.yaml
@@ -1,4 +1,26 @@
 ---
+- &rfdetr
+  name: "rfdetr-base"
+  url: "github:mudler/LocalAI/gallery/virtual.yaml@master"
+  icon: https://avatars.githubusercontent.com/u/53104118?s=200&v=4
+  license: apache-2.0
+  description: |
+    RF-DETR is a real-time, transformer-based object detection model architecture developed by Roboflow and released under the Apache 2.0 license.
+    RF-DETR is the first real-time model to exceed 60 AP on the Microsoft COCO benchmark alongside competitive performance at base sizes. It also achieves state-of-the-art performance on RF100-VL, an object detection benchmark that measures model domain adaptability to real world problems. RF-DETR is fastest and most accurate for its size when compared current real-time objection models.
+    RF-DETR is small enough to run on the edge using Inference, making it an ideal model for deployments that need both strong accuracy and real-time performance.
+  tags:
+    - object-detection
+    - rfdetr
+    - gpu
+    - cpu
+  urls:
+    - https://github.com/roboflow/rf-detr
+  overrides:
+    backend: rfdetr
+    parameters:
+      model: rfdetr-base
+    known_usecases:
+      - detection
 - name: "dream-org_dream-v0-instruct-7b"
  # chatml
  url: "github:mudler/LocalAI/gallery/chatml.yaml@master"
--- a/pkg/grpc/backend.go
+++ b/pkg/grpc/backend.go
@@ -9,7 +9,7 @@ import (

 var embeds = map[string]*embedBackend{}

-func Provide(addr string, llm LLM) {
+func Provide(addr string, llm AIModel) {
 	embeds[addr] = &embedBackend{s: &server{llm: llm}}
 }

@@ -42,6 +42,7 @@ type Backend interface {
 	GenerateVideo(ctx context.Context, in *pb.GenerateVideoRequest, opts ...grpc.CallOption) (*pb.Result, error)
 	TTS(ctx context.Context, in *pb.TTSRequest, opts ...grpc.CallOption) (*pb.Result, error)
 	SoundGeneration(ctx context.Context, in *pb.SoundGenerationRequest, opts ...grpc.CallOption) (*pb.Result, error)
+	Detect(ctx context.Context, in *pb.DetectOptions, opts ...grpc.CallOption) (*pb.DetectResponse, error)
 	AudioTranscription(ctx context.Context, in *pb.TranscriptRequest, opts ...grpc.CallOption) (*pb.TranscriptResult, error)
 	TokenizeString(ctx context.Context, in *pb.PredictOptions, opts ...grpc.CallOption) (*pb.TokenizationResponse, error)
 	Status(ctx context.Context) (*pb.StatusResponse, error)
--- a/pkg/grpc/base/base.go
+++ b/pkg/grpc/base/base.go
@@ -69,6 +69,10 @@ func (llm *Base) SoundGeneration(*pb.SoundGenerationRequest) error {
 	return fmt.Errorf("unimplemented")
 }

+func (llm *Base) Detect(*pb.DetectOptions) (pb.DetectResponse, error) {
+	return pb.DetectResponse{}, fmt.Errorf("unimplemented")
+}
+
 func (llm *Base) TokenizeString(opts *pb.PredictOptions) (pb.TokenizationResponse, error) {
 	return pb.TokenizationResponse{}, fmt.Errorf("unimplemented")
 }
--- a/pkg/grpc/client.go
+++ b/pkg/grpc/client.go
@@ -504,3 +504,25 @@ func (c *Client) VAD(ctx context.Context, in *pb.VADRequest, opts ...grpc.CallOp
 	client := pb.NewBackendClient(conn)
 	return client.VAD(ctx, in, opts...)
 }
+
+func (c *Client) Detect(ctx context.Context, in *pb.DetectOptions, opts ...grpc.CallOption) (*pb.DetectResponse, error) {
+	if !c.parallel {
+		c.opMutex.Lock()
+		defer c.opMutex.Unlock()
+	}
+	c.setBusy(true)
+	defer c.setBusy(false)
+	c.wdMark()
+	defer c.wdUnMark()
+	conn, err := grpc.Dial(c.address, grpc.WithTransportCredentials(insecure.NewCredentials()),
+		grpc.WithDefaultCallOptions(
+			grpc.MaxCallRecvMsgSize(50*1024*1024), // 50MB
+			grpc.MaxCallSendMsgSize(50*1024*1024), // 50MB
+		))
+	if err != nil {
+		return nil, err
+	}
+	defer conn.Close()
+	client := pb.NewBackendClient(conn)
+	return client.Detect(ctx, in, opts...)
+}
--- a/pkg/grpc/embed.go
+++ b/pkg/grpc/embed.go
@@ -59,6 +59,10 @@ func (e *embedBackend) SoundGeneration(ctx context.Context, in *pb.SoundGenerati
 	return e.s.SoundGeneration(ctx, in)
 }

+func (e *embedBackend) Detect(ctx context.Context, in *pb.DetectOptions, opts ...grpc.CallOption) (*pb.DetectResponse, error) {
+	return e.s.Detect(ctx, in)
+}
+
 func (e *embedBackend) AudioTranscription(ctx context.Context, in *pb.TranscriptRequest, opts ...grpc.CallOption) (*pb.TranscriptResult, error) {
 	return e.s.AudioTranscription(ctx, in)
 }
--- a/pkg/grpc/interface.go
+++ b/pkg/grpc/interface.go
@@ -4,7 +4,7 @@ import (
 	pb "github.com/mudler/LocalAI/pkg/grpc/proto"
 )

-type LLM interface {
+type AIModel interface {
 	Busy() bool
 	Lock()
 	Unlock()
@@ -15,6 +15,7 @@ type LLM interface {
 	Embeddings(*pb.PredictOptions) ([]float32, error)
 	GenerateImage(*pb.GenerateImageRequest) error
 	GenerateVideo(*pb.GenerateVideoRequest) error
+	Detect(*pb.DetectOptions) (pb.DetectResponse, error)
 	AudioTranscription(*pb.TranscriptRequest) (pb.TranscriptResult, error)
 	TTS(*pb.TTSRequest) error
 	SoundGeneration(*pb.SoundGenerationRequest) error
--- a/pkg/grpc/server.go
+++ b/pkg/grpc/server.go
@@ -22,7 +22,7 @@ import (
 // server is used to implement helloworld.GreeterServer.
 type server struct {
 	pb.UnimplementedBackendServer
-	llm LLM
+	llm AIModel
 }

 func (s *server) Health(ctx context.Context, in *pb.HealthMessage) (*pb.Reply, error) {
@@ -111,6 +111,18 @@ func (s *server) SoundGeneration(ctx context.Context, in *pb.SoundGenerationRequ
 	return &pb.Result{Message: "Sound Generation audio generated", Success: true}, nil
 }

+func (s *server) Detect(ctx context.Context, in *pb.DetectOptions) (*pb.DetectResponse, error) {
+	if s.llm.Locking() {
+		s.llm.Lock()
+		defer s.llm.Unlock()
+	}
+	res, err := s.llm.Detect(in)
+	if err != nil {
+		return nil, err
+	}
+	return &res, nil
+}
+
 func (s *server) AudioTranscription(ctx context.Context, in *pb.TranscriptRequest) (*pb.TranscriptResult, error) {
 	if s.llm.Locking() {
 		s.llm.Lock()
@@ -251,7 +263,7 @@ func (s *server) VAD(ctx context.Context, in *pb.VADRequest) (*pb.VADResponse, e
 	return &res, nil
 }

-func StartServer(address string, model LLM) error {
+func StartServer(address string, model AIModel) error {
 	lis, err := net.Listen("tcp", address)
 	if err != nil {
 		return err
@@ -269,7 +281,7 @@ func StartServer(address string, model LLM) error {
 	return nil
 }

-func RunServer(address string, model LLM) (func() error, error) {
+func RunServer(address string, model AIModel) (func() error, error) {
 	lis, err := net.Listen("tcp", address)
 	if err != nil {
 		return nil, err
--- a/pkg/utils/base64.go
+++ b/pkg/utils/base64.go
@@ -20,7 +20,7 @@ var dataURIPattern = regexp.MustCompile(`^data:([^;]+);base64,`)

 // GetContentURIAsBase64 checks if the string is an URL, if it's an URL downloads the content in memory encodes it in base64 and returns the base64 string, otherwise returns the string by stripping base64 data headers
 func GetContentURIAsBase64(s string) (string, error) {
-	if strings.HasPrefix(s, "http") {
+	if strings.HasPrefix(s, "http") || strings.HasPrefix(s, "https") {
 		// download the image
 		resp, err := base64DownloadClient.Get(s)
 		if err != nil {
--- a/pkg/utils/config.go
+++ b/pkg/utils/config.go
@@ -1,42 +0,0 @@
-package utils
-
-import (
-	"encoding/json"
-	"os"
-	"path/filepath"
-
-	"github.com/rs/zerolog/log"
-)
-
-func SaveConfig(filePath, fileName string, obj any) {
-	file, err := json.MarshalIndent(obj, "", " ")
-	if err != nil {
-		log.Error().Err(err).Msg("failed to JSON marshal the uploadedFiles")
-	}
-
-	absolutePath := filepath.Join(filePath, fileName)
-	err = os.WriteFile(absolutePath, file, 0600)
-	if err != nil {
-		log.Error().Err(err).Str("filepath", absolutePath).Msg("failed to save configuration file")
-	}
-}
-
-func LoadConfig(filePath, fileName string, obj interface{}) {
-	uploadFilePath := filepath.Join(filePath, fileName)
-
-	_, err := os.Stat(uploadFilePath)
-	if os.IsNotExist(err) {
-		log.Debug().Msgf("No configuration file found at %s", uploadFilePath)
-		return
-	}
-
-	file, err := os.ReadFile(uploadFilePath)
-	if err != nil {
-		log.Error().Err(err).Str("filepath", uploadFilePath).Msg("failed to read file")
-	} else {
-		err = json.Unmarshal(file, &obj)
-		if err != nil {
-			log.Error().Err(err).Str("filepath", uploadFilePath).Msg("failed to parse file as JSON")
-		}
-	}
-}
Author	SHA1	Message	Date
Ettore Di Giacinto	36179ffbed	fix(backend gallery): intel images for python-based backends, re-add exllama2 (#5928 ) chore(backend gallery): fix intel images for python-based backends Signed-off-by: Ettore Di Giacinto <mudler@localai.io>	2025-07-28 15:15:19 +02:00
LocalAI [bot]	d25145e641	chore: ⬆️ Update ggml-org/llama.cpp to `bf78f5439ee8e82e367674043303ebf8e92b4805` (#5927 ) ⬆️ Update ggml-org/llama.cpp Signed-off-by: github-actions[bot] <41898282+github-actions[bot]@users.noreply.github.com> Co-authored-by: mudler <2420543+mudler@users.noreply.github.com>	2025-07-27 21:08:32 +00:00
Ettore Di Giacinto	949e5b9be8	feat(rfdetr): add object detection API (#5923 ) Signed-off-by: Ettore Di Giacinto <mudler@localai.io>	2025-07-27 22:02:51 +02:00
Ettore Di Giacinto	73ecb7f90b	chore: drop assistants endpoint (#5926 ) Signed-off-by: Ettore Di Giacinto <mudler@localai.io>	2025-07-27 21:06:09 +02:00
Ettore Di Giacinto	053bed6e5f	feat: normalize search (#5925 ) Signed-off-by: Ettore Di Giacinto <mudler@localai.io>	2025-07-27 11:51:28 +02:00
LocalAI [bot]	932360bf7e	chore: ⬆️ Update ggml-org/llama.cpp to `11dd5a44eb180e1d69fac24d3852b5222d66fb7f` (#5921 ) ⬆️ Update ggml-org/llama.cpp Signed-off-by: github-actions[bot] <41898282+github-actions[bot]@users.noreply.github.com> Co-authored-by: mudler <2420543+mudler@users.noreply.github.com>	2025-07-27 09:50:56 +02:00
LocalAI [bot]	6d0b52843f	chore: ⬆️ Update ggml-org/whisper.cpp to `e7bf0294ec9099b5fc21f5ba969805dfb2108cea` (#5922 ) ⬆️ Update ggml-org/whisper.cpp Signed-off-by: github-actions[bot] <41898282+github-actions[bot]@users.noreply.github.com> Co-authored-by: mudler <2420543+mudler@users.noreply.github.com>	2025-07-27 09:42:28 +02:00
LocalAI [bot]	078c22f485	docs: ⬆️ update docs version mudler/LocalAI (#5920 ) ⬆️ Update docs version mudler/LocalAI Signed-off-by: github-actions[bot] <41898282+github-actions[bot]@users.noreply.github.com> Co-authored-by: mudler <2420543+mudler@users.noreply.github.com>	2025-07-26 20:58:54 +00:00
Ettore Di Giacinto	6ef3852de5	chore(docs): fixup tag Signed-off-by: Ettore Di Giacinto <mudler@localai.io>	2025-07-26 21:25:07 +02:00