fix(backends gallery): pass-by backend galleries to the model service (#5906 )

Signed-off-by: Ettore Di Giacinto <mudler@localai.io>
Update README.md
2026-02-03 19:22:39 -05:00 · 2025-07-25 16:38:09 +02:00 · 2025-07-25 11:51:23 +02:00 · 2025-07-25 10:43:22 +02:00 · 2025-07-25 08:40:24 +02:00 · 2025-07-25 08:40:06 +02:00
50 changed files with 303 additions and 424 deletions
--- a/.github/workflows/backend.yml
+++ b/.github/workflows/backend.yml
@@ -597,7 +597,7 @@ jobs:
            base-image: "ubuntu:22.04"
            skip-drivers: 'false'
            backend: "piper"
-            dockerfile: "./backend/Dockerfile.go"
+            dockerfile: "./backend/Dockerfile.golang"
            context: "./"
          # bark-cpp
          - build-type: ''
@@ -610,7 +610,7 @@ jobs:
            base-image: "ubuntu:22.04"
            skip-drivers: 'false'
            backend: "bark-cpp"
-            dockerfile: "./backend/Dockerfile.go"
+            dockerfile: "./backend/Dockerfile.golang"
            context: "./"
          - build-type: ''
            cuda-major-version: ""
@@ -659,7 +659,7 @@ jobs:
            base-image: "ubuntu:22.04"
            skip-drivers: 'false'
            backend: "stablediffusion-ggml"
-            dockerfile: "./backend/Dockerfile.go"
+            dockerfile: "./backend/Dockerfile.golang"
            context: "./"
          - build-type: 'cublas'
            cuda-major-version: "12"
@@ -671,7 +671,7 @@ jobs:
            base-image: "ubuntu:22.04"
            skip-drivers: 'false'
            backend: "stablediffusion-ggml"
-            dockerfile: "./backend/Dockerfile.go"
+            dockerfile: "./backend/Dockerfile.golang"
            context: "./"
          - build-type: 'cublas'
            cuda-major-version: "11"
@@ -683,7 +683,7 @@ jobs:
            base-image: "ubuntu:22.04"
            skip-drivers: 'false'
            backend: "stablediffusion-ggml"
-            dockerfile: "./backend/Dockerfile.go"
+            dockerfile: "./backend/Dockerfile.golang"
            context: "./"
          - build-type: 'sycl_f32'
            cuda-major-version: ""
@@ -695,7 +695,7 @@ jobs:
            base-image: "quay.io/go-skynet/intel-oneapi-base:latest"
            skip-drivers: 'false'
            backend: "stablediffusion-ggml"
-            dockerfile: "./backend/Dockerfile.go"
+            dockerfile: "./backend/Dockerfile.golang"
            context: "./"
          - build-type: 'sycl_f16'
            cuda-major-version: ""
@@ -707,7 +707,7 @@ jobs:
            base-image: "quay.io/go-skynet/intel-oneapi-base:latest"
            skip-drivers: 'false'
            backend: "stablediffusion-ggml"
-            dockerfile: "./backend/Dockerfile.go"
+            dockerfile: "./backend/Dockerfile.golang"
            context: "./"
          - build-type: 'vulkan'
            cuda-major-version: ""
@@ -719,7 +719,7 @@ jobs:
            base-image: "ubuntu:22.04"
            skip-drivers: 'false'
            backend: "stablediffusion-ggml"
-            dockerfile: "./backend/Dockerfile.go"
+            dockerfile: "./backend/Dockerfile.golang"
            context: "./"
          - build-type: 'cublas'
            cuda-major-version: "12"
@@ -731,7 +731,7 @@ jobs:
            base-image: "nvcr.io/nvidia/l4t-jetpack:r36.4.0"
            runs-on: 'ubuntu-24.04-arm'
            backend: "stablediffusion-ggml"
-            dockerfile: "./backend/Dockerfile.go"
+            dockerfile: "./backend/Dockerfile.golang"
            context: "./"
          # whisper
          - build-type: ''
@@ -744,7 +744,7 @@ jobs:
            base-image: "ubuntu:22.04"
            skip-drivers: 'false'
            backend: "whisper"
-            dockerfile: "./backend/Dockerfile.go"
+            dockerfile: "./backend/Dockerfile.golang"
            context: "./"
          - build-type: 'cublas'
            cuda-major-version: "12"
@@ -756,7 +756,7 @@ jobs:
            base-image: "ubuntu:22.04"
            skip-drivers: 'false'
            backend: "whisper"
-            dockerfile: "./backend/Dockerfile.go"
+            dockerfile: "./backend/Dockerfile.golang"
            context: "./"
          - build-type: 'cublas'
            cuda-major-version: "11"
@@ -768,7 +768,7 @@ jobs:
            base-image: "ubuntu:22.04"
            skip-drivers: 'false'
            backend: "whisper"
-            dockerfile: "./backend/Dockerfile.go"
+            dockerfile: "./backend/Dockerfile.golang"
            context: "./"
          - build-type: 'sycl_f32'
            cuda-major-version: ""
@@ -780,7 +780,7 @@ jobs:
            base-image: "quay.io/go-skynet/intel-oneapi-base:latest"
            skip-drivers: 'false'
            backend: "whisper"
-            dockerfile: "./backend/Dockerfile.go"
+            dockerfile: "./backend/Dockerfile.golang"
            context: "./"
          - build-type: 'sycl_f16'
            cuda-major-version: ""
@@ -792,7 +792,7 @@ jobs:
            base-image: "quay.io/go-skynet/intel-oneapi-base:latest"
            skip-drivers: 'false'
            backend: "whisper"
-            dockerfile: "./backend/Dockerfile.go"
+            dockerfile: "./backend/Dockerfile.golang"
            context: "./"
          - build-type: 'vulkan'
            cuda-major-version: ""
@@ -804,7 +804,7 @@ jobs:
            base-image: "ubuntu:22.04"
            skip-drivers: 'false'
            backend: "whisper"
-            dockerfile: "./backend/Dockerfile.go"
+            dockerfile: "./backend/Dockerfile.golang"
            context: "./"
          - build-type: 'cublas'
            cuda-major-version: "12"
@@ -816,7 +816,7 @@ jobs:
            base-image: "nvcr.io/nvidia/l4t-jetpack:r36.4.0"
            runs-on: 'ubuntu-24.04-arm'
            backend: "whisper"
-            dockerfile: "./backend/Dockerfile.go"
+            dockerfile: "./backend/Dockerfile.golang"
            context: "./"
          - build-type: 'hipblas'
            cuda-major-version: ""
@@ -828,7 +828,7 @@ jobs:
            runs-on: 'ubuntu-latest'
            skip-drivers: 'false'
            backend: "whisper"
-            dockerfile: "./backend/Dockerfile.go"
+            dockerfile: "./backend/Dockerfile.golang"
            context: "./"
          #silero-vad
          - build-type: ''
@@ -841,7 +841,7 @@ jobs:
            base-image: "ubuntu:22.04"
            skip-drivers: 'false'
            backend: "silero-vad"
-            dockerfile: "./backend/Dockerfile.go"
+            dockerfile: "./backend/Dockerfile.golang"
            context: "./"
          # local-store
          - build-type: ''
@@ -854,7 +854,7 @@ jobs:
            base-image: "ubuntu:22.04"
            skip-drivers: 'false'
            backend: "local-store"
-            dockerfile: "./backend/Dockerfile.go"
+            dockerfile: "./backend/Dockerfile.golang"
            context: "./"
          # huggingface
          - build-type: ''
@@ -867,7 +867,7 @@ jobs:
            base-image: "ubuntu:22.04"
            skip-drivers: 'false'
            backend: "huggingface"
-            dockerfile: "./backend/Dockerfile.go"
+            dockerfile: "./backend/Dockerfile.golang"
            context: "./"  
  llama-cpp-darwin:
    runs-on: macOS-14
--- a/.github/workflows/bump_deps.yaml
+++ b/.github/workflows/bump_deps.yaml
@@ -21,7 +21,7 @@ jobs:
            variable: "BARKCPP_VERSION"
            branch: "main"
            file: "Makefile"
-          - repository: "richiejp/stable-diffusion.cpp"
+          - repository: "leejet/stable-diffusion.cpp"
            variable: "STABLEDIFFUSION_GGML_VERSION"
            branch: "master"
            file: "backend/go/stablediffusion-ggml/Makefile"
--- a/.gitignore
+++ b/.gitignore
@@ -12,6 +12,7 @@ prepare-sources
 /backends
 /backend-images
 /result.yaml
+protoc

 *.log

--- a/130
+++ b/130
@@ -145,7 +145,7 @@ backends/stablediffusion-ggml: docker-build-stablediffusion-ggml docker-save-sta

 backends/whisper: docker-build-whisper docker-save-whisper build
 	./local-ai backends install "ocifile://$(abspath ./backend-images/whisper.tar)"
-	
+
 backends/silero-vad: docker-build-silero-vad docker-save-silero-vad build
 	./local-ai backends install "ocifile://$(abspath ./backend-images/silero-vad.tar)"

@@ -242,10 +242,7 @@ help: ## Show this help.
 ########################################################

 .PHONY: protogen
-protogen: protogen-go protogen-python
-
-.PHONY: protogen-clean
-protogen-clean: protogen-go-clean protogen-python-clean
+protogen: protogen-go

 protoc:
 	@OS_NAME=$$(uname -s | tr '[:upper:]' '[:lower:]'); \
@@ -290,93 +287,6 @@ protogen-go-clean:
 	$(RM) pkg/grpc/proto/backend.pb.go pkg/grpc/proto/backend_grpc.pb.go
 	$(RM) bin/*

-.PHONY: protogen-python
-protogen-python: bark-protogen coqui-protogen chatterbox-protogen diffusers-protogen exllama2-protogen rerankers-protogen transformers-protogen kokoro-protogen vllm-protogen faster-whisper-protogen
-
-.PHONY: protogen-python-clean
-protogen-python-clean: bark-protogen-clean coqui-protogen-clean chatterbox-protogen-clean diffusers-protogen-clean  exllama2-protogen-clean rerankers-protogen-clean transformers-protogen-clean kokoro-protogen-clean vllm-protogen-clean faster-whisper-protogen-clean
-
-.PHONY: bark-protogen
-bark-protogen:
-	$(MAKE) -C backend/python/bark protogen
-
-.PHONY: bark-protogen-clean
-bark-protogen-clean:
-	$(MAKE) -C backend/python/bark protogen-clean
-
-.PHONY: coqui-protogen
-coqui-protogen:
-	$(MAKE) -C backend/python/coqui protogen
-
-.PHONY: coqui-protogen-clean
-coqui-protogen-clean:
-	$(MAKE) -C backend/python/coqui protogen-clean
-
-.PHONY: diffusers-protogen
-diffusers-protogen:
-	$(MAKE) -C backend/python/diffusers protogen
-
-.PHONY: chatterbox-protogen
-chatterbox-protogen:
-	$(MAKE) -C backend/python/chatterbox protogen
-
-.PHONY: diffusers-protogen-clean
-diffusers-protogen-clean:
-	$(MAKE) -C backend/python/diffusers protogen-clean
-
-.PHONY: chatterbox-protogen-clean
-chatterbox-protogen-clean:
-	$(MAKE) -C backend/python/chatterbox protogen-clean
-
-.PHONY: faster-whisper-protogen
-faster-whisper-protogen:
-	$(MAKE) -C backend/python/faster-whisper protogen
-
-.PHONY: faster-whisper-protogen-clean
-faster-whisper-protogen-clean:
-	$(MAKE) -C backend/python/faster-whisper protogen-clean
-
-.PHONY: exllama2-protogen
-exllama2-protogen:
-	$(MAKE) -C backend/python/exllama2 protogen
-
-.PHONY: exllama2-protogen-clean
-exllama2-protogen-clean:
-	$(MAKE) -C backend/python/exllama2 protogen-clean
-
-.PHONY: rerankers-protogen
-rerankers-protogen:
-	$(MAKE) -C backend/python/rerankers protogen
-
-.PHONY: rerankers-protogen-clean
-rerankers-protogen-clean:
-	$(MAKE) -C backend/python/rerankers protogen-clean
-
-.PHONY: transformers-protogen
-transformers-protogen:
-	$(MAKE) -C backend/python/transformers protogen
-
-.PHONY: transformers-protogen-clean
-transformers-protogen-clean:
-	$(MAKE) -C backend/python/transformers protogen-clean
-
-.PHONY: kokoro-protogen
-kokoro-protogen:
-	$(MAKE) -C backend/python/kokoro protogen
-
-.PHONY: kokoro-protogen-clean
-kokoro-protogen-clean:
-	$(MAKE) -C backend/python/kokoro protogen-clean
-
-.PHONY: vllm-protogen
-vllm-protogen:
-	$(MAKE) -C backend/python/vllm protogen
-
-.PHONY: vllm-protogen-clean
-vllm-protogen-clean:
-	$(MAKE) -C backend/python/vllm protogen-clean
-
-
 prepare-test-extra: protogen-python
 	$(MAKE) -C backend/python/transformers
 	$(MAKE) -C backend/python/diffusers
@@ -449,19 +359,19 @@ backend-images:
 	mkdir -p backend-images

 docker-build-llama-cpp:
-	docker build --build-arg BUILD_TYPE=$(BUILD_TYPE) --build-arg IMAGE_BASE=$(IMAGE_BASE) -t local-ai-backend:llama-cpp -f backend/Dockerfile.llama-cpp .
+	docker build --build-arg BUILD_TYPE=$(BUILD_TYPE) --build-arg BASE_IMAGE=$(BASE_IMAGE) -t local-ai-backend:llama-cpp -f backend/Dockerfile.llama-cpp .

 docker-build-bark-cpp:
-	docker build -t local-ai-backend:bark-cpp -f backend/Dockerfile.go --build-arg BACKEND=bark-cpp .
+	docker build --build-arg BUILD_TYPE=$(BUILD_TYPE) --build-arg BASE_IMAGE=$(BASE_IMAGE) -t local-ai-backend:bark-cpp -f backend/Dockerfile.golang --build-arg BACKEND=bark-cpp .

 docker-build-piper:
-	docker build -t local-ai-backend:piper -f backend/Dockerfile.go --build-arg BACKEND=piper .
+	docker build --build-arg BUILD_TYPE=$(BUILD_TYPE) --build-arg BASE_IMAGE=$(BASE_IMAGE) -t local-ai-backend:piper -f backend/Dockerfile.golang --build-arg BACKEND=piper .

 docker-build-local-store:
-	docker build -t local-ai-backend:local-store -f backend/Dockerfile.go --build-arg BACKEND=local-store .
+	docker build --build-arg BUILD_TYPE=$(BUILD_TYPE) --build-arg BASE_IMAGE=$(BASE_IMAGE) -t local-ai-backend:local-store -f backend/Dockerfile.golang --build-arg BACKEND=local-store .

 docker-build-huggingface:
-	docker build -t local-ai-backend:huggingface -f backend/Dockerfile.go --build-arg BACKEND=huggingface .
+	docker build --build-arg BUILD_TYPE=$(BUILD_TYPE) --build-arg BASE_IMAGE=$(BASE_IMAGE) -t local-ai-backend:huggingface -f backend/Dockerfile.golang --build-arg BACKEND=huggingface .

 docker-save-huggingface: backend-images
 	docker save local-ai-backend:huggingface -o backend-images/huggingface.tar
@@ -470,7 +380,7 @@ docker-save-local-store: backend-images
 	docker save local-ai-backend:local-store -o backend-images/local-store.tar

 docker-build-silero-vad:
-	docker build -t local-ai-backend:silero-vad -f backend/Dockerfile.go --build-arg BACKEND=silero-vad .
+	docker build --build-arg BUILD_TYPE=$(BUILD_TYPE) --build-arg BASE_IMAGE=$(BASE_IMAGE) -t local-ai-backend:silero-vad -f backend/Dockerfile.golang --build-arg BACKEND=silero-vad .

 docker-save-silero-vad: backend-images
 	docker save local-ai-backend:silero-vad -o backend-images/silero-vad.tar
@@ -485,46 +395,46 @@ docker-save-bark-cpp: backend-images
 	docker save local-ai-backend:bark-cpp -o backend-images/bark-cpp.tar

 docker-build-stablediffusion-ggml:
-	docker build -t local-ai-backend:stablediffusion-ggml -f backend/Dockerfile.go --build-arg BACKEND=stablediffusion-ggml .
+	docker build --build-arg BUILD_TYPE=$(BUILD_TYPE) --build-arg BASE_IMAGE=$(BASE_IMAGE) -t local-ai-backend:stablediffusion-ggml -f backend/Dockerfile.golang --build-arg BACKEND=stablediffusion-ggml .

 docker-save-stablediffusion-ggml: backend-images
 	docker save local-ai-backend:stablediffusion-ggml -o backend-images/stablediffusion-ggml.tar

 docker-build-rerankers:
-	docker build -t local-ai-backend:rerankers -f backend/Dockerfile.python --build-arg BACKEND=rerankers .
+	docker build --build-arg BUILD_TYPE=$(BUILD_TYPE) --build-arg BASE_IMAGE=$(BASE_IMAGE) -t local-ai-backend:rerankers -f backend/Dockerfile.python --build-arg BACKEND=rerankers .

 docker-build-vllm:
-	docker build -t local-ai-backend:vllm -f backend/Dockerfile.python --build-arg BACKEND=vllm .
+	docker build --build-arg BUILD_TYPE=$(BUILD_TYPE) --build-arg BASE_IMAGE=$(BASE_IMAGE) -t local-ai-backend:vllm -f backend/Dockerfile.python --build-arg BACKEND=vllm .

 docker-build-transformers:
-	docker build -t local-ai-backend:transformers -f backend/Dockerfile.python --build-arg BACKEND=transformers .
+	docker build --build-arg BUILD_TYPE=$(BUILD_TYPE) --build-arg BASE_IMAGE=$(BASE_IMAGE) -t local-ai-backend:transformers -f backend/Dockerfile.python --build-arg BACKEND=transformers .

 docker-build-diffusers:
-	docker build -t local-ai-backend:diffusers -f backend/Dockerfile.python --build-arg BACKEND=diffusers .
+	docker build --build-arg BUILD_TYPE=$(BUILD_TYPE) --build-arg BASE_IMAGE=$(BASE_IMAGE) -t local-ai-backend:diffusers -f backend/Dockerfile.python --build-arg BACKEND=diffusers .

 docker-build-kokoro:
-	docker build -t local-ai-backend:kokoro -f backend/Dockerfile.python --build-arg BACKEND=kokoro .
+	docker build --build-arg BUILD_TYPE=$(BUILD_TYPE) --build-arg BASE_IMAGE=$(BASE_IMAGE) -t local-ai-backend:kokoro -f backend/Dockerfile.python --build-arg BACKEND=kokoro .

 docker-build-whisper:
-	docker build --build-arg BUILD_TYPE=$(BUILD_TYPE) --build-arg BASE_IMAGE=$(BASE_IMAGE) -t local-ai-backend:whisper -f backend/Dockerfile.go --build-arg BACKEND=whisper  .
+	docker build --build-arg BUILD_TYPE=$(BUILD_TYPE) --build-arg BASE_IMAGE=$(BASE_IMAGE) -t local-ai-backend:whisper -f backend/Dockerfile.golang --build-arg BACKEND=whisper  .

 docker-save-whisper: backend-images
 	docker save local-ai-backend:whisper -o backend-images/whisper.tar

 docker-build-faster-whisper:
-	docker build -t local-ai-backend:faster-whisper -f backend/Dockerfile.python --build-arg BACKEND=faster-whisper .
+	docker build --build-arg BUILD_TYPE=$(BUILD_TYPE) --build-arg BASE_IMAGE=$(BASE_IMAGE) -t local-ai-backend:faster-whisper -f backend/Dockerfile.python --build-arg BACKEND=faster-whisper .

 docker-build-coqui:
-	docker build -t local-ai-backend:coqui -f backend/Dockerfile.python --build-arg BACKEND=coqui .
+	docker build --build-arg BUILD_TYPE=$(BUILD_TYPE) --build-arg BASE_IMAGE=$(BASE_IMAGE) -t local-ai-backend:coqui -f backend/Dockerfile.python --build-arg BACKEND=coqui .

 docker-build-bark:
-	docker build -t local-ai-backend:bark -f backend/Dockerfile.python --build-arg BACKEND=bark .
+	docker build --build-arg BUILD_TYPE=$(BUILD_TYPE) --build-arg BASE_IMAGE=$(BASE_IMAGE) -t local-ai-backend:bark -f backend/Dockerfile.python --build-arg BACKEND=bark .

 docker-build-chatterbox:
-	docker build -t local-ai-backend:chatterbox -f backend/Dockerfile.python --build-arg BACKEND=chatterbox .
+	docker build --build-arg BUILD_TYPE=$(BUILD_TYPE) --build-arg BASE_IMAGE=$(BASE_IMAGE) -t local-ai-backend:chatterbox -f backend/Dockerfile.python --build-arg BACKEND=chatterbox .

 docker-build-exllama2:
-	docker build -t local-ai-backend:exllama2 -f backend/Dockerfile.python --build-arg BACKEND=exllama2 .
+	docker build --build-arg BUILD_TYPE=$(BUILD_TYPE) --build-arg BASE_IMAGE=$(BASE_IMAGE) -t local-ai-backend:exllama2 -f backend/Dockerfile.python --build-arg BACKEND=exllama2 .

 docker-build-backends: docker-build-llama-cpp docker-build-rerankers docker-build-vllm docker-build-transformers docker-build-diffusers docker-build-kokoro docker-build-faster-whisper docker-build-coqui docker-build-bark docker-build-chatterbox docker-build-exllama2

--- a/README.md
+++ b/README.md
@@ -193,6 +193,7 @@ For more information, see [💻 Getting started](https://localai.io/basics/getti

 ## 📰 Latest project news

+- July 2025: All backends migrated outside of the main binary. LocalAI is now more lightweight, small, and automatically downloads the required backend to run the model. [Read the release notes](https://github.com/mudler/LocalAI/releases/tag/v3.2.0)
 - June 2025: [Backend management](https://github.com/mudler/LocalAI/pull/5607) has been added. Attention: extras images are going to be deprecated from the next release! Read [the backend management PR](https://github.com/mudler/LocalAI/pull/5607).
 - May 2025: [Audio input](https://github.com/mudler/LocalAI/pull/5466) and [Reranking](https://github.com/mudler/LocalAI/pull/5396) in llama.cpp backend, [Realtime API](https://github.com/mudler/LocalAI/pull/5392),  Support to Gemma, SmollVLM, and more multimodal models (available in the gallery).
 - May 2025: Important: image name changes [See release](https://github.com/mudler/LocalAI/releases/tag/v2.29.0)
--- a/backend/Dockerfile.golang
+++ b/backend/Dockerfile.golang
--- a/backend/Dockerfile.llama-cpp
+++ b/backend/Dockerfile.llama-cpp
@@ -11,7 +11,6 @@ ARG GRPC_MAKEFLAGS="-j4 -Otarget"
 ARG GRPC_VERSION=v1.65.0
 ARG CMAKE_FROM_SOURCE=false
 ARG CMAKE_VERSION=3.26.4
-ARG PROTOBUF_VERSION=v21.12

 ENV MAKEFLAGS=${GRPC_MAKEFLAGS}

@@ -50,14 +49,6 @@ RUN git clone --recurse-submodules --jobs 4 -b ${GRPC_VERSION} --depth 1 --shall
    make install && \
    rm -rf /build

-RUN git clone --recurse-submodules --branch ${PROTOBUF_VERSION} https://github.com/protocolbuffers/protobuf.git && \
-    mkdir -p /build/protobuf/build && \
-    cd /build/protobuf/build && \
-    cmake -Dprotobuf_BUILD_SHARED_LIBS=ON -Dprotobuf_BUILD_TESTS=OFF .. && \
-    make && \
-    make install && \
-    rm -rf /build
-
 FROM ${BASE_IMAGE} AS builder
 ARG BACKEND=rerankers
 ARG BUILD_TYPE
@@ -189,9 +180,21 @@ COPY --from=grpc /opt/grpc /usr/local

 COPY . /LocalAI

-RUN make -C /LocalAI/backend/cpp/llama-cpp llama-cpp
-RUN make -C /LocalAI/backend/cpp/llama-cpp llama-cpp-grpc
-RUN make -C /LocalAI/backend/cpp/llama-cpp llama-cpp-rpc-server
+## Otherwise just run the normal build
+RUN <<EOT bash
+if [ "${TARGETARCH}" = "arm64" ] || [ "${BUILD_TYPE}" = "hipblas" ]; then \
+        cd /LocalAI/backend/cpp/llama-cpp && make llama-cpp-fallback && \
+        make llama-cpp-grpc && make llama-cpp-rpc-server; \
+    else \
+        cd /LocalAI/backend/cpp/llama-cpp && make llama-cpp-avx && \
+        make llama-cpp-avx2 && \
+        make llama-cpp-avx512 && \
+        make llama-cpp-fallback && \
+        make llama-cpp-grpc && \
+        make llama-cpp-rpc-server; \
+    fi  
+EOT
+

 # Copy libraries using a script to handle architecture differences
 RUN make -C /LocalAI/backend/cpp/llama-cpp package
--- a/backend/cpp/llama-cpp/CMakeLists.txt
+++ b/backend/cpp/llama-cpp/CMakeLists.txt
@@ -17,8 +17,6 @@ if (${CMAKE_SYSTEM_NAME} MATCHES "Darwin")
    include_directories("${HOMEBREW_DEFAULT_PREFIX}/include")
 endif()

-set(Protobuf_USE_STATIC_LIBS OFF)
-set(gRPC_USE_STATIC_LIBS OFF)
 find_package(absl CONFIG REQUIRED)
 find_package(Protobuf CONFIG REQUIRED)
 find_package(gRPC CONFIG REQUIRED)
--- a/backend/cpp/llama-cpp/Makefile
+++ b/backend/cpp/llama-cpp/Makefile
@@ -1,5 +1,5 @@

-LLAMA_VERSION?=acd6cb1c41676f6bbb25c2a76fa5abeb1719301e
+LLAMA_VERSION?=3f4fc97f1d745f1d5d3c853949503136d419e6de
 LLAMA_REPO?=https://github.com/ggerganov/llama.cpp

 CMAKE_ARGS?=
@@ -7,9 +7,10 @@ BUILD_TYPE?=
 NATIVE?=false
 ONEAPI_VARS?=/opt/intel/oneapi/setvars.sh
 TARGET?=--target grpc-server
+JOBS?=$(shell nproc)

 # Disable Shared libs as we are linking on static gRPC and we can't mix shared and static
-CMAKE_ARGS+=-DBUILD_SHARED_LIBS=ON -DLLAMA_CURL=OFF -DGGML_CPU_ALL_VARIANTS=ON -DGGML_BACKEND_DL=ON
+CMAKE_ARGS+=-DBUILD_SHARED_LIBS=OFF -DLLAMA_CURL=OFF

 CURRENT_MAKEFILE_DIR := $(dir $(abspath $(lastword $(MAKEFILE_LIST))))
 ifeq ($(NATIVE),false)
@@ -89,12 +90,33 @@ else
 	LLAMA_VERSION=$(LLAMA_VERSION) $(MAKE) -C $(CURRENT_MAKEFILE_DIR)/../$(VARIANT) grpc-server
 endif

-llama-cpp: llama.cpp
-	cp -rf $(CURRENT_MAKEFILE_DIR)/../llama-cpp $(CURRENT_MAKEFILE_DIR)/../llama-cpp-build
-	$(MAKE) -C $(CURRENT_MAKEFILE_DIR)/../llama-cpp-build purge
-	$(info ${GREEN}I llama-cpp build info:${RESET})
-	CMAKE_ARGS="$(CMAKE_ARGS)" $(MAKE) VARIANT="llama-cpp-build" build-llama-cpp-grpc-server
-	cp -rfv $(CURRENT_MAKEFILE_DIR)/../llama-cpp-build/grpc-server llama-cpp
+llama-cpp-avx2: llama.cpp
+	cp -rf $(CURRENT_MAKEFILE_DIR)/../llama-cpp $(CURRENT_MAKEFILE_DIR)/../llama-cpp-avx2-build
+	$(MAKE) -C $(CURRENT_MAKEFILE_DIR)/../llama-cpp-avx2-build purge
+	$(info ${GREEN}I llama-cpp build info:avx2${RESET})
+	CMAKE_ARGS="$(CMAKE_ARGS) -DGGML_AVX=on -DGGML_AVX2=on -DGGML_AVX512=off -DGGML_FMA=on -DGGML_F16C=on" $(MAKE) VARIANT="llama-cpp-avx2-build" build-llama-cpp-grpc-server
+	cp -rfv $(CURRENT_MAKEFILE_DIR)/../llama-cpp-avx2-build/grpc-server llama-cpp-avx2
+
+llama-cpp-avx512: llama.cpp
+	cp -rf $(CURRENT_MAKEFILE_DIR)/../llama-cpp $(CURRENT_MAKEFILE_DIR)/../llama-cpp-avx512-build
+	$(MAKE) -C $(CURRENT_MAKEFILE_DIR)/../llama-cpp-avx512-build purge
+	$(info ${GREEN}I llama-cpp build info:avx512${RESET})
+	CMAKE_ARGS="$(CMAKE_ARGS) -DGGML_AVX=on -DGGML_AVX2=off -DGGML_AVX512=on -DGGML_FMA=on -DGGML_F16C=on" $(MAKE) VARIANT="llama-cpp-avx512-build" build-llama-cpp-grpc-server
+	cp -rfv $(CURRENT_MAKEFILE_DIR)/../llama-cpp-avx512-build/grpc-server llama-cpp-avx512
+
+llama-cpp-avx: llama.cpp
+	cp -rf $(CURRENT_MAKEFILE_DIR)/../llama-cpp $(CURRENT_MAKEFILE_DIR)/../llama-cpp-avx-build
+	$(MAKE) -C $(CURRENT_MAKEFILE_DIR)/../llama-cpp-avx-build purge
+	$(info ${GREEN}I llama-cpp build info:avx${RESET})
+	CMAKE_ARGS="$(CMAKE_ARGS) -DGGML_AVX=on -DGGML_AVX2=off -DGGML_AVX512=off -DGGML_FMA=off -DGGML_F16C=off" $(MAKE) VARIANT="llama-cpp-avx-build" build-llama-cpp-grpc-server
+	cp -rfv $(CURRENT_MAKEFILE_DIR)/../llama-cpp-avx-build/grpc-server llama-cpp-avx
+
+llama-cpp-fallback: llama.cpp
+	cp -rf $(CURRENT_MAKEFILE_DIR)/../llama-cpp $(CURRENT_MAKEFILE_DIR)/../llama-cpp-fallback-build
+	$(MAKE) -C $(CURRENT_MAKEFILE_DIR)/../llama-cpp-fallback-build purge
+	$(info ${GREEN}I llama-cpp build info:fallback${RESET})
+	CMAKE_ARGS="$(CMAKE_ARGS) -DGGML_AVX=off -DGGML_AVX2=off -DGGML_AVX512=off -DGGML_FMA=off -DGGML_F16C=off" $(MAKE) VARIANT="llama-cpp-fallback-build" build-llama-cpp-grpc-server
+	cp -rfv $(CURRENT_MAKEFILE_DIR)/../llama-cpp-fallback-build/grpc-server llama-cpp-fallback

 llama-cpp-grpc: llama.cpp
 	cp -rf $(CURRENT_MAKEFILE_DIR)/../llama-cpp $(CURRENT_MAKEFILE_DIR)/../llama-cpp-grpc-build
@@ -139,8 +161,8 @@ grpc-server: llama.cpp llama.cpp/tools/grpc-server
 	@echo "Building grpc-server with $(BUILD_TYPE) build type and $(CMAKE_ARGS)"
 ifneq (,$(findstring sycl,$(BUILD_TYPE)))
 	+bash -c "source $(ONEAPI_VARS); \
-	cd llama.cpp && mkdir -p build && cd build && cmake .. $(CMAKE_ARGS) && cmake --build . --config Release $(TARGET)"
+	cd llama.cpp && mkdir -p build && cd build && cmake .. $(CMAKE_ARGS) && cmake --build . --config Release -j $(JOBS) $(TARGET)"
 else
-	+cd llama.cpp && mkdir -p build && cd build && cmake .. $(CMAKE_ARGS) && cmake --build . --config Release $(TARGET)
+	+cd llama.cpp && mkdir -p build && cd build && cmake .. $(CMAKE_ARGS) && cmake --build . --config Release -j $(JOBS) $(TARGET)
 endif
 	cp llama.cpp/build/bin/grpc-server .
--- a/backend/cpp/llama-cpp/run.sh
+++ b/backend/cpp/llama-cpp/run.sh
@@ -6,9 +6,34 @@ CURDIR=$(dirname "$(realpath $0)")

 cd /

-BINARY=llama-cpp
+echo "CPU info:"
+grep -e "model\sname" /proc/cpuinfo | head -1
+grep -e "flags" /proc/cpuinfo | head -1
+
+BINARY=llama-cpp-fallback
+
+if grep -q -e "\savx\s" /proc/cpuinfo ; then
+	echo "CPU:    AVX    found OK"
+	if [ -e $CURDIR/llama-cpp-avx ]; then
+		BINARY=llama-cpp-avx
+	fi
+fi
+
+if grep -q -e "\savx2\s" /proc/cpuinfo ; then
+	echo "CPU:    AVX2   found OK"
+	if [ -e $CURDIR/llama-cpp-avx2 ]; then
+		BINARY=llama-cpp-avx2
+	fi
+fi
+
+# Check avx 512
+if grep -q -e "\savx512f\s" /proc/cpuinfo ; then
+	echo "CPU:    AVX512F found OK"
+	if [ -e $CURDIR/llama-cpp-avx512 ]; then
+		BINARY=llama-cpp-avx512
+	fi
+fi

-## P2P/GRPC mode
 if [ -n "$LLAMACPP_GRPC_SERVERS" ]; then
 	if [ -e $CURDIR/llama-cpp-grpc ]; then
 		BINARY=llama-cpp-grpc
@@ -31,3 +56,6 @@ fi

 echo "Using binary: $BINARY"
 exec $CURDIR/$BINARY "$@"
+
+# In case we fail execing, just run fallback
+exec $CURDIR/llama-cpp-fallback "$@"
--- a/backend/go/stablediffusion-ggml/Makefile
+++ b/backend/go/stablediffusion-ggml/Makefile
@@ -18,8 +18,8 @@ GO_TAGS?=
 LD_FLAGS?=

 # stablediffusion.cpp (ggml)
-STABLEDIFFUSION_GGML_REPO?=https://github.com/richiejp/stable-diffusion.cpp
-STABLEDIFFUSION_GGML_VERSION?=53e3b17eb3d0b5760ced06a1f98320b68b34aaae
+STABLEDIFFUSION_GGML_REPO?=https://github.com/leejet/stable-diffusion.cpp
+STABLEDIFFUSION_GGML_VERSION?=eed97a5e1d054f9c1e7ac01982ae480411d4157e

 # Disable Shared libs as we are linking on static gRPC and we can't mix shared and static
 CMAKE_ARGS+=-DBUILD_SHARED_LIBS=OFF
@@ -91,23 +91,18 @@ endif
 # (ggml can have different backends cpu, cuda, etc., each backend generates a .a archive)
 GGML_ARCHIVE_DIR := build/ggml/src/
 ALL_ARCHIVES := $(shell find $(GGML_ARCHIVE_DIR) -type f -name '*.a')
+ALL_OBJS := $(shell find $(GGML_ARCHIVE_DIR) -type f -name '*.o')

 # Name of the single merged library
 COMBINED_LIB := libggmlall.a

-# Rule to merge all the .a files into one
+# Instead of using the archives generated by GGML, use the object files directly to avoid overwriting objects with the same base name
 $(COMBINED_LIB): $(ALL_ARCHIVES)
-	@echo "Merging all .a into $(COMBINED_LIB)"
+	@echo "Merging all .o into $(COMBINED_LIB): $(ALL_OBJS)"
 	rm -f $@
-	mkdir -p merge-tmp
-	for a in $(ALL_ARCHIVES); do \
-		( cd merge-tmp && ar x ../$$a ); \
-	done
-	( cd merge-tmp && ar rcs ../$@ *.o )
+	ar -qc $@ $(ALL_OBJS)
 	# Ensure we have a proper index
 	ranlib $@
-	# Clean up
-	rm -rf merge-tmp

 build/libstable-diffusion.a:
 	@echo "Building SD with $(BUILD_TYPE) build type and $(CMAKE_ARGS)"
--- a/backend/go/stablediffusion-ggml/gosd.cpp
+++ b/backend/go/stablediffusion-ggml/gosd.cpp
@@ -53,9 +53,43 @@ sd_ctx_t* sd_c;

 sample_method_t sample_method;

+// Copied from the upstream CLI
+void sd_log_cb(enum sd_log_level_t level, const char* log, void* data) {
+    //SDParams* params = (SDParams*)data;
+    const char* level_str;
+
+    if (!log /*|| (!params->verbose && level <= SD_LOG_DEBUG)*/) {
+        return;
+    }
+
+    switch (level) {
+        case SD_LOG_DEBUG:
+            level_str = "DEBUG";
+            break;
+        case SD_LOG_INFO:
+            level_str = "INFO";
+            break;
+        case SD_LOG_WARN:
+            level_str = "WARN";
+            break;
+        case SD_LOG_ERROR:
+            level_str = "ERROR";
+            break;
+        default: /* Potential future-proofing */
+            level_str = "?????";
+            break;
+    }
+
+    fprintf(stderr, "[%-5s] ", level_str);
+    fputs(log, stderr);
+    fflush(stderr);
+}
+
 int load_model(char *model, char* options[], int threads, int diff) {
    fprintf (stderr, "Loading model!\n");

+    sd_set_log_callback(sd_log_cb, NULL);
+
    char *stableDiffusionModel = "";
    if (diff == 1 ) {
        stableDiffusionModel = model;
@@ -70,6 +104,8 @@ int load_model(char *model, char* options[], int threads, int diff) {
    char *scheduler = "";
    char *sampler = "";

+    fprintf(stderr, "parsing options\n");
+
    // If options is not NULL, parse options
    for (int i = 0; options[i] != NULL; i++) {
        char *optname = strtok(options[i], ":");
@@ -98,10 +134,13 @@ int load_model(char *model, char* options[], int threads, int diff) {
        }
    }

+    fprintf(stderr, "parsed options\n");
+
    int sample_method_found = -1;
-    for (int m = 0; m < N_SAMPLE_METHODS; m++) {
+    for (int m = 0; m < SAMPLE_METHOD_COUNT; m++) {
        if (!strcmp(sampler, sample_method_str[m])) {
            sample_method_found = m;
+            fprintf(stderr, "Found sampler: %s\n", sampler);
        }
    }
    if (sample_method_found == -1) {
@@ -111,7 +150,7 @@ int load_model(char *model, char* options[], int threads, int diff) {
    sample_method = (sample_method_t)sample_method_found;

    int schedule_found            = -1;
-    for (int d = 0; d < N_SCHEDULES; d++) {
+    for (int d = 0; d < SCHEDULE_COUNT; d++) {
        if (!strcmp(scheduler, schedule_str[d])) {
            schedule_found = d;
                fprintf (stderr, "Found scheduler: %s\n", scheduler);
@@ -125,30 +164,28 @@ int load_model(char *model, char* options[], int threads, int diff) {
    }

    schedule_t schedule = (schedule_t)schedule_found;
-    
+
    fprintf (stderr, "Creating context\n");
-    sd_ctx_t* sd_ctx = new_sd_ctx(model,
-                                  clip_l_path,
-                                  clip_g_path,
-                                  t5xxl_path,
-                                  stableDiffusionModel,
-                                  vae_path,
-                                  "",
-                                  "",
-                                  "",
-                                  "",
-                                  "",
-                                  false,
-                                  false,
-                                  false,
-                                  threads,
-                                  SD_TYPE_COUNT,
-                                  STD_DEFAULT_RNG,
-                                  schedule,
-                                  false,
-                                  false,
-                                  false,
-                                  false);
+    sd_ctx_params_t ctx_params;
+    sd_ctx_params_init(&ctx_params);
+    ctx_params.model_path = model;
+    ctx_params.clip_l_path = clip_l_path;
+    ctx_params.clip_g_path = clip_g_path;
+    ctx_params.t5xxl_path = t5xxl_path;
+    ctx_params.diffusion_model_path = stableDiffusionModel;
+    ctx_params.vae_path = vae_path;
+    ctx_params.taesd_path = "";
+    ctx_params.control_net_path = "";
+    ctx_params.lora_model_dir = "";
+    ctx_params.embedding_dir = "";
+    ctx_params.stacked_id_embed_dir = "";
+    ctx_params.vae_decode_only = false;
+    ctx_params.vae_tiling = false;
+    ctx_params.free_params_immediately = false;
+    ctx_params.n_threads = threads;
+    ctx_params.rng_type = STD_DEFAULT_RNG;
+    ctx_params.schedule = schedule;
+    sd_ctx_t* sd_ctx = new_sd_ctx(&ctx_params);

    if (sd_ctx == NULL) {
        fprintf (stderr, "failed loading model (generic error)\n");
@@ -169,29 +206,22 @@ int gen_image(char *text, char *negativeText, int width, int height, int steps,

    fprintf (stderr, "Generating image\n");

-    results = txt2img(sd_c,
-                            text,
-                            negativeText,
-                            -1, //clip_skip
-                            cfg_scale, // sfg_scale
-                            3.5f,
-			    0, // eta
-                            width,
-                            height,
-                            sample_method, 
-                            steps,
-                            seed,
-                            1,
-                            NULL,
-                            0.9f,
-                            20.f,
-                            false,
-                            "",
-                            skip_layers.data(),
-                            skip_layers.size(),
-                            0,
-                            0.01,
-                            0.2);
+    sd_img_gen_params_t p;
+    sd_img_gen_params_init(&p);
+
+    p.prompt = text;
+    p.negative_prompt = negativeText;
+    p.guidance.txt_cfg = cfg_scale;
+    p.guidance.slg.layers = skip_layers.data();
+    p.guidance.slg.layer_count = skip_layers.size();
+    p.width = width;
+    p.height = height;
+    p.sample_method = sample_method;
+    p.sample_steps = steps;
+    p.seed = seed;
+    p.input_id_images_path = "";
+
+    results = generate_image(sd_c, &p);

    if (results == NULL) {
        fprintf (stderr, "NO results\n");
--- a/backend/go/stablediffusion-ggml/gosd.go
+++ b/backend/go/stablediffusion-ggml/gosd.go
@@ -37,8 +37,8 @@ func (sd *SDGGML) Load(opts *pb.ModelOptions) error {

 	size := C.size_t(unsafe.Sizeof((*C.char)(nil)))
 	length := C.size_t(len(opts.Options))
-	options = (**C.char)(C.malloc(length * size))
-	view := (*[1 << 30]*C.char)(unsafe.Pointer(options))[0:len(opts.Options):len(opts.Options)]
+	options = (**C.char)(C.malloc((length + 1) * size))
+	view := (*[1 << 30]*C.char)(unsafe.Pointer(options))[0:len(opts.Options) + 1:len(opts.Options) + 1]

 	var diffusionModel int

@@ -66,6 +66,7 @@ func (sd *SDGGML) Load(opts *pb.ModelOptions) error {
 	for i, x := range oo {
 		view[i] = C.CString(x)
 	}
+	view[len(oo)] = nil

 	sd.cfgScale = opts.CFGScale

--- a/backend/go/whisper/Makefile
+++ b/backend/go/whisper/Makefile
@@ -6,7 +6,7 @@ CMAKE_ARGS?=

 # whisper.cpp version
 WHISPER_REPO?=https://github.com/ggml-org/whisper.cpp
-WHISPER_CPP_VERSION?=1f5cf0b2888402d57bb17b2029b2caa97e5f3baf
+WHISPER_CPP_VERSION?=7de8dd783f7b2eab56bff6bbc5d3369e34f0e77f

 export WHISPER_CMAKE_ARGS?=-DBUILD_SHARED_LIBS=OFF
 export WHISPER_DIR=$(abspath ./sources/whisper.cpp)
--- a/core/application/application.go
+++ b/core/application/application.go
@@ -2,8 +2,8 @@ package application

 import (
 	"github.com/mudler/LocalAI/core/config"
+	"github.com/mudler/LocalAI/core/templates"
 	"github.com/mudler/LocalAI/pkg/model"
-	"github.com/mudler/LocalAI/pkg/templates"
 )

 type Application struct {
--- a/core/application/startup.go
+++ b/core/application/startup.go
@@ -10,8 +10,8 @@ import (
 	"github.com/mudler/LocalAI/core/services"
 	"github.com/mudler/LocalAI/internal"

+	coreStartup "github.com/mudler/LocalAI/core/startup"
 	"github.com/mudler/LocalAI/pkg/model"
-	pkgStartup "github.com/mudler/LocalAI/pkg/startup"
 	"github.com/mudler/LocalAI/pkg/xsysinfo"
 	"github.com/rs/zerolog/log"
 )
@@ -55,11 +55,11 @@ func New(opts ...config.AppOption) (*Application, error) {
 		}
 	}

-	if err := pkgStartup.InstallModels(options.Galleries, options.BackendGalleries, options.ModelPath, options.BackendsPath, options.EnforcePredownloadScans, options.AutoloadBackendGalleries, nil, options.ModelsURL...); err != nil {
+	if err := coreStartup.InstallModels(options.Galleries, options.BackendGalleries, options.ModelPath, options.BackendsPath, options.EnforcePredownloadScans, options.AutoloadBackendGalleries, nil, options.ModelsURL...); err != nil {
 		log.Error().Err(err).Msg("error installing models")
 	}

-	if err := pkgStartup.InstallExternalBackends(options.BackendGalleries, options.BackendsPath, nil, options.ExternalBackends...); err != nil {
+	if err := coreStartup.InstallExternalBackends(options.BackendGalleries, options.BackendsPath, nil, options.ExternalBackends...); err != nil {
 		log.Error().Err(err).Msg("error installing external backends")
 	}

--- a/core/cli/backends.go
+++ b/core/cli/backends.go
@@ -8,7 +8,7 @@ import (
 	"github.com/mudler/LocalAI/core/config"

 	"github.com/mudler/LocalAI/core/gallery"
-	"github.com/mudler/LocalAI/pkg/startup"
+	"github.com/mudler/LocalAI/core/startup"
 	"github.com/rs/zerolog/log"
 	"github.com/schollz/progressbar/v3"
 )
--- a/core/cli/models.go
+++ b/core/cli/models.go
@@ -9,8 +9,8 @@ import (
 	"github.com/mudler/LocalAI/core/config"

 	"github.com/mudler/LocalAI/core/gallery"
+	"github.com/mudler/LocalAI/core/startup"
 	"github.com/mudler/LocalAI/pkg/downloader"
-	"github.com/mudler/LocalAI/pkg/startup"
 	"github.com/rs/zerolog/log"
 	"github.com/schollz/progressbar/v3"
 )
--- a/core/cli/util.go
+++ b/core/cli/util.go
@@ -72,7 +72,7 @@ func (u *CreateOCIImageCMD) Run(ctx *cliContext.Context) error {
 }

 func (u *GGUFInfoCMD) Run(ctx *cliContext.Context) error {
-	if u.Args == nil || len(u.Args) == 0 {
+	if len(u.Args) == 0 {
 		return fmt.Errorf("no GGUF file provided")
 	}
 	// We try to guess only if we don't have a template defined already
--- a/core/gallery/backend_types.go
+++ b/core/gallery/backend_types.go
@@ -2,7 +2,7 @@ package gallery

 import (
 	"github.com/mudler/LocalAI/core/config"
-	"github.com/mudler/LocalAI/core/system"
+	"github.com/mudler/LocalAI/pkg/system"
 )

 // BackendMetadata represents the metadata stored in a JSON file for each installed backend
--- a/core/gallery/backends.go
+++ b/core/gallery/backends.go
@@ -8,9 +8,9 @@ import (
 	"time"

 	"github.com/mudler/LocalAI/core/config"
-	"github.com/mudler/LocalAI/core/system"
 	"github.com/mudler/LocalAI/pkg/downloader"
 	"github.com/mudler/LocalAI/pkg/model"
+	"github.com/mudler/LocalAI/pkg/system"
 	"github.com/rs/zerolog/log"
 )

--- a/core/gallery/backends_test.go
+++ b/core/gallery/backends_test.go
@@ -7,7 +7,7 @@ import (
 	"runtime"

 	"github.com/mudler/LocalAI/core/config"
-	"github.com/mudler/LocalAI/core/system"
+	"github.com/mudler/LocalAI/pkg/system"
 	. "github.com/onsi/ginkgo/v2"
 	. "github.com/onsi/gomega"
 	"gopkg.in/yaml.v2"
--- a/core/gallery/models.go
+++ b/core/gallery/models.go
@@ -10,8 +10,8 @@ import (
 	"dario.cat/mergo"
 	"github.com/mudler/LocalAI/core/config"
 	lconfig "github.com/mudler/LocalAI/core/config"
-	"github.com/mudler/LocalAI/core/system"
 	"github.com/mudler/LocalAI/pkg/downloader"
+	"github.com/mudler/LocalAI/pkg/system"
 	"github.com/mudler/LocalAI/pkg/utils"

 	"github.com/rs/zerolog/log"
--- a/core/http/endpoints/localai/gallery.go
+++ b/core/http/endpoints/localai/gallery.go
@@ -15,9 +15,10 @@ import (
 )

 type ModelGalleryEndpointService struct {
-	galleries      []config.Gallery
-	modelPath      string
-	galleryApplier *services.GalleryService
+	galleries        []config.Gallery
+	backendGalleries []config.Gallery
+	modelPath        string
+	galleryApplier   *services.GalleryService
 }

 type GalleryModel struct {
@@ -25,11 +26,12 @@ type GalleryModel struct {
 	gallery.GalleryModel
 }

-func CreateModelGalleryEndpointService(galleries []config.Gallery, modelPath string, galleryApplier *services.GalleryService) ModelGalleryEndpointService {
+func CreateModelGalleryEndpointService(galleries []config.Gallery, backendGalleries []config.Gallery, modelPath string, galleryApplier *services.GalleryService) ModelGalleryEndpointService {
 	return ModelGalleryEndpointService{
-		galleries:      galleries,
-		modelPath:      modelPath,
-		galleryApplier: galleryApplier,
+		galleries:        galleries,
+		backendGalleries: backendGalleries,
+		modelPath:        modelPath,
+		galleryApplier:   galleryApplier,
 	}
 }

@@ -79,6 +81,7 @@ func (mgs *ModelGalleryEndpointService) ApplyModelGalleryEndpoint() func(c *fibe
 			ID:                 uuid.String(),
 			GalleryElementName: input.ID,
 			Galleries:          mgs.galleries,
+			BackendGalleries:   mgs.backendGalleries,
 		}

 		return c.JSON(schema.GalleryResponse{ID: uuid.String(), StatusURL: fmt.Sprintf("%smodels/jobs/%s", utils.BaseURL(c), uuid.String())})
--- a/core/http/endpoints/openai/chat.go
+++ b/core/http/endpoints/openai/chat.go
@@ -15,8 +15,8 @@ import (
 	"github.com/mudler/LocalAI/core/schema"
 	"github.com/mudler/LocalAI/pkg/functions"

+	"github.com/mudler/LocalAI/core/templates"
 	"github.com/mudler/LocalAI/pkg/model"
-	"github.com/mudler/LocalAI/pkg/templates"

 	"github.com/rs/zerolog/log"
 	"github.com/valyala/fasthttp"
@@ -175,7 +175,7 @@ func ChatEndpoint(cl *config.BackendConfigLoader, ml *model.ModelLoader, evaluat
 		textContentToReturn = ""
 		id = uuid.New().String()
 		created = int(time.Now().Unix())
-		
+
 		input, ok := c.Locals(middleware.CONTEXT_LOCALS_KEY_LOCALAI_REQUEST).(*schema.OpenAIRequest)
 		if !ok || input.Model == "" {
 			return fiber.ErrBadRequest
--- a/core/http/endpoints/openai/completion.go
+++ b/core/http/endpoints/openai/completion.go
@@ -15,9 +15,9 @@ import (
 	"github.com/gofiber/fiber/v2"
 	"github.com/google/uuid"
 	"github.com/mudler/LocalAI/core/schema"
+	"github.com/mudler/LocalAI/core/templates"
 	"github.com/mudler/LocalAI/pkg/functions"
 	"github.com/mudler/LocalAI/pkg/model"
-	"github.com/mudler/LocalAI/pkg/templates"
 	"github.com/rs/zerolog/log"
 	"github.com/valyala/fasthttp"
 )
--- a/core/http/endpoints/openai/edit.go
+++ b/core/http/endpoints/openai/edit.go
@@ -12,8 +12,8 @@ import (
 	"github.com/google/uuid"
 	"github.com/mudler/LocalAI/core/schema"

+	"github.com/mudler/LocalAI/core/templates"
 	"github.com/mudler/LocalAI/pkg/model"
-	"github.com/mudler/LocalAI/pkg/templates"

 	"github.com/rs/zerolog/log"
 )
--- a/core/http/endpoints/openai/realtime.go
+++ b/core/http/endpoints/openai/realtime.go
@@ -16,12 +16,12 @@ import (
 	"github.com/mudler/LocalAI/core/application"
 	"github.com/mudler/LocalAI/core/config"
 	"github.com/mudler/LocalAI/core/http/endpoints/openai/types"
+	"github.com/mudler/LocalAI/core/templates"
 	laudio "github.com/mudler/LocalAI/pkg/audio"
 	"github.com/mudler/LocalAI/pkg/functions"
 	"github.com/mudler/LocalAI/pkg/grpc/proto"
 	model "github.com/mudler/LocalAI/pkg/model"
 	"github.com/mudler/LocalAI/pkg/sound"
-	"github.com/mudler/LocalAI/pkg/templates"

 	"google.golang.org/grpc"

@@ -29,8 +29,8 @@ import (
 )

 const (
-	localSampleRate       = 16000
-	remoteSampleRate      = 24000
+	localSampleRate  = 16000
+	remoteSampleRate = 24000
 )

 // A model can be "emulated" that is: transcribe audio to text -> feed text to the LLM -> generate audio as result
@@ -210,9 +210,9 @@ func registerRealtime(application *application.Application) func(c *websocket.Co
 					// TODO: Need some way to pass this to the backend
 					Threshold: 0.5,
 					// TODO: This is ignored and the amount of padding is random at present
-					PrefixPaddingMs: 30,
+					PrefixPaddingMs:   30,
 					SilenceDurationMs: 500,
-					CreateResponse: func() *bool { t := true; return &t }(),
+					CreateResponse:    func() *bool { t := true; return &t }(),
 				},
 			},
 			InputAudioTranscription: &types.InputAudioTranscription{
@@ -233,7 +233,7 @@ func registerRealtime(application *application.Application) func(c *websocket.Co
 		// TODO: The API has no way to configure the VAD model or other models that make up a pipeline to fake any-to-any
 		//       So possibly we could have a way to configure a composite model that can be used in situations where any-to-any is expected
 		pipeline := config.Pipeline{
-			VAD: "silero-vad",
+			VAD:           "silero-vad",
 			Transcription: session.InputAudioTranscription.Model,
 		}

@@ -567,8 +567,8 @@ func updateTransSession(session *Session, update *types.ClientSession, cl *confi
 	trCur := session.InputAudioTranscription

 	if trUpd != nil && trUpd.Model != "" && trUpd.Model != trCur.Model {
-		pipeline := config.Pipeline {
-			VAD: "silero-vad",
+		pipeline := config.Pipeline{
+			VAD:           "silero-vad",
 			Transcription: trUpd.Model,
 		}

@@ -684,7 +684,7 @@ func handleVAD(cfg *config.BackendConfig, evaluator *templates.Evaluator, sessio
 				sendEvent(c, types.InputAudioBufferClearedEvent{
 					ServerEventBase: types.ServerEventBase{
 						EventID: "event_TODO",
-						Type: types.ServerEventTypeInputAudioBufferCleared,
+						Type:    types.ServerEventTypeInputAudioBufferCleared,
 					},
 				})

@@ -697,7 +697,7 @@ func handleVAD(cfg *config.BackendConfig, evaluator *templates.Evaluator, sessio
 				sendEvent(c, types.InputAudioBufferSpeechStartedEvent{
 					ServerEventBase: types.ServerEventBase{
 						EventID: "event_TODO",
-						Type: types.ServerEventTypeInputAudioBufferSpeechStarted,
+						Type:    types.ServerEventTypeInputAudioBufferSpeechStarted,
 					},
 					AudioStartMs: time.Now().Sub(startTime).Milliseconds(),
 				})
@@ -719,7 +719,7 @@ func handleVAD(cfg *config.BackendConfig, evaluator *templates.Evaluator, sessio
 				sendEvent(c, types.InputAudioBufferSpeechStoppedEvent{
 					ServerEventBase: types.ServerEventBase{
 						EventID: "event_TODO",
-						Type: types.ServerEventTypeInputAudioBufferSpeechStopped,
+						Type:    types.ServerEventTypeInputAudioBufferSpeechStopped,
 					},
 					AudioEndMs: time.Now().Sub(startTime).Milliseconds(),
 				})
@@ -728,9 +728,9 @@ func handleVAD(cfg *config.BackendConfig, evaluator *templates.Evaluator, sessio
 				sendEvent(c, types.InputAudioBufferCommittedEvent{
 					ServerEventBase: types.ServerEventBase{
 						EventID: "event_TODO",
-						Type: types.ServerEventTypeInputAudioBufferCommitted,
+						Type:    types.ServerEventTypeInputAudioBufferCommitted,
 					},
-					ItemID: generateItemID(),
+					ItemID:         generateItemID(),
 					PreviousItemID: "TODO",
 				})

@@ -833,9 +833,9 @@ func commitUtterance(ctx context.Context, utt []byte, cfg *config.BackendConfig,

 func runVAD(ctx context.Context, session *Session, adata []int16) ([]*proto.VADSegment, error) {
 	soundIntBuffer := &audio.IntBuffer{
-		Format: &audio.Format{SampleRate: localSampleRate, NumChannels: 1},
+		Format:         &audio.Format{SampleRate: localSampleRate, NumChannels: 1},
 		SourceBitDepth: 16,
-		Data: sound.ConvertInt16ToInt(adata),
+		Data:           sound.ConvertInt16ToInt(adata),
 	}

 	float32Data := soundIntBuffer.AsFloat32Buffer().Data
--- a/core/http/middleware/request.go
+++ b/core/http/middleware/request.go
@@ -11,9 +11,9 @@ import (
 	"github.com/mudler/LocalAI/core/config"
 	"github.com/mudler/LocalAI/core/schema"
 	"github.com/mudler/LocalAI/core/services"
+	"github.com/mudler/LocalAI/core/templates"
 	"github.com/mudler/LocalAI/pkg/functions"
 	"github.com/mudler/LocalAI/pkg/model"
-	"github.com/mudler/LocalAI/pkg/templates"
 	"github.com/mudler/LocalAI/pkg/utils"

 	"github.com/gofiber/fiber/v2"
--- a/core/http/routes/localai.go
+++ b/core/http/routes/localai.go
@@ -23,7 +23,7 @@ func RegisterLocalAIRoutes(router *fiber.App,

 	// LocalAI API endpoints
 	if !appConfig.DisableGalleryEndpoint {
-		modelGalleryEndpointService := localai.CreateModelGalleryEndpointService(appConfig.Galleries, appConfig.ModelPath, galleryService)
+		modelGalleryEndpointService := localai.CreateModelGalleryEndpointService(appConfig.Galleries, appConfig.BackendGalleries, appConfig.ModelPath, galleryService)
 		router.Post("/models/apply", modelGalleryEndpointService.ApplyModelGalleryEndpoint())
 		router.Post("/models/delete/:name", modelGalleryEndpointService.DeleteModelGalleryEndpoint())

--- a/core/http/routes/ui_gallery.go
+++ b/core/http/routes/ui_gallery.go
@@ -180,6 +180,7 @@ func registerGalleryRoutes(app *fiber.App, cl *config.BackendConfigLoader, appCo
 			ID:                 uid,
 			GalleryElementName: galleryID,
 			Galleries:          appConfig.Galleries,
+			BackendGalleries:   appConfig.BackendGalleries,
 		}
 		go func() {
 			galleryService.ModelGalleryChannel <- op
@@ -219,6 +220,7 @@ func registerGalleryRoutes(app *fiber.App, cl *config.BackendConfigLoader, appCo
 			Delete:             true,
 			GalleryElementName: galleryName,
 			Galleries:          appConfig.Galleries,
+			BackendGalleries:   appConfig.BackendGalleries,
 		}
 		go func() {
 			galleryService.ModelGalleryChannel <- op
--- a/core/p2p/p2p.go
+++ b/core/p2p/p2p.go
@@ -278,6 +278,7 @@ func ensureService(ctx context.Context, n *node.Node, nd *NodeData, sserv string
 			port, err := freeport.GetFreePort()
 			if err != nil {
 				zlog.Error().Err(err).Msgf("Could not allocate a free port for %s", nd.ID)
+				cancel()
 				return
 			}

--- a/core/services/backends.go
+++ b/core/services/backends.go
@@ -2,7 +2,7 @@ package services

 import (
 	"github.com/mudler/LocalAI/core/gallery"
-	"github.com/mudler/LocalAI/core/system"
+	"github.com/mudler/LocalAI/pkg/system"

 	"github.com/mudler/LocalAI/pkg/utils"
 	"github.com/rs/zerolog/log"
--- a/core/services/gallery.go
+++ b/core/services/gallery.go
@@ -7,8 +7,8 @@ import (

 	"github.com/mudler/LocalAI/core/config"
 	"github.com/mudler/LocalAI/core/gallery"
-	"github.com/mudler/LocalAI/core/system"
 	"github.com/mudler/LocalAI/pkg/model"
+	"github.com/mudler/LocalAI/pkg/system"
 	"github.com/rs/zerolog/log"
 )

--- a/core/services/models.go
+++ b/core/services/models.go
@@ -7,7 +7,7 @@ import (

 	"github.com/mudler/LocalAI/core/config"
 	"github.com/mudler/LocalAI/core/gallery"
-	"github.com/mudler/LocalAI/core/system"
+	"github.com/mudler/LocalAI/pkg/system"
 	"github.com/mudler/LocalAI/pkg/utils"
 	"gopkg.in/yaml.v2"
 )
--- a/core/startup/backend_preload.go
+++ b/core/startup/backend_preload.go
@@ -8,8 +8,8 @@ import (

 	"github.com/mudler/LocalAI/core/config"
 	"github.com/mudler/LocalAI/core/gallery"
-	"github.com/mudler/LocalAI/core/system"
 	"github.com/mudler/LocalAI/pkg/downloader"
+	"github.com/mudler/LocalAI/pkg/system"
 	"github.com/rs/zerolog/log"
 )

--- a/core/startup/model_preload.go
+++ b/core/startup/model_preload.go
@@ -10,8 +10,8 @@ import (

 	"github.com/mudler/LocalAI/core/config"
 	"github.com/mudler/LocalAI/core/gallery"
-	"github.com/mudler/LocalAI/core/system"
 	"github.com/mudler/LocalAI/pkg/downloader"
+	"github.com/mudler/LocalAI/pkg/system"
 	"github.com/mudler/LocalAI/pkg/utils"
 	"github.com/rs/zerolog/log"
 	"gopkg.in/yaml.v2"
--- a/core/startup/model_preload_test.go
+++ b/core/startup/model_preload_test.go
@@ -6,7 +6,7 @@ import (
 	"path/filepath"

 	"github.com/mudler/LocalAI/core/config"
-	. "github.com/mudler/LocalAI/pkg/startup"
+	. "github.com/mudler/LocalAI/core/startup"

 	. "github.com/onsi/ginkgo/v2"
 	. "github.com/onsi/gomega"
--- a/core/startup/startup_suite_test.go
+++ b/core/startup/startup_suite_test.go
--- a/core/templates/cache.go
+++ b/core/templates/cache.go
--- a/core/templates/evaluator.go
+++ b/core/templates/evaluator.go
--- a/core/templates/evaluator_test.go
+++ b/core/templates/evaluator_test.go
@@ -3,8 +3,8 @@ package templates_test
 import (
 	"github.com/mudler/LocalAI/core/config"
 	"github.com/mudler/LocalAI/core/schema"
+	. "github.com/mudler/LocalAI/core/templates"
 	"github.com/mudler/LocalAI/pkg/functions"
-	. "github.com/mudler/LocalAI/pkg/templates"

 	. "github.com/onsi/ginkgo/v2"
 	. "github.com/onsi/gomega"
--- a/core/templates/multimodal.go
+++ b/core/templates/multimodal.go
--- a/core/templates/multimodal_test.go
+++ b/core/templates/multimodal_test.go
@@ -1,7 +1,7 @@
 package templates_test

 import (
-	. "github.com/mudler/LocalAI/pkg/templates" // Update with your module path
+	. "github.com/mudler/LocalAI/core/templates" // Update with your module path

 	// Update with your module path
 	. "github.com/onsi/ginkgo/v2"
--- a/core/templates/templates_suite_test.go
+++ b/core/templates/templates_suite_test.go
--- a/docs/content/docs/features/backends.md
+++ b/docs/content/docs/features/backends.md
@@ -96,8 +96,8 @@ Your backend container should:
 For getting started, see the available backends in LocalAI here: https://github.com/mudler/LocalAI/tree/master/backend . 

 - For Python based backends there is a template that can be used as starting point: https://github.com/mudler/LocalAI/tree/master/backend/python/common/template . 
- For Golang based backends, you can see the `bark-cpp` backend as an example: https://github.com/mudler/LocalAI/tree/master/backend/go/bark
- For C++ based backends, you can see the `llama-cpp` backend as an example: https://github.com/mudler/LocalAI/tree/master/backend/cpp/llama
+- For Golang based backends, you can see the `bark-cpp` backend as an example: https://github.com/mudler/LocalAI/tree/master/backend/go/bark-cpp
+- For C++ based backends, you can see the `llama-cpp` backend as an example: https://github.com/mudler/LocalAI/tree/master/backend/cpp/llama-cpp

 ### Publishing Your Backend

--- a/docs/content/docs/getting-started/build.md
+++ b/docs/content/docs/getting-started/build.md
@@ -9,13 +9,11 @@ ico = "rocket_launch"

 ### Build

-LocalAI can be built as a container image or as a single, portable binary. Note that some model architectures might require Python libraries, which are not included in the binary. The binary contains only the core backends written in Go and C++. 
+LocalAI can be built as a container image or as a single, portable binary. Note that some model architectures might require Python libraries, which are not included in the binary.

 LocalAI's extensible architecture allows you to add your own backends, which can be written in any language, and as such the container images contains also the Python dependencies to run all the available backends (for example, in order to run backends like __Diffusers__ that allows to generate images and videos from text).

-In some cases you might want to re-build LocalAI from source (for instance to leverage Apple Silicon acceleration), or to build a custom container image with your own backends. This section contains instructions on how to build LocalAI from source.
-
-
+This section contains instructions on how to build LocalAI from source.

 #### Build LocalAI locally

@@ -24,7 +22,6 @@ In some cases you might want to re-build LocalAI from source (for instance to le
 In order to build LocalAI locally, you need the following requirements:

 - Golang >= 1.21
- Cmake/make
 - GCC
 - GRPC

@@ -36,20 +33,14 @@ To install the dependencies follow the instructions below:
 Install `xcode` from the App Store

 ```bash
-brew install abseil cmake go grpc protobuf protoc-gen-go protoc-gen-go-grpc python wget
-```
-
-After installing the above dependencies, you need to install grpcio-tools from PyPI. You could do this via a pip --user install or a virtualenv.
-
-```bash
-pip install --user grpcio-tools
+brew install go protobuf protoc-gen-go protoc-gen-go-grpc wget
 ```

 {{% /tab %}}
 {{% tab tabName="Debian" %}}

 ```bash
-apt install cmake golang libgrpc-dev make protobuf-compiler-grpc python3-grpc-tools
+apt install golang make protobuf-compiler-grpc
 ```

 After you have golang installed and working, you can install the required binaries for compiling the golang protobuf components via the following commands
@@ -63,10 +54,8 @@ go install google.golang.org/grpc/cmd/protoc-gen-go-grpc@1958fcbe2ca8bd93af633f1
 {{% /tab %}}
 {{% tab tabName="From source" %}}

-Specify `BUILD_GRPC_FOR_BACKEND_LLAMA=true` to build automatically the gRPC dependencies
-
 ```bash
-make ... BUILD_GRPC_FOR_BACKEND_LLAMA=true build
+make build
 ```

 {{% /tab %}}
@@ -83,36 +72,6 @@ make build

 This should produce the binary `local-ai`

-Here is the list of the variables available that can be used to customize the build:
-
-| Variable | Default | Description |
-| ---------------------| ------- | ----------- |
-| `BUILD_TYPE`         |   None      | Build type. Available: `cublas`, `openblas`, `clblas`, `metal`,`hipblas`, `sycl_f16`, `sycl_f32` |
-| `GO_TAGS`            |   `tts stablediffusion`      | Go tags. Available: `stablediffusion`, `tts` |
-| `CLBLAST_DIR`        |         | Specify a CLBlast directory |
-| `CUDA_LIBPATH`       |         | Specify a CUDA library path |
-| `BUILD_API_ONLY` | false | Set to true to build only the API (no backends will be built) |
-
-{{% alert note %}}
-
-#### CPU flagset compatibility
-
-
-LocalAI uses different backends based on ggml and llama.cpp to run models. If your CPU doesn't support common instruction sets, you can disable them during build:
-
-```
-CMAKE_ARGS="-DGGML_F16C=OFF -DGGML_AVX512=OFF -DGGML_AVX2=OFF -DGGML_AVX=OFF -DGGML_FMA=OFF" make build
-```
-
-To have effect on the container image, you need to set `REBUILD=true`:
-
-```
-docker run  quay.io/go-skynet/localai
-docker run --rm -ti -p 8080:8080 -e DEBUG=true -e MODELS_PATH=/models -e THREADS=1 -e REBUILD=true -e CMAKE_ARGS="-DGGML_F16C=OFF -DGGML_AVX512=OFF -DGGML_AVX2=OFF -DGGML_AVX=OFF -DGGML_FMA=OFF" -v $PWD/models:/models quay.io/go-skynet/local-ai:latest
-```
-
-{{% /alert %}}
-
 #### Container image

 Requirements:
@@ -153,6 +112,9 @@ wget https://huggingface.co/TheBloke/phi-2-GGUF/resolve/main/phi-2.Q2_K.gguf -O
 # Use a template from the examples
 cp -rf prompt-templates/ggml-gpt4all-j.tmpl models/phi-2.Q2_K.tmpl

+# Install the llama-cpp backend
+./local-ai backends install llama-cpp
+
 # Run LocalAI
 ./local-ai --models-path=./models/ --debug=true

@@ -186,131 +148,53 @@ sudo xcode-select --switch /Applications/Xcode.app/Contents/Developer

 ```
 # reinstall build dependencies
-brew reinstall abseil cmake go grpc protobuf wget
+brew reinstall go grpc protobuf wget

 make clean

 make build
 ```

-**Requirements**: OpenCV, Gomp
+## Build backends

-Image generation requires `GO_TAGS=stablediffusion` to be set during build:
+LocalAI have several backends available for installation in the backend gallery. The backends can be also built by source. As backends might vary from language and dependencies that they require, the documentation will provide generic guidance for few of the backends, which can be applied with some slight modifications also to the others.
+
+### Manually
+
+Typically each backend include a Makefile which allow to package the backend.
+
+In the LocalAI repository, for instance you can build `bark-cpp` by doing:

 ```
-make GO_TAGS=stablediffusion build
+git clone https://github.com/go-skynet/LocalAI.git
+
+# Build the bark-cpp backend (requires cmake)
+make -C LocalAI/backend/go/bark-cpp build package
+
+# Build vllm backend (requires python)
+make -C LocalAI/backend/python/vllm
 ```

-### Build with Text to audio support
+### With Docker

-**Requirements**: piper-phonemize
+Building with docker is simpler as abstracts away all the requirement, and focuses on building the final OCI images that are available in the gallery. This allows for instance also to build locally a backend and install it with LocalAI. You can refer to [Backends](https://localai.io/backends/) for general guidance on how to install and develop backends.

-Text to audio support is experimental and requires `GO_TAGS=tts` to be set during build:
+In the LocalAI repository, you can build `bark-cpp` by doing:

 ```
-make GO_TAGS=tts build
+git clone https://github.com/go-skynet/LocalAI.git
+
+# Build the bark-cpp backend (requires docker)
+make docker-build-bark-cpp
 ```

-### Acceleration
-
-#### OpenBLAS
-
-Software acceleration.
-
-Requirements: OpenBLAS
-
-```
-make BUILD_TYPE=openblas build
-```
-
-#### CuBLAS
-
-Nvidia Acceleration.
-
-Requirement: Nvidia CUDA toolkit
-
-Note: CuBLAS support is experimental, and has not been tested on real HW. please report any issues you find!
-
-```
-make BUILD_TYPE=cublas build
-```
-
-More informations available in the upstream PR: https://github.com/ggerganov/llama.cpp/pull/1412
-
-
-#### Hipblas (AMD GPU with ROCm on Arch Linux)
-
-Packages:
-```
-pacman -S base-devel git rocm-hip-sdk rocm-opencl-sdk opencv clblast grpc
-```
-
-Library links:
-```
-export CGO_CFLAGS="-I/usr/include/opencv4"
-export CGO_CXXFLAGS="-I/usr/include/opencv4"
-export CGO_LDFLAGS="-L/opt/rocm/hip/lib -lamdhip64 -L/opt/rocm/lib -lOpenCL -L/usr/lib -lclblast -lrocblas -lhipblas -lrocrand -lomp -O3 --rtlib=compiler-rt -unwindlib=libgcc -lhipblas -lrocblas --hip-link"
-```
-
-Build:
-```
-make BUILD_TYPE=hipblas GPU_TARGETS=gfx1030
-```
-
-#### ClBLAS
-
-AMD/Intel GPU acceleration.
-
-Requirement: OpenCL, CLBlast
-
-```
-make BUILD_TYPE=clblas build
-```
-
-To specify a clblast dir set: `CLBLAST_DIR`
-
-#### Intel GPU acceleration
-
-Intel GPU acceleration is supported via SYCL.
-
-Requirements: [Intel oneAPI Base Toolkit](https://www.intel.com/content/www/us/en/developer/tools/oneapi/base-toolkit-download.html) (see also [llama.cpp setup installations instructions](https://github.com/ggerganov/llama.cpp/blob/d71ac90985854b0905e1abba778e407e17f9f887/README-sycl.md?plain=1#L56))
-
-```
-make BUILD_TYPE=sycl_f16 build # for float16
-make BUILD_TYPE=sycl_f32 build # for float32
-```
-
-#### Metal (Apple Silicon)
-
-```
-make build
-
-# correct build type is automatically used on mac (BUILD_TYPE=metal)
-# Set `gpu_layers: 256` (or equal to the number of model layers) to your YAML model config file and `f16: true`
-```
-
-### Windows compatibility
-
-Make sure to give enough resources to the running container. See https://github.com/go-skynet/LocalAI/issues/2
-
-### Examples
-
-More advanced build options are available, for instance to build only a single backend.
-
-#### Build only a single backend
-
-You can control the backends that are built by setting the `GRPC_BACKENDS` environment variable. For instance, to build only the `llama-cpp` backend only:
+Note that `make` is only by convenience, in reality it just runs a simple `docker` command as:

 ```bash
-make GRPC_BACKENDS=backend-assets/grpc/llama-cpp build
+docker build --build-arg BUILD_TYPE=$(BUILD_TYPE) --build-arg BASE_IMAGE=$(BASE_IMAGE) -t local-ai-backend:bark-cpp -f LocalAI/backend/Dockerfile.golang --build-arg BACKEND=bark-cpp .               
 ```

-By default, all the backends are built.
+Note:

-#### Specific llama.cpp version
-
-To build with a specific version of llama.cpp, set `CPPLLAMA_VERSION` to the tag or wanted sha:
-
-```
-CPPLLAMA_VERSION=<sha> make build
-```
+- BUILD_TYPE can be either: `cublas`, `hipblas`, `sycl_f16`, `sycl_f32`, `metal`.
+- BASE_IMAGE is tested on `ubuntu:22.04` (and defaults to it)
--- a/docs/content/docs/getting-started/quickstart.md
+++ b/docs/content/docs/getting-started/quickstart.md
@@ -154,7 +154,7 @@ For instructions on using AIO images, see [Using container images]({{% relref "d

 LocalAI is part of the Local family stack, along with LocalAGI and LocalRecall.

-[LocalAGI](https://github.com/mudler/LocalAGI) is a powerful, self-hostable AI Agent platform designed for maximum privacy and flexibility which encompassess and uses all the softwre stack. It provides a complete drop-in replacement for OpenAI's Responses APIs with advanced agentic capabilities, working entirely locally on consumer-grade hardware (CPU and GPU).
+[LocalAGI](https://github.com/mudler/LocalAGI) is a powerful, self-hostable AI Agent platform designed for maximum privacy and flexibility which encompassess and uses all the software stack. It provides a complete drop-in replacement for OpenAI's Responses APIs with advanced agentic capabilities, working entirely locally on consumer-grade hardware (CPU and GPU).

 ### Quick Start

--- a/docs/static/install.sh
+++ b/docs/static/install.sh
@@ -757,7 +757,7 @@ install_binary_darwin() {
    [ "$(uname -s)" = "Darwin" ] || fatal 'This script is intended to run on macOS only.'

    info "Downloading LocalAI ${LOCALAI_VERSION}..."
-    curl --fail --show-error --location --progress-bar -o $TEMP_DIR/local-ai "https://github.com/mudler/LocalAI/releases/download/${LOCALAI_VERSION}/local-ai-Darwin-${ARCH}"
+    curl --fail --show-error --location --progress-bar -o $TEMP_DIR/local-ai "https://github.com/mudler/LocalAI/releases/download/${LOCALAI_VERSION}/local-ai-${LOCALAI_VERSION}-darwin-${ARCH}"

    info "Installing to /usr/local/bin/local-ai"
    install -o0 -g0 -m755 $TEMP_DIR/local-ai /usr/local/bin/local-ai
@@ -789,7 +789,7 @@ install_binary() {
    fi

    info "Downloading LocalAI ${LOCALAI_VERSION}..."
-    curl --fail --location --progress-bar -o $TEMP_DIR/local-ai "https://github.com/mudler/LocalAI/releases/download/${LOCALAI_VERSION}/local-ai-Linux-${ARCH}"
+    curl --fail --location --progress-bar -o $TEMP_DIR/local-ai "https://github.com/mudler/LocalAI/releases/download/${LOCALAI_VERSION}/local-ai-${LOCALAI_VERSION}-linux-${ARCH}"

    for BINDIR in /usr/local/bin /usr/bin /bin; do
        echo $PATH | grep -q $BINDIR && break || continue
@@ -868,7 +868,7 @@ OS="$(uname -s)"

 ARCH=$(uname -m)
 case "$ARCH" in
-    x86_64) ARCH="x86_64" ;;
+    x86_64) ARCH="amd64" ;;
    aarch64|arm64) ARCH="arm64" ;;
    *) fatal "Unsupported architecture: $ARCH" ;;
 esac
--- a/core/system/capabilities.go
+++ b/core/system/capabilities.go
Author	SHA1	Message	Date
Ettore Di Giacinto	ee625fc34e	fix(backends gallery): pass-by backend galleries to the model service (#5906 ) Signed-off-by: Ettore Di Giacinto <mudler@localai.io>	2025-07-25 16:38:09 +02:00
Ettore Di Giacinto	693aa0b5de	Update README.md Signed-off-by: Ettore Di Giacinto <mudler@users.noreply.github.com>	2025-07-25 11:51:23 +02:00
Ettore Di Giacinto	3973e6e5da	fix(install.sh): update to use the new binary naming (#5903 ) Signed-off-by: Ettore Di Giacinto <mudler@localai.io>	2025-07-25 10:43:22 +02:00
LocalAI [bot]	fb6ec68090	chore: ⬆️ Update ggml-org/whisper.cpp to `7de8dd783f7b2eab56bff6bbc5d3369e34f0e77f` (#5902 ) ⬆️ Update ggml-org/whisper.cpp Signed-off-by: github-actions[bot] <41898282+github-actions[bot]@users.noreply.github.com> Co-authored-by: mudler <2420543+mudler@users.noreply.github.com>	2025-07-25 08:40:24 +02:00
LocalAI [bot]	0301fc7c46	chore: ⬆️ Update leejet/stable-diffusion.cpp to `eed97a5e1d054f9c1e7ac01982ae480411d4157e` (#5901 ) ⬆️ Update leejet/stable-diffusion.cpp Signed-off-by: github-actions[bot] <41898282+github-actions[bot]@users.noreply.github.com> Co-authored-by: mudler <2420543+mudler@users.noreply.github.com>	2025-07-25 08:40:06 +02:00
LocalAI [bot]	813cb4296d	chore: ⬆️ Update ggml-org/llama.cpp to `3f4fc97f1d745f1d5d3c853949503136d419e6de` (#5900 ) ⬆️ Update ggml-org/llama.cpp Signed-off-by: github-actions[bot] <41898282+github-actions[bot]@users.noreply.github.com> Co-authored-by: mudler <2420543+mudler@users.noreply.github.com>	2025-07-25 08:39:44 +02:00
Ettore Di Giacinto	deda3a4972	Update build documentation Signed-off-by: Ettore Di Giacinto <mudler@localai.io>	2025-07-24 22:53:08 +02:00
Ettore Di Giacinto	a28f27604a	Update backends.md Signed-off-by: Ettore Di Giacinto <mudler@users.noreply.github.com>	2025-07-24 16:18:25 +02:00
Richard Palethorpe	8fe9fa98f2	fix(stablediffusion-cpp): Switch back to upstream and update (#5880 ) * sync(stablediffusion-cpp): Switch back to upstream and update Signed-off-by: Richard Palethorpe <io@richiejp.com> * fix(stablediffusion-ggml): NULL terminate options array to prevent segfault Signed-off-by: Richard Palethorpe <io@richiejp.com> * fix(build): Add BUILD_TYPE and BASE_IMAGE to all backends Signed-off-by: Richard Palethorpe <io@richiejp.com> --------- Signed-off-by: Richard Palethorpe <io@richiejp.com>	2025-07-24 16:03:18 +02:00
Nathaniel Hyson	4db1b80278	Update quickstart.md (#5898 ) Fixed spelling mistake Signed-off-by: Nathaniel Hyson <Shinrai@users.noreply.github.com>	2025-07-24 15:04:02 +02:00
Dave	b3c2a3c257	fix: untangle pkg and core (#5896 ) * migrate core/system to pkg/system - it has no dependencies FROM core, and IS USED in pkg Signed-off-by: Dave Lee <dave@gray101.com> * move pkg/templates up to core/templates -- nothing in pkg references it, but it does reference core. Signed-off-by: Dave Lee <dave@gray101.com> * remove extra check, len of nil is 0 Signed-off-by: Dave Lee <dave@gray101.com> * move pkg/startup to core/startup -- it does have important and unfixable dependencies on core Signed-off-by: Dave Lee <dave@gray101.com> --------- Signed-off-by: Dave Lee <dave@gray101.com>	2025-07-24 15:03:41 +02:00
LocalAI [bot]	61c2304638	chore: ⬆️ Update ggml-org/llama.cpp to `a86f52b2859dae4db5a7a0bbc0f1ad9de6b43ec6` (#5894 ) ⬆️ Update ggml-org/llama.cpp Signed-off-by: github-actions[bot] <41898282+github-actions[bot]@users.noreply.github.com> Co-authored-by: mudler <2420543+mudler@users.noreply.github.com>	2025-07-24 15:02:37 +02:00
Ettore Di Giacinto	92c5ab97e2	chore(Makefile): drop unused targets (#5893 ) Signed-off-by: Ettore Di Giacinto <mudler@localai.io>	2025-07-24 14:49:50 +02:00
LocalAI [bot]	76e471441c	chore: ⬆️ Update richiejp/stable-diffusion.cpp to `10c6501bd05a697e014f1bee3a84e5664290c489` (#5732 ) ⬆️ Update richiejp/stable-diffusion.cpp Signed-off-by: github-actions[bot] <41898282+github-actions[bot]@users.noreply.github.com> Co-authored-by: mudler <2420543+mudler@users.noreply.github.com>	2025-07-23 21:09:02 +00:00
Dave	9cecf5e7ac	fix: rename Dockerfile.go --> Dockerfile.golang to avoid IDE errors (#5892 ) extract up and out Dockerfile.go --> Dockerfile.golang rename. Prevents syntax highlighting and IDE errors Signed-off-by: Dave Lee <dave@gray101.com>	2025-07-23 21:33:26 +02:00
Ettore Di Giacinto	b7b3164736	chore: try to speedup build Signed-off-by: Ettore Di Giacinto <mudler@localai.io>	2025-07-23 21:21:23 +02:00