chore(docs): decrease logo size, minor enhancements

Signed-off-by: Ettore Di Giacinto <mudler@localai.io>
fix(stablediffusion): Avoid overwriting SYCL specific flags from outer make call (#5181 )
2026-02-03 11:13:31 -05:00 · 2025-04-15 22:00:51 +02:00 · 2025-04-15 19:31:25 +02:00 · 2025-04-15 17:51:24 +02:00 · 2025-04-15 17:17:17 +02:00 · 2025-04-15 09:27:29 +02:00
84 changed files with 2013 additions and 358 deletions
--- a/.env
+++ b/.env
@@ -29,6 +29,9 @@
 ## Enable/Disable single backend (useful if only one GPU is available)
 # LOCALAI_SINGLE_ACTIVE_BACKEND=true

+# Forces shutdown of the backends if busy (only if LOCALAI_SINGLE_ACTIVE_BACKEND is set)
+# LOCALAI_FORCE_BACKEND_SHUTDOWN=true
+
 ## Specify a build type. Available: cublas, openblas, clblas.
 ## cuBLAS: This is a GPU-accelerated version of the complete standard BLAS (Basic Linear Algebra Subprograms) library. It's provided by Nvidia and is part of their CUDA toolkit.
 ## OpenBLAS: This is an open-source implementation of the BLAS library that aims to provide highly optimized code for various platforms. It includes support for multi-threading and can be compiled to use hardware-specific features for additional performance. OpenBLAS can run on many kinds of hardware, including CPUs from Intel, AMD, and ARM.
--- a/.github/workflows/generate_intel_image.yaml
+++ b/.github/workflows/generate_intel_image.yaml
@@ -15,7 +15,7 @@ jobs:
    strategy:
      matrix:
        include:
-          - base-image: intel/oneapi-basekit:2025.0.0-0-devel-ubuntu22.04
+          - base-image: intel/oneapi-basekit:2025.1.0-0-devel-ubuntu22.04
            runs-on: 'ubuntu-latest'
            platforms: 'linux/amd64'
    runs-on: ${{matrix.runs-on}}
--- a/.github/workflows/secscan.yaml
+++ b/.github/workflows/secscan.yaml
@@ -18,7 +18,7 @@ jobs:
        if: ${{ github.actor != 'dependabot[bot]' }}
      - name: Run Gosec Security Scanner
        if: ${{ github.actor != 'dependabot[bot]' }}
-        uses: securego/gosec@v2.22.0
+        uses: securego/gosec@v2.22.3
        with:
          # we let the report trigger content trigger a failure using the GitHub Security features.
          args: '-no-fail -fmt sarif -out results.sarif ./...'
--- a/17
+++ b/17
@@ -6,7 +6,7 @@ BINARY_NAME=local-ai
 DETECT_LIBS?=true

 # llama.cpp versions
-CPPLLAMA_VERSION?=4663bd353c61c1136cd8a97b9908755e4ab30cec
+CPPLLAMA_VERSION?=d6d2c2ab8c8865784ba9fef37f2b2de3f2134d33

 # whisper.cpp version
 WHISPER_REPO?=https://github.com/ggerganov/whisper.cpp
@@ -21,8 +21,8 @@ BARKCPP_REPO?=https://github.com/PABannier/bark.cpp.git
 BARKCPP_VERSION?=v1.0.0

 # stablediffusion.cpp (ggml)
-STABLEDIFFUSION_GGML_REPO?=https://github.com/leejet/stable-diffusion.cpp
-STABLEDIFFUSION_GGML_VERSION?=19d876ee300a055629926ff836489901f734f2b7
+STABLEDIFFUSION_GGML_REPO?=https://github.com/richiejp/stable-diffusion.cpp
+STABLEDIFFUSION_GGML_VERSION?=53e3b17eb3d0b5760ced06a1f98320b68b34aaae

 ONNX_VERSION?=1.20.0
 ONNX_ARCH?=x64
@@ -260,11 +260,7 @@ backend/go/image/stablediffusion-ggml/libsd.a: sources/stablediffusion-ggml.cpp
 	$(MAKE) -C backend/go/image/stablediffusion-ggml libsd.a

 backend-assets/grpc/stablediffusion-ggml: backend/go/image/stablediffusion-ggml/libsd.a backend-assets/grpc
-	CGO_LDFLAGS="$(CGO_LDFLAGS)" C_INCLUDE_PATH=$(CURDIR)/backend/go/image/stablediffusion-ggml/ LIBRARY_PATH=$(CURDIR)/backend/go/image/stablediffusion-ggml/ \
-	$(GOCMD) build -ldflags "$(LD_FLAGS)" -tags "$(GO_TAGS)" -o backend-assets/grpc/stablediffusion-ggml ./backend/go/image/stablediffusion-ggml/
-ifneq ($(UPX),)
-	$(UPX) backend-assets/grpc/stablediffusion-ggml
-endif
+	$(MAKE) -C backend/go/image/stablediffusion-ggml CGO_LDFLAGS="$(CGO_LDFLAGS)" stablediffusion-ggml

 sources/onnxruntime:
 	mkdir -p sources/onnxruntime
@@ -809,7 +805,8 @@ docker-aio-all:

 docker-image-intel:
 	docker build \
-		--build-arg BASE_IMAGE=intel/oneapi-basekit:2025.0.0-0-devel-ubuntu22.04 \
+		--progress plain \
+		--build-arg BASE_IMAGE=intel/oneapi-basekit:2025.1.0-0-devel-ubuntu24.04 \
 		--build-arg IMAGE_TYPE=$(IMAGE_TYPE) \
 		--build-arg GO_TAGS="none" \
 		--build-arg MAKEFLAGS="$(DOCKER_MAKEFLAGS)" \
@@ -817,7 +814,7 @@ docker-image-intel:

 docker-image-intel-xpu:
 	docker build \
-		--build-arg BASE_IMAGE=intel/oneapi-basekit:2025.0.0-0-devel-ubuntu22.04 \
+		--build-arg BASE_IMAGE=intel/oneapi-basekit:2025.1.0-0-devel-ubuntu22.04 \
 		--build-arg IMAGE_TYPE=$(IMAGE_TYPE) \
 		--build-arg GO_TAGS="none" \
 		--build-arg MAKEFLAGS="$(DOCKER_MAKEFLAGS)" \
--- a/README.md
+++ b/README.md
@@ -1,7 +1,6 @@
 <h1 align="center">
  <br>
-  <img height="300" src="https://github.com/go-skynet/LocalAI/assets/2420543/0966aa2a-166e-4f99-a3e5-6c915fc997dd"> <br>
-    LocalAI
+  <img height="300" src="./core/http/static/logo.png"> <br>
 <br>
 </h1>

@@ -48,9 +47,58 @@

 [![tests](https://github.com/go-skynet/LocalAI/actions/workflows/test.yml/badge.svg)](https://github.com/go-skynet/LocalAI/actions/workflows/test.yml)[![Build and Release](https://github.com/go-skynet/LocalAI/actions/workflows/release.yaml/badge.svg)](https://github.com/go-skynet/LocalAI/actions/workflows/release.yaml)[![build container images](https://github.com/go-skynet/LocalAI/actions/workflows/image.yml/badge.svg)](https://github.com/go-skynet/LocalAI/actions/workflows/image.yml)[![Bump dependencies](https://github.com/go-skynet/LocalAI/actions/workflows/bump_deps.yaml/badge.svg)](https://github.com/go-skynet/LocalAI/actions/workflows/bump_deps.yaml)[![Artifact Hub](https://img.shields.io/endpoint?url=https://artifacthub.io/badge/repository/localai)](https://artifacthub.io/packages/search?repo=localai)

-**LocalAI** is the free, Open Source OpenAI alternative. LocalAI act as a drop-in replacement REST API that’s compatible with OpenAI (Elevenlabs, Anthropic... ) API specifications for local AI inferencing. It allows you to run LLMs, generate images, audio (and not only) locally or on-prem with consumer grade hardware, supporting multiple model families. Does not require GPU. It is created and maintained by [Ettore Di Giacinto](https://github.com/mudler).
+**LocalAI** is the free, Open Source OpenAI alternative. LocalAI act as a drop-in replacement REST API that's compatible with OpenAI (Elevenlabs, Anthropic... ) API specifications for local AI inferencing. It allows you to run LLMs, generate images, audio (and not only) locally or on-prem with consumer grade hardware, supporting multiple model families. Does not require GPU. It is created and maintained by [Ettore Di Giacinto](https://github.com/mudler).

-![screen](https://github.com/mudler/LocalAI/assets/2420543/20b5ccd2-8393-44f0-aaf6-87a23806381e)
+
+## 📚🆕 Local Stack Family
+
+🆕 LocalAI is now part of a comprehensive suite of AI tools designed to work together:
+
+<table>
+  <tr>
+    <td width="50%" valign="top">
+      <a href="https://github.com/mudler/LocalAGI">
+        <img src="https://raw.githubusercontent.com/mudler/LocalAGI/refs/heads/main/webui/react-ui/public/logo_2.png" width="300" alt="LocalAGI Logo">
+      </a>
+    </td>
+    <td width="50%" valign="top">
+      <h3><a href="https://github.com/mudler/LocalAGI">LocalAGI</a></h3>
+      <p>A powerful Local AI agent management platform that serves as a drop-in replacement for OpenAI's Responses API, enhanced with advanced agentic capabilities.</p>
+    </td>
+  </tr>
+  <tr>
+    <td width="50%" valign="top">
+      <a href="https://github.com/mudler/LocalRecall">
+        <img src="https://raw.githubusercontent.com/mudler/LocalRecall/refs/heads/main/static/localrecall_horizontal.png" width="300" alt="LocalRecall Logo">
+      </a>
+    </td>
+    <td width="50%" valign="top">
+      <h3><a href="https://github.com/mudler/LocalRecall">LocalRecall</a></h3>
+      <p>A REST-ful API and knowledge base management system that provides persistent memory and storage capabilities for AI agents.</p>
+    </td>
+  </tr>
+</table>
+
+## Screenshots
+
+
+| Talk Interface | Generate Audio |
+| --- | --- |
+| ![Screenshot 2025-03-31 at 12-01-36 LocalAI - Talk](./docs/assets/images/screenshots/screenshot_tts.png) | ![Screenshot 2025-03-31 at 12-01-29 LocalAI - Generate audio with voice-en-us-ryan-low](./docs/assets/images/screenshots/screenshot_tts.png) |
+
+| Models Overview | Generate Images |
+| --- | --- |
+| ![Screenshot 2025-03-31 at 12-01-20 LocalAI - Models](./docs/assets/images/screenshots/screenshot_gallery.png) | ![Screenshot 2025-03-31 at 12-31-41 LocalAI - Generate images with flux 1-dev](./docs/assets/images/screenshots/screenshot_image.png) |
+
+| Chat Interface | Home |
+| --- | --- |
+| ![Screenshot 2025-03-31 at 11-57-44 LocalAI - Chat with localai-functioncall-qwen2 5-7b-v0 5](./docs/assets/images/screenshots/screenshot_chat.png) | ![Screenshot 2025-03-31 at 11-57-23 LocalAI API - c2a39e3 (c2a39e3639227cfd94ffffe9f5691239acc275a8)](./docs/assets/images/screenshots/screenshot_home.png) |
+
+| Login | Swarm |
+| --- | --- |
+|![Screenshot 2025-03-31 at 12-09-59 ](./docs/assets/images/screenshots/screenshot_login.png) | ![Screenshot 2025-03-31 at 12-10-39 LocalAI - P2P dashboard](./docs/assets/images/screenshots/screenshot_p2p.png) |
+
+## 💻 Quickstart

 Run the installer script:

@@ -59,17 +107,21 @@ curl https://localai.io/install.sh | sh
 ```

 Or run with docker:
+
+### CPU only image:
 ```bash
-# CPU only image:
 docker run -ti --name local-ai -p 8080:8080 localai/localai:latest-cpu
-
-# Nvidia GPU:
+```
+### Nvidia GPU:
+```bash
 docker run -ti --name local-ai -p 8080:8080 --gpus all localai/localai:latest-gpu-nvidia-cuda-12
-
-# CPU and GPU image (bigger size):
+```
+### CPU and GPU image (bigger size):
+```bash
 docker run -ti --name local-ai -p 8080:8080 localai/localai:latest
-
-# AIO images (it will pre-download a set of models ready for use, see https://localai.io/basics/container/)
+```
+### AIO images (it will pre-download a set of models ready for use, see https://localai.io/basics/container/)
+```bash
 docker run -ti --name local-ai -p 8080:8080 localai/localai:latest-aio-cpu
 ```

@@ -88,10 +140,13 @@ local-ai run https://gist.githubusercontent.com/.../phi-2.yaml
 local-ai run oci://localai/phi-2:latest
 ```

-[💻 Getting started](https://localai.io/basics/getting_started/index.html)
+For more information, see [💻 Getting started](https://localai.io/basics/getting_started/index.html)

 ## 📰 Latest project news

+- Apr 2025: [LocalAGI](https://github.com/mudler/LocalAGI) and [LocalRecall](https://github.com/mudler/LocalRecall) join the LocalAI family stack.
+- Apr 2025: WebUI overhaul, AIO images updates
+- Feb 2025: Backend cleanup, Breaking changes, new backends (kokoro, OutelTTS, faster-whisper), Nvidia L4T images
 - Jan 2025: LocalAI model release: https://huggingface.co/mudler/LocalAI-functioncall-phi-4-v0.3, SANA support in diffusers: https://github.com/mudler/LocalAI/pull/4603
 - Dec 2024: stablediffusion.cpp backend (ggml) added ( https://github.com/mudler/LocalAI/pull/4289 )
 - Nov 2024: Bark.cpp backend added ( https://github.com/mudler/LocalAI/pull/4287 )
@@ -105,19 +160,6 @@ local-ai run oci://localai/phi-2:latest

 Roadmap items: [List of issues](https://github.com/mudler/LocalAI/issues?q=is%3Aissue+is%3Aopen+label%3Aroadmap)

-## 🔥🔥 Hot topics (looking for help):
-
- Multimodal with vLLM and Video understanding: https://github.com/mudler/LocalAI/pull/3729
- Realtime API https://github.com/mudler/LocalAI/issues/3714
- WebUI improvements: https://github.com/mudler/LocalAI/issues/2156
- Backends v2: https://github.com/mudler/LocalAI/issues/1126
- Improving UX v2: https://github.com/mudler/LocalAI/issues/1373
- Assistant API: https://github.com/mudler/LocalAI/issues/1273
- Vulkan: https://github.com/mudler/LocalAI/issues/1647
- Anthropic API: https://github.com/mudler/LocalAI/issues/1808
-
-If you want to help and contribute, issues up for grabs: https://github.com/mudler/LocalAI/issues?q=is%3Aissue+is%3Aopen+label%3A%22up+for+grabs%22
-
 ## 🚀 [Features](https://localai.io/features/)

 - 📖 [Text generation with GPTs](https://localai.io/features/text-generation/) (`llama.cpp`, `transformers`, `vllm` ... [:book: and more](https://localai.io/model-compatibility/index.html#model-compatibility-table))
@@ -131,12 +173,10 @@ If you want to help and contribute, issues up for grabs: https://github.com/mudl
 - 🥽 [Vision API](https://localai.io/features/gpt-vision/)
 - 📈 [Reranker API](https://localai.io/features/reranker/)
 - 🆕🖧 [P2P Inferencing](https://localai.io/features/distribute/)
+- [Agentic capabilities](https://github.com/mudler/LocalAGI)
 - 🔊 Voice activity detection (Silero-VAD support)
 - 🌍 Integrated WebUI!

-## 💻 Usage
-
-Check out the [Getting started](https://localai.io/basics/getting_started/index.html) section in our documentation.

 ### 🔗 Community and integrations

--- a/backend/cpp/llama/CMakeLists.txt
+++ b/backend/cpp/llama/CMakeLists.txt
@@ -2,7 +2,7 @@
 ## XXX: In some versions of CMake clip wasn't being built before llama.
 ## This is an hack for now, but it should be fixed in the future.
 set(TARGET myclip)
-add_library(${TARGET} clip.cpp clip.h llava.cpp llava.h)
+add_library(${TARGET} clip.cpp clip.h clip-impl.h llava.cpp llava.h)
 install(TARGETS ${TARGET} LIBRARY)
 target_include_directories(myclip PUBLIC .)
 target_include_directories(myclip PUBLIC ../..)
--- a/backend/cpp/llama/Makefile
+++ b/backend/cpp/llama/Makefile
@@ -8,7 +8,7 @@ ONEAPI_VARS?=/opt/intel/oneapi/setvars.sh
 TARGET?=--target grpc-server

 # Disable Shared libs as we are linking on static gRPC and we can't mix shared and static
-CMAKE_ARGS+=-DBUILD_SHARED_LIBS=OFF
+CMAKE_ARGS+=-DBUILD_SHARED_LIBS=OFF -DLLAMA_CURL=OFF

 # If build type is cublas, then we set -DGGML_CUDA=ON to CMAKE_ARGS automatically
 ifeq ($(BUILD_TYPE),cublas)
@@ -36,11 +36,18 @@ else ifeq ($(OS),Darwin)
 endif

 ifeq ($(BUILD_TYPE),sycl_f16)
-	CMAKE_ARGS+=-DGGML_SYCL=ON -DCMAKE_C_COMPILER=icx -DCMAKE_CXX_COMPILER=icpx -DGGML_SYCL_F16=ON
+	CMAKE_ARGS+=-DGGML_SYCL=ON \
+		-DCMAKE_C_COMPILER=icx \
+		-DCMAKE_CXX_COMPILER=icpx \
+		-DCMAKE_CXX_FLAGS="-fsycl" \
+		-DGGML_SYCL_F16=ON
 endif

 ifeq ($(BUILD_TYPE),sycl_f32)
-	CMAKE_ARGS+=-DGGML_SYCL=ON -DCMAKE_C_COMPILER=icx -DCMAKE_CXX_COMPILER=icpx
+	CMAKE_ARGS+=-DGGML_SYCL=ON \
+		-DCMAKE_C_COMPILER=icx \
+		-DCMAKE_CXX_COMPILER=icpx \
+		-DCMAKE_CXX_FLAGS="-fsycl"
 endif

 llama.cpp:
@@ -77,4 +84,4 @@ ifneq (,$(findstring sycl,$(BUILD_TYPE)))
 else
 	+cd llama.cpp && mkdir -p build && cd build && cmake .. $(CMAKE_ARGS) && cmake --build . --config Release $(TARGET)
 endif
-	cp llama.cpp/build/bin/grpc-server .
+	cp llama.cpp/build/bin/grpc-server .
--- a/backend/cpp/llama/grpc-server.cpp
+++ b/backend/cpp/llama/grpc-server.cpp
@@ -509,15 +509,15 @@ struct llama_server_context
    bool load_model(const common_params &params_)
    {
        params = params_;
-        if (!params.mmproj.empty()) {
+        if (!params.mmproj.path.empty()) {
            multimodal = true;
            LOG_INFO("Multi Modal Mode Enabled", {});
-            clp_ctx = clip_init(params.mmproj.c_str(), clip_context_params {
+            clp_ctx = clip_init(params.mmproj.path.c_str(), clip_context_params {
                /* use_gpu */ has_gpu,
-                /*verbosity=*/ 1,
+                /*verbosity=*/ GGML_LOG_LEVEL_INFO,
            });
            if(clp_ctx == nullptr) {
-                LOG_ERR("unable to load clip model: %s", params.mmproj.c_str());
+                LOG_ERR("unable to load clip model: %s", params.mmproj.path.c_str());
                return false;
            }

@@ -531,7 +531,7 @@ struct llama_server_context
        ctx = common_init.context.release();
        if (model == nullptr)
        {
-            LOG_ERR("unable to load model: %s", params.model.c_str());
+            LOG_ERR("unable to load model: %s", params.model.path.c_str());
            return false;
        }

@@ -2326,11 +2326,11 @@ static void params_parse(const backend::ModelOptions* request,
   
    // this is comparable to: https://github.com/ggerganov/llama.cpp/blob/d9b33fe95bd257b36c84ee5769cc048230067d6f/examples/server/server.cpp#L1809

-    params.model = request->modelfile();
+    params.model.path = request->modelfile();
    if (!request->mmproj().empty()) {
    // get the directory of modelfile
-      std::string model_dir = params.model.substr(0, params.model.find_last_of("/\\"));
-      params.mmproj = model_dir + "/"+ request->mmproj();
+      std::string model_dir = params.model.path.substr(0, params.model.path.find_last_of("/\\"));
+      params.mmproj.path = model_dir + "/"+ request->mmproj();
    }
    //  params.model_alias ??
    params.model_alias =  request->modelfile();
@@ -2405,7 +2405,7 @@ static void params_parse(const backend::ModelOptions* request,
        scale_factor = request->lorascale();
     }
     // get the directory of modelfile
-     std::string model_dir = params.model.substr(0, params.model.find_last_of("/\\"));
+     std::string model_dir = params.model.path.substr(0, params.model.path.find_last_of("/\\"));
     params.lora_adapters.push_back({ model_dir + "/"+request->loraadapter(), scale_factor });
    }
    params.use_mlock = request->mlock();
--- a/backend/cpp/llama/prepare.sh
+++ b/backend/cpp/llama/prepare.sh
@@ -21,6 +21,7 @@ fi
 ## XXX: In some versions of CMake clip wasn't being built before llama.
 ## This is an hack for now, but it should be fixed in the future.
 cp -rfv llama.cpp/examples/llava/clip.h llama.cpp/examples/grpc-server/clip.h
+cp -rfv llama.cpp/examples/llava/clip-impl.h llama.cpp/examples/grpc-server/clip-impl.h
 cp -rfv llama.cpp/examples/llava/llava.cpp llama.cpp/examples/grpc-server/llava.cpp
 echo '#include "llama.h"' > llama.cpp/examples/grpc-server/llava.h
 cat llama.cpp/examples/llava/llava.h >> llama.cpp/examples/grpc-server/llava.h
--- a/backend/go/image/stablediffusion-ggml/Makefile
+++ b/backend/go/image/stablediffusion-ggml/Makefile
@@ -8,6 +8,13 @@ ONEAPI_VARS?=/opt/intel/oneapi/setvars.sh
 # keep standard at C11 and C++11
 CXXFLAGS = -I. -I$(INCLUDE_PATH)/../../../../sources/stablediffusion-ggml.cpp/thirdparty -I$(INCLUDE_PATH)/../../../../sources/stablediffusion-ggml.cpp/ggml/include -I$(INCLUDE_PATH)/../../../../sources/stablediffusion-ggml.cpp -O3 -DNDEBUG -std=c++17 -fPIC

+GOCMD?=go
+CGO_LDFLAGS?=
+# Avoid parent make file overwriting CGO_LDFLAGS which is needed for hipblas
+CGO_LDFLAGS_SYCL=
+GO_TAGS?=
+LD_FLAGS?=
+
 # Disable Shared libs as we are linking on static gRPC and we can't mix shared and static
 CMAKE_ARGS+=-DBUILD_SHARED_LIBS=OFF

@@ -21,7 +28,7 @@ else ifeq ($(BUILD_TYPE),openblas)
 # If build type is clblas (openCL) we set -DGGML_CLBLAST=ON -DCLBlast_DIR=/some/path
 else ifeq ($(BUILD_TYPE),clblas)
 	CMAKE_ARGS+=-DGGML_CLBLAST=ON -DCLBlast_DIR=/some/path
-# If it's hipblas we do have also to set CC=/opt/rocm/llvm/bin/clang CXX=/opt/rocm/llvm/bin/clang++ 
+# If it's hipblas we do have also to set CC=/opt/rocm/llvm/bin/clang CXX=/opt/rocm/llvm/bin/clang++
 else ifeq ($(BUILD_TYPE),hipblas)
 	CMAKE_ARGS+=-DGGML_HIP=ON
 # If it's OSX, DO NOT embed the metal library - -DGGML_METAL_EMBED_LIBRARY=ON requires further investigation
@@ -36,16 +43,35 @@ else ifeq ($(OS),Darwin)
 	endif
 endif

-# ifeq ($(BUILD_TYPE),sycl_f16)
-# 	CMAKE_ARGS+=-DGGML_SYCL=ON -DCMAKE_C_COMPILER=icx -DCMAKE_CXX_COMPILER=icpx -DGGML_SYCL_F16=ON -DSD_SYCL=ON -DGGML_SYCL_F16=ON
-# endif
+ifeq ($(BUILD_TYPE),sycl_f16)
+	CMAKE_ARGS+=-DGGML_SYCL=ON \
+		-DCMAKE_C_COMPILER=icx \
+		-DCMAKE_CXX_COMPILER=icpx \
+		-DSD_SYCL=ON \
+		-DGGML_SYCL_F16=ON
+	CC=icx
+	CXX=icpx
+	CGO_LDFLAGS_SYCL += -fsycl -L${DNNLROOT}/lib -ldnnl ${MKLROOT}/lib/intel64/libmkl_sycl.a -fiopenmp -fopenmp-targets=spir64 -lOpenCL
+	CGO_LDFLAGS_SYCL += $(shell pkg-config --libs mkl-static-lp64-gomp)
+	CGO_CXXFLAGS += -fiopenmp -fopenmp-targets=spir64
+	CGO_CXXFLAGS += $(shell pkg-config --cflags mkl-static-lp64-gomp )
+endif

-# ifeq ($(BUILD_TYPE),sycl_f32)
-# 	CMAKE_ARGS+=-DGGML_SYCL=ON -DCMAKE_C_COMPILER=icx -DCMAKE_CXX_COMPILER=icpx -DSD_SYCL=ON
-# endif
+ifeq ($(BUILD_TYPE),sycl_f32)
+	CMAKE_ARGS+=-DGGML_SYCL=ON \
+		-DCMAKE_C_COMPILER=icx \
+		-DCMAKE_CXX_COMPILER=icpx \
+		-DSD_SYCL=ON
+	CC=icx
+	CXX=icpx
+	CGO_LDFLAGS_SYCL += -fsycl -L${DNNLROOT}/lib -ldnnl ${MKLROOT}/lib/intel64/libmkl_sycl.a -fiopenmp -fopenmp-targets=spir64 -lOpenCL
+	CGO_LDFLAGS_SYCL += $(shell pkg-config --libs mkl-static-lp64-gomp)
+	CGO_CXXFLAGS += -fiopenmp -fopenmp-targets=spir64
+	CGO_CXXFLAGS += $(shell pkg-config --cflags mkl-static-lp64-gomp )
+endif

 # warnings
-CXXFLAGS += -Wall -Wextra -Wpedantic -Wcast-qual -Wno-unused-function
+# CXXFLAGS += -Wall -Wextra -Wpedantic -Wcast-qual -Wno-unused-function

 # Find all .a archives in ARCHIVE_DIR
 # (ggml can have different backends cpu, cuda, etc., each backend generates a .a archive)
@@ -86,11 +112,24 @@ endif
 	$(MAKE) $(COMBINED_LIB)

 gosd.o:
+ifneq (,$(findstring sycl,$(BUILD_TYPE)))
+	+bash -c "source $(ONEAPI_VARS); \
+	$(CXX) $(CXXFLAGS) gosd.cpp -o gosd.o -c"
+else
 	$(CXX) $(CXXFLAGS) gosd.cpp -o gosd.o -c
+endif

 libsd.a: gosd.o
 	cp $(INCLUDE_PATH)/build/libstable-diffusion.a ./libsd.a
 	$(AR) rcs libsd.a gosd.o

+stablediffusion-ggml:
+	CGO_LDFLAGS="$(CGO_LDFLAGS) $(CGO_LDFLAGS_SYCL)" C_INCLUDE_PATH="$(INCLUDE_PATH)" LIBRARY_PATH="$(LIBRARY_PATH)" \
+	CC="$(CC)" CXX="$(CXX)" CGO_CXXFLAGS="$(CGO_CXXFLAGS)" \
+	$(GOCMD) build -ldflags "$(LD_FLAGS)" -tags "$(GO_TAGS)" -o ../../../../backend-assets/grpc/stablediffusion-ggml ./
+ifneq ($(UPX),)
+	$(UPX) ../../../../backend-assets/grpc/stablediffusion-ggml
+endif
+
 clean:
-	rm -rf gosd.o libsd.a build $(COMBINED_LIB)
+	rm -rf gosd.o libsd.a build $(COMBINED_LIB)
--- a/backend/python/diffusers/backend.py
+++ b/backend/python/diffusers/backend.py
@@ -19,7 +19,7 @@ import grpc

 from diffusers import SanaPipeline, StableDiffusion3Pipeline, StableDiffusionXLPipeline, StableDiffusionDepth2ImgPipeline, DPMSolverMultistepScheduler, StableDiffusionPipeline, DiffusionPipeline, \
    EulerAncestralDiscreteScheduler, FluxPipeline, FluxTransformer2DModel
-from diffusers import StableDiffusionImg2ImgPipeline, AutoPipelineForText2Image, ControlNetModel, StableVideoDiffusionPipeline
+from diffusers import StableDiffusionImg2ImgPipeline, AutoPipelineForText2Image, ControlNetModel, StableVideoDiffusionPipeline, Lumina2Text2ImgPipeline
 from diffusers.pipelines.stable_diffusion import safety_checker
 from diffusers.utils import load_image, export_to_video
 from compel import Compel, ReturnedEmbeddingsType
@@ -287,6 +287,12 @@ class BackendServicer(backend_pb2_grpc.BackendServicer):

                    if request.LowVRAM:
                        self.pipe.enable_model_cpu_offload()
+            elif request.PipelineType == "Lumina2Text2ImgPipeline":
+                self.pipe = Lumina2Text2ImgPipeline.from_pretrained(
+                    request.Model,
+                    torch_dtype=torch.bfloat16)
+                if request.LowVRAM:
+                    self.pipe.enable_model_cpu_offload()
            elif request.PipelineType == "SanaPipeline":
                self.pipe = SanaPipeline.from_pretrained(
                    request.Model,
--- a/core/application/application.go
+++ b/core/application/application.go
@@ -16,7 +16,7 @@ type Application struct {
 func newApplication(appConfig *config.ApplicationConfig) *Application {
 	return &Application{
 		backendLoader:      config.NewBackendConfigLoader(appConfig.ModelPath),
-		modelLoader:        model.NewModelLoader(appConfig.ModelPath),
+		modelLoader:        model.NewModelLoader(appConfig.ModelPath, appConfig.SingleBackend),
 		applicationConfig:  appConfig,
 		templatesEvaluator: templates.NewEvaluator(appConfig.ModelPath),
 	}
--- a/core/application/startup.go
+++ b/core/application/startup.go
@@ -143,7 +143,7 @@ func New(opts ...config.AppOption) (*Application, error) {
 		}()
 	}

-	if options.LoadToMemory != nil {
+	if options.LoadToMemory != nil && !options.SingleBackend {
 		for _, m := range options.LoadToMemory {
 			cfg, err := application.BackendLoader().LoadBackendConfigFileByNameDefaultOptions(m, options)
 			if err != nil {
--- a/core/backend/embeddings.go
+++ b/core/backend/embeddings.go
@@ -17,6 +17,7 @@ func ModelEmbedding(s string, tokens []int, loader *model.ModelLoader, backendCo
 	if err != nil {
 		return nil, err
 	}
+	defer loader.Close()

 	var fn func() ([]float32, error)
 	switch model := inferenceModel.(type) {
--- a/core/backend/image.go
+++ b/core/backend/image.go
@@ -16,6 +16,7 @@ func ImageGeneration(height, width, mode, step, seed int, positive_prompt, negat
 	if err != nil {
 		return nil, err
 	}
+	defer loader.Close()

 	fn := func() error {
 		_, err := inferenceModel.GenerateImage(
--- a/core/backend/llm.go
+++ b/core/backend/llm.go
@@ -53,6 +53,7 @@ func ModelInference(ctx context.Context, s string, messages []schema.Message, im
 	if err != nil {
 		return nil, err
 	}
+	defer loader.Close()

 	var protoMessages []*proto.Message
 	// if we are using the tokenizer template, we need to convert the messages to proto messages
--- a/core/backend/options.go
+++ b/core/backend/options.go
@@ -40,10 +40,6 @@ func ModelOptions(c config.BackendConfig, so *config.ApplicationConfig, opts ...
 	grpcOpts := grpcModelOpts(c)
 	defOpts = append(defOpts, model.WithLoadGRPCLoadModelOpts(grpcOpts))

-	if so.SingleBackend {
-		defOpts = append(defOpts, model.WithSingleActiveBackend())
-	}
-
 	if so.ParallelBackendRequests {
 		defOpts = append(defOpts, model.EnableParallelRequests)
 	}
@@ -121,7 +117,7 @@ func grpcModelOpts(c config.BackendConfig) *pb.ModelOptions {
 	triggers := make([]*pb.GrammarTrigger, 0)
 	for _, t := range c.FunctionsConfig.GrammarConfig.GrammarTriggers {
 		triggers = append(triggers, &pb.GrammarTrigger{
-			Word:    t.Word,
+			Word: t.Word,
 		})

 	}
@@ -161,33 +157,33 @@ func grpcModelOpts(c config.BackendConfig) *pb.ModelOptions {
 		DisableLogStatus:     c.DisableLogStatus,
 		DType:                c.DType,
 		// LimitMMPerPrompt vLLM
-		LimitImagePerPrompt:  int32(c.LimitMMPerPrompt.LimitImagePerPrompt),
-		LimitVideoPerPrompt:  int32(c.LimitMMPerPrompt.LimitVideoPerPrompt),
-		LimitAudioPerPrompt:  int32(c.LimitMMPerPrompt.LimitAudioPerPrompt),
-		MMProj:               c.MMProj,
-		FlashAttention:       c.FlashAttention,
-		CacheTypeKey:         c.CacheTypeK,
-		CacheTypeValue:       c.CacheTypeV,
-		NoKVOffload:          c.NoKVOffloading,
-		YarnExtFactor:        c.YarnExtFactor,
-		YarnAttnFactor:       c.YarnAttnFactor,
-		YarnBetaFast:         c.YarnBetaFast,
-		YarnBetaSlow:         c.YarnBetaSlow,
-		NGQA:                 c.NGQA,
-		RMSNormEps:           c.RMSNormEps,
-		MLock:                mmlock,
-		RopeFreqBase:         c.RopeFreqBase,
-		RopeScaling:          c.RopeScaling,
-		Type:                 c.ModelType,
-		RopeFreqScale:        c.RopeFreqScale,
-		NUMA:                 c.NUMA,
-		Embeddings:           embeddings,
-		LowVRAM:              lowVRAM,
-		NGPULayers:           int32(nGPULayers),
-		MMap:                 mmap,
-		MainGPU:              c.MainGPU,
-		Threads:              int32(*c.Threads),
-		TensorSplit:          c.TensorSplit,
+		LimitImagePerPrompt: int32(c.LimitMMPerPrompt.LimitImagePerPrompt),
+		LimitVideoPerPrompt: int32(c.LimitMMPerPrompt.LimitVideoPerPrompt),
+		LimitAudioPerPrompt: int32(c.LimitMMPerPrompt.LimitAudioPerPrompt),
+		MMProj:              c.MMProj,
+		FlashAttention:      c.FlashAttention,
+		CacheTypeKey:        c.CacheTypeK,
+		CacheTypeValue:      c.CacheTypeV,
+		NoKVOffload:         c.NoKVOffloading,
+		YarnExtFactor:       c.YarnExtFactor,
+		YarnAttnFactor:      c.YarnAttnFactor,
+		YarnBetaFast:        c.YarnBetaFast,
+		YarnBetaSlow:        c.YarnBetaSlow,
+		NGQA:                c.NGQA,
+		RMSNormEps:          c.RMSNormEps,
+		MLock:               mmlock,
+		RopeFreqBase:        c.RopeFreqBase,
+		RopeScaling:         c.RopeScaling,
+		Type:                c.ModelType,
+		RopeFreqScale:       c.RopeFreqScale,
+		NUMA:                c.NUMA,
+		Embeddings:          embeddings,
+		LowVRAM:             lowVRAM,
+		NGPULayers:          int32(nGPULayers),
+		MMap:                mmap,
+		MainGPU:             c.MainGPU,
+		Threads:             int32(*c.Threads),
+		TensorSplit:         c.TensorSplit,
 		// AutoGPTQ
 		ModelBaseName:    c.AutoGPTQ.ModelBaseName,
 		Device:           c.AutoGPTQ.Device,
--- a/core/backend/rerank.go
+++ b/core/backend/rerank.go
@@ -12,10 +12,10 @@ import (
 func Rerank(request *proto.RerankRequest, loader *model.ModelLoader, appConfig *config.ApplicationConfig, backendConfig config.BackendConfig) (*proto.RerankResult, error) {
 	opts := ModelOptions(backendConfig, appConfig)
 	rerankModel, err := loader.Load(opts...)
-
 	if err != nil {
 		return nil, err
 	}
+	defer loader.Close()

 	if rerankModel == nil {
 		return nil, fmt.Errorf("could not load rerank model")
--- a/core/backend/soundgeneration.go
+++ b/core/backend/soundgeneration.go
@@ -26,10 +26,10 @@ func SoundGeneration(

 	opts := ModelOptions(backendConfig, appConfig)
 	soundGenModel, err := loader.Load(opts...)
-
 	if err != nil {
 		return "", nil, err
 	}
+	defer loader.Close()

 	if soundGenModel == nil {
 		return "", nil, fmt.Errorf("could not load sound generation model")
--- a/core/backend/token_metrics.go
+++ b/core/backend/token_metrics.go
@@ -20,6 +20,7 @@ func TokenMetrics(
 	if err != nil {
 		return nil, err
 	}
+	defer loader.Close()

 	if model == nil {
 		return nil, fmt.Errorf("could not loadmodel model")
--- a/core/backend/tokenize.go
+++ b/core/backend/tokenize.go
@@ -14,10 +14,10 @@ func ModelTokenize(s string, loader *model.ModelLoader, backendConfig config.Bac

 	opts := ModelOptions(backendConfig, appConfig)
 	inferenceModel, err = loader.Load(opts...)
-
 	if err != nil {
 		return schema.TokenizeResponse{}, err
 	}
+	defer loader.Close()

 	predictOptions := gRPCPredictOpts(backendConfig, loader.ModelPath)
 	predictOptions.Prompt = s
--- a/core/backend/transcript.go
+++ b/core/backend/transcript.go
@@ -24,6 +24,7 @@ func ModelTranscription(audio, language string, translate bool, ml *model.ModelL
 	if err != nil {
 		return nil, err
 	}
+	defer ml.Close()

 	if transcriptionModel == nil {
 		return nil, fmt.Errorf("could not load transcription model")
--- a/core/backend/tts.go
+++ b/core/backend/tts.go
@@ -23,10 +23,10 @@ func ModelTTS(
 ) (string, *proto.Result, error) {
 	opts := ModelOptions(backendConfig, appConfig, model.WithDefaultBackendString(model.PiperBackend))
 	ttsModel, err := loader.Load(opts...)
-
 	if err != nil {
 		return "", nil, err
 	}
+	defer loader.Close()

 	if ttsModel == nil {
 		return "", nil, fmt.Errorf("could not load tts model %q", backendConfig.Model)
--- a/core/backend/vad.go
+++ b/core/backend/vad.go
@@ -19,6 +19,8 @@ func VAD(request *schema.VADRequest,
 	if err != nil {
 		return nil, err
 	}
+	defer ml.Close()
+
 	req := proto.VADRequest{
 		Audio: request.Audio,
 	}
--- a/core/cli/soundgeneration.go
+++ b/core/cli/soundgeneration.go
@@ -74,7 +74,7 @@ func (t *SoundGenerationCMD) Run(ctx *cliContext.Context) error {
 		AssetsDestination:    t.BackendAssetsPath,
 		ExternalGRPCBackends: externalBackends,
 	}
-	ml := model.NewModelLoader(opts.ModelPath)
+	ml := model.NewModelLoader(opts.ModelPath, opts.SingleBackend)

 	defer func() {
 		err := ml.StopAllGRPC()
--- a/core/cli/transcript.go
+++ b/core/cli/transcript.go
@@ -32,7 +32,7 @@ func (t *TranscriptCMD) Run(ctx *cliContext.Context) error {
 	}

 	cl := config.NewBackendConfigLoader(t.ModelsPath)
-	ml := model.NewModelLoader(opts.ModelPath)
+	ml := model.NewModelLoader(opts.ModelPath, opts.SingleBackend)
 	if err := cl.LoadBackendConfigsFromPath(t.ModelsPath); err != nil {
 		return err
 	}
--- a/core/cli/tts.go
+++ b/core/cli/tts.go
@@ -41,7 +41,7 @@ func (t *TTSCMD) Run(ctx *cliContext.Context) error {
 		AudioDir:          outputDir,
 		AssetsDestination: t.BackendAssetsPath,
 	}
-	ml := model.NewModelLoader(opts.ModelPath)
+	ml := model.NewModelLoader(opts.ModelPath, opts.SingleBackend)

 	defer func() {
 		err := ml.StopAllGRPC()
--- a/core/http/app.go
+++ b/core/http/app.go
@@ -142,9 +142,9 @@ func API(application *application.Application) (*fiber.App, error) {
 	httpFS := http.FS(embedDirStatic)

 	router.Use(favicon.New(favicon.Config{
-		URL:        "/favicon.ico",
+		URL:        "/favicon.svg",
 		FileSystem: httpFS,
-		File:       "static/favicon.ico",
+		File:       "static/favicon.svg",
 	}))

 	router.Use("/static", filesystem.New(filesystem.Config{
--- a/core/http/elements/gallery.go
+++ b/core/http/elements/gallery.go
@@ -122,15 +122,15 @@ func modelModal(m *gallery.GalleryModel) elem.Node {
 			"id":          modalName(m),
 			"tabindex":    "-1",
 			"aria-hidden": "true",
-			"class":       "hidden overflow-y-auto overflow-x-hidden fixed top-0 right-0 left-0 z-50 justify-center items-center w-full md:inset-0 h-[calc(100%-1rem)] max-h-full",
+			"class":       "hidden fixed top-0 right-0 left-0 z-50 justify-center items-center w-full md:inset-0 h-full max-h-full bg-gray-900/50",
 		},
 		elem.Div(
 			attrs.Props{
-				"class": "relative p-4 w-full max-w-2xl max-h-full",
+				"class": "relative p-4 w-full max-w-2xl h-[90vh] mx-auto mt-[5vh]",
 			},
 			elem.Div(
 				attrs.Props{
-					"class": "relative p-4 w-full max-w-2xl max-h-full bg-white rounded-lg shadow dark:bg-gray-700",
+					"class": "relative bg-white rounded-lg shadow dark:bg-gray-700 h-full flex flex-col",
 				},
 				// header
 				elem.Div(
@@ -164,14 +164,13 @@ func modelModal(m *gallery.GalleryModel) elem.Node {
 				// body
 				elem.Div(
 					attrs.Props{
-						"class": "p-4 md:p-5 space-y-4",
+						"class": "p-4 md:p-5 space-y-4 overflow-y-auto flex-1 min-h-0",
 					},
 					elem.Div(
 						attrs.Props{
 							"class": "flex justify-center items-center",
 						},
 						elem.Img(attrs.Props{
-							//	"class": "rounded-t-lg object-fit object-center h-96",
 							"class":   "lazy rounded-t-lg max-h-48 max-w-96 object-cover mt-3 entered loaded",
 							"src":     m.Icon,
 							"loading": "lazy",
@@ -232,7 +231,6 @@ func modelModal(m *gallery.GalleryModel) elem.Node {
 			),
 		),
 	)
-
 }

 func modelDescription(m *gallery.GalleryModel) elem.Node {
--- a/core/http/endpoints/localai/stores.go
+++ b/core/http/endpoints/localai/stores.go
@@ -21,6 +21,7 @@ func StoresSetEndpoint(sl *model.ModelLoader, appConfig *config.ApplicationConfi
 		if err != nil {
 			return err
 		}
+		defer sl.Close()

 		vals := make([][]byte, len(input.Values))
 		for i, v := range input.Values {
@@ -48,6 +49,7 @@ func StoresDeleteEndpoint(sl *model.ModelLoader, appConfig *config.ApplicationCo
 		if err != nil {
 			return err
 		}
+		defer sl.Close()

 		if err := store.DeleteCols(c.Context(), sb, input.Keys); err != nil {
 			return err
@@ -69,6 +71,7 @@ func StoresGetEndpoint(sl *model.ModelLoader, appConfig *config.ApplicationConfi
 		if err != nil {
 			return err
 		}
+		defer sl.Close()

 		keys, vals, err := store.GetCols(c.Context(), sb, input.Keys)
 		if err != nil {
@@ -100,6 +103,7 @@ func StoresFindEndpoint(sl *model.ModelLoader, appConfig *config.ApplicationConf
 		if err != nil {
 			return err
 		}
+		defer sl.Close()

 		keys, vals, similarities, err := store.Find(c.Context(), sb, input.Key, input.Topk)
 		if err != nil {
--- a/core/http/endpoints/openai/assistant_test.go
+++ b/core/http/endpoints/openai/assistant_test.go
@@ -40,7 +40,7 @@ func TestAssistantEndpoints(t *testing.T) {
 	cl := &config.BackendConfigLoader{}
 	//configsDir := "/tmp/localai/configs"
 	modelPath := "/tmp/localai/model"
-	var ml = model.NewModelLoader(modelPath)
+	var ml = model.NewModelLoader(modelPath, false)

 	appConfig := &config.ApplicationConfig{
 		ConfigsDir:    configsDir,
--- a/core/http/explorer.go
+++ b/core/http/explorer.go
@@ -29,9 +29,9 @@ func Explorer(db *explorer.Database) *fiber.App {
 	httpFS := http.FS(embedDirStatic)

 	app.Use(favicon.New(favicon.Config{
-		URL:        "/favicon.ico",
+		URL:        "/favicon.svg",
 		FileSystem: httpFS,
-		File:       "static/favicon.ico",
+		File:       "static/favicon.svg",
 	}))

 	app.Use("/static", filesystem.New(filesystem.Config{
--- a/core/http/routes/localai.go
+++ b/core/http/routes/localai.go
@@ -50,11 +50,10 @@ func RegisterLocalAIRoutes(router *fiber.App,
 	router.Post("/v1/vad", vadChain...)

 	// Stores
-	sl := model.NewModelLoader("")
-	router.Post("/stores/set", localai.StoresSetEndpoint(sl, appConfig))
-	router.Post("/stores/delete", localai.StoresDeleteEndpoint(sl, appConfig))
-	router.Post("/stores/get", localai.StoresGetEndpoint(sl, appConfig))
-	router.Post("/stores/find", localai.StoresFindEndpoint(sl, appConfig))
+	router.Post("/stores/set", localai.StoresSetEndpoint(ml, appConfig))
+	router.Post("/stores/delete", localai.StoresDeleteEndpoint(ml, appConfig))
+	router.Post("/stores/get", localai.StoresGetEndpoint(ml, appConfig))
+	router.Post("/stores/find", localai.StoresFindEndpoint(ml, appConfig))

 	if !appConfig.DisableMetrics {
 		router.Get("/metrics", localai.LocalAIMetricsEndpoint())
--- a/core/http/static/favicon.ico
+++ b/core/http/static/favicon.ico
--- a/core/http/static/favicon.svg
+++ b/core/http/static/favicon.svg
--- a/core/http/static/logo.png
+++ b/core/http/static/logo.png
--- a/core/http/static/logo_horizontal.png
+++ b/core/http/static/logo_horizontal.png
--- a/core/http/views/login.html
+++ b/core/http/views/login.html
@@ -12,7 +12,7 @@
        <div class="max-w-md w-full bg-gray-800/90 border border-gray-700/50 rounded-xl overflow-hidden shadow-xl">
            <div class="animation-container">
                <div class="text-overlay">
-                <!--    <i class="fas fa-circle-nodes text-5xl text-blue-400 mb-2"></i> -->
+                    <img src="static/logo.png" alt="LocalAI Logo" class="h-32">
                </div>
            </div>
            
--- a/core/http/views/partials/head.html
+++ b/core/http/views/partials/head.html
@@ -3,7 +3,7 @@
  <meta name="viewport" content="width=device-width, initial-scale=1.0">
  <title>{{.Title}}</title>
  <base href="{{.BaseURL}}" />
-  <link rel="icon" type="image/x-icon" href="favicon.ico" />
+  <link rel="shortcut icon" href="static/favicon.svg" type="image/svg">
  <link rel="stylesheet" href="static/assets/highlightjs.css" />
  <script defer src="static/assets/highlightjs.js"></script>
  <script defer src="static/assets/alpine.js"></script>
--- a/core/http/views/partials/navbar.html
+++ b/core/http/views/partials/navbar.html
@@ -4,10 +4,9 @@
            <div class="flex items-center">
                <!-- Logo Image -->
                <a href="./" class="flex items-center group">
-                    <img src="https://github.com/go-skynet/LocalAI/assets/2420543/0966aa2a-166e-4f99-a3e5-6c915fc997dd" 
+                    <img src="static/logo_horizontal.png" 
                         alt="LocalAI Logo" 
-                         class="h-10 mr-3 rounded-lg border border-blue-600/30 shadow-md transition-all duration-300 group-hover:shadow-blue-500/20 group-hover:border-blue-500/50">
-                    <span class="text-white text-xl font-bold bg-clip-text text-transparent bg-gradient-to-r from-blue-400 to-indigo-400">LocalAI</span>
+                         class="h-14 mr-3 brightness-110 transition-all duration-300 group-hover:brightness-125">
                </a>
            </div>
            
--- a/core/http/views/partials/navbar_explorer.html
+++ b/core/http/views/partials/navbar_explorer.html
@@ -4,10 +4,9 @@
            <div class="flex items-center">
                <!-- Logo Image -->
                <a href="./" class="flex items-center group">
-                    <img src="https://github.com/go-skynet/LocalAI/assets/2420543/0966aa2a-166e-4f99-a3e5-6c915fc997dd" 
+                    <img src="static/logo_horizontal.png" 
                         alt="LocalAI Logo" 
                         class="h-10 mr-3 rounded-lg border border-blue-600/30 shadow-md transition-all duration-300 group-hover:shadow-blue-500/20 group-hover:border-blue-500/50">
-                    <span class="text-white text-xl font-bold bg-clip-text text-transparent bg-gradient-to-r from-blue-400 to-indigo-400">LocalAI</span>
                </a>
            </div>
            
--- a/docs/assets/images/imagen.png
+++ b/docs/assets/images/imagen.png
--- a/docs/assets/images/localai_screenshot.png
+++ b/docs/assets/images/localai_screenshot.png
--- a/docs/assets/images/logos/logo.png
+++ b/docs/assets/images/logos/logo.png
--- a/docs/assets/images/logos/logo.svg
+++ b/docs/assets/images/logos/logo.svg
--- a/docs/assets/images/screenshots/screenshot_chat.png
+++ b/docs/assets/images/screenshots/screenshot_chat.png
--- a/docs/assets/images/screenshots/screenshot_gallery.png
+++ b/docs/assets/images/screenshots/screenshot_gallery.png
--- a/docs/assets/images/screenshots/screenshot_home.png
+++ b/docs/assets/images/screenshots/screenshot_home.png
--- a/docs/assets/images/screenshots/screenshot_image.png
+++ b/docs/assets/images/screenshots/screenshot_image.png
--- a/docs/assets/images/screenshots/screenshot_login.png
+++ b/docs/assets/images/screenshots/screenshot_login.png
--- a/docs/assets/images/screenshots/screenshot_p2p.png
+++ b/docs/assets/images/screenshots/screenshot_p2p.png
--- a/docs/assets/images/screenshots/screenshot_talk.png
+++ b/docs/assets/images/screenshots/screenshot_talk.png
--- a/docs/assets/images/screenshots/screenshot_tts.png
+++ b/docs/assets/images/screenshots/screenshot_tts.png
--- a/docs/assets/jsconfig.json
+++ b/docs/assets/jsconfig.json
@@ -3,7 +3,7 @@
  "baseUrl": ".",
  "paths": {
   "*": [
-    "../../../../.cache/hugo_cache/modules/filecache/modules/pkg/mod/github.com/gohugoio/hugo-mod-jslibs-dist/popperjs/v2@v2.21100.20000/package/dist/cjs/popper.js/*",
+    "../../../../.cache/hugo_cache/modules/filecache/modules/pkg/mod/github.com/gohugoio/hugo-mod-jslibs-dist/popperjs/v2@v2.21100.20000/package/dist/cjs/*",
    "../../../../.cache/hugo_cache/modules/filecache/modules/pkg/mod/github.com/twbs/bootstrap@v5.3.2+incompatible/js/*"
   ]
  }
--- a/docs/config.toml
+++ b/docs/config.toml
@@ -48,9 +48,9 @@ defaultContentLanguage = 'en'

    [params.docs] # Parameters for the /docs 'template'

-        logo = "https://github.com/go-skynet/LocalAI/assets/2420543/0966aa2a-166e-4f99-a3e5-6c915fc997dd"
-        logo_text = "LocalAI"
-        title           = "LocalAI documentation"           # default html title for documentation pages/sections
+        logo = "https://raw.githubusercontent.com/mudler/LocalAI/refs/heads/master/core/http/static/logo.png"
+        logo_text = ""
+        title           = "LocalAI"           # default html title for documentation pages/sections

        pathName        = "docs"                            # path name for documentation site | default "docs"

@@ -108,6 +108,7 @@ defaultContentLanguage = 'en'
        # indexName = "" # Index Name to perform search on (or set env variable HUGO_PARAM_DOCSEARCH_indexName)

    [params.analytics] # Parameters for Analytics (Google, Plausible)
+        # google = "G-XXXXXXXXXX" # Replace with your Google Analytics ID
        # plausibleURL    = "/docs/s" # (or set via env variable HUGO_PARAM_ANALYTICS_plausibleURL)
        # plausibleAPI    = "/docs/s" # optional - (or set via env variable HUGO_PARAM_ANALYTICS_plausibleAPI)
        # plausibleDomain = ""      # (or set via env variable HUGO_PARAM_ANALYTICS_plausibleDomain)
@@ -151,7 +152,7 @@ defaultContentLanguage = 'en'

 [languages]
  [languages.en]
-    title = "LocalAI documentation"
+    title = "LocalAI"
    languageName = "English"
    weight = 10
 #  [languages.fr]
--- a/docs/content/docs/features/distributed_inferencing.md
+++ b/docs/content/docs/features/distributed_inferencing.md
@@ -13,6 +13,8 @@ LocalAI supports two modes of distributed inferencing via p2p:
 - **Federated Mode**: Requests are shared between the cluster and routed to a single worker node in the network based on the load balancer's decision.
 - **Worker Mode** (aka "model sharding" or "splitting weights"): Requests are processed by all the workers which contributes to the final inference result (by sharing the model weights).

+A list of global instances shared by the community is available at [explorer.localai.io](https://explorer.localai.io).
+
 ## Usage

 Starting LocalAI with `--p2p` generates a shared token for connecting multiple instances: and that's all you need to create AI clusters, eliminating the need for intricate network setups. 
--- a/docs/content/docs/getting-started/quickstart.md
+++ b/docs/content/docs/getting-started/quickstart.md
@@ -18,14 +18,45 @@ To access the WebUI with an API_KEY, browser extensions such as [Requestly](http

 {{% /alert %}}

-## Using the Bash Installer
+## Quickstart

-Install LocalAI easily using the bash installer with the following command:

-```sh
+### Using the Bash Installer
+```bash
 curl https://localai.io/install.sh | sh
 ```

+### Run with docker:
+```bash
+# CPU only image:
+docker run -ti --name local-ai -p 8080:8080 localai/localai:latest-cpu
+
+# Nvidia GPU:
+docker run -ti --name local-ai -p 8080:8080 --gpus all localai/localai:latest-gpu-nvidia-cuda-12
+
+# CPU and GPU image (bigger size):
+docker run -ti --name local-ai -p 8080:8080 localai/localai:latest
+
+# AIO images (it will pre-download a set of models ready for use, see https://localai.io/basics/container/)
+docker run -ti --name local-ai -p 8080:8080 localai/localai:latest-aio-cpu
+```
+
+### Load models:
+
+```bash
+# From the model gallery (see available models with `local-ai models list`, in the WebUI from the model tab, or visiting https://models.localai.io)
+local-ai run llama-3.2-1b-instruct:q4_k_m
+# Start LocalAI with the phi-2 model directly from huggingface
+local-ai run huggingface://TheBloke/phi-2-GGUF/phi-2.Q8_0.gguf
+# Install and run a model from the Ollama OCI registry
+local-ai run ollama://gemma:2b
+# Run a model from a configuration file
+local-ai run https://gist.githubusercontent.com/.../phi-2.yaml
+# Install and run a model from a standard OCI registry (e.g., Docker Hub)
+local-ai run oci://localai/phi-2:latest
+```
+
+
 For a full list of options, refer to the [Installer Options]({{% relref "docs/advanced/installer" %}}) documentation.

 Binaries can also be [manually downloaded]({{% relref "docs/reference/binaries" %}}).
--- a/docs/content/docs/overview.md
+++ b/docs/content/docs/overview.md
@@ -1,4 +1,3 @@
-
 +++
 title = "Overview"
 weight = 1
@@ -7,162 +6,96 @@ description = "What is LocalAI?"
 tags = ["Beginners"]
 categories = [""]
 author = "Ettore Di Giacinto"
-# This allows to overwrite the landing page
-url = '/'
 icon = "info"
 +++

-<p align="center">
-<a href="https://localai.io"><img width=512 src="https://github.com/go-skynet/LocalAI/assets/2420543/0966aa2a-166e-4f99-a3e5-6c915fc997dd"></a>
-</p               >
+# Welcome to LocalAI

-<p align="center">
-<a href="https://github.com/go-skynet/LocalAI/fork" target="blank">
-<img src="https://img.shields.io/github/forks/go-skynet/LocalAI?style=for-the-badge" alt="LocalAI forks"/>
-</a>
-<a href="https://github.com/go-skynet/LocalAI/stargazers" target="blank">
-<img src="https://img.shields.io/github/stars/go-skynet/LocalAI?style=for-the-badge" alt="LocalAI stars"/>
-</a>
-<a href="https://github.com/go-skynet/LocalAI/pulls" target="blank">
-<img src="https://img.shields.io/github/issues-pr/go-skynet/LocalAI?style=for-the-badge" alt="LocalAI pull-requests"/>
-</a>
-<a href='https://github.com/go-skynet/LocalAI/releases'>
-<img src='https://img.shields.io/github/release/go-skynet/LocalAI?&label=Latest&style=for-the-badge'>
-</a>
-</p>
+LocalAI is your complete AI stack for running AI models locally. It's designed to be simple, efficient, and accessible, providing a drop-in replacement for OpenAI's API while keeping your data private and secure.

-<p align="center">
-<a href="https://hub.docker.com/r/localai/localai" target="blank">
-<img src="https://img.shields.io/badge/dockerhub-images-important.svg?logo=Docker" alt="LocalAI Docker hub"/>
-</a>
-<a href="https://quay.io/repository/go-skynet/local-ai?tab=tags&tag=latest" target="blank">
-<img src="https://img.shields.io/badge/quay.io-images-important.svg?" alt="LocalAI Quay.io"/>
-</a>
-</p>
+## Why LocalAI?

-<p align="center">
-<a href="https://trendshift.io/repositories/5539" target="_blank"><img src="https://trendshift.io/api/badge/repositories/5539" alt="mudler%2FLocalAI | Trendshift" style="width: 250px; height: 55px;" width="250" height="55"/></a>
-</p>
+In today's AI landscape, privacy, control, and flexibility are paramount. LocalAI addresses these needs by:

-<p align="center">
-<a href="https://twitter.com/LocalAI_API" target="blank">
-<img src="https://img.shields.io/twitter/follow/LocalAI_API?label=Follow: LocalAI_API&style=social" alt="Follow LocalAI_API"/>
-</a>
-<a href="https://discord.gg/uJAeKSAGDy" target="blank">
-<img src="https://dcbadge.vercel.app/api/server/uJAeKSAGDy?style=flat-square&theme=default-inverted" alt="Join LocalAI Discord Community"/>
-</a>
-</p>
+- **Privacy First**: Your data never leaves your machine
+- **Complete Control**: Run models on your terms, with your hardware
+- **Open Source**: MIT licensed and community-driven
+- **Flexible Deployment**: From laptops to servers, with or without GPUs
+- **Extensible**: Add new models and features as needed

-> 💡 Get help - [❓FAQ](https://localai.io/faq/) [💭Discussions](https://github.com/go-skynet/LocalAI/discussions) [💭Discord](https://discord.gg/uJAeKSAGDy)
->
-> [💻 Quickstart](https://localai.io/basics/getting_started/) [🖼️ Models](https://models.localai.io/) [🚀 Roadmap](https://github.com/mudler/LocalAI/issues?q=is%3Aissue+is%3Aopen+label%3Aroadmap) [🥽 Demo](https://demo.localai.io) [🌍 Explorer](https://explorer.localai.io) [🛫 Examples](https://github.com/go-skynet/LocalAI/tree/master/examples/) 
+## Core Components

+LocalAI is more than just a single tool - it's a complete ecosystem:

-**LocalAI** is the free, Open Source OpenAI alternative. LocalAI act as a drop-in replacement REST API that's compatible with OpenAI API specifications for local inferencing. It allows you to run LLMs, generate images, audio (and not only) locally or on-prem with consumer grade hardware, supporting multiple model families and architectures. Does not require GPU. It is created and maintained by [Ettore Di Giacinto](https://github.com/mudler).
+1. **[LocalAI Core](https://github.com/mudler/LocalAI)**
+   - OpenAI-compatible API
+   - Multiple model support (LLMs, image, audio)
+   - No GPU required
+   - Fast inference with native bindings
+   - [Github repository](https://github.com/mudler/LocalAI)

+2. **[LocalAGI](https://github.com/mudler/LocalAGI)**
+   - Autonomous AI agents
+   - No coding required
+   - WebUI and REST API support
+   - Extensible agent framework
+   - [Github repository](https://github.com/mudler/LocalAGI)

-## Start LocalAI
+3. **[LocalRecall](https://github.com/mudler/LocalRecall)**
+   - Semantic search
+   - Memory management
+   - Vector database
+   - Perfect for AI applications
+   - [Github repository](https://github.com/mudler/LocalRecall)

-Start the image with Docker to have a functional clone of OpenAI! 🚀:
+## Getting Started

-```bash
-docker run -p 8080:8080 --name local-ai -ti localai/localai:latest-aio-cpu
-# Do you have a Nvidia GPUs? Use this instead
-# CUDA 11
-# docker run -p 8080:8080 --gpus all --name local-ai -ti localai/localai:latest-aio-gpu-nvidia-cuda-11
-# CUDA 12
-# docker run -p 8080:8080 --gpus all --name local-ai -ti localai/localai:latest-aio-gpu-nvidia-cuda-12
-```
-
-Or just use the bash installer:
+The fastest way to get started is with our one-line installer:

 ```bash
 curl https://localai.io/install.sh | sh
 ```

-See the [💻 Quickstart](https://localai.io/basics/getting_started/) for all the options and way you can run LocalAI!
+Or use Docker for a quick start:

-## What is LocalAI?
+```bash
+docker run -p 8080:8080 --name local-ai -ti localai/localai:latest-aio-cpu
+```

-In a nutshell:
+For more detailed installation options and configurations, see our [Getting Started guide](/basics/getting_started/).

- Local, OpenAI drop-in alternative REST API. You own your data.
- NO GPU required. NO Internet access is required either
-  - Optional, GPU Acceleration is available. See also the [build section](https://localai.io/basics/build/index.html).
- Supports multiple models
- 🏃 Once loaded the first time, it keep models loaded in memory for faster inference
- ⚡ Doesn't shell-out, but uses bindings for a faster inference and better performance.
+## Key Features

-LocalAI is focused on making the AI accessible to anyone. Any contribution, feedback and PR is welcome!
+- **Text Generation**: Run various LLMs locally
+- **Image Generation**: Create images with stable diffusion
+- **Audio Processing**: Text-to-speech and speech-to-text
+- **Vision API**: Image understanding and analysis
+- **Embeddings**: Vector database support
+- **Functions**: OpenAI-compatible function calling
+- **P2P**: Distributed inference capabilities

-Note that this started just as a fun weekend project by [mudler](https://github.com/mudler) in order to try to create the necessary pieces for a full AI assistant like `ChatGPT`: the community is growing fast and we are working hard to make it better and more stable. If you want to help, please consider contributing (see below)!
+## Community and Support

-### 🚀 Features
+LocalAI is a community-driven project. You can:

- 📖 [Text generation with GPTs](https://localai.io/features/text-generation/) (`llama.cpp`, `gpt4all.cpp`, ... [:book: and more](https://localai.io/model-compatibility/index.html#model-compatibility-table))
- 🗣 [Text to Audio](https://localai.io/features/text-to-audio/)
- 🔈 [Audio to Text](https://localai.io/features/audio-to-text/) (Audio transcription with `whisper.cpp`)
- 🎨 [Image generation with stable diffusion](https://localai.io/features/image-generation)
- 🔥 [OpenAI functions](https://localai.io/features/openai-functions/) 🆕
- 🧠 [Embeddings generation for vector databases](https://localai.io/features/embeddings/)
- ✍️ [Constrained grammars](https://localai.io/features/constrained_grammars/)
- 🖼️ [Download Models directly from Huggingface ](https://localai.io/models/)
- 🥽 [Vision API](https://localai.io/features/gpt-vision/)
- 💾 [Stores](https://localai.io/stores)
- 📈 [Reranker](https://localai.io/features/reranker/)
- 🆕🖧 [P2P Inferencing](https://localai.io/features/distribute/)
+- Join our [Discord community](https://discord.gg/uJAeKSAGDy)
+- Check out our [GitHub repository](https://github.com/mudler/LocalAI)
+- Contribute to the project
+- Share your use cases and examples

-## Contribute and help
+## Next Steps

-To help the project you can:
+Ready to dive in? Here are some recommended next steps:

- If you have technological skills and want to contribute to development, have a look at the open issues. If you are new you can have a look at the [good-first-issue](https://github.com/go-skynet/LocalAI/issues?q=is%3Aissue+is%3Aopen+label%3A%22good+first+issue%22) and [help-wanted](https://github.com/go-skynet/LocalAI/issues?q=is%3Aissue+is%3Aopen+label%3A%22help+wanted%22) labels.
+1. [Install LocalAI](/basics/getting_started/)
+2. [Explore available models](https://models.localai.io)
+3. [Model compatibility](/model-compatibility/)
+4. [Try out examples](https://github.com/mudler/LocalAI-examples)
+5. [Join the community](https://discord.gg/uJAeKSAGDy)
+6. [Check the LocalAI Github repository](https://github.com/mudler/LocalAI)
+7. [Check the LocalAGI Github repository](https://github.com/mudler/LocalAGI)

- If you don't have technological skills you can still help improving documentation or [add examples](https://github.com/go-skynet/LocalAI/tree/master/examples) or share your user-stories with our community, any help and contribution is welcome!

-## 🌟 Star history
+## License

-[![LocalAI Star history Chart](https://api.star-history.com/svg?repos=mudler/LocalAI&type=Date)](https://star-history.com/#mudler/LocalAI&Date)
-
-## ❤️ Sponsors
-
-> Do you find LocalAI useful?
-
-Support the project by becoming [a backer or sponsor](https://github.com/sponsors/mudler). Your logo will show up here with a link to your website.
-
-A huge thank you to our generous sponsors who support this project covering CI expenses, and our [Sponsor list](https://github.com/sponsors/mudler):
-
-<p align="center">
-  <a href="https://www.spectrocloud.com/" target="blank">
-    <img width=200 src="https://github.com/user-attachments/assets/72eab1dd-8b93-4fc0-9ade-84db49f24962">
-  </a>
-  <a href="https://www.premai.io/" target="blank">
-    <img  width=200 src="https://github.com/mudler/LocalAI/assets/2420543/42e4ca83-661e-4f79-8e46-ae43689683d6"> <br>
-  </a>
-</p>
-
-## 📖 License
-
-LocalAI is a community-driven project created by [Ettore Di Giacinto](https://github.com/mudler/).
-
-MIT - Author Ettore Di Giacinto
-
-## 🙇 Acknowledgements
-
-LocalAI couldn't have been built without the help of great software already available from the community. Thank you!
-
- [llama.cpp](https://github.com/ggerganov/llama.cpp)
- https://github.com/tatsu-lab/stanford_alpaca
- https://github.com/cornelk/llama-go for the initial ideas
- https://github.com/antimatter15/alpaca.cpp
- https://github.com/EdVince/Stable-Diffusion-NCNN
- https://github.com/ggerganov/whisper.cpp
- https://github.com/saharNooby/rwkv.cpp
- https://github.com/rhasspy/piper
-
-## 🤗 Contributors
-
-This is a community project, a special thanks to our contributors! 🤗
-<a href="https://github.com/go-skynet/LocalAI/graphs/contributors">
-  <img src="https://contrib.rocks/image?repo=go-skynet/LocalAI" />
-</a>
+LocalAI is MIT licensed, created and maintained by [Ettore Di Giacinto](https://github.com/mudler).
--- a/docs/data/landing.yaml
+++ b/docs/data/landing.yaml
@@ -2,38 +2,212 @@

 # Hero
 hero:
-  enable: false
+  enable: true
  weight: 10
  template: hero

+  backgroundImage:
+    path: "images/templates/hero"
+    filename:
+      desktop: "gradient-desktop.webp"
+      mobile: "gradient-mobile.webp"
+
+  badge:
+    text: "⭐ 31.8k+ stars on GitHub!"
+    color: primary
+    pill: false
+    soft: true
+
+  titleLogo:
+    path: "images/logos"
+    filename: "logo.png"
+    alt: "LocalAI Logo"
+    height: 340px
+
+  title: ""
+  subtitle: |
+    **The free, OpenAI, Anthropic alternative. Your All-in-One Complete AI Stack** - Run powerful language models, autonomous agents, and document intelligence **locally** on your hardware. 
+    
+    **No cloud, no limits, no compromise.**
+
+  image:
+    path: "images"
+    filename: "localai_screenshot.png"
+    alt: "LocalAI Screenshot"
+    boxShadow: true
+    rounded: true
+
+  ctaButton:
+    icon: rocket_launch
+    btnText: "Get Started"
+    url: "/basics/getting_started/"
+  cta2Button:
+    icon: code
+    btnText: "View on GitHub"
+    url: "https://github.com/mudler/LocalAI"
+
+  info: |
+    **Drop-in replacement for OpenAI API** - modular suite of tools that work seamlessly together or independently. 
+    
+    Start with **[LocalAI](https://localai.io)**'s OpenAI-compatible API, extend with **[LocalAGI](https://github.com/mudler/LocalAGI)**'s autonomous agents, and enhance with **[LocalRecall](https://github.com/mudler/LocalRecall)**'s semantic search - all running locally on your hardware.
+
+    **Open Source** MIT Licensed.
+
 # Feature Grid
 featureGrid:
-  enable: false
+  enable: true
  weight: 20
  template: feature grid

+  title: Why choose LocalAI?
+  subtitle: |
+    **OpenAI API Compatible** - Run AI models locally with our modular ecosystem. From language models to autonomous agents and semantic search, build your complete AI stack without the cloud.
+
+  items:
+    - title: LLM Inferencing
+      icon: memory_alt
+      description: LocalAI is a free, **Open Source** OpenAI alternative. Run **LLMs**, generate **images**, **audio** and more **locally** with consumer grade hardware.
+      ctaLink:
+        text: learn more
+        url: /basics/getting_started/
+    - title: Agentic-first
+      icon: smart_toy
+      description: |
+        Extend LocalAI with LocalAGI, an autonomous AI agent platform that runs locally, no coding required. 
+        Build and deploy autonomous agents with ease. Interact with REST APIs or use the WebUI.
+      ctaLink:
+        text: learn more
+        url: https://github.com/mudler/LocalAGI
+
+    - title: Memory and Knowledge base
+      icon: psychology
+      description: 
+        Extend LocalAI with LocalRecall, A local rest api for semantic search and memory management. Perfect for AI applications.
+      ctaLink:
+        text: learn more
+        url: https://github.com/mudler/LocalRecall
+
+    - title: OpenAI Compatible
+      icon: api
+      description: Drop-in replacement for OpenAI API. Compatible with existing applications and libraries.
+      ctaLink:
+        text: learn more
+        url: /basics/getting_started/
+
+    - title: No GPU Required
+      icon: memory
+      description: Run on consumer grade hardware. No need for expensive GPUs or cloud services.
+      ctaLink:
+        text: learn more
+        url: /basics/getting_started/
+
+    - title: Multiple Models
+      icon: hub
+      description: |
+          Support for various model families including LLMs, image generation, and audio models.
+          Supports multiple backends for inferencing, including vLLM, llama.cpp, and more.
+          You can switch between them as needed and install them from the Web interface or the CLI.
+      ctaLink:
+        text: learn more
+        url: /model-compatibility
+
+    - title: Privacy Focused
+      icon: security
+      description: Keep your data local. No data leaves your machine, ensuring complete privacy.
+      ctaLink:
+        text: learn more
+        url: /basics/container/
+
+    - title: Easy Setup
+      icon: settings
+      description: Simple installation and configuration. Get started in minutes with Binaries installation, Docker, Podman, Kubernetes or local installation.
+      ctaLink:
+        text: learn more
+        url: /basics/getting_started/
+
+    - title: Community Driven
+      icon: groups
+      description: Active community support and regular updates. Contribute and help shape the future of LocalAI.
+      ctaLink:
+        text: learn more
+        url: https://github.com/mudler/LocalAI
+
+
+
+    - title: Extensible
+      icon: extension
+      description: Easy to extend and customize. Add new models and features as needed.
+      ctaLink:
+        text: learn more
+        url: /docs/integrations/
+
+    - title: Peer 2 Peer
+      icon: hub
+      description: |
+        LocalAI is designed to be a decentralized LLM inference, powered by a peer-to-peer system based on libp2p. 
+        It is designed to be used in a local or remote network, and is compatible with any LLM model. 
+        It works both in federated mode or by splitting models weights.
+      ctaLink:
+        text: learn more
+        url: /features/distribute/
+
+    - title: Open Source
+      icon: code
+      description: MIT licensed. Free to use, modify, and distribute. Community contributions welcome.
+      ctaLink:
+        text: learn more
+        url: https://github.com/mudler/LocalAI
+
 imageText:
  enable: true
  weight: 25
  template: image text

-  title: LocalAI
-  subtitle: The Free, Open Source OpenAI Alternative
+  title: Run AI models locally with ease
+  subtitle: |
+    LocalAI makes it simple to run various AI models on your own hardware. From text generation to image creation, autonomous agents to semantic search - all orchestrated through a unified API.

  list:
-    - text: Optimized, fast inference
-      icon: speed
+    - text: OpenAI API compatibility
+      icon: api

-    - text: Comprensive support for many models architectures
-      icon: area_chart
+    - text: Multiple model support
+      icon: hub

-    - text: Easy to deploy with Docker
-      icon: accessibility
+    - text: Image understanding
+      icon: image
+    
+    - text: Image generation
+      icon: image
+
+    - text: Audio generation
+      icon: music_note
+
+    - text: Voice activity detection
+      icon: mic
+
+    - text: Speech recognition
+      icon: mic
+
+    - text: Video generation
+      icon: movie
+
+    - text: Privacy focused
+      icon: security
+
+    - text: Autonomous agents with [LocalAGI](https://github.com/mudler/LocalAGI)
+      icon: smart_toy
+
+    - text: Semantic search with [LocalRecall](https://github.com/mudler/LocalRecall)
+      icon: psychology
+
+    - text: Agent orchestration
+      icon: hub

  image:
-    path: "images/logos"
-    filename: "logo.png"
-    alt: "LocalAI logo" # Optional but recommended
+    path: "images"
+    filename: "imagen.png"
+    alt: "LocalAI Image generation"

  imgOrder:
    desktop: 2
@@ -41,10 +215,62 @@ imageText:

  ctaButton:
    text: Learn more
-    url: "/docs/"
+    url: "/basics/getting_started/"

 # Image compare
 imageCompare:
  enable: false
  weight: 30
  template: image compare
+
+  title: LocalAI in Action
+  subtitle: See how LocalAI can transform your local AI experience with various models and capabilities.
+
+  items:
+    - title: Text Generation
+      config: {
+        startingPoint: 50,
+        addCircle: true,
+        addCircleBlur: false,
+        showLabels: true,
+        labelOptions: {
+          before: 'Dark',
+          after: 'Light',
+          onHover: false
+        }
+      }
+      imagePath: "images/screenshots"
+      imageBefore: "text_generation_input.webp"
+      imageAfter: "text_generation_output.webp"
+
+    - title: Image Generation
+      config: {
+        startingPoint: 50,
+        addCircle: true,
+        addCircleBlur: true,
+        showLabels: true,
+        labelOptions: {
+          before: 'Prompt',
+          after: 'Result',
+          onHover: true
+        }
+      }
+      imagePath: "images/screenshots"
+      imageBefore: "imagen_before.webp"
+      imageAfter: "imagen_after.webp"
+
+    - title: Audio Generation
+      config: {
+        startingPoint: 50,
+        addCircle: true,
+        addCircleBlur: false,
+        showLabels: true,
+        labelOptions: {
+          before: 'Text',
+          after: 'Audio',
+          onHover: false
+        }
+      }
+      imagePath: "images/screenshots"
+      imageBefore: "audio_generation_text.webp"
+      imageAfter: "audio_generation_waveform.webp"
--- a/docs/data/version.json
+++ b/docs/data/version.json
@@ -1,3 +1,3 @@
 {
-  "version": "v2.26.0"
+  "version": "v2.27.0"
 }
--- a/docs/layouts/index.html
+++ b/docs/layouts/index.html
--- a/docs/layouts/partials/docs/top-header.html
+++ b/docs/layouts/partials/docs/top-header.html
@@ -82,7 +82,7 @@
                </span>
            </button>
            {{ end -}}
-            {{ if .Site.IsMultiLingual }}
+            {{ if hugo.IsMultilingual }}
                <div class="dropdown">
                    <button class="btn btn-link btn-default dropdown-toggle ps-2" type="button" data-bs-toggle="dropdown" aria-expanded="false">
                        {{ site.Language.Lang | upper }}
--- a/docs/layouts/partials/head.html
+++ b/docs/layouts/partials/head.html
@@ -18,10 +18,10 @@
    <!-- Custom CSS -->
    {{- $options := dict "enableSourceMap" true }}
    {{- if hugo.IsProduction}}
-        {{- $options := dict "enableSourceMap" false "outputStyle" "compressed" }}
+        {{- $options = dict "enableSourceMap" false "outputStyle" "compressed" }}
    {{- end }}
    {{- $style := resources.Get "/scss/style.scss" }}
-    {{- $style = $style | resources.ExecuteAsTemplate "/scss/style.scss" . | resources.ToCSS $options }}
+    {{- $style = $style | resources.ExecuteAsTemplate "/scss/style.scss" . | css.Sass $options }}
    {{- if hugo.IsProduction }}
        {{- $style = $style | minify | fingerprint "sha384" }}
    {{- end -}}
@@ -39,7 +39,7 @@
    <!-- Image Compare Viewer -->
    {{ if ($.Scratch.Get "image_compare_enabled") }}
        {{ $imagecompare := resources.Get "js/image-compare-viewer.min.js" }}
-        {{- if not .Site.IsServer }}
+        {{- if not hugo.IsDevelopment }}
            {{- $js := (slice $imagecompare) | resources.Concat "/js/image-compare.js" | minify | fingerprint "sha384" }}
            <script type="text/javascript" src="{{ $js.Permalink }}" integrity="{{ $js.Data.Integrity }}"></script>
        {{- else }}
@@ -48,14 +48,14 @@
        {{- end }}
    {{- end }}
    <!-- Plausible Analytics Config -->
-    {{- if not .Site.IsServer }}
+    {{- if not hugo.IsDevelopment }}
    {{ if and (.Site.Params.plausible.scriptURL) (.Site.Params.plausible.dataDomain) -}}
        {{- partialCached "head/plausible" . }}
    {{- end -}}
    {{- end -}}
    <!-- Google Analytics v4 Config -->
-    {{- if not .Site.IsServer }}
-    {{- if .Site.GoogleAnalytics }}
+    {{- if not hugo.IsDevelopment }}
+    {{- if .Site.Params.analytics.google }}
        {{- template "_internal/google_analytics.html" . -}}
    {{- end -}}
    {{- end -}}
--- a/docs/layouts/partials/header.html
+++ b/docs/layouts/partials/header.html
@@ -0,0 +1,57 @@
+<!-- Navbar Start -->
+<header id="topnav">
+    <div class="container d-flex justify-content-between align-items-center">
+        <!-- Logo container-->
+        <a class="logo" aria-label="Home" href='{{ relLangURL "" }}'>
+            
+        </a>
+        <!-- End Logo container-->
+
+        <div class="d-flex align-items-center">
+
+            <div id="navigation">
+                <!-- Navigation Menu -->
+                <ul class="navigation-menu nav-right">
+                    {{- range .Site.Menus.primary }}
+                    <li><a href="{{ relLangURL .URL }}">{{ .Name }}</a></li>
+                    {{ end }}
+                </ul><!--end navigation menu-->
+            </div><!--end navigation-->
+
+            <!-- Social Links Start -->
+            {{ with $.Scratch.Get "social_list" }}
+            <ul class="social-link d-flex list-inline mb-0">
+                {{ range . }}
+                    {{ $path := printf "images/social/%s.%s" . "svg" }}
+                    <li class="list-inline-item mb-0">
+                        <a href="{{ if eq . `rss` }} {{ `index.xml` | absURL }} {{ else if eq . `bluesky` }} https://bsky.app/profile/{{ index site.Params.social . }} {{ else }} https://{{ . }}.com/{{ index site.Params.social . }} {{ end }}" alt="{{ . }}" rel="noopener noreferrer" target="_blank">
+                            <div class="btn btn-icon btn-landing border-0">
+                                {{ with resources.Get $path }}
+                                    {{ .Content | safeHTML }}
+                                {{ end }}
+                            </div>
+                        </a>
+                    </li>
+                {{ end }}
+            </ul>
+            {{ end }}
+            <!-- Social Links End -->
+
+            <div class="menu-extras ms-3 me-2">
+                <div class="menu-item">
+                    <!-- Mobile menu toggle-->
+                    <button class="navbar-toggle btn btn-icon btn-soft-light" id="isToggle" aria-label="toggleMenu" onclick="toggleMenu()">
+                        <div class="lines">
+                            <span></span>
+                            <span></span>
+                            <span></span>
+                        </div>
+                    </button>
+                    <!-- End mobile menu toggle-->
+                </div>
+            </div>
+
+        </div>
+    </div><!--end container-->
+</header><!--end header-->
+<!-- Navbar End -->
--- a/docs/layouts/partials/logo.html
+++ b/docs/layouts/partials/logo.html
@@ -1 +1 @@
-<a href="https://localai.io"><img src="https://github.com/go-skynet/LocalAI/assets/2420543/0966aa2a-166e-4f99-a3e5-6c915fc997dd"></a>
+<a href="https://localai.io"><img src="https://raw.githubusercontent.com/mudler/LocalAI/refs/heads/master/core/http/static/logo.png"></a>
--- a/docs/netlify.toml
+++ b/docs/netlify.toml
@@ -1,4 +1,4 @@
 [build]
 [build.environment]
-HUGO_VERSION = "0.121.2"
+HUGO_VERSION = "0.146.3"
 GO_VERSION = "1.22.2"
--- a/docs/static/android-chrome-192x192.png
+++ b/docs/static/android-chrome-192x192.png
--- a/docs/static/android-chrome-512x512.png
+++ b/docs/static/android-chrome-512x512.png
--- a/docs/static/apple-touch-icon.png
+++ b/docs/static/apple-touch-icon.png
--- a/docs/static/favicon-16x16.png
+++ b/docs/static/favicon-16x16.png
--- a/docs/static/favicon-32x32.png
+++ b/docs/static/favicon-32x32.png
--- a/docs/static/favicon.ico
+++ b/docs/static/favicon.ico
--- a/docs/static/favicon.svg
+++ b/docs/static/favicon.svg
--- a/docs/static/site.webmanifest
+++ b/docs/static/site.webmanifest
@@ -0,0 +1 @@
+{"name":"","short_name":"","icons":[{"src":"/android-chrome-192x192.png","sizes":"192x192","type":"image/png"},{"src":"/android-chrome-512x512.png","sizes":"512x512","type":"image/png"}],"theme_color":"#ffffff","background_color":"#ffffff","display":"standalone"}
--- a/docs/themes/lotusdocs
+++ b/docs/themes/lotusdocs
--- a/gallery/gemma.yaml
+++ b/gallery/gemma.yaml
@@ -8,9 +8,7 @@ config_file: |
    chat_message: |-
      <start_of_turn>{{if eq .RoleName "assistant" }}model{{else}}{{ .RoleName }}{{end}}
      {{ if .FunctionCall -}}
-      Function call:
      {{ else if eq .RoleName "tool" -}}
-      Function response:
      {{ end -}}
      {{ if .Content -}}
      {{.Content -}}
@@ -25,11 +23,14 @@ config_file: |
      {{.Input}}
    function: |
      <start_of_turn>system
-      You are a function calling AI model. You are provided with functions to execute. You may call one or more functions to assist with the user query. Don't make assumptions about what values to plug into functions. Here are the available tools:
+      You have access to functions. If you decide to invoke any of the function(s),
+      you MUST put it in the format of
+      {"name": function name, "parameters": dictionary of argument name and its value}
+
+      You SHOULD NOT include any other text in the response if you call a function
      {{range .Functions}}
      {'type': 'function', 'function': {'name': '{{.Name}}', 'description': '{{.Description}}', 'parameters': {{toJson .Parameters}} }}
      {{end}}
-      For each function call return a json object with function name and arguments
      <end_of_turn>
      {{.Input -}}
      <start_of_turn>model
--- a/gallery/index.yaml
+++ b/gallery/index.yaml
@@ -78,6 +78,60 @@
    - filename: gemma-3-1b-it-Q4_K_M.gguf
      sha256: 8ccc5cd1f1b3602548715ae25a66ed73fd5dc68a210412eea643eb20eb75a135
      uri: huggingface://ggml-org/gemma-3-1b-it-GGUF/gemma-3-1b-it-Q4_K_M.gguf
+- !!merge <<: *gemma3
+  name: "gemma-3-12b-it-qat"
+  urls:
+    - https://huggingface.co/google/gemma-3-12b-it
+    - https://huggingface.co/vinimuchulski/gemma-3-12b-it-qat-q4_0-gguf
+  description: |
+    This model corresponds to the 12B instruction-tuned version of the Gemma 3 model in GGUF format using Quantization Aware Training (QAT). The GGUF corresponds to Q4_0 quantization.
+
+    Thanks to QAT, the model is able to preserve similar quality as bfloat16 while significantly reducing the memory requirements to load the model.
+
+    You can find the half-precision version here.
+  overrides:
+    parameters:
+      model: gemma-3-12b-it-q4_0.gguf
+  files:
+    - filename: gemma-3-12b-it-q4_0.gguf
+      sha256: 6f1bb5f455414f7b46482bda51cbfdbf19786e21a5498c4403fdfc03d09b045c
+      uri: huggingface://vinimuchulski/gemma-3-12b-it-qat-q4_0-gguf/gemma-3-12b-it-q4_0.gguf
+- !!merge <<: *gemma3
+  name: "gemma-3-4b-it-qat"
+  urls:
+    - https://huggingface.co/google/gemma-3-4b-it
+    - https://huggingface.co/vinimuchulski/gemma-3-4b-it-qat-q4_0-gguf
+  description: |
+    This model corresponds to the 4B instruction-tuned version of the Gemma 3 model in GGUF format using Quantization Aware Training (QAT). The GGUF corresponds to Q4_0 quantization.
+
+    Thanks to QAT, the model is able to preserve similar quality as bfloat16 while significantly reducing the memory requirements to load the model.
+
+    You can find the half-precision version here.
+  overrides:
+    parameters:
+      model: gemma-3-4b-it-q4_0.gguf
+  files:
+    - filename: gemma-3-4b-it-q4_0.gguf
+      sha256: 2ca493d426ffcb43db27132f183a0230eda4a3621e58b328d55b665f1937a317
+      uri: huggingface://vinimuchulski/gemma-3-4b-it-qat-q4_0-gguf/gemma-3-4b-it-q4_0.gguf
+- !!merge <<: *gemma3
+  name: "gemma-3-27b-it-qat"
+  urls:
+    - https://huggingface.co/google/gemma-3-27b-it
+    - https://huggingface.co/vinimuchulski/gemma-3-27b-it-qat-q4_0-gguf
+  description: |
+    This model corresponds to the 27B instruction-tuned version of the Gemma 3 model in GGUF format using Quantization Aware Training (QAT). The GGUF corresponds to Q4_0 quantization.
+
+    Thanks to QAT, the model is able to preserve similar quality as bfloat16 while significantly reducing the memory requirements to load the model.
+
+    You can find the half-precision version here.
+  overrides:
+    parameters:
+      model: gemma-3-27b-it-q4_0.gguf
+  files:
+    - filename: gemma-3-27b-it-q4_0.gguf
+      sha256: 45e586879bc5f5d7a5b6527e812952057ce916d9fc7ba16f7262ec9972c9e2a2
+      uri: huggingface://vinimuchulski/gemma-3-27b-it-qat-q4_0-gguf/gemma-3-27b-it-q4_0.gguf
 - !!merge <<: *gemma3
  name: "qgallouedec_gemma-3-27b-it-codeforces-sft"
  urls:
@@ -386,6 +440,78 @@
    - filename: Gemma-3-Starshine-12B.i1-Q4_K_M.gguf
      sha256: 4c35a678e3784e20a8d85d4e7045d965509a1a71305a0da105fc5991ba7d6dc4
      uri: huggingface://mradermacher/Gemma-3-Starshine-12B-i1-GGUF/Gemma-3-Starshine-12B.i1-Q4_K_M.gguf
+- !!merge <<: *gemma3
+  name: "burtenshaw_gemmacoder3-12b"
+  icon: https://cdn-uploads.huggingface.co/production/uploads/62d648291fa3e4e7ae3fa6e8/zkcBr2UZFDpALAsMdgbze.gif
+  urls:
+    - https://huggingface.co/burtenshaw/GemmaCoder3-12B
+    - https://huggingface.co/bartowski/burtenshaw_GemmaCoder3-12B-GGUF
+  description: |
+    This model is a fine-tuned version of google/gemma-3-12b-it on the open-r1/codeforces-cots dataset. It has been trained using TRL.
+  overrides:
+    parameters:
+      model: burtenshaw_GemmaCoder3-12B-Q4_K_M.gguf
+  files:
+    - filename: burtenshaw_GemmaCoder3-12B-Q4_K_M.gguf
+      sha256: 47f0a2848eeed783cb03336afd8cc69f6ee0e088e3cec11ab6d9fe16457dc3d4
+      uri: huggingface://bartowski/burtenshaw_GemmaCoder3-12B-GGUF/burtenshaw_GemmaCoder3-12B-Q4_K_M.gguf
+- !!merge <<: *gemma3
+  name: "tesslate_synthia-s1-27b"
+  icon: https://cdn-uploads.huggingface.co/production/uploads/64d1129297ca59bcf7458d07/zgFDl7UvWhiPYqdote7XT.png
+  urls:
+    - https://huggingface.co/Tesslate/Synthia-S1-27b
+    - https://huggingface.co/bartowski/Tesslate_Synthia-S1-27b-GGUF
+  description: |
+    Synthia-S1-27b is a reasoning, AI model developed by Tesslate AI, fine-tuned specifically for advanced reasoning, coding, and RP usecases. Built upon the robust Gemma3 architecture, Synthia-S1-27b excels in logical reasoning, creative writing, and deep contextual understanding. It supports multimodal inputs (text and images) with a large 128K token context window, enabling complex analysis suitable for research, academic tasks, and enterprise-grade AI applications.
+  overrides:
+    parameters:
+      model: Tesslate_Synthia-S1-27b-Q4_K_M.gguf
+  files:
+    - filename: Tesslate_Synthia-S1-27b-Q4_K_M.gguf
+      sha256: d953bf7f802dc68f85a35360deb24b9a8b446af051e82c77f2f0759065d2aa71
+      uri: huggingface://bartowski/Tesslate_Synthia-S1-27b-GGUF/Tesslate_Synthia-S1-27b-Q4_K_M.gguf
+- !!merge <<: *gemma3
+  name: "daichi-12b"
+  icon: https://cdn-uploads.huggingface.co/production/uploads/66c26b6fb01b19d8c3c2467b/RqjcprtID598UTzL4igkU.webp
+  urls:
+    - https://huggingface.co/Delta-Vector/Daichi-12B
+    - https://huggingface.co/Delta-Vector/Daichi-12B-GGUF
+  description: |
+    A merge between my Gemma-Finetune of Pascal-12B and Omega-Directive-G-12B, Meant to give it more NSFW knowledge.
+    This model has a short-sweet prose and is uncensored in Roleplay.
+    The model is suited for traditional RP, All thanks to Tav for funding the train.
+  overrides:
+    parameters:
+      model: Omega-LN-SFT-Q4_K_M.gguf
+  files:
+    - filename: Omega-LN-SFT-Q4_K_M.gguf
+      sha256: 33fb1c61085f9b18074e320ac784e6dbc8a98fe20705f92773e055471fd3cb0f
+      uri: huggingface://Delta-Vector/Daichi-12B-GGUF/Omega-LN-SFT-Q4_K_M.gguf
+- &llama4
+  url: "github:mudler/LocalAI/gallery/llama3.1-instruct.yaml@master"
+  icon: https://avatars.githubusercontent.com/u/153379578
+  license: llama4
+  tags:
+    - llm
+    - gguf
+    - gpu
+    - cpu
+    - llama3.3
+  name: "meta-llama_llama-4-scout-17b-16e-instruct"
+  urls:
+    - https://huggingface.co/meta-llama/Llama-4-Scout-17B-16E-Instruct
+    - https://huggingface.co/bartowski/meta-llama_Llama-4-Scout-17B-16E-Instruct-GGUF
+  description: |
+    The Llama 4 collection of models are natively multimodal AI models that enable text and multimodal experiences. These models leverage a mixture-of-experts architecture to offer industry-leading performance in text and image understanding.
+
+    These Llama 4 models mark the beginning of a new era for the Llama ecosystem. We are launching two efficient models in the Llama 4 series, Llama 4 Scout, a 17 billion parameter model with 16 experts, and Llama 4 Maverick, a 17 billion parameter model with 128 experts.
+  overrides:
+    parameters:
+      model: meta-llama_Llama-4-Scout-17B-16E-Instruct-Q3_K_S.gguf
+  files:
+    - filename: meta-llama_Llama-4-Scout-17B-16E-Instruct-Q3_K_S.gguf
+      sha256: 48dfc18d40691b4190b7fecf1f89b78cadc758c3a27a9e2a1cabd686fdb822e3
+      uri: huggingface://bartowski/meta-llama_Llama-4-Scout-17B-16E-Instruct-GGUF/meta-llama_Llama-4-Scout-17B-16E-Instruct-Q3_K_S.gguf
 - &eurollm
  name: "eurollm-9b-instruct"
  icon: https://openeurollm.eu/_next/static/media/logo-dark.e7001867.svg
@@ -1382,6 +1508,52 @@
    - filename: Forgotten-Abomination-70B-v5.0.Q4_K_M.gguf
      sha256: a5f5e712e66b855f36ff45175f20c24441fa942ca8af47bd6f49107c6e0f025d
      uri: huggingface://mradermacher/Forgotten-Abomination-70B-v5.0-GGUF/Forgotten-Abomination-70B-v5.0.Q4_K_M.gguf
+- !!merge <<: *llama33
+  name: "watt-ai_watt-tool-70b"
+  urls:
+    - https://huggingface.co/watt-ai/watt-tool-70B
+    - https://huggingface.co/bartowski/watt-ai_watt-tool-70B-GGUF
+  description: |
+    watt-tool-70B is a fine-tuned language model based on LLaMa-3.3-70B-Instruct, optimized for tool usage and multi-turn dialogue. It achieves state-of-the-art performance on the Berkeley Function-Calling Leaderboard (BFCL).
+    Model Description
+
+    This model is specifically designed to excel at complex tool usage scenarios that require multi-turn interactions, making it ideal for empowering platforms like Lupan, an AI-powered workflow building tool. By leveraging a carefully curated and optimized dataset, watt-tool-70B demonstrates superior capabilities in understanding user requests, selecting appropriate tools, and effectively utilizing them across multiple turns of conversation.
+
+    Target Application: AI Workflow Building as in https://lupan.watt.chat/ and Coze.
+    Key Features
+
+        Enhanced Tool Usage: Fine-tuned for precise and efficient tool selection and execution.
+        Multi-Turn Dialogue: Optimized for maintaining context and effectively utilizing tools across multiple turns of conversation, enabling more complex task completion.
+        State-of-the-Art Performance: Achieves top performance on the BFCL, demonstrating its capabilities in function calling and tool usage.
+        Based on LLaMa-3.1-70B-Instruct: Inherits the strong language understanding and generation capabilities of the base model.
+  overrides:
+    parameters:
+      model: watt-ai_watt-tool-70B-Q4_K_M.gguf
+  files:
+    - filename: watt-ai_watt-tool-70B-Q4_K_M.gguf
+      sha256: 93806a5482b9e40e50ffca7a72abe3414d384749cc9e3d378eab5db8a8154b18
+      uri: huggingface://bartowski/watt-ai_watt-tool-70B-GGUF/watt-ai_watt-tool-70B-Q4_K_M.gguf
+- !!merge <<: *llama33
+  name: "deepcogito_cogito-v1-preview-llama-70b"
+  icon: https://huggingface.co/deepcogito/cogito-v1-preview-llama-70B/resolve/main/images/deep-cogito-logo.png
+  urls:
+    - https://huggingface.co/deepcogito/cogito-v1-preview-llama-70B
+    - https://huggingface.co/bartowski/deepcogito_cogito-v1-preview-llama-70B-GGUF
+  description: |
+      The Cogito LLMs are instruction tuned generative models (text in/text out). All models are released under an open license for commercial use.
+
+          Cogito models are hybrid reasoning models. Each model can answer directly (standard LLM), or self-reflect before answering (like reasoning models).
+          The LLMs are trained using Iterated Distillation and Amplification (IDA) - an scalable and efficient alignment strategy for superintelligence using iterative self-improvement.
+          The models have been optimized for coding, STEM, instruction following and general helpfulness, and have significantly higher multilingual, coding and tool calling capabilities than size equivalent counterparts.
+              In both standard and reasoning modes, Cogito v1-preview models outperform their size equivalent counterparts on common industry benchmarks.
+          Each model is trained in over 30 languages and supports a context length of 128k.
+  overrides:
+    parameters:
+      model: deepcogito_cogito-v1-preview-llama-70B-Q4_K_M.gguf
+  files:
+    - filename: deepcogito_cogito-v1-preview-llama-70B-Q4_K_M.gguf
+      sha256: d1deaf80c649e2a9446463cf5e1f7c026583647f46e3940d2b405a57cc685225
+      uri: huggingface://bartowski/deepcogito_cogito-v1-preview-llama-70B-GGUF/deepcogito_cogito-v1-preview-llama-70B-Q4_K_M.gguf
 - &rwkv
  url: "github:mudler/LocalAI/gallery/rwkv.yaml@master"
  name: "rwkv-6-world-7b"
@@ -2495,6 +2667,27 @@
    - filename: Eximius_Persona_5B.Q4_K_M.gguf
      sha256: 8a8e7a0fa1068755322c51900e53423d795e57976b4d95982242cbec41141c7b
      uri: huggingface://mradermacher/Eximius_Persona_5B-GGUF/Eximius_Persona_5B.Q4_K_M.gguf
+- !!merge <<: *llama32
+  name: "deepcogito_cogito-v1-preview-llama-3b"
+  icon: https://huggingface.co/deepcogito/cogito-v1-preview-llama-3B/resolve/main/images/deep-cogito-logo.png
+  urls:
+    - https://huggingface.co/deepcogito/cogito-v1-preview-llama-3B
+    - https://huggingface.co/bartowski/deepcogito_cogito-v1-preview-llama-3B-GGUF
+  description: |
+    The Cogito LLMs are instruction tuned generative models (text in/text out). All models are released under an open license for commercial use.
+
+    Cogito models are hybrid reasoning models. Each model can answer directly (standard LLM), or self-reflect before answering (like reasoning models).
+    The LLMs are trained using Iterated Distillation and Amplification (IDA) - an scalable and efficient alignment strategy for superintelligence using iterative self-improvement.
+    The models have been optimized for coding, STEM, instruction following and general helpfulness, and have significantly higher multilingual, coding and tool calling capabilities than size equivalent counterparts.
+        In both standard and reasoning modes, Cogito v1-preview models outperform their size equivalent counterparts on common industry benchmarks.
+    Each model is trained in over 30 languages and supports a context length of 128k.
+  overrides:
+    parameters:
+      model: deepcogito_cogito-v1-preview-llama-3B-Q4_K_M.gguf
+  files:
+    - filename: deepcogito_cogito-v1-preview-llama-3B-Q4_K_M.gguf
+      sha256: 726a0ef5f818b8d238f2844f3204848bea66fb9c172b8ae0f6dc51b7bc081dd5
+      uri: huggingface://bartowski/deepcogito_cogito-v1-preview-llama-3B-GGUF/deepcogito_cogito-v1-preview-llama-3B-Q4_K_M.gguf
 - &qwen25
  name: "qwen2.5-14b-instruct" ## Qwen2.5
  icon: https://avatars.githubusercontent.com/u/141221163
@@ -5410,6 +5603,359 @@
    - filename: hammer2.0-7b-q5_k_m.gguf
      sha256: 3682843c857595765f0786cf24b3d501af96fe5d99a9fb2526bc7707e28bae1e
      uri: huggingface://Nekuromento/Hammer2.0-7b-Q5_K_M-GGUF/hammer2.0-7b-q5_k_m.gguf
+- !!merge <<: *qwen25
+  icon: https://github.com/All-Hands-AI/OpenHands/blob/main/docs/static/img/logo.png?raw=true
+  name: "all-hands_openhands-lm-32b-v0.1"
+  urls:
+    - https://huggingface.co/all-hands/openhands-lm-32b-v0.1
+    - https://huggingface.co/bartowski/all-hands_openhands-lm-32b-v0.1-GGUF
+  description: |
+    Autonomous agents for software development are already contributing to a wide range of software development tasks. But up to this point, strong coding agents have relied on proprietary models, which means that even if you use an open-source agent like OpenHands, you are still reliant on API calls to an external service.
+
+    Today, we are excited to introduce OpenHands LM, a new open coding model that:
+
+        Is open and available on Hugging Face, so you can download it and run it locally
+        Is a reasonable size, 32B, so it can be run locally on hardware such as a single 3090 GPU
+        Achieves strong performance on software engineering tasks, including 37.2% resolve rate on SWE-Bench Verified
+
+    Read below for more details and our future plans!
+    What is OpenHands LM?
+
+    OpenHands LM is built on the foundation of Qwen Coder 2.5 Instruct 32B, leveraging its powerful base capabilities for coding tasks. What sets OpenHands LM apart is our specialized fine-tuning process:
+
+        We used training data generated by OpenHands itself on a diverse set of open-source repositories
+        Specifically, we use an RL-based framework outlined in SWE-Gym, where we set up a training environment, generate training data using an existing agent, and then fine-tune the model on examples that were resolved successfully
+        It features a 128K token context window, ideal for handling large codebases and long-horizon software engineering tasks
+  overrides:
+    parameters:
+      model: all-hands_openhands-lm-32b-v0.1-Q4_K_M.gguf
+  files:
+    - filename: all-hands_openhands-lm-32b-v0.1-Q4_K_M.gguf
+      sha256: f7c2311d3264cc1e021a21a319748a9c75b74ddebe38551786aa4053448e5e74
+      uri: huggingface://bartowski/all-hands_openhands-lm-32b-v0.1-GGUF/all-hands_openhands-lm-32b-v0.1-Q4_K_M.gguf
+- !!merge <<: *qwen25
+  name: "all-hands_openhands-lm-7b-v0.1"
+  icon: https://github.com/All-Hands-AI/OpenHands/blob/main/docs/static/img/logo.png?raw=true
+  urls:
+    - https://huggingface.co/all-hands/openhands-lm-7b-v0.1
+    - https://huggingface.co/bartowski/all-hands_openhands-lm-7b-v0.1-GGUF
+  description: |
+    This is a smaller 7B model trained following the recipe of all-hands/openhands-lm-32b-v0.1. Autonomous agents for software development are already contributing to a wide range of software development tasks. But up to this point, strong coding agents have relied on proprietary models, which means that even if you use an open-source agent like OpenHands, you are still reliant on API calls to an external service.
+
+    Today, we are excited to introduce OpenHands LM, a new open coding model that:
+
+        Is open and available on Hugging Face, so you can download it and run it locally
+        Is a reasonable size, 32B, so it can be run locally on hardware such as a single 3090 GPU
+        Achieves strong performance on software engineering tasks, including 37.2% resolve rate on SWE-Bench Verified
+
+    Read below for more details and our future plans!
+    What is OpenHands LM?
+
+    OpenHands LM is built on the foundation of Qwen Coder 2.5 Instruct 32B, leveraging its powerful base capabilities for coding tasks. What sets OpenHands LM apart is our specialized fine-tuning process:
+
+        We used training data generated by OpenHands itself on a diverse set of open-source repositories
+        Specifically, we use an RL-based framework outlined in SWE-Gym, where we set up a training environment, generate training data using an existing agent, and then fine-tune the model on examples that were resolved successfully
+        It features a 128K token context window, ideal for handling large codebases and long-horizon software engineering tasks
+  overrides:
+    parameters:
+      model: all-hands_openhands-lm-7b-v0.1-Q4_K_M.gguf
+  files:
+    - filename: all-hands_openhands-lm-7b-v0.1-Q4_K_M.gguf
+      sha256: d50031b04bbdad714c004a0dc117c18d26a026297c236cda36089c20279b2ec1
+      uri: huggingface://bartowski/all-hands_openhands-lm-7b-v0.1-GGUF/all-hands_openhands-lm-7b-v0.1-Q4_K_M.gguf
+- !!merge <<: *qwen25
+  name: "all-hands_openhands-lm-1.5b-v0.1"
+  icon: https://github.com/All-Hands-AI/OpenHands/blob/main/docs/static/img/logo.png?raw=true
+  urls:
+    - https://huggingface.co/all-hands/openhands-lm-1.5b-v0.1
+    - https://huggingface.co/bartowski/all-hands_openhands-lm-1.5b-v0.1-GGUF
+  description: |
+    This is a smaller 1.5B model trained following the recipe of all-hands/openhands-lm-32b-v0.1. It is intended to be used for speculative decoding. Autonomous agents for software development are already contributing to a wide range of software development tasks. But up to this point, strong coding agents have relied on proprietary models, which means that even if you use an open-source agent like OpenHands, you are still reliant on API calls to an external service.
+
+    Today, we are excited to introduce OpenHands LM, a new open coding model that:
+
+        Is open and available on Hugging Face, so you can download it and run it locally
+        Is a reasonable size, 32B, so it can be run locally on hardware such as a single 3090 GPU
+        Achieves strong performance on software engineering tasks, including 37.2% resolve rate on SWE-Bench Verified
+
+    Read below for more details and our future plans!
+    What is OpenHands LM?
+
+    OpenHands LM is built on the foundation of Qwen Coder 2.5 Instruct 32B, leveraging its powerful base capabilities for coding tasks. What sets OpenHands LM apart is our specialized fine-tuning process:
+
+        We used training data generated by OpenHands itself on a diverse set of open-source repositories
+        Specifically, we use an RL-based framework outlined in SWE-Gym, where we set up a training environment, generate training data using an existing agent, and then fine-tune the model on examples that were resolved successfully
+        It features a 128K token context window, ideal for handling large codebases and long-horizon software engineering tasks
+  overrides:
+    parameters:
+      model: all-hands_openhands-lm-1.5b-v0.1-Q4_K_M.gguf
+  files:
+    - filename: all-hands_openhands-lm-1.5b-v0.1-Q4_K_M.gguf
+      sha256: 30abd7860c4eb5f2f51546389407b0064360862f64ea55cdf95f97c6e155b3c6
+      uri: huggingface://bartowski/all-hands_openhands-lm-1.5b-v0.1-GGUF/all-hands_openhands-lm-1.5b-v0.1-Q4_K_M.gguf
+- !!merge <<: *qwen25
+  name: "katanemo_arch-function-chat-7b"
+  urls:
+    - https://huggingface.co/katanemo/Arch-Function-Chat-7B
+    - https://huggingface.co/bartowski/katanemo_Arch-Function-Chat-7B-GGUF
+  description: |
+    The Arch-Function-Chat collection builds upon the Katanemo's Arch-Function collection by extending its capabilities beyond function calling. This new collection maintains the state-of-the-art(SOTA) function calling performance of the original collection while adding powerful new features that make it even more versatile in real-world applications.
+
+    In addition to function calling capabilities, this collection now offers:
+
+        Clarify & refine: Generates natural follow-up questions to collect missing information for function calling
+        Interpret & respond: Provides human-friendly responses based on function execution results
+        Context management: Mantains context in complex multi-turn interactions
+
+    Note: Arch-Function-Chat is now the primarly LLM used in then open source Arch Gateway - An AI-native proxy for agents. For more details about the project, check out the Github README.
+  overrides:
+    parameters:
+      model: katanemo_Arch-Function-Chat-7B-Q4_K_M.gguf
+  files:
+    - filename: katanemo_Arch-Function-Chat-7B-Q4_K_M.gguf
+      sha256: 6fd603511076ffea3697c8a76d82c054781c5e11f134b937a66cedfc49b3d2c5
+      uri: huggingface://bartowski/katanemo_Arch-Function-Chat-7B-GGUF/katanemo_Arch-Function-Chat-7B-Q4_K_M.gguf
+- !!merge <<: *qwen25
+  name: "katanemo_arch-function-chat-1.5b"
+  urls:
+    - https://huggingface.co/katanemo/Arch-Function-Chat-1.5B
+    - https://huggingface.co/bartowski/katanemo_Arch-Function-Chat-1.5B-GGUF
+  description: |
+    The Arch-Function-Chat collection builds upon the Katanemo's Arch-Function collection by extending its capabilities beyond function calling. This new collection maintains the state-of-the-art(SOTA) function calling performance of the original collection while adding powerful new features that make it even more versatile in real-world applications.
+
+    In addition to function calling capabilities, this collection now offers:
+
+        Clarify & refine: Generates natural follow-up questions to collect missing information for function calling
+        Interpret & respond: Provides human-friendly responses based on function execution results
+        Context management: Mantains context in complex multi-turn interactions
+
+    Note: Arch-Function-Chat is now the primarly LLM used in then open source Arch Gateway - An AI-native proxy for agents. For more details about the project, check out the Github README.
+  overrides:
+    parameters:
+      model: katanemo_Arch-Function-Chat-1.5B-Q4_K_M.gguf
+  files:
+    - filename: katanemo_Arch-Function-Chat-1.5B-Q4_K_M.gguf
+      sha256: 5bfcb72803745c374a90b0ceb60f347a8c7d1239960cce6a2d22cc1276236098
+      uri: huggingface://bartowski/katanemo_Arch-Function-Chat-1.5B-GGUF/katanemo_Arch-Function-Chat-1.5B-Q4_K_M.gguf
+- !!merge <<: *qwen25
+  name: "katanemo_arch-function-chat-3b"
+  urls:
+    - https://huggingface.co/katanemo/Arch-Function-Chat-3B
+    - https://huggingface.co/bartowski/katanemo_Arch-Function-Chat-3B-GGUF
+  description: |
+    The Arch-Function-Chat collection builds upon the Katanemo's Arch-Function collection by extending its capabilities beyond function calling. This new collection maintains the state-of-the-art(SOTA) function calling performance of the original collection while adding powerful new features that make it even more versatile in real-world applications.
+
+    In addition to function calling capabilities, this collection now offers:
+
+        Clarify & refine: Generates natural follow-up questions to collect missing information for function calling
+        Interpret & respond: Provides human-friendly responses based on function execution results
+        Context management: Mantains context in complex multi-turn interactions
+
+    Note: Arch-Function-Chat is now the primarly LLM used in then open source Arch Gateway - An AI-native proxy for agents. For more details about the project, check out the Github README.
+  overrides:
+    parameters:
+      model: katanemo_Arch-Function-Chat-3B-Q4_K_M.gguf
+  files:
+    - filename: katanemo_Arch-Function-Chat-3B-Q4_K_M.gguf
+      sha256: f59dbef397bf1364b5f0a2c23a7f67c40ec63cc666036c4e7615fa7d79d4e1a0
+      uri: huggingface://bartowski/katanemo_Arch-Function-Chat-3B-GGUF/katanemo_Arch-Function-Chat-3B-Q4_K_M.gguf
+- !!merge <<: *qwen25
+  name: "open-thoughts_openthinker2-32b"
+  icon: https://huggingface.co/datasets/open-thoughts/open-thoughts-114k/resolve/main/open_thoughts.png
+  urls:
+    - https://huggingface.co/open-thoughts/OpenThinker2-32B
+    - https://huggingface.co/bartowski/open-thoughts_OpenThinker2-32B-GGUF
+  description: |
+    This model is a fine-tuned version of Qwen/Qwen2.5-32B-Instruct on the OpenThoughts2-1M dataset.
+
+    The OpenThinker2-32B model is the highest performing open-data model. This model improves upon our previous OpenThinker-32B model, which was trained on 114k examples from OpenThoughts-114k. The numbers reported in the table below are evaluated with our open-source tool Evalchemy.
+  overrides:
+    parameters:
+      model: open-thoughts_OpenThinker2-32B-Q4_K_M.gguf
+  files:
+    - filename: open-thoughts_OpenThinker2-32B-Q4_K_M.gguf
+      sha256: e9c7bf7cb349cfe07b4550759a3b4d7005834d0fa7580b23e483cbfeecd7a982
+      uri: huggingface://bartowski/open-thoughts_OpenThinker2-32B-GGUF/open-thoughts_OpenThinker2-32B-Q4_K_M.gguf
+- !!merge <<: *qwen25
+  name: "open-thoughts_openthinker2-7b"
+  icon: https://huggingface.co/datasets/open-thoughts/open-thoughts-114k/resolve/main/open_thoughts.pnghttps://huggingface.co/datasets/open-thoughts/open-thoughts-114k/resolve/main/open_thoughts.png
+  urls:
+    - https://huggingface.co/open-thoughts/OpenThinker2-7B
+    - https://huggingface.co/bartowski/open-thoughts_OpenThinker2-7B-GGUF
+  description: |
+    This model is a fine-tuned version of Qwen/Qwen2.5-7B-Instruct on the OpenThoughts2-1M dataset.
+
+    The OpenThinker2-7B model is the top 7B open-data reasoning model. It delivers performance comparable to state of the art 7B models like DeepSeek-R1-Distill-7B across a suite of tasks. This model improves upon our previous OpenThinker-7B model, which was trained on 114k examples from OpenThoughts-114k. The numbers reported in the table below are evaluated with our open-source tool Evalchemy.
+  overrides:
+    parameters:
+      model: open-thoughts_OpenThinker2-7B-Q4_K_M.gguf
+  files:
+    - filename: open-thoughts_OpenThinker2-7B-Q4_K_M.gguf
+      sha256: 481d785047d66ae2eeaf14650a9e659ec4f7766a6414b6c7e92854c944201734
+      uri: huggingface://bartowski/open-thoughts_OpenThinker2-7B-GGUF/open-thoughts_OpenThinker2-7B-Q4_K_M.gguf
+- !!merge <<: *qwen25
+  name: "arliai_qwq-32b-arliai-rpr-v1"
+  icon: https://cdn-uploads.huggingface.co/production/uploads/6625f4a8a8d1362ebcc3851a/albSlnUy9dPVGVuLlsBua.jpeg
+  urls:
+    - https://huggingface.co/ArliAI/QwQ-32B-ArliAI-RpR-v1
+    - https://huggingface.co/bartowski/ArliAI_QwQ-32B-ArliAI-RpR-v1-GGUF
+  description: |
+    RpR (RolePlay with Reasoning) is a new series of models from ArliAI. This series builds directly upon the successful dataset curation methodology and training methods developed for the RPMax series.
+
+    RpR models use the same curated, deduplicated RP and creative writing dataset used for RPMax, with a focus on variety to ensure high creativity and minimize cross-context repetition. Users familiar with RPMax will recognize the unique, non-repetitive writing style unlike other finetuned-for-RP models.
+
+    With the release of QwQ as the first high performing open-source reasoning model that can be easily trained, it was clear that the available instruct and creative writing reasoning datasets contains only one response per example. This is type of single response dataset used for training reasoning models causes degraded output quality in long multi-turn chats. Which is why Arli AI decided to create a real RP model capable of long multi-turn chat with reasoning.
+
+    In order to create RpR, we first had to actually create the reasoning RP dataset by re-processing our existing known-good RPMax dataset into a reasoning dataset. This was possible by using the base QwQ Instruct model itself to create the reasoning process for every turn in the RPMax dataset conversation examples, which is then further refined in order to make sure the reasoning is in-line with the actual response examples from the dataset.
+
+    Another important thing to get right is to make sure the model is trained on examples that present reasoning blocks in the same way as it encounters it during inference. Which is, never seeing the reasoning blocks in it's context. In order to do this, the training run was completed using axolotl with manual template-free segments dataset in order to make sure that the model is never trained to see the reasoning block in the context. Just like how the model will be used during inference time.
+
+    The result of training QwQ on this dataset with this method are consistently coherent and interesting outputs even in long multi-turn RP chats. This is as far as we know the first true correctly-trained reasoning model trained for RP and creative writing.
+  overrides:
+    parameters:
+      model: ArliAI_QwQ-32B-ArliAI-RpR-v1-Q4_K_M.gguf
+  files:
+    - filename: ArliAI_QwQ-32B-ArliAI-RpR-v1-Q4_K_M.gguf
+      sha256: b0f2ca8f62a5d021e20db40608a109713e9d23e75b68b3b71b7654c04d596dcf
+      uri: huggingface://bartowski/ArliAI_QwQ-32B-ArliAI-RpR-v1-GGUF/ArliAI_QwQ-32B-ArliAI-RpR-v1-Q4_K_M.gguf
+- !!merge <<: *qwen25
+  name: "mensa-beta-14b-instruct-i1"
+  icon: https://cdn-uploads.huggingface.co/production/uploads/65bb837dbfb878f46c77de4c/DyO5Fvqwvee-UM9QqgWZS.png
+  urls:
+    - https://huggingface.co/prithivMLmods/Mensa-Beta-14B-Instruct
+    - https://huggingface.co/mradermacher/Mensa-Beta-14B-Instruct-i1-GGUF
+  description: |
+    weighted/imatrix quants of https://huggingface.co/prithivMLmods/Mensa-Beta-14B-Instruct
+  overrides:
+    parameters:
+      model: Mensa-Beta-14B-Instruct.i1-Q4_K_M.gguf
+  files:
+    - filename: Mensa-Beta-14B-Instruct.i1-Q4_K_M.gguf
+      sha256: 86ccd640d72dcf3129fdd5b94381a733a684672b22487784e388b2ee9de57760
+      uri: huggingface://mradermacher/Mensa-Beta-14B-Instruct-i1-GGUF/Mensa-Beta-14B-Instruct.i1-Q4_K_M.gguf
+- !!merge <<: *qwen25
+  name: "cogito-v1-preview-qwen-14B"
+  icon: https://huggingface.co/deepcogito/cogito-v1-preview-qwen-14B/resolve/main/images/deep-cogito-logo.png
+  urls:
+    - https://huggingface.co/deepcogito/cogito-v1-preview-qwen-14B
+    - https://huggingface.co/NikolayKozloff/cogito-v1-preview-qwen-14B-Q4_K_M-GGUF
+  description: |
+      The Cogito LLMs are instruction tuned generative models (text in/text out). All models are released under an open license for commercial use.
+      Cogito models are hybrid reasoning models. Each model can answer directly (standard LLM), or self-reflect before answering (like reasoning models).
+      The LLMs are trained using Iterated Distillation and Amplification (IDA) - an scalable and efficient alignment strategy for superintelligence using iterative self-improvement.
+      The models have been optimized for coding, STEM, instruction following and general helpfulness, and have significantly higher multilingual, coding and tool calling capabilities than size equivalent counterparts.
+          In both standard and reasoning modes, Cogito v1-preview models outperform their size equivalent counterparts on common industry benchmarks.
+      Each model is trained in over 30 languages and supports a context length of 128k.
+  overrides:
+    parameters:
+      model: cogito-v1-preview-qwen-14b-q4_k_m.gguf
+  files:
+    - filename: cogito-v1-preview-qwen-14b-q4_k_m.gguf
+      sha256: 42ddd667bac3e5f0989f52b3dca5767ed15d0e5077c6f537e4b3873862ff7096
+      uri: huggingface://NikolayKozloff/cogito-v1-preview-qwen-14B-Q4_K_M-GGUF/cogito-v1-preview-qwen-14b-q4_k_m.gguf
+- !!merge <<: *qwen25
+  name: "deepcogito_cogito-v1-preview-qwen-32b"
+  icon: https://huggingface.co/deepcogito/cogito-v1-preview-qwen-32B/resolve/main/images/deep-cogito-logo.png
+  urls:
+    - https://huggingface.co/deepcogito/cogito-v1-preview-qwen-32B
+    - https://huggingface.co/bartowski/deepcogito_cogito-v1-preview-qwen-32B-GGUF
+  description: |
+    The Cogito LLMs are instruction tuned generative models (text in/text out). All models are released under an open license for commercial use.
+
+    Cogito models are hybrid reasoning models. Each model can answer directly (standard LLM), or self-reflect before answering (like reasoning models).
+    The LLMs are trained using Iterated Distillation and Amplification (IDA) - an scalable and efficient alignment strategy for superintelligence using iterative self-improvement.
+    The models have been optimized for coding, STEM, instruction following and general helpfulness, and have significantly higher multilingual, coding and tool calling capabilities than size equivalent counterparts.
+        In both standard and reasoning modes, Cogito v1-preview models outperform their size equivalent counterparts on common industry benchmarks.
+    Each model is trained in over 30 languages and supports a context length of 128k.
+  overrides:
+    parameters:
+      model: deepcogito_cogito-v1-preview-qwen-32B-Q4_K_M.gguf
+  files:
+    - filename: deepcogito_cogito-v1-preview-qwen-32B-Q4_K_M.gguf
+      sha256: 985f2d49330090e64603309f7eb61030769f25a5da027ac0b0a740858d087ad8
+      uri: huggingface://bartowski/deepcogito_cogito-v1-preview-qwen-32B-GGUF/deepcogito_cogito-v1-preview-qwen-32B-Q4_K_M.gguf
+- !!merge <<: *qwen25
+  name: "soob3123_amoral-cogito-v1-preview-qwen-14b"
+  urls:
+    - https://huggingface.co/soob3123/amoral-cogito-v1-preview-qwen-14B
+    - https://huggingface.co/bartowski/soob3123_amoral-cogito-v1-preview-qwen-14B-GGUF
+  description: |
+    Key Features
+      Neutral response protocol (bias dampening layers)
+      Reduced refusal rate vs base Llama-3
+      Moral phrasing detection/reformulation
+    Use Cases
+        Controversial topic analysis
+        Ethical philosophy simulations
+        Academic research requiring neutral framing
+  overrides:
+    parameters:
+      model: soob3123_amoral-cogito-v1-preview-qwen-14B-Q4_K_M.gguf
+  files:
+    - filename: soob3123_amoral-cogito-v1-preview-qwen-14B-Q4_K_M.gguf
+      sha256: c01a0b0c44345011dc61212fb1c0ffdba32f85e702d2f3d4abeb2a09208d6184
+      uri: huggingface://bartowski/soob3123_amoral-cogito-v1-preview-qwen-14B-GGUF/soob3123_amoral-cogito-v1-preview-qwen-14B-Q4_K_M.gguf
+- !!merge <<: *qwen25
+  name: "tesslate_gradience-t1-3b-preview"
+  urls:
+    - https://huggingface.co/Tesslate/Gradience-T1-3B-preview
+    - https://huggingface.co/bartowski/Tesslate_Gradience-T1-3B-preview-GGUF
+  description: |
+    This model is still in preview/beta. We're still working on it! This is just so the community can try out our new "Gradient Reasoning" that intends to break problems down and reason faster.
+    You can use a system prompt to enable thinking: "First, think step-by-step to reach the solution. Enclose your entire reasoning process within <|begin_of_thought|> and <|end_of_thought|> tags." You can try sampling params: Temp: 0.76, TopP: 0.62, Topk 30-68, Rep: 1.0, minp: 0.05
+  overrides:
+    parameters:
+      model: Tesslate_Gradience-T1-3B-preview-Q4_K_M.gguf
+  files:
+    - filename: Tesslate_Gradience-T1-3B-preview-Q4_K_M.gguf
+      sha256: 119ccefa09e3756750a983301f8bbb95e6c8fce6941a5d91490dac600f887111
+      uri: huggingface://bartowski/Tesslate_Gradience-T1-3B-preview-GGUF/Tesslate_Gradience-T1-3B-preview-Q4_K_M.gguf
+- !!merge <<: *qwen25
+  name: "lightthinker-qwen"
+  urls:
+    - https://huggingface.co/zjunlp/LightThinker-Qwen
+    - https://huggingface.co/mradermacher/LightThinker-Qwen-GGUF
+  description: |
+    LLMs have shown remarkable performance in complex reasoning tasks, but their efficiency is hindered by the substantial memory and computational costs associated with generating lengthy tokens. In this paper, we propose LightThinker, a novel method that enables LLMs to dynamically compress intermediate thoughts during reasoning. Inspired by human cognitive processes, LightThinker compresses verbose thought steps into compact representations and discards the original reasoning chains, thereby significantly reducing the number of tokens stored in the context window. This is achieved by training the model on when and how to perform compression through data construction, mapping hidden states to condensed gist tokens, and creating specialized attention masks.
+  overrides:
+    parameters:
+      model: LightThinker-Qwen.Q4_K_M.gguf
+  files:
+    - filename: LightThinker-Qwen.Q4_K_M.gguf
+      sha256: f52f27c23fa734b1a0306efd29fcb80434364e7a1077695574e9a4f5e48b7ed2
+      uri: huggingface://mradermacher/LightThinker-Qwen-GGUF/LightThinker-Qwen.Q4_K_M.gguf
+- !!merge <<: *qwen25
+  name: "mag-picaro-72b"
+  icon: https://cdn-uploads.huggingface.co/production/uploads/66c26b6fb01b19d8c3c2467b/hrYOp7JiH7o5ij1WEoyCZ.png
+  urls:
+    - https://huggingface.co/Delta-Vector/Mag-Picaro-72B
+    - https://huggingface.co/mradermacher/Mag-Picaro-72B-GGUF
+  description: |
+    A scaled up version of Mag-Picaro, Funded by PygmalionAI as alternative to their Magnum Large option.
+    Fine-tuned on top of Qwen-2-Instruct, Mag-Picaro has been then slerp-merged at 50/50 weight with Magnum-V2. If you like the model support me on Ko-Fi https://ko-fi.com/deltavector
+  overrides:
+    parameters:
+      model: Mag-Picaro-72B.Q4_K_M.gguf
+  files:
+    - filename: Mag-Picaro-72B.Q4_K_M.gguf
+      sha256: 3fda6cf318a9082ef7b502c4384ee3ea5f9f9f44268b852a2e46d71bcea29d5a
+      uri: huggingface://mradermacher/Mag-Picaro-72B-GGUF/Mag-Picaro-72B.Q4_K_M.gguf
+- !!merge <<: *qwen25
+  name: "m1-32b"
+  urls:
+    - https://huggingface.co/Can111/m1-32b
+    - https://huggingface.co/mradermacher/m1-32b-GGUF
+  description: |
+    M1-32B is a 32B-parameter large language model fine-tuned from Qwen2.5-32B-Instruct on the M500 dataset—an interdisciplinary multi-agent collaborative reasoning dataset. M1-32B is optimized for improved reasoning, discussion, and decision-making in multi-agent systems (MAS), including frameworks such as AgentVerse.
+
+    Code: https://github.com/jincan333/MAS-TTS
+  overrides:
+    parameters:
+      model: m1-32b.Q4_K_M.gguf
+  files:
+    - filename: m1-32b.Q4_K_M.gguf
+      sha256: 1dfa3b6822447aca590d6f2881cf277bd0fbde633a39c5a20b521f4a59145e3f
+      uri: huggingface://mradermacher/m1-32b-GGUF/m1-32b.Q4_K_M.gguf
 - &llama31
  url: "github:mudler/LocalAI/gallery/llama3.1-instruct.yaml@master" ## LLama3.1
  icon: https://avatars.githubusercontent.com/u/153379578
@@ -7548,6 +8094,93 @@
    - filename: TextSynth-8B.i1-Q4_K_M.gguf
      sha256: 9186a8cb3a797cd2cd5b2eeaee99808674d96731824a9ee45685bbf480ba56c3
      uri: huggingface://mradermacher/TextSynth-8B-i1-GGUF/TextSynth-8B.i1-Q4_K_M.gguf
+- !!merge <<: *llama31
+  name: "deepcogito_cogito-v1-preview-llama-8b"
+  icon: https://huggingface.co/deepcogito/cogito-v1-preview-llama-8B/resolve/main/images/deep-cogito-logo.png
+  urls:
+    - https://huggingface.co/deepcogito/cogito-v1-preview-llama-8B
+    - https://huggingface.co/bartowski/deepcogito_cogito-v1-preview-llama-8B-GGUF
+  description: |
+    The Cogito LLMs are instruction tuned generative models (text in/text out). All models are released under an open license for commercial use.
+
+    Cogito models are hybrid reasoning models. Each model can answer directly (standard LLM), or self-reflect before answering (like reasoning models).
+    The LLMs are trained using Iterated Distillation and Amplification (IDA) - an scalable and efficient alignment strategy for superintelligence using iterative self-improvement.
+    The models have been optimized for coding, STEM, instruction following and general helpfulness, and have significantly higher multilingual, coding and tool calling capabilities than size equivalent counterparts.
+        In both standard and reasoning modes, Cogito v1-preview models outperform their size equivalent counterparts on common industry benchmarks.
+    Each model is trained in over 30 languages and supports a context length of 128k.
+  overrides:
+    parameters:
+      model: deepcogito_cogito-v1-preview-llama-8B-Q4_K_M.gguf
+  files:
+    - filename: deepcogito_cogito-v1-preview-llama-8B-Q4_K_M.gguf
+      sha256: 445173fb1dacef3fa0be49ebb4512b948fdb1434d86732de198424695b017b50
+      uri: huggingface://bartowski/deepcogito_cogito-v1-preview-llama-8B-GGUF/deepcogito_cogito-v1-preview-llama-8B-Q4_K_M.gguf
+- !!merge <<: *llama31
+  name: "hamanasu-adventure-4b-i1"
+  url: "github:mudler/LocalAI/gallery/chatml.yaml@master"
+  icon: https://cdn-uploads.huggingface.co/production/uploads/66c26b6fb01b19d8c3c2467b/o5WjJKA9f95ri9UzRxZQE.png
+  urls:
+    - https://huggingface.co/Delta-Vector/Hamanasu-Adventure-4B
+    - https://huggingface.co/mradermacher/Hamanasu-Adventure-4B-i1-GGUF
+  description: |
+    Thanks to PocketDoc's Adventure datasets and taking his Dangerous Winds models as inspiration, I was able to finetune a small Adventure model that HATES the User
+    The model is suited for Text Adventure, All thanks to Tav for funding the train.
+    Support me and my finetunes on Ko-Fi https://ko-fi.com/deltavector
+  overrides:
+    parameters:
+      model: Hamanasu-Adventure-4B.i1-Q4_K_M.gguf
+  files:
+    - filename: Hamanasu-Adventure-4B.i1-Q4_K_M.gguf
+      sha256: d4f2bb3bdd99dbfe1019368813c8b6574c4c53748ff58e1b0cc1786b32cc9f5d
+      uri: huggingface://mradermacher/Hamanasu-Adventure-4B-i1-GGUF/Hamanasu-Adventure-4B.i1-Q4_K_M.gguf
+- !!merge <<: *llama31
+  name: "hamanasu-magnum-4b-i1"
+  url: "github:mudler/LocalAI/gallery/chatml.yaml@master"
+  icon: https://cdn-uploads.huggingface.co/production/uploads/66c26b6fb01b19d8c3c2467b/o5WjJKA9f95ri9UzRxZQE.png
+  urls:
+    - https://huggingface.co/Delta-Vector/Hamanasu-Magnum-4B
+    - https://huggingface.co/mradermacher/Hamanasu-Magnum-4B-i1-GGUF
+  description: |
+    This is a model designed to replicate the prose quality of the Claude 3 series of models. specifically Sonnet and Opus - Made with a prototype magnum V5 datamix.
+    The model is suited for traditional RP, All thanks to Tav for funding the train.
+    Support me and my finetunes on Ko-Fi https://ko-fi.com/deltavector
+  overrides:
+    parameters:
+      model: Hamanasu-Magnum-4B.i1-Q4_K_M.gguf
+  files:
+    - filename: Hamanasu-Magnum-4B.i1-Q4_K_M.gguf
+      sha256: 7eb6d1bfda7c0a5bf62de754323cf59f14ddd394550a5893b7bd086fd1906361
+      uri: huggingface://mradermacher/Hamanasu-Magnum-4B-i1-GGUF/Hamanasu-Magnum-4B.i1-Q4_K_M.gguf
+- !!merge <<: *llama31
+  name: "nvidia_llama-3.1-8b-ultralong-1m-instruct"
+  icon: https://cdn-avatars.huggingface.co/v1/production/uploads/1613114437487-60262a8e0703121c822a80b6.png
+  urls:
+    - https://huggingface.co/nvidia/Llama-3.1-8B-UltraLong-1M-Instruct
+    - https://huggingface.co/bartowski/nvidia_Llama-3.1-8B-UltraLong-1M-Instruct-GGUF
+  description: |
+    We introduce UltraLong-8B, a series of ultra-long context language models designed to process extensive sequences of text (up to 1M, 2M, and 4M tokens) while maintaining competitive performance on standard benchmarks. Built on the Llama-3.1, UltraLong-8B leverages a systematic training recipe that combines efficient continued pretraining with instruction tuning to enhance long-context understanding and instruction-following capabilities. This approach enables our models to efficiently scale their context windows without sacrificing general performance.
+  overrides:
+    parameters:
+      model: nvidia_Llama-3.1-8B-UltraLong-1M-Instruct-Q4_K_M.gguf
+  files:
+    - filename: nvidia_Llama-3.1-8B-UltraLong-1M-Instruct-Q4_K_M.gguf
+      sha256: 22e59b0eff7fd7b77403027fb758f75ad41c78a4f56adc10ca39802c64fe97fa
+      uri: huggingface://bartowski/nvidia_Llama-3.1-8B-UltraLong-1M-Instruct-GGUF/nvidia_Llama-3.1-8B-UltraLong-1M-Instruct-Q4_K_M.gguf
+- !!merge <<: *llama31
+  name: "nvidia_llama-3.1-8b-ultralong-4m-instruct"
+  icon: https://cdn-avatars.huggingface.co/v1/production/uploads/1613114437487-60262a8e0703121c822a80b6.png
+  urls:
+    - https://huggingface.co/nvidia/Llama-3.1-8B-UltraLong-4M-Instruct
+    - https://huggingface.co/bartowski/nvidia_Llama-3.1-8B-UltraLong-4M-Instruct-GGUF
+  description: |
+    We introduce UltraLong-8B, a series of ultra-long context language models designed to process extensive sequences of text (up to 1M, 2M, and 4M tokens) while maintaining competitive performance on standard benchmarks. Built on the Llama-3.1, UltraLong-8B leverages a systematic training recipe that combines efficient continued pretraining with instruction tuning to enhance long-context understanding and instruction-following capabilities. This approach enables our models to efficiently scale their context windows without sacrificing general performance.
+  overrides:
+    parameters:
+      model: nvidia_Llama-3.1-8B-UltraLong-4M-Instruct-Q4_K_M.gguf
+  files:
+    - filename: nvidia_Llama-3.1-8B-UltraLong-4M-Instruct-Q4_K_M.gguf
+      sha256: c503c77c6d8cc4be53ce7cddb756cb571862f0422594c17e58a75d7be9f00907
+      uri: huggingface://bartowski/nvidia_Llama-3.1-8B-UltraLong-4M-Instruct-GGUF/nvidia_Llama-3.1-8B-UltraLong-4M-Instruct-Q4_K_M.gguf
 - !!merge <<: *llama33
  name: "llama-3.3-magicalgirl-2.5-i1"
  icon: https://cdn-uploads.huggingface.co/production/uploads/633e85093a17ab61de8d9073/FGK0qBGmELj6DEUxbbrdR.png
@@ -8026,6 +8659,115 @@
    - filename: Fallen-Safeword-70B-R1-v4.1.Q4_K_M.gguf
      sha256: aed6bd5bb03b7bd886939237bc10ea6331d4feb5a3b6712e0c5474a778acf817
      uri: huggingface://mradermacher/Fallen-Safeword-70B-R1-v4.1-GGUF/Fallen-Safeword-70B-R1-v4.1.Q4_K_M.gguf
+- !!merge <<: *deepseek-r1
+  name: "agentica-org_deepcoder-14b-preview"
+  urls:
+    - https://huggingface.co/agentica-org/DeepCoder-14B-Preview
+    - https://huggingface.co/bartowski/agentica-org_DeepCoder-14B-Preview-GGUF
+  description: |
+    DeepCoder-14B-Preview is a code reasoning LLM fine-tuned from DeepSeek-R1-Distilled-Qwen-14B using distributed reinforcement learning (RL) to scale up to long context lengths. The model achieves 60.6% Pass@1 accuracy on LiveCodeBench v5 (8/1/24-2/1/25), representing a 8% improvement over the base model (53%) and achieving similar performance to OpenAI's o3-mini with just 14B parameters.
+  overrides:
+    parameters:
+      model: agentica-org_DeepCoder-14B-Preview-Q4_K_M.gguf
+  files:
+    - filename: agentica-org_DeepCoder-14B-Preview-Q4_K_M.gguf
+      sha256: 38f0f777de3116ca27d10ec84388b3290a1bf3f7db8c5bdc1f92d100e4231870
+      uri: huggingface://bartowski/agentica-org_DeepCoder-14B-Preview-GGUF/agentica-org_DeepCoder-14B-Preview-Q4_K_M.gguf
+- !!merge <<: *deepseek-r1
+  name: "agentica-org_deepcoder-1.5b-preview"
+  urls:
+    - https://huggingface.co/agentica-org/DeepCoder-1.5B-Preview
+    - https://huggingface.co/bartowski/agentica-org_DeepCoder-1.5B-Preview-GGUF
+  description: |
+    DeepCoder-1.5B-Preview is a code reasoning LLM fine-tuned from DeepSeek-R1-Distilled-Qwen-1.5B using distributed reinforcement learning (RL) to scale up to long context lengths.
+    Data
+
+    Our training dataset consists of approximately 24K unique problem-tests pairs compiled from:
+
+        Taco-Verified
+        PrimeIntellect SYNTHETIC-1
+        LiveCodeBench v5 (5/1/23-7/31/24)
+  overrides:
+    parameters:
+      model: agentica-org_DeepCoder-1.5B-Preview-Q4_K_M.gguf
+  files:
+    - filename: agentica-org_DeepCoder-1.5B-Preview-Q4_K_M.gguf
+      sha256: 9ddd89eddf8d56b1c16317932af56dc59b49ca2beec735d1332f5a3e0f225714
+      uri: huggingface://bartowski/agentica-org_DeepCoder-1.5B-Preview-GGUF/agentica-org_DeepCoder-1.5B-Preview-Q4_K_M.gguf
+- !!merge <<: *deepseek-r1
+  name: "zyphra_zr1-1.5b"
+  urls:
+    - https://huggingface.co/Zyphra/ZR1-1.5B
+    - https://huggingface.co/bartowski/Zyphra_ZR1-1.5B-GGUF
+  description: |
+    ZR1-1.5B is a small reasoning model trained extensively on both verified coding and mathematics problems with reinforcement learning. The model outperforms Llama-3.1-70B-Instruct on hard coding tasks and improves upon the base R1-Distill-1.5B model by over 50%, while achieving strong scores on math evaluations and a 37.91% pass@1 accuracy on GPQA-Diamond with just 1.5B parameters.
+  overrides:
+    parameters:
+      model: Zyphra_ZR1-1.5B-Q4_K_M.gguf
+  files:
+    - filename: Zyphra_ZR1-1.5B-Q4_K_M.gguf
+      sha256: 5442a9303f651eec30d8d17cd649982ddedf3629ff4faf3bf08d187900a7e7bd
+      uri: huggingface://bartowski/Zyphra_ZR1-1.5B-GGUF/Zyphra_ZR1-1.5B-Q4_K_M.gguf
+- !!merge <<: *deepseek-r1
+  name: "skywork_skywork-or1-7b-preview"
+  urls:
+    - https://huggingface.co/Skywork/Skywork-OR1-7B-Preview
+    - https://huggingface.co/bartowski/Skywork_Skywork-OR1-7B-Preview-GGUF
+  description: |
+    The Skywork-OR1 (Open Reasoner 1) model series consists of powerful math and code reasoning models trained using large-scale rule-based reinforcement learning with carefully designed datasets and training recipes. This series includes two general-purpose reasoning modelsl, Skywork-OR1-7B-Preview and Skywork-OR1-32B-Preview, along with a math-specialized model, Skywork-OR1-Math-7B.
+
+    Skywork-OR1-Math-7B is specifically optimized for mathematical reasoning, scoring 69.8 on AIME24 and 52.3 on AIME25 — well ahead of all models of similar size.
+    Skywork-OR1-32B-Preview delivers the 671B-parameter Deepseek-R1 performance on math tasks (AIME24 and AIME25) and coding tasks (LiveCodeBench).
+    Skywork-OR1-7B-Preview outperforms all similarly sized models in both math and coding scenarios.
+
+    The final release version will be available in two weeks.
+  overrides:
+    parameters:
+      model: Skywork_Skywork-OR1-7B-Preview-Q4_K_M.gguf
+  files:
+    - filename: Skywork_Skywork-OR1-7B-Preview-Q4_K_M.gguf
+      sha256: 5816934378dd1b9dd3a656efedef488bfa85eeeade467f99317f7cc4cbf6ceda
+      uri: huggingface://bartowski/Skywork_Skywork-OR1-7B-Preview-GGUF/Skywork_Skywork-OR1-7B-Preview-Q4_K_M.gguf
+- !!merge <<: *deepseek-r1
+  name: "skywork_skywork-or1-math-7b"
+  urls:
+    - https://huggingface.co/Skywork/Skywork-OR1-Math-7B
+    - https://huggingface.co/bartowski/Skywork_Skywork-OR1-Math-7B-GGUF
+  description: |
+    The Skywork-OR1 (Open Reasoner 1) model series consists of powerful math and code reasoning models trained using large-scale rule-based reinforcement learning with carefully designed datasets and training recipes. This series includes two general-purpose reasoning modelsl, Skywork-OR1-7B-Preview and Skywork-OR1-32B-Preview, along with a math-specialized model, Skywork-OR1-Math-7B.
+
+    Skywork-OR1-Math-7B is specifically optimized for mathematical reasoning, scoring 69.8 on AIME24 and 52.3 on AIME25 — well ahead of all models of similar size.
+    Skywork-OR1-32B-Preview delivers the 671B-parameter Deepseek-R1 performance on math tasks (AIME24 and AIME25) and coding tasks (LiveCodeBench).
+    Skywork-OR1-7B-Preview outperforms all similarly sized models in both math and coding scenarios.
+
+    The final release version will be available in two weeks.
+  overrides:
+    parameters:
+      model: Skywork_Skywork-OR1-Math-7B-Q4_K_M.gguf
+  files:
+    - filename: Skywork_Skywork-OR1-Math-7B-Q4_K_M.gguf
+      sha256: 4a28cc95da712d37f1aef701f3eff5591e437beba9f89faf29b2a2e7443dd170
+      uri: huggingface://bartowski/Skywork_Skywork-OR1-Math-7B-GGUF/Skywork_Skywork-OR1-Math-7B-Q4_K_M.gguf
+- !!merge <<: *deepseek-r1
+  name: "skywork_skywork-or1-32b-preview"
+  urls:
+    - https://huggingface.co/Skywork/Skywork-OR1-32B-Preview
+    - https://huggingface.co/bartowski/Skywork_Skywork-OR1-32B-Preview-GGUF
+  description: |
+    The Skywork-OR1 (Open Reasoner 1) model series consists of powerful math and code reasoning models trained using large-scale rule-based reinforcement learning with carefully designed datasets and training recipes. This series includes two general-purpose reasoning modelsl, Skywork-OR1-7B-Preview and Skywork-OR1-32B-Preview, along with a math-specialized model, Skywork-OR1-Math-7B.
+
+    Skywork-OR1-Math-7B is specifically optimized for mathematical reasoning, scoring 69.8 on AIME24 and 52.3 on AIME25 — well ahead of all models of similar size.
+    Skywork-OR1-32B-Preview delivers the 671B-parameter Deepseek-R1 performance on math tasks (AIME24 and AIME25) and coding tasks (LiveCodeBench).
+    Skywork-OR1-7B-Preview outperforms all similarly sized models in both math and coding scenarios.
+
+    The final release version will be available in two weeks.
+  overrides:
+    parameters:
+      model: Skywork_Skywork-OR1-32B-Preview-Q4_K_M.gguf
+  files:
+    - filename: Skywork_Skywork-OR1-32B-Preview-Q4_K_M.gguf
+      sha256: 304d4f6e6ac6c530b7427c30b43df3d19ae6160c68582b8815efb129533c2f0c
+      uri: huggingface://bartowski/Skywork_Skywork-OR1-32B-Preview-GGUF/Skywork_Skywork-OR1-32B-Preview-Q4_K_M.gguf
 - &qwen2
  url: "github:mudler/LocalAI/gallery/chatml.yaml@master" ## Start QWEN2
  name: "qwen2-7b-instruct"
@@ -9340,6 +10082,40 @@
    - filename: BlackSheep-24B.i1-Q4_K_M.gguf
      sha256: 95ae096eca05a95591254babf81b4d5617ceebbe8eda04c6cf8968ef4a69fc80
      uri: huggingface://mradermacher/BlackSheep-24B-i1-GGUF/BlackSheep-24B.i1-Q4_K_M.gguf
+- !!merge <<: *mistral03
+  name: "eurydice-24b-v2-i1"
+  icon: https://cdn-uploads.huggingface.co/production/uploads/652c2a63d78452c4742cd3d3/Hm_tg4s0D6yWmtrTHII32.png
+  urls:
+    - https://huggingface.co/aixonlab/Eurydice-24b-v2
+    - https://huggingface.co/mradermacher/Eurydice-24b-v2-i1-GGUF
+  description: |
+    Eurydice 24b v2 is designed to be the perfect companion for multi-role conversations. It demonstrates exceptional contextual understanding and excels in creativity, natural conversation and storytelling. Built on Mistral 3.1, this model has been trained on a custom dataset specifically crafted to enhance its capabilities.
+  overrides:
+    parameters:
+      model: Eurydice-24b-v2.i1-Q4_K_M.gguf
+  files:
+    - filename: Eurydice-24b-v2.i1-Q4_K_M.gguf
+      sha256: fb4104a1b33dd860e1eca3b6906a10cacc5b91a2534db72d9749652a204fbcbf
+      uri: huggingface://mradermacher/Eurydice-24b-v2-i1-GGUF/Eurydice-24b-v2.i1-Q4_K_M.gguf
+- !!merge <<: *mistral03
+  name: "trappu_magnum-picaro-0.7-v2-12b"
+  url: "github:mudler/LocalAI/gallery/chatml.yaml@master"
+  urls:
+    - https://huggingface.co/Trappu/Magnum-Picaro-0.7-v2-12b
+    - https://huggingface.co/bartowski/Trappu_Magnum-Picaro-0.7-v2-12b-GGUF
+  description: |
+    This model is a merge between Trappu/Nemo-Picaro-12B, a model trained on my own little dataset free of synthetic data, which focuses solely on storywriting and scenrio prompting (Example: [ Scenario: bla bla bla; Tags: bla bla bla ]), and anthracite-org/magnum-v2-12b.
+
+    The reason why I decided to merge it with Magnum (and don't recommend Picaro alone) is because that model, aside from its obvious flaws (rampant impersonation, stupid, etc...), is a one-trick pony and will be really rough for the average LLM user to handle. The idea was to have Magnum work as some sort of stabilizer to fix the issues that emerge from the lack of multiturn/smart data in Picaro's dataset. It worked, I think. I enjoy the outputs and it's smart enough to work with.
+
+    But yeah the goal of this merge was to make a model that's both good at storytelling/narration but also fine when it comes to other forms of creative writing such as RP or chatting. I don't think it's quite there yet but it's something for sure.
+  overrides:
+    parameters:
+      model: Trappu_Magnum-Picaro-0.7-v2-12b-Q4_K_M.gguf
+  files:
+    - filename: Trappu_Magnum-Picaro-0.7-v2-12b-Q4_K_M.gguf
+      sha256: 989839dd7eab997a70eb8430b9df1138f9b0f35d58299d5007e6555a4a4a7f4c
+      uri: huggingface://bartowski/Trappu_Magnum-Picaro-0.7-v2-12b-GGUF/Trappu_Magnum-Picaro-0.7-v2-12b-Q4_K_M.gguf
 - &mudler
  url: "github:mudler/LocalAI/gallery/mudler.yaml@master" ### START mudler's LocalAI specific-models
  name: "LocalAI-llama3-8b-function-call-v0.2"
--- a/main.go
+++ b/main.go
@@ -74,10 +74,9 @@ Version: ${version}
 		),
 		kong.UsageOnError(),
 		kong.Vars{
-			"basepath":         kong.ExpandPath("."),
-			"remoteLibraryURL": "https://raw.githubusercontent.com/mudler/LocalAI/master/embedded/model_library.yaml",
-			"galleries":        `[{"name":"localai", "url":"github:mudler/LocalAI/gallery/index.yaml@master"}]`,
-			"version":          internal.PrintableVersion(),
+			"basepath":  kong.ExpandPath("."),
+			"galleries": `[{"name":"localai", "url":"github:mudler/LocalAI/gallery/index.yaml@master"}]`,
+			"version":   internal.PrintableVersion(),
 		},
 	)

--- a/pkg/model/initializers.go
+++ b/pkg/model/initializers.go
@@ -473,8 +473,6 @@ func (ml *ModelLoader) backendLoader(opts ...Option) (client grpc.Backend, err e
 		backend = realBackend
 	}

-	ml.stopActiveBackends(o.modelID, o.singleActiveBackend)
-
 	var backendToConsume string

 	switch backend {
@@ -497,17 +495,37 @@ func (ml *ModelLoader) backendLoader(opts ...Option) (client grpc.Backend, err e
 }

 func (ml *ModelLoader) stopActiveBackends(modelID string, singleActiveBackend bool) {
+	if !singleActiveBackend {
+		return
+	}
+
 	// If we can have only one backend active, kill all the others (except external backends)
-	if singleActiveBackend {
-		log.Debug().Msgf("Stopping all backends except '%s'", modelID)
-		err := ml.StopGRPC(allExcept(modelID))
-		if err != nil {
-			log.Error().Err(err).Str("keptModel", modelID).Msg("error while shutting down all backends except for the keptModel - greedyloader continuing")
-		}
+
+	// Stop all backends except the one we are going to load
+	log.Debug().Msgf("Stopping all backends except '%s'", modelID)
+	err := ml.StopGRPC(allExcept(modelID))
+	if err != nil {
+		log.Error().Err(err).Str("keptModel", modelID).Msg("error while shutting down all backends except for the keptModel - greedyloader continuing")
 	}
 }

+func (ml *ModelLoader) Close() {
+	if !ml.singletonMode {
+		return
+	}
+	ml.singletonLock.Unlock()
+}
+
+func (ml *ModelLoader) lockBackend() {
+	if !ml.singletonMode {
+		return
+	}
+	ml.singletonLock.Lock()
+}
+
 func (ml *ModelLoader) Load(opts ...Option) (grpc.Backend, error) {
+	ml.lockBackend() // grab the singleton lock if needed
+
 	o := NewOptions(opts...)

 	// Return earlier if we have a model already loaded
@@ -518,17 +536,20 @@ func (ml *ModelLoader) Load(opts ...Option) (grpc.Backend, error) {
 		return m.GRPC(o.parallelRequests, ml.wd), nil
 	}

-	ml.stopActiveBackends(o.modelID, o.singleActiveBackend)
+	ml.stopActiveBackends(o.modelID, ml.singletonMode)

+	// if a backend is defined, return the loader directly
 	if o.backendString != "" {
 		return ml.backendLoader(opts...)
 	}

+	// Otherwise scan for backends in the asset directory
 	var err error

 	// get backends embedded in the binary
 	autoLoadBackends, err := ml.ListAvailableBackends(o.assetDir)
 	if err != nil {
+		ml.Close() // we failed, release the lock
 		return nil, err
 	}

@@ -560,5 +581,7 @@ func (ml *ModelLoader) Load(opts ...Option) (grpc.Backend, error) {
 		}
 	}

+	ml.Close() // make sure to release the lock in case of failure
+
 	return nil, fmt.Errorf("could not load model - all backends returned error: %s", err.Error())
 }
--- a/pkg/model/loader.go
+++ b/pkg/model/loader.go
@@ -18,16 +18,19 @@ import (

 // TODO: Split ModelLoader and TemplateLoader? Just to keep things more organized. Left together to share a mutex until I look into that. Would split if we seperate directories for .bin/.yaml and .tmpl
 type ModelLoader struct {
-	ModelPath string
-	mu        sync.Mutex
-	models    map[string]*Model
-	wd        *WatchDog
+	ModelPath     string
+	mu            sync.Mutex
+	singletonLock sync.Mutex
+	singletonMode bool
+	models        map[string]*Model
+	wd            *WatchDog
 }

-func NewModelLoader(modelPath string) *ModelLoader {
+func NewModelLoader(modelPath string, singleActiveBackend bool) *ModelLoader {
 	nml := &ModelLoader{
-		ModelPath: modelPath,
-		models:    make(map[string]*Model),
+		ModelPath:     modelPath,
+		models:        make(map[string]*Model),
+		singletonMode: singleActiveBackend,
 	}

 	return nml
@@ -142,26 +145,6 @@ func (ml *ModelLoader) LoadModel(modelID, modelName string, loader func(string,
 func (ml *ModelLoader) ShutdownModel(modelName string) error {
 	ml.mu.Lock()
 	defer ml.mu.Unlock()
-	model, ok := ml.models[modelName]
-	if !ok {
-		return fmt.Errorf("model %s not found", modelName)
-	}
-
-	retries := 1
-	for model.GRPC(false, ml.wd).IsBusy() {
-		log.Debug().Msgf("%s busy. Waiting.", modelName)
-		dur := time.Duration(retries*2) * time.Second
-		if dur > retryTimeout {
-			dur = retryTimeout
-		}
-		time.Sleep(dur)
-		retries++
-
-		if retries > 10 && os.Getenv("LOCALAI_FORCE_BACKEND_SHUTDOWN") == "true" {
-			log.Warn().Msgf("Model %s is still busy after %d retries. Forcing shutdown.", modelName, retries)
-			break
-		}
-	}

 	return ml.deleteProcess(modelName)
 }
--- a/pkg/model/loader_options.go
+++ b/pkg/model/loader_options.go
@@ -17,10 +17,9 @@ type Options struct {

 	externalBackends map[string]string

-	grpcAttempts        int
-	grpcAttemptsDelay   int
-	singleActiveBackend bool
-	parallelRequests    bool
+	grpcAttempts      int
+	grpcAttemptsDelay int
+	parallelRequests  bool
 }

 type Option func(*Options)
@@ -88,12 +87,6 @@ func WithContext(ctx context.Context) Option {
 	}
 }

-func WithSingleActiveBackend() Option {
-	return func(o *Options) {
-		o.singleActiveBackend = true
-	}
-}
-
 func WithModelID(id string) Option {
 	return func(o *Options) {
 		o.modelID = id
--- a/pkg/model/loader_test.go
+++ b/pkg/model/loader_test.go
@@ -21,7 +21,7 @@ var _ = Describe("ModelLoader", func() {
 		// Setup the model loader with a test directory
 		modelPath = "/tmp/test_model_path"
 		os.Mkdir(modelPath, 0755)
-		modelLoader = model.NewModelLoader(modelPath)
+		modelLoader = model.NewModelLoader(modelPath, false)
 	})

 	AfterEach(func() {
--- a/pkg/model/process.go
+++ b/pkg/model/process.go
@@ -9,25 +9,43 @@ import (
 	"strconv"
 	"strings"
 	"syscall"
+	"time"

 	"github.com/hpcloud/tail"
 	process "github.com/mudler/go-processmanager"
 	"github.com/rs/zerolog/log"
 )

+var forceBackendShutdown bool = os.Getenv("LOCALAI_FORCE_BACKEND_SHUTDOWN") == "true"
+
 func (ml *ModelLoader) deleteProcess(s string) error {
+	model, ok := ml.models[s]
+	if !ok {
+		log.Debug().Msgf("Model %s not found", s)
+		return fmt.Errorf("model %s not found", s)
+	}
+
 	defer delete(ml.models, s)

+	retries := 1
+	for model.GRPC(false, ml.wd).IsBusy() {
+		log.Debug().Msgf("%s busy. Waiting.", s)
+		dur := time.Duration(retries*2) * time.Second
+		if dur > retryTimeout {
+			dur = retryTimeout
+		}
+		time.Sleep(dur)
+		retries++
+
+		if retries > 10 && forceBackendShutdown {
+			log.Warn().Msgf("Model %s is still busy after %d retries. Forcing shutdown.", s, retries)
+			break
+		}
+	}
+
 	log.Debug().Msgf("Deleting process %s", s)

-	m, exists := ml.models[s]
-	if !exists {
-		log.Error().Msgf("Model does not exist %s", s)
-		// Nothing to do
-		return nil
-	}
-
-	process := m.Process()
+	process := model.Process()
 	if process == nil {
 		log.Error().Msgf("No process for %s", s)
 		// Nothing to do as there is no process
@@ -44,9 +62,12 @@ func (ml *ModelLoader) deleteProcess(s string) error {

 func (ml *ModelLoader) StopGRPC(filter GRPCProcessFilter) error {
 	var err error = nil
+	ml.mu.Lock()
+	defer ml.mu.Unlock()
+
 	for k, m := range ml.models {
 		if filter(k, m.Process()) {
-			e := ml.ShutdownModel(k)
+			e := ml.deleteProcess(k)
 			err = errors.Join(err, e)
 		}
 	}
--- a/tests/integration/stores_test.go
+++ b/tests/integration/stores_test.go
@@ -70,7 +70,7 @@ var _ = Describe("Integration tests for the stores backend(s) and internal APIs"
 				model.WithModel("test"),
 			}

-			sl = model.NewModelLoader("")
+			sl = model.NewModelLoader("", false)
 			sc, err = sl.Load(storeOpts...)
 			Expect(err).ToNot(HaveOccurred())
 			Expect(sc).ToNot(BeNil())
@@ -235,7 +235,7 @@ var _ = Describe("Integration tests for the stores backend(s) and internal APIs"
 			keys := [][]float32{{1.0, 0.0, 0.0}, {0.0, 1.0, 0.0}, {0.0, 0.0, 1.0}, {-1.0, 0.0, 0.0}}
 			vals := [][]byte{[]byte("x"), []byte("y"), []byte("z"), []byte("-z")}

-			err := store.SetCols(context.Background(), sc, keys, vals);
+			err := store.SetCols(context.Background(), sc, keys, vals)
 			Expect(err).ToNot(HaveOccurred())

 			_, _, sims, err := store.Find(context.Background(), sc, keys[0], 4)
@@ -247,7 +247,7 @@ var _ = Describe("Integration tests for the stores backend(s) and internal APIs"
 			keys := [][]float32{{1.0, 0.0, 1.0}, {0.0, 2.0, 0.0}, {0.0, 0.0, -1.0}, {-1.0, 0.0, -1.0}}
 			vals := [][]byte{[]byte("x"), []byte("y"), []byte("z"), []byte("-z")}

-			err := store.SetCols(context.Background(), sc, keys, vals);
+			err := store.SetCols(context.Background(), sc, keys, vals)
 			Expect(err).ToNot(HaveOccurred())

 			_, _, sims, err := store.Find(context.Background(), sc, keys[0], 4)
@@ -314,7 +314,7 @@ var _ = Describe("Integration tests for the stores backend(s) and internal APIs"

 			normalize(keys[6:])

-			err := store.SetCols(context.Background(), sc, keys, vals);
+			err := store.SetCols(context.Background(), sc, keys, vals)
 			Expect(err).ToNot(HaveOccurred())

 			expectTriangleEq(keys, vals)
@@ -341,7 +341,7 @@ var _ = Describe("Integration tests for the stores backend(s) and internal APIs"
 				c += 1
 			}

-			err := store.SetCols(context.Background(), sc, keys, vals);
+			err := store.SetCols(context.Background(), sc, keys, vals)
 			Expect(err).ToNot(HaveOccurred())

 			expectTriangleEq(keys, vals)
Author	SHA1	Message	Date
Ettore Di Giacinto	56f44d448c	chore(docs): decrease logo size, minor enhancements Signed-off-by: Ettore Di Giacinto <mudler@localai.io>	2025-04-15 22:00:51 +02:00
Richard Palethorpe	0f0fafacd9	fix(stablediffusion): Avoid overwriting SYCL specific flags from outer make call (#5181 ) Signed-off-by: Richard Palethorpe <io@richiejp.com>	2025-04-15 19:31:25 +02:00
Ettore Di Giacinto	4f239bac89	feat: rebrand - LocalAGI and LocalRecall joins the LocalAI stack family (#5159 ) * wip Signed-off-by: Ettore Di Giacinto <mudler@localai.io> * docs Signed-off-by: Ettore Di Giacinto <mudler@localai.io> * Update lotusdocs and hugo Signed-off-by: Ettore Di Giacinto <mudler@localai.io> * rephrasing Signed-off-by: Ettore Di Giacinto <mudler@localai.io> * fixups Signed-off-by: Ettore Di Giacinto <mudler@localai.io> * Latest fixups Signed-off-by: Ettore Di Giacinto <mudler@localai.io> * Adjust readme section Signed-off-by: Ettore Di Giacinto <mudler@localai.io> --------- Signed-off-by: Ettore Di Giacinto <mudler@localai.io>	2025-04-15 17:51:24 +02:00
Ettore Di Giacinto	04d74ac648	chore(model gallery): add m1-32b (#5182 ) Signed-off-by: Ettore Di Giacinto <mudler@localai.io>	2025-04-15 17:17:17 +02:00
Richard Palethorpe	18c3dc33ee	fix(stablediffusion): Pass ROCM LD CGO flags through to recursive make (#5179 ) Signed-off-by: Richard Palethorpe <io@richiejp.com>	2025-04-15 09:27:29 +02:00
LocalAI [bot]	508cfa7369	chore: ⬆️ Update ggml-org/llama.cpp to `d6d2c2ab8c8865784ba9fef37f2b2de3f2134d33` (#5178 ) ⬆️ Update ggml-org/llama.cpp Signed-off-by: github-actions[bot] <41898282+github-actions[bot]@users.noreply.github.com> Co-authored-by: mudler <2420543+mudler@users.noreply.github.com>	2025-04-14 23:10:16 +02:00
Ettore Di Giacinto	1f94cddbae	chore(model gallery): add nvidia_llama-3.1-8b-ultralong-4m-instruct (#5177 ) Signed-off-by: Ettore Di Giacinto <mudler@localai.io>	2025-04-14 12:30:55 +02:00
Ettore Di Giacinto	21ae7b4cd4	chore(model gallery): add nvidia_llama-3.1-8b-ultralong-1m-instruct (#5176 ) Signed-off-by: Ettore Di Giacinto <mudler@localai.io>	2025-04-14 12:28:09 +02:00
Ettore Di Giacinto	bef22ab547	chore(model gallery): add skywork_skywork-or1-32b-preview (#5175 ) Signed-off-by: Ettore Di Giacinto <mudler@localai.io>	2025-04-14 12:25:43 +02:00
Ettore Di Giacinto	eb04e8cdcf	chore(model gallery): add skywork_skywork-or1-math-7b (#5174 ) Signed-off-by: Ettore Di Giacinto <mudler@localai.io>	2025-04-14 12:23:44 +02:00
Ettore Di Giacinto	17e533a086	chore(model gallery): add skywork_skywork-or1-7b-preview (#5173 ) Signed-off-by: Ettore Di Giacinto <mudler@localai.io>	2025-04-14 12:20:20 +02:00
qwerty108109	4fc68409ff	Update README.md (#5172 ) Modified the README.md to separate out the different docker run commands to make it easier to copy into the terminal. Signed-off-by: qwerty108109 <97707491+qwerty108109@users.noreply.github.com>	2025-04-14 10:48:10 +02:00
Richard Palethorpe	e587044449	fix(stablediffusion): Avoid GGML commit which causes CUDA compile error (#5170 ) Signed-off-by: Richard Palethorpe <io@richiejp.com>	2025-04-14 09:29:09 +02:00
LocalAI [bot]	1f09db5161	chore: ⬆️ Update ggml-org/llama.cpp to `71e90e8813f90097701e62f7fce137d96ddf41e2` (#5171 ) ⬆️ Update ggml-org/llama.cpp Signed-off-by: github-actions[bot] <41898282+github-actions[bot]@users.noreply.github.com> Co-authored-by: mudler <2420543+mudler@users.noreply.github.com>	2025-04-13 21:46:07 +00:00
Ettore Di Giacinto	05b744f086	chore(model gallery): add daichi-12b (#5169 ) Signed-off-by: Ettore Di Giacinto <mudler@localai.io>	2025-04-13 15:53:11 +02:00
Ettore Di Giacinto	89ca4bc02d	chore(model gallery): add hamanasu-magnum-4b-i1 (#5168 ) Signed-off-by: Ettore Di Giacinto <mudler@localai.io>	2025-04-13 14:37:59 +02:00
Ettore Di Giacinto	e626aa48a4	chore(model gallery): add hamanasu-adventure-4b-i1 (#5167 ) Signed-off-by: Ettore Di Giacinto <mudler@localai.io>	2025-04-13 14:35:57 +02:00
Ettore Di Giacinto	752b5e0339	chore(model gallery): add mag-picaro-72b (#5166 ) Signed-off-by: Ettore Di Giacinto <mudler@localai.io>	2025-04-13 14:34:14 +02:00
Ettore Di Giacinto	637d72d6e3	chore(model gallery): add lightthinker-qwen (#5165 ) Signed-off-by: Ettore Di Giacinto <mudler@localai.io>	2025-04-13 14:31:05 +02:00
LocalAI [bot]	f3bfec580a	chore: ⬆️ Update ggml-org/llama.cpp to `bc091a4dc585af25c438c8473285a8cfec5c7695` (#5158 ) ⬆️ Update ggml-org/llama.cpp Signed-off-by: github-actions[bot] <41898282+github-actions[bot]@users.noreply.github.com> Co-authored-by: mudler <2420543+mudler@users.noreply.github.com>	2025-04-13 08:23:41 +00:00
Ettore Di Giacinto	165c1ddff3	chore(model gallery): add tesslate_gradience-t1-3b-preview (#5160 ) Signed-off-by: Ettore Di Giacinto <mudler@localai.io>	2025-04-12 10:37:40 +02:00
Ettore Di Giacinto	fb83238e9e	chore(model gallery): add zyphra_zr1-1.5b (#5157 ) Signed-off-by: Ettore Di Giacinto <mudler@localai.io>	2025-04-11 10:06:05 +02:00
Ettore Di Giacinto	700bfa41c7	chore(model gallery): add agentica-org_deepcoder-1.5b-preview (#5156 ) Signed-off-by: Ettore Di Giacinto <mudler@localai.io>	2025-04-11 10:03:59 +02:00
LocalAI [bot]	25bdc350df	chore: ⬆️ Update ggml-org/llama.cpp to `64eda5deb9859e87a020e56bab5d2f9ca956f1de` (#5155 ) ⬆️ Update ggml-org/llama.cpp Signed-off-by: github-actions[bot] <41898282+github-actions[bot]@users.noreply.github.com> Co-authored-by: mudler <2420543+mudler@users.noreply.github.com>	2025-04-10 21:44:55 +00:00
Richard Palethorpe	1b899e1a68	feat(stablediffusion): Enable SYCL (#5144 ) * feat(sycl): Enable SYCL for stable diffusion This is a pain because we compile with CGO, but SD is compiled with CMake. I don't think we can easily use CMake to set the linker flags necessary. Also I could not find pkg-config calls that would fully set the flags, so some of them are set manually. See https://www.intel.com/content/www/us/en/developer/tools/oneapi/onemkl-link-line-advisor.html for reference. I also resorted to searching the shared object files in MKLROOT/lib for the symbols. Signed-off-by: Richard Palethorpe <io@richiejp.com> * fix(ci): Don't set nproc on cmake Signed-off-by: Richard Palethorpe <io@richiejp.com> --------- Signed-off-by: Richard Palethorpe <io@richiejp.com>	2025-04-10 15:20:53 +02:00
Ettore Di Giacinto	3bf13f8c69	chore(model gallery): add soob3123_amoral-cogito-v1-preview-qwen-14b (#5154 ) Signed-off-by: Ettore Di Giacinto <mudler@localai.io>	2025-04-10 10:07:56 +02:00
Ettore Di Giacinto	7a00729374	chore(model gallery): add trappu_magnum-picaro-0.7-v2-12b (#5153 ) Signed-off-by: Ettore Di Giacinto <mudler@localai.io>	2025-04-10 10:03:42 +02:00
Ettore Di Giacinto	d484028532	feat(diffusers): add support for Lumina2Text2ImgPipeline (#4806 ) Signed-off-by: Ettore Di Giacinto <mudler@localai.io>	2025-04-10 09:55:51 +02:00
LocalAI [bot]	0eb7fc2c41	chore: ⬆️ Update ggml-org/llama.cpp to `d3bd7193ba66c15963fd1c59448f22019a8caf6e` (#5152 ) ⬆️ Update ggml-org/llama.cpp Signed-off-by: github-actions[bot] <41898282+github-actions[bot]@users.noreply.github.com> Co-authored-by: mudler <2420543+mudler@users.noreply.github.com>	2025-04-09 22:01:25 +00:00
Ettore Di Giacinto	a69e30e0c9	chore(model gallery): add agentica-org_deepcoder-14b-preview (#5151 ) Signed-off-by: Ettore Di Giacinto <mudler@localai.io>	2025-04-09 16:55:47 +02:00
Ettore Di Giacinto	9c018e6bff	chore(model gallery): add deepcogito_cogito-v1-preview-llama-70b (#5150 ) Signed-off-by: Ettore Di Giacinto <mudler@localai.io>	2025-04-09 16:54:59 +02:00
Ettore Di Giacinto	281e818047	chore(model gallery): add deepcogito_cogito-v1-preview-llama-70b (#5150 ) Signed-off-by: Ettore Di Giacinto <mudler@localai.io>	2025-04-09 16:53:28 +02:00
Ettore Di Giacinto	270f0e2157	chore(model gallery): add deepcogito_cogito-v1-preview-qwen-32b (#5149 ) Signed-off-by: Ettore Di Giacinto <mudler@localai.io>	2025-04-09 16:48:15 +02:00
Ettore Di Giacinto	673e59e76c	chore(model gallery): add deepcogito_cogito-v1-preview-llama-3b (#5148 ) Signed-off-by: Ettore Di Giacinto <mudler@localai.io>	2025-04-09 16:42:53 +02:00
LocalAI [bot]	5a8a2adb44	chore: ⬆️ Update ggml-org/llama.cpp to `b32efad2bc42460637c3a364c9554ea8217b3d7f` (#5146 ) ⬆️ Update ggml-org/llama.cpp Signed-off-by: github-actions[bot] <41898282+github-actions[bot]@users.noreply.github.com> Co-authored-by: mudler <2420543+mudler@users.noreply.github.com> Co-authored-by: Ettore Di Giacinto <mudler@users.noreply.github.com>	2025-04-09 15:39:04 +02:00
Ettore Di Giacinto	a7317d23bf	chore(model gallery): add deepcogito_cogito-v1-preview-llama-8b (#5147 ) Signed-off-by: Ettore Di Giacinto <mudler@localai.io>	2025-04-09 10:02:09 +02:00
Ettore Di Giacinto	2bab9b5fe2	fix: fix gallery name for cogito-v1-preview-qwen-14B Signed-off-by: Ettore Di Giacinto <mudler@users.noreply.github.com>	2025-04-08 22:15:32 +02:00
Ettore Di Giacinto	081be3ba7d	chore(model gallery): add cogito-v1-preview-qwen-14b (#5145 ) Signed-off-by: Ettore Di Giacinto <mudler@localai.io>	2025-04-08 22:04:14 +02:00
Ettore Di Giacinto	25e6f21322	chore(deps): bump llama.cpp to `4ccea213bc629c4eef7b520f7f6c59ce9bbdaca0` (#5143 ) Signed-off-by: Ettore Di Giacinto <mudler@localai.io>	2025-04-08 11:26:06 +02:00
Ettore Di Giacinto	b4df1c9cf3	fix(gemma): improve prompt for tool calls (#5142 ) Signed-off-by: Ettore Di Giacinto <mudler@localai.io>	2025-04-08 10:12:42 +02:00
Ettore Di Giacinto	4fbd6609f2	chore(model gallery): add meta-llama_llama-4-scout-17b-16e-instruct (#5141 ) Signed-off-by: Ettore Di Giacinto <mudler@localai.io>	2025-04-08 10:12:28 +02:00
Ettore Di Giacinto	7387932f89	chore(model gallery): add mensa-beta-14b-instruct-i1 (#5140 ) Signed-off-by: Ettore Di Giacinto <mudler@localai.io>	2025-04-08 10:01:24 +02:00
Ettore Di Giacinto	59c37e67b2	chore(model gallery): add eurydice-24b-v2-i1 (#5139 ) Signed-off-by: Ettore Di Giacinto <mudler@localai.io>	2025-04-08 09:56:29 +02:00
Ettore Di Giacinto	c09d227647	chore(model gallery): add watt-ai_watt-tool-70b (#5138 ) Signed-off-by: Ettore Di Giacinto <mudler@localai.io>	2025-04-08 09:42:49 +02:00
Ettore Di Giacinto	547d322b28	chore(model gallery): add arliai_qwq-32b-arliai-rpr-v (#5137 ) Signed-off-by: Ettore Di Giacinto <mudler@localai.io>	2025-04-08 09:40:26 +02:00
dependabot[bot]	a6f0bb410f	chore(deps): bump securego/gosec from 2.22.0 to 2.22.3 (#5134 ) Bumps [securego/gosec](https://github.com/securego/gosec) from 2.22.0 to 2.22.3. - [Release notes](https://github.com/securego/gosec/releases) - [Changelog](https://github.com/securego/gosec/blob/master/.goreleaser.yml) - [Commits](https://github.com/securego/gosec/compare/v2.22.0...v2.22.3) --- updated-dependencies: - dependency-name: securego/gosec dependency-version: 2.22.3 dependency-type: direct:production update-type: version-update:semver-patch ... Signed-off-by: dependabot[bot] <support@github.com> Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com>	2025-04-07 21:09:45 +00:00
Ettore Di Giacinto	710f624ecd	fix(webui): improve model display, do not block view (#5133 ) Signed-off-by: Ettore Di Giacinto <mudler@localai.io>	2025-04-07 18:03:25 +02:00
LocalAI [bot]	5018452be7	chore: ⬆️ Update ggml-org/llama.cpp to `916c83bfe7f8b08ada609c3b8e583cf5301e594b` (#5130 ) ⬆️ Update ggml-org/llama.cpp Signed-off-by: github-actions[bot] <41898282+github-actions[bot]@users.noreply.github.com> Co-authored-by: mudler <2420543+mudler@users.noreply.github.com>	2025-04-06 21:51:51 +00:00
Ettore Di Giacinto	ece239966f	chore: ⬆️ Update ggml-org/llama.cpp to `6bf28f0111ff9f21b3c1b1eace20c590281e7ba6` (#5127 ) Signed-off-by: Ettore Di Giacinto <mudler@localai.io>	2025-04-06 14:01:51 +02:00
Ettore Di Giacinto	3b8bc7e64c	chore(model gallery): add open-thoughts_openthinker2-7b (#5129 ) Signed-off-by: Ettore Di Giacinto <mudler@localai.io>	2025-04-06 10:53:22 +02:00
Ettore Di Giacinto	fc73b2b430	chore(model gallery): add open-thoughts_openthinker2-32b (#5128 ) Signed-off-by: Ettore Di Giacinto <mudler@localai.io>	2025-04-06 10:48:21 +02:00
Ettore Di Giacinto	901dba6063	chore(model gallery): add gemma-3-27b-it-qat (#5124 ) Signed-off-by: Ettore Di Giacinto <mudler@localai.io>	2025-04-05 08:46:49 +02:00
LocalAI [bot]	b88a7a4550	chore: ⬆️ Update ggml-org/llama.cpp to `3e1d29348b5d77269f6931500dd1c1a729d429c8` (#5123 ) ⬆️ Update ggml-org/llama.cpp Signed-off-by: github-actions[bot] <41898282+github-actions[bot]@users.noreply.github.com> Co-authored-by: mudler <2420543+mudler@users.noreply.github.com>	2025-04-04 21:49:53 +00:00
Ettore Di Giacinto	106e40845f	chore(model gallery): add katanemo_arch-function-chat-3b (#5122 )	2025-04-04 10:45:44 +02:00
Ettore Di Giacinto	0064bec8f5	chore(model gallery): add katanemo_arch-function-chat-1.5b (#5121 )	2025-04-04 10:31:44 +02:00
Ettore Di Giacinto	9e6dbb0b5a	chore(model gallery): add katanemo_arch-function-chat-7b (#5120 ) Signed-off-by: Ettore Di Giacinto <mudler@localai.io>	2025-04-04 10:29:47 +02:00
Ettore Di Giacinto	d26e61388b	chore(model gallery): add tesslate_synthia-s1-27b (#5119 ) Signed-off-by: Ettore Di Giacinto <mudler@localai.io>	2025-04-04 10:27:52 +02:00
Ettore Di Giacinto	31a7084c75	chore(model gallery): add gemma-3-4b-it-qat (#5118 ) Signed-off-by: Ettore Di Giacinto <mudler@localai.io>	2025-04-04 10:23:56 +02:00
Ettore Di Giacinto	128612a6fc	chore(model gallery): add gemma-3-12b-it-qat (#5117 ) Signed-off-by: Ettore Di Giacinto <mudler@localai.io>	2025-04-04 10:21:45 +02:00
LocalAI [bot]	6af3f46bc3	chore: ⬆️ Update ggml-org/llama.cpp to `c262beddf29f3f3be5bbbf167b56029a19876956` (#5116 ) ⬆️ Update ggml-org/llama.cpp Signed-off-by: github-actions[bot] <41898282+github-actions[bot]@users.noreply.github.com> Co-authored-by: mudler <2420543+mudler@users.noreply.github.com>	2025-04-03 22:59:49 +00:00
Richard Palethorpe	d2cf8ef070	fix(sycl): kernel not found error by forcing -fsycl (#5115 ) * chore(sycl): Update oneapi to 2025:1 Signed-off-by: Richard Palethorpe <io@richiejp.com> * fix(sycl): Pass -fsycl flag as workaround -fsycl should be set by llama.cpp's cmake file, but something goes wrong and it doesn't appear to get added Signed-off-by: Richard Palethorpe <io@richiejp.com> * fix(build): Speed up llama build by using all CPUs Signed-off-by: Richard Palethorpe <io@richiejp.com> --------- Signed-off-by: Richard Palethorpe <io@richiejp.com>	2025-04-03 16:22:59 +02:00
Ettore Di Giacinto	259ad3cfe6	chore(model gallery): add all-hands_openhands-lm-1.5b-v0.1 (#5114 ) Signed-off-by: Ettore Di Giacinto <mudler@localai.io>	2025-04-03 10:25:46 +02:00
Ettore Di Giacinto	18b320d577	chore(deps): bump llama.cpp to 'f01bd02376f919b05ee635f438311be8dfc91d7c (#5110 ) chore(deps): bump llama.cpp to 'f01bd02376f919b05ee635f438311be8dfc91d7c' Signed-off-by: Ettore Di Giacinto <mudler@localai.io>	2025-04-03 10:23:14 +02:00
Ettore Di Giacinto	89e151f035	chore(model gallery): add all-hands_openhands-lm-7b-v0.1 (#5113 ) Signed-off-by: Ettore Di Giacinto <mudler@localai.io>	2025-04-03 10:20:20 +02:00
Ettore Di Giacinto	22060f6410	chore(model gallery): add burtenshaw_gemmacoder3-12b (#5112 ) Signed-off-by: Ettore Di Giacinto <mudler@localai.io>	2025-04-03 10:17:57 +02:00
Ettore Di Giacinto	7ee3288460	chore(model gallery): add all-hands_openhands-lm-32b-v0.1 (#5111 ) Signed-off-by: Ettore Di Giacinto <mudler@localai.io>	2025-04-03 10:15:57 +02:00
LocalAI [bot]	cbbc954a8c	chore: ⬆️ Update ggml-org/llama.cpp to `f423981ac806bf031d83784bcb47d2721bc70f97` (#5108 ) ⬆️ Update ggml-org/llama.cpp Signed-off-by: github-actions[bot] <41898282+github-actions[bot]@users.noreply.github.com> Co-authored-by: mudler <2420543+mudler@users.noreply.github.com>	2025-04-02 09:22:53 +02:00
Ettore Di Giacinto	2c425e9c69	feat(loader): enhance single active backend by treating as singleton (#5107 ) feat(loader): enhance single active backend by treating at singleton Signed-off-by: Ettore Di Giacinto <mudler@localai.io>	2025-04-01 20:58:11 +02:00
LocalAI [bot]	c59975ab05	chore: ⬆️ Update ggml-org/llama.cpp to `c80a7759dab10657b9b6c3e87eef988a133b9b6a` (#5105 ) ⬆️ Update ggml-org/llama.cpp Signed-off-by: github-actions[bot] <41898282+github-actions[bot]@users.noreply.github.com> Co-authored-by: mudler <2420543+mudler@users.noreply.github.com>	2025-04-01 00:01:34 +02:00
Ettore Di Giacinto	05f7004487	fix: race during stop of active backends (#5106 ) * chore: drop double call to stop all backends, refactors Signed-off-by: Ettore Di Giacinto <mudler@localai.io> * fix: do lock when cycling to models to delete Signed-off-by: Ettore Di Giacinto <mudler@localai.io> --------- Signed-off-by: Ettore Di Giacinto <mudler@localai.io>	2025-04-01 00:01:10 +02:00
Ettore Di Giacinto	2f9203cd2a	chore: drop remoteLibraryURL from kong vars (#5103 ) Signed-off-by: Ettore Di Giacinto <mudler@localai.io>	2025-03-31 22:48:17 +02:00
LocalAI [bot]	f09b33f2ef	docs: ⬆️ update docs version mudler/LocalAI (#5104 ) ⬆️ Update docs version mudler/LocalAI Signed-off-by: github-actions[bot] <41898282+github-actions[bot]@users.noreply.github.com> Co-authored-by: mudler <2420543+mudler@users.noreply.github.com>	2025-03-31 22:48:03 +02:00
Ettore Di Giacinto	65470b0ab1	Update README	2025-03-31 21:51:09 +02:00
Ettore Di Giacinto	9a23fe662b	Update README.md Signed-off-by: Ettore Di Giacinto <mudler@users.noreply.github.com>	2025-03-31 19:35:34 +02:00
				`@@ -0,0 +1 @@`
				`{"name":"","short_name":"","icons":[{"src":"/android-chrome-192x192.png","sizes":"192x192","type":"image/png"},{"src":"/android-chrome-512x512.png","sizes":"512x512","type":"image/png"}],"theme_color":"#ffffff","background_color":"#ffffff","display":"standalone"}`