Compare commits

...

28 Commits

Author SHA1 Message Date
LocalAI [bot]
9ecfdc5938 chore: ⬆️ Update ggml-org/llama.cpp to 31c511a968348281e11d590446bb815048a1e912 (#6970)
⬆️ Update ggml-org/llama.cpp

Signed-off-by: github-actions[bot] <41898282+github-actions[bot]@users.noreply.github.com>
Co-authored-by: mudler <2420543+mudler@users.noreply.github.com>
2025-10-31 21:04:53 +00:00
Ettore Di Giacinto
c332ef5cce chore: fix linting issues
Signed-off-by: Ettore Di Giacinto <mudler@localai.io>
2025-10-31 19:08:34 +01:00
Ettore Di Giacinto
6e7a8c6041 chore(model gallery): add qwen3-vl-2b-instruct (#6967)
Signed-off-by: Ettore Di Giacinto <mudler@localai.io>
2025-10-31 19:04:10 +01:00
Ettore Di Giacinto
43e707ec4f chore(model gallery): add qwen3-vl-2b-thinking (#6966)
Signed-off-by: Ettore Di Giacinto <mudler@localai.io>
2025-10-31 19:03:23 +01:00
Ettore Di Giacinto
fed3663a74 chore(model gallery): add qwen3-vl-4b-thinking (#6965)
Signed-off-by: Ettore Di Giacinto <mudler@localai.io>
2025-10-31 19:02:22 +01:00
Ettore Di Giacinto
5b72798db3 chore(model gallery): add qwen3-vl-32b-instruct (#6964)
Signed-off-by: Ettore Di Giacinto <mudler@localai.io>
2025-10-31 19:01:11 +01:00
Ettore Di Giacinto
d24d6d4e93 chore(model gallery): add qwen3-vl-4b-instruct (#6963)
Signed-off-by: Ettore Di Giacinto <mudler@localai.io>
2025-10-31 18:57:50 +01:00
Ettore Di Giacinto
50ee1fbe06 chore(model gallery): add qwen3-vl-30b-a3b-thinking (#6962)
Signed-off-by: Ettore Di Giacinto <mudler@localai.io>
2025-10-31 18:53:13 +01:00
Ettore Di Giacinto
19f3425ce0 chore(model gallery): add huihui-qwen3-vl-30b-a3b-instruct-abliterated (#6961)
Signed-off-by: Ettore Di Giacinto <mudler@localai.io>
2025-10-31 18:46:25 +01:00
Ettore Di Giacinto
a6ef245534 chore(model gallery): add qwen3-vl-30b-a3b-instruct (#6960)
Signed-off-by: Ettore Di Giacinto <mudler@localai.io>
2025-10-31 18:37:12 +01:00
LocalAI [bot]
88cb379c2d chore(model gallery): 🤖 add 1 new models via gallery agent (#6940)
chore(model gallery): 🤖 add new models via gallery agent

Signed-off-by: github-actions[bot] <41898282+github-actions[bot]@users.noreply.github.com>
Co-authored-by: mudler <2420543+mudler@users.noreply.github.com>
2025-10-31 16:57:18 +01:00
LocalAI [bot]
0ddb2e8dcf chore: ⬆️ Update ggml-org/llama.cpp to 4146d6a1a6228711a487a1e3e9ddd120f8d027d7 (#6945)
⬆️ Update ggml-org/llama.cpp

Signed-off-by: github-actions[bot] <41898282+github-actions[bot]@users.noreply.github.com>
Co-authored-by: mudler <2420543+mudler@users.noreply.github.com>
2025-10-31 14:51:03 +00:00
Ettore Di Giacinto
91b9301bec Rename workflow from 'Bump dependencies' to 'Bump Documentation'
Signed-off-by: Ettore Di Giacinto <mudler@users.noreply.github.com>
2025-10-31 14:40:50 +01:00
Ettore Di Giacinto
fad5868f7b Rename job to 'bump-backends' in workflow
Signed-off-by: Ettore Di Giacinto <mudler@users.noreply.github.com>
2025-10-31 14:40:34 +01:00
LocalAI [bot]
1e5b9135df chore: ⬆️ Update ggml-org/llama.cpp to 16724b5b6836a2d4b8936a5824d2ff27c52b4517 (#6925)
⬆️ Update ggml-org/llama.cpp

Signed-off-by: github-actions[bot] <41898282+github-actions[bot]@users.noreply.github.com>
Co-authored-by: mudler <2420543+mudler@users.noreply.github.com>
2025-10-30 21:07:33 +00:00
LocalAI [bot]
36d19e23e0 chore(model gallery): 🤖 add 1 new models via gallery agent (#6921)
chore(model gallery): 🤖 add new models via gallery agent

Signed-off-by: github-actions[bot] <41898282+github-actions[bot]@users.noreply.github.com>
Co-authored-by: mudler <2420543+mudler@users.noreply.github.com>
2025-10-30 18:58:08 +01:00
LocalAI [bot]
cba9d1aac0 chore(model gallery): 🤖 add 1 new models via gallery agent (#6919)
chore(model gallery): 🤖 add new models via gallery agent

Signed-off-by: github-actions[bot] <41898282+github-actions[bot]@users.noreply.github.com>
Co-authored-by: mudler <2420543+mudler@users.noreply.github.com>
2025-10-30 17:26:18 +01:00
LocalAI [bot]
dd21a0d2f9 chore: ⬆️ Update ggml-org/llama.cpp to 3464bdac37027c5e9661621fc75ffcef3c19c6ef (#6896)
⬆️ Update ggml-org/llama.cpp

Signed-off-by: github-actions[bot] <41898282+github-actions[bot]@users.noreply.github.com>
Co-authored-by: mudler <2420543+mudler@users.noreply.github.com>
2025-10-30 14:17:58 +01:00
LocalAI [bot]
302a43b3ae chore(model gallery): 🤖 add 1 new models via gallery agent (#6911)
chore(model gallery): 🤖 add new models via gallery agent

Signed-off-by: github-actions[bot] <41898282+github-actions[bot]@users.noreply.github.com>
Co-authored-by: mudler <2420543+mudler@users.noreply.github.com>
2025-10-30 09:54:24 +01:00
LocalAI [bot]
2955061b42 chore(model gallery): 🤖 add 1 new models via gallery agent (#6910)
chore(model gallery): 🤖 add new models via gallery agent

Signed-off-by: github-actions[bot] <41898282+github-actions[bot]@users.noreply.github.com>
Co-authored-by: mudler <2420543+mudler@users.noreply.github.com>
2025-10-30 09:39:31 +01:00
LocalAI [bot]
84644ab693 chore(model gallery): 🤖 add 1 new models via gallery agent (#6908)
chore(model gallery): 🤖 add new models via gallery agent

Signed-off-by: github-actions[bot] <41898282+github-actions[bot]@users.noreply.github.com>
Co-authored-by: mudler <2420543+mudler@users.noreply.github.com>
2025-10-30 09:20:23 +01:00
Ettore Di Giacinto
b8f40dde1e feat: do also text match (#6891)
Signed-off-by: Ettore Di Giacinto <mudler@localai.io>
2025-10-29 17:18:56 +01:00
LocalAI [bot]
a6c9789a54 chore(model gallery): 🤖 add 1 new models via gallery agent (#6884)
chore(model gallery): 🤖 add new models via gallery agent

Signed-off-by: github-actions[bot] <41898282+github-actions[bot]@users.noreply.github.com>
Co-authored-by: mudler <2420543+mudler@users.noreply.github.com>
2025-10-29 10:56:57 +01:00
LocalAI [bot]
a48d9ce27c chore(model gallery): 🤖 add 1 new models via gallery agent (#6879)
chore(model gallery): 🤖 add new models via gallery agent

Signed-off-by: github-actions[bot] <41898282+github-actions[bot]@users.noreply.github.com>
Co-authored-by: mudler <2420543+mudler@users.noreply.github.com>
2025-10-29 08:19:51 +01:00
LocalAI [bot]
fb825a2708 chore: ⬆️ Update ggml-org/llama.cpp to 851553ea6b24cb39fd5fd188b437d777cb411de8 (#6869)
⬆️ Update ggml-org/llama.cpp

Signed-off-by: github-actions[bot] <41898282+github-actions[bot]@users.noreply.github.com>
Co-authored-by: mudler <2420543+mudler@users.noreply.github.com>
2025-10-29 08:16:55 +01:00
LocalAI [bot]
5558dce449 chore: ⬆️ Update ggml-org/whisper.cpp to c62adfbd1ecdaea9e295c72d672992514a2d887c (#6868)
⬆️ Update ggml-org/whisper.cpp

Signed-off-by: github-actions[bot] <41898282+github-actions[bot]@users.noreply.github.com>
Co-authored-by: mudler <2420543+mudler@users.noreply.github.com>
2025-10-28 21:12:05 +00:00
LocalAI [bot]
cf74a11e65 chore(model gallery): 🤖 add 1 new models via gallery agent (#6864)
chore(model gallery): 🤖 add new models via gallery agent

Signed-off-by: github-actions[bot] <41898282+github-actions[bot]@users.noreply.github.com>
Co-authored-by: mudler <2420543+mudler@users.noreply.github.com>
2025-10-28 17:20:57 +01:00
LocalAI [bot]
86b5deec81 chore(model gallery): 🤖 add 1 new models via gallery agent (#6863)
chore(model gallery): 🤖 add new models via gallery agent

Signed-off-by: github-actions[bot] <41898282+github-actions[bot]@users.noreply.github.com>
Co-authored-by: mudler <2420543+mudler@users.noreply.github.com>
2025-10-28 16:23:57 +01:00
7 changed files with 586 additions and 10 deletions

View File

@@ -1,10 +1,10 @@
name: Bump dependencies
name: Bump Backend dependencies
on:
schedule:
- cron: 0 20 * * *
workflow_dispatch:
jobs:
bump:
bump-backends:
strategy:
fail-fast: false
matrix:

View File

@@ -1,10 +1,10 @@
name: Bump dependencies
name: Bump Documentation
on:
schedule:
- cron: 0 20 * * *
workflow_dispatch:
jobs:
bump:
bump-docs:
strategy:
fail-fast: false
matrix:

View File

@@ -1,5 +1,5 @@
LLAMA_VERSION?=5a4ff43e7dd049e35942bc3d12361dab2f155544
LLAMA_VERSION?=31c511a968348281e11d590446bb815048a1e912
LLAMA_REPO?=https://github.com/ggerganov/llama.cpp
CMAKE_ARGS?=

View File

@@ -8,7 +8,7 @@ JOBS?=$(shell nproc --ignore=1)
# whisper.cpp version
WHISPER_REPO?=https://github.com/ggml-org/whisper.cpp
WHISPER_CPP_VERSION?=f16c12f3f55f5bd3d6ac8cf2f31ab90a42c884d5
WHISPER_CPP_VERSION?=c62adfbd1ecdaea9e295c72d672992514a2d887c
SO_TARGET?=libgowhisper.so
CMAKE_ARGS+=-DBUILD_SHARED_LIBS=OFF

View File

@@ -61,12 +61,15 @@ func (gm GalleryElements[T]) Search(term string) GalleryElements[T] {
term = strings.ToLower(term)
for _, m := range gm {
if fuzzy.Match(term, strings.ToLower(m.GetName())) ||
fuzzy.Match(term, strings.ToLower(m.GetDescription())) ||
fuzzy.Match(term, strings.ToLower(m.GetGallery().Name)) ||
strings.Contains(strings.ToLower(m.GetName()), term) ||
strings.Contains(strings.ToLower(m.GetDescription()), term) ||
strings.Contains(strings.ToLower(m.GetGallery().Name), term) ||
strings.Contains(strings.ToLower(strings.Join(m.GetTags(), ",")), term) {
filteredModels = append(filteredModels, m)
}
}
return filteredModels
}

View File

@@ -1,4 +1,186 @@
---
- &qwen3vl
url: "github:mudler/LocalAI/gallery/qwen3.yaml@master"
icon: https://cdn-avatars.huggingface.co/v1/production/uploads/620760a26e3b7210c2ff1943/-s1gyJfvbE1RgO5iBeNOi.png
license: apache-2.0
tags:
- llm
- gguf
- gpu
- image-to-text
- multimodal
- cpu
- qwen
- qwen3
- thinking
- reasoning
name: "qwen3-vl-30b-a3b-instruct"
urls:
- https://huggingface.co/unsloth/Qwen3-VL-30B-A3B-Instruct-GGUF
description: |
Meet Qwen3-VL — the most powerful vision-language model in the Qwen series to date.
This generation delivers comprehensive upgrades across the board: superior text understanding & generation, deeper visual perception & reasoning, extended context length, enhanced spatial and video dynamics comprehension, and stronger agent interaction capabilities.
Available in Dense and MoE architectures that scale from edge to cloud, with Instruct and reasoningenhanced Thinking editions for flexible, on-demand deployment.
#### Key Enhancements:
* **Visual Agent**: Operates PC/mobile GUIs—recognizes elements, understands functions, invokes tools, completes tasks.
* **Visual Coding Boost**: Generates Draw.io/HTML/CSS/JS from images/videos.
* **Advanced Spatial Perception**: Judges object positions, viewpoints, and occlusions; provides stronger 2D grounding and enables 3D grounding for spatial reasoning and embodied AI.
* **Long Context & Video Understanding**: Native 256K context, expandable to 1M; handles books and hours-long video with full recall and second-level indexing.
* **Enhanced Multimodal Reasoning**: Excels in STEM/Math—causal analysis and logical, evidence-based answers.
* **Upgraded Visual Recognition**: Broader, higher-quality pretraining is able to “recognize everything”—celebrities, anime, products, landmarks, flora/fauna, etc.
* **Expanded OCR**: Supports 32 languages (up from 19); robust in low light, blur, and tilt; better with rare/ancient characters and jargon; improved long-document structure parsing.
* **Text Understanding on par with pure LLMs**: Seamless textvision fusion for lossless, unified comprehension.
#### Model Architecture Updates:
1. **Interleaved-MRoPE**: Fullfrequency allocation over time, width, and height via robust positional embeddings, enhancing longhorizon video reasoning.
2. **DeepStack**: Fuses multilevel ViT features to capture fine-grained details and sharpen imagetext alignment.
3. **TextTimestamp Alignment:** Moves beyond TRoPE to precise, timestampgrounded event localization for stronger video temporal modeling.
This is the weight repository for Qwen3-VL-30B-A3B-Instruct.
overrides:
mmproj: mmproj/mmproj-F16.gguf
parameters:
model: Qwen3-VL-30B-A3B-Instruct-Q4_K_M.gguf
files:
- filename: Qwen3-VL-30B-A3B-Instruct-Q4_K_M.gguf
sha256: 75d8f4904016d90b71509c8576ebd047a0606cc5aa788eada29d4bedf9b761a6
uri: huggingface://unsloth/Qwen3-VL-30B-A3B-Instruct-GGUF/Qwen3-VL-30B-A3B-Instruct-Q4_K_M.gguf
- filename: mmproj/mmproj-F16.gguf
sha256: 7e7cec67a3a887bddbf38099738d08570e85f08dd126578fa00a7acf4dacef01
uri: huggingface://unsloth/Qwen3-VL-30B-A3B-Instruct-GGUF/mmproj-F16.gguf
- !!merge <<: *qwen3vl
name: "qwen3-vl-30b-a3b-thinking"
urls:
- https://huggingface.co/unsloth/Qwen3-VL-30B-A3B-Thinking-GGUF
description: |
Qwen3-VL-30B-A3B-Thinking is a 30B parameter model that is thinking.
overrides:
mmproj: mmproj/mmproj-F16.gguf
parameters:
model: Qwen3-VL-30B-A3B-Thinking-Q4_K_M.gguf
files:
- filename: Qwen3-VL-30B-A3B-Thinking-Q4_K_M.gguf
sha256: d3e12c6b15f59cc1c6db685d33eb510184d006ebbff0e038e7685e57ce628b3b
uri: huggingface://unsloth/Qwen3-VL-30B-A3B-Thinking-GGUF/Qwen3-VL-30B-A3B-Thinking-Q4_K_M.gguf
- filename: mmproj/mmproj-F16.gguf
sha256: 7e7cec67a3a887bddbf38099738d08570e85f08dd126578fa00a7acf4dacef01
uri: huggingface://unsloth/Qwen3-VL-30B-A3B-Thinking-GGUF/mmproj-F16.gguf
- !!merge <<: *qwen3vl
name: "qwen3-vl-4b-instruct"
urls:
- https://huggingface.co/unsloth/Qwen3-VL-4B-Instruct-GGUF
description: |
Qwen3-VL-4B-Instruct is the 4B parameter model of the Qwen3-VL series.
overrides:
mmproj: mmproj/mmproj-Qwen3-VL-4B-Instruct-F16.gguf
parameters:
model: Qwen3-VL-4B-Instruct-Q4_K_M.gguf
files:
- filename: Qwen3-VL-4B-Instruct-Q4_K_M.gguf
sha256: d4dcd426bfba75752a312b266b80fec8136fbaca13c62d93b7ac41fa67f0492b
uri: huggingface://unsloth/Qwen3-VL-4B-Instruct-GGUF/Qwen3-VL-4B-Instruct-Q4_K_M.gguf
- filename: mmproj/mmproj-Qwen3-VL-4B-Instruct-F16.gguf
sha256: 1b9f4e92f0fbda14d7d7b58baed86039b8a980fe503d9d6a9393f25c0028f1fc
uri: huggingface://unsloth/Qwen3-VL-4B-Instruct-GGUF/mmproj-F16.gguf
- !!merge <<: *qwen3vl
name: "qwen3-vl-32b-instruct"
urls:
- https://huggingface.co/unsloth/Qwen3-VL-32B-Instruct-GGUF
description: |
Qwen3-VL-32B-Instruct is the 32B parameter model of the Qwen3-VL series.
overrides:
mmproj: mmproj/mmproj-Qwen3-VL-32B-Instruct-F16.gguf
parameters:
model: Qwen3-VL-32B-Instruct-Q4_K_M.gguf
files:
- filename: Qwen3-VL-32B-Instruct-Q4_K_M.gguf
sha256: 17885d28e964b22b2faa981a7eaeeeb78da0972ee5f826ad5965f7583a610d9f
uri: huggingface://unsloth/Qwen3-VL-32B-Instruct-GGUF/Qwen3-VL-32B-Instruct-Q4_K_M.gguf
- filename: mmproj/mmproj-Qwen3-VL-32B-Instruct-F16.gguf
sha256: 14b1d68befa75a5e646dd990c5bb429c912b7aa9b49b9ab18231ca5f750421c9
uri: huggingface://unsloth/Qwen3-VL-32B-Instruct-GGUF/mmproj-F16.gguf
- !!merge <<: *qwen3vl
name: "qwen3-vl-4b-thinking"
urls:
- https://huggingface.co/unsloth/Qwen3-VL-4B-Thinking-GGUF
description: |
Qwen3-VL-4B-Thinking is the 4B parameter model of the Qwen3-VL series that is thinking.
overrides:
mmproj: mmproj/mmproj-Qwen3-VL-4B-Thinking-F16.gguf
parameters:
model: Qwen3-VL-4B-Thinking-Q4_K_M.gguf
files:
- filename: Qwen3-VL-4B-Thinking-Q4_K_M.gguf
sha256: bd73237f16265a1014979b7ed34ff9265e7e200ae6745bb1da383a1bbe0f9211
uri: huggingface://unsloth/Qwen3-VL-4B-Thinking-GGUF/Qwen3-VL-4B-Thinking-Q4_K_M.gguf
- filename: mmproj/mmproj-Qwen3-VL-4B-Thinking-F16.gguf
sha256: 72354fcd3fc75935b84e745ca492d6e78dd003bb5a020d71b296e7650926ac87
uri: huggingface://unsloth/Qwen3-VL-4B-Thinking-GGUF/mmproj-F16.gguf
- !!merge <<: *qwen3vl
name: "qwen3-vl-2b-thinking"
urls:
- https://huggingface.co/unsloth/Qwen3-VL-2B-Thinking-GGUF
description: |
Qwen3-VL-2B-Thinking is the 2B parameter model of the Qwen3-VL series that is thinking.
overrides:
mmproj: mmproj/mmproj-Qwen3-VL-2B-Thinking-F16.gguf
parameters:
model: Qwen3-VL-2B-Thinking-Q4_K_M.gguf
files:
- filename: Qwen3-VL-2B-Thinking-Q4_K_M.gguf
sha256: 5f282086042d96b78b138839610f5148493b354524090fadc5c97c981b70a26e
uri: huggingface://unsloth/Qwen3-VL-2B-Thinking-GGUF/Qwen3-VL-2B-Thinking-Q4_K_M.gguf
- filename: mmproj/mmproj-Qwen3-VL-2B-Thinking-F16.gguf
sha256: 4eabc90a52fe890d6ca1dad92548782eab6edc91f012a365fff95cf027ba529d
uri: huggingface://unsloth/Qwen3-VL-2B-Thinking-GGUF/mmproj-F16.gguf
- !!merge <<: *qwen3vl
name: "qwen3-vl-2b-instruct"
urls:
- https://huggingface.co/unsloth/Qwen3-VL-2B-Instruct-GGUF
description: |
Qwen3-VL-2B-Instruct is the 2B parameter model of the Qwen3-VL series.
overrides:
mmproj: mmproj/mmproj-Qwen3-VL-2B-Instruct-F16.gguf
parameters:
model: Qwen3-VL-2B-Instruct-Q4_K_M.gguf
files:
- filename: Qwen3-VL-2B-Instruct-Q4_K_M.gguf
sha256: 858fcf2a39dc73b26dd86592cb0a5f949b59d1edb365d1dea98e46b02e955e56
uri: huggingface://unsloth/Qwen3-VL-2B-Instruct-GGUF/Qwen3-VL-2B-Instruct-Q4_K_M.gguf
- filename: mmproj/mmproj-Qwen3-VL-2B-Instruct-F16.gguf
sha256: cd5a851d3928697fa1bd76d459d2cc409b6cf40c9d9682b2f5c8e7c6a9f9630f
uri: huggingface://unsloth/Qwen3-VL-2B-Instruct-GGUF/mmproj-F16.gguf
- !!merge <<: *qwen3vl
name: "huihui-qwen3-vl-30b-a3b-instruct-abliterated"
urls:
- https://huggingface.co/noctrex/Huihui-Qwen3-VL-30B-A3B-Instruct-abliterated-GGUF
description: |
These are quantizations of the model Huihui-Qwen3-VL-30B-A3B-Instruct-abliterated-GGUF
overrides:
mmproj: mmproj/mmproj-Huihui-Qwen3-VL-30B-A3B-F16.gguf
parameters:
model: Huihui-Qwen3-VL-30B-A3B-Instruct-abliterated-Q4_K_M.gguf
files:
- filename: Huihui-Qwen3-VL-30B-A3B-Instruct-abliterated-Q4_K_M.gguf
sha256: 1e94a65167a39d2ff4427393746d4dbc838f3d163c639d932e9ce983f575eabf
uri: huggingface://noctrex/Huihui-Qwen3-VL-30B-A3B-Instruct-abliterated-GGUF/Huihui-Qwen3-VL-30B-A3B-Instruct-abliterated-Q4_K_M.gguf
- filename: mmproj/mmproj-Huihui-Qwen3-VL-30B-A3B-F16.gguf
sha256: 4bfd655851a5609b29201154e0bd4fe5f9274073766b8ab35b3a8acba0dd77a7
uri: huggingface://noctrex/Huihui-Qwen3-VL-30B-A3B-Instruct-abliterated-GGUF/mmproj-F16.gguf
- &jamba
icon: https://cdn-avatars.huggingface.co/v1/production/uploads/65e60c0ed5313c06372446ff/QwehUHgP2HtVAMW5MzJ2j.png
name: "ai21labs_ai21-jamba-reasoning-3b"
@@ -22795,3 +22977,389 @@
- filename: wraith-8b.i1-Q4_K_M.gguf
sha256: 180469f9de3e1b5a77b7cf316899dbe4782bd5e6d4f161fb18ea95aa612e6926
uri: huggingface://mradermacher/wraith-8b-i1-GGUF/wraith-8b.i1-Q4_K_M.gguf
- !!merge <<: *qwen25
name: "pokee_research_7b"
urls:
- https://huggingface.co/Mungert/pokee_research_7b-GGUF
description: |
**Model Name:** Qwen2.5-7B-Instruct
**Base Model:** Qwen/Qwen2.5-7B
**Model Type:** Instruction-tuned large language model (7.61B parameters)
**License:** Apache 2.0
**Description:**
Qwen2.5-7B-Instruct is a powerful, instruction-following language model designed for advanced reasoning, coding, and multi-turn dialogue. Built on the Qwen2.5 architecture, it delivers state-of-the-art performance in understanding complex prompts, generating long-form text (up to 8K tokens), and handling structured outputs like JSON. It supports multilingual communication (29+ languages), including English, Chinese, and European languages, and excels in long-context tasks with support for up to 131,072 tokens.
Ideal for research, creative writing, coding assistance, and agent-based workflows, this model is optimized for real-world applications requiring robustness, accuracy, and scalability.
**Key Features:**
- 7.61 billion parameters
- Context length: 131K tokens (supports long-context via YaRN)
- Strong performance in math, coding, and factual reasoning
- Fine-tuned for instruction following and chat interactions
- Deployable with Hugging Face Transformers, vLLM, and llama.cpp
**Use Case:**
Perfect for developers, researchers, and enterprises building intelligent assistants, autonomous agents, or content generation systems.
**Citation:**
```bibtex
@misc{qwen2.5,
title = {Qwen2.5: A Party of Foundation Models},
url = {https://qwenlm.github.io/blog/qwen2.5/},
author = {Qwen Team},
month = {September},
year = {2024}
}
```
overrides:
parameters:
model: pokee_research_7b-q4_k_m.gguf
files:
- filename: pokee_research_7b-q4_k_m.gguf
sha256: 670706711d82fcdbae951fda084f77c9c479edf3eb5d8458d1cfddd46cf4b767
uri: huggingface://Mungert/pokee_research_7b-GGUF/pokee_research_7b-q4_k_m.gguf
- !!merge <<: *qwen3
name: "deepkat-32b-i1"
urls:
- https://huggingface.co/mradermacher/DeepKAT-32B-i1-GGUF
description: |
**DeepKAT-32B** is a high-performance, open-source coding agent built by merging two leading RL-tuned models—**DeepSWE-Preview** and **KAT-Dev**—on the **Qwen3-32B** base architecture using Arcee MergeKits TIES method. This 32B parameter model excels in complex software engineering tasks, including code generation, bug fixing, refactoring, and autonomous agent workflows with tool use.
Key strengths:
- Achieves ~62% SWE-Bench Verified score (on par with top open-source models).
- Strong performance in multi-file reasoning, multi-turn planning, and sparse reward environments.
- Optimized for agentic behavior with step-by-step reasoning and tool chaining.
Ideal for developers, AI researchers, and teams building intelligent code assistants or autonomous software agents.
> 🔗 **Base Model**: Qwen/Qwen3-32B
> 🛠️ **Built With**: MergeKit (TIES), RL-finetuned components
> 📊 **Benchmarks**: SWE-Bench Verified: ~62%, HumanEval Pass@1: ~85%
*Note: The model is a merge of two RL-tuned models and not a direct training from scratch.*
overrides:
parameters:
model: mradermacher/DeepKAT-32B-i1-GGUF
- !!merge <<: *granite4
name: "ibm-granite.granite-4.0-1b"
urls:
- https://huggingface.co/DevQuasar/ibm-granite.granite-4.0-1b-GGUF
description: |
### **Granite-4.0-1B**
*By IBM | Apache 2.0 License*
**Overview:**
Granite-4.0-1B is a lightweight, instruction-tuned language model designed for efficient on-device and research use. Built on a decoder-only dense transformer architecture, it delivers strong performance in instruction following, code generation, tool calling, and multilingual tasks—making it ideal for applications requiring low latency and minimal resource usage.
**Key Features:**
- **Size:** 1.6 billion parameters (1B Dense), optimized for efficiency.
- **Capabilities:**
- Text generation, summarization, question answering
- Code completion and function calling (e.g., API integration)
- Multilingual support (English, Spanish, French, German, Japanese, Chinese, Arabic, Korean, Portuguese, Italian, Dutch, Czech)
- Robust safety and alignment via instruction tuning and reinforcement learning
- **Architecture:** Uses GQA (Grouped Query Attention), SwiGLU activation, RMSNorm, shared input/output embeddings, and RoPE position embeddings.
- **Context Length:** Up to 128K tokens — suitable for long-form content and complex reasoning.
- **Training:** Finetuned from *Granite-4.0-1B-Base* using open-source datasets, synthetic data, and human-curated instruction pairs.
**Performance Highlights (1B Dense):**
- **MMLU (5-shot):** 59.39
- **HumanEval (pass@1):** 74
- **IFEval (Alignment):** 80.82
- **GSM8K (8-shot):** 76.35
- **SALAD-Bench (Safety):** 93.44
**Use Cases:**
- On-device AI applications
- Research and prototyping
- Fine-tuning for domain-specific tasks
- Low-resource environments with high performance expectations
**Resources:**
- [Hugging Face Model](https://huggingface.co/ibm-granite/granite-4.0-1b)
- [Granite Docs](https://www.ibm.com/granite/docs/)
- [GitHub Repository](https://github.com/ibm-granite/granite-4.0-nano-language-models)
> *“Make knowledge free for everyone.” IBM Granite Team*
overrides:
parameters:
model: ibm-granite.granite-4.0-1b.Q4_K_M.gguf
files:
- filename: ibm-granite.granite-4.0-1b.Q4_K_M.gguf
sha256: 0e0ef42486b7f1f95dfe33af2e696df1149253e500c48f3fb8db0125afa2922c
uri: huggingface://DevQuasar/ibm-granite.granite-4.0-1b-GGUF/ibm-granite.granite-4.0-1b.Q4_K_M.gguf
- !!merge <<: *qwen3
name: "apollo-astralis-4b-i1"
urls:
- https://huggingface.co/mradermacher/apollo-astralis-4b-i1-GGUF
description: |
**Apollo-Astralis V1 4B**
*A warm, enthusiastic, and empathetic reasoning model built on Qwen3-4B-Thinking*
**Overview**
Apollo-Astralis V1 4B is a 4-billion-parameter conversational AI designed for collaborative, emotionally intelligent problem-solving. Developed by VANTA Research, it combines rigorous logical reasoning with a vibrant, supportive communication style—making it ideal for creative brainstorming, educational support, and personal development.
**Key Features**
- 🤔 **Explicit Reasoning**: Uses `</tool_call>` tags to break down thought processes step by step
- 💬 **Warm & Enthusiastic Tone**: Celebrates achievements with energy and empathy
- 🤝 **Collaborative Style**: Engages users with "we" language and clarifying questions
- 🔍 **High Accuracy**: Achieves 100% in enthusiasm detection and 90% in empathy recognition
- 🎯 **Fine-Tuned for Real-World Use**: Trained with LoRA on a dataset emphasizing emotional intelligence and consistency
**Base Model**
Built on **Qwen3-4B-Thinking** and enhanced with lightweight LoRA fine-tuning (33M trainable parameters).
Available in both full and quantized (GGUF) formats via Hugging Face and Ollama.
**Use Cases**
- Personal coaching & motivation
- Creative ideation & project planning
- Educational tutoring with emotional support
- Mental wellness conversations (complementary, not替代)
**License**
Apache 2.0 — open for research, commercial, and personal use.
**Try It**
👉 [Hugging Face Page](https://huggingface.co/VANTA-Research/apollo-astralis-v1-4b)
👉 [Ollama](https://ollama.com/vanta-research/apollo-astralis-v1-4b)
*Developed by VANTA Research — where reasoning meets warmth.*
overrides:
parameters:
model: apollo-astralis-4b.i1-Q4_K_M.gguf
files:
- filename: apollo-astralis-4b.i1-Q4_K_M.gguf
sha256: 94e1d371420b03710fc7de030c1c06e75a356d9388210a134ee2adb4792a2626
uri: huggingface://mradermacher/apollo-astralis-4b-i1-GGUF/apollo-astralis-4b.i1-Q4_K_M.gguf
- !!merge <<: *qwen3
name: "qwen3-vlto-32b-instruct-i1"
urls:
- https://huggingface.co/mradermacher/Qwen3-VLTO-32B-Instruct-i1-GGUF
description: |
**Model Name:** Qwen3-VL-32B-Instruct (Text-Only Variant: Qwen3-VLTO-32B-Instruct)
**Base Model:** Qwen/Qwen3-VL-32B-Instruct
**Repository:** [mradermacher/Qwen3-VLTO-32B-Instruct-i1-GGUF](https://huggingface.co/mradermacher/Qwen3-VLTO-32B-Instruct-i1-GGUF)
**Type:** Large Language Model (LLM) Text-Only (Vision-Language model stripped of vision components)
**Architecture:** Qwen3-VL, adapted for pure text generation
**Size:** 32 billion parameters
**License:** Apache 2.0
**Framework:** Hugging Face Transformers
---
### 🔍 **Description**
This is a **text-only variant** of the powerful **Qwen3-VL-32B-Instruct** multimodal model, stripped of its vision components to function as a high-performance pure language model. The model retains the full text understanding and generation capabilities of its parent — including strong reasoning, long-context handling (up to 32K+ tokens), and advanced multimodal training-derived coherence — while being optimized for text-only tasks.
It was created by loading the weights from the full Qwen3-VL-32B-Instruct model into a text-only Qwen3 architecture, preserving all linguistic and reasoning strengths without the need for image input.
Perfect for applications requiring deep reasoning, long-form content generation, code synthesis, and dialogue — with all the benefits of the Qwen3 series, now in a lightweight, text-focused form.
---
### 📌 Key Features
- ✅ **High-Performance Text Generation** Built on top of the state-of-the-art Qwen3-VL architecture
- ✅ **Extended Context Length** Supports up to 32,768 tokens (ideal for long documents and complex tasks)
- ✅ **Strong Reasoning & Planning** Excels at logic, math, coding, and multi-step reasoning
- ✅ **Optimized for GGUF Format** Available in multiple quantized versions (IQ3_M, Q2_K, etc.) for efficient inference on consumer hardware
- ✅ **Free to Use & Modify** Apache 2.0 license
---
### 📦 Use Case Suggestions
- Long-form writing, summarization, and editing
- Code generation and debugging
- AI agents and task automation
- High-quality chat and dialogue systems
- Research and experimentation with large-scale LLMs on local devices
---
### 📚 References
- Original Model: [Qwen/Qwen3-VL-32B-Instruct](https://huggingface.co/Qwen/Qwen3-VL-32B-Instruct)
- Technical Report: [Qwen3 Technical Report (arXiv)](https://arxiv.org/abs/2505.09388)
- Quantization by: [mradermacher](https://huggingface.co/mradermacher)
> ✅ **Note**: The model shown here is **not the original vision-language model** — it's a **text-only conversion** of the Qwen3-VL-32B-Instruct model, ideal for pure language tasks.
overrides:
parameters:
model: Qwen3-VLTO-32B-Instruct.i1-Q4_K_S.gguf
files:
- filename: Qwen3-VLTO-32B-Instruct.i1-Q4_K_S.gguf
sha256: 789d55249614cd1acee1a23278133cd56ca898472259fa2261f77d65ed7f8367
uri: huggingface://mradermacher/Qwen3-VLTO-32B-Instruct-i1-GGUF/Qwen3-VLTO-32B-Instruct.i1-Q4_K_S.gguf
- !!merge <<: *qwen3
name: "qwen3-vlto-32b-thinking"
urls:
- https://huggingface.co/mradermacher/Qwen3-VLTO-32B-Thinking-GGUF
description: |
**Model Name:** Qwen3-VLTO-32B-Thinking
**Model Type:** Large Language Model (Text-Only)
**Base Model:** Qwen/Qwen3-VL-32B-Thinking (vanilla Qwen3-VL-32B with vision components removed)
**Architecture:** Transformer-based, 32-billion parameter model optimized for reasoning and complex text generation.
### Description:
Qwen3-VLTO-32B-Thinking is a pure text-only variant of the Qwen3-VL-32B-Thinking model, stripped of its vision capabilities while preserving the full reasoning and language understanding power. It is derived by transferring the weights from the vision-language model into a text-only transformer architecture, maintaining the same high-quality behavior for tasks such as logical reasoning, code generation, and dialogue.
This model is ideal for applications requiring deep linguistic reasoning and long-context understanding without image input. It supports advanced multimodal reasoning capabilities *in text form*—perfect for research, chatbots, and content generation.
### Key Features:
- ✅ 32B parameters, high reasoning capability
- ✅ No vision components — fully text-only
- ✅ Trained for complex thinking and step-by-step reasoning
- ✅ Compatible with Hugging Face Transformers and GGUF inference tools
- ✅ Available in multiple quantization levels (Q2_K to Q8_0) for efficient deployment
### Use Case:
Ideal for advanced text generation, logical inference, coding, and conversational AI where vision is not needed.
> 🔗 **Base Model**: [Qwen/Qwen3-VL-32B-Thinking](https://huggingface.co/Qwen/Qwen3-VL-32B-Thinking)
> 📦 **Quantized Versions**: Available via [mradermacher/Qwen3-VLTO-32B-Thinking-GGUF](https://huggingface.co/mradermacher/Qwen3-VLTO-32B-Thinking-GGUF)
---
*Note: The original model was created by Alibabas Qwen team. This variant was adapted by qingy2024 and quantized by mradermacher.*
overrides:
parameters:
model: Qwen3-VLTO-32B-Thinking.Q4_K_M.gguf
files:
- filename: Qwen3-VLTO-32B-Thinking.Q4_K_M.gguf
sha256: d88b75df7c40455dfa21ded23c8b25463a8d58418bb6296304052b7e70e96954
uri: huggingface://mradermacher/Qwen3-VLTO-32B-Thinking-GGUF/Qwen3-VLTO-32B-Thinking.Q4_K_M.gguf
- !!merge <<: *gemma3
name: "gemma-3-the-grand-horror-27b"
urls:
- https://huggingface.co/DavidAU/Gemma-3-The-Grand-Horror-27B-GGUF
description: |
The **Gemma-3-The-Grand-Horror-27B-GGUF** model is a **fine-tuned version** of Google's **Gemma 3 27B** language model, specifically optimized for **extreme horror-themed text generation**. It was trained using the **Unsloth framework** on a custom in-house dataset of horror content, resulting in a model that produces vivid, graphic, and psychologically intense narratives—featuring gore, madness, and disturbing imagery—often even when prompts don't explicitly request horror.
Key characteristics:
- **Base Model**: Gemma 3 27B (original by Google, not the quantized version)
- **Fine-tuned For**: High-intensity horror storytelling, long-form narrative generation, and immersive scene creation
- **Use Case**: Creative writing, horror RP, dark fiction, and experimental storytelling
- **Not Suitable For**: General use, children, sensitive audiences, or content requiring neutral/positive tone
- **Quantization**: Available in GGUF format (e.g., q3k, q4, etc.), making it accessible for local inference on consumer hardware
> ✅ **Note**: The model card you see is for a **quantized, fine-tuned derivative**, not the original. The true base model is **Gemma 3 27B**, available at: https://huggingface.co/google/gemma-3-27b
This model is not for all audiences — it generates content with a consistently dark, unsettling tone. Use responsibly.
overrides:
parameters:
model: Gemma-3-The-Grand-Horror-27B-Q4_k_m.gguf
files:
- filename: Gemma-3-The-Grand-Horror-27B-Q4_k_m.gguf
sha256: 46f0b06b785d19804a1a796bec89a8eeba8a4e2ef21e2ab8dbb8fa2ff0d675b1
uri: huggingface://DavidAU/Gemma-3-The-Grand-Horror-27B-GGUF/Gemma-3-The-Grand-Horror-27B-Q4_k_m.gguf
- !!merge <<: *qwen3
name: "qwen3-nemotron-32b-rlbff-i1"
urls:
- https://huggingface.co/mradermacher/Qwen3-Nemotron-32B-RLBFF-i1-GGUF
description: |
**Model Name:** Qwen3-Nemotron-32B-RLBFF
**Base Model:** Qwen/Qwen3-32B
**Developer:** NVIDIA
**License:** NVIDIA Open Model License
**Description:**
Qwen3-Nemotron-32B-RLBFF is a high-performance, fine-tuned large language model built on the Qwen3-32B foundation. It is specifically optimized to generate high-quality, helpful responses in a default thinking mode through advanced reinforcement learning with binary flexible feedback (RLBFF). Trained on the HelpSteer3 dataset, this model excels in reasoning, planning, coding, and information-seeking tasks while maintaining strong safety and alignment with human preferences.
**Key Performance (as of Sep 2025):**
- **MT-Bench:** 9.50 (near GPT-4-Turbo level)
- **Arena Hard V2:** 55.6%
- **WildBench:** 70.33%
**Architecture & Efficiency:**
- 32 billion parameters, based on the Qwen3 Transformer architecture
- Designed for deployment on NVIDIA GPUs (Ampere, Hopper, Turing)
- Achieves performance comparable to DeepSeek R1 and O3-mini at less than 5% of the inference cost
**Use Case:**
Ideal for applications requiring reliable, thoughtful, and safe responses—such as advanced chatbots, research assistants, and enterprise AI systems.
**Access & Usage:**
Available on Hugging Face with support for Hugging Face Transformers and vLLM.
**Cite:** [Wang et al., 2025 — RLBFF: Binary Flexible Feedback](https://arxiv.org/abs/2509.21319)
👉 *Note: The GGUF version (mradermacher/Qwen3-Nemotron-32B-RLBFF-i1-GGUF) is a user-quantized variant. The original model is available at nvidia/Qwen3-Nemotron-32B-RLBFF.*
overrides:
parameters:
model: Qwen3-Nemotron-32B-RLBFF.i1-Q4_K_M.gguf
files:
- filename: Qwen3-Nemotron-32B-RLBFF.i1-Q4_K_M.gguf
sha256: 000e8c65299fc232d1a832f1cae831ceaa16425eccfb7d01702d73e8bd3eafee
uri: huggingface://mradermacher/Qwen3-Nemotron-32B-RLBFF-i1-GGUF/Qwen3-Nemotron-32B-RLBFF.i1-Q4_K_M.gguf
- !!merge <<: *gptoss
name: "financial-gpt-oss-20b-q8-i1"
urls:
- https://huggingface.co/mradermacher/financial-gpt-oss-20b-q8-i1-GGUF
description: |
### **Financial GPT-OSS 20B (Base Model)**
**Model Type:** Causal Language Model (Fine-tuned for Financial Analysis)
**Architecture:** Mixture of Experts (MoE) 20B parameters, 32 experts (4 active per token)
**Base Model:** `unsloth/gpt-oss-20b-unsloth-bnb-4bit`
**Fine-tuned With:** LoRA (Low-Rank Adaptation) on financial conversation data
**Training Data:** 22,250 financial dialogue pairs covering stocks (AAPL, NVDA, TSLA, etc.), technical analysis, risk assessment, and trading signals
**Context Length:** 131,072 tokens
**Quantization:** Q8_0 GGUF (for efficient inference)
**License:** Apache 2.0
**Key Features:**
- Specialized in financial market analysis: technical indicators (RSI, MACD), risk assessments, trading signals, and price forecasts
- Handles complex financial queries with structured, actionable insights
- Designed for real-time use with low-latency inference (GGUF format)
- Supports S&P 500 stocks and major asset classes across tech, healthcare, energy, and finance sectors
**Use Case:** Ideal for traders, analysts, and developers building financial AI tools. Use with caution—**not financial advice**.
**Citation:**
```bibtex
@misc{financial-gpt-oss-20b-q8,
title={Financial GPT-OSS 20B Q8: Fine-tuned Financial Analysis Model},
author={beenyb},
year={2025},
publisher={Hugging Face Hub},
url={https://huggingface.co/beenyb/financial-gpt-oss-20b-q8}
}
```
overrides:
parameters:
model: financial-gpt-oss-20b-q8.i1-Q4_K_M.gguf
files:
- filename: financial-gpt-oss-20b-q8.i1-Q4_K_M.gguf
sha256: 14586673de2a769f88bd51f88464b9b1f73d3ad986fa878b2e0c1473f1c1fc59
uri: huggingface://mradermacher/financial-gpt-oss-20b-q8-i1-GGUF/financial-gpt-oss-20b-q8.i1-Q4_K_M.gguf
- !!merge <<: *llama3
name: "qwen3-grand-horror-light-1.7b"
urls:
- https://huggingface.co/mradermacher/Qwen3-Grand-Horror-Light-1.7B-GGUF
description: |
**Model Name:** Qwen3-Grand-Horror-Light-1.7B
**Base Model:** qingy2024/Qwen3-VLTO-1.7B-Instruct
**Model Type:** Fine-tuned Language Model (Text Generation)
**Size:** 1.7B parameters
**License:** Apache 2.0
**Language:** English
**Use Case:** Horror storytelling, creative writing, roleplay, scene generation
**Fine-Tuned On:** Custom horror dataset (`DavidAU/horror-nightmare1`)
**Training Method:** Fine-tuned via Unsloth
**Key Features:**
- Specialized in generating atmospheric, intense horror content with elements of madness, gore, and suspense
- Optimized for roleplay and narrative generation with low to medium horror intensity
- Supports high-quality output across multiple quantization levels (Q2_K to Q8_0, f16)
- Designed for use with tools like KoboldCpp, oobabooga/text-generation-webui, and Silly Tavern
- Recommended settings: Temperature 0.41.2, Repetition penalty 1.1, Smoothing factor 1.5 for smoother output
**Note:** This model is a fine-tuned variant of the Qwen3 series, not a quantized version. The original base model is available at [qingy2024/Qwen3-VLTO-1.7B-Instruct](https://huggingface.co/qingy2024/Qwen3-VLTO-1.7B-Instruct) and was further adapted for horror-themed creative writing.
**Ideal For:** Creators, writers, and roleplayers seeking a compact, expressive model for immersive horror storytelling.
overrides:
parameters:
model: Qwen3-Grand-Horror-Light-1.7B.Q4_K_M.gguf
files:
- filename: Qwen3-Grand-Horror-Light-1.7B.Q4_K_M.gguf
sha256: cbbb0c5f6874130a8ae253377fdc7ad25fa2c1e9bb45f1aaad88db853ef985dc
uri: huggingface://mradermacher/Qwen3-Grand-Horror-Light-1.7B-GGUF/Qwen3-Grand-Horror-Light-1.7B.Q4_K_M.gguf

View File

@@ -6,15 +6,20 @@ config_file: |
backend: "llama-cpp"
template:
chat_message: |
<|im_start|>{{ .RoleName }}
{{ if .FunctionCall -}}
{{ else if eq .RoleName "tool" -}}
<|im_start|>{{if eq .RoleName "tool" }}user{{else}}{{ .RoleName }}{{end}}
{{ if eq .RoleName "tool" -}}
<tool_response>
{{ end -}}
{{ if .Content -}}
{{.Content }}
{{ end -}}
{{ if eq .RoleName "tool" -}}
</tool_response>
{{ end -}}
{{ if .FunctionCall -}}
<tool_call>
{{toJson .FunctionCall}}
</tool_call>
{{ end -}}<|im_end|>
function: |
<|im_start|>system