mirror of
https://github.com/mudler/LocalAI.git
synced 2026-04-16 12:59:33 -04:00
chore: add embeddingemma
Signed-off-by: Ettore Di Giacinto <mudler@localai.io>
This commit is contained in:
111
.agents/adding-gallery-models.md
Normal file
111
.agents/adding-gallery-models.md
Normal file
@@ -0,0 +1,111 @@
|
||||
# Adding GGUF Models from HuggingFace to the Gallery
|
||||
|
||||
When adding a GGUF model from HuggingFace to the LocalAI model gallery, follow this guide.
|
||||
|
||||
## Gallery file
|
||||
|
||||
All models are defined in `gallery/index.yaml`. Find the appropriate section (embedding models near other embeddings, chat models near similar chat models) and add a new entry.
|
||||
|
||||
## Getting the SHA256
|
||||
|
||||
GGUF files on HuggingFace expose their SHA256 via the `x-linked-etag` HTTP header. Fetch it with:
|
||||
|
||||
```bash
|
||||
curl -sI "https://huggingface.co/<org>/<repo>/resolve/main/<filename>.gguf" | grep -i x-linked-etag
|
||||
```
|
||||
|
||||
The value (without quotes) is the SHA256 hash. Example:
|
||||
|
||||
```bash
|
||||
curl -sI "https://huggingface.co/ggml-org/embeddinggemma-300m-qat-q8_0-GGUF/resolve/main/embeddinggemma-300m-qat-Q8_0.gguf" | grep -i x-linked-etag
|
||||
# x-linked-etag: "6fa0c02a9c302be6f977521d399b4de3a46310a4f2621ee0063747881b673f67"
|
||||
```
|
||||
|
||||
**Important**: Pay attention to exact filename casing — HuggingFace filenames are case-sensitive (e.g., `Q8_0` vs `q8_0`). Check the repo's file listing to get the exact name.
|
||||
|
||||
## Entry format — Embedding models
|
||||
|
||||
Embedding models use `gallery/virtual.yaml` as the base config and set `embeddings: true`:
|
||||
|
||||
```yaml
|
||||
- name: "model-name"
|
||||
url: github:mudler/LocalAI/gallery/virtual.yaml@master
|
||||
urls:
|
||||
- https://huggingface.co/<original-model-org>/<original-model-name>
|
||||
- https://huggingface.co/<gguf-org>/<gguf-repo-name>
|
||||
description: |
|
||||
Short description of the model, its size, and capabilities.
|
||||
tags:
|
||||
- embeddings
|
||||
overrides:
|
||||
backend: llama-cpp
|
||||
embeddings: true
|
||||
parameters:
|
||||
model: <filename>.gguf
|
||||
files:
|
||||
- filename: <filename>.gguf
|
||||
uri: huggingface://<gguf-org>/<gguf-repo-name>/<filename>.gguf
|
||||
sha256: <sha256-hash>
|
||||
```
|
||||
|
||||
## Entry format — Chat/LLM models
|
||||
|
||||
Chat models typically reference a template config (e.g., `gallery/gemma.yaml`, `gallery/chatml.yaml`) that defines the prompt format. Use YAML anchors (`&name` / `*name`) if adding multiple quantization variants of the same model:
|
||||
|
||||
```yaml
|
||||
- &model-anchor
|
||||
url: "github:mudler/LocalAI/gallery/<template>.yaml@master"
|
||||
name: "model-name"
|
||||
icon: https://example.com/icon.png
|
||||
license: <license>
|
||||
urls:
|
||||
- https://huggingface.co/<org>/<model>
|
||||
- https://huggingface.co/<gguf-org>/<gguf-repo>
|
||||
description: |
|
||||
Model description.
|
||||
tags:
|
||||
- llm
|
||||
- gguf
|
||||
- gpu
|
||||
- cpu
|
||||
overrides:
|
||||
parameters:
|
||||
model: <filename>-Q4_K_M.gguf
|
||||
files:
|
||||
- filename: <filename>-Q4_K_M.gguf
|
||||
sha256: <sha256>
|
||||
uri: huggingface://<gguf-org>/<gguf-repo>/<filename>-Q4_K_M.gguf
|
||||
```
|
||||
|
||||
To add a variant (e.g., different quantization), use YAML merge:
|
||||
|
||||
```yaml
|
||||
- !!merge <<: *model-anchor
|
||||
name: "model-name-q8"
|
||||
overrides:
|
||||
parameters:
|
||||
model: <filename>-Q8_0.gguf
|
||||
files:
|
||||
- filename: <filename>-Q8_0.gguf
|
||||
sha256: <sha256>
|
||||
uri: huggingface://<gguf-org>/<gguf-repo>/<filename>-Q8_0.gguf
|
||||
```
|
||||
|
||||
## Available template configs
|
||||
|
||||
Look at existing `.yaml` files in `gallery/` to find the right prompt template for your model architecture:
|
||||
|
||||
- `gemma.yaml` — Gemma-family models (gemma, embeddinggemma, etc.)
|
||||
- `chatml.yaml` — ChatML format (many Mistral/OpenHermes models)
|
||||
- `deepseek.yaml` — DeepSeek models
|
||||
- `virtual.yaml` — Minimal base (good for embedding models that don't need chat templates)
|
||||
|
||||
## Checklist
|
||||
|
||||
1. **Find the GGUF file** on HuggingFace — note exact filename (case-sensitive)
|
||||
2. **Get the SHA256** using the `curl -sI` + `x-linked-etag` method above
|
||||
3. **Choose the right template** config from `gallery/` based on model architecture
|
||||
4. **Add the entry** to `gallery/index.yaml` near similar models
|
||||
5. **Set `embeddings: true`** if it's an embedding model
|
||||
6. **Include both URLs** — the original model page and the GGUF repo
|
||||
7. **Write a description** — mention model size, capabilities, and quantization type
|
||||
@@ -13,6 +13,7 @@ This file is an index to detailed topic guides in the `.agents/` directory. Read
|
||||
| [.agents/testing-mcp-apps.md](.agents/testing-mcp-apps.md) | Testing MCP Apps (interactive tool UIs) in the React UI |
|
||||
| [.agents/api-endpoints-and-auth.md](.agents/api-endpoints-and-auth.md) | Adding API endpoints, auth middleware, feature permissions, user access control |
|
||||
| [.agents/debugging-backends.md](.agents/debugging-backends.md) | Debugging runtime backend failures, dependency conflicts, rebuilding backends |
|
||||
| [.agents/adding-gallery-models.md](.agents/adding-gallery-models.md) | Adding GGUF models from HuggingFace to the model gallery |
|
||||
|
||||
## Quick Reference
|
||||
|
||||
|
||||
@@ -7789,6 +7789,24 @@
|
||||
- filename: granite-embedding-125m-english-f16.gguf
|
||||
uri: huggingface://bartowski/granite-embedding-125m-english-GGUF/granite-embedding-125m-english-f16.gguf
|
||||
sha256: e2950cf0228514e0e81c6f0701a62a9e4763990ce660b4a3c0329cd6a4acd4b9
|
||||
- name: "embeddinggemma-300m"
|
||||
url: github:mudler/LocalAI/gallery/virtual.yaml@master
|
||||
urls:
|
||||
- https://huggingface.co/google/embeddinggemma-300m
|
||||
- https://huggingface.co/ggml-org/embeddinggemma-300m-qat-q8_0-GGUF
|
||||
description: |
|
||||
EmbeddingGemma 300M is a lightweight, high-quality embedding model from Google, based on the Gemma architecture. It produces 1024-dimensional embeddings optimized for retrieval and semantic similarity tasks. This GGUF version uses QAT (Quantization-Aware Training) Q8_0 quantization for efficient inference.
|
||||
tags:
|
||||
- embeddings
|
||||
overrides:
|
||||
backend: llama-cpp
|
||||
embeddings: true
|
||||
parameters:
|
||||
model: embeddinggemma-300m-qat-Q8_0.gguf
|
||||
files:
|
||||
- filename: embeddinggemma-300m-qat-Q8_0.gguf
|
||||
uri: huggingface://ggml-org/embeddinggemma-300m-qat-q8_0-GGUF/embeddinggemma-300m-qat-Q8_0.gguf
|
||||
sha256: 6fa0c02a9c302be6f977521d399b4de3a46310a4f2621ee0063747881b673f67
|
||||
- name: "moe-girl-1ba-7bt-i1"
|
||||
icon: https://cdn-uploads.huggingface.co/production/uploads/634262af8d8089ebaefd410e/kTXXSSSqpb21rfyOX7FUa.jpeg
|
||||
# chatml
|
||||
|
||||
Reference in New Issue
Block a user