docs: Add troubleshooting guide for embedding models (#9064) - Add section on using gallery models for embeddings - Document common issues with embedding model configuration - Add troubleshooting guide for Qwen3 embedding models - Include correct configuration examples for Qwen3-Embedding-4B - Document context size limits and dimension parameters - Add table of Qwen3 embedding model specifications Fixes #9064 Signed-off-by: localai-bot <localai-bot@localai.io> Co-authored-by: localai-bot <localai-bot@localai.io>
5.7 KiB
+++ disableToc = false title = "🧠 Embeddings" weight = 13 url = "/features/embeddings/" +++
LocalAI supports generating embeddings for text or list of tokens.
For the API documentation you can refer to the OpenAI docs: https://platform.openai.com/docs/api-reference/embeddings
Model compatibility
The embedding endpoint is compatible with llama.cpp models, bert.cpp models and sentence-transformers models available in huggingface.
Using Gallery Models
LocalAI provides a model gallery with pre-configured embedding models. To use a gallery model:
- Ensure the model is available in the gallery (check [Model Gallery]({{%relref "features/model-gallery" %}}))
- Use the model name directly in your API calls
Example gallery models:
qwen3-embedding-4b- Qwen3 Embedding 4B modelqwen3-embedding-8b- Qwen3 Embedding 8B modelqwen3-embedding-0.6b- Qwen3 Embedding 0.6B model
Example: Using Qwen3-Embedding-4B from Gallery
curl http://localhost:8080/embeddings -X POST -H "Content-Type: application/json" -d '{
"input": "My text to embed",
"model": "qwen3-embedding-4b",
"dimensions": 2560
}'
Manual Setup
Create a YAML config file in the models directory. Specify the backend and the model file.
name: text-embedding-ada-002 # The model name used in the API
parameters:
model: <model_file>
backend: "<backend>"
embeddings: true
Huggingface embeddings
To use sentence-transformers and models in huggingface you can use the sentencetransformers embedding backend.
name: text-embedding-ada-002
backend: sentencetransformers
embeddings: true
parameters:
model: all-MiniLM-L6-v2
The sentencetransformers backend uses Python sentence-transformers. For a list of all pre-trained models available see here: https://github.com/UKPLab/sentence-transformers#pre-trained-models
{{% notice note %}}
- The
sentencetransformersbackend is an optional backend of LocalAI and uses Python. If you are runningLocalAIfrom the containers you are good to go and should be already configured for use. - For local execution, you also have to specify the extra backend in the
EXTERNAL_GRPC_BACKENDSenvironment variable.- Example:
EXTERNAL_GRPC_BACKENDS="sentencetransformers:/path/to/LocalAI/backend/python/sentencetransformers/sentencetransformers.py"
- Example:
- The
sentencetransformersbackend does support only embeddings of text, and not of tokens. If you need to embed tokens you can use thebertbackend orllama.cpp. - No models are required to be downloaded before using the
sentencetransformersbackend. The models will be downloaded automatically the first time the API is used.
{{% /notice %}}
Llama.cpp embeddings
Embeddings with llama.cpp are supported with the llama-cpp backend, it needs to be enabled with embeddings set to true.
name: my-awesome-model
backend: llama-cpp
embeddings: true
parameters:
model: ggml-file.bin
Then you can use the API to generate embeddings:
curl http://localhost:8080/embeddings -X POST -H "Content-Type: application/json" -d '{
"input": "My text",
"model": "my-awesome-model"
}' | jq "."
💡 Examples
- Example that uses LLamaIndex and LocalAI as embedding: here.
⚠️ Common Issues and Troubleshooting
Issue: Embedding model not returning correct results
Symptoms:
- Model returns empty or incorrect embeddings
- API returns errors when calling embedding endpoint
Common Causes:
-
Incorrect model filename: Ensure you're using the correct filename from the gallery or your model file location.
- Gallery models use specific filenames (e.g.,
Qwen3-Embedding-4B-Q4_K_M.gguf) - Check the [Model Gallery]({{%relref "features/model-gallery" %}}) for correct filenames
- Gallery models use specific filenames (e.g.,
-
Context size mismatch: Ensure your
context_sizesetting doesn't exceed the model's maximum context length.- Qwen3-Embedding-4B: max 32k (32768) context
- Qwen3-Embedding-8B: max 32k (32768) context
- Qwen3-Embedding-0.6B: max 32k (32768) context
-
Missing
embeddings: trueflag: The model configuration must haveembeddings: trueset.
Correct Configuration Example:
name: qwen3-embedding-4b
backend: llama-cpp
embeddings: true
context_size: 32768
parameters:
model: Qwen3-Embedding-4B-Q4_K_M.gguf
Issue: Dimension mismatch
Symptoms:
- Returned embedding dimensions don't match expected dimensions
Solution:
- Use the
dimensionsparameter in your API request to specify the output dimension - Qwen3-Embedding models support dimensions from 32 to 2560 (4B) or 4096 (8B)
curl http://localhost:8080/embeddings -X POST -H "Content-Type: application/json" -d '{
"input": "My text",
"model": "qwen3-embedding-4b",
"dimensions": 1024
}'
Issue: Model not found
Symptoms:
- API returns 404 or "model not found" error
Solution:
- Ensure the model is properly configured in the models directory
- Check that the model name in your API request matches the
namefield in the configuration - For gallery models, ensure the gallery is properly loaded
Qwen3 Embedding Models Specifics
The Qwen3 Embedding series models have these characteristics:
| Model | Parameters | Max Context | Max Dimensions | Supported Languages |
|---|---|---|---|---|
| qwen3-embedding-0.6b | 0.6B | 32k | 1024 | 100+ |
| qwen3-embedding-4b | 4B | 32k | 2560 | 100+ |
| qwen3-embedding-8b | 8B | 32k | 4096 | 100+ |
All models support:
- User-defined output dimensions (32 to max dimensions)
- Multilingual text embedding (100+ languages)
- Instruction-tuned embedding with custom instructions