diff --git a/docs/content/features/embeddings.md b/docs/content/features/embeddings.md index 192c4c87b..c6ee1ee31 100644 --- a/docs/content/features/embeddings.md +++ b/docs/content/features/embeddings.md @@ -1,4 +1,3 @@ - +++ disableToc = false title = "🧠 Embeddings" @@ -14,6 +13,28 @@ For the API documentation you can refer to the OpenAI docs: https://platform.ope The embedding endpoint is compatible with `llama.cpp` models, `bert.cpp` models and sentence-transformers models available in huggingface. +## Using Gallery Models + +LocalAI provides a model gallery with pre-configured embedding models. To use a gallery model: + +1. Ensure the model is available in the gallery (check [Model Gallery]({{%relref "features/model-gallery" %}})) +2. Use the model name directly in your API calls + +Example gallery models: +- `qwen3-embedding-4b` - Qwen3 Embedding 4B model +- `qwen3-embedding-8b` - Qwen3 Embedding 8B model +- `qwen3-embedding-0.6b` - Qwen3 Embedding 0.6B model + +### Example: Using Qwen3-Embedding-4B from Gallery + +```bash +curl http://localhost:8080/embeddings -X POST -H "Content-Type: application/json" -d '{ + "input": "My text to embed", + "model": "qwen3-embedding-4b", + "dimensions": 2560 +}' +``` + ## Manual Setup Create a `YAML` config file in the `models` directory. Specify the `backend` and the model file. @@ -73,4 +94,78 @@ curl http://localhost:8080/embeddings -X POST -H "Content-Type: application/json ## 💡 Examples -- Example that uses LLamaIndex and LocalAI as embedding: [here](https://github.com/mudler/LocalAI-examples/tree/main/query_data). \ No newline at end of file +- Example that uses LLamaIndex and LocalAI as embedding: [here](https://github.com/mudler/LocalAI-examples/tree/main/query_data). + +## ⚠️ Common Issues and Troubleshooting + +### Issue: Embedding model not returning correct results + +**Symptoms:** +- Model returns empty or incorrect embeddings +- API returns errors when calling embedding endpoint + +**Common Causes:** + +1. **Incorrect model filename**: Ensure you're using the correct filename from the gallery or your model file location. + - Gallery models use specific filenames (e.g., `Qwen3-Embedding-4B-Q4_K_M.gguf`) + - Check the [Model Gallery]({{%relref "features/model-gallery" %}}) for correct filenames + +2. **Context size mismatch**: Ensure your `context_size` setting doesn't exceed the model's maximum context length. + - Qwen3-Embedding-4B: max 32k (32768) context + - Qwen3-Embedding-8B: max 32k (32768) context + - Qwen3-Embedding-0.6B: max 32k (32768) context + +3. **Missing `embeddings: true` flag**: The model configuration must have `embeddings: true` set. + +**Correct Configuration Example:** + +```yaml +name: qwen3-embedding-4b +backend: llama-cpp +embeddings: true +context_size: 32768 +parameters: + model: Qwen3-Embedding-4B-Q4_K_M.gguf +``` + +### Issue: Dimension mismatch + +**Symptoms:** +- Returned embedding dimensions don't match expected dimensions + +**Solution:** +- Use the `dimensions` parameter in your API request to specify the output dimension +- Qwen3-Embedding models support dimensions from 32 to 2560 (4B) or 4096 (8B) + +```bash +curl http://localhost:8080/embeddings -X POST -H "Content-Type: application/json" -d '{ + "input": "My text", + "model": "qwen3-embedding-4b", + "dimensions": 1024 +}' +``` + +### Issue: Model not found + +**Symptoms:** +- API returns 404 or "model not found" error + +**Solution:** +- Ensure the model is properly configured in the models directory +- Check that the model name in your API request matches the `name` field in the configuration +- For gallery models, ensure the gallery is properly loaded + +## Qwen3 Embedding Models Specifics + +The Qwen3 Embedding series models have these characteristics: + +| Model | Parameters | Max Context | Max Dimensions | Supported Languages | +|-------|------------|-------------|----------------|---------------------| +| qwen3-embedding-0.6b | 0.6B | 32k | 1024 | 100+ | +| qwen3-embedding-4b | 4B | 32k | 2560 | 100+ | +| qwen3-embedding-8b | 8B | 32k | 4096 | 100+ | + +All models support: +- User-defined output dimensions (32 to max dimensions) +- Multilingual text embedding (100+ languages) +- Instruction-tuned embedding with custom instructions