Implement llama.cpp GenAI Provider (#21690)

* Implement llama.cpp GenAI Provider

* Add docs

* Update links

* Fix broken mqtt links

* Fix more broken anchors
This commit is contained in:
Nicolas Mowen
2026-01-18 06:34:30 -07:00
parent 3826d72c2a
commit 20360db2c9
5 changed files with 145 additions and 5 deletions

View File

@@ -5,7 +5,7 @@ title: Configuring Generative AI
## Configuration
A Generative AI provider can be configured in the global config, which will make the Generative AI features available for use. There are currently 3 native providers available to integrate with Frigate. Other providers that support the OpenAI standard API can also be used. See the OpenAI section below.
A Generative AI provider can be configured in the global config, which will make the Generative AI features available for use. There are currently 4 native providers available to integrate with Frigate. Other providers that support the OpenAI standard API can also be used. See the OpenAI section below.
To use Generative AI, you must define a single provider at the global level of your Frigate configuration. If the provider you choose requires an API key, you may either directly paste it in your configuration, or store it in an environment variable prefixed with `FRIGATE_`.
@@ -77,8 +77,46 @@ genai:
provider: ollama
base_url: http://localhost:11434
model: qwen3-vl:4b
provider_options: # other Ollama client options can be defined
keep_alive: -1
options:
num_ctx: 8192 # make sure the context matches other services that are using ollama
```
## llama.cpp
[llama.cpp](https://github.com/ggml-org/llama.cpp) is a C++ implementation of LLaMA that provides a high-performance inference server. Using llama.cpp directly gives you access to all native llama.cpp options and parameters.
:::warning
Using llama.cpp on CPU is not recommended, high inference times make using Generative AI impractical.
:::
It is highly recommended to host the llama.cpp server on a machine with a discrete graphics card, or on an Apple silicon Mac for best performance.
### Supported Models
You must use a vision capable model with Frigate. The llama.cpp server supports various vision models in GGUF format.
### Configuration
```yaml
genai:
provider: llamacpp
base_url: http://localhost:8080
model: your-model-name
provider_options:
temperature: 0.7
repeat_penalty: 1.05
top_p: 0.8
top_k: 40
min_p: 0.05
seed: -1
```
All llama.cpp native options can be passed through `provider_options`, including `temperature`, `top_k`, `top_p`, `min_p`, `repeat_penalty`, `repeat_last_n`, `seed`, `grammar`, and more. See the [llama.cpp server documentation](https://github.com/ggml-org/llama.cpp/blob/master/tools/server/README.md) for a complete list of available parameters.
## Google Gemini
Google Gemini has a [free tier](https://ai.google.dev/pricing) for the API, however the limits may not be sufficient for standard Frigate usage. Choose a plan appropriate for your installation.
@@ -185,4 +223,4 @@ genai:
base_url: https://instance.cognitiveservices.azure.com/openai/responses?api-version=2025-04-01-preview
model: gpt-5-mini
api_key: "{FRIGATE_OPENAI_API_KEY}"
```
```

View File

@@ -11,7 +11,7 @@ By default, descriptions will be generated for all tracked objects and all zones
Optionally, you can generate the description using a snapshot (if enabled) by setting `use_snapshot` to `True`. By default, this is set to `False`, which sends the uncompressed images from the `detect` stream collected over the object's lifetime to the model. Once the object lifecycle ends, only a single compressed and cropped thumbnail is saved with the tracked object. Using a snapshot might be useful when you want to _regenerate_ a tracked object's description as it will provide the AI with a higher-quality image (typically downscaled by the AI itself) than the cropped/compressed thumbnail. Using a snapshot otherwise has a trade-off in that only a single image is sent to your provider, which will limit the model's ability to determine object movement or direction.
Generative AI object descriptions can also be toggled dynamically for a camera via MQTT with the topic `frigate/<camera_name>/object_descriptions/set`. See the [MQTT documentation](/integrations/mqtt/#frigatecamera_nameobjectdescriptionsset).
Generative AI object descriptions can also be toggled dynamically for a camera via MQTT with the topic `frigate/<camera_name>/object_descriptions/set`. See the [MQTT documentation](/integrations/mqtt#frigatecamera_nameobject_descriptionsset).
## Usage and Best Practices
@@ -75,4 +75,4 @@ Many providers also have a public facing chat interface for their models. Downlo
- OpenAI - [ChatGPT](https://chatgpt.com)
- Gemini - [Google AI Studio](https://aistudio.google.com)
- Ollama - [Open WebUI](https://docs.openwebui.com/)
- Ollama - [Open WebUI](https://docs.openwebui.com/)

View File

@@ -7,7 +7,7 @@ Generative AI can be used to automatically generate structured summaries of revi
Requests for a summary are requested automatically to your AI provider for alert review items when the activity has ended, they can also be optionally enabled for detections as well.
Generative AI review summaries can also be toggled dynamically for a [camera via MQTT](/integrations/mqtt/#frigatecamera_namereviewdescriptionsset).
Generative AI review summaries can also be toggled dynamically for a [camera via MQTT](/integrations/mqtt#frigatecamera_namereview_descriptionsset).
## Review Summary Usage and Best Practices

View File

@@ -14,6 +14,7 @@ class GenAIProviderEnum(str, Enum):
azure_openai = "azure_openai"
gemini = "gemini"
ollama = "ollama"
llamacpp = "llamacpp"
class GenAIConfig(FrigateBaseModel):

101
frigate/genai/llama_cpp.py Normal file
View File

@@ -0,0 +1,101 @@
"""llama.cpp Provider for Frigate AI."""
import base64
import logging
from typing import Any, Optional
import requests
from frigate.config import GenAIProviderEnum
from frigate.genai import GenAIClient, register_genai_provider
logger = logging.getLogger(__name__)
@register_genai_provider(GenAIProviderEnum.llamacpp)
class LlamaCppClient(GenAIClient):
"""Generative AI client for Frigate using llama.cpp server."""
LOCAL_OPTIMIZED_OPTIONS = {
"temperature": 0.7,
"repeat_penalty": 1.05,
"top_p": 0.8,
}
provider: str # base_url
provider_options: dict[str, Any]
def _init_provider(self):
"""Initialize the client."""
self.provider_options = {
**self.LOCAL_OPTIMIZED_OPTIONS,
**self.genai_config.provider_options,
}
return (
self.genai_config.base_url.rstrip("/")
if self.genai_config.base_url
else None
)
def _send(self, prompt: str, images: list[bytes]) -> Optional[str]:
"""Submit a request to llama.cpp server."""
if self.provider is None:
logger.warning(
"llama.cpp provider has not been initialized, a description will not be generated. Check your llama.cpp configuration."
)
return None
try:
content = []
for image in images:
encoded_image = base64.b64encode(image).decode("utf-8")
content.append(
{
"type": "image_url",
"image_url": {
"url": f"data:image/jpeg;base64,{encoded_image}",
},
}
)
content.append(
{
"type": "text",
"text": prompt,
}
)
# Build request payload with llama.cpp native options
payload = {
"messages": [
{
"role": "user",
"content": content,
},
],
**self.provider_options,
}
response = requests.post(
f"{self.provider}/v1/chat/completions",
json=payload,
timeout=self.timeout,
)
response.raise_for_status()
result = response.json()
if (
result is not None
and "choices" in result
and len(result["choices"]) > 0
):
choice = result["choices"][0]
if "message" in choice and "content" in choice["message"]:
return choice["message"]["content"].strip()
return None
except Exception as e:
logger.warning("llama.cpp returned an error: %s", str(e))
return None
def get_context_size(self) -> int:
"""Get the context window size for llama.cpp."""
return self.genai_config.provider_options.get("context_size", 4096)