diff --git a/docs/docs/configuration/genai/config.md b/docs/docs/configuration/genai/config.md index e1f79b744..6a004e353 100644 --- a/docs/docs/configuration/genai/config.md +++ b/docs/docs/configuration/genai/config.md @@ -5,7 +5,7 @@ title: Configuring Generative AI ## Configuration -A Generative AI provider can be configured in the global config, which will make the Generative AI features available for use. There are currently 3 native providers available to integrate with Frigate. Other providers that support the OpenAI standard API can also be used. See the OpenAI section below. +A Generative AI provider can be configured in the global config, which will make the Generative AI features available for use. There are currently 4 native providers available to integrate with Frigate. Other providers that support the OpenAI standard API can also be used. See the OpenAI section below. To use Generative AI, you must define a single provider at the global level of your Frigate configuration. If the provider you choose requires an API key, you may either directly paste it in your configuration, or store it in an environment variable prefixed with `FRIGATE_`. @@ -77,8 +77,46 @@ genai: provider: ollama base_url: http://localhost:11434 model: qwen3-vl:4b + provider_options: # other Ollama client options can be defined + keep_alive: -1 + options: + num_ctx: 8192 # make sure the context matches other services that are using ollama ``` +## llama.cpp + +[llama.cpp](https://github.com/ggml-org/llama.cpp) is a C++ implementation of LLaMA that provides a high-performance inference server. Using llama.cpp directly gives you access to all native llama.cpp options and parameters. + +:::warning + +Using llama.cpp on CPU is not recommended, high inference times make using Generative AI impractical. + +::: + +It is highly recommended to host the llama.cpp server on a machine with a discrete graphics card, or on an Apple silicon Mac for best performance. + +### Supported Models + +You must use a vision capable model with Frigate. The llama.cpp server supports various vision models in GGUF format. + +### Configuration + +```yaml +genai: + provider: llamacpp + base_url: http://localhost:8080 + model: your-model-name + provider_options: + temperature: 0.7 + repeat_penalty: 1.05 + top_p: 0.8 + top_k: 40 + min_p: 0.05 + seed: -1 +``` + +All llama.cpp native options can be passed through `provider_options`, including `temperature`, `top_k`, `top_p`, `min_p`, `repeat_penalty`, `repeat_last_n`, `seed`, `grammar`, and more. See the [llama.cpp server documentation](https://github.com/ggml-org/llama.cpp/blob/master/tools/server/README.md) for a complete list of available parameters. + ## Google Gemini Google Gemini has a [free tier](https://ai.google.dev/pricing) for the API, however the limits may not be sufficient for standard Frigate usage. Choose a plan appropriate for your installation. @@ -185,4 +223,4 @@ genai: base_url: https://instance.cognitiveservices.azure.com/openai/responses?api-version=2025-04-01-preview model: gpt-5-mini api_key: "{FRIGATE_OPENAI_API_KEY}" -``` +``` \ No newline at end of file diff --git a/docs/docs/configuration/genai/objects.md b/docs/docs/configuration/genai/objects.md index e3ae31393..c878f5ec8 100644 --- a/docs/docs/configuration/genai/objects.md +++ b/docs/docs/configuration/genai/objects.md @@ -11,7 +11,7 @@ By default, descriptions will be generated for all tracked objects and all zones Optionally, you can generate the description using a snapshot (if enabled) by setting `use_snapshot` to `True`. By default, this is set to `False`, which sends the uncompressed images from the `detect` stream collected over the object's lifetime to the model. Once the object lifecycle ends, only a single compressed and cropped thumbnail is saved with the tracked object. Using a snapshot might be useful when you want to _regenerate_ a tracked object's description as it will provide the AI with a higher-quality image (typically downscaled by the AI itself) than the cropped/compressed thumbnail. Using a snapshot otherwise has a trade-off in that only a single image is sent to your provider, which will limit the model's ability to determine object movement or direction. -Generative AI object descriptions can also be toggled dynamically for a camera via MQTT with the topic `frigate//object_descriptions/set`. See the [MQTT documentation](/integrations/mqtt/#frigatecamera_nameobjectdescriptionsset). +Generative AI object descriptions can also be toggled dynamically for a camera via MQTT with the topic `frigate//object_descriptions/set`. See the [MQTT documentation](/integrations/mqtt#frigatecamera_nameobject_descriptionsset). ## Usage and Best Practices @@ -75,4 +75,4 @@ Many providers also have a public facing chat interface for their models. Downlo - OpenAI - [ChatGPT](https://chatgpt.com) - Gemini - [Google AI Studio](https://aistudio.google.com) -- Ollama - [Open WebUI](https://docs.openwebui.com/) +- Ollama - [Open WebUI](https://docs.openwebui.com/) \ No newline at end of file diff --git a/docs/docs/configuration/genai/review_summaries.md b/docs/docs/configuration/genai/review_summaries.md index df287446c..c6f5e53ec 100644 --- a/docs/docs/configuration/genai/review_summaries.md +++ b/docs/docs/configuration/genai/review_summaries.md @@ -7,7 +7,7 @@ Generative AI can be used to automatically generate structured summaries of revi Requests for a summary are requested automatically to your AI provider for alert review items when the activity has ended, they can also be optionally enabled for detections as well. -Generative AI review summaries can also be toggled dynamically for a [camera via MQTT](/integrations/mqtt/#frigatecamera_namereviewdescriptionsset). +Generative AI review summaries can also be toggled dynamically for a [camera via MQTT](/integrations/mqtt#frigatecamera_namereview_descriptionsset). ## Review Summary Usage and Best Practices diff --git a/frigate/config/camera/genai.py b/frigate/config/camera/genai.py index a4d9199af..3dd596c3b 100644 --- a/frigate/config/camera/genai.py +++ b/frigate/config/camera/genai.py @@ -14,6 +14,7 @@ class GenAIProviderEnum(str, Enum): azure_openai = "azure_openai" gemini = "gemini" ollama = "ollama" + llamacpp = "llamacpp" class GenAIConfig(FrigateBaseModel): diff --git a/frigate/genai/llama_cpp.py b/frigate/genai/llama_cpp.py new file mode 100644 index 000000000..45e364bc0 --- /dev/null +++ b/frigate/genai/llama_cpp.py @@ -0,0 +1,101 @@ +"""llama.cpp Provider for Frigate AI.""" + +import base64 +import logging +from typing import Any, Optional + +import requests + +from frigate.config import GenAIProviderEnum +from frigate.genai import GenAIClient, register_genai_provider + +logger = logging.getLogger(__name__) + + +@register_genai_provider(GenAIProviderEnum.llamacpp) +class LlamaCppClient(GenAIClient): + """Generative AI client for Frigate using llama.cpp server.""" + + LOCAL_OPTIMIZED_OPTIONS = { + "temperature": 0.7, + "repeat_penalty": 1.05, + "top_p": 0.8, + } + + provider: str # base_url + provider_options: dict[str, Any] + + def _init_provider(self): + """Initialize the client.""" + self.provider_options = { + **self.LOCAL_OPTIMIZED_OPTIONS, + **self.genai_config.provider_options, + } + return ( + self.genai_config.base_url.rstrip("/") + if self.genai_config.base_url + else None + ) + + def _send(self, prompt: str, images: list[bytes]) -> Optional[str]: + """Submit a request to llama.cpp server.""" + if self.provider is None: + logger.warning( + "llama.cpp provider has not been initialized, a description will not be generated. Check your llama.cpp configuration." + ) + return None + + try: + content = [] + for image in images: + encoded_image = base64.b64encode(image).decode("utf-8") + content.append( + { + "type": "image_url", + "image_url": { + "url": f"data:image/jpeg;base64,{encoded_image}", + }, + } + ) + content.append( + { + "type": "text", + "text": prompt, + } + ) + + # Build request payload with llama.cpp native options + payload = { + "messages": [ + { + "role": "user", + "content": content, + }, + ], + **self.provider_options, + } + + response = requests.post( + f"{self.provider}/v1/chat/completions", + json=payload, + timeout=self.timeout, + ) + response.raise_for_status() + result = response.json() + + if ( + result is not None + and "choices" in result + and len(result["choices"]) > 0 + ): + choice = result["choices"][0] + if "message" in choice and "content" in choice["message"]: + return choice["message"]["content"].strip() + return None + except Exception as e: + logger.warning("llama.cpp returned an error: %s", str(e)) + return None + + def get_context_size(self) -> int: + """Get the context window size for llama.cpp.""" + return self.genai_config.provider_options.get("context_size", 4096)