Compare commits

..

42 Commits

Author SHA1 Message Date
Nicolas Mowen
88bad3423b Set model 2026-02-26 15:28:58 -07:00
Nicolas Mowen
f3cda9020b Don't require download check 2026-02-26 14:33:08 -07:00
Nicolas Mowen
0c333ec28a Fix sending images 2026-02-26 14:33:08 -07:00
Nicolas Mowen
de986c7430 undo 2026-02-26 14:33:08 -07:00
Nicolas Mowen
dd2d7aca19 Basic docs 2026-02-26 14:33:08 -07:00
Nicolas Mowen
3f1bf1ae12 Add support for embedding via genai 2026-02-26 14:33:08 -07:00
Nicolas Mowen
d6e8cad32f Add embed API support 2026-02-26 14:33:07 -07:00
Nicolas Mowen
699d5ffa28 Support GenAI for embeddings 2026-02-26 14:32:33 -07:00
Nicolas Mowen
f400e91ede Add a starting state for chat 2026-02-26 08:38:59 -07:00
Nicolas Mowen
3bac4b15ae Add thumbnail images to object results 2026-02-26 08:38:59 -07:00
Nicolas Mowen
b2c424ad73 Add support for markdown tables 2026-02-26 08:38:59 -07:00
Nicolas Mowen
c18846ac62 Fix loading 2026-02-26 08:38:59 -07:00
Nicolas Mowen
b65ae76f0c Cleanup UI bubbles 2026-02-26 08:38:59 -07:00
Nicolas Mowen
5faf5e0d84 Cleanup UI and prompt 2026-02-26 08:38:59 -07:00
Nicolas Mowen
6837b9c89a Cleanup 2026-02-26 08:38:59 -07:00
Nicolas Mowen
f04df4a144 Add sub label to event tool filtering 2026-02-26 08:38:59 -07:00
Nicolas Mowen
e42f70eeec Implement message editing 2026-02-26 08:38:59 -07:00
Nicolas Mowen
e7b2b919d5 Improve default behavior 2026-02-26 08:38:59 -07:00
Nicolas Mowen
c68b7c9f46 Improvements to UI 2026-02-26 08:38:59 -07:00
Nicolas Mowen
8184ec5c8f Add copy button 2026-02-26 08:38:59 -07:00
Nicolas Mowen
ef448a7f7c Fix tool calling 2026-02-26 08:38:58 -07:00
Nicolas Mowen
f841ccdb63 Undo 2026-02-26 08:38:58 -07:00
Nicolas Mowen
4b6228acd9 Full streaming support 2026-02-26 08:38:58 -07:00
Nicolas Mowen
0b8d1ce568 Support streaming 2026-02-26 08:38:58 -07:00
Nicolas Mowen
9ad7a2639f Improve UI handling 2026-02-26 08:38:58 -07:00
Nicolas Mowen
089c2c1018 Add title 2026-02-26 08:38:58 -07:00
Nicolas Mowen
3e97f9e985 Show tool calls separately from message 2026-02-26 08:38:58 -07:00
Nicolas Mowen
eb9f16b4fa More time parsing improvements 2026-02-26 08:38:58 -07:00
Nicolas Mowen
45c6be47d2 Reduce fields in response 2026-02-26 08:38:58 -07:00
Nicolas Mowen
5a6c62a844 Adjust timing format 2026-02-26 08:38:58 -07:00
Nicolas Mowen
f29fbe14ca Improvements 2026-02-26 08:38:58 -07:00
Nicolas Mowen
cc941ab2db Add markdown 2026-02-26 08:38:58 -07:00
Nicolas Mowen
56b3ebe791 processing 2026-02-26 08:38:58 -07:00
Nicolas Mowen
6fdfe22f8c Add chat history 2026-02-26 08:38:58 -07:00
Nicolas Mowen
0cf713985f Add basic chat page with entry 2026-02-26 08:38:58 -07:00
Nicolas Mowen
dc39d2f0ef Set model in llama.cpp config 2026-02-26 08:38:52 -07:00
Nicolas Mowen
e6387dac05 Fix import issues 2026-02-26 08:38:52 -07:00
Nicolas Mowen
c870ebea37 Cleanup 2026-02-26 08:38:52 -07:00
Nicolas Mowen
56a1a0f5e3 Support getting client via manager 2026-02-26 08:38:52 -07:00
Nicolas Mowen
67a245c8ef Convert to roles list 2026-02-26 08:38:52 -07:00
Nicolas Mowen
a072600c94 Add config migration 2026-02-26 08:38:52 -07:00
Nicolas Mowen
b603678b26 GenAI client manager 2026-02-26 08:38:52 -07:00
38 changed files with 3733 additions and 460 deletions

View File

@@ -5,96 +5,72 @@ title: Configuring Generative AI
## Configuration
A Generative AI provider can be configured in the global config, which will make the Generative AI features available for use. There are currently 4 native providers available to integrate with Frigate. Other providers that support the OpenAI standard API can also be used. See the OpenAI-Compatible section below.
A Generative AI provider can be configured in the global config, which will make the Generative AI features available for use. There are currently 4 native providers available to integrate with Frigate. Other providers that support the OpenAI standard API can also be used. See the OpenAI section below.
To use Generative AI, you must define a single provider at the global level of your Frigate configuration. If the provider you choose requires an API key, you may either directly paste it in your configuration, or store it in an environment variable prefixed with `FRIGATE_`.
## Local Providers
Local providers run on your own hardware and keep all data processing private. These require a GPU or dedicated hardware for best performance.
## Ollama
:::warning
Running Generative AI models on CPU is not recommended, as high inference times make using Generative AI impractical.
Using Ollama on CPU is not recommended, high inference times make using Generative AI impractical.
:::
### Recommended Local Models
You must use a vision-capable model with Frigate. The following models are recommended for local deployment:
| Model | Notes |
| ------------- | -------------------------------------------------------------------------------------------------------------------------------------------------------------------- |
| `qwen3-vl` | Strong visual and situational understanding, strong ability to identify smaller objects and interactions with object. |
| `qwen3.5` | Strong situational understanding, but missing DeepStack from qwen3-vl leading to worse performance for identifying objects in people's hand and other small details. |
| `Intern3.5VL` | Relatively fast with good vision comprehension |
| `gemma3` | Slower model with good vision and temporal understanding |
| `qwen2.5-vl` | Fast but capable model with good vision comprehension |
:::info
Each model is available in multiple parameter sizes (3b, 4b, 8b, etc.). Larger sizes are more capable of complex tasks and understanding of situations, but requires more memory and computational resources. It is recommended to try multiple models and experiment to see which performs best.
:::
:::note
You should have at least 8 GB of RAM available (or VRAM if running on GPU) to run the 7B models, 16 GB to run the 13B models, and 24 GB to run the 33B models.
:::
### Model Types: Instruct vs Thinking
Most vision-language models are available as **instruct** models, which are fine-tuned to follow instructions and respond concisely to prompts. However, some models (such as certain Qwen-VL or minigpt variants) offer both **instruct** and **thinking** versions.
- **Instruct models** are always recommended for use with Frigate. These models generate direct, relevant, actionable descriptions that best fit Frigate's object and event summary use case.
- **Reasoning / Thinking models** are fine-tuned for more free-form, open-ended, and speculative outputs, which are typically not concise and may not provide the practical summaries Frigate expects. For this reason, Frigate does **not** recommend or support using thinking models.
Some models are labeled as **hybrid** (capable of both thinking and instruct tasks). In these cases, it is recommended to disable reasoning / thinking, which is generally model specific (see your models documentation).
**Recommendation:**
Always select the `-instruct` or documented instruct/tagged variant of any model you use in your Frigate configuration. If in doubt, refer to your model provider's documentation or model library for guidance on the correct model variant to use.
### llama.cpp
[llama.cpp](https://github.com/ggml-org/llama.cpp) is a C++ implementation of LLaMA that provides a high-performance inference server.
It is highly recommended to host the llama.cpp server on a machine with a discrete graphics card, or on an Apple silicon Mac for best performance.
#### Supported Models
You must use a vision capable model with Frigate. The llama.cpp server supports various vision models in GGUF format.
#### Configuration
All llama.cpp native options can be passed through `provider_options`, including `temperature`, `top_k`, `top_p`, `min_p`, `repeat_penalty`, `repeat_last_n`, `seed`, `grammar`, and more. See the [llama.cpp server documentation](https://github.com/ggml-org/llama.cpp/blob/master/tools/server/README.md) for a complete list of available parameters.
```yaml
genai:
provider: llamacpp
base_url: http://localhost:8080
model: your-model-name
provider_options:
context_size: 16000 # Tell Frigate your context size so it can send the appropriate amount of information.
```
### Ollama
[Ollama](https://ollama.com/) allows you to self-host large language models and keep everything running locally. It is highly recommended to host this server on a machine with an Nvidia graphics card, or on a Apple silicon Mac for best performance.
Most of the 7b parameter 4-bit vision models will fit inside 8GB of VRAM. There is also a [Docker container](https://hub.docker.com/r/ollama/ollama) available.
Parallel requests also come with some caveats. You will need to set `OLLAMA_NUM_PARALLEL=1` and choose a `OLLAMA_MAX_QUEUE` and `OLLAMA_MAX_LOADED_MODELS` values that are appropriate for your hardware and preferences. See the [Ollama documentation](https://docs.ollama.com/faq#how-does-ollama-handle-concurrent-requests).
### Model Types: Instruct vs Thinking
Most vision-language models are available as **instruct** models, which are fine-tuned to follow instructions and respond concisely to prompts. However, some models (such as certain Qwen-VL or minigpt variants) offer both **instruct** and **thinking** versions.
- **Instruct models** are always recommended for use with Frigate. These models generate direct, relevant, actionable descriptions that best fit Frigate's object and event summary use case.
- **Thinking models** are fine-tuned for more free-form, open-ended, and speculative outputs, which are typically not concise and may not provide the practical summaries Frigate expects. For this reason, Frigate does **not** recommend or support using thinking models.
Some models are labeled as **hybrid** (capable of both thinking and instruct tasks). In these cases, Frigate will always use instruct-style prompts and specifically disables thinking-mode behaviors to ensure concise, useful responses.
**Recommendation:**
Always select the `-instruct` or documented instruct/tagged variant of any model you use in your Frigate configuration. If in doubt, refer to your model providers documentation or model library for guidance on the correct model variant to use.
### Supported Models
You must use a vision capable model with Frigate. Current model variants can be found [in their model library](https://ollama.com/library). Note that Frigate will not automatically download the model you specify in your config, Ollama will try to download the model but it may take longer than the timeout, it is recommended to pull the model beforehand by running `ollama pull your_model` on your Ollama server/Docker container. Note that the model specified in Frigate's config must match the downloaded model tag.
:::info
Each model is available in multiple parameter sizes (3b, 4b, 8b, etc.). Larger sizes are more capable of complex tasks and understanding of situations, but requires more memory and computational resources. It is recommended to try multiple models and experiment to see which performs best.
:::
:::tip
If you are trying to use a single model for Frigate and HomeAssistant, it will need to support vision and tools calling. qwen3-VL supports vision and tools simultaneously in Ollama.
:::
Note that Frigate will not automatically download the model you specify in your config. Ollama will try to download the model but it may take longer than the timeout, so it is recommended to pull the model beforehand by running `ollama pull your_model` on your Ollama server/Docker container. The model specified in Frigate's config must match the downloaded model tag.
The following models are recommended:
#### Configuration
| Model | Notes |
| ------------- | -------------------------------------------------------------------- |
| `qwen3-vl` | Strong visual and situational understanding, higher vram requirement |
| `Intern3.5VL` | Relatively fast with good vision comprehension |
| `gemma3` | Strong frame-to-frame understanding, slower inference times |
| `qwen2.5-vl` | Fast but capable model with good vision comprehension |
:::note
You should have at least 8 GB of RAM available (or VRAM if running on GPU) to run the 7B models, 16 GB to run the 13B models, and 32 GB to run the 33B models.
:::
#### Ollama Cloud models
Ollama also supports [cloud models](https://ollama.com/cloud), where your local Ollama instance handles requests from Frigate, but model inference is performed in the cloud. Set up Ollama locally, sign in with your Ollama account, and specify the cloud model name in your Frigate config. For more details, see the Ollama cloud model [docs](https://docs.ollama.com/cloud).
### Configuration
```yaml
genai:
@@ -107,65 +83,49 @@ genai:
num_ctx: 8192 # make sure the context matches other services that are using ollama
```
### OpenAI-Compatible
## llama.cpp
Frigate supports any provider that implements the OpenAI API standard. This includes self-hosted solutions like [vLLM](https://docs.vllm.ai/), [LocalAI](https://localai.io/), and other OpenAI-compatible servers.
[llama.cpp](https://github.com/ggml-org/llama.cpp) is a C++ implementation of LLaMA that provides a high-performance inference server. Using llama.cpp directly gives you access to all native llama.cpp options and parameters.
:::tip
:::warning
For OpenAI-compatible servers (such as llama.cpp) that don't expose the configured context size in the API response, you can manually specify the context size in `provider_options`:
```yaml
genai:
provider: openai
base_url: http://your-llama-server
model: your-model-name
provider_options:
context_size: 8192 # Specify the configured context size
```
This ensures Frigate uses the correct context window size when generating prompts.
Using llama.cpp on CPU is not recommended, high inference times make using Generative AI impractical.
:::
#### Configuration
It is highly recommended to host the llama.cpp server on a machine with a discrete graphics card, or on an Apple silicon Mac for best performance.
### Supported Models
You must use a vision capable model with Frigate. The llama.cpp server supports various vision models in GGUF format.
### Configuration
```yaml
genai:
provider: openai
base_url: http://your-server:port
api_key: your-api-key # May not be required for local servers
provider: llamacpp
base_url: http://localhost:8080
model: your-model-name
provider_options:
temperature: 0.7
repeat_penalty: 1.05
top_p: 0.8
top_k: 40
min_p: 0.05
seed: -1
```
To use a different OpenAI-compatible API endpoint, set the `OPENAI_BASE_URL` environment variable to your provider's API URL.
All llama.cpp native options can be passed through `provider_options`, including `temperature`, `top_k`, `top_p`, `min_p`, `repeat_penalty`, `repeat_last_n`, `seed`, `grammar`, and more. See the [llama.cpp server documentation](https://github.com/ggml-org/llama.cpp/blob/master/tools/server/README.md) for a complete list of available parameters.
## Cloud Providers
Cloud providers run on remote infrastructure and require an API key for authentication. These services handle all model inference on their servers.
### Ollama Cloud
Ollama also supports [cloud models](https://ollama.com/cloud), where your local Ollama instance handles requests from Frigate, but model inference is performed in the cloud. Set up Ollama locally, sign in with your Ollama account, and specify the cloud model name in your Frigate config. For more details, see the Ollama cloud model [docs](https://docs.ollama.com/cloud).
#### Configuration
```yaml
genai:
provider: ollama
base_url: http://localhost:11434
model: cloud-model-name
```
### Google Gemini
## Google Gemini
Google Gemini has a [free tier](https://ai.google.dev/pricing) for the API, however the limits may not be sufficient for standard Frigate usage. Choose a plan appropriate for your installation.
#### Supported Models
### Supported Models
You must use a vision capable model with Frigate. Current model variants can be found [in their documentation](https://ai.google.dev/gemini-api/docs/models/gemini).
#### Get API Key
### Get API Key
To start using Gemini, you must first get an API key from [Google AI Studio](https://aistudio.google.com).
@@ -174,7 +134,7 @@ To start using Gemini, you must first get an API key from [Google AI Studio](htt
3. Click "Create API key in new project"
4. Copy the API key for use in your config
#### Configuration
### Configuration
```yaml
genai:
@@ -199,19 +159,19 @@ Other HTTP options are available, see the [python-genai documentation](https://g
:::
### OpenAI
## OpenAI
OpenAI does not have a free tier for their API. With the release of gpt-4o, pricing has been reduced and each generation should cost fractions of a cent if you choose to go this route.
#### Supported Models
### Supported Models
You must use a vision capable model with Frigate. Current model variants can be found [in their documentation](https://platform.openai.com/docs/models).
#### Get API Key
### Get API Key
To start using OpenAI, you must first [create an API key](https://platform.openai.com/api-keys) and [configure billing](https://platform.openai.com/settings/organization/billing/overview).
#### Configuration
### Configuration
```yaml
genai:
@@ -220,19 +180,42 @@ genai:
model: gpt-4o
```
### Azure OpenAI
:::note
To use a different OpenAI-compatible API endpoint, set the `OPENAI_BASE_URL` environment variable to your provider's API URL.
:::
:::tip
For OpenAI-compatible servers (such as llama.cpp) that don't expose the configured context size in the API response, you can manually specify the context size in `provider_options`:
```yaml
genai:
provider: openai
base_url: http://your-llama-server
model: your-model-name
provider_options:
context_size: 8192 # Specify the configured context size
```
This ensures Frigate uses the correct context window size when generating prompts.
:::
## Azure OpenAI
Microsoft offers several vision models through Azure OpenAI. A subscription is required.
#### Supported Models
### Supported Models
You must use a vision capable model with Frigate. Current model variants can be found [in their documentation](https://learn.microsoft.com/en-us/azure/ai-services/openai/concepts/models).
#### Create Resource and Get API Key
### Create Resource and Get API Key
To start using Azure OpenAI, you must first [create a resource](https://learn.microsoft.com/azure/cognitive-services/openai/how-to/create-resource?pivots=web-portal#create-a-resource). You'll need your API key, model name, and resource URL, which must include the `api-version` parameter (see the example below).
#### Configuration
### Configuration
```yaml
genai:
@@ -240,4 +223,4 @@ genai:
base_url: https://instance.cognitiveservices.azure.com/openai/responses?api-version=2025-04-01-preview
model: gpt-5-mini
api_key: "{FRIGATE_OPENAI_API_KEY}"
```
```

View File

@@ -76,6 +76,40 @@ Switching between V1 and V2 requires reindexing your embeddings. The embeddings
:::
### GenAI Provider (llama.cpp)
Frigate can use a GenAI provider for semantic search embeddings when that provider has the `embeddings` role. Currently, only **llama.cpp** supports multimodal embeddings (both text and images).
To use llama.cpp for semantic search:
1. Configure a GenAI provider in your config with `embeddings` in its `roles`.
2. Set `semantic_search.model` to the GenAI config key (e.g. `default`).
3. Start the llama.cpp server with `--embeddings` and `--mmproj` for image support:
```yaml
genai:
default:
provider: llamacpp
base_url: http://localhost:8080
model: your-model-name
roles:
- embeddings
- vision
- tools
semantic_search:
enabled: True
model: default
```
The llama.cpp server must be started with `--embeddings` for the embeddings API, and `--mmproj <mmproj.gguf>` when using image embeddings. See the [llama.cpp server documentation](https://github.com/ggml-org/llama.cpp/blob/master/tools/server/README.md) for details.
:::note
Switching between Jina models and a GenAI provider requires reindexing. Embeddings from different backends are incompatible.
:::
### GPU Acceleration
The CLIP models are downloaded in ONNX format, and the `large` model can be accelerated using GPU hardware, when available. This depends on the Docker build that is used. You can also target a specific device in a multi-GPU installation.

View File

@@ -38,6 +38,7 @@ from frigate.config.camera.updater import (
CameraConfigUpdateTopic,
)
from frigate.ffmpeg_presets import FFMPEG_HWACCEL_VAAPI, _gpu_selector
from frigate.genai import GenAIClientManager
from frigate.jobs.media_sync import (
get_current_media_sync_job,
get_media_sync_job_by_id,
@@ -432,6 +433,7 @@ def config_set(request: Request, body: AppConfigSetBody):
if body.requires_restart == 0 or body.update_topic:
old_config: FrigateConfig = request.app.frigate_config
request.app.frigate_config = config
request.app.genai_manager = GenAIClientManager(config)
if body.update_topic:
if body.update_topic.startswith("config/cameras/"):

View File

@@ -1037,4 +1037,4 @@ async def get_allowed_cameras_for_filter(request: Request):
role = current_user["role"]
all_camera_names = set(request.app.frigate_config.cameras.keys())
roles_dict = request.app.frigate_config.auth.roles
return User.get_allowed_cameras(role, roles_dict, all_camera_names)
return User.get_allowed_cameras(role, roles_dict, all_camera_names)

View File

@@ -3,12 +3,13 @@
import base64
import json
import logging
from datetime import datetime, timezone
from typing import Any, Dict, List, Optional
import time
from datetime import datetime
from typing import Any, Dict, Generator, List, Optional
import cv2
from fastapi import APIRouter, Body, Depends, Request
from fastapi.responses import JSONResponse
from fastapi.responses import JSONResponse, StreamingResponse
from pydantic import BaseModel
from frigate.api.auth import (
@@ -20,16 +21,60 @@ from frigate.api.defs.request.chat_body import ChatCompletionRequest
from frigate.api.defs.response.chat_response import (
ChatCompletionResponse,
ChatMessageResponse,
ToolCall,
)
from frigate.api.defs.tags import Tags
from frigate.api.event import events
from frigate.genai import get_genai_client
from frigate.genai.utils import build_assistant_message_for_conversation
logger = logging.getLogger(__name__)
router = APIRouter(tags=[Tags.chat])
def _chunk_content(content: str, chunk_size: int = 80) -> Generator[str, None, None]:
"""Yield content in word-aware chunks for streaming."""
if not content:
return
words = content.split(" ")
current: List[str] = []
current_len = 0
for w in words:
current.append(w)
current_len += len(w) + 1
if current_len >= chunk_size:
yield " ".join(current) + " "
current = []
current_len = 0
if current:
yield " ".join(current)
def _format_events_with_local_time(
events_list: List[Dict[str, Any]],
) -> List[Dict[str, Any]]:
"""Add human-readable local start/end times to each event for the LLM."""
result = []
for evt in events_list:
if not isinstance(evt, dict):
result.append(evt)
continue
copy_evt = dict(evt)
try:
start_ts = evt.get("start_time")
end_ts = evt.get("end_time")
if start_ts is not None:
dt_start = datetime.fromtimestamp(start_ts)
copy_evt["start_time_local"] = dt_start.strftime("%Y-%m-%d %I:%M:%S %p")
if end_ts is not None:
dt_end = datetime.fromtimestamp(end_ts)
copy_evt["end_time_local"] = dt_end.strftime("%Y-%m-%d %I:%M:%S %p")
except (TypeError, ValueError, OSError):
pass
result.append(copy_evt)
return result
class ToolExecuteRequest(BaseModel):
"""Request model for tool execution."""
@@ -53,19 +98,25 @@ def get_tool_definitions() -> List[Dict[str, Any]]:
"Search for detected objects in Frigate by camera, object label, time range, "
"zones, and other filters. Use this to answer questions about when "
"objects were detected, what objects appeared, or to find specific object detections. "
"An 'object' in Frigate represents a tracked detection (e.g., a person, package, car)."
"An 'object' in Frigate represents a tracked detection (e.g., a person, package, car). "
"When the user asks about a specific name (person, delivery company, animal, etc.), "
"filter by sub_label only and do not set label."
),
"parameters": {
"type": "object",
"properties": {
"camera": {
"type": "string",
"description": "Camera name to filter by (optional). Use 'all' for all cameras.",
"description": "Camera name to filter by (optional).",
},
"label": {
"type": "string",
"description": "Object label to filter by (e.g., 'person', 'package', 'car').",
},
"sub_label": {
"type": "string",
"description": "Name of a person, delivery company, animal, etc. When filtering by a specific name, use only sub_label; do not set label.",
},
"after": {
"type": "string",
"description": "Start time in ISO 8601 format (e.g., '2024-01-01T00:00:00Z').",
@@ -81,8 +132,8 @@ def get_tool_definitions() -> List[Dict[str, Any]]:
},
"limit": {
"type": "integer",
"description": "Maximum number of objects to return (default: 10).",
"default": 10,
"description": "Maximum number of objects to return (default: 25).",
"default": 25,
},
},
},
@@ -120,14 +171,13 @@ def get_tool_definitions() -> List[Dict[str, Any]]:
summary="Get available tools",
description="Returns OpenAI-compatible tool definitions for function calling.",
)
def get_tools(request: Request) -> JSONResponse:
def get_tools() -> JSONResponse:
"""Get list of available tools for LLM function calling."""
tools = get_tool_definitions()
return JSONResponse(content={"tools": tools})
async def _execute_search_objects(
request: Request,
arguments: Dict[str, Any],
allowed_cameras: List[str],
) -> JSONResponse:
@@ -137,23 +187,26 @@ async def _execute_search_objects(
This searches for detected objects (events) in Frigate using the same
logic as the events API endpoint.
"""
# Parse ISO 8601 timestamps to Unix timestamps if provided
# Parse after/before as server local time; convert to Unix timestamp
after = arguments.get("after")
before = arguments.get("before")
def _parse_as_local_timestamp(s: str):
s = s.replace("Z", "").strip()[:19]
dt = datetime.strptime(s, "%Y-%m-%dT%H:%M:%S")
return time.mktime(dt.timetuple())
if after:
try:
after_dt = datetime.fromisoformat(after.replace("Z", "+00:00"))
after = after_dt.timestamp()
except (ValueError, AttributeError):
after = _parse_as_local_timestamp(after)
except (ValueError, AttributeError, TypeError):
logger.warning(f"Invalid 'after' timestamp format: {after}")
after = None
if before:
try:
before_dt = datetime.fromisoformat(before.replace("Z", "+00:00"))
before = before_dt.timestamp()
except (ValueError, AttributeError):
before = _parse_as_local_timestamp(before)
except (ValueError, AttributeError, TypeError):
logger.warning(f"Invalid 'before' timestamp format: {before}")
before = None
@@ -166,15 +219,14 @@ async def _execute_search_objects(
# Build query parameters compatible with EventsQueryParams
query_params = EventsQueryParams(
camera=arguments.get("camera", "all"),
cameras=arguments.get("camera", "all"),
label=arguments.get("label", "all"),
labels=arguments.get("label", "all"),
sub_labels=arguments.get("sub_label", "all").lower(),
zones=zones,
zone=zones,
after=after,
before=before,
limit=arguments.get("limit", 10),
limit=arguments.get("limit", 25),
)
try:
@@ -203,7 +255,6 @@ async def _execute_search_objects(
description="Execute a tool function call from an LLM.",
)
async def execute_tool(
request: Request,
body: ToolExecuteRequest = Body(...),
allowed_cameras: List[str] = Depends(get_allowed_cameras_for_filter),
) -> JSONResponse:
@@ -219,7 +270,7 @@ async def execute_tool(
logger.debug(f"Executing tool: {tool_name} with arguments: {arguments}")
if tool_name == "search_objects":
return await _execute_search_objects(request, arguments, allowed_cameras)
return await _execute_search_objects(arguments, allowed_cameras)
return JSONResponse(
content={
@@ -335,7 +386,7 @@ async def _execute_tool_internal(
This is used by the chat completion endpoint to execute tools.
"""
if tool_name == "search_objects":
response = await _execute_search_objects(request, arguments, allowed_cameras)
response = await _execute_search_objects(arguments, allowed_cameras)
try:
if hasattr(response, "body"):
body_str = response.body.decode("utf-8")
@@ -350,15 +401,109 @@ async def _execute_tool_internal(
elif tool_name == "get_live_context":
camera = arguments.get("camera")
if not camera:
logger.error(
"Tool get_live_context failed: camera parameter is required. "
"Arguments: %s",
json.dumps(arguments),
)
return {"error": "Camera parameter is required"}
return await _execute_get_live_context(request, camera, allowed_cameras)
else:
logger.error(
"Tool call failed: unknown tool %r. Expected one of: search_objects, get_live_context. "
"Arguments received: %s",
tool_name,
json.dumps(arguments),
)
return {"error": f"Unknown tool: {tool_name}"}
async def _execute_pending_tools(
pending_tool_calls: List[Dict[str, Any]],
request: Request,
allowed_cameras: List[str],
) -> tuple[List[ToolCall], List[Dict[str, Any]]]:
"""
Execute a list of tool calls; return (ToolCall list for API response, tool result dicts for conversation).
"""
tool_calls_out: List[ToolCall] = []
tool_results: List[Dict[str, Any]] = []
for tool_call in pending_tool_calls:
tool_name = tool_call["name"]
tool_args = tool_call.get("arguments") or {}
tool_call_id = tool_call["id"]
logger.debug(
f"Executing tool: {tool_name} (id: {tool_call_id}) with arguments: {json.dumps(tool_args, indent=2)}"
)
try:
tool_result = await _execute_tool_internal(
tool_name, tool_args, request, allowed_cameras
)
if isinstance(tool_result, dict) and tool_result.get("error"):
logger.error(
"Tool call %s (id: %s) returned error: %s. Arguments: %s",
tool_name,
tool_call_id,
tool_result.get("error"),
json.dumps(tool_args),
)
if tool_name == "search_objects" and isinstance(tool_result, list):
tool_result = _format_events_with_local_time(tool_result)
_keys = {
"id",
"camera",
"label",
"zones",
"start_time_local",
"end_time_local",
"sub_label",
"event_count",
}
tool_result = [
{k: evt[k] for k in _keys if k in evt}
for evt in tool_result
if isinstance(evt, dict)
]
result_content = (
json.dumps(tool_result)
if isinstance(tool_result, (dict, list))
else (tool_result if isinstance(tool_result, str) else str(tool_result))
)
tool_calls_out.append(
ToolCall(name=tool_name, arguments=tool_args, response=result_content)
)
tool_results.append(
{
"role": "tool",
"tool_call_id": tool_call_id,
"content": result_content,
}
)
except Exception as e:
logger.error(
"Error executing tool %s (id: %s): %s. Arguments: %s",
tool_name,
tool_call_id,
e,
json.dumps(tool_args),
exc_info=True,
)
error_content = json.dumps({"error": f"Tool execution failed: {str(e)}"})
tool_calls_out.append(
ToolCall(name=tool_name, arguments=tool_args, response=error_content)
)
tool_results.append(
{
"role": "tool",
"tool_call_id": tool_call_id,
"content": error_content,
}
)
return (tool_calls_out, tool_results)
@router.post(
"/chat/completion",
response_model=ChatCompletionResponse,
dependencies=[Depends(allow_any_authenticated())],
summary="Chat completion with tool calling",
description=(
@@ -370,7 +515,7 @@ async def chat_completion(
request: Request,
body: ChatCompletionRequest = Body(...),
allowed_cameras: List[str] = Depends(get_allowed_cameras_for_filter),
) -> JSONResponse:
):
"""
Chat completion endpoint with tool calling support.
@@ -383,7 +528,7 @@ async def chat_completion(
6. Repeats until final answer
7. Returns response to user
"""
genai_client = get_genai_client(request.app.frigate_config)
genai_client = request.app.genai_manager.tool_client
if not genai_client:
return JSONResponse(
content={
@@ -395,9 +540,9 @@ async def chat_completion(
tools = get_tool_definitions()
conversation = []
current_datetime = datetime.now(timezone.utc)
current_datetime = datetime.now()
current_date_str = current_datetime.strftime("%Y-%m-%d")
current_time_str = current_datetime.strftime("%H:%M:%S %Z")
current_time_str = current_datetime.strftime("%I:%M:%S %p")
cameras_info = []
config = request.app.frigate_config
@@ -430,9 +575,12 @@ async def chat_completion(
system_prompt = f"""You are a helpful assistant for Frigate, a security camera NVR system. You help users answer questions about their cameras, detected objects, and events.
Current date and time: {current_date_str} at {current_time_str} (UTC)
Current server local date and time: {current_date_str} at {current_time_str}
When users ask questions about "today", "yesterday", "this week", etc., use the current date above as reference.
Do not start your response with phrases like "I will check...", "Let me see...", or "Let me look...". Answer directly.
Always present times to the user in the server's local timezone. When tool results include start_time_local and end_time_local, use those exact strings when listing or describing detection times—do not convert or invent timestamps. Do not use UTC or ISO format with Z for the user-facing answer unless the tool result only provides Unix timestamps without local time fields.
When users ask about "today", "yesterday", "this week", etc., use the current date above as reference.
When searching for objects or events, use ISO 8601 format for dates (e.g., {current_date_str}T00:00:00Z for the start of today).
Always be accurate with time calculations based on the current date provided.{cameras_section}{live_image_note}"""
@@ -472,6 +620,7 @@ Always be accurate with time calculations based on the current date provided.{ca
conversation.append(msg_dict)
tool_iterations = 0
tool_calls: List[ToolCall] = []
max_iterations = body.max_tool_iterations
logger.debug(
@@ -479,6 +628,81 @@ Always be accurate with time calculations based on the current date provided.{ca
f"{len(tools)} tool(s) available, max_iterations={max_iterations}"
)
# True LLM streaming when client supports it and stream requested
if body.stream and hasattr(genai_client, "chat_with_tools_stream"):
stream_tool_calls: List[ToolCall] = []
stream_iterations = 0
async def stream_body_llm():
nonlocal conversation, stream_tool_calls, stream_iterations
while stream_iterations < max_iterations:
logger.debug(
f"Streaming LLM (iteration {stream_iterations + 1}/{max_iterations}) "
f"with {len(conversation)} message(s)"
)
async for event in genai_client.chat_with_tools_stream(
messages=conversation,
tools=tools if tools else None,
tool_choice="auto",
):
kind, value = event
if kind == "content_delta":
yield (
json.dumps({"type": "content", "delta": value}).encode(
"utf-8"
)
+ b"\n"
)
elif kind == "message":
msg = value
if msg.get("finish_reason") == "error":
yield (
json.dumps(
{
"type": "error",
"error": "An error occurred while processing your request.",
}
).encode("utf-8")
+ b"\n"
)
return
pending = msg.get("tool_calls")
if pending:
stream_iterations += 1
conversation.append(
build_assistant_message_for_conversation(
msg.get("content"), pending
)
)
executed_calls, tool_results = await _execute_pending_tools(
pending, request, allowed_cameras
)
stream_tool_calls.extend(executed_calls)
conversation.extend(tool_results)
yield (
json.dumps(
{
"type": "tool_calls",
"tool_calls": [
tc.model_dump() for tc in stream_tool_calls
],
}
).encode("utf-8")
+ b"\n"
)
break
else:
yield (json.dumps({"type": "done"}).encode("utf-8") + b"\n")
return
else:
yield json.dumps({"type": "done"}).encode("utf-8") + b"\n"
return StreamingResponse(
stream_body_llm(),
media_type="application/x-ndjson",
headers={"X-Accel-Buffering": "no"},
)
try:
while tool_iterations < max_iterations:
logger.debug(
@@ -500,119 +724,71 @@ Always be accurate with time calculations based on the current date provided.{ca
status_code=500,
)
assistant_message = {
"role": "assistant",
"content": response.get("content"),
}
if response.get("tool_calls"):
assistant_message["tool_calls"] = [
{
"id": tc["id"],
"type": "function",
"function": {
"name": tc["name"],
"arguments": json.dumps(tc["arguments"]),
},
}
for tc in response["tool_calls"]
]
conversation.append(assistant_message)
conversation.append(
build_assistant_message_for_conversation(
response.get("content"), response.get("tool_calls")
)
)
tool_calls = response.get("tool_calls")
if not tool_calls:
pending_tool_calls = response.get("tool_calls")
if not pending_tool_calls:
logger.debug(
f"Chat completion finished with final answer (iterations: {tool_iterations})"
)
final_content = response.get("content") or ""
if body.stream:
async def stream_body() -> Any:
if tool_calls:
yield (
json.dumps(
{
"type": "tool_calls",
"tool_calls": [
tc.model_dump() for tc in tool_calls
],
}
).encode("utf-8")
+ b"\n"
)
# Stream content in word-sized chunks for smooth UX
for part in _chunk_content(final_content):
yield (
json.dumps({"type": "content", "delta": part}).encode(
"utf-8"
)
+ b"\n"
)
yield json.dumps({"type": "done"}).encode("utf-8") + b"\n"
return StreamingResponse(
stream_body(),
media_type="application/x-ndjson",
)
return JSONResponse(
content=ChatCompletionResponse(
message=ChatMessageResponse(
role="assistant",
content=response.get("content"),
content=final_content,
tool_calls=None,
),
finish_reason=response.get("finish_reason", "stop"),
tool_iterations=tool_iterations,
tool_calls=tool_calls,
).model_dump(),
)
# Execute tools
tool_iterations += 1
logger.debug(
f"Tool calls detected (iteration {tool_iterations}/{max_iterations}): "
f"{len(tool_calls)} tool(s) to execute"
f"{len(pending_tool_calls)} tool(s) to execute"
)
tool_results = []
for tool_call in tool_calls:
tool_name = tool_call["name"]
tool_args = tool_call["arguments"]
tool_call_id = tool_call["id"]
logger.debug(
f"Executing tool: {tool_name} (id: {tool_call_id}) with arguments: {json.dumps(tool_args, indent=2)}"
)
try:
tool_result = await _execute_tool_internal(
tool_name, tool_args, request, allowed_cameras
)
if isinstance(tool_result, dict):
result_content = json.dumps(tool_result)
result_summary = tool_result
if isinstance(tool_result, dict) and isinstance(
tool_result.get("content"), list
):
result_count = len(tool_result.get("content", []))
result_summary = {
"count": result_count,
"sample": tool_result.get("content", [])[:2]
if result_count > 0
else [],
}
logger.debug(
f"Tool {tool_name} (id: {tool_call_id}) completed successfully. "
f"Result: {json.dumps(result_summary, indent=2)}"
)
elif isinstance(tool_result, str):
result_content = tool_result
logger.debug(
f"Tool {tool_name} (id: {tool_call_id}) completed successfully. "
f"Result length: {len(result_content)} characters"
)
else:
result_content = str(tool_result)
logger.debug(
f"Tool {tool_name} (id: {tool_call_id}) completed successfully. "
f"Result type: {type(tool_result).__name__}"
)
tool_results.append(
{
"role": "tool",
"tool_call_id": tool_call_id,
"content": result_content,
}
)
except Exception as e:
logger.error(
f"Error executing tool {tool_name} (id: {tool_call_id}): {e}",
exc_info=True,
)
error_content = json.dumps(
{"error": f"Tool execution failed: {str(e)}"}
)
tool_results.append(
{
"role": "tool",
"tool_call_id": tool_call_id,
"content": error_content,
}
)
logger.debug(
f"Tool {tool_name} (id: {tool_call_id}) failed. Error result added to conversation."
)
executed_calls, tool_results = await _execute_pending_tools(
pending_tool_calls, request, allowed_cameras
)
tool_calls.extend(executed_calls)
conversation.extend(tool_results)
logger.debug(
f"Added {len(tool_results)} tool result(s) to conversation. "
@@ -631,6 +807,7 @@ Always be accurate with time calculations based on the current date provided.{ca
),
finish_reason="length",
tool_iterations=tool_iterations,
tool_calls=tool_calls,
).model_dump(),
)

View File

@@ -39,3 +39,7 @@ class ChatCompletionRequest(BaseModel):
"user message as multimodal content. Use with get_live_context for detection info."
),
)
stream: bool = Field(
default=False,
description="If true, stream the final assistant response in the body as newline-delimited JSON.",
)

View File

@@ -5,8 +5,8 @@ from typing import Any, Optional
from pydantic import BaseModel, Field
class ToolCall(BaseModel):
"""A tool call from the LLM."""
class ToolCallInvocation(BaseModel):
"""A tool call requested by the LLM (before execution)."""
id: str = Field(description="Unique identifier for this tool call")
name: str = Field(description="Tool name to call")
@@ -20,11 +20,24 @@ class ChatMessageResponse(BaseModel):
content: Optional[str] = Field(
default=None, description="Message content (None if tool calls present)"
)
tool_calls: Optional[list[ToolCall]] = Field(
tool_calls: Optional[list[ToolCallInvocation]] = Field(
default=None, description="Tool calls if LLM wants to call tools"
)
class ToolCall(BaseModel):
"""A tool that was executed during the completion, with its response."""
name: str = Field(description="Tool name that was called")
arguments: dict[str, Any] = Field(
default_factory=dict, description="Arguments passed to the tool"
)
response: str = Field(
default="",
description="The response or result returned from the tool execution",
)
class ChatCompletionResponse(BaseModel):
"""Response from chat completion."""
@@ -35,3 +48,7 @@ class ChatCompletionResponse(BaseModel):
tool_iterations: int = Field(
default=0, description="Number of tool call iterations performed"
)
tool_calls: list[ToolCall] = Field(
default_factory=list,
description="List of tool calls that were executed during this completion",
)

View File

@@ -33,6 +33,7 @@ from frigate.comms.event_metadata_updater import (
from frigate.config import FrigateConfig
from frigate.config.camera.updater import CameraConfigUpdatePublisher
from frigate.embeddings import EmbeddingsContext
from frigate.genai import GenAIClientManager
from frigate.ptz.onvif import OnvifController
from frigate.stats.emitter import StatsEmitter
from frigate.storage import StorageMaintainer
@@ -134,6 +135,7 @@ def create_fastapi_app(
app.include_router(record.router)
# App Properties
app.frigate_config = frigate_config
app.genai_manager = GenAIClientManager(frigate_config)
app.embeddings = embeddings
app.detected_frames_processor = detected_frames_processor
app.storage_maintainer = storage_maintainer

View File

@@ -33,7 +33,6 @@ from frigate.api.defs.response.review_response import (
ReviewSummaryResponse,
)
from frigate.api.defs.tags import Tags
from frigate.config import FrigateConfig
from frigate.embeddings import EmbeddingsContext
from frigate.models import Recordings, ReviewSegment, UserReviewStatus
from frigate.review.types import SeverityEnum
@@ -747,9 +746,7 @@ async def set_not_reviewed(
description="Use GenAI to summarize review items over a period of time.",
)
def generate_review_summary(request: Request, start_ts: float, end_ts: float):
config: FrigateConfig = request.app.frigate_config
if not config.genai.provider:
if not request.app.genai_manager.vision_client:
return JSONResponse(
content=(
{

View File

@@ -6,7 +6,7 @@ from pydantic import Field
from ..base import FrigateBaseModel
from ..env import EnvString
__all__ = ["GenAIConfig", "GenAIProviderEnum"]
__all__ = ["GenAIConfig", "GenAIProviderEnum", "GenAIRoleEnum"]
class GenAIProviderEnum(str, Enum):
@@ -17,6 +17,12 @@ class GenAIProviderEnum(str, Enum):
llamacpp = "llamacpp"
class GenAIRoleEnum(str, Enum):
tools = "tools"
vision = "vision"
embeddings = "embeddings"
class GenAIConfig(FrigateBaseModel):
"""Primary GenAI Config to define GenAI Provider."""
@@ -24,6 +30,14 @@ class GenAIConfig(FrigateBaseModel):
base_url: Optional[str] = Field(default=None, title="Provider base url.")
model: str = Field(default="gpt-4o", title="GenAI model.")
provider: GenAIProviderEnum | None = Field(default=None, title="GenAI provider.")
roles: list[GenAIRoleEnum] = Field(
default_factory=lambda: [
GenAIRoleEnum.embeddings,
GenAIRoleEnum.vision,
GenAIRoleEnum.tools,
],
title="GenAI roles (tools, vision, embeddings); one provider per role.",
)
provider_options: dict[str, Any] = Field(
default={}, title="GenAI Provider extra options."
)

View File

@@ -1,5 +1,5 @@
from enum import Enum
from typing import Dict, List, Optional
from typing import Dict, List, Optional, Union
from pydantic import ConfigDict, Field
@@ -128,9 +128,10 @@ class SemanticSearchConfig(FrigateBaseModel):
reindex: Optional[bool] = Field(
default=False, title="Reindex all tracked objects on startup."
)
model: Optional[SemanticSearchModelEnum] = Field(
model: Optional[Union[SemanticSearchModelEnum, str]] = Field(
default=SemanticSearchModelEnum.jinav1,
title="The CLIP model to use for semantic search.",
title="The CLIP model or GenAI provider name for semantic search.",
description="Use 'jinav1', 'jinav2' for ONNX models, or a GenAI config key (e.g. 'default') when that provider has the embeddings role.",
)
model_size: str = Field(
default="small", title="The size of the embeddings model used."

View File

@@ -45,7 +45,7 @@ from .camera.audio import AudioConfig
from .camera.birdseye import BirdseyeConfig
from .camera.detect import DetectConfig
from .camera.ffmpeg import FfmpegConfig
from .camera.genai import GenAIConfig
from .camera.genai import GenAIConfig, GenAIRoleEnum
from .camera.motion import MotionConfig
from .camera.notification import NotificationConfig
from .camera.objects import FilterConfig, ObjectConfig
@@ -347,9 +347,9 @@ class FrigateConfig(FrigateBaseModel):
default_factory=ModelConfig, title="Detection model configuration."
)
# GenAI config
genai: GenAIConfig = Field(
default_factory=GenAIConfig, title="Generative AI configuration."
# GenAI config (named provider configs: name -> GenAIConfig)
genai: Dict[str, GenAIConfig] = Field(
default_factory=dict, title="Generative AI configuration (named providers)."
)
# Camera config
@@ -431,6 +431,34 @@ class FrigateConfig(FrigateBaseModel):
# set notifications state
self.notifications.enabled_in_config = self.notifications.enabled
# validate genai: each role (tools, vision, embeddings) at most once
role_to_name: dict[GenAIRoleEnum, str] = {}
for name, genai_cfg in self.genai.items():
for role in genai_cfg.roles:
if role in role_to_name:
raise ValueError(
f"GenAI role '{role.value}' is assigned to both "
f"'{role_to_name[role]}' and '{name}'; each role must have "
"exactly one provider."
)
role_to_name[role] = name
# validate semantic_search.model when it is a GenAI provider name
if self.semantic_search.enabled and isinstance(
self.semantic_search.model, str
):
if self.semantic_search.model not in self.genai:
raise ValueError(
f"semantic_search.model '{self.semantic_search.model}' is not a "
"valid GenAI config key. Must match a key in genai config."
)
genai_cfg = self.genai[self.semantic_search.model]
if GenAIRoleEnum.embeddings not in genai_cfg.roles:
raise ValueError(
f"GenAI provider '{self.semantic_search.model}' must have "
"'embeddings' in its roles for semantic search."
)
# set default min_score for object attributes
for attribute in self.model.all_attributes:
if not self.objects.filters.get(attribute):

View File

@@ -603,4 +603,4 @@ def get_optimized_runner(
provider_options=options,
),
model_type=model_type,
)
)

View File

@@ -28,6 +28,7 @@ from frigate.types import ModelStatusTypesEnum
from frigate.util.builtin import EventsPerSecond, InferenceSpeed, serialize
from frigate.util.file import get_event_thumbnail_bytes
from .genai_embedding import GenAIEmbedding
from .onnx.jina_v1_embedding import JinaV1ImageEmbedding, JinaV1TextEmbedding
from .onnx.jina_v2_embedding import JinaV2Embedding
@@ -73,11 +74,13 @@ class Embeddings:
config: FrigateConfig,
db: SqliteVecQueueDatabase,
metrics: DataProcessorMetrics,
genai_manager=None,
) -> None:
self.config = config
self.db = db
self.metrics = metrics
self.requestor = InterProcessRequestor()
self.genai_manager = genai_manager
self.image_inference_speed = InferenceSpeed(self.metrics.image_embeddings_speed)
self.image_eps = EventsPerSecond()
@@ -104,7 +107,27 @@ class Embeddings:
},
)
if self.config.semantic_search.model == SemanticSearchModelEnum.jinav2:
model_cfg = self.config.semantic_search.model
is_genai_model = isinstance(model_cfg, str)
if is_genai_model:
embeddings_client = (
genai_manager.embeddings_client if genai_manager else None
)
if not embeddings_client:
raise ValueError(
f"semantic_search.model is '{model_cfg}' (GenAI provider) but "
"no embeddings client is configured. Ensure the GenAI provider "
"has 'embeddings' in its roles."
)
self.embedding = GenAIEmbedding(embeddings_client)
self.text_embedding = lambda input_data: self.embedding(
input_data, embedding_type="text"
)
self.vision_embedding = lambda input_data: self.embedding(
input_data, embedding_type="vision"
)
elif model_cfg == SemanticSearchModelEnum.jinav2:
# Single JinaV2Embedding instance for both text and vision
self.embedding = JinaV2Embedding(
model_size=self.config.semantic_search.model_size,
@@ -118,7 +141,8 @@ class Embeddings:
self.vision_embedding = lambda input_data: self.embedding(
input_data, embedding_type="vision"
)
else: # Default to jinav1
else:
# Default to jinav1
self.text_embedding = JinaV1TextEmbedding(
model_size=config.semantic_search.model_size,
requestor=self.requestor,
@@ -136,8 +160,11 @@ class Embeddings:
self.metrics.text_embeddings_eps.value = self.text_eps.eps()
def get_model_definitions(self):
# Version-specific models
if self.config.semantic_search.model == SemanticSearchModelEnum.jinav2:
model_cfg = self.config.semantic_search.model
if isinstance(model_cfg, str):
# GenAI provider: no ONNX models to download
models = []
elif model_cfg == SemanticSearchModelEnum.jinav2:
models = [
"jinaai/jina-clip-v2-tokenizer",
"jinaai/jina-clip-v2-model_fp16.onnx"
@@ -224,6 +251,14 @@ class Embeddings:
embeddings = self.vision_embedding(valid_thumbs)
if len(embeddings) != len(valid_ids):
logger.warning(
"Batch embed returned %d embeddings for %d thumbnails; skipping batch",
len(embeddings),
len(valid_ids),
)
return []
if upsert:
items = []
for i in range(len(valid_ids)):
@@ -246,9 +281,15 @@ class Embeddings:
def embed_description(
self, event_id: str, description: str, upsert: bool = True
) -> np.ndarray:
) -> np.ndarray | None:
start = datetime.datetime.now().timestamp()
embedding = self.text_embedding([description])[0]
embeddings = self.text_embedding([description])
if not embeddings:
logger.warning(
"Failed to generate description embedding for event %s", event_id
)
return None
embedding = embeddings[0]
if upsert:
self.db.execute_sql(
@@ -271,8 +312,32 @@ class Embeddings:
# upsert embeddings one by one to avoid token limit
embeddings = []
for desc in event_descriptions.values():
embeddings.append(self.text_embedding([desc])[0])
for eid, desc in event_descriptions.items():
result = self.text_embedding([desc])
if not result:
logger.warning(
"Failed to generate description embedding for event %s", eid
)
continue
embeddings.append(result[0])
if not embeddings:
logger.warning("No description embeddings generated in batch")
return np.array([])
# Build ids list for only successful embeddings - we need to track which succeeded
ids = list(event_descriptions.keys())
if len(embeddings) != len(ids):
# Rebuild ids/embeddings for only successful ones (match by order)
ids = []
embeddings_filtered = []
for eid, desc in event_descriptions.items():
result = self.text_embedding([desc])
if result:
ids.append(eid)
embeddings_filtered.append(result[0])
ids = ids
embeddings = embeddings_filtered
if upsert:
ids = list(event_descriptions.keys())
@@ -314,7 +379,10 @@ class Embeddings:
batch_size = (
4
if self.config.semantic_search.model == SemanticSearchModelEnum.jinav2
if (
isinstance(self.config.semantic_search.model, str)
or self.config.semantic_search.model == SemanticSearchModelEnum.jinav2
)
else 32
)
current_page = 1
@@ -601,6 +669,8 @@ class Embeddings:
if trigger.type == "description":
logger.debug(f"Generating embedding for trigger description {trigger_name}")
embedding = self.embed_description(None, trigger.data, upsert=False)
if embedding is None:
return b""
return embedding.astype(np.float32).tobytes()
elif trigger.type == "thumbnail":
@@ -636,6 +706,8 @@ class Embeddings:
embedding = self.embed_thumbnail(
str(trigger.data), thumbnail, upsert=False
)
if embedding is None:
return b""
return embedding.astype(np.float32).tobytes()
else:

View File

@@ -0,0 +1,85 @@
"""GenAI-backed embeddings for semantic search."""
import io
import logging
from typing import TYPE_CHECKING
import numpy as np
from PIL import Image
if TYPE_CHECKING:
from frigate.genai import GenAIClient
logger = logging.getLogger(__name__)
EMBEDDING_DIM = 768
class GenAIEmbedding:
"""Embedding adapter that delegates to a GenAI provider's embed API.
Provides the same interface as JinaV2Embedding for semantic search:
__call__(inputs, embedding_type) -> list[np.ndarray]. Output embeddings are
normalized to 768 dimensions for Frigate's sqlite-vec schema.
"""
def __init__(self, client: "GenAIClient") -> None:
self.client = client
def __call__(
self,
inputs: list[str] | list[bytes] | list[Image.Image],
embedding_type: str = "text",
) -> list[np.ndarray]:
"""Generate embeddings for text or images.
Args:
inputs: List of strings (text) or bytes/PIL images (vision).
embedding_type: "text" or "vision".
Returns:
List of 768-dim numpy float32 arrays.
"""
if not inputs:
return []
if embedding_type == "text":
texts = [str(x) for x in inputs]
embeddings = self.client.embed(texts=texts)
elif embedding_type == "vision":
images: list[bytes] = []
for inp in inputs:
if isinstance(inp, bytes):
images.append(inp)
elif isinstance(inp, Image.Image):
buf = io.BytesIO()
inp.convert("RGB").save(buf, format="JPEG")
images.append(buf.getvalue())
else:
logger.warning(
"GenAIEmbedding: skipping unsupported vision input type %s",
type(inp).__name__,
)
if not images:
return []
embeddings = self.client.embed(images=images)
else:
raise ValueError(
f"Invalid embedding_type '{embedding_type}'. Must be 'text' or 'vision'."
)
result = []
for emb in embeddings:
arr = np.asarray(emb, dtype=np.float32).flatten()
if arr.size != EMBEDDING_DIM:
if arr.size > EMBEDDING_DIM:
arr = arr[:EMBEDDING_DIM]
else:
arr = np.pad(
arr,
(0, EMBEDDING_DIM - arr.size),
mode="constant",
constant_values=0,
)
result.append(arr)
return result

View File

@@ -59,7 +59,7 @@ from frigate.data_processing.real_time.license_plate import (
from frigate.data_processing.types import DataProcessorMetrics, PostProcessDataEnum
from frigate.db.sqlitevecq import SqliteVecQueueDatabase
from frigate.events.types import EventTypeEnum, RegenerateDescriptionEnum
from frigate.genai import get_genai_client
from frigate.genai import GenAIClientManager
from frigate.models import Event, Recordings, ReviewSegment, Trigger
from frigate.util.builtin import serialize
from frigate.util.file import get_event_thumbnail_bytes
@@ -116,8 +116,10 @@ class EmbeddingMaintainer(threading.Thread):
models = [Event, Recordings, ReviewSegment, Trigger]
db.bind(models)
self.genai_manager = GenAIClientManager(config)
if config.semantic_search.enabled:
self.embeddings = Embeddings(config, db, metrics)
self.embeddings = Embeddings(config, db, metrics, self.genai_manager)
# Check if we need to re-index events
if config.semantic_search.reindex:
@@ -144,7 +146,6 @@ class EmbeddingMaintainer(threading.Thread):
self.frame_manager = SharedMemoryFrameManager()
self.detected_license_plates: dict[str, dict[str, Any]] = {}
self.genai_client = get_genai_client(config)
# model runners to share between realtime and post processors
if self.config.lpr.enabled:
@@ -203,12 +204,15 @@ class EmbeddingMaintainer(threading.Thread):
# post processors
self.post_processors: list[PostProcessorApi] = []
if self.genai_client is not None and any(
if self.genai_manager.vision_client is not None and any(
c.review.genai.enabled_in_config for c in self.config.cameras.values()
):
self.post_processors.append(
ReviewDescriptionProcessor(
self.config, self.requestor, self.metrics, self.genai_client
self.config,
self.requestor,
self.metrics,
self.genai_manager.vision_client,
)
)
@@ -246,7 +250,7 @@ class EmbeddingMaintainer(threading.Thread):
)
self.post_processors.append(semantic_trigger_processor)
if self.genai_client is not None and any(
if self.genai_manager.vision_client is not None and any(
c.objects.genai.enabled_in_config for c in self.config.cameras.values()
):
self.post_processors.append(
@@ -255,7 +259,7 @@ class EmbeddingMaintainer(threading.Thread):
self.embeddings,
self.requestor,
self.metrics,
self.genai_client,
self.genai_manager.vision_client,
semantic_trigger_processor,
)
)

View File

@@ -7,15 +7,27 @@ import os
import re
from typing import Any, Optional
import numpy as np
from playhouse.shortcuts import model_to_dict
from frigate.config import CameraConfig, FrigateConfig, GenAIConfig, GenAIProviderEnum
from frigate.const import CLIPS_DIR
from frigate.data_processing.post.types import ReviewMetadata
from frigate.genai.manager import GenAIClientManager
from frigate.models import Event
logger = logging.getLogger(__name__)
__all__ = [
"GenAIClient",
"GenAIClientManager",
"GenAIConfig",
"GenAIProviderEnum",
"PROVIDERS",
"load_providers",
"register_genai_provider",
]
PROVIDERS = {}
@@ -293,6 +305,25 @@ Guidelines:
"""Get the context window size for this provider in tokens."""
return 4096
def embed(
self,
texts: list[str] | None = None,
images: list[bytes] | None = None,
) -> list[np.ndarray]:
"""Generate embeddings for text and/or images.
Returns list of numpy arrays (one per input). Expected dimension is 768
for Frigate semantic search compatibility.
Providers that support embeddings should override this method.
"""
logger.warning(
"%s does not support embeddings. "
"This method should be overridden by the provider implementation.",
self.__class__.__name__,
)
return []
def chat_with_tools(
self,
messages: list[dict[str, Any]],
@@ -352,19 +383,6 @@ Guidelines:
}
def get_genai_client(config: FrigateConfig) -> Optional[GenAIClient]:
"""Get the GenAI client."""
if not config.genai.provider:
return None
load_providers()
provider = PROVIDERS.get(config.genai.provider)
if provider:
return provider(config.genai)
return None
def load_providers():
package_dir = os.path.dirname(__file__)
for filename in os.listdir(package_dir):

View File

@@ -1,18 +1,37 @@
"""llama.cpp Provider for Frigate AI."""
import base64
import io
import json
import logging
from typing import Any, Optional
import httpx
import numpy as np
import requests
from PIL import Image
from frigate.config import GenAIProviderEnum
from frigate.genai import GenAIClient, register_genai_provider
from frigate.genai.utils import parse_tool_calls_from_message
logger = logging.getLogger(__name__)
def _to_jpeg(img_bytes: bytes) -> bytes | None:
"""Convert image bytes to JPEG. llama.cpp/STB does not support WebP."""
try:
img = Image.open(io.BytesIO(img_bytes))
if img.mode != "RGB":
img = img.convert("RGB")
buf = io.BytesIO()
img.save(buf, format="JPEG", quality=85)
return buf.getvalue()
except Exception as e:
logger.warning("Failed to convert image to JPEG: %s", e)
return None
@register_genai_provider(GenAIProviderEnum.llamacpp)
class LlamaCppClient(GenAIClient):
"""Generative AI client for Frigate using llama.cpp server."""
@@ -67,6 +86,7 @@ class LlamaCppClient(GenAIClient):
# Build request payload with llama.cpp native options
payload = {
"model": self.genai_config.model,
"messages": [
{
"role": "user",
@@ -99,7 +119,179 @@ class LlamaCppClient(GenAIClient):
def get_context_size(self) -> int:
"""Get the context window size for llama.cpp."""
return self.genai_config.provider_options.get("context_size", 4096)
return self.provider_options.get("context_size", 4096)
def _build_payload(
self,
messages: list[dict[str, Any]],
tools: Optional[list[dict[str, Any]]],
tool_choice: Optional[str],
stream: bool = False,
) -> dict[str, Any]:
"""Build request payload for chat completions (sync or stream)."""
openai_tool_choice = None
if tool_choice:
if tool_choice == "none":
openai_tool_choice = "none"
elif tool_choice == "auto":
openai_tool_choice = "auto"
elif tool_choice == "required":
openai_tool_choice = "required"
payload: dict[str, Any] = {
"messages": messages,
"model": self.genai_config.model,
}
if stream:
payload["stream"] = True
if tools:
payload["tools"] = tools
if openai_tool_choice is not None:
payload["tool_choice"] = openai_tool_choice
provider_opts = {
k: v for k, v in self.provider_options.items() if k != "context_size"
}
payload.update(provider_opts)
return payload
def _message_from_choice(self, choice: dict[str, Any]) -> dict[str, Any]:
"""Parse OpenAI-style choice into {content, tool_calls, finish_reason}."""
message = choice.get("message", {})
content = message.get("content")
content = content.strip() if content else None
tool_calls = parse_tool_calls_from_message(message)
finish_reason = choice.get("finish_reason") or (
"tool_calls" if tool_calls else "stop" if content else "error"
)
return {
"content": content,
"tool_calls": tool_calls,
"finish_reason": finish_reason,
}
@staticmethod
def _streamed_tool_calls_to_list(
tool_calls_by_index: dict[int, dict[str, Any]],
) -> Optional[list[dict[str, Any]]]:
"""Convert streamed tool_calls index map to list of {id, name, arguments}."""
if not tool_calls_by_index:
return None
result = []
for idx in sorted(tool_calls_by_index.keys()):
t = tool_calls_by_index[idx]
args_str = t.get("arguments") or "{}"
try:
arguments = json.loads(args_str)
except json.JSONDecodeError:
arguments = {}
result.append(
{
"id": t.get("id", ""),
"name": t.get("name", ""),
"arguments": arguments,
}
)
return result if result else None
def embed(
self,
texts: list[str] | None = None,
images: list[bytes] | None = None,
) -> list[np.ndarray]:
"""Generate embeddings via llama.cpp /embeddings endpoint.
Supports batch requests. Uses content format with prompt_string and
multimodal_data for images (PR #15108). Server must be started with
--embeddings and --mmproj for multimodal support.
"""
if self.provider is None:
logger.warning(
"llama.cpp provider has not been initialized. Check your llama.cpp configuration."
)
return []
texts = texts or []
images = images or []
if not texts and not images:
return []
EMBEDDING_DIM = 768
content = []
for text in texts:
content.append({"prompt_string": text})
for img in images:
# llama.cpp uses STB which does not support WebP; convert to JPEG
jpeg_bytes = _to_jpeg(img)
to_encode = jpeg_bytes if jpeg_bytes is not None else img
encoded = base64.b64encode(to_encode).decode("utf-8")
# prompt_string must contain <__media__> placeholder for image tokenization
content.append(
{
"prompt_string": "<__media__>\n",
"multimodal_data": [encoded],
}
)
try:
response = requests.post(
f"{self.provider}/embeddings",
json={"model": self.genai_config.model, "content": content},
timeout=self.timeout,
)
response.raise_for_status()
result = response.json()
items = result.get("data", result) if isinstance(result, dict) else result
if not isinstance(items, list):
logger.warning("llama.cpp embeddings returned unexpected format")
return []
embeddings = []
for item in items:
emb = item.get("embedding") if isinstance(item, dict) else None
if emb is None:
logger.warning("llama.cpp embeddings item missing embedding field")
continue
arr = np.array(emb, dtype=np.float32)
orig_dim = arr.size
if orig_dim != EMBEDDING_DIM:
if orig_dim > EMBEDDING_DIM:
arr = arr[:EMBEDDING_DIM]
logger.debug(
"Truncated llama.cpp embedding from %d to %d dimensions",
orig_dim,
EMBEDDING_DIM,
)
else:
arr = np.pad(
arr,
(0, EMBEDDING_DIM - orig_dim),
mode="constant",
constant_values=0,
)
logger.debug(
"Padded llama.cpp embedding from %d to %d dimensions",
orig_dim,
EMBEDDING_DIM,
)
embeddings.append(arr)
return embeddings
except requests.exceptions.Timeout:
logger.warning("llama.cpp embeddings request timed out")
return []
except requests.exceptions.RequestException as e:
error_detail = str(e)
if hasattr(e, "response") and e.response is not None:
try:
error_detail = f"{str(e)} - Response: {e.response.text[:500]}"
except Exception:
pass
logger.warning("llama.cpp embeddings error: %s", error_detail)
return []
except Exception as e:
logger.warning("Unexpected error in llama.cpp embeddings: %s", str(e))
return []
def chat_with_tools(
self,
@@ -122,31 +314,8 @@ class LlamaCppClient(GenAIClient):
"tool_calls": None,
"finish_reason": "error",
}
try:
openai_tool_choice = None
if tool_choice:
if tool_choice == "none":
openai_tool_choice = "none"
elif tool_choice == "auto":
openai_tool_choice = "auto"
elif tool_choice == "required":
openai_tool_choice = "required"
payload = {
"messages": messages,
}
if tools:
payload["tools"] = tools
if openai_tool_choice is not None:
payload["tool_choice"] = openai_tool_choice
provider_opts = {
k: v for k, v in self.provider_options.items() if k != "context_size"
}
payload.update(provider_opts)
payload = self._build_payload(messages, tools, tool_choice, stream=False)
response = requests.post(
f"{self.provider}/v1/chat/completions",
json=payload,
@@ -154,60 +323,13 @@ class LlamaCppClient(GenAIClient):
)
response.raise_for_status()
result = response.json()
if result is None or "choices" not in result or len(result["choices"]) == 0:
return {
"content": None,
"tool_calls": None,
"finish_reason": "error",
}
choice = result["choices"][0]
message = choice.get("message", {})
content = message.get("content")
if content:
content = content.strip()
else:
content = None
tool_calls = None
if "tool_calls" in message and message["tool_calls"]:
tool_calls = []
for tool_call in message["tool_calls"]:
try:
function_data = tool_call.get("function", {})
arguments_str = function_data.get("arguments", "{}")
arguments = json.loads(arguments_str)
except (json.JSONDecodeError, KeyError, TypeError) as e:
logger.warning(
f"Failed to parse tool call arguments: {e}, "
f"tool: {function_data.get('name', 'unknown')}"
)
arguments = {}
tool_calls.append(
{
"id": tool_call.get("id", ""),
"name": function_data.get("name", ""),
"arguments": arguments,
}
)
finish_reason = "error"
if "finish_reason" in choice and choice["finish_reason"]:
finish_reason = choice["finish_reason"]
elif tool_calls:
finish_reason = "tool_calls"
elif content:
finish_reason = "stop"
return {
"content": content,
"tool_calls": tool_calls,
"finish_reason": finish_reason,
}
return self._message_from_choice(result["choices"][0])
except requests.exceptions.Timeout as e:
logger.warning("llama.cpp request timed out: %s", str(e))
return {
@@ -219,8 +341,7 @@ class LlamaCppClient(GenAIClient):
error_detail = str(e)
if hasattr(e, "response") and e.response is not None:
try:
error_body = e.response.text
error_detail = f"{str(e)} - Response: {error_body[:500]}"
error_detail = f"{str(e)} - Response: {e.response.text[:500]}"
except Exception:
pass
logger.warning("llama.cpp returned an error: %s", error_detail)
@@ -236,3 +357,111 @@ class LlamaCppClient(GenAIClient):
"tool_calls": None,
"finish_reason": "error",
}
async def chat_with_tools_stream(
self,
messages: list[dict[str, Any]],
tools: Optional[list[dict[str, Any]]] = None,
tool_choice: Optional[str] = "auto",
):
"""Stream chat with tools via OpenAI-compatible streaming API."""
if self.provider is None:
logger.warning(
"llama.cpp provider has not been initialized. Check your llama.cpp configuration."
)
yield (
"message",
{
"content": None,
"tool_calls": None,
"finish_reason": "error",
},
)
return
try:
payload = self._build_payload(messages, tools, tool_choice, stream=True)
content_parts: list[str] = []
tool_calls_by_index: dict[int, dict[str, Any]] = {}
finish_reason = "stop"
async with httpx.AsyncClient(timeout=float(self.timeout)) as client:
async with client.stream(
"POST",
f"{self.provider}/v1/chat/completions",
json=payload,
) as response:
response.raise_for_status()
async for line in response.aiter_lines():
if not line.startswith("data: "):
continue
data_str = line[6:].strip()
if data_str == "[DONE]":
break
try:
data = json.loads(data_str)
except json.JSONDecodeError:
continue
choices = data.get("choices") or []
if not choices:
continue
delta = choices[0].get("delta", {})
if choices[0].get("finish_reason"):
finish_reason = choices[0]["finish_reason"]
if delta.get("content"):
content_parts.append(delta["content"])
yield ("content_delta", delta["content"])
for tc in delta.get("tool_calls") or []:
idx = tc.get("index", 0)
fn = tc.get("function") or {}
if idx not in tool_calls_by_index:
tool_calls_by_index[idx] = {
"id": tc.get("id", ""),
"name": tc.get("name") or fn.get("name", ""),
"arguments": "",
}
t = tool_calls_by_index[idx]
if tc.get("id"):
t["id"] = tc["id"]
name = tc.get("name") or fn.get("name")
if name:
t["name"] = name
arg = tc.get("arguments") or fn.get("arguments")
if arg is not None:
t["arguments"] += (
arg if isinstance(arg, str) else json.dumps(arg)
)
full_content = "".join(content_parts).strip() or None
tool_calls_list = self._streamed_tool_calls_to_list(tool_calls_by_index)
if tool_calls_list:
finish_reason = "tool_calls"
yield (
"message",
{
"content": full_content,
"tool_calls": tool_calls_list,
"finish_reason": finish_reason,
},
)
except httpx.HTTPStatusError as e:
logger.warning("llama.cpp streaming HTTP error: %s", e)
yield (
"message",
{
"content": None,
"tool_calls": None,
"finish_reason": "error",
},
)
except Exception as e:
logger.warning(
"Unexpected error in llama.cpp chat_with_tools_stream: %s", str(e)
)
yield (
"message",
{
"content": None,
"tool_calls": None,
"finish_reason": "error",
},
)

89
frigate/genai/manager.py Normal file
View File

@@ -0,0 +1,89 @@
"""GenAI client manager for Frigate.
Manages GenAI provider clients from Frigate config. Configuration is read only
in _update_config(); no other code should read config.genai. Exposes clients
by role: tool_client, vision_client, embeddings_client.
"""
import logging
from typing import TYPE_CHECKING, Optional
from frigate.config import FrigateConfig
from frigate.config.camera.genai import GenAIRoleEnum
if TYPE_CHECKING:
from frigate.genai import GenAIClient
logger = logging.getLogger(__name__)
class GenAIClientManager:
"""Manages GenAI provider clients from Frigate config."""
def __init__(self, config: FrigateConfig) -> None:
self._config = config
self._tool_client: Optional[GenAIClient] = None
self._vision_client: Optional[GenAIClient] = None
self._embeddings_client: Optional[GenAIClient] = None
self._update_config()
def _update_config(self) -> None:
"""Build role clients from current Frigate config.genai.
Called from __init__ and can be called again when config is reloaded.
Each role (tools, vision, embeddings) gets the client for the provider
that has that role in its roles list.
"""
from frigate.genai import PROVIDERS, load_providers
self._tool_client = None
self._vision_client = None
self._embeddings_client = None
if not self._config.genai:
return
load_providers()
for _name, genai_cfg in self._config.genai.items():
if not genai_cfg.provider:
continue
provider_cls = PROVIDERS.get(genai_cfg.provider)
if not provider_cls:
logger.warning(
"Unknown GenAI provider %s in config, skipping.",
genai_cfg.provider,
)
continue
try:
client = provider_cls(genai_cfg)
except Exception as e:
logger.exception(
"Failed to create GenAI client for provider %s: %s",
genai_cfg.provider,
e,
)
continue
for role in genai_cfg.roles:
if role == GenAIRoleEnum.tools:
self._tool_client = client
elif role == GenAIRoleEnum.vision:
self._vision_client = client
elif role == GenAIRoleEnum.embeddings:
self._embeddings_client = client
@property
def tool_client(self) -> "Optional[GenAIClient]":
"""Client configured for the tools role (e.g. chat with function calling)."""
return self._tool_client
@property
def vision_client(self) -> "Optional[GenAIClient]":
"""Client configured for the vision role (e.g. review descriptions, object descriptions)."""
return self._vision_client
@property
def embeddings_client(self) -> "Optional[GenAIClient]":
"""Client configured for the embeddings role."""
return self._embeddings_client

View File

@@ -1,15 +1,16 @@
"""Ollama Provider for Frigate AI."""
import json
import logging
from typing import Any, Optional
from httpx import RemoteProtocolError, TimeoutException
from ollama import AsyncClient as OllamaAsyncClient
from ollama import Client as ApiClient
from ollama import ResponseError
from frigate.config import GenAIProviderEnum
from frigate.genai import GenAIClient, register_genai_provider
from frigate.genai.utils import parse_tool_calls_from_message
logger = logging.getLogger(__name__)
@@ -88,6 +89,73 @@ class OllamaClient(GenAIClient):
"num_ctx", 4096
)
def _build_request_params(
self,
messages: list[dict[str, Any]],
tools: Optional[list[dict[str, Any]]],
tool_choice: Optional[str],
stream: bool = False,
) -> dict[str, Any]:
"""Build request_messages and params for chat (sync or stream)."""
request_messages = []
for msg in messages:
msg_dict = {
"role": msg.get("role"),
"content": msg.get("content", ""),
}
if msg.get("tool_call_id"):
msg_dict["tool_call_id"] = msg["tool_call_id"]
if msg.get("name"):
msg_dict["name"] = msg["name"]
if msg.get("tool_calls"):
msg_dict["tool_calls"] = msg["tool_calls"]
request_messages.append(msg_dict)
request_params: dict[str, Any] = {
"model": self.genai_config.model,
"messages": request_messages,
**self.provider_options,
}
if stream:
request_params["stream"] = True
if tools:
request_params["tools"] = tools
if tool_choice:
request_params["tool_choice"] = (
"none"
if tool_choice == "none"
else "required"
if tool_choice == "required"
else "auto"
)
return request_params
def _message_from_response(self, response: dict[str, Any]) -> dict[str, Any]:
"""Parse Ollama chat response into {content, tool_calls, finish_reason}."""
if not response or "message" not in response:
return {
"content": None,
"tool_calls": None,
"finish_reason": "error",
}
message = response["message"]
content = message.get("content", "").strip() if message.get("content") else None
tool_calls = parse_tool_calls_from_message(message)
finish_reason = "error"
if response.get("done"):
finish_reason = (
"tool_calls" if tool_calls else "stop" if content else "error"
)
elif tool_calls:
finish_reason = "tool_calls"
elif content:
finish_reason = "stop"
return {
"content": content,
"tool_calls": tool_calls,
"finish_reason": finish_reason,
}
def chat_with_tools(
self,
messages: list[dict[str, Any]],
@@ -103,93 +171,12 @@ class OllamaClient(GenAIClient):
"tool_calls": None,
"finish_reason": "error",
}
try:
request_messages = []
for msg in messages:
msg_dict = {
"role": msg.get("role"),
"content": msg.get("content", ""),
}
if msg.get("tool_call_id"):
msg_dict["tool_call_id"] = msg["tool_call_id"]
if msg.get("name"):
msg_dict["name"] = msg["name"]
if msg.get("tool_calls"):
msg_dict["tool_calls"] = msg["tool_calls"]
request_messages.append(msg_dict)
request_params = {
"model": self.genai_config.model,
"messages": request_messages,
}
if tools:
request_params["tools"] = tools
if tool_choice:
if tool_choice == "none":
request_params["tool_choice"] = "none"
elif tool_choice == "required":
request_params["tool_choice"] = "required"
elif tool_choice == "auto":
request_params["tool_choice"] = "auto"
request_params.update(self.provider_options)
response = self.provider.chat(**request_params)
if not response or "message" not in response:
return {
"content": None,
"tool_calls": None,
"finish_reason": "error",
}
message = response["message"]
content = (
message.get("content", "").strip() if message.get("content") else None
request_params = self._build_request_params(
messages, tools, tool_choice, stream=False
)
tool_calls = None
if "tool_calls" in message and message["tool_calls"]:
tool_calls = []
for tool_call in message["tool_calls"]:
try:
function_data = tool_call.get("function", {})
arguments_str = function_data.get("arguments", "{}")
arguments = json.loads(arguments_str)
except (json.JSONDecodeError, KeyError, TypeError) as e:
logger.warning(
f"Failed to parse tool call arguments: {e}, "
f"tool: {function_data.get('name', 'unknown')}"
)
arguments = {}
tool_calls.append(
{
"id": tool_call.get("id", ""),
"name": function_data.get("name", ""),
"arguments": arguments,
}
)
finish_reason = "error"
if "done" in response and response["done"]:
if tool_calls:
finish_reason = "tool_calls"
elif content:
finish_reason = "stop"
elif tool_calls:
finish_reason = "tool_calls"
elif content:
finish_reason = "stop"
return {
"content": content,
"tool_calls": tool_calls,
"finish_reason": finish_reason,
}
response = self.provider.chat(**request_params)
return self._message_from_response(response)
except (TimeoutException, ResponseError, ConnectionError) as e:
logger.warning("Ollama returned an error: %s", str(e))
return {
@@ -204,3 +191,89 @@ class OllamaClient(GenAIClient):
"tool_calls": None,
"finish_reason": "error",
}
async def chat_with_tools_stream(
self,
messages: list[dict[str, Any]],
tools: Optional[list[dict[str, Any]]] = None,
tool_choice: Optional[str] = "auto",
):
"""Stream chat with tools; yields content deltas then final message."""
if self.provider is None:
logger.warning(
"Ollama provider has not been initialized. Check your Ollama configuration."
)
yield (
"message",
{
"content": None,
"tool_calls": None,
"finish_reason": "error",
},
)
return
try:
request_params = self._build_request_params(
messages, tools, tool_choice, stream=True
)
async_client = OllamaAsyncClient(
host=self.genai_config.base_url,
timeout=self.timeout,
)
content_parts: list[str] = []
final_message: dict[str, Any] | None = None
try:
stream = await async_client.chat(**request_params)
async for chunk in stream:
if not chunk or "message" not in chunk:
continue
msg = chunk.get("message", {})
delta = msg.get("content") or ""
if delta:
content_parts.append(delta)
yield ("content_delta", delta)
if chunk.get("done"):
full_content = "".join(content_parts).strip() or None
tool_calls = parse_tool_calls_from_message(msg)
final_message = {
"content": full_content,
"tool_calls": tool_calls,
"finish_reason": "tool_calls" if tool_calls else "stop",
}
break
finally:
await async_client.close()
if final_message is not None:
yield ("message", final_message)
else:
yield (
"message",
{
"content": "".join(content_parts).strip() or None,
"tool_calls": None,
"finish_reason": "stop",
},
)
except (TimeoutException, ResponseError, ConnectionError) as e:
logger.warning("Ollama streaming error: %s", str(e))
yield (
"message",
{
"content": None,
"tool_calls": None,
"finish_reason": "error",
},
)
except Exception as e:
logger.warning(
"Unexpected error in Ollama chat_with_tools_stream: %s", str(e)
)
yield (
"message",
{
"content": None,
"tool_calls": None,
"finish_reason": "error",
},
)

70
frigate/genai/utils.py Normal file
View File

@@ -0,0 +1,70 @@
"""Shared helpers for GenAI providers and chat (OpenAI-style messages, tool call parsing)."""
import json
import logging
from typing import Any, List, Optional
logger = logging.getLogger(__name__)
def parse_tool_calls_from_message(
message: dict[str, Any],
) -> Optional[list[dict[str, Any]]]:
"""
Parse tool_calls from an OpenAI-style message dict.
Message may have "tool_calls" as a list of:
{"id": str, "function": {"name": str, "arguments": str}, ...}
Returns a list of {"id", "name", "arguments"} with arguments parsed as dict,
or None if no tool_calls. Used by Ollama and LlamaCpp (non-stream) responses.
"""
raw = message.get("tool_calls")
if not raw or not isinstance(raw, list):
return None
result = []
for tool_call in raw:
function_data = tool_call.get("function") or {}
try:
arguments_str = function_data.get("arguments") or "{}"
arguments = json.loads(arguments_str)
except (json.JSONDecodeError, KeyError, TypeError) as e:
logger.warning(
"Failed to parse tool call arguments: %s, tool: %s",
e,
function_data.get("name", "unknown"),
)
arguments = {}
result.append(
{
"id": tool_call.get("id", ""),
"name": function_data.get("name", ""),
"arguments": arguments,
}
)
return result if result else None
def build_assistant_message_for_conversation(
content: Any,
tool_calls_raw: Optional[List[dict[str, Any]]],
) -> dict[str, Any]:
"""
Build the assistant message dict in OpenAI format for appending to a conversation.
tool_calls_raw: list of {"id", "name", "arguments"} (arguments as dict), or None.
"""
msg: dict[str, Any] = {"role": "assistant", "content": content}
if tool_calls_raw:
msg["tool_calls"] = [
{
"id": tc["id"],
"type": "function",
"function": {
"name": tc["name"],
"arguments": json.dumps(tc.get("arguments") or {}),
},
}
for tc in tool_calls_raw
]
return msg

View File

@@ -438,6 +438,13 @@ def migrate_018_0(config: dict[str, dict[str, Any]]) -> dict[str, dict[str, Any]
"""Handle migrating frigate config to 0.18-0"""
new_config = config.copy()
# Migrate GenAI to new format
genai = new_config.get("genai")
if genai and genai.get("provider"):
genai["roles"] = ["embeddings", "vision", "tools"]
new_config["genai"] = {"default": genai}
# Remove deprecated sync_recordings from global record config
if new_config.get("record", {}).get("sync_recordings") is not None:
del new_config["record"]["sync_recordings"]

1458
web/package-lock.json generated
View File

File diff suppressed because it is too large Load Diff

View File

@@ -71,6 +71,8 @@
"react-icons": "^5.5.0",
"react-konva": "^18.2.10",
"react-router-dom": "^6.30.3",
"react-markdown": "^9.0.1",
"remark-gfm": "^4.0.0",
"react-swipeable": "^7.0.2",
"react-tracked": "^2.0.1",
"react-transition-group": "^4.4.5",

View File

@@ -127,6 +127,7 @@
"cancel": "Cancel",
"close": "Close",
"copy": "Copy",
"copiedToClipboard": "Copied to clipboard",
"back": "Back",
"history": "History",
"fullscreen": "Fullscreen",
@@ -245,6 +246,7 @@
"uiPlayground": "UI Playground",
"faceLibrary": "Face Library",
"classification": "Classification",
"chat": "Chat",
"user": {
"title": "User",
"account": "Account",

View File

@@ -0,0 +1,24 @@
{
"title": "Frigate Chat",
"subtitle": "Your AI assistant for camera management and insights",
"placeholder": "Ask anything...",
"error": "Something went wrong. Please try again.",
"processing": "Processing...",
"toolsUsed": "Used: {{tools}}",
"showTools": "Show tools ({{count}})",
"hideTools": "Hide tools",
"call": "Call",
"result": "Result",
"arguments": "Arguments:",
"response": "Response:",
"send": "Send",
"suggested_requests": "Try asking:",
"starting_requests": {
"show_recent_events": "Show recent events",
"show_camera_status": "Show camera status"
},
"starting_requests_prompts": {
"show_recent_events": "Show me the recent events from the last hour",
"show_camera_status": "What is the current status of my cameras?"
}
}

View File

@@ -27,6 +27,7 @@ const Settings = lazy(() => import("@/pages/Settings"));
const UIPlayground = lazy(() => import("@/pages/UIPlayground"));
const FaceLibrary = lazy(() => import("@/pages/FaceLibrary"));
const Classification = lazy(() => import("@/pages/ClassificationModel"));
const Chat = lazy(() => import("@/pages/Chat"));
const Logs = lazy(() => import("@/pages/Logs"));
const AccessDenied = lazy(() => import("@/pages/AccessDenied"));
@@ -106,6 +107,7 @@ function DefaultAppView() {
<Route path="/logs" element={<Logs />} />
<Route path="/faces" element={<FaceLibrary />} />
<Route path="/classification" element={<Classification />} />
<Route path="/chat" element={<Chat />} />
<Route path="/playground" element={<UIPlayground />} />
</Route>
<Route path="/unauthorized" element={<AccessDenied />} />

View File

@@ -0,0 +1,42 @@
import { useApiHost } from "@/api";
type ChatEventThumbnailsRowProps = {
events: { id: string }[];
};
/**
* Horizontal scroll row of event thumbnail images for chat (e.g. after search_objects).
* Renders nothing when events is empty.
*/
export function ChatEventThumbnailsRow({
events,
}: ChatEventThumbnailsRowProps) {
const apiHost = useApiHost();
if (events.length === 0) return null;
return (
<div className="flex min-w-0 max-w-full flex-col gap-1 self-start">
<div className="scrollbar-container min-w-0 overflow-x-auto">
<div className="flex w-max gap-2">
{events.map((event) => (
<a
key={event.id}
href={`/explore?event_id=${event.id}`}
target="_blank"
rel="noopener noreferrer"
className="relative aspect-square size-32 shrink-0 overflow-hidden rounded-lg"
>
<img
className="size-full object-cover"
src={`${apiHost}api/events/${event.id}/thumbnail.webp`}
alt=""
loading="lazy"
/>
</a>
))}
</div>
</div>
</div>
);
}

View File

@@ -0,0 +1,208 @@
import { useState, useEffect, useRef } from "react";
import ReactMarkdown from "react-markdown";
import remarkGfm from "remark-gfm";
import { useTranslation } from "react-i18next";
import copy from "copy-to-clipboard";
import { toast } from "sonner";
import { FaCopy, FaPencilAlt } from "react-icons/fa";
import { FaArrowUpLong } from "react-icons/fa6";
import { Button } from "@/components/ui/button";
import { Textarea } from "@/components/ui/textarea";
import {
Tooltip,
TooltipContent,
TooltipTrigger,
} from "@/components/ui/tooltip";
import { cn } from "@/lib/utils";
type MessageBubbleProps = {
role: "user" | "assistant";
content: string;
messageIndex?: number;
onEditSubmit?: (messageIndex: number, newContent: string) => void;
isComplete?: boolean;
};
export function MessageBubble({
role,
content,
messageIndex = 0,
onEditSubmit,
isComplete = true,
}: MessageBubbleProps) {
const { t } = useTranslation(["views/chat", "common"]);
const isUser = role === "user";
const [isEditing, setIsEditing] = useState(false);
const [draftContent, setDraftContent] = useState(content);
const editInputRef = useRef<HTMLTextAreaElement>(null);
useEffect(() => {
setDraftContent(content);
}, [content]);
useEffect(() => {
if (isEditing) {
editInputRef.current?.focus();
editInputRef.current?.setSelectionRange(
editInputRef.current.value.length,
editInputRef.current.value.length,
);
}
}, [isEditing]);
const handleCopy = () => {
const text = content?.trim() || "";
if (!text) return;
if (copy(text)) {
toast.success(t("button.copiedToClipboard", { ns: "common" }));
}
};
const handleEditClick = () => {
setDraftContent(content);
setIsEditing(true);
};
const handleEditSubmit = () => {
const trimmed = draftContent.trim();
if (!trimmed || onEditSubmit == null) return;
onEditSubmit(messageIndex, trimmed);
setIsEditing(false);
};
const handleEditCancel = () => {
setDraftContent(content);
setIsEditing(false);
};
const handleEditKeyDown = (e: React.KeyboardEvent<HTMLTextAreaElement>) => {
if (e.key === "Enter" && !e.shiftKey) {
e.preventDefault();
handleEditSubmit();
}
if (e.key === "Escape") {
handleEditCancel();
}
};
if (isUser && isEditing) {
return (
<div className="flex w-full max-w-full flex-col gap-2 self-end">
<Textarea
ref={editInputRef}
value={draftContent}
onChange={(e) => setDraftContent(e.target.value)}
onKeyDown={handleEditKeyDown}
className="min-h-[80px] w-full resize-y rounded-lg bg-primary px-3 py-2 text-primary-foreground placeholder:text-primary-foreground/60"
placeholder={t("placeholder")}
rows={3}
/>
<div className="flex items-center gap-2 self-end">
<Button
variant="ghost"
size="sm"
className="text-muted-foreground hover:text-foreground"
onClick={handleEditCancel}
>
{t("button.cancel", { ns: "common" })}
</Button>
<Button
variant="select"
size="icon"
className="size-9 rounded-full"
disabled={!draftContent.trim()}
onClick={handleEditSubmit}
aria-label={t("send")}
>
<FaArrowUpLong size="16" />
</Button>
</div>
</div>
);
}
return (
<div
className={cn(
"flex flex-col gap-1",
isUser ? "items-end self-end" : "items-start self-start",
)}
>
<div
className={cn(
"rounded-lg px-3 py-2",
isUser ? "bg-primary text-primary-foreground" : "bg-muted",
)}
>
{isUser ? (
content
) : (
<ReactMarkdown
remarkPlugins={[remarkGfm]}
components={{
table: ({ node: _n, ...props }) => (
<table
className="my-2 w-full border-collapse border border-border"
{...props}
/>
),
th: ({ node: _n, ...props }) => (
<th
className="border border-border bg-muted/50 px-2 py-1 text-left text-sm font-medium"
{...props}
/>
),
td: ({ node: _n, ...props }) => (
<td
className="border border-border px-2 py-1 text-sm"
{...props}
/>
),
}}
>
{content}
</ReactMarkdown>
)}
</div>
<div className="flex items-center gap-0.5">
{isUser && onEditSubmit != null && (
<Tooltip>
<TooltipTrigger asChild>
<Button
variant="ghost"
size="icon"
className="size-7 text-muted-foreground hover:text-foreground"
onClick={handleEditClick}
aria-label={t("button.edit", { ns: "common" })}
>
<FaPencilAlt className="size-3" />
</Button>
</TooltipTrigger>
<TooltipContent>
{t("button.edit", { ns: "common" })}
</TooltipContent>
</Tooltip>
)}
{isComplete && (
<Tooltip>
<TooltipTrigger asChild>
<Button
variant="ghost"
size="icon"
className="size-7 text-muted-foreground hover:text-foreground"
onClick={handleCopy}
disabled={!content?.trim()}
aria-label={t("button.copy", { ns: "common" })}
>
<FaCopy className="size-3" />
</Button>
</TooltipTrigger>
<TooltipContent>
{t("button.copy", { ns: "common" })}
</TooltipContent>
</Tooltip>
)}
</div>
</div>
);
}

View File

@@ -0,0 +1,89 @@
import { Button } from "@/components/ui/button";
import { Input } from "@/components/ui/input";
import { FaArrowUpLong } from "react-icons/fa6";
import { useTranslation } from "react-i18next";
import { useState } from "react";
import type { StartingRequest } from "@/types/chat";
type ChatStartingStateProps = {
onSendMessage: (message: string) => void;
};
export function ChatStartingState({ onSendMessage }: ChatStartingStateProps) {
const { t } = useTranslation(["views/chat"]);
const [input, setInput] = useState("");
const defaultRequests: StartingRequest[] = [
{
label: t("starting_requests.show_recent_events"),
prompt: t("starting_requests_prompts.show_recent_events"),
},
{
label: t("starting_requests.show_camera_status"),
prompt: t("starting_requests_prompts.show_camera_status"),
},
];
const handleRequestClick = (prompt: string) => {
onSendMessage(prompt);
};
const handleSubmit = () => {
const text = input.trim();
if (!text) return;
onSendMessage(text);
setInput("");
};
const handleKeyDown = (e: React.KeyboardEvent<HTMLInputElement>) => {
if (e.key === "Enter" && !e.shiftKey) {
e.preventDefault();
handleSubmit();
}
};
return (
<div className="flex size-full flex-col items-center justify-center gap-6 p-8">
<div className="flex flex-col items-center gap-2">
<h1 className="text-4xl font-bold text-foreground">{t("title")}</h1>
<p className="text-muted-foreground">{t("subtitle")}</p>
</div>
<div className="flex w-full max-w-2xl flex-col items-center gap-4">
<p className="text-center text-sm text-muted-foreground">
{t("suggested_requests")}
</p>
<div className="flex w-full flex-wrap justify-center gap-2">
{defaultRequests.map((request, idx) => (
<Button
key={idx}
variant="outline"
className="max-w-sm text-sm"
onClick={() => handleRequestClick(request.prompt)}
>
{request.label}
</Button>
))}
</div>
</div>
<div className="flex w-full max-w-2xl flex-row items-center gap-2 rounded-xl bg-secondary p-4">
<Input
className="h-12 w-full flex-1 border-transparent bg-transparent text-base shadow-none focus-visible:ring-0 dark:bg-transparent"
placeholder={t("placeholder")}
value={input}
onChange={(e) => setInput(e.target.value)}
onKeyDown={handleKeyDown}
/>
<Button
variant="select"
className="size-10 shrink-0 rounded-full"
disabled={!input.trim()}
onClick={handleSubmit}
>
<FaArrowUpLong size="18" />
</Button>
</div>
</div>
);
}

View File

@@ -0,0 +1,88 @@
import { useState } from "react";
import { useTranslation } from "react-i18next";
import {
Collapsible,
CollapsibleContent,
CollapsibleTrigger,
} from "@/components/ui/collapsible";
import { Button } from "@/components/ui/button";
import { cn } from "@/lib/utils";
import { ChevronDown, ChevronRight } from "lucide-react";
type ToolCallBubbleProps = {
name: string;
arguments?: Record<string, unknown>;
response?: string;
side: "left" | "right";
};
export function ToolCallBubble({
name,
arguments: args,
response,
side,
}: ToolCallBubbleProps) {
const { t } = useTranslation(["views/chat"]);
const [open, setOpen] = useState(false);
const isLeft = side === "left";
const normalizedName = name
.replace(/_/g, " ")
.split(" ")
.map((word) => word.charAt(0).toUpperCase() + word.slice(1).toLowerCase())
.join(" ");
return (
<div
className={cn(
"rounded-lg px-3 py-2",
isLeft
? "self-start bg-muted"
: "self-end bg-primary text-primary-foreground",
)}
>
<Collapsible open={open} onOpenChange={setOpen}>
<CollapsibleTrigger asChild>
<Button
variant="ghost"
size="sm"
className={cn(
"h-auto w-full min-w-0 justify-start gap-2 whitespace-normal p-0 text-left text-xs hover:bg-transparent",
!isLeft && "hover:text-primary-foreground",
)}
>
{open ? (
<ChevronDown size={12} className="shrink-0" />
) : (
<ChevronRight size={12} className="shrink-0" />
)}
<span className="break-words font-medium">
{isLeft ? t("call") : t("result")} {normalizedName}
</span>
</Button>
</CollapsibleTrigger>
<CollapsibleContent>
<div className="mt-2 space-y-2">
{isLeft && args && Object.keys(args).length > 0 && (
<div className="text-xs">
<div className="font-medium text-muted-foreground">
{t("arguments")}
</div>
<pre className="scrollbar-container mt-1 max-h-32 overflow-auto whitespace-pre-wrap break-words rounded bg-muted/50 p-2 text-[10px]">
{JSON.stringify(args, null, 2)}
</pre>
</div>
)}
{!isLeft && response && response !== "" && (
<div className="text-xs">
<div className="font-medium opacity-80">{t("response")}</div>
<pre className="scrollbar-container mt-1 max-h-32 overflow-auto whitespace-pre-wrap break-words rounded bg-primary/20 p-2 text-[10px]">
{response}
</pre>
</div>
)}
</div>
</CollapsibleContent>
</Collapsible>
</div>
);
}

View File

@@ -6,7 +6,7 @@ import { isDesktop } from "react-device-detect";
import { FaCompactDisc, FaVideo } from "react-icons/fa";
import { IoSearch } from "react-icons/io5";
import { LuConstruction } from "react-icons/lu";
import { MdCategory, MdVideoLibrary } from "react-icons/md";
import { MdCategory, MdChat, MdVideoLibrary } from "react-icons/md";
import { TbFaceId } from "react-icons/tb";
import useSWR from "swr";
import { useIsAdmin } from "./use-is-admin";
@@ -18,6 +18,7 @@ export const ID_EXPORT = 4;
export const ID_PLAYGROUND = 5;
export const ID_FACE_LIBRARY = 6;
export const ID_CLASSIFICATION = 7;
export const ID_CHAT = 8;
export default function useNavigation(
variant: "primary" | "secondary" = "primary",
@@ -82,7 +83,15 @@ export default function useNavigation(
url: "/classification",
enabled: isDesktop && isAdmin,
},
{
id: ID_CHAT,
variant,
icon: MdChat,
title: "menu.chat",
url: "/chat",
enabled: isDesktop && isAdmin && config?.genai?.model !== "none",
},
] as NavData[],
[config?.face_recognition?.enabled, variant, isAdmin],
[config?.face_recognition?.enabled, config?.genai?.model, variant, isAdmin],
);
}

View File

@@ -1,3 +1,6 @@
/** ONNX embedding models that require local model downloads. GenAI providers are not in this list. */
export const JINA_EMBEDDING_MODELS = ["jinav1", "jinav2"] as const;
export const supportedLanguageKeys = [
"en",
"es",

226
web/src/pages/Chat.tsx Normal file
View File

@@ -0,0 +1,226 @@
import { Button } from "@/components/ui/button";
import { Input } from "@/components/ui/input";
import { FaArrowUpLong } from "react-icons/fa6";
import { useTranslation } from "react-i18next";
import { useState, useCallback } from "react";
import axios from "axios";
import { ChatEventThumbnailsRow } from "@/components/chat/ChatEventThumbnailsRow";
import { MessageBubble } from "@/components/chat/ChatMessage";
import { ToolCallBubble } from "@/components/chat/ToolCallBubble";
import { ChatStartingState } from "@/components/chat/ChatStartingState";
import type { ChatMessage } from "@/types/chat";
import {
getEventIdsFromSearchObjectsToolCalls,
streamChatCompletion,
} from "@/utils/chatUtil";
export default function ChatPage() {
const { t } = useTranslation(["views/chat"]);
const [input, setInput] = useState("");
const [messages, setMessages] = useState<ChatMessage[]>([]);
const [isLoading, setIsLoading] = useState(false);
const [error, setError] = useState<string | null>(null);
const submitConversation = useCallback(
async (messagesToSend: ChatMessage[]) => {
if (isLoading) return;
const last = messagesToSend[messagesToSend.length - 1];
if (!last || last.role !== "user" || !last.content.trim()) return;
setError(null);
const assistantPlaceholder: ChatMessage = {
role: "assistant",
content: "",
toolCalls: undefined,
};
setMessages([...messagesToSend, assistantPlaceholder]);
setIsLoading(true);
const apiMessages = messagesToSend.map((m) => ({
role: m.role,
content: m.content,
}));
const baseURL = axios.defaults.baseURL ?? "";
const url = `${baseURL}chat/completion`;
const headers: Record<string, string> = {
"Content-Type": "application/json",
...(axios.defaults.headers.common as Record<string, string>),
};
await streamChatCompletion(url, headers, apiMessages, {
updateMessages: (updater) => setMessages(updater),
onError: (message) => setError(message),
onDone: () => setIsLoading(false),
defaultErrorMessage: t("error"),
});
},
[isLoading, t],
);
const sendMessage = useCallback(() => {
const text = input.trim();
if (!text || isLoading) return;
setInput("");
submitConversation([...messages, { role: "user", content: text }]);
}, [input, isLoading, messages, submitConversation]);
const handleEditSubmit = useCallback(
(messageIndex: number, newContent: string) => {
const newList: ChatMessage[] = [
...messages.slice(0, messageIndex),
{ role: "user", content: newContent },
];
submitConversation(newList);
},
[messages, submitConversation],
);
return (
<div className="flex size-full justify-center p-2">
<div className="flex size-full flex-col xl:w-[50%] 3xl:w-[35%]">
{messages.length === 0 ? (
<ChatStartingState
onSendMessage={(message) => {
setInput("");
submitConversation([{ role: "user", content: message }]);
}}
/>
) : (
<div className="scrollbar-container flex min-h-0 w-full flex-1 flex-col gap-2 overflow-y-auto">
{messages.map((msg, i) => {
const isStreamingPlaceholder =
i === messages.length - 1 &&
msg.role === "assistant" &&
isLoading &&
!msg.content?.trim() &&
!(msg.toolCalls && msg.toolCalls.length > 0);
if (isStreamingPlaceholder) {
return <div key={i} />;
}
return (
<div key={i} className="flex flex-col gap-2">
{msg.role === "assistant" && msg.toolCalls && (
<>
{msg.toolCalls.map((tc, tcIdx) => (
<div key={tcIdx} className="flex flex-col gap-2">
<ToolCallBubble
name={tc.name}
arguments={tc.arguments}
side="left"
/>
{tc.response && (
<ToolCallBubble
name={tc.name}
response={tc.response}
side="right"
/>
)}
</div>
))}
</>
)}
<MessageBubble
role={msg.role}
content={msg.content}
messageIndex={i}
onEditSubmit={
msg.role === "user" ? handleEditSubmit : undefined
}
isComplete={
msg.role === "user" ||
!isLoading ||
i < messages.length - 1
}
/>
{msg.role === "assistant" &&
(() => {
const isComplete = !isLoading || i < messages.length - 1;
if (!isComplete) return null;
const events = getEventIdsFromSearchObjectsToolCalls(
msg.toolCalls,
);
return <ChatEventThumbnailsRow events={events} />;
})()}
</div>
);
})}
{(() => {
const lastMsg = messages[messages.length - 1];
const showProcessing =
isLoading &&
lastMsg?.role === "assistant" &&
!lastMsg.content?.trim() &&
!(lastMsg.toolCalls && lastMsg.toolCalls.length > 0);
return showProcessing ? (
<div className="self-start rounded-lg bg-muted px-3 py-2 text-muted-foreground">
{t("processing")}
</div>
) : null;
})()}
{error && (
<p className="self-start text-sm text-destructive" role="alert">
{error}
</p>
)}
</div>
)}
{messages.length > 0 && (
<ChatEntry
input={input}
setInput={setInput}
sendMessage={sendMessage}
isLoading={isLoading}
placeholder={t("placeholder")}
/>
)}
</div>
</div>
);
}
type ChatEntryProps = {
input: string;
setInput: (value: string) => void;
sendMessage: () => void;
isLoading: boolean;
placeholder: string;
};
function ChatEntry({
input,
setInput,
sendMessage,
isLoading,
placeholder,
}: ChatEntryProps) {
const handleKeyDown = (e: React.KeyboardEvent<HTMLInputElement>) => {
if (e.key === "Enter" && !e.shiftKey) {
e.preventDefault();
sendMessage();
}
};
return (
<div className="flex w-full flex-col items-center justify-center rounded-xl bg-secondary p-2">
<div className="flex w-full flex-row items-center gap-2">
<Input
className="w-full flex-1 border-transparent bg-transparent shadow-none focus-visible:ring-0 dark:bg-transparent"
placeholder={placeholder}
value={input}
onChange={(e) => setInput(e.target.value)}
onKeyDown={handleKeyDown}
aria-busy={isLoading}
/>
<Button
variant="select"
className="size-10 shrink-0 rounded-full"
disabled={!input.trim() || isLoading}
onClick={sendMessage}
>
<FaArrowUpLong size="16" />
</Button>
</div>
</div>
);
}

View File

@@ -23,6 +23,7 @@ import { toast } from "sonner";
import useSWR from "swr";
import useSWRInfinite from "swr/infinite";
import { useDocDomain } from "@/hooks/use-doc-domain";
import { JINA_EMBEDDING_MODELS } from "@/lib/const";
const API_LIMIT = 25;
@@ -293,7 +294,12 @@ export default function Explore() {
const modelVersion = config?.semantic_search.model || "jinav1";
const modelSize = config?.semantic_search.model_size || "small";
// Text model state
// GenAI providers have no local models to download
const isGenaiEmbeddings =
typeof modelVersion === "string" &&
!(JINA_EMBEDDING_MODELS as readonly string[]).includes(modelVersion);
// Text model state (skipped for GenAI - no local models)
const { payload: textModelState } = useModelState(
modelVersion === "jinav1"
? "jinaai/jina-clip-v1-text_model_fp16.onnx"
@@ -328,6 +334,10 @@ export default function Explore() {
);
const allModelsLoaded = useMemo(() => {
if (isGenaiEmbeddings) {
return true;
}
return (
textModelState === "downloaded" &&
textTokenizerState === "downloaded" &&
@@ -335,6 +345,7 @@ export default function Explore() {
visionFeatureExtractorState === "downloaded"
);
}, [
isGenaiEmbeddings,
textModelState,
textTokenizerState,
visionModelState,
@@ -358,10 +369,11 @@ export default function Explore() {
!defaultViewLoaded ||
(config?.semantic_search.enabled &&
(!reindexState ||
!textModelState ||
!textTokenizerState ||
!visionModelState ||
!visionFeatureExtractorState))
(!isGenaiEmbeddings &&
(!textModelState ||
!textTokenizerState ||
!visionModelState ||
!visionFeatureExtractorState))))
) {
return (
<ActivityIndicator className="absolute left-1/2 top-1/2 -translate-x-1/2 -translate-y-1/2" />

16
web/src/types/chat.ts Normal file
View File

@@ -0,0 +1,16 @@
export type ToolCall = {
name: string;
arguments?: Record<string, unknown>;
response?: string;
};
export type ChatMessage = {
role: "user" | "assistant";
content: string;
toolCalls?: ToolCall[];
};
export type StartingRequest = {
label: string;
prompt: string;
};

193
web/src/utils/chatUtil.ts Normal file
View File

@@ -0,0 +1,193 @@
import type { ChatMessage, ToolCall } from "@/types/chat";
export type StreamChatCallbacks = {
/** Update the messages array (e.g. pass to setState). */
updateMessages: (updater: (prev: ChatMessage[]) => ChatMessage[]) => void;
/** Called when the stream sends an error or fetch fails. */
onError: (message: string) => void;
/** Called when the stream finishes (success or error). */
onDone: () => void;
/** Message used when fetch throws and no server error is available. */
defaultErrorMessage?: string;
};
type StreamChunk =
| { type: "error"; error: string }
| { type: "tool_calls"; tool_calls: ToolCall[] }
| { type: "content"; delta: string };
/**
* POST to chat/completion with stream: true, parse NDJSON stream, and invoke
* callbacks so the caller can update UI (e.g. React state).
*/
export async function streamChatCompletion(
url: string,
headers: Record<string, string>,
apiMessages: { role: string; content: string }[],
callbacks: StreamChatCallbacks,
): Promise<void> {
const {
updateMessages,
onError,
onDone,
defaultErrorMessage = "Something went wrong. Please try again.",
} = callbacks;
try {
const res = await fetch(url, {
method: "POST",
headers,
body: JSON.stringify({ messages: apiMessages, stream: true }),
});
if (!res.ok) {
const errBody = await res.json().catch(() => ({}));
const message = (errBody as { error?: string }).error ?? res.statusText;
onError(message);
onDone();
return;
}
const reader = res.body?.getReader();
const decoder = new TextDecoder();
if (!reader) {
onError("No response body");
onDone();
return;
}
let buffer = "";
let hadStreamError = false;
const applyChunk = (data: StreamChunk) => {
if (data.type === "error") {
onError(data.error);
updateMessages((prev) =>
prev.filter((m) => !(m.role === "assistant" && m.content === "")),
);
return "break";
}
if (data.type === "tool_calls" && data.tool_calls?.length) {
updateMessages((prev) => {
const next = [...prev];
const lastMsg = next[next.length - 1];
if (lastMsg?.role === "assistant")
next[next.length - 1] = {
...lastMsg,
toolCalls: data.tool_calls,
};
return next;
});
return "continue";
}
if (data.type === "content" && data.delta !== undefined) {
updateMessages((prev) => {
const next = [...prev];
const lastMsg = next[next.length - 1];
if (lastMsg?.role === "assistant")
next[next.length - 1] = {
...lastMsg,
content: lastMsg.content + data.delta,
};
return next;
});
return "continue";
}
return "continue";
};
for (;;) {
const { done, value } = await reader.read();
if (done) break;
buffer += decoder.decode(value, { stream: true });
const lines = buffer.split("\n");
buffer = lines.pop() ?? "";
for (const line of lines) {
const trimmed = line.trim();
if (!trimmed) continue;
try {
const data = JSON.parse(trimmed) as StreamChunk & { type: string };
const result = applyChunk(data as StreamChunk);
if (result === "break") {
hadStreamError = true;
break;
}
} catch {
// skip malformed JSON lines
}
}
if (hadStreamError) break;
}
// Flush remaining buffer
if (!hadStreamError && buffer.trim()) {
try {
const data = JSON.parse(buffer.trim()) as StreamChunk & {
type: string;
delta?: string;
};
if (data.type === "content" && data.delta !== undefined) {
updateMessages((prev) => {
const next = [...prev];
const lastMsg = next[next.length - 1];
if (lastMsg?.role === "assistant")
next[next.length - 1] = {
...lastMsg,
content: lastMsg.content + data.delta!,
};
return next;
});
}
} catch {
// ignore final malformed chunk
}
}
if (!hadStreamError) {
updateMessages((prev) => {
const next = [...prev];
const lastMsg = next[next.length - 1];
if (lastMsg?.role === "assistant" && lastMsg.content === "")
next[next.length - 1] = { ...lastMsg, content: " " };
return next;
});
}
} catch {
onError(defaultErrorMessage);
updateMessages((prev) =>
prev.filter((m) => !(m.role === "assistant" && m.content === "")),
);
} finally {
onDone();
}
}
/**
* Parse search_objects tool call response(s) into event ids for thumbnails.
*/
export function getEventIdsFromSearchObjectsToolCalls(
toolCalls: ToolCall[] | undefined,
): { id: string }[] {
if (!toolCalls?.length) return [];
const results: { id: string }[] = [];
for (const tc of toolCalls) {
if (tc.name !== "search_objects" || !tc.response?.trim()) continue;
try {
const parsed = JSON.parse(tc.response) as unknown;
if (!Array.isArray(parsed)) continue;
for (const item of parsed) {
if (
item &&
typeof item === "object" &&
"id" in item &&
typeof (item as { id: unknown }).id === "string"
) {
results.push({ id: (item as { id: string }).id });
}
}
} catch {
// ignore parse errors
}
}
return results;
}

View File

@@ -46,6 +46,7 @@ i18n
"components/icons",
"components/player",
"views/events",
"views/chat",
"views/explore",
"views/live",
"views/settings",