Files
firmware/mcp-server/scripts/datadog-dashboard.json
Ben Meadors f6a954b97e Implement rotating JSONL recorder for persistent logging (#10428)
* Implement rotating JSONL recorder for persistent logging

* Fixes

* Update documentation and clean up imports in command files

* Address remaining recorder review feedback

Agent-Logs-Url: https://github.com/meshtastic/firmware/sessions/2541773c-869a-463f-9fae-8505272c06ff

Co-authored-by: thebentern <9000580+thebentern@users.noreply.github.com>

* recorder: fix lock re-entry deadlock on start() and force_rotate_all()

The previous "Fixes" commit added `_files_snapshot()` which acquires
`self._lock` so handlers don't race with `stop()` clearing `_files`.
But two callers were already holding `self._lock` when they invoked
methods that go through the snapshot:

  - `start()` writes the `recorder_start` event from inside its `with
    self._lock:` block. `_write_event` -> `_files_snapshot` re-acquires
    the same non-reentrant `threading.Lock`, freezing process startup.

  - `force_rotate_all()` calls `self.status()` (which also acquires
    `self._lock`) while still holding the lock from rotating each file.

Both fixes release the lock before the call. The recorder_start marker
still lands in events.jsonl because the started/started_at flags are
already set when we write it.

Verified end-to-end against the standalone /tmp/verify_pr_fixes.py
harness — all 9 PR review-comment fixes pass, including pause/resume
event ordering and concurrent start/stop without KeyError.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

* Fix markdown linting issues in leakhunt.md and repro.md

* Handle recorder startup and query review fixes

Agent-Logs-Url: https://github.com/meshtastic/firmware/sessions/78540a9f-fe62-4350-b252-0ae5621f0b8a

Co-authored-by: thebentern <9000580+thebentern@users.noreply.github.com>

* Tighten recorder follow-up tests

Agent-Logs-Url: https://github.com/meshtastic/firmware/sessions/78540a9f-fe62-4350-b252-0ae5621f0b8a

Co-authored-by: thebentern <9000580+thebentern@users.noreply.github.com>

* Stabilize recorder startup tests

Agent-Logs-Url: https://github.com/meshtastic/firmware/sessions/78540a9f-fe62-4350-b252-0ae5621f0b8a

Co-authored-by: thebentern <9000580+thebentern@users.noreply.github.com>

* Remove brittle recorder startup test

Agent-Logs-Url: https://github.com/meshtastic/firmware/sessions/78540a9f-fe62-4350-b252-0ae5621f0b8a

Co-authored-by: thebentern <9000580+thebentern@users.noreply.github.com>

* Polish recorder follow-up errors

Agent-Logs-Url: https://github.com/meshtastic/firmware/sessions/78540a9f-fe62-4350-b252-0ae5621f0b8a

Co-authored-by: thebentern <9000580+thebentern@users.noreply.github.com>

* Refine recorder startup and regex errors

Agent-Logs-Url: https://github.com/meshtastic/firmware/sessions/78540a9f-fe62-4350-b252-0ae5621f0b8a

Co-authored-by: thebentern <9000580+thebentern@users.noreply.github.com>

* Clean up recorder follow-up nits

Agent-Logs-Url: https://github.com/meshtastic/firmware/sessions/78540a9f-fe62-4350-b252-0ae5621f0b8a

Co-authored-by: thebentern <9000580+thebentern@users.noreply.github.com>

* Trunk

---------

Co-authored-by: copilot-swe-agent[bot] <198982749+Copilot@users.noreply.github.com>
Co-authored-by: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-05-10 09:22:40 -05:00

218 lines
6.1 KiB
JSON

{
"title": "Meshtastic Firmware — Recorder Stream",
"description": "Live view of `.mtlog/` streams shipped by `mtlog_to_datadog.py`. Heap, packet volume, log levels, errors. One row per port.",
"widgets": [
{
"definition": {
"title": "Free heap (bytes)",
"type": "timeseries",
"show_legend": true,
"requests": [
{
"queries": [
{
"name": "free_heap",
"data_source": "metrics",
"query": "avg:mesh.local.heap_free_bytes{service:meshtastic-firmware} by {port}"
}
],
"response_format": "timeseries",
"display_type": "line"
}
],
"yaxis": { "label": "bytes" }
}
},
{
"definition": {
"title": "Heap slope (bytes/min) — last 1h",
"type": "query_value",
"precision": 0,
"requests": [
{
"queries": [
{
"name": "slope",
"data_source": "metrics",
"query": "derivative(avg:mesh.local.heap_free_bytes{service:meshtastic-firmware})",
"aggregator": "avg"
}
],
"response_format": "scalar"
}
],
"conditional_formats": [
{ "comparator": "<", "value": -100, "palette": "white_on_red" },
{ "comparator": "<", "value": 0, "palette": "white_on_yellow" },
{ "comparator": ">=", "value": 0, "palette": "white_on_green" }
]
}
},
{
"definition": {
"title": "Total heap (bytes)",
"type": "timeseries",
"requests": [
{
"queries": [
{
"name": "total_heap",
"data_source": "metrics",
"query": "avg:mesh.local.heap_total_bytes{service:meshtastic-firmware} by {port}"
}
],
"response_format": "timeseries",
"display_type": "line"
}
]
}
},
{
"definition": {
"title": "Battery level (%)",
"type": "timeseries",
"requests": [
{
"queries": [
{
"name": "battery",
"data_source": "metrics",
"query": "avg:mesh.device.battery_level{service:meshtastic-firmware} by {port}"
}
],
"response_format": "timeseries",
"display_type": "line"
}
],
"yaxis": { "min": "0", "max": "105" }
}
},
{
"definition": {
"title": "Air utilization (TX %)",
"type": "timeseries",
"requests": [
{
"queries": [
{
"name": "airutil",
"data_source": "metrics",
"query": "avg:mesh.device.air_util_tx{service:meshtastic-firmware} by {port}"
}
],
"response_format": "timeseries",
"display_type": "line"
}
]
}
},
{
"definition": {
"title": "Channel utilization (%)",
"type": "timeseries",
"requests": [
{
"queries": [
{
"name": "chutil",
"data_source": "metrics",
"query": "avg:mesh.device.channel_utilization{service:meshtastic-firmware} by {port}"
}
],
"response_format": "timeseries",
"display_type": "line"
}
]
}
},
{
"definition": {
"title": "Log volume by level",
"type": "timeseries",
"show_legend": true,
"requests": [
{
"response_format": "timeseries",
"display_type": "bars",
"queries": [
{
"name": "log_count",
"data_source": "logs",
"indexes": ["*"],
"compute": { "aggregation": "count" },
"search": { "query": "service:meshtastic-firmware" },
"group_by": [
{
"facet": "@level",
"limit": 10,
"sort": { "order": "desc", "aggregation": "count" }
}
]
}
]
}
]
}
},
{
"definition": {
"title": "Recent ERROR / CRIT firmware logs",
"type": "list_stream",
"requests": [
{
"response_format": "event_list",
"query": {
"data_source": "logs_stream",
"query_string": "service:meshtastic-firmware (status:error OR @level:ERROR OR @level:CRIT)",
"indexes": [],
"sort": { "column": "timestamp", "order": "desc" }
},
"columns": [
{ "field": "timestamp", "width": "auto" },
{ "field": "host", "width": "auto" },
{ "field": "@port", "width": "auto" },
{ "field": "@level", "width": "auto" },
{ "field": "@thread", "width": "auto" },
{ "field": "message", "width": "stretch" }
]
}
]
}
},
{
"definition": {
"title": "Recorder marker events",
"type": "list_stream",
"requests": [
{
"response_format": "event_list",
"query": {
"data_source": "logs_stream",
"query_string": "service:meshtastic-firmware @level:MARK",
"indexes": [],
"sort": { "column": "timestamp", "order": "desc" }
},
"columns": [
{ "field": "timestamp", "width": "auto" },
{ "field": "host", "width": "auto" },
{ "field": "message", "width": "stretch" }
]
}
]
}
}
],
"template_variables": [
{
"name": "port",
"prefix": "port",
"available_values": [],
"default": "*"
},
{ "name": "host", "prefix": "host", "available_values": [], "default": "*" }
],
"layout_type": "ordered",
"notify_list": [],
"reflow_type": "auto"
}