diff --git a/.claude/commands/README.md b/.claude/commands/README.md new file mode 100644 index 000000000..3767dac98 --- /dev/null +++ b/.claude/commands/README.md @@ -0,0 +1,49 @@ +# Claude Code slash commands for the mcp-server test suite + +Three AI-assisted workflows wrapping `mcp-server/run-tests.sh` and the meshtastic MCP tools. Each one has a twin in `.github/prompts/` for Copilot users. + +| Slash command | What it does | Copilot equivalent | +| --------------------- | ------------------------------------------------------------------------- | ---------------------------------------- | +| `/test [args]` | Runs the test suite (auto-detects hardware) and interprets failures | `.github/prompts/mcp-test.prompt.md` | +| `/diagnose [role]` | Read-only device health report via the meshtastic MCP tools | `.github/prompts/mcp-diagnose.prompt.md` | +| `/repro [n=5]` | Re-runs one test N times, diffs firmware logs between passes and failures | `.github/prompts/mcp-repro.prompt.md` | + +## Why two surfaces + +The Claude Code commands and Copilot prompts cover the same three workflows but each speaks its host's idiom: + +- **Claude Code** (`/test`) uses `$ARGUMENTS` for pass-through, has direct access to Bash + all MCP tools registered in the user's settings, and runs in the terminal context. +- **Copilot** (`/mcp-test`) runs in VS Code's agent mode; it has terminal + MCP access too but typically asks the operator to confirm inputs interactively. + +A contributor using either IDE gets equivalent assistance. Keep the two in sync when behavior changes — the diff of intent should be minimal. + +## House rules + +- **No destructive writes without explicit operator approval.** Skills that could reflash, factory-reset, or reboot a device must describe the action and stop — the operator authorizes. +- **Interpret failures, don't just echo them.** The skill body should pull firmware log lines from `mcp-server/tests/report.html` (the `Meshtastic debug` section, attached by `tests/conftest.py::pytest_runtest_makereport`) and classify the failure. +- **Keep MCP tool calls sequential per port.** SerialInterface holds an exclusive port lock; two parallel tool calls on the same port deadlock. +- **Never speculate about root cause.** If the evidence doesn't support a classification, say "unknown" and list what you'd need to disambiguate. + +## Adding a new command + +1. Write the Claude Code version at `.claude/commands/.md` with YAML frontmatter: + + ```yaml + --- + description: one-line purpose (used for auto-invocation by the model) + argument-hint: [optional-hint] + --- + ``` + +2. Write the Copilot equivalent at `.github/prompts/mcp-.prompt.md` with: + + ```yaml + --- + mode: agent + description: ... + --- + ``` + +3. Add the row to the table above. Cross-link in both bodies. + +4. Smoke-test on Claude Code first (`/` should appear in autocomplete), then in VS Code Copilot (`/mcp-` in Chat). diff --git a/.claude/commands/diagnose.md b/.claude/commands/diagnose.md new file mode 100644 index 000000000..e4dbe7962 --- /dev/null +++ b/.claude/commands/diagnose.md @@ -0,0 +1,55 @@ +--- +description: Produce a device health report using the meshtastic MCP tools (device_info, list_nodes, get_config, short serial log capture) +argument-hint: [role=all|nrf52|esp32s3|] +--- + +# `/diagnose` — device health report + +Call the meshtastic MCP tool bundle and format a structured health report for one or all detected devices. Zero guesswork for the operator. + +## What to do + +1. **Enumerate hardware.** Call `mcp__meshtastic__list_devices(include_unknown=True)`. For each entry where `likely_meshtastic=True`, capture `port`, `vid`, `pid`, `description`. + +2. **Filter by `$ARGUMENTS`**: + - No args, `all` → every likely-meshtastic device. + - `nrf52` → only devices with `vid == 0x239a`. + - `esp32s3` → only devices with `vid == 0x303a` or `vid == 0x10c4`. + - A `/dev/cu.*` path → only that one port. + - Anything else → treat as a substring match against the `port` string. + +3. **For each selected device, in sequence (NOT parallel — SerialInterface holds an exclusive port lock):** + - `mcp__meshtastic__device_info(port=

)` — captures `my_node_num`, `long_name`, `short_name`, `firmware_version`, `hw_model`, `region`, `num_nodes`, `primary_channel`. + - `mcp__meshtastic__list_nodes(port=

)` — count of peers, which ones have `publicKey` set, SNR/RSSI distribution. + - `mcp__meshtastic__get_config(section="lora", port=

)` — region, preset, channel_num, tx_power, hop_limit. + - Optionally, if the device seems unhappy (fails to connect, `num_nodes==1` when ≥2 are plugged in, missing firmware*version), open a short firmware log window: `mcp__meshtastic__serial_open(port=

, env=)`, wait 3s, `serial_read(session_id=, max_lines=100)`, `serial_close(session_id=)`. The env should be inferred from the VID map in `mcp-server/run-tests.sh` (nrf52 → rak4631, esp32s3 → heltec-v3) unless `MESHTASTIC_MCP_ENV*` is set. + +4. **Render per-device report** as: + + ``` + [nrf52 @ /dev/cu.usbmodem1101] fw=2.7.23.bce2825, hw=RAK4631 + owner : Meshtastic 40eb / 40eb + region/band : US, channel 88, LONG_FAST + tx_power : 30 dBm, hop_limit=3 + peers : 1 (esp32s3 0x433c2428, pubkey ✓, SNR 6.0 / RSSI -24 dBm) + primary ch : McpTest + firmware : no panics in last 3s; NodeInfoModule emitted 2 broadcasts + ``` + + Keep it scannable. If a field is missing or abnormal (no pubkey for a known peer, region=UNSET, num_nodes inconsistent with the hub), flag it inline with a short `⚠︎ `. + +5. **Cross-device correlation** (only when >1 device is inspected): + - Do both sides see each other in `nodesByNum`? If one does and the other doesn't, that's asymmetric NodeInfo — flag it. + - Do the LoRa configs match? (region, channel_num, modem_preset should all agree; mismatch = no mesh) + - Do the primary channel NAMES match? Mismatch = different PSK = no decode. + +6. **Suggest next actions only for specific, recognisable failure modes**: + - Stale PKI pubkey one-way → "run `/test tests/mesh/test_direct_with_ack.py` — the retry + nodeinfo-ping heals this in the test path." + - Region mismatch → "re-bake one side via `./mcp-server/run-tests.sh --force-bake`." + - Device unreachable → point at touch_1200bps + the CP2102-wedged-driver note in run-tests.sh. + +## What NOT to do + +- No writes. No `set_config`, no `reboot`, no `factory_reset`. This is a read-only diagnostic skill — if the operator wants to change state, they'll ask explicitly. +- No `flash` / `erase_and_flash`. Those are separate escalations. +- No holding SerialInterface across tool calls — open, query, close; next device. The port lock is exclusive. diff --git a/.claude/commands/repro.md b/.claude/commands/repro.md new file mode 100644 index 000000000..7ebf9ccd4 --- /dev/null +++ b/.claude/commands/repro.md @@ -0,0 +1,65 @@ +--- +description: Re-run a specific test N times in isolation to triage flakes, diff firmware logs between passes and failures +argument-hint: [count=5] +--- + +# `/repro` — flakiness triage for one test + +Re-run a single pytest node ID N times in isolation, track pass rate, and surface what's _different_ in the firmware logs between the passing attempts and the failing ones. Turns "it's flaky, I guess" into "it fails when X, passes when Y." + +## What to do + +1. **Parse `$ARGUMENTS`**: first token is the pytest node id (e.g. `tests/mesh/test_direct_with_ack.py::test_direct_with_ack_roundtrip[nrf52->esp32s3]`); second token is an integer count (default `5`, cap at `20`). If the first token doesn't look like a test path (no `::` and no `tests/` prefix), treat the whole `$ARGUMENTS` as a `-k` filter instead. + +2. **Sanity-check the hub first** (so we're not measuring "nothing plugged in" N times): call `mcp__meshtastic__list_devices`. If the test name contains `nrf52` or `esp32s3` and the matching VID isn't present, stop and report — re-running won't help. + +3. **Loop N times**. For each iteration: + + ```bash + ./mcp-server/run-tests.sh --tb=short -p no:cacheprovider + ``` + + Capture: exit code, duration, and (on failure) the `Meshtastic debug` firmware log section from `mcp-server/tests/report.html`. `-p no:cacheprovider` suppresses pytest's `.pytest_cache` writes so iterations don't influence each other. + +4. **Track a small structured tally**: + + ``` + attempt 1: PASS (42s) + attempt 2: FAIL (128s) ← firmware log 200-line tail captured + attempt 3: PASS (39s) + attempt 4: FAIL (121s) + attempt 5: PASS (41s) + -------------------------------------- + pass rate: 3/5 (60%) | mean duration: 74s + ``` + +5. **On mixed outcomes**: diff the firmware log tails between a representative passing attempt and a representative failing attempt. Focus on: + - Error-level lines only present in failures (`PKI_UNKNOWN_PUBKEY`, `Alloc an err=`, `Skip send`, `No suitable channel`) + - Timing around the assertion event — did a broadcast go out, was there an ACK, did NAK fire? + - Device state fields that changed (nodesByNum entries, region/preset, channel_num) + + Surface the top 3 differences as a "passes when / fails when" table. Don't dump full logs — pull specific lines with uptime timestamps. + +6. **Classify the flake** into one of: + - **LoRa airtime collision** → pass rate improves with fewer concurrent transmitters; propose a `time.sleep` gap or retry bump in the test body. + - **PKI key staleness** → fails on first attempt, passes after self-heal; existing retry loop in `test_direct_with_ack.py` handles this. + - **NodeInfo cooldown** → `Skip send NodeInfo since we sent it <600s ago` in fail-only logs; needs `broadcast_nodeinfo_ping()` warmup. + - **Hardware-specific** (one direction fails, other passes; one device's firmware is older; driver wedged) → specific recovery pointer. + - **Genuinely unknown** → say so; don't invent a root cause. + +7. **Report back** with: + - Pass rate and mean duration. + - Classification + evidence (the specific log lines that support it). + - A suggested next step (re-run with specific args, open `/diagnose`, edit a specific test file, nothing). + +## Examples + +- `/repro tests/mesh/test_direct_with_ack.py::test_direct_with_ack_roundtrip[esp32s3->nrf52] 10` — runs 10 times, diffs firmware logs. +- `/repro broadcast_delivers` — no `::`, no `tests/`, so interpreted as `-k broadcast_delivers`; runs every matching test the default 5 times. +- `/repro tests/telemetry/test_device_telemetry_broadcast.py 3` — shorter run for a slow test. + +## Constraints + +- Don't exceed `count=20` per invocation — airtime and USB wear add up. If the user asks for 50, negotiate down. +- Don't rebuild firmware as part of triage; flakes that only reproduce under different firmware belong in a separate session. +- If the FIRST attempt fails AND the rest all pass, that's a classic "state leak from a prior test" → say so and suggest running with `--force-bake` or starting from a clean state rather than chasing the first failure. diff --git a/.claude/commands/test.md b/.claude/commands/test.md new file mode 100644 index 000000000..986ee1f31 --- /dev/null +++ b/.claude/commands/test.md @@ -0,0 +1,42 @@ +--- +description: Run the mcp-server test suite (auto-detects devices) and interpret the results +argument-hint: [pytest-args] +--- + +# `/test` — mcp-server test runner with interpretation + +Run `mcp-server/run-tests.sh` and make sense of the output so the operator doesn't have to. + +## What to do + +1. **Invoke the wrapper.** From the firmware repo root, run: + + ```bash + ./mcp-server/run-tests.sh $ARGUMENTS + ``` + + The wrapper auto-detects connected Meshtastic devices, maps each to its PlatformIO env, exports the required `MESHTASTIC_MCP_ENV_*` env vars, and invokes pytest. If the user passed no arguments, the wrapper supplies a sensible default set (`tests/ --html=tests/report.html --self-contained-html --junitxml=tests/junit.xml -v --tb=short`). A `--report-log=tests/reportlog.jsonl` arg is always appended (unless the operator passed their own). `--assume-baked` is deliberately NOT in the defaults — `test_00_bake.py` has its own skip-if-already-baked check and runs the ~8 s verification by default. Operators can opt into the fast path with `--assume-baked`, or force a reflash with `--force-bake`. + +2. **Read the pre-flight header.** First ~6 lines print the detected hub (role → port → env). If that line reads `detected hub : (none)`, the wrapper will narrow to `tests/unit` only — say so explicitly in your summary so the operator knows hardware tiers were skipped. + +3. **On pass**: one-line summary of the form `N passed, M skipped in `. Don't enumerate the 52 test names — the user can read those. Do mention if any test was SKIPPED for a NON-placeholder reason (e.g. "role not present on hub" is worth flagging). + +4. **On failure**: for every FAILED test, open `mcp-server/tests/report.html` and extract the `Meshtastic debug` section for that test. pytest-html embeds the firmware log stream + device state dump there; the 200-line firmware log tail is usually enough to explain the failure. Summarise: which test, one-line assertion message, the firmware log lines that matter (things like `PKI_UNKNOWN_PUBKEY`, `Skip send NodeInfo`, `Error=`, `Guru Meditation`, `assertion failed`). + +5. **Classify the failure** as one of: + - **Transient/flake**: LoRa collision, timing-sensitive assertion, first-attempt NAK + successful retry pattern. Propose `/repro ` to confirm. + - **Environmental**: device unreachable, port busy, CP2102 driver wedged. Suggest the specific recovery (replug USB, `touch_1200bps`, check `git status userPrefs.jsonc`). + - **Regression**: same assertion fails repeatedly, firmware log shows a new/unusual error. Surface the diff between expected and observed, identify the module likely responsible. + +6. **Never run destructive recovery automatically.** If a failure looks like it needs a reflash, factory*reset, or USB replug, \_describe what to do* — don't execute. The operator decides. + +## Arguments handling + +- No args → wrapper's defaults (full suite). +- `$ARGUMENTS` passed verbatim to the wrapper, which passes them to pytest. +- Common operator invocations: `/test tests/mesh`, `/test tests/mesh/test_direct_with_ack.py::test_direct_with_ack_roundtrip`, `/test --force-bake`, `/test -k telemetry`. + +## Side-effects to mention in summary + +- The session fixture snapshots `userPrefs.jsonc` at session start and restores at teardown (plus on `atexit`). After a clean run, `git status userPrefs.jsonc` should be empty. If the wrapper's pre-flight printed a warning about a stale sidecar, call that out — means a prior session crashed. +- `mcp-server/tests/report.html` and `junit.xml` are regenerated on every run; the HTML is self-contained (shareable). diff --git a/.github/copilot-instructions.md b/.github/copilot-instructions.md index 24e11bd4d..d12244229 100644 --- a/.github/copilot-instructions.md +++ b/.github/copilot-instructions.md @@ -429,6 +429,8 @@ Most workflows can be triggered manually via `workflow_dispatch` for testing. ## Testing +### Native unit tests (C++) + Unit tests in `test/` directory with 12 test suites: - `test_crypto/` - Cryptography @@ -446,6 +448,164 @@ Run with: `pio test -e native` Simulation testing: `bin/test-simulator.sh` +### Hardware-in-the-loop tests (`mcp-server/tests/`) + +Separate pytest suite that exercises real USB-connected Meshtastic devices. See the **MCP Server & Hardware Test Harness** section below for invocation, tier layout, and agent usage rules. + +## MCP Server & Hardware Test Harness + +The `mcp-server/` directory houses a firmware-aware [MCP](https://modelcontextprotocol.io/) server plus a pytest-based integration suite. AI agents that speak MCP get a well-defined tool surface for flashing, configuring, and inspecting physical Meshtastic devices — use it instead of hand-rolling `pio` or `meshtastic --port` calls where possible. `mcp-server/README.md` is the operator-facing setup doc; this section is the agent-facing usage contract. + +The repo registers the server via `.mcp.json` at the repo root — Claude Code picks it up automatically once `mcp-server/.venv/` is built (`cd mcp-server && python3 -m venv .venv && .venv/bin/pip install -e '.[test]'`). + +### When to use which surface + +| Goal | Tool | +| ------------------------------------------------- | ---------------------------------------------------------------------------------------------------------------- | +| Find a connected device | `mcp__meshtastic__list_devices` | +| Read a live node's config/state | `mcp__meshtastic__device_info`, `list_nodes`, `get_config` | +| Mutate a device (owner, region, channels, reboot) | `set_owner`, `set_config`, `set_channel_url`, `reboot`, `shutdown`, `factory_reset` — all require `confirm=True` | +| Flash firmware to a variant | `pio_flash` (any arch) or `erase_and_flash` (ESP32 factory install) | +| Stream serial logs while debugging | `serial_open` → `serial_read` loop → `serial_close` | +| Administer `userPrefs.jsonc` build-time constants | `userprefs_get`, `userprefs_set`, `userprefs_reset`, `userprefs_manifest` | +| Run the regression suite | `./mcp-server/run-tests.sh` (or `/test` slash command) | +| Diagnose a specific device | `/diagnose [role]` slash command (read-only) | +| Triage a flaky test | `/repro [count]` slash command | + +**One MCP call per port at a time.** `SerialInterface` holds an exclusive OS-level lock on the serial port for its lifetime. If a `serial_*` session is open on `/dev/cu.usbmodem101`, calling `device_info` on the same port will fail fast pointing at the active session. Sequence calls: open → read/mutate → close, then next device. Never parallelize tool calls on the same port. + +### MCP tool surface (~32 tools) + +Grouped by purpose. Full argument shapes in `mcp-server/README.md`; a few high-value signatures are called out here. + +- **Discovery & metadata**: `list_devices`, `list_boards`, `get_board` +- **Build & flash**: `build`, `clean`, `pio_flash`, `erase_and_flash` (ESP32 only), `update_flash` (ESP32 OTA), `touch_1200bps` +- **Serial sessions** (long-running, 10k-line ring buffer): `serial_open`, `serial_read`, `serial_list`, `serial_close` +- **Device reads**: `device_info`, `list_nodes` +- **Device writes** (all require `confirm=True`): `set_owner`, `get_config`, `set_config`, `get_channel_url`, `set_channel_url`, `send_text`, `reboot`, `shutdown`, `factory_reset`, `set_debug_log_api` +- **userPrefs admin** (build-time constants, not runtime config): `userprefs_get`, `userprefs_set`, `userprefs_reset`, `userprefs_manifest`, `userprefs_testing_profile` +- **Vendor escape hatches**: `esptool_chip_info`, `esptool_erase_flash`, `esptool_raw`, `nrfutil_dfu`, `nrfutil_raw`, `picotool_info`, `picotool_load`, `picotool_raw` + +`confirm=True` is a tool-level gate on top of whatever permission prompt your MCP host shows. **Don't bypass it** by asking the host to auto-approve — it exists specifically because MCP hosts sometimes remember "always allow this tool" and that's dangerous for `factory_reset` and `erase_and_flash`. + +### Hardware test suite (`mcp-server/run-tests.sh`) + +The wrapper auto-detects connected devices (VID → role map: `0x239A` → `nrf52`, `0x303A`/`0x10C4` → `esp32s3`), maps each role to a PlatformIO env (`nrf52` → `rak4631`, `esp32s3` → `heltec-v3`, overridable via `MESHTASTIC_MCP_ENV_`), then invokes pytest. Zero pre-flight config needed from the operator. + +Suite tiers (collected + run in this order via `pytest_collection_modifyitems`): + +1. `tests/unit/` — pure Python (boards parse, pio wrapper, userPrefs parse, testing profile). No hardware. +2. `tests/test_00_bake.py` — flashes each detected device with current `userPrefs.jsonc` merged with the session's test profile. Has its own skip-if-already-baked check comparing region + primary channel to the session profile; skips cheaply on warm devices. +3. `tests/mesh/` — multi-device mesh: bidirectional send, broadcast delivery, direct-with-ACK, mesh formation within 60s. Parametrized `[nrf52->esp32s3]` and `[esp32s3->nrf52]`. +4. `tests/telemetry/` — `DEVICE_METRICS_APP` broadcast timing. +5. `tests/monitor/` — boot-log panic check. +6. `tests/fleet/` — PSK seed session isolation. +7. `tests/admin/` — channel URL roundtrip, owner persistence across reboot. +8. `tests/provisioning/` — region + modem + slot bake, admin key presence, `UNSET` region blocks TX, userPrefs survive factory reset. + +Invocation patterns: + +```bash +./mcp-server/run-tests.sh # full suite (auto-bake-if-needed) +./mcp-server/run-tests.sh --force-bake # reflash before testing +./mcp-server/run-tests.sh --assume-baked # skip bake (caller vouches for device state) +./mcp-server/run-tests.sh tests/mesh # one tier +./mcp-server/run-tests.sh tests/mesh/test_direct_with_ack.py # one file +./mcp-server/run-tests.sh -k telemetry # name filter +``` + +**No hardware detected?** The wrapper auto-narrows to `tests/unit/` only and prints `detected hub : (none)` in the pre-flight header. Agents interpreting the output should call this out explicitly — a 52-test green run without hardware is qualitatively different from a 12-unit-test green run. + +**Artifacts every run produces:** + +- `mcp-server/tests/report.html` — self-contained pytest-html. Each test gets a `Meshtastic debug` section with the tail of firmware log + device state dump. **Open this first** on failures; it's the canonical evidence source. +- `mcp-server/tests/junit.xml` — CI-parseable. +- `mcp-server/tests/reportlog.jsonl` — pytest-reportlog stream (`$report_type` keyed JSONL). Consumed by the live TUI. +- `mcp-server/tests/fwlog.jsonl` — firmware log mirror from the `meshtastic.log.line` pubsub topic. Populated by the `_firmware_log_stream` autouse session fixture. + +### Live TUI (`meshtastic-mcp-test-tui`) + +A Textual-based live view that wraps `run-tests.sh`. Tails reportlog for per-test state, streams firmware logs, polls device state at startup + post-run (gated out of the active run because `hub_devices` holds exclusive port locks). Key bindings: + +| Key | Action | +| --- | ------------------------------------------------------------------------------------------------------------ | +| `r` | re-run focused test (leaf → that node id; internal node → directory or `-k`) | +| `f` | filter tree by substring | +| `d` | failure detail modal (pulls `longrepr` + captured stdout from the reportlog) | +| `g` | export reproducer bundle (tar.gz with README, test_report.json, time-filtered fwlog, devices.json, env.json) | +| `l` | toggle firmware log pane | +| `x` | tool coverage modal | +| `c` | cross-run history sparkline | +| `q` | quit (SIGINT → SIGTERM → SIGKILL escalation, 5-s windows each) | + +Launch: + +```bash +cd mcp-server +.venv/bin/meshtastic-mcp-test-tui # full suite +.venv/bin/meshtastic-mcp-test-tui tests/mesh # args pass through to pytest +``` + +The plain CLI stays primary; the TUI is for operators who want a live dashboard. Both consume the same `run-tests.sh`. + +### Slash commands (Claude Code + Copilot) + +Three AI-assisted workflows wrap the test harness. Claude Code operators get `/test`, `/diagnose`, `/repro`; Copilot operators get `/mcp-test`, `/mcp-diagnose`, `/mcp-repro`. Bodies: + +- `.claude/commands/{test,diagnose,repro}.md` +- `.github/prompts/mcp-{test,diagnose,repro}.prompt.md` + +`.claude/commands/README.md` is the index. + +House rules for agents running these prompts: + +- **Interpret failures, don't just echo them.** Pull firmware log tails from `report.html` and classify each failure as transient / environmental / regression. Use the exact format in `.claude/commands/test.md`. +- **No destructive writes without operator approval.** Any skill that could reflash, factory-reset, or reboot a device must describe the action and stop. The operator authorizes. +- **Sequential MCP calls per port.** See above. +- **"Unknown" is a valid classification.** If evidence doesn't support a root cause, say so and list what would disambiguate. Do not invent. + +### Key fixtures (test authors + agents debugging) + +`mcp-server/tests/conftest.py` provides: + +- **`_session_userprefs`** (autouse session) — snapshots `userPrefs.jsonc` at session start, merges the session test profile via `userprefs.merge_active(test_profile)`, restores at teardown. Four layers of safety: pytest teardown + `atexit` + sidecar file (`userPrefs.jsonc.mcp-session-bak`) + startup self-heal in `run-tests.sh`. **Do not edit `userPrefs.jsonc` from inside a test.** +- **`_firmware_log_stream`** (autouse session) — subscribes to `meshtastic.log.line` pubsub on every connected `SerialInterface` and mirrors lines to `tests/fwlog.jsonl`. Drives the TUI firmware-log pane. +- **`_debug_log_buffer`** (autouse per-test) — captures last 200 firmware log lines + device state for attachment to the pytest-html `Meshtastic debug` section on failure. +- **`hub_devices`** (session) — `dict[role, SerialInterface]` with session-long exclusive port locks. Reason the TUI's device poller is gated to startup + post-run only. +- **`baked_mesh`** — parametrized mesh-pair fixture; depends on `test_00_bake`. `pytest_generate_tests` in `conftest.py` auto-generates `[nrf52->esp32s3]` and `[esp32s3->nrf52]` variants. +- **`test_profile`** — session-scoped dict: region, primary channel, admin key, PSK seed. Derived from `MESHTASTIC_MCP_SEED` (defaults to `mcp--`). + +### Firmware integration points tied to the test harness + +Two firmware changes exist specifically so the test harness works reliably. **Keep these in mind when touching related code.** + +- **`src/mesh/StreamAPI.cpp` + `StreamAPI.h`** — `emitLogRecord` uses a dedicated `fromRadioScratchLog` + `txBufLog` pair and a `concurrency::Lock streamLock`. Before this fix, `debug_log_api_enabled=true` would tear `FromRadio` protobufs on the serial transport because `emitTxBuffer` and `emitLogRecord` shared a single scratch buffer. The conftest enables the log stream session-wide; without this fix the device would corrupt its own FromRadio replies mid-session. +- **`src/mesh/PhoneAPI.cpp`** — `ToRadio` `Heartbeat(nonce=1)` triggers `nodeInfoModule->sendOurNodeInfo(NODENUM_BROADCAST, true, 0, true)` for serial clients, mirroring the pre-existing behavior for TCP/UDP clients in `PacketAPI.cpp`. The mesh tests rely on this to force a NodeInfo broadcast right after connect so the peer discovers them before the test's first assertion. + +If you're modifying `StreamAPI`, `PhoneAPI`, `NodeInfoModule`, or `userPrefs` flow, run `./mcp-server/run-tests.sh` at minimum before asking for review. + +### Recovery playbooks + +| Symptom | First check | Fix | +| ---------------------------------------------------------- | ------------------------------------------------------------- | -------------------------------------------------------------------------------------------------------------------------------------------------------------------------- | +| `userPrefs.jsonc` dirty after test run | `git status --porcelain userPrefs.jsonc` | If non-empty, re-run `./mcp-server/run-tests.sh` once — the pre-flight self-heal restores from sidecar. If still dirty, `git checkout userPrefs.jsonc`. | +| Port busy / wedged CP2102 on macOS | `lsof /dev/cu.usbserial-0001` | Kill the holder. USB replug if the kernel still reports busy. Often a stale `pio device monitor` or zombie `meshtastic_mcp` process. | +| nRF52 appears unresponsive | `list_devices` shows VID `0x239A` but `device_info` times out | `touch_1200bps(port=...)` drops it into the DFU bootloader → `pio_flash` re-installs. | +| Multiple MCP server processes | `ps aux \| grep meshtastic_mcp` shows >1 | Kill all but the one your MCP host spawned. Zombies hold ports and break tests. | +| Mesh formation fails, one side sees peer but other doesn't | `/diagnose` (or `list_nodes` on both sides) | Asymmetric NodeInfo. `test_direct_with_ack` has a heal path; `/repro` it a few times. If persistent, both devices' clocks may be out of sync with their NodeInfo cooldown. | +| "role not present on hub" in skip reasons | `list_devices` | Expected if a device is unplugged. Reconnect before re-running the tier. | +| Tests fail only on first attempt then pass on rerun | — | State leak from a prior session. Run with `--force-bake` to reset to a known state. | + +### Never do these without asking + +- `factory_reset` — wipes node identity; regenerates PKI keypair. Mesh peers will reject old DMs until re-exchange. Legitimate only when the operator explicitly wants it. +- `erase_and_flash` — full chip erase; destroys all on-device state. +- `esptool_erase_flash` / `esptool_raw` write/erase — bypasses pio's safety chain. +- `set_config` on `lora.region` — changes regulatory domain; requires physical-location context the operator has and the agent doesn't. +- `reboot` / `shutdown` mid-test — breaks fixture invariants. +- `push -f`, `rebase -i`, `reset --hard`, or any history-rewriting git operation. +- Clicking computer-use tools on web links in Mail/Messages/PDFs — open URLs via the claude-in-chrome MCP so the extension's link-safety checks apply. + ## Resources - [Documentation](https://meshtastic.org/docs/) diff --git a/.github/prompts/mcp-diagnose.prompt.md b/.github/prompts/mcp-diagnose.prompt.md new file mode 100644 index 000000000..188392043 --- /dev/null +++ b/.github/prompts/mcp-diagnose.prompt.md @@ -0,0 +1,57 @@ +--- +mode: agent +description: Device health report via the meshtastic MCP tools (Copilot equivalent of the Claude Code /diagnose slash command) +--- + +# `/mcp-diagnose` — device health report + +Equivalent of `.claude/commands/diagnose.md`. Use when the operator asks to "check the devices", "what's the mesh looking like", "is nrf52 alive", etc. + +This prompt assumes the meshtastic MCP server is registered with your VS Code Copilot agent. If it isn't, fall back to running `./mcp-server/run-tests.sh tests/unit` plus a short `device_info` script via the terminal. + +## What to do + +1. **Enumerate hardware** via the `list_devices` MCP tool (with `include_unknown=True`). For each entry where `likely_meshtastic=True`, capture `port`, `vid`, `pid`, `description`. + +2. **Apply the operator's filter** (if any): + - No filter → every likely-meshtastic device. + - `nrf52` → `vid == 0x239a` + - `esp32s3` → `vid == 0x303a` or `vid == 0x10c4` + - A `/dev/cu.*` path → only that port. + - Anything else → substring match on port. + +3. **For each selected device, in sequence (don't parallelize — SerialInterface holds an exclusive port lock):** + - `device_info(port=

)` → `my_node_num`, `long_name`, `short_name`, `firmware_version`, `hw_model`, `region`, `num_nodes`, `primary_channel` + - `list_nodes(port=

)` → peer count, which peers have `publicKey`, SNR/RSSI distribution + - `get_config(section="lora", port=

)` → region, preset, channel_num, tx_power, hop_limit + - If anything looks off (can't connect, `num_nodes` wrong, missing `firmware_version`), open a short firmware-log window: `serial_open(port=

, env=)`, wait 3 seconds, `serial_read(session_id, max_lines=100)`, `serial_close(session_id)`. Infer env from VID (0x239a → `rak4631`, 0x303a/0x10c4 → `heltec-v3`) unless an `MESHTASTIC_MCP_ENV_` env var overrides it. + +4. **Render per-device report** as a compact block: + + ``` + [nrf52 @ /dev/cu.usbmodem1101] fw=2.7.23.bce2825, hw=RAK4631 + owner : Meshtastic 40eb / 40eb + region/band : US, channel 88, LONG_FAST + tx_power : 30 dBm, hop_limit=3 + peers : 1 (esp32s3 0x433c2428, pubkey ✓, SNR 6.0 / RSSI -24 dBm) + primary ch : McpTest + firmware : no panics in last 3s + ``` + + Flag abnormalities inline with `⚠︎ ` — missing pubkey on a known peer, region UNSET, mismatched channel name, etc. + +5. **Cross-device correlation** (when >1 device selected): + - Do both see each other in `nodesByNum`? + - Do `region`, `channel_num`, `modem_preset` match across devices? + - Do the primary channel names match? (Different name → different PSK → no decode.) + +6. **Suggest next steps only for recognizable failure modes**, never speculatively: + - Stale PKI one-way → "`/mcp-test tests/mesh/test_direct_with_ack.py` — the test's retry+nodeinfo-ping heals this." + - Region mismatch → "re-bake one side via `./mcp-server/run-tests.sh --force-bake`." + - Device unreachable → refer operator to the touch_1200bps + CP2102-wedged-driver notes in `run-tests.sh`. + +## Hard constraints + +- **Read-only.** No `set_config`, no `reboot`, no `factory_reset`, no `flash`. If the operator wants mutation, they'll escalate explicitly. +- **Open/query/close per device.** Never hold multiple SerialInterfaces to the same port. The port lock is exclusive. +- **Don't infer env beyond the VID map** — if the operator has an unusual board, ask them which env to use rather than guessing. diff --git a/.github/prompts/mcp-repro.prompt.md b/.github/prompts/mcp-repro.prompt.md new file mode 100644 index 000000000..62ffcbc52 --- /dev/null +++ b/.github/prompts/mcp-repro.prompt.md @@ -0,0 +1,67 @@ +--- +mode: agent +description: Re-run a specific test N times to triage flakes; diff firmware logs between passes and failures (Copilot equivalent of the Claude Code /repro slash command) +--- + +# `/mcp-repro` — flakiness triage for one test + +Equivalent of `.claude/commands/repro.md`. Use when the operator says "that one test is flaky — dig in", "repro the direct_with_ack failure", "why does X sometimes fail?". + +## What to do + +1. **Parse the operator's input** into two pieces: + - **Test identifier** — either a pytest node id (has `::` or starts with `tests/`) or a `-k`-style filter (plain substring like `direct_with_ack`). + - **Count** — integer, default `5`, cap at `20`. If the operator asks for 50, negotiate down and explain (airtime + USB wear). + +2. **Sanity-check the hub** via the `list_devices` MCP tool. If the test name references `nrf52` or `esp32s3` and the matching VID isn't present, stop and report — re-running won't help. + +3. **Loop** N times. Each iteration: + + ```bash + ./mcp-server/run-tests.sh --tb=short -p no:cacheprovider + ``` + + `-p no:cacheprovider` keeps pytest from caching anything between iterations. Capture: exit code, duration, and (on failure) the `Meshtastic debug` firmware-log section from `mcp-server/tests/report.html`. + +4. **Tally** results as you go: + + ``` + attempt 1: PASS (42s) + attempt 2: FAIL (128s) ← fw log captured + attempt 3: PASS (39s) + attempt 4: FAIL (121s) + attempt 5: PASS (41s) + -------------------------------------------------- + pass rate: 3/5 (60%) | mean duration: 74s + ``` + +5. **On mixed outcomes, diff the firmware logs** between one representative pass and one representative fail. Focus on: + - Error-level lines present only in failures (`PKI_UNKNOWN_PUBKEY`, `Alloc an err=`, `Skip send`, `No suitable channel`, `NAK`) + - Timing around the assertion point (broadcast sent? ACK received? retry fired?) + - Device-state fields that changed between attempts + + Surface the top 3 differences as a compact "passes when / fails when" table with uptime timestamps. Don't dump full logs. + +6. **Classify** the flake into one of: + - **LoRa airtime collision** — pass rate improves with fewer concurrent transmitters. Suggest a `time.sleep` gap or retry bump in the test body. + - **PKI key staleness** — first attempt fails, subsequent ones pass; existing retry-loop pattern in `test_direct_with_ack.py` is the fix. + - **NodeInfo cooldown** — `Skip send NodeInfo since we sent it <600s ago` in fail-only logs; needs a `broadcast_nodeinfo_ping()` warmup. + - **Hardware-specific** — one direction consistently fails, firmware versions differ, CP2102 driver wedged, etc. + - **Unknown** — say so. Don't invent a root cause. + +7. **Report back** with: + - Pass rate + mean duration. + - Classification + the specific log evidence for it. + - A concrete next step (tighter assertion, more retries, open `/mcp-diagnose`, file a bug, nothing). + +## Examples + +- `tests/mesh/test_direct_with_ack.py::test_direct_with_ack_roundtrip[esp32s3->nrf52] 10` — 10 runs of that parametrized case. +- `broadcast_delivers` — no `::`, no `tests/`; treat as `-k broadcast_delivers`; runs every match 5 times. +- `tests/telemetry/test_device_telemetry_broadcast.py 3` — shorter count for a slow test. + +## Notes + +- If the FIRST attempt fails and the rest pass, that's a state-leak signature — suggest starting from `--force-bake` or a clean device state rather than chasing the first-failure firmware logs. +- If ALL N fail, this isn't a flake — it's a regression. Say so, stop iterating, escalate to `/mcp-test` for full-suite context. +- Don't rebuild firmware during triage. Flakes that only reproduce under different firmware belong in a separate session with a plan. diff --git a/.github/prompts/mcp-test.prompt.md b/.github/prompts/mcp-test.prompt.md new file mode 100644 index 000000000..092ad3d85 --- /dev/null +++ b/.github/prompts/mcp-test.prompt.md @@ -0,0 +1,51 @@ +--- +mode: agent +description: Run the mcp-server test suite and interpret results (Copilot equivalent of the Claude Code /test slash command) +--- + +# `/mcp-test` — mcp-server test runner with interpretation + +Equivalent of the Claude Code `/test` slash command in `.claude/commands/test.md`. Use this when the operator asks you to "run the tests", "check the mcp test suite", "run the mesh tests", etc. + +## What to do + +1. **Invoke the wrapper** from the firmware repo root: + + ```bash + ./mcp-server/run-tests.sh [pytest-args] + ``` + + If the operator specified a subset (e.g. "just the mesh tests"), pass it through as `tests/mesh` or a pytest `-k filter`. If they said nothing, use the wrapper's defaults (full suite with pytest-html report). + + The wrapper auto-detects connected Meshtastic devices, maps each to its PlatformIO env, exports the required env vars, and invokes pytest. Zero pre-flight config needed from the operator. + +2. **Read the pre-flight header** (first few lines of wrapper output). The `detected hub :` line lists role → port → env mappings. If it reads `(none)`, the wrapper narrowed to `tests/unit` only — call that out explicitly so the operator knows hardware tiers were skipped. + +3. **On pass**: one-line summary like `N passed, M skipped in `. Don't enumerate test names. DO mention any non-placeholder SKIPs (things like "role not present on hub") because they indicate missing hardware or setup issues. + +4. **On failure**: open `mcp-server/tests/report.html` (pytest-html output, self-contained) and extract the `Meshtastic debug` section for each failed test. That section includes a firmware log stream (last 200 lines) and device state dump. For each failure, summarise: + - test name + - one-line assertion message + - the specific firmware log lines that explain why (look for `PKI_UNKNOWN_PUBKEY`, `Skip send NodeInfo`, `Error=`, `Guru Meditation`, `assertion failed`, `No suitable channel`) + +5. **Classify each failure** as one of: + - **Transient flake** — LoRa collision, first-attempt NAK with self-heal pattern, timing-sensitive assertion. Suggest `/mcp-repro ` to confirm. + - **Environmental** — device unreachable, port busy, CP2102 driver wedged on macOS. Suggest specific recovery (USB replug, `touch_1200bps`, `git status userPrefs.jsonc`). + - **Regression** — same assertion fails repeatedly on re-runs, firmware log shows novel errors. Identify the firmware module likely responsible. + +6. **Do NOT run destructive recovery automatically**. If a failure looks like it needs a reflash, factory*reset, or replug — \_describe the steps* and let the operator decide. Never burn airtime or flash cycles without approval. + +## Arguments convention + +Operators generally invoke this prompt either with no arguments (full suite) or with a specific subset. Examples: + +- `tests/mesh` — one tier +- `tests/mesh/test_direct_with_ack.py::test_direct_with_ack_roundtrip` — one test +- `--force-bake` — reflash devices first +- `-k telemetry` — name-filter + +## Side-effects to confirm in your summary + +- `userPrefs.jsonc` should be clean after a successful run. The session fixture in `mcp-server/tests/conftest.py` (`_session_userprefs`) snapshots and restores. Check `git status --porcelain userPrefs.jsonc` and report if it's non-empty. +- `mcp-server/tests/report.html` and `junit.xml` regenerate on every run. +- The wrapper prints a warning if a `.mcp-session-bak` sidecar was left over from a crashed prior session and auto-restores from it — mention that if it happened. diff --git a/.gitignore b/.gitignore index 43cee78db..f1eb9d852 100644 --- a/.gitignore +++ b/.gitignore @@ -54,3 +54,5 @@ CMakeLists.txt # PYTHONPATH used by the Nix shell .python3 +.claude/scheduled_tasks.lock +userPrefs.jsonc.mcp-session-bak diff --git a/AGENTS.md b/AGENTS.md new file mode 100644 index 000000000..cd043c087 --- /dev/null +++ b/AGENTS.md @@ -0,0 +1,113 @@ +# Agent instructions + +This repository is the [Meshtastic](https://meshtastic.org) firmware — a C++17 embedded codebase targeting ESP32 / nRF52 / RP2040 / STM32WL / Linux-Portduino LoRa mesh radios — plus a Python MCP server in `mcp-server/` that AI agents use to flash, configure, and test connected devices. + +## Primary instruction file + +**Read `.github/copilot-instructions.md` first.** That file is the canonical agent-facing document for this repo. It covers project layout, coding conventions (naming, module framework, Observer pattern, thread safety), the build system, CI/CD, the native C++ test suite, and — most importantly for automation work — the **MCP Server & Hardware Test Harness** section. Read it top-to-bottom before starting any non-trivial change. + +This file (`AGENTS.md`) is a short pointer + quick reference for agents that don't read `.github/copilot-instructions.md` by default. + +## Quick command reference + +| Action | Command | +| -------------------------------- | ----------------------------------------------------------------------------------- | +| Build a firmware variant | `pio run -e ` (e.g. `pio run -e rak4631`, `pio run -e heltec-v3`) | +| Clean + rebuild | `pio run -e -t clean && pio run -e ` | +| Flash a device | `pio run -e -t upload --upload-port ` (or use the `pio_flash` MCP tool) | +| Run firmware unit tests (native) | `pio test -e native` | +| Run MCP hardware tests | `./mcp-server/run-tests.sh` | +| Live TUI test runner | `mcp-server/.venv/bin/meshtastic-mcp-test-tui` | +| Format before commit | `trunk fmt` | +| Regenerate protobuf bindings | `bin/regen-protos.sh` | +| Generate CI matrix | `./bin/generate_ci_matrix.py all [--level pr]` | + +## MCP server (device + test automation) + +The `mcp-server/` package exposes ~32 MCP tools for device discovery, building, flashing, serial monitoring, and live-node administration. Tools are grouped as: + +- **Discovery**: `list_devices`, `list_boards`, `get_board` +- **Build & flash**: `build`, `clean`, `pio_flash`, `erase_and_flash` (ESP32 factory), `update_flash` (ESP32 OTA), `touch_1200bps` +- **Serial sessions**: `serial_open`, `serial_read`, `serial_list`, `serial_close` +- **Device reads**: `device_info`, `list_nodes` +- **Device writes** (require `confirm=True`): `set_owner`, `get_config`, `set_config`, `get_channel_url`, `set_channel_url`, `send_text`, `reboot`, `shutdown`, `factory_reset`, `set_debug_log_api` +- **userPrefs admin**: `userprefs_get`, `userprefs_set`, `userprefs_reset`, `userprefs_manifest`, `userprefs_testing_profile` +- **Vendor escape hatches**: `esptool_*`, `nrfutil_*`, `picotool_*` + +Setup: `cd mcp-server && python3 -m venv .venv && .venv/bin/pip install -e '.[test]'`. The repo registers the server via `.mcp.json` — Claude Code picks it up automatically. + +See `mcp-server/README.md` for argument shapes and the **MCP Server & Hardware Test Harness** section of `.github/copilot-instructions.md` for agent usage rules (tool surface, fixture contract, firmware integration points, recovery playbooks). + +## Slash commands (AI-assisted workflows) + +Three test-and-diagnose workflows exist as slash commands: + +- **`/test` (Claude Code) / `/mcp-test` (Copilot)** — run the hardware test suite and interpret failures +- **`/diagnose` / `/mcp-diagnose`** — read-only device health report +- **`/repro` / `/mcp-repro`** — flakiness triage: re-run one test N times, diff firmware logs between passes and failures + +Bodies live in `.claude/commands/` and `.github/prompts/` respectively. `.claude/commands/README.md` is the index. + +## House rules + +- **No destructive device operations without operator approval.** `factory_reset`, `erase_and_flash`, `reboot`, `shutdown`, history-rewriting git ops — describe the action and stop. Operator authorizes. +- **One MCP call per serial port at a time.** The port lock is exclusive; concurrent calls deadlock. Sequence: open → read/mutate → close, then next device. +- **`userPrefs.jsonc` is session state during tests.** The `_session_userprefs` fixture snapshots + restores it; never edit it from inside a test. +- **Don't speculate about firmware root causes.** When evidence doesn't support a classification, say "unknown" and list what would disambiguate. +- **Run `trunk fmt` before proposing a commit.** The `trunk_check` CI gate will reject unformatted code. +- **`confirm=True` on destructive MCP tools is a real gate, not a formality.** Don't bypass it via auto-approve settings. + +## Typical agent workflows + +### Flashing a device + +1. `list_devices` → find the port + likely VID +2. `list_boards` → confirm the env, or use the known default for the hardware +3. `pio_flash(env=..., port=..., confirm=True)` for any arch, or `erase_and_flash(env=..., port=..., confirm=True)` for an ESP32 factory install + +### Inspecting live node state + +1. `device_info(port=...)` — short summary (node num, firmware version, region, peer count) +2. `list_nodes(port=...)` — full peer table (SNR, RSSI, pubkey presence, last_heard) +3. `get_config(section="lora", port=...)` — LoRa settings for cross-device comparison + +Sequence these; don't parallelize on the same port. + +### Testing a firmware change + +1. Build locally: `pio run -e ` +2. Flash the test device: `pio_flash(env=..., port=..., confirm=True)` +3. Run the suite: `./mcp-server/run-tests.sh tests/` or `/test tests/` +4. On failure, open `mcp-server/tests/report.html` → `Meshtastic debug` section for the firmware log tail + device state dump +5. Iterate + +### Debugging a flaky test + +1. `/repro [count]` — re-runs the test N times, diffs firmware logs between passes and failures +2. If the first attempt always fails and the rest pass, that's a state-leak pattern → suggest `--force-bake` or a clean device state, don't chase the first failure +3. If all N fail, this isn't a flake — it's a regression. Stop iterating and escalate to `/test` for full-suite context. + +## Where to look + +| Path | What's there | +| --------------------------------- | ---------------------------------------------------------------------------------------------------- | +| `src/` | Firmware C++ source (`mesh/`, `modules/`, `platform/`, `graphics/`, `gps/`, `motion/`, `mqtt/`, …) | +| `src/mesh/` | Core: NodeDB, Router, Channels, CryptoEngine, radio interfaces, StreamAPI, PhoneAPI | +| `src/modules/` | Feature modules; `Telemetry/Sensor/` has 50+ I2C sensor drivers | +| `variants/` | 200+ hardware variant definitions (`variant.h` + `platformio.ini` per board) | +| `protobufs/` | `.proto` definitions; regenerate with `bin/regen-protos.sh` | +| `test/` | Firmware unit tests (12 suites; `pio test -e native`) | +| `mcp-server/` | Python MCP server + pytest hardware integration tests | +| `mcp-server/tests/` | Tiered pytest suite: `unit/`, `mesh/`, `telemetry/`, `monitor/`, `fleet/`, `admin/`, `provisioning/` | +| `.claude/commands/` | Claude Code slash command bodies | +| `.github/prompts/` | Copilot prompt bodies (mirrors of the Claude Code ones) | +| `.github/copilot-instructions.md` | **Primary agent instructions — read this** | +| `.github/workflows/` | CI pipelines | +| `.mcp.json` | MCP server registration for Claude Code | + +## Recovery one-liners + +- **`userPrefs.jsonc` dirty after a test run?** Re-run `./mcp-server/run-tests.sh` once (pre-flight self-heals from the sidecar). If still dirty: `git checkout userPrefs.jsonc`. +- **nRF52 not responding?** `mcp__meshtastic__touch_1200bps(port=...)` drops it into the DFU bootloader, then `pio_flash` re-installs. +- **Port busy?** `lsof ` to find the holder. Usually a stale `pio device monitor` or zombie `meshtastic_mcp` process. Kill it. +- **Multiple MCP servers running?** `ps aux | grep meshtastic_mcp` — zombies hold ports. Kill all but the one your host spawned. diff --git a/mcp-server/.gitignore b/mcp-server/.gitignore index b1cec3c3a..f5180bc71 100644 --- a/mcp-server/.gitignore +++ b/mcp-server/.gitignore @@ -10,6 +10,17 @@ build/ # Test harness artifacts tests/report.html tests/junit.xml +tests/reportlog.jsonl +tests/fwlog.jsonl +# Subprocess-output tee from pio/esptool/nrfutil/picotool (live flash +# progress for the TUI; also a post-run diagnostic for plain CLI runs). +tests/flash.log tests/tool_coverage.json tests/.coverage htmlcov/ +# Persistent run counter for meshtastic-mcp-test-tui header. +tests/.tui-runs +# Cross-run history (TUI duration sparkline). +tests/.history/ +# Reproducer bundles (TUI `x` export on failed tests). +tests/reproducers/ diff --git a/mcp-server/pyproject.toml b/mcp-server/pyproject.toml index 7ebaca7ff..d73bf795f 100644 --- a/mcp-server/pyproject.toml +++ b/mcp-server/pyproject.toml @@ -17,10 +17,19 @@ test = [ "pytest-timeout>=2.3", "coverage[toml]>=7", "pyyaml>=6", + # textual is required by the `meshtastic-mcp-test-tui` script (see + # `src/meshtastic_mcp/cli/test_tui.py`). Bundled into `test` rather than a + # separate `[tui]` extra because v1 expects test operators are the only + # consumers; revisit if install cost pushes back. + "textual>=0.50", ] [project.scripts] meshtastic-mcp = "meshtastic_mcp.__main__:main" +# Live TUI wrapping run-tests.sh — shells out to the same script the plain +# CLI uses, tails pytest-reportlog for per-test state, and polls the device +# list at startup + post-run (port lock forces it to stay idle during the run). +meshtastic-mcp-test-tui = "meshtastic_mcp.cli.test_tui:main" [build-system] requires = ["hatchling"] diff --git a/mcp-server/run-tests.sh b/mcp-server/run-tests.sh new file mode 100755 index 000000000..87dacd119 --- /dev/null +++ b/mcp-server/run-tests.sh @@ -0,0 +1,229 @@ +#!/usr/bin/env bash +# mcp-server hardware test runner. +# +# Auto-detects connected Meshtastic devices, maps each to its PlatformIO env +# via the same role table the pytest fixtures use, exports the right +# MESHTASTIC_MCP_ENV_* env vars, and invokes pytest. +# +# Usage: +# ./run-tests.sh # full suite, default pytest args +# ./run-tests.sh tests/mesh # subset (any pytest args pass through) +# ./run-tests.sh --force-bake # override one default with another +# MESHTASTIC_MCP_ENV_NRF52=foo ./run-tests.sh # override env per role +# MESHTASTIC_MCP_SEED=ci-run-42 ./run-tests.sh # override PSK seed +# +# If zero supported devices are detected, only the unit tier runs. +# +# Also restores `userPrefs.jsonc` from the session-backup sidecar if a prior +# run exited abnormally (belt to conftest.py's atexit suspenders). + +set -euo pipefail + +# cd to the script's directory so relative paths resolve consistently no +# matter where the user invoked from. +SCRIPT_DIR="$(cd "$(dirname "$0")" && pwd)" +cd "$SCRIPT_DIR" + +VENV_PY="$SCRIPT_DIR/.venv/bin/python" +if [ ! -x "$VENV_PY" ]; then + echo "error: $VENV_PY not found or not executable." >&2 + echo " Bootstrap the venv first:" >&2 + echo " cd $SCRIPT_DIR && python3 -m venv .venv && .venv/bin/pip install -e '.[test]'" >&2 + exit 2 +fi + +# Resolve firmware root the same way conftest.py does (this script sits in +# mcp-server/, firmware repo root is one level up). +FIRMWARE_ROOT="$(cd "$SCRIPT_DIR/.." && pwd)" +USERPREFS_PATH="$FIRMWARE_ROOT/userPrefs.jsonc" +USERPREFS_SIDECAR="$USERPREFS_PATH.mcp-session-bak" + +# ---------- Pre-flight: recover stale userPrefs.jsonc from prior crash ---- +# If conftest.py's atexit hook didn't fire (SIGKILL, kernel panic, OS +# restart), the sidecar is the ground truth. Self-heal before running so we +# don't bake the previous run's dirty state into this run's firmware. +if [ -f "$USERPREFS_SIDECAR" ]; then + echo "[pre-flight] found $USERPREFS_SIDECAR from a prior abnormal exit;" >&2 + echo " restoring userPrefs.jsonc before starting." >&2 + cp "$USERPREFS_SIDECAR" "$USERPREFS_PATH" + rm -f "$USERPREFS_SIDECAR" +fi + +# If userPrefs.jsonc has uncommitted changes BEFORE the run starts, that's +# worth warning about — tests will snapshot this dirty state and restore to +# it at the end, which may not be what the operator wants. +if command -v git >/dev/null 2>&1; then + cd "$FIRMWARE_ROOT" + if [ -n "$(git status --porcelain userPrefs.jsonc 2>/dev/null)" ]; then + echo "[pre-flight] warning: userPrefs.jsonc has uncommitted changes." >&2 + echo " Tests will snapshot THIS state and restore to it" >&2 + echo " at teardown. If that's not intended, run:" >&2 + echo " git checkout userPrefs.jsonc" >&2 + echo " and re-invoke." >&2 + fi + cd "$SCRIPT_DIR" +fi + +# ---------- Seed default -------------------------------------------------- +# Per-machine default so repeated runs from the same operator land on the +# same PSK (makes --assume-baked valid across invocations). Operator can +# override with an explicit env var if they want isolation (e.g. CI). +if [ -z "${MESHTASTIC_MCP_SEED-}" ]; then + WHO="$(whoami 2>/dev/null || echo anon)" + HOST="$(hostname -s 2>/dev/null || echo host)" + export MESHTASTIC_MCP_SEED="mcp-${WHO}-${HOST}" +fi + +# ---------- Flash progress log -------------------------------------------- +# pio.py / hw_tools.py tee subprocess output (pio run -t upload, esptool, +# nrfutil, picotool) to this file line-by-line as it arrives when this env +# var is set. The TUI tails it so the operator sees live flash progress +# instead of 3 minutes of silence during `test_00_bake.py`. Plain CLI users +# also benefit — the log is a post-run diagnostic even without the TUI. +# Truncate at session start so each run gets a clean log. +export MESHTASTIC_MCP_FLASH_LOG="$SCRIPT_DIR/tests/flash.log" +: >"$MESHTASTIC_MCP_FLASH_LOG" + +# ---------- Detect connected hardware ------------------------------------- +# In-process call to the same Python API the test fixtures use, so the +# script never drifts from what pytest sees. Returns a JSON object +# {role: port, ...}. +ROLES_JSON="$( + "$VENV_PY" - <<'PY' +import json +import sys + +sys.path.insert(0, "src") +from meshtastic_mcp import devices + +# Role → canonical VID map. Kept in sync with +# `tests/conftest.py::hub_profile` defaults; if that changes, this must too. +ROLE_BY_VID = { + 0x239A: "nrf52", # Adafruit / RAK nRF52 native USB (app + DFU) + 0x303A: "esp32s3", # Espressif native USB (ESP32-S3) + 0x10C4: "esp32s3", # CP2102 USB-UART (common on Heltec/LilyGO ESP32 boards) +} + +out: dict[str, str] = {} +for dev in devices.list_devices(include_unknown=True): + vid_raw = dev.get("vid") or "" + try: + if isinstance(vid_raw, str) and vid_raw.startswith("0x"): + vid = int(vid_raw, 16) + else: + vid = int(vid_raw) + except (TypeError, ValueError): + continue + role = ROLE_BY_VID.get(vid) + # First port wins per role — matches hub_devices fixture semantics. + if role and role not in out: + out[role] = dev["port"] + +json.dump(out, sys.stdout) +PY +)" + +# ---------- Map role → pio env -------------------------------------------- +# Honor MESHTASTIC_MCP_ENV_ operator overrides; fall back to the +# same defaults hardcoded in tests/conftest.py::_DEFAULT_ROLE_ENVS. +resolve_env() { + local role="$1" + local default="$2" + local upper + upper="$(echo "$role" | tr '[:lower:]' '[:upper:]')" + local var="MESHTASTIC_MCP_ENV_${upper}" + eval "local override=\${$var:-}" + if [ -n "$override" ]; then + echo "$override" + else + echo "$default" + fi +} + +NRF52_PORT="$(echo "$ROLES_JSON" | "$VENV_PY" -c 'import json,sys; print(json.loads(sys.stdin.read()).get("nrf52", ""))')" +ESP32S3_PORT="$(echo "$ROLES_JSON" | "$VENV_PY" -c 'import json,sys; print(json.loads(sys.stdin.read()).get("esp32s3", ""))')" + +DETECTED="" +if [ -n "$NRF52_PORT" ]; then + NRF52_ENV="$(resolve_env nrf52 rak4631)" + export MESHTASTIC_MCP_ENV_NRF52="$NRF52_ENV" + DETECTED="${DETECTED} nrf52 @ ${NRF52_PORT} -> env=${NRF52_ENV}\n" +fi +if [ -n "$ESP32S3_PORT" ]; then + ESP32S3_ENV="$(resolve_env esp32s3 heltec-v3)" + export MESHTASTIC_MCP_ENV_ESP32S3="$ESP32S3_ENV" + DETECTED="${DETECTED} esp32s3 @ ${ESP32S3_PORT} -> env=${ESP32S3_ENV}\n" +fi + +# ---------- Pre-flight summary -------------------------------------------- +# Surface what pytest is about to do with respect to the bake phase: the +# operator should see "will verify + bake if needed" by default, so a +# 3-minute flash appearing mid-run isn't a surprise. Detection of the +# explicit overrides is best-effort — we just scan $@ for the known flags. +_bake_mode="auto (verify + bake if needed)" +for _arg in "$@"; do + case "$_arg" in + --assume-baked) _bake_mode="skip (--assume-baked)" ;; + --force-bake) _bake_mode="force (--force-bake)" ;; + esac +done + +echo "mcp-server test runner" +echo " firmware root : $FIRMWARE_ROOT" +echo " seed : $MESHTASTIC_MCP_SEED" +echo " bake : $_bake_mode" +if [ -n "$DETECTED" ]; then + echo " detected hub :" + printf "%b" "$DETECTED" +else + echo " detected hub : (none)" +fi +echo + +# ---------- Invoke pytest ------------------------------------------------- +# If no devices detected, only the unit tier would produce meaningful +# PASS/FAIL — every hardware test would SKIP with "role not present". We +# narrow to tests/unit explicitly so the summary reads as "no hardware, +# unit suite only" instead of "big skip count looks suspicious". +if [ -z "$DETECTED" ] && [ "$#" -eq 0 ]; then + echo "[pre-flight] no supported devices detected; running unit tier only." + echo + exec "$VENV_PY" -m pytest tests/unit -v --report-log=tests/reportlog.jsonl +fi + +# Default pytest args when the user passed none. Power users can invoke +# `./run-tests.sh tests/mesh -v --tb=long` and skip all of these defaults. +# +# NOTE: `--assume-baked` is DELIBERATELY omitted here. `tests/test_00_bake.py` +# has an internal skip-if-already-baked check (`_bake_role`: query device_info, +# compare region + primary_channel to the session profile, skip on match). +# So the fast path is ~8-10 s of verification overhead when the devices are +# already baked — negligible next to the 2-6 min suite runtime. Letting +# test_00_bake.py run means a fresh device, a re-seeded session, or a post- +# factory-reset device gets flashed automatically instead of silently +# skipping half the hardware tests with "not baked with session profile" +# errors. Power users who know their hardware is current and want to shave +# those seconds can pass `--assume-baked` explicitly. +if [ "$#" -eq 0 ]; then + set -- tests/ \ + --html=tests/report.html --self-contained-html \ + --junitxml=tests/junit.xml \ + -v --tb=short +fi + +# Always emit `tests/reportlog.jsonl` (unless the operator explicitly passed +# their own `--report-log=...`). Consumers — notably the +# `meshtastic-mcp-test-tui` TUI — tail the reportlog for live per-test state. +# Appending here means power-user invocations like `./run-tests.sh tests/mesh` +# also produce it, not just the all-defaults invocation. +_has_report_log=0 +for _arg in "$@"; do + case "$_arg" in + --report-log | --report-log=*) _has_report_log=1 ;; + esac +done +if [ "$_has_report_log" -eq 0 ]; then + set -- "$@" --report-log=tests/reportlog.jsonl +fi + +exec "$VENV_PY" -m pytest "$@" diff --git a/mcp-server/src/meshtastic_mcp/admin.py b/mcp-server/src/meshtastic_mcp/admin.py index 99dc12188..23cbaef27 100644 --- a/mcp-server/src/meshtastic_mcp/admin.py +++ b/mcp-server/src/meshtastic_mcp/admin.py @@ -36,11 +36,18 @@ def _require_confirm(confirm: bool, operation: str) -> None: def _message_to_dict(msg: Any) -> dict[str, Any]: - return json_format.MessageToDict( - msg, - preserving_proto_field_name=True, - including_default_value_fields=False, - ) + # `including_default_value_fields` was renamed to + # `always_print_fields_with_no_presence` in protobuf 5.26+. Pick whichever + # kwarg the installed version accepts so we work against both. + kwargs: dict[str, Any] = {"preserving_proto_field_name": True} + import inspect + + sig = inspect.signature(json_format.MessageToDict) + if "always_print_fields_with_no_presence" in sig.parameters: + kwargs["always_print_fields_with_no_presence"] = False + elif "including_default_value_fields" in sig.parameters: + kwargs["including_default_value_fields"] = False + return json_format.MessageToDict(msg, **kwargs) # ---------- owner ---------------------------------------------------------- @@ -291,6 +298,37 @@ def send_text( return {"ok": True, "packet_id": packet_id, "destination": destination} +# ---------- diagnostics ---------------------------------------------------- + + +def set_debug_log_api(enabled: bool, port: str | None = None) -> dict[str, Any]: + """Toggle `config.security.debug_log_api_enabled` on the local node. + + When enabled, firmware emits log lines as protobuf `LogRecord` messages + over the StreamAPI instead of raw text. meshtastic-python surfaces them + on pubsub topic `meshtastic.log.line`, which flows through the SAME + SerialInterface our tests already hold open — no `pio device monitor` + needed, no port-contention with admin/info calls. + + Firmware gate: `src/SerialConsole.cpp` (`usingProtobufs && + config.security.debug_log_api_enabled`). Setting persists in NVS; it + survives reboot. `factory_reset(full=False)` clears it unless it's + re-applied after reset. + + Previously-documented concurrency hazard (emitLogRecord sharing the + main packet-emission buffers) has been fixed — see `StreamAPI.h` + where the log path now owns dedicated `fromRadioScratchLog` / + `txBufLog` buffers, and `StreamAPI::emitTxBuffer` + + `StreamAPI::emitLogRecord` both serialize their `stream->write` + calls via `streamLock`. Leaving the flag on under traffic is safe. + """ + with connect(port=port) as iface: + sec = iface.localNode.localConfig.security + sec.debug_log_api_enabled = bool(enabled) + iface.localNode.writeConfig("security") + return {"ok": True, "debug_log_api_enabled": bool(enabled)} + + # ---------- admin actions -------------------------------------------------- @@ -315,7 +353,19 @@ def shutdown( def factory_reset( port: str | None = None, confirm: bool = False, full: bool = False ) -> dict[str, Any]: + """Tell the node to factory-reset its config. + + Works around a meshtastic-python 2.7.8 bug: `Node.factoryReset(full=True)` + internally does `p.factory_reset_config = True` where the field is + int32. protobuf 5.x rejects bool→int assignment as a TypeError. We build + the AdminMessage directly with int values (1=non-full, 2=full) and call + `_sendAdmin` to sidestep the SDK bug entirely. + """ _require_confirm(confirm, "factory_reset") + from meshtastic.protobuf import admin_pb2 # type: ignore[import-untyped] + with connect(port=port) as iface: - iface.localNode.factoryReset(full=full) + msg = admin_pb2.AdminMessage() + msg.factory_reset_config = 2 if full else 1 + iface.localNode._sendAdmin(msg) return {"ok": True, "full": full} diff --git a/mcp-server/src/meshtastic_mcp/cli/__init__.py b/mcp-server/src/meshtastic_mcp/cli/__init__.py new file mode 100644 index 000000000..04729b643 --- /dev/null +++ b/mcp-server/src/meshtastic_mcp/cli/__init__.py @@ -0,0 +1,6 @@ +"""Command-line entry points that sit alongside the MCP server. + +Modules here are loaded on-demand by `[project.scripts]` entries in +`pyproject.toml`. They are NOT imported by `meshtastic_mcp.server` or the +admin/info tool surface — the MCP server stays pure stdio JSON-RPC. +""" diff --git a/mcp-server/src/meshtastic_mcp/cli/_flashlog.py b/mcp-server/src/meshtastic_mcp/cli/_flashlog.py new file mode 100644 index 000000000..889183bb3 --- /dev/null +++ b/mcp-server/src/meshtastic_mcp/cli/_flashlog.py @@ -0,0 +1,73 @@ +"""Flash progress log tailer for ``meshtastic-mcp-test-tui``. + +``pio.py`` / ``hw_tools.py`` tee subprocess output (``pio run -t upload``, +``esptool erase_flash``, ``nrfutil dfu``, etc.) to ``tests/flash.log`` +line-by-line as it arrives — controlled by the ``MESHTASTIC_MCP_FLASH_LOG`` +env var that ``run-tests.sh`` sets. The TUI tails that file so the operator +sees live flash progress in the pytest pane instead of 3 minutes of silence +during ``test_00_bake``. + +Separate from ``_fwlog.py`` because that one parses JSONL, this one +streams plain text lines. Same daemon-thread + EOF-backoff structure. +""" + +from __future__ import annotations + +import pathlib +import threading +import time +from typing import Callable + + +class FlashLogTailer(threading.Thread): + """Tail a plain-text log file, publish each stripped line via ``post``. + + ``post`` is invoked with a single ``str`` for every new line. Lines are + stripped of trailing newlines; empty lines after stripping are dropped. + + The file may not exist yet when this thread starts — it's truncated by + ``run-tests.sh`` at session start, but if the tailer races the shell, + we tolerate FileNotFoundError for up to ``wait_s`` seconds. + """ + + def __init__( + self, + path: pathlib.Path, + post: Callable[[str], None], + stop: threading.Event, + *, + wait_s: float = 30.0, + ) -> None: + super().__init__(daemon=True, name="flashlog-tail") + self._path = path + self._post = post + self._stop = stop + self._wait_s = wait_s + + def run(self) -> None: + deadline = time.monotonic() + self._wait_s + while not self._path.is_file(): + if self._stop.is_set() or time.monotonic() > deadline: + return + time.sleep(0.1) + try: + fh = self._path.open("r", encoding="utf-8", errors="replace") + except OSError: + return + try: + while not self._stop.is_set(): + line = fh.readline() + if not line: + time.sleep(0.05) + continue + line = line.rstrip("\r\n") + if not line: + continue + try: + self._post(line) + except Exception: + # A post failure (e.g. closed app) is terminal for this + # thread but we still want to close the file handle. + return + finally: + fh.close() diff --git a/mcp-server/src/meshtastic_mcp/cli/_fwlog.py b/mcp-server/src/meshtastic_mcp/cli/_fwlog.py new file mode 100644 index 000000000..7294ce2cd --- /dev/null +++ b/mcp-server/src/meshtastic_mcp/cli/_fwlog.py @@ -0,0 +1,95 @@ +"""Firmware log tail worker for ``meshtastic-mcp-test-tui``. + +Complements v1's reportlog-tail worker. ``tests/conftest.py`` owns a +session-scoped autouse fixture (``_firmware_log_stream``) that mirrors +every ``meshtastic.log.line`` pubsub event to ``tests/fwlog.jsonl`` — +one JSON object per line: + + {"ts": 1729100000.123, "port": "/dev/cu.usbmodem1101", "line": "..."} + +The TUI tails that file from a worker thread; each new line becomes a +:class:`FirmwareLogLine` message posted to the App. Same pattern as the +reportlog tail worker — truncate on launch, tolerate missing file for +30 s, back off at EOF. + +Kept in its own module so the (large) ``test_tui.py`` stays focused on +the Textual App shell. +""" + +from __future__ import annotations + +import json +import pathlib +import threading +import time +from typing import Any, Callable + + +class FirmwareLogTailer(threading.Thread): + """Tail ``tests/fwlog.jsonl``, publish parsed records via ``post``. + + ``post`` is the App's ``post_message`` (or any callable that accepts a + single payload arg). We pass parsed dicts rather than constructing + Textual Message objects here — keeps this module free of the + textual dependency so it's unit-testable in a bare venv. + + Parameters + ---------- + path: + Path to ``tests/fwlog.jsonl``. The file may not exist yet at + startup — pytest only creates it once the session fixture runs. + post: + Callable invoked with a dict ``{"ts", "port", "line"}`` for every + new line parsed from the file. + stop: + An event the App sets to signal shutdown. + wait_s: + How long to poll for the file's creation before giving up. Default + 30 s; pytest collection on a cold cache can be slow. + """ + + def __init__( + self, + path: pathlib.Path, + post: Callable[[dict[str, Any]], None], + stop: threading.Event, + *, + wait_s: float = 30.0, + ) -> None: + super().__init__(daemon=True, name="fwlog-tail") + self._path = path + self._post = post + self._stop = stop + self._wait_s = wait_s + + def run(self) -> None: + deadline = time.monotonic() + self._wait_s + while not self._path.is_file(): + if self._stop.is_set() or time.monotonic() > deadline: + return + time.sleep(0.1) + try: + fh = self._path.open("r", encoding="utf-8") + except OSError: + return + try: + while not self._stop.is_set(): + line = fh.readline() + if not line: + time.sleep(0.05) + continue + line = line.strip() + if not line: + continue + try: + record = json.loads(line) + except json.JSONDecodeError: + continue + # Defensive: require the three fields we rely on. + if not isinstance(record, dict): + continue + if "line" not in record: + continue + self._post(record) + finally: + fh.close() diff --git a/mcp-server/src/meshtastic_mcp/cli/_history.py b/mcp-server/src/meshtastic_mcp/cli/_history.py new file mode 100644 index 000000000..639dcec5f --- /dev/null +++ b/mcp-server/src/meshtastic_mcp/cli/_history.py @@ -0,0 +1,127 @@ +"""Cross-run history for ``meshtastic-mcp-test-tui``. + +Persists one JSON object per pytest run to +``mcp-server/tests/.history/runs.jsonl``. The TUI reads the last N +entries on launch to render a duration sparkline in the header — a +quick read on whether the suite is slowing down over time. + +Schema (keep small; the file can grow for months): + + {"run": 42, "ts": 1729100000.0, "duration_s": 387.2, + "passed": 52, "failed": 0, "skipped": 23, "exit_code": 0, + "seed": "mcp-user-host"} +""" + +from __future__ import annotations + +import json +import pathlib +import time +from dataclasses import asdict, dataclass +from typing import Iterable + +# Sparkline glyphs, low → high. 8 levels is the Unicode convention. +_SPARK_BLOCKS = "▁▂▃▄▅▆▇█" + + +@dataclass +class RunRecord: + run: int + ts: float + duration_s: float + passed: int + failed: int + skipped: int + exit_code: int + seed: str + + +class HistoryStore: + """Append-only JSONL store with bounded read. + + Writes are fsynced after each append (the file is tiny; fsync cost + is negligible and protects against truncation on a crash). + """ + + def __init__(self, path: pathlib.Path, *, keep_last: int = 50) -> None: + self._path = path + self._keep_last = keep_last + + def append(self, record: RunRecord) -> None: + try: + self._path.parent.mkdir(parents=True, exist_ok=True) + with self._path.open("a", encoding="utf-8") as fh: + fh.write(json.dumps(asdict(record)) + "\n") + fh.flush() + except Exception: + # Non-fatal: history is cosmetic. + pass + + def read_recent(self) -> list[RunRecord]: + """Return the last ``keep_last`` records in chronological order.""" + if not self._path.is_file(): + return [] + try: + lines = self._path.read_text(encoding="utf-8").splitlines() + except OSError: + return [] + out: list[RunRecord] = [] + # Parse tail-first so we don't waste work on a huge history. + for line in lines[-self._keep_last :]: + line = line.strip() + if not line: + continue + try: + raw = json.loads(line) + except json.JSONDecodeError: + continue + try: + out.append(RunRecord(**raw)) + except TypeError: + # Schema drift; skip the record rather than crash. + continue + return out + + def record_run( + self, + *, + run: int, + duration_s: float, + passed: int, + failed: int, + skipped: int, + exit_code: int, + seed: str, + ) -> RunRecord: + rec = RunRecord( + run=run, + ts=time.time(), + duration_s=float(duration_s), + passed=int(passed), + failed=int(failed), + skipped=int(skipped), + exit_code=int(exit_code), + seed=seed, + ) + self.append(rec) + return rec + + +def sparkline(values: Iterable[float], *, width: int = 20) -> str: + """Render a Unicode block-character sparkline from the last ``width`` values. + + Returns an empty string for empty input so the header handles + "no history yet" gracefully. + """ + buf = [v for v in values if v >= 0][-width:] + if not buf: + return "" + lo, hi = min(buf), max(buf) + if hi - lo < 1e-9: + return _SPARK_BLOCKS[len(_SPARK_BLOCKS) // 2] * len(buf) + n = len(_SPARK_BLOCKS) - 1 + out = [] + for v in buf: + idx = int(round((v - lo) / (hi - lo) * n)) + out.append(_SPARK_BLOCKS[max(0, min(n, idx))]) + return "".join(out) diff --git a/mcp-server/src/meshtastic_mcp/cli/_reproducer.py b/mcp-server/src/meshtastic_mcp/cli/_reproducer.py new file mode 100644 index 000000000..420da3c76 --- /dev/null +++ b/mcp-server/src/meshtastic_mcp/cli/_reproducer.py @@ -0,0 +1,214 @@ +"""Reproducer bundle builder for ``meshtastic-mcp-test-tui``. + +When the operator presses ``x`` on a failed test leaf, we package the +minimum viable failure context into a tarball under +``mcp-server/tests/reproducers/``: + +:: + + repro--.tar.gz + ├── README.md human-readable overview + ├── test_report.json the failing TestReport event from reportlog + ├── fwlog.jsonl firmware log filtered to the failure window + ├── devices.json per-device device_info + lora config snapshot + └── env.json seed, run #, pytest version, platform, hostname + +Separate module so the logic can be unit-tested without Textual. The +TUI glue is thin — one key binding calls :func:`build_reproducer_bundle` +with the focused test's state and shows the path in a modal. +""" + +from __future__ import annotations + +import io +import json +import pathlib +import platform +import re +import socket +import tarfile +import time +from dataclasses import dataclass +from typing import Any, Iterable + + +@dataclass +class ReproContext: + """Everything :func:`build_reproducer_bundle` needs. Shaped to map + cleanly onto the state the TUI already tracks — no extra data + collection required at export time.""" + + nodeid: str + longrepr: str + sections: list[tuple[str, str]] + start_ts: float | None + stop_ts: float | None + seed: str + run_number: int + exit_code: int | None + fwlog_path: pathlib.Path + output_dir: pathlib.Path + extra_device_rows: list[dict[str, Any]] # [{role, port, info, ...}, ...] + + +def _short_nodeid(nodeid: str) -> str: + """Collapse a pytest nodeid into a filename-safe slug (<= 60 chars).""" + # Drop the file path prefix; keep test name + parametrization. + tail = nodeid.split("::", 1)[-1] if "::" in nodeid else nodeid + slug = re.sub(r"[^A-Za-z0-9_.\-]", "_", tail) + return slug[:60].strip("_.-") or "test" + + +def _filtered_fwlog( + fwlog_path: pathlib.Path, + start_ts: float | None, + stop_ts: float | None, + *, + pad_s: float = 5.0, +) -> bytes: + """Return fwlog.jsonl lines whose ``ts`` lies in [start-pad, stop+pad].""" + if not fwlog_path.is_file(): + return b"" + if start_ts is None or stop_ts is None: + # Without a time window, include the whole file — rare; happens + # when a test fails in setup before pytest emitted a start ts. + try: + return fwlog_path.read_bytes() + except OSError: + return b"" + lo, hi = start_ts - pad_s, stop_ts + pad_s + out = io.BytesIO() + try: + with fwlog_path.open("r", encoding="utf-8") as fh: + for line in fh: + stripped = line.strip() + if not stripped: + continue + try: + record = json.loads(stripped) + except json.JSONDecodeError: + continue + ts = record.get("ts") + if not isinstance(ts, (int, float)): + continue + if lo <= ts <= hi: + out.write(line.encode("utf-8")) + except OSError: + return b"" + return out.getvalue() + + +def _readme(ctx: ReproContext) -> str: + t = time.strftime("%Y-%m-%d %H:%M:%S %Z", time.localtime()) + return f"""# Reproducer bundle + +Exported by `meshtastic-mcp-test-tui` on {t}. + +## Failing test + +- **nodeid:** `{ctx.nodeid}` +- **seed:** `{ctx.seed}` +- **run #:** {ctx.run_number} +- **suite exit code (at export time):** {ctx.exit_code if ctx.exit_code is not None else "in progress"} + +## Files in this archive + +| File | Contents | +|---|---| +| `test_report.json` | The pytest-reportlog `TestReport` event for the failing test — includes `longrepr`, captured `sections` (stdout/stderr/log), `duration`, `location`, `keywords`. | +| `fwlog.jsonl` | Firmware log lines (from `meshtastic.log.line` pubsub) filtered to [start−5s, stop+5s] around the test's run window. Each line is `{{ts, port, line}}`. | +| `devices.json` | Per-device snapshot at export time: `device_info` + `lora` config per detected role. | +| `env.json` | Python version, platform, hostname, seed, run number. | + +## How to triage + +1. Open `test_report.json` and read `longrepr` + `sections` — most failures explain themselves there. +2. If the failure is a mesh/telemetry assertion, `fwlog.jsonl` is where the answer usually lives. Grep for `Error=`, `NAK`, `PKI_UNKNOWN_PUBKEY`, `Skip send`, `Guru Meditation`, or the uptime timestamps around the assertion event. +3. Compare `devices.json` against the expected state (e.g. `num_nodes >= 2`, `primary_channel == "McpTest"`, `region == "US"`). If fields disagree with the seed-derived USERPREFS profile, the device probably wasn't baked with this session's profile. + +## Reproducing locally + +```bash +cd mcp-server +MESHTASTIC_MCP_SEED='{ctx.seed}' .venv/bin/pytest '{ctx.nodeid}' --tb=long -v +``` +""" + + +def build_reproducer_bundle(ctx: ReproContext) -> pathlib.Path: + """Build a tarball under ``ctx.output_dir`` and return its path. + + Parent dirs are created as needed. Errors during optional sections + (devices, env) are swallowed — the bundle is still useful without + them; refusing to export because the device poller had a hiccup + would be worse than the export missing a file. + """ + ctx.output_dir.mkdir(parents=True, exist_ok=True) + ts = int(time.time()) + slug = _short_nodeid(ctx.nodeid) + archive_path = ctx.output_dir / f"repro-{ts}-{slug}.tar.gz" + + with tarfile.open(archive_path, "w:gz") as tar: + + def _add(name: str, data: bytes) -> None: + info = tarfile.TarInfo(name=name) + info.size = len(data) + info.mtime = ts + tar.addfile(info, io.BytesIO(data)) + + # README + _add("README.md", _readme(ctx).encode("utf-8")) + + # test_report.json — reconstruct from the fields the TUI stashes. + test_report = { + "nodeid": ctx.nodeid, + "outcome": "failed", + "longrepr": ctx.longrepr, + "sections": [list(s) for s in ctx.sections], + "start": ctx.start_ts, + "stop": ctx.stop_ts, + } + _add( + "test_report.json", + json.dumps(test_report, indent=2, default=str).encode("utf-8"), + ) + + # fwlog.jsonl (filtered) + _add("fwlog.jsonl", _filtered_fwlog(ctx.fwlog_path, ctx.start_ts, ctx.stop_ts)) + + # devices.json + try: + devices_payload = json.dumps( + ctx.extra_device_rows or [], indent=2, default=str + ) + except Exception: + devices_payload = "[]" + _add("devices.json", devices_payload.encode("utf-8")) + + # env.json + try: + from importlib.metadata import version as _pkg_version + + pytest_version = _pkg_version("pytest") + except Exception: + pytest_version = "unknown" + env_payload = { + "seed": ctx.seed, + "run": ctx.run_number, + "exit_code": ctx.exit_code, + "export_ts": ts, + "python": platform.python_version(), + "pytest": pytest_version, + "platform": f"{platform.system()} {platform.release()} {platform.machine()}", + "hostname": socket.gethostname(), + } + _add("env.json", json.dumps(env_payload, indent=2).encode("utf-8")) + + return archive_path + + +def iter_entries(archive_path: pathlib.Path) -> Iterable[str]: + """Yield member names — used by callers that want to confirm the bundle shape.""" + with tarfile.open(archive_path, "r:gz") as tar: + for m in tar.getmembers(): + yield m.name diff --git a/mcp-server/src/meshtastic_mcp/cli/test_tui.py b/mcp-server/src/meshtastic_mcp/cli/test_tui.py new file mode 100644 index 000000000..33201101b --- /dev/null +++ b/mcp-server/src/meshtastic_mcp/cli/test_tui.py @@ -0,0 +1,1782 @@ +"""Textual TUI wrapping `mcp-server/run-tests.sh`. + +Launch: ``meshtastic-mcp-test-tui [pytest-args]`` + +The TUI *wraps* ``run-tests.sh``; it never replaces it. Same script, same +env-var resolution, same ``userPrefs.jsonc`` session fixture. Four data +sources drive live state: + +1. ``tests/reportlog.jsonl`` — written by ``pytest-reportlog``. Tailed in a + worker thread; each JSON line is published as a :class:`ReportLogEvent` + message. This is the authoritative source for tree population + per-test + outcome. +2. The pytest subprocess ``stdout`` + ``stderr`` streams — line-by-line, + published as :class:`PytestLine` messages and rendered verbatim in the + pytest pane. +3. ``tests/fwlog.jsonl`` — firmware log stream. Written by the + ``_firmware_log_stream`` autouse session fixture in ``conftest.py`` + (mirrors every ``meshtastic.log.line`` pubsub event), tailed by the + :class:`FirmwareLogTailer` worker, displayed in a wrap-enabled + RichLog with cycleable port filter. +4. ``devices.list_devices()`` + ``info.device_info(port)`` — polled only at + startup and again after ``RunFinished``. Device polling while pytest + holds a SerialInterface would deadlock on the exclusive port lock; the + existing ``hub_devices`` fixture is session-scoped so there is no safe + "between tests" window. The header reflects this with a "(stale)" + marker while the run is active. + +Key bindings (see :class:`TestTuiApp.BINDINGS`): + ``r`` re-run focused ``f`` filter tree ``d`` failure detail + ``g`` open report.html ``l`` cycle firmware-log port filter + ``x`` export reproducer bundle ``c`` tool-coverage panel + ``q`` / Ctrl-C graceful quit with SIGINT → SIGTERM → SIGKILL escalation + +Shipped today (v1 + v2 slice): test tree + tier counters with progress bars, +pytest tail, live firmware log with port filter, device strip with +"currently running" status column, failure-detail modal, reproducer bundle +export (filters fwlog by test's start/stop timestamps), tool-coverage +modal, cross-run history sparkline in the header, clean SIGINT +propagation. Still open (see the plan file): mesh topology mini-diagram +and airtime / channel-utilization gauges. +""" + +from __future__ import annotations + +import argparse +import json +import os +import pathlib +import signal +import subprocess +import sys +import threading +import time +from dataclasses import dataclass, field +from typing import Any, Iterator + +# --------------------------------------------------------------------------- +# Configuration constants +# --------------------------------------------------------------------------- + +# Tier names that map nodeids like "tests//..." to counter buckets. +# Order here == display order in the tier-counters table. Matches the order +# `pytest_collection_modifyitems` in `conftest.py` uses: +# bake → unit → mesh → telemetry → monitor → fleet → admin → provisioning +# so the counters table reads top-to-bottom in execution order. +# +# "bake" is the synthetic tier for `tests/test_00_bake.py` — the file sits +# at the `tests/` root rather than under a tier subdirectory, so without +# this mapping `_tier_of_nodeid` would return "other" and the bake outcomes +# would be silently dropped from both the tier table and the history +# record (which sums tier counters to compute passed/failed/skipped). +TIERS = ( + "bake", + "unit", + "mesh", + "telemetry", + "monitor", + "fleet", + "admin", + "provisioning", +) + +# Relative paths from the mcp-server root. +_REPORTLOG_RELATIVE = "tests/reportlog.jsonl" +_FWLOG_RELATIVE = "tests/fwlog.jsonl" +# pio / esptool / nrfutil / picotool tee subprocess output here when +# `MESHTASTIC_MCP_FLASH_LOG` is set (see `pio._run_capturing`). run-tests.sh +# sets that env var; the TUI also sets it for direct `_spawn_pytest` calls +# so `r`-key re-runs that skip the wrapper still get tee'd output. +_FLASHLOG_RELATIVE = "tests/flash.log" +_REPORT_HTML_RELATIVE = "tests/report.html" +_TOOL_COVERAGE_RELATIVE = "tests/tool_coverage.json" +_HISTORY_RELATIVE = "tests/.history/runs.jsonl" +_REPRODUCERS_RELATIVE = "tests/reproducers" +_RUN_TESTS_RELATIVE = "run-tests.sh" +_RUN_COUNTER_RELATIVE = "tests/.tui-runs" + +# Graceful-shutdown budgets (seconds) for the pytest subprocess when the +# user hits `q`. Matches what the existing CLI's atexit + userprefs sidecar +# self-heal expects. +_SIGINT_GRACE_S = 5.0 +_SIGTERM_GRACE_S = 5.0 + + +# --------------------------------------------------------------------------- +# Path resolution +# --------------------------------------------------------------------------- + + +def _mcp_server_root() -> pathlib.Path: + """Locate the mcp-server directory (the one containing run-tests.sh).""" + here = pathlib.Path(__file__).resolve() + # Walk up until we find pyproject.toml with a matching project name, or + # default to the three-up ancestor (src/meshtastic_mcp/cli/test_tui.py → + # .../mcp-server). The walk-up protects against unusual checkouts. + for parent in (here.parent, *here.parents): + if (parent / "pyproject.toml").is_file() and ( + parent / "run-tests.sh" + ).is_file(): + return parent + return here.parents[3] + + +# --------------------------------------------------------------------------- +# Data classes +# --------------------------------------------------------------------------- + + +@dataclass +class LeafReport: + """Per-test state drawn from reportlog events. + + Outcomes mirror pytest's: "passed" | "failed" | "skipped" | "running". + """ + + nodeid: str + tier: str + outcome: str = "pending" + duration_s: float = 0.0 + longrepr: str = "" + # Captured stdout / stderr / firmware-log sections from the test's + # `TestReport.sections` — shown in the failure-detail modal. + sections: list[tuple[str, str]] = field(default_factory=list) + # Wall-clock start/stop from the TestReport event. Used by the + # reproducer exporter (`x`) to filter `tests/fwlog.jsonl` down to + # just the lines around the failure window. + start_ts: float | None = None + stop_ts: float | None = None + + +@dataclass +class TierCounters: + tier: str + passed: int = 0 + failed: int = 0 + skipped: int = 0 + running: int = 0 + remaining: int = 0 + + +@dataclass +class DeviceRow: + role: str | None + port: str + vid: str + pid: str + description: str + # Populated from info.device_info when available; empty dict when we + # haven't queried (or when the poller is paused). + info: dict[str, Any] = field(default_factory=dict) + + +@dataclass +class State: + """Shared state owned by the App; written by workers under `lock`. + + UI code reads via Textual Message handlers which run on the UI thread + in the order workers called `post_message` — so reads don't need the + lock themselves. + """ + + lock: threading.Lock = field(default_factory=threading.Lock) + tiers: dict[str, TierCounters] = field( + default_factory=lambda: {t: TierCounters(tier=t) for t in TIERS} + ) + leaves: dict[str, LeafReport] = field(default_factory=dict) + # Ordered list of nodeids in the order they were first seen — lets us + # rebuild the tree deterministically. + nodeid_order: list[str] = field(default_factory=list) + devices: list[DeviceRow] = field(default_factory=list) + run_active: bool = False + exit_code: int | None = None + # nodeid of the currently-running test. Set on `when="setup"` + + # outcome="passed" (body about to execute); cleared on `when="call"` + # (any outcome) or on `when="setup"` + outcome="failed" (no body + # window). Drives the device-table "Status" column so the operator + # can see which test is touching a given device right now. + running_nodeid: str | None = None + # `time.monotonic()` captured when `running_nodeid` was set. Surfaced + # as live-updating elapsed-time ("RUNNING: test_bake_nrf52 (1:23)") so + # an operator staring at a ~3 min `test_00_bake` or a `mesh_formation` + # with a 60 s ceiling has concrete evidence the test isn't stuck. + running_started_at: float | None = None + + +# --------------------------------------------------------------------------- +# Helpers +# --------------------------------------------------------------------------- + + +def _tier_of_nodeid(nodeid: str) -> str: + """Map a pytest nodeid to its tier bucket. Unknown → 'other'. + + `tests/test_00_bake.py::...` is special-cased to the synthetic `bake` + tier — it's a top-level file (no tier subdirectory) so the generic + "second path segment" logic would miss it and route the bake outcomes + into the non-existent `other` bucket. + """ + parts = nodeid.split("/", 2) + if len(parts) >= 2 and parts[0] == "tests": + # Bake file sits at `tests/test_00_bake.py` — dedicated bucket. + if parts[1].startswith("test_00_bake"): + return "bake" + candidate = parts[1] + if candidate in TIERS: + return candidate + return "other" + + +def _file_of_nodeid(nodeid: str) -> str: + """Extract the test file name (e.g. 'test_boards.py') from a nodeid.""" + left = nodeid.split("::", 1)[0] + return left.rsplit("/", 1)[-1] + + +def _testname_of_nodeid(nodeid: str) -> str: + """Extract the 'test_foo[param]' suffix from a nodeid, or the full thing.""" + if "::" in nodeid: + return nodeid.split("::", 1)[1] + return nodeid + + +def _roles_from_nodeid(nodeid: str) -> set[str]: + """Infer which device roles a parametrized test touches. + + Patterns we recognize (from the existing ``conftest.py`` parametrization + in ``pytest_generate_tests``): + + - ``test_foo[nrf52]`` → {"nrf52"} (baked_single) + - ``test_foo[nrf52->esp32s3]`` → {"nrf52", "esp32s3"} (mesh_pair) + + Unparametrized tests (no bracket) return an empty set — the caller + should fall back to "this test involves ALL detected devices" rather + than pretending it touches none. + """ + if "[" not in nodeid or not nodeid.endswith("]"): + return set() + try: + inner = nodeid.rsplit("[", 1)[1][:-1] + except Exception: + return set() + # Split on "->" for directed mesh pairs; otherwise treat as single role. + parts = [p.strip() for p in inner.split("->")] if "->" in inner else [inner.strip()] + return {p for p in parts if p} + + +def _parse_events(path: pathlib.Path) -> Iterator[dict[str, Any]]: + """Yield parsed JSON dicts from a reportlog file, skipping malformed lines. + + Used for smoke-testing the parser against a finished file; the live + worker has its own tail loop. + """ + if not path.is_file(): + return + with path.open("r", encoding="utf-8") as fh: + for line in fh: + line = line.strip() + if not line: + continue + try: + yield json.loads(line) + except json.JSONDecodeError: + continue + + +def _load_run_number(counter_path: pathlib.Path) -> int: + """Bump + persist a monotonic run counter used in the TUI header.""" + try: + n = int(counter_path.read_text().strip()) + except Exception: + n = 0 + n += 1 + try: + counter_path.parent.mkdir(parents=True, exist_ok=True) + counter_path.write_text(str(n)) + except Exception: + # Non-fatal: the counter is cosmetic. + pass + return n + + +def _resolve_seed() -> str: + """Mirror the default-seed resolution from run-tests.sh. + + Operator can override via MESHTASTIC_MCP_SEED. Matches the + per-user/per-host default so repeated invocations land on the same PSK + (makes --assume-baked valid across invocations). + """ + if explicit := os.environ.get("MESHTASTIC_MCP_SEED"): + return explicit + try: + who = os.environ.get("USER") or os.environ.get("LOGNAME") or "anon" + except Exception: + who = "anon" + try: + import socket + + host = socket.gethostname().split(".", 1)[0] + except Exception: + host = "host" + return f"mcp-{who}-{host}" + + +def _format_duration(seconds: float) -> str: + if seconds < 60: + return f"{seconds:5.1f}s" + m, s = divmod(int(seconds), 60) + return f"{m:d}:{s:02d}" + + +# --------------------------------------------------------------------------- +# Textual imports (lazy — only when main() runs, so `_parse_events` can be +# imported by smoke tests without requiring textual installed in every env) +# --------------------------------------------------------------------------- + + +def _import_textual() -> Any: + """Return a namespace carrying every Textual class we use. + + Deferred import keeps `_parse_events` + `_tier_of_nodeid` importable + from tests / smoke scripts without pulling in the UI stack. + """ + import textual + from textual.app import App, ComposeResult + from textual.binding import Binding + from textual.containers import Horizontal, Vertical + from textual.message import Message + from textual.screen import ModalScreen + from textual.widgets import DataTable, Footer, Input, RichLog, Static, Tree + + ns = argparse.Namespace() + ns.App = App + ns.Binding = Binding + ns.ComposeResult = ComposeResult + ns.DataTable = DataTable + ns.Footer = Footer + ns.Horizontal = Horizontal + ns.Input = Input + ns.Message = Message + ns.ModalScreen = ModalScreen + ns.RichLog = RichLog + ns.Static = Static + ns.Tree = Tree + ns.Vertical = Vertical + ns.textual = textual + return ns + + +# --------------------------------------------------------------------------- +# main() — the important scaffolding lives here so that when we bail out +# before entering the Textual event loop (missing terminal, --help, etc.) +# nothing has grabbed the screen yet. +# --------------------------------------------------------------------------- + + +def main(argv: list[str] | None = None) -> int: + """Entry point for `meshtastic-mcp-test-tui`.""" + argv = list(argv if argv is not None else sys.argv[1:]) + + parser = argparse.ArgumentParser( + prog="meshtastic-mcp-test-tui", + description=( + "Live Textual TUI wrapping mcp-server/run-tests.sh. " + "Passes any unrecognized arguments through to pytest." + ), + allow_abbrev=False, + ) + parser.add_argument( + "--no-tui", + action="store_true", + help=( + "Skip the TUI and exec run-tests.sh directly. Useful as a health " + "check that the wrapper argv+env resolution is working." + ), + ) + args, pytest_args = parser.parse_known_args(argv) + + root = _mcp_server_root() + run_tests = root / _RUN_TESTS_RELATIVE + reportlog = root / _REPORTLOG_RELATIVE + fwlog = root / _FWLOG_RELATIVE + flashlog = root / _FLASHLOG_RELATIVE + counter = root / _RUN_COUNTER_RELATIVE + + if not run_tests.is_file(): + print( + f"error: could not locate {_RUN_TESTS_RELATIVE} relative to " + f"{root}. Is this the mcp-server checkout?", + file=sys.stderr, + ) + return 2 + + # Always clear stale log files before launching pytest. The TUI's tail + # workers race pytest file-creation; starting from a known-empty state + # avoids mid-line-decode confusion from the prior run. The fwlog session + # fixture also truncates on its end, and run-tests.sh truncates the + # flashlog — triple-truncate is deliberate (whichever side creates the + # file first, it starts empty). + for p in (reportlog, fwlog, flashlog): + try: + p.unlink(missing_ok=True) + except Exception: + pass + + # Compute + persist the run counter for the header (cosmetic). + run_number = _load_run_number(counter) + seed = _resolve_seed() + # Export the seed so the subprocess inherits the SAME value the TUI + # displays. run-tests.sh computes its own fallback if unset, and we'd + # end up with a header / wrapper-header mismatch if we let that happen. + os.environ.setdefault("MESHTASTIC_MCP_SEED", seed) + # Turn on subprocess-output tee'ing so `pio._run_capturing` writes each + # line of pio / esptool / nrfutil / picotool output to `tests/flash.log` + # as it arrives. The TUI tails that file and routes each line to the + # pytest pane so the operator sees live flash progress during long + # `pio run -t upload` / `esptool erase_flash` operations. run-tests.sh + # also sets this when invoked directly — `setdefault` so the wrapper's + # value wins when present. + os.environ.setdefault("MESHTASTIC_MCP_FLASH_LOG", str(flashlog)) + + # --no-tui: exec run-tests.sh directly. Useful for diagnosing wrapper + # env / argv handling without getting into Textual's alternate screen. + if args.no_tui: + cmd = [str(run_tests), *pytest_args] + os.execv(str(run_tests), cmd) # noqa: S606 — intentional + + # Textual UI import is deferred so `--help` and `--no-tui` do not pay + # the ~40 MB startup cost. + try: + tx = _import_textual() + except ImportError as exc: + print( + f"error: textual is not installed ({exc}). Install with: " + f"pip install -e '.[test]'", + file=sys.stderr, + ) + return 2 + + # Narrow-terminal warning (see plan §8 risk 2). Textual itself degrades, + # but a heads-up helps a first-time user. + term = os.environ.get("TERM", "") + if term in ("", "dumb", "screen") and not os.environ.get("TEXTUAL_NO_TERM_HINT"): + print( + f"[hint] TERM={term!r} may render poorly. Try " + f"`TERM=xterm-256color meshtastic-mcp-test-tui ...` if the layout " + f"looks broken.", + file=sys.stderr, + ) + + app = _build_app( + tx=tx, + root=root, + run_tests=run_tests, + reportlog=reportlog, + fwlog=fwlog, + flashlog=flashlog, + seed=seed, + run_number=run_number, + pytest_args=pytest_args, + ) + + # App.run() returns the subprocess exit code via `app.exit(returncode)`. + return_value = app.run() + if isinstance(return_value, int): + return return_value + return 0 + + +# --------------------------------------------------------------------------- +# Everything below is only reachable once Textual is importable. `tx` is +# the namespace returned by `_import_textual()` so we don't scatter `from +# textual import ...` across the file. +# --------------------------------------------------------------------------- + + +def _build_app( + *, + tx: Any, + root: pathlib.Path, + run_tests: pathlib.Path, + reportlog: pathlib.Path, + fwlog: pathlib.Path, + flashlog: pathlib.Path, + seed: str, + run_number: int, + pytest_args: list[str], +) -> Any: + """Assemble TestTuiApp with its Textual-dependent inner classes. + + Keeping the class definitions inside a factory means `main()` can + short-circuit (--no-tui, terminal-check, argparse error) before we + force Textual's import cost. + """ + + # Helper modules — lazy-imported here so the top-of-file import cost + # only kicks in when main() has decided to run the TUI. + from . import _flashlog as _flashlog_mod + from . import _fwlog as _fwlog_mod + from . import _history as _history_mod + from . import _reproducer as _reproducer_mod + + # ---------------- Messages ---------------- + + class ReportLogEvent(tx.Message): + def __init__(self, event: dict[str, Any]) -> None: + self.event = event + super().__init__() + + class PytestLine(tx.Message): + def __init__(self, source: str, line: str) -> None: + self.source = source # "stdout" | "stderr" + self.line = line + super().__init__() + + class FirmwareLogLine(tx.Message): + def __init__(self, record: dict[str, Any]) -> None: + # {"ts": float, "port": str | None, "line": str} + self.record = record + super().__init__() + + class FlashLogLine(tx.Message): + """Plain-text line from `tests/flash.log` — pio / esptool / nrfutil / + picotool output tee'd by `pio._run_capturing`. Routed to the pytest + pane so the operator sees live flash progress during `test_00_bake` + instead of 3 minutes of pytest-captured silence.""" + + def __init__(self, line: str) -> None: + self.line = line + super().__init__() + + class DeviceSnapshot(tx.Message): + def __init__(self, rows: list[DeviceRow]) -> None: + self.rows = rows + super().__init__() + + class RunFinished(tx.Message): + def __init__(self, returncode: int) -> None: + self.returncode = returncode + super().__init__() + + # ---------------- Workers ---------------- + + class ReportlogWorker(threading.Thread): + """Tail `reportlog.jsonl`, publish each event.""" + + def __init__(self, app: Any, path: pathlib.Path, stop: threading.Event) -> None: + super().__init__(daemon=True, name="reportlog-tail") + self._app = app + self._path = path + self._stop = stop + + def run(self) -> None: + # Wait up to 30 s for pytest to create the file (first call on + # a cold cache can be slow). + wait_deadline = time.monotonic() + 30.0 + while not self._path.is_file(): + if self._stop.is_set() or time.monotonic() > wait_deadline: + return + time.sleep(0.1) + try: + fh = self._path.open("r", encoding="utf-8") + except OSError: + return + try: + while not self._stop.is_set(): + line = fh.readline() + if not line: + time.sleep(0.05) + continue + line = line.strip() + if not line: + continue + try: + event = json.loads(line) + except json.JSONDecodeError: + continue + self._app.post_message(ReportLogEvent(event)) + finally: + fh.close() + + class SubprocessReaderWorker(threading.Thread): + """Read one stream line-by-line and publish PytestLine messages.""" + + def __init__( + self, + app: Any, + stream: Any, + source: str, + stop: threading.Event, + ) -> None: + super().__init__(daemon=True, name=f"subprocess-{source}") + self._app = app + self._stream = stream + self._source = source + self._stop = stop + + def run(self) -> None: + try: + for line in iter(self._stream.readline, ""): + if self._stop.is_set(): + break + self._app.post_message( + PytestLine(source=self._source, line=line.rstrip("\n")) + ) + except Exception: + # stream closed / subprocess died; not fatal. + pass + + class DevicePollerWorker(threading.Thread): + """Poll list_devices() + device_info() at startup and after RunFinished. + + Deliberately NOT polling during the run — `hub_devices` is a + session-scoped fixture holding SerialInterfaces across the whole + session, and device_info() would deadlock on the exclusive port + lock. Header shows "(stale)" during the gap. + """ + + def __init__(self, app: Any, state: State, stop: threading.Event) -> None: + super().__init__(daemon=True, name="device-poller") + self._app = app + self._state = state + self._stop = stop + self._trigger = threading.Event() + + def trigger(self) -> None: + self._trigger.set() + + def run(self) -> None: + # Perform one poll at startup; then wait for explicit triggers. + self._poll_once() + while not self._stop.is_set(): + if self._trigger.wait(timeout=0.5): + self._trigger.clear() + if self._stop.is_set(): + break + with self._state.lock: + active = self._state.run_active + if active: + continue + self._poll_once() + + def _poll_once(self) -> None: + try: + from meshtastic_mcp import devices as devices_mod + from meshtastic_mcp import info as info_mod + except Exception as exc: # pragma: no cover + self._app.post_message( + PytestLine( + source="stderr", line=f"[tui] device import failed: {exc!r}" + ) + ) + return + rows: list[DeviceRow] = [] + try: + raw = devices_mod.list_devices(include_unknown=True) + except Exception as exc: + self._app.post_message( + PytestLine( + source="stderr", line=f"[tui] list_devices failed: {exc!r}" + ) + ) + return + for d in raw: + vid_raw = d.get("vid") or "" + try: + vid_i = ( + int(vid_raw, 16) + if isinstance(vid_raw, str) and vid_raw.startswith("0x") + else int(vid_raw) + ) + except (TypeError, ValueError): + vid_i = 0 + role = None + if vid_i == 0x239A: + role = "nrf52" + elif vid_i in (0x303A, 0x10C4): + role = "esp32s3" + if not role and not d.get("likely_meshtastic"): + continue + row = DeviceRow( + role=role, + port=d.get("port", ""), + vid=str(vid_raw), + pid=str(d.get("pid") or ""), + description=d.get("description", "") or "", + ) + if role: + try: + row.info = info_mod.device_info(port=row.port, timeout_s=6.0) + except Exception as exc: + row.info = {"error": repr(exc)} + rows.append(row) + self._app.post_message(DeviceSnapshot(rows=rows)) + + # ---------------- Modals ---------------- + + class FailureDetailScreen(tx.ModalScreen): + """Show a failed test's longrepr + captured sections.""" + + BINDINGS = [tx.Binding("escape,q", "dismiss", "close")] + + def __init__(self, leaf: LeafReport, report_html: pathlib.Path) -> None: + self._leaf = leaf + self._report_html = report_html + super().__init__() + + def compose(self) -> Any: + yield tx.Static( + f"[bold]{self._leaf.nodeid}[/bold] " + f"outcome=[red]{self._leaf.outcome}[/red] " + f"duration={_format_duration(self._leaf.duration_s)}", + id="failure-detail-header", + ) + log = tx.RichLog( + highlight=False, markup=False, wrap=False, id="failure-detail-log" + ) + yield log + yield tx.Static( + f"[dim]Full HTML report: {self._report_html}[/dim] [esc] close", + id="failure-detail-footer", + ) + + def on_mount(self) -> None: + log = self.query_one("#failure-detail-log", tx.RichLog) + if self._leaf.longrepr: + log.write(self._leaf.longrepr) + log.write("") + for section_name, section_text in self._leaf.sections: + log.write(f"--- {section_name} ---") + log.write(section_text) + log.write("") + if not self._leaf.longrepr and not self._leaf.sections: + log.write("(no longrepr or captured sections in reportlog event)") + + def action_dismiss(self, _result: Any = None) -> None: + self.dismiss() + + class FilterInputScreen(tx.ModalScreen[str]): + """Prompt the user for a tree filter substring (empty clears).""" + + BINDINGS = [tx.Binding("escape", "cancel", "cancel")] + + def compose(self) -> Any: + yield tx.Static("filter test tree (substring, empty = clear):") + yield tx.Input(placeholder="nodeid substring", id="filter-input") + + def on_input_submitted(self, event: Any) -> None: + self.dismiss(event.value.strip()) + + def action_cancel(self) -> None: + self.dismiss(None) + + class CoverageModal(tx.ModalScreen): + """Read `tests/tool_coverage.json` (written by `tests/tool_coverage.py` + at `pytest_sessionfinish`) and render a two-column summary of which + MCP tools got exercised by the run. `(no coverage data yet)` while + the run is in flight.""" + + BINDINGS = [tx.Binding("escape,q,c", "dismiss", "close")] + + def __init__(self, coverage_path: pathlib.Path) -> None: + self._path = coverage_path + super().__init__() + + def compose(self) -> Any: + yield tx.Static("[bold]MCP tool coverage[/bold]", id="coverage-header") + yield tx.RichLog( + highlight=False, markup=True, wrap=False, id="coverage-log" + ) + yield tx.Static( + f"[dim]{self._path}[/dim] [esc] close", + id="coverage-footer", + ) + + def on_mount(self) -> None: + log = self.query_one("#coverage-log", tx.RichLog) + if not self._path.is_file(): + log.write("(no coverage data — tool_coverage.json not written yet)") + log.write("") + log.write("Coverage is emitted at pytest_sessionfinish; this") + log.write("file appears after the suite completes.") + return + try: + data = json.loads(self._path.read_text(encoding="utf-8")) + except Exception as exc: + log.write(f"[red]failed to read {self._path}:[/red] {exc!r}") + return + calls = data.get("calls") or {} + if not calls: + log.write("(tool_coverage.json present but no calls recorded)") + return + exercised = sorted( + ((n, c) for n, c in calls.items() if c > 0), key=lambda x: -x[1] + ) + unexercised = sorted(n for n, c in calls.items() if c == 0) + log.write(f"[b]{len(exercised)} / {len(calls)} MCP tools exercised[/b]") + log.write("") + log.write("[green]exercised[/green] (count):") + for name, count in exercised: + log.write(f" {count:>4} {name}") + if unexercised: + log.write("") + log.write("[dim]not exercised:[/dim]") + for name in unexercised: + log.write(f" {name}") + + def action_dismiss(self, _result: Any = None) -> None: + self.dismiss() + + class ReproducerResultModal(tx.ModalScreen): + """Show the exported reproducer tarball path with a short instruction.""" + + BINDINGS = [tx.Binding("escape,q,enter", "dismiss", "close")] + + def __init__( + self, archive_path: pathlib.Path, error: str | None = None + ) -> None: + self._archive = archive_path + self._error = error + super().__init__() + + def compose(self) -> Any: + if self._error: + yield tx.Static(f"[red]Reproducer export failed:[/red] {self._error}") + else: + yield tx.Static("[bold green]Reproducer bundle written[/bold green]") + yield tx.Static(f"[cyan]{self._archive}[/cyan]") + yield tx.Static("") + yield tx.Static( + "Contains: README.md, test_report.json, fwlog.jsonl (time-filtered)," + ) + yield tx.Static( + "devices.json, env.json. Attach to an issue / paste the path in chat." + ) + yield tx.Static("") + yield tx.Static("[dim][esc] close[/dim]") + + def action_dismiss(self, _result: Any = None) -> None: + self.dismiss() + + # ---------------- App ---------------- + + class TestTuiApp(tx.App): + CSS = """ + Screen { layout: vertical; } + #header-bar { height: 2; padding: 0 1; background: $panel; } + #tier-table { height: auto; max-height: 11; } + #body { height: 1fr; } + #tree-pane { width: 50%; border-right: solid $primary-background; } + #right-pane { width: 50%; layout: vertical; } + #pytest-pane { height: 50%; border-bottom: solid $primary-background; } + #fwlog-header { height: 1; padding: 0 1; background: $panel; } + #fwlog-pane { height: 1fr; } + Tree { height: 100%; } + RichLog { height: 100%; } + #device-table { height: auto; max-height: 6; } + """ + + TITLE = "mcp-server test runner" + + BINDINGS = [ + tx.Binding("r", "rerun_focused", "re-run focused"), + tx.Binding("f", "filter_tree", "filter"), + tx.Binding("d", "failure_detail", "failure detail"), + tx.Binding("g", "open_html_report", "open report.html"), + tx.Binding("x", "export_reproducer", "export reproducer"), + tx.Binding("c", "coverage_panel", "coverage"), + tx.Binding("l", "cycle_fwlog_filter", "fw log filter"), + tx.Binding("q,ctrl+c", "quit_app", "quit"), + ] + + def __init__(self) -> None: + super().__init__() + self._state = State() + self._root = root + self._run_tests = run_tests + self._reportlog = reportlog + self._fwlog = fwlog + self._flashlog = flashlog + self._report_html = root / _REPORT_HTML_RELATIVE + self._tool_coverage = root / _TOOL_COVERAGE_RELATIVE + self._repro_dir = root / _REPRODUCERS_RELATIVE + self._seed = seed + self._run_number = run_number + self._pytest_args = pytest_args + self._start_time = time.monotonic() + self._proc: subprocess.Popen[str] | None = None + self._stop = threading.Event() + self._reportlog_worker: ReportlogWorker | None = None + self._stdout_worker: SubprocessReaderWorker | None = None + self._stderr_worker: SubprocessReaderWorker | None = None + self._device_worker: DevicePollerWorker | None = None + self._fwlog_worker: _fwlog_mod.FirmwareLogTailer | None = None + self._flashlog_worker: _flashlog_mod.FlashLogTailer | None = None + self._tree_filter: str = "" + self._sigint_count = 0 + # Firmware-log port filter: None = all, else exact port match. + self._fwlog_filter: str | None = None + # Ordered set of distinct ports we've seen firmware log lines + # from — the `l` key cycles through these. + self._fwlog_ports: list[str] = [] + # Cross-run history. + self._history_store = _history_mod.HistoryStore( + root / _HISTORY_RELATIVE, keep_last=40 + ) + self._history_cache = self._history_store.read_recent() + + # -------- composition / mount -------- + + def compose(self) -> Any: + yield tx.Static(self._header_text(), id="header-bar") + tier_table = tx.DataTable(id="tier-table", show_cursor=False) + yield tier_table + with tx.Horizontal(id="body"): + with tx.Vertical(id="tree-pane"): + yield tx.Tree("tests", id="test-tree") + with tx.Vertical(id="right-pane"): + with tx.Vertical(id="pytest-pane"): + yield tx.RichLog( + id="pytest-log", + highlight=False, + markup=False, + wrap=False, + max_lines=5000, + ) + yield tx.Static(self._fwlog_header_text(), id="fwlog-header") + with tx.Vertical(id="fwlog-pane"): + yield tx.RichLog( + id="fwlog-log", + highlight=False, + markup=False, + # `wrap=True` so long firmware log lines (some + # hit ~200 chars — full packet hex dumps plus + # source tags) don't get truncated at the + # right edge. The right pane is ~50% of the + # terminal so even a wide terminal has a + # ~90-char cap; plain truncation dropped the + # uptime counter or packet id off the end. + wrap=True, + max_lines=5000, + ) + yield tx.DataTable(id="device-table", show_cursor=False) + yield tx.Footer() + + def _fwlog_header_text(self) -> str: + filt = self._fwlog_filter or "(all ports)" + return f"firmware log filter: [b]{filt}[/b] [l] cycle" + + def on_mount(self) -> None: + # Tier-counters table. `add_column` (singular) lets us pick + # the key explicitly — `add_columns` (plural) in textual 8.x + # returns auto-generated keys that are tedious to track + # separately, and update_cell(column_key=