Jamie Pine 60369e9f00 feat: redundancy awareness & cross-volume file comparison (#3053)
* feat: add redundancy awareness & cross-volume file comparison

Surface content redundancy data so users can answer "if this drive dies,
what do I lose?" — builds on existing content identity and volume systems.

Backend:
- New `redundancy.summary` library query with per-volume at-risk vs
  redundant byte/file counts and a library-wide replication score
- Extend `SearchFilters` with `at_risk`, `on_volumes`, `not_on_volumes`,
  `min_volume_count`, `max_volume_count` filters
- Add composite index migration on entries(content_id, volume_id)

Frontend:
- `/redundancy` dashboard with replication score, volume bars, at-risk callout
- `/redundancy/at-risk` paginated file list sorted by size
- `/redundancy/compare` two-volume comparison (unique/shared toggle)
- Sidebar ShieldCheck button linking to redundancy view

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

* redundancy UI improvements + ZFS volume detection fix

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

* fix ZFS pool capacity reporting and stats filtering

- ZFS: override total_capacity for pool-root volumes using zfs list
  used+available. df under-reports pool-root Size because it only
  counts the root dataset's own used bytes plus avail — on a 60 TB
  raidz2 pool this shows as ~15 TB instead of ~62 TB. The pool root's
  own used property includes descendants, so used+available is the
  real usable capacity.

- Library stats: drop volumes where is_user_visible=false AND
  re-apply should_hide_by_mount_path retroactively so stale DB rows
  (detected before the Linux visibility filters existed) don't
  inflate reported capacity.

- Extract should_hide_by_mount_path into volume/utils as a shared
  helper used by both the list query and the stats calculation.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

* show capacity and visibility in sd volume list

Makes it possible to verify library-level capacity aggregation from
the CLI — previously the list only showed mount, fingerprint, and
tracked/mounted state, which meant debugging the ZFS pool capacity
issue required querying the library DB directly.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

* document filesystem support matrix and detection

New docs/core/filesystems.mdx covering per-filesystem capabilities
(CoW, pool-awareness, visibility filtering, capacity correction),
platform detection strategies, the FilesystemHandler trait, Linux/
macOS/ZFS visibility rules, the ZFS pool-root capacity problem and
fix, copy strategy selection, and known limitations.

Registered under File Management in both mint.json and docs.json.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

* WIP: redundancy filter wiring across search, CLI, and UI

- core/src/ops/search: wire redundancy filters (at_risk, on_volumes,
  not_on_volumes, min/max volume_count) through the search query;
  fix UUID-to-SQLite BLOB literal so volume UUID comparisons actually
  match (volumes.uuid is stored as a 16-byte BLOB, quoted-string
  comparison silently returned zero rows).

- apps/cli: new redundancy subcommand + populate the new
  SearchFilters fields from search args.

- packages/interface: redundancy at-risk and compare pages reworked
  to consume the new filter surface; explorer context/hook updates
  to support redundancy-scoped views.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

* add web context menu renderer and UI polish

- New WebContextMenuProvider + Radix DropdownMenu-based renderer anchored
  at cursor via a 1x1 virtual trigger. Handles separators, submenus,
  disabled, and the danger variant via text-status-error.
- useContextMenu now routes web clicks through the provider instead of
  parking data in unused local state, and trims leading/trailing/adjacent
  separators so condition-filtered menus don't render orphaned lines.
- Drop app-frame corner rounding on the web build.
- Add shrink-0 to the sidebar space switcher so the scrollable sibling
  can't compress it vertically.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

* sd-server dev workflow: auto web build, shutdown watchdog, stable data dir

- build.rs runs `bun run build` in apps/web so `just dev-server` always
  embeds the latest UI. rerun-if-changed covers apps/web/src,
  packages/interface/src, and packages/ts-client/src so Rust-only edits
  skip the rebuild. Skips gracefully when bun isn't on PATH or
  SD_SKIP_WEB_BUILD is set; Dockerfile sets the latter since dist is
  pre-built and bun isn't in the Rust stage.
- Graceful shutdown was hanging because the browser holds the /events
  SSE stream open forever and axum waits for all connections to drain.
  After the first signal, arm a background force-exit on second Ctrl+C
  or 5s timeout so the process can't stick.
- Debug builds were starting from a fresh tempfile::tempdir() on every
  run (the TempDir handle dropped at end of the closure, deleting the
  dir we just took a path to). Default to ~/.spacedrive in debug so data
  persists and `just dev-server` shares a data dir with the Tauri app.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

* add Sources space item alongside Redundancy in default library layout

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

* add TrueNAS native build script

Uses zig cc as C/C++ compiler on TrueNAS Scale where /usr is
read-only and no system gcc exists. Dev tools live at
/mnt/pool/dev-tools/ (zig, cmake, make, extracted deb headers).

Builds sd-server + sd-cli in ~4 min on a 12-core NAS. AI feature
disabled (whisper.cpp C11 atomics incompatible with zig clang-18).

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

* never block RPC on synchronous statistics calculation

On first load (fresh library, all stats zero), libraries.info used
to calculate statistics synchronously before responding. On large
libraries during active indexing this hangs indefinitely — the
closure-table walk in calculate_file_statistics loads every
descendant ID into a Vec then issues a WHERE IN(...) with millions
of entries, which SQLite can't finish while the indexer is writing.

Now always return cached (possibly zero) stats and let the
background recalculate_statistics task fill them in. The UI
refreshes via the ResourceChanged event when the calculation
completes.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

* self-heal protocol handler registration on re-init

Core::new() registers default protocol handlers after starting networking,
but swallows any failure (error is only logged). If the initial registration
fails — e.g. on a host where start_networking hasn't fully set up the event
loop command sender by the time register_default_protocol_handlers runs —
the registry is left empty. A subsequent call to Core::init_networking()
would see `services.networking().is_some()` and skip re-registration,
permanently leaving protocols unregistered for the life of the process.

sd-server calls init_networking() right after Core::new(), so it's the
client most exposed to this. Symptom: pairing over the web UI returns
"Pairing protocol not registered" while the same library works fine
from Tauri and mobile.

Fix: init_networking now queries the registry directly for the pairing
handler and re-registers the default set if it's missing, independent of
whether networking is already initialized.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

* fall back to pkarr+DNS discovery when mDNS port is unavailable

Iroh's endpoint.bind() fails wholesale if any configured discovery service
fails to initialize. MdnsDiscovery requires binding UDP :5353, which on
most Linux systems (including TrueNAS) is already owned by avahi-daemon.
Result: endpoint creation errors out with "Service 'mdns' error", the
event loop never starts, command_sender stays None, and protocol
registration fails — so sd-server has no working networking at all.

Make mDNS best-effort: on any error whose message mentions "mdns",
retry endpoint creation with only pkarr + DNS discovery. Local-network
auto-discovery is lost but remote pairing via node ID (which uses n0's
DNS infrastructure, not mDNS) continues to work normally.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

* succeed pairing if either mDNS or relay discovery wins

The dual-path discovery in start_pairing_as_joiner_with_code used
tokio::select! to race mDNS and relay. select! resolves on the first
branch to complete — including errors — so a host that can't bind
mDNS (e.g. a Linux box where avahi already owns UDP :5353) would fail
pairing wholesale: mDNS discovery errors out in <1ms with
"Failed to create mDNS discovery: Service 'mdns' error", that Err
wins the race, and relay discovery gets cancelled before it can even
begin.

Switch to futures::select_ok so we only return the error if EVERY
discovery path has failed. mDNS failing immediately now leaves relay
running to completion, which is the common case for remote pairing
into a NAS.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

---------

Co-authored-by: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
2026-04-18 18:56:51 -07:00
2025-11-06 05:25:15 -08:00

Spacedrive

Spacedrive

One file manager for all your devices and clouds.
Powered by a Virtual Distributed File System, complete with apps for macOS, Windows, Linux, iOS and Android

v2.spacedrive.comDiscordGetting Started


What is Spacedrive?

Spacedrive is a cross-device data platform. Index files, emails, notes, and external sources. Search everything. Sync via P2P. Keep AI agents safe with built-in screening.

  • Content identity — every file gets a BLAKE3 content hash. Same file on two devices produces the same hash. Spacedrive tracks redundancy and deduplication across all your machines.
  • Cross-device — see all your files across all your devices in one place. Files on disconnected devices stay in the index and appear as offline.
  • P2P sync — devices connect directly via Iroh/QUIC. No servers, no cloud, no single point of failure. Metadata syncs between devices. Files stay where they are.
  • Cloud volumes — index S3, Google Drive, Dropbox, OneDrive, Azure, and GCS as first-class volumes alongside local storage.
  • Nine views — grid, list, columns, media, size, recents, search, knowledge, and splat. QuickPreview for video, audio, code, documents, 3D, and images.
  • Local-first — everything runs on your machine. No data leaves your device unless you choose to sync between your own devices.

Is this a replacement for Finder or Explorer?

No. Spacedrive sits above your OS file manager and adds capabilities Finder/Explorer lack:

  • Portal across everything — search and browse files across local disks, external drives, NAS, cloud storage, and archived data sources from one interface.
  • Operating surface for files — content identity, sidecars, derivative artifacts, rich metadata, sync, and cross-device awareness built into the core model.
  • Embeddable and shareable — run it as a desktop app, headless server, hosted file service, or embed the interface and APIs into other products.
  • AI-ready by design — indexing and analysis pipelines prepare data ahead of time instead of giving agents raw shell access.
  • Safer access model — route AI and automation through structured APIs, permissions, and processing layers instead of direct file operations.

You still use your OS for low-level file interactions. Spacedrive adds the cross-platform, cross-device, cloud-aware, and automation-friendly layer on top.

Data Archival

Spacedrive indexes external data sources via script-based adapters: Gmail, Apple Notes, Chrome bookmarks, Obsidian, Slack, GitHub, calendar events, contacts. Each source becomes a searchable repository alongside your files.

Adapters are a folder with an adapter.toml manifest and a sync script in any language. If it reads stdin and prints lines, it works.

Shipped adapters: Gmail, Apple Notes, Chrome Bookmarks, Chrome History, Safari History, Obsidian, OpenCode, Slack, macOS Contacts, macOS Calendar, GitHub.

Spacebot

Spacedrive integrates with Spacebot, an open source AI agent runtime. Spacebot runs as an optional separate process. Spacedrive provides the data, permission, and execution layer. Spacebot provides the intelligence.

Each Spacebot instance pairs with one Spacedrive node as its home device. That node authenticates the agent, maintains the device graph, resolves permissions, and forwards operations to peer devices. Every device in your library can reach Spacebot through the paired node over P2P (Iroh/QUIC) without direct network access. One agent runtime serves your entire device fleet.

When Spacebot spawns a worker, that worker can target any device in the library. File reads, shell commands, and operations proxy through Spacedrive to the target device. Talk to the agent from your phone while work executes on a server. Read files from a NAS, run commands on a workstation, report to a laptop — all in one task.

Every operation passes through Spacedrive's permission system: which devices the agent can access, which paths are readable or writable, which operations are allowed, and which require human confirmation. The paired node resolves effective policy before forwarding. One security model, one audit surface across all devices and clouds.

File System Intelligence

Spacedrive adds intelligence to your filesystem by combining three layers:

  • File intelligence — derivative data like OCR, transcripts, extracted metadata, thumbnails, previews, classifications, and sidecars.
  • Directory intelligence — contextual knowledge attached to folders and subtrees ("active projects", "dormant archives", etc).
  • Access intelligence — permissions and policy that apply across devices and clouds, routing agents through structured access instead of raw shell commands.

When an agent navigates through Spacedrive, it receives the file listing, subtree context, effective permissions, and summaries. Users can explain how they organize their system. Agents can add attributed notes. Jobs generate summaries from structure and activity. The intelligence stays attached to the filesystem, not buried in temporary session memory.

Safety Screening

When enabled, every record passes through a safety pipeline before becoming searchable:

  • Prompt Guard 2 — local classifier detects prompt injection in emails, messages, and documents before they enter the index.
  • Trust tiers — authored content (your notes) gets balanced screening, external content (email inbox) gets strict screening.
  • Quarantine system — flagged records excluded from AI agent queries, reviewable in desktop app.
  • Content fencing — search results include trust metadata so agents know what's safe vs untrusted.

No other local data tool screens indexed content before exposing it to AI.


Architecture

The core is built on four principles:

  1. Virtual Distributed Filesystem (VDFS) — files and folders become first-class objects with rich metadata, independent of physical location. Every file gets a universal address (SdPath) that works across devices.

  2. Content Identity System — adaptive hashing (BLAKE3 with strategic sampling for large files) creates a unique fingerprint for every piece of content. Enables deduplication, redundancy tracking, and content-based operations.

  3. Transactional Actions — every file operation can be previewed before execution. See space savings, conflicts, and estimated time, then approve or cancel. Operations become durable jobs that survive network interruptions and device restarts.

  4. Leaderless Sync — peer-to-peer synchronization without central coordinators. Device-specific data uses state replication. Shared metadata uses an HLC-ordered log with deterministic conflict resolution.

The implementation is a single Rust crate with CQRS/DDD architecture. Every operation (file copy, tag create, search query) is a registered action or query with type-safe input/output that auto-generates TypeScript types for the frontend.

Component Technology
Language Rust
Async runtime Tokio
Database SQLite (SeaORM + sqlx)
P2P Iroh (QUIC, hole-punching, local discovery)
Content hashing BLAKE3
Vector search LanceDB + FastEmbed
Cloud storage OpenDAL
Cryptography Ed25519, X25519, ChaCha20-Poly1305, AES-GCM
Media FFmpeg, libheif, Pdfium, Whisper
Desktop Tauri 2
Mobile React Native + Expo
Frontend React 19, Vite, TanStack Query, Tailwind CSS v4
Design system SpaceUI (shared component library)
Type generation Specta
spacedrive/
├── core/                  # Rust engine (CQRS/DDD)
├── apps/
│   ├── tauri/             # Desktop app (macOS, Windows, Linux)
│   ├── mobile/            # React Native (iOS, Android)
│   ├── cli/               # CLI and daemon
│   ├── server/            # Headless server
│   └── web/               # Browser client
├── packages/
│   ├── interface/         # Shared React UI
│   ├── ts-client/         # Auto-generated TypeScript client
│   ├── ui/                # Component library
│   └── assets/            # Icons, images, SVGs
├── crates/                # Standalone Rust crates (ffmpeg, crypto, etc.)
├── adapters/              # Script-based data source adapters
└── schemas/               # TOML data type schemas

Getting Started

Requires Rust 1.81+, Bun 1.3+, just, and Python 3.9+ (for adapters).

git clone https://github.com/spacedriveapp/spacedrive
cd spacedrive

just setup        # bun install + native deps + cargo config
just dev-desktop  # launch the desktop app (auto-starts daemon)
just test         # run all workspace tests

Privacy & Security

Spacedrive is local-first. Your data stays on your devices.

  • End-to-End Encryption — all P2P traffic encrypted via QUIC/TLS
  • At-Rest Encryption — libraries can be encrypted on disk (SQLCipher)
  • No Telemetry — zero tracking or analytics
  • Self-Hostable — run your own relay servers
  • Data Sovereignty — you control where your data lives

Optional cloud integration is available for backup and remote access, but it's never required. The cloud service runs unmodified Spacedrive core as a standard P2P device—no special privileges.


Contributing


License

FSL-1.1-ALv2 — Functional Source License, converting to Apache 2.0 after two years.

Description
No description provided
Readme 377 MiB
Languages
Rust 76.7%
TypeScript 19.8%
Python 1.2%
Swift 1%
Shell 0.4%
Other 0.7%