mirror of
https://github.com/spacedriveapp/spacedrive.git
synced 2026-06-04 05:37:16 -04:00
* feat: add redundancy awareness & cross-volume file comparison Surface content redundancy data so users can answer "if this drive dies, what do I lose?" — builds on existing content identity and volume systems. Backend: - New `redundancy.summary` library query with per-volume at-risk vs redundant byte/file counts and a library-wide replication score - Extend `SearchFilters` with `at_risk`, `on_volumes`, `not_on_volumes`, `min_volume_count`, `max_volume_count` filters - Add composite index migration on entries(content_id, volume_id) Frontend: - `/redundancy` dashboard with replication score, volume bars, at-risk callout - `/redundancy/at-risk` paginated file list sorted by size - `/redundancy/compare` two-volume comparison (unique/shared toggle) - Sidebar ShieldCheck button linking to redundancy view Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com> * redundancy UI improvements + ZFS volume detection fix Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com> * fix ZFS pool capacity reporting and stats filtering - ZFS: override total_capacity for pool-root volumes using zfs list used+available. df under-reports pool-root Size because it only counts the root dataset's own used bytes plus avail — on a 60 TB raidz2 pool this shows as ~15 TB instead of ~62 TB. The pool root's own used property includes descendants, so used+available is the real usable capacity. - Library stats: drop volumes where is_user_visible=false AND re-apply should_hide_by_mount_path retroactively so stale DB rows (detected before the Linux visibility filters existed) don't inflate reported capacity. - Extract should_hide_by_mount_path into volume/utils as a shared helper used by both the list query and the stats calculation. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com> * show capacity and visibility in sd volume list Makes it possible to verify library-level capacity aggregation from the CLI — previously the list only showed mount, fingerprint, and tracked/mounted state, which meant debugging the ZFS pool capacity issue required querying the library DB directly. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com> * document filesystem support matrix and detection New docs/core/filesystems.mdx covering per-filesystem capabilities (CoW, pool-awareness, visibility filtering, capacity correction), platform detection strategies, the FilesystemHandler trait, Linux/ macOS/ZFS visibility rules, the ZFS pool-root capacity problem and fix, copy strategy selection, and known limitations. Registered under File Management in both mint.json and docs.json. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com> * WIP: redundancy filter wiring across search, CLI, and UI - core/src/ops/search: wire redundancy filters (at_risk, on_volumes, not_on_volumes, min/max volume_count) through the search query; fix UUID-to-SQLite BLOB literal so volume UUID comparisons actually match (volumes.uuid is stored as a 16-byte BLOB, quoted-string comparison silently returned zero rows). - apps/cli: new redundancy subcommand + populate the new SearchFilters fields from search args. - packages/interface: redundancy at-risk and compare pages reworked to consume the new filter surface; explorer context/hook updates to support redundancy-scoped views. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com> * add web context menu renderer and UI polish - New WebContextMenuProvider + Radix DropdownMenu-based renderer anchored at cursor via a 1x1 virtual trigger. Handles separators, submenus, disabled, and the danger variant via text-status-error. - useContextMenu now routes web clicks through the provider instead of parking data in unused local state, and trims leading/trailing/adjacent separators so condition-filtered menus don't render orphaned lines. - Drop app-frame corner rounding on the web build. - Add shrink-0 to the sidebar space switcher so the scrollable sibling can't compress it vertically. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com> * sd-server dev workflow: auto web build, shutdown watchdog, stable data dir - build.rs runs `bun run build` in apps/web so `just dev-server` always embeds the latest UI. rerun-if-changed covers apps/web/src, packages/interface/src, and packages/ts-client/src so Rust-only edits skip the rebuild. Skips gracefully when bun isn't on PATH or SD_SKIP_WEB_BUILD is set; Dockerfile sets the latter since dist is pre-built and bun isn't in the Rust stage. - Graceful shutdown was hanging because the browser holds the /events SSE stream open forever and axum waits for all connections to drain. After the first signal, arm a background force-exit on second Ctrl+C or 5s timeout so the process can't stick. - Debug builds were starting from a fresh tempfile::tempdir() on every run (the TempDir handle dropped at end of the closure, deleting the dir we just took a path to). Default to ~/.spacedrive in debug so data persists and `just dev-server` shares a data dir with the Tauri app. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com> * add Sources space item alongside Redundancy in default library layout Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com> * add TrueNAS native build script Uses zig cc as C/C++ compiler on TrueNAS Scale where /usr is read-only and no system gcc exists. Dev tools live at /mnt/pool/dev-tools/ (zig, cmake, make, extracted deb headers). Builds sd-server + sd-cli in ~4 min on a 12-core NAS. AI feature disabled (whisper.cpp C11 atomics incompatible with zig clang-18). Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com> * never block RPC on synchronous statistics calculation On first load (fresh library, all stats zero), libraries.info used to calculate statistics synchronously before responding. On large libraries during active indexing this hangs indefinitely — the closure-table walk in calculate_file_statistics loads every descendant ID into a Vec then issues a WHERE IN(...) with millions of entries, which SQLite can't finish while the indexer is writing. Now always return cached (possibly zero) stats and let the background recalculate_statistics task fill them in. The UI refreshes via the ResourceChanged event when the calculation completes. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com> * self-heal protocol handler registration on re-init Core::new() registers default protocol handlers after starting networking, but swallows any failure (error is only logged). If the initial registration fails — e.g. on a host where start_networking hasn't fully set up the event loop command sender by the time register_default_protocol_handlers runs — the registry is left empty. A subsequent call to Core::init_networking() would see `services.networking().is_some()` and skip re-registration, permanently leaving protocols unregistered for the life of the process. sd-server calls init_networking() right after Core::new(), so it's the client most exposed to this. Symptom: pairing over the web UI returns "Pairing protocol not registered" while the same library works fine from Tauri and mobile. Fix: init_networking now queries the registry directly for the pairing handler and re-registers the default set if it's missing, independent of whether networking is already initialized. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com> * fall back to pkarr+DNS discovery when mDNS port is unavailable Iroh's endpoint.bind() fails wholesale if any configured discovery service fails to initialize. MdnsDiscovery requires binding UDP :5353, which on most Linux systems (including TrueNAS) is already owned by avahi-daemon. Result: endpoint creation errors out with "Service 'mdns' error", the event loop never starts, command_sender stays None, and protocol registration fails — so sd-server has no working networking at all. Make mDNS best-effort: on any error whose message mentions "mdns", retry endpoint creation with only pkarr + DNS discovery. Local-network auto-discovery is lost but remote pairing via node ID (which uses n0's DNS infrastructure, not mDNS) continues to work normally. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com> * succeed pairing if either mDNS or relay discovery wins The dual-path discovery in start_pairing_as_joiner_with_code used tokio::select! to race mDNS and relay. select! resolves on the first branch to complete — including errors — so a host that can't bind mDNS (e.g. a Linux box where avahi already owns UDP :5353) would fail pairing wholesale: mDNS discovery errors out in <1ms with "Failed to create mDNS discovery: Service 'mdns' error", that Err wins the race, and relay discovery gets cancelled before it can even begin. Switch to futures::select_ok so we only return the error if EVERY discovery path has failed. mDNS failing immediately now leaves relay running to completion, which is the common case for remote pairing into a NAS. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com> --------- Co-authored-by: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
224 lines
14 KiB
Plaintext
224 lines
14 KiB
Plaintext
---
|
|
title: Filesystems
|
|
sidebarTitle: Filesystems
|
|
---
|
|
|
|
Spacedrive treats the native filesystem as the substrate for everything above it. Detection, capacity reporting, copy-on-write, visibility filtering, and same-storage checks all have filesystem-specific behavior because the abstractions leak: `df` lies about ZFS pool sizes, APFS volumes share containers, Btrfs subvolumes look independent but aren't, and Windows mount points rename themselves.
|
|
|
|
This page documents what Spacedrive knows about each filesystem, how it detects them, and where the abstraction boundaries are.
|
|
|
|
## Support Matrix
|
|
|
|
| Filesystem | Platform | CoW / Clones | Pool-aware | Visibility filter | Capacity correction |
|
|
|---|---|---|---|---|---|
|
|
| APFS | macOS, iOS | yes (clonefile) | yes (containers) | yes (system volumes) | no |
|
|
| Btrfs | Linux | yes (reflink) | yes (subvolumes) | yes (via Linux rules) | no |
|
|
| ZFS | Linux | yes (reflink on recent ZoL) | yes (pools) | yes (system pools, apps) | yes (pool root) |
|
|
| ReFS | Windows | yes (block clone) | no | no | no |
|
|
| NTFS | Windows | no | no | no | no |
|
|
| ext2/3/4 | Linux | no | no | yes (via Linux rules) | no |
|
|
| XFS | Linux | no | no | yes (via Linux rules) | no |
|
|
| FAT32, exFAT | all | no | no | no | no |
|
|
| HFS+ | macOS | no | no | yes (system volumes) | no |
|
|
|
|
"CoW / Clones" means `std::fs::copy` and the `FastCopyStrategy` produce metadata-only copies when source and destination are on the same filesystem. Everything else falls back to `LocalStreamCopyStrategy` which streams bytes with progress reporting.
|
|
|
|
## Detection
|
|
|
|
Volume detection runs at startup and on mount/unmount events. Each platform uses a different primary source:
|
|
|
|
### macOS (`core/src/volume/platform/macos.rs`)
|
|
|
|
Primary: `diskutil apfs list` — gives APFS container topology, volume roles (`Data`, `System`, `VM`, `Preboot`, `Recovery`, `Update`), and mount points. Containers group volumes that share physical storage and space (`ApfsContainer`).
|
|
|
|
Fallback: `df -h -T` for non-APFS volumes (HFS+, external FAT32, etc.).
|
|
|
|
Classification:
|
|
- `/`, `/System/Volumes/Data`, `/System/Volumes/Preboot` etc. are system roles — hidden from user-visible view but still fingerprinted.
|
|
- `/Volumes/*` that aren't system roles are External.
|
|
|
|
### Linux (`core/src/volume/platform/linux.rs`)
|
|
|
|
Primary: `df -h -T` — one line per mounted filesystem with device, type, size, available, mount point.
|
|
|
|
Secondary: `/sys/block/<device>/queue/rotational` to distinguish SSD from HDD. `/proc/mounts` is also parseable via `parse_proc_mounts()` as an alternative source.
|
|
|
|
ZFS datasets get a second pass via `zfs list -H -o name,mountpoint,used,available,type -t filesystem` to enrich each volume with dataset/pool information (see [Capacity Reporting](#capacity-reporting) below).
|
|
|
|
### Windows (`core/src/volume/platform/windows.rs`)
|
|
|
|
Uses Win32 APIs via `windows-sys`:
|
|
- `GetLogicalDrives` to enumerate drive letters.
|
|
- `GetVolumeInformationW` for filesystem type and volume label.
|
|
- `GetDiskFreeSpaceExW` for capacity.
|
|
- `GetVolumeNameForVolumeMountPointW` for the stable `\\?\Volume{GUID}\` path — used as a hardware identifier that survives drive letter changes.
|
|
|
|
### iOS (`core/src/volume/platform/ios.rs`)
|
|
|
|
Uses the macOS APFS code path but restricted to app-accessible volumes (sandboxed; detection is mostly informational).
|
|
|
|
## FilesystemHandler trait
|
|
|
|
`core/src/volume/fs/mod.rs` defines a trait each filesystem implements:
|
|
|
|
```rust
|
|
#[async_trait]
|
|
pub trait FilesystemHandler: Send + Sync {
|
|
/// Add filesystem-specific fields to a Volume (dataset info, container, subvolume, etc.)
|
|
async fn enhance_volume(&self, volume: &mut Volume) -> VolumeResult<()>;
|
|
|
|
/// Can these two paths use fast same-storage operations (clone/reflink)?
|
|
async fn same_physical_storage(&self, path1: &Path, path2: &Path) -> bool;
|
|
|
|
/// Copy strategy to use for this filesystem
|
|
fn get_copy_strategy(&self) -> Box<dyn CopyStrategy>;
|
|
|
|
/// Filesystem-specific contains-path check (accounts for datasets/subvolumes/etc.)
|
|
fn contains_path(&self, volume: &Volume, path: &Path) -> bool;
|
|
}
|
|
```
|
|
|
|
`get_filesystem_handler(FileSystem)` returns the right implementation, falling back to `GenericFilesystemHandler` for anything unrecognized.
|
|
|
|
## Per-filesystem details
|
|
|
|
### APFS (`core/src/volume/fs/apfs.rs`)
|
|
|
|
- **Containers**: APFS groups volumes into containers that share physical space. `ApfsContainer` is populated from `diskutil apfs list` and attached to each volume. `same_physical_storage` returns true when two paths are on volumes in the same container — that's when `clonefile(2)` produces instant clones.
|
|
- **Firmlinks**: macOS silently maps paths like `/Users` onto `/System/Volumes/Data/Users`. `generate_macos_path_mappings()` materializes these mappings so `contains_path` resolves correctly.
|
|
- **Role-based visibility**: volumes with roles `System`, `VM`, `Preboot`, `Recovery`, `Update` are marked `is_user_visible = false`. Only `Data` and unroled external volumes appear in the default UI.
|
|
|
|
### Btrfs (`core/src/volume/fs/btrfs.rs`)
|
|
|
|
- **Subvolumes**: `btrfs subvolume show <path>` populates `SubvolumeInfo`. Subvolumes on the same Btrfs filesystem share storage.
|
|
- **Reflinks**: `same_physical_storage` checks whether two paths share the top-level Btrfs filesystem via `btrfs filesystem show`. If yes, reflinks work between them.
|
|
|
|
### ZFS (`core/src/volume/fs/zfs.rs`)
|
|
|
|
ZFS is the most-developed filesystem integration because TrueNAS Scale is a common Spacedrive server target.
|
|
|
|
- **Datasets and pools**: `zfs list` output is parsed once per detection pass via `fetch_zfs_list_output()` (not per-volume — important for servers with many datasets). Each volume gets matched to its dataset via `find_dataset_for_path`, and the dataset's pool is extracted from the name (`pool/a/b` → pool `pool`).
|
|
- **Pool root capacity correction**: see [Capacity Reporting](#capacity-reporting).
|
|
- **System pool filter**: `is_system_zfs_pool` matches `boot-pool`, `rpool`, `zroot`. Datasets on these pools are marked `VolumeType::System`, `is_user_visible = false`, and never auto-tracked.
|
|
- **App-managed dataset filter**: `is_app_managed_dataset` matches names containing `/ix-applications/`, `/.ix-apps/`, `/docker/`, or `/containerd/`. These are hidden from user view. TrueNAS Scale apps create dozens of nested datasets per app — without this filter the volume list becomes unusable.
|
|
- **Clone support**: `supports_clones` returns true for any read-write dataset. ZoL 2.2+ supports reflinks; older versions fall back to streaming copy.
|
|
|
|
### ReFS (`core/src/volume/fs/refs.rs`)
|
|
|
|
- **Block cloning**: checks for ReFS integrity stream support via `DeviceIoControl` / `FSCTL_DUPLICATE_EXTENTS_TO_FILE`. Sets `supports_block_cloning` on the volume.
|
|
- **Version gating**: ReFS 3.x supports block cloning; 2.x doesn't. The handler feature-detects rather than version-checks.
|
|
|
|
### NTFS (`core/src/volume/fs/ntfs.rs`)
|
|
|
|
No CoW primitive on NTFS, so `get_copy_strategy` returns `LocalStreamCopyStrategy`. The handler mainly exists to provide NTFS-aware `same_physical_storage` (compares Volume GUIDs, not drive letters).
|
|
|
|
### Generic (`core/src/volume/fs/generic.rs`)
|
|
|
|
Fallback for ext2/3/4, XFS, FAT32, exFAT, HFS+, and anything unrecognized. `same_physical_storage` compares mount point roots. Copy strategy is always `LocalStreamCopyStrategy`.
|
|
|
|
## Visibility rules
|
|
|
|
Spacedrive tracks far more volumes than it shows. Hidden volumes still get stable fingerprints so locations on them survive remounts, but they don't clutter the default UI and aren't eligible for auto-tracking.
|
|
|
|
Two flags drive this:
|
|
- `is_user_visible: bool` — shown in the default volume list.
|
|
- `auto_track_eligible: bool` — picked up by `volumes.scan`. Always implies `is_user_visible`.
|
|
|
|
### Linux rules (`core/src/volume/utils.rs`)
|
|
|
|
`is_virtual_filesystem(fs_type)` drops anything backed by kernel memory: `tmpfs`, `proc`, `sysfs`, `devtmpfs`, `cgroup`, `cgroup2`, `squashfs`, `efivarfs`, `overlay`, `fuse`, and ~20 more. These are hidden even before classification.
|
|
|
|
`is_system_mount_point(path)` matches Linux OS paths:
|
|
- Exact: `/`, `/usr`, `/var`, `/etc`, `/opt`, `/srv`, `/root`, `/boot`, `/home`, `/run`, `/dev`, `/proc`, `/sys`, `/tmp`, `/audit`, `/data`, `/conf`, `/mnt`, `/lost+found`.
|
|
- Prefixes: `/boot/`, `/sys/`, `/proc/`, `/dev/`, `/run/`, `/var/log`, `/var/db/`, `/var/lib/systemd`, `/var/local/`, `/var/cache/`.
|
|
|
|
The exact-match list includes TrueNAS Scale's split-root datasets (it mounts `/usr`, `/var`, `/etc` as separate ZFS datasets for atomic OS updates).
|
|
|
|
`is_nested_app_mount(path)` matches container/app mounts:
|
|
- Anything under `ix-applications/` or `.ix-apps/` (TrueNAS apps — one app creates dozens of datasets).
|
|
- `docker/overlay2/`, `containerd/`, `kubelet/`, `snap/`.
|
|
- `.snapshots/`, `.zfs/snapshot/` (ZFS snapshot browsing mounts).
|
|
|
|
`should_hide_by_mount_path(path)` is the combined check. It's applied at:
|
|
1. **Detection** — so newly-discovered volumes get `is_user_visible = false` persistently.
|
|
2. **Volume list query** (`core/src/ops/volumes/list/query.rs`) — retroactively for tracked volumes whose DB rows predate these filters.
|
|
3. **Stats calculation** (`core/src/library/mod.rs`) — so `total_capacity` and `available_capacity` exclude hidden volumes even if the DB flag is stale.
|
|
|
|
### ZFS-specific rules
|
|
|
|
Applied during ZFS enhancement after `should_hide_by_mount_path`:
|
|
- Datasets on `is_system_zfs_pool` pools (boot-pool, rpool, zroot) → hidden + `VolumeType::System`.
|
|
- Datasets matching `is_app_managed_dataset` → hidden.
|
|
|
|
### macOS rules
|
|
|
|
APFS role-based: `System`, `VM`, `Preboot`, `Recovery`, `Update` roles are hidden. Also `/System/Volumes/*` except `/System/Volumes/Data` is hidden by path.
|
|
|
|
## Capacity reporting
|
|
|
|
### The df-for-ZFS problem
|
|
|
|
`df -T` reports `Size = used + available` per mounted dataset. For a ZFS **leaf** dataset this is fine. For a ZFS **pool root** it's misleading:
|
|
|
|
```
|
|
$ df -T /mnt/pool
|
|
Filesystem Type Size Used Available Mount
|
|
pool zfs 15.0T 199M 14.9T /mnt/pool
|
|
```
|
|
|
|
The pool root's own `used` is tiny (199 MB) because all the real data lives in descendant datasets. `df` doesn't know that. On a 60 TB pool that's 75% full, `df` says the pool root is "15 TB" — essentially just the free space.
|
|
|
|
ZFS's native `used` property on the pool root *does* include descendants:
|
|
|
|
```
|
|
$ zfs get used,available pool
|
|
pool used 47.0T
|
|
pool available 14.9T
|
|
```
|
|
|
|
47 T + 14.9 T ≈ 62 T = the real pool capacity after raidz2 parity.
|
|
|
|
### Correction
|
|
|
|
`enhance_volume_with_cached_output` in `zfs.rs` detects pool-root volumes (`dataset.name == dataset.pool_name`) and overwrites `total_capacity` with `used + available` from `zfs list`. Leaf datasets keep their df-derived values — they're accurate for single-dataset views.
|
|
|
|
### Library statistics
|
|
|
|
`calculate_volume_capacity` (and `_static`) in `core/src/library/mod.rs` aggregates per-volume capacity with three passes:
|
|
|
|
1. Filter by `volume_type` (`Primary`, `UserData`, `External`, `Secondary`).
|
|
2. Filter by visibility (`is_user_visible = true` *and* `!should_hide_by_mount_path(mount)`).
|
|
3. Deduplicate by fingerprint.
|
|
4. Sort by mount-path length (shortest first).
|
|
5. For each volume: skip if it's a subpath of an already-counted volume on the same device; otherwise add its capacity to the running totals.
|
|
|
|
Subpath dedup handles the common ZFS case: when `/mnt/pool` is tracked along with `/mnt/pool/footage` and `/mnt/pool/cctv`, only `/mnt/pool` gets counted (once).
|
|
|
|
### Pool-aware dedup limitation
|
|
|
|
Subpath dedup breaks if the user tracks only leaf datasets without the pool root. Each leaf reports the full `available` as its own — summing them over-counts by the pool's free space per extra leaf.
|
|
|
|
On TrueNAS this doesn't bite because `df` always detects the pool root. For other setups, proper fix requires either persisting `pool_name` on the volume record or a second dedup pass keyed on `(device_id, file_system=ZFS, available_capacity)`. Neither is implemented yet.
|
|
|
|
## Copy strategies
|
|
|
|
`core/src/ops/files/copy/strategy.rs` defines three strategies:
|
|
|
|
- **`LocalMoveStrategy`** — `fs::rename()` for same-volume moves. Metadata-only.
|
|
- **`FastCopyStrategy`** — `std::fs::copy()` which invokes platform CoW primitives (`clonefile` on APFS, `ficlone`/`FICLONERANGE` on Btrfs/ZFS, block cloning on ReFS) when source and destination are on the same filesystem. Falls back to streaming if CoW fails.
|
|
- **`LocalStreamCopyStrategy`** — chunked buffered copy with progress events. Used for cross-volume copies and for filesystems without CoW.
|
|
|
|
`FilesystemHandler::get_copy_strategy` picks `FastCopyStrategy` for APFS, Btrfs, ZFS, ReFS. Everything else gets `LocalStreamCopyStrategy`.
|
|
|
|
Note that `std::fs::copy` itself picks the right syscall — the `FastCopyStrategy`/`LocalStreamCopyStrategy` split is about *whether to try fast copy at all* and how to report progress, not about which syscall to call.
|
|
|
|
See [File Copy Operations](/docs/core/file-copy-operations) for the higher-level copy/move API.
|
|
|
|
## Known limitations
|
|
|
|
- **Leaf-only ZFS dataset tracking** — see [Pool-aware dedup limitation](#pool-aware-dedup-limitation).
|
|
- **Windows detection is shallow** — we get capacity and FS type, but not the storage-pool topology that Storage Spaces / ReFS mirroring exposes. Same-pool detection across ReFS volumes isn't implemented.
|
|
- **Btrfs subvolume visibility** — we detect subvolumes but don't hide nested subvolumes created by Docker or snapper. Equivalent to ZFS `is_app_managed_dataset` would need a similar name-based filter.
|
|
- **Network filesystems (NFS, SMB)** — treated as `MountType::Network` but no protocol-aware capacity or CoW handling. `Available` comes from whatever the server reports via statvfs.
|
|
- **Encrypted volumes (LUKS, FileVault, BitLocker)** — opaque to us once mounted; they appear as whatever filesystem is layered on top.
|