mirror of
https://github.com/spacedriveapp/spacedrive.git
synced 2026-02-20 15:43:58 -05:00
This commit refines the volume ownership model by ensuring that entries and locations inherit ownership from their respective volumes. It updates the documentation to clarify the sync ownership flow, emphasizing the seamless transfer of ownership when external drives are connected to different devices. Additionally, it improves the overall clarity of the sync state machine and related processes, ensuring that the documentation accurately reflects the system's behavior and enhances understanding for developers and users.
568 lines
19 KiB
Plaintext
568 lines
19 KiB
Plaintext
---
|
|
title: Data Model
|
|
sidebarTitle: Data Model
|
|
---
|
|
|
|
Spacedrive's data model powers a Virtual Distributed File System (VDFS) that unifies files across all your devices. It enables instant organization, content deduplication, and powerful semantic search while maintaining performance at scale.
|
|
|
|
## Core Design
|
|
|
|
The system separates concerns into distinct entities:
|
|
|
|
- **SdPath** - Address any file; local, peer device, cloud, or by content ID
|
|
- **Entry** - File and directory representation
|
|
- **ContentIdentity** - Unique file content for deduplication
|
|
- **UserMetadata** - Organization data (tags, notes, favorites)
|
|
- **Location** - Monitored directories
|
|
- **Device** - Individual machines in your network
|
|
- **Volume** - Real storage volumes, local, external and cloud
|
|
- **Sidecar** - Derivative and associated data
|
|
|
|
## Domain vs. Database Entity Models
|
|
|
|
It is critical to understand the distinction between two data modeling layers in Spacedrive:
|
|
|
|
- **Domain Models**: These are the rich objects used throughout the application's business logic. They contain computed fields and methods that provide a powerful, high-level interface to the underlying data. For example, the `domain::File` structure represents several database models such as `entities::entry`, `entities::content_identity`, and `entities::user_metadata`.
|
|
- **Database Entity Models**: These are simpler structs that map directly to the database tables (e.g., `entities::entry`). They represent the raw, persisted state of the data and are optimized for storage and query performance.
|
|
|
|
The code examples in this document generally refer to the database entity models to accurately represent what is stored on disk. The domain models provide a convenient abstraction over this raw data.
|
|
|
|
## SdPath
|
|
|
|
The `SdPath` enum is the universal addressing system for files across all storage backends:
|
|
|
|
```rust
|
|
pub enum SdPath {
|
|
/// A direct pointer to a file on a specific local device
|
|
Physical {
|
|
device_slug: String, // The device slug (e.g., "jamies-macbook")
|
|
path: PathBuf, // The local filesystem path
|
|
},
|
|
|
|
/// A cloud storage path within a cloud volume
|
|
Cloud {
|
|
service: CloudServiceType, // The cloud service type (S3, GoogleDrive, etc.)
|
|
identifier: String, // The cloud identifier (bucket name, drive name, etc.)
|
|
path: String, // The cloud-native path (e.g., "photos/vacation.jpg")
|
|
},
|
|
|
|
/// An abstract, location-independent handle via content ID
|
|
Content {
|
|
content_id: Uuid, // The unique content identifier
|
|
},
|
|
|
|
/// A derivative data file (thumbnail, OCR text, embedding, etc.)
|
|
Sidecar {
|
|
content_id: Uuid, // The content this sidecar is derived from
|
|
kind: SidecarKind, // The type of sidecar (thumb, ocr, embeddings, etc.)
|
|
variant: SidecarVariant, // The specific variant (e.g., "grid@2x", "1080p")
|
|
format: SidecarFormat, // The storage format (webp, json, msgpack, etc.)
|
|
},
|
|
}
|
|
```
|
|
|
|
This enum enables transparent operations across local filesystems, cloud storage, content-addressed files, and derivative data. The `Physical` variant handles traditional filesystem paths, `Cloud` manages cloud storage locations, `Content` enables deduplication-aware operations by referencing files by their content, and `Sidecar` addresses generated derivative data like thumbnails and embeddings.
|
|
|
|
### Unified Addressing
|
|
|
|
Spacedrive displays paths using a unified addressing scheme that matches industry standards:
|
|
|
|
```rust
|
|
// Display user-friendly URIs
|
|
let uri = sd_path.display_with_context(&context).await;
|
|
|
|
// Examples:
|
|
// Physical: "local://jamies-macbook/Users/james/Documents/report.pdf"
|
|
// Cloud: "s3://my-bucket/photos/vacation.jpg"
|
|
// Content: "content://550e8400-e29b-41d4-a716-446655440000"
|
|
```
|
|
|
|
The addressing system uses:
|
|
|
|
- **Device slugs** for local paths (e.g., `local://jamies-macbook/path`)
|
|
- **Service-native URIs** for cloud storage (e.g., `s3://`, `gdrive://`, `onedrive://`)
|
|
- **Content UUIDs** for location-independent references
|
|
|
|
See [Unified Addressing](/docs/core/addressing) for complete details on URI formats and resolution.
|
|
|
|
## Entry
|
|
|
|
The `Entry` is the core entity representing a file or directory. The database entity (`entities::entry::Model`) stores the fundamental hierarchy and metadata.
|
|
|
|
```rust Expandable theme={null}
|
|
pub struct Entry {
|
|
pub id: i32, // Database primary key
|
|
pub uuid: Option<Uuid>, // Global identifier (assigned immediately during indexing)
|
|
pub name: String, // File or directory name
|
|
pub kind: i32, // 0=File, 1=Directory, 2=Symlink
|
|
pub extension: Option<String>, // File extension (without dot)
|
|
|
|
// Relationships
|
|
pub parent_id: Option<i32>, // Parent directory (self-referential)
|
|
pub metadata_id: Option<i32>, // User metadata (when present)
|
|
pub content_id: Option<i32>, // Content identity (for deduplication)
|
|
pub volume_id: Option<i32>, // Volume this entry resides on (determines sync ownership)
|
|
|
|
// Size and hierarchy
|
|
pub size: i64, // File size in bytes
|
|
pub aggregate_size: i64, // Total size including children
|
|
pub child_count: i32, // Direct children count
|
|
pub file_count: i32, // Total files in subtree
|
|
|
|
// Filesystem metadata
|
|
pub permissions: Option<String>, // Unix-style permissions
|
|
pub inode: Option<i64>, // Platform-specific identifier
|
|
|
|
// Timestamps
|
|
pub created_at: DateTime<Utc>,
|
|
pub modified_at: DateTime<Utc>,
|
|
pub accessed_at: Option<DateTime<Utc>>,
|
|
pub indexed_at: Option<DateTime<Utc>>, // When this entry was indexed, used for sync
|
|
|
|
}
|
|
```
|
|
|
|
### Ownership via Volume
|
|
|
|
Entries inherit sync ownership from their volume, not directly from a device. When you plug a portable drive into a different machine, updating the volume's device reference instantly transfers ownership of all entries on that volume. No bulk updates needed.
|
|
|
|
This design enables portable storage to move seamlessly between devices while maintaining correct sync behavior.
|
|
|
|
### UUID Assignment
|
|
|
|
All entries receive UUIDs immediately during indexing for UI caching compatibility. However, sync readiness is determined separately:
|
|
|
|
- **Directories** - Sync ready immediately (no content to identify)
|
|
- **Empty files** - Sync ready immediately (size = 0)
|
|
- **Regular files** - Sync ready only after content identification (content_id present)
|
|
|
|
This ensures files sync only after proper content identification, while allowing the UI to cache and track all entries from the moment they're discovered.
|
|
|
|
### Hierarchical Queries
|
|
|
|
A closure table enables efficient ancestor/descendant queries:
|
|
|
|
```rust
|
|
pub struct EntryClosure {
|
|
pub ancestor_id: i32,
|
|
pub descendant_id: i32,
|
|
pub depth: i32, // 0=self, 1=child, 2=grandchild
|
|
}
|
|
```
|
|
|
|
This structure allows instant queries like "find all files under this directory" without recursion.
|
|
|
|
## ContentIdentity
|
|
|
|
ContentIdentity represents unique file content, enabling deduplication across your entire library:
|
|
|
|
```rust Expandable theme={null}
|
|
pub struct ContentIdentity {
|
|
pub id: i32,
|
|
pub uuid: Option<Uuid>, // Globally deterministic from content_hash only
|
|
|
|
// Hashing
|
|
pub content_hash: String, // Fast sampled hash (for deduplication)
|
|
pub integrity_hash: Option<String>, // Full hash (for validation)
|
|
|
|
// Classification
|
|
pub mime_type_id: Option<i32>, // FK to MimeType
|
|
pub kind_id: i32, // FK to ContentKind (enum)
|
|
|
|
// Content metadata
|
|
pub text_content: Option<String>, // Extracted text
|
|
pub image_media_data_id: Option<i32>, // FK to image-specific metadata
|
|
pub video_media_data_id: Option<i32>, // FK to video-specific metadata
|
|
pub audio_media_data_id: Option<i32>, // FK to audio-specific metadata
|
|
|
|
// Statistics
|
|
pub total_size: i64, // Size of one instance
|
|
pub entry_count: i32, // Entries sharing this content (in library)
|
|
|
|
pub first_seen_at: DateTime<Utc>,
|
|
pub last_verified_at: DateTime<Utc>,
|
|
|
|
}
|
|
|
|
```
|
|
|
|
### Two-Stage Hashing
|
|
|
|
**Content Hash** - Fast sampling for deduplication
|
|
**Integrity Hash** - Full file hash for verification
|
|
|
|
### Deduplication
|
|
|
|
Multiple entries can point to the same ContentIdentity. When you have duplicate files, they all reference a single ContentIdentity record.
|
|
|
|
## UserMetadata
|
|
|
|
UserMetadata stores how you organize your files:
|
|
|
|
```rust Expandable theme={null}
|
|
pub struct UserMetadata {
|
|
pub id: i32,
|
|
pub uuid: Uuid,
|
|
|
|
// Scope - exactly one must be set
|
|
pub entry_uuid: Option<Uuid>, // File-specific metadata
|
|
pub content_identity_uuid: Option<Uuid>, // Content-universal metadata
|
|
|
|
// Organization
|
|
pub notes: Option<String>,
|
|
pub favorite: bool,
|
|
pub hidden: bool,
|
|
pub custom_data: Json, // Extensible JSON fields
|
|
|
|
pub created_at: DateTime<Utc>,
|
|
pub updated_at: DateTime<Utc>,
|
|
|
|
}
|
|
|
|
```
|
|
|
|
### Metadata Scoping
|
|
|
|
UserMetadata can be scoped two ways:
|
|
|
|
**Entry-Scoped** - Applies to a specific file instance
|
|
**Content-Scoped** - Applies to all instances of the same content
|
|
|
|
## Semantic Tags
|
|
|
|
Spacedrive uses a graph-based tagging system that understands context and relationships:
|
|
|
|
```rust Expandable theme={null}
|
|
pub struct Tag {
|
|
pub id: i32,
|
|
pub uuid: Uuid,
|
|
|
|
// Core identity
|
|
pub canonical_name: String, // Primary name
|
|
pub display_name: Option<String>, // Display variant
|
|
|
|
// Semantic variants
|
|
pub formal_name: Option<String>, // Formal variant
|
|
pub abbreviation: Option<String>, // Short form
|
|
pub aliases: Option<Json>, // Vec<String> as JSON
|
|
|
|
// Context
|
|
pub namespace: Option<String>, // Disambiguation namespace
|
|
pub tag_type: String, // "standard" | "organizational" | "privacy" | "system"
|
|
|
|
// Visual properties
|
|
pub color: Option<String>, // Hex color
|
|
pub icon: Option<String>, // Icon identifier
|
|
pub description: Option<String>,
|
|
|
|
// Behavior
|
|
pub is_organizational_anchor: bool,
|
|
pub privacy_level: String, // "normal" | "archive" | "hidden"
|
|
pub search_weight: i32, // Search ranking
|
|
|
|
// Extensibility
|
|
pub attributes: Option<Json>, // HashMap<String, Value> as JSON
|
|
pub composition_rules: Option<Json>, // Vec<CompositionRule> as JSON
|
|
|
|
pub created_at: DateTime<Utc>,
|
|
pub updated_at: DateTime<Utc>,
|
|
pub created_by_device: Option<Uuid>,
|
|
|
|
}
|
|
|
|
```
|
|
|
|
### Polymorphic Naming
|
|
|
|
The same name can mean different things in different contexts:
|
|
|
|
```rust
|
|
// Different tags with same name
|
|
Tag { canonical_name: "vacation", namespace: "travel", uuid: uuid_1 }
|
|
Tag { canonical_name: "vacation", namespace: "work", uuid: uuid_2 }
|
|
```
|
|
|
|
### Tag Relationships
|
|
|
|
Tags form hierarchies and semantic networks:
|
|
|
|
```rust
|
|
pub struct TagRelationship {
|
|
pub parent_tag_id: i32,
|
|
pub child_tag_id: i32,
|
|
pub relationship_type: String, // "parent_child" | "synonym" | "related"
|
|
pub strength: f32, // 0.0-1.0
|
|
}
|
|
```
|
|
|
|
Examples:
|
|
|
|
- Parent/Child: "Animals" → "Dogs" → "Puppies"
|
|
- Synonyms: "Car" "Automobile"
|
|
- Related: "Photography" "Camera"
|
|
|
|
Tag hierarchies use closure tables for efficient ancestor/descendant queries, similar to entries.
|
|
|
|
## Location
|
|
|
|
Locations are directories that Spacedrive monitors:
|
|
|
|
```rust Expandable theme={null}
|
|
pub struct Location {
|
|
pub id: i32,
|
|
pub uuid: Uuid,
|
|
pub device_id: i32, // Device that owns this location
|
|
pub entry_id: Option<i32>, // Root entry for this location (nullable during sync)
|
|
pub name: Option<String>, // User-friendly name
|
|
|
|
// Indexing configuration
|
|
pub index_mode: String, // "shallow" | "content" | "deep"
|
|
pub scan_state: String, // "pending" | "scanning" | "completed" | "error"
|
|
|
|
// Statistics
|
|
pub last_scan_at: Option<DateTime<Utc>>,
|
|
pub error_message: Option<String>,
|
|
pub total_file_count: i64,
|
|
pub total_byte_size: i64,
|
|
pub job_policies: Option<String>, // JSON-serialized JobPolicies (local config)
|
|
|
|
pub created_at: DateTime<Utc>,
|
|
pub updated_at: DateTime<Utc>,
|
|
|
|
}
|
|
```
|
|
|
|
### Index Modes
|
|
|
|
- **Shallow** - Metadata only (fast, no content hashing)
|
|
- **Content** - Metadata + deduplication
|
|
- **Deep** - Full analysis including media extraction
|
|
|
|
## Device
|
|
|
|
Devices represent machines in your Spacedrive network:
|
|
|
|
```rust Expandable theme={null}
|
|
pub struct Device {
|
|
pub id: i32,
|
|
pub uuid: Uuid,
|
|
pub name: String,
|
|
pub slug: String, // URL-safe unique identifier
|
|
|
|
// System information
|
|
pub os: String,
|
|
pub os_version: Option<String>,
|
|
pub hardware_model: Option<String>,
|
|
|
|
// Network
|
|
pub network_addresses: Json, // Array of IP addresses
|
|
pub is_online: bool,
|
|
pub last_seen_at: DateTime<Utc>,
|
|
|
|
// Capabilities
|
|
pub capabilities: Json, // Feature flags
|
|
|
|
// Sync
|
|
pub sync_enabled: bool,
|
|
pub last_sync_at: Option<DateTime<Utc>>,
|
|
|
|
pub created_at: DateTime<Utc>,
|
|
pub updated_at: DateTime<Utc>,
|
|
|
|
}
|
|
|
|
```
|
|
|
|
## Volume
|
|
|
|
Volumes track physical drives and partitions:
|
|
|
|
```rust Expandable theme={null}
|
|
pub struct Volume {
|
|
pub id: i32,
|
|
pub uuid: Uuid,
|
|
pub device_id: Uuid, // FK to Device
|
|
pub fingerprint: String, // Stable identifier across mounts
|
|
|
|
pub display_name: Option<String>,
|
|
pub mount_point: Option<String>,
|
|
pub file_system: Option<String>,
|
|
|
|
// Capacity
|
|
pub total_capacity: Option<i64>,
|
|
pub available_capacity: Option<i64>,
|
|
pub unique_bytes: Option<i64>, // Deduplicated storage usage
|
|
|
|
// Performance
|
|
pub read_speed_mbps: Option<i32>,
|
|
pub write_speed_mbps: Option<i32>,
|
|
pub last_speed_test_at: Option<DateTime<Utc>>,
|
|
|
|
// Classification
|
|
pub is_removable: Option<bool>,
|
|
pub is_network_drive: Option<bool>,
|
|
pub device_model: Option<String>,
|
|
pub volume_type: Option<String>,
|
|
pub is_user_visible: Option<bool>, // Visible in UI
|
|
pub auto_track_eligible: Option<bool>, // Eligible for auto-tracking
|
|
pub cloud_identifier: Option<String>, // Cloud volume identifier
|
|
|
|
// Tracking
|
|
pub tracked_at: DateTime<Utc>,
|
|
pub last_seen_at: DateTime<Utc>,
|
|
pub is_online: bool,
|
|
|
|
}
|
|
```
|
|
|
|
Volumes serve as the ownership anchor for entries. The `device_id` field determines which device owns all entries on this volume. When a portable drive moves between machines, updating this single field transfers ownership of the entire volume's contents. See [Library Sync](/docs/core/library-sync) for details on portable volume handling.
|
|
|
|
## Sidecar
|
|
|
|
Sidecars store generated content like thumbnails:
|
|
|
|
```rust Expandable theme={null}
|
|
pub struct Sidecar {
|
|
pub id: i32,
|
|
pub uuid: Uuid,
|
|
pub content_uuid: Uuid, // FK to ContentIdentity
|
|
|
|
// Classification
|
|
pub kind: String, // "thumbnail" | "preview" | "metadata"
|
|
pub variant: String, // Size/quality variant
|
|
pub format: String, // File format
|
|
|
|
// Storage
|
|
pub rel_path: String, // Relative path to sidecar
|
|
pub source_entry_id: Option<i32>, // Reference to existing file
|
|
|
|
// Metadata
|
|
pub size: i64,
|
|
pub checksum: Option<String>,
|
|
pub status: String, // "pending" | "processing" | "ready" | "error"
|
|
pub source: Option<String>, // Source of the sidecar
|
|
pub version: i32,
|
|
|
|
pub created_at: DateTime<Utc>,
|
|
pub updated_at: DateTime<Utc>,
|
|
}
|
|
```
|
|
|
|
<Note>
|
|
Sidecars link to ContentIdentity, not Entry. This means one thumbnail serves
|
|
all duplicate files.
|
|
</Note>
|
|
|
|
## Extension Models
|
|
|
|
Extensions create custom tables at runtime to store domain-specific data. These integrate seamlessly with core tagging and organization.
|
|
|
|
<Note type="warning">
|
|
The extension system is currently a work in progress. The API and
|
|
implementation details described here are subject to change.
|
|
</Note>
|
|
|
|
### Table Naming
|
|
|
|
Extension tables use prefixed naming:
|
|
|
|
```sql
|
|
-- Photos extension creates:
|
|
CREATE TABLE ext_photos_person (
|
|
id BLOB PRIMARY KEY,
|
|
name TEXT NOT NULL,
|
|
birth_date TEXT,
|
|
metadata_id INTEGER NOT NULL,
|
|
FOREIGN KEY (metadata_id) REFERENCES user_metadata(id)
|
|
);
|
|
|
|
CREATE TABLE ext_photos_album (
|
|
id BLOB PRIMARY KEY,
|
|
title TEXT NOT NULL,
|
|
description TEXT,
|
|
created_date TEXT,
|
|
metadata_id INTEGER NOT NULL,
|
|
FOREIGN KEY (metadata_id) REFERENCES user_metadata(id)
|
|
);
|
|
```
|
|
|
|
### Model Definition
|
|
|
|
Extensions define models using SDK macros:
|
|
|
|
```rust
|
|
#[model(
|
|
table_name = "person",
|
|
version = "1.0.0",
|
|
scope = "content",
|
|
sync_strategy = "shared"
|
|
)]
|
|
struct Person {
|
|
#[primary_key]
|
|
id: Uuid,
|
|
name: String,
|
|
birth_date: Option<String>,
|
|
#[metadata]
|
|
metadata_id: i32, // Links to UserMetadata
|
|
}
|
|
```
|
|
|
|
### Integration Benefits
|
|
|
|
1. **SQL Queries** - Direct database queries with JOINs and indexes
|
|
2. **Foreign Keys** - Referential integrity enforced by database
|
|
3. **Unified Organization** - Extension data can be tagged and searched
|
|
4. **Type Safety** - Compile-time schema validation
|
|
|
|
## Sync Architecture
|
|
|
|
### Device-Owned Resources
|
|
|
|
**Entities**: Device, Volume, Location, Entry
|
|
|
|
Ownership flows through volumes. A device owns its volumes. Locations and entries reference a volume, inheriting ownership from the volume's device. This indirection enables portable storage: when a drive moves between machines, updating the volume's device reference transfers ownership of all associated entries instantly.
|
|
|
|
Only the owning device can modify these resources. Last state wins.
|
|
|
|
### Shared Resources
|
|
|
|
**Entities**: Tag, UserMetadata, TagRelationship, ContentIdentity
|
|
|
|
Any device can modify shared resources. Changes are ordered using Hybrid Logical Clocks for consistency across devices.
|
|
|
|
### Foreign Key Mapping
|
|
|
|
During sync, integer IDs map to UUIDs for wire format, then back to local IDs on receiving devices.
|
|
|
|
## Query Patterns
|
|
|
|
Find files with a specific tag:
|
|
|
|
```rust
|
|
Entry::find()
|
|
.inner_join(UserMetadata)
|
|
.inner_join(UserMetadataTag)
|
|
.inner_join(Tag)
|
|
.filter(tag::Column::CanonicalName.eq("vacation"))
|
|
.all(db).await?
|
|
```
|
|
|
|
Find duplicate files:
|
|
|
|
```rust
|
|
ContentIdentity::find()
|
|
.find_with_related(Entry)
|
|
.filter(content_identity::Column::EntryCount.gt(1))
|
|
.all(db).await?
|
|
```
|
|
|
|
## Performance Optimizations
|
|
|
|
1. **Closure Tables** - O(1) hierarchical queries
|
|
2. **Directory Path Table** - The full path for every directory is stored in a dedicated `directory_paths` table. This is the source of truth for directory paths and avoids storing redundant path information on every file entry, making path-based updates significantly more efficient.
|
|
3. **Aggregate Columns** - Pre-computed size/count fields
|
|
4. **Deterministic UUIDs** - Consistent references across devices
|
|
5. **Integer PKs** - Fast local joins, UUIDs only for sync
|
|
|
|
The data model provides a foundation for powerful file management that scales from single devices to complex multi-device networks.
|