mirror of
https://github.com/spacedriveapp/spacedrive.git
synced 2026-04-24 08:28:27 -04:00
feat(whitepaper): Enhance Closure Table Indexing Proposal in Spacedrive documentation
- Revised the Closure Table Indexing Proposal to detail the transition from a materialized path-based system to a hybrid model utilizing a Closure Table, aimed at improving hierarchical query performance and scalability. - Updated the current implementation analysis, highlighting performance bottlenecks and introducing the proposed schema changes, including the addition of a `parent_id` field and the creation of an `entry_closure` table. - Expanded on the closure table solution, emphasizing its efficiency in managing ancestor-descendant relationships and the impact on filesystem indexing. - Enhanced the conclusion to reflect the significance of these architectural changes for Spacedrive's long-term performance and data integrity. These updates aim to provide a comprehensive understanding of the proposed indexing improvements and their implications for the Spacedrive architecture.
This commit is contained in:
@@ -1,27 +1,38 @@
|
||||
# Closure Table Indexing Proposal for Spacedrive
|
||||
'''# Closure Table Indexing Proposal for Spacedrive
|
||||
|
||||
## Executive Summary
|
||||
|
||||
This document explores how closure tables could improve Spacedrive's filesystem indexing performance, particularly for hierarchical queries and directory aggregation operations.
|
||||
This document proposes a shift from a materialized path-based indexing system to a hybrid model incorporating a **Closure Table**. This change will dramatically improve hierarchical query performance, address critical scaling bottlenecks, and enhance data integrity, particularly for move operations. The core of this proposal is to supplement the existing `entries` table with an `entry_closure` table and a `parent_id` field, enabling highly efficient and scalable filesystem indexing.
|
||||
|
||||
## Current Implementation Analysis
|
||||
## 1. Current Implementation Analysis
|
||||
|
||||
### Materialized Path Approach
|
||||
Spacedrive currently uses a materialized path approach where:
|
||||
- Each entry stores its `relative_path` (e.g., "Documents/Projects")
|
||||
- Full paths are reconstructed by combining `location_path + relative_path + name`
|
||||
- No explicit parent-child relationships in the database
|
||||
- Each entry stores its `relative_path` (e.g., "Documents/Projects").
|
||||
- Full paths are reconstructed by combining `location_path + relative_path + name`.
|
||||
- There are no explicit, indexed parent-child relationships in the database.
|
||||
|
||||
### Performance Bottlenecks
|
||||
1. **String-based path matching** for finding children/descendants
|
||||
2. **Sequential directory aggregation** from leaves to root
|
||||
3. **Inefficient ancestor queries** (finding all parents of a file)
|
||||
4. **Complex LIKE queries** for subtree operations
|
||||
This design leads to significant performance issues that will not scale:
|
||||
1. **String-based path matching** for finding children/descendants (`LIKE 'path/%'`). These queries are un-indexable and require full table scans.
|
||||
2. **Sequential directory aggregation** from leaves to root, which is slow and complex.
|
||||
3. **Inefficient ancestor queries** (e.g., for breadcrumbs), requiring multiple queries and string parsing in the application layer.
|
||||
|
||||
## Closure Table Solution
|
||||
## 2. The Closure Table Solution
|
||||
|
||||
### Concept
|
||||
A closure table stores all ancestor-descendant relationships explicitly:
|
||||
A closure table stores all ancestor-descendant relationships explicitly, turning slow string operations into highly efficient integer-based joins.
|
||||
|
||||
### Proposed Schema Changes
|
||||
|
||||
**1. Add `parent_id` to `entries` table:**
|
||||
This provides a direct, indexed link to a parent, simplifying relationship lookups during indexing.
|
||||
|
||||
```sql
|
||||
ALTER TABLE entries ADD COLUMN parent_id INTEGER REFERENCES entries(id) ON DELETE SET NULL;
|
||||
```
|
||||
|
||||
**2. Create `entry_closure` table:**
|
||||
|
||||
```sql
|
||||
CREATE TABLE entry_closure (
|
||||
@@ -29,189 +40,85 @@ CREATE TABLE entry_closure (
|
||||
descendant_id INTEGER NOT NULL,
|
||||
depth INTEGER NOT NULL,
|
||||
PRIMARY KEY (ancestor_id, descendant_id),
|
||||
FOREIGN KEY (ancestor_id) REFERENCES entries(id),
|
||||
FOREIGN KEY (descendant_id) REFERENCES entries(id)
|
||||
FOREIGN KEY (ancestor_id) REFERENCES entries(id) ON DELETE CASCADE,
|
||||
FOREIGN KEY (descendant_id) REFERENCES entries(id) ON DELETE CASCADE
|
||||
);
|
||||
|
||||
CREATE INDEX idx_closure_descendant ON entry_closure(descendant_id);
|
||||
CREATE INDEX idx_closure_depth ON entry_closure(ancestor_id, depth);
|
||||
CREATE INDEX idx_closure_ancestor_depth ON entry_closure(ancestor_id, depth);
|
||||
```
|
||||
*Note: `ON DELETE CASCADE` is crucial. When an entry is deleted, all its relationships in the closure table are automatically and efficiently removed by the database.*
|
||||
|
||||
### Example Data
|
||||
For a path `/Documents/Projects/spacedrive/README.md`:
|
||||
```
|
||||
entry_closure:
|
||||
ancestor_id | descendant_id | depth
|
||||
----------- | ------------- | -----
|
||||
1 | 1 | 0 (Documents → Documents)
|
||||
1 | 2 | 1 (Documents → Projects)
|
||||
1 | 3 | 2 (Documents → spacedrive)
|
||||
1 | 4 | 3 (Documents → README.md)
|
||||
2 | 2 | 0 (Projects → Projects)
|
||||
2 | 3 | 1 (Projects → spacedrive)
|
||||
2 | 4 | 2 (Projects → README.md)
|
||||
3 | 3 | 0 (spacedrive → spacedrive)
|
||||
3 | 4 | 1 (spacedrive → README.md)
|
||||
4 | 4 | 0 (README.md → README.md)
|
||||
```
|
||||
## 3. Critical Requirement: Inode-Based Change Detection
|
||||
|
||||
## Benefits for Spacedrive
|
||||
A core prerequisite for the closure table's integrity is the indexer's ability to reliably distinguish between a file **move** and a **delete/add** operation, especially when Spacedrive is catching up on offline changes.
|
||||
|
||||
### 1. Optimized Queries
|
||||
**The Problem:** Without proper move detection, moving a directory containing 10,000 files would be misinterpreted as 10,000 deletions and 10,000 creations, leading to a catastrophic and incorrect rebuild of the closure table.
|
||||
|
||||
**Get all children of a directory:**
|
||||
```sql
|
||||
-- Current approach (string matching)
|
||||
SELECT * FROM entries
|
||||
WHERE location_id = ? AND relative_path = ?;
|
||||
**The Solution:** The indexing process **must** be inode-aware.
|
||||
1. **Initial Scan:** Before scanning the filesystem, the indexer must load all existing entries for the target location into two in-memory maps:
|
||||
* `path_map: HashMap<PathBuf, Entry>`
|
||||
* `inode_map: HashMap<u64, Entry>`
|
||||
2. **Reconciliation:** When the indexer encounters a file on disk:
|
||||
* If the file's path is not in `path_map`, it then looks up the file's **inode** in `inode_map`.
|
||||
* If the inode is found, the indexer has detected a **move**. It must trigger a specific `EntryMoved` event/update.
|
||||
* If neither the path nor the inode is found, it is a genuinely new file.
|
||||
|
||||
-- Closure table approach (indexed lookup)
|
||||
SELECT e.* FROM entries e
|
||||
JOIN entry_closure c ON e.id = c.descendant_id
|
||||
WHERE c.ancestor_id = ? AND c.depth = 1;
|
||||
```
|
||||
This is the only way to guarantee the integrity of the hierarchy and prevent data corruption in the closure table.
|
||||
|
||||
**Get entire subtree:**
|
||||
```sql
|
||||
-- Current approach (complex LIKE)
|
||||
SELECT * FROM entries
|
||||
WHERE location_id = ?
|
||||
AND (relative_path = ? OR relative_path LIKE ?||'/%');
|
||||
|
||||
-- Closure table approach (simple join)
|
||||
SELECT e.* FROM entries e
|
||||
JOIN entry_closure c ON e.id = c.descendant_id
|
||||
WHERE c.ancestor_id = ? AND c.depth > 0
|
||||
ORDER BY c.depth;
|
||||
```
|
||||
|
||||
**Get all ancestors (breadcrumb):**
|
||||
```sql
|
||||
-- Current approach (requires application logic)
|
||||
-- Must parse path and query each component
|
||||
|
||||
-- Closure table approach (single query)
|
||||
SELECT e.* FROM entries e
|
||||
JOIN entry_closure c ON e.id = c.ancestor_id
|
||||
WHERE c.descendant_id = ?
|
||||
ORDER BY c.depth DESC;
|
||||
```
|
||||
|
||||
### 2. Improved Directory Aggregation
|
||||
|
||||
The current aggregation phase could be dramatically improved:
|
||||
|
||||
```sql
|
||||
-- Calculate directory sizes in one query
|
||||
WITH RECURSIVE dir_sizes AS (
|
||||
SELECT
|
||||
c.ancestor_id as dir_id,
|
||||
SUM(e.size) as total_size,
|
||||
COUNT(DISTINCT e.id) as file_count
|
||||
FROM entry_closure c
|
||||
JOIN entries e ON c.descendant_id = e.id
|
||||
WHERE e.kind = 0 -- Files only
|
||||
GROUP BY c.ancestor_id
|
||||
)
|
||||
UPDATE entries
|
||||
SET size = dir_sizes.total_size
|
||||
FROM dir_sizes
|
||||
WHERE entries.id = dir_sizes.dir_id AND entries.kind = 1;
|
||||
```
|
||||
|
||||
### 3. Fast Move Operations
|
||||
|
||||
Moving a directory and all its contents becomes much simpler:
|
||||
|
||||
```sql
|
||||
-- Update closure table for move operation
|
||||
-- 1. Remove old relationships
|
||||
DELETE FROM entry_closure
|
||||
WHERE descendant_id IN (
|
||||
SELECT descendant_id FROM entry_closure WHERE ancestor_id = ?
|
||||
) AND ancestor_id NOT IN (
|
||||
SELECT descendant_id FROM entry_closure WHERE ancestor_id = ?
|
||||
);
|
||||
|
||||
-- 2. Add new relationships (can be optimized with CTEs)
|
||||
```
|
||||
|
||||
## Implementation Strategy
|
||||
## 4. Implementation Strategy
|
||||
|
||||
### Hybrid Approach
|
||||
Keep the current materialized path system but add closure tables as an optimization:
|
||||
We will keep the current materialized path system for display purposes and backwards compatibility but add the closure table as the primary mechanism for all hierarchical operations.
|
||||
|
||||
1. **Maintain both systems** during transition
|
||||
2. **Use closure tables for**:
|
||||
- Directory aggregation
|
||||
- Subtree queries
|
||||
- Ancestor lookups
|
||||
- Move operations
|
||||
3. **Keep materialized paths for**:
|
||||
- Display purposes
|
||||
- Simple path construction
|
||||
- Backwards compatibility
|
||||
### Implementation Plan
|
||||
|
||||
### Migration Plan
|
||||
1. **Schema Migration:**
|
||||
* Create a new database migration file.
|
||||
* Add the `parent_id` column to the `entries` table.
|
||||
* Create the `entry_closure` table and its indexes as defined above.
|
||||
|
||||
1. **Add closure table** without removing existing structure
|
||||
2. **Populate closure table** during indexing:
|
||||
```rust
|
||||
// In indexing job
|
||||
fn process_entry(entry: &Entry, parent_id: Option<i32>) {
|
||||
// Insert entry
|
||||
let entry_id = insert_entry(entry);
|
||||
|
||||
// Build closure relationships
|
||||
if let Some(parent) = parent_id {
|
||||
// Insert all ancestor relationships
|
||||
insert_closure_relationships(entry_id, parent);
|
||||
}
|
||||
}
|
||||
```
|
||||
2. **Update Indexing Logic:**
|
||||
* Modify the `EntryProcessor::create_entry` function to accept a `parent_id`.
|
||||
* When a new entry is inserted, within the same database transaction:
|
||||
1. Insert the entry and get its new `id`.
|
||||
2. Insert the self-referential row into `entry_closure`: `(ancestor_id: id, descendant_id: id, depth: 0)`.
|
||||
3. If `parent_id` exists, execute the following query to copy the parent's ancestor relationships:
|
||||
```sql
|
||||
INSERT INTO entry_closure (ancestor_id, descendant_id, depth)
|
||||
SELECT p.ancestor_id, ? as descendant_id, p.depth + 1
|
||||
FROM entry_closure p
|
||||
WHERE p.descendant_id = ? -- parent_id
|
||||
```
|
||||
|
||||
3. **Update queries gradually** to use closure tables
|
||||
4. **Benchmark performance** improvements
|
||||
5. **Remove string-based queries** once proven
|
||||
3. **Refactor Core Operations:**
|
||||
|
||||
### Database Impact
|
||||
''' * **Move Operation:** This is the most complex part. When an `EntryMoved` event is handled, the entire operation **must be wrapped in a single database transaction** to ensure atomicity and prevent data corruption.
|
||||
1. **Disconnect Subtree:** Delete all hierarchical relationships for the moved node and its descendants, *except* for their own internal relationships.'''
|
||||
```sql
|
||||
DELETE FROM entry_closure
|
||||
WHERE descendant_id IN (SELECT descendant_id FROM entry_closure WHERE ancestor_id = ?1) -- All descendants of the moved node
|
||||
AND ancestor_id NOT IN (SELECT descendant_id FROM entry_closure WHERE ancestor_id = ?1); -- All ancestors of the moved node itself
|
||||
```
|
||||
2. **Update `parent_id`:** Set the `parent_id` of the moved entry to its new parent.
|
||||
3. **Reconnect Subtree:** Connect the moved subtree to its new parent.
|
||||
```sql
|
||||
INSERT INTO entry_closure (ancestor_id, descendant_id, depth)
|
||||
SELECT p.ancestor_id, c.descendant_id, p.depth + c.depth + 1
|
||||
FROM entry_closure p, entry_closure c
|
||||
WHERE p.descendant_id = ?1 -- new_parent_id
|
||||
AND c.ancestor_id = ?2; -- moved_entry_id
|
||||
```
|
||||
|
||||
**Storage overhead:**
|
||||
- For a tree with N nodes and average depth D: ~N * D rows
|
||||
- Example: 1M files, avg depth 5 = ~5M closure rows
|
||||
- With 3 integers per row = ~60MB additional storage
|
||||
* **Delete Operation:** With `ON DELETE CASCADE` defined on the foreign keys, the database will handle this automatically. When an entry is deleted, all rows in `entry_closure` where it is an `ancestor_id` or `descendant_id` will be removed.
|
||||
|
||||
**Trade-offs:**
|
||||
- ✅ O(1) child lookups vs O(N) string matching
|
||||
- ✅ O(1) subtree queries vs O(N) LIKE queries
|
||||
- ✅ Parallel aggregation possible
|
||||
- ❌ More complex inserts/moves
|
||||
- ❌ Additional storage requirements
|
||||
4. **Refactor Hierarchical Queries:**
|
||||
* Gradually replace all `LIKE` queries for path matching with efficient `JOIN`s on the `entry_closure` table.
|
||||
* **Get Children:** `... WHERE c.ancestor_id = ? AND c.depth = 1`
|
||||
* **Get Descendants:** `... WHERE c.ancestor_id = ? AND c.depth > 0`
|
||||
* **Get Ancestors:** `... WHERE c.descendant_id = ? ORDER BY c.depth DESC`
|
||||
|
||||
## Benchmarking Metrics
|
||||
## 5. Conclusion
|
||||
|
||||
Compare before/after implementation:
|
||||
1. **Directory listing speed** (get children)
|
||||
2. **Subtree query performance** (get all descendants)
|
||||
3. **Aggregation phase duration**
|
||||
4. **Move operation speed**
|
||||
5. **Memory usage**
|
||||
6. **Database size**
|
||||
|
||||
## Conclusion
|
||||
|
||||
Closure tables could significantly improve Spacedrive's indexing performance, especially for:
|
||||
- Large directory trees
|
||||
- Deep hierarchies
|
||||
- Frequent directory aggregation
|
||||
- Complex hierarchical queries
|
||||
|
||||
The hybrid approach allows gradual migration while maintaining backwards compatibility. The storage overhead (estimated ~6% for typical filesystems) is justified by the performance gains for read-heavy operations.
|
||||
|
||||
## Next Steps
|
||||
|
||||
1. Create proof-of-concept branch
|
||||
2. Implement closure table schema
|
||||
3. Add closure maintenance to indexing job
|
||||
4. Benchmark with real-world data
|
||||
5. Make go/no-go decision based on results
|
||||
While this is a significant architectural change, it is essential for the long-term performance and scalability of Spacedrive. The current string-based path matching is a critical bottleneck that this proposal directly and correctly addresses using established database patterns. The hybrid approach and phased rollout plan provide a safe and manageable path to implementation.
|
||||
'''
|
||||
@@ -33,38 +33,37 @@
|
||||
\BOOKMARK [1][-]{section.6}{\376\377\0006\000\040\000I\000m\000p\000l\000e\000m\000e\000n\000t\000a\000t\000i\000o\000n\000\040\000a\000n\000d\000\040\000E\000v\000a\000l\000u\000a\000t\000i\000o\000n}{}% 33
|
||||
\BOOKMARK [2][-]{subsection.6.1}{\376\377\0006\000.\0001\000\040\000T\000e\000c\000h\000n\000o\000l\000o\000g\000y\000\040\000S\000t\000a\000c\000k}{section.6}% 34
|
||||
\BOOKMARK [2][-]{subsection.6.2}{\376\377\0006\000.\0002\000\040\000D\000a\000t\000a\000b\000a\000s\000e\000\040\000S\000c\000h\000e\000m\000a\000\040\000O\000p\000t\000i\000m\000i\000z\000a\000t\000i\000o\000n}{section.6}% 35
|
||||
\BOOKMARK [2][-]{subsection.6.3}{\376\377\0006\000.\0003\000\040\000C\000a\000s\000e\000\040\000S\000t\000u\000d\000y\000:\000\040\000H\000i\000e\000r\000a\000r\000c\000h\000i\000c\000a\000l\000\040\000Q\000u\000e\000r\000y\000\040\000O\000p\000t\000i\000m\000i\000z\000a\000t\000i\000o\000n\000\040\000[\000P\000l\000a\000n\000n\000e\000d\000]}{section.6}% 36
|
||||
\BOOKMARK [2][-]{subsection.6.4}{\376\377\0006\000.\0004\000\040\000T\000e\000s\000t\000i\000n\000g\000\040\000a\000n\000d\000\040\000V\000a\000l\000i\000d\000a\000t\000i\000o\000n\000\040\000F\000r\000a\000m\000e\000w\000o\000r\000k}{section.6}% 37
|
||||
\BOOKMARK [2][-]{subsection.6.5}{\376\377\0006\000.\0005\000\040\000C\000o\000m\000p\000a\000t\000i\000b\000i\000l\000i\000t\000y\000\040\000a\000n\000d\000\040\000I\000n\000t\000e\000r\000o\000p\000e\000r\000a\000b\000i\000l\000i\000t\000y}{section.6}% 38
|
||||
\BOOKMARK [2][-]{subsection.6.6}{\376\377\0006\000.\0006\000\040\000S\000c\000a\000l\000a\000b\000i\000l\000i\000t\000y\000\040\000L\000i\000m\000i\000t\000s\000\040\000a\000n\000d\000\040\000A\000r\000c\000h\000i\000t\000e\000c\000t\000u\000r\000a\000l\000\040\000B\000o\000u\000n\000d\000a\000r\000i\000e\000s}{section.6}% 39
|
||||
\BOOKMARK [2][-]{subsection.6.7}{\376\377\0006\000.\0007\000\040\000F\000a\000i\000l\000u\000r\000e\000\040\000R\000e\000c\000o\000v\000e\000r\000y\000\040\000S\000c\000e\000n\000a\000r\000i\000o\000s}{section.6}% 40
|
||||
\BOOKMARK [2][-]{subsection.6.8}{\376\377\0006\000.\0008\000\040\000E\000x\000t\000e\000n\000s\000i\000b\000i\000l\000i\000t\000y\000\040\000A\000r\000c\000h\000i\000t\000e\000c\000t\000u\000r\000e}{section.6}% 41
|
||||
\BOOKMARK [1][-]{section.7}{\376\377\0007\000\040\000R\000e\000s\000o\000u\000r\000c\000e\000\040\000E\000f\000f\000i\000c\000i\000e\000n\000c\000y\000\040\000a\000n\000d\000\040\000M\000o\000b\000i\000l\000e\000\040\000C\000o\000n\000s\000i\000d\000e\000r\000a\000t\000i\000o\000n\000s}{}% 42
|
||||
\BOOKMARK [2][-]{subsection.7.1}{\376\377\0007\000.\0001\000\040\000A\000d\000a\000p\000t\000i\000v\000e\000\040\000B\000a\000c\000k\000g\000r\000o\000u\000n\000d\000\040\000P\000r\000o\000c\000e\000s\000s\000i\000n\000g}{section.7}% 43
|
||||
\BOOKMARK [2][-]{subsection.7.2}{\376\377\0007\000.\0002\000\040\000S\000t\000o\000r\000a\000g\000e\000\040\000E\000f\000f\000i\000c\000i\000e\000n\000c\000y}{section.7}% 44
|
||||
\BOOKMARK [2][-]{subsection.7.3}{\376\377\0007\000.\0003\000\040\000N\000e\000t\000w\000o\000r\000k\000\040\000E\000f\000f\000i\000c\000i\000e\000n\000c\000y}{section.7}% 45
|
||||
\BOOKMARK [1][-]{section.8}{\376\377\0008\000\040\000S\000e\000c\000u\000r\000i\000t\000y\000\040\000a\000n\000d\000\040\000P\000r\000i\000v\000a\000c\000y\000\040\000M\000o\000d\000e\000l}{}% 46
|
||||
\BOOKMARK [2][-]{subsection.8.1}{\376\377\0008\000.\0001\000\040\000D\000a\000t\000a\000\040\000P\000r\000o\000t\000e\000c\000t\000i\000o\000n\000\040\000a\000t\000\040\000R\000e\000s\000t}{section.8}% 47
|
||||
\BOOKMARK [2][-]{subsection.8.2}{\376\377\0008\000.\0002\000\040\000N\000e\000t\000w\000o\000r\000k\000\040\000S\000e\000c\000u\000r\000i\000t\000y}{section.8}% 48
|
||||
\BOOKMARK [2][-]{subsection.8.3}{\376\377\0008\000.\0003\000\040\000C\000r\000e\000d\000e\000n\000t\000i\000a\000l\000\040\000M\000a\000n\000a\000g\000e\000m\000e\000n\000t}{section.8}% 49
|
||||
\BOOKMARK [2][-]{subsection.8.4}{\376\377\0008\000.\0004\000\040\000T\000h\000r\000e\000a\000t\000\040\000M\000o\000d\000e\000l}{section.8}% 50
|
||||
\BOOKMARK [2][-]{subsection.8.5}{\376\377\0008\000.\0005\000\040\000C\000e\000r\000t\000i\000f\000i\000c\000a\000t\000e\000\040\000P\000i\000n\000n\000i\000n\000g\000\040\000a\000n\000d\000\040\000A\000P\000I\000\040\000S\000e\000c\000u\000r\000i\000t\000y}{section.8}% 51
|
||||
\BOOKMARK [2][-]{subsection.8.6}{\376\377\0008\000.\0006\000\040\000R\000a\000t\000e\000\040\000L\000i\000m\000i\000t\000i\000n\000g\000\040\000a\000n\000d\000\040\000A\000b\000u\000s\000e\000\040\000P\000r\000e\000v\000e\000n\000t\000i\000o\000n}{section.8}% 52
|
||||
\BOOKMARK [2][-]{subsection.8.7}{\376\377\0008\000.\0007\000\040\000A\000u\000d\000i\000t\000\040\000L\000o\000g\000\040\000I\000m\000m\000u\000t\000a\000b\000i\000l\000i\000t\000y}{section.8}% 53
|
||||
\BOOKMARK [2][-]{subsection.8.8}{\376\377\0008\000.\0008\000\040\000S\000p\000a\000c\000e\000d\000r\000i\000v\000e\000\040\000C\000l\000o\000u\000d\000\040\000S\000e\000r\000v\000i\000c\000e\000\040\000P\000r\000i\000v\000a\000c\000y\000\040\000M\000o\000d\000e\000l}{section.8}% 54
|
||||
\BOOKMARK [2][-]{subsection.8.9}{\376\377\0008\000.\0009\000\040\000P\000r\000i\000v\000a\000c\000y\000-\000P\000r\000e\000s\000e\000r\000v\000i\000n\000g\000\040\000A\000I}{section.8}% 55
|
||||
\BOOKMARK [2][-]{subsection.8.10}{\376\377\0008\000.\0001\0000\000\040\000B\000a\000l\000a\000n\000c\000i\000n\000g\000\040\000P\000r\000i\000v\000a\000c\000y\000\040\000a\000n\000d\000\040\000P\000u\000b\000l\000i\000c\000\040\000S\000h\000a\000r\000i\000n\000g}{section.8}% 56
|
||||
\BOOKMARK [1][-]{section.9}{\376\377\0009\000\040\000P\000r\000a\000c\000t\000i\000c\000a\000l\000\040\000C\000o\000n\000f\000l\000i\000c\000t\000\040\000R\000e\000s\000o\000l\000u\000t\000i\000o\000n}{}% 57
|
||||
\BOOKMARK [2][-]{subsection.9.1}{\376\377\0009\000.\0001\000\040\000M\000e\000t\000a\000d\000a\000t\000a\000\040\000C\000o\000n\000f\000l\000i\000c\000t\000\040\000S\000c\000e\000n\000a\000r\000i\000o\000s}{section.9}% 58
|
||||
\BOOKMARK [2][-]{subsection.9.2}{\376\377\0009\000.\0002\000\040\000A\000d\000v\000a\000n\000c\000e\000d\000\040\000C\000o\000n\000f\000l\000i\000c\000t\000\040\000S\000c\000e\000n\000a\000r\000i\000o\000s}{section.9}% 59
|
||||
\BOOKMARK [2][-]{subsection.9.3}{\376\377\0009\000.\0003\000\040\000C\000o\000n\000f\000l\000i\000c\000t\000\040\000P\000r\000e\000v\000e\000n\000t\000i\000o\000n}{section.9}% 60
|
||||
\BOOKMARK [1][-]{section.10}{\376\377\0001\0000\000\040\000C\000o\000n\000c\000l\000u\000s\000i\000o\000n}{}% 61
|
||||
\BOOKMARK [2][-]{subsection.10.1}{\376\377\0001\0000\000.\0001\000\040\000K\000e\000y\000\040\000C\000o\000n\000t\000r\000i\000b\000u\000t\000i\000o\000n\000s\000\040\000a\000n\000d\000\040\000R\000e\000a\000l\000-\000W\000o\000r\000l\000d\000\040\000I\000m\000p\000a\000c\000t}{section.10}% 62
|
||||
\BOOKMARK [2][-]{subsection.10.2}{\376\377\0001\0000\000.\0002\000\040\000S\000y\000s\000t\000e\000m\000\040\000I\000n\000t\000e\000g\000r\000a\000t\000i\000o\000n}{section.10}% 63
|
||||
\BOOKMARK [2][-]{subsection.10.3}{\376\377\0001\0000\000.\0003\000\040\000V\000a\000l\000i\000d\000a\000t\000i\000o\000n\000\040\000i\000n\000\040\000P\000r\000o\000d\000u\000c\000t\000i\000o\000n}{section.10}% 64
|
||||
\BOOKMARK [2][-]{subsection.10.4}{\376\377\0001\0000\000.\0004\000\040\000F\000u\000t\000u\000r\000e\000\040\000W\000o\000r\000k\000\040\000a\000n\000d\000\040\000R\000o\000a\000d\000m\000a\000p}{section.10}% 65
|
||||
\BOOKMARK [2][-]{subsection.10.5}{\376\377\0001\0000\000.\0005\000\040\000L\000i\000m\000i\000t\000a\000t\000i\000o\000n\000s}{section.10}% 66
|
||||
\BOOKMARK [2][-]{subsection.10.6}{\376\377\0001\0000\000.\0006\000\040\000B\000r\000o\000a\000d\000e\000r\000\040\000I\000m\000p\000l\000i\000c\000a\000t\000i\000o\000n\000s}{section.10}% 67
|
||||
\BOOKMARK [1][-]{section*.46}{\376\377\000A\000c\000k\000n\000o\000w\000l\000e\000d\000g\000m\000e\000n\000t\000s}{}% 68
|
||||
\BOOKMARK [1][-]{appendix.A}{\376\377\000A\000\040\000G\000l\000o\000s\000s\000a\000r\000y\000\040\000o\000f\000\040\000T\000e\000r\000m\000s}{}% 69
|
||||
\BOOKMARK [1][-]{section*.56}{\376\377\000R\000e\000f\000e\000r\000e\000n\000c\000e\000s}{}% 70
|
||||
\BOOKMARK [2][-]{subsection.6.3}{\376\377\0006\000.\0003\000\040\000T\000e\000s\000t\000i\000n\000g\000\040\000a\000n\000d\000\040\000V\000a\000l\000i\000d\000a\000t\000i\000o\000n\000\040\000F\000r\000a\000m\000e\000w\000o\000r\000k}{section.6}% 36
|
||||
\BOOKMARK [2][-]{subsection.6.4}{\376\377\0006\000.\0004\000\040\000C\000o\000m\000p\000a\000t\000i\000b\000i\000l\000i\000t\000y\000\040\000a\000n\000d\000\040\000I\000n\000t\000e\000r\000o\000p\000e\000r\000a\000b\000i\000l\000i\000t\000y}{section.6}% 37
|
||||
\BOOKMARK [2][-]{subsection.6.5}{\376\377\0006\000.\0005\000\040\000S\000c\000a\000l\000a\000b\000i\000l\000i\000t\000y\000\040\000L\000i\000m\000i\000t\000s\000\040\000a\000n\000d\000\040\000A\000r\000c\000h\000i\000t\000e\000c\000t\000u\000r\000a\000l\000\040\000B\000o\000u\000n\000d\000a\000r\000i\000e\000s}{section.6}% 38
|
||||
\BOOKMARK [2][-]{subsection.6.6}{\376\377\0006\000.\0006\000\040\000F\000a\000i\000l\000u\000r\000e\000\040\000R\000e\000c\000o\000v\000e\000r\000y\000\040\000S\000c\000e\000n\000a\000r\000i\000o\000s}{section.6}% 39
|
||||
\BOOKMARK [2][-]{subsection.6.7}{\376\377\0006\000.\0007\000\040\000E\000x\000t\000e\000n\000s\000i\000b\000i\000l\000i\000t\000y\000\040\000A\000r\000c\000h\000i\000t\000e\000c\000t\000u\000r\000e}{section.6}% 40
|
||||
\BOOKMARK [1][-]{section.7}{\376\377\0007\000\040\000R\000e\000s\000o\000u\000r\000c\000e\000\040\000E\000f\000f\000i\000c\000i\000e\000n\000c\000y\000\040\000a\000n\000d\000\040\000M\000o\000b\000i\000l\000e\000\040\000C\000o\000n\000s\000i\000d\000e\000r\000a\000t\000i\000o\000n\000s}{}% 41
|
||||
\BOOKMARK [2][-]{subsection.7.1}{\376\377\0007\000.\0001\000\040\000A\000d\000a\000p\000t\000i\000v\000e\000\040\000B\000a\000c\000k\000g\000r\000o\000u\000n\000d\000\040\000P\000r\000o\000c\000e\000s\000s\000i\000n\000g}{section.7}% 42
|
||||
\BOOKMARK [2][-]{subsection.7.2}{\376\377\0007\000.\0002\000\040\000S\000t\000o\000r\000a\000g\000e\000\040\000E\000f\000f\000i\000c\000i\000e\000n\000c\000y}{section.7}% 43
|
||||
\BOOKMARK [2][-]{subsection.7.3}{\376\377\0007\000.\0003\000\040\000N\000e\000t\000w\000o\000r\000k\000\040\000E\000f\000f\000i\000c\000i\000e\000n\000c\000y}{section.7}% 44
|
||||
\BOOKMARK [1][-]{section.8}{\376\377\0008\000\040\000S\000e\000c\000u\000r\000i\000t\000y\000\040\000a\000n\000d\000\040\000P\000r\000i\000v\000a\000c\000y\000\040\000M\000o\000d\000e\000l}{}% 45
|
||||
\BOOKMARK [2][-]{subsection.8.1}{\376\377\0008\000.\0001\000\040\000D\000a\000t\000a\000\040\000P\000r\000o\000t\000e\000c\000t\000i\000o\000n\000\040\000a\000t\000\040\000R\000e\000s\000t}{section.8}% 46
|
||||
\BOOKMARK [2][-]{subsection.8.2}{\376\377\0008\000.\0002\000\040\000N\000e\000t\000w\000o\000r\000k\000\040\000S\000e\000c\000u\000r\000i\000t\000y}{section.8}% 47
|
||||
\BOOKMARK [2][-]{subsection.8.3}{\376\377\0008\000.\0003\000\040\000C\000r\000e\000d\000e\000n\000t\000i\000a\000l\000\040\000M\000a\000n\000a\000g\000e\000m\000e\000n\000t}{section.8}% 48
|
||||
\BOOKMARK [2][-]{subsection.8.4}{\376\377\0008\000.\0004\000\040\000T\000h\000r\000e\000a\000t\000\040\000M\000o\000d\000e\000l}{section.8}% 49
|
||||
\BOOKMARK [2][-]{subsection.8.5}{\376\377\0008\000.\0005\000\040\000C\000e\000r\000t\000i\000f\000i\000c\000a\000t\000e\000\040\000P\000i\000n\000n\000i\000n\000g\000\040\000a\000n\000d\000\040\000A\000P\000I\000\040\000S\000e\000c\000u\000r\000i\000t\000y}{section.8}% 50
|
||||
\BOOKMARK [2][-]{subsection.8.6}{\376\377\0008\000.\0006\000\040\000R\000a\000t\000e\000\040\000L\000i\000m\000i\000t\000i\000n\000g\000\040\000a\000n\000d\000\040\000A\000b\000u\000s\000e\000\040\000P\000r\000e\000v\000e\000n\000t\000i\000o\000n}{section.8}% 51
|
||||
\BOOKMARK [2][-]{subsection.8.7}{\376\377\0008\000.\0007\000\040\000A\000u\000d\000i\000t\000\040\000L\000o\000g\000\040\000I\000m\000m\000u\000t\000a\000b\000i\000l\000i\000t\000y}{section.8}% 52
|
||||
\BOOKMARK [2][-]{subsection.8.8}{\376\377\0008\000.\0008\000\040\000S\000p\000a\000c\000e\000d\000r\000i\000v\000e\000\040\000C\000l\000o\000u\000d\000\040\000S\000e\000r\000v\000i\000c\000e\000\040\000P\000r\000i\000v\000a\000c\000y\000\040\000M\000o\000d\000e\000l}{section.8}% 53
|
||||
\BOOKMARK [2][-]{subsection.8.9}{\376\377\0008\000.\0009\000\040\000P\000r\000i\000v\000a\000c\000y\000-\000P\000r\000e\000s\000e\000r\000v\000i\000n\000g\000\040\000A\000I}{section.8}% 54
|
||||
\BOOKMARK [2][-]{subsection.8.10}{\376\377\0008\000.\0001\0000\000\040\000B\000a\000l\000a\000n\000c\000i\000n\000g\000\040\000P\000r\000i\000v\000a\000c\000y\000\040\000a\000n\000d\000\040\000P\000u\000b\000l\000i\000c\000\040\000S\000h\000a\000r\000i\000n\000g}{section.8}% 55
|
||||
\BOOKMARK [1][-]{section.9}{\376\377\0009\000\040\000P\000r\000a\000c\000t\000i\000c\000a\000l\000\040\000C\000o\000n\000f\000l\000i\000c\000t\000\040\000R\000e\000s\000o\000l\000u\000t\000i\000o\000n}{}% 56
|
||||
\BOOKMARK [2][-]{subsection.9.1}{\376\377\0009\000.\0001\000\040\000M\000e\000t\000a\000d\000a\000t\000a\000\040\000C\000o\000n\000f\000l\000i\000c\000t\000\040\000S\000c\000e\000n\000a\000r\000i\000o\000s}{section.9}% 57
|
||||
\BOOKMARK [2][-]{subsection.9.2}{\376\377\0009\000.\0002\000\040\000A\000d\000v\000a\000n\000c\000e\000d\000\040\000C\000o\000n\000f\000l\000i\000c\000t\000\040\000S\000c\000e\000n\000a\000r\000i\000o\000s}{section.9}% 58
|
||||
\BOOKMARK [2][-]{subsection.9.3}{\376\377\0009\000.\0003\000\040\000C\000o\000n\000f\000l\000i\000c\000t\000\040\000P\000r\000e\000v\000e\000n\000t\000i\000o\000n}{section.9}% 59
|
||||
\BOOKMARK [1][-]{section.10}{\376\377\0001\0000\000\040\000C\000o\000n\000c\000l\000u\000s\000i\000o\000n}{}% 60
|
||||
\BOOKMARK [2][-]{subsection.10.1}{\376\377\0001\0000\000.\0001\000\040\000K\000e\000y\000\040\000C\000o\000n\000t\000r\000i\000b\000u\000t\000i\000o\000n\000s\000\040\000a\000n\000d\000\040\000R\000e\000a\000l\000-\000W\000o\000r\000l\000d\000\040\000I\000m\000p\000a\000c\000t}{section.10}% 61
|
||||
\BOOKMARK [2][-]{subsection.10.2}{\376\377\0001\0000\000.\0002\000\040\000S\000y\000s\000t\000e\000m\000\040\000I\000n\000t\000e\000g\000r\000a\000t\000i\000o\000n}{section.10}% 62
|
||||
\BOOKMARK [2][-]{subsection.10.3}{\376\377\0001\0000\000.\0003\000\040\000V\000a\000l\000i\000d\000a\000t\000i\000o\000n\000\040\000i\000n\000\040\000P\000r\000o\000d\000u\000c\000t\000i\000o\000n}{section.10}% 63
|
||||
\BOOKMARK [2][-]{subsection.10.4}{\376\377\0001\0000\000.\0004\000\040\000F\000u\000t\000u\000r\000e\000\040\000W\000o\000r\000k\000\040\000a\000n\000d\000\040\000R\000o\000a\000d\000m\000a\000p}{section.10}% 64
|
||||
\BOOKMARK [2][-]{subsection.10.5}{\376\377\0001\0000\000.\0005\000\040\000L\000i\000m\000i\000t\000a\000t\000i\000o\000n\000s}{section.10}% 65
|
||||
\BOOKMARK [2][-]{subsection.10.6}{\376\377\0001\0000\000.\0006\000\040\000B\000r\000o\000a\000d\000e\000r\000\040\000I\000m\000p\000l\000i\000c\000a\000t\000i\000o\000n\000s}{section.10}% 66
|
||||
\BOOKMARK [1][-]{section*.46}{\376\377\000A\000c\000k\000n\000o\000w\000l\000e\000d\000g\000m\000e\000n\000t\000s}{}% 67
|
||||
\BOOKMARK [1][-]{appendix.A}{\376\377\000A\000\040\000G\000l\000o\000s\000s\000a\000r\000y\000\040\000o\000f\000\040\000T\000e\000r\000m\000s}{}% 68
|
||||
\BOOKMARK [1][-]{section*.56}{\376\377\000R\000e\000f\000e\000r\000e\000n\000c\000e\000s}{}% 69
|
||||
|
||||
Binary file not shown.
@@ -379,6 +379,7 @@ Unique ID & Globally unique, immutable identifier \\
|
||||
Universal Path & Complete location with device ID \\
|
||||
Name \& Type & File/directory name and type \\
|
||||
\textbf{Metadata ID} & \textbf{Instant tagging without content analysis} \\
|
||||
Parent ID & Direct link to parent Entry for fast hierarchy traversal \\
|
||||
Content ID & Links to deduplication fingerprint \\
|
||||
Discovery Time & First detection timestamp \\
|
||||
\bottomrule
|
||||
@@ -387,7 +388,7 @@ Discovery Time & First detection timestamp \\
|
||||
\label{tab:entry-fields}
|
||||
\end{table}
|
||||
|
||||
\textbf{Key Innovation}: Entries are created immediately during the discovery phase (detailed in Section~\ref{sec:indexing-engine}), enabling near instant browsing and tagging. The \texttt{metadata\_id} field ensures every Entry can receive user metadata the moment it's discovered, while content analysis continues asynchronously in later phases. This "metadata-first" approach means users never wait to organize their files—whether browsing managed Locations or exploring external drives through ephemeral mode (Section~\ref{sec:indexing-scopes}).
|
||||
\textbf{Key Innovation}: Entries are created immediately during the discovery phase (detailed in Section~\ref{sec:indexing-engine}), enabling near instant browsing and tagging. The \texttt{metadata\_id} ensures every Entry can receive user metadata the moment it's discovered. Simultaneously, the \texttt{parent\_id} provides a direct link to its parent, allowing the system to instantly place the new file in the correct hierarchical context and transactionally update the index, while slower content analysis continues asynchronously. This "metadata-first" approach means users never wait to organize their files—whether browsing managed Locations or exploring external drives through ephemeral mode (Section~\ref{sec:indexing-scopes}).
|
||||
|
||||
This "metadata-first" capability is a direct result of the indexer's multi-phase architecture (detailed in Section~\ref{sec:indexing-engine}). The initial \textbf{Discovery phase} is a high-speed filesystem traversal that creates a lightweight \texttt{Entry} record for each item it finds. This record, containing the essential \texttt{metadata\_id}, is established almost instantly, allowing for immediate user interaction like tagging. Slower, content-aware operations like hashing and media analysis occur in subsequent, asynchronous phases, enriching the \texttt{Entry} over time without blocking initial organization.
|
||||
|
||||
@@ -923,6 +924,13 @@ async fn hash_remote_entry(op: &Operator, entry: &Entry) -> Result<ContentId> {
|
||||
|
||||
\footnotetext{OpenDAL v0.49 changelog emphasizes ranged reads and metadata alignment for services like S3/GCS~\cite{opendal-changelog-2025}.}
|
||||
|
||||
\subsubsection{High-Performance Hierarchical Indexing}
|
||||
To overcome the scaling limitations of traditional path-based queries (e.g., \texttt{LIKE 'path/\%'}), Spacedrive's indexer implements a \textbf{Closure Table} for all hierarchical data. This standard database pattern pre-calculates and stores all ancestor-descendant relationships, transforming slow, un-indexable string comparisons into highly efficient, indexed integer joins.
|
||||
|
||||
When a new entry is created, its \texttt{parent\_id} is used to transactionally populate the \texttt{entry\_closure} table with all its ancestor relationships in a single, atomic operation. This ensures the hierarchy is always consistent.
|
||||
|
||||
A critical component of this system is the indexer's resilient change detection. To ensure data integrity, especially for offline changes, the indexer uses \textbf{inode tracking} to reliably differentiate between a file move and a delete/add operation. When a file is moved within the same volume, its inode remains constant. The indexer leverages this to identify the operation as a move, triggering a safe, transactional update of the entry's path and its position in the closure table, rather than performing a destructive and incorrect delete-and-re-add. This preserves all user metadata and ensures the integrity of the hierarchical index.
|
||||
|
||||
% --- SECTION 4.4: THE TRANSACTIONAL ACTION SYSTEM ---
|
||||
\subsection{The Transactional Action System}
|
||||
\label{sec:action-system}
|
||||
@@ -1069,7 +1077,7 @@ The simulation engine operates through a three-step process: retrieving relevant
|
||||
|
||||
\textbf{Operational Conflict Detection}
|
||||
|
||||
The simulation engine proactively identifies operational conflicts that would cause traditional file operations to fail:
|
||||
The closure table also dramatically accelerates the preview generation itself. When simulating a move or copy of a large directory, the system can now instantly determine the entire scope of the operation (all descendant files and folders) with a single, efficient query, rather than a slow recursive path scan. The simulation engine proactively identifies operational conflicts that would cause traditional file operations to fail:
|
||||
\begin{itemize}[noitemsep, topsep=0pt]
|
||||
\item \textbf{Storage Constraints}: Calculates exact space requirements and verifies availability on target devices
|
||||
\item \textbf{Permission Violations}: Detects write-protected locations or access-restricted files before attempting operations
|
||||
@@ -1561,8 +1569,9 @@ This architecture ensures the agent's role is simply to initiate the search; it
|
||||
The \texttt{SearchJob} executes a hybrid temporal-semantic query designed for performance on consumer hardware. This \textbf{Temporal-First, Vector-Enhanced} approach operates in two stages:
|
||||
|
||||
\begin{enumerate}[noitemsep, topsep=0pt]
|
||||
\item \textbf{Temporal Filtering (FTS5)}: First, SQLite's FTS5 index performs a high-speed keyword search on filenames, paths, tags, and extracted text from sidecars. This rapidly narrows millions of entries down to a small, relevant candidate set, typically in under 55ms.
|
||||
\item \textbf{Semantic Re-ranking}: For queries requiring semantic understanding, the system computes a vector embedding of the query using a lightweight local model (e.g., \texttt{all-MiniLM-L6-v2}). It then re-ranks only the candidate set from Stage 1 by cosine similarity against their pre-computed embeddings, adding minimal latency (typically <40ms).
|
||||
\item \textbf{Hierarchical Pre-filtering (Closure Table)}: If the search query includes a path scope (e.g., "find photos in 'Projects'"), the system first uses the closure table to get a list of all entry IDs within that scope. This provides a highly-focused candidate set before any text search occurs.
|
||||
\item \textbf{Temporal Filtering (FTS5)}: Next, SQLite's FTS5 index performs a high-speed keyword search only on the pre-filtered candidate set. This rapidly narrows millions of entries down to a small, relevant set, typically in under 55ms.
|
||||
\item \textbf{Semantic Re-ranking}: For queries requiring semantic understanding, the system computes a vector embedding of the query using a lightweight local model (e.g., \texttt{all-MiniLM-L6-v2}). It then re-ranks only the candidate set from Stage 2 by cosine similarity against their pre-computed embeddings, adding minimal latency (typically <40ms).
|
||||
\end{enumerate}
|
||||
|
||||
This hybrid process provides the power of semantic search with the speed of traditional indexed search, delivering sub-100ms response times on consumer hardware for libraries with over a million entries.
|
||||
@@ -1991,6 +2000,8 @@ Spacedrive is implemented in \textbf{Rust} to leverage its guarantees of memory
|
||||
|
||||
\item \textbf{SeaORM with SQLite}: Type-safe database operations with ACID transactions and FTS
|
||||
|
||||
\item \textbf{O(1) Hierarchical Lookups}: Directory listings, subtree calculations, and ancestor lookups are performed in constant time via indexed closure table joins, replacing slow LIKE queries and scaling to millions of entries
|
||||
|
||||
\item \textbf{Event-Driven Architecture}: Custom EventBus for loose coupling and state propagation
|
||||
|
||||
\item \textbf{Job System}: MessagePack-serialized tasks with automatic resumability
|
||||
@@ -2027,65 +2038,6 @@ This approach works well for simple file operations but has performance limitati
|
||||
|
||||
While effective for simple operations, this approach encounters performance limitations with deep hierarchies and complex aggregation queries.
|
||||
|
||||
\subsection{\plannedSection{Case Study: Hierarchical Query Optimization}}
|
||||
Spacedrive plans to migrate from materialized paths to a closure table\footnote{A closure table pre-stores all folder relationships, using more storage for faster queries.} for directories. This changes O(N) operations to O(1) lookups.
|
||||
|
||||
\subsubsection{Performance Problem}
|
||||
The current materialized path approach leads to inefficient operations for complex hierarchy queries:
|
||||
|
||||
\begin{itemize}
|
||||
\item \textbf{String-based path matching} for ancestor/descendant relationships
|
||||
\item \textbf{Sequential directory aggregation} requiring multiple database round-trips
|
||||
\item \textbf{O(N) LIKE queries} for subtree operations on large directory trees
|
||||
\item \textbf{Complex join patterns} for multi-level hierarchy operations
|
||||
\end{itemize}
|
||||
|
||||
\subsubsection{Closure Table Solution}
|
||||
The closure table stores all folder relationships:
|
||||
|
||||
|
||||
\textbf{Closure Table: Pre-computed Relationships}
|
||||
|
||||
The planned optimization stores all parent-child relationships explicitly:
|
||||
|
||||
\begin{center}
|
||||
\begin{tabular}{|l|p{4cm}|}
|
||||
\hline
|
||||
\footnotesize
|
||||
\textbf{Relationship} & \textbf{Storage Method} \\
|
||||
\hline
|
||||
Direct Relationships & Every parent-child pair (folder contains file) \\
|
||||
\hline
|
||||
Indirect Relationships & Every ancestor-descendant pair with depth tracking \\
|
||||
\hline
|
||||
Performance Indexes & Optimized lookup tables for instant hierarchy queries \\
|
||||
\hline
|
||||
\end{tabular}
|
||||
\end{center}
|
||||
|
||||
\textbf{Performance Transformation}
|
||||
|
||||
This approach converts complex hierarchy operations into simple, fast lookups:
|
||||
|
||||
\begin{itemize}[noitemsep, topsep=0pt]
|
||||
\item \textbf{Directory Listing}: Instant retrieval of all files in a folder (no pattern matching)
|
||||
\item \textbf{Subtree Operations}: Get entire folder contents with single query
|
||||
\item \textbf{Size Calculations}: Calculate total folder size across all subdirectories in one operation
|
||||
\item \textbf{Ancestor Lookup}: Find the parent folder chain instantly (no path parsing)
|
||||
\end{itemize}
|
||||
|
||||
|
||||
\subsubsection{Performance Projections}
|
||||
Based on algorithmic analysis and preliminary benchmarks:
|
||||
|
||||
\begin{itemize}[noitemsep, topsep=0pt]
|
||||
\item \textbf{Directory listing}: O(N) string matching → O(1) indexed lookup
|
||||
\item \textbf{Subtree traversal}: O(N) recursive queries → O(1) join operation
|
||||
\item \textbf{Ancestor lookup}: O(D) path parsing → O(1) indexed lookup
|
||||
\item \textbf{Bulk aggregation}: O(N×D) sequential → O(N) parallel processing
|
||||
\end{itemize}
|
||||
|
||||
\textbf{Storage overhead}: For a filesystem with N entries and average depth D, closure tables require approximately N×D additional rows. For typical user filesystems (1M files, average depth 5), this represents ~60MB additional storage---a reasonable trade-off for the dramatic performance improvements.
|
||||
|
||||
\subsection{Testing and Validation Framework}
|
||||
Spacedrive employs a thorough testing strategy designed for real-world scenarios:
|
||||
|
||||
Reference in New Issue
Block a user