- Introduces a new, dated sync design document (`SYNC_DESIGN_2025_08_19.md`) to reflect a more robust and refined architecture.
- The new design incorporates several key improvements based on a detailed analysis of the V2 codebase and data models.
Key changes in the new design include:
- A clear architectural distinction between finite "Jobs" (e.g., Backfill) and long-running "Services" (e.g., LiveSync).
- Explicit clarification that derived data, such as closure tables, will not be synced and must be rebuilt locally on each device.
- A new requirement for log redaction and compaction on the leader to ensure data privacy and efficient storage.
- Also updates the original `SYNC_DESIGN.md` to include the clarification about not syncing closure tables.
- Removed the `start_networking` method call from the daemon initialization process, as it was redundant after `init_networking`.
- Commented out the volume detection initialization code for future review, improving code clarity.
- Enhanced the pairing protocol by establishing a persistent connection immediately after successful pairing, ensuring better device communication.
- Updated the handling of pairing messages to maintain a continuous stream for message exchanges, improving responsiveness.
- Improved logging throughout the networking and pairing processes for better visibility into operations and error handling.
These changes aim to streamline the networking setup and enhance the reliability of the pairing protocol within the system.
- Refactored the `test_database_creation_and_migration` function in `database_migration_test.rs` for improved readability and structure.
- Updated the whitepaper to replace the outdated indexing duration summary with a new performance benchmarks description, including testing details on hardware specifications.
- Removed deprecated CSV inputs from the LaTeX document and adjusted related figures to enhance clarity in performance reporting.
These changes aim to improve the maintainability of the test code and enhance the accuracy of performance insights in the whitepaper.
- Introduced a `data_dir` field in `DeviceManager` to allow specification of a custom data directory for device configurations and keys.
- Updated `DeviceKeyManager` to support fallback file paths for master keys, ensuring consistent key management across different data directories.
- Enhanced the `DevicePersistence` and `DeviceRegistry` implementations to utilize the new data directory structure, improving the reliability of device connection persistence.
- Added comprehensive logging and retry mechanisms in the networking service to improve connection stability and visibility during device interactions.
These changes aim to enhance the flexibility and robustness of device management and key handling within the system.
- Added functionality to immediately mark the initiator device as connected upon successful pairing completion, ensuring accurate device status even if the completion message fails.
- Implemented logging for both successful and failed attempts to mark the device as connected, improving visibility into the pairing process.
- Updated file transfer tests to display full checksums instead of truncated versions, enhancing clarity in verification outputs.
These changes aim to improve the reliability of device connection status during pairing and enhance the clarity of file transfer verification.
- Introduced a new `LocationAdded` variant in the `ActionOutput` enum to encapsulate the location ID and optional job ID upon successful location addition.
- Updated the `handle_location_command` function to handle the new response format from the daemon, including improved error handling for unexpected responses.
- Modified the `LocationHandler` to extract and return the location ID and job ID from the action output, ensuring proper response structure.
- Refactored the `ActionHandler` for `LocationAddHandler` to utilize the new `LocationAdded` output format, enhancing clarity in action results.
These changes aim to improve the handling of location addition actions and provide clearer feedback in the system's output.
- Introduced a new `build.rs` file to automate the generation of build metadata, including Git SHA, commit timestamp, and branch information.
- Updated `Cargo.toml` to include new dependencies such as `comfy-table`, `dialoguer`, `indicatif`, `owo-colors`, `supports-color`, `console`, and `colored` for improved CLI output and user interaction.
- Added a new shell script `update_spacedrive.sh` for streamlined updates of the Spacedrive CLI, including Git pull and project build processes.
- Refactored benchmark scenarios by renaming and consolidating related functionality, enhancing clarity and maintainability.
These changes aim to improve the build process and enhance the user experience within the Spacedrive system.
- Introduced new JSON files for shape_large, shape_medium, and shape_small content identification benchmarks, detailing performance metrics and hardware specifications.
- Updated the whitepaper metrics CSV to include the new benchmark results, enhancing the reporting structure for better analysis.
- Enhanced the LaTeX document to incorporate new tables and figures summarizing indexing durations and performance comparisons across different hardware setups.
These changes aim to improve the clarity and comprehensiveness of performance insights within the Spacedrive system.
- Updated the CSV structure for whitepaper metrics to include a new 'Duration_s' column, providing insights into the time taken for each phase of the benchmarking process.
- Modified the CSV reporting logic in the Rust code to calculate and include phase-specific durations, improving the clarity and usability of performance data.
- Adjusted the LaTeX document to reflect the updated CSV format, ensuring accurate representation of benchmark results in the whitepaper.
These changes aim to improve the analysis of performance metrics within the Spacedrive system.
- Deleted obsolete Rust test files related to file copy and device persistence actions to streamline the codebase.
- Introduced new benchmark result JSON files for various scenarios, including shape_large and shape_medium, enhancing performance metrics and hardware details.
- Updated the CSV reporting structure to reflect the latest benchmark results, improving clarity and usability for analysis.
These changes aim to optimize the codebase and provide more accurate performance insights within the Spacedrive system.
- Revised benchmark result JSON files for shape_small and shape_medium scenarios, significantly improving performance metrics and updating hardware details.
- Enhanced the CSV file for whitepaper metrics to include additional metrics for better analysis and reporting.
- Refactored CLI commands to support multiple scenario executions and improved hardware detection, increasing usability and flexibility in benchmarking tasks.
These changes aim to provide more accurate performance insights and streamline the benchmarking process within the Spacedrive system.
- Updated benchmark result JSON files for shape_small and shape_medium scenarios, improving performance metrics with new hardware details and refined durations.
- Introduced a new CSV file for whitepaper metrics, consolidating benchmark results for better reporting and analysis.
- Refactored CLI commands to support multiple scenario executions and improved hardware detection, enhancing usability and flexibility in benchmarking tasks.
These changes aim to provide more accurate performance insights and streamline the benchmarking process within the Spacedrive system.
- Added a new dependency on `sysinfo` for improved hardware label detection in benchmark results.
- Updated benchmark result JSON files to include a more structured format with metadata and durations for better clarity.
- Refactored the CLI commands to utilize the new `BenchmarkRun` structure, enhancing the output of benchmark summaries.
- Removed outdated benchmark result files to streamline the reporting process.
These changes aim to improve the accuracy and usability of benchmark reporting within the Spacedrive system.
- Improved sidecar management by adding new database entities and migration scripts to support enhanced functionality.
- Updated benchmark YAML files to include additional scenarios, expanding performance testing coverage.
- Enhanced CLI commands with a new `recipe_filter` option for better control over recipe file selection.
These changes aim to strengthen the sidecar management system and improve the benchmarking framework's capabilities within the Spacedrive system.
- Updated `Cargo.toml` to modify the `sqlx` dependency, removing the `sqlite-fts5` feature for improved compatibility.
- Introduced new benchmark YAML files for `shape_large`, `shape_medium`, and `shape_small` scenarios, expanding performance testing coverage.
- Added a new `test_daemon.sh` script for testing the daemon's functionality, including logging and error handling.
- Implemented a sidecar management system with new database entities and migration scripts to support sidecar functionality.
- Enhanced the CLI commands to include a `recipe_filter` option for better control over recipe file selection.
These changes aim to improve the benchmarking framework's capabilities, enhance sidecar management, and streamline the testing process within the Spacedrive system.
- Revised the `rel_path` description to clarify the inclusion of sharding prefixes in the path structure.
- Updated directory structure examples to reflect the new two-level hex sharding under `content/`, improving organization and scalability.
- Enhanced the bootstrap scan description to specify the handling of the sharded directory tree, ensuring consistency between the database and filesystem.
These changes aim to improve clarity and understanding of the sidecar system's structure and its operational efficiency at scale.
- Modified the `sqlx` dependency in `Cargo.toml` to include the `sqlite-fts5` feature, enabling full-text search capabilities.
- Added an implementation note in the `SEARCH_DESIGN.md` documentation to clarify the execution of raw SQL statements for FTS5 using SeaORM's APIs, emphasizing the need for the `rusqlite` driver to be compiled with FTS5 support.
These changes aim to improve the functionality and documentation surrounding full-text search implementation within the system.
- Revised the `SearchMode` enum to include `Normal` and `Full`, replacing the previous modes for improved clarity.
- Enhanced the search workflow documentation, detailing the steps for indexing, sidecar generation, and progressive search execution.
- Clarified the asynchronous nature of job processing and the UI update mechanism during search execution.
These changes aim to streamline the search process and improve user understanding of the search functionality within the system.
- Updated the `SearchMode` enum to simplify search modes, replacing the previous definitions with three primary modes: `Fast`, `Normal`, and `Full`.
- Enhanced documentation to clarify the purpose and functionality of each search mode, improving understanding of the search optimization process.
These changes aim to streamline the search mode implementation and enhance the overall search experience within the system.
- Introduced a new optional argument `dataset_root` to the `Mkdata`, `MkdataAll`, `Run`, and `RunAll` commands, allowing users to prefix relative recipe locations with a specified path.
- Updated the `mkdata`, `mkdata_all`, `run_scenario`, and `run_all` functions to accommodate the new `dataset_root` parameter, improving flexibility in dataset generation and scenario execution.
- Enhanced documentation to reflect the changes in command options and usage, ensuring clarity for users.
These modifications aim to improve the usability of the CLI by providing more control over recipe path resolution within the Spacedrive system.
- Updated the benchmark results CSV to include additional metrics such as Files_per_s, GB_per_s, Files, Dirs, GB, Errors, and Recipe for improved clarity and detail.
- Refactored the CLI command to generate a results table that now accommodates the new output format, enhancing usability for users generating performance reports.
- Adjusted the whitepaper documentation to reflect the updated metrics and ensure consistency in reporting.
These changes aim to provide a more comprehensive view of benchmark performance and improve the overall reporting capabilities within the Spacedrive system.
- Introduced new benchmark result JSON files for various scenarios, including HDD and NVMe aggregation and content identification, to expand performance testing coverage.
- Added a new CLI command to generate a summary table from JSON results, allowing users to output results in markdown or CSV format for better reporting.
- Updated documentation to reflect the new command and its usage, improving clarity for users on how to generate and format benchmark results.
These changes aim to enhance the benchmarking framework's capabilities and improve usability for generating performance reports within the Spacedrive system.
- Updated the benchmarks documentation to improve clarity and consistency in command options, specifically changing `--out_json` to `--out-json`.
- Added a comprehensive roadmap outlining future benchmarks, including indexing pipeline, file operations, duplicates detection, search and querying, media pipeline, networking, and system metrics.
- Introduced enhancements for data generation, metrics telemetry, reporting, CLI ergonomics, and CI integration to improve the benchmarking framework's capabilities.
These changes aim to provide a clearer direction for future development and enhance the usability of the benchmarking tools within the Spacedrive system.
- Added new dependencies in `Cargo.toml` for `humantime` and `blake3` to support content generation features.
- Updated benchmark YAML files to include new `content_gen` settings for partial content generation and magic header writing.
- Refactored the dataset generation logic to incorporate content generation options, improving the fidelity of generated datasets.
- Modified benchmark result JSON files to reflect updated performance metrics and additional timing information.
These changes aim to enhance the accuracy and flexibility of the benchmarking framework within the Spacedrive system.
- Added new dependencies in `Cargo.toml` for `async-trait` and `blake3` to support asynchronous operations and hashing functionalities.
- Introduced a new library structure in `src/lib.rs` and created a new binary entry point in `src/bin/sd-bench-new.rs` for the benchmarking harness.
- Updated benchmark result JSON files for HDD and NVMe scenarios, reflecting changes in performance metrics and structure.
- Implemented a modular approach for dataset generation and scenario execution, improving the overall organization and maintainability of the benchmarking code.
These changes aim to improve the benchmarking capabilities and enhance the accuracy of performance metrics within the Spacedrive system.
- Updated benchmark JSON files for HDD and NVMe scenarios, reflecting improved performance metrics in `hdd_large-indexing-discovery.json`, `hdd_medium-indexing-discovery.json`, `nvme_mixed-indexing-discovery.json`, `nvme_small-indexing-discovery.json`, and `nvme_tiny-indexing-discovery.json`.
- Refactored entry processing logic in `entry.rs` to support bulk insertion of related rows, improving transaction efficiency and clarity.
- Enhanced the processing phase in `processing.rs` to accumulate entries for bulk insert, optimizing the indexing process.
These changes aim to improve the accuracy of benchmark results and streamline entry processing within the Spacedrive system.
- Updated benchmark JSON files for HDD and NVMe scenarios, reflecting improved performance metrics in `hdd_large-indexing-discovery.json`, `hdd_medium-indexing-discovery.json`, `nvme_mixed-indexing-discovery.json`, `nvme_small-indexing-discovery.json`, and `nvme_tiny-indexing-discovery.json`.
- Refactored entry processing logic in `entry.rs` to improve transaction handling and clarity, including the introduction of a new method for creating entries with a provided connection.
- Enhanced the processing phase in `processing.rs` to utilize a single transaction for batch entry creations, improving efficiency during the indexing process.
These changes aim to optimize the indexing discovery phase and ensure accurate benchmark results within the Spacedrive system.
- Introduced a new document outlining the Indexing Discovery Throughput Plan, detailing objectives, current baselines, measurement plans, design changes, and expected outcomes to improve discovery throughput and scalability.
- Enhanced the database connection handling in `mod.rs` by applying SQLite PRAGMAs for better write throughput, including settings for journal mode, synchronous behavior, temporary storage, cache size, and memory mapping.
- Updated the indexing state management to include new configuration parameters for discovery concurrency and channel capacities, facilitating improved performance during the indexing process.
These changes aim to optimize the indexing discovery phase and improve database performance within the Spacedrive system.
- Updated benchmark JSON files for HDD and NVMe scenarios with new metrics, including `hdd_large-indexing-discovery.json`, `hdd_medium-indexing-discovery.json`, `nvme_mixed-indexing-discovery.json`, `nvme_small-indexing-discovery.json`, and `nvme_tiny-indexing-discovery.json`.
- Refactored path resolution logic in the `addressing.rs` and `path_resolver.rs` files to improve efficiency and clarity, including enhanced batching for database queries.
- Improved the handling of directory paths and entry aggregation in the indexing phase, ensuring better performance and reliability.
These changes aim to enhance the accuracy of benchmark results and streamline path resolution processes within the Spacedrive system.
- Updated `nvme_tiny.yaml` to improve formatting consistency in file size buckets.
- Introduced new JSON result files for various HDD and NVMe scenarios, including `hdd_large-indexing-discovery.json`, `hdd_medium-indexing-discovery.json`, `nvme_mixed-indexing-discovery.json`, `nvme_small-indexing-discovery.json`, and `nvme_tiny-indexing-discovery.json`.
- Expanded the CLI with a new `RunAll` command to execute all recipes in a directory sequentially, generating JSON summaries for each run.
These changes aim to improve the organization and output of benchmark results, facilitating better analysis and reporting within the Spacedrive system.
- Added `benchdata` to `.gitignore` to exclude benchmark data from version control.
- Updated `Cargo.toml` to include the `benchmarks` workspace member, facilitating better organization of benchmark-related code.
- Enhanced the whitepaper by adding new sections and citations related to database schema optimization, testing frameworks, and performance benchmarks, improving the comprehensiveness of the documentation.
- Updated various auxiliary files to reflect changes in the whitepaper structure and content.
These changes aim to improve project organization and documentation clarity, supporting ongoing development and benchmarking efforts within the Spacedrive system.
- Refactored the `run_processing_phase` function to improve readability and maintainability.
- Reorganized imports for better clarity and consistency.
- Implemented a more efficient batching mechanism for processing entries, ensuring that entries are sorted by depth before processing.
- Enhanced logging to provide clearer insights into the processing steps and outcomes.
- Updated change detection logic to handle new, modified, and moved entries more effectively, improving overall performance during the indexing phase.
These changes aim to streamline the processing phase, making it more efficient and easier to understand, thereby enhancing the overall indexing functionality within the Spacedrive system.
- Introduced job logging capabilities, allowing detailed tracking of job execution and progress.
- Added configuration options for enabling job logging, including log directory, maximum file size, and debug log inclusion.
- Enhanced job context to support file logging, ensuring that job-related events are recorded in separate log files.
- Updated core initialization and job management to utilize the new logging features, improving observability and debugging capabilities.
These changes aim to provide better insights into job execution, facilitating easier troubleshooting and performance monitoring within the Spacedrive system.
- Added a detailed documentation file for refactoring `PathBuf` to `SdPath`, outlining architectural principles and implementation best practices.
- Introduced a new `addressing` module containing the `SdPath` enum and related path resolution logic, enhancing the structure of the Spacedrive codebase.
- Updated various components to utilize the new `SdPath` structure, ensuring consistency across the codebase.
- Refactored existing code to replace `shared::types::SdPath` with `domain::addressing::SdPath`, improving modularity and clarity.
These changes aim to streamline file operations and improve the resilience of the Spacedrive architecture by fully leveraging the capabilities of `SdPath`.
- Refactored `SdPath` from a struct to an enum, introducing `Physical` and `Content` variants for better path management.
- Updated path resolution logic to handle both physical and content-addressed paths, improving efficiency and safety.
- Modified various components, including job handling and CLI commands, to utilize the new `SdPath` structure, ensuring consistency across the codebase.
- Enhanced documentation to reflect changes in path handling and the new URI scheme for better usability.
These changes aim to streamline file operations and improve the resilience of the Spacedrive architecture.
- Revised section headings for consistency and improved readability.
- Expanded the impact section to provide detailed guidance on changes to the codebase.
- Added a new section on performant batch resolution to enhance path resolution efficiency.
- Updated examples and conclusions to reflect the refactor's significance in the Spacedrive architecture.
These changes aim to enhance the documentation's clarity and usability, ensuring that the design rationale and implementation details are easily understood.
- Created `mime_types` and `content_kinds` tables to serve as lookup tables for content management.
- Updated `content_identities` table to include foreign keys referencing the new lookup tables, enhancing data integrity.
- Modified migration scripts to include the creation and dropping of the new tables, ensuring proper database structure.
- Adjusted indexing to reflect changes in the `content_identities` table.
These changes aim to improve the organization and accessibility of content types within the database, facilitating better content management.
- Implemented functionality to create an entry for the current directory as the root location in the library demo.
- Updated the location model to associate the newly created entry with the current directory, improving the structure of location management.
- Enhanced the job registry by refining job handling and deserialization methods, ensuring better integration with the new location features.
These changes aim to improve the organization and accessibility of locations within the Spacedrive system, facilitating a more intuitive user experience.
- Introduced a new `deserialize_dyn_job` method in the `JobRegistry` to facilitate dynamic job deserialization, improving flexibility in job management.
- Updated the `JobRegistration` struct to include a function pointer for dynamic job deserialization, enhancing the extensibility of job types.
- Removed redundant raw job instance retrieval logic from `JobManager`, simplifying the codebase and leveraging the new registry capabilities.
- Improved documentation and formatting in the `spacedrive-jobs-derive` macro to enhance clarity for users.
These changes aim to improve the efficiency and maintainability of job handling within the system, enabling better dynamic job operations.
- Removed the `as_any` method from the `DynJob` trait, simplifying the interface for dynamic job handling.
- Introduced `try_get_affected_resources` method in the `DynJob` trait, allowing jobs to report affected resources without requiring downcasting.
- Updated job implementations (e.g., `FileCopyJob`, `MoveJob`, `IndexerJob`, etc.) to utilize the new resource tracking method, improving consistency across job types.
- Refactored `JobManager` and `EntryStateService` to leverage the new resource tracking capabilities, enhancing the accuracy of job impact assessments.
These changes aim to improve the efficiency and clarity of job management within the system, facilitating better resource tracking and dynamic job operations.
- Removed the `MaybeResourceful` trait and integrated its functionality into the `Resourceful` trait, streamlining resource tracking for jobs.
- Updated `JobManager` to utilize the new resource tracking capabilities, improving the accuracy of job impact assessments on entries.
- Enhanced dynamic job handling by implementing the `DynJob` trait across various job types, allowing for better type management and downcasting.
- Refactored job implementations to align with the new resourceful structure, ensuring consistency and maintainability across the job system.
These changes aim to improve the efficiency and clarity of job management within the system, facilitating better resource tracking and dynamic job operations.
- Introduced the `Resourceful` trait to allow jobs to report affected resources, enabling better tracking of job impacts on entries.
- Implemented methods in `JobManager` to find jobs affecting specific entry IDs, improving resource management capabilities.
- Updated `EntryStateService` to utilize the new resource tracking features, ensuring accurate state reporting for entries based on active jobs.
- Enhanced job serialization and deserialization processes to support dynamic job handling, allowing for more flexible job management.
These changes aim to improve the efficiency and accuracy of job processing within the system, facilitating better resource management and state tracking.
- Introduced `collection` and `collection_entry` entities to manage user-defined collections and their associated entries.
- Implemented a migration script to create the necessary database tables for collections and collection entries, ensuring proper relationships and constraints.
- Updated the `mod.rs` files to include new entities for easier access throughout the application.
- Enhanced the whitepaper to reflect the new user-managed collections feature, detailing its architecture and benefits.
These changes aim to improve the organizational capabilities of Spacedrive, allowing users to create and manage collections of files effectively.
- Introduced a new `entry_closure` table to efficiently manage hierarchical relationships between entries, enhancing query performance.
- Removed the `location_id` field from the `entries` table, establishing a clearer relationship through the closure table.
- Updated various modules, including `EntryProcessor` and `DatabasePersistence`, to utilize the closure table for managing entry relationships and indexing.
- Enhanced the `create_location` and `indexing` functions to support the new structure, ensuring data integrity and performance improvements.
- Refactored tests to validate the functionality of the closure table and its integration within the indexing process.
These changes aim to significantly improve the architecture of Spacedrive, enabling robust hierarchical data management and enhancing overall system performance.
- Updated the `LocationManager` to utilize a transaction-based approach for creating locations, ensuring atomicity and data integrity.
- Introduced a new `PathResolver` utility to handle full path resolution for entries, improving path management across various modules.
- Modified the `Location` and `Entry` entities to replace the `path` field with an `entry_id`, establishing a clearer relationship between locations and their corresponding entries.
- Enhanced the `create_location` function to insert directory paths into the new `directory_paths` table, facilitating better path tracking.
- Updated various modules to leverage the new path resolution logic, ensuring consistent path handling throughout the application.
These changes aim to improve the overall architecture of the location management system, enhancing performance and maintainability.
- Introduced a new `entry_closure` table to efficiently manage ancestor-descendant relationships, enhancing hierarchical queries.
- Added `parent_id` field to the `entries` table to establish parent-child relationships, facilitating a pure hierarchical model.
- Updated migration scripts to create the `entry_closure` table and modify existing tables accordingly.
- Enhanced `EntryProcessor` to handle entry creation and movement with closure table updates, ensuring data integrity and performance.
- Implemented `HierarchyQuery` module for efficient retrieval of children, descendants, and ancestors, improving query capabilities.
- Added comprehensive tests for closure table operations to validate functionality and performance.
These changes aim to significantly improve the architecture of Spacedrive, enabling robust hierarchical data management and enhancing overall system performance.
- Revised the Closure Table Indexing Proposal to detail the transition from a materialized path-based system to a hybrid model utilizing a Closure Table, aimed at improving hierarchical query performance and scalability.
- Updated the current implementation analysis, highlighting performance bottlenecks and introducing the proposed schema changes, including the addition of a `parent_id` field and the creation of an `entry_closure` table.
- Expanded on the closure table solution, emphasizing its efficiency in managing ancestor-descendant relationships and the impact on filesystem indexing.
- Enhanced the conclusion to reflect the significance of these architectural changes for Spacedrive's long-term performance and data integrity.
These updates aim to provide a comprehensive understanding of the proposed indexing improvements and their implications for the Spacedrive architecture.
- Replaced SHA-256 with BLAKE3 for hashing in the ContentHashGenerator, enhancing performance and security.
- Introduced constants for minimum file size and sampling parameters to improve clarity and maintainability.
- Updated the logic for generating full and sampled hashes based on new size thresholds, ensuring more efficient processing of small and large files.
- Enhanced documentation to reflect changes in hashing methodology and file handling strategies.
These updates aim to optimize the content identification process and improve the overall efficiency of the system.