mirror of
https://github.com/spacedriveapp/spacedrive.git
synced 2026-04-24 16:41:07 -04:00
Enhance indexing system with new scopes and persistence modes
This update introduces a refined indexing system that supports multiple indexing scopes (current and recursive) and persistence options (persistent and ephemeral). The CLI commands have been updated to reflect these changes, allowing for quick scans and external path browsing without database writes. Documentation has been revised to include detailed descriptions of the new features, use cases, and command comparisons, ensuring clarity for users. Additionally, the codebase has been refactored to accommodate these enhancements, improving overall functionality and performance.
This commit is contained in:
@@ -115,20 +115,43 @@ spacedrive location rescan <location-id> --force # Full rescan, ignore change d
|
||||
|
||||
**Note**: Location IDs are UUIDs displayed in the list command. All location operations work with the daemon automatically.
|
||||
|
||||
### Indexing
|
||||
### Enhanced Indexing
|
||||
|
||||
The new indexing system supports different scopes and persistence modes:
|
||||
|
||||
```bash
|
||||
# Start indexing with default settings (content mode)
|
||||
spacedrive index ~/Desktop
|
||||
# Quick scan of current directory only (no subdirectories)
|
||||
spacedrive index quick-scan /path/to/directory --scope current
|
||||
|
||||
# Index with specific mode
|
||||
spacedrive index ~/Videos --mode shallow # Metadata only
|
||||
spacedrive index ~/Photos --mode deep # Full analysis
|
||||
# Quick scan with ephemeral mode (no database writes)
|
||||
spacedrive index quick-scan /path/to/directory --scope current --ephemeral
|
||||
|
||||
# Index and watch progress
|
||||
spacedrive index ~/Documents --watch
|
||||
# Browse external paths without adding to managed locations
|
||||
spacedrive index browse /media/external-drive --scope current
|
||||
spacedrive index browse /network/drive --scope recursive --content
|
||||
|
||||
# Index managed locations with specific scope
|
||||
spacedrive index location /managed/location --scope current --mode shallow
|
||||
spacedrive index location <location-uuid> --scope recursive --mode deep
|
||||
|
||||
# Legacy full location indexing (backward compatibility)
|
||||
spacedrive scan /path/to/directory --mode content --watch
|
||||
```
|
||||
|
||||
**Index Scopes:**
|
||||
- `current`: Index only the specified directory (single level)
|
||||
- `recursive`: Index the directory and all subdirectories
|
||||
|
||||
**Index Modes:**
|
||||
- `shallow`: Metadata only (fastest)
|
||||
- `content`: Metadata + content hashing (moderate)
|
||||
- `deep`: Full analysis including media metadata (slowest)
|
||||
|
||||
**Use Cases:**
|
||||
- **UI Navigation**: `quick-scan --scope current` for instant directory viewing
|
||||
- **External Browsing**: `browse --ephemeral` for exploring non-managed paths
|
||||
- **Location Updates**: `location --scope current` to refresh specific directories
|
||||
|
||||
### Job Management
|
||||
|
||||
```bash
|
||||
|
||||
@@ -2,7 +2,7 @@
|
||||
|
||||
## Overview
|
||||
|
||||
The Spacedrive indexing system is a sophisticated, multi-phase file indexing engine designed for high performance and reliability. It discovers, processes, and categorizes files while supporting incremental updates, change detection, and content-based deduplication.
|
||||
The Spacedrive indexing system is a sophisticated, multi-phase file indexing engine designed for high performance and reliability. It discovers, processes, and categorizes files while supporting incremental updates, change detection, and content-based deduplication. The system now supports multiple indexing scopes and ephemeral modes for different use cases.
|
||||
|
||||
## Architecture
|
||||
|
||||
@@ -22,6 +22,9 @@ The Spacedrive indexing system is a sophisticated, multi-phase file indexing eng
|
||||
- **Content Deduplication**: Uses CAS (Content-Addressed Storage) IDs
|
||||
- **Type Detection**: Sophisticated file type identification with MIME type support
|
||||
- **Performance Optimized**: Batch processing, caching, and parallel operations
|
||||
- **Flexible Scoping**: Current (single-level) vs Recursive (full tree) indexing
|
||||
- **Ephemeral Mode**: In-memory indexing for browsing external paths
|
||||
- **Persistence Options**: Database storage vs memory-only for different use cases
|
||||
|
||||
## Indexing Phases
|
||||
|
||||
@@ -76,6 +79,126 @@ Generates content identifiers and detects file types.
|
||||
|
||||
**Output**: Content identities linked to entries
|
||||
|
||||
## Indexing Scopes and Persistence
|
||||
|
||||
### Index Scopes
|
||||
|
||||
The indexing system supports two different scopes for different use cases:
|
||||
|
||||
#### Current Scope
|
||||
- **Description**: Index only the specified directory (single level)
|
||||
- **Use Cases**: UI navigation, quick directory browsing, instant feedback
|
||||
- **Performance**: <500ms for typical directories
|
||||
- **Implementation**: Direct directory read without recursion
|
||||
|
||||
```rust
|
||||
let config = IndexerJobConfig::ui_navigation(location_id, path);
|
||||
// Results in single-level scan optimized for UI responsiveness
|
||||
```
|
||||
|
||||
#### Recursive Scope
|
||||
- **Description**: Index the directory and all subdirectories
|
||||
- **Use Cases**: Full location indexing, comprehensive file discovery
|
||||
- **Performance**: Depends on directory tree size
|
||||
- **Implementation**: Traditional recursive tree traversal
|
||||
|
||||
```rust
|
||||
let config = IndexerJobConfig::new(location_id, path, mode);
|
||||
// Default recursive behavior for complete coverage
|
||||
```
|
||||
|
||||
### Persistence Modes
|
||||
|
||||
#### Persistent Mode
|
||||
- **Storage**: Database (SQLite/PostgreSQL)
|
||||
- **Use Cases**: Managed locations, permanent indexing
|
||||
- **Features**: Full change detection, resumability, sync support
|
||||
- **Lifecycle**: Permanent until explicitly removed
|
||||
|
||||
```rust
|
||||
let config = IndexerJobConfig::new(location_id, path, mode);
|
||||
config.persistence = IndexPersistence::Persistent;
|
||||
```
|
||||
|
||||
#### Ephemeral Mode
|
||||
- **Storage**: Memory (EphemeralIndex)
|
||||
- **Use Cases**: External path browsing, temporary exploration
|
||||
- **Features**: No database writes, session-based caching
|
||||
- **Lifecycle**: Exists only during application session
|
||||
|
||||
```rust
|
||||
let config = IndexerJobConfig::ephemeral_browse(path, scope);
|
||||
// Results stored in memory, automatic cleanup
|
||||
```
|
||||
|
||||
### Enhanced Configuration
|
||||
|
||||
The new `IndexerJobConfig` provides fine-grained control:
|
||||
|
||||
```rust
|
||||
pub struct IndexerJobConfig {
|
||||
pub location_id: Option<Uuid>, // None for ephemeral
|
||||
pub path: SdPath, // Path to index
|
||||
pub mode: IndexMode, // Shallow/Content/Deep
|
||||
pub scope: IndexScope, // Current/Recursive
|
||||
pub persistence: IndexPersistence, // Persistent/Ephemeral
|
||||
pub max_depth: Option<u32>, // Depth limiting
|
||||
}
|
||||
```
|
||||
|
||||
### Use Case Examples
|
||||
|
||||
#### UI Directory Navigation
|
||||
```rust
|
||||
// Fast current directory scan for UI
|
||||
let config = IndexerJobConfig::ui_navigation(location_id, path);
|
||||
// - Scope: Current (single level)
|
||||
// - Mode: Shallow (metadata only)
|
||||
// - Persistence: Persistent
|
||||
// - Target: <500ms response time
|
||||
```
|
||||
|
||||
#### External Path Browsing
|
||||
```rust
|
||||
// Browse USB drive without adding to library
|
||||
let config = IndexerJobConfig::ephemeral_browse(usb_path, IndexScope::Current);
|
||||
// - Scope: Current or Recursive
|
||||
// - Mode: Shallow (configurable)
|
||||
// - Persistence: Ephemeral
|
||||
// - Target: Exploration without database pollution
|
||||
```
|
||||
|
||||
#### Background Location Indexing
|
||||
```rust
|
||||
// Traditional full location scan
|
||||
let config = IndexerJobConfig::new(location_id, path, IndexMode::Deep);
|
||||
// - Scope: Recursive (default)
|
||||
// - Mode: Deep (full analysis)
|
||||
// - Persistence: Persistent
|
||||
// - Target: Complete coverage
|
||||
```
|
||||
|
||||
### Ephemeral Index Structure
|
||||
|
||||
The `EphemeralIndex` provides temporary storage:
|
||||
|
||||
```rust
|
||||
pub struct EphemeralIndex {
|
||||
pub entries: HashMap<PathBuf, EntryMetadata>,
|
||||
pub content_identities: HashMap<String, EphemeralContentIdentity>,
|
||||
pub created_at: Instant,
|
||||
pub last_accessed: Instant,
|
||||
pub root_path: PathBuf,
|
||||
pub stats: IndexerStats,
|
||||
}
|
||||
```
|
||||
|
||||
Features:
|
||||
- **LRU Behavior**: Automatic cleanup based on access time
|
||||
- **Memory Efficient**: Lightweight metadata storage
|
||||
- **Session Scoped**: Cleared on application restart
|
||||
- **Fast Access**: Direct HashMap lookups
|
||||
|
||||
## Database Schema
|
||||
|
||||
### Core Tables
|
||||
@@ -202,49 +325,85 @@ Change types detected:
|
||||
- Reduces database round trips
|
||||
- Improves memory efficiency
|
||||
|
||||
### Scope Optimizations
|
||||
- **Current Scope**: Direct directory read without recursion (<500ms target)
|
||||
- **Recursive Scope**: Efficient tree traversal with depth control
|
||||
- **Ephemeral Mode**: Memory-only storage for external path browsing
|
||||
- **Early Termination**: Configurable max_depth limiting
|
||||
|
||||
### Caching
|
||||
- Path prefix cache
|
||||
- Parent entry ID cache
|
||||
- Change detection cache
|
||||
- Entry ID cache for parent lookups
|
||||
- Change detection cache for inode/timestamp comparisons
|
||||
- Ephemeral index LRU cache for session-based storage
|
||||
- Content identity cache for deduplication
|
||||
|
||||
### Parallelization
|
||||
- Concurrent CAS ID generation
|
||||
- Parallel file type detection
|
||||
- Async I/O operations
|
||||
- Batch processing across multiple threads
|
||||
|
||||
### Database Optimizations
|
||||
- Bulk inserts
|
||||
- Prepared statements
|
||||
- Strategic indexing
|
||||
- Bulk inserts with transaction batching
|
||||
- Prepared statements for repeated operations
|
||||
- Strategic indexing on location_id and relative_path
|
||||
- Persistence abstraction for database vs memory storage
|
||||
|
||||
## Usage Examples
|
||||
|
||||
### Starting an Indexing Job
|
||||
### Enhanced Indexing Jobs
|
||||
|
||||
```rust
|
||||
use sd_core_new::operations::indexing::{IndexerJob, IndexMode};
|
||||
use sd_core_new::operations::indexing::{
|
||||
IndexerJob, IndexerJobConfig, IndexMode, IndexScope, IndexPersistence
|
||||
};
|
||||
|
||||
// Create indexer job
|
||||
let job = IndexerJob::new(
|
||||
location_id,
|
||||
location_path,
|
||||
IndexMode::Deep, // Shallow, Content, or Deep
|
||||
);
|
||||
|
||||
// Dispatch through job manager
|
||||
// UI Navigation - Fast current directory scan
|
||||
let config = IndexerJobConfig::ui_navigation(location_id, path);
|
||||
let job = IndexerJob::new(config);
|
||||
let handle = library.jobs().dispatch(job).await?;
|
||||
|
||||
// Monitor progress
|
||||
while let Some(progress) = handle.progress().await {
|
||||
println!("Progress: {:?}", progress);
|
||||
}
|
||||
// Ephemeral Browsing - External path exploration
|
||||
let config = IndexerJobConfig::ephemeral_browse(external_path, IndexScope::Current);
|
||||
let job = IndexerJob::new(config);
|
||||
let handle = library.jobs().dispatch(job).await?;
|
||||
|
||||
// Traditional Location Indexing - Full recursive scan
|
||||
let config = IndexerJobConfig::new(location_id, path, IndexMode::Deep);
|
||||
let job = IndexerJob::new(config);
|
||||
let handle = library.jobs().dispatch(job).await?;
|
||||
|
||||
// Custom Configuration - Fine-grained control
|
||||
let mut config = IndexerJobConfig::new(location_id, path, IndexMode::Content);
|
||||
config.scope = IndexScope::Current;
|
||||
config.max_depth = Some(2);
|
||||
let job = IndexerJob::new(config);
|
||||
```
|
||||
|
||||
### Legacy API (Backward Compatibility)
|
||||
|
||||
```rust
|
||||
// Old API still works for simple cases
|
||||
let job = IndexerJob::from_location(location_id, path, IndexMode::Deep);
|
||||
let job = IndexerJob::shallow(location_id, path);
|
||||
let job = IndexerJob::with_content(location_id, path);
|
||||
```
|
||||
|
||||
### Indexing Modes
|
||||
|
||||
- **Shallow**: Metadata only (fastest)
|
||||
- **Content**: Includes CAS ID generation
|
||||
- **Deep**: Full analysis including thumbnails (future)
|
||||
- **Shallow**: Metadata only (fastest, <500ms for UI)
|
||||
- **Content**: Includes CAS ID generation (moderate performance)
|
||||
- **Deep**: Full analysis including thumbnails (comprehensive)
|
||||
|
||||
### Indexing Scopes
|
||||
|
||||
- **Current**: Single directory level (UI navigation, quick browsing)
|
||||
- **Recursive**: Full directory tree (complete location indexing)
|
||||
|
||||
### Persistence Options
|
||||
|
||||
- **Persistent**: Database storage (managed locations, permanent data)
|
||||
- **Ephemeral**: Memory storage (external browsing, temporary exploration)
|
||||
|
||||
## Metrics and Monitoring
|
||||
|
||||
@@ -305,6 +464,24 @@ IndexerConfig {
|
||||
max_concurrent_io: usize, // Default: 100
|
||||
enable_content_id: bool, // Default: true
|
||||
}
|
||||
|
||||
// Enhanced configuration with scope and persistence
|
||||
IndexerJobConfig {
|
||||
location_id: Option<Uuid>, // None for ephemeral jobs
|
||||
path: SdPath, // Target path
|
||||
mode: IndexMode, // Shallow/Content/Deep
|
||||
scope: IndexScope, // Current/Recursive
|
||||
persistence: IndexPersistence, // Persistent/Ephemeral
|
||||
max_depth: Option<u32>, // Depth limiting for performance
|
||||
}
|
||||
|
||||
// Ephemeral index settings
|
||||
EphemeralConfig {
|
||||
max_entries: usize, // Default: 10000
|
||||
cleanup_interval: Duration, // Default: 5 minutes
|
||||
max_idle_time: Duration, // Default: 30 minutes
|
||||
enable_content_analysis: bool, // Default: false
|
||||
}
|
||||
```
|
||||
|
||||
## Integration Points
|
||||
@@ -342,27 +519,71 @@ The indexer state is serialized using MessagePack for efficient storage and quic
|
||||
|
||||
## CLI Usage
|
||||
|
||||
The indexing system is easily accessible through the CLI:
|
||||
The indexing system provides comprehensive CLI access with enhanced scope and persistence options:
|
||||
|
||||
### Enhanced Index Commands
|
||||
|
||||
```bash
|
||||
# Start the daemon first
|
||||
spacedrive start
|
||||
|
||||
# Quick scan for UI navigation (fast, current directory only)
|
||||
spacedrive index quick-scan ~/Documents --scope current
|
||||
|
||||
# Quick scan with ephemeral mode (no database writes)
|
||||
spacedrive index quick-scan /external/drive --scope current --ephemeral
|
||||
|
||||
# Browse external paths without adding to managed locations
|
||||
spacedrive index browse /media/usb-drive --scope current
|
||||
spacedrive index browse /network/share --scope recursive --content
|
||||
|
||||
# Index managed locations with specific scope and mode
|
||||
spacedrive index location ~/Pictures --scope current --mode shallow
|
||||
spacedrive index location <location-uuid> --scope recursive --mode deep
|
||||
```
|
||||
|
||||
### Location Management
|
||||
|
||||
```bash
|
||||
# Add locations with different indexing modes
|
||||
spacedrive location add ~/Documents --mode shallow # Fast metadata only
|
||||
spacedrive location add ~/Pictures --mode content # With content hashing
|
||||
spacedrive location add ~/Videos --mode deep # Full media analysis
|
||||
|
||||
# Monitor indexing progress in real-time
|
||||
spacedrive job monitor
|
||||
|
||||
# Check job status
|
||||
spacedrive job list --status running
|
||||
|
||||
# Force re-indexing of a location
|
||||
spacedrive location rescan <location-id> --force
|
||||
```
|
||||
|
||||
### Legacy Commands (Backward Compatibility)
|
||||
|
||||
```bash
|
||||
# Traditional indexing (creates location and starts full scan)
|
||||
spacedrive scan ~/Desktop --mode content --watch
|
||||
```
|
||||
|
||||
### Monitoring and Status
|
||||
|
||||
```bash
|
||||
# Monitor indexing progress in real-time
|
||||
spacedrive job monitor
|
||||
|
||||
# Check job status with scope/persistence info
|
||||
spacedrive job list --status running
|
||||
|
||||
# Get detailed job information
|
||||
spacedrive job info <job-id>
|
||||
```
|
||||
|
||||
### Command Comparison
|
||||
|
||||
| Command | Scope | Persistence | Use Case |
|
||||
|---------|-------|-------------|----------|
|
||||
| `index quick-scan` | Current/Recursive | Persistent/Ephemeral | UI navigation, quick browsing |
|
||||
| `index browse` | Current/Recursive | Ephemeral | External path exploration |
|
||||
| `index location` | Current/Recursive | Persistent | Managed location updates |
|
||||
| `scan` (legacy) | Recursive | Persistent | Traditional full indexing |
|
||||
| `location add` | Recursive | Persistent | Add new managed locations |
|
||||
|
||||
For complete CLI documentation, see [CLI Documentation](./cli.md).
|
||||
|
||||
## Debugging
|
||||
|
||||
@@ -25,6 +25,14 @@ pub enum IndexMode {
|
||||
Deep,
|
||||
}
|
||||
|
||||
#[derive(Clone, Debug, ValueEnum)]
|
||||
pub enum IndexScope {
|
||||
/// Index only the current directory (single level)
|
||||
Current,
|
||||
/// Index recursively through all subdirectories
|
||||
Recursive,
|
||||
}
|
||||
|
||||
impl From<IndexMode> for crate::location::IndexMode {
|
||||
fn from(mode: IndexMode) -> Self {
|
||||
match mode {
|
||||
@@ -35,6 +43,54 @@ impl From<IndexMode> for crate::location::IndexMode {
|
||||
}
|
||||
}
|
||||
|
||||
impl From<IndexScope> for crate::operations::indexing::IndexScope {
|
||||
fn from(scope: IndexScope) -> Self {
|
||||
match scope {
|
||||
IndexScope::Current => crate::operations::indexing::IndexScope::Current,
|
||||
IndexScope::Recursive => crate::operations::indexing::IndexScope::Recursive,
|
||||
}
|
||||
}
|
||||
}
|
||||
|
||||
#[derive(Subcommand, Clone)]
|
||||
pub enum IndexCommands {
|
||||
/// Quick scan of a directory (metadata only, current scope)
|
||||
QuickScan {
|
||||
/// Path to scan
|
||||
path: PathBuf,
|
||||
/// Scope: current or recursive
|
||||
#[arg(short, long, value_enum, default_value = "current")]
|
||||
scope: IndexScope,
|
||||
/// Run ephemerally (no database writes)
|
||||
#[arg(short, long)]
|
||||
ephemeral: bool,
|
||||
},
|
||||
|
||||
/// Browse external paths without adding to managed locations
|
||||
Browse {
|
||||
/// Path to browse
|
||||
path: PathBuf,
|
||||
/// Scope: current or recursive
|
||||
#[arg(short, long, value_enum, default_value = "current")]
|
||||
scope: IndexScope,
|
||||
/// Enable content analysis
|
||||
#[arg(short, long)]
|
||||
content: bool,
|
||||
},
|
||||
|
||||
/// Traditional full location indexing
|
||||
Location {
|
||||
/// Location ID or path
|
||||
identifier: String,
|
||||
/// Indexing mode
|
||||
#[arg(short, long, value_enum, default_value = "content")]
|
||||
mode: IndexMode,
|
||||
/// Scope: current or recursive
|
||||
#[arg(short, long, value_enum, default_value = "recursive")]
|
||||
scope: IndexScope,
|
||||
},
|
||||
}
|
||||
|
||||
#[derive(Subcommand, Clone)]
|
||||
pub enum LibraryCommands {
|
||||
/// Create a new library
|
||||
@@ -561,7 +617,7 @@ pub async fn handle_job_command(
|
||||
Ok(())
|
||||
}
|
||||
|
||||
pub async fn handle_index_command(
|
||||
pub async fn handle_legacy_scan_command(
|
||||
path: PathBuf,
|
||||
mode: IndexMode,
|
||||
watch: bool,
|
||||
@@ -662,6 +718,136 @@ pub async fn handle_status_command(
|
||||
Ok(())
|
||||
}
|
||||
|
||||
pub async fn handle_index_command(
|
||||
cmd: IndexCommands,
|
||||
core: &Core,
|
||||
state: &CliState,
|
||||
) -> Result<(), Box<dyn std::error::Error>> {
|
||||
use crate::{
|
||||
operations::indexing::{IndexerJob, IndexerJobConfig, IndexMode as JobIndexMode, IndexScope as JobIndexScope, IndexPersistence},
|
||||
shared::types::SdPath,
|
||||
};
|
||||
|
||||
match cmd {
|
||||
IndexCommands::QuickScan { path, scope, ephemeral } => {
|
||||
if !path.exists() {
|
||||
return Err(format!("Path does not exist: {}", path.display()).into());
|
||||
}
|
||||
|
||||
println!("🔍 {} scan of {}",
|
||||
if ephemeral { "Ephemeral quick" } else { "Quick" },
|
||||
path.display().to_string().bright_cyan());
|
||||
println!(" Scope: {}", format!("{:?}", scope).bright_yellow());
|
||||
|
||||
// Create SdPath - for demo we'll use a nil device UUID
|
||||
let device = core.device.to_device()?;
|
||||
let sd_path = SdPath::new(device.id, path);
|
||||
|
||||
let job = if ephemeral {
|
||||
IndexerJob::ephemeral_browse(sd_path, scope.into())
|
||||
} else {
|
||||
// Need a library for persistent jobs
|
||||
let library = get_current_library(core, state).await?;
|
||||
// For quick scan, we'll create a UI navigation job
|
||||
IndexerJob::ui_navigation(library.id(), sd_path)
|
||||
};
|
||||
|
||||
// Dispatch the job
|
||||
let library = if ephemeral {
|
||||
// Use a temporary library for ephemeral jobs - in practice this should be handled differently
|
||||
get_current_library(core, state).await?
|
||||
} else {
|
||||
get_current_library(core, state).await?
|
||||
};
|
||||
|
||||
let handle = library.jobs().dispatch(job).await?;
|
||||
|
||||
println!("✅ Quick scan job started!");
|
||||
println!(" Job ID: {}", handle.id().to_string().bright_yellow());
|
||||
|
||||
if ephemeral {
|
||||
println!(" Mode: Ephemeral (no database writes)");
|
||||
}
|
||||
|
||||
println!("\n💡 Monitor progress with: {}", "spacedrive job monitor".bright_cyan());
|
||||
}
|
||||
|
||||
IndexCommands::Browse { path, scope, content } => {
|
||||
if !path.exists() {
|
||||
return Err(format!("Path does not exist: {}", path.display()).into());
|
||||
}
|
||||
|
||||
println!("🌐 Browsing {} (scope: {:?})",
|
||||
path.display().to_string().bright_cyan(),
|
||||
scope);
|
||||
|
||||
let device = core.device.to_device()?;
|
||||
let sd_path = SdPath::new(device.id, path);
|
||||
|
||||
// Create ephemeral job with appropriate mode
|
||||
let mut config = IndexerJobConfig::ephemeral_browse(sd_path, scope.into());
|
||||
if content {
|
||||
config.mode = JobIndexMode::Content;
|
||||
println!(" Content analysis: {}", "Enabled".bright_green());
|
||||
}
|
||||
|
||||
let job = IndexerJob::new(config);
|
||||
|
||||
// For browsing, we still need a library context but results won't be persisted
|
||||
let library = get_current_library(core, state).await?;
|
||||
let handle = library.jobs().dispatch(job).await?;
|
||||
|
||||
println!("✅ Browse job started!");
|
||||
println!(" Job ID: {}", handle.id().to_string().bright_yellow());
|
||||
println!(" Mode: Ephemeral browsing");
|
||||
|
||||
println!("\n💡 Monitor progress with: {}", "spacedrive job monitor".bright_cyan());
|
||||
}
|
||||
|
||||
IndexCommands::Location { identifier, mode, scope } => {
|
||||
let library = get_current_library(core, state).await?;
|
||||
|
||||
// Find location by ID or path
|
||||
let locations = entities::location::Entity::find()
|
||||
.all(library.db().conn())
|
||||
.await?;
|
||||
|
||||
let location = locations.into_iter().find(|loc| {
|
||||
// Try to match by UUID first
|
||||
if let Ok(uuid) = identifier.parse::<Uuid>() {
|
||||
loc.uuid == uuid
|
||||
} else {
|
||||
// Match by path
|
||||
loc.path == identifier
|
||||
}
|
||||
}).ok_or_else(|| format!("Location not found: {}", identifier))?;
|
||||
|
||||
println!("📂 Indexing location: {}", location.name.as_deref().unwrap_or("Unnamed").bright_cyan());
|
||||
println!(" Path: {}", location.path.bright_blue());
|
||||
println!(" Mode: {:?}", mode);
|
||||
println!(" Scope: {:?}", scope);
|
||||
|
||||
let device = core.device.to_device()?;
|
||||
let sd_path = SdPath::new(device.id, PathBuf::from(&location.path));
|
||||
|
||||
// Create appropriate job configuration
|
||||
let mut config = IndexerJobConfig::new(location.uuid, sd_path, mode.into().into());
|
||||
config.scope = scope.into();
|
||||
|
||||
let job = IndexerJob::new(config);
|
||||
let handle = library.jobs().dispatch(job).await?;
|
||||
|
||||
println!("✅ Location indexing job started!");
|
||||
println!(" Job ID: {}", handle.id().to_string().bright_yellow());
|
||||
println!(" Location: {}", location.uuid.to_string().bright_yellow());
|
||||
|
||||
println!("\n💡 Monitor progress with: {}", "spacedrive job monitor".bright_cyan());
|
||||
}
|
||||
}
|
||||
|
||||
Ok(())
|
||||
}
|
||||
|
||||
async fn get_current_library(
|
||||
core: &Core,
|
||||
state: &CliState,
|
||||
|
||||
@@ -39,8 +39,12 @@ pub enum Commands {
|
||||
#[command(subcommand)]
|
||||
Job(commands::JobCommands),
|
||||
|
||||
/// Start an indexing job
|
||||
Index {
|
||||
/// Enhanced indexing with scope and persistence options
|
||||
#[command(subcommand)]
|
||||
Index(commands::IndexCommands),
|
||||
|
||||
/// Start a traditional indexing job (legacy)
|
||||
Scan {
|
||||
/// Path to index
|
||||
path: PathBuf,
|
||||
|
||||
@@ -139,8 +143,9 @@ pub async fn run() -> Result<(), Box<dyn std::error::Error>> {
|
||||
Commands::Library(cmd) => commands::handle_library_command(cmd, &core, &mut state).await?,
|
||||
Commands::Location(cmd) => commands::handle_location_command(cmd, &core, &mut state).await?,
|
||||
Commands::Job(cmd) => commands::handle_job_command(cmd, &core, &mut state).await?,
|
||||
Commands::Index { path, mode, watch } => {
|
||||
commands::handle_index_command(path, mode, watch, &core, &mut state).await?
|
||||
Commands::Index(cmd) => commands::handle_index_command(cmd, &core, &mut state).await?,
|
||||
Commands::Scan { path, mode, watch } => {
|
||||
commands::handle_legacy_scan_command(path, mode, watch, &core, &mut state).await?
|
||||
}
|
||||
Commands::Monitor => monitor::run_monitor(&core).await?,
|
||||
Commands::Status => commands::handle_status_command(&core, &state).await?,
|
||||
|
||||
@@ -5,11 +5,13 @@ use crate::{
|
||||
shared::types::SdPath,
|
||||
};
|
||||
use serde::{Deserialize, Serialize};
|
||||
use std::time::Duration;
|
||||
use std::{time::Duration, collections::HashMap, sync::Arc, path::PathBuf};
|
||||
use uuid::Uuid;
|
||||
use tokio::sync::RwLock;
|
||||
|
||||
use super::{
|
||||
state::{IndexerState, IndexerProgress, IndexerStats, IndexError, Phase},
|
||||
entry::EntryMetadata,
|
||||
metrics::{IndexerMetrics, PhaseTimer},
|
||||
phases,
|
||||
};
|
||||
@@ -25,17 +27,184 @@ pub enum IndexMode {
|
||||
Deep,
|
||||
}
|
||||
|
||||
/// Indexing scope determines how much of the directory tree to process
|
||||
#[derive(Debug, Clone, Copy, PartialEq, Eq, Serialize, Deserialize)]
|
||||
pub enum IndexScope {
|
||||
/// Index only the current directory (single level)
|
||||
Current,
|
||||
/// Index recursively through all subdirectories
|
||||
Recursive,
|
||||
}
|
||||
|
||||
impl Default for IndexScope {
|
||||
fn default() -> Self {
|
||||
IndexScope::Recursive
|
||||
}
|
||||
}
|
||||
|
||||
impl From<&str> for IndexScope {
|
||||
fn from(s: &str) -> Self {
|
||||
match s.to_lowercase().as_str() {
|
||||
"current" => IndexScope::Current,
|
||||
"recursive" => IndexScope::Recursive,
|
||||
_ => IndexScope::Recursive,
|
||||
}
|
||||
}
|
||||
}
|
||||
|
||||
impl std::fmt::Display for IndexScope {
|
||||
fn fmt(&self, f: &mut std::fmt::Formatter<'_>) -> std::fmt::Result {
|
||||
match self {
|
||||
IndexScope::Current => write!(f, "current"),
|
||||
IndexScope::Recursive => write!(f, "recursive"),
|
||||
}
|
||||
}
|
||||
}
|
||||
|
||||
/// Determines whether indexing results are persisted to database or kept in memory
|
||||
#[derive(Debug, Clone, Copy, PartialEq, Eq, Serialize, Deserialize)]
|
||||
pub enum IndexPersistence {
|
||||
/// Write all results to database (normal operation)
|
||||
Persistent,
|
||||
/// Keep results in memory only (for unmanaged paths)
|
||||
Ephemeral,
|
||||
}
|
||||
|
||||
impl Default for IndexPersistence {
|
||||
fn default() -> Self {
|
||||
IndexPersistence::Persistent
|
||||
}
|
||||
}
|
||||
|
||||
/// Enhanced configuration for indexer jobs
|
||||
#[derive(Debug, Clone, Serialize, Deserialize)]
|
||||
pub struct IndexerJobConfig {
|
||||
pub location_id: Option<Uuid>, // None for ephemeral indexing
|
||||
pub path: SdPath,
|
||||
pub mode: IndexMode,
|
||||
pub scope: IndexScope,
|
||||
pub persistence: IndexPersistence,
|
||||
pub max_depth: Option<u32>, // Override for Current scope or depth limiting
|
||||
}
|
||||
|
||||
impl IndexerJobConfig {
|
||||
/// Create a new configuration for persistent recursive indexing (traditional)
|
||||
pub fn new(location_id: Uuid, path: SdPath, mode: IndexMode) -> Self {
|
||||
Self {
|
||||
location_id: Some(location_id),
|
||||
path,
|
||||
mode,
|
||||
scope: IndexScope::Recursive,
|
||||
persistence: IndexPersistence::Persistent,
|
||||
max_depth: None,
|
||||
}
|
||||
}
|
||||
|
||||
/// Create configuration for UI directory navigation (quick current scan)
|
||||
pub fn ui_navigation(location_id: Uuid, path: SdPath) -> Self {
|
||||
Self {
|
||||
location_id: Some(location_id),
|
||||
path,
|
||||
mode: IndexMode::Shallow,
|
||||
scope: IndexScope::Current,
|
||||
persistence: IndexPersistence::Persistent,
|
||||
max_depth: Some(1),
|
||||
}
|
||||
}
|
||||
|
||||
/// Create configuration for ephemeral path browsing (outside managed locations)
|
||||
pub fn ephemeral_browse(path: SdPath, scope: IndexScope) -> Self {
|
||||
Self {
|
||||
location_id: None,
|
||||
path,
|
||||
mode: IndexMode::Shallow,
|
||||
scope,
|
||||
persistence: IndexPersistence::Ephemeral,
|
||||
max_depth: if scope == IndexScope::Current { Some(1) } else { None },
|
||||
}
|
||||
}
|
||||
|
||||
/// Check if this is an ephemeral (non-persistent) job
|
||||
pub fn is_ephemeral(&self) -> bool {
|
||||
self.persistence == IndexPersistence::Ephemeral
|
||||
}
|
||||
|
||||
/// Check if this is a current scope (single level) job
|
||||
pub fn is_current_scope(&self) -> bool {
|
||||
self.scope == IndexScope::Current
|
||||
}
|
||||
}
|
||||
|
||||
/// In-memory storage for ephemeral indexing results
|
||||
#[derive(Debug)]
|
||||
pub struct EphemeralIndex {
|
||||
pub entries: HashMap<PathBuf, EntryMetadata>,
|
||||
pub content_identities: HashMap<String, EphemeralContentIdentity>,
|
||||
pub created_at: std::time::Instant,
|
||||
pub last_accessed: std::time::Instant,
|
||||
pub root_path: PathBuf,
|
||||
pub stats: IndexerStats,
|
||||
}
|
||||
|
||||
/// Simplified content identity for ephemeral storage
|
||||
#[derive(Debug, Clone)]
|
||||
pub struct EphemeralContentIdentity {
|
||||
pub cas_id: String,
|
||||
pub mime_type: Option<String>,
|
||||
pub file_size: u64,
|
||||
pub entry_count: u32,
|
||||
}
|
||||
|
||||
impl EphemeralIndex {
|
||||
pub fn new(root_path: PathBuf) -> Self {
|
||||
let now = std::time::Instant::now();
|
||||
Self {
|
||||
entries: HashMap::new(),
|
||||
content_identities: HashMap::new(),
|
||||
created_at: now,
|
||||
last_accessed: now,
|
||||
root_path,
|
||||
stats: IndexerStats::default(),
|
||||
}
|
||||
}
|
||||
|
||||
pub fn add_entry(&mut self, path: PathBuf, metadata: EntryMetadata) {
|
||||
self.entries.insert(path, metadata);
|
||||
self.last_accessed = std::time::Instant::now();
|
||||
}
|
||||
|
||||
pub fn get_entry(&mut self, path: &PathBuf) -> Option<&EntryMetadata> {
|
||||
self.last_accessed = std::time::Instant::now();
|
||||
self.entries.get(path)
|
||||
}
|
||||
|
||||
pub fn add_content_identity(&mut self, cas_id: String, content: EphemeralContentIdentity) {
|
||||
self.content_identities.insert(cas_id, content);
|
||||
self.last_accessed = std::time::Instant::now();
|
||||
}
|
||||
|
||||
pub fn age(&self) -> Duration {
|
||||
self.created_at.elapsed()
|
||||
}
|
||||
|
||||
pub fn idle_time(&self) -> Duration {
|
||||
self.last_accessed.elapsed()
|
||||
}
|
||||
}
|
||||
|
||||
/// Indexer job - discovers and indexes files in a location
|
||||
#[derive(Debug, Serialize, Deserialize)]
|
||||
pub struct IndexerJob {
|
||||
pub location_id: Uuid,
|
||||
pub root_path: SdPath,
|
||||
pub mode: IndexMode,
|
||||
pub config: IndexerJobConfig,
|
||||
|
||||
// Resumable state
|
||||
#[serde(skip_serializing_if = "Option::is_none")]
|
||||
state: Option<IndexerState>,
|
||||
|
||||
// Ephemeral storage for non-persistent jobs
|
||||
#[serde(skip)]
|
||||
ephemeral_index: Option<Arc<RwLock<EphemeralIndex>>>,
|
||||
|
||||
// Performance tracking
|
||||
#[serde(skip)]
|
||||
timer: Option<PhaseTimer>,
|
||||
@@ -63,6 +232,14 @@ impl JobHandler for IndexerJob {
|
||||
self.timer = Some(PhaseTimer::new());
|
||||
}
|
||||
|
||||
// Initialize ephemeral index if needed
|
||||
if self.config.is_ephemeral() && self.ephemeral_index.is_none() {
|
||||
let root_path = self.config.path.as_local_path()
|
||||
.ok_or_else(|| JobError::execution("Path not accessible locally".to_string()))?;
|
||||
self.ephemeral_index = Some(Arc::new(RwLock::new(EphemeralIndex::new(root_path.to_path_buf()))));
|
||||
ctx.log("Initialized ephemeral index for non-persistent job");
|
||||
}
|
||||
|
||||
// Initialize or restore state
|
||||
let state = match &mut self.state {
|
||||
Some(state) => {
|
||||
@@ -70,23 +247,30 @@ impl JobHandler for IndexerJob {
|
||||
state
|
||||
}
|
||||
None => {
|
||||
ctx.log("Starting new indexer job");
|
||||
self.state = Some(IndexerState::new(&self.root_path));
|
||||
ctx.log(format!("Starting new indexer job (scope: {}, persistence: {:?})",
|
||||
self.config.scope, self.config.persistence));
|
||||
self.state = Some(IndexerState::new(&self.config.path));
|
||||
self.state.as_mut().unwrap()
|
||||
}
|
||||
};
|
||||
|
||||
// Get local path for operations
|
||||
let root_path = self.root_path.as_local_path()
|
||||
let root_path = self.config.path.as_local_path()
|
||||
.ok_or_else(|| JobError::execution("Location root path is not local".to_string()))?;
|
||||
|
||||
// Main state machine loop
|
||||
loop {
|
||||
ctx.check_interrupt().await?;
|
||||
|
||||
match state.phase.clone() {
|
||||
let current_phase = state.phase.clone();
|
||||
match current_phase {
|
||||
Phase::Discovery => {
|
||||
phases::run_discovery_phase(state, &ctx, root_path).await?;
|
||||
// Use scope-aware discovery
|
||||
if self.config.is_current_scope() {
|
||||
Self::run_current_scope_discovery_static(state, &ctx, root_path).await?;
|
||||
} else {
|
||||
phases::run_discovery_phase(state, &ctx, root_path).await?;
|
||||
}
|
||||
|
||||
// Track batch info
|
||||
self.batch_info.0 = state.entry_batches.len() as u64;
|
||||
@@ -99,24 +283,37 @@ impl JobHandler for IndexerJob {
|
||||
}
|
||||
|
||||
Phase::Processing => {
|
||||
phases::run_processing_phase(
|
||||
self.location_id,
|
||||
state,
|
||||
&ctx,
|
||||
self.mode,
|
||||
root_path,
|
||||
).await?;
|
||||
|
||||
// Update DB operation counts
|
||||
self.db_operations.1 += state.entry_batches.len() as u64 * 100; // Estimate
|
||||
if self.config.is_ephemeral() {
|
||||
let ephemeral_index = self.ephemeral_index.clone()
|
||||
.ok_or_else(|| JobError::execution("Ephemeral index not initialized".to_string()))?;
|
||||
Self::run_ephemeral_processing_static(state, &ctx, ephemeral_index).await?;
|
||||
} else {
|
||||
phases::run_processing_phase(
|
||||
self.config.location_id.expect("Location ID required for persistent jobs"),
|
||||
state,
|
||||
&ctx,
|
||||
self.config.mode,
|
||||
root_path,
|
||||
).await?;
|
||||
|
||||
// Update DB operation counts
|
||||
self.db_operations.1 += state.entry_batches.len() as u64 * 100; // Estimate
|
||||
}
|
||||
}
|
||||
|
||||
Phase::Aggregation => {
|
||||
phases::run_aggregation_phase(
|
||||
self.location_id,
|
||||
state,
|
||||
&ctx,
|
||||
).await?;
|
||||
if !self.config.is_ephemeral() {
|
||||
phases::run_aggregation_phase(
|
||||
self.config.location_id.expect("Location ID required for persistent jobs"),
|
||||
state,
|
||||
&ctx,
|
||||
).await?;
|
||||
} else {
|
||||
// Skip aggregation for ephemeral jobs
|
||||
ctx.log("Skipping aggregation phase for ephemeral job");
|
||||
state.phase = Phase::ContentIdentification;
|
||||
continue;
|
||||
}
|
||||
|
||||
// Start content timer
|
||||
if let Some(timer) = &mut self.timer {
|
||||
@@ -125,9 +322,15 @@ impl JobHandler for IndexerJob {
|
||||
}
|
||||
|
||||
Phase::ContentIdentification => {
|
||||
if self.mode >= IndexMode::Content {
|
||||
phases::run_content_phase(state, &ctx).await?;
|
||||
self.db_operations.1 += state.entries_for_content.len() as u64;
|
||||
if self.config.mode >= IndexMode::Content {
|
||||
if self.config.is_ephemeral() {
|
||||
let ephemeral_index = self.ephemeral_index.clone()
|
||||
.ok_or_else(|| JobError::execution("Ephemeral index not initialized".to_string()))?;
|
||||
Self::run_ephemeral_content_phase_static(state, &ctx, ephemeral_index).await?;
|
||||
} else {
|
||||
phases::run_content_phase(state, &ctx).await?;
|
||||
self.db_operations.1 += state.entries_for_content.len() as u64;
|
||||
}
|
||||
} else {
|
||||
ctx.log("Skipping content identification phase (mode=Shallow)");
|
||||
state.phase = Phase::Complete;
|
||||
@@ -137,8 +340,10 @@ impl JobHandler for IndexerJob {
|
||||
Phase::Complete => break,
|
||||
}
|
||||
|
||||
// Checkpoint after each phase
|
||||
ctx.checkpoint().await?;
|
||||
// Checkpoint after each phase (only for persistent jobs)
|
||||
if !self.config.is_ephemeral() {
|
||||
ctx.checkpoint().await?;
|
||||
}
|
||||
}
|
||||
|
||||
// Calculate final metrics
|
||||
@@ -158,11 +363,16 @@ impl JobHandler for IndexerJob {
|
||||
|
||||
// Generate final output
|
||||
Ok(IndexerOutput {
|
||||
location_id: self.location_id,
|
||||
location_id: self.config.location_id,
|
||||
stats: state.stats,
|
||||
duration: state.started_at.elapsed(),
|
||||
errors: state.errors.clone(),
|
||||
metrics: Some(metrics),
|
||||
ephemeral_results: if self.config.is_ephemeral() {
|
||||
self.ephemeral_index.clone()
|
||||
} else {
|
||||
None
|
||||
},
|
||||
})
|
||||
}
|
||||
|
||||
@@ -195,43 +405,165 @@ impl JobHandler for IndexerJob {
|
||||
}
|
||||
|
||||
impl IndexerJob {
|
||||
/// Create a new indexer job
|
||||
pub fn new(location_id: Uuid, root_path: SdPath, mode: IndexMode) -> Self {
|
||||
/// Create a new indexer job with enhanced configuration
|
||||
pub fn new(config: IndexerJobConfig) -> Self {
|
||||
Self {
|
||||
location_id,
|
||||
root_path,
|
||||
mode,
|
||||
config,
|
||||
state: None,
|
||||
ephemeral_index: None,
|
||||
timer: None,
|
||||
db_operations: (0, 0),
|
||||
batch_info: (0, 0),
|
||||
}
|
||||
}
|
||||
|
||||
/// Create a traditional persistent recursive indexer job
|
||||
pub fn from_location(location_id: Uuid, root_path: SdPath, mode: IndexMode) -> Self {
|
||||
Self::new(IndexerJobConfig::new(location_id, root_path, mode))
|
||||
}
|
||||
|
||||
/// Create a shallow indexer job (metadata only)
|
||||
pub fn shallow(location_id: Uuid, root_path: SdPath) -> Self {
|
||||
Self::new(location_id, root_path, IndexMode::Shallow)
|
||||
Self::from_location(location_id, root_path, IndexMode::Shallow)
|
||||
}
|
||||
|
||||
/// Create a content indexer job (with CAS IDs)
|
||||
pub fn with_content(location_id: Uuid, root_path: SdPath) -> Self {
|
||||
Self::new(location_id, root_path, IndexMode::Content)
|
||||
Self::from_location(location_id, root_path, IndexMode::Content)
|
||||
}
|
||||
|
||||
/// Create a deep indexer job (full processing)
|
||||
pub fn deep(location_id: Uuid, root_path: SdPath) -> Self {
|
||||
Self::new(location_id, root_path, IndexMode::Deep)
|
||||
Self::from_location(location_id, root_path, IndexMode::Deep)
|
||||
}
|
||||
|
||||
/// Create a UI navigation job (current scope, quick scan)
|
||||
pub fn ui_navigation(location_id: Uuid, path: SdPath) -> Self {
|
||||
Self::new(IndexerJobConfig::ui_navigation(location_id, path))
|
||||
}
|
||||
|
||||
/// Create an ephemeral browsing job (no database writes)
|
||||
pub fn ephemeral_browse(path: SdPath, scope: IndexScope) -> Self {
|
||||
Self::new(IndexerJobConfig::ephemeral_browse(path, scope))
|
||||
}
|
||||
|
||||
/// Run current scope discovery (single level only)
|
||||
async fn run_current_scope_discovery_static(
|
||||
state: &mut IndexerState,
|
||||
ctx: &JobContext<'_>,
|
||||
root_path: &std::path::Path,
|
||||
) -> JobResult<()> {
|
||||
use tokio::fs;
|
||||
use super::state::{DirEntry, EntryKind};
|
||||
use super::entry::EntryProcessor;
|
||||
|
||||
ctx.log("Starting current scope discovery (single level)");
|
||||
|
||||
let mut entries = fs::read_dir(root_path).await
|
||||
.map_err(|e| JobError::execution(format!("Failed to read directory: {}", e)))?;
|
||||
|
||||
while let Some(entry) = entries.next_entry().await
|
||||
.map_err(|e| JobError::execution(format!("Failed to read directory entry: {}", e)))? {
|
||||
|
||||
let path = entry.path();
|
||||
let metadata = entry.metadata().await
|
||||
.map_err(|e| JobError::execution(format!("Failed to read metadata: {}", e)))?;
|
||||
|
||||
let entry_kind = if metadata.is_dir() { EntryKind::Directory }
|
||||
else if metadata.is_symlink() { EntryKind::Symlink }
|
||||
else { EntryKind::File };
|
||||
|
||||
let dir_entry = DirEntry {
|
||||
path: path.clone(),
|
||||
kind: entry_kind,
|
||||
size: metadata.len(),
|
||||
modified: metadata.modified().ok(),
|
||||
inode: EntryProcessor::get_inode(&metadata),
|
||||
};
|
||||
|
||||
state.pending_entries.push(dir_entry);
|
||||
state.items_since_last_update += 1;
|
||||
|
||||
// Update stats
|
||||
match entry_kind {
|
||||
EntryKind::File => state.stats.files += 1,
|
||||
EntryKind::Directory => state.stats.dirs += 1,
|
||||
EntryKind::Symlink => state.stats.symlinks += 1,
|
||||
}
|
||||
}
|
||||
|
||||
// Create single batch and move to processing
|
||||
if !state.pending_entries.is_empty() {
|
||||
let batch = state.create_batch();
|
||||
state.entry_batches.push(batch);
|
||||
}
|
||||
|
||||
state.phase = Phase::Processing;
|
||||
ctx.log(format!("Current scope discovery complete: {} entries found", state.stats.files + state.stats.dirs));
|
||||
|
||||
Ok(())
|
||||
}
|
||||
|
||||
/// Run ephemeral processing (store in memory instead of database)
|
||||
async fn run_ephemeral_processing_static(
|
||||
state: &mut IndexerState,
|
||||
ctx: &JobContext<'_>,
|
||||
ephemeral_index: Arc<RwLock<EphemeralIndex>>,
|
||||
) -> JobResult<()> {
|
||||
use super::entry::EntryProcessor;
|
||||
|
||||
ctx.log("Starting ephemeral processing");
|
||||
|
||||
for batch in &state.entry_batches {
|
||||
for entry in batch {
|
||||
// Extract metadata
|
||||
let metadata = EntryProcessor::extract_metadata(&entry.path).await
|
||||
.map_err(|e| JobError::execution(format!("Failed to extract metadata: {}", e)))?;
|
||||
|
||||
// Store in ephemeral index
|
||||
{
|
||||
let mut index = ephemeral_index.write().await;
|
||||
index.add_entry(entry.path.clone(), metadata);
|
||||
index.stats = state.stats; // Update stats
|
||||
}
|
||||
}
|
||||
}
|
||||
|
||||
state.entry_batches.clear();
|
||||
state.phase = Phase::ContentIdentification;
|
||||
|
||||
ctx.log("Ephemeral processing complete");
|
||||
Ok(())
|
||||
}
|
||||
|
||||
/// Run ephemeral content identification
|
||||
async fn run_ephemeral_content_phase_static(
|
||||
state: &mut IndexerState,
|
||||
ctx: &JobContext<'_>,
|
||||
_ephemeral_index: Arc<RwLock<EphemeralIndex>>,
|
||||
) -> JobResult<()> {
|
||||
ctx.log("Starting ephemeral content identification");
|
||||
|
||||
// For ephemeral jobs, we can skip heavy content processing or do it lightly
|
||||
// This is mainly for demonstration - in practice you might generate CAS IDs
|
||||
|
||||
state.phase = Phase::Complete;
|
||||
ctx.log("Ephemeral content identification complete");
|
||||
|
||||
Ok(())
|
||||
}
|
||||
}
|
||||
|
||||
/// Job output with comprehensive results
|
||||
#[derive(Debug, Serialize, Deserialize)]
|
||||
pub struct IndexerOutput {
|
||||
pub location_id: Uuid,
|
||||
pub location_id: Option<Uuid>,
|
||||
pub stats: IndexerStats,
|
||||
pub duration: Duration,
|
||||
pub errors: Vec<IndexError>,
|
||||
pub metrics: Option<IndexerMetrics>,
|
||||
#[serde(skip)]
|
||||
pub ephemeral_results: Option<Arc<RwLock<EphemeralIndex>>>,
|
||||
}
|
||||
|
||||
impl From<IndexerOutput> for JobOutput {
|
||||
|
||||
@@ -16,13 +16,19 @@ pub mod metrics;
|
||||
pub mod phases;
|
||||
pub mod progress;
|
||||
pub mod change_detection;
|
||||
pub mod persistence;
|
||||
|
||||
// Re-exports for convenience
|
||||
pub use job::{IndexerJob, IndexMode};
|
||||
pub use job::{
|
||||
IndexerJob, IndexMode, IndexScope, IndexPersistence,
|
||||
IndexerJobConfig, EphemeralIndex, EphemeralContentIdentity,
|
||||
IndexerOutput
|
||||
};
|
||||
pub use state::{IndexerState, IndexerProgress, IndexPhase, IndexerStats};
|
||||
pub use entry::{EntryProcessor, EntryMetadata};
|
||||
pub use filters::should_skip_path;
|
||||
pub use metrics::IndexerMetrics;
|
||||
pub use persistence::{IndexPersistence as PersistenceTrait, PersistenceFactory};
|
||||
|
||||
// Rules system will be integrated here in the future
|
||||
// pub mod rules;
|
||||
334
core-new/src/operations/indexing/persistence.rs
Normal file
334
core-new/src/operations/indexing/persistence.rs
Normal file
@@ -0,0 +1,334 @@
|
||||
//! Persistence abstraction layer for indexing operations
|
||||
//!
|
||||
//! This module provides a unified interface for storing indexing results
|
||||
//! either persistently in the database or ephemerally in memory.
|
||||
|
||||
use crate::{
|
||||
infrastructure::{
|
||||
database::entities,
|
||||
jobs::prelude::{JobContext, JobError, JobResult},
|
||||
},
|
||||
domain::content_identity::ContentKind,
|
||||
file_type::FileTypeRegistry,
|
||||
};
|
||||
use sea_orm::{ActiveModelTrait, ActiveValue::Set, QueryFilter, ColumnTrait};
|
||||
use std::{path::Path, sync::Arc, collections::HashMap};
|
||||
use uuid::Uuid;
|
||||
use tokio::sync::RwLock;
|
||||
|
||||
use super::{
|
||||
state::{DirEntry, EntryKind, IndexerState},
|
||||
entry::EntryMetadata,
|
||||
job::{EphemeralIndex, EphemeralContentIdentity},
|
||||
};
|
||||
|
||||
/// Abstraction for storing indexing results
|
||||
#[async_trait::async_trait]
|
||||
pub trait IndexPersistence: Send + Sync {
|
||||
/// Store an entry and return its ID
|
||||
async fn store_entry(
|
||||
&self,
|
||||
entry: &DirEntry,
|
||||
location_id: Option<i32>,
|
||||
location_root_path: &Path,
|
||||
) -> JobResult<i32>;
|
||||
|
||||
/// Store content identity and link to entry
|
||||
async fn store_content_identity(
|
||||
&self,
|
||||
entry_id: i32,
|
||||
path: &Path,
|
||||
cas_id: String,
|
||||
) -> JobResult<()>;
|
||||
|
||||
/// Get existing entries for change detection
|
||||
async fn get_existing_entries(
|
||||
&self,
|
||||
path: &Path,
|
||||
) -> JobResult<HashMap<std::path::PathBuf, (i32, Option<u64>, Option<std::time::SystemTime>)>>;
|
||||
|
||||
/// Update an existing entry
|
||||
async fn update_entry(
|
||||
&self,
|
||||
entry_id: i32,
|
||||
entry: &DirEntry,
|
||||
) -> JobResult<()>;
|
||||
|
||||
/// Check if this persistence layer supports operations
|
||||
fn is_persistent(&self) -> bool;
|
||||
}
|
||||
|
||||
/// Database-backed persistence implementation
|
||||
pub struct DatabasePersistence<'a> {
|
||||
ctx: &'a JobContext<'a>,
|
||||
location_id: i32,
|
||||
device_id: i32,
|
||||
entry_id_cache: Arc<RwLock<HashMap<std::path::PathBuf, i32>>>,
|
||||
}
|
||||
|
||||
impl<'a> DatabasePersistence<'a> {
|
||||
pub fn new(
|
||||
ctx: &'a JobContext<'a>,
|
||||
location_id: i32,
|
||||
device_id: i32,
|
||||
) -> Self {
|
||||
Self {
|
||||
ctx,
|
||||
location_id,
|
||||
device_id,
|
||||
entry_id_cache: Arc::new(RwLock::new(HashMap::new())),
|
||||
}
|
||||
}
|
||||
}
|
||||
|
||||
#[async_trait::async_trait]
|
||||
impl<'a> IndexPersistence for DatabasePersistence<'a> {
|
||||
async fn store_entry(
|
||||
&self,
|
||||
entry: &DirEntry,
|
||||
_location_id: Option<i32>,
|
||||
location_root_path: &Path,
|
||||
) -> JobResult<i32> {
|
||||
use super::entry::EntryProcessor;
|
||||
|
||||
// Calculate relative directory path from location root (without filename)
|
||||
let relative_path = if let Ok(rel_path) = entry.path.strip_prefix(location_root_path) {
|
||||
// Get parent directory relative to location root
|
||||
if let Some(parent) = rel_path.parent() {
|
||||
if parent == std::path::Path::new("") {
|
||||
String::new()
|
||||
} else {
|
||||
parent.to_string_lossy().to_string()
|
||||
}
|
||||
} else {
|
||||
String::new()
|
||||
}
|
||||
} else {
|
||||
String::new()
|
||||
};
|
||||
|
||||
// Extract file extension (without dot) for files, None for directories
|
||||
let extension = match entry.kind {
|
||||
EntryKind::File => {
|
||||
entry.path.extension()
|
||||
.and_then(|ext| ext.to_str())
|
||||
.map(|ext| ext.to_lowercase())
|
||||
}
|
||||
EntryKind::Directory | EntryKind::Symlink => None,
|
||||
};
|
||||
|
||||
// Get file name without extension (stem)
|
||||
let name = entry.path.file_stem()
|
||||
.map(|stem| stem.to_string_lossy().to_string())
|
||||
.unwrap_or_else(|| {
|
||||
entry.path.file_name()
|
||||
.map(|n| n.to_string_lossy().to_string())
|
||||
.unwrap_or_else(|| "unknown".to_string())
|
||||
});
|
||||
|
||||
// Convert timestamps
|
||||
let modified_at = entry.modified
|
||||
.and_then(|t| chrono::DateTime::from_timestamp(
|
||||
t.duration_since(std::time::UNIX_EPOCH).ok()?.as_secs() as i64, 0
|
||||
))
|
||||
.unwrap_or_else(|| chrono::Utc::now());
|
||||
|
||||
// Create entry
|
||||
let new_entry = entities::entry::ActiveModel {
|
||||
uuid: Set(Uuid::new_v4()),
|
||||
location_id: Set(self.location_id),
|
||||
relative_path: Set(relative_path),
|
||||
name: Set(name),
|
||||
kind: Set(EntryProcessor::entry_kind_to_int(entry.kind)),
|
||||
extension: Set(extension),
|
||||
metadata_id: Set(None), // User metadata only created when user adds metadata
|
||||
content_id: Set(None), // Will be set later if content indexing is enabled
|
||||
size: Set(entry.size as i64),
|
||||
aggregate_size: Set(0), // Will be calculated in aggregation phase
|
||||
child_count: Set(0), // Will be calculated in aggregation phase
|
||||
file_count: Set(0), // Will be calculated in aggregation phase
|
||||
created_at: Set(chrono::Utc::now()),
|
||||
modified_at: Set(modified_at),
|
||||
accessed_at: Set(None),
|
||||
permissions: Set(None), // TODO: Could extract from metadata
|
||||
inode: Set(entry.inode.map(|i| i as i64)),
|
||||
..Default::default()
|
||||
};
|
||||
|
||||
let result = new_entry.insert(self.ctx.library_db()).await
|
||||
.map_err(|e| JobError::execution(format!("Failed to create entry: {}", e)))?;
|
||||
|
||||
// Cache the entry ID for potential children
|
||||
{
|
||||
let mut cache = self.entry_id_cache.write().await;
|
||||
cache.insert(entry.path.clone(), result.id);
|
||||
}
|
||||
|
||||
Ok(result.id)
|
||||
}
|
||||
|
||||
async fn store_content_identity(
|
||||
&self,
|
||||
entry_id: i32,
|
||||
path: &Path,
|
||||
cas_id: String,
|
||||
) -> JobResult<()> {
|
||||
use super::entry::EntryProcessor;
|
||||
|
||||
// Delegate to existing implementation
|
||||
EntryProcessor::create_content_identity(self.ctx, entry_id, path, cas_id).await
|
||||
}
|
||||
|
||||
async fn get_existing_entries(
|
||||
&self,
|
||||
_path: &Path,
|
||||
) -> JobResult<HashMap<std::path::PathBuf, (i32, Option<u64>, Option<std::time::SystemTime>)>> {
|
||||
// TODO: Implement change detection query
|
||||
Ok(HashMap::new())
|
||||
}
|
||||
|
||||
async fn update_entry(
|
||||
&self,
|
||||
entry_id: i32,
|
||||
entry: &DirEntry,
|
||||
) -> JobResult<()> {
|
||||
use super::entry::EntryProcessor;
|
||||
|
||||
// Delegate to existing implementation
|
||||
EntryProcessor::update_entry(self.ctx, entry_id, entry).await
|
||||
}
|
||||
|
||||
fn is_persistent(&self) -> bool {
|
||||
true
|
||||
}
|
||||
}
|
||||
|
||||
/// In-memory ephemeral persistence implementation
|
||||
pub struct EphemeralPersistence {
|
||||
index: Arc<RwLock<EphemeralIndex>>,
|
||||
next_entry_id: Arc<RwLock<i32>>,
|
||||
}
|
||||
|
||||
impl EphemeralPersistence {
|
||||
pub fn new(index: Arc<RwLock<EphemeralIndex>>) -> Self {
|
||||
Self {
|
||||
index,
|
||||
next_entry_id: Arc::new(RwLock::new(1)),
|
||||
}
|
||||
}
|
||||
|
||||
async fn get_next_id(&self) -> i32 {
|
||||
let mut id = self.next_entry_id.write().await;
|
||||
let current = *id;
|
||||
*id += 1;
|
||||
current
|
||||
}
|
||||
}
|
||||
|
||||
#[async_trait::async_trait]
|
||||
impl IndexPersistence for EphemeralPersistence {
|
||||
async fn store_entry(
|
||||
&self,
|
||||
entry: &DirEntry,
|
||||
_location_id: Option<i32>,
|
||||
_location_root_path: &Path,
|
||||
) -> JobResult<i32> {
|
||||
use super::entry::EntryProcessor;
|
||||
|
||||
// Extract full metadata
|
||||
let metadata = EntryProcessor::extract_metadata(&entry.path).await
|
||||
.map_err(|e| JobError::execution(format!("Failed to extract metadata: {}", e)))?;
|
||||
|
||||
// Store in ephemeral index
|
||||
{
|
||||
let mut index = self.index.write().await;
|
||||
index.add_entry(entry.path.clone(), metadata);
|
||||
|
||||
// Update stats
|
||||
match entry.kind {
|
||||
EntryKind::File => index.stats.files += 1,
|
||||
EntryKind::Directory => index.stats.dirs += 1,
|
||||
EntryKind::Symlink => index.stats.symlinks += 1,
|
||||
}
|
||||
index.stats.bytes += entry.size;
|
||||
}
|
||||
|
||||
Ok(self.get_next_id().await)
|
||||
}
|
||||
|
||||
async fn store_content_identity(
|
||||
&self,
|
||||
_entry_id: i32,
|
||||
path: &Path,
|
||||
cas_id: String,
|
||||
) -> JobResult<()> {
|
||||
// Get file size
|
||||
let file_size = tokio::fs::metadata(path).await
|
||||
.map(|m| m.len())
|
||||
.unwrap_or(0);
|
||||
|
||||
// Detect file type using the file type registry
|
||||
let registry = FileTypeRegistry::default();
|
||||
let mime_type = if let Ok(result) = registry.identify(path).await {
|
||||
result.file_type.primary_mime_type().map(|s| s.to_string())
|
||||
} else {
|
||||
None
|
||||
};
|
||||
|
||||
let content_identity = EphemeralContentIdentity {
|
||||
cas_id: cas_id.clone(),
|
||||
mime_type,
|
||||
file_size,
|
||||
entry_count: 1,
|
||||
};
|
||||
|
||||
{
|
||||
let mut index = self.index.write().await;
|
||||
index.add_content_identity(cas_id, content_identity);
|
||||
}
|
||||
|
||||
Ok(())
|
||||
}
|
||||
|
||||
async fn get_existing_entries(
|
||||
&self,
|
||||
_path: &Path,
|
||||
) -> JobResult<HashMap<std::path::PathBuf, (i32, Option<u64>, Option<std::time::SystemTime>)>> {
|
||||
// Ephemeral persistence doesn't support change detection
|
||||
Ok(HashMap::new())
|
||||
}
|
||||
|
||||
async fn update_entry(
|
||||
&self,
|
||||
_entry_id: i32,
|
||||
_entry: &DirEntry,
|
||||
) -> JobResult<()> {
|
||||
// Updates not needed for ephemeral storage
|
||||
Ok(())
|
||||
}
|
||||
|
||||
fn is_persistent(&self) -> bool {
|
||||
false
|
||||
}
|
||||
}
|
||||
|
||||
/// Factory for creating appropriate persistence implementations
|
||||
pub struct PersistenceFactory;
|
||||
|
||||
impl PersistenceFactory {
|
||||
/// Create a database persistence instance
|
||||
pub fn database<'a>(
|
||||
ctx: &'a JobContext<'a>,
|
||||
location_id: i32,
|
||||
device_id: i32,
|
||||
) -> Box<dyn IndexPersistence + 'a> {
|
||||
Box::new(DatabasePersistence::new(ctx, location_id, device_id))
|
||||
}
|
||||
|
||||
/// Create an ephemeral persistence instance
|
||||
pub fn ephemeral(
|
||||
index: Arc<RwLock<EphemeralIndex>>,
|
||||
) -> Box<dyn IndexPersistence + Send + Sync> {
|
||||
Box::new(EphemeralPersistence::new(index))
|
||||
}
|
||||
}
|
||||
@@ -64,6 +64,9 @@ pub async fn run_aggregation_phase(
|
||||
total_found: state.stats,
|
||||
processing_rate: state.calculate_rate(),
|
||||
estimated_remaining: state.estimate_remaining(),
|
||||
scope: None,
|
||||
persistence: None,
|
||||
is_ephemeral: false,
|
||||
};
|
||||
ctx.progress(Progress::generic(indexer_progress.to_generic_progress()));
|
||||
|
||||
|
||||
@@ -48,6 +48,9 @@ pub async fn run_content_phase(
|
||||
total_found: state.stats,
|
||||
processing_rate: state.calculate_rate(),
|
||||
estimated_remaining: state.estimate_remaining(),
|
||||
scope: None,
|
||||
persistence: None,
|
||||
is_ephemeral: false,
|
||||
};
|
||||
ctx.progress(Progress::generic(indexer_progress.to_generic_progress()));
|
||||
|
||||
|
||||
@@ -47,6 +47,9 @@ pub async fn run_discovery_phase(
|
||||
total_found: state.stats,
|
||||
processing_rate: state.calculate_rate(),
|
||||
estimated_remaining: state.estimate_remaining(),
|
||||
scope: None,
|
||||
persistence: None,
|
||||
is_ephemeral: false,
|
||||
};
|
||||
ctx.progress(Progress::generic(indexer_progress.to_generic_progress()));
|
||||
|
||||
|
||||
@@ -66,6 +66,9 @@ pub async fn run_processing_phase(
|
||||
total_found: state.stats,
|
||||
processing_rate: state.calculate_rate(),
|
||||
estimated_remaining: state.estimate_remaining(),
|
||||
scope: None,
|
||||
persistence: None,
|
||||
is_ephemeral: false,
|
||||
};
|
||||
ctx.progress(Progress::generic(indexer_progress.to_generic_progress()));
|
||||
|
||||
|
||||
@@ -88,6 +88,9 @@ mod tests {
|
||||
total_found: IndexerStats::default(),
|
||||
processing_rate: 0.0,
|
||||
estimated_remaining: None,
|
||||
scope: None,
|
||||
persistence: None,
|
||||
is_ephemeral: false,
|
||||
};
|
||||
|
||||
let generic = indexer_progress.to_generic_progress();
|
||||
@@ -111,6 +114,9 @@ mod tests {
|
||||
},
|
||||
processing_rate: 25.5,
|
||||
estimated_remaining: Some(Duration::from_secs(120)),
|
||||
scope: None,
|
||||
persistence: None,
|
||||
is_ephemeral: false,
|
||||
};
|
||||
|
||||
let generic = indexer_progress.to_generic_progress();
|
||||
@@ -131,6 +137,9 @@ mod tests {
|
||||
total_found: IndexerStats::default(),
|
||||
processing_rate: 12.0,
|
||||
estimated_remaining: Some(Duration::from_secs(30)),
|
||||
scope: None,
|
||||
persistence: None,
|
||||
is_ephemeral: false,
|
||||
};
|
||||
|
||||
let generic = indexer_progress.to_generic_progress();
|
||||
@@ -148,6 +157,9 @@ mod tests {
|
||||
total_found: IndexerStats::default(),
|
||||
processing_rate: 0.0,
|
||||
estimated_remaining: Some(Duration::from_secs(5)),
|
||||
scope: None,
|
||||
persistence: None,
|
||||
is_ephemeral: false,
|
||||
};
|
||||
|
||||
let generic = indexer_progress.to_generic_progress();
|
||||
|
||||
@@ -16,6 +16,11 @@ pub struct IndexerProgress {
|
||||
pub total_found: IndexerStats,
|
||||
pub processing_rate: f32,
|
||||
pub estimated_remaining: Option<Duration>,
|
||||
#[serde(skip_serializing_if = "Option::is_none")]
|
||||
pub scope: Option<super::job::IndexScope>,
|
||||
#[serde(skip_serializing_if = "Option::is_none")]
|
||||
pub persistence: Option<super::job::IndexPersistence>,
|
||||
pub is_ephemeral: bool,
|
||||
}
|
||||
|
||||
/// Statistics collected during indexing
|
||||
|
||||
Reference in New Issue
Block a user