* feat: Add selective folder scanning capability Implement targeted scanning of specific library/folder pairs without full recursion. This enables efficient rescanning of individual folders when changes are detected, significantly reducing scan time for large libraries. Key changes: - Add ScanTarget struct and ScanFolders API to Scanner interface - Implement CLI flag --targets for specifying libraryID:folderPath pairs - Add FolderRepository.GetByPaths() for batch folder info retrieval - Create loadSpecificFolders() for non-recursive directory loading - Scope GC operations to affected libraries only (with TODO for full impl) - Add comprehensive tests for selective scanning behavior The selective scan: - Only processes specified folders (no subdirectory recursion) - Maintains library isolation - Runs full maintenance pipeline scoped to affected libraries - Supports both full and quick scan modes Examples: navidrome scan --targets "1:Music/Rock,1:Music/Jazz" navidrome scan --full --targets "2:Classical" * feat(folder): replace GetByPaths with GetFolderUpdateInfo for improved folder updates retrieval Signed-off-by: Deluan <deluan@navidrome.org> * test: update parseTargets test to handle folder names with spaces Signed-off-by: Deluan <deluan@navidrome.org> * refactor(folder): remove unused LibraryPath struct and update GC logging message Signed-off-by: Deluan <deluan@navidrome.org> * refactor(folder): enhance external scanner to support target-specific scanning Signed-off-by: Deluan <deluan@navidrome.org> * refactor(scanner): simplify scanner methods Signed-off-by: Deluan <deluan@navidrome.org> * feat(watcher): implement folder scanning notifications with deduplication Signed-off-by: Deluan <deluan@navidrome.org> * refactor(watcher): add resolveFolderPath function for testability Signed-off-by: Deluan <deluan@navidrome.org> * feat(watcher): implement path ignoring based on .ndignore patterns Signed-off-by: Deluan <deluan@navidrome.org> * refactor(scanner): implement IgnoreChecker for managing .ndignore patterns Signed-off-by: Deluan <deluan@navidrome.org> * refactor(ignore_checker): rename scanner to lineScanner for clarity Signed-off-by: Deluan <deluan@navidrome.org> * refactor(scanner): enhance ScanTarget struct with String method for better target representation Signed-off-by: Deluan <deluan@navidrome.org> * fix(scanner): validate library ID to prevent negative values Signed-off-by: Deluan <deluan@navidrome.org> * refactor(scanner): simplify GC method by removing library ID parameter Signed-off-by: Deluan <deluan@navidrome.org> * feat(scanner): update folder scanning to include all descendants of specified folders Signed-off-by: Deluan <deluan@navidrome.org> * feat(subsonic): allow selective scan in the /startScan endpoint Signed-off-by: Deluan <deluan@navidrome.org> * refactor(scanner): update CallScan to handle specific library/folder pairs Signed-off-by: Deluan <deluan@navidrome.org> * refactor(scanner): streamline scanning logic by removing scanAll method Signed-off-by: Deluan <deluan@navidrome.org> * test: enhance mockScanner for thread safety and improve test reliability Signed-off-by: Deluan <deluan@navidrome.org> * refactor(scanner): move scanner.ScanTarget to model.ScanTarget Signed-off-by: Deluan <deluan@navidrome.org> * refactor: move scanner types to model,implement MockScanner Signed-off-by: Deluan <deluan@navidrome.org> * refactor(scanner): update scanner interface and implementations to use model.Scanner Signed-off-by: Deluan <deluan@navidrome.org> * refactor(folder_repository): normalize target path handling by using filepath.Clean Signed-off-by: Deluan <deluan@navidrome.org> * test(folder_repository): add comprehensive tests for folder retrieval and child exclusion Signed-off-by: Deluan <deluan@navidrome.org> * refactor(scanner): simplify selective scan logic using slice.Filter Signed-off-by: Deluan <deluan@navidrome.org> * refactor(scanner): streamline phase folder and album creation by removing unnecessary library parameter Signed-off-by: Deluan <deluan@navidrome.org> * refactor(scanner): move initialization logic from phase_1 to the scanner itself Signed-off-by: Deluan <deluan@navidrome.org> * refactor(tests): rename selective scan test file to scanner_selective_test.go Signed-off-by: Deluan <deluan@navidrome.org> * feat(configuration): add DevSelectiveWatcher configuration option Signed-off-by: Deluan <deluan@navidrome.org> * feat(watcher): enhance .ndignore handling for folder deletions and file changes Signed-off-by: Deluan <deluan@navidrome.org> * docs(scanner): comments Signed-off-by: Deluan <deluan@navidrome.org> * refactor(scanner): enhance walkDirTree to support target folder scanning Signed-off-by: Deluan <deluan@navidrome.org> * fix(scanner, watcher): handle errors when pushing ignore patterns for folders Signed-off-by: Deluan <deluan@navidrome.org> * Update scanner/phase_1_folders.go Co-authored-by: Copilot <175728472+Copilot@users.noreply.github.com> * refactor(scanner): replace parseTargets function with direct call to scanner.ParseTargets Signed-off-by: Deluan <deluan@navidrome.org> * test(scanner): add tests for ScanBegin and ScanEnd functionality Signed-off-by: Deluan <deluan@navidrome.org> * fix(library): update PRAGMA optimize to check table sizes without ANALYZE Signed-off-by: Deluan <deluan@navidrome.org> * test(scanner): refactor tests Signed-off-by: Deluan <deluan@navidrome.org> * feat(ui): add selective scan options and update translations Signed-off-by: Deluan <deluan@navidrome.org> * feat(ui): add quick and full scan options for individual libraries Signed-off-by: Deluan <deluan@navidrome.org> * feat(ui): add Scan buttonsto the LibraryList Signed-off-by: Deluan <deluan@navidrome.org> * feat(scan): update scanning parameters from 'path' to 'target' for selective scans. * refactor(scan): move ParseTargets function to model package * test(scan): suppress unused return value from SetUserLibraries in tests * feat(gc): enhance garbage collection to support selective library purging Signed-off-by: Deluan <deluan@navidrome.org> * fix(scanner): prevent race condition when scanning deleted folders When the watcher detects changes in a folder that gets deleted before the scanner runs (due to the 10-second delay), the scanner was prematurely removing these folders from the tracking map, preventing them from being marked as missing. The issue occurred because `newFolderEntry` was calling `popLastUpdate` before verifying the folder actually exists on the filesystem. Changes: - Move fs.Stat check before newFolderEntry creation in loadDir to ensure deleted folders remain in lastUpdates for finalize() to handle - Add early existence check in walkDirTree to skip non-existent target folders with a warning log - Add unit test verifying non-existent folders aren't removed from lastUpdates prematurely - Add integration test for deleted folder scenario with ScanFolders Fixes the issue where deleting entire folders (e.g., /music/AC_DC) wouldn't mark tracks as missing when using selective folder scanning. * refactor(scan): streamline folder entry creation and update handling Signed-off-by: Deluan <deluan@navidrome.org> * feat(scan): add '@Recycle' (QNAP) to ignored directories list Signed-off-by: Deluan <deluan@navidrome.org> * fix(log): improve thread safety in logging level management * test(scan): move unit tests for ParseTargets function Signed-off-by: Deluan <deluan@navidrome.org> * review Signed-off-by: Deluan <deluan@navidrome.org> --------- Signed-off-by: Deluan <deluan@navidrome.org> Co-authored-by: Copilot <175728472+Copilot@users.noreply.github.com> Co-authored-by: deluan <deluan.quintao@mechanical-orchard.com>
Navidrome Scanner: Technical Overview
This document provides a comprehensive technical explanation of Navidrome's music library scanner system.
Architecture Overview
The Navidrome scanner is built on a multi-phase pipeline architecture designed for efficient processing of music files. It systematically traverses file system directories, processes metadata, and maintains a database representation of the music library. A key performance feature is that some phases run sequentially while others execute in parallel.
flowchart TD
subgraph "Scanner Execution Flow"
Controller[Scanner Controller] --> Scanner[Scanner Implementation]
Scanner --> Phase1[Phase 1: Folders Scan]
Phase1 --> Phase2[Phase 2: Missing Tracks]
Phase2 --> ParallelPhases
subgraph ParallelPhases["Parallel Execution"]
Phase3[Phase 3: Refresh Albums]
Phase4[Phase 4: Playlist Import]
end
ParallelPhases --> FinalSteps[Final Steps: GC + Stats]
end
%% Triggers that can initiate a scan
FileChanges[File System Changes] -->|Detected by| Watcher[Filesystem Watcher]
Watcher -->|Triggers| Controller
ScheduledJob[Scheduled Job] -->|Based on Scanner.Schedule| Controller
ServerStartup[Server Startup] -->|If Scanner.ScanOnStartup=true| Controller
ManualTrigger[Manual Scan via UI/API] -->|Admin user action| Controller
CLICommand[Command Line: navidrome scan] -->|Direct invocation| Controller
PIDChange[PID Configuration Change] -->|Forces full scan| Controller
DBMigration[Database Migration] -->|May require full scan| Controller
Scanner -.->|Alternative| External[External Scanner Process]
The execution flow shows that Phases 1 and 2 run sequentially, while Phases 3 and 4 execute in parallel to maximize performance before the final processing steps.
Core Components
Scanner Controller (controller.go)
This is the entry point for all scanning operations. It provides:
- Public API for initiating scans and checking scan status
- Event broadcasting to notify clients about scan progress
- Serialization of scan operations (prevents concurrent scans)
- Progress tracking and monitoring
- Error collection and reporting
type Scanner interface {
// ScanAll starts a full scan of the music library. This is a blocking operation.
ScanAll(ctx context.Context, fullScan bool) (warnings []string, err error)
Status(context.Context) (*StatusInfo, error)
}
Scanner Implementation (scanner.go)
The primary implementation that orchestrates the four-phase scanning pipeline. Each phase follows the Phase interface pattern:
type phase[T any] interface {
producer() ppl.Producer[T]
stages() []ppl.Stage[T]
finalize(error) error
description() string
}
This design enables:
- Type-safe pipeline construction with generics
- Modular phase implementation
- Separation of concerns
- Easy measurement of performance
External Scanner (external.go)
The External Scanner is a specialized implementation that offloads the scanning process to a separate subprocess. This is specifically designed to address memory management challenges in long-running Navidrome instances.
// scannerExternal is a scanner that runs an external process to do the scanning. It is used to avoid
// memory leaks or retention in the main process, as the scanner can consume a lot of memory. The
// external process will be spawned with the same executable as the current process, and will run
// the "scan" command with the "--subprocess" flag.
//
// The external process will send progress updates to the main process through its STDOUT, and the main
// process will forward them to the caller.
sequenceDiagram
participant MP as Main Process
participant ES as External Scanner
participant SP as Subprocess (navidrome scan --subprocess)
participant FS as File System
participant DB as Database
Note over MP: DevExternalScanner=true
MP->>ES: ScanAll(ctx, fullScan)
activate ES
ES->>ES: Locate executable path
ES->>SP: Start subprocess with args:<br>scan --subprocess --configfile ... etc.
activate SP
Note over ES,SP: Create pipe for communication
par Subprocess executes scan
SP->>FS: Read files & metadata
SP->>DB: Update database
and Main process monitors progress
loop For each progress update
SP->>ES: Send encoded progress info via stdout pipe
ES->>MP: Forward progress info
end
end
SP-->>ES: Subprocess completes (success/error)
deactivate SP
ES-->>MP: Return aggregated warnings/errors
deactivate ES
Technical details:
-
Process Isolation
- Spawns a separate process using the same executable
- Uses the
--subprocessflag to indicate it's running as a child process - Preserves configuration by passing required flags (
--configfile,--datafolder, etc.)
-
Inter-Process Communication
- Uses a pipe for bidirectional communication
- Encodes/decodes progress updates using Go's
gobencoding for efficient binary transfer - Properly handles process termination and error propagation
-
Memory Management Benefits
- Scanning operations can be memory-intensive, especially with large music libraries
- Memory leaks or excessive allocations are automatically cleaned up when the process terminates
- Main Navidrome process remains stable even if scanner encounters memory-related issues
-
Error Handling
- Detects non-zero exit codes from the subprocess
- Propagates error messages back to the main process
- Ensures resources are properly cleaned up, even in error conditions
Scanning Process Flow
Phase 1: Folder Scan (phase_1_folders.go)
This phase handles the initial traversal and media file processing.
flowchart TD
A[Start Phase 1] --> B{Full Scan?}
B -- Yes --> C[Scan All Folders]
B -- No --> D[Scan Modified Folders]
C --> E[Read File Metadata]
D --> E
E --> F[Create Artists]
E --> G[Create Albums]
F --> H[Save to Database]
G --> H
H --> I[Mark Missing Folders]
I --> J[End Phase 1]
Technical implementation details:
-
Folder Traversal
- Uses
walkDirTreeto traverse the directory structure - Handles symbolic links and hidden files
- Processes
.ndignorefiles for exclusions - Maps files to appropriate types (audio, image, playlist)
- Uses
-
Metadata Extraction
- Processes files in batches (defined by
filesBatchSize = 200) - Extracts metadata using the configured storage backend
- Converts raw metadata to
MediaFileobjects - Collects and normalizes tag information
- Processes files in batches (defined by
-
Album and Artist Creation
- Groups tracks by album ID
- Creates album records from track metadata
- Handles album ID changes by tracking previous IDs
- Creates artist records from track participants
-
Database Persistence
- Uses transactions for atomic updates
- Preserves album annotations across ID changes
- Updates library-artist mappings
- Marks missing tracks for later processing
- Pre-caches artwork for performance
Phase 2: Missing Tracks Processing (phase_2_missing_tracks.go)
This phase identifies tracks that have moved or been deleted.
flowchart TD
A[Start Phase 2] --> B[Load Libraries]
B --> C[Get Missing and Matching Tracks]
C --> D[Group by PID]
D --> E{Match Type?}
E -- Exact --> F[Update Path]
E -- Same PID --> G[Update If Only One]
E -- Equivalent --> H[Update If No Better Match]
F --> I[End Phase 2]
G --> I
H --> I
Technical implementation details:
-
Track Identification Strategy
- Uses persistent identifiers (PIDs) to track tracks across scans
- Loads missing tracks and potential matches from the database
- Groups tracks by PID to limit comparison scope
-
Match Analysis
- Applies three levels of matching criteria:
- Exact match (full metadata equivalence)
- Single match for a PID
- Equivalent match (same base path or similar metadata)
- Prioritizes matches in order of confidence
- Applies three levels of matching criteria:
-
Database Update Strategy
- Preserves the original track ID
- Updates the path to the new location
- Deletes the duplicate entry
- Uses transactions to ensure atomicity
Phase 3: Album Refresh (phase_3_refresh_albums.go)
This phase updates album information based on the latest track metadata.
flowchart TD
A[Start Phase 3] --> B[Load Touched Albums]
B --> C[Filter Unmodified]
C --> D{Changes Detected?}
D -- Yes --> E[Refresh Album Data]
D -- No --> F[Skip]
E --> G[Update Database]
F --> H[End Phase 3]
G --> H
H --> I[Refresh Statistics]
Technical implementation details:
-
Album Selection Logic
- Loads albums that have been "touched" in previous phases
- Uses a producer-consumer pattern for efficient processing
- Retrieves all media files for each album for completeness
-
Change Detection
- Rebuilds album metadata from associated tracks
- Compares album attributes for changes
- Skips albums with no media files
- Avoids unnecessary database updates
-
Statistics Refreshing
- Updates album play counts
- Updates artist play counts
- Maintains consistency between related entities
Phase 4: Playlist Import (phase_4_playlists.go)
This phase imports and updates playlists from the file system.
flowchart TD
A[Start Phase 4] --> B{AutoImportPlaylists?}
B -- No --> C[Skip]
B -- Yes --> D{Admin User Exists?}
D -- No --> E[Log Warning & Skip]
D -- Yes --> F[Load Folders with Playlists]
F --> G{For Each Folder}
G --> H[Read Directory]
H --> I{For Each Playlist}
I --> J[Import Playlist]
J --> K[Pre-cache Artwork]
K --> L[End Phase 4]
C --> L
E --> L
Technical implementation details:
-
Playlist Discovery
- Loads folders known to contain playlists
- Focuses on folders that have been touched in previous phases
- Handles both playlist formats (M3U, NSP)
-
Import Process
- Uses the core.Playlists service for import
- Handles both regular and smart playlists
- Updates existing playlists when changed
- Pre-caches playlist cover art
-
Configuration Awareness
- Respects the AutoImportPlaylists setting
- Requires an admin user for playlist import
- Logs appropriate messages for configuration issues
Final Processing Steps
After the four main phases, several finalization steps occur:
-
Garbage Collection
- Removes dangling tracks with no files
- Cleans up empty albums
- Removes orphaned artists
- Deletes orphaned annotations
-
Statistics Refresh
- Updates artist song and album counts
- Refreshes tag usage statistics
- Updates aggregate metrics
-
Library Status Update
- Marks scan as completed
- Updates last scan timestamp
- Stores persistent ID configuration
-
Database Optimization
- Performs database maintenance
- Optimizes tables and indexes
- Reclaims space from deleted records
File System Watching
The watcher system (watcher.go) provides real-time monitoring of file system changes:
flowchart TD
A[Start Watcher] --> B[For Each Library]
B --> C[Start Library Watcher]
C --> D[Monitor File Events]
D --> E{Change Detected?}
E -- Yes --> F[Wait for More Changes]
F --> G{Time Elapsed?}
G -- Yes --> H[Trigger Scan]
G -- No --> F
H --> I[Wait for Scan Completion]
I --> D
Technical implementation details:
-
Event Throttling
- Uses a timer to batch changes
- Prevents excessive rescanning
- Configurable wait period
-
Library-specific Watching
- Each library has its own watcher goroutine
- Translates paths to library-relative paths
- Filters irrelevant changes
-
Platform Adaptability
- Uses storage-provided watcher implementation
- Supports different notification mechanisms per platform
- Graceful fallback when watching is not supported
Edge Cases and Optimizations
Handling Album ID Changes
The scanner carefully manages album identity across scans:
- Tracks previous album IDs to handle ID generation changes
- Preserves annotations when IDs change
- Maintains creation timestamps for consistent sorting
Detecting Moved Files
A sophisticated algorithm identifies moved files:
- Groups missing and new files by their Persistent ID
- Applies multiple matching strategies in priority order
- Updates paths rather than creating duplicate entries
Resuming Interrupted Scans
If a scan is interrupted:
- The next scan detects this condition
- Forces a full scan if the previous one was a full scan
- Continues from where it left off for incremental scans
Memory Efficiency
Several strategies minimize memory usage:
- Batched file processing (200 files at a time)
- External scanner process option
- Database-side filtering where possible
- Stream processing with pipelines
Concurrency Control
The scanner implements a sophisticated concurrency model to optimize performance:
-
Phase-Level Parallelism:
- Phases 1 and 2 run sequentially due to their dependencies
- Phases 3 and 4 run in parallel using the
chain.RunParallel()function - Final steps run sequentially to ensure data consistency
-
Within-Phase Concurrency:
- Each phase has configurable concurrency for its stages
- For example,
phase_1_folders.goprocesses folders concurrently:ppl.NewStage(p.processFolder, ppl.Name("process folder"), ppl.Concurrency(conf.Server.DevScannerThreads)) - Multiple stages can exist within a phase, each with its own concurrency level
-
Pipeline Architecture Benefits:
- Producer-consumer pattern minimizes memory usage
- Work is streamed through stages rather than accumulated
- Back-pressure is automatically managed
-
Thread Safety Mechanisms:
- Atomic counters for statistics gathering
- Mutex protection for shared resources
- Transactional database operations
Configuration Options
The scanner's behavior can be customized through several configuration settings that directly affect its operation:
Core Scanner Options
| Setting | Description | Default |
|---|---|---|
Scanner.Enabled |
Whether the automatic scanner is enabled | true |
Scanner.Schedule |
Cron expression or duration for scheduled scans (e.g., "@daily") | "0" (disabled) |
Scanner.ScanOnStartup |
Whether to scan when the server starts | true |
Scanner.WatcherWait |
Delay before triggering scan after file changes detected | 5s |
Scanner.ArtistJoiner |
String used to join multiple artists in track metadata | " • " |
Playlist Processing
| Setting | Description | Default |
|---|---|---|
PlaylistsPath |
Path(s) to search for playlists (supports glob patterns) | "" |
AutoImportPlaylists |
Whether to import playlists during scanning | true |
Performance Options
| Setting | Description | Default |
|---|---|---|
DevExternalScanner |
Use external process for scanning (reduces memory issues) | true |
DevScannerThreads |
Number of concurrent processing threads during scanning | 5 |
Persistent ID Options
| Setting | Description | Default |
|---|---|---|
PID.Track |
Format for track persistent IDs (critical for tracking moved files) | "musicbrainz_trackid|albumid,discnumber,tracknumber,title" |
PID.Album |
Format for album persistent IDs (affects album grouping) | "musicbrainz_albumid|albumartistid,album,albumversion,releasedate" |
These options can be set in the Navidrome configuration file (e.g., navidrome.toml) or via environment variables with the ND_ prefix (e.g., ND_SCANNER_ENABLED=false). For environment variables, dots in option names are replaced with underscores.
Conclusion
The Navidrome scanner represents a sophisticated system for efficiently managing music libraries. Its phase-based pipeline architecture, careful handling of edge cases, and performance optimizations allow it to handle libraries of significant size while maintaining data integrity and providing a responsive user experience.