Files
arnis/docs/PARALLEL_REGION_PROCESSING.md
2026-01-27 22:50:00 +01:00

38 KiB
Raw Permalink Blame History

Parallel Region Processing Architecture

Executive Summary

This document outlines a comprehensive plan to parallelize the Arnis world generation pipeline by splitting large user-selected areas into smaller processing units (1 Minecraft region = 512×512 blocks per unit). The goal is to:

  1. Reduce memory usage by ~90% by processing and flushing regions incrementally
  2. Utilize multiple CPU cores for parallel generation
  3. Maintain visual consistency across region boundaries (colors, elevation, etc.)

Current Architecture Analysis

Processing Pipeline Overview

┌─────────────────────────────────────────────────────────────────────────────┐
│                           CURRENT PROCESSING FLOW                            │
├─────────────────────────────────────────────────────────────────────────────┤
│                                                                              │
│  [1/7] Fetch Data (retrieve_data.rs)                                        │
│     └── Downloads OSM data for entire bbox from Overpass API                 │
│     └── Single HTTP request with full bounding box                          │
│                                                                              │
│  [2/7] Parse Data (osm_parser.rs)                                           │
│     └── Transforms lat/lon → Minecraft X/Z coordinates                      │
│     └── Clips ways/relations to bounding box (clipping.rs)                  │
│     └── Sorts elements by priority                                          │
│                                                                              │
│  [3/7] Fetch Elevation (ground.rs / elevation_data.rs)                      │
│     └── Downloads Terrarium tiles for entire bbox                           │
│     └── Builds height grid matching world dimensions                        │
│                                                                              │
│  [4/7] Process Data (data_processing.rs)                                    │
│     └── Pre-computes flood fills in parallel (floodfill_cache.rs)          │
│     └── Builds highway connectivity map                                     │
│     └── Collects building footprints                                        │
│                                                                              │
│  [5/7] Process Terrain + Elements (data_processing.rs)                      │
│     └── Iterates ALL elements sequentially                                  │
│     └── Calls element_processing/* for each element type                    │
│     └── Places blocks via WorldEditor                                       │
│                                                                              │
│  [6/7] Generate Ground (data_processing.rs)                                 │
│     └── Iterates ALL blocks in bbox                                         │
│     └── Sets grass, dirt, stone, bedrock layers                            │
│                                                                              │
│  [7/7] Save World (world_editor/mod.rs → java.rs)                          │
│     └── Iterates ALL regions in memory                                      │
│     └── Writes .mca files in parallel                                       │
│                                                                              │
└─────────────────────────────────────────────────────────────────────────────┘

Key Data Structures and Memory Usage

WorldToModify (world_editor/common.rs)

pub struct WorldToModify {
    pub regions: FnvHashMap<(i32, i32), RegionToModify>,  // Key: (region_x, region_z)
}

pub struct RegionToModify {
    pub chunks: FnvHashMap<(i32, i32), ChunkToModify>,    // 32×32 chunks per region
}

pub struct ChunkToModify {
    pub sections: FnvHashMap<i8, SectionToModify>,       // 24 sections per chunk (-4 to 19)
}

pub struct SectionToModify {
    pub blocks: [Block; 4096],                            // 16×16×16 = 4096 blocks
    pub properties: FnvHashMap<usize, Value>,            // Block properties (stairs, slabs, etc.)
}

Memory estimate per region:

  • Section: ~4KB (blocks) + ~variable (properties)
  • Chunk: ~24 sections × 4KB = ~96KB minimum, typically ~200-500KB with properties
  • Region: ~1024 chunks × 300KB = ~300MB per region
  • For a 10×10 region area: ~30GB of memory required!

Why Elements Are "Scattered"

The current design processes elements in OSM priority order (entrance → building → highway → waterway → water → barrier → other), NOT by spatial location. This means:

  1. A building in region (0,0) might be followed by a highway in region (5,5)
  2. Each set_block() call potentially accesses different regions
  3. ALL regions must remain in memory until the end because any element might touch any region

Proposed Architecture: Region-Based Parallel Processing

Core Concept

Split the user-selected area into processing units of 1 Minecraft region each (512×512 blocks = 32×32 chunks). Process each unit independently in parallel, then flush to disk immediately.

┌─────────────────────────────────────────────────────────────────────────────┐
│                      PROPOSED PARALLEL PROCESSING FLOW                       │
├─────────────────────────────────────────────────────────────────────────────┤
│                                                                              │
│  GLOBAL PHASE (Run Once for Entire Area)                                    │
│  ═══════════════════════════════════════                                    │
│                                                                              │
│  [1] Fetch Elevation Data for ENTIRE bbox                                   │
│      └── Must be consistent across all units                                │
│      └── Store as shared read-only Arc<Ground>                              │
│                                                                              │
│  [2] Compute Processing Unit Grid                                           │
│      └── Divide bbox into N×N region units                                  │
│      └── Create sub-bboxes with small overlap for boundary elements         │
│                                                                              │
│  PARALLEL PHASE (Per Processing Unit)                                       │
│  ═════════════════════════════════════                                      │
│                                                                              │
│  For each processing unit (in parallel, using N-1 CPU cores):               │
│                                                                              │
│    [3] Fetch OSM Data for Unit's Sub-BBox                                   │
│        └── Separate Overpass API query per unit                             │
│        └── Include small buffer zone for boundary elements                  │
│                                                                              │
│    [4] Parse & Clip Elements to Unit Bounds                                 │
│        └── Same as current, but for smaller area                            │
│                                                                              │
│    [5] Pre-compute Flood Fills                                              │
│        └── Only for elements in this unit                                   │
│                                                                              │
│    [6] Process Elements                                                     │
│        └── Generate buildings, roads, etc.                                  │
│        └── Use deterministic RNG keyed by element ID                        │
│                                                                              │
│    [7] Generate Ground Layer                                                │
│        └── Only for this unit's blocks                                      │
│                                                                              │
│    [8] Save Regions to Disk                                                 │
│        └── Write .mca files immediately                                     │
│        └── FREE MEMORY for this unit                                        │
│                                                                              │
│  FINALIZATION PHASE                                                         │
│  ══════════════════                                                         │
│                                                                              │
│  [9] Wait for all units to complete                                         │
│  [10] Generate map preview (optional)                                       │
│                                                                              │
└─────────────────────────────────────────────────────────────────────────────┘

Critical Considerations

1. Deterministic Randomness ALREADY IMPLEMENTED

The codebase already has deterministic_rng.rs which provides:

// Creates RNG seeded by element ID - same element always produces same random values
pub fn element_rng(element_id: u64) -> ChaCha8Rng {
    ChaCha8Rng::seed_from_u64(element_id)
}

// For coordinate-based randomness
pub fn coord_rng(x: i32, z: i32, element_id: u64) -> ChaCha8Rng

Impact on buildings crossing boundaries:

  • Building colors are chosen using element_rng(element.id) in buildings.rs
  • Even if a building is processed in two different units, SAME element ID → SAME color
  • The existing implementation already supports this use case!

Files using deterministic RNG:

  • element_processing/buildings.rs - wall colors, window styles, accent blocks
  • element_processing/natural.rs - grass/flower distribution
  • element_processing/tree.rs - tree variations

2. Elevation Data Consistency ⚠️ REQUIRES CHANGES

Current behavior:

  • Elevation is fetched once in ground.rsGround::new_enabled()
  • Height grid dimensions match the world's XZ dimensions
  • Lookup uses relative coordinates: ground.level(XZPoint::new(x - min_x, z - min_z))

Problem:

  • If each unit downloads its own elevation tiles, slight differences in tile boundaries or interpolation could cause height discontinuities at unit boundaries

Solution:

  1. Download elevation ONCE for the entire area before parallel processing starts
  2. Pass Arc<Ground> (read-only) to all processing units
  3. The Ground::level() function already uses world-relative coordinates, so no changes needed
// Proposed: Global elevation fetch before parallel processing
let global_ground = Arc::new(Ground::new_enabled(&args.bbox, args.scale, args.ground_level));

// Each processing unit receives a clone of the Arc
for unit in processing_units {
    let ground_ref = Arc::clone(&global_ground);
    // spawn task with ground_ref
}

3. Element Clipping ⚠️ REQUIRES NEW LOGIC

Current clipping (clipping.rs):

  • Uses Sutherland-Hodgman algorithm to clip polygons to user's bbox
  • Works on the OUTER boundary of the entire selected area

New requirement:

  • Need to clip elements to each processing unit's internal boundary
  • But with OVERLAP to handle elements that straddle unit boundaries

Proposed approach:

┌─────────────────────────────────────────────────────────────────┐
│                    UNIT BOUNDARY HANDLING                        │
├─────────────────────────────────────────────────────────────────┤
│                                                                  │
│  Example: 4 processing units arranged in a 2×2 grid              │
│                                                                  │
│      Unit A          │       Unit B                              │
│   (regions 0,0-1,1)  │   (regions 2,0-3,1)                      │
│                      │                                           │
│   ┌──────────────────┼──────────────────┐                       │
│   │                  │                  │                       │
│   │     ████████     │                  │  ← Building straddles │
│   │     █ BLD  █─────┼──────────────────│    Unit A and B       │
│   │     ████████     │                  │                       │
│   │                  │                  │                       │
│   ├──────────────────┼──────────────────┤                       │
│   │                  │                  │                       │
│   │                  │                  │                       │
│   │                  │                  │                       │
│   │                  │                  │                       │
│   └──────────────────┴──────────────────┘                       │
│      Unit C          │       Unit D                              │
│   (regions 0,2-1,3)  │   (regions 2,2-3,3)                      │
│                                                                  │
└─────────────────────────────────────────────────────────────────┘

Strategy for boundary elements:

  1. Expanded Fetch BBox: Each unit fetches OSM data with a buffer zone (e.g., +256 blocks)
  2. Clip to Processing BBox: Clip elements to the unit's actual processing bounds
  3. Process Normally: Elements partially in the unit are still processed, just clipped
  4. Deterministic Results: Same element in adjacent units produces identical blocks due to RNG seeding

Example: Building straddling Unit A and B

Step Unit A Unit B
Fetch Gets building (with buffer) Gets building (with buffer)
Clip Clips to Unit A bounds → left half Clips to Unit B bounds → right half
Color element_rng(building_id) → BLUE element_rng(building_id) → BLUE
Place Places left half in blue Places right half in blue
Result Seamless blue building across boundary

4. OSM Data Downloading Strategy ⚠️ REQUIRES CAREFUL DESIGN

Options:

┌─────────────────────────────────────────────────────────────────┐
│                                                                  │
│  [1] Download ALL OSM data for entire bbox (single API call)    │
│  [2] Parse into ProcessedElements                                │
│  [3] For each processing unit:                                   │
│      └── Filter elements that intersect unit's bbox             │
│      └── Clip filtered elements to unit bounds                  │
│      └── Send to parallel processor                             │
│                                                                  │
└─────────────────────────────────────────────────────────────────┘

Pros:

  • Single Overpass API call (respects rate limits)
  • No duplicate data transfer
  • Elements are already parsed, just need filtering

Cons:

  • Must keep all elements in memory during distribution phase
  • For very large areas, this might still be memory-intensive

Option B: Download Per Unit (Simpler, Higher Bandwidth)

┌─────────────────────────────────────────────────────────────────┐
│                                                                  │
│  For each processing unit (sequentially or with rate limiting): │
│      [1] Download OSM data for unit's expanded bbox             │
│      [2] Parse into ProcessedElements                            │
│      [3] Send to parallel processor                             │
│                                                                  │
└─────────────────────────────────────────────────────────────────┘

Pros:

  • Lower peak memory usage
  • Simpler code structure

Cons:

  • Multiple API calls (may hit rate limits)
  • Duplicate data transfer for overlapping areas
  • Slower due to network latency

Recommendation: Option A with Streaming

Download once, but use a streaming approach to distribute elements to units:

// Pseudo-code for element distribution
fn distribute_elements_to_units(
    elements: Vec<ProcessedElement>,
    units: &[ProcessingUnit],
) -> Vec<Vec<ProcessedElement>> {
    let mut unit_elements = vec![Vec::new(); units.len()];
    
    for element in elements {
        let element_bbox = compute_element_bbox(&element);
        for (i, unit) in units.iter().enumerate() {
            if unit.expanded_bbox.intersects(&element_bbox) {
                // Clone element for each unit that needs it
                // (or use Arc for large elements)
                unit_elements[i].push(element.clone());
            }
        }
    }
    
    unit_elements
}

5. Flood Fill Cache ⚠️ REQUIRES CHANGES

Current behavior:

  • FloodFillCache::precompute() runs in parallel for ALL elements
  • Results are stored in a FnvHashMap<u64, Vec<(i32, i32)>>
  • Cache is consumed during sequential element processing

Problem:

  • If we process units in parallel, each unit needs its own flood fill cache
  • But we don't want to re-compute the same flood fills multiple times

Solution A: Per-Unit Flood Fill (Simpler)

// Each unit computes flood fills only for its elements
fn process_unit(unit_elements: Vec<ProcessedElement>) {
    let flood_fill_cache = FloodFillCache::precompute(&unit_elements, timeout);
    // Process elements using this cache
}

Pros: Simple, no coordination needed Cons: Elements at boundaries may be flood-filled twice

Solution B: Global Flood Fill + Distribution (More Complex)

// Compute flood fills globally, then distribute to units
let global_cache = FloodFillCache::precompute(&all_elements, timeout);

// For each unit, create a view into the global cache
let unit_caches: Vec<_> = units.iter()
    .map(|unit| global_cache.filter_for_bbox(&unit.bbox))
    .collect();

Recommendation: Start with Solution A. The overhead of re-computing some flood fills at boundaries is acceptable given the simplicity.

6. Building Footprints Bitmap ⚠️ REQUIRES CHANGES

Current behavior:

  • BuildingFootprintBitmap is a memory-efficient bitmap covering the entire world
  • Used to prevent trees from spawning inside buildings
  • Created AFTER flood fill precomputation

Problem:

  • With parallel processing, each unit only knows about buildings in its own area
  • A tree in Unit B might spawn inside a building that exists in Unit A (near boundary)

Solution:

  • Compute building footprints GLOBALLY before parallel processing
  • Use Arc<BuildingFootprintBitmap> shared across all units (read-only)
// Global building footprint computation
let all_building_coords = compute_all_building_footprints(&all_elements, &global_xzbbox);
let global_footprints = Arc::new(BuildingFootprintBitmap::from(all_building_coords));

// Each unit receives Arc clone
for unit in units {
    let footprints = Arc::clone(&global_footprints);
    // spawn task
}

7. Highway Connectivity ⚠️ REQUIRES CHANGES

Current behavior:

  • highways::build_highway_connectivity_map() creates a map of connected highway segments
  • Used for intersection detection and road marking placement

Problem:

  • Highway segments crossing unit boundaries won't see their full connectivity

Solution:

  • Build highway connectivity map GLOBALLY before parallel processing
  • Pass as Arc<HighwayConnectivityMap> to all units

8. Water Areas and Ring Merging ALREADY SUPPORTED

Current behavior:

  • Water relations contain multiple ways that must be merged into closed rings
  • merge_way_segments() in water_areas.rs handles this
  • Clipping happens AFTER ring merging via clip_water_ring_to_bbox()
  • Water uses inverse_floodfill() which iterates over bounding box (not flood fill)

Why water CAN be clipped per-unit:

  1. Ring merging happens on the UNCLIPPED ways (preserved in osm_parser.rs)
  2. After merging, clip_water_ring_to_bbox() clips the assembled polygon
  3. The inverse_floodfill algorithm iterates block-by-block within bounds
  4. Each unit can independently clip and fill its portion of a water body

No special handling needed - water relations work the same as other elements:

  • Distribute relation to units that intersect its bbox
  • Each unit clips to its own bounds
  • Each unit fills its portion independently

9. Element Priority Order ⚠️ MUST BE PRESERVED

Current behavior:

  • Elements are sorted by priority before processing (osm_parser.rs):
    const PRIORITY_ORDER: [&str; 6] = [
        "entrance", "building", "highway", "waterway", "water", "barrier",
    ];
    
  • This ensures entrances are placed before buildings (so doors work)
  • Buildings before highways (so sidewalks don't overwrite buildings)

Requirement:

  • Each unit must process its elements in the SAME priority order
  • This is natural: just sort the unit's elements the same way

10. SPONGE Block as Placeholder ⚠️ MINOR CONSIDERATION

Current behavior:

  • SPONGE block is used as a blacklist marker in some places
  • Example: editor.set_block(actual_block, x, 0, z, None, Some(&[SPONGE]));
  • Prevents certain blocks from overwriting sponge blocks

Impact on parallel processing:

  • None - this is a per-block check, not cross-region coordination
  • Each unit handles its own sponge blocks independently

11. Tree Placement and Building Footprints ⚠️ REQUIRES GLOBAL FOOTPRINTS

Current behavior:

  • Trees check building_footprints.contains(x, z) before spawning
  • Prevents trees from appearing inside buildings
  • Uses coord_rng(x, z, element_id) for deterministic placement

Problem with per-unit footprints:

  • A tree near a unit boundary might not see a building from the adjacent unit
  • Could spawn a tree inside a building that exists in neighbor unit

Solution (already planned):

  • Compute building footprints GLOBALLY before parallel processing
  • Pass as Arc<BuildingFootprintBitmap> to all units
  • Tree placement will correctly avoid all buildings

12. Relations with Multiple Members Across Units ⚠️ REQUIRES CAREFUL HANDLING

Current behavior:

  • Relations (buildings, landuse, leisure, natural) process each member way
  • Member ways can be scattered across the entire bbox

Example: Building relation with courtyard

Building Relation:
  - Outer way 1 (in Unit A)
  - Outer way 2 (in Unit A and B)  ← straddles boundary
  - Inner way (courtyard, in Unit A)

Strategy:

  1. Distribute entire relation to all units that any member touches
  2. Each unit clips all members to its bounds
  3. Each unit processes the clipped relation independently
  4. Deterministic RNG ensures consistent colors/styles

Important: The relation-level tags (e.g., building:levels) must be preserved for all units processing that relation.


Proposed Processing Unit Structure

ProcessingUnit Definition

struct ProcessingUnit {
    /// Which region this unit covers (1 region per unit)
    region_x: i32,
    region_z: i32,
    
    /// Minecraft coordinate bounds for this unit (512×512 blocks)
    min_x: i32,  // region_x * 512
    max_x: i32,  // region_x * 512 + 511
    min_z: i32,  // region_z * 512
    max_z: i32,  // region_z * 512 + 511
    
    /// Expanded bounds for element fetching (includes buffer for boundary elements)
    fetch_min_x: i32,
    fetch_max_x: i32,
    fetch_min_z: i32,
    fetch_max_z: i32,
}

Unit Grid Calculation

fn compute_processing_units(
    global_xzbbox: &XZBBox,
    buffer_blocks: i32,     // e.g., 64-128 blocks overlap
) -> Vec<ProcessingUnit> {
    let blocks_per_region = 512;  // 32 chunks × 16 blocks
    
    // Calculate which regions are covered by the bbox
    let min_region_x = global_xzbbox.min_x() >> 9;  // divide by 512
    let max_region_x = global_xzbbox.max_x() >> 9;
    let min_region_z = global_xzbbox.min_z() >> 9;
    let max_region_z = global_xzbbox.max_z() >> 9;
    
    let mut units = Vec::new();
    
    // Create one unit per region
    for rx in min_region_x..=max_region_x {
        for rz in min_region_z..=max_region_z {
            // Compute Minecraft coordinate bounds for this region
            let min_x = rx * blocks_per_region;
            let max_x = min_x + blocks_per_region - 1;
            let min_z = rz * blocks_per_region;
            let max_z = min_z + blocks_per_region - 1;
            
            // Add buffer for fetch bounds (clamped to global bbox)
            let fetch_min_x = (min_x - buffer_blocks).max(global_xzbbox.min_x());
            let fetch_max_x = (max_x + buffer_blocks).min(global_xzbbox.max_x());
            let fetch_min_z = (min_z - buffer_blocks).max(global_xzbbox.min_z());
            let fetch_max_z = (max_z + buffer_blocks).min(global_xzbbox.max_z());
            
            units.push(ProcessingUnit {
                region_x: rx,
                region_z: rz,
                min_x, max_x, min_z, max_z,
                fetch_min_x, fetch_max_x, fetch_min_z, fetch_max_z,
            });
        }
    }
    
    units
}

Parallel Execution Strategy

fn process_units_parallel(
    units: Vec<ProcessingUnit>,
    elements: &[ProcessedElement],
    global_ground: Arc<Ground>,
    global_building_footprints: Arc<BuildingFootprintBitmap>,
    global_highway_connectivity: Arc<HighwayConnectivityMap>,
    args: &Args,
) {
    // Use CPU-1 cores for parallel processing
    let num_threads = std::thread::available_parallelism()
        .map(|n| n.get().saturating_sub(1).max(1))
        .unwrap_or(1);
    
    units.into_par_iter()
        .with_min_len(1)  // Process 1 unit per task
        .for_each(|unit| {
            // 1. Filter elements that intersect this unit's fetch bounds
            let unit_elements = filter_elements_for_unit(elements, &unit);
            
            // 2. Clip elements to unit's actual bounds
            let clipped_elements = clip_elements_to_unit(unit_elements, &unit);
            
            // 3. Create per-unit structures
            let unit_xzbbox = XZBBox::new(unit.min_x, unit.max_x, unit.min_z, unit.max_z);
            let mut editor = WorldEditor::new(args.path.clone(), &unit_xzbbox, ...);
            editor.set_ground(Arc::clone(&global_ground));
            
            // 4. Pre-compute flood fills for this unit's elements
            let flood_fill_cache = FloodFillCache::precompute(&clipped_elements, args.timeout.as_ref());
            
            // 5. Process elements (same as current, just for this unit)
            for element in clipped_elements {
                process_element(&mut editor, &element, ...);
            }
            
            // 6. Generate ground layer for this unit
            generate_ground_for_unit(&mut editor, &unit, &global_ground);
            
            // 7. Save region immediately and FREE MEMORY
            editor.save_single_region(unit.region_x, unit.region_z);
            drop(editor);  // Release memory
        });
}

Memory Usage Comparison

Understanding Minecraft Region Sizes

1 Region = 32×32 chunks = 512×512 blocks (horizontally)
1 Chunk  = 16×16×384 blocks (with sections from Y=-64 to Y=319)

Current Architecture (All Regions in Memory)

Stage Memory Usage
OSM Data (parsed) ~50-200 MB
Flood Fill Cache ~100-500 MB
Building Footprints ~10-50 MB
WorldToModify (all regions) ~300 MB × N regions
Total for 100 regions ~30+ GB

Unit Size Analysis

The optimal unit size depends on balancing:

  1. Memory per unit - Larger units = more memory
  2. Parallelism overhead - Smaller units = more coordination
  3. Boundary overhead - More units = more elements processed multiple times
Unit Size Blocks Memory per Unit Parallel Units (8 cores) Peak Memory
1 region (32×32 chunks) 512×512 ~300 MB 7 units ~2.5 GB
2×2 regions 1024×1024 ~1.2 GB 7 units ~9 GB
4×4 regions 2048×2048 ~4.8 GB 7 units ~35 GB

Recommendation: 1 Region Per Unit

1 region per unit is optimal because:

  1. Lowest memory footprint - Only ~300 MB per unit
  2. Natural alignment - Regions are the atomic save unit in Minecraft (.mca files)
  3. Maximum parallelism - More units = better CPU utilization
  4. Simple boundary logic - No partial region handling

Memory calculation for 7 parallel units (8-core CPU, using 7):

  • Per-unit WorldToModify: ~300 MB
  • Per-unit flood fill cache: ~50 MB
  • Per-unit OSM elements: ~20 MB
  • Peak memory: ~370 MB × 7 = ~2.6 GB

Plus global shared data:

  • Elevation data: ~50-100 MB
  • Building footprints: ~10-50 MB
  • Highway connectivity: ~20-50 MB

Total peak: ~3 GB (vs ~30 GB for 100 regions currently!)

Why Not Smaller Than 1 Region?

  • Regions are the minimum save unit for Minecraft
  • Going smaller would require buffering partial regions
  • No memory benefit (still need full region in memory to save)

Implementation Phases

Phase 1: Refactor Global Data Preparation

Goal: Extract global computations that must run before parallel processing

Changes:

  1. Move elevation fetching to a separate global phase
  2. Move building footprint collection to global phase
  3. Move highway connectivity map building to global phase
  4. Create shared data structures with Arc<T>

Files affected:

  • data_processing.rs - restructure generate_world_with_options()
  • ground.rs - no changes, already returns Ground
  • floodfill_cache.rs - add method to collect building footprints globally
  • element_processing/highways.rs - extract connectivity map building

Phase 2: Implement Processing Unit Grid

Goal: Add logic to divide the world into processing units

Changes:

  1. Create processing_unit.rs module
  2. Implement grid computation
  3. Implement element-to-unit distribution
  4. Add unit-level bounding box clipping

New files:

  • src/processing_unit.rs

Phase 3: Parallelize Unit Processing

Goal: Process units in parallel using rayon

Changes:

  1. Create per-unit WorldEditor instances
  2. Implement unit processing function
  3. Add parallel execution with CPU cap
  4. Implement region saving after unit completion

Files affected:

  • data_processing.rs - main parallel loop
  • world_editor/mod.rs - support per-unit saving

Phase 4: Handle Boundary Cases

Goal: Ensure seamless results across unit boundaries

Changes:

  1. Verify deterministic RNG produces identical results
  2. Implement special handling for large water bodies
  3. Add boundary verification tests
  4. Optimize overlap buffer size

Files affected:

  • element_processing/water_areas.rs - global water handling
  • clipping.rs - potential optimizations

Phase 5: Optimize Memory Management

Goal: Fine-tune memory usage and parallelism

Changes:

  1. Implement memory pressure monitoring
  2. Add dynamic unit size adjustment
  3. Optimize flood fill cache memory
  4. Profile and optimize hot paths

Testing Strategy

Unit Tests

  1. Deterministic RNG Test

    • Process same building in two units
    • Verify identical colors/styles
  2. Elevation Consistency Test

    • Check ground level at unit boundaries
    • Verify no height discontinuities
  3. Clipping Accuracy Test

    • Verify elements clipped correctly at unit boundaries
    • Check polygon integrity after clipping

Integration Tests

  1. Small Area Test

    • Process 2×2 region area
    • Verify world loads correctly in Minecraft
  2. Boundary Building Test

    • Create world with buildings at unit boundaries
    • Verify buildings are complete and correctly colored
  3. Large Water Body Test

    • Process area with lake spanning multiple units
    • Verify water body is continuous

Performance Tests

  1. Memory Usage Test

    • Monitor peak memory during processing
    • Compare with current architecture
  2. CPU Utilization Test

    • Verify parallel units use expected cores
    • Measure speedup vs sequential processing

Configuration Options

Proposed CLI Arguments

/// Number of CPU cores to use for parallel processing (default: available - 1)
/// Set to 1 to disable parallel processing
#[arg(long, default_value_t = 0)]
pub parallel_cores: usize,

/// Buffer size for boundary overlap in blocks (default: 64)
/// Larger values ensure buildings at boundaries are complete but increase processing time
#[arg(long, default_value_t = 64)]
pub boundary_buffer: i32,

Risk Assessment

High Risk

Risk Mitigation
Elevation discontinuities at boundaries Use global elevation data (already planned)
Race conditions in file writing Each unit writes different regions (no overlap)
Trees spawning inside buildings at boundaries Use global building footprints bitmap

Medium Risk

Risk Mitigation
Overpass API rate limiting Download once globally, distribute elements
Complex relations broken at boundaries Distribute full relation to all touching units
Highway connectivity missing at boundaries Build connectivity map globally

Low Risk

Risk Mitigation
Different random values at boundaries Deterministic RNG already implemented
Performance regression Benchmark before/after, make parallel optional
Water bodies split incorrectly Water already supports clipping via clip_water_ring_to_bbox

Questions to Resolve

  1. Should we support Bedrock format with this change?

    • Bedrock writes to a single .mcworld file (LevelDB database)
    • May need different handling (write to temp, merge at end)
    • Could be deferred to a follow-up implementation
  2. What buffer size for boundary overlap?

    • Current thinking: 64-128 blocks should be sufficient
    • Most buildings are smaller than this
    • Larger buffers = more duplicate processing
  3. Should flood fills be computed globally or per-unit?

    • Per-unit is simpler and avoids coordination
    • Some redundant computation at boundaries (acceptable)
    • Recommendation: Start per-unit
  4. How to report progress across parallel units?

    • Current progress is linear (element by element)
    • With parallel, need aggregated progress reporting
    • Option: Track completed regions, report as percentage
  5. Should we limit parallelism based on available RAM?

    • Could detect system RAM and adjust parallel units
    • Or just document memory requirements per parallel unit
    • Recommendation: Start with CPU-1 cores, let users override

Summary

The proposed parallel region processing architecture will:

  1. Reduce memory usage by ~90% by processing 1 region at a time per unit (~300 MB vs ~30 GB for 100 regions)
  2. Utilize multiple CPU cores through rayon-based parallel processing (CPU-1 cores)
  3. Maintain visual consistency using deterministic RNG and global shared data
  4. Be backward compatible with a --no-parallel flag for the current behavior

The main implementation work is:

  • Refactoring to extract global computations (elevation, building footprints, highway connectivity)
  • Adding element-to-unit distribution logic with proper clipping
  • Per-unit WorldEditor instances with immediate region saving

The design is simpler than originally thought because:

  • Water relations already support clipping (no special handling)
  • Deterministic RNG already exists (no changes needed)
  • Priority order is preserved naturally (just sort per-unit)

Estimated implementation effort: 3-4 weeks for a fully tested solution.