mirror/arnis

Fork 0

mirror of https://github.com/louis-e/arnis.git synced 2026-02-12 15:42:48 -05:00

Files

louis-e 56fe1b9515 POC Parallel Processing

2026-01-27 22:50:00 +01:00

38 KiB

Raw Permalink Blame History

Parallel Region Processing Architecture

Executive Summary

This document outlines a comprehensive plan to parallelize the Arnis world generation pipeline by splitting large user-selected areas into smaller processing units (1 Minecraft region = 512×512 blocks per unit). The goal is to:

Reduce memory usage by ~90% by processing and flushing regions incrementally
Utilize multiple CPU cores for parallel generation
Maintain visual consistency across region boundaries (colors, elevation, etc.)

Current Architecture Analysis

Processing Pipeline Overview

┌─────────────────────────────────────────────────────────────────────────────┐
│                           CURRENT PROCESSING FLOW                            │
├─────────────────────────────────────────────────────────────────────────────┤
│                                                                              │
│  [1/7] Fetch Data (retrieve_data.rs)                                        │
│     └── Downloads OSM data for entire bbox from Overpass API                 │
│     └── Single HTTP request with full bounding box                          │
│                                                                              │
│  [2/7] Parse Data (osm_parser.rs)                                           │
│     └── Transforms lat/lon → Minecraft X/Z coordinates                      │
│     └── Clips ways/relations to bounding box (clipping.rs)                  │
│     └── Sorts elements by priority                                          │
│                                                                              │
│  [3/7] Fetch Elevation (ground.rs / elevation_data.rs)                      │
│     └── Downloads Terrarium tiles for entire bbox                           │
│     └── Builds height grid matching world dimensions                        │
│                                                                              │
│  [4/7] Process Data (data_processing.rs)                                    │
│     └── Pre-computes flood fills in parallel (floodfill_cache.rs)          │
│     └── Builds highway connectivity map                                     │
│     └── Collects building footprints                                        │
│                                                                              │
│  [5/7] Process Terrain + Elements (data_processing.rs)                      │
│     └── Iterates ALL elements sequentially                                  │
│     └── Calls element_processing/* for each element type                    │
│     └── Places blocks via WorldEditor                                       │
│                                                                              │
│  [6/7] Generate Ground (data_processing.rs)                                 │
│     └── Iterates ALL blocks in bbox                                         │
│     └── Sets grass, dirt, stone, bedrock layers                            │
│                                                                              │
│  [7/7] Save World (world_editor/mod.rs → java.rs)                          │
│     └── Iterates ALL regions in memory                                      │
│     └── Writes .mca files in parallel                                       │
│                                                                              │
└─────────────────────────────────────────────────────────────────────────────┘

Key Data Structures and Memory Usage

WorldToModify (world_editor/common.rs)

pub struct WorldToModify {
    pub regions: FnvHashMap<(i32, i32), RegionToModify>,  // Key: (region_x, region_z)
}

pub struct RegionToModify {
    pub chunks: FnvHashMap<(i32, i32), ChunkToModify>,    // 32×32 chunks per region
}

pub struct ChunkToModify {
    pub sections: FnvHashMap<i8, SectionToModify>,       // 24 sections per chunk (-4 to 19)
}

pub struct SectionToModify {
    pub blocks: [Block; 4096],                            // 16×16×16 = 4096 blocks
    pub properties: FnvHashMap<usize, Value>,            // Block properties (stairs, slabs, etc.)
}

Memory estimate per region:

Section: ~4KB (blocks) + ~variable (properties)
Chunk: ~24 sections × 4KB = ~96KB minimum, typically ~200-500KB with properties
Region: ~1024 chunks × 300KB = ~300MB per region
For a 10×10 region area: ~30GB of memory required!

Why Elements Are "Scattered"

The current design processes elements in OSM priority order (entrance → building → highway → waterway → water → barrier → other), NOT by spatial location. This means:

A building in region (0,0) might be followed by a highway in region (5,5)
Each set_block() call potentially accesses different regions
ALL regions must remain in memory until the end because any element might touch any region

Proposed Architecture: Region-Based Parallel Processing

Core Concept

Split the user-selected area into processing units of 1 Minecraft region each (512×512 blocks = 32×32 chunks). Process each unit independently in parallel, then flush to disk immediately.

┌─────────────────────────────────────────────────────────────────────────────┐
│                      PROPOSED PARALLEL PROCESSING FLOW                       │
├─────────────────────────────────────────────────────────────────────────────┤
│                                                                              │
│  GLOBAL PHASE (Run Once for Entire Area)                                    │
│  ═══════════════════════════════════════                                    │
│                                                                              │
│  [1] Fetch Elevation Data for ENTIRE bbox                                   │
│      └── Must be consistent across all units                                │
│      └── Store as shared read-only Arc<Ground>                              │
│                                                                              │
│  [2] Compute Processing Unit Grid                                           │
│      └── Divide bbox into N×N region units                                  │
│      └── Create sub-bboxes with small overlap for boundary elements         │
│                                                                              │
│  PARALLEL PHASE (Per Processing Unit)                                       │
│  ═════════════════════════════════════                                      │
│                                                                              │
│  For each processing unit (in parallel, using N-1 CPU cores):               │
│                                                                              │
│    [3] Fetch OSM Data for Unit's Sub-BBox                                   │
│        └── Separate Overpass API query per unit                             │
│        └── Include small buffer zone for boundary elements                  │
│                                                                              │
│    [4] Parse & Clip Elements to Unit Bounds                                 │
│        └── Same as current, but for smaller area                            │
│                                                                              │
│    [5] Pre-compute Flood Fills                                              │
│        └── Only for elements in this unit                                   │
│                                                                              │
│    [6] Process Elements                                                     │
│        └── Generate buildings, roads, etc.                                  │
│        └── Use deterministic RNG keyed by element ID                        │
│                                                                              │
│    [7] Generate Ground Layer                                                │
│        └── Only for this unit's blocks                                      │
│                                                                              │
│    [8] Save Regions to Disk                                                 │
│        └── Write .mca files immediately                                     │
│        └── FREE MEMORY for this unit                                        │
│                                                                              │
│  FINALIZATION PHASE                                                         │
│  ══════════════════                                                         │
│                                                                              │
│  [9] Wait for all units to complete                                         │
│  [10] Generate map preview (optional)                                       │
│                                                                              │
└─────────────────────────────────────────────────────────────────────────────┘

Critical Considerations

1. Deterministic Randomness ✅ ALREADY IMPLEMENTED

The codebase already has deterministic_rng.rs which provides:

// Creates RNG seeded by element ID - same element always produces same random values
pub fn element_rng(element_id: u64) -> ChaCha8Rng {
    ChaCha8Rng::seed_from_u64(element_id)
}

// For coordinate-based randomness
pub fn coord_rng(x: i32, z: i32, element_id: u64) -> ChaCha8Rng

Impact on buildings crossing boundaries:

Building colors are chosen using element_rng(element.id) in buildings.rs
Even if a building is processed in two different units, SAME element ID → SAME color
The existing implementation already supports this use case!

Files using deterministic RNG:

element_processing/buildings.rs - wall colors, window styles, accent blocks
element_processing/natural.rs - grass/flower distribution
element_processing/tree.rs - tree variations

2. Elevation Data Consistency ⚠️ REQUIRES CHANGES

Current behavior:

Elevation is fetched once in ground.rs → Ground::new_enabled()
Height grid dimensions match the world's XZ dimensions
Lookup uses relative coordinates: ground.level(XZPoint::new(x - min_x, z - min_z))

Problem:

If each unit downloads its own elevation tiles, slight differences in tile boundaries or interpolation could cause height discontinuities at unit boundaries

Solution:

Download elevation ONCE for the entire area before parallel processing starts
Pass Arc<Ground> (read-only) to all processing units
The Ground::level() function already uses world-relative coordinates, so no changes needed

// Proposed: Global elevation fetch before parallel processing
let global_ground = Arc::new(Ground::new_enabled(&args.bbox, args.scale, args.ground_level));

// Each processing unit receives a clone of the Arc
for unit in processing_units {
    let ground_ref = Arc::clone(&global_ground);
    // spawn task with ground_ref
}

3. Element Clipping ⚠️ REQUIRES NEW LOGIC

Current clipping (clipping.rs):

Uses Sutherland-Hodgman algorithm to clip polygons to user's bbox
Works on the OUTER boundary of the entire selected area

New requirement:

Need to clip elements to each processing unit's internal boundary
But with OVERLAP to handle elements that straddle unit boundaries

Proposed approach:

┌─────────────────────────────────────────────────────────────────┐
│                    UNIT BOUNDARY HANDLING                        │
├─────────────────────────────────────────────────────────────────┤
│                                                                  │
│  Example: 4 processing units arranged in a 2×2 grid              │
│                                                                  │
│      Unit A          │       Unit B                              │
│   (regions 0,0-1,1)  │   (regions 2,0-3,1)                      │
│                      │                                           │
│   ┌──────────────────┼──────────────────┐                       │
│   │                  │                  │                       │
│   │     ████████     │                  │  ← Building straddles │
│   │     █ BLD  █─────┼──────────────────│    Unit A and B       │
│   │     ████████     │                  │                       │
│   │                  │                  │                       │
│   ├──────────────────┼──────────────────┤                       │
│   │                  │                  │                       │
│   │                  │                  │                       │
│   │                  │                  │                       │
│   │                  │                  │                       │
│   └──────────────────┴──────────────────┘                       │
│      Unit C          │       Unit D                              │
│   (regions 0,2-1,3)  │   (regions 2,2-3,3)                      │
│                                                                  │
└─────────────────────────────────────────────────────────────────┘

Strategy for boundary elements:

Expanded Fetch BBox: Each unit fetches OSM data with a buffer zone (e.g., +256 blocks)
Clip to Processing BBox: Clip elements to the unit's actual processing bounds
Process Normally: Elements partially in the unit are still processed, just clipped
Deterministic Results: Same element in adjacent units produces identical blocks due to RNG seeding

Example: Building straddling Unit A and B

Step	Unit A	Unit B
Fetch	Gets building (with buffer)	Gets building (with buffer)
Clip	Clips to Unit A bounds → left half	Clips to Unit B bounds → right half
Color	`element_rng(building_id)` → BLUE	`element_rng(building_id)` → BLUE
Place	Places left half in blue	Places right half in blue
Result	Seamless blue building across boundary

4. OSM Data Downloading Strategy ⚠️ REQUIRES CAREFUL DESIGN

Options:

Option A: Download Once, Distribute Elements (RECOMMENDED)

┌─────────────────────────────────────────────────────────────────┐
│                                                                  │
│  [1] Download ALL OSM data for entire bbox (single API call)    │
│  [2] Parse into ProcessedElements                                │
│  [3] For each processing unit:                                   │
│      └── Filter elements that intersect unit's bbox             │
│      └── Clip filtered elements to unit bounds                  │
│      └── Send to parallel processor                             │
│                                                                  │
└─────────────────────────────────────────────────────────────────┘

Pros:

Single Overpass API call (respects rate limits)
No duplicate data transfer
Elements are already parsed, just need filtering

Cons:

Must keep all elements in memory during distribution phase
For very large areas, this might still be memory-intensive

Option B: Download Per Unit (Simpler, Higher Bandwidth)

┌─────────────────────────────────────────────────────────────────┐
│                                                                  │
│  For each processing unit (sequentially or with rate limiting): │
│      [1] Download OSM data for unit's expanded bbox             │
│      [2] Parse into ProcessedElements                            │
│      [3] Send to parallel processor                             │
│                                                                  │
└─────────────────────────────────────────────────────────────────┘

Pros:

Lower peak memory usage
Simpler code structure

Cons:

Multiple API calls (may hit rate limits)
Duplicate data transfer for overlapping areas
Slower due to network latency

Recommendation: Option A with Streaming

Download once, but use a streaming approach to distribute elements to units:

// Pseudo-code for element distribution
fn distribute_elements_to_units(
    elements: Vec<ProcessedElement>,
    units: &[ProcessingUnit],
) -> Vec<Vec<ProcessedElement>> {
    let mut unit_elements = vec![Vec::new(); units.len()];
    
    for element in elements {
        let element_bbox = compute_element_bbox(&element);
        for (i, unit) in units.iter().enumerate() {
            if unit.expanded_bbox.intersects(&element_bbox) {
                // Clone element for each unit that needs it
                // (or use Arc for large elements)
                unit_elements[i].push(element.clone());
            }
        }
    }
    
    unit_elements
}

5. Flood Fill Cache ⚠️ REQUIRES CHANGES

Current behavior:

FloodFillCache::precompute() runs in parallel for ALL elements
Results are stored in a FnvHashMap<u64, Vec<(i32, i32)>>
Cache is consumed during sequential element processing

Problem:

If we process units in parallel, each unit needs its own flood fill cache
But we don't want to re-compute the same flood fills multiple times

Solution A: Per-Unit Flood Fill (Simpler)

// Each unit computes flood fills only for its elements
fn process_unit(unit_elements: Vec<ProcessedElement>) {
    let flood_fill_cache = FloodFillCache::precompute(&unit_elements, timeout);
    // Process elements using this cache
}

Pros: Simple, no coordination needed Cons: Elements at boundaries may be flood-filled twice

Solution B: Global Flood Fill + Distribution (More Complex)

// Compute flood fills globally, then distribute to units
let global_cache = FloodFillCache::precompute(&all_elements, timeout);

// For each unit, create a view into the global cache
let unit_caches: Vec<_> = units.iter()
    .map(|unit| global_cache.filter_for_bbox(&unit.bbox))
    .collect();

Recommendation: Start with Solution A. The overhead of re-computing some flood fills at boundaries is acceptable given the simplicity.

6. Building Footprints Bitmap ⚠️ REQUIRES CHANGES

Current behavior:

BuildingFootprintBitmap is a memory-efficient bitmap covering the entire world
Used to prevent trees from spawning inside buildings
Created AFTER flood fill precomputation

Problem:

With parallel processing, each unit only knows about buildings in its own area
A tree in Unit B might spawn inside a building that exists in Unit A (near boundary)

Solution:

Compute building footprints GLOBALLY before parallel processing
Use Arc<BuildingFootprintBitmap> shared across all units (read-only)

// Global building footprint computation
let all_building_coords = compute_all_building_footprints(&all_elements, &global_xzbbox);
let global_footprints = Arc::new(BuildingFootprintBitmap::from(all_building_coords));

// Each unit receives Arc clone
for unit in units {
    let footprints = Arc::clone(&global_footprints);
    // spawn task
}

7. Highway Connectivity ⚠️ REQUIRES CHANGES

Current behavior:

highways::build_highway_connectivity_map() creates a map of connected highway segments
Used for intersection detection and road marking placement

Problem:

Highway segments crossing unit boundaries won't see their full connectivity

Solution:

Build highway connectivity map GLOBALLY before parallel processing
Pass as Arc<HighwayConnectivityMap> to all units

8. Water Areas and Ring Merging ✅ ALREADY SUPPORTED

Current behavior:

Water relations contain multiple ways that must be merged into closed rings
merge_way_segments() in water_areas.rs handles this
Clipping happens AFTER ring merging via clip_water_ring_to_bbox()
Water uses inverse_floodfill() which iterates over bounding box (not flood fill)

Why water CAN be clipped per-unit:

Ring merging happens on the UNCLIPPED ways (preserved in osm_parser.rs)
After merging, clip_water_ring_to_bbox() clips the assembled polygon
The inverse_floodfill algorithm iterates block-by-block within bounds
Each unit can independently clip and fill its portion of a water body

No special handling needed - water relations work the same as other elements:

Distribute relation to units that intersect its bbox
Each unit clips to its own bounds
Each unit fills its portion independently

9. Element Priority Order ⚠️ MUST BE PRESERVED

Current behavior:

Elements are sorted by priority before processing (osm_parser.rs):

const PRIORITY_ORDER: [&str; 6] = [
    "entrance", "building", "highway", "waterway", "water", "barrier",
];

This ensures entrances are placed before buildings (so doors work)
Buildings before highways (so sidewalks don't overwrite buildings)

Requirement:

Each unit must process its elements in the SAME priority order
This is natural: just sort the unit's elements the same way

10. SPONGE Block as Placeholder ⚠️ MINOR CONSIDERATION

Current behavior:

SPONGE block is used as a blacklist marker in some places
Example: editor.set_block(actual_block, x, 0, z, None, Some(&[SPONGE]));
Prevents certain blocks from overwriting sponge blocks

Impact on parallel processing:

None - this is a per-block check, not cross-region coordination
Each unit handles its own sponge blocks independently

11. Tree Placement and Building Footprints ⚠️ REQUIRES GLOBAL FOOTPRINTS

Current behavior:

Trees check building_footprints.contains(x, z) before spawning
Prevents trees from appearing inside buildings
Uses coord_rng(x, z, element_id) for deterministic placement

Problem with per-unit footprints:

A tree near a unit boundary might not see a building from the adjacent unit
Could spawn a tree inside a building that exists in neighbor unit

Solution (already planned):

Compute building footprints GLOBALLY before parallel processing
Pass as Arc<BuildingFootprintBitmap> to all units
Tree placement will correctly avoid all buildings

12. Relations with Multiple Members Across Units ⚠️ REQUIRES CAREFUL HANDLING

Current behavior:

Relations (buildings, landuse, leisure, natural) process each member way
Member ways can be scattered across the entire bbox

Example: Building relation with courtyard

Building Relation:
  - Outer way 1 (in Unit A)
  - Outer way 2 (in Unit A and B)  ← straddles boundary
  - Inner way (courtyard, in Unit A)

Strategy:

Distribute entire relation to all units that any member touches
Each unit clips all members to its bounds
Each unit processes the clipped relation independently
Deterministic RNG ensures consistent colors/styles

Important: The relation-level tags (e.g., building:levels) must be preserved for all units processing that relation.

Proposed Processing Unit Structure

ProcessingUnit Definition

struct ProcessingUnit {
    /// Which region this unit covers (1 region per unit)
    region_x: i32,
    region_z: i32,
    
    /// Minecraft coordinate bounds for this unit (512×512 blocks)
    min_x: i32,  // region_x * 512
    max_x: i32,  // region_x * 512 + 511
    min_z: i32,  // region_z * 512
    max_z: i32,  // region_z * 512 + 511
    
    /// Expanded bounds for element fetching (includes buffer for boundary elements)
    fetch_min_x: i32,
    fetch_max_x: i32,
    fetch_min_z: i32,
    fetch_max_z: i32,
}

Unit Grid Calculation

fn compute_processing_units(
    global_xzbbox: &XZBBox,
    buffer_blocks: i32,     // e.g., 64-128 blocks overlap
) -> Vec<ProcessingUnit> {
    let blocks_per_region = 512;  // 32 chunks × 16 blocks
    
    // Calculate which regions are covered by the bbox
    let min_region_x = global_xzbbox.min_x() >> 9;  // divide by 512
    let max_region_x = global_xzbbox.max_x() >> 9;
    let min_region_z = global_xzbbox.min_z() >> 9;
    let max_region_z = global_xzbbox.max_z() >> 9;
    
    let mut units = Vec::new();
    
    // Create one unit per region
    for rx in min_region_x..=max_region_x {
        for rz in min_region_z..=max_region_z {
            // Compute Minecraft coordinate bounds for this region
            let min_x = rx * blocks_per_region;
            let max_x = min_x + blocks_per_region - 1;
            let min_z = rz * blocks_per_region;
            let max_z = min_z + blocks_per_region - 1;
            
            // Add buffer for fetch bounds (clamped to global bbox)
            let fetch_min_x = (min_x - buffer_blocks).max(global_xzbbox.min_x());
            let fetch_max_x = (max_x + buffer_blocks).min(global_xzbbox.max_x());
            let fetch_min_z = (min_z - buffer_blocks).max(global_xzbbox.min_z());
            let fetch_max_z = (max_z + buffer_blocks).min(global_xzbbox.max_z());
            
            units.push(ProcessingUnit {
                region_x: rx,
                region_z: rz,
                min_x, max_x, min_z, max_z,
                fetch_min_x, fetch_max_x, fetch_min_z, fetch_max_z,
            });
        }
    }
    
    units
}

Parallel Execution Strategy

fn process_units_parallel(
    units: Vec<ProcessingUnit>,
    elements: &[ProcessedElement],
    global_ground: Arc<Ground>,
    global_building_footprints: Arc<BuildingFootprintBitmap>,
    global_highway_connectivity: Arc<HighwayConnectivityMap>,
    args: &Args,
) {
    // Use CPU-1 cores for parallel processing
    let num_threads = std::thread::available_parallelism()
        .map(|n| n.get().saturating_sub(1).max(1))
        .unwrap_or(1);
    
    units.into_par_iter()
        .with_min_len(1)  // Process 1 unit per task
        .for_each(|unit| {
            // 1. Filter elements that intersect this unit's fetch bounds
            let unit_elements = filter_elements_for_unit(elements, &unit);
            
            // 2. Clip elements to unit's actual bounds
            let clipped_elements = clip_elements_to_unit(unit_elements, &unit);
            
            // 3. Create per-unit structures
            let unit_xzbbox = XZBBox::new(unit.min_x, unit.max_x, unit.min_z, unit.max_z);
            let mut editor = WorldEditor::new(args.path.clone(), &unit_xzbbox, ...);
            editor.set_ground(Arc::clone(&global_ground));
            
            // 4. Pre-compute flood fills for this unit's elements
            let flood_fill_cache = FloodFillCache::precompute(&clipped_elements, args.timeout.as_ref());
            
            // 5. Process elements (same as current, just for this unit)
            for element in clipped_elements {
                process_element(&mut editor, &element, ...);
            }
            
            // 6. Generate ground layer for this unit
            generate_ground_for_unit(&mut editor, &unit, &global_ground);
            
            // 7. Save region immediately and FREE MEMORY
            editor.save_single_region(unit.region_x, unit.region_z);
            drop(editor);  // Release memory
        });
}

Memory Usage Comparison

Understanding Minecraft Region Sizes

1 Region = 32×32 chunks = 512×512 blocks (horizontally)
1 Chunk  = 16×16×384 blocks (with sections from Y=-64 to Y=319)

Current Architecture (All Regions in Memory)

Stage	Memory Usage
OSM Data (parsed)	~50-200 MB
Flood Fill Cache	~100-500 MB
Building Footprints	~10-50 MB
WorldToModify (all regions)	~300 MB × N regions
Total for 100 regions	~30+ GB

Unit Size Analysis

The optimal unit size depends on balancing:

Memory per unit - Larger units = more memory
Parallelism overhead - Smaller units = more coordination
Boundary overhead - More units = more elements processed multiple times

Unit Size	Blocks	Memory per Unit	Parallel Units (8 cores)	Peak Memory
1 region (32×32 chunks)	512×512	~300 MB	7 units	~2.5 GB
2×2 regions	1024×1024	~1.2 GB	7 units	~9 GB
4×4 regions	2048×2048	~4.8 GB	7 units	~35 GB

Recommendation: 1 Region Per Unit

1 region per unit is optimal because:

Lowest memory footprint - Only ~300 MB per unit
Natural alignment - Regions are the atomic save unit in Minecraft (.mca files)
Maximum parallelism - More units = better CPU utilization
Simple boundary logic - No partial region handling

Memory calculation for 7 parallel units (8-core CPU, using 7):

Per-unit WorldToModify: ~300 MB
Per-unit flood fill cache: ~50 MB
Per-unit OSM elements: ~20 MB
Peak memory: ~370 MB × 7 = ~2.6 GB

Plus global shared data:

Elevation data: ~50-100 MB
Building footprints: ~10-50 MB
Highway connectivity: ~20-50 MB

Total peak: ~3 GB (vs ~30 GB for 100 regions currently!)

Why Not Smaller Than 1 Region?

Regions are the minimum save unit for Minecraft
Going smaller would require buffering partial regions
No memory benefit (still need full region in memory to save)

Implementation Phases

Phase 1: Refactor Global Data Preparation

Goal: Extract global computations that must run before parallel processing

Changes:

Move elevation fetching to a separate global phase
Move building footprint collection to global phase
Move highway connectivity map building to global phase
Create shared data structures with Arc<T>

Files affected:

data_processing.rs - restructure generate_world_with_options()
ground.rs - no changes, already returns Ground
floodfill_cache.rs - add method to collect building footprints globally
element_processing/highways.rs - extract connectivity map building

Phase 2: Implement Processing Unit Grid

Goal: Add logic to divide the world into processing units

Changes:

Create processing_unit.rs module
Implement grid computation
Implement element-to-unit distribution
Add unit-level bounding box clipping

New files:

src/processing_unit.rs

Phase 3: Parallelize Unit Processing

Goal: Process units in parallel using rayon

Changes:

Create per-unit WorldEditor instances
Implement unit processing function
Add parallel execution with CPU cap
Implement region saving after unit completion

Files affected:

data_processing.rs - main parallel loop
world_editor/mod.rs - support per-unit saving

Phase 4: Handle Boundary Cases

Goal: Ensure seamless results across unit boundaries

Changes:

Verify deterministic RNG produces identical results
Implement special handling for large water bodies
Add boundary verification tests
Optimize overlap buffer size

Files affected:

element_processing/water_areas.rs - global water handling
clipping.rs - potential optimizations

Phase 5: Optimize Memory Management

Goal: Fine-tune memory usage and parallelism

Changes:

Implement memory pressure monitoring
Add dynamic unit size adjustment
Optimize flood fill cache memory
Profile and optimize hot paths

Testing Strategy

Unit Tests

Deterministic RNG Test
- Process same building in two units
- Verify identical colors/styles
Elevation Consistency Test
- Check ground level at unit boundaries
- Verify no height discontinuities
Clipping Accuracy Test
- Verify elements clipped correctly at unit boundaries
- Check polygon integrity after clipping

Integration Tests

Small Area Test
- Process 2×2 region area
- Verify world loads correctly in Minecraft
Boundary Building Test
- Create world with buildings at unit boundaries
- Verify buildings are complete and correctly colored
Large Water Body Test
- Process area with lake spanning multiple units
- Verify water body is continuous

Performance Tests

Memory Usage Test
- Monitor peak memory during processing
- Compare with current architecture
CPU Utilization Test
- Verify parallel units use expected cores
- Measure speedup vs sequential processing

Configuration Options

Proposed CLI Arguments

/// Number of CPU cores to use for parallel processing (default: available - 1)
/// Set to 1 to disable parallel processing
#[arg(long, default_value_t = 0)]
pub parallel_cores: usize,

/// Buffer size for boundary overlap in blocks (default: 64)
/// Larger values ensure buildings at boundaries are complete but increase processing time
#[arg(long, default_value_t = 64)]
pub boundary_buffer: i32,

Risk Assessment

High Risk

Risk	Mitigation
Elevation discontinuities at boundaries	Use global elevation data (already planned)
Race conditions in file writing	Each unit writes different regions (no overlap)
Trees spawning inside buildings at boundaries	Use global building footprints bitmap

Medium Risk

Risk	Mitigation
Overpass API rate limiting	Download once globally, distribute elements
Complex relations broken at boundaries	Distribute full relation to all touching units
Highway connectivity missing at boundaries	Build connectivity map globally

Low Risk

Risk	Mitigation
Different random values at boundaries	Deterministic RNG already implemented
Performance regression	Benchmark before/after, make parallel optional
Water bodies split incorrectly	Water already supports clipping via `clip_water_ring_to_bbox`

Questions to Resolve

Should we support Bedrock format with this change?
- Bedrock writes to a single .mcworld file (LevelDB database)
- May need different handling (write to temp, merge at end)
- Could be deferred to a follow-up implementation
What buffer size for boundary overlap?
- Current thinking: 64-128 blocks should be sufficient
- Most buildings are smaller than this
- Larger buffers = more duplicate processing
Should flood fills be computed globally or per-unit?
- Per-unit is simpler and avoids coordination
- Some redundant computation at boundaries (acceptable)
- Recommendation: Start per-unit
How to report progress across parallel units?
- Current progress is linear (element by element)
- With parallel, need aggregated progress reporting
- Option: Track completed regions, report as percentage
Should we limit parallelism based on available RAM?
- Could detect system RAM and adjust parallel units
- Or just document memory requirements per parallel unit
- Recommendation: Start with CPU-1 cores, let users override

Summary

The proposed parallel region processing architecture will:

✅ Reduce memory usage by ~90% by processing 1 region at a time per unit (~300 MB vs ~30 GB for 100 regions)
✅ Utilize multiple CPU cores through rayon-based parallel processing (CPU-1 cores)
✅ Maintain visual consistency using deterministic RNG and global shared data
✅ Be backward compatible with a --no-parallel flag for the current behavior

The main implementation work is:

Refactoring to extract global computations (elevation, building footprints, highway connectivity)
Adding element-to-unit distribution logic with proper clipping
Per-unit WorldEditor instances with immediate region saving

The design is simpler than originally thought because:

Water relations already support clipping (no special handling)
Deterministic RNG already exists (no changes needed)
Priority order is preserved naturally (just sort per-unit)

Estimated implementation effort: 3-4 weeks for a fully tested solution.

38 KiB Raw Permalink Blame History Unescape Escape

Parallel Region Processing Architecture

Executive Summary

Current Architecture Analysis

Processing Pipeline Overview

Key Data Structures and Memory Usage

WorldToModify (world_editor/common.rs)

Why Elements Are "Scattered"

Proposed Architecture: Region-Based Parallel Processing

Core Concept

Critical Considerations

1. Deterministic Randomness ✅ ALREADY IMPLEMENTED

2. Elevation Data Consistency ⚠️ REQUIRES CHANGES

3. Element Clipping ⚠️ REQUIRES NEW LOGIC

4. OSM Data Downloading Strategy ⚠️ REQUIRES CAREFUL DESIGN

Option A: Download Once, Distribute Elements (RECOMMENDED)

Option B: Download Per Unit (Simpler, Higher Bandwidth)

Recommendation: Option A with Streaming

5. Flood Fill Cache ⚠️ REQUIRES CHANGES

6. Building Footprints Bitmap ⚠️ REQUIRES CHANGES

7. Highway Connectivity ⚠️ REQUIRES CHANGES

8. Water Areas and Ring Merging ✅ ALREADY SUPPORTED

9. Element Priority Order ⚠️ MUST BE PRESERVED

10. SPONGE Block as Placeholder ⚠️ MINOR CONSIDERATION

11. Tree Placement and Building Footprints ⚠️ REQUIRES GLOBAL FOOTPRINTS

12. Relations with Multiple Members Across Units ⚠️ REQUIRES CAREFUL HANDLING

Proposed Processing Unit Structure

ProcessingUnit Definition

Unit Grid Calculation

Parallel Execution Strategy

Memory Usage Comparison

Understanding Minecraft Region Sizes

Current Architecture (All Regions in Memory)

Unit Size Analysis

Recommendation: 1 Region Per Unit

Why Not Smaller Than 1 Region?

Implementation Phases

Phase 1: Refactor Global Data Preparation

Phase 2: Implement Processing Unit Grid

Phase 3: Parallelize Unit Processing

Phase 4: Handle Boundary Cases

Phase 5: Optimize Memory Management

Testing Strategy

Unit Tests

Integration Tests

Performance Tests

Configuration Options

Proposed CLI Arguments

Risk Assessment

High Risk

Medium Risk

Low Risk

Questions to Resolve

Summary

38 KiB

Raw Permalink Blame History