Add complete spell visual pipeline resolving the DBC chain
(Spell → SpellVisual → SpellVisualKit → SpellVisualEffectName → M2)
with precast/cast/impact phases, bone-attached positioning, and
automatic dual-hand mirroring.
Ribbon rendering fixes:
- Parse visibility track as uint8 (was read as float, suppressing
all ribbon edges due to ~1.4e-45 failing the >0.5 check)
- Filter garbage emitters with bone=UINT_MAX unconditionally
- Guard against NaN spine positions from corrupt bone data
- Resolve ribbon textures via direct index, not textureLookup table
- Fall back to bone 0 when ribbon bone index is out of range
Particle rendering fixes:
- Reduce spell particle scale from 5x to 1.5x (was oversized)
- Exempt spell effect instances from position-based deduplication
Spell handler integration:
- Trigger precast visuals on SMSG_SPELL_START with server castTimeMs
- Trigger cast/impact visuals on SMSG_SPELL_GO
- Cancel precast visuals on cast interrupt/failure/movement
M2 classifier expansion:
- Add AmbientEmitterType enum for sound system integration
- Add 20+ foliage tokens, 4 spell effect tokens, isSmallFoliage flag
- Add markModelAsSpellEffect() to override disableAnimation
DBC layouts:
- Add SpellVisualID field to Spell.dbc for all expansion configs
Signed-off-by: Pavel Okhlopkov <pavel.okhlopkov@flant.com>
All M2 pipelines used VK_CULL_MODE_NONE, so back-facing polygons always
rendered. On NPCs whose torso meshes are single-layer geometry this
made the interior cavity visible through the back.
Create backface-culled pipeline variants (VK_CULL_MODE_BACK_BIT) and
select them at draw time unless the material has the TwoSided flag
(0x04). Foliage/ground-detail forceCutout batches and the shadow
pipeline keep VK_CULL_MODE_NONE since those cards are inherently
two-sided.
Implement GPU-driven Hierarchical-Z occlusion culling for M2 doodads
using a depth pyramid built from the previous frame's depth buffer.
The cull shader projects bounding spheres via prevViewProj (temporal
reprojection) and samples the HiZ pyramid to reject hidden objects
before the main render pass.
Key implementation details:
- Separate early compute submission (beginSingleTimeCommands + fence
wait) eliminates 2-frame visibility staleness
- Conservative safeguards prevent false culls: screen-edge guard,
full VP row-vector AABB projection (Cauchy-Schwarz), 50% sphere
inflation, depth bias, mip+1, min screen size threshold, camera
motion dampening (auto-disable on fast rotations), and per-instance
previouslyVisible flag tracking
- Graceful fallback to frustum-only culling if HiZ init fails
Fix dark WMO interiors by gating shadow map sampling on isInterior==0
in the WMO fragment shader. Interior groups (flag 0x2000) now rely
solely on pre-baked MOCV vertex-color lighting + MOHD ambient color.
Disable interiorDarken globally (was incorrectly darkening outdoor M2s
when camera was inside a WMO). Use isInsideInteriorWMO() instead of
isInsideWMO() for correct indoor detection.
New files:
- hiz_system.hpp/cpp: pyramid image management, compute pipeline,
descriptors, mip-chain build dispatch, resize handling
- hiz_build.comp.glsl: MAX-depth 2x2 reduction compute shader
- m2_cull_hiz.comp.glsl: frustum + HiZ occlusion cull compute shader
- test_indoor_shadows.cpp: 14 unit tests for shadow/interior contracts
Modified:
- CullUniformsGPU expanded 128->272 bytes (HiZ params, viewProj,
prevViewProj)
- Depth buffer images gain VK_IMAGE_USAGE_SAMPLED_BIT for HiZ reads
- wmo.frag.glsl: interior branch before unlit, shadow skip for 0x2000
- Render graph: hiz_build + compute_cull disabled (run in early compute)
- .gitignore: ignore compiled .spv binaries
- MEGA_BONE_MAX_INSTANCES: 2048 -> 4096
Signed-off-by: Pavel Okhlopkov <pavel.okhlopkov@flant.com>
MAX_CULL_INSTANCES was 16384 but game object instances were allocated
at indices 20000+, beyond the cull buffer. Increased to 65536.
Also compute fallback boundRadius from vertex extents when M2 header
reports 0, and seed bones for global-sequence-only animated models.
M2 GPU instancing
- M2InstanceGPU SSBO (96 B/entry, double-buffered, 16384 max)
- Group opaque instances by (modelId, LOD); single vkCmdDrawIndexed per group
- boneBase field indexes into mega bone SSBO via gl_InstanceIndex
Indirect terrain drawing
- 24 MB mega index buffer (6M uint32) + 64 MB mega vertex buffer
- CPU builds VkDrawIndexedIndirectCommand per visible chunk
- Single VB/IB bind per frame; shadow pass reuses mega buffers
- Replaced vkCmdDrawIndexedIndirect with direct vkCmdDrawIndexed to fix
host-mapped buffer race condition that caused terrain flickering
GPU frustum culling (compute shader)
- m2_cull.comp.glsl: 64-thread workgroups, sphere-vs-6-planes + distance cull
- CullInstanceGPU SSBO input, uint visibility[] output, double-buffered
- dispatchCullCompute() runs before main pass via render graph node
Consolidated bone matrix SSBOs
- 16 MB double-buffered mega bone SSBO (2048 instances × 128 bones)
- Eliminated per-instance descriptor sets; one megaBoneSet_ per frame
- prepareRender() packs bone matrices consecutively into current frame slot
Render graph / frame graph
- RenderGraph: RGResource handles, RGPass nodes, Kahn topological sort
- Automatic VkImageMemoryBarrier/VkBufferMemoryBarrier between passes
- Passes: minimap_composite, worldmap_composite, preview_composite,
shadow_pass, reflection_pass, compute_cull
- beginFrame() uses buildFrameGraph() + renderGraph_->execute(cmd)
Pipeline derivatives
- PipelineBuilder::setFlags/setBasePipeline for VK_PIPELINE_CREATE_DERIVATIVE_BIT
- M2 opaque = base; alphaTest/alpha/additive are derivatives
- Applied to terrain (wireframe) and WMO (alpha-test) renderers
Rendering bug fixes:
- fix(shadow): compute lightSpaceMatrix before updatePerFrameUBO to eliminate
one-frame lag that caused shadow trails and flicker on moving objects
- fix(shadow): scale depth bias with shadowDistance_ instead of hardcoded 0.8f
to prevent acne at close range and gaps at far range
- fix(visibility): WMO group distance threshold 500u → 1200u to match terrain
view distance; buildings were disappearing on the horizon
- fix(precision): camera near plane 0.05 → 0.5 (ratio 600K:1 → 60K:1),
eliminating Z-fighting and improving frustum plane extraction stability
- fix(streaming): terrain load radius 4 → 6 tiles (~2133u → ~3200u) to exceed
M2 render distance (2800u) and eliminate pop-in when camera turns;
unload radius 7 → 9; spawn radius 3 → 4
- fix(visibility): ground-detail M2 distance multiplier 0.75 → 0.9 to reduce
early pop of grass and debris
M2 vertex bone indices are indices into boneLookupTable, not direct bone
array indices. Without remapping, vertices weighted to higher bone
indices (cloak, cape, hair) get the wrong bone transform, causing
vertices to project wildly outward from the character.
destroyInstanceBones loops over both frame slots (i=0,1), deferring
both boneSet[0] and boneSet[1] to the current frame's fence. When
currentFrame=0, boneSet[1] is freed after only slot 0's fence completes
while slot 1's command buffer may still be using it.
Switch both M2Renderer and CharacterRenderer bone destruction from
deferAfterFrameFence to deferAfterAllFrameFences to ensure all
in-flight frames have completed before freeing cross-slot resources.
M2 destroyInstanceBones and WMO destroyGroupGPU freed descriptor sets
and buffers immediately during tile streaming, while in-flight command
buffers still referenced them — causing DEVICE_LOST on AMD RADV.
Now defers GPU resource destruction via deferAfterFrameFence in streaming
paths (removeInstance, removeInstances, unloadModel). Immediate
destruction preserved for shutdown/clear paths that vkDeviceWaitIdle
first.
Also: vkDeviceWaitIdle before WMO backfillNormalMaps descriptor rebinds,
and fillModeNonSolid added to required device features for wireframe
pipelines on AMD.
Three bugs found via AMD RADV crash log:
1. Water reflection render pass used BOTTOM_OF_PIPE as srcStageMask but
pipelines were created against the main pass (EARLY_FRAGMENT_TESTS |
COLOR_ATTACHMENT_OUTPUT). AMD enforces strict render pass compatibility
→ SIGSEGV when scene renders into reflection texture.
2. samplerAnisotropy was never enabled during device creation despite being
used in sampler creation — now requested via PhysicalDeviceSelector.
3. Shadow texture descriptor pool was reset each frame while prior frame's
command buffers might still reference it. Split into per-frame-slot pools
so each reset is fence-guarded.
M2 particle/ribbon/batch, terrain layer, and WMO material texture
resolution paths were silently falling back to white textures when
indices were out of range — making missing texture issues hard to
diagnose. Add LOG_WARNING at each silent failure point with model
name, index details, and array sizes.
- Name portal spin wrap value as kTwoPi constant
- Name particle animTime wrap as kParticleWrapMs (3333ms) with
why-comment: covers longest known emission cycle (~3s torch/campfire)
while preventing float precision loss over hours of runtime
- Add FBlock interpolation documentation: explain what FBlocks are
(particle lifetime curves) and note that float/vec3 variants share
identical logic and must be updated together
All 8 rand() calls for animation time offsets and variation timers in
m2_renderer.cpp used C rand() which defaults to seed 1 without srand(),
producing identical sequences every launch. Trees, torches, and grass
all swayed in sync. Replaced with std::mt19937 seeded from
random_device. Same fix for 4 mount idle fidget/sound timer sites in
renderer.cpp which mixed rand() with the mt19937 already present.
M2 renderer: move 3 per-frame local containers to member variables:
- particleGroups_ (unordered_map): reuse bucket structure across frames
- ribbonDraws_ (vector): reuse draw call buffer
- shadowTexSetCache_ (unordered_map): reuse descriptor cache
Eliminates ~3 heap allocations per frame in particle/ribbon/shadow passes.
UI polish:
- Nameplate hover tooltip showing level, class (players), guild name
- Bag window titles show slot counts: "Backpack (12/16)"
Player report: CMSG_COMPLAIN packet builder and reportPlayer() method.
"Report Player" option in target frame right-click menu for other players.
Server response handler (SMSG_COMPLAIN_RESULT) was already implemented.
M2 renderer: skip bone matrix computation for instances beyond 150 units
(LOD 3 threshold). These models use minimal static geometry with no visible
skeletal animation. Last-computed bone matrices are retained for GPU upload.
Removes unnecessary float matrix operations for hundreds of distant NPCs
in crowded zones.
Water renderer: add per-surface AABB frustum culling before draw calls.
Computes tight AABB from surface corners and height range, tests against
camera frustum. Skips descriptor binding and vkCmdDrawIndexed for surfaces
outside the view. Handles both ADT and WMO water (rotated step vectors).
Mail: change money/COD fields from uint32 to uint64 in CMSG_SEND_MAIL and
SMSG_MAIL_LIST_RESULT for WotLK 3.3.5a. Classic keeps uint32 on the wire.
Fixes money truncation and packet misalignment causing mail failures.
Other-player capes: add cape texture loading to setOnlinePlayerEquipment().
The cape geoset was enabled but no texture was loaded, leaving capes blank.
Now mirrors the local-player path: looks up ItemDisplayInfo.dbc, finds cape
texture candidates, applies via setGroupTextureOverride/setTextureSlotOverride.
Zone toasts: suppress duplicate zone toast when the zone text overlay is
already showing the same zone name. Fixes double "Entering: Stormwind City".
Network: enable TCP_NODELAY on both auth and world sockets after connect(),
disabling Nagle's algorithm to eliminate up to 200ms buffering delay on
small packets (movement, spell casts, chat).
Rendering: track material and bone descriptor sets in M2 renderer to skip
redundant vkCmdBindDescriptorSets calls between batches sharing same textures.
- Hoist DBC field index lookups before loops in game_handler (7 DBC iteration loops)
- Cache getSkybox()/getPosition() calls instead of redundant per-frame queries
- Merge textureHasAlphaByPtr_ + textureColorKeyBlackByPtr_ into single map
- Add constexpr for DEG_TO_RAD, reciprocal constants, physics delta
- Add reserve() for WMO/M2 collision grid queries and portal BFS
- Frustum plane normalize: inversesqrt instead of length+divide
- M2 particle emission: inversesqrt for direction normalization
- Parse creature display IDs from query response
- UI: show spell names/IDs as fallback instead of "Unknown"
Squared distance optimizations across 30 files:
- Convert glm::length() comparisons to glm::dot() (no sqrt)
- Use glm::inversesqrt() for check-then-normalize patterns (1 rsqrt vs 2 sqrt)
- Defer sqrt to after early-out checks in collision/movement code
- Hottest paths: camera_controller (21), weather particles, WMO collision,
transport movement, creature interpolation, nameplate culling
Container and algorithm improvements:
- std::map<string> → std::unordered_map for asset/DBC/MPQ/warden caches
- std::mutex → std::shared_mutex for asset_manager and mpq_manager caches
- std::sort → std::partial_sort in lighting_manager (top-2 of N volumes)
- Double-lookup find()+operator[] → insert_or_assign in game_handler
- Add reserve() for per-frame vectors: weather, swim_effects, WMO/M2 collision
Threading and synchronization:
- Replace 1ms busy-wait polling with condition_variable in character_renderer
- Move timestamp capture before mutex in logger
- Use memory_order_acquire/release for normal map completion signaling
API additions:
- DBC getStringView()/getStringViewByOffset() for zero-copy string access
- Parse creature display IDs from SMSG_CREATURE_QUERY_SINGLE_RESPONSE
Add [[nodiscard]] to VkShaderModule::loadFromFile, Shader::loadFromFile/
loadFromSource, AssetManifest::load, DbcLoader::load — all return bool
indicating success/failure that callers should check.
Suppress with (void) at 17 call sites where validity is checked via
isValid() after loading rather than the return value (m2_renderer
recreatePipelines, swim_effects recreatePipelines).
Create a VkPipelineCache at device init, loaded from disk if available.
All 65 pipeline creation calls across 19 renderer files now use the
shared cache. On shutdown, the cache is serialized to disk so subsequent
launches skip redundant shader compilation.
Cache path: ~/.local/share/wowee/pipeline_cache.bin (Linux),
~/Library/Caches/wowee/ (macOS), %APPDATA%\wowee\ (Windows).
Stale/corrupt caches are handled gracefully (fallback to empty cache).
Lava M2 models used independent static-local start times in pass 1
and pass 2 for UV scroll animation. Since static locals initialize
on first call, the two timers started at slightly different times
(microseconds to frames apart), causing a permanent UV offset mismatch
between passes — visible as texture flicker/jumping on lava surfaces.
Replace both function-scoped statics with a single file-scoped
kLavaAnimStart constant, ensuring both passes compute identical UV
offsets from the same epoch.
Apply at-rest values from M2 color alpha and transparency animation
tracks to batch rendering opacity. This fixes models that should render
as semi-transparent (ghosts, ethereal effects, fading doodads) but were
previously rendering at full opacity.
The fix multiplies colorAlphas[batch.colorIndex] and
textureWeights[batch.transparencyIndex] into batchOpacity during model
setup. Zero values are skipped to avoid the edge case where animated
tracks start at 0 (invisible) and animate up — baking that first
keyframe would make the entire batch permanently invisible.
Spell visual effects previously used a fixed 3.5s duration for all
effects, causing some to linger too long and overlap during combat.
Now queries the M2 model's default animation duration via the new
getInstanceAnimDuration() method and clamps it to 0.5-5s. Effects
without animations fall back to a 2s default. This makes spell impacts
feel more responsive and reduces visual clutter.
Models that lose all instances are no longer immediately evicted from
GPU memory. Instead they get a 60-second grace period, preventing the
thrash cycle where GO models (barrels, chests, herbs) were evicted
every 5 seconds and re-loaded when the same object type respawned.
M2Renderer::removeInstance() was calling rebuildSpatialIndex() for every
single removal, causing 25-90ms frame hitches during entity despawns.
Now uses O(1) lookup via instanceIndexById, incremental spatial grid
cell removal, and swap-remove from the instance vector. The auxiliary
index vectors are rebuilt cheaply since they're small.
Two follow-up fixes for the ribbon emitter implementation and the
transport-doodad stall fix:
1. loadModel() rejected any M2 with no vertices AND no particles, but
ribbon-only spell-effect models (e.g. weapon trail or aura ribbons)
have neither. These models were silently invisible even though the
ribbon rendering pipeline added in 1108aa9 is fully capable of
rendering them. Extended the guard to also accept models that have
ribbon emitters, matching the particle-emitter precedent.
2. processPendingTransportDoodads() ignored the bool return of
loadModel(), calling createInstance() even when the model was
rejected, generating spurious "Cannot create instance: model X not
loaded" warnings for every failed doodad path. Check the return
value and continue to the next doodad on failure.
Three root causes identified from wowee.log crash at frame 134368:
1. processPendingTransportDoodads() was doing N separate synchronous
GPU uploads (vkQueueSubmit + vkWaitForFences per texture per doodad).
With 30+ doodads × multiple textures, this caused the 489ms stall in
the 'gameobject/transport queues' update stage. Fixed by wrapping the
entire batch in beginUploadBatch()/endUploadBatch() so all texture
layout transitions are submitted in a single async command buffer.
2. Game objects whose M2 model has no geometry/particles (empty or
unsupported format) were retried every frame because loadModel()
returns false without adding to gameObjectDisplayIdModelCache_.
Added gameObjectDisplayIdFailedCache_ to permanently skip these
display IDs after the first failure, stopping the per-frame spam.
3. renderM2Ribbons() only checked ribbonPipeline_ != null, not
ribbonAdditivePipeline_. If additive pipeline creation failed, any
ribbon with additive blending would call vkCmdBindPipeline with
VK_NULL_HANDLE, causing VK_ERROR_DEVICE_LOST on the GPU side.
Extended the early-return guard to cover both ribbon pipelines.
Parse M2RibbonEmitter data (WotLK format) from M2 files — bone index,
position, color/alpha/height tracks, edgesPerSecond, edgeLifetime,
gravity. Add CPU-side trail simulation per instance (edge birth at bone
world position, lifetime expiry, gravity droop). New m2_ribbon.vert/frag
shaders render a triangle-strip quad per emitter using the existing
particleTexLayout_ descriptor set. Supports both alpha-blend and additive
pipeline variants based on material blend mode. Fixes invisible spell
trail effects (~5-10%% of spell visuals) that were silently skipped.
cleanupUnusedModels() runs every 5 seconds and freed vertex/index buffers
without waiting for the GPU to finish the previous frame's command buffer.
This caused VK_ERROR_DEVICE_LOST (-4) after extended gameplay when tiles
stream out and their models are freed mid-render.
Add vkDeviceWaitIdle() before the buffer destroy loop in both M2Renderer
and WMORenderer cleanupUnusedModels(). The wait only happens when there are
models to remove, so quiet sessions have no overhead.
Floating-point fmod() loses precision with large accumulated time values, causing
subtle jumps/hitches in animation loops. Replace with iterative duration subtraction
to keep animationTime bounded and maintain precision, consistent with the fix
applied to character_renderer.cpp.
Applies to:
- M2 creature/object animation loops (main update)
- M2 particle-only instance wrapping (3333ms limit)
- M2 global sequence timing resolution
- M2 animated particle tile indexing
- Mount bobbing motion (sinusoidal rider motion)
- Character footstep trigger timing
- Mount footstep trigger timing
All timing computations now use the same precision-preserving approach.
Addresses sparseness in lava/magma effects noted in status documentation.
Higher emission rate (48 vs 32 per second) makes lava/slime areas visually
denser and more immersive while staying within GPU budget constraints.
Double the smoke particle emission rate to create visually richer lava and magma
effects. Current implementation emitted only 16 particles/sec per emitter (~88 in
steady state), which appeared sparse especially in multi-emitter lava areas.
Increasing to 32/sec provides denser steam/smoke effects (~176 in steady state)
while remaining well under the 1000 particle cap. This tuning opportunity was
documented in status.md as a known gap in visual completeness.
Move ShadowParamsUBO from 5 separate shadow rendering functions (2 in
m2_renderer, 1 in terrain_renderer, 1 in wmo_renderer) into shared
vk_frame_data.hpp header. Eliminates 5 identical local struct definitions
and improves consistency across all shadow pass implementations. Structure
layout matches shader std140 uniform buffer requirements.
Move envSizeMBOrDefault and envSizeOrDefault from 4 separate rendering
modules (character_renderer, m2_renderer, terrain_renderer, wmo_renderer)
into shared vk_utils.hpp header as inline functions. Use the most robust
version which includes overflow checking for MB-to-bytes conversion. This
eliminates 7 identical local function definitions and improves consistency
across all rendering modules.
Move ShadowPush from 4 separate rendering modules (character_renderer,
m2_renderer, terrain_renderer, wmo_renderer) into shared vk_frame_data.hpp
header. This eliminates 4 identical local struct definitions and ensures
consistency across all shadow rendering passes. Add vk_frame_data.hpp include
to character_renderer.cpp.
Pre-allocate one stable VkDescriptorSet per particle emitter at model
upload time (particleTexSets[]) instead of allocating a new set from
materialDescPool_ every frame for each particle group. The per-frame
path exhausted the 8192-set pool in ~14 s at 60 fps with 10 active
particle emitters, causing GPU device-lost crashes. The old path is
kept as an explicit fallback but should never be reached in practice.