Sparse Fluid Simulation in DirectX Alex Dunn Dev. Tech. – NVIDIA [email protected] Eulerian Simulation ● Grid based. ● Great for simulating gaseous fluid; smoke, flame, clouds. ● It just works-> Basic Algorithm Inject 2x Velocity Advect Pressure 2x Pressure Vorticity Evolve 1x Vorticity Why Are Small Volumes Bad? ● Fluid isn’t box shaped. ● ● Simulated separately. ● ● ● Often fluid will clip bounds. Lots of state changes. No interaction between volumes. Tricky to render. ● ● No sorting between volumes. Each volume rendered separately. Problem! N-order problem ● ● ● ● ● 64^3 = ~0.25m cells 128^3 = ~2m cells 256^3 = ~16m cells … Applies to: ● ● Computational complexity Memory requirements 8192 7168 Memory (Mb) ● Texture3D - 4x16F 6144 5120 4096 3072 2048 1024 0 0 256 512 768 1024 Dimensions (X = Y = Z) And that’s just 1 texture… What Do We Do? ● Typically, fluid doesn’t occupy every cell! ● Why waste cycles/memory on empty cells? ● Options? ● ● Trees – Tight fitting but poor lookup speed Bricks – Good fit and fast lookup Bricks ● Split simulation space into groups of cells (each known as a brick). ● Simulate each brick independently. Storage ● Storage of bricks can take one of two forms: ● Compressed; allocate bricks as needed. ● Uncompressed; allocate all bricks. Compressed Storage Kind of like, vector<Brick>. Pros: • Good memory consumption. Only store bricks we care about. Cons: • • Allocation strategies. Expensive neighbourhood lookup. • “Software translation” 1 Brick = 43 = 64 1 Brick = (1+4+1)3 = 216 • New problem; • “6n2 +12n + 8” problem. Uncompressed Storage Allocate everything; forget about unoccupied cells Pros: • Simulation is coherent in memory. Cons: • No reduction in memory used. Brick Map ● We need to track which bricks are occupied! ● New Texture3D<uint> 1 voxel per brick ● ● ● ● 0 Unoccupied 1 Occupied Could also use packed binary grids [Holger15], but this requires atomics Tracking Bricks ● Initialise with emitter ● ● Perform on the CPU Expansion (unoccupied -> occupied) ● ● Read velocity If axial velocity enough to traverse boundary ●Expand ● in that axis Reduction (occupied -> unoccupied) ● ● Handled automatically Calculate occupied bricks every frame Sparse Algorithm Clear Tiles Inject Reset all tiles to 0 (unmapped) in tile map. Advect Pressure Vorticity Evolve* Fill List Read value from tile map. Texture3D<uint> g_OccupiedMapRO; AppendBuffer<uint3> g_ListRW; Append tile coordinate to list if occupied. if(g_OccupiedMapRO[idx] != 0) { g_ListRW.Append(idx); } *Also handles expansion Numbers ● Show performance of uncompressed grid ● Explain how memory consumption stays the same when using uncompressed storage. ● Can we do better? Enter; DirectX 11.3 ● ● ● Volume Tiled Resources (VTR)! Extends 2D functionality in DX11.2. 1 Tile = 64KB Bits/Pixel 8 16 32 64 128 Tile Dimensions 64x32x32 32x32x32 32x32x16 32x16x16 16x16x16 Tiled Resources • Only mapped memory is allocated in VRAM • Fast HW translation (same as regular paged memory) • All samplers supported Gotcha: Tile mappings must be updated from CPU Idea 1. Tile == Brick 2. Create fluid textures as tiled resources. 3. Create tile pool per texture. 4. Map tiles to memory using list of occupied tiles. 5. Handle expansion/reduction of tiles in simulation. Drawbacks ● Tile mappings updated by CPU. ● ● Possible using CPU read backs. Must use triple buffer for speedy Map/Unmap. ● Means 3 frame latency. ● Predict tile mappings ahead of time. Frame GPU/CPU Status N •CPU issues render calls for current frame. N+1 •GPU executing calls sent from CPU during frame N. •CPU issues render calls for current frame. N+2 •GPU finished executing calls sent from CPU during frame N. Results ready. •GPU executing calls sent from CPU during frame N+1. •CPU issues render calls for current frame. N+3 •GPU finished executing calls sent from CPU during frame N+1. Results ready. •GPU executing calls sent from CPU during frame N+2. •CPU issues render calls for current frame. N+4 ... GPU Simulation; CPU Prediction ● We always know the maximum velocity of the fluid. ● Add a skirt of “probable” tiles around “definite” tiles. New Algorithm Yes Is Data Available ? Readback Tiles Fill List No Emitter Tiles Predict Tiles Build List Map Tiles Evolve Vorticity Pressure Advect Inject Clear Tiles Demo Numbers Questions? Alex Dunn - [email protected] Twitter: @AlexWDunn Thanks for attending.
© Copyright 2024