R e n d e r i n g ... u s i n g D i r... G R I D A u t o...

Rendering Fields of Grass
u s i n g D i r e c t X 11 i n
GRID Autosport
Richard Kettlewell
Codemasters
M O T I VAT I O N
 Current implementation engineered for PS3/XBox360
M O T I VAT I O N
 High-end PC can do much better
 DirectX 11
 Compute Shaders
 Lots of interesting techniques online
 Outerra (http://goo.gl/tYlcjN)
 Nvidia (http://goo.gl/F43iTY)
GOALS
 High density
 Keep all data on GPU, for efficiency
 Get rid of polygonal look of terrain
 Flat polys with grass textures are unconvincing
 Interaction
 Wind
 Deformation
OUR APPROACH
 Generate
 Populate Append Buffer with blades of grass
 Render
 Read Append Buffer
 Construct geometry in Vertex Shader
 Rasterise using Alpha-To-Coverage
 No sorting required
ART PROCESS
 Simple world space map
 RGB defines grass colour
 Alpha defines grass height
 2K x 2K
 Wastes resolution
 Simplest approach given time constraints
 UV mapping onto terrain would be better
 Doesn’t scale well for large point-to-point tracks
G E N E R AT I N G G R A S S
 Render Terrain using custom shader
 Orthographic top-down render, centred around viewer
 Output to Append Buffer, not Render Target
 Every pixel could be a blade of grass
 Debug mode outputs to render target, for visualisation
G E N E R AT I N G G R A S S
 Every pixel could be a blade of grass
 Control density using viewport size
 Spreads the pixels over more/less distance
 Need to cull unimportant blades
 Set Scissor Rectangle around view segment
 Frustum cull against main scene camera
 Read world space map (discard if alpha < threshold)
 Scissor Rectangle
 Create bounding box from circle segment
 View position
 2 extents
 Any axis intersection
 Extra points around viewer
 Fixes problem when looking down
G E N E R AT I N G G R A S S
 LODs
 Vital for performance
 Distance based
 Each LOD discards increasing amounts of grass
 Remaining blades are scaled up to fill gaps
G E N E R AT I N G G R A S S
 LODs
 Feather distances randomly, to break up transitions
 Randomise distance calculation
 Fade grass height towards zero over last 15%
R A N D O M I S AT I O N
 Generate texture at load time
 Fill a 64x64 RGBA texture with rand()
 Provides 4 random numbers per grass blade
 Align texture to orthographic projection
 Used for
 Rotation
 Position
 Scale
 Varying Albedo
 Etc
APPEND BUFFER
 Represents every valid pixel from Generate stage
 DirectX 11 Structured Buffer
 Each element represents one grass blade
struct Instance
{
float3 position;
 Output to this instead of Render Target
float specular;
 16 byte aligned
float3 albedo;
 Pack 16bit values where possible
 f32tof16
uint vertexOffsetAndSkew;
 f16tof32
float2 rotation;
float2 scale;
};
DrawInstancedIndirect
 Allows the GPU to control Draw arguments
 Because we don’t know how many grass instances the GPU generated
 Avoids copying the AppendBuffer structure count back to CPU
 Same as DrawInstanced, except arguments come from GPU buffer
 VertexCountPerInstance
 InstanceCount
 StartVertexLocation
 StartInstanceLocation
 Create ID3D11Buffer with D3D11_RESOURCE_MISC_DRAWINDIRECT_ARGS
DrawInstancedIndirect
 Use CopyStructureCount to copy size of Append Buffer into Constant Buffer
 Populate buffer using Compute Shader
 Dispatch a single thread
 Is there a better way?
// buffer
RWBuffer<uint> g_drawInstancedBuffer : register( u0 );
// vertex buffer counter
cbuffer BufferCounter : register( b12 )
{
uint numInstances;
}
[numthreads( 1, 1, 1 )]
void cp()
{
g_drawInstancedBuffer[
g_drawInstancedBuffer[
g_drawInstancedBuffer[
g_drawInstancedBuffer[
}
0
1
2
3
]
]
]
]
=
=
=
=
6u;
numInstances;
0u;
0u;
//
//
//
//
vertexCountPerInstance
instanceCount
startVertexLocation
startInstanceLocation
DrawInstancedIndirect
 Avoid dispatching high instance counts with low vertex counts
 http://www.slideshare.net/DevCentralAMD/vertex-shader-tricks-bill-bilodeau
 Prefer dispatching a single large instance
 Reconstruct vertex/instance ID in Vertex Shader
 Use SV_VertexID
// buffer
RWBuffer<uint> g_drawInstancedBuffer : register( u0 );
// vertex buffer counter
cbuffer BufferCounter : register( b12 )
{
uint numInstances;
}
[numthreads( 1, 1, 1 )]
void cp()
{
g_drawInstancedBuffer[
g_drawInstancedBuffer[
g_drawInstancedBuffer[
g_drawInstancedBuffer[
}
0
1
2
3
]
]
]
]
=
=
=
=
6u * numInstances;// vertexCountPerInstance
1u;
// instanceCount
0u;
// startVertexLocation
0u;
// startInstanceLocation
I N I T I A L R E S U LT S
GEOMETRY OR FINS?
 Initial implementation used geometry
 Inspired by Outerra tech
 Heavily vertex bound
 Difficult to make grass look soft with few verts per blade
 Difficult to achieve desired grass density
 We were using only 5 verts per blade
 Contributing to spiky results
 Could tessellate close grass?
Outerra grass geometry
GEOMETRY OR FINS?
 Use fins instead?
 More traditional approach to rendering grass
 Alpha Testing/ATOC
 Each ‘blade’ now represents one billboard
 Easier to add variety via UV shifting
 Softer grass can be painted into texture
 Most of existing Generation tech still valid
RENDERING
 Vertex data hardcoded in shader
 Use SV_VertexID to generate it
 Construct matrix from position and rotation
// sin/cos for rotation matrix
float s = instance.rotation.x;
float c = instance.rotation.y;
 Apply scale to all verts
 Apply skew to top verts
// world matrix
 Texture format (DXT1)
float3 worldPosition = instance.position;
 Red: Diffuse tint
float4x4 m = float4x4(
float4( c, 0, s, worldPosition.x ),
 Green: Specular map
float4( 0, 1, 0, worldPosition.y ),
 Blue: Alpha
float4( -s, 0, c, worldPosition.z ),
 Negative LOD Bias
 3/4 mip
float4( 0, 0, 0, 1 )
);
LIGHTING
 Calculated per instance
 More efficient than per vertex/per pixel
 Inaccurate for large billboards
 Normals
 Combine terrain and billboard normal
 Randomise albedo
 Small amount of noise makes big difference
 Darken terrain under grass
 Terrain shader reads grass map for height
 Specular
 Use terrain normal and apply random reduction factor
 Fade effects in distance for smooth transition to
terrain
SHADOWS
 Game creates a screen-space mask from depth pre-pass
 Pixel Shaders read mask instead of cascades
 One sample per grass instance
 What if grass instance is partially occluded?
 Solution
 Read shadow cascades directly
SSAO
 Same problem as shadows (screen-space mask)
 Expensive to add grass to depth pre-pass
 Must cope with screen-space problem (no shadow cascades!)
 SSAO also includes undercar shadow
 Leaks around car edges
 Solution
 Use depth buffer to compare 2 sample points
 Read SSAO from sample with furthest depth value
 Solves car occluding grass
SELF OCCLUSION
 Tall grass should occlude neighbours
 Treat height map like normal map
 Sample neighbours to estimate slope
 Use normal and sun direction to estimate occlusion
 Artist controlled strength
SELF OCCLUSION
D E F O R M AT I O N
 Cars/dynamic objects should flatten grass
 Render objects into F32 height texture
 Pass 1: Render centred around viewer
 Pass 2: Update texture into world space tiled texture
 Prevents texel swimming
 Fade edges of texture as it wraps around
 Use skidmarks not wheels
 Read height value in Generate stage
 If height intersects grass, modify the albedo, scale
and skew, to appear squashed/flattened
D E F O R M AT I O N
PERFORMANCE
 Worst Case (ms)
 1920 x 1200
 4xMSAA
Generate Render Total
AMD R9 290X
1.3
1.5
2.8
Nvidia GTX 780Ti
1.4
1.8
3.2
Nvidia GTX 560Ti
3.9
3.6
7.5
Intel HD5200
5.1
9.4*
14.5
*MSAA Disabled
PERFORMANCE
 Average Case (ms)
 1920 x 1200
 4xMSAA
Generate Render Total
AMD R9 290X
1.5
0.2
1.7
Nvidia GTX 780Ti
1.6
0.3
1.9
Nvidia GTX 560Ti
4.1
0.8
4.9
Intel HD5200
5.2
2.0*
7.2
*MSAA Disabled
FUTURE IMPROVEMENTS
 One Generate per LOD
 Wind
 Prototyped, but too subtle on short grass
 Similar to deformation
 Render car speed instead of height
 Bleed speed values out over texture
 Read in Generate stage, to increase existing sine wave sway
 Flowers
 Meshes
 Gravel / small rocks
 Improve art authoring pipeline
 World space map is naïve, wastes texture space
 Translucency
WE ARE HIRING!
http://www.codemasters.com/uk/working-for-us/southam/
THANKS FOR LISTENING!
Questions?