Conditional Statement Divergent

Nullspace · June 1, 2011, 8:11pm

I am implementing a blur compute shader and reading the texture values into shared memory. My thread group is 256x1x1. Each thread reads in a texel value into shared memory. From an ATI presentation, they recommend that the boundary threads (how wide the boundary is depends on the blur radius) read in an additional texel, as to blur 256 pixels will require 256 + 2*BlurRadius samples.

So I have code like: if(localThreadID.x < gBlurRadius) read extra sample, and similar code for the right boundary.

256 threads means there are 8 warps. So only 2 warps (one at each boundary) out of the 8 should be divergent, correct?

Is this something I should worry about? Is it possible to improve? Someone told me the gain of loading into shared memory outweighs the divergance cost and not to worry about it.

tera · June 2, 2011, 9:48am

For not too large values of the blur radius this operation will be entirely memory bound. For large blur radii the convolution operation in shared memory will dominate the time spent. In both cases you don’t need to care about the cost of divergence.

You might achieve small gains by using a texture to read the data, or by having the same warp read values at both boundaries.

Topic		Replies	Views
Branch divergence, Boundary element exchange Optimization and best practices CUDA Programming and Performance	9	18599	December 13, 2007
Help in thread divergene CUDA Programming and Performance	18	9534	July 28, 2010
Some Performance Consideration Questions warp divergence, coalescing and shared mem then and now CUDA Programming and Performance	1	2023	March 8, 2012
Optimization to Reduce Bank Conflicts Decreases Performance CUDA Programming and Performance	3	4015	May 31, 2010
global to shared mem loads and sync CUDA Programming and Performance	26	11571	February 21, 2008
Diverge-free doesn't win 32x over Diverge-all warp divergence CUDA Programming and Performance	6	3139	September 14, 2007
Wacking the CUDA performance Is this how you can screw up you CUDA CUDA Programming and Performance	16	21284	March 12, 2007
reduction optimization #1 Not agree with performances explanation CUDA Programming and Performance	8	6716	August 1, 2008
Some advice needed pls Doubts we have, we're starting with CUDA programming CUDA Programming and Performance	16	4756	June 22, 2011
divergent branches how to change it? CUDA Programming and Performance	1	2715	April 26, 2009

Conditional Statement Divergent

Related topics