CUDA Pro Tip: Minimize the Tail Effect

jwitsoe · June 4, 2014, 2:17pm

Originally published at: https://developer.nvidia.com/blog/cuda-pro-tip-minimize-the-tail-effect/

When I work on the optimization of CUDA kernels, I sometimes see a discrepancy between Achieved and Theoretical Occupancies. The Theoretical Occupancy is the ratio between the number of threads which may run on each multiprocessor (SM) and the maximum number of executable threads per SM (2048 on the Kepler architecture). This value is estimated…

anon83584961 · June 6, 2014, 3:23am

You say the GPU arranges blocks in a grid into waves, and allocates them to SMs on a per-wave basis, not a block-by-block basis.
Does this mean an idle SM with free resources will not be assigned a ready block until every SM on the device is able to accept a new block?

anon18555434 · June 6, 2014, 8:04am

Waves are an easy abstraction but the work is launched on a
block-by-block basis (so, the answer to your question is no). If you
have a grid of blocks which leads to a couple of full waves. You may
still have a strong tail effect if a few blocks are significantly longer
than the others. It's a rather classical scheduling problem.

Topic		Replies	Views
CUDA kernel block size tuning with maximum theoretical occupancy CUDA Programming and Performance	3	997	June 17, 2019
How to optimize tail effect? CUDA Programming and Performance cuda , jetson	8	156	December 6, 2024
Achieved Occupancy vs Theoretical CUDA Programming and Performance	6	5298	September 20, 2011
Discrepancy between theoretical occupancy and achieved occupancy depending on ThreadsPerBlock CUDA Programming and Performance cuda	7	130	September 6, 2024
Low processor efficiency with almost same CUDA kernels CUDA Programming and Performance	4	683	April 9, 2018
How to know the maximum blocks I can launch CUDA Programming and Performance jetson	10	282	November 9, 2024
Confusion about setting kernel block and grid size for maximum occupancy CUDA Programming and Performance cuda	11	785	March 30, 2024
Hide latency CUDA Programming and Performance	3	513	June 9, 2023
Performance degradation as task size grows CUDA Programming and Performance	13	631	April 25, 2023
Occupancy wierdness.... Is the calculator wrong? CUDA Programming and Performance	5	5901	July 25, 2007

CUDA Pro Tip: Minimize the Tail Effect

Related topics