Block Count

yummig · January 18, 2009, 9:45am

Simple question:

If the CUDA Occupancy Calculator states that your kernel is limited to 8 blocks per multiprocessor, then inorder to effectively load balance, the number of blocks you execute should be a multiple of the number of multiprocessors multiplied by the maximum blocks per multiprocessor:

Example:

My kernel is limited to 8 blocks per multiprocessor

My graphics card is a GTX280 equating to 30 multiprocessors

30 * 8 = 240 blocks per device (or multiple of) to ensure the grid is balanced across the hardware

E.D_Riedijk · January 18, 2009, 9:51am

yes, that is optimal if each block takes the same amount of time. If you would have 241 blocks for example, your total running time might be 2x the running time for 240 blocks.

Topic		Replies	Views
Controlling Multiprocessor Usage? CUDA Programming and Performance	2	1475	April 3, 2009
Number of blocks run by gpu CUDA Programming and Performance	1	4472	February 17, 2012
how to determine max number of blocks per kernel CUDA Programming and Performance	10	17235	September 11, 2011
Device occupancy and correctness of computation CUDA Programming and Performance	0	554	August 22, 2014
Execution Of Thread-Blocks CUDA Programming and Performance	4	5285	June 18, 2007
Possible performance guide on un-balance kernels? CUDA Programming and Performance	1	4263	January 23, 2011
Max # of blocks? CUDA Programming and Performance	10	9977	November 28, 2007
block numbers related to the number of SMs blocks in multiple SMs CUDA Programming and Performance	1	1411	December 1, 2009
How to make sure that all blocks are in the grid CUDA Programming and Performance	2	1002	October 12, 2021
How to use blocks CUDA Programming and Performance	1	3574	November 26, 2007

Block Count

Related topics