Max size of each dimension in 1.3 compute capable devices ?

Hello,

I am using Tesla c1060 with AMD Phenom 9850 quad.

On page 82 of the NVIDIA_CUDA_Programming_Guide_2.1 it is mentioned for 1.0 compute capable devices that maximum
size of each dimension of a grid of thread blocks is 65535. For 1.3 compute capable devices, nothing is mentioned about the
maximum size of each dimension so I ‘assume’ that restriction of 1.0 are valid.

I have launched a gird with threads_ per_block = 512 and number_of_blocks = ceil((1024102410)/512)
Kernel launch: kernel_name <<<number_of_blocks,threads_per_block>>> (argument_list)
Kernel launch is successful and computations . Total_threads_created = 1024 * 1024 *10.

How is this possible ? Does the CUDA runtime checks for this constraint and schedules the number of
blocks accordingly i.e only that many blocks will run at a any given time such as to have
total_number_of_running_threads <= 65535 ?If such scheduling procedure exists then is it specific to 1.3
compute capable devices only ?

Ashish

The thing to notice here is that they said the number of “thread blocks,” not total threads. You are launching 20480 blocks, which is below the 65535 limit.

Thank you !