I’m very confusing about the thread number limitation.
I’v read cuda programming guide and many articles to understand that problem, but I couldn’t.
I has 8500GT device and it compatibles with Compute 1.1
CUDA compute 1.1 specifies that
first, the maximum thread number per block is 512.
second, maximum block dimemsion size and maximum grid dimension size are 512 * 512 * 64 and 65535 * 65535 respectively.
When i read this restriction, i understood that
the limitation of dimension size is the aspect of programming
and the thread number limitation is just about the unit which the device can process once.
Hence, I’v written sample programs ignoring the thread number limitation,
but the programs are larger larger, the kernels do not work!!
actually, i think that the kernels could not be work because i define the block dimension over the maximum thread number.
Thus, I tested some code and I found that the block dimension should not over the maximum thread number. i.e. blockDim.x * block.Dim.y * blockDim.z must be less than 512.
I can’t trust this result. (because, i’ve still not understood CUDA internals completely…)
this result is against the second specification which i mentioned above.
is there anyone to clarify relations between the maximum thread number and the maximum block dimension size???
Help me~~.
512 threads is the maximum amount of threads per block.
Hardware-wise, you speak of multiprocessors, and for older cards they can hold a maximum of 768 threads (or three blocks of 256 threads) at a time, while for GT200 it is 1024 threads.