Hey Folks,
Is there any cap on the maximum number of threads in a multiprocessor.
I know from devicequery, you can get the maximum no. of threads in a block.
I want to know what can be the max. no. threads for a multiprocessor, if any?
Hey Folks,
Is there any cap on the maximum number of threads in a multiprocessor.
I know from devicequery, you can get the maximum no. of threads in a block.
I want to know what can be the max. no. threads for a multiprocessor, if any?
from appendix A in programming guide
compute capability 1.0
The maximum number of active threads per multiprocessor is 768
compute capability 1.2
The maximum number of active threads per multiprocessor is 1024
I have a knowledge of this. What I mean to ask, when I define grid size, do I need to take into consideration how many threads are there in the block?
Even if there are 1024 active threads permitted, more threads can be there in one multiprocessor, right?
Even if there are 1024 active threads permitted, more threads can be there in one multiprocessor, right?
Yea… there can be more than 1024 threads, whereas only those many of them will actually be occupying SM resources at any given moment.
when I define grid size, do I need to take into consideration how many threads are there in the block?
Yea… again, the answer is use ‘deviceQuery’ :)
If you’ve got the ‘max. number of threads per block’ to be limited to 512, and now, let us say that you want a block to be of dim3 = (512, 512, 1), your kernel will not get launched!
The number of threads per block must be less than the number of maximum threads per multiprocessor for your CUDA device. When a block launches, all threads in the block must be active.