I have 2 CUDA streams and 2 diferent kernels. For executing them in parallel I reduced number of threads in block and now I am seeing some parallel behaviar.
My question is about the number of threads that we have for each SM. Is it possible to have 1 SM with 512 thread and another SM with 256 thread? If maximum number of thread is 512 per SM, am I waste half of the thread for the second SM with 256 thread?
Also It is not clear for me the relation betwine number of CUDA kernel and number of CUDA thread and maximum of each.