I’m developing some program using CUDA.
My question is:
What happens if I give (to the kernel) the number of parallel threads, i.e., dimension size, which is greater than the maximum number of threads of my hardware?
I’m using GTX 280 (compute capability 1.3).
Does the hardware internally schedule all the threads of the large dimension size (even when > the maximum)?
Or should I have to divide the kernels into of reasonable size and execute several times?