Determining the kernel dimension


I’m developing some program using CUDA.

My question is:

What happens if I give (to the kernel) the number of parallel threads, i.e., dimension size, which is greater than the maximum number of threads of my hardware?

I’m using GTX 280 (compute capability 1.3).

Does the hardware internally schedule all the threads of the large dimension size (even when > the maximum)?

Or should I have to divide the kernels into of reasonable size and execute several times?

That is 6553565535512 threads. That should be enough, as reading 1 bit per thread would already mean you need almost 256 GB of memory on your card.

Thanks for your reply, Riedijk.

Is it possible to load 6553565535512 threads in REAL? I thought it was theoretical number. Am I wrong?

In some article, I read that the compute capability 1.3 devices can have up to 32 warps running concurrently.

So the maximum total number of concurrent thread is:

32 warps/multiprocessor * 32 threads/warp * 30 multiprocessors (in 1.3 capability) = 30720 threads.

If this is true,

do you mean that regardless of this physical (real) 30720 number, I can have more than 30720 threads (e.g., 50000)? In other words, even though my hardware can run physically up to 30720 threads, more than 30720 threads will be internally scheduled properly?

Sure, they can be “internally scheduled,” meaning that N CTAs (blocks) will be launched and must complete before other CTAs can be launched on the SM.