I have invoked some kernel on emulation mode with the following:
dim3 grid(32, 24);
dim3 threads(20, 20);
Everything worked fine with this.
When I tried to do the following:
dim3 grid(1, 1);
dim3 threads(640, 480);
It seemed like the kernel didn’t execute at all.
The big problem with this is that I am not even notified by an error or an exception that the kernel didn’t execute.
After the kernel I have this line of code:
CUT_CHECK_ERROR(“Kernel execution failed”);
But even with this I didn’t get any notification and the program ran as usual.
So how am I to decide how many threads to use inside each block? Especially when I don’t know when the amount of threads is too much?
It is odd that CUT_CHECK_ERROR did not report an error. It should have reported too many resources requested for launch.
It is not always easy to tell when you are using too many threads that is true. But, there is a hard upper limit of 512 threads in a block. 640*480 >> 512 so this is your problem. To find out if your limit is under 512, compile the cubin and look at the number of registers used. You can either type that into the occupany analysis tool or take N_threads * registers_used <= 8192 to find the largest N_threads.