kernel execution fail - because of memory ? function memory


I am trying to call device function in kernel.
At first i was not able to launch kernel with big “block size” & small “grid size”.
but by reducing the “block size” & increasing the “grid size” i was able to successfully launch the Kernel.

So if anyone can explain in detail how the memory is being used by the function called from Kernel.
Whether function is using global memory , local etc… ( we haven’t use any shared or “register memory” )

please explain if there is anything that make the kernel execution failed.

See Appendix A of Programming guide. THat lists the max and min limits for various parameters.

For example : A block cannot have more than 512 threads and so on