I call the kernel function with kernelfun<<<102400,1024>>>, it simply skiped it and does not run call the kernel function,
Then I divided my task into 1024 parts, for each parts there is only <<<100,1024>>>, then it works well.
However I uses deviceQuery to check my GPUs.
There are two GPUs in the desktop, one GTX780 and one GT610, and I uses GT610 for display.
For GTX780, the maximum grid size is <2147483647,65535,65535>.
For GT610, the maximum grid size is <65535,65535,65535>.
However it should automatically choose GT780 to run the program, so there should be no problem for runing with more than 65535 blocks.
How can I fix this problem?