Invalid argument error with grid x dimension > 65535

Hi there,
I’m new to Cuda, and am attempting to make a kernel that will require > 65535 blocks in the x-dimension. I’ve found that using 65535 blocks works, however going over this limit results in a runtime invalid argument error when I execute the kernel. I’ve tried commenting all the contents of the kernel, just in case it has something to do with the kernel execution, and I get the same results. I’m using a 980ti, which reports the maximum grid x dimension as 2^31 - 1, which fits with compute capability 5.2. What am I doing wrong?

sample code:

int threadsPerBlock = 128;
dim3 blocks(65535, 1, 1);
processDataKernel << <blocks, threadsPerBlock >> >(…);

executes properly

dim3 blocks(65537, 1, 1);
processDataKernel << <blocks, threadsPerBlock >> >(…);

spits out a runtime invalid argument error.

Can anyone help? Thanks!!

Maximum x-dimension of a grid of thread blocks is 2^31-1 starting with compute capability 3.0 so you need to compile at least for CC 3.0 or higher.
See http://docs.nvidia.com/cuda/cuda-compiler-driver-nvcc/#gpu-compilation for details on compilation.

Thank you for the help! That fixed it!