addKernel Launch Failed: Invalid Argument

Why does a kernel launch fail when I use > 100Mb global memory on a 3.0 compute capability with 2Gb of total Global Memory available?

I’ve simply modified the default project code for CUDA 7.5 Visual Studio 2013 to receive as a command line argument the number of elements in each of the 3 arrays (a, b, c). The kernel launches without error for sizes < 8388000 and issues an error for size = 8389000.

Did you modify the project settings to compile for a cc3.0 device? (project…properties…CUDA…device)

I don’t happen to have that code in front of me, but if memory serves the default project is set up to compile for a cc2.0 target (limited to 65535 blocks in the x grid dimension), and I suspect that code launches blocks of 128 threads. 128 threads * 65535 = 8388480

If you compile for a cc3.0 device, the limit on the x grid dimension becomes some large number like 2^31-1 I think.

Again, I don’t have the code in front of me. This is just conjecture.

You’ll see something like compute_20,sm_20 in the project properties under the device setting. Change that to compute_30,sm_30

http://docs.nvidia.com/cuda/cuda-c-programming-guide/index.html#features-and-technical-specifications__technical-specifications-per-compute-capability

That was it: it works now, thanks.

Chris