Why does a kernel launch fail when I use > 100Mb global memory on a 3.0 compute capability with 2Gb of total Global Memory available?
I’ve simply modified the default project code for CUDA 7.5 Visual Studio 2013 to receive as a command line argument the number of elements in each of the 3 arrays (a, b, c). The kernel launches without error for sizes < 8388000 and issues an error for size = 8389000.
Did you modify the project settings to compile for a cc3.0 device? (project…properties…CUDA…device)
I don’t happen to have that code in front of me, but if memory serves the default project is set up to compile for a cc2.0 target (limited to 65535 blocks in the x grid dimension), and I suspect that code launches blocks of 128 threads. 128 threads * 65535 = 8388480
If you compile for a cc3.0 device, the limit on the x grid dimension becomes some large number like 2^31-1 I think.
Again, I don’t have the code in front of me. This is just conjecture.
You’ll see something like compute_20,sm_20 in the project properties under the device setting. Change that to compute_30,sm_30
[url]Programming Guide :: CUDA Toolkit Documentation
That was it: it works now, thanks.
Chris