When I run my kernel,
int gridx = 16; int gridy = 16; int blockx = 16; int blockx = 16; dim3 dimGrid(gridx, gridy, 1); dim3 dimBlock(blockx, blocky, 1);
I get no problems, but if I use 32 instead of 16 throughout, I get an error; found by calling ‘cudaGetLastError()’ immediately after the kernel call. The cudaError is ‘cudaErrorInvalidConfiguration’, and the string is “invalid configuration argument”.
I can see that I’m getting a much faster runtime using 32, but I’m concerned about this (non-halting) error. I’m not using a lot of threads here. Any thoughts?