Hi,
When I run my kernel,
integrate_kernel<<<dimGrid, dimBlock>>>(foo,bar);
using
int gridx = 16;
int gridy = 16;
int blockx = 16;
int blockx = 16;
dim3 dimGrid(gridx, gridy, 1);
dim3 dimBlock(blockx, blocky, 1);
I get no problems, but if I use 32 instead of 16 throughout, I get an error; found by calling ‘cudaGetLastError()’ immediately after the kernel call. The cudaError is ‘cudaErrorInvalidConfiguration’, and the string is “invalid configuration argument”.
I can see that I’m getting a much faster runtime using 32, but I’m concerned about this (non-halting) error. I’m not using a lot of threads here. Any thoughts?