Hi,
I have a kernel that work perfectly on GTX285 but that return “unspecified launch failure” when compiled with nvcc 3.0 and nvcc 3.1beta on GTX480. I found a workaround, but I don’t understant why this fix it while it is working perfectly on GTX285.
The kernel fail when I want to copy a matrix that have negative stride to a new matrix that is c contiguous. If the stride are positive, it work correctly.
Here is the code of the kernel.
__global__ void k_elemwise_unary_rowmajor_copy2 (unsigned int numEls,
unsigned int nd,
const int * dim,
const float * a_data, const int * a_str,
float * z_data, const int * z_str)
{
const unsigned int idx = blockIdx.x * blockDim.x + threadIdx.x;
const unsigned int numThreads = blockDim.x * gridDim.x;
for (unsigned int i = idx; i < numEls; i += numThreads)
{
unsigned int ii = i;
const float * a_i = a_data;
float * z_i = z_data;
for (unsigned int _d = 0; _d < nd; ++_d)
{
unsigned int d = nd - _d-1;
unsigned int i_d = ii % dim[d]; /* BUG REMOVE THE unsigned TO FIX THE BUG */
ii = ii / dim[d];
a_i += i_d * a_str[d]; /* increment our a and z pointers by i_d elements */
z_i += i_d * z_str[d];
}
z_i[0] = a_i[0];
}
}
N.B. This version of the code is made to be generic(work for any number of dimension in the matrix) not fast…
If I change the line
unsigned int i_d = ii % dim[d]; /* BUG REMOVE THE unsigned TO FIX THE BUG */
to
int i_d = ii % dim[d]; /* BUG REMOVE THE unsigned TO FIX THE BUG */
It fix my bug.
It is important that I understand this bug as I have another bug that is also related to negative stride that I’m not able to find a workaround.
I attach a full example that exhibit the problem on GTX480.
Anyone have any glue why this fail?
thanks.
edit note: made the post more clear.
cuda_error.cu (3.37 KB)