This is the kernel of my histogram. On emuDebug, it works so good, but on release it can’t. In this code, a is the matrix and b is the out vector. *e is the value of each cell of the matrix. I think that there is problem on syncronize.
global_ void test(int *a, int *b, size_t pitch){
int iy = blockDim.y * blockIdx.y + threadIdx.y; //line
int ix = blockDim.x * blockIdx.x + threadIdx.x; //col.
if (ix >= C || iy >= L)
return;
int e = (int)((char*)a + iy * pitch) + ix;
b[*e] += 1;
__syncthreads();
}
…
…
…
int bx = (C + BLOCK_SIZE-1) / BLOCK_SIZE;
int by = (L + BLOCK_SIZE-1) / BLOCK_SIZE;
dim3 blocks(bx, by);
dim3 threads(BLOCK_SIZE, BLOCK_SIZE);
test<<< blocks, threads >>>(a, b, pitch);
cudaThreadSynchronize();
Thanks in advance.
Sidney Lima
sidney@sidneylima.com
www.sidneylima.com