This is the kernel of my histogram. On emuDebug, it works so good, but on release it can’t. In this code, a is the matrix and b is the out vector. *e is the value of each cell of the matrix. I think that there is problem on syncronize.

*global*_ void test(int *a, int *b, size_t pitch){

int iy = blockDim.y * blockIdx.y + threadIdx.y; //line

int ix = blockDim.x * blockIdx.x + threadIdx.x; //col.

if (ix >= C || iy >= L)

return;

int *e = (int*)((char*)a + iy * pitch) + ix;

b[*e] += 1;

__syncthreads();

}

…

…

…

int bx = (C + BLOCK_SIZE-1) / BLOCK_SIZE;

int by = (L + BLOCK_SIZE-1) / BLOCK_SIZE;

dim3 blocks(bx, by);

dim3 threads(BLOCK_SIZE, BLOCK_SIZE);

test<<< blocks, threads >>>(a, b, pitch);

cudaThreadSynchronize();

## Thanks in advance.

Sidney Lima

sidney@sidneylima.com

www.sidneylima.com