Hi, :rolleyes:
I have a problem with a kernel function. I have 1024 threads which have to increase elements of a matrix Cmat.
There’s my kernel (it calculates a cooccurrence matrix) :
[codebox]global void kernel_cooc(unsigned char* Image, int M, int N, float*Cmat, unsigned char nbNiveaux, int Tx, int Ty, float increment)
{ // pour une image 32*32 : 2 * 2 blocks de 16 * 16 threads.
unsigned int i = __umul24(blockIdx.x,blockDim.x) + threadIdx.x;
unsigned int j = __umul24(blockIdx.y,blockDim.y) + threadIdx.y;
if((i < (M - Ty)) && (j < (N - Tx)))
{
unsigned int index_a = i + __umul24(j, M);
unsigned int index_b = i + Ty + __umul24((j + Tx), M);
unsigned int index_cmat = Image[index_a] - 1 + __umul24(Image[index_b] - 1, nbNiveaux);
Cmat[index_cmat] += increment;
}
}
[/codebox]
The problem is that several threads may need to access the same element of the matrix Cmat. When I test, elements of Cmat are increased of 1 or 2 while its have to be increased of 10 for example. :blink:
How could I manage the access of my matrix ?
Thanks