Hi, :rolleyes:

I have a problem with a kernel function. I have 1024 threads which have to increase elements of a matrix Cmat.

There’s my kernel (it calculates a cooccurrence matrix) :

[codebox]**global** void kernel_cooc(unsigned char* Image, int M, int N, float*Cmat, unsigned char nbNiveaux, int Tx, int Ty, float increment)

{ // pour une image 32*32 : 2 * 2 blocks de 16 * 16 threads.

unsigned int i = __umul24(blockIdx.x,blockDim.x) + threadIdx.x;

unsigned int j = __umul24(blockIdx.y,blockDim.y) + threadIdx.y;

if((i < (M - Ty)) && (j < (N - Tx)))

{

```
unsigned int index_a = i + __umul24(j, M);
unsigned int index_b = i + Ty + __umul24((j + Tx), M);
unsigned int index_cmat = Image[index_a] - 1 + __umul24(Image[index_b] - 1, nbNiveaux);
Cmat[index_cmat] += increment;
```

}

}

[/codebox]

The problem is that several threads may need to access the same element of the matrix Cmat. When I test, elements of Cmat are increased of 1 or 2 while its have to be increased of 10 for example. :blink:

How could I manage the access of my matrix ?

Thanks