I have a float * A of width=wA, pitch=pA and height=hA.
I try to store parts of A in shared memory shared_A.
In the example bellow, I try to store the part [lig_A,lig_A+BLOCK_SIZE][col_A,col_A+BLOCK_SIZE].
I also use thread block of size (BLOCK_SIZE,1,1).
What I don’t understand is that the following makes ma GeForce crash
tmp = A[(lig_A+l)*pA + col_A+tx]; shared_A[tx][l] = tmp;
And when I use the following lines, it works…
tmp = A[(lig_A+l)*pA + col_A+tx]; shared_A[tx][l] = 1.0;//Just an example
I don’t understand why I can set in the shared memory 1.0 and not tmp which is a float too…
It makes me crazzzzzzzzzzzzzy!
Maybe I do something wrong but I don’t understand what…
Thanks for your help!