Hii. I wrote a marix multipcatoin for A*AT. When i compared the actuall result to the expceted result i saw that some values along the diagonal are incorrect.

RTX A4000 Cuda 11.4, Ubunto 10.04.6

The matrix is 5*4 (5 rows , 4 columns). The values are cuFloatComplex ~10^15.

5,5 threads , 1,1 Block

threadIdx.x = 2, threadIdx.y = 2

Just for the example I multiply only the first cell of the row.

z * Conj(z)

acc_sum1.x = 0;

acc_sum1.y = 0;

acc_sum2.x = 0;

acc_sum2.y = 0;

for (int k=0;k < 1;k++)

{

acc_sum1 = cuCaddf(acc_sum1,cuCmulf(matrix[k+thread.y*4],cuConjf(matrix[k+threadIdx.x*4])));

acc_sum2 = cuCaddf(acc_sum2,cuCmulf(matrix[8],cuConjf(matrix[8])));

}

the value of the first elemnt of raw 2 is

Real -3647191470047232, Image 1640025074696192

The result of the first line

15991687666823789327821742014464,230493328208896511180800

The result of the Second line ( which is the correct result )

15991687666823789327821742014464, 0

Can some one tell me why the result are differnet?