Hi,
please consider this section of the kernel function that I’m actually writing:
[codebox]
global void processL (const double *x, double *y, const double *v, const int r, const int c)
{
extern __shared__ double s_y[];
double tmp;
int y_index, v_index;
....
if (column_counter > 0)
{
tmp += s_y [ y_index-1] * v [ v_index ];
v_index++;
}
s_y [ y_index ] = x [ y_index ] - tmp;
....
}
[/codebox]
When I check the results of the execution of processL it seems that the tmp local variable never increases
its value, like if the multiplication between the elements in the s_y and v arrays fails. Indeed, debugging the
code, i get the following result:
[codebox]
(cuda-gdb)
[Current CUDA Thread <<<(0,0),(0,0,0)>>>]
processL () at /home/user/trests.cu:102
102 if (column_counter > 0)
(cuda-gdb) print tmp
$1 = 0
(cuda-gdb) next
[Current CUDA Thread <<<(0,0),(0,0,0)>>>]
processL () at /home/user/trests.cu:104
104 tmp += s_y [ y_index - 1 ] * v [ v_index ] ;
(cuda-gdb) print s_y [ y_index - 1 ] * v [ v_index ]
$2 = 9.4772098437180056e-15
(cuda-gdb) next
[Current CUDA Thread <<<(0,0),(0,0,0)>>>]
processL () at /home/user/trests.cu:105
105 v_index++;
(cuda-gdb) print tmp
$3 = 0
(cuda-gdb) print s_y [ y_index - 1 ]
$4 = -1.1546319456101628e-14
(cuda-gdb) print v [ v_index ]
$5 = -0.82079920616693092
(cuda-gdb) next
[Current CUDA Thread <<<(0,0),(0,0,0)>>>]
processL () at /home/user/trests.cu:108
108 s_y [ y_index ] = x [ y_index ] - tmp;
(cuda-gdb)
[/codebox]
So this is what happens: before and after the multiplication the value of tmp remains zero. Why??
Is simply 9.4e-15 rounded to zero or there is another reason?
I’m running this code using a GeForce GTX 295 and the kernel is compiled with the option -arch=sm_13.
Thank you in advance.