Different cuda kernel results

test

“the difference only one line: //sum = 0;”

how do you know this?

“But it should not play any role, because there is a condition:
if (y>=(M-1)) sum = 0;”

so, why don’t you use the debugger to determine why the condition is ‘violated’ then?