I have done cuda malloc and stuff. The kernel should make the left and and top border of the matrix(it is 1Dimensional but logically I am using the length as the column size of the matrix) in multiple of negative odd numbers.
Ex-> if the length= 4 and size of Smatrix =20 then the output should be,
0 -2 -4 -6
-2 0 0 0
-4 0 0 0
-6 0 0 0
-8 0 0 0
And it is giving the same output which is expected for this input but when the
length is 7 and the size of Smatrix is 119 then the out put is wired like
Amazing man… I really appreciate that you responded so promptly…I will surely go forwarded with the implementation you suggested… Thank you…
And ya I guess the problem which I am having is mainly because I am not allocating the grid and block dimensions properly.
Ah…
At my University I use 2 different machines one having a gtx 8800 and other gtx 260.
It works fine on 260… but gives the above o/p on gtx8800… Don’t know why it happens.
But as my submission are closer I guess I will switch over to 260 instead of 8800… but I am curious why is such discrepancy between the two…