i am developping an application in which i adapt the scalar product sample to my needs.
Since i need to do a scalar product it shouldn’t be that difficult, all i need to do is just
to change the size of the parameters allocated inside the GPU memory (is that so?).
The thing is that the results i obtain are not correct and outside of all logics.
The program works fine in the sense that if, inside the kernel module, i assign an arbitrary value to the variable returned, the value returned in the main is correct.
So the problem is not in passing the parameters from kernel to main, and neither in the
opposite direction (i’m pretty sure about that). The problem is in the calculation of
the scalar product.
I didn’t change a line so i don’t understand why it doesn’t work…
What aree the things i should pay attention at?
and what’s the use of the tree-like reduction cycles at the end of the kernel code?
Thanks a lot