I have written a simple program which does nothing but takes in an array of 10 float’s returns an array of 10 float’s. In the kernel i am just trying to store some arbitrary value or just the thread index. I am still just trying to get a feel of CUDA. Unfortunately when i see the resultant array i see only two values and rest of it is garbage. I am attaching the code snippet, some help in debugging the issue will be of great help to me.
I am right now using CUDA 2.3 on 8600 GT on a windows machine.