My first program with CUDA need some help

I have written a simple program which does nothing but takes in an array of 10 float’s returns an array of 10 float’s. In the kernel i am just trying to store some arbitrary value or just the thread index. I am still just trying to get a feel of CUDA. Unfortunately when i see the resultant array i see only two values and rest of it is garbage. I am attaching the code snippet, some help in debugging the issue will be of great help to me.

I am right now using CUDA 2.3 on 8600 GT on a windows machine.

Can any body help me with this issue i am still trying to debug the problem. :(

The transfer size is too small. The size is in bytes, not words. You did this properly for the malloc but not the transfer.

So change to

cudaMemcpy((void*)arrS,(const void*)arr,10*sizeof(float),cudaMemcpyHostToDevice);

launch(arrS);

cudaMemcpy((void*)dst,(const void*)arrS,10*sizeof(float),cudaMemcpyDeviceToHost);

Thank you SPWorley. That was really a silly mistake from me but was really struggling with this one. Other stuff was really running fine but not this