MemCopy Problem with CUDA Can't copy data


I have problem copying CUDA device memory outside the kernel, i will love some advise.

I work in debug emulation mode.
constant float * data1;
device float * data2;

First i fill data1 with real data, and than run the kernel, which uses data1 as input to write into data2 as output.
After the kernel is finished, in the application, i try to copy data2 into data1, something similar to ping pong in Cg-GL.

i use cudaMemcpy() to do it in the application using the 2 given pointers.
The compiler has no problem with it, but i print the values and see that no actual copy was done.

Does it possible at all?
Maybe debug emulation bug?

Please advise.
Thanks, Oded

You have to copy all data to and from the device with the CudaMemcpy() function. You cannot access host data on your GPU.
So your kernel cannot access data1.

You should :

fill data1
allocate g_data1 variable on the device.
CudaMemcpy data1 to g_data1
run your kernel
allocate h_data2 on your host.
CudaMemCpy data2 from device to host (h_data2)

Then you will find the right values in h_data2.

With CUDA, you really don’t have to do it that way anymore. You can read and write directly to the stream.