I have problem copying CUDA device memory outside the kernel, i will love some advise.
I work in debug emulation mode.
constant float * data1;
device float * data2;
First i fill data1 with real data, and than run the kernel, which uses data1 as input to write into data2 as output.
After the kernel is finished, in the application, i try to copy data2 into data1, something similar to ping pong in Cg-GL.
i use cudaMemcpy() to do it in the application using the 2 given pointers.
The compiler has no problem with it, but i print the values and see that no actual copy was done.
Does it possible at all?
Maybe debug emulation bug?