about using "cudamemcpy"

I compile the CUDA code with EmuRelease mode.
I call " cudamemcpy " to copy the device data to host array.
It’s ok…, and the host data is correct. :)

But when I run the code on Release mode…( Run with GPU )
It seems that the “cudaMemcpy” is useless…
Cuz my host array didn’t update from the Device data…
All of the host data are zero… :wacko:

what’s the problem with my CUDA code ?
I need some help …Thanks~ External Image

Most likely your kernel did not run. Options:

  • you have an unspecified launch error
  • you ask too many resources (too many threads per block e.g.)
  • you have a 5 sec timeout problem

You should use CUT_CHECK_ERROR (and compile in debug mode) to check for errors. Also posting your kernel code & the code where you call the kernel is helpful, because with the information you provided it is not easy to help you.

Just to make sure - did you doublecheck the cudaMemcpyDeviceToHost vs. cudaMemcpyHostToDevice parameters?