Hi, I get some rather unexpected results when transferring a large data array from CUDA to PC in RELEASE and DEBUG modes. In both cases I used absolutely the same code and parameters.
For example: array size ~ 4MBs
Transfer time in DEBUG: ~4.5ms
Transfer Time in RELEASE: ~128ms
Time measurements are conducted with cuda timers around the memory copy function call. no other code in between. I checked it couple of times, for different grid sizes and similar, but I always get similar results.
HW is GeForce 8800 GTX.
Do you have any idea what could have gone wrong here?
Also, it look like there is always some latency of ~0.02-0.03ms when issuing a memory copy call, and that the transfer times do not increase linearly with the size of transfered data, at least not for relatively small sizes. Is there somewhere more detailed information on this topic?