DEBUG vs RELEASE data transfer times Unexpected results

RoofTopG · January 25, 2008, 1:26pm

Hi, I get some rather unexpected results when transferring a large data array from CUDA to PC in RELEASE and DEBUG modes. In both cases I used absolutely the same code and parameters.

For example: array size ~ 4MBs
Transfer time in DEBUG: ~4.5ms
Transfer Time in RELEASE: ~128ms

Time measurements are conducted with cuda timers around the memory copy function call. no other code in between. I checked it couple of times, for different grid sizes and similar, but I always get similar results.

HW is GeForce 8800 GTX.

Do you have any idea what could have gone wrong here?

Also, it look like there is always some latency of ~0.02-0.03ms when issuing a memory copy call, and that the transfer times do not increase linearly with the size of transfered data, at least not for relatively small sizes. Is there somewhere more detailed information on this topic?

AndreiB · January 25, 2008, 1:54pm

DEBUG builds are always slower because they omit optimization and include additional stuff like stack checks which can slow down execution considerably. So, I guess this is host code problem.

I suggest you to enable CUDA profiler and check memcpy timings for debug and release builds; they should very close.

As for memcpy performance, yes, there is some overhead of issuing copy operation, for my machine about 15-25us. I haven’t seen description in documentation, so you may try searching this forum.

RoofTopG · January 25, 2008, 2:27pm

Sorry there was a mistake in my post.

Strange thing about the data transfer times that I’m getting, is that RELEASE values are much worse than the DEBUG values.

kuisma · January 25, 2008, 2:56pm

Do you run DEBUG in emulation mode? There is quite a big overhead starting the GPU kernel, and your times indicates a small workload, so maybe the overhead here is much more then the gain…?

AndreiB · January 25, 2008, 3:36pm

Anyway, try CUDA profiler. It will tell you whether the problem is in driver or in host code (I’m almost sure it’s somewhere in your code or nvcc options).

seb · January 26, 2008, 2:46pm

Do you have 2 CUDA compatible cards in the system?