The timing for the SDK 1.1 project ‘template’ seems to be badly broken, with the debug version running more than 250 X FASTER than the release version.
I have run this on both Linux with 1.1, and the Mac OS X Beta with similar results.
I haven’t found this reported already in these forums, I apologise in advance if this is a well worn topic. If someone has an explanation, PLEASE point me at it or explain, it’s driving me nuts.
Example run
NVIDIA_CUDA_SDK$ for i in 1 2 3 4 5 6 7 8 9 0
> do
> echo -n 'release: '; ./bin/linux/release/template -noprompt
> echo -n 'debug: '; ./bin/linux/debug/template -noprompt
> done
release: Processing time: 34.004002 (ms)
Test PASSED
debug: Processing time: 0.130000 (ms)
Test PASSED
release: Processing time: 34.013000 (ms)
Test PASSED
debug: Processing time: 0.130000 (ms)
Test PASSED
release: Processing time: 34.319000 (ms)
Test PASSED
debug: Processing time: 0.130000 (ms)
Test PASSED
release: Processing time: 34.131001 (ms)
Test PASSED
debug: Processing time: 0.130000 (ms)
Test PASSED
release: Processing time: 34.127998 (ms)
Test PASSED
debug: Processing time: 0.158000 (ms)
Test PASSED
release: Processing time: 34.119999 (ms)
Test PASSED
debug: Processing time: 0.129000 (ms)
Test PASSED
release: Processing time: 34.422001 (ms)
Test PASSED
debug: Processing time: 0.130000 (ms)
Test PASSED
release: Processing time: 34.138000 (ms)
Test PASSED
debug: Processing time: 0.130000 (ms)
Test PASSED
release: Processing time: 34.001999 (ms)
Test PASSED
debug: Processing time: 0.130000 (ms)
Test PASSED
release: Processing time: 34.471001 (ms)
Test PASSED
debug: Processing time: 0.130000 (ms)
Test PASSED
As you can see, the release version runs in 34 ms, and debug in 0.13 ms, so debug is more than 250 X faster than release.
This was run on:
SDK 1.1 compiled with Tools 1.1
Athlon 64 X2 4600+, 4GB Ram, Asus M2N32-SLI Delux
GeForce 8800 GTS, 320MB memory, VBIOS 60.80.0d.00.61 - running as the only display
OpenSuse 10.2, with standard, out-of-the-box gcc (GCC) 4.1.2 20061115
NVIDIA Driver Version: 169.09
I have also run release and debug versions on MacBook Pro, 2.2GHz COoe 2 Duo
with GeForce 8600M GT, and 128MB VRAM
OS X 10.5.2, with the Leopard graphics updates, gcc 4.0.1
While this MAcBook Pro configuration isn’t recommended, there is enough VRAM to run the debug version and pass the tests, so I assume it is working correctly. It gives similar ‘wrong’ results, though slower of course. The details are in my posting in the Mac OS X forum.
I have tried other projects, e.g. matrixMul, which work as I would expect:
NVIDIA_CUDA_SDK$ ./bin/linux/release/matrixMul -noprompt
Processing time: 0.158000 (ms)
Test PASSED
NVIDIA_CUDA_SDK$ ./bin/linux/debug/matrixMul -noprompt
Processing time: 0.179000 (ms)
Test PASSED
Another project which seems to give counter intuitive results is simpleTemplates where release is much slower than debug, but for float only:
NVIDIA_CUDA_SDK$ ./bin/linux/release/simpleTemplates -noprompt
Processing time: 34.540001 (ms)
Test PASSED
Processing time: 0.080000 (ms)
Test PASSED
NVIDIA_CUDA_SDK$ ./bin/linux/debug/simpleTemplates -noprompt
Processing time: 0.131000 (ms)
Test PASSED
Processing time: 0.100000 (ms)
Test PASSED
I have added direct calls to gettimeofday() on Linux and to the OS X timing counters (mach_absolute_time()), and they confirm that the results are accurate, release is much slower than debug.
So, HELP! What is the explanation?