Different behavior on omitting --device-debug

Flatval · February 13, 2017, 1:32pm

Hello!

I have an application that does some image processing for computer vision purposes with CUDA, specifically, the application is to be used for drone navigation in a known environment. There are several things bugging me, the first one being the most important:

Upon compiling with -G (or --device-debug ), the application produces the expected results although it runs slowly. Contrary-wise, disabling debugging leaves the program to run in real-time, but the result is slightly off, and becomes useless over time.

I am compiling with -arch compute_50 -code -sm_50, both the latest (compute/sm_62) and the earliest (compute/sm_20) supported alternatives gives an Invalid Texture-error. I am using nvcc from the CUDA-8.0 SDK.

Furthermore, upon compiling with -Xptxas -g, the code seems to run slowly, but simultaneously produces the wrong result, implying that this option differs from -G, contrary to what is described in the docs: [url]http://docs.nvidia.com/cuda/cuda-compiler-driver-nvcc/#ptxas-options[/url]

The program is non-probabilistic and the observations above are made consistently. Does the behavior of -G have any aspects that could possibly produce the above result, that is, that the program runs fine with -G and not without it?

By the way, I have run cuda-memcheck on the program both with and without debugging enabled, both resulting in no detected errors.

njuffa · February 13, 2017, 10:50pm

(1) cuda-memcheck can find many problems but not all of them. For example, it cannot find all race conditions, and it cannot find accesses that are out of bounds for a particular array, but are within the allocated memory.

(2) Throwing the debugging switch turns off all compiler optimizations. This can change the numerical properties of floating-point computation for example, by turning off contractions of FADD and FMUL into FMA. This in turn could interfere with convergence criteria, number of loop iterations, etc.

(3) Presumably -G affects the code generation of both the nvmm and ptxas portions of the toolchain, as both nvvm and ptxas are optimizing compilers. So changing the ptxas switch alone is unlikely to result in the same machine code.

The most likely cause of you troubles is a latent bug in the code which is exposed at higher optimization levels. This could be due to invoking undefined behavior as defined by C++, or something CUDA-specific, such as violating the rules on the use of synchronization barriers, or using warp-synchronous programming without proper safe guards. A compiler issue is possible, but unlikely.

I would suggest using code instrumentation and use of the CUDA debugger to get to the bottom of this.

Flatval · February 16, 2017, 4:48pm

I finally managed to track down the bug, it actually was a matter of racing conditions floating to the surface as the optimizations started up.

I believe the main problem here was my own limited knowledge of the cuda-memcheck debugger, as running cuda-memcheck --tool racecheck gave away the crucial points right away.

Anyway, thank you for your reply!

Topic		Replies	Views
different output when compiled for emulation, device, and device with -g -G CUDA Programming and Performance	7	3068	October 26, 2009
Debugging problem nvcc -G gives error messages CUDA Programming and Performance	6	7608	September 18, 2008
debug in device emulation mode CUDA Programming and Performance	2	3381	May 6, 2008
What does -G flag do exactly? CUDA Programming and Performance	2	621	June 14, 2023
How to debug a code working with -G but not without it? CUDA Programming and Performance	4	91	January 12, 2025
Kernel is massivly slower when compiling without the "-G" flag CUDA Programming and Performance	3	807	June 21, 2016
debug build versus release build CUDA Programming and Performance	9	1984	June 24, 2014
Bad GPU performance when compiling with -G parameter with nvcc compiler CUDA Programming and Performance	3	761	May 12, 2014
Strange performance results when changing from debug to release build CUDA Programming and Performance	1	764	February 17, 2016
[4.0] compiling for cuda-gdb (-G) results in the correct result, while omitting -G does not CUDA Programming and Performance	3	825	June 14, 2011

Different behavior on omitting --device-debug

Related topics