How to debug a program that only bugs in release mode? Debug and emu do not show the problem at all

jam11 · April 6, 2010, 6:25pm

Why would this happened. This is the same program compile 3 times with : release, debug, emurelease.
Could it be due to rounding ? The numbers I can check in the host code do start to be slightly differents in the varoius modes as the program runs.

_Big_Mac · April 6, 2010, 7:00pm

Ah, the dreaded Heisenbug.

These often happen due to race conditions but that’s as much as my crystall ball is willing to tell me now.

YDD · April 6, 2010, 7:10pm

It can also be due to differing orders of floating point operations, dependent on the precise definition of “slightly different.” However, my crystal ball is also suffering from extreme fogging right now.

avidday · April 6, 2010, 7:23pm

Define “bugs”.

Seriously, if you expect help, you are going to have to describe what the problem with more precision. Were are not playing “20 questions” here…

jam11 · April 6, 2010, 10:25pm

It seems that over a very narrow range of input numbers , one of the kernels would not launched at all except in debug or emulation mode.
For thread per blocks of 256
N = 29000 to about 31000

kernel<<< N/256 , 256 >>> …
And I solve this problem by changing N .

This is still very strange!

cbuchner1 · April 7, 2010, 6:28am

I’d put in a cudaThreadSynchronize() between all CUDA operations
and check for error codes following that

Then when an operation fails, you will know.

avidday · April 7, 2010, 6:54am

Error checking, now there is an idea…

Sarnath · April 7, 2010, 12:32pm

Possibly - Something to do with your “register” count…and the number of threads per block you are spawning…

Instead of changing “N”, just change the way you are spawning the kernel as (N/32, 32) and see if it works…
OR
If you are knowledgeable enough, You can work out the register math yourself from the cubin…

jam11 · April 7, 2010, 1:09pm

None of these suggestions applys to the main problem that the same program compiled with different flags (dgb,emu,none) behaves differently at runtime.
To summarized:

WHEN : only in release mode at some specific input values
WHERE: in one kernel of about 10
WHAT : the kernel does not launch with the message :unspecified launch failure
WHY : I do not know
WHO : probably me : mistake in variable declaration ?

It it just annoying because it works with new values and this is good enough right now.

eyalhir74 · April 7, 2010, 2:00pm

That usually indicates an invalid memory access in your kernel (out of bounds access).

Probably for certain configurations of your kernel the out of bounds access does not occur while for other

configurations it does and hence fails.

eyal

avidday · April 7, 2010, 2:12pm

Huzzah! We now know what the problem actually is. Unspecified launch failures are usually out of bounds memory access. I can think of at least three reasons why it might only be appearing in “release” mode :

[*]in emulation everything is in the same memory space and your code is probably silently reading/writing over memory inside the process image, which you won’t see unless you use something like valgrind. The warp size is one, which also eliminates about 99.9% of race conditions that might otherwise happen on the device.

[*]in debugging mode compiler optimizations are disabled and registers are spilled to local memory, so potential subtle race conditions in poorly written code aren’t exposed.

[*]Your hardware is flakey, and only when the code is running at full speed does it start misbehaving (I had this happen once).

My suggesting is to use emulation + valgrind or gpu-ocelot to run the code and see what happens. I have found ocelot to be flawless at detecting illegal memory access. Alternatively, if you are running linux, there is the new cuda-memcheck utility which should do the same thing as Ocelot, but on the device. If I were to guess I would say you have an indexing or addressing snafu in your kernel code, mostly because you are getting the failures over an apparently narrow range of execution parameters.

EDIT: eyal beat me to most of it.

Topic		Replies	Views
Emulation runs ok V.S. GPU run failed CUDA Programming and Performance	1	1255	February 24, 2009
kernel not executed, profiler reports all-zeros CUDA Programming and Performance	18	11146	December 2, 2008
Debug and release in VS 2005 CUDA Programming and Performance	0	1407	November 4, 2008
expected results in EmuRelease and EmuDebug CUDA Programming and Performance	2	2941	November 9, 2008
CUDA Debug and Profiling success but Release hangs CUDA Programming and Performance	8	1395	December 1, 2013
Bug appears only when compiling to "release" How to track it down? CUDA Programming and Performance	15	4908	January 2, 2012
Random Launch Failure CUDA Programming and Performance	2	1300	March 1, 2010
Other causes of Unspecified Launch Failues CUDA Programming and Performance	2	2597	May 15, 2010
Emulation works, Debug doesn't CUDA Programming and Performance	12	2760	January 29, 2010
Problem:Different results on EMULATION and RELEASE Problem on release and debug mode CUDA Programming and Performance	4	1945	June 5, 2008

How to debug a program that only bugs in release mode? Debug and emu do not show the problem at all

Related topics