question about debugging

julandcuda · November 27, 2007, 10:03am

Hello, i have a question.

I would like to know which is the differences between Debug, Emudebug, Emurelease and release.

Thanks

LuYao · November 27, 2007, 1:15pm

You know first the Debug and Release is running on GPU and indeed it’s no DEBUG because you cannot debug on GPU.
EmuDubug and EmuRelease is the emulation virsion on CPU running your GPU program and thus you can debug on EmuDebug mode to find some low level errors. But It’s a pity that there is still huge difference between EmuDebug and Debug. And it depends on yourselfe to find the real way to debug on GPU which I think it’s really a challenge.

Gook Luck!

julandcuda · November 27, 2007, 2:19pm

You mean when i want to debug i use Emudebug, and when i want to run my program i use release. Is that correct ?

In that case, what are debug and emurelease for ?

paulius · November 27, 2007, 10:46pm

Debug mode will “activate” all the macros from cutil.h, such as CUDA_SAFE_CALL etc.

Contrary to what LuYao said, it is possible to debug on the GPU. A debugger, running code on the GPU, has been demonstrated at SuperComputing 07. It will be released in the future.

Paulius

Morph208 · November 28, 2007, 1:06am

Well, I’m not sure we can tell LuYao was wrong. At the moment it’s not possible to debug on the GPU. I think it’s what he meant.

And yes, we heard about this debugger. I’m gonna ask the obvious question as always after one of your ‘announcement’. Can we have a clue about a possible release date? (for developers and public). Is it a matter of weeks or months?

LuYao · November 28, 2007, 3:26am

Debugger on GPU is crucial to my projects. Is that true? Do you have some detail information?
At the moment there’s no access to debug on GPU .
My program can run on EmuDebug mode but not on Debug mode which brings me a lot of trouble and I even don’t know where the mistake happens.Some high-level logical errors cannot be found in EmuDebug. That’s TERRIBLE.

paulius · November 28, 2007, 3:27am

We don’t comment on release dates, sorry.

MisterAnderson42 · November 28, 2007, 1:35pm

If you can’t run in Debug mode, then there is likely a problem in your code somewhere. What exactly is the issue?

Debug mode enables debugging and asserts on the CPU, but keeps the GPU code running on the GPU. So your kernels still run at nearly full speed and you can debug any problems in the rest of your code. It should be the standard mode you develop in because any crash will give you a back trace, you can step through CPU functions, and CUDA_SAFE_CALL will tell you whenever a CUDA call results in an error. Sure, you can’t debug the kernels themselves in Debug mode, but you can still debug EVERYTHING ELSE.

When it comes to actually debugging a kernel call itself, I have yet to run into a problem that I couldn’t solve by setting breakpoints in kernel calls in EmuDebug. Sure, the emu mode is slow, but then you just run your kernel with a reduced number of blocks or on a smaller problem size.

Release mode compiles your CPU code with full compiler optimizations, removes the information needed for generating back traces and stepping through code, disables asserts, and makes CUDA_SAFE_CALL into a noop (and thus ignoring any CUDA error). It is the mode you should compile in for full performance, but with significantly reduced error checking.

EmuDebug emulates kernel calls on the cpu and cpmpiles your code with no optimizations and with debug symbols for backtraces and stepping through code.

EmuRelease emulates kernel calls on the cpu and compiles your code with full optimizations and no debug symbols. It is perhaps the least useful of all the modes, but if you have a really subtle bug that shows up in Release, not Debug and you think it might be related to a kernel (however unlikely) then you can compile in EmuRelease and printf debug kernel calls.

LuYao · November 28, 2007, 3:26pm

I’m transplanting a program into GPU and I just need to rewrite the kenel.

The main progress of the program is:

read input file from harddisk into host memory
transfer data to device memory
running kernel

Each block have one tread processing its own part of input data . All the treads work separately so I needn’t care about

syncthreads.After some declearationg of parameters, in one circle, a thread reads some input data to buff-in zone in registor from

device memory and put resaults into buff-out zone also in registor. When buff out zone is full, the resaults will be tranferred to general

resault output array in device memory. The input and output array offsets have been calculated for many times and I’m sure they are all

right.

transfer resaults back to host memory
write resaults into files.

All the program has been checked for many times and what puzzled me is that I can run it on Emu mode and resaults are correct but

when I run on GPU, the whole machine just stuck and even the mouse cannot move when the kernel is running. Sometimes it will show

CUDA ERRORS but sometimes it just reboot. Of cause the resault file is empty.

I’ve also tried many ways to “debug” on GPU and I found when I remove the code that transfer resaults from register to device memory

the kernel runs well. And of cause it’s not just remove, but all code writing device memory will behave the same.But when I remove all

the computation part just leaving the read-and-write device memory part it runs well too.(fail in write but success in read, meanwhile,

all access to registor is just OK). But that’s not the end. In the declearation part, each thread will write some parameter values into

device memory for they are too big to put into registors. The form of this write into device memory is just the same as that transfering

resaults. So I’m considering there must be some logical errors in the process that occurs write-failure to device memoryBut I cannot

detect them because it is perfectly done in EMU mode.That’s painful!

MisterAnderson42 · November 28, 2007, 3:54pm

Just a note: if you comment out the global memory write, the dead code optimization will remove all the computations.

How long does your kernel take to run in emulation mode? The results you describe when running the kernel (cant move mouse) are normal for a kernel that takes more than 1 second to execute. Does the kernel die and result in CUDA errors after running for 5 seconds? You may be hitting the watchdog timeout then. Have you tried a smaller dataset?

You say: “Each block have one tread processing its own part of input data” which has me worried because 1 thread per block is not a very optimal way to use the device when the warp size is 32. How many blocks are you launching? Even an empty kernel with 65,000 blocks takes a while to launch due to scheduling overhead.

Neeraj_Kulkarni · November 28, 2007, 5:20pm

I’m transplanting a program into GPU and I just need to rewrite the kenel.

The main progress of the program is:

read input file from harddisk into host memory

transfer data to device memory

running kernel

Each block have one tread processing its own part of input data . All the treads work separately so I needn’t care about

syncthreads.After some declearationg of parameters, in one circle, a thread reads some input data to buff-in zone in registor from

device memory and put resaults into buff-out zone also in registor. When buff out zone is full, the resaults will be tranferred to general

resault output array in device memory. The input and output array offsets have been calculated for many times and I’m sure they are all

right.

transfer resaults back to host memory

write resaults into files.

All the program has been checked for many times and what puzzled me is that I can run it on Emu mode and resaults are correct but

when I run on GPU, the whole machine just stuck and even the mouse cannot move when the kernel is running. Sometimes it will show

CUDA ERRORS but sometimes it just reboot. Of cause the resault file is empty.

I’ve also tried many ways to “debug” on GPU and I found when I remove the code that transfer resaults from register to device memory

the kernel runs well. And of cause it’s not just remove, but all code writing device memory will behave the same.But when I remove all

the computation part just leaving the read-and-write device memory part it runs well too.(fail in write but success in read, meanwhile,

all access to registor is just OK). But that’s not the end. In the declearation part, each thread will write some parameter values into

device memory for they are too big to put into registors. The form of this write into device memory is just the same as that transfering

resaults. So I’m considering there must be some logical errors in the process that occurs write-failure to device memoryBut I cannot

detect them because it is perfectly done in EMU mode.That’s painful!

[snapback]285953[/snapback]

I had similar problems when i was working on buffering mechanisms,

[1] When you are waiting for the buff-out to be full. Are other blocks idling(waiting)for others? If yes try re-launch another kernel that does this for you.

[2] Try re-checking conflicts between offsets.Dump them into a device memory and re-check them with those calculated on the CPU. Warp size on the CPU emulation is 1 therefore it is highly unlikely that you will catch concurrent - write errors in the emulation mode.

[3] Make sure the offsets do not extend beyond the Allocated memory.

Hope this helps,

Cheers,

Neeraj

LuYao · November 29, 2007, 9:14am

All the threads work separately and even when I just use one block and one thread the same things happens.

Thanks for your advice and I will check it again!

Wish you good luck too!

Topic		Replies	Views
EmuDebug vs. Debug Modes What's the difference? Why? CUDA Programming and Performance	1	3525	November 23, 2008
CUDA Newbee questions/doubts. Please answer... CUDA Programming and Performance	4	2695	April 30, 2015
what is the difference between debug and emurelease while running cuda programs CUDA Programming and Performance	1	2333	December 18, 2008
CUDA debuging modes CUDA Programming and Performance	2	1989	March 6, 2008
Different kernel behaviour EmuDebug VS Debug CUDA Programming and Performance	5	2249	March 1, 2009
did any one have idea on debug device code CUDA Programming and Performance	6	2801	June 9, 2008
expected results in EmuRelease and EmuDebug CUDA Programming and Performance	2	2938	November 9, 2008
Different ways to execute CUDA CUDA Programming and Performance	2	2943	July 1, 2008
CUDA Toolkit 3.0 update GPU HW debugging tools to replace device emulation CUDA Programming and Performance	44	29848	April 29, 2010
Emulation works, Debug doesn't CUDA Programming and Performance	12	2757	January 29, 2010

question about debugging

Related topics