I am wondering how and if people are using the CUDA device emulation mode. Please post your use-cases, even if they are duplicates of someone else’s.
Paulius
I am wondering how and if people are using the CUDA device emulation mode. Please post your use-cases, even if they are duplicates of someone else’s.
Paulius
I verified my Hough transform implementation in emulation mode, in particular whether any out-of-bounds memory accesses were occuring.
Unfortunately, emulation mode was insufficient to find race conditions that occured during writes to shared memory locations. They only got apparent when running it on the device.
Looking forward to the CUDA 2.1 and an SSE + multicore target - to get an emulation mode that is actually fast enough for production use. Having one code path to run on the GPU and/or CPU will be useful.
I rarely use emulation. I found that most problems tend to be threading and race related, and the emulator doesn’t show those often (makes sense, it’s single threaded itself!).
So when I DO run it, it’s to check that my suspected threading bug really is threading and not logic.
The other reason I don’t use emulation… it’s just way way way way too slow, changing the application itself… I am trying for 1M shaded pixels 30 frames a second, but the emulator is a slideshow. This wouldn’t matter for numerical computing though.
I find emulation mode to be an invaluable tool for debugging. I have my build system setup to run all unit test cases through valgrind compiled emulation builds. I run this battery of tests every time I make significant changes to the GPU kernels. Sure, it is slow and the tests need to be left to run over lunch, but with it I have found dozens of subtle memory access bugs that did not show up right away in other testing. These may have otherwise only started to cause unidentifiable problems months later (i.e., not requesting enough dynamic shared memory bytes, forgetting a boundary check, etc…) and might have required days to debug.
I mainly use it with very small data sets for some kinds of debugging. I find it is much easier to just be able to print values directly from the kernels as opposed to writing to some debug output array and then later examining it on the cpu.
Of course, this works for finding logic bugs, incorrect inputs, etc. but not race conditions and the like.
Oh, and I also sometimes use it to run with valgrind like MisterAnderson, if I suspect some sort of memory problem.
I use emulation mode quite often with the visual debugger in Visual Studio. Stepping through, breakpointing conditionally, mouse-overing variables, and evaluating expressions in the watch window (very handy, that last one) are all invaluable. Sometimes I also use Emu mode because nvcc takes a long time to compile my kernels.
I use emu mode if I my code doesn’t work after I implemented it. Running with very small problem sizes (8,16) and just stepping thru.
I use it often when developing and debugging.
When I have a bug in a kernel, emulation mode is the first thing I use to try to track it down. For a complex kernel, I usually compile and run in emulation mode first so I can “debug print” and run gdb. Once everything looks good, I finally try it on the device.
I have had some times when trying to use emulation mode, that it did not work (like one year ago when I started with CUDA).As I am using CUDA mainly from matlab, and have no idea if I could use emulation mode. Trouble is I currently do not have the time to find out how & if it is working, so I am not using it, even though I could use it in some hard to debug cases, as I find myself making errors that are not causing races and such, but other types of errors. (I am not really a C kind of guy, and have gotten to the point where my mind thinks quite ok for writing parallel algorithms)
It should just work^TM. A library generated from CUDA code has the same external interface to matlab (or python, as I’m using it) whether it was compiled in device emulation mode or not. The caller doesn’t know the difference. With MATLAB, you just need to figure out how to setup the command line arguments of nvcc so you can build in emulation mode.
And how do you then actually debug? do you start matlab in the debugger? And how do you set a breakpoint? (here you can see my complete lack of C programming experience shining ;))
I mostly don’t use emulation mode. The reason is simple – I use Driver API.
I’ve used it few times in the past but can’t say that it was really helpful.
I use it for debugging on small datasets. However I prefer visualizing my data and actually understanding the problem. Especially as it is to slow to really make use of conditional breakpoints to catch some of those bugs that show up only in very special, random looking cases.
I also used it for some prototyping before I got ahold of a GT200.