PC crashing everytime the CUDA program crashes any way to prevent this/ is it normal for this to hap

Hi all,

I’m writing a ray tracer in CUDA and when my program crashes (due to a bad memory access or whatever) my PC stops responding and I have to revert to restarting the PC. Is this normal and is there anyway to stop this from happening?

That depends on what operating system you’re using. If you’re using Linux with console-only setup (just using the graphics card for CUDA), then I would guess that yes, you could fix it without restarting (though I’m not a linux user, so I can’t say for sure). If you’re on Windows XP/Vista, then no, probably not; if your bad memory accesses are corrupting the memory that is reserved for the operating system to display it’s graphics…most OS’s probably don’t play well with that sort of thing.

Use the emulator to work out whatever memory access bugs you can first (and remember, in the emulator, chances are you won’t be able to catch instances where you pass a host pointer to the device, which will cause a crash when you run the real version). If you’re familiar with Linux, give valgrind a try…it seems that lots of the linux folks here use that to check for memory access problems in their kernels (even if they end up running the kernels on a Windows machine).

Ok, cheers. I’m running XP using my primary graphics card. I thought the OS would’ve realised the program/graphics card was stuck and timeout the program? I thought there was something built into the driver layer that would reset the graphics card in XP/Vista.

There may be be something in Vista, I don’t know for sure. It uses a different driver model and supports virtualization of graphics memory, but you’ll have to wait for someone that knows more to answer.

XP (and Vista, to a lesser extent) allow drivers to pretty much do whatever they want in terms of reading/writing memory, which is why you always get the “Are you sure?” prompt when you try to install an unsigned driver (a network driver could be infected with a keylogger if you get it from an untrusted source, for example). Since you’re basically executing code on the driver level (even via the runtime, if that’s what you use), there is always the danger that your memory accesses will go unchecked and crash the system/driver/kernel if they are incorrect.

Hmm… Did the watchdog-timer fix introduce a “No watchdog” bug?? :-)

I believe that the memory space on a GPU isn’t virtualised. This means that any kernel can stomp on any memory on the card. If you end up writing to buffers the OS is using for the main display, I imagine that interesting behaviour could result. I’ve certainly managed to crash X on my Linux box with invalid CUDA memory accesses on the GPU running my display. I saw X go to 100% in [font=“Courier New”]top[/font], just before my screen locked up. I could [font=“Courier New”]ssh[/font] in, but decided that [font=“Courier New”]shutdown -r[/font] was the simplest incantation which could fix things.