Program runs with status D

Hi All

Im doing some tests using cuda. I test my program in emulation mode and it worked ok, now I wanted to run on the graphic card but using more data, but in got stuck, and if I do a ps ux, I can find that the job status is D+, and I cannot kill it. I tried to run the deviceQuery program and it also got an status D+. One of the problems I facing is that the code run in a node, where I dont have access as root.

Any help?

Thanks in advance.
Luis

The graphic seems to be blocked, I cannot run anything else on it. I found the following info that can be useful.

This has nothing to do with CUDA. You ran the machine out of memory and the Linux OOM (out of memory) process killer took your process because it was trying to allocate all the available memory. But the OOM mechanism isn’t very elegant or clean, and it seems to have left things in a bad state (probably with a CUDA context still on the card or with the CUDA driver in some wait state that it can’t recover from)

The machine will have to be rebooted. Check you code.

Hi Luis,

i’m having the same trouble. Since i updated to cuda 3.0 when executing some random cuda code i got the status D. This state means “uninterruptible sleep (usually IO)” and + is because the job is running in foreground. What version of cuda are you using? BTW a job in this state cannot be killed (is owned by init, and you cant kill that) and at this moment the only solution is to reboot. If you’re using cuda 3.0, try using 2.3

PD. My code has been running flawlessly in cuda 2.3, and doesn’t run out of memory (we have a tesla cluster node with 48GB of ram)