CUDA causes system freeze system has to be reset to work again ...

In last weeks I observed that CUDA can freeze your system and you aren’t able to do anything.

I asked around an I’m not the only one who had such a expierence.

Probably there is something wrong with my kernel, but this behavior is annoying, because you
have to reboot.

I hoped system freezes (caused by user level programs) are a problem of the past and that
we had overcome this problem with the introduction of virtual adress spaces.

How memory is handled on the GPU? Is there any memory protection or can my CUDA
kernel overwrite the complete memory? My kernel can at least freeze my machine and
I didn’t like this at all.

My machine is Win XP x64 with 6 GB RAM and the GPU is Quadro FX 4800.
I’m using CUDA 2.3 and compile it to 32 Bit.

Hello.

I’ve been having the same problem.

I think there is no memory protection and your kernel is overwritting any special address. On Linux you can use “dmesg” looking for “Nvidia Xdi” messeges.

If your kernel is overwritting any special address and causing Nvidia Xdi messeges, and that happens several times, finally the driver freezes the system.

I’m using 3.0 tools and there is no solution.

The same happens to me now. Interesting but weird behavior began just a few days ago. Now the program kills my computer if run several times for very small input values (very small means smaller than practical input should be). I believe it leaves kernel because the screen blinks from time to time. I even can try to move program window, but it “scatters” and machine freezes finally. Wish I know what causes such a behavior… External Image

Bad programs, Old drivers are primarily repsonsible…

If possible, Dont run CUDA on display cards…

The program was known to work fine a week ago. I don’t remember me changing something in the code, so I wonder where it came from.

I’m using CUDA 2.3, driver version is 190.62, OS is MS Windows Server 2003 x64 Enterprise.

It seems to me that some resources are not freed on host thread exit and therefore later calls experience lack of resources…

Can it be a result of overloading the hw? I run 512 computationally intensive threads per kernel, at least 64 blocks in grid. Unfortunately, have not profile yet because it takes too much time.