I can run CUDA three times, then gpu stops responding

I can run a program that has cuda kernel inside only three times, and if I run a cuda program again, it hangs.

What kind of programs hangs:

  1. Simplest cuda program ever, just an empty kernel call
#include "cuda.h"

__global__ void my_kernel()
{
  ;
}

int main()
{
  my_kernel<<<1, 1>>>();
  return 0;
}
$ nvcc -g -G simplest.cu -o simplest
$ ./simplest
  1. any program from Sample SDK

I can run deviceQuery, then simplest, then deviceQuery - and it still hangs

nvidia-smi output before running any program, after running first, second and third and results are the same. Output of this file is here: http://pastebin.com/V3De5mZB

For example, when I run deviceQuery, I have output like this: http://pastebin.com/PmFbkrmT . This is history of launching deviceQuery. First, I have good output, on third I have good output with 10 second lag, and the launch after the lag I have:

cudaGetDeviceCount returned 10
-> invalid device ordinal

Additional info: using different card to render screen (integrated one). I have one ZOTAC GT630, which is based on Fermi chips. When I launch system, I do not have /dev/nvidiactl, /dev/nvidia0 and I create them by typing in term: sudo nvidia-script.sh , where I copied the code. I am using Ubuntu 12.04 32 bit version.

How to deal with it? I am so time-pressed to finish the job that I actually code reseting the machine ;)

I run Ubuntu, albeit x64 versions, and I’ve never had to create those /dev(ices) manually. If you can start over with a fresh install, use the drivers available in the Ubuntu repositories… You shouldn’t have any weird issues or have to create anything manually. If the default nvidia-drivers in 12.04’s repositories are too old for you, you can do either:

sudo apt-add-repository ppa:ubuntu-x-swat/x-updates
or
sudo add-apt-repository ppa:xorg-edgers/ppa
and then
sudo apt-get update
sudo apt-get install

where represents the nvidia driver version package(s) are available in the corresponding repositories listed above by browsing here:
http://www.ubuntuupdates.org/ppa/ubuntu-x-swat
http://www.ubuntuupdates.org/ppa/xorg-edgers

It also could be that your empty kernel just happens to leave the card in a weird/unreliable state. Have you tried other SDK examples? Do those work correctly? If so, I’d try to compile/run something that makes sense and isn’t doing just nothing and see if you have no issues then.

I installed drivers that comes with CUDA, https://developer.nvidia.com/cuda-downloads, Ubuntu 11.04 version.

Empty kernel is the smallest test case I reduced problem to. Everything that I tried that made sense hangs the device after three runs.

The reason why I didn’t recommend installing the drivers that come with CUDA is because those standalone drivers don’t tend to behave well when your kernel gets updated – i.e. your updated kernel might boot without an NVIDIA module generated.

I’ll admit I don’t have experience with running off integrated graphics on a desktop since my board doesn’t have that capability. What I do know is that in my case, the Ubuntu repository versions install the device(s) automatically. Perhaps it doesn’t happen in your case because you’re driving your display with the integrated video… not sure if the repository packaged drivers would behave differently. The only other reason you might need to create those manually is if you’re running a headless server, which might or might not parallel driving the display with integrated graphics.

Other than what I’ve mentioned, my other suggestion is to switch motherboard slots (if possible) and try running the code and see if you have issues. I don’t think this card draws any external power from PCI-E 6/8 pin connectors, but you should also make sure that your power supply is stable.

To figure out the problem or discard possible culprits, I’d suggest trying a different O/S on that system (maybe even Windows, to discard any weird Linux issues that might come up – you can always make a backup with CloneZilla if need be) or even moving the card to another system and seeing if it works there… just some ideas for you to attempt to figure out where the issue lies.