nvidia-smi hangs. cannot be killed even by SIGKILL

I just finished carefully installing the latest nvidia drive to an new amazon g2.2xlarge ec2 instance which I would like to use to do some machine learning.

Before install cuda and other packages, I would like to verify that the driver has been installed correctly.

My understanding is that nvidia-smi is the tool for this job.

The nvidia-smi command seems to return output when I don’t use any options.

ubuntu@ip-10-220-191-26:~$ nvidia-smi 
Tue Apr  5 05:51:06 2016       
+------------------------------------------------------+                       
| NVIDIA-SMI 352.39     Driver Version: 352.39         |                       
|-------------------------------+----------------------+----------------------+
| GPU  Name        Persistence-M| Bus-Id        Disp.A | Volatile Uncorr. ECC |
| Fan  Temp  Perf  Pwr:Usage/Cap|         Memory-Usage | GPU-Util  Compute M. |
|===============================+======================+======================|
Killed

However, if I ask it to display the GPUs or even the help page, it just hangs.

ubuntu@ip-10-220-191-26:~$ nvidia-smi -L

I cannot even kill it with SIGKILL. I have to reboot the machine.

ubuntu@ip-10-220-191-26:~$ ps aux | grep smi
ubuntu    3919  0.0  0.0  14120   932 pts/0    D+   05:37   0:00 nvidia-smi -h
ubuntu    3991  0.0  0.0  14120   928 pts/1    D+   05:38   0:00 nvidia-smi -L
ubuntu    4064  0.0  0.0  10460   928 pts/2    S+   05:42   0:00 grep --color=auto smi
ubuntu@ip-10-220-191-26:~$ kill 3919
ubuntu@ip-10-220-191-26:~$ kill 3991
ubuntu@ip-10-220-191-26:~$ ps aux | grep smi
ubuntu    3919  0.0  0.0  14120   932 pts/0    D+   05:37   0:00 nvidia-smi -h
ubuntu    3991  0.0  0.0  14120   928 pts/1    D+   05:38   0:00 nvidia-smi -L
ubuntu    4066  0.0  0.0  10460   932 pts/2    S+   05:43   0:00 grep --color=auto smi

I am uncertain how to debug this problem.

I would just like to verify that the drive is properly installed and communicating with the GPUs.

Try updating to a newer driver. You may be interested in this forum thread:

[url]https://devtalk.nvidia.com/default/topic/880246/cuda-setup-and-installation/cuda-7-5-unstable-on-ec2-/[/url]

I would recommend trying the 352.79 driver:

[url]http://www.nvidia.com/download/driverResults.aspx/98334/en-us[/url]

If you previously installed the nvidia driver using the runfile installer, you can just use the runfile installer downloaded from the above link.

If you used a package manager method (apt-get) to install the driver, you will want to start over with a clean AMI that does not have an NVIDIA driver loaded, and load the 352.79 driver using the runfile installer.