nvidia-smi

I discovered this issue when debugging a failure of ‘gpuburn’ (http://wili.cc/blog/gpu-burn.html) and quickly tracking it down to the update temperature method they implemented. gpuburn would quickly segfault.

Besides doing some very ugly things to parse out the temperatures, there seems to be an issue with the looping capabilities of nvidia-smi (It does not seem to work anymore) on a GTX 970.

Known working: The Cuda 6.5 bundled driver release has no issue making this call, it seems that sometime before 343.36 but after 343.19 this issue appeared.

#######
FAILING
#######

root@localhost:~# nvidia-smi -l 100 -q -d “TEMPERATURE”
Failed to register events for GPU 0000:82:00.0: N/A

root@localhost:~# nvidia-smi -lms 10000 -q -d “TEMPERATURE”
Failed to register events for GPU 0000:82:00.0: N/A

#######
WORKING
#######

root@localhost:~# nvidia-smi -q -d “TEMPERATURE”

==============NVSMI LOG==============

Timestamp : Sat Jan 3 18:16:51 2015
Driver Version : 346.22

Attached GPUs : 2
GPU 0000:05:00.0
Temperature
GPU Current Temp : 53 C
GPU Shutdown Temp : 106 C
GPU Slowdown Temp : 101 C

GPU 0000:82:00.0
Temperature
GPU Current Temp : 48 C
GPU Shutdown Temp : N/A
GPU Slowdown Temp : N/A

nvidia-bug-report.log.gz (222 KB)

Driver version 346.35 rectifies this issue.