I’ve found various ways to probe nvidia-settings for the information, for example like:
nvidia-settings -g gpucoretemp
or
nvidia-settings -q=gpu:0
but whatever I do, I get a complaint about the display:
$ nvidia-settings -g gpucoretemp
ERROR: Cannot open display ‘’.
$ nvidia-settings -q=gpu:0
ERROR: The control display is undefined; please run nvidia-settings --help for usage information.
I assume this is somehow related to this being a headless server.
So, any suggestions?
nvidia-settings will also work but as you noticed you have to get picky with the query flags and it may not list all your gpus if they aren’t in the xorg.conf files.
Also note that nvidia-smi device numbers are not the same as nvida-settings device numbers, which are not the same as CUDA device numbers… always confusing.
In my PC I’m running now, you see I have 3 flavors of cards installed at once. To nvida-settings, the GT240 is gpu:0. To nvidia-smi, it’s GPU 2. To cuda, it’s device 3.
I always add a query flag to my own tools to simply listing the cuda devices so I can decide which ones to activate manually since you can’t really query reliably.
I build this list via simple calls to cudaGetDeviceProperties for each device number.
Temperature
GPU Current Temp : 72 C
GPU Shutdown Temp : 99 C
GPU Slowdown Temp : 96 C
GPU Max Operating Temp : N/A
Memory Current Temp : N/A
Memory Max Operating Temp : N/A
Temperature
GPU Current Temp : 73 C
GPU Shutdown Temp : 97 C
GPU Slowdown Temp : 92 C
GPU Max Operating Temp : N/A
Memory Current Temp : N/A
Memory Max Operating Temp : N/A