Monitoring critical temperatures

caezar_jc · October 10, 2014, 2:12pm

Hello, I am working right now using the Nvidia APi to get information about the graphic card thermal sensor. I have a few questions I would like to clear.

First, does some or all graphic card’s drivers from Nvidia implement some security measures for downing the GPU’s clock (maybe setting it to a idle state) or shutting down the computer to avoid damaging the graphic card due to overheating?

Second, there is some way to get the information, maybe from the driver it self or using some Nvidia api function, of the critical temperatures that the GPU or memory can reach?

And the last, on Nvapi there is a thermal module that gives information about some sensors. I only found that is not very clear if the following attribute refers to the mentioned above “critical temperature” the GPU can reach before taking some safety measures, or it only means the maximum value that can be read from the sensors.

From the documentation:
NvU32 NV_GPU_THERMAL_SETTINGS_V1::defaultMaxTemp

The max default temperature value of the thermal sensor in degree Celsius.

Reference: NVAPI Reference Documentation

Thanks very much!

ccooper · March 30, 2015, 8:25pm

Caezar_JC,
Some of the GPUs do support thermal throttling and will shutdown in order to protect the devices. In linux for example when this happens lspci will report it has dropped off the bus… reboot and she will come back to ya. In order to gain the information on your current GPU temperatures you can use the Nvidia Management Library NVML. It is a C-based API that calls base level GPU information. One of the key tools that uses NVML is Nvidia-SMI. This is command line utility that pulls the needed information as XML or plain text.

NVML = https://developer.nvidia.com/nvidia-management-library-nvml
Nvidia-SMI = NVIDIA System Management Interface | NVIDIA Developer
Nvidia-SMI Commands = http://developer.download.nvidia.com/compute/cuda/6_0/rel/gdk/nvidia-smi.331.38.pdf

Have a Good One,
ccooper

Topic		Replies	Views
Is there an API to change the current maximum GPU temperature limit? System Management and Monitoring (NVML)	6	1019	April 24, 2024
NvCplGetThermalSettings call to nvcpl.dll returns false (C++) CUDA Programming and Performance	1	3093	September 24, 2014
Measure GPU temperature in Linux ? CUDA Programming and Performance	6	107189	October 6, 2009
measuring device temperature CUDA Programming and Performance	18	12118	November 6, 2009
Temperature monitoring Is sub-degree accuracy possible? CUDA Programming and Performance	5	9585	September 12, 2011
Inquiry about Temperature Capping Support on Nvidia GPUs System Management and Monitoring (NVML)	0	317	March 8, 2024
Shell script & "nvidia-smi" - needs right command/flag! CUDA Programming and Performance	5	9622	June 28, 2015
nvapi gpu thermal info NVAPI	1	2245	August 18, 2015
Get the temperature of GPU CUDA Programming and Performance	3	1068	February 4, 2018
How to read GPU temperature from CLI CUDA Programming and Performance	4	17465	February 18, 2018

Monitoring critical temperatures

Related topics