Monitoring critical temperatures

Hello, I am working right now using the Nvidia APi to get information about the graphic card thermal sensor. I have a few questions I would like to clear.

First, does some or all graphic card’s drivers from Nvidia implement some security measures for downing the GPU’s clock (maybe setting it to a idle state) or shutting down the computer to avoid damaging the graphic card due to overheating?

Second, there is some way to get the information, maybe from the driver it self or using some Nvidia api function, of the critical temperatures that the GPU or memory can reach?

And the last, on Nvapi there is a thermal module that gives information about some sensors. I only found that is not very clear if the following attribute refers to the mentioned above “critical temperature” the GPU can reach before taking some safety measures, or it only means the maximum value that can be read from the sensors.

From the documentation:
NvU32 NV_GPU_THERMAL_SETTINGS_V1::defaultMaxTemp

The max default temperature value of the thermal sensor in degree Celsius.

Reference: http://docs.nvidia.com/gameworks/content/gameworkslibrary/coresdk/nvapi/group__vidio.html

Thanks very much!

Caezar_JC,
Some of the GPUs do support thermal throttling and will shutdown in order to protect the devices. In linux for example when this happens lspci will report it has dropped off the bus… reboot and she will come back to ya. In order to gain the information on your current GPU temperatures you can use the Nvidia Management Library NVML. It is a C-based API that calls base level GPU information. One of the key tools that uses NVML is Nvidia-SMI. This is command line utility that pulls the needed information as XML or plain text.

NVML = https://developer.nvidia.com/nvidia-management-library-nvml
Nvidia-SMI = https://developer.nvidia.com/nvidia-system-management-interface
Nvidia-SMI Commands = http://developer.download.nvidia.com/compute/cuda/6_0/rel/gdk/nvidia-smi.331.38.pdf

Have a Good One,
ccooper