I am undertaking a study looking at how the temperature of the GPU affects its power consumption for a variety of workloads. As is well known, the leakage of the GPU is reduced as the temperature is reduced. To do so, we are using a liquid cooled GTX 580 and we have constructed a power monitoring device for each of the PCIe power inputs to the GPU with sub 1% accuracy. However, the main source of my errors at the moment are in the temperature monitoring, which I am currently relying on using nvidia-smi to log the temperature.
nvidia-smi only reports temperature to the nearest degree. Is the temperature monitor built in the to the card capable of more accuracy, and if so, is there an easy way to access and log it? (linux or Windows is fine)
If the built in the monitoring is insufficient for 0.1 degree accuracy, can anyone recommend any alternative method?
As a taste of what power efficiency improvement is possible I attach some results plots. The first shows the reduction in power draw for a constant workload running over a 32-82 C temperature range. The second plot shows the Gflops/w as a function of temperature. Note that when running air cooled, this kernel will typically run in the 80s, so the reduction through liquid cooling is significant.
The kernel we are using here is that reported in http://arxiv.org/abs/1107.4264, which for the parameters chosen for this study runs in excess of one sustained Tflops, so is quite brutal with regards to power draw.