Tesla Temperature Monitoring

I’d like to readout the current temperature of a TESLA C1060 card.

Could anybody please give me a hint how to find code snippets in C or a script ?

thanks from the maintenance man.

NVAPI is what you want:

We are running an app on a K80 that for the first 2-3 minutes does just fine - but the temperature of one of the GPUs goes up steadily to 90C after 3 minutes and the clock speeds then throttle to between a third to an eighth of what they were. There is only a passive heat sink. Has anyone else overcome this hurdle?

$ nvidia-smi
| NVIDIA-SMI 340.32 Driver Version: 340.32 |
| GPU Name Persistence-M| Bus-Id Disp.A | Volatile Uncorr. ECC |
| Fan Temp Perf Pwr:Usage/Cap| Memory-Usage | GPU-Util Compute M. |
| 0 Tesla K80 Off | 0000:05:00.0 Off | 0 |
| N/A 91C P0 110W / 149W | 940MiB / 11519MiB | 98% Default |
| 1 Tesla K80 Off | 0000:06:00.0 Off | 0 |
| N/A 63C P0 120W / 149W | 940MiB / 11519MiB | 64% Default |

K80 is designed to be installed in an OEM-qualified server that has been designed and certified by the OEM for K80. It sounds like you have plugged this into some other platform. In that case, this is exactly what you should expect - there is not adequate cooling provided by the K80 card itself.

A proper K80 OEM server monitors this temperature and varies airflow across the passive heatsink accordingly, to manage cooling.

1 Like