We are running an app on a K80 that for the first 2-3 minutes does just fine - but the temperature of one of the GPUs goes up steadily to 90C after 3 minutes and the clock speeds then throttle to between a third to an eighth of what they were. There is only a passive heat sink. Has anyone else overcome this hurdle?
TIA.
$ nvidia-smi
+------------------------------------------------------+
| NVIDIA-SMI 340.32 Driver Version: 340.32 |
|-------------------------------+----------------------+----------------------+
| GPU Name Persistence-M| Bus-Id Disp.A | Volatile Uncorr. ECC |
| Fan Temp Perf Pwr:Usage/Cap| Memory-Usage | GPU-Util Compute M. |
|===============================+======================+======================|
| 0 Tesla K80 Off | 0000:05:00.0 Off | 0 |
| N/A 91C P0 110W / 149W | 940MiB / 11519MiB | 98% Default |
+-------------------------------+----------------------+----------------------+
| 1 Tesla K80 Off | 0000:06:00.0 Off | 0 |
| N/A 63C P0 120W / 149W | 940MiB / 11519MiB | 64% Default |
+-------------------------------+----------------------+----------------------+