How much cooling is required for a K40c ?

I have a K40c (the one with a fan) plugged into a workstation which is heavily used day and night for GPU computations. I have been observing the temperature (reported by nvidia-smi) going up to 73-74 C easily every time there is an intensive job running. The max I have seen is 75C.

So, I have been keeping a 1 tonne AC switched on day and night but since the workstation is in a room where people keep coming and going, the temperature isn’t maintained very well. Also, I sit near the workstation, so I can’t keep the temperature very low.

The technical specs ( for K40c say that 70C is the max operating temperature! Is my GPU being hurt by the heat and should I arrange better cooling ? How much cooling is right ? And how does one decide how much cooling would be right ? I have read a bit about BTU and tonnage of AC but I am confused.

PS. Sorry if the topic is in the wrong forum. Please point me to the right one in that case :)

Those temp numbers seem about right for heavy use, as that is about what I have observed running large tasks on a GTX 980 or GTX Titan X.

Maybe you could add some extra cooling inside the case via an additional fan or a liquid cooling system, but in general I would not worry too much. Tesla GPUs have a long warranty/service period and run at lower clock speeds than the consumer GTX GPUs, so I would be very surprised if there were to be a major issue caused by slightly higher temps.

My concern is that since the technical specs specify 70C as the max operating temperature, isn’t it worrying when it regularly stumbles around 73-74C ? Also, I have verified that the machine can be kept quite cool and at around 65C by just putting an extra temporary AC, but I want to know exact cooling requirement so that I can arrange that.

I would rather invest in cooling now than servicing later.

You may want to check the airflow inside the computer case to make sure airflow to the GPU is unobstructed.

The way I read table 8 in the document referenced is that this specifies the environment in which the K40c can operate. The operating temperature is specified as -10 to 70 deg C, which I interpret to mean that the air at intake must be in that temperature range.

The GPU’s internal temperature can well be higher, each type of GPU has a specific limit. My Quadro regularly hits 80 deg C under heavy load. GPUs apply clock throttling if they draw too much power or hit their specified thermal limit. Before they do that, they will increase fan speed to the maximum, which should be noticeable due to noise. I think nvidia-smi can show you whether clock-throttling is being applied for thermal reasons.

[Later] nvidia-smi -q shows the following relevant data:
GPU Shutdown Temp
GPU Slowdown Temp
Fan Speed
Clocks Throttle Reasons