I have a workstation with two K80 cards. when start running my CUDA application, within 10 minutes, the cards reach a temperature of 90 dec C and the applications become very slow. Once the temperature crosses 93 Dec C, they terminate abnormally with ‘unknown error’. Anybody has a solution to the heating problem of K80?
K80 is only designed to work correctly when installed in a OEM server that has been certified for use with K80. It is not a workstation product. It requires server-provided forced air cooling, which generally won’t happen in a workstation.
The main point txbob makes is that the K80 is a passively cooled GPU, meaning it relies on fans inside the enclosure to provide adequate airflow across the K80 heatsink (see K80 specifications for airflow requirements). I have come across at least one workstation design that appears to accommodate passively-cooled GPUs:
I have no affiliation with SuperMicro, haven’t used such a system, and cannot tell you whether the system on the linked webpage can in fact be used with K80s. Generally speaking, it is highly advisable to buy K80-based systems only from system integrators that are on NVIDIA’s partner list.
We have had numerous questions in these forums from people who tried to roll their own K80-based system and ran into trouble of one sort or another. In general, such issues are not resolvable remotely, and I believe (don’t know for sure) that NVIDIA’s position is that only K80-systems acquired from approved system integrators are supported.
List of partners: http://www.nvidia.com/object/where-to-buy-tesla.html
There is a YouTube video by someone who added a dedicated external exhaust fan for his K80 in a rather high powered desktop system. Requires some amount of sheet metal work.