When running under Linux, what part of the system is supposed to be responsible for temperature management and fan control?
I’m asking this question because with a nice working install of 64-bit Ubuntu Linux, CUDA and any 64-bit driver up to and including 180.22, I get no fan management. When the machine boots up, the GPU fan is at 40% speed according to nvclock. The fan speed stays constant no matter what I run on the card.
When I run nvclock, and request automatic fan control, it seems to adjust the fan speed once, according to the temperature at that precise moment. So when I run batches of stuff, the card heats up but the fan doesn’t respond.
To counter this, I tried slapping together a nasty little fan management daemon, using Ruby and the “daemonize” library. This calls the “auto” fan function in NVClock every 10 seconds. It’s a pretty bad solution but it looks like it works OK.
It doesn’t seem like the state of affairs I’d expect to see … how is this SUPPOSED to work? Is it unreasonable to expect the driver to manage the fans?
The thing is that the NVIDIA software doesn’t do temperature / fan management. Hence I have had to resort to using non-NVIDIA software to do this. And also the question:
How is this SUPPOSED to work? Is it unreasonable to expect the driver to manage the fans?
Thanks for the replies, guys. I’ll try not to be too melodramatic.
It’s a GTX 280. I got it up to 70° C just now, in maybe 10 minutes, maybe less. I was running a program that sleeps the GPU maybe 50% of the time, in alternating 10ms intervals, so I’d like to try one of the hotter-running sample projects.
How do those numbers look?
Edit – 75 core, 58 ambient, after 10 minutes of Mandelbrot. Any particular tests you’d like me to try?
Edit 2 – 80 core, 62 ambient, after ~15 minutes of postProcessGL and Enemy Territory: Quake Wars in a window.
Thanks for the info … so, can I conclude that something is wrong with my setup or config? Are these temperatures as high as I think? Any ideas for things to try?
If you run nvidia-settings, you’ll be able to see what the card & driver believe are the acceptable safe limits for the temperature. I don’t think that the numbers you’re reporting are excessive. If the fan never spins up to a higher speed (than the default) as you approach the upper safe limit, then that would be cause for concern. Definitely if the fan cannot bring the temperature back into a safe range, that would be a problem. Artificially spinning up the fan with nvclock is likely to just reduce the life of the fan if its running at a higher rate more often than is needed.
As these cards are running quite a bit hotter, in terms of absolute temperature, than other components these days, it is kind of a leap of faith to trust that the fan should stay so quiet … I’m not disputing the OK-ness of the temperatures, just stating my impressions. It would be very useful - and would give peace of mind - to see the range of OK temperature readings and the expected and OK behavior of the fans. Somewhere on a spec page on nvidia.com or something. An explicit listing of this would probably eliminate misunderstandings like mine, and aid in configuring and managing the system. I was a bit worried actually :)