Controling fan speed of Titan and TitanX with TCC enabled

I was easily able to activate TCC on cards and it works great for my needs except one small thing.
There is no way to control fans anymore, and some builtin fan curve is bad… seen cards going at and over 85 degrees with fans going up from 20% only up to 40% speed and staying there and card throttling down a lot.

Also after TCC is activated cards are not detected by any software that I could usually use for fan control not even nvidia inspector.

So does anyone have any idea about this any way to control fans on TCC activated cards?
Even if it is setting fixed rate from command line?
Thanks

What is the temperature inside the computer case? Is there adequate air flow? Is it possible air flow is obstructed by cabling or other plug-in cards? Are there dust accumulations (“dust bunnies”)?

I am surprised whenever I hear about such throttling issues because I have used numerous GPUs over the years (both consumer and professional), often high-end cards running at full speed for extended periods of time, but I have never run into such an issue.

I am also not aware of issues with “bad fan curves”, which does not mean they could not exist with some VBIOS versions. Are you running the original VBIOS installed on the card?

It is not problem with airflow.
When TCC is off and I have control of fans they are going up to 100% as I use very aggressive fan curve and all is fine. I use cards for GPU rendering.

Problem is as soon as I activate TCC mode I don’t have any control of fans on cards.
Not a single program actually sees cards at all to be able to control them.
Fans then goes at max of like 40% rpm which is far from enough to cool rendering GPUs, 4 of them at stack.

So problem is how to make fans on cards go to 100% of rpm speed when they are in TCC mode

A GPU in TCC mode is a 3D controller not a graphics card, which is presumably why the programs you normally use to control the GPU fan don’t work. I don’t know what to do about it.

Your reference to “4 GPUs in a stack” seems to imply that you have actively cooled GPUs that are placed too close together to ensure adequate airflow (and possibly that there is hot air from one GPU flowing to the next one in the stack), which is presumably why you have been forced to manually increase fan speed for adequate cooling to begin with. This does not sound like a properly engineered enclosure to me.

The only idea I have is to use a powerful fan to push cool air from outside the case into the (presumably very narrow) gaps between the GPUs. That is the kind of hacky “chickenwire & duct tape” approach I have used for cobbled-together, insufficiently cooled systems before.

4 GPUs in a stack meaning there are 4 of them installed in a case.
They are all nvidia reference fan design ie blowing hot air out.
There are no issue with temperature at all when fan increases properly with temperature rising.
After hours of rendering they don’t go much over 75 degrees.
So only issue is that once cards are in TCC mode fan doesn’t go full speed, ever. Ofc then temperature goes way beyond 80 degrees which is problem.
For example how does Tesla card are being cooled?
Don’t they do intensive calculations as well, even being used for rendering too. So I assume there should be some fan control :)

If none of the GPU overheats (75 degrees seems perfectly fine, even 80 would be OK), the only other reason I can think of that would cause them to down-clock is if they exceed the power limit, or if the power supply does not deliver enough power. nvidia-smi can show the max. power rating for the GPU as well as the current power draw.

What does not make any sense to me is the statement that the GPUs do not overheat at stock fan speeds, but nonetheless down-clock unless cooled more aggressively. Something is very wrong in that scenario, but I cannot remotely diagnose what it is. You also stated earlier that you saw GPUs go up to 85 degrees, how does that jibe with your information that “after hours” of operation they only reach 75 degrees? Power consumption will differ widely based on workload, are you quoting temperatures from two difference workloads by any chance?

As a sanity check, I would make sure that all the power connectors are plugged in, and make sure the power supply has sufficient output to drive four GPUs. With some power supplies you may need to take care how GPUs are matched to “rails”. My recommendation would be to use a power supply that is rated at 1.5x the combined peak power consumption of the GPUs plus the CPU. For example, if each of the four GPUs is specified for 235W max power, use a 1500W PSU.

How are Tesla cards cooled? It depends on whether you have an actively or passively cooled model. The actively cooled ones come with a fan, basically the same way as a consumer GPU. The passively cooled models have a heat fin assembly, and require that the fans in the server enclosure blow just the right amount of air over these fins. This usually means you need to buy such a system from an integrator that partners with NVIDIA so you can be sure the cooling is set up correctly.

Some adventurous souls have tried integrating passively cooled Teslas into their own systems, and often it doesn’t work right due to insufficient cooling. That does not mean it can’t be done, one just has to have the knowledge and experience to set this up correctly, and few people have that.

I think you misunderstood me.
There are 2 scenarios:

  1. GPUs in standard, NON TCC mode.
    All fan control works with programs such after burner for example. There is temperature curve and when cards are rendering fan goes up to 100% with heat rising and they keep cards cooled and up to 75 degrees or something.

  2. GPUs in TCC mode
    Then fan control is not working, programs such after burner can’t even detect cards so in this case fan curve is not working. Then even when rendering and card temperatures are rising over 85 degrees fans are at max of 40%.
    That is not enough to properly cool cards.

All my systems have high quality 1500W power supplies, all power cables connected both to GPUs and all additional one to MBOs as well.
So only issue is once I put card into TCC mode I cannot control fan speed and with max fan of 40% in that case is not enough to properly cool cards.