Ubuntu Linux Titan RTX locked at 300MHz 100W power draw.

Hi, my titan RTX seems to be locked at 300MHz and 100W power draw.

I’ve tried analyzing all the nvidia-smi output but nothing seems to be unusual other than SW powercap: active

But my other Titan has this same setting and is not locked at 300MHz and 100W power draw.
The second Titan seems to be acting normal as far as clock rate, and power draw is 10-20W on Idle and up to 280W when it’s working.

Any pointers how to trouble shoot this? What I can post in here to help trouble shoot this?

Anybody with similar experience for who this sounds familiar?

The problem Titan RTX seems to be locked at 300MHz at idle, and during full cuda load it might drop to 150MHz but never goes above 300MHz, even though gpu utilization says 100%. It pulls 100W at idle and doesn’t go up past 170W at full cuda load even though utilization says 100%

Power limit is set at 280W on both Titans.

Thanks,
Angel

Does the situation persist across power cycles (including hard shutdown with power removed)?

Do the cards swap roles (of being throttled) if you swap the PCIe power cables connected to the GPUs?

You could attempt to use the nvflash tool to get the BIOS from the “good” GPU and flash it to the “bad” GPU provided that the cards are identical (including the brand of the GPU memory chips)

Hi, thanks. No the situation does not persist across hard shutdown with power completely removed from everything. But then quickly returns again after first load.

We found the problem. Removing the shroud we found that the thermal paste was completely dried up.
We applied fresh thermal paste and the problem is gone. It was the fast temperature spike that gave it away.

Can you tell me if removing the shroud and replacing it invalidates the warranty? (founders edition Titan RTX)

Thanks again,
Angel

PS. appreciate all the advanced card protection systems you’ve incorporated into the card.

Probably best to discuss your situation with the support people from wherever you bought it. If you bought it directly from NVIDIA, they have support.