Quadro T2000 throttles down to 300MHz and stays there

I have what looks like same issue with DELL Precision 7730 (Intel Core i9-8950HK, Quadro P5200) for more than a year already (but it happens only on battery), hopefully it will be resolved for good this time, coz it’s embarrassing.

Definitely doesn’t look as temperature, power or hardware problem on my end, have tested everything to death, it’s not even throttling of CPU / GPU as temps and power is intact.

Looks like very low graphical performance on battery, as even internet pages are very laggy when scrolling while on battery (which is laughable on such machine, even when throttled to something like Graphics 427 mHz / Memory 7230 mHz - it definitely shouldn’t behave that way).

On AC it works like beast.

Can’t test 450.51 yet, coz it’s not available on Manjaro Stable yet (440.100), will try later…Hopefully it will work finally.

@amrits Using driver version 450.51 from Index of /graphics-drivers/ppa/ubuntu, the problem persists.

With Ubuntu 18.04 and Kernel 5.3.0-62, nvidia-driver-418 and lower can be installed, but do not seem to function correctly. nvidia-driver-430 and higher, up to the version 450.51.06-0ubuntu1, suffer from the mentioned throttling issue.

Windows with version 440 has generally lower GPU temps (70-71 instead of 75+) with the workload that leads to throttling issues in Linux. No throttling takes place when using Windows, but that might be due to the lower temperature. In Linux, limiting CPU frequency leads to significantly lower GPU temperatures as well and throttling can thus be avoided.

For the sake of comparison, I have a Dell Precision 5540 with a Quadro T2000. For the first few minutes of a heavy workload, everything is fine, regardless of whether the charger is plugged in or not. After some time at around 74-75°C with a utilisation of approximately 50% and only 26W draw (of apparently 60W maximum), the frequency instantly drops from 1500+MHz to a measly 300MHz. If the workload is not removed, the system frame rate drops to a point where it’s barely usable. The GPU frequency stays low until the next reboot.

It could be because your heatsink is clogged, which causes the temperature to rise, which is causing thermal throttling.

The throttling itself is not the issue. The failure to increase the GPU frequency to more than 300MHz afterwards is the problem. This can only be fixed by a reboot, which should not be the case.

EDIT: Also, it might be useful to implement more gradual throttling instead of dropping from 1500+MHz to 300MHz without any intermediate steps, but that too is not the issue at hand.

Have you monitored the GPU temp to see if it stays high? Mine stays high while the OS is running, even if it is virtually idle.

On my system, when booting, it gradually climbs from around 40C to 90C. And might oscillate between 90C and 70C. But generally stays nearer 90C.

When I reboot, the GPU does nothing during the boot process. So it has cooled down by the time the OS is loaded again.

Please see the CSV file in the first post. Looking at it all these questions, (if the heat sink is clogged, if the temp keeps high, etc) will be answered to the fullest extent.

I think the throttling somewhat works before the final drop to 300. The CSV file in the first post shows some fluctuation in the clocks before the final drop to 300.

This is to keep everyone informed that we are actively working on finding the root cause for the issue.
I will keep posted here with the updates.

The funny thing is that I bought this laptop for work about 9 months ago or so and it cost me quite a lot of money. If I had had a need for the GPU so far I probably would have returned it as defective and gone with an alternative as this would be 100% unusable in any GPU workload. It is quite a shame as this is Quadro and not some mere GeForce out from Wallmart.

Hi All,

Can someone please kill all the load from GPU when hit with repro and wait for temperature to drop down.
This should revive clock values back to normal and this cycle may keeps on going.
We verified it internally and had to avoid reboot.

We are still actively debugging issue to find RCA, will keep updating on the same.

Hi All,

Please help to provide inputs based on my last comment.

Hello, I tried to upgrade to focal on my precision 5540 i5 with T2000. I hade to revert because I could not use the T2000 correctly :

  • The 410 driver seems not usable with 20.04
  • I still have the 300MHz limit with the other drivers.

And to answer the question: whatever the temperature, the T2000 does not go over 300MHz when not using the 410 driver.

I want to add some information which could help to track down the problem. When I start the 5540 plugged on a USB-C power with not enough watts, I get a warning message from the bios that the computer may not run full throttle. If I go on, then the GPU will always be limited at 300MHz. Even if I then plug the regular adapter in.

If I reboot with the regular power adapter, it gets back to normal.

(all this is with the 410 driver, the only one allowing to actually use the GPU)

Thanks Benjamin for the response.
Does clock values not went beyond 300 MHz after killing all load from GPU followed by temperature down.
In our local setup, once we killed workload on GPU and monitored the clocks, it gets revived in 5-6 minutes…

Our team is investigating for the root cause and will keep updated.

Hello. I have the same problem, T2000 switches to 300MHz after some time during the computing task (CUDA, Tensorflow). Looks like it switches to a different performance mode for some reason. I’m using PRIME in on-demand mode, but the problem persists if I switch to nvidia mode. The laptop is Dell Precision 5540, brand new, works on a desk, so the issue is not with the thermals, but looks very similar to thermal throttling.

I’ve attached two bug reports, normal is for the state of GPU without the computing task and the other was created during the computing, when the GPU was in 300 MHz state. I hope that they’ll help you to pinpoint the issue.

nvidia-bug-report.log.gz (951.2 KB)
nvidia-bug-report-normal.log.gz (887.8 KB)

I’ve also found that disconnecting the AC supply turns off this throttling for some time during the computing.

sopsaare, benjamin.werner, whaag, genis_valentin can you try updating the driver to version 450.80 and the kernel to version 5.8 (or at least >5.4)? I’ve updated my OS from Ubuntu 20.04 to Ubuntu 20.08 (which updated the kernel from 5.4 to 5.8) and I no longer have this issue.

Indeed, it seems ok with the 20.04 + 450.88 combination. Many thanks for the head up !

Blockquote [kiselev.06.01]kiselev.06.01
I’ve also found that disconnecting the AC supply turns off this throttling for some time during the computing.

I reproduced this exactly.
I disconnected my power plug for a second and the clock while doing my cuda/tensorflow computations jumped right to 1725mhz ++ (where it was stuck at 300mhz). (I had tried sudo tlp ac, before and nothing changed). This is probably a dell BIOS problem???

I have a quadro T2000 with a dell precision 5540 . RHEL8 and nvidia drivers 450.66 .

Not fixed in 455.45.01
(rhel8)