Quadro T2000 throttles down to 300MHz and stays there

sopsaare · November 20, 2019, 1:41am

Hello,

My configuration is Dell Precision 5540, i7-9850H and Quadro T2000 (specifically NVIDIA Corporation TU117GLM [Quadro T2000 Mobile / Max-Q]) with Fedora 31, (5.3.9-300.fc31.x86_64) and NVIDIA drivers being 440.31 installed from akmod package.

The problem I’m encountering is that basically on all 3D applications, after a while, like 5 minutes or so, the GPU starts to throttle. That is normal and understandable especially on a laptop. But the problem is that it throttles down to 300MHz and does not clock any higher without a reboot. That is basically unusable at that point. The PowerMizer Preferred Mode setting does not affect this at all.

For demonstration purpose I wrote a script that takes current gpu frequency and temperature and appends them to to csv file (attached) and I run TW: WH 2 benchmarks in this order:

battle benchmark (avg fps 47.7)
skaven benchmark (avg fps 48.6)
battle benchmark (avg fps 12.7) as can be seen the effect on performance is huge

I will attach a csv file of the script where on timestamp 1574211968198 a drop in the frequency can be observed which happened pretty near to end of the skaven benchmark. Then also the temperature starts going down but the clock speed never picks up. I will attach screenshots from the runs and also provide the nvidia-bug-report.log

I do realize that this might be compatibility issue with my laptop manufacturer as at the same time the GPU starts to throttle the CPU also starts to throttle. Yet the CPU recovers normally as soon as the temps recover but the GPU does not recover without a reboot.

Files:

nvidia-bug-report.log.gzhttps://drive.google.com/open?id=16ErfsUudEoH4dkbsHN0BIarUEE03JFTb
frequency and temperature csvhttps://drive.google.com/open?id=11uDnWnEw3SW-b6knXNG1ZuiRJBHXH5_L
first benchmark https://drive.google.com/open?id=1_VF2Q5e3ea4F0vZUxj1zCqULdHSXlVlQ
second benchmark https://drive.google.com/open?id=1Zk4r3j9EuMs0tKJ6LSzWIwKsCOlX0Jjr
third benchmark https://drive.google.com/open?id=1RxVZuFE_Gv5wm_mKpnKhswyaQa98vSqv

Hopefully someone here can figure something out, if there are some parameters or settings I could test, I’m more that willing to give them a try!

generix · November 20, 2019, 9:12am

You should check temperatures of the gpu using nvidia-settings gui or install nvidia-smi and use that. Maybe some bad hysteresis entry in bios is causing this so please check for a system bios update.

sopsaare · November 20, 2019, 2:50pm

Thank you for your reply. The script I wrote uses nvidia-settings to query the temperature and frequency, it just does it in the background and results into a time series data,

I have checked for firmware / bios updates and I’m having the most current one installed already (came as ota update)

generix · November 20, 2019, 5:20pm

https://devtalk.nvidia.com/default/topic/1062020/linux/quadro-t2000-max-q-support/post/5385557/#5385557

sopsaare · November 26, 2019, 4:53pm

Bump maybe?

generix · November 26, 2019, 5:06pm

Since this seems to be an issue with that specific notebook model, did you try to raise an issue with Dell so they could contact nvidia? Furthermore, you could also mail the problem description and nvidia-bug-report.log to linux-bugs[at]nvidia.com

sopsaare · November 26, 2019, 8:37pm

I did post to Dell but they were again totally not helpful.

I think that this is a problem in the NVIDIA driver (and confusion about the Max-Q or non Max-Q variant)

The best possible situation is that the NVIDIA driver does not support the temperature management with the T2000 Max-Q and my laptop has that particular chip in it. Or the chip is most probably 100% same with all the models but the Max-Q just has different power / temperature management that is not supported with the Linux driver.

generix · November 26, 2019, 9:27pm

You shouldn’t put too much attention to the “Max-Q” tag, rather ignore it. Those are the same chips sharing the same pci id which results in the “T2000 / Max-Q” display, those are just vendor specific models with a lowered tdp.
Rather assume that you have a regular T2000. You could install nvidia-smi and check if it displays the power budget. https://www.notebookcheck.net/NVIDIA-Quadro-T2000-Max-Q-Graphics-Card.424172.0.html

sopsaare · November 26, 2019, 9:33pm

Thanks for that answer!

In the .csv file I can see clock speeds going up to 1815MHz so definitely not the lowest power budget model.

I will look into installing nvidia-smi but for what I understood it needs the xorg stuff installed and newest Fedora is not running that.

generix · November 26, 2019, 9:44pm

BTW, did you ever install windows to check if the same issue is not happening there?

sopsaare · November 27, 2019, 2:57am

Hi,

No I didn’t try it on Windows as I have no interest to run my work PC with that. Fortunately I don’t need the CUDA or anything else taxing the graphics card right at the moment but soonish I may well be running CUDA / OpenCL stuff and at that point I would not like to have it throttling like this.

I have been toying up with graphics cards and PC’s from the times of GeForce 2 / 3DLabs and I’m pretty sure that I have a driver problem so that is also why I haven’t been spending too much of time on changing the OS’s.

I will try Windows at some point when I have time to move all my work stuff to an USB HD or install Windows on one.

I will also try to use back channel communication with Dell (and NV) if no-one picks this up.

But thanks for all the help @generix!

genis_valentin · January 9, 2020, 3:35pm

Hello,

I have the dell 5540 with ubuntu 18.04 and experienced the same problem. I realized that the frequency is limited to 300 MHz only AFTER the execution of a CUDA application. The frequency of the clock is not limited as long as CUDA is not used. For example, if I run glmark2 right after booting the clock frequency stays at to 1860 MHz.

sopsaare · January 13, 2020, 1:25pm

@genis_valentin are you sure that it stays there, like did you hit the throttle? For me it goes down only once it has hit the 80C mark.

I had this script running on the background to take the readings:

const { exec } = require('child_process');

const TEMP_COMMAND = "nvidia-settings -q ThermalSensorReading";
const CLOCK_COMMAND = "nvidia-settings -q GPUCurrentClockFreqs";
const getIntValue = (out) => Number.parseInt(out.split("):")[1].split(".")[0].trim());

const getFloatValue = (out) => Number.parseFloat(out.split("):")[1].split(".")[0].trim());

function msleep(n) {
    Atomics.wait(new Int32Array(new SharedArrayBuffer(4)), 0, 0, n);
}

function sleep(n) {
    msleep(n*1000);
}

const main = async () => {
    while (1) {
        let temp, freq = null;
        temp = getFloatValue( await execute(TEMP_COMMAND) );
        freq = getFloatValue( await execute(CLOCK_COMMAND) );
        console.log(`${(new Date()).getTime()},${temp},${freq}`)
        sleep(1);
    }
}

const execute = (command) => {
    return new Promise((res, rej) => {
        exec(command, (err, stdout, stderr) => {
            if (err) {
              console.error(err)
            } else {
             res(stdout);
             if (stderr != null && stderr != "") console.log(`stderr: ${stderr}`);
            }
        });
    });
}

main();

and I run it as “node script.js >> temp_and_freq.cvs”. Instantly when it hits 80C (can be seen from the cvs file in the first post) it will throttle to 300MHz and that is the maximum it will reach after that.

And for @generix I installed Windows to dual boot and had zero issues there. So only Linux issue, haven’t tested with the new drivers yet (440.44).

benjamin.werner · March 1, 2020, 7:53am

Hello,
I recently discovered the issue. Is there anything new ? I tried the only three days old driver 440.64, but am still stuck at 300MHz.

I have not bee able to go beyond 300MHz, even without launching cuda applications before.

genis_valentin · March 1, 2020, 2:31pm

Hello!

Thanks @sopsaare for the script. I used the script to monitor GPU frequency and I saw that it oscillates between 75 Mhz and 300 Mhz (for a T2000 card), already from system boot up. This happens with drivers 418, 430, 435 and 440 series. When I reverted to driver 410, which came in the original ubuntu installation by Dell, the GPU clock frequency oscillates between 300 and 2100 Mhz as expected.

I attach the bug report produced with

410.104 driver Microsoft OneDrive - Access files anywhere. Create docs with free Office Online.

440.59 driver Microsoft OneDrive - Access files anywhere. Create docs with free Office Online.

benjamin.werner · March 1, 2020, 3:46pm

@genis_valentin, I can’t thank you enough for this information. Indeed, reverting to the 410.104 driver changes everything. The clock goes up, the graphic benchmarks and tensorflow computations are much better, and even videos on the web are smooth again.

I should have noticed that it was better at the beginning.

Cheers,

B

sopsaare · March 2, 2020, 4:05pm

Good work @genis_valentin.

Now we just need to file to NVIDIA or hope that someone picks this thread up form here. Also might be that it is only problem with the specific Dell model + specific NVIDIA model but I highly doubt that.

Unfortunately the 410 is quite an old driver already so feels bad to revert to so old driver :(

genis_valentin · March 10, 2020, 5:39pm

I filed a bug, let’s hope that they solve the issue in future releases! It is a pity to be stuck with 410 as it does not have prime offloading.

https://developer.nvidia.com/nvidia_bug/2884316

amrits · July 1, 2020, 5:57am

Hi All,

Please help to verify with latest Beta driver release and share results.

amrits · July 20, 2020, 7:14am

Hi All,

Please help to verify with latest driver release and share results.

Topic		Replies	Views
Severe throttling on Thinkpad T14 Gen 1 with GeForce MX330 Linux linux , gpu	11	5263	December 25, 2022
Quadro P5200 Power / Performance problems in Manjaro \| Ubuntu on DELL Precision 7730 Linux	35	3899	November 27, 2019
System hangs with drivers 319.23, 319.32, 325.08 and others - simple test case included Linux	17	9448	July 1, 2014
Sluggish Performance/no Reclocking (Ubuntu 17.04, Kernel 4.12RC2, Nvidia Quadro M2200, Driver 381.22... Linux	49	9697	October 14, 2021
Ubuntu 20.04 - NVIDIA GPU consuming power even when using only integrated graphics card (Intel iGPU) Linux	40	9684	December 21, 2022
Random low frame rate intervels no matter how much is running Linux	22	3503	October 27, 2024
Has anyone been able to run an RTX 3060 laptop GPU at more than 80W on Linux? Linux	110	36631	March 13, 2024
Ubuntu 18.04 completely freezes after a few minutes of being booted Linux	25	18070	October 8, 2021
why "all CUDA-capable devices are busy or unavailable" ? CUDA Programming and Performance	34	64132	April 20, 2011
trying to get a tesla k10 online. cuda_5.5.22_linux_64.run fails Linux	18	5793	February 16, 2014

Quadro T2000 throttles down to 300MHz and stays there

Related topics