dynamic downclocking doesn't work

Hi,

In machine (A) I have a Tesla C2050 card along some GeForce cards (GTX 280 and 580) and while the performance level of GTX cards is adjusted dynamically, the C2050 is always stuck at the maximum level (2). Obviously, it runs quite hot and most annoyingly loud. The issue is 100% reproducible with both driver v260.19.29 and v270.18.

The Tesla is also supposed to get downclocked when not in use, right?

On the other hand, in machine (B) a GTX 470 card, though it down-clocks perfectly in the beginning, after a while it gets stuck at maximum performance level (3). The machine is used for development so it might very well be that buggy code and/or debugger/profiler messes with the driver and causes this annoyance. Like in the previous case, the result is that the card runs hot and only rebooting solves the issue. The issue was observed with 260.19.x drivers.

Does anyone have an idea how to avoid this problem to occur or how to fix it without rebooting?

I’m not sure whether the two issues are related, they might very well be. Is anyone else experiencing similar things? Can I assume that this/these are driver bugs?

Cheers,

Szilard

System A:

Hardware:

    MB: Tyan S7025

    CPU: Xeon X5660

    GPU: Tesla C2050, GTX 580, GTX 280

Software:

    OS: Ubuntu 10.04.1 (up to date packages)

    NVIDIA driver: 260.19.29, 270.18

System B:

Hardware:

    MB: Gigabyte MA790FXT

    CPU: AMD Phenom II X9 1090T

    GPU: GTX 470, GTX 260

Software:

    OS: Ubuntu 10.04.1 (up to date packages)

    NVIDIA driver: 260.19.29

Additionally, I’ve just noticed something strange: the [font=“Courier New”]nvidia-setting[/font] tool shows performance level . [font=“Courier New”]nvidia-setting[/font] shows the following when settings are queried through the command line:

[...]

  Attribute 'GPUPowerSource' (localhost:14[gpu:1]): 0.

    'GPUPowerSource' is an integer attribute.

    'GPUPowerSource' is a read-only attribute.

    'GPUPowerSource' can use the following target types: X Screen, GPU.

Attribute 'GPUCurrentPerfMode' (localhost:14[gpu:1]): 0.

    'GPUCurrentPerfMode' is an integer attribute.

    'GPUCurrentPerfMode' is a read-only attribute.

    'GPUCurrentPerfMode' can use the following target types: X Screen, GPU.

Attribute 'GPUCurrentPerfLevel' (localhost:14[gpu:1]): 0.

    'GPUCurrentPerfLevel' is an integer attribute.

    'GPUCurrentPerfLevel' is a read-only attribute.

    'GPUCurrentPerfLevel' can use the following target types: X Screen, GPU.

Attribute 'GPUAdaptiveClockState' (localhost:14[gpu:1]): 1.

    'GPUAdaptiveClockState' is an integer attribute.

    'GPUAdaptiveClockState' is a read-only attribute.

    'GPUAdaptiveClockState' can use the following target types: X Screen, GPU.

Attribute 'GPUPerfModes' (localhost:14[gpu:1]): perf=0, nvclock=50, memclock=135, processorclock=101 ;

  perf=1, nvclock=405, memclock=324, processorclock=810 ; perf=2, nvclock=405, memclock=1674,

  processorclock=810 ; perf=3, nvclock=607, memclock=1674, processorclock=1215 

    'GPUPerfModes' is a string attribute.

    'GPUPerfModes' is a read-only attribute.

    'GPUPerfModes' can use the following target types: X Screen, GPU.

[...]

This indicates that the GPU:1 which is the Tesla C2050 runs in level 0. However, the GUI version shows level 2:

Also, the temperature reported by both nvidia-settings and nvidia-smi is quite high (>80 C) and the card pumps out pretty hot air so I’m pretty sure that it’s running on the maximum performance level.

Bump!

I’m still having the same issue, my C2050 gets stuck at maximum performance level and turns the office into a sauna!