Ubuntu 22.04 // 4090 // always P2: how to set to P0 permanently or fix frequencies

Hi!
System: ubuntu 22.04 (lxd vm, not container), pci passthrough-ed 4090.
Task: simple GAN
Always see P2 on nvidia-smi:

+-----------------------------------------------------------------------------+
| NVIDIA-SMI 525.89.02    Driver Version: 525.89.02    CUDA Version: 12.0     |
|-------------------------------+----------------------+----------------------+
| GPU  Name        Persistence-M| Bus-Id        Disp.A | Volatile Uncorr. ECC |
| Fan  Temp  Perf  Pwr:Usage/Cap|         Memory-Usage | GPU-Util  Compute M. |
|                               |                      |               MIG M. |
|===============================+======================+======================|
|   0  NVIDIA GeForce ...  On   | 00000000:06:00.0 Off |                  Off |
| 44%   66C    P2   391W / 450W |   5784MiB / 24564MiB |     98%      Default |
|                               |                      |                  N/A |
+-------------------------------+----------------------+----------------------+
                                                                               
+-----------------------------------------------------------------------------+
| Processes:                                                                  |
|  GPU   GI   CI        PID   Type   Process name                  GPU Memory |
|        ID   ID                                                   Usage      |
|=============================================================================|
|    0   N/A  N/A    611675      C   ...ep/jupyterlab/bin/python3     5782MiB |
+-----------------------------------------------------------------------------+

dmon show clock changing (boosting, if i use this term in right way) during task run:

gpu pwr gtemp mtemp sm mem enc dec mclk pclk

Idx W C C % % % % MHz MHz

0    393     65      -    98     36      0      0  10251   2550 
0    393     63      -    97     36      0      0  10251   2490 
0    394     64      -    97     35      0      0  10251   2550 
0    395     66      -    97     35      0      0  10251   2595 
0    393     66      -    98     37      0      0  10251   2640 
0    395     64      -    98     37      0      0  10251   2310 
0    395     64      -    98     37      0      0  10251   2490 
0    395     65      -    97     36      0      0  10251   2640 
0    396     67      -    97     36      0      0  10251   2655 
0    397     67      -    98     36      0      0  10251   2490 
0    395     65      -    98     36      0      0  10251   2505 
0    396     64      -    98     36      0      0  10251   2610 
0    394     65      -    98     36      0      0  10251   2610 
0    394     67      -    98     35      0      0  10251   2670 
0    395     68      -    98     36      0      0  10251   2670 
0    394     64      -    98     36      0      0  10251   2295 
0    391     65      -    97     35      0      0  10251   2310 
0    391     66      -    97     35      0      0  10251   2505 
0    391     65      -    97     35      0      0  10251   2445 
0    394     64      -    97     33      0      0  10251   2430 
0    394     65      -    97     33      0      0  10251   2550 
0    394     64      -    97     33      0      0  10251   2550 
0    395     67      -    98     37      0      0  10251   2640 
0    398     66      -    98     38      0      0  10251   2640 

I want to set clock permanently to, for example, 2640.
Tried different ways (without attempt to deeply understand what i exacly do) to change to P0 or fix mem+gpu clock via nvidia-settings, no luck.

Question: Is there a step-by-step guide “how to fix frequencies”?

I understand that there can be lots of tricky cases and i “need to better discover the topic”, but maybe there is a “common set of instructions that can help to understand how to fix freqs” that “usually should work” (i.e. without deeper debug of particular problems that can prevent frequencies of my exact system+card to be fixed)

The funniest thing that manufacturer’s (Palit) utility (ThunderMaster) allows to fix both base & boost GPU freqs at needed level, but there is no Linux version of ThunderMaster.

I dont think that there is a “principal difference” between ThunderMaster and nvidia-settings/nvidia-smi in low-level interaction with graphic card for “sending a command to fix GPU freqs”, so, there should be “a working way to do that”, of course, if NVidia implemented such interaction via /proc in nvidia-settings/nvidia-smi and this interaction is not Palit-vbios-specific.

But i still cannot find any working guide (simple things like, for example, sudo nvidia-smi -lgc 2610 just don’t work even if says “GPU clocks set to “(gpuClkMin 2610, gpuClkMax 2610)””)

Well, “nvidia-smi -lgc” works when:

  • on idle (i see desired clock fixed via nvidia-smi dmon).
  • as upper limit on load (if not set to supported maximum)

But under load GPU clock vary (if set to supported maximum), so, probably it’s governed by some other “regulations/limitations”: power, PX-mode, temperature-protection?