3090 power throttles around 300w

I just noticed using mangohud that my 3090 is power throttling already around 300w.
I have not had this warning turned on before so I do not know if this has been an issue over time but I have experienced complete crashes on my gpu while for example gaming or generating images using SD.

Then 2 days ago a game crashed, and my gpu did no longer show even on lspci. Rebooting or shutting off the computer/turning it back on did not help, but for some reason removing it from my computer and reseating it again made it come back and that is when I started looking into power consumption and levels.

nvidia-smi -q tells me max should be 350w (366w), not 300w.

I have a 1000w PSU.
The only thing recently changed is me adding a samsung 990pro m2 and I used one of the pci lanes instead of cpu if that could be related?

I DID notice that one of the lines on the gpu was a mm shorter than the rest, if that could be related?

Please advice, I still have warranty on the card (and the mobo if you guys think it is that)

The contact being shorter is normal, to allow for pcie hotplug.
Please run nvidia-bug-report.sh as root and attach the resulting nvidia-bug-report.log.gz file to your post.

Thank you for replying so fast.

This is on a Manjaro-unstable using kernel 6.8 and x11.
Using latest firmware (F22) on mobo.
If you want me I can also run it on a pure arch install, using wayland, with the card actually used when running the script (I use prime) etc. Just let me know. :)
nvidia-bug-report.log.gz (726.6 KB)

The temperature is at 49°C while idle, please monitor temperatures under load.

I was gaming just before. :)
It never goes higher than 75-80c even at a 90-100% load over long time, that is why I started suspecting power consumption and included the “check” in mangohud.
And when power goes just a tiny bit above 300w a big red warning stating “power throttling” and shows spikes in the frametime histogram.
image
vsync is on, that is why it is locked at 60 on my old a** monitor. xD

$ nvidia-smi -q -d POWER

==============NVSMI LOG==============

Timestamp                                 : Mon Apr 15 14:03:53 2024
Driver Version                            : 550.67
CUDA Version                              : 12.4

Attached GPUs                             : 1
GPU 00000000:01:00.0
    GPU Power Readings
        Power Draw                        : 25.68 W
        Current Power Limit               : 350.00 W
        Requested Power Limit             : 350.00 W
        Default Power Limit               : 350.00 W
        Min Power Limit                   : 100.00 W
        Max Power Limit                   : 366.00 W
    Power Samples
        Duration                          : 51.10 sec
        Number of Samples                 : 119
        Max                               : 35.56 W
        Min                               : 23.39 W
        Avg                               : 24.89 W
    GPU Memory Power Readings 
        Power Draw                        : N/A
    Module Power Readings
        Power Draw                        : N/A
        Current Power Limit               : N/A
        Requested Power Limit             : N/A
        Default Power Limit               : N/A
        Min Power Limit                   : N/A
        Max Power Limit                   : N/A

Edit
Or am I understanding this wrong here somehow and it SHOULD throttle already at 300w?
I’m a bit nervous using the card to full potential now, and I do still have warranty…

I wouldn’t know why, if it shows 350W, it should go up to 350W. Maybe check nvidia-smi -q while throttling what reason is displayed.

Thank you again for responding.
nvidia-smi -q lists nothing as throttling. It just shows the same temperature and power consumption as mangohud.
I wonder where mangohud is getting the information that the GPU is throttling?
I did notice 300w is not the limit, it is below that, around 29x something.

I would not be so worried if it wasn’t that the GPU completely died on me a few days ago. I payed a LOT OF MONEY to nvidia for this GPU!!! More money for that GPU than I payed for any complete computer in my entire life, and I have owned computers since late 1980.

Here is another nvidia bug report (this time on a pure arch install) ran while mangohud was constantly reporting “THROTTLING POWER”
nvidia-bug-report.log.gz (896.2 KB)

Thanks in advance from a VERY worried Nvidia user. :(

    Clocks Event Reasons
        SW Power Cap                      : Active

Previously, this was called “Throttling Reasons” which made more sense.
So it’s really the driver that’s limiting it. Only reasons I could think of were either a driver bug or, since you’re running in PRIME mode without monitors connected to the nvidia gpu, 50W reserved for the display engines. Far-fetched, though.

I was contemplating including the information above, but you figured it out anyway. :)
The reason is I HAVE to connect my monitor to my mobo hdmi to get reverse prime to work (otherwise everything would run on the nvidia rather than on my CPU, xrandr doesn’t want to move the ouput on this machine for some reason, neither on x11 nor on wayland)

What I could do to further test this is to connect my monitor to the nvidia and retry and see if it still throttles. Maybe there is a limit how much it can consume while having to “reroute” the signal to my mobo hdmi…

Edit
So @generix I connected the monitor to a port on the GPU, ran the debug tool, started a demanding game and debug one more time:
nvidia-bug-report-nv_hdmi.log.gz (924.7 KB)
nvidia-bug-report-nv_hdmi-game.log.gz (958.9 KB)

I had to go x11 since wayland bugged out completely, visual bugs everywhere, flickering in steam and game freezing on load. :-/

I added fan speed in mangohud yesterday, and it seems to work fine when in reverse prime, usually running at 65-85% when gaming.
But look at this when I used the GPU hdmi!
Screenshot_20240416_192029
Something fishy is going on here.
Could be mangohud though, idk.
Maybe you see something more in the logs?

Edit 2
Maybe this information could be useful:

inxi -Fazy
System:
  Kernel: 6.8.5-arch1-1 arch: x86_64 bits: 64 compiler: gcc v: 13.2.1
    clocksource: tsc avail: hpet,acpi_pm parameters: BOOT_IMAGE=/vmlinuz-linux
    root=UUID=78235068-f1ca-4363-99b1-474175fb3438 rw rootflags=subvol=@
    zswap.enabled=0 rootfstype=btrfs modeset=1 sysrq_always_enabled=1 fbdev=1
    nvidia_drm.modeset=1 loglevel=3 amd-pstate=guided
  Desktop: KDE Plasma v: 6.0.3 tk: Qt v: N/A info: frameworks v: 6.1.0
    wm: kwin_wayland vt: 2 dm: SDDM Distro: Arch Linux
Machine:
  Type: Desktop Mobo: Gigabyte model: X670 AORUS ELITE AX v: x.x
    serial: <superuser required> uuid: <superuser required> UEFI: American
    Megatrends LLC. v: F22 date: 03/11/2024
CPU:
  Info: model: AMD Ryzen 9 7900X bits: 64 type: MT MCP arch: Zen 4 gen: 5
    level: v4 note: check built: 2022+ process: TSMC n5 (5nm) family: 0x19 (25)
    model-id: 0x61 (97) stepping: 2 microcode: 0xA601206
  Topology: cpus: 1x cores: 12 tpc: 2 threads: 24 smt: enabled cache:
    L1: 768 KiB desc: d-12x32 KiB; i-12x32 KiB L2: 12 MiB desc: 12x1024 KiB
    L3: 64 MiB desc: 2x32 MiB
  Speed (MHz): avg: 3304 high: 5402 min/max: 400/5733 boost: enabled scaling:
    driver: amd-pstate governor: schedutil cores: 1: 3005 2: 3005 3: 3005 4: 3005
    5: 3005 6: 3005 7: 3005 8: 3855 9: 4110 10: 3005 11: 5402 12: 3005 13: 3005
    14: 3005 15: 3372 16: 3005 17: 3005 18: 3005 19: 3039 20: 3005 21: 3121
    22: 3005 23: 5328 24: 3005 bogomips: 225258
  Flags: avx avx2 ht lm nx pae sse sse2 sse3 sse4_1 sse4_2 sse4a ssse3 svm
  Vulnerabilities:
  Type: gather_data_sampling status: Not affected
  Type: itlb_multihit status: Not affected
  Type: l1tf status: Not affected
  Type: mds status: Not affected
  Type: meltdown status: Not affected
  Type: mmio_stale_data status: Not affected
  Type: reg_file_data_sampling status: Not affected
  Type: retbleed status: Not affected
  Type: spec_rstack_overflow mitigation: Safe RET
  Type: spec_store_bypass mitigation: Speculative Store Bypass disabled via
    prctl
  Type: spectre_v1 mitigation: usercopy/swapgs barriers and __user pointer
    sanitization
  Type: spectre_v2 mitigation: Enhanced / Automatic IBRS; IBPB: conditional;
    STIBP: always-on; RSB filling; PBRSB-eIBRS: Not affected; BHI: Not
    affected
  Type: srbds status: Not affected
  Type: tsx_async_abort status: Not affected
Graphics:
  Device-1: NVIDIA GA102 [GeForce RTX 3090] vendor: eVga.com. driver: nvidia
    v: 550.67 alternate: nouveau,nvidia_drm non-free: 550.xx+ status: current
    (as of 2024-04; EOL~2026-12-xx) arch: Ampere code: GAxxx
    process: TSMC n7 (7nm) built: 2020-2023 pcie: gen: 1 speed: 2.5 GT/s
    lanes: 16 link-max: gen: 4 speed: 16 GT/s ports: active: none empty: DP-1,
    DP-2, DP-3, HDMI-A-1 bus-ID: 01:00.0 chip-ID: 10de:2204 class-ID: 0300
  Device-2: AMD Raphael vendor: Gigabyte driver: amdgpu v: kernel
    arch: RDNA-2 code: Navi-2x process: TSMC n7 (7nm) built: 2020-22 pcie: gen: 4
    speed: 16 GT/s lanes: 16 ports: active: HDMI-A-2 empty: DP-4, DP-5, DP-6,
    Writeback-1 bus-ID: 15:00.0 chip-ID: 1002:164e class-ID: 0300 temp: 52.0 C
  Display: wayland server: X.org v: 1.21.1.13 with: Xwayland v: 23.2.6
    compositor: kwin_wayland driver: X: loaded: modesetting,nvidia
    alternate: fbdev,nouveau,nv,vesa dri: radeonsi gpu: nvidia,amdgpu
    display-ID: 0
  Monitor-1: HDMI-A-2 res: 1920x1080 size: N/A modes: N/A
  API: EGL v: 1.5 hw: drv: nvidia drv: amd radeonsi platforms: device: 0
    drv: nvidia device: 2 drv: radeonsi device: 3 drv: swrast gbm: drv: nvidia
    surfaceless: drv: nvidia wayland: drv: radeonsi x11: drv: radeonsi
    inactive: device-1
  API: OpenGL v: 4.6.0 compat-v: 4.5 vendor: amd mesa v: 24.0.5-arch1.1
    glx-v: 1.4 direct-render: yes renderer: AMD Radeon Graphics (radeonsi
    raphael_mendocino LLVM 17.0.6 DRM 3.57 6.8.5-arch1-1) device-ID: 1002:164e
    memory: 500 MiB unified: no display-ID: :0.0
  API: Vulkan v: 1.3.279 layers: 7 device: 0 type: discrete-gpu
    name: NVIDIA GeForce RTX 3090 driver: nvidia v: 550.67 device-ID: 10de:2204
    surfaces: xcb,xlib,wayland
Audio:
  Device-1: NVIDIA GA102 High Definition Audio vendor: eVga.com.
    driver: snd_hda_intel v: kernel pcie: gen: 4 speed: 16 GT/s lanes: 16
    bus-ID: 01:00.1 chip-ID: 10de:1aef class-ID: 0403
  Device-2: AMD Rembrandt Radeon High Definition Audio driver: snd_hda_intel
    v: kernel pcie: gen: 4 speed: 16 GT/s lanes: 16 bus-ID: 15:00.1
    chip-ID: 1002:1640 class-ID: 0403
  Device-3: AMD Family 17h/19h HD Audio vendor: Gigabyte
    driver: snd_hda_intel v: kernel pcie: gen: 4 speed: 16 GT/s lanes: 16
    bus-ID: 15:00.6 chip-ID: 1022:15e3 class-ID: 0403
  API: ALSA v: k6.8.5-arch1-1 status: kernel-api tools: N/A
  Server-1: JACK v: 1.9.22 status: off tools: N/A
  Server-2: PipeWire v: 1.0.5 status: active with: 1: pipewire-pulse
    status: active 2: wireplumber status: active tools: pactl,pw-cat,pw-cli,wpctl
Network:
  Device-1: Realtek RTL8125 2.5GbE vendor: Gigabyte driver: r8169 v: kernel
    pcie: gen: 2 speed: 5 GT/s lanes: 1 port: e000 bus-ID: 0e:00.0
    chip-ID: 10ec:8125 class-ID: 0200
  IF: enp14s0 state: up speed: 1000 Mbps duplex: full mac: <filter>
  Device-2: MEDIATEK MT7922 802.11ax PCI Express Wireless Network Adapter
    driver: mt7921e v: kernel pcie: gen: 2 speed: 5 GT/s lanes: 1 bus-ID: 0f:00.0
    chip-ID: 14c3:0616 class-ID: 0280
  IF: wlan0 state: down mac: <filter>
  Info: services: NetworkManager,systemd-timesyncd
Bluetooth:
  Device-1: MediaTek Wireless_Device driver: btusb v: 0.8 type: USB rev: 2.1
    speed: 480 Mb/s lanes: 1 mode: 2.0 bus-ID: 3-7:3 chip-ID: 0e8d:0616
    class-ID: e001 serial: <filter>
  Report: rfkill ID: hci0 rfk-id: 0 state: down bt-service: disabled
    rfk-block: hardware: no software: no address: see --recommends
Drives:
  Local Storage: total: 6.08 TiB used: 1.97 TiB (32.4%)
  SMART Message: Unable to run smartctl. Root privileges required.
  ID-1: /dev/nvme0n1 maj-min: 259:3 vendor: Samsung model: SSD 980 PRO 1TB
    size: 931.51 GiB block-size: physical: 512 B logical: 512 B speed: 63.2 Gb/s
    lanes: 4 tech: SSD serial: <filter> fw-rev: 5B2QGXA7 temp: 46.9 C
    scheme: GPT
  ID-2: /dev/nvme1n1 maj-min: 259:0 vendor: Samsung model: SSD 990 PRO 2TB
    size: 1.82 TiB block-size: physical: 512 B logical: 512 B speed: 63.2 Gb/s
    lanes: 4 tech: SSD serial: <filter> fw-rev: 0B2QJXG7 temp: 43.9 C
    scheme: GPT
  ID-3: /dev/sda maj-min: 8:0 vendor: Samsung model: SSD 860 EVO 500GB
    size: 465.76 GiB block-size: physical: 512 B logical: 512 B speed: 6.0 Gb/s
    tech: SSD serial: <filter> fw-rev: 4B6Q scheme: GPT
  ID-4: /dev/sdb maj-min: 8:16 vendor: Intel model: SSDSC2BF180A4H
    size: 167.68 GiB block-size: physical: 512 B logical: 512 B speed: 6.0 Gb/s
    tech: SSD serial: <filter> fw-rev: LH6i scheme: GPT
  ID-5: /dev/sdc maj-min: 8:32 vendor: Western Digital
    model: WD30EFRX-68EUZN0 size: 2.73 TiB block-size: physical: 4096 B
    logical: 512 B speed: 6.0 Gb/s tech: HDD rpm: 5400 serial: <filter>
    fw-rev: 0A80 scheme: GPT
Partition:
  ID-1: / raw-size: 100 GiB size: 100 GiB (100.00%) used: 17.85 GiB (17.8%)
    fs: btrfs dev: /dev/sda2 maj-min: 8:2
  ID-2: /boot raw-size: 512 MiB size: 511 MiB (99.80%)
    used: 241.2 MiB (47.2%) fs: vfat dev: /dev/sda1 maj-min: 8:1
  ID-3: /home raw-size: 301.26 GiB size: 301.26 GiB (100.00%)
    used: 137.25 GiB (45.6%) fs: btrfs dev: /dev/sda3 maj-min: 8:3
  ID-4: /var/log raw-size: 100 GiB size: 100 GiB (100.00%)
    used: 17.85 GiB (17.8%) fs: btrfs dev: /dev/sda2 maj-min: 8:2
Swap:
  Kernel: swappiness: 60 (default) cache-pressure: 100 (default) zswap: no
  ID-1: swap-1 type: zram size: 10.17 GiB used: 0 KiB (0.0%) priority: 100
    comp: zstd avail: lzo,lzo-rle,lz4,lz4hc,842 max-streams: 24 dev: /dev/zram0
Sensors:
  System Temperatures: cpu: 58.8 C mobo: 46.0 C gpu: amdgpu temp: 53.0 C
  Fan Speeds (rpm): N/A
Info:
  Memory: total: 32 GiB note: est. available: 30.5 GiB used: 2.79 GiB (9.2%)
  Processes: 442 Power: uptime: 16m states: freeze,mem,disk suspend: deep
    avail: s2idle wakeups: 0 hibernate: platform avail: shutdown, reboot,
    suspend, test_resume image: 12.16 GiB services: org_kde_powerdevil,upowerd
    Init: systemd v: 255 default: graphical tool: systemctl
  Packages: pm: pacman pkgs: 1046 libs: 323 tools: yay pm: flatpak pkgs: 0
    Compilers: gcc: 13.2.1 Shell: Zsh v: 5.9 default: Bash v: 5.2.26
    running-in: yakuake inxi: 3.3.34

BUMP
PLEASE NVIDIA RESP0ND TO ME!!!

I payed more than $2200 for this card and now I can not even use it for fear of it breaking!

PLEASE RESPOND AND ADVICE ME WHAT TO DO!
PLEASE PLEASE PLEASE!!!