RTX 3060 Laptop, Stuck at 45Watt

I have a Dell G15 5525 Ryzen Edition with RTX3060. When I boot, the power limit is 115W as expected. After some use, probably due to the laptop heating up, the power limit is set to 45W. This value never increases back to 115W, and I have to reboot to get back 115W again.

On driver version 525, I could set the power limit to something lower than 115 on boot, and the 45W limit would never trigger. Now, I have to restart my laptop every 15min to utilize my GPU!

On Windows, this 45W limit automatically increases back to 115W after some time. I can only assume this is a bug with the Linux driver. Is there any way to solve this? Thank you in advance.

Please post the output of nvidia-smi -q
Unfortunately, the linux driver doesn’t return to the normal power limit when cooling down after hitting a temperature limit. Instead of setting a power limit, you could try to limit clocks, e.g. nvidia-smi -lgc 210,1400

Thank you for your reply. Here is the output:

==============NVSMI LOG==============

Timestamp : Tue Feb 27 19:41:41 2024
Driver Version : 545.29.06
CUDA Version : 12.3

Attached GPUs : 1
GPU 00000000:01:00.0
Product Name : NVIDIA GeForce RTX 3060 Laptop GPU
Product Brand : GeForce
Product Architecture : Ampere
Display Mode : Enabled
Display Active : Disabled
Persistence Mode : Enabled
Addressing Mode : HMM
MIG Mode
Current : N/A
Pending : N/A
Accounting Mode : Disabled
Accounting Mode Buffer Size : 4000
Driver Model
Current : N/A
Pending : N/A
Serial Number : N/A
GPU UUID : GPU-f9fd6dc8-c77c-90c5-3a83-0d9ccb5defe5
Minor Number : 0
VBIOS Version : 94.06.29.00.3B
MultiGPU Board : No
Board ID : 0x100
Board Part Number : N/A
GPU Part Number : 2560-775-A1
FRU Part Number : N/A
Module ID : 1
Inforom Version
Image Version : G001.0000.03.03
OEM Object : 2.0
ECC Object : N/A
Power Management Object : N/A
Inforom BBX Object Flush
Latest Timestamp : N/A
Latest Duration : N/A
GPU Operation Mode
Current : N/A
Pending : N/A
GSP Firmware Version : 545.29.06
GPU C2C Mode : N/A
GPU Virtualization Mode
Virtualization Mode : None
Host VGPU Mode : N/A
GPU Reset Status
Reset Required : No
Drain and Reset Recommended : N/A
IBMNPU
Relaxed Ordering Mode : N/A
PCI
Bus : 0x01
Device : 0x00
Domain : 0x0000
Device Id : 0x256010DE
Bus Id : 00000000:01:00.0
Sub System Id : 0x0B5E1028
GPU Link Info
PCIe Generation
Max : 4
Current : 4
Device Current : 4
Device Max : 4
Host Max : 4
Link Width
Max : 16x
Current : 8x
Bridge Chip
Type : N/A
Firmware : N/A
Replays Since Reset : 0
Replay Number Rollovers : 0
Tx Throughput : 0 KB/s
Rx Throughput : 0 KB/s
Atomic Caps Inbound : N/A
Atomic Caps Outbound : N/A
Fan Speed : N/A
Performance State : P0
Clocks Event Reasons
Idle : Not Active
Applications Clocks Setting : Not Active
SW Power Cap : Not Active
HW Slowdown : Not Active
HW Thermal Slowdown : Not Active
HW Power Brake Slowdown : Not Active
Sync Boost : Not Active
SW Thermal Slowdown : Not Active
Display Clock Setting : Not Active
FB Memory Usage
Total : 6144 MiB
Reserved : 415 MiB
Used : 37 MiB
Free : 5691 MiB
BAR1 Memory Usage
Total : 8192 MiB
Used : 34 MiB
Free : 8158 MiB
Conf Compute Protected Memory Usage
Total : 0 MiB
Used : 0 MiB
Free : 0 MiB
Compute Mode : Default
Utilization
Gpu : 0 %
Memory : 0 %
Encoder : 0 %
Decoder : 0 %
JPEG : 0 %
OFA : 0 %
Encoder Stats
Active Sessions : 0
Average FPS : 0
Average Latency : 0
FBC Stats
Active Sessions : 0
Average FPS : 0
Average Latency : 0
ECC Mode
Current : N/A
Pending : N/A
ECC Errors
Volatile
SRAM Correctable : N/A
SRAM Uncorrectable : N/A
DRAM Correctable : N/A
DRAM Uncorrectable : N/A
Aggregate
SRAM Correctable : N/A
SRAM Uncorrectable : N/A
DRAM Correctable : N/A
DRAM Uncorrectable : N/A
Retired Pages
Single Bit ECC : N/A
Double Bit ECC : N/A
Pending Page Blacklist : N/A
Remapped Rows : N/A
Temperature
GPU Current Temp : 38 C
GPU T.Limit Temp : N/A
GPU Shutdown Temp : 105 C
GPU Slowdown Temp : 102 C
GPU Max Operating Temp : 87 C
GPU Target Temperature : N/A
Memory Current Temp : N/A
Memory Max Operating Temp : N/A
GPU Power Readings
Power Draw : 31.36 W
Current Power Limit : 130.00 W
Requested Power Limit : N/A
Default Power Limit : 115.00 W
Min Power Limit : 1.00 W
Max Power Limit : 140.00 W
GPU Memory Power Readings
Power Draw : N/A
Module Power Readings
Power Draw : N/A
Current Power Limit : N/A
Requested Power Limit : N/A
Default Power Limit : N/A
Min Power Limit : N/A
Max Power Limit : N/A
Clocks
Graphics : 2025 MHz
SM : 2025 MHz
Memory : 7000 MHz
Video : 1785 MHz
Applications Clocks
Graphics : N/A
Memory : N/A
Default Applications Clocks
Graphics : N/A
Memory : N/A
Deferred Clocks
Memory : N/A
Max Clocks
Graphics : 2100 MHz
SM : 2100 MHz
Memory : 7001 MHz
Video : 1950 MHz
Max Customer Boost Clocks
Graphics : N/A
Clock Policy
Auto Boost : N/A
Auto Boost Default : N/A
Voltage
Graphics : 1081.250 mV
Fabric
State : N/A
Status : N/A
Processes : None

I will try to set the graphics clock to a lower value.
Is there a reason why the driver does not return to the original power limit after cooling down?

Missing feature in the Linux driver? Good question, IDK.
Besides, the temperature setting look quite normal, did you check at what teperature the throttling starts?

Well, depending on the power mode of the laptop, temperature limit is set to either 78, 79 or 87 Celsius for the GPU, with an accompanying fan profile. This depends on the power mode when the nvidia module is loaded, and does not change without a reboot, even when the power profile of the laptop changes.

It is hard to say exactly when the 45Watt limit takes effect, though it almost always happens with heavy GPU usage, e.g. training a ML model etc. triggers this limit.

In nvtop, I do not see the temp reach 87 Celsius, but maybe it happens only for an instant?