Usually when looking at “nvidia-smi.exe -q -d TEMPERATURE” outputs, the GPU Max Operating Temp is within a few degrees of the GPU Slowdown Temp, however for the mobile 3060 (laptop) and possible other mobile cards, I’m noticing the GPU Max Operating Temp is quite low comparatively.
For my mobile 3060 the GPU Max Operating Temp is 87C while the GPU Slowdown Temp is 102C, that’s quite a significant difference and I can’t think of a scenario where it beings throttling at 87C and eventually hits 102C. As far as I know the GPU Max Operating Temp is when it starts to throttle clocks through SW and GPU Slowdown Temp is when it throttles via HW.
Anyone know why this GPU Max Operating Temp is so low, especially for a laptop GPU? I thought laptop GPUs were able to run hotter, and the high slowdown temp definitely indicates that. I’ve noticed alot of other desktop GPUs have max operating temps in the 90s.
Pasting my nvidia-smi query. Personally I think there’s an error and GPU Max Operating Temp for the laptop 3060 should be 100C, even the desktop 3060 has the GPU Max Operating Temp at 93C.
==============NVSMI LOG==============
Timestamp : Thu Aug 10 01:07:48 2023
Driver Version : 536.67
CUDA Version : 12.2
Attached GPUs : 1
GPU 00000000:01:00.0
Product Name : NVIDIA GeForce RTX 3060 Laptop GPU
Product Brand : GeForce
Product Architecture : Ampere
Display Mode : Enabled
Display Active : Enabled
Persistence Mode : N/A
Addressing Mode : N/A
MIG Mode
Current : N/A
Pending : N/A
Accounting Mode : Disabled
Accounting Mode Buffer Size : 4000
Driver Model
Current : WDDM
Pending : WDDM
Serial Number : N/A
GPU UUID : GPU-b4686bb3-b616-e0b7-8466-e4934d3b9a00
Minor Number : N/A
VBIOS Version : 94.06.17.00.31
MultiGPU Board : No
Board ID : 0x100
Board Part Number : N/A
GPU Part Number : 2520-775-A1
FRU Part Number : N/A
Module ID : 1
Inforom Version
Image Version : G001.0000.03.03
OEM Object : 2.0
ECC Object : N/A
Power Management Object : N/A
GPU Operation Mode
Current : N/A
Pending : N/A
GSP Firmware Version : N/A
GPU Virtualization Mode
Virtualization Mode : None
Host VGPU Mode : N/A
GPU Reset Status
Reset Required : No
Drain and Reset Recommended : N/A
IBMNPU
Relaxed Ordering Mode : N/A
PCI
Bus : 0x01
Device : 0x00
Domain : 0x0000
Device Id : 0x252010DE
Bus Id : 00000000:01:00.0
Sub System Id : 0x70F21558
GPU Link Info
PCIe Generation
Max : 3
Current : 3
Device Current : 3
Device Max : 4
Host Max : 3
Link Width
Max : 16x
Current : 8x
Bridge Chip
Type : N/A
Firmware : N/A
Replays Since Reset : 0
Replay Number Rollovers : 0
Tx Throughput : 0 KB/s
Rx Throughput : 5000 KB/s
Atomic Caps Inbound : N/A
Atomic Caps Outbound : N/A
Fan Speed : N/A
Performance State : P8
Clocks Event Reasons
Idle : Active
Applications Clocks Setting : Not Active
SW Power Cap : Not Active
HW Slowdown : Not Active
HW Thermal Slowdown : Not Active
HW Power Brake Slowdown : Not Active
Sync Boost : Not Active
SW Thermal Slowdown : Not Active
Display Clock Setting : Not Active
FB Memory Usage
Total : 6144 MiB
Reserved : 135 MiB
Used : 506 MiB
Free : 5502 MiB
BAR1 Memory Usage
Total : 8192 MiB
Used : 1 MiB
Free : 8191 MiB
Conf Compute Protected Memory Usage
Total : N/A
Used : N/A
Free : N/A
Compute Mode : Default
Utilization
Gpu : 11 %
Memory : 23 %
Encoder : 0 %
Decoder : 0 %
JPEG : 0 %
OFA : 0 %
Encoder Stats
Active Sessions : 0
Average FPS : 0
Average Latency : 0
FBC Stats
Active Sessions : 0
Average FPS : 0
Average Latency : 0
ECC Mode
Current : N/A
Pending : N/A
ECC Errors
Volatile
SRAM Correctable : N/A
SRAM Uncorrectable : N/A
DRAM Correctable : N/A
DRAM Uncorrectable : N/A
Aggregate
SRAM Correctable : N/A
SRAM Uncorrectable : N/A
DRAM Correctable : N/A
DRAM Uncorrectable : N/A
Retired Pages
Single Bit ECC : N/A
Double Bit ECC : N/A
Pending Page Blacklist : N/A
Remapped Rows : N/A
Temperature
GPU Current Temp : 38 C
GPU T.Limit Temp : N/A
GPU Shutdown Temp : 105 C
GPU Slowdown Temp : 102 C
GPU Max Operating Temp : 87 C
GPU Target Temperature : N/A
Memory Current Temp : N/A
Memory Max Operating Temp : N/A
GPU Power Readings
Power Draw : 15.00 W
Current Power Limit : 120.00 W
Requested Power Limit : N/A
Default Power Limit : 115.00 W
Min Power Limit : 1.00 W
Max Power Limit : 130.00 W
Module Power Readings
Power Draw : N/A
Current Power Limit : N/A
Requested Power Limit : N/A
Default Power Limit : N/A
Min Power Limit : N/A
Max Power Limit : N/A
Clocks
Graphics : 210 MHz
SM : 210 MHz
Memory : 405 MHz
Video : 555 MHz
Applications Clocks
Graphics : N/A
Memory : N/A
Default Applications Clocks
Graphics : N/A
Memory : N/A
Deferred Clocks
Memory : N/A
Max Clocks
Graphics : 2100 MHz
SM : 2100 MHz
Memory : 7001 MHz
Video : 1950 MHz
Max Customer Boost Clocks
Graphics : N/A
Clock Policy
Auto Boost : N/A
Auto Boost Default : N/A
Voltage
Graphics : 637.500 mV
Hi there @raiger and welcome to the NVIDIA developer forums!
I am not the expert in thermal distribution theory, but the values seem correct to me.
You have to take into count that a Laptop form factor has completely different thermal constraints than a standard PC with an AIC GPU. Cooling systems differ from design to design and usually involve a holistic approach of cooling all components in an integrated cooling loop. That means CPU, RAM and other components temperatures will influence overall system values. That way if the GPU temp AND CPU temp for example spike, you can easily gain 10s of centigrade in a jiffy. That is why the (system) shutdown temp for the GPU is so high. But on the other hand, since heat dissipation is so difficult, the temperature where the GPU is clocked down will be much lower to safeguard against the whole system overheating too easily.
Again, this differs from manufacturer to manufacturer, but will be similar in any Laptop design. Besides, the GPU is a mobile version, so you can’t directly compare to desktop GPU values in any case.