JetPack 6: tegrastats missing GPU temperature

I recently upgraded my Jetson Orin AGX 32GB Developer Kit from JetPack 5 to JetPack 6 and now I do not see a GPU temperature in the tegrastats output.

❯ cat /etc/nv_tegra_release
# R36 (release), REVISION: 4.0, GCID: 37537400, BOARD: generic, EABI: aarch64, DATE: Fri Sep 13 04:36:44 UTC 2024
# KERNEL_VARIANT: oot
TARGET_USERSPACE_LIB_DIR=nvidia
TARGET_USERSPACE_LIB_DIR_PATH=usr/lib/aarch64-linux-gnu/nvidia
❯ sudo tegrastats
10-11-2024 16:37:03 RAM 1543/30697MB (lfb 253x4MB) SWAP 0/15348MB (cached 0MB) CPU [0%@729,0%@729,0%@729,0%@729,0%@729,0%@729,0%@729,0%@729,off,off,off,off] EMC_FREQ 0%@2133 GR3D_FREQ 0%@[0,0] NVENC off NVDEC off NVJPG off NVJPG1 off VIC off OFA off NVDLA0 off NVDLA1 off PVA0_FREQ off APE 174 cpu@47.781C soc2@43.812C soc0@45.531C tj@47.875C soc1@44.562C VDD_GPU_SOC 2386mW/2386mW VDD_CPU_CV 0mW/0mW VIN_SYS_5V0 3735mW/3735mW
10-11-2024 16:37:04 RAM 1543/30697MB (lfb 253x4MB) SWAP 0/15348MB (cached 0MB) CPU [0%@729,0%@729,0%@729,0%@729,0%@729,0%@729,0%@729,0%@729,off,off,off,off] EMC_FREQ 0%@2133 GR3D_FREQ 0%@[0,0] NVENC off NVDEC off NVJPG off NVJPG1 off VIC off OFA off NVDLA0 off NVDLA1 off PVA0_FREQ off APE 174 cpu@47.812C soc2@43.812C soc0@45.437C tj@47.812C soc1@44.906C VDD_GPU_SOC 2386mW/2386mW VDD_CPU_CV 0mW/0mW VIN_SYS_5V0 3735mW/3735mW
10-11-2024 16:37:05 RAM 1543/30697MB (lfb 253x4MB) SWAP 0/15348MB (cached 0MB) CPU [0%@729,0%@729,0%@729,0%@729,0%@729,0%@729,0%@729,0%@729,off,off,off,off] EMC_FREQ 0%@2133 GR3D_FREQ 0%@[0,0] NVENC off NVDEC off NVJPG off NVJPG1 off VIC off OFA off NVDLA0 off NVDLA1 off PVA0_FREQ off APE 174 cpu@47.812C soc2@43.781C soc0@45.468C tj@47.812C soc1@44.562C VDD_GPU_SOC 2386mW/2386mW VDD_CPU_CV 0mW/0mW VIN_SYS_5V0 3735mW/3735mW
10-11-2024 16:37:06 RAM 1543/30697MB (lfb 253x4MB) SWAP 0/15348MB (cached 0MB) CPU [0%@729,0%@729,0%@729,0%@729,0%@729,0%@729,0%@729,0%@729,off,off,off,off] EMC_FREQ 0%@2133 GR3D_FREQ 0%@[0,0] NVENC off NVDEC off NVJPG off NVJPG1 off VIC off OFA off NVDLA0 off NVDLA1 off PVA0_FREQ off APE 174 cpu@47.906C soc2@43.843C soc0@45.437C tj@47.906C soc1@44.625C VDD_GPU_SOC 2386mW/2386mW VDD_CPU_CV 0mW/0mW VIN_SYS_5V0 3735mW/3735mW

In JetPack 5, there was a gpu@XX.XXXC entry after the cpu@ temperature, where XX.XXXC would be replaced with the current GPU temperature. I am using the development kit carrier board.

Based another post that was having similar issues with tegrastats values missing in JetPack 6, I verified the nvgpu module was installed.

❯ lsmod | grep "nvgpu"
nvgpu                2654208  0
host1x                180224  9 host1x_nvhost,host1x_fence,tegra_se,nvgpu,tegra_drm,nvhost_nvdla,nvidia_drm,nvhost_pva,nvidia_modeset
mc_utils               16384  3 nvidia,nvgpu,tegra_camera_platform
nvmap                 204800  1 nvgpu

and tried the sudo modprobe nvgpu command. No error message and nvgpu was re-verified with identical output from above. I then tried the --readall flag.

❯ sudo tegrastats --readall
10-11-2024 16:45:48 RAM 1538/30697MB (lfb 253x4MB) SWAP 0/15348MB (cached 0MB) CPU [0%@729,0%@729,0%@729,0%@729,0%@729,0%@729,0%@729,0%@729,off,off,off,off] EMC_FREQ 0%@2133 GR3D_FREQ 0%@[0,0] NVENC off NVDEC off NVJPG off NVJPG1 off VIC off OFA off NVDLA0 off NVDLA1 off PVA0_FREQ off APE 174 cpu@48.187C soc2@43.906C soc0@45.593C tj@48.187C soc1@44.812C VDD_GPU_SOC 2387mW/2387mW VDD_CPU_CV 397mW/397mW VIN_SYS_5V0 3735mW/3735mW
10-11-2024 16:45:49 RAM 1538/30697MB (lfb 253x4MB) SWAP 0/15348MB (cached 0MB) CPU [0%@729,0%@729,0%@729,0%@729,0%@729,0%@729,0%@729,0%@729,off,off,off,off] EMC_FREQ 0%@2133 GR3D_FREQ 0%@[0,0] NVENC off NVDEC off NVJPG off NVJPG1 off VIC off OFA off NVDLA0 off NVDLA1 off PVA0_FREQ off APE 174 cpu@48.062C soc2@43.875C soc0@45.625C tj@48.062C soc1@44.687C VDD_GPU_SOC 2387mW/2387mW VDD_CPU_CV 0mW/199mW VIN_SYS_5V0 3735mW/3735mW

The gpu@XX.XXXC temperature key-value is still missing. The --verbose command yielded the following:

❯ sudo tegrastats --verbose --readall
ERROR: failed to read /sys/devices/system/cpu/cpu8/cpufreq/cpuinfo_cur_freq

ERROR: failed to read /sys/devices/system/cpu/cpu9/cpufreq/cpuinfo_cur_freq

ERROR: failed to read /sys/devices/system/cpu/cpu10/cpufreq/cpuinfo_cur_freq

ERROR: failed to read /sys/devices/system/cpu/cpu11/cpufreq/cpuinfo_cur_freq

WARNING: failed to open /sys/kernel/debug/gpu_pci/clocks/gpcclk

WARNING: failed to open /sys/bus/pci/drivers/nvgpu/module/load

WARNING: failed to open /sys/kernel/debug/nvmap/iram/size

WARNING: failed to open /sys/kernel/debug/tegra_denver/nvmstats/instantaneous_stats
ERROR: failed to read /sys/devices/virtual/thermal/thermal_zone2/temp

ERROR: failed to read /sys/devices/virtual/thermal/thermal_zone3/temp

ERROR: failed to read /sys/devices/virtual/thermal/thermal_zone1/temp

ERROR: failed to read /sys/devices/virtual/thermal/thermal_zone4/temp

WARNING: failed to open /sys/bus/i2c/devices/0-0040/name

WARNING: failed to open /sys/bus/i2c/devices/0-0041/name

WARNING: failed to open /sys/bus/i2c/devices/6-0040/name

WARNING: failed to open /sys/bus/i2c/devices/7-0040/name

WARNING: failed to open /sys/bus/i2c/devices/2-0040/name

WARNING: failed to open /sys/class/hwmon/hwmon3/in1_label

WARNING: failed to open /sys/class/hwmon/hwmon3/label

Is one of these errors or warnings causing the missing GPU temperature?

The other post mentions passing a configuration file to tegrastats. I tried this by creating a file with the exact contents.

❯ cat tegra_stats_conf
APE,/sys/kernel/debug/bpmp/debug/clk/ape/rate
EMC_FREQ,/sys/kernel/debug/bpmp/debug/clk/emc/rate
EMC_LOAD,/sys/kernel/actmon_avg_activity/mc_all
GR3D_FREQ,/sys/kernel/debug/bpmp/debug/clk/gpcclk/rate
IGPU_LOAD,/sys/devices/gpu.0/load
NVENC_ENBL,/sys/kernel/debug/clk/nvenc/clk_enable_count
NVENC1_ENBL,/sys/kernel/debug/clk/nvenc1/clk_enable_count
NVENC,/sys/kernel/debug/bpmp/debug/clk/nvenc/rate
NVENC1,/sys/kernel/debug/bpmp/debug/clk/nvenc1/rate
NVDEC_ENBL,/sys/kernel/debug/clk/nvdec/clk_enable_count
NVDEC1_ENBL,/sys/kernel/debug/clk/nvdec1/clk_enable_count
NVDEC,/sys/kernel/debug/bpmp/debug/clk/nvdec/rate
NVDEC1,/sys/kernel/debug/bpmp/debug/clk/nvdec1/rate
NVJPG_ENBL,/sys/kernel/debug/clk/nvjpg/clk_enable_count
NVJPG,/sys/kernel/debug/bpmp/debug/clk/nvjpg/rate
VIC_FREQ,/sys/kernel/debug/clk/vic/clk_rate
VIC_LOAD,/sys/kernel/debug/vic/actmon_avg_norm

Then,

❯ sudo tegrastats --verbose --load_cfg tegra_stats_conf
ERROR: failed to read /sys/devices/system/cpu/cpu8/cpufreq/cpuinfo_cur_freq

ERROR: failed to read /sys/devices/system/cpu/cpu9/cpufreq/cpuinfo_cur_freq

ERROR: failed to read /sys/devices/system/cpu/cpu10/cpufreq/cpuinfo_cur_freq

ERROR: failed to read /sys/devices/system/cpu/cpu11/cpufreq/cpuinfo_cur_freq

WARNING: failed to open /sys/kernel/debug/gpu_pci/clocks/gpcclk

WARNING: failed to open /sys/bus/pci/drivers/nvgpu/module/load

WARNING: failed to open /sys/kernel/debug/nvmap/iram/size

WARNING: failed to open /sys/kernel/debug/tegra_denver/nvmstats/instantaneous_stats
ERROR: failed to read /sys/devices/virtual/thermal/thermal_zone2/temp

ERROR: failed to read /sys/devices/virtual/thermal/thermal_zone3/temp

ERROR: failed to read /sys/devices/virtual/thermal/thermal_zone1/temp

ERROR: failed to read /sys/devices/virtual/thermal/thermal_zone4/temp

WARNING: failed to open /sys/bus/i2c/devices/0-0040/name

WARNING: failed to open /sys/bus/i2c/devices/0-0041/name

WARNING: failed to open /sys/bus/i2c/devices/6-0040/name

WARNING: failed to open /sys/bus/i2c/devices/7-0040/name

WARNING: failed to open /sys/bus/i2c/devices/2-0040/name

WARNING: failed to open /sys/class/hwmon/hwmon3/in1_label

WARNING: failed to open /sys/class/hwmon/hwmon3/label

WARNING: failed to open /sys/kernel/actmon_avg_activity/mc_all

WARNING: failed to open /sys/kernel/debug/bpmp/debug/clk/gpcclk/rate

WARNING: failed to open /sys/devices/gpu.0/load

WARNING: failed to open /sys/kernel/debug/bpmp/debug/clk/nvenc1/rate

WARNING: failed to open /sys/kernel/debug/bpmp/debug/clk/nvdec1/rate

WARNING: failed to open /sys/kernel/debug/vic/actmon_avg_norm

WARNING: failed to open /sys/kernel/actmon_avg_activity/mc_all

WARNING: failed to open /sys/kernel/debug/bpmp/debug/clk/gpcclk/rate

WARNING: failed to open /sys/devices/gpu.0/load

WARNING: failed to open /sys/kernel/debug/bpmp/debug/clk/nvenc1/rate

WARNING: failed to open /sys/kernel/debug/bpmp/debug/clk/nvdec1/rate

WARNING: failed to open /sys/kernel/debug/vic/actmon_avg_norm

10-11-2024 16:55:47 RAM 1539/30697MB (lfb 253x4MB) SWAP 0/15348MB (cached 0MB) CPU [0%@729,3%@729,0%@729,0%@729,0%@729,0%@729,0%@729,0%@729,off,off,off,off] EMC_FREQ 0%@2133 GR3D_FREQ 0%@[0] NVENC1 off NVDEC1 off APE 174 cpu@48.062C soc2@43.968C soc0@45.75C tj@48.062C soc1@44.875C VDD_GPU_SOC 2387mW/2387mW VDD_CPU_CV 0mW/0mW VIN_SYS_5V0 3735mW/3735mW

Still no luck getting the GPU temperature and I do not think those configurations are valid anymore.

Any information and/or direction on getting the GPU temperature would be greatly appreciated.

Thank you.

Hi chris.field,

Could you share the result of the following command on your devkit?

# nvpmodel -q --verbose

and also the full dmesg for further check.

Have you tried to run any application to check if GPU info would show?

Hello KevinFFF,

Could you share the result of the following command on your devkit?

Yes. Please see below.

❯ nvpmodel -q --verbose
NVPM VERB: Config file: /etc/nvpmodel.conf
NVPM VERB: parsing done for /etc/nvpmodel.conf
NVPM VERB: Current mode: NV Power Mode: MODE_30W
2
NVPM VERB: PARAM CPU_ONLINE: ARG CORE_0: PATH /sys/devices/system/cpu/cpu0/online: REAL_VAL: 1 CONF_VAL: 1
NVPM VERB: PARAM CPU_ONLINE: ARG CORE_1: PATH /sys/devices/system/cpu/cpu1/online: REAL_VAL: 1 CONF_VAL: 1
NVPM VERB: PARAM CPU_ONLINE: ARG CORE_2: PATH /sys/devices/system/cpu/cpu2/online: REAL_VAL: 1 CONF_VAL: 1
NVPM VERB: PARAM CPU_ONLINE: ARG CORE_3: PATH /sys/devices/system/cpu/cpu3/online: REAL_VAL: 1 CONF_VAL: 1
NVPM VERB: PARAM CPU_ONLINE: ARG CORE_4: PATH /sys/devices/system/cpu/cpu4/online: REAL_VAL: 1 CONF_VAL: 1
NVPM VERB: PARAM CPU_ONLINE: ARG CORE_5: PATH /sys/devices/system/cpu/cpu5/online: REAL_VAL: 1 CONF_VAL: 1
NVPM VERB: PARAM CPU_ONLINE: ARG CORE_6: PATH /sys/devices/system/cpu/cpu6/online: REAL_VAL: 1 CONF_VAL: 1
NVPM VERB: PARAM CPU_ONLINE: ARG CORE_7: PATH /sys/devices/system/cpu/cpu7/online: REAL_VAL: 1 CONF_VAL: 1
NVPM VERB: PARAM CPU_ONLINE: ARG CORE_8: PATH /sys/devices/system/cpu/cpu8/online: REAL_VAL: 0 CONF_VAL: 0
NVPM VERB: PARAM CPU_ONLINE: ARG CORE_9: PATH /sys/devices/system/cpu/cpu9/online: REAL_VAL: 0 CONF_VAL: 0
NVPM VERB: PARAM CPU_ONLINE: ARG CORE_10: PATH /sys/devices/system/cpu/cpu10/online: REAL_VAL: 0 CONF_VAL: 0
NVPM VERB: PARAM CPU_ONLINE: ARG CORE_11: PATH /sys/devices/system/cpu/cpu11/online: REAL_VAL: 0 CONF_VAL: 0
NVPM VERB: PARAM TPC_POWER_GATING: ARG TPC_PG_MASK: PATH /sys/devices/platform/gpu.0/tpc_pg_mask: REAL_VAL: 240 CONF_VAL: 240
NVPM VERB: PARAM GPU_POWER_CONTROL_ENABLE: ARG GPU_PWR_CNTL_EN: PATH /sys/devices/platform/gpu.0/power/control: REAL_VAL: auto CONF_VAL: on
NVPM VERB: PARAM CPU_A78_0: ARG MIN_FREQ: PATH /sys/devices/system/cpu/cpu0/cpufreq/scaling_min_freq: REAL_VAL: 729600 CONF_VAL: 729600
NVPM VERB: PARAM CPU_A78_0: ARG MAX_FREQ: PATH /sys/devices/system/cpu/cpu0/cpufreq/scaling_max_freq: REAL_VAL: 1728000 CONF_VAL: 1728000
NVPM VERB: PARAM CPU_A78_1: ARG MIN_FREQ: PATH /sys/devices/system/cpu/cpu4/cpufreq/scaling_min_freq: REAL_VAL: 729600 CONF_VAL: 729600
NVPM VERB: PARAM CPU_A78_1: ARG MAX_FREQ: PATH /sys/devices/system/cpu/cpu4/cpufreq/scaling_max_freq: REAL_VAL: 1728000 CONF_VAL: 1728000
NVPM VERB: PARAM GPU: ARG MIN_FREQ: PATH /sys/devices/platform/17000000.gpu/devfreq_dev/min_freq: REAL_VAL: 306000000 CONF_VAL: 0
NVPM VERB: PARAM GPU: ARG MAX_FREQ: PATH /sys/devices/platform/17000000.gpu/devfreq_dev/max_freq: REAL_VAL: 612000000 CONF_VAL: 612000000
NVPM VERB: PARAM GPU_POWER_CONTROL_DISABLE: ARG GPU_PWR_CNTL_DIS: PATH /sys/devices/platform/gpu.0/power/control: REAL_VAL: auto CONF_VAL: auto
NVPM VERB: PARAM EMC: ARG MAX_FREQ: PATH /sys/kernel/nvpmodel_clk_cap/emc: REAL_VAL: 3199000000 CONF_VAL: 9223372036854775807
NVPM VERB: PARAM DLA0_CORE: ARG MAX_FREQ: PATH /sys/devices/platform/bus@0/13e00000.host1x/15880000.nvdla0/clk_cap/dla0_core: REAL_VAL: 1369600000 CONF_VAL: 1369600000
NVPM VERB: PARAM DLA1_CORE: ARG MAX_FREQ: PATH /sys/devices/platform/bus@0/13e00000.host1x/158c0000.nvdla1/clk_cap/dla1_core: REAL_VAL: 1369600000 CONF_VAL: 1369600000
NVPM VERB: PARAM DLA0_FALCON: ARG MAX_FREQ: PATH /sys/devices/platform/bus@0/13e00000.host1x/15880000.nvdla0/clk_cap/dla0_falcon: REAL_VAL: 729600000 CONF_VAL: 729600000
NVPM VERB: PARAM DLA1_FALCON: ARG MAX_FREQ: PATH /sys/devices/platform/bus@0/13e00000.host1x/158c0000.nvdla1/clk_cap/dla1_falcon: REAL_VAL: 729600000 CONF_VAL: 729600000
NVPM VERB: PARAM PVA0_VPS: ARG MAX_FREQ: PATH /sys/devices/platform/bus@0/13e00000.host1x/16000000.pva0/clk_cap/pva0_vps: REAL_VAL: 512000000 CONF_VAL: 512000000
NVPM VERB: PARAM PVA0_AXI: ARG MAX_FREQ: PATH /sys/devices/platform/bus@0/13e00000.host1x/16000000.pva0/clk_cap/pva0_cpu_axi: REAL_VAL: 358400000 CONF_VAL: 358400000

and also the full dmesg for further check.

I have also uploaded/attached the output from the dmesg on the Jetson Orin 32GB devkit. It seemed a little too long to post directly in the topic.

Have you tried to run any application to check if GPU info would show?

I have tried the jtop utility in the jetson-stats package and the GPU temperature appears as “Offline” with JetPack 6. Other temperatures appear as expected along with the other stats/metrics, such as GPU Usage, CPU Usage, Memory Usage, etc. It was investigating the “Offline” status of the GPU temperature in jtop that prompted me to discover the GPU temperature is missing from the tegrastats output.

Is there another application I should use to check the GPU information?

dmesg.log (72.1 KB)

Could you share the result of cat /etc/nv_boot_control.conf?

Could you share the result of cat /etc/nv_boot_control.conf?

Yes, please see the following output.

❯ cat /etc/nv_boot_control.conf
TNSPEC 3701-500-0000-J.0-1-1-jetson-agx-orin-devkit-
COMPATIBLE_SPEC 3701-300-0000--1--jetson-agx-orin-devkit-
TEGRA_BOOT_STORAGE nvme0n1
TEGRA_CHIPID 0x23
TEGRA_OTA_BOOT_DEVICE /dev/mtdblock0
TEGRA_OTA_GPT_DEVICE /dev/mtdblock0

It seems your result of tegrastats are expected since you didn’t run anything to get it work.
(i.e. GPU is in idle mode so that it shows GR3D_FREQ 0%@[0,0], GPU temp will show when the load of GR3D_FREQ is not 0%)

Have you also tried using Jetson Power GUI to check the GPU status?

You can also refer to Activating GPU Power Rails on AGX Orin without a GUI - #5 by KevinFFF for the similar topic as yours.

KevinFFF,

Thank you for the information.

Have you also tried using Jetson Power GUI to check the GPU status?

No, I have not tried using the Jetson Power GUI tool. I am running my AGX Orin 32GB dev kit in headless mode. I tried connecting to the device with SSH X-forwarding enabled, and then running the /usr/share/nvpmodel_indicator/nvpmodel_indicator.py command. I get a segmentation fault:

(nvpmodel_indicator.py:3472): Gtk-CRITICAL **: 10:17:13.786: gtk_icon_theme_get_for_screen: assertion 'GDK_IS_SCREEN (screen)' failed
/usr/share/nvpmodel_indicator/./nvpmodel_indicator.py:237: Warning: invalid (NULL) pointer instance
  indicator = appindicator.Indicator.new(INDICATOR_ID, ICON_DEFAULT,
/usr/share/nvpmodel_indicator/./nvpmodel_indicator.py:237: Warning: g_signal_connect_data: assertion 'G_TYPE_CHECK_INSTANCE (instance)' failed
  indicator = appindicator.Indicator.new(INDICATOR_ID, ICON_DEFAULT,

(nvpmodel_indicator.py:3472): Gtk-CRITICAL **: 10:17:13.791: _gtk_style_provider_private_get_settings: assertion 'GTK_IS_STYLE_PROVIDER_PRIVATE (provider)' failed

(nvpmodel_indicator.py:3472): Gtk-CRITICAL **: 10:17:13.791: _gtk_style_provider_private_get_settings: assertion 'GTK_IS_STYLE_PROVIDER_PRIVATE (provider)' failed

(nvpmodel_indicator.py:3472): Gtk-CRITICAL **: 10:17:13.791: _gtk_style_provider_private_get_settings: assertion 'GTK_IS_STYLE_PROVIDER_PRIVATE (provider)' failed
[1]    3472 segmentation fault (core dumped)  python3 ./nvpmodel_indicator.py

Next, I tried connecting a monitor, mouse, and keyboard to my AGX Orin 32GB devkit, but I get an error for the X server and I cannot launch the Gnome desktop/GUI. Only the console appears.

Trying to use the Jetson Power GUI tool is probably off topic.

You can also refer to Activating GPU Power Rails on AGX Orin without a GUI - #5 by KevinFFF for the similar topic as yours.

Interesting! Thank you for sharing this topic. Looking at that topic, this comment offers the following output from tegrastats for L4T v35 as part of JetPack v5:

05-10-2024 11:54:03 RAM 3274/54718MB (lfb 10932x4MB) SWAP 0/27359MB (cached 0MB) CPU [2%@729,0%@729,0%@729,6%@729,0%@729,0%@729,0%@729,0%@729,0%@1497,0%@1497,0%@1497,13%@1497] EMC_FREQ 0%@2133 GR3D_FREQ 0%@[0,0] VIC_FREQ 921 APE 174 CV0@-256C CPU@52.656C Tboard@42C SOC2@49.125C Tdiode@42.5C SOC0@50.687C CV1@-256C GPU@-256C tj@52.562C SOC1@50.312C CV2@-256C VDD_GPU_SOC 2154mW/2154mW VDD_CPU_CV 718mW/718mW VIN_SYS_5V0 7862mW/7862mW NC 0mW/0mW VDDQ_VDD2_1V8AO 796mW/796mW NC 0mW/0mW

The GR3D_FREQ is 0%@[0,0] but the GPU@-256C temperature is present. The -256C indicates the temperature is not working but the GPU@ “tag” is nonetheless still present in the tegrastats output. It is possible the GPU temperature is reporting -256C for the user in the topic because it is a custom carrier board.

I started jtop and ran an application that used the GPU. I saw the GPU usage reach 93% in jtop and the GPU temperature changed from Offline to 43.44C. Then, I stopped the application that used the GPU. The GPU usage returned to 0.0%. Yet! The GPU temperature did not change back to Offline. It continued to report the correct GPU temperature.

I exited out of jtop and ran sudo tegrastats from the command line.

10-16-2024 10:35:20 RAM 5764/30697MB (lfb 6x4MB) SWAP 0/15348MB (cached 0MB) CPU [5%@729,6%@729,5%@729,7%@729,6%@729,5%@729,3%@729,5%@729,off,off,off,off] EMC_FREQ 0%@2133 GR3D_FREQ 0%@[305,0] NVENC off NVDEC off NVJPG off NVJPG1 off VIC off OFA off NVDLA0 off NVDLA1 off PVA0_FREQ off APE 174 cpu@48.187C soc2@43.875C soc0@45.531C gpu@44.25C tj@48.187C soc1@44.75C VDD_GPU_SOC 2785mW/2785mW VDD_CPU_CV 397mW/397mW VIN_SYS_5V0 4145mW/4145mW

The gpu@44.25C “tag” and temperature appears despite the GR3D_FREQ 0% after stopping the GPU application.

Thus, it appears the GPU temperature is offline, non-existent until the first use of the GPU. This is a behavior change for tegrastats.

JetPack v5

The GPU temperature is offline until first GPU use and the gpu@-256C temperature is used to indicate “offline”. I do not have a JetPack v5 device at the moment to confirm the JetPack v5 behavior, but this topic seems to confirm this behavior.

Unconfirmed if the GPU temperature continues to be “online” after first GPU use.

JetPack v6

The GPU temperature is offline until first GPU use, but “offline” is indicated by tegrastats through not providing any gpu “tagged” temperature. After first GPU use, the GPU temperature continues to be “online”. When the GPU usage returns to 0%, the GPU temperature works as expected after the first use.

Does this need for “first use” behavior have anything to do with the GR3D_FREQ 0%[305,0] field being GR3D_FREQ 0%[0,0] and changes to 305 after first use?

Thank you

It may not relate to “first use” but directly relate to if GPU is running.

May I know what’s your use case for this GPU temperature?

We use the GPU temperature, along with the all other metrics from tegrastats, to monitor performance during inference with variety of different model architectures. In JetPack v5, all the metrics appeared all of the time, but when we upgraded to JetPack v6, our tegrastats parser assumed the format and output would be the same and all metrics would be present. JetPack v6 changed the output behavior of the tegrastats and the GPU temperature is no longer always included in the output string.