I recently upgraded my Jetson Orin AGX 32GB Developer Kit from JetPack 5 to JetPack 6 and now I do not see a GPU temperature in the tegrastats output.
❯ cat /etc/nv_tegra_release
# R36 (release), REVISION: 4.0, GCID: 37537400, BOARD: generic, EABI: aarch64, DATE: Fri Sep 13 04:36:44 UTC 2024
# KERNEL_VARIANT: oot
TARGET_USERSPACE_LIB_DIR=nvidia
TARGET_USERSPACE_LIB_DIR_PATH=usr/lib/aarch64-linux-gnu/nvidia
❯ sudo tegrastats
10-11-2024 16:37:03 RAM 1543/30697MB (lfb 253x4MB) SWAP 0/15348MB (cached 0MB) CPU [0%@729,0%@729,0%@729,0%@729,0%@729,0%@729,0%@729,0%@729,off,off,off,off] EMC_FREQ 0%@2133 GR3D_FREQ 0%@[0,0] NVENC off NVDEC off NVJPG off NVJPG1 off VIC off OFA off NVDLA0 off NVDLA1 off PVA0_FREQ off APE 174 cpu@47.781C soc2@43.812C soc0@45.531C tj@47.875C soc1@44.562C VDD_GPU_SOC 2386mW/2386mW VDD_CPU_CV 0mW/0mW VIN_SYS_5V0 3735mW/3735mW
10-11-2024 16:37:04 RAM 1543/30697MB (lfb 253x4MB) SWAP 0/15348MB (cached 0MB) CPU [0%@729,0%@729,0%@729,0%@729,0%@729,0%@729,0%@729,0%@729,off,off,off,off] EMC_FREQ 0%@2133 GR3D_FREQ 0%@[0,0] NVENC off NVDEC off NVJPG off NVJPG1 off VIC off OFA off NVDLA0 off NVDLA1 off PVA0_FREQ off APE 174 cpu@47.812C soc2@43.812C soc0@45.437C tj@47.812C soc1@44.906C VDD_GPU_SOC 2386mW/2386mW VDD_CPU_CV 0mW/0mW VIN_SYS_5V0 3735mW/3735mW
10-11-2024 16:37:05 RAM 1543/30697MB (lfb 253x4MB) SWAP 0/15348MB (cached 0MB) CPU [0%@729,0%@729,0%@729,0%@729,0%@729,0%@729,0%@729,0%@729,off,off,off,off] EMC_FREQ 0%@2133 GR3D_FREQ 0%@[0,0] NVENC off NVDEC off NVJPG off NVJPG1 off VIC off OFA off NVDLA0 off NVDLA1 off PVA0_FREQ off APE 174 cpu@47.812C soc2@43.781C soc0@45.468C tj@47.812C soc1@44.562C VDD_GPU_SOC 2386mW/2386mW VDD_CPU_CV 0mW/0mW VIN_SYS_5V0 3735mW/3735mW
10-11-2024 16:37:06 RAM 1543/30697MB (lfb 253x4MB) SWAP 0/15348MB (cached 0MB) CPU [0%@729,0%@729,0%@729,0%@729,0%@729,0%@729,0%@729,0%@729,off,off,off,off] EMC_FREQ 0%@2133 GR3D_FREQ 0%@[0,0] NVENC off NVDEC off NVJPG off NVJPG1 off VIC off OFA off NVDLA0 off NVDLA1 off PVA0_FREQ off APE 174 cpu@47.906C soc2@43.843C soc0@45.437C tj@47.906C soc1@44.625C VDD_GPU_SOC 2386mW/2386mW VDD_CPU_CV 0mW/0mW VIN_SYS_5V0 3735mW/3735mW
In JetPack 5, there was a gpu@XX.XXXC
entry after the cpu@
temperature, where XX.XXXC
would be replaced with the current GPU temperature. I am using the development kit carrier board.
Based another post that was having similar issues with tegrastats values missing in JetPack 6, I verified the nvgpu
module was installed.
❯ lsmod | grep "nvgpu"
nvgpu 2654208 0
host1x 180224 9 host1x_nvhost,host1x_fence,tegra_se,nvgpu,tegra_drm,nvhost_nvdla,nvidia_drm,nvhost_pva,nvidia_modeset
mc_utils 16384 3 nvidia,nvgpu,tegra_camera_platform
nvmap 204800 1 nvgpu
and tried the sudo modprobe nvgpu
command. No error message and nvgpu
was re-verified with identical output from above. I then tried the --readall
flag.
❯ sudo tegrastats --readall
10-11-2024 16:45:48 RAM 1538/30697MB (lfb 253x4MB) SWAP 0/15348MB (cached 0MB) CPU [0%@729,0%@729,0%@729,0%@729,0%@729,0%@729,0%@729,0%@729,off,off,off,off] EMC_FREQ 0%@2133 GR3D_FREQ 0%@[0,0] NVENC off NVDEC off NVJPG off NVJPG1 off VIC off OFA off NVDLA0 off NVDLA1 off PVA0_FREQ off APE 174 cpu@48.187C soc2@43.906C soc0@45.593C tj@48.187C soc1@44.812C VDD_GPU_SOC 2387mW/2387mW VDD_CPU_CV 397mW/397mW VIN_SYS_5V0 3735mW/3735mW
10-11-2024 16:45:49 RAM 1538/30697MB (lfb 253x4MB) SWAP 0/15348MB (cached 0MB) CPU [0%@729,0%@729,0%@729,0%@729,0%@729,0%@729,0%@729,0%@729,off,off,off,off] EMC_FREQ 0%@2133 GR3D_FREQ 0%@[0,0] NVENC off NVDEC off NVJPG off NVJPG1 off VIC off OFA off NVDLA0 off NVDLA1 off PVA0_FREQ off APE 174 cpu@48.062C soc2@43.875C soc0@45.625C tj@48.062C soc1@44.687C VDD_GPU_SOC 2387mW/2387mW VDD_CPU_CV 0mW/199mW VIN_SYS_5V0 3735mW/3735mW
The gpu@XX.XXXC
temperature key-value is still missing. The --verbose
command yielded the following:
❯ sudo tegrastats --verbose --readall
ERROR: failed to read /sys/devices/system/cpu/cpu8/cpufreq/cpuinfo_cur_freq
ERROR: failed to read /sys/devices/system/cpu/cpu9/cpufreq/cpuinfo_cur_freq
ERROR: failed to read /sys/devices/system/cpu/cpu10/cpufreq/cpuinfo_cur_freq
ERROR: failed to read /sys/devices/system/cpu/cpu11/cpufreq/cpuinfo_cur_freq
WARNING: failed to open /sys/kernel/debug/gpu_pci/clocks/gpcclk
WARNING: failed to open /sys/bus/pci/drivers/nvgpu/module/load
WARNING: failed to open /sys/kernel/debug/nvmap/iram/size
WARNING: failed to open /sys/kernel/debug/tegra_denver/nvmstats/instantaneous_stats
ERROR: failed to read /sys/devices/virtual/thermal/thermal_zone2/temp
ERROR: failed to read /sys/devices/virtual/thermal/thermal_zone3/temp
ERROR: failed to read /sys/devices/virtual/thermal/thermal_zone1/temp
ERROR: failed to read /sys/devices/virtual/thermal/thermal_zone4/temp
WARNING: failed to open /sys/bus/i2c/devices/0-0040/name
WARNING: failed to open /sys/bus/i2c/devices/0-0041/name
WARNING: failed to open /sys/bus/i2c/devices/6-0040/name
WARNING: failed to open /sys/bus/i2c/devices/7-0040/name
WARNING: failed to open /sys/bus/i2c/devices/2-0040/name
WARNING: failed to open /sys/class/hwmon/hwmon3/in1_label
WARNING: failed to open /sys/class/hwmon/hwmon3/label
Is one of these errors or warnings causing the missing GPU temperature?
The other post mentions passing a configuration file to tegrastats. I tried this by creating a file with the exact contents.
❯ cat tegra_stats_conf
APE,/sys/kernel/debug/bpmp/debug/clk/ape/rate
EMC_FREQ,/sys/kernel/debug/bpmp/debug/clk/emc/rate
EMC_LOAD,/sys/kernel/actmon_avg_activity/mc_all
GR3D_FREQ,/sys/kernel/debug/bpmp/debug/clk/gpcclk/rate
IGPU_LOAD,/sys/devices/gpu.0/load
NVENC_ENBL,/sys/kernel/debug/clk/nvenc/clk_enable_count
NVENC1_ENBL,/sys/kernel/debug/clk/nvenc1/clk_enable_count
NVENC,/sys/kernel/debug/bpmp/debug/clk/nvenc/rate
NVENC1,/sys/kernel/debug/bpmp/debug/clk/nvenc1/rate
NVDEC_ENBL,/sys/kernel/debug/clk/nvdec/clk_enable_count
NVDEC1_ENBL,/sys/kernel/debug/clk/nvdec1/clk_enable_count
NVDEC,/sys/kernel/debug/bpmp/debug/clk/nvdec/rate
NVDEC1,/sys/kernel/debug/bpmp/debug/clk/nvdec1/rate
NVJPG_ENBL,/sys/kernel/debug/clk/nvjpg/clk_enable_count
NVJPG,/sys/kernel/debug/bpmp/debug/clk/nvjpg/rate
VIC_FREQ,/sys/kernel/debug/clk/vic/clk_rate
VIC_LOAD,/sys/kernel/debug/vic/actmon_avg_norm
Then,
❯ sudo tegrastats --verbose --load_cfg tegra_stats_conf
ERROR: failed to read /sys/devices/system/cpu/cpu8/cpufreq/cpuinfo_cur_freq
ERROR: failed to read /sys/devices/system/cpu/cpu9/cpufreq/cpuinfo_cur_freq
ERROR: failed to read /sys/devices/system/cpu/cpu10/cpufreq/cpuinfo_cur_freq
ERROR: failed to read /sys/devices/system/cpu/cpu11/cpufreq/cpuinfo_cur_freq
WARNING: failed to open /sys/kernel/debug/gpu_pci/clocks/gpcclk
WARNING: failed to open /sys/bus/pci/drivers/nvgpu/module/load
WARNING: failed to open /sys/kernel/debug/nvmap/iram/size
WARNING: failed to open /sys/kernel/debug/tegra_denver/nvmstats/instantaneous_stats
ERROR: failed to read /sys/devices/virtual/thermal/thermal_zone2/temp
ERROR: failed to read /sys/devices/virtual/thermal/thermal_zone3/temp
ERROR: failed to read /sys/devices/virtual/thermal/thermal_zone1/temp
ERROR: failed to read /sys/devices/virtual/thermal/thermal_zone4/temp
WARNING: failed to open /sys/bus/i2c/devices/0-0040/name
WARNING: failed to open /sys/bus/i2c/devices/0-0041/name
WARNING: failed to open /sys/bus/i2c/devices/6-0040/name
WARNING: failed to open /sys/bus/i2c/devices/7-0040/name
WARNING: failed to open /sys/bus/i2c/devices/2-0040/name
WARNING: failed to open /sys/class/hwmon/hwmon3/in1_label
WARNING: failed to open /sys/class/hwmon/hwmon3/label
WARNING: failed to open /sys/kernel/actmon_avg_activity/mc_all
WARNING: failed to open /sys/kernel/debug/bpmp/debug/clk/gpcclk/rate
WARNING: failed to open /sys/devices/gpu.0/load
WARNING: failed to open /sys/kernel/debug/bpmp/debug/clk/nvenc1/rate
WARNING: failed to open /sys/kernel/debug/bpmp/debug/clk/nvdec1/rate
WARNING: failed to open /sys/kernel/debug/vic/actmon_avg_norm
WARNING: failed to open /sys/kernel/actmon_avg_activity/mc_all
WARNING: failed to open /sys/kernel/debug/bpmp/debug/clk/gpcclk/rate
WARNING: failed to open /sys/devices/gpu.0/load
WARNING: failed to open /sys/kernel/debug/bpmp/debug/clk/nvenc1/rate
WARNING: failed to open /sys/kernel/debug/bpmp/debug/clk/nvdec1/rate
WARNING: failed to open /sys/kernel/debug/vic/actmon_avg_norm
10-11-2024 16:55:47 RAM 1539/30697MB (lfb 253x4MB) SWAP 0/15348MB (cached 0MB) CPU [0%@729,3%@729,0%@729,0%@729,0%@729,0%@729,0%@729,0%@729,off,off,off,off] EMC_FREQ 0%@2133 GR3D_FREQ 0%@[0] NVENC1 off NVDEC1 off APE 174 cpu@48.062C soc2@43.968C soc0@45.75C tj@48.062C soc1@44.875C VDD_GPU_SOC 2387mW/2387mW VDD_CPU_CV 0mW/0mW VIN_SYS_5V0 3735mW/3735mW
Still no luck getting the GPU temperature and I do not think those configurations are valid anymore.
Any information and/or direction on getting the GPU temperature would be greatly appreciated.
Thank you.