Tegrastats stop showing some information while running inside Docker container

Hello guys,

Currently I’m running some Docker containers in multiples Jetson Nano. To monitor all of them I have a small container in each of them, where a cron run some commands to get system stats and send them to a web page. One of the commands I use to monitor system resources is Tegrastats. This small container runs smoothly most of the time since is only a mix of small system commands, grep, sed, etc. But at some point the Tegrastats command stop giving information about GPU frequency ( EMC_FREQ and GR3D_FREQ info in Tegrastats’ output), making the command pipeline to fail after that.

Just as reference, normally Tegrastats’ output something like:

RAM 2324/3964MB (lfb 21x1MB) SWAP 327/1982MB (cached 10MB) CPU [18%@1479,21%@1479,17%@1479,13%@1479] EMC_FREQ 0% GR3D_FREQ 0% PLL@40C CPU@43C PMIC@100C GPU@40C AO@46C thermal@41.5C POM_5V_IN 5623/6058 POM_5V_GPU 2249/3295 POM_5V_CPU 1357/633

But at some point (that I havent determined) it stops showing EMC_FREQ and GR3D_FREQ, so its something like:

RAM 2358/3964MB (lfb 21x1MB) SWAP 327/1982MB (cached 10MB) CPU [12%@307,18%@307,13%@307,18%@307] PLL@39C CPU@43C PMIC@100C GPU@40C AO@46C thermal@41.75C POM_5V_IN 6912/6969 POM_5V_GPU 4331/4369 POM_5V_CPU 384/345

Just to clarify:

  • When the Docker container starts, Tegrastats inside the container works normally.
  • Tegrastats can keep working normally inside the container for days.
  • While the container fails to get the normal Tegrastats output. Tegrastats runs normally if its ran inside the host. (Running Tegrastats manually inside the Docker container gives the output missing both freqs)
  • The command used is “tegrastats” not “sudo tegrastats”, since it gives the information I need anyway.
  • The Docker container runs with “runtime nvidia”

Is there something I could be missing that makes Tegrastats stop getting the GPU usage? anything I could check for information on errors? any service that could need a restart?

Edit: typo

Hi,
You need to run sudo tegrastats or the information may not be precise. Please execute with sudo and check again.

So, I was having the same issues. I wanted to get the GPU frequency inside of a docker container and stumbled upon (what I believe to be) the root-cause:

/usr/bin/jetson_clocks not having access to clocks

outside the container

# /usr/bin/jetson_clocks --show
SOC family:tegra194  Machine:NVIDIA Jetson Xavier NX Waggle Wild Sage
Online CPUs: 0-5
CPU Cluster Switching: Disabled
cpu0: Online=1 Governor=schedutil MinFreq=1907200 MaxFreq=1907200 CurrentFreq=1907200 IdleStates: C1=0 c6=0
cpu1: Online=1 Governor=schedutil MinFreq=1907200 MaxFreq=1907200 CurrentFreq=1907200 IdleStates: C1=0 c6=0
cpu2: Online=1 Governor=schedutil MinFreq=1907200 MaxFreq=1907200 CurrentFreq=1907200 IdleStates: C1=0 c6=0
cpu3: Online=1 Governor=schedutil MinFreq=1907200 MaxFreq=1907200 CurrentFreq=1907200 IdleStates: C1=0 c6=0
cpu4: Online=1 Governor=schedutil MinFreq=1907200 MaxFreq=1907200 CurrentFreq=1907200 IdleStates: C1=0 c6=0
cpu5: Online=1 Governor=schedutil MinFreq=1907200 MaxFreq=1907200 CurrentFreq=1907200 IdleStates: C1=0 c6=0
GPU MinFreq=1109250000 MaxFreq=1109250000 CurrentFreq=1109250000
EMC MinFreq=204000000 MaxFreq=1600000000 CurrentFreq=1600000000 FreqOverride=1
Fan: speed=130
NV Power Mode: MODE_15W_6CORE

Inside the container

# /usr/bin/jetson_clocks
cat: /sys/kernel/debug/bpmp/debug/clk/emc/max_rate: No such file or directory
/usr/bin/jetson_clocks: line 328: /sys/kernel/debug/bpmp/debug/clk/emc/mrq_rate_locked: No such file or directory
/usr/bin/jetson_clocks: line 329: /sys/kernel/debug/bpmp/debug/clk/emc/rate: No such file or directory

I am still working through this but wanted to inquire if @rlienlafsoto was able to solve the original issue.

Reference: Error when using jtop in docker container · Issue #63 · rbonghi/jetson_stats · GitHub

I was able to get the GPU frequency in a docker container by volume mounting the /sys/kerne/debug path

# docker run --privileged --rm -it -v /sys/kernel/debug:/sys/kernel/debug --entrypoint tegrastats joe:latest
RAM 1838/7771MB (lfb 515x4MB) SWAP 31/20269MB (cached 6MB) CPU [14%@1906,16%@1906,11%@1906,15%@1906,15%@1906,11%@1905] EMC_FREQ 1%@1600 GR3D_FREQ 0%@1109 APE 150 MTS fg 0% bg 5% AO@43C GPU@44C PMIC@100C AUX@43.5C CPU@44.5C thermal@44.4C VDD_IN 5661/5661 VDD_CPU_GPU_CV 1873/1873 VDD_SOC 1154/1154

Without volume mount

# docker run --privileged --rm -it --entrypoint tegrastats joe:latest
RAM 1841/7771MB (lfb 513x4MB) SWAP 31/20269MB (cached 6MB) CPU [15%@1906,24%@1907,20%@1906,17%@1906,36%@1906,10%@1907] EMC_FREQ 0% GR3D_FREQ 0% AO@43C GPU@44C PMIC@100C AUX@43.5C CPU@45C thermal@44.45C VDD_IN 6259/6259 VDD_CPU_GPU_CV 2468/2468 VDD_SOC 1154/1154