Currently I’m running some Docker containers in multiples Jetson Nano. To monitor all of them I have a small container in each of them, where a cron run some commands to get system stats and send them to a web page. One of the commands I use to monitor system resources is Tegrastats. This small container runs smoothly most of the time since is only a mix of small system commands, grep, sed, etc. But at some point the Tegrastats command stop giving information about GPU frequency ( EMC_FREQ and GR3D_FREQ info in Tegrastats’ output), making the command pipeline to fail after that.
Just as reference, normally Tegrastats’ output something like:
RAM 2324/3964MB (lfb 21x1MB) SWAP 327/1982MB (cached 10MB) CPU [18%@1479,21%@1479,17%@1479,13%@1479] EMC_FREQ 0% GR3D_FREQ 0% PLL@40C CPU@43C PMIC@100C GPU@40C AO@46C firstname.lastname@example.orgC POM_5V_IN 5623/6058 POM_5V_GPU 2249/3295 POM_5V_CPU 1357/633
But at some point (that I havent determined) it stops showing EMC_FREQ and GR3D_FREQ, so its something like:
RAM 2358/3964MB (lfb 21x1MB) SWAP 327/1982MB (cached 10MB) CPU [12%@307,18%@307,13%@307,18%@307] PLL@39C CPU@43C PMIC@100C GPU@40C AO@46C email@example.comC POM_5V_IN 6912/6969 POM_5V_GPU 4331/4369 POM_5V_CPU 384/345
Just to clarify:
- When the Docker container starts, Tegrastats inside the container works normally.
- Tegrastats can keep working normally inside the container for days.
- While the container fails to get the normal Tegrastats output. Tegrastats runs normally if its ran inside the host. (Running Tegrastats manually inside the Docker container gives the output missing both freqs)
- The command used is “tegrastats” not “sudo tegrastats”, since it gives the information I need anyway.
- The Docker container runs with “runtime nvidia”
Is there something I could be missing that makes Tegrastats stop getting the GPU usage? anything I could check for information on errors? any service that could need a restart?