GPU usage rises and frames drop after hot-surface-alert (or other popup alerts) appear

Hi,
This may be similar to

My camera pipeline normally does not lose frames and consumes 55-60% of GPU time on average, as reported by tegrastats.
But when temperature reaches 70C - hot-surface alert appears and suddenly GPU usage jumps to 95+% and frame loss occurs.

I distilled this down to interaction between Gtk popup notification utilized by /usr/share/nvpmodel_indicator/nvpmodel_indicator.py and NvEglRenderer in full screen mode.

To reproduce first make this small python code, which calls the same API as nvpmodel_indicator.py:

cat <<EOF > warning.py
import os
import gi
gi.require_version("Gtk", "3.0")
gi.require_version('AppIndicator3', '0.1')
gi.require_version('Notify', '0.7')
from gi.repository import Gtk as gtk
from gi.repository import AppIndicator3 as appindicator
from gi.repository import Notify as notify
from gi.repository import GLib
INDICATOR_ID = 'test'
ICON_DEFAULT = os.path.abspath('/usr/share/nvpmodel_indicator/nv_logo.svg')
ICON_WARNING = 'dialog-warning'
notify.init(INDICATOR_ID)
warning = notify.Notification()
warning.update("Sample warning message")
warning.show()
EOF

Verify that Orin AGX DevKit is in MAXN mode, then run jetsonclock and tegrastats (with some filter for clarity):

sudo nvpmodel -q # Should print NV Power Mode: MAXN
sudo /usr/bin/jetson_clocks
tegrastats | sed -n 's@\(..-..-.... ..:..:..\).*\(GR3D_FREQ\) \(....\).*\(tj\)\(.....\).*@\1,\2,\3,\4,\5@p'

note that currently “GR3D_FREQ,0%”

In another shell run some GPU code with NvEglRenderer:

DISPLAY=:0 ./video_dec_cuda /usr/src/jetson_multimedia_api/data/Video/sample_outdoor_car_1080p_10fps.h264 H264 --fullscreen -fps 60

Observe GR3D_FREQ numbers in the middle of video (your numbers may vary depending on display resolution):

11-01-2024 20:44:42,GR3D_FREQ,10% ,tj,@48.0
11-01-2024 20:44:43,GR3D_FREQ,12% ,tj,@48.0
11-01-2024 20:44:44,GR3D_FREQ,7% c,tj,@48.1
11-01-2024 20:44:45,GR3D_FREQ,7% c,tj,@48.1
11-01-2024 20:44:46,GR3D_FREQ,7% c,tj,@48.0
11-01-2024 20:44:47,GR3D_FREQ,6% c,tj,@48.3

Now run

DISPLAY=:0 python3 warning.py

And again:

DISPLAY=:0 ./video_dec_cuda /usr/src/jetson_multimedia_api/data/Video/sample_outdoor_car_1080p_10fps.h264 H264 --fullscreen -fps 60

Now GR3D_FREQ increased almost 4 times! It does not loose frames because this video is so tiny, but if you will replace it by larger one, it will lose frames!

11-01-2024 20:46:02,GR3D_FREQ,38% ,tj,@47.7
11-01-2024 20:46:03,GR3D_FREQ,38% ,tj,@47.9
11-01-2024 20:46:04,GR3D_FREQ,40% ,tj,@47.7
11-01-2024 20:46:05,GR3D_FREQ,39% ,tj,@47.8
11-01-2024 20:46:06,GR3D_FREQ,37% ,tj,@47.8

I am not sure, but I suspect the same problem happens with other Ubuntu notifications such as the dreaded Software Updates reminder.
In ncu profier output I see that some CUDA operations take many times longer than in normal case without any obvious blocker.
May be X window manager or compositor took so much GPU? How to control that?
Should we disable all popup notifications and what is the best way to do that?

Thank you

Hi,
Here are some suggestions for the common issues:

1. Performance

Please run the below command before benchmarking deep learning use case:

$ sudo nvpmodel -m 0
$ sudo jetson_clocks

2. Installation

Installation guide of deep learning frameworks on Jetson:

3. Tutorial

Startup deep learning tutorial:

4. Report issue

If these suggestions don’t help and you want to report an issue to us, please attach the model, command/step, and the customized app (if any) with us to reproduce locally.

Thanks!

Please update to r35.6 to verify due to

Camera improvements
Updated the Argus library, resulting in up to a 40% reduction in CPU load for CSI camera capture with Argus cameras.

I am not using Argus and I reproduced the problem with video_dec_cuda sample, which does not use Argus either.

When I run video_dec_cuda before gnome notification is shown,
then htop shows that video_dec_cuda consumes most CPU time and tegrastats shows less than 10% GPU usage.

When I run video_dec_cuda while gnome notification is shown,
then htop shows “./video_dec_cuda” consumes 11% CPU and “/usr/bin/gnome-shell” consumes 9% CPU and tegrastats shows more than 40% GPU usage.

So, video_dec_cuda appears to have a costly interaction with gnome-shell or some other X related process, like compositor.
By the way, can you tell me which compositor Orin is using? Is it a separate process or some library within X process?

I can fix this problem by disabling notifications using “gsettings set org.gnome.desktop.notifications show-banners false”
but there may be other hidden Ubuntu services, which may also activate suddently and ruin my CUDA pipeline.
I need to identify and disable all those system services, which may use GPU behind my back.

What is annoying is that I cannot find a way to profile GPU system wide (tegrastats is not enough).

I am looking at GPU Utilization(%) for Individual Process and How to get GPU usage by an specific application(PID) - #6 by diogojusten
but cannot figure out how to attach nsys to /usr/bin/gnome-shell or /usr/lib/xorg/Xorg
Do you know how to do that?

Hi,

ncu can support the attach mode but the app needs to be launched with --mode launch.

Thanks.

This topic was automatically closed 14 days after the last reply. New replies are no longer allowed.