XServer would crash when running app with full screen mode

Hello, we’re develop the app on Orin NX device with jetpack R35.4.1. And our app would run AI detection, so the GPU usage was high. It’s running well in non full screen mode. But for fluency issue, we try to run the app with full screen mode. In this situation, the XServer might crash when switched the windows focus between the full screen app and terminal or other app. It’s seems that the XServer would crash when switch into the full screen mode with high GPU usage sometimes.
And the Xorg log as follow:
[ 556.214] (EE) NVIDIA(0): The NVIDIA X driver has encountered an error; attempting to
[ 556.215] (EE) NVIDIA(0): recover…
[ 556.251] (II) NVIDIA(0): Error recovery was successful.
[ 556.258] (EE)
[ 556.258] (EE) Backtrace:
[ 556.264] (EE) 0: /usr/lib/xorg/Xorg (OsLookupColor+0x188) [0xaaaab9b48168]
[ 556.264] (EE) unw_get_proc_info failed: no unwind info found [-10]
[ 556.264] (EE)
[ 556.264] (EE) Segmentation fault at address 0x34
[ 556.264] (EE)
Fatal server error:
[ 556.264] (EE) Caught signal 11 (Segmentation fault). Server aborting
[ 556.264] (EE)
[ 556.264] (EE)
Please consult the The X.Org Foundation support
at http://wiki.x.org
for help.
[ 556.264] (EE) Please also check the log file at “/var/log/Xorg.0.log” for additional information.
[ 556.264] (EE)
[ 556.331] (EE) Server terminated with error (1). Closing log file.

Is there any suggestion about this issue?

Such issue requires to share method to reproduce issue on NV devkit.

We need you to provide method so that we can check it locally.

I will suggest you also monitor “tegrastats” (and log it, e.g., “tegrastats 2>&1 | tee log_stats.txt”) from a serial console (serial console will log when much crashes and fails; the remote computer will be able to save the log even if the Jetson fails). If the system as a whole does not fail, then ssh would also be a way to record this, but serial console is on the extreme side of reliable.

I say this because the integrated GPU (iGPU) of the Jetson shares memory with the rest of the Jetson and does not have its own memory. Combine this with the segmentation fault. Perhaps it just ran out of memory? You could include both the mentioned “/var/log/Xorg.0.log” and the serial console tegrastats.

Hello, sorry for the late reply, we use Qt to develop the app, and the app will play video continuously.
So the reproduce step is:

  1. First open a full screen mode app with playing video.
  2. using Alt+Tab to switch between the full screen app and others.

After few times, the error may happen.
Also, I will try it on a devkit and log the tegrastats.
Thanks for the suggestion

Please clarify if any kind of Qt application could reproduce this issue or must use the one you are using now to reproduce.

Also, you mentioned high GPU loading. I don’t think just one Qt application would lead to high GPU loading.

You must share all the application info that would lead to this error.

Yes, I think it just the Qt application with video playing function in full screen mode will cause the problem. Our other teams faced the same issue too.

My application will receive the video from a camera device and detect object with AI model. More model we use would cost more GPU usage.

I try to stop all models, only playing the video, the GPU cost at this situation is about 13% and GPU frequency setting is 408MHz. And when I switch the focus between app and terminal, the usage may dramatically increase to 99% and make the XServer crash.

Could you share the application you are using to reproduce this issue?

This would create a very large log if using strace, but have you considered building a debug version of your application, and then running it in gdb? It is possible that this would lead straight to the issue.

If you are in a terminal in the GUI and you find your display context with “echo $DISPLAY”, then in either serial console or ssh or one of the local pure text consoles you could “export :0” (or whatever the context is you export it), and then start your app from that text terminal. The actual GUI would show the application despite the debug session being in a non-GUI. Should you get it to fail within the GUI you will probably be able to get a stack from with the backtrace function in gdb (just type “bt”).

strace and perhaps ltrace are useful too, but much more difficult to use with this since it doesn’t relate to your code and has enormous volume. If it is suspect that the system or library is at fault, then that is when you might migrate to strace or ltrace (I suggest not doing so unless you are certain it is some outside failure).

Sorry, I can’t provide the app directly.
Maybe I can make a simple sample.

I have tried to use gdb to find out the crash reason, but it seems that the log is tracing back to X11 too.

The X11 connection broke: I/O error (code 1)
XIO: fatal IO error 0 (Success) on X server “:1”
after 3707 requests (3707 known processed) with 0 events remaining.
The X11 connection broke (error 1). Did the X11 server die?
[Thread 0xffffbc400900 (LWP 6267) exited]
QObject::~QObject: Timers cannot be stopped from another thread
[Thread 0xffffb8474900 (LWP 6291) exited]
[Thread 0xffffd0b4a900 (LWP 6238) exited]
[Thread 0xffffbd603900 (LWP 6241) exited]
–Type for more, q to quit, c to continue without paging–

Thread 14 “QSGRenderThread” received signal SIGSEGV, Segmentation fault.
[Switching to Thread 0xffffb40eb900 (LWP 6298)]
0x0000ffffed61e448 in __run_exit_handlers (status=1, listp=0xffffed7566b8 <__exit_funcs>, run_list_atexit=run_list_atexit@entry=true,
run_dtors=run_dtors@entry=true) at exit.c:77
77 exit.c: No such file or directory.
(gdb) bt
#0 0x0000ffffed61e448 in __run_exit_handlers
(status=1, listp=0xffffed7566b8 <__exit_funcs>, run_list_atexit=run_list_atexit@entry=true, run_dtors=run_dtors@entry=true) at exit.c:77
#1 0x0000ffffed61e60c in __GI_exit (status=) at exit.c:139
#2 0x0000ffffe9c487e8 in _XDefaultIOError () at /lib/aarch64-linux-gnu/libX11.so.6
#3 0x0000ffffbc6cb5dc in ioErrorHandler(Display*) (dpy=0xaaaaac34c800)
at /media/nvidia/qt5/qtbase/src/plugins/platforms/xcb/qxcbconnection_basic.cpp:106
#4 0x0000ffffe9c48ab0 in _XIOError () at /lib/aarch64-linux-gnu/libX11.so.6
#5 0x0000ffffe9c46be8 in _XRead () at /lib/aarch64-linux-gnu/libX11.so.6
#6 0x0000ffffba305450 in () at /usr/lib/aarch64-linux-gnu/tegra/libGLX_nvidia.so.0
#7 0x0000ffffba29a034 in () at /usr/lib/aarch64-linux-gnu/tegra/libGLX_nvidia.so.0
#8 0x0000ffffba3037c0 in () at /usr/lib/aarch64-linux-gnu/tegra/libGLX_nvidia.so.0
#9 0x0000ffffba303b68 in () at /usr/lib/aarch64-linux-gnu/tegra/libGLX_nvidia.so.0
#10 0x0000ffffb95b6584 in () at /usr/lib/aarch64-linux-gnu/tegra/libnvidia-glcore.so.35.4.1
#11 0x0000ffffb95b8414 in () at /usr/lib/aarch64-linux-gnu/tegra/libnvidia-glcore.so.35.4.1
#12 0x0000ffffb93eebbc in () at /usr/lib/aarch64-linux-gnu/tegra/libnvidia-glcore.so.35.4.1
#13 0x0000ffffb93eedb8 in () at /usr/lib/aarch64-linux-gnu/tegra/libnvidia-glcore.so.35.4.1
#14 0x0000ffffb95069f0 in () at /usr/lib/aarch64-linux-gnu/tegra/libnvidia-glcore.so.35.4.1
#15 0x0000ffffb95ec1e4 in () at /usr/lib/aarch64-linux-gnu/tegra/libnvidia-glcore.so.35.4.1
#16 0x0000ffffb92aed60 in () at /usr/lib/aarch64-linux-gnu/tegra/libnvidia-glcore.so.35.4.1
#17 0x0000fffff513eb90 in QOpenGLFunctions::glClear(unsigned int) (this=0xaaaaac9b8d48, mask=17664)
at …/…/include/QtGui/…/…/…/…/qt5/qtbase/src/gui/opengl/qopenglfunctions.h:628
#18 0x0000fffff4831efc in QSGBindable::clear(QFlagsQSGAbstractRenderer::ClearModeBit) const (this=0xffffb40eadb8, mode=…)
at /media/nvidia/qt5/qtdeclarative/src/quick/scenegraph/coreapi/qsgrenderer.cpp:78
#19 0x0000fffff4847b2c in QSGBatchRenderer::Renderer::renderBatches() (this=0xaaaaad6e0000)
at /media/nvidia/qt5/qtdeclarative/src/quick/scenegraph/coreapi/qsgbatchrenderer.cpp:4042
#20 0x0000fffff484966c in QSGBatchRenderer::Renderer::render() (this=0xaaaaad6e0000)
at /media/nvidia/qt5/qtdeclarative/src/quick/scenegraph/coreapi/qsgbatchrenderer.cpp:4363
#21 0x0000fffff4832614 in QSGRenderer::renderScene(QSGBindable const&) (this=0xaaaaad6e0000, bindable=warning: RTTI symbol not found for class ‘QSGRenderer::renderScene(unsigned int)::B’
…)
at /media/nvidia/qt5/qtdeclarative/src/quick/scenegraph/coreapi/qsgrenderer.cpp:264
–Type for more, q to quit, c to continue without paging–
#22 0x0000fffff48323f0 in QSGRenderer::renderScene(unsigned int) (this=0xaaaaad6e0000, fboId=0)
at /media/nvidia/qt5/qtdeclarative/src/quick/scenegraph/coreapi/qsgrenderer.cpp:212
#23 0x0000fffff48d3030 in QSGDefaultRenderContext::renderNextFrame(QSGRenderer*, unsigned int) (this=
0xaaaaacb869c0, renderer=0xaaaaad6e0000, fboId=0) at /media/nvidia/qt5/qtdeclarative/src/quick/scenegraph/qsgdefaultrendercontext.cpp:228
#24 0x0000fffff496bae4 in QQuickWindowPrivate::renderSceneGraph(QSize const&, QSize const&) (this=0xaaaaac504c00, size=…, surfaceSize=…)
at /media/nvidia/qt5/qtdeclarative/src/quick/items/qquickwindow.cpp:617
#25 0x0000fffff48ea31c in QSGRenderThread::syncAndRender(QImage*) (this=0xaaaaabb498c0, grabImage=0x0)
at /media/nvidia/qt5/qtdeclarative/src/quick/scenegraph/qsgthreadedrenderloop.cpp:837
#26 0x0000fffff48eb40c in QSGRenderThread::run() (this=0xaaaaabb498c0)
at /media/nvidia/qt5/qtdeclarative/src/quick/scenegraph/qsgthreadedrenderloop.cpp:1043
#27 0x0000ffffedb85a04 in QThreadPrivate::start(void*) (arg=0xaaaaabb498c0) at /media/nvidia/qt5/qtbase/src/corelib/thread/qthread_unix.cpp:329
#28 0x0000ffffeda16624 in start_thread (arg=0xffffedb85834 QThreadPrivate::start(void*)) at pthread_create.c:477
#29 0x0000ffffed6b949c in thread_start () at …/sysdeps/unix/sysv/linux/aarch64/clone.S:78

Yes please share that simple sample that can reproduce this issue on Orin devkit.

This is not in order, and may not be useful. Just some random comments as I look at the debug content…

  • SIGSEGV implies a memory error. Something in QSGRenderThread failed.
  • This backtrace is from a program without debug symbols. You could get a lot better information if compiled for debug.
  • ltrace might be interesting since the program seems to be talking to “/usr/lib/aarch64-linux-gnu/tegra/libGLX_nvidia.so.0” and “/lib/aarch64-linux-gnu/libX11.so.6”. Debug symbols though would be far more helpful with less effort.
  • One of the “middle” parts of the stack frame mentions " /media/nvidia/qt5/qtbase/src/plugins/platforms/xcb/qxcbconnection_basic.cpp:106". Perhaps you could add debug prints around this function or call or within this function. Debug symbols would perhaps allow you to set a breakpoint and step into it.

Any “gutted” version of this program which includes the QSGRenderThread might reproduce the issue. Maybe.

Hello, I make a simple sample for just select a mp4 file and playing. And test it on the devkit, the issue still happen. Just playing the video and using alt+tab to switch the focus, it would make the X driver crash after few times.

I put the sample on google cloud and the link as follow

Please test it on your site, thank you.

If the app shows "Could not find the Qt platform plugin “xcb” …
please use “LD_LIBRARY_PATH=lib ./viewer_test_sample” to link the lib under folder

Hello, is the issue can be reproduced?

Sorry that I was out so not checking this yet.

Just to clarify. Is this issue only specific to Qt? If I tried something else based on Xserver then it won’t hit such crash?

I guess it would happen too. Maybe you can try it other then Qt, just set to full screen mode. It looks like switch between full-screen mode and non-full-screen mode would make the crash.

We will start with other kind of application first. Even your “simple Qt sample” has 665MB size as tarball.
I don’t think it is ideal to use this to debug.

Some more clarification.

Do any of the basic samples from the Qt website reproduce this issue?

For example,

We are not actively working on Qt. So we prefer a simple method/sample to reproduce this issue.

I use the Qt5 to develop.
The sample code is simple, but I include the dependency libs of Qt on it, so it looks so big.

The sample code should be less than hundred lines