Hi,
I have encountered a problem in Nvidia Jetson Orin AGX developer Kit with L4T version 35.3.1 and 35.4.1 where the display driver crashed with the following message when I try to resize, minimize and maximize the application window which was build using Qt version 5.14.2 and Qt quick.
I observed that the above issue happens in Orin platforms like Orin AGX and Orin NX with L4T version 35.3.1 and 35.4.1, where as Xavier platforms like Xavier AGX and Xavier NX with L4T version 35.3.1 and 35.4.1 could not recreate this issue.
Since the crash message shows log related to nvgpu I wanted to mention that the application uses QOpenGLFunctions, Qt quick libraries and QtGraphicalEffects. Awaiting for support. Thanks in advance.
Hi @DaneLLL
I have attached the installer run file for a sample application below. Before running the installer please install dependencies using the following command, “sudo apt install libqt5widgets5 libqt5xml5 libqt5qml5”.
The above application was built using Qt 5.14.2 and QML.
To recreate the display driver crash issue, Resize the application window rapidly or maximize and restore down the application window continuously. Awaiting for support.
Hi @DaneLLL
As mentioned in the first thread, I have tried the same with the latest L4T 35.4.1, and the results are same. (L4T 35.4.1 is featured in Jetpack 5.1.2). Awaiting for support.
Hi,
Since we don’t reproduce the issue on developer kit, it may be additional software packages trigger the issue. Could you do clean re-flash and try our steps?
Hi @DaneLLL ,
I re-flashed my Nvidia Jetson Orin AGX developer Kit with L4T version 35.4.1 (Jetpack version 5.1.2) and tried to recreate the issue. And I was clearly able to recreate that again. I noticed that the sampleApp installer I shared previously was not working because of some missing libraries. I don’t know how you were able to run the application and confirm that you were not able to recreate the issue. But since you mentioned that you were not able to recreate, I would like to add more specifics. When the application that was built using Qt and Qt quick is resized, the display driver crashed showing some nvgpu logs in the dmesg. Any of the below three responses were observed.
The nvgpu crash messages were shown in dmesg log and display driver was able to recover (sometimes).
The nvgpu crash messages were shown in dmesg log and display driver was not able to recover. Only restarting the board helps. (This happens most of the time)
The nvgpu crash message was shown in dmesg log and the board automatically goes to restart (I dont know how).
I have attached a working installer run file for the same sample application I shared previously. Before running the installer please install dependencies using the following command, “sudo apt install libqt5widgets5 libqt5xml5 libqt5qml5”.
Extract the zip and run the installer to install the sample Application. Now go to the directory where it was installed and run the application using ./App. Now the application will start. To recreate the display driver crash issue resize the application window rapidly or minimize and restore down the application window.
I have attached the screen recording and crash log for your reference. Please take a look.
We would like to test your app on other environments (different branches or CUDA).
But sampleApp.run installer links to OpenCV 4.2.0 which is too old to be used.
How could we change the link version?
Or could you attach a sample link to OpenCV 4.8?
Hi,
The SampleApp.run that I shared in my last thread does not have any opencv dependency. I am able to reproduce the nvgpu crash issue even in that. Can you please download the SampleApp.zip that I shared in my last thread and test with that. I also recommend to take a look at the dmesg log and screen recording that I shared along with SampleApp.zip for a better understanding. Thanks. Awaiting for support.
We did see the same nvgpu error on JetPack 5 and our internal team is checking.
When testing on JetPack 6 which contains a new CUDA and GPU driver, it fails with a missing OpenCV 4.2 library although there is a 4.8.0 with -D BUILD_opencv_world=ON enabled.
That’s why we thought the installer might have a dependency on the OpenCV version.
$ ./App
./App: error while loading shared libraries: libopencv_world.so.4.2: cannot open shared object file: No such file or directory
But as you mentioned, it looks like we didn’t use the installer in the last thread. (checksum is different)
Will give it a quick try and update here.
Our internal team has some feedback about this issue.
The error is triggered by the user space so please double-check the implementation of “App” binary.
[ 788.176304] nvgpu: 17000000.gpu nvgpu_cic_mon_report_err_safety_services:97 [ERR] Error reporting is not supported in this platform
[ 788.176316] nvgpu: 17000000.gpu nvgpu_gr_intr_handle_sm_exception:365 [ERR] sm machine check err. gpc_id(0), tpc_id(0), offset(0)
[ 788.176327] nvgpu: 17000000.gpu gv11b_gr_intr_record_sm_error_state:1949 [ERR] sm err state gpc_id(0), tpc_id(0), offset(128), sm_id(15), hww_global_esr 4,hww_warp_esr 184549408, hww_warp_esr_pc 0x0
[ 788.176534] nvgpu: 17000000.gpu gr_intr_handle_exception_interrupts:759 [ERR] set gr exception notifier
[ 788.176539] nvgpu: 17000000.gpu nvgpu_set_err_notifier_locked:143 [ERR] error notifier set to 13 for ch 510 owned by App
[ 788.176626] __ga10b__ 510-ga10b, TSG: 2, pid 4448, thread name App, refs: 5, deterministic: no, domain name: (no domain)
[ 788.176627] __ga10b__ channel status: in use on_pbdma, on_eng, pbdma_busy, eng_busy busy
[ 788.176628] __ga10b__ RAMFC: TOP: 8000001ffec6a890 PUT: 001ffec6a8a4 GET: 001ffec6a890 FETCH: 000000000000 HEADER: 2140006c COUNT: 11110000 SEMAPHORE: addr 002000170000 payload 0000000000004b0f execute 00081003
Hi @AastaLLL ,
Thanks for the update. As mentioned in your above thread I get that the error is triggered by the user space, but I am not able to understand what triggered this issue. The SampleApp shares the same code base in all the platforms like Nvidia Jetson Xavier AGX, Xavier NX and even other x64 platforms. I didn’t face this kind of error in other platforms. The only possible difference I can notice is the platform difference. This the reason why I wanted to pull out as much information as possible from platform perspective.