Use oceanFFT example code test cuda gpu, agx orin kernel crash and reboot

[com COM18] (2024-08-26_101738) COM18 (USB-SERIAL CH340 (COM18)).log (230.1 KB)
kern.log (68.2 KB)
nvmap_trace.zip (2.1 MB)
test_shell.txt (243 Bytes)

hello, we build /usr/local/cuda-11.4/samples/5_Simulations/oceanFFT/oceanFFT.cpp demo example, get oceanFFT bin file, then we use test_shell.txt shell script file test cuda gpu, we found orin console[file-CH340-COM18.log] so fast get crash, and orin will auto-reboot. nvmap_trace.zip file is we captured the nvmap trace event log, kern.log is the /var/log/kern.log. we use jetpack 5.1.1. we want know, What is the reason make this kernel crash(and auto-reboot), and how to fix it?


Hi,

Do you run multiple oceanFFT at the same time?
If yes, do you meet the same issue when running single oceanFFT app?

Thanks.

thanks. yes, we run multiple oceanFFT at the same time. if we test running single oceanFFT app, we can not get this same issue so fast.

Hi,

Do you mean the crash still occur with single oceanFFT but just requires more time?

Thanks.

Hi,
we do not test use single oceanFFT. if you want, we can try it

Hi,

we test use multiple oceanFFT at the same time, because of we have a ros application use cuda api “gpuConvertUYVYtoBGR((uint8_t *)p, _pCudaOutBuffer, _width, _height)”, and run multiple our ros application at the same time. at this conditon(use our ros application), we can get like this crash, and orin will auto-reboot.but we can not get this same issue so fast(maybe one day or more day get this crash). today, we found use multiple oceanFFT at the same time can get this crash so fast, so we want know, What is the reason make this kernel crash(and auto-reboot), and how to fix it? and i think maybe can fix same with our ros application.

Thanks.

Hi,

We are trying to reproduce this issue in our environment.
Will provide more info to you later.

Thanks.

Hi,
OK.

Thanks.

Hi,

How long it take to reproduce this issue?
We try the oceanFFT sample around a hour and work correctly (num up to 346).

Thanks.

Hi,
We are tested use a 4k dispaly screen(hdmi dispaly mode), num range between 100 and 200, will reproduce this issue.

Thanks

Hi,

Is upgrading to JetPack 6 an option for you?

We tested it on Orin with JetPack 6 and were not able to reproduce this issue.
So it’s recommended to give it a try.

Thanks.

Hi, AastaLLL

Our project is based on JetPack5.1.1. upgrade to JetPack6, we need change our own driver, rootfs and applications source code, adapt to the JetPack6 new kernel, rootfs, and api, It takes a lot of time

so can you restore the firmware to jetpack 5.1.1, and check this issue.

Help us…

Thanks.

Hi,

We can check this issue on JetPack 5.1.1 again.
Will provide more info to you later.

Thanks.

Hi, AastaLLL

Thanks very much!

Hi,

We test this issue with JetPack 5.1.1 for 3 hours (num=500).
But the apps run well without issue.

Thanks.

Hi, AastaLLL

Did you test with a 4k display screen(hdmi display mode)? and can you tell us, your test conditions, we use your conditions test again.

Thanks.