Nvgpu driver reports error

Hi all,

I have been experiencing a reoccurring issue with my jetson orin agx recently but it’s not 100% clear to me what the cause of this issue is. We have numerous systems running essentially the same system and configuration (about 8-10 systems), but we have not experienced this issue on any of the other ones yet.

From a little bit of testing, it looks like it relates to the startup of one of our inference systems (models being loaded and started by the Stereolabs zed SDK) and/or the pair of Stereolabs ZED2 cameras. The cameras frequently fail to register (initially registering but then disconnecting afterward) but if they do register we experience the GPU error report. I have attached a copy of one of the recent dmesg logs.

Does anyone have any insight on this issue? Does anything in the logs stick out indicating what the cause of the issue could be? Any advice or assistance on this issue is greatly appreciated!

weird_gpu_failure.log (93.5 KB)

Cheers,
Stephen Abraham

Hi,

Which JetPack version do you use?
We met a similar issue last year and it should be fixed in our current release already.

Thanks

we are running Jetpack 5.0.2 on this orin (all 8-9 systems that I mentioned above featuring an orin agx are running jp5.0.2).

what was the issue faced? can it happen like this were only a few devices experienced it?

Hi,

In general, this kind of error is hard to reproduce.
So it might just happen to one of your devices.

Thanks.