We have our custom BSP (based on L4T32.4.3) for AGX Xavier and custom carrier board.
One of our customer reported below issue. can you please suggest on this?
I have been troubleshooting a problem on the custom carrier board with AGX xavier that causes tensorflow not to function. The problem appears upon reboot after the installation of tensorflow’s dependencies through the metapackage nvidia-jetpack.
Our application is functional before the reboot and is able to utilize the new packages. After reboot the system freezes when our application accesses cuda through tensorflow (see attached dmesg output).
It seems that the gpu driver crashes. I have included the following: apt history log, outputs of the cuda utilities deviceQuery and bandwidthTest before and after restarting, dmesg output after the error.
I have referred to the following nvidia forum link describing a similar problem: Cuda hangs after installation of jetpack and reboot - #4 by AastaLLL
apt_history.txt (8.7 KB)
bandwidthTest_after_reboot.txt (114 Bytes)
apt_history.txt (8.7 KB)
bandwidthTest_after_reboot.txt (114 Bytes)
bandwidthTest_before_reboot.txt (585 Bytes)
deviceQuery_after_reboot.txt (2.3 KB)
deviceQuery_after_reboot.txt (2.3 KB)
dmesg_bandwidthTest_after_reboot.txt (16.1 KB)