Hi @WayneWWW here is some more detail,
Our first encounter with this issue was when
nvidia-container-cli started giving the following error:
docker: Error response from daemon: failed to create shim: OCI runtime create failed: container_linux.go:380: starting container process caused: process_linux.go:545: container init caused: Running hook #0:: error running hook: exit status 1, stdout: , stderr: nvidia-container-cli: initialization error: cuda error: no cuda-capable device is detected: unknown.
ERRO error waiting for container: context canceled
We intermittently saw this issue after the following changes. By intermittent, I mean once every 4-5 boots. A bad shutdown also seems to cause the issue, which can then be resolved with a restart.
- L4T update to
32.7.2 and L4T_DOCKER to
- Turning on the pll for the can clock to operate at an exact 250000bps.
+ pll_source = "pllaon";
+ clocks = <&bpmp_clks TEGRA194_CLK_CAN1_CORE>,
+ <&bpmp_clks TEGRA194_CLK_CAN1_HOST>,
+ <&bpmp_clks TEGRA194_CLK_CAN1>,
+ <&bpmp_clks TEGRA194_CLK_PLLAON>;
+ clock-names = "can_core", "can_host", "can", "pllaon";
compatible = "nvidia,cboot-options-v1";
boot-order = "sd", "usb", "emmc", "nvme", "net";
tftp-server-ip = /bits/ 8 <192 168 0 1>;
Waiting in the debug (micro USB) terminal before typing
boot+enter does not seem to have an effect. This might just have been coincidental. The only concrete information I have at this point is that nvidia-docker containers do not start when the
pva0: failed to get free Queue pops up in the micro-usb debug logs.
nvidia-container-cli: initialization error: cuda error: no cuda-capable device is detected
Here is a
dmesg dump, as requested. Monitor connected, display works fine and shows our configured splash screen.
dmesg_dump.txt (77.7 KB)
On a good boot we get:
[ 0.853567] iommu: Adding device 16000000.pva0 to group 48
[ 0.854228] iommu: Adding device 16800000.pva1 to group 49
[ 1.636979] pva 16000000.pva0: initialized
[ 1.668024] pva 16800000.pva1: initialized
On a bad boot we get:
[ 0.856987] iommu: Adding device 16000000.pva0 to group 48
[ 0.857619] iommu: Adding device 16800000.pva1 to group 49
[ 1.508835] pva 16000000.pva0: initialized
[ 1.540228] pva 16800000.pva1: initialized
[ 13.835500] pva 16000000.pva0: failed to get free Queue
[ 13.835657] pva 16000000.pva0: failed to get free Queue
[ 13.837465] pva 16000000.pva0: failed to get free Queue
[ 13.846573] pva 16000000.pva0: failed to get free Queue