With two intel realsense cameras and a single usb 2D camera on the usb bus it is easy for me to reproduce the following crash while starting our system up:
[ 145.335287] smmu_dump_pagetable(): fault_address=0x0000000000000000 pa=0xffffffffffffffff bytes=ffffffffffffffff #pte=0 in L2
[ 145.346662] mc-err: (20) csr_xusb_hostr: EMEM decode error on PDE or PTE entry
[ 145.353938] mc-err: status = 0x6000004a; addr = 0x00000000
[ 145.359668] mc-err: secure: no, access-type: read, SMMU fault: nr-nw-s
[ 145.366483] smmu_dump_pagetable(): fault_address=0x0000000000000000 pa=0xffffffffffffffff bytes=ffffffffffffffff #pte=0 in L2
[ 145.377880] mc-err: (20) csr_xusb_hostr: EMEM decode error on PDE or PTE entry
[ 145.385156] mc-err: status = 0x6000004a; addr = 0x00000000
[ 145.390865] mc-err: secure: no, access-type: read, SMMU fault: nr-nw-s
[ 145.397651] smmu_dump_pagetable(): fault_address=0x0000000000000000 pa=0xffffffffffffffff bytes=ffffffffffffffff #pte=0 in L2
[ 145.409028] mc-err: (20) csr_xusb_hostr: EMEM decode error on PDE or PTE entry
[ 145.416295] mc-err: status = 0x6000004a; addr = 0x00000000
[ 145.422018] mc-err: secure: no, access-type: read, SMMU fault: nr-nw-s
[ 145.428839] smmu_dump_pagetable(): fault_address=0x0000000000000000 pa=0xffffffffffffffff bytes=ffffffffffffffff #pte=0 in L2
[ 145.440199] mc-err: (20) csr_xusb_hostr: EMEM decode error on PDE or PTE entry
[ 145.447484] mc-err: status = 0x6000004a; addr = 0x00000000
[ 145.453210] mc-err: secure: no, access-type: read, SMMU fault: nr-nw-s
[ 145.459989] mc-err: Too many MC errors; throttling prints
[ 150.237341] tegra-xusb-mbox 70098000.mailbox: Controller firmware hang
[ 150.244101] tegra-xusb-mbox 70098000.mailbox: XUSB_CFG_ARU_MBOX_OWNER 0x0
[ 150.251061] tegra-xusb-mbox 70098000.mailbox: XUSB_CFG_ARU_MBOX_CMD 0x80000000
[ 150.258478] tegra-xusb-mbox 70098000.mailbox: XUSB_CFG_ARU_MBOX_DATA_IN 0x0
[ 150.265585] tegra-xusb-mbox 70098000.mailbox: XUSB_CFG_ARU_MBOX_DATA_OUT 0x0
[ 155.381216] xhci-tegra 70090000.xusb: HC died; cleaning up
[ 155.500053] uvcvideo: Failed to query (131) UVC probe control : -110 (exp. 34).
[ 173.833378] xhci-tegra 70090000.xusb: Stopped the command ring failed, maybe the host is dead
[ 173.878749] xhci-tegra 70090000.xusb: Abort command ring failed
[ 173.884678] xhci-tegra 70090000.xusb: HC died; cleaning up
The system needs to be rebooted to restore usb connectivity. This crash happens with the new L4T 28.1 release and with the previous L4T 24.2.1. In the best case this seems to be a firmware error in the xusb controller. In the worst case it could be a hardware error in the controller.
Is this a known error and is someone working on it?
Are there any known workarounds?
Hi brmrbt,
For more information, so you have two intel realsense cameras and one usb 2D camera connected to the default carrier board? Three usb cameras in total? What is the brand of the usb 2D camera?
The issue happens also with just the two realsense R200 cameras connected - perhaps a little less frequently than when we also have the 2D camera connected.
The crash happens very rarely when just enabling the depth stream from the two R200s but if I enable the two infra red streams and the 2D stream from both cameras in addition to the depth stream then the error happens quickly. The error happens when starting our application. This crash happened after 6 starts:
[619544.138239] smmu_dump_pagetable(): fault_address=0x0000000000000000 pa=0xffffffffffffffff bytes=fffffffff2
[619544.150155] mc-err: (20) csr_xusb_hostr: EMEM decode error on PDE or PTE entry
[619544.157738] mc-err: status = 0x6000004a; addr = 0x00000000
[619544.163689] mc-err: secure: no, access-type: read, SMMU fault: nr-nw-s
[619544.170784] smmu_dump_pagetable(): fault_address=0x0000000000000000 pa=0xffffffffffffffff bytes=fffffffff2
[619544.182350] mc-err: (20) csr_xusb_hostr: EMEM decode error on PDE or PTE entry
[619544.189681] mc-err: status = 0x6000004a; addr = 0x00000000
[619544.195439] mc-err: secure: no, access-type: read, SMMU fault: nr-nw-s
[619544.202250] smmu_dump_pagetable(): fault_address=0x0000000000000000 pa=0xffffffffffffffff bytes=fffffffff2
[619544.213666] mc-err: (20) csr_xusb_hostr: EMEM decode error on PDE or PTE entry
[619544.220989] mc-err: status = 0x6000004a; addr = 0x00000000
[619544.226746] mc-err: secure: no, access-type: read, SMMU fault: nr-nw-s
[619544.233552] smmu_dump_pagetable(): fault_address=0x0000000000000000 pa=0xffffffffffffffff bytes=fffffffff2
[619544.244945] mc-err: (20) csr_xusb_hostr: EMEM decode error on PDE or PTE entry
[619544.252260] mc-err: status = 0x6000004a; addr = 0x00000000
[619544.258002] mc-err: secure: no, access-type: read, SMMU fault: nr-nw-s
[619544.264791] mc-err: Too many MC errors; throttling prints
[619549.039820] tegra-xusb-mbox 70098000.mailbox: Controller firmware hang
[619549.046674] tegra-xusb-mbox 70098000.mailbox: XUSB_CFG_ARU_MBOX_OWNER 0x0
[619549.053630] tegra-xusb-mbox 70098000.mailbox: XUSB_CFG_ARU_MBOX_CMD 0x80000000
[619549.061005] tegra-xusb-mbox 70098000.mailbox: XUSB_CFG_ARU_MBOX_DATA_IN 0x0
[619549.068117] tegra-xusb-mbox 70098000.mailbox: XUSB_CFG_ARU_MBOX_DATA_OUT 0x0
[619550.866909] xhci-tegra 70090000.xusb: xHCI host not responding to stop endpoint command.
[619550.875279] xhci-tegra 70090000.xusb: Assuming host is dying, halting host.
[619550.947317] xhci-tegra 70090000.xusb: Host not halted after 16000 microseconds.
[619550.954954] xhci-tegra 70090000.xusb: Non-responsive xHCI host is not halting.
[619550.962341] xhci-tegra 70090000.xusb: Completing active URBs anyway.
This application is running on the latest 28.1 release with librealsense and ROS (kinetic) on a Jetson evaluation board with an external USB 3.0 hub from TP link.
Hi brmbrt,
This is not a kernel crash but application is trying to do illegal memory address access leading to smmu errors. plaese check realsense sw stack.
After this crash all usb devices are dead including the ethernet controller.
I’m pretty sure that an application should not be able to do that with a null pointer exception.
My explanation for the smmu error is the xusb controller accesses address 0 as a DMA target and this causes the xusb to hang in the request and thus it crashes and brings down the kernel xhci driver.
This line gives the source of the fault:
mc-err: (20) csr_xusb_hostr: EMEM decode error on PDE or PTE entry
csw_xusb_hostr here is the source of the fault which is the xusb as far as I can see and not an application running on the cpu.
The problem happens when we start streaming from two realsense cameras at the same time.
I tried the same test using 2 USB 2D cameras and that did not fail (I could not start three USB 2.0 cameras because of bandwidth limitations).
The crash is very rare if we just start one stream from each realsense camera (one in 5000 times) but becomes very frequent if several streams from each camera are started. The realsense cameras are special in several ways, They have multiple streams: a left and right infrared stream, a depth information stream and a separate 2D camera stream and they have a USB 3.0 interface instead of the standard USB 2.0 usb camera interface.
I think to reproduce this issue you would need at least 3 USB 3 2D cameras. Is it possible to get some debug information out of the xusb controller? It is very easy for us to reproduce. Another alternative is that we send you a pair of R200s and I make a test program that causes the failure that you can run.
[ 3069.032122] smmu_dump_pagetable(): fault_address=0x0000000000000000 pa=0xffffffffffffffff bytes=ffffffffffffffff #pte=0 in L2
[ 3069.049210] mc-err: (20) csr_xusb_hostr: EMEM decode error on PDE or PTE entry
[ 3069.060769] mc-err: status = 0x6000004a; addr = 0x00000000
[ 3069.067793] mc-err: secure: no, access-type: read, SMMU fault: nr-nw-s
[ 3072.237743] tegra-xhci tegra-xhci: Firmware reinit.
[ 3074.389626] tegra-xhci tegra-xhci: xHCI host not responding to stop endpoint command.
[ 3074.389754] tegra-xhci tegra-xhci: Assuming host is dying, halting host.
[ 3074.504890] tegra-xhci tegra-xhci: Host not halted after 16000 microseconds.
[ 3074.504981] tegra-xhci tegra-xhci: Non-responsive xHCI host is not halting.
[ 3074.505052] tegra-xhci tegra-xhci: Completing active URBs anyway.
[ 3074.506254] uvcvideo: Failed to query (SET_CUR) UVC control 1 on unit 7: -110 (exp. 2).
[ 3074.506750] uvcvideo: Failed to query (SET_CUR) UVC control 1 on unit 7: -110 (exp. 2).
[ 3074.506780] tegra-xhci tegra-xhci: HC died; cleaning up
[ 3074.506971] tegra_xhci_hcd_reinit: hcd_reinit is disabled
This was with kernel 3.10.96. I tried to remove the xhci kernel module (and that went fine) but inserting it in the kernel again caused the system to freeze.
If you think it will make a difference that I run on r28.1 then I will do that. Our current production system runs on the previous release with the 3.10.96 kernel so I wanted to test there first.