USB errors during using more than one USB 3.0 cameras

Hi!

I have encountered some USB errors during using more than one USB 3.0 Basler cameras.
(Jetson TX1 with the latest L4T 24.1)

If I use only one camera at a time the error I rarely get is:

Failed to open device '2676:ba02:2:2:9' for XML file download. Tried 5 times. Error: 'UX Status: Libusb error: LIBUSB_ERROR_OTHER.'

This occurs when my application tries to open the camera. Fortunately, if I execute the same sample program again, it usually runs with no error.

It gets complicated when I try to control 2 cameras at a time.
Sometimes I get the same error message above, sometimes I get the “LIBUSB_ERROR_NO_DEVICE” error message below, and sometimes I get simple “timeout errors” or “failed to read maximum device responsetime” errors. The below error usually occurs during grabbing some images, while the others usually occurs when the application tries to enumerate and open the connected cameras.

CompleteXfers: Instance = 3, idx = 21, pDestBuffer = 0x0x241ca8. Trailer is corrupted.
CompleteXfers: Instance = 3, idx = 0, pDestBuffer = 0x0x242320. Leader is corrupted.
Unable to stall pipe 0, status=0xe2100004 LIBUSB_ERROR_NO_DEVICE
Unable to reset pipe 0, status=0xe2100004
Unable to transmit data, status=LIBUSB_ERROR_NO_DEVICE, bytes read=0
: Sending read mem command failed. Resetting pipe.
Unable to stall pipe 0, status=0xe2100004 LIBUSB_ERROR_NO_DEVICE
Failed to stall control channel pipe 1, status=0xE2100004
Unable to reset pipe 0, status=0xe2100004
Failed to reset control channel pipe 1, status=0xE2100004
Unable to transmit data, status=LIBUSB_ERROR_NO_DEVICE, bytes read=0
: Sending read mem command failed.
: Failed to read SI Control value. (0xe2000009)
Unable to transmit data, status=LIBUSB_ERROR_NO_DEVICE, bytes read=0
: Sending read mem command failed. Resetting pipe.
Unable to stall pipe 0, status=0xe2100004 LIBUSB_ERROR_NO_DEVICE
Failed to stall control channel pipe 1, status=0xE2100004
Unable to reset pipe 0, status=0xe2100004
Failed to reset control channel pipe 1, status=0xE2100004
Unable to transmit data, status=LIBUSB_ERROR_NO_DEVICE, bytes read=0
: Sending read mem command failed.
: Failed to read SI Control value. (0xe2000009)
Unable to transmit data, status=LIBUSB_ERROR_NO_DEVICE, bytes read=0
: Sending read mem command failed. Resetting pipe.
Unable to stall pipe 0, status=0xe2100004 LIBUSB_ERROR_NO_DEVICE
Failed to stall control channel pipe 1, status=0xE2100004
Unable to reset pipe 0, status=0xe2100004
Failed to reset control channel pipe 1, status=0xE2100004
Unable to transmit data, status=LIBUSB_ERROR_NO_DEVICE, bytes read=0
: Sending read mem command failed.
: Failed to read SI Control value. (0xe2000009)
Failed to submit transfer status=0xe200000f
BeginAsyncRead, numBytes=1024, status=0xE200000F
BeginDataXfer: Instance = 3, BeginAsyncRead failed bufferIdx = 6, XferIdx = 0, pDestBuffer = 0x0x242da0, status = 0xe200000f, size = 1024
[  261.380012] Warning: Grab_Cameras: PID 2372: Using deprecated CP15 barrier instruction
[  266.507493] xhci_hcd 0000:01:00.0: WARN Set TR Deq Ptr cmd failed due to incorrect slot or ep state.
[  266.710924] Warning: Grab_Cameras: PID 2381: Using deprecated CP15 barrier instruction
[  267.870698] Warning: Grab_Cameras: PID 2390: Using deprecated CP15 barrier instruction
[  269.044604] Warning: Grab_Cameras: PID 2399: Using deprecated CP15 barrier instruction
[  274.962412] xhci_hcd 0000:01:00.0: xHCI host not responding to stop endpoint command.
[  274.962505] xhci_hcd 0000:01:00.0: Assuming host is dying, halting host.
[  274.966055] xhci_hcd 0000:01:00.0: HC died; cleaning up
[  274.972675] usb 2-1: USB disconnect, device number 2
[  274.980629] usb 2-2: USB disconnect, device number 3

Anyway, I have connected the 2 cameras to two kinds of PCI cards, those are:
ASMedia Technology Inc. ASM1142 USB 3.1 Host Controller
Renesas Technology Corp. uPD720201 USB 3.0 Host Controller (rev 03)

Installing the same PCI card to an Ubuntu PC, and executing the same sample program causes no error at all. So I assume the USB driver of the Jetson TX1 is the main reason for this.
Is there any patch or any good ideas?
Thank you!

You may want to see what shows up in the output of dmesg from this sequence (check “dmesg | tail” prior to this so you’ll know which lines were already there):

insert camera 1
remove camera 1
insert camera 1
insert camera 2
...run the software showing errors...
remove camera 2
remove camera 1

It would also be useful to see the “lspci -vvv” output for just one camera before any errors show up.

Unfortunately, currently I have only one camera and can’t reproduce the error messages with only one.

Anyway the log messages when I connect one camera:

[   43.115344] usb 2-2: new SuperSpeed USB device number 2 using xhci_hcd
[   43.138388] usb 2-2: Parent hub missing LPM exit latency info.  Power management will be impacted.
[   43.143657] usb 2-2: New USB device found, idVendor=2676, idProduct=ba02
[   43.143759] usb 2-2: New USB device strings: Mfr=1, Product=2, SerialNumber=3
[   43.143848] usb 2-2: Product: acA1300-200um
[   43.145820] usb 2-2: Manufacturer: Basler
[   43.145917] usb 2-2: SerialNumber: 21812928

When I execute the grabbing program:

[  124.626647] Warning: Grab_Cameras: PID 2125: Using deprecated CP15 barrier instruction
[  124.969407] warning: `Grab_MultipleCameras' uses 32-bit capabilities (legacy support in use)

And when I disconnect the camera:

[  225.613511] usb 2-2: USB disconnect, device number 2

lspci -vvv response:

ubuntu@tegra-ubuntu:~$ lspci -vvv
00:01.0 PCI bridge: NVIDIA Corporation Device 0fae (rev a1) (prog-if 00 [Normal decode])
        Control: I/O+ Mem+ BusMaster+ SpecCycle- MemWINV- VGASnoop- ParErr+ Stepping- SERR+ FastB2B- DisINTx-
        Status: Cap+ 66MHz- UDF- FastB2B- ParErr- DEVSEL=fast >TAbort- <TAbort- <MAbort- >SERR- <PERR- INTx-
        Latency: 0, Cache Line Size: 64 bytes
        Bus: primary=00, secondary=01, subordinate=01, sec-latency=0
        I/O behind bridge: 0000f000-00000fff
        Memory behind bridge: 13000000-130fffff
        Prefetchable memory behind bridge: 00000000fff00000-00000000000fffff
        Secondary status: 66MHz- FastB2B- ParErr- DEVSEL=fast >TAbort- <TAbort- <MAbort- <SERR- <PERR-
        BridgeCtl: Parity+ SERR- NoISA- VGA- MAbort- >Reset- FastB2B-
                PriDiscTmr- SecDiscTmr- DiscTmrStat- DiscTmrSERREn-
        Capabilities: <access denied>
        Kernel driver in use: pcieport

01:00.0 USB controller: ASMedia Technology Inc. ASM1142 USB 3.1 Host Controller (prog-if 30 [XHCI])
        Subsystem: ASMedia Technology Inc. ASM1142 USB 3.1 Host Controller
        Control: I/O- Mem+ BusMaster+ SpecCycle- MemWINV- VGASnoop- ParErr+ Stepping- SERR+ FastB2B- DisINTx+
        Status: Cap+ 66MHz- UDF- FastB2B- ParErr- DEVSEL=fast >TAbort- <TAbort- <MAbort- >SERR- <PERR- INTx-
        Latency: 0, Cache Line Size: 64 bytes
        Interrupt: pin A routed to IRQ 130
        Region 0: Memory at 13000000 (64-bit, non-prefetchable) 
        Capabilities: <access denied>
        Kernel driver in use: xhci_hcd

I will test with 2 cameras, too, as soon as possible.
But it’s kind of strange, 4 times in 5 attempts, executing the program succeeds, but once it causes a LIBUSB error.
I assume the USB transfers on Jetson are not stable when more devices communicate at almost full speed. It must be a USB driver problem since on Ubuntu Host PC everything works well.

The output of lspci -vvv for the controller was a bit short, was it done with sudo? If not, try this:

sudo lspci -s '01:00.0' -vvv

This part I’ve not seen before, but it makes me wonder:

Parent hub missing LPM exit latency info.  Power management will be impacted.

I’m wondering if disable of ondemand would help:

sudo update-rc.d -f ondemand remove

…followed by performance mode for USB as per here:
http://elinux.org/Jetson/Performance

Sorry…executed without sudo.

ubuntu@tegra-ubuntu:~$ sudo lspci -s '01:00.0' -vvv
01:00.0 USB controller: ASMedia Technology Inc. ASM1142 USB 3.1 Host Controller (prog-if 30 [XHCI])
        Subsystem: ASMedia Technology Inc. ASM1142 USB 3.1 Host Controller
        Control: I/O- Mem+ BusMaster+ SpecCycle- MemWINV- VGASnoop- ParErr+ Stepping- SERR+ FastB2B- DisINTx+
        Status: Cap+ 66MHz- UDF- FastB2B- ParErr- DEVSEL=fast >TAbort- <TAbort- <MAbort- >SERR- <PERR- INTx-
        Latency: 0, Cache Line Size: 64 bytes
        Interrupt: pin A routed to IRQ 130
        Region 0: Memory at 13000000 (64-bit, non-prefetchable) 
        Capabilities: [50] MSI: Enable- Count=1/8 Maskable- 64bit+
                Address: 0000000000000000  Data: 0000
        Capabilities: [68] MSI-X: Enable+ Count=8 Masked-
                Vector table: BAR=0 offset=00002000
                PBA: BAR=0 offset=00002080
        Capabilities: [78] Power Management version 3
                Flags: PMEClk- DSI- D1- D2- AuxCurrent=55mA PME(D0-,D1-,D2-,D3hot+,D3cold+)
                Status: D0 NoSoftRst+ PME-Enable- DSel=0 DScale=0 PME-
        Capabilities: [80] Express (v2) Endpoint, MSI 00
                DevCap: MaxPayload 512 bytes, PhantFunc 0, Latency L0s <64ns, L1 <2us
                        ExtTag+ AttnBtn- AttnInd- PwrInd- RBE+ FLReset-
                DevCtl: Report errors: Correctable- Non-Fatal- Fatal- Unsupported-
                        RlxdOrd+ ExtTag- PhantFunc- AuxPwr- NoSnoop+
                        MaxPayload 128 bytes, MaxReadReq 512 bytes
                DevSta: CorrErr- UncorrErr- FatalErr- UnsuppReq- AuxPwr+ TransPend-
                LnkCap: Port #1, Speed 5GT/s, Width x2, ASPM L0s L1, Exit Latency L0s <2us, L1 unlimited
                        ClockPM- Surprise- LLActRep- BwNot-
                LnkCtl: ASPM Disabled; RCB 64 bytes Disabled- CommClk+
                        ExtSynch- ClockPM- AutWidDis- BWInt- AutBWInt-
                LnkSta: Speed 5GT/s, Width x2, TrErr- Train- SlotClk+ DLActive- BWMgmt- ABWMgmt-
                DevCap2: Completion Timeout: Not Supported, TimeoutDis-, LTR+, OBFF Not Supported
                DevCtl2: Completion Timeout: 50us to 50ms, TimeoutDis-, LTR-, OBFF Disabled
                LnkCtl2: Target Link Speed: 8GT/s, EnterCompliance- SpeedDis-
                         Transmit Margin: Normal Operating Range, EnterModifiedCompliance- ComplianceSOS-
                         Compliance De-emphasis: -6dB
                LnkSta2: Current De-emphasis Level: -6dB, EqualizationComplete-, EqualizationPhase1-
                         EqualizationPhase2-, EqualizationPhase3-, LinkEqualizationRequest-
        Capabilities: [100 v1] Virtual Channel
                Caps:   LPEVC=0 RefClk=100ns PATEntryBits=1
                Arb:    Fixed- WRR32- WRR64- WRR128-
                Ctrl:   ArbSelect=Fixed
                Status: InProgress-
                VC0:    Caps:   PATOffset=00 MaxTimeSlots=1 RejSnoopTrans-
                        Arb:    Fixed- WRR32- WRR64- WRR128- TWRR128- WRR256-
                        Ctrl:   Enable+ ID=0 ArbSelect=Fixed TC/VC=ff
                        Status: NegoPending- InProgress-
        Capabilities: [200 v1] Advanced Error Reporting
                UESta:  DLP- SDES- TLP- FCP- CmpltTO- CmpltAbrt- UnxCmplt- RxOF- MalfTLP- ECRC- UnsupReq- ACSViol-
                UEMsk:  DLP- SDES- TLP- FCP- CmpltTO- CmpltAbrt- UnxCmplt- RxOF- MalfTLP- ECRC- UnsupReq- ACSViol-
                UESvrt: DLP+ SDES- TLP- FCP- CmpltTO- CmpltAbrt- UnxCmplt- RxOF- MalfTLP+ ECRC- UnsupReq- ACSViol-
                CESta:  RxErr- BadTLP- BadDLLP- Rollover- Timeout- NonFatalErr-
                CEMsk:  RxErr- BadTLP- BadDLLP- Rollover- Timeout- NonFatalErr+
                AERCap: First Error Pointer: 00, GenCap- CGenEn- ChkCap- ChkEn-
        Capabilities: [280 v1] #19
        Capabilities: [300 v1] Latency Tolerance Reporting
                Max snoop latency: 0ns
                Max no snoop latency: 0ns
        Kernel driver in use: xhci_hcd

Thanks for the advice! I’ll try it as soon as possible!

I don’t know if the following matters at all, it’s just a curiosity of observation which may or may not have meaning. So far as I know, the PCIe on the JTX1 is capable of revision 1 or 2 behavior…not revision 3. My understanding of de-emphasis is that on revision 1 de-emphasis is always -3.5dB, and on revision 2 it is up to the root complex to pick either -3.5dB or -6dB. The -6dB is for longer PCIe traces which might be seen on a larger circuit board, whereas -3.5dB would be appropriate for the shorter trace lengths of the JTX1. If issues are because of PCIe, this might be relevant…if issues are solely USB, then the de-emphasis will have no bearing on the problem. This also could not matter if the built-in USB3 is used, as this does not use PCIe.

Hi!

I have tested the kit with 2 cameras again. These are the log messages I get.

Inserting camera 1:

[   76.889972] usb 2-1: new SuperSpeed USB device number 3 using xhci_hcd
[   76.913131] usb 2-1: Parent hub missing LPM exit latency info.  Power management will be impacted.
[   76.918371] usb 2-1: New USB device found, idVendor=2676, idProduct=ba02
[   76.918481] usb 2-1: New USB device strings: Mfr=1, Product=2, SerialNumber=3
[   76.918817] usb 2-1: Product: acA1300-200um
[   76.918908] usb 2-1: Manufacturer: Basler
[   76.918991] usb 2-1: SerialNumber: 21927838

Removing camera 1:

[  113.211289] usb 2-1: USB disconnect, device number 3

Inserting camera 1,2:

[  145.745723] usb 2-1: new SuperSpeed USB device number 4 using xhci_hcd
[  145.768347] usb 2-1: Parent hub missing LPM exit latency info.  Power management will be impacted.
[  145.773184] usb 2-1: New USB device found, idVendor=2676, idProduct=ba02
[  145.773296] usb 2-1: New USB device strings: Mfr=1, Product=2, SerialNumber=3
[  145.773384] usb 2-1: Product: acA1300-200um
[  145.773465] usb 2-1: Manufacturer: Basler
[  145.773540] usb 2-1: SerialNumber: 21927838
[  152.342443] usb 2-2: new SuperSpeed USB device number 5 using xhci_hcd
[  152.364396] usb 2-2: Parent hub missing LPM exit latency info.  Power management will be impacted.
[  152.368501] usb 2-2: New USB device found, idVendor=2676, idProduct=ba02
[  152.368614] usb 2-2: New USB device strings: Mfr=1, Product=2, SerialNumber=3
[  152.368708] usb 2-2: Product: acA1300-200um
[  152.368787] usb 2-2: Manufacturer: Basler
[  152.368865] usb 2-2: SerialNumber: 21927840

Executing the program multiple times:

[  218.120761] Warning: Grab_MultipleCa: PID 2218: Using deprecated CP15 barrier instruction
[  218.484020] warning: `Grab_MultipleCa' uses 32-bit capabilities (legacy support in use)
[  274.973610] Warning: Grab_MultipleCa: PID 2265: Using deprecated CP15 barrier instruction
[  276.173416] Warning: Grab_MultipleCa: PID 2274: Using deprecated CP15 barrier instruction
[  277.381178] Warning: Grab_MultipleCa: PID 2283: Using deprecated CP15 barrier instruction
[  278.460816] Warning: Grab_MultipleCa: PID 2292: Using deprecated CP15 barrier instruction
[  279.683402] Warning: Grab_MultipleCa: PID 2301: Using deprecated CP15 barrier instruction
[  280.901609] Warning: Grab_MultipleCa: PID 2310: Using deprecated CP15 barrier instruction
[  282.131217] Warning: Grab_MultipleCa: PID 2319: Using deprecated CP15 barrier instruction
[  283.135978] xhci_hcd 0000:01:00.0: WARN Set TR Deq Ptr cmd failed due to incorrect slot or ep state.
[  283.136231] xhci_hcd 0000:01:00.0: WARN Event TRB for slot 6 ep 2 with no TDs queued?
[  283.331148] Warning: Grab_MultipleCa: PID 2328: Using deprecated CP15 barrier instruction
[  284.553554] Warning: Grab_MultipleCa: PID 2337: Using deprecated CP15 barrier instruction
[  285.802104] Warning: Grab_MultipleCa: PID 2346: Using deprecated CP15 barrier instruction
[  287.042466] Warning: Grab_MultipleCa: PID 2355: Using deprecated CP15 barrier instruction
[  288.336083] Warning: Grab_MultipleCa: PID 2364: Using deprecated CP15 barrier instruction
[  289.623044] Warning: Grab_MultipleCa: PID 2373: Using deprecated CP15 barrier instruction
[  290.841261] Warning: Grab_MultipleCa: PID 2382: Using deprecated CP15 barrier instruction
[  291.898524] xhci_hcd 0000:01:00.0: WARN Set TR Deq Ptr cmd failed due to incorrect slot or ep state.
[  291.899578] xhci_hcd 0000:01:00.0: WARN Set TR Deq Ptr cmd failed due to incorrect slot or ep state.
[  291.899800] xhci_hcd 0000:01:00.0: WARN Set TR Deq Ptr cmd failed due to incorrect slot or ep state.
[  291.900631] xhci_hcd 0000:01:00.0: ERROR Unknown event condition, HC probably busted
[  291.900784] xhci_hcd 0000:01:00.0: ERROR Unknown event condition, HC probably busted
[  291.900947] xhci_hcd 0000:01:00.0: ERROR Unknown event condition, HC probably busted
[  291.901113] xhci_hcd 0000:01:00.0: ERROR Unknown event condition, HC probably busted
[  291.901280] xhci_hcd 0000:01:00.0: ERROR Transfer event TRB DMA ptr not part of current TD
[  291.909552] xhci_hcd 0000:01:00.0: WARN Successful completion on short TX
[  291.921469] Grab_MultipleCa[2382]: unhandled level 3 translation fault (11) at 0x0000000c, esr 0x92000007
[  291.921477] pgd = ffffffc0d3c6c000
[  291.924881] [0000000c] *pgd=0000000151c6e003, *pmd=0000000152eef003, *pte=0000000000000000
[  291.933230]
[  291.933238] CPU: 2 PID: 2382 Comm: Grab_MultipleCa Not tainted 3.10.96+ #1
[  291.933243] task: ffffffc0e42d1300 ti: ffffffc0d20a8000 task.ti: ffffffc0d20a8000
[  291.933251] PC is at 0xf67aa3dc
[  291.933255] LR is at 0xf67c6b30
[  291.933259] pc : [<00000000f67aa3dc>] lr : [<00000000f67c6b30>] pstate: a00d0010
[  291.933263] sp : 00000000ffc57810
[  291.933266] x12: 0000000000000000
[  291.933271] x11: 0000000000000001 x10: 0000000000000001
[  291.933278] x9 : 0000000000000314 x8 : 00000000f680f000
[  291.933284] x7 : 00000000ffc578fc x6 : 00000000e2000008
[  291.933290] x5 : 00000000001fd7e4 x4 : 00000000001a7f08
[  291.933297] x3 : 0000000000000000 x2 : 0000000000000001
[  291.933303] x1 : 0000000000000000 x0 : 00000000e2000102
[  291.933309]
[  291.933318] Library at 0xf67aa3dc: 0xf6793000 /opt/pylon5/lib/libuxapi-5.0.1.so
[  291.940619] Library at 0xf67c6b30: 0xf6793000 /opt/pylon5/lib/libuxapi-5.0.1.so
[  291.947938] vdso base = 0xf757e000
[  291.969016] Warning: Grab_MultipleCa: PID 2391: Using deprecated CP15 barrier instruction
[  293.241971] Warning: Grab_MultipleCa: PID 2401: Using deprecated CP15 barrier instruction
[  294.460103] Warning: Grab_MultipleCa: PID 2410: Using deprecated CP15 barrier instruction
[  295.679521] Warning: Grab_MultipleCa: PID 2419: Using deprecated CP15 barrier instruction
[  296.970025] Warning: Grab_MultipleCa: PID 2428: Using deprecated CP15 barrier instruction
[  297.986077] xhci_hcd 0000:01:00.0: WARN Set TR Deq Ptr cmd failed due to incorrect slot or ep state.
[  297.986292] xhci_hcd 0000:01:00.0: WARN Set TR Deq Ptr cmd failed due to incorrect slot or ep state.
[  297.986397] xhci_hcd 0000:01:00.0: WARN Successful completion on short TX
[  297.987292] xhci_hcd 0000:01:00.0: WARN Event TRB for slot 6 ep 2 with no TDs queued?
[  297.987578] xhci_hcd 0000:01:00.0: WARN Event TRB for slot 6 ep 2 with no TDs queued?
[  298.190852] Warning: Grab_MultipleCa: PID 2437: Using deprecated CP15 barrier instruction
[  299.461588] Warning: Grab_MultipleCa: PID 2446: Using deprecated CP15 barrier instruction
[  300.701441] Warning: Grab_MultipleCa: PID 2455: Using deprecated CP15 barrier instruction
[  301.979318] Warning: Grab_MultipleCa: PID 2464: Using deprecated CP15 barrier instruction
[  303.010343] xhci_hcd 0000:01:00.0: WARN Set TR Deq Ptr cmd failed due to incorrect slot or ep state.
[  303.014905] xhci_hcd 0000:01:00.0: WARN Set TR Deq Ptr cmd failed due to incorrect slot or ep state.
[  303.025907] xhci_hcd 0000:01:00.0: WARN Event TRB for slot 6 ep 2 with no TDs queued?

Removing camera 1,2:

[  393.153021] usb 2-2: USB disconnect, device number 5
[  413.472538] usb 2-1: USB disconnect, device number 4

This time the output of the program was a LIBUSB TIMEOUT message:

Unable to ensure stalled pipe 0, status=0xe2100007 LIBUSB_ERROR_TIMEOUT
Unable to ensure stalled pipe 0, status=0xe2100007 LIBUSB_ERROR_TIMEOUT

Unable to ensure stalled pipe 0, status=0xe2100007 LIBUSB_ERROR_TIMEOUT
: Failed to stall pipe. (0xe2100007)
An exception occurred.
PrepareGrab failed for device '2676:ba02:2:2:5'. Error: 'UX Status: Libusb error: LIBUSB_ERROR_TIMEOUT.'

After that, I have executed the whole process again from the beginning. This time a LIBUSB NO_DEVICE error was the output of the program:

Unable to stall pipe 0, status=0xe2100004 LIBUSB_ERROR_NO_DEVICE
Unable to reset pipe 0, status=0xe2100004
Unable to transmit data, this may occur due to removal of a device, status=LIBUSB_ERROR_NO_DEVICE, bytes read=0.
: Sending read mem command failed. Resetting pipe.
Unable to stall pipe 0, status=0xe2100004 LIBUSB_ERROR_NO_DEVICE
Failed to stall control channel pipe 1, status=0xE2100004
Unable to reset pipe 0, status=0xe2100004
Failed to reset control channel pipe 1, status=0xE2100004
Unable to transmit data, this may occur due to removal of a device, status=LIBUSB_ERROR_NO_DEVICE, bytes read=0.
: Sending read mem command failed.
: Failed to read SI Control value. (0xe2000009)
Unable to transmit data, this may occur due to removal of a device, status=LIBUSB_ERROR_NO_DEVICE, bytes read=0.
: Sending read mem command failed. Resetting pipe.
Unable to stall pipe 0, status=0xe2100004 LIBUSB_ERROR_NO_DEVICE
Failed to stall control channel pipe 1, status=0xE2100004
Unable to reset pipe 0, status=0xe2100004
Failed to reset control channel pipe 1, status=0xE2100004
Unable to transmit data, this may occur due to removal of a device, status=LIBUSB_ERROR_NO_DEVICE, bytes read=0.
: Sending read mem command failed.
: Failed to read SI Control value. (0xe2000009)
Unable to transmit data, this may occur due to removal of a device, status=LIBUSB_ERROR_NO_DEVICE, bytes read=0.
: Sending read mem command failed. Resetting pipe.
Unable to stall pipe 0, status=0xe2100004 LIBUSB_ERROR_NO_DEVICE
Failed to stall control channel pipe 1, status=0xE2100004
Unable to reset pipe 0, status=0xe2100004
Failed to reset control channel pipe 1, status=0xE2100004
Unable to transmit data, this may occur due to removal of a device, status=LIBUSB_ERROR_NO_DEVICE, bytes read=0.
: Sending read mem command failed.
: Failed to read SI Control value. (0xe2000009)
Unable to stall pipe 0, status=0xe2100004 LIBUSB_ERROR_NO_DEVICE
Unable to reset pipe 0, status=0xe2100004
Unable to transmit data, this may occur due to removal of a device, status=LIBUSB_ERROR_NO_DEVICE, bytes read=0.
: Sending read mem command failed. Resetting pipe.
Unable to stall pipe 0, status=0xe2100004 LIBUSB_ERROR_NO_DEVICE
Failed to stall control channel pipe 1, status=0xE2100004
Unable to reset pipe 0, status=0xe2100004
Failed to reset control channel pipe 1, status=0xE2100004
Unable to transmit data, this may occur due to removal of a device, status=LIBUSB_ERROR_NO_DEVICE, bytes read=0.
: Sending read mem command failed.
: Failed to read SI Control value. (0xe2000009)
Unable to transmit data, this may occur due to removal of a device, status=LIBUSB_ERROR_NO_DEVICE, bytes read=0.
: Sending read mem command failed. Resetting pipe.
Unable to stall pipe 0, status=0xe2100004 LIBUSB_ERROR_NO_DEVICE
Failed to stall control channel pipe 1, status=0xE2100004
Unable to reset pipe 0, status=0xe2100004
Failed to reset control channel pipe 1, status=0xE2100004
Unable to transmit data, this may occur due to removal of a device, status=LIBUSB_ERROR_NO_DEVICE, bytes read=0.
: Sending read mem command failed.
: Failed to read SI Control value. (0xe2000009)
Unable to transmit data, this may occur due to removal of a device, status=LIBUSB_ERROR_NO_DEVICE, bytes read=0.
: Sending read mem command failed. Resetting pipe.
Unable to stall pipe 0, status=0xe2100004 LIBUSB_ERROR_NO_DEVICE
Failed to stall control channel pipe 1, status=0xE2100004
Unable to reset pipe 0, status=0xe2100004
Failed to reset control channel pipe 1, status=0xE2100004
Unable to transmit data, this may occur due to removal of a device, status=LIBUSB_ERROR_NO_DEVICE, bytes read=0.
: Sending read mem command failed.
: Failed to read SI Control value. (0xe2000009)
Unable to stall pipe 0, status=0xe2100004 LIBUSB_ERROR_NO_DEVICE
Unable to reset pipe 0, status=0xe2100004
Unable to transmit data, this may occur due to removal of a device, status=LIBUSB_ERROR_NO_DEVICE, bytes read=0.
: Sending read mem command failed. Resetting pipe.
Unable to stall pipe 0, status=0xe2100004 LIBUSB_ERROR_NO_DEVICE
Failed to stall control channel pipe 1, status=0xE2100004
Unable to reset pipe 0, status=0xe2100004
Failed to reset control channel pipe 1, status=0xE2100004
Unable to transmit data, this may occur due to removal of a device, status=LIBUSB_ERROR_NO_DEVICE, bytes read=0.
: Sending read mem command failed.
: Failed to read SI Control value. (0xe2000009)
Unable to transmit data, this may occur due to removal of a device, status=LIBUSB_ERROR_NO_DEVICE, bytes read=0.
: Sending read mem command failed. Resetting pipe.
Unable to stall pipe 0, status=0xe2100004 LIBUSB_ERROR_NO_DEVICE
Failed to stall control channel pipe 1, status=0xE2100004
Unable to reset pipe 0, status=0xe2100004
Failed to reset control channel pipe 1, status=0xE2100004
Unable to transmit data, this may occur due to removal of a device, status=LIBUSB_ERROR_NO_DEVICE, bytes read=0.
: Sending read mem command failed.
: Failed to read SI Control value. (0xe2000009)
Unable to transmit data, this may occur due to removal of a device, status=LIBUSB_ERROR_NO_DEVICE, bytes read=0.
: Sending read mem command failed. Resetting pipe.
Unable to stall pipe 0, status=0xe2100004 LIBUSB_ERROR_NO_DEVICE
Failed to stall control channel pipe 1, status=0xE2100004
Unable to reset pipe 0, status=0xe2100004
Failed to reset control channel pipe 1, status=0xE2100004
Unable to transmit data, this may occur due to removal of a device, status=LIBUSB_ERROR_NO_DEVICE, bytes read=0.
: Sending read mem command failed.
: Failed to read SI Control value. (0xe2000009)
Unable to transmit data, this may occur due to removal of a device, status=LIBUSB_ERROR_NO_DEVICE, bytes read=0.
: Sending read mem command failed. Resetting pipe.
Unable to stall pipe 0, status=0xe2100004 LIBUSB_ERROR_NO_DEVICE
Failed to stall control channel pipe 1, status=0xE2100004
Unable to reset pipe 0, status=0xe2100004
Failed to reset control channel pipe 1, status=0xE2100004
Unable to transmit data, this may occur due to removal of a device, status=LIBUSB_ERROR_NO_DEVICE, bytes read=0.
: Sending read mem command failed.
: Failed to read SI Control value. (0xe2000009)
Unable to transmit data, this may occur due to removal of a device, status=LIBUSB_ERROR_NO_DEVICE, bytes read=0.
: Sending read mem command failed. Resetting pipe.
Unable to stall pipe 0, status=0xe2100004 LIBUSB_ERROR_NO_DEVICE
Failed to stall control channel pipe 1, status=0xE2100004
Unable to reset pipe 0, status=0xe2100004
Failed to reset control channel pipe 1, status=0xE2100004
Unable to transmit data, this may occur due to removal of a device, status=LIBUSB_ERROR_NO_DEVICE, bytes read=0.
: Sending read mem command failed.
: Failed to read SI Control value. (0xe2000009)
Unable to transmit data, this may occur due to removal of a device, status=LIBUSB_ERROR_NO_DEVICE, bytes read=0.
: Sending read mem command failed. Resetting pipe.
Unable to stall pipe 0, status=0xe2100004 LIBUSB_ERROR_NO_DEVICE
Failed to stall control channel pipe 1, status=0xE2100004
Unable to reset pipe 0, status=0xe2100004
Failed to reset control channel pipe 1, status=0xE2100004
Unable to transmit data, this may occur due to removal of a device, status=LIBUSB_ERROR_NO_DEVICE, bytes read=0.
: Sending read mem command failed.
: Failed to read SI Control value. (0xe2000009)
CompleteXfers: Instance = 3: Unexpected timeout while waiting for aborted requests.
ABORT PIPE DOES NOT WORK. MUST RECOVER FROM BAD SETUP. CYCLING PORT TO RECOVER.
CompleteXfers: Instance = 3: Unexpected timeout while waiting for aborted requests.
ABORT PIPE DOES NOT WORK. MUST RECOVER FROM BAD SETUP. CYCLING PORT TO RECOVER.

And for this second case, the only difference in dmesg is that the two cameras disconnects/disappears automatically after the error message shows up:

[  221.380962] xhci_hcd 0000:01:00.0: xHCI host not responding to stop endpoint command.
[  221.381062] xhci_hcd 0000:01:00.0: Assuming host is dying, halting host.
[  221.386245] xhci_hcd 0000:01:00.0: HC died; cleaning up
[  221.392606] usb 2-1: USB disconnect, device number 3
[  221.399515] usb 2-2: USB disconnect, device number 4

I hope it’s not a hardware problem as you mentioned before. Maybe an other option is that the 32bit Pylon software is not stable on Jetston TX1. (I have tested only the 64bit version Pylon on Ubuntu Host PC)

Anyway, I am testing the board connecting one camera to the PCI extension card and one to the on-board USB port, and I have no error messages like the above yet. Furthermore, If I connect the 2 cameras to a USB-HUB which is connected to the on-board USB port, I get no errors either. So it seems to be a PCI-related problem…

Any help and comment would be appreciated! Thanks!

Does the “hub missing LPM exit latency info” show only under one type of PCIe USB3 HUB card? Were you able to try the removal of the “ondemand” as mentioned above? Because the HUB does not support low power mode, it would be good to see what happens when low power mode is disabled…if the HUB never sees a low power mode then the behavior is unimportant despite detecting lack of support for low power.

I glanced over the errors about CP15 barrier code, and the gist of this is that quite some time ago ARM architectures did not necessarily have the ability to do out of order scheduling of some assembler code…later on, to improve cache performance, some out of order execution abilities were put in (resulting in fewer cache misses). However, there were times when the automatic out of order mechanism had to be circumvented when running older code, as the result was altered by doing out of order…the barrier mechanisms were the code to disable some of this out of order behavior in order to get the old behavior. Here is a quote from ARM’s documents about this instruction:

Note that the CP15 equivalent barrier instructions available in ARMv6 are deprecated in ARMv7

This seems to imply that even in ARMv7 the barrier instructions were not really intended to be used, this was more for older ARMv6 days (presumably prior to ARMv6 out of order changes for cache efficiency probably did not exist, and it was probably ARMv6 when barriers were sometimes needed to run code older than ARMv6, but I may be wrong). If this is the case, then ARMv7 would be complaining with a warning about using deprecated barrier code.

Now we are in ARMv8-a (64-bit) and ARMv8 (32-bit compatibility to ARMv7 armhf). So the assembler being run is probably from 32-bit ARMv8 mode running out of date ARMv7 code which shouldn’t have the barrier instructions in the first place (though it seems it should be harmless). I’m asking myself this question…where did the JTX1 get code in a kernel driver which was not only 32-bit, but 32-bit from an older compiler? It seems a newer compiler would not have used barrier code.

Possibility 1…there is a driver or kernel module or kernel code inserted as a module which is third party and not directly part of this compiled kernel.

Possibility 2…the compiler itself, while building the 32-bit ARMv8 part of the kernel, is still using barrier code (even though it should not). This could be an issue of the compiler used to build the kernel.

I doubt either possibility should cause outright failure, but there is separate and different kernel code executed when deprecated barrier code is detected…if there is a bug in executing deprecated code features this might cause actual failure.

Have you custom built the kernel? Have you used any outside source for drivers, especially anything precompiled? Note that the ondemand part of the question is basically separate, and may still be an issue…the barrier instruction warnings might have no effect.

We have the same problem with Basler USB 3.0 cameras.
But we have noticed a strange thing:
the message

CompleteXfers: Instance = 3, idx = 20, pDestBuffer = 0x0x4bb9a8. Trailer status field is not zero. Status = 41216

disappears when we connect the display via HDMI.
There is also a correlated issue with performance.
Our stereo processing algorithm that uses CUDA work faster when display is connected and is slower when not.
We connect cameras via USB 3.0 Hub or directly: one to USB2 and another to USB3.0.

@vixtor-qm: Is any kind of remote connection involved in this, e.g., ssh to a Jetson from a remote host?

Yes, we run the app via ssh

@linuxdev, have you noticed smth similar?

@vixtor-qm: Life gets more complicated when doing remote connection. One has to look at how X11 deals with remote display migration. I believe this is probably getting in your way.

What it comes down to is that X11 can separate the machine the code runs on from the machine the code results are displayed to. When that happens the code for actual rendering also migrates…part of the non-rendering code runs on the original host, but OpenGL type computation migrates to the machine doing the display. This could be prevented by running a virtual X11 client/server on the Jetson, and then viewing via a remote desktop type of software (purposely not using X11 forwarding…the software renders to a virtual context on the Jetson, then remote desktop software views what was rendered).

The hardware accelerated access to the video and to the GPU in general are tied to the video driver, which in turn is tied to the X11 server. It is obvious that when doing remote display you want to use your remote machine for graphics, as that is where you display it…what the software often does not know is that the GPU used in CUDA is not for graphics display, so all GPU/CUDA seems to migrate with this to the remote desktop. If it happens that your desktop does not support the version of OpenGL the application wants to use, then you’d get a missing OpenGL type error during that remote display…else the display would work, and if your video card on the remote display is quite fast, you’ll get amazing performance increase. The same is true for CUDA…if your remote desktop has the proper version of CUDA installed, and you migrate to your desktop, then you’ll get an amazing performance boost…else you’ll get an error about missing CUDA even though you know your Jetson has what it needs.

Here is a related post about such remote display software, though this was on a JTK1 instead of JTX1:
[url]https://devtalk.nvidia.com/default/topic/828974/jetson-tk1/-howto-install-virtualgl-and-turbovnc-to-jetson-tk1[/url]

The reason this grabs my attention is because of the changes with or without HDMI. The changes seem to be too dramatic and somewhat odd…unless the machine doing the rendering is completely different. If you “echo $DISPLAY” from a direct login it will differ from the remote ssh login…CUDA sees the same difference between ssh and direct.

Thx for the explanation, but my case is a bit different. My application is not intended to render anything and i’m not using any X forwarding at all. It just grabs images from cameras, computes the stereomap and sends it via the Ethernet.
That’s it. I can make a clue how CUDA performance is related to plugging/unplugging the external display and activating OpenGL, but what is completely dark to me is why the USB-related code is affected by that. I saw similar behavior when cameras were connected via long USB 3.0 cable which probably caused signal saturation and voltage drop. But here i see no obvious reason except the case that attaching a display activates some power switch and some internal “power controller” starts to produce enough power for the hardware. Or this disables a power-saving mode and GPU increases its frequency and consumes more power. How do you guys think, does it make sense?

I could easily see performance changes based scaling changing things with or without a display. Even having differences in how memory is used might change things with or without a display. One thing you can do is make sure performance is maxed out and scaling is disabled, then test again to see how with/without HDMI changes.

You can disable init’s scaling via:

sudo update-rc.d -f ondemand remove

Then follow the performance information:
http://elinux.org/Jetson/TX1_Controlling_Performance

The JTK1 performance information shows USB autosuspend removal which you would also want:
http://elinux.org/Jetson/Performance

See how things change or remain the same with those performance settings in place.

I will be in my lab in 9 hours, will try and let you know. Thanks

unfortunately it didn’t help((

One thing I’m wondering about is if maybe it is related to more bandwidth being required (data starvation). Typically an x86_64 desktop system does have greater data throughput. With two cameras attached, what is the output of “lsusb -t”? Also, how much bandwidth are the cameras using, what is the color depth, resolution, and frame rate of each camera?

Yes, the “hub missing LPM exit latency info” message shows up only if I connect the cameras via the PCIe card.
I have executed all tests after removing the “ondemand” feature, and you were right, it solved many problems, like “fail to read device response time” errors during opening the cameras, and also solved the errors when using external USB hub. Sorry for skipping these infos, and thanks for the idea!

But unfortunately LIBUSB_ERROR_TIMEOUT errors occurs yet, 1 in about 20 attempts if I execute the camera grabbing sample program. Both ASMedia and Renesas PCI card. But in case I use the Renesas card, only these three kinds of wierd warnings show up in dmesg log, which simplifies the case:

Parent hub missing LPM exit latency info.  Power management will be impacted.
Warning: Grab_MultipleCa: PID 2419: Using deprecated CP15 barrier instruction 
xhci_hcd 0000:04:00.0: WARN Successful completion on short TX: needs XHCI_TRUST_TX_LENGTH quirk?

About the last xhci_hcd message I saw some similar posts on the internet, but could’t solve the problem yet.

I have patched the kernel because of the following reason, but did nothing else: (https://devtalk.nvidia.com/default/topic/946129/jetson-tx1/switching-to-tty1-6-/)

Ohh, I almost forgot. I have tried a third PCIe card as well to make it more complex XD.
It is namly the USB3.0RX4-P4-PCIE board, which has 4 Renesas USB Host controller and a PLX PCI bridge.
Using this PCI card I can use 4 cameras at a time at full speed with no errors at all. However,“Parent hub missing LPM exit latency info” and “CP15 barrier instruction” shows up, but no other warnings like the above.
So this PCI card may solve my problem, but it would be great to use other 2-port cards as well which has no brige IC.

Because the CP15 barrier instruction warning involves code from older compilers or perhaps compatibility modes, what compilers are you using for kernel build? I keep wondering what would perhaps start working if the code which detects the CP 15 barrier code does not have to run (I doubt any modern code has hit that safety net in a long time). I am really curious as to how that code got into the kernel. There can be an enormous difference between the assembler generated for the same exact kernel when compiler versions change.

Can you give more information on the design difference between the working PCIe card and the others so far as the bridge IC you mentioned? I agree it would be good to know why the other cards fail, it is worth documenting.