TX1 hangs just after boot up when PCIe card is connected

Hi,

I have a PCIe based framegrabber that I’m using with TX1. When I start the TX1 after inserting the PCIe card, it boots up for 10 odd seconds and just hangs at the Ubuntu login screen. I have to reboot it until it doesn’t hang so abruptly. This hang/crash happens very frequently. Here’s the kernel log I got through serial console.

http://pastebin.com/r9ZTvGcy

At times, it doesn’t even show the login screen and hangs up before it, just after the boot up. In those cases, there’s no error message that’s there in the post below, it just timeouts and reboots. Any suggestions on how to debug this would be helpful.

Thanks!

This is where it’s crashing.

[   13.462130] vgaarb: this pci device is not a vga device
ubuntu@tegra-ubuntu:~$ [   15.273550] tegra210_mixer tegra210-mixer: ASoC: hw_params() failed: -22
[   15.280270] tegra-snd-t210ref-mobile-rt565x sound.27: ASoC: PRE_PMU: TX1 Transmit-MIXER1-1 Receive event failed: -22
[   36.338754] Bad mode in Error handler detected, code 0xbf000002
[   36.344668] CPU: 3 PID: 469 Comm: avahi-daemon Not tainted 3.10.96 #1
[   36.351095] task: ffffffc0fa07e2c0 ti: ffffffc0f92c4000 task.ti: ffffffc0f92c4000
[   36.358564] PC is at 0x7fa35733f0
[   36.361869] LR is at 0x7fa372a6c8
[   36.365176] pc : [<0000007fa35733f0>] lr : [<0000007fa372a6c8>] pstate: 60000000
[   36.372553] sp : ffffffc0f92c7ff0
[   36.375857] x29: 0000007ff4927e90 x28: 0000000000000004
[   36.381174] x27: 0000000000000005 x26: 000000000042c970
[   36.386489] x25: 00000000004302c0 x24: 000000000042c000
[   36.391805] x23: 000000000042c000 x22: 0000000000430210
[   36.397121] x21: 0000000000000001 x20: 0000007fa3776710
[   36.402437] x19: 0000000000450410 x18: 0000007ff4927ae0
[   36.407753] x17: 0000007fa357337c x16: 0000007fa373d060
[   36.413068] x15: 003760357316ea62 x14: 0000000000000000
[   36.418382] x13: 00000003e8000000 x12: 0000000000000018
[   36.423700] x11: 00000000000e2d1c x10: 0000000057a2e53c
[   36.429016] x9 : 0000000000000b96 x8 : 0000000000000049
[   36.434331] x7 : 0000000000000000 x6 : 0000007fa377e000
[   36.439645] x5 : 0000000000000000 x4 : 0000000000000000
[   36.444961] x3 : 0000000000000000 x2 : 0000007ff4927ec0
[   36.450276] x1 : 000000000000000a x0 : 0000000000000000
[   36.455589]
[   36.457078] Bad mode in Synchronous Abort handler detected, code 0x8600000f
[   36.464024] CPU: 3 PID: 469 Comm: avahi-daemon Not tainted 3.10.96 #1
[   36.470449] task: ffffffc0fa07e2c0 ti: ffffffc0f92c4000 task.ti: ffffffc0f92c4000
[   36.477915] PC is at 0x7fa372a6c8
[   36.481219] LR is at 0x7fa372a6c8
[   36.484523] pc : [<0000007fa372a6c8>] lr : [<0000007fa372a6c8>] pstate: 800003c5
[   36.491900] sp : ffffffc0f92c7ed0
[   36.495204] x29: 0000007ff4927e90 x28: 0000000000000004
[   36.500520] x27: 0000000000000005 x26: 000000000042c970
[   36.505837] x25: 00000000004302c0 x24: 000000000042c000
[   36.511153] x23: 0000000060000000 x22: 0000007fa35733f0
[   36.516470] x21: ffffffc0f92c7ff0 x20: 0000007fa3776710
[   36.521788] x19: 0000000000450410 x18: 0000007ff4927ae0
[   36.527104] x17: 0000007fa357337c x16: 000000000000000a
[   36.532418] x15: 0000000000000020 x14: 000000000000005d
[   36.537732] x13: 0000000000000039 x12: 0000000000000038
[   36.543047] x11: 0000000000000035 x10: 0000000000000035
[   36.548363] x9 : 0000000000000035 x8 : 0000000000000006
[   36.553679] x7 : ffffffc0002b1994 x6 : ffffffc001166e68
[   36.558994] x5 : 0000000000000000 x4 : ffffffc0fa07e2c0
[   36.564308] x3 : ffffffc0f92c7db0 x2 : 0000000000000000
[   36.569624] x1 : ffffffc0f92c4000 x0 : 0000000000000000
[   36.574939]
[   36.574939] SP: 0xffffffc0f92c7e50:
[   36.579891] 7e50  a3776710 0000007f f92c7ff0 ffffffc0 a35733f0 0000007f 60000000 00000000
[   36.588130] 7e70  0042c000 00000000 004302c0 00000000 0042c970 00000000 00000005 00000000
[   36.596369] 7e90  00000004 00000000 f4927e90 0000007f a372a6c8 0000007f f92c7ed0 ffffffc0
[   36.604608] 7eb0  a372a6c8 0000007f 800003c5 00000000 00450410 00000000 a3776710 0000007f
[   36.612846] 7ed0  00000000 00000000 0000000a 00000000 f4927ec0 0000007f 00000000 00000000
[   36.621085] 7ef0  00000000 00000000 00000000 00000000 a377e000 0000007f 00000000 00000000
[   36.629322] 7f10  00000049 00000000 00000b96 00000000 57a2e53c 00000000 000e2d1c 00000000
[   36.637559] 7f30  00000018 00000000 e8000000 00000003 00000000 00000000 7316ea62 00376035
[   36.645798]
[   36.645798] X1: 0xffffffc0f92c3f80:
[   36.650750] 3f80  00000000 00000000 00000000 00000000 00000000 00000000 00000000 00000000
[   36.658991] 3fa0  00000000 00000000 00000000 00000000 00000000 00000000 00000000 00000000
[   36.667229] 3fc0  00000000 00000000 00000000 00000000 00000000 00000000 00000000 00000000
[   36.675470] 3fe0  00000000 00000000 00000000 00000000 00000000 00000000 00000000 00000000
[   36.683710] 4000  00000001 00000000 ffffffff ffffffff fa07e2c0 ffffffc0 000bc6ec ffffffc0
[   36.691946] 4020  00000000 00000000 00000000 00000000 00000000 00000000 00000000 00000000
[   36.700186] 4040  00000000 00000000 00000001 00000003 00000100 00000000 57ac6e9d 00000000
[   36.708426] 4060  00000019 052c0c00 00003049 001909c0 0c000000 3fef053a 09c00000 00000019
[   36.716666]
[   36.716666] X3: 0xffffffc0f92c7d30:
[   36.721618] 7d30  f92c7d60 ffffffc0 0008c530 ffffffc0 00000001 00000000 00ac7b24 ffffffc0
[   36.729854] 7d50  00000004 00000000 00000000 00000000 f92c7da0 ffffffc0 000debbc ffffffc0
[   36.738090] 7d70  f92c7dc0 ffffffc0 f92c7de0 ffffffc0 f92c7da0 ffffffc0 000debc0 ffffffc0
[   36.746329] 7d90  00000001 00000000 000bb180 ffffffc0 f92c7dc0 ffffffc0 00ac7b24 ffffffc0
[   36.754565] 7db0  00000000 00000000 f92c4000 ffffffc0 00000000 00000000 f92c7db0 ffffffc0
[   36.762804] 7dd0  fa07e2c0 ffffffc0 00000000 00000000 01166e68 ffffffc0 002b1994 ffffffc0
[   36.771043] 7df0  00000006 00000000 00000035 00000000 00000035 00000000 00000035 00000000
[   36.779283] 7e10  00000038 00000000 00000039 00000000 0000005d 00000000 00000020 00000000
[   36.787520]
[   36.787520] X4: 0xffffffc0fa07e240:
[   36.792472] e240  ffffffff ffffffff ffffffff ffffffff ffffffff ffffffff ffffffff ffffffff
[   36.800716] e260  ffffffff ffffffff ffffffff ffffffff ffffffff ffffffff ffffffff ffffffff
[   36.808953] e280  fa928240 ffffffc0 fa1b42c0 ffffffc0 000002c0 00000000 fa07e2c0 ffffffc0
[   36.817195] e2a0  00000002 ffffffff ffff0000 ffffffff 00000001 ffffffff ffffffff ffffffff
[   36.825437] e2c0  00000000 00000000 f92c4000 ffffffc0 00000002 00404140 00000000 00000000
[   36.833675] e2e0  00000000 00000000 00000001 00000001 00000078 00000078 00000078 00000000
[   36.841914] e300  00ad0910 ffffffc0 00000400 00000000 00400000 00000000 fcdee719 ffffffc0
[   36.850153] e320  00000000 00000000 00000000 00000000 e61540f0 ffffffc0 fcdee730 ffffffc0
[   36.858393]
[   36.858393] X6: 0xffffffc001166de8:
[   36.863343] 6de8  ff18fa40 ffffffc0 00000000 00000000 f6000000 00000000 00200000 00000000
[   36.871582] 6e08  00000000 00000000 00010000 00000000 00080000 00000000 00000000 00000000
[   36.879823] 6e28  00080000 00000000 00000000 00000000 00000001 00000000 00000000 00000000
[   36.888063] 6e48  00000000 00000010 00000000 00000001 00000001 00000001 00000001 00000001
[   36.896301] 6e68  00000000 00000000 00f5b328 ffffffc0 05890589 00000000 ff1c0000 ffffffc0
[   36.904539] 6e88  00010000 00000000 ff8069c0 ffffffc0 00000001 00000000 01166ea0 ffffffc0
[   36.912779] 6ea8  01166ea0 ffffffc0 00000000 00000000 00000000 00000000 002b1830 ffffffc0
[   36.921020] 6ec8  00000000 00000000 002b1c5c ffffffc0 002b0f40 ffffffc0 002b1994 ffffffc0
[   36.929265]
[   36.929265] X7: 0xffffffc0002b1914:
[   36.934215] 1914  f9401ba2 d0006541 9100e021 52800400 94014017 aa0003f3 b50001a0 d0008060
[   36.942453] 1934  39488800 d2800014 35000160 52801cc1 d0006540 91012000 97f7cf0f 52800021
[   36.950694] 1954  d0008060 39088801 14000003 94012aa0 aa0003f4 2a1403e2 aa1303e1 aa1503e0
[   36.958931] 1974  940005b9 aa1303e0 97fb68ed aa1403e0 a94153f3 f94013f5 a8c47bfd d65f03c0
[   36.967169] 1994  a9bd7bfd 910003fd a90153f3 a9025bf5 aa0403f5 aa0503f4 f94044d3 7100081f
[   36.975405] 19b4  54000101 f9400660 b4000820 2a0503e2 aa0403e1 940005a4 52800000 1400004d
[   36.983644] 19d4  71000c1f 54000101 f9400a60 b4000740 2a0503e2 aa0403e1 9400059b 52800000
[   36.991882] 19f4  14000044 7100101f 54000101 f9400e60 b4000660 2a0503e2 aa0403e1 94000592
[   37.000121]
[   37.000121] X21: 0xffffffc0f92c7f70:
[   37.005160] 7f70  a3776710 0000007f 00000001 00000000 00430210 00000000 0042c000 00000000
[   37.013397] 7f90  0042c000 00000000 004302c0 00000000 0042c970 00000000 00000005 00000000
[   37.021636] 7fb0  00000004 00000000 f4927e90 0000007f a372a6c8 0000007f f92c7ff0 ffffffc0
[   37.029873] 7fd0  a35733f0 0000007f 60000000 00000000 00450410 00000000 00000049 00000000
[   37.038110] 7ff0  4bbe080a 930b0000 1000004b 00000000 30373a49 38353939 493a450a 4d4d5f44
[   37.046347] 8010  4e41435f 41444944 313d4554 0000000a 00000000 00000000 00000000 00000000
[   37.054584] 8030  00000000 00000000 00000000 00000000 00000000 00000000 00000000 00000000
[   37.062826] 8050  00000000 00000000 00000000 00000000 00000000 00000000 00000000 00000000
[   37.071067]
[   37.072555] Internal error: Oops - bad mode: 0 [#1] PREEMPT SMP
[   37.078459] Enter nvdumper_crash_setup_regs

Which L4T version is this?

Is it possible for you to put this PCIe card in another Linux computer, and show the output for this from “sudo lspci -vvv” (this is a lot of output, you’d only care about the output specific to this card)?

This is L4T 24.1. Currently, I don’t have access to another computer. I can share the output as soon as it’s possible. Meanwhile, I can share the output of sudo lspci -vvv on my Jetson TX1 (when it boots and runs successfully). Here’s the relevant log:

01:00.0 Unclassified device [0004]: Intersil Techwell Device 6869 (rev 01)
        Control: I/O- Mem+ BusMaster+ SpecCycle- MemWINV- VGASnoop- ParErr+ Stepping- SERR+ FastB2B- DisINTx-
        Status: Cap+ 66MHz- UDF- FastB2B- ParErr- DEVSEL=fast >TAbort- <TAbort- <MAbort- >SERR- <PERR- INTx-
        Latency: 0, Cache Line Size: 64 bytes
        Interrupt: pin A routed to IRQ 130
        Region 0: Memory at 20000000 (32-bit, prefetchable) 
        Capabilities: [40] Power Management version 3
                Flags: PMEClk- DSI- D1- D2- AuxCurrent=375mA PME(D0+,D1-,D2-,D3hot+,D3cold+)
                Status: D0 NoSoftRst- PME-Enable- DSel=0 DScale=0 PME-
        Capabilities: [50] MSI: Enable- Count=1/1 Maskable- 64bit+
                Address: 0000000000000000  Data: 0000
        Capabilities: [70] Express (v2) Legacy Endpoint, MSI 00
                DevCap: MaxPayload 512 bytes, PhantFunc 0, Latency L0s <1us, L1 <8us
                        ExtTag+ AttnBtn- AttnInd- PwrInd- RBE+ FLReset-
                DevCtl: Report errors: Correctable- Non-Fatal- Fatal- Unsupported-
                        RlxdOrd+ ExtTag- PhantFunc- AuxPwr- NoSnoop-
                        MaxPayload 128 bytes, MaxReadReq 512 bytes
                DevSta: CorrErr- UncorrErr- FatalErr- UnsuppReq- AuxPwr+ TransPend-
                LnkCap: Port #0, Speed 2.5GT/s, Width x1, ASPM L0s L1, Exit Latency L0s <512ns, L1 <64us
                        ClockPM+ Surprise- LLActRep- BwNot-
                LnkCtl: ASPM Disabled; RCB 64 bytes Disabled- CommClk+
                        ExtSynch- ClockPM- AutWidDis- BWInt- AutBWInt-
                LnkSta: Speed 2.5GT/s, Width x1, TrErr- Train- SlotClk+ DLActive- BWMgmt- ABWMgmt-
                DevCap2: Completion Timeout: Not Supported, TimeoutDis+, LTR-, OBFF Not Supported
                DevCtl2: Completion Timeout: 50us to 50ms, TimeoutDis-, LTR-, OBFF Disabled
                LnkCtl2: Target Link Speed: 2.5GT/s, EnterCompliance- SpeedDis-
                         Transmit Margin: Normal Operating Range, EnterModifiedCompliance- ComplianceSOS-
                         Compliance De-emphasis: -6dB
                LnkSta2: Current De-emphasis Level: -6dB, EqualizationComplete-, EqualizationPhase1-
                         EqualizationPhase2-, EqualizationPhase3-, LinkEqualizationRequest-
        Capabilities: [100 v1] Advanced Error Reporting
                UESta:  DLP- SDES- TLP- FCP- CmpltTO- CmpltAbrt- UnxCmplt- RxOF- MalfTLP- ECRC- UnsupReq- ACSViol-
                UEMsk:  DLP- SDES- TLP- FCP- CmpltTO- CmpltAbrt- UnxCmplt- RxOF- MalfTLP- ECRC- UnsupReq- ACSViol-
                UESvrt: DLP+ SDES+ TLP- FCP+ CmpltTO- CmpltAbrt- UnxCmplt- RxOF+ MalfTLP+ ECRC- UnsupReq- ACSViol-
                CESta:  RxErr- BadTLP- BadDLLP- Rollover- Timeout- NonFatalErr-
                CEMsk:  RxErr- BadTLP- BadDLLP- Rollover- Timeout- NonFatalErr+
                AERCap: First Error Pointer: 00, GenCap+ CGenEn- ChkCap+ ChkEn-
        Capabilities: [140 v1] Virtual Channel
                Caps:   LPEVC=0 RefClk=100ns PATEntryBits=1
                Arb:    Fixed- WRR32- WRR64- WRR128-
                Ctrl:   ArbSelect=Fixed
                Status: InProgress-
                VC0:    Caps:   PATOffset=00 MaxTimeSlots=1 RejSnoopTrans-
                        Arb:    Fixed- WRR32- WRR64- WRR128- TWRR128- WRR256-
                        Ctrl:   Enable+ ID=0 ArbSelect=Fixed TC/VC=ff
                        Status: NegoPending- InProgress-
        Kernel driver in use: tw6869

Also, at times there’s also a BUG: Soft Lockup error after the login screen (during the hang), but it’s there only some of the times.

So far from the soft lockup plus a successful lspci, it seems there may be a threading issue regarding the driver to the card. The card itself does not seem to use any standardized device class, so the driver is specific to that board (the device is not a generic/general/standard class so far as driver handling is concerned). According to this, PCI handed this device to driver “tw6869”. Beyond that I do not know what is going on in the driver. Was this a third party driver?

That might be possible. I am using this driver:
https://github.com/FrankBau/tw6869
The driver doesn’t work out of the box as I kept getting a resource collision error on dmesg.
So I also had to apply a patch to the drivers/pci/quirks.c file to make it work:

+static void fixup_tw6869_class(struct pci_dev* dev)
+{
+       dev_info(&dev->dev, "Setting PCI class for tw6868 PCIe device\n");
+       dev->class = PCI_CLASS_MULTIMEDIA_VIDEO;
+}
+DECLARE_PCI_FIXUP_CLASS_EARLY(0x1797, 0x6869, PCI_CLASS_NOT_DEFINED, 0, fixup_tw6869_class);

It’ll be very difficult to know what’s going on without an actual device and the driver source as edited…even then it may not be easy. From what I’ve seen in the URL you gave, the driver author may be able to make a suggestion. Most likely the driver has been functioning on a typical x86_64 desktop distribution, so something slightly different in design might be needed for Jetson.

If lucky, the author will have worked on the code which causes the “Bad mode in Error handler detected, code 0xbf000002” message…he will possibly be able to go straight to the part of the code which generated this and be able to adjust. This may not “fix” the driver, but it would prevent the driver from the soft lockup and better information would be available for any other issues.

Do you think the problem is with the driver itself or the quirk that I applied to make it work?

I couldn’t say for sure, but odds are high it is with the driver itself. The question is what base kernel version was the driver designed for? For example, if it is running normally on a 4.x version kernel, there’s a lot that might go wrong putting it in a 3.x kernel. Just going from x86_64 to aarch64 would have effects on many drivers even if they are from the same base kernel version.

The driver and the quirk works for IMX6 board. I actually found that patch on the IMX6 forum, and they’re using the same driver. Here’s the relevant post, with the driver and patch link in the comments:

https://community.nxp.com/thread/319973

IMX linux kernel is probably 3.10.x, so it shouldn’t be a kernel issue. And the architecture is also ARM based.

Correct me if I’m wrong, but I think the IMX6 is 32-bit ARMv7, while JTX1 is 64-bit ARMv8-a. Despite a lot of similarities, they are still different architectures (the ARMV8 can enter a 32-bit compatibility mode to execute the older ARM 32-bit, but normal operation is a different instruction set). There are likely some differences in DMA between these two, but I couldn’t tell you what to look for. I’m not sure what would be required for a “proper” port to ARMv8-a.

Our company also manufactures a TW6869 based frame grabber “C351”, or DarkCrystal SD Capture Mini-PCIe Quad, which can capture 4 SD video streams simultaneously.

http://www.avermedia.com/professional/product/c351/overview

I have tested C351 on Jetson TX1 running L4T R24.1 with our own proprietary driver, and it seems to work fine. Here’s a “lspci -vvv” output of our C351 frame grabber on Jetson TX1, in case it helps.

01:00.0 Non-VGA unclassified device: Intersil Techwell Device 6869 (rev 01)
        Control: I/O+ Mem+ BusMaster+ SpecCycle- MemWINV- VGASnoop- ParErr+ Stepping- SERR+ FastB2B- DisINTx-
        Status: Cap+ 66MHz- UDF- FastB2B- ParErr- DEVSEL=fast >TAbort- <TAbort- <MAbort- >SERR- <PERR- INTx-
        Latency: 0, Cache Line Size: 64 bytes
        Interrupt: pin A routed to IRQ 130
        Region 0: Memory at 20000000 (32-bit, prefetchable) 
        Capabilities: <access denied>
        Kernel driver in use: TW6869

This probably wouldn’t help for the other frame grabber, but it is still interesting to compare. Your lspci info will be truncated though unless called with “sudo”. Then you could see things like how fast data lanes are running.

Did you make any changes in the kernel for making it work? Or does it work as expected by just compiling from source and loading it on the TX1? Did you have to add any quirk like I did to make it work because it didn’t recognize the TW6869 class without it.

No, I don’t need to make any change to the kernel (3.10.96-tegra).

I do need to do the following to make our C351 driver work on Jetson TX1 though:

  1. Add “vmalloc=320M cma=64M coherent_pool=48M” to kernel cmdline option, since our C351 driver would need those resources to work properly.
  2. Load these 2 kernel modules: “videobuf-vmalloc.ko” and “videobuf-dma-sg.ko”.

Here’s the more comprehensive version of “lspci -vvv” for your reference.

01:00.0 Non-VGA unclassified device: Intersil Techwell Device 6869 (rev 01)
        Control: I/O+ Mem+ BusMaster+ SpecCycle- MemWINV- VGASnoop- ParErr+ Stepping- SERR+ FastB2B- DisINTx-
        Status: Cap+ 66MHz- UDF- FastB2B- ParErr- DEVSEL=fast >TAbort- <TAbort- <MAbort- >SERR- <PERR- INTx-
        Latency: 0, Cache Line Size: 64 bytes
        Interrupt: pin A routed to IRQ 130
        Region 0: Memory at 20000000 (32-bit, prefetchable) 
        Capabilities: [40] Power Management version 3
                Flags: PMEClk- DSI- D1- D2- AuxCurrent=375mA PME(D0+,D1-,D2-,D3hot+,D3cold+)
                Status: D0 NoSoftRst- PME-Enable- DSel=0 DScale=0 PME-
        Capabilities: [50] MSI: Enable- Count=1/1 Maskable- 64bit+
                Address: 0000000000000000  Data: 0000
        Capabilities: [70] Express (v2) Legacy Endpoint, MSI 00
                DevCap: MaxPayload 512 bytes, PhantFunc 0, Latency L0s <1us, L1 <8us
                        ExtTag+ AttnBtn- AttnInd- PwrInd- RBE+ FLReset-
                DevCtl: Report errors: Correctable- Non-Fatal- Fatal- Unsupported-
                        RlxdOrd+ ExtTag+ PhantFunc- AuxPwr- NoSnoop-
                        MaxPayload 128 bytes, MaxReadReq 512 bytes
                DevSta: CorrErr- UncorrErr- FatalErr- UnsuppReq- AuxPwr+ TransPend-
                LnkCap: Port #0, Speed 2.5GT/s, Width x1, ASPM L0s L1, Exit Latency L0s <512ns, L1 <64us
                        ClockPM+ Surprise- LLActRep- BwNot-
                LnkCtl: ASPM L0s Enabled; RCB 64 bytes Disabled- CommClk+
                        ExtSynch- ClockPM+ AutWidDis- BWInt- AutBWInt-
                LnkSta: Speed 2.5GT/s, Width x1, TrErr- Train- SlotClk+ DLActive- BWMgmt- ABWMgmt-
                DevCap2: Completion Timeout: Not Supported, TimeoutDis+, LTR-, OBFF Not Supported
                DevCtl2: Completion Timeout: 50us to 50ms, TimeoutDis-, LTR-, OBFF Disabled
                LnkCtl2: Target Link Speed: 2.5GT/s, EnterCompliance- SpeedDis-
                         Transmit Margin: Normal Operating Range, EnterModifiedCompliance- ComplianceSOS-
                         Compliance De-emphasis: -6dB
                LnkSta2: Current De-emphasis Level: -6dB, EqualizationComplete-, EqualizationPhase1-
                         EqualizationPhase2-, EqualizationPhase3-, LinkEqualizationRequest-
        Capabilities: [100 v1] Advanced Error Reporting
                UESta:  DLP- SDES- TLP- FCP- CmpltTO- CmpltAbrt- UnxCmplt- RxOF- MalfTLP- ECRC- UnsupReq- ACSViol-
                UEMsk:  DLP- SDES- TLP- FCP- CmpltTO- CmpltAbrt- UnxCmplt- RxOF- MalfTLP- ECRC- UnsupReq- ACSViol-
                UESvrt: DLP+ SDES+ TLP- FCP+ CmpltTO- CmpltAbrt- UnxCmplt- RxOF+ MalfTLP+ ECRC- UnsupReq- ACSViol-
                CESta:  RxErr- BadTLP- BadDLLP- Rollover- Timeout- NonFatalErr-
                CEMsk:  RxErr- BadTLP- BadDLLP- Rollover- Timeout- NonFatalErr+
                AERCap: First Error Pointer: 00, GenCap+ CGenEn- ChkCap+ ChkEn-
        Capabilities: [140 v1] Virtual Channel
                Caps:   LPEVC=0 RefClk=100ns PATEntryBits=1
                Arb:    Fixed- WRR32- WRR64- WRR128-
                Ctrl:   ArbSelect=Fixed
                Status: InProgress-
                VC0:    Caps:   PATOffset=00 MaxTimeSlots=1 RejSnoopTrans-
                        Arb:    Fixed- WRR32- WRR64- WRR128- TWRR128- WRR256-
                        Ctrl:   Enable+ ID=0 ArbSelect=Fixed TC/VC=ff
                        Status: NegoPending- InProgress-
        Kernel driver in use: TW6869

Interesting…both the original device and the listed working device are “Intersil Techwell Device 6869 (rev 01)” on lspci. So even if they are different devices, they use the same “chipset”. They also both show driver as “Kernel driver in use: TW6869”. However, the original poster had to compile the driver from an outside source, and ended up with a probable threading issue…@jkjung, is there any way the origial poster could get a compiled kernel module driver from your version which he could try? I would bet that with that plus the kernel command line edits you gave his camera would work.

On an unrelated note about PCIe which I’ve noticed and wonder about for gen. 1 devices sometimes not showing up on lspci, it seems de-emphasis may be handled incorrectly in the root complex. In gen. 1 de-emphasis is fixed at -3.5dB, it wasn’t until gen. 2 that -6dB was added as an option. The basic idea seems to be that the increased de-emphasis would be used to support longer traces for PCIe devices which were physically further from the root complex. A gen. 1 device would be “hard coded” to behave with expectations of -3.5dB. A motherboard supporting gen. 2 would have a fixed -3.5dB for a PCIe slot close to the root complex, and would have a fixed -6dB de-emphasis for PCIe slots further away (mixing -3.5 and -6dB wouldn’t break things, but correct matching would improve signal). It wasn’t until gen. 3 that de-emphasis became “adjustable” with the endpoint participating in discovery of best de-emphasis. What I’m wondering about is whether the eye pattern is actually better at -6dB for this slot which is close to root complex, or if -3.5dB would be better? If it turns out that -3.5dB is better, then using -6dB could be part of the reason why spread spectrum would cause some of the cheaper PCIe cards to not quite show up.

I’m still confused how the driver works without adding a patch to the drivers/pci/quirks.c file. Because there’s no entry for TW686* PCI cards there and hence, no class would be assigned to those PCI cards. Which is exactly what I had an issue with earlier. (I was getting a PCI type 0 class 0 error on dmesg while loading). Here’s the quirk finally added to the Linux kernel in a later version:

https://github.com/torvalds/linux/commit/3657cebda5eb9dc1c4c6a0ea5b38bfef70aea50a

I have it working now, somehow. I built the driver as a module so that it doesn’t load with the kernel. The system boots up perfectly without any errors (yet). Then I insert the module with an insmod and it’s working as expected. I still haven’t tested it’s robustness but it seems to work this way. I’ll probably add the insmod command to a startup script so it loads automatically after bootup. Thanks for all your help @linuxdev and @jkjung.

I would still be interested to know if depmod and modprobe fail, while insmod at a later time works. Glad it works now though.

I noticed your company makes interesting products, have you tested CE511-HN on TX1? it seems PCIex4 card.