tw686x driver issue on jetpack4.2

Hi everyone:
We are using jetapck4.2 for tx2.
We found the kernel have the tw6869 driver in /driver/media/pci/tw686x dir .
We compile it as a module tw686x.ko.
But we found that if insert the tw686x.ko module the kernel crashed.
We have tried the following method all of them cause the kernel crash:
1) insmod tw686x.ko
2) insmod tw686x.ko dma_mode=contig
3) insmod tw686x.ko dma_mode=sg

Could anyone give us some guidance to debug this ,thanks a lot!
here is the crash log

nvidia@nvidia-desktop:~$ sudo insmod tw686x.ko 
[sudo] password for nvidia: 
nvidia@nvidia-desktop:~$ dmesg | tail
[   10.098330] IPv6: ADDRCONF(NETDEV_CHANGE): eth0: link becomes ready
[   10.154890] CFGP2P-ERROR) wl_cfgp2p_add_p2p_disc_if : 
[   10.154893] P2P interface registered
[   10.165385] WLC_E_IF: NO_IF set, event Ignored
[   10.175133] IPv6: ADDRCONF(NETDEV_UP): wlan0: link is not ready
[   12.023158] fuse init (API version 7.26)
[   12.074332] Bluetooth: BNEP (Ethernet Emulation) ver 1.3
[   12.074342] Bluetooth: BNEP socket layer initialized
[   33.131499] tw6869: PCI 0000:01:00.0, IRQ 381, MMIO 0x40100000 (memcpy mode)
[   33.131543] tw686x 0000:01:00.0: enabling device (0000 -> 0002)
nvidia@nvidia-desktop:~$ [   38.448347] CPU4: SError detected, daif=1c0, spsr=0x800000c5, mpidr=80000102, esr=bf40c000
[   38.448361] CPU5: SError detected, daif=1c0, spsr=0x800000c5, mpidr=80000103, esr=bf40c000
[   38.448376] CPU3: SError detected, daif=1c0, spsr=0x800000c5, mpidr=80000101, esr=bf40c000
[   38.644638] ROC:IOB Machine Check Error:
[   38.738893] CPU5: SError detected, daif=140, spsr=0x40000045, mpidr=80000103, esr=bf40c000
[   38.741516] ROC:IOB Machine Check Error:
[   38.741616] CPU3: SError detected, daif=1c0, spsr=0x800000c5, mpidr=80000101, esr=bf000002
[   38.935148] ROC:CCE Machine Check Error:
[   38.935437] ROC:IOB Machine Check Error:
[   38.935517] CPU4: SError detected, daif=1c0, spsr=0x800000c5, mpidr=80000102, esr=bf000002
[   39.029321] tegra-pcie 10003000.pcie-controller: PCIE: Transcation timeout, signature: dead2009
[   39.032017] ROC:IOB Machine Check Error:
[   39.032096] CPU3: SError detected, daif=1c0, spsr=0x800000c5, mpidr=80000101, esr=bf00c002
[   39.225706] ROC:IOB Machine Check Error:
[   39.229731] Bad mode in Error handler detected on CPU4, code 0xbf000002 -- SError
[   39.419308] ROC:IOB Machine Check Error:
[   39.419317]  Address Type = Secure DRAM
[   39.419331]  Address = 0x0 (Unknown Device)
[   39.419386] CPU3: SError detected, daif=1c0, spsr=0x800000c5, mpidr=80000101, esr=bf00c002
[   39.516208] ROC:IOB Machine Check Error:
[   39.516220]  Address Type = Secure DRAM
[   39.516246]  Address = 0x0 (Unknown Device)
[   39.516302] CPU5: SError detected, daif=140, spsr=0x40000045, mpidr


[   39.032096] CPU3: SError detected, daif=1c0, spsr=0x800000c5, mpidr=80000101, esr=bf00c002
[   39.225706] ROC:IOB Machine Check Error:
[   39.229731] Bad mode in Error handler detected on CPU4, code 0xbf000002 -- SError
[   39.419308] ROC:IOB Machine Check Error:
[   39.419317]  Address Type = Secure DRAM
[   39.419331]  Address = 0x0 (Unknown Device)
[   39.419386] CPU3: SError detected, daif=1c0, spsr=0x800000c5, mpidr=80000101, esr=bf00c002
[   39.516208] ROC:IOB Machine Check Error:
[   39.516220]  Address Type = Secure DRAM
[   39.516246]  Address = 0x0 (Unknown Device)
[   39.516302] CPU5: SError detected, daif=140, spsr=0x40000045, mpidr
nvidia@nvidia-desktop:~$ sudo insmod tw686x.ko dma_mode=contig
[sudo] password for nvidia: 
nvidia@nvidia-desktop:~$ [   31.699053] CPU4: SError detected, daif=1c0, spsr=0x800000c5, mpidr=80000102, esr=bf40c000
[   31.699061] CPU5: SError detected, daif=1c0, spsr=0x800000c5, mpidr=80000103, esr=bf40c000
[   31.699068] CPU3: SError detected, daif=1c0, spsr=0x800000c5, mpidr=80000101, esr=bf40c000
[   31.894259] ROC:IOB Machine Check Error:
[   31.988348] CPU5: SError detected, daif=140, spsr=0x40000045, mpidr=80000103, esr=bf40c000
[   31.991039] ROC:IOB Machine Check Error:
[   31.991112] CPU3: SError detected, daif=1c0, spsr=0x800000c5, mpidr=80000101, esr=bf000002
[   32.184671] ROC:CCE Machine Check Error:
[   32.184931] ROC:IOB Machine Check Error:
[   32.184992] CPU4: SError detected, daif=1c0, spsr=0x800000c5, mpidr=80000102, esr=bf000002
[   32.278849] tegra-pcie 10003000.pcie-controller: PCIE: Transcation timeout, signature: dead2009
[   32.281540] ROC:IOB Machine Check Error:
[   32.281601] CPU3: SError detected, daif=1c0, spsr=0x800000c5, mpidr=80000101, esr=bf00c002
[   32.475225] ROC:IOB Machine Check Error:
[   32.479214] Bad mode in Error handler detected on CPU4, code 0xbf000002 -- SError
[   32.668823] ROC:IOB Machine Check Error:
[   32.668828]  Address Type = Secure DRAM
[   32.668835]  Address = 0x0 (Unknown Device)
[   32.668884] CPU3: SError detected, daif=1c0, spsr=0x800000c5, mpidr=80000101, esr=bf00c002
[   32.862492] ROC:IOB Machine Check Error:
[   32.862498]  Address Type = Secure DRAM
[   32.862512]  Address = 0x0 (Unknown Device)

It’s better consult to the vendor to check if this driver support arm system.

Hello I have the same problem but I do have an extra question that can only be answered be nvidia developers. I did some checking where the errors of the type

CPU3: SError detected, daif=1c0, spsr=0x800000c5, mpidr=80000101, esr=bf00c002

and I found this happens in the function:

asmlinkage void handle_serr(unsigned long daif, unsigned long spsr, struct pt_regs *regs)

that is found in the file arch/arm64/kernel/traps.c
but when I compare this with the original kernel code from github I don’t find this function so it seems that it was added by nvidia developers, further more I don’t find this function being used anywhere else in the code.
So I was wondering how this got called in the first place, can someone answer that?

Try to use GFP_DMA instead of GFP_KERNEL to limits MSI address to only 32-bit region to enable
some of the PCIe end points where only 32-bit MSIs are supported.

Wow, that fixed it! Thank you so much!
so I simply changed the GFP_KERNEL allocations in the tw686x_probe function to GFP_DMA and now everything works like a charm.
I’ve also seen the flag GFP_DMA32 being used, wouldn’t it have been even better to use that one?

@gert
Suppose GFP_DMA32 is for x86 system for arm system they may be the same.