pcie wifi module doens't work

We install an external pcie wifi module on tx2. If we don’t change anything with dtb. We can ping network. But when use scp to transfer too large file (over than 1G) we get arm-smmu issue. Just like https://devtalk.nvidia.com/default/topic/1035808/jetson-tx2/altera-fpga-dma-to-tx2-via-pcie-problem/. After follow the instruction to disable smmu component (through change dtb), after boot up, I have bellow error message. Can somebody meet such kind of issue before?

[ 35.356379] hif_pci_probe: ramdump base 0xffffffc1dda00000 size 2095136
[ 35.368471] R0: wlan: [1045:E :HDD] hdd_apply_cfg_ini: Reg Parameter gRrmOperChanMax > allowed Maximum [8 > 7]. Enforcing Default= 4
[ 35.382876] R0: wlan: [1045:E :HDD] hdd_apply_cfg_ini: Reg Parameter gRrmNonOperChanMax > allowed Maximum [8 > 7]. Enforcing Default= 4
[ 35.398800] R0: wlan: [1045:E :HDD] hdd_apply_cfg_ini: Reg Parameter gtsf_gpio_pin > allowed Maximum [255 > 254]. Enforcing Default= 255
[ 35.413618] R0: wlan: [1045:E :HDD] hdd_apply_cfg_ini: Reg Parameter 5g_rssi_boost_threshold < allowed Minimum [4294967236 < 18446744073709551546]. Enforcing Default= 18446744073709551556
[ 35.432728] R0: wlan: [1045:E :HDD] hdd_apply_cfg_ini: Reg Parameter 5g_rssi_penalize_threshold < allowed Minimum [4294967226 < 18446744073709551536]. Enforcing Default= 18446744073709551546
[ 35.452298] R0: wlan: [1045:E :HDD] Name = [gEnableHostapdEdcaLocal] Value = [0]
[ 35.461023] R0: wlan: [1045:E :HDD] Name = [g_sta_change_cc_via_beacon] Value = [0]
[ 35.470984] R0: [insmod][17:00:22.093996] wlan: [1045:E :HDD] Name = [gEnableHostapdEdcaLocal] Value = [0]
[ 35.482173] R0: [insmod][17:00:22.105186] wlan: [1045:E :HDD] Name = [g_sta_change_cc_via_beacon] Value = [0]
[ 35.499454] NUM_DEV=1 FWMODE=0x2 FWSUBMODE=0x0 FWBR_BUF 0
[ 35.509121] ol_download_firmware: Using 0x1234 for the remainder of init
[ 35.536659] R0: [insmod][17:00:22.159667] wlan: [1045:E :VOS] __ol_transfer_bin_file: transferring file: otp30.bin size 24209 bytes done!
[ 35.550600] ol_download_firmware: chip_id:0x5030000 board_id:0x0
[ 35.559331] Board extended Data download address: 0x0
[ 35.571375] R0: [insmod][17:00:22.194386] wlan: [1045:E :VOS] __ol_transfer_bin_file: transferring file: bdwlan30.bin size 8124 bytes done!
[ 35.585708] __ol_transfer_bin_file: no Setup file defined
[ 36.120760] R0: [insmod][17:00:22.743768] wlan: [1045:E :VOS] __ol_transfer_bin_file: transferring file: qwlan30.bin size 653005 bytes done!
[ 36.135179] +HTCCreate .. HIF :ffffffc1e23e1000 
[ 36.141465] -HTCCreate (0xffffffc1e5b33000) 
[ 36.147645] R0: [insmod][17:00:22.770656] wlan: [1045:F :WDA] WMA --> wmi_unified_attach - success
[ 36.158169] ol_if_dfs_attach: called; ptr=ffffffc1e0815fa8, radar_info=ffffffc1e0ee3628
[ 36.167867] R0: [insmod][17:00:22.790878] wlan: [1045:E :SAP] dfs_init_radar_filters[217]: Unknown dfs domain 0 
[ 36.179634] send_filled_buffers_to_user: Send Failed -3 drop_count = 1
Segmentation fault
[ 36.179660] +HWT
[ 36.179663] pipe_num:0 pipe_info:0xffffffc1e23e10b8
worker@master:~$ [ 36.179703] pipe_num:3 pipe_info:0xffffffc1e23e1220
[ 36.179712] pipe_num:4 pipe_info:0xffffffc1e23e1298
[ 36.179777] Unable to handle kernel paging request at virtual address ffffffbfe08cf600
[ 36.179778] pgd = ffffffc1e4b12000
[ 36.179781] [ffffffbfe08cf600] *pgd=0000000000000000, *pud=0000000000000000
[ 36.179784] Internal error: Oops: 96000146 [#1] PREEMPT SMP
[ 36.179793] Modules linked in: wlan(+) ipt_MASQUERADE nf_nat_masquerade_ipv4 iptable_nat nf_nat_ipv4 nf_nat iptable_filter ip_tables pci_tegra bluedroid_pm
[ 36.179796] CPU: 5 PID: 1045 Comm: insmod Not tainted 4.4.38+ #2
[ 36.179797] Hardware name: quill (DT)
[ 36.179799] task: ffffffc1e2063200 ti: ffffffc1e0ee0000 task.ti: ffffffc1e0ee0000
[ 36.179805] PC is at __dma_inv_range+0x38/0x50
[ 36.179808] LR is at arm_dma_sync_single_for_device+0x2c/0x38
[ 36.179809] pc : [<ffffffc00009a6a8>] lr : [<ffffffc000095264>] pstate: 40000045
[ 36.179810] sp : ffffffc1e0ee3620
[ 36.179812] x29: ffffffc1e0ee3620 x28: ffffffc1e2b26600 
[ 36.179814] x27: 00000000608cf600 x26: ffffffc1e2859e00 
[ 36.179816] x25: 00000000f005ba11 x24: 0000000000000000 
[ 36.179818] x23: 0000000000000800 x22: ffffffc1e23e1158 
[ 36.179819] x21: ffffffc07b398800 x20: ffffffc1e23e1148 
[ 36.179821] x19: ffffffc1e23e1130 x18: 00000000fffffff0 
[ 36.179822] x17: 0000000000000000 x16: 000000000002e7a0 
[ 36.179824] x15: ffffffc001298858 x14: ffffffc00143dbd8 
[ 36.179826] x13: ffffffc001298000 x12: ffffffc00143c000 
[ 36.179827] x11: 0000000000000000 x10: 0000000000000068 
[ 36.179829] x9 : 0000000000000000 x8 : ffffffc000095270 
[ 36.179830] x7 : ffffffc1e200c098 x6 : ffffffc1e08cf600 
[ 36.179832] x5 : ffffffc00127f000 x4 : 0000000000000800 
[ 36.179833] x3 : 000000000000003f x2 : 0000000000000040 
[ 36.179835] x1 : ffffffbfe08cfe00 x0 : ffffffbfe08cf600 
[ 36.179835] 
[ 36.179837] Process insmod (pid: 1045, stack limit = 0xffffffc1e0ee0020)
[ 36.179838] Call trace:
[ 36.179840] [<ffffffc00009a6a8>] __dma_inv_range+0x38/0x50
[ 36.180440] [<ffffffbffc289288>] hif_post_recv_buffers_for_pipe+0x118/0x4f0 [wlan]
[ 36.181220] [<ffffffbffc2896b4>] hif_post_recv_buffers+0x54/0x88 [wlan]
[ 36.181854] [<ffffffbffc28b12c>] HIFStart+0x5c/0x128 [wlan]
[ 36.182420] [<ffffffbffc27f9f0>] HTCWaitTarget+0x30/0x278 [wlan]
[ 36.182977] [<ffffffbffc205684>] vos_open+0x4a4/0x8f0 [wlan]
[ 36.183513] [<ffffffbffc087514>] hdd_wlan_startup+0x67c/0x1f50 [wlan]
[ 36.184065] [<ffffffbffc2901e4>] hif_pci_probe+0x644/0x7e0 [wlan]
[ 36.184072] [<ffffffc000394918>] pci_device_probe+0xa0/0x118
[ 36.184076] [<ffffffc000589424>] driver_probe_device+0xcc/0x408
[ 36.184078] [<ffffffc0005897fc>] __driver_attach+0x9c/0xa0
[ 36.184081] [<ffffffc000587334>] bus_for_each_dev+0x64/0xa0
[ 36.184083] [<ffffffc000588d20>] driver_attach+0x20/0x28
[ 36.184085] [<ffffffc000588850>] bus_add_driver+0x1d0/0x298
[ 36.184088] [<ffffffc00058a5e0>] driver_register+0x60/0xf8
[ 36.184090] [<ffffffc000393700>] __pci_register_driver+0x38/0x40
[ 36.184657] [<ffffffbffc28fb00>] hif_register_driver+0x18/0x38 [wlan]
[ 36.185197] [<ffffffbffc07ae80>] hdd_hif_register_driver+0x30/0x120 [wlan]
[ 36.185714] [<ffffffbffc5f2104>] hdd_module_init+0x104/0x230 [wlan]
[ 36.185718] [<ffffffc000081c70>] do_one_initcall+0xd0/0x1d8
[ 36.185722] [<ffffffc00017084c>] do_init_module+0x64/0x1b0
[ 36.185726] [<ffffffc000122a88>] load_module+0xec0/0x11b8
[ 36.185727] [<ffffffc000123008>] SyS_finit_module+0xc8/0xf8
[ 36.185730] [<ffffffc000084ff0>] el0_svc_naked+0x24/0x28
[ 36.185948] ---[ end trace d9f04c3ecf264a17 ]---

Can you please share the make and model of the WiFi device you are using?
Also, it would be great if you can share ‘sudo lspci -vv’ output?

We are using i1465-sp whose the main chipset is QCA6574A-1.
sudo lspci -vv output show below.

worker@master:~$ lspci
00:03.0 PCI bridge: NVIDIA Corporation Device 10e6 (rev a1)
01:00.0 Network controller: Qualcomm Atheros QCA6174 802.11ac Wireless Network Adapter (rev 32)
worker@master:~$ sudo lspci -vv
00:03.0 PCI bridge: NVIDIA Corporation Device 10e6 (rev a1) (prog-if 00 [Normal decode])
	Control: I/O- Mem+ BusMaster+ SpecCycle- MemWINV- VGASnoop- ParErr- Stepping- SERR- FastB2B- DisINTx-
	Status: Cap+ 66MHz- UDF- FastB2B- ParErr- DEVSEL=fast >TAbort- <TAbort- <MAbort- >SERR- <PERR- INTx-
	Latency: 0
	Interrupt: pin A routed to IRQ 388
	Bus: primary=00, secondary=01, subordinate=01, sec-latency=0
	Memory behind bridge: 50200000-503fffff
	Secondary status: 66MHz- FastB2B- ParErr- DEVSEL=fast >TAbort- <TAbort- <MAbort- <SERR- <PERR-
	BridgeCtl: Parity- SERR- NoISA- VGA- MAbort- >Reset- FastB2B-
		PriDiscTmr- SecDiscTmr- DiscTmrStat- DiscTmrSERREn-
	Capabilities: [40] Subsystem: NVIDIA Corporation Device 0000
	Capabilities: [48] Power Management version 3
		Flags: PMEClk- DSI- D1- D2- AuxCurrent=0mA PME(D0+,D1+,D2+,D3hot+,D3cold+)
		Status: D0 NoSoftRst- PME-Enable- DSel=0 DScale=0 PME-
	Capabilities: [50] MSI: Enable- Count=1/2 Maskable- 64bit+
		Address: 0000000000000000  Data: 0000
	Capabilities: [60] HyperTransport: MSI Mapping Enable- Fixed-
		Mapping Address Base: 00000000fee00000
	Capabilities: [80] Express (v2) Root Port (Slot+), MSI 00
		DevCap:	MaxPayload 128 bytes, PhantFunc 0
			ExtTag+ RBE+
		DevCtl:	Report errors: Correctable+ Non-Fatal+ Fatal+ Unsupported+
			RlxdOrd+ ExtTag+ PhantFunc- AuxPwr- NoSnoop+
			MaxPayload 128 bytes, MaxReadReq 512 bytes
		DevSta:	CorrErr- UncorrErr- FatalErr- UnsuppReq- AuxPwr- TransPend-
		LnkCap:	Port #2, Speed 5GT/s, Width x1, ASPM L0s L1, Exit Latency L0s <512ns, L1 <4us
			ClockPM- Surprise- LLActRep+ BwNot+ ASPMOptComp-
		LnkCtl:	ASPM Disabled; RCB 64 bytes Disabled- CommClk-
			ExtSynch- ClockPM- AutWidDis- BWInt- AutBWInt-
		LnkSta:	Speed 2.5GT/s, Width x1, TrErr- Train- SlotClk+ DLActive+ BWMgmt- ABWMgmt-
		SltCap:	AttnBtn- PwrCtrl- MRL- AttnInd- PwrInd- HotPlug- Surprise-
			Slot #0, PowerLimit 0.000W; Interlock- NoCompl-
		SltCtl:	Enable: AttnBtn- PwrFlt- MRL- PresDet- CmdCplt- HPIrq- LinkChg-
			Control: AttnInd Off, PwrInd On, Power- Interlock-
		SltSta:	Status: AttnBtn- PowerFlt- MRL- CmdCplt- PresDet+ Interlock-
			Changed: MRL- PresDet+ LinkState+
		RootCtl: ErrCorrectable- ErrNon-Fatal- ErrFatal- PMEIntEna+ CRSVisible-
		RootCap: CRSVisible-
		RootSta: PME ReqID 0000, PMEStatus- PMEPending-
		DevCap2: Completion Timeout: Range AB, TimeoutDis+, LTR+, OBFF Not Supported ARIFwd-
		DevCtl2: Completion Timeout: 50us to 50ms, TimeoutDis-, LTR+, OBFF Disabled ARIFwd-
		LnkCtl2: Target Link Speed: 2.5GT/s, EnterCompliance- SpeedDis-
			 Transmit Margin: Normal Operating Range, EnterModifiedCompliance- ComplianceSOS-
			 Compliance De-emphasis: -6dB
		LnkSta2: Current De-emphasis Level: -3.5dB, EqualizationComplete-, EqualizationPhase1-
			 EqualizationPhase2-, EqualizationPhase3-, LinkEqualizationRequest-
	Capabilities: [100 v1] Advanced Error Reporting
		UESta:	DLP- SDES- TLP- FCP- CmpltTO- CmpltAbrt- UnxCmplt- RxOF- MalfTLP- ECRC- UnsupReq- ACSViol-
		UEMsk:	DLP- SDES- TLP- FCP- CmpltTO- CmpltAbrt- UnxCmplt- RxOF- MalfTLP- ECRC- UnsupReq- ACSViol-
		UESvrt:	DLP+ SDES- TLP- FCP+ CmpltTO- CmpltAbrt- UnxCmplt- RxOF+ MalfTLP+ ECRC- UnsupReq- ACSViol-
		CESta:	RxErr- BadTLP- BadDLLP- Rollover- Timeout- NonFatalErr-
		CEMsk:	RxErr- BadTLP- BadDLLP- Rollover- Timeout- NonFatalErr+
		AERCap:	First Error Pointer: 00, GenCap+ CGenEn- ChkCap+ ChkEn-
	Capabilities: [140 v1] L1 PM Substates
		L1SubCap: PCI-PM_L1.2+ PCI-PM_L1.1+ ASPM_L1.2+ ASPM_L1.1+ L1_PM_Substates+
			  PortCommonModeRestoreTime=30us PortTPowerOnTime=70us
	Kernel driver in use: pcieport

01:00.0 Network controller: Qualcomm Atheros QCA6174 802.11ac Wireless Network Adapter (rev 32)
	Control: I/O- Mem- BusMaster- SpecCycle- MemWINV- VGASnoop- ParErr- Stepping- SERR- FastB2B- DisINTx-
	Status: Cap+ 66MHz- UDF- FastB2B- ParErr- DEVSEL=fast >TAbort- <TAbort- <MAbort- >SERR- <PERR- INTx-
	Interrupt: pin A routed to IRQ 388
	Region 0: Memory at 50200000 (64-bit, non-prefetchable) [disabled] 
	Capabilities: [40] Power Management version 3
		Flags: PMEClk- DSI- D1- D2- AuxCurrent=375mA PME(D0+,D1-,D2-,D3hot+,D3cold+)
		Status: D0 NoSoftRst- PME-Enable- DSel=0 DScale=0 PME-
	Capabilities: [50] MSI: Enable- Count=1/8 Maskable+ 64bit-
		Address: 00000000  Data: 0000
		Masking: 00000000  Pending: 00000000
	Capabilities: [70] Express (v2) Endpoint, MSI 00
		DevCap:	MaxPayload 256 bytes, PhantFunc 0, Latency L0s unlimited, L1 <64us
			ExtTag- AttnBtn- AttnInd- PwrInd- RBE+ FLReset-
		DevCtl:	Report errors: Correctable- Non-Fatal- Fatal- Unsupported-
			RlxdOrd+ ExtTag- PhantFunc- AuxPwr- NoSnoop-
			MaxPayload 128 bytes, MaxReadReq 512 bytes
		DevSta:	CorrErr- UncorrErr- FatalErr- UnsuppReq- AuxPwr+ TransPend-
		LnkCap:	Port #0, Speed 2.5GT/s, Width x1, ASPM L0s L1, Exit Latency L0s <4us, L1 <64us
			ClockPM+ Surprise- LLActRep- BwNot- ASPMOptComp+
		LnkCtl:	ASPM Disabled; RCB 64 bytes Disabled- CommClk-
			ExtSynch- ClockPM- AutWidDis- BWInt- AutBWInt-
		LnkSta:	Speed 2.5GT/s, Width x1, TrErr- Train- SlotClk+ DLActive- BWMgmt- ABWMgmt-
		DevCap2: Completion Timeout: Not Supported, TimeoutDis+, LTR+, OBFF Via message
		DevCtl2: Completion Timeout: 50us to 50ms, TimeoutDis-, LTR-, OBFF Disabled
		LnkCtl2: Target Link Speed: 2.5GT/s, EnterCompliance- SpeedDis-
			 Transmit Margin: Normal Operating Range, EnterModifiedCompliance- ComplianceSOS-
			 Compliance De-emphasis: -6dB
		LnkSta2: Current De-emphasis Level: -6dB, EqualizationComplete-, EqualizationPhase1-
			 EqualizationPhase2-, EqualizationPhase3-, LinkEqualizationRequest-
	Capabilities: [100 v2] Advanced Error Reporting
		UESta:	DLP- SDES- TLP- FCP- CmpltTO- CmpltAbrt- UnxCmplt- RxOF- MalfTLP- ECRC- UnsupReq- ACSViol-
		UEMsk:	DLP- SDES- TLP- FCP- CmpltTO- CmpltAbrt- UnxCmplt- RxOF- MalfTLP- ECRC- UnsupReq- ACSViol-
		UESvrt:	DLP+ SDES+ TLP- FCP+ CmpltTO- CmpltAbrt- UnxCmplt- RxOF+ MalfTLP+ ECRC- UnsupReq- ACSViol-
		CESta:	RxErr- BadTLP- BadDLLP- Rollover- Timeout- NonFatalErr-
		CEMsk:	RxErr- BadTLP- BadDLLP- Rollover- Timeout- NonFatalErr+
		AERCap:	First Error Pointer: 00, GenCap- CGenEn- ChkCap- ChkEn-
	Capabilities: [148 v1] Virtual Channel
		Caps:	LPEVC=0 RefClk=100ns PATEntryBits=1
		Arb:	Fixed- WRR32- WRR64- WRR128-
		Ctrl:	ArbSelect=Fixed
		Status:	InProgress-
		VC0:	Caps:	PATOffset=00 MaxTimeSlots=1 RejSnoopTrans-
			Arb:	Fixed- WRR32- WRR64- WRR128- TWRR128- WRR256-
			Ctrl:	Enable+ ID=0 ArbSelect=Fixed TC/VC=ff
			Status:	NegoPending- InProgress-
	Capabilities: [168 v1] Device Serial Number 00-00-00-00-00-00-00-00
	Capabilities: [178 v1] Latency Tolerance Reporting
		Max snoop latency: 0ns
		Max no snoop latency: 0ns
	Capabilities: [180 v1] L1 PM Substates
		L1SubCap: PCI-PM_L1.2+ PCI-PM_L1.1+ ASPM_L1.2+ ASPM_L1.1+ L1_PM_Substates+
			  PortCommonModeRestoreTime=50us PortTPowerOnTime=10us
	Kernel modules: wlan

Do you need more info?

Hi,
We didn’t verify with this particular card.
BTW, I see that this card support Active State Power Management (ASPM) L1 Sub-States. Not sure if having those enabled is causing any issue.
Can you please disable ASPM completely and see if you could still observe issues?
echo “performance” > /sys/module/pcie_aspm/parameters/policy (do it as sudo) would disable all ASPM states

After some testing and investigation, we decide to enable arm-smmu finally.
But when we do large file transfer, we got follow crash message.

[  522.884358] arm-smmu 12000000.iommu: Unhandled context fault: iova=0x805bc000, fsynr=0x200003, cb=22, sid=17(0x11 - AFI), pgd=2568e0003, pud=2568e0003, pmd=24a591003, pte=0
[  522.899791] (255) csr_afir: EMEM address decode error
[  522.904881]   status = 0x2032700e; addr = 0x3ffffffc0
[  522.909962]   secure: yes, access-type: read
[  522.914253] unknown mcerr fault, int_status=0x00000000, ch_int_status=0x00000200, hubc_int_status=0x00000000
[  522.924086] unknown mcerr fault, int_status=0x00000000, ch_int_status=0x00000200, hubc_int_status=0x00000000
[  522.933914] unknown mcerr fault, int_status=0x00000000, ch_int_status=0x00000200, hubc_int_status=0x00000000

My first thought is smmu doesn’t map iova 0x805bc000 correctly. After check 10003000.pcie-controller ptdump. I found 0x805bc000 don’t below to iova part of 10003000.pcie-controller.
ref total mapped iova=10224KB, where I can enlarge those part of iova space or how to downsize the pcie driver iova space size. So that those two parts are matched each other.

Can you point me out whether my own understanding is correct?

root@master:/sys/kernel/debug/12000000.iommu/masters/10003000.pcie-controller/cb022# cat ptdump 
va=0x0000000080000000 pa=0x00000000f8000000 *pte=0x00600000f8000f43
va=0x0000000080002000 pa=0x00000000f8002000 *pte=0x00600000f8002f43
va=0x0000000080003000 pa=0x00000000f8003000 *pte=0x00600000f8003f43
va=0x0000000080005000 pa=0x00000000f8001000 *pte=0x00600000f8001f43
va=0x0000000080007000 pa=0x00000000f8004000 *pte=0x00600000f8004f43
...............................
va=0x00000000825ed000 pa=0x00000000f335b000 *pte=0x00600000f335bf43
va=0x00000000825ef000 pa=0x0000000250a2a000 *pte=0x0060000250a2af43
va=0x00000000825f1000 pa=0x0000000250a37000 *pte=0x0060000250a37f43
va=0x00000000825f3000 pa=0x00000000f3326000 *pte=0x00600000f3326f43
total mapped iova=10224KB