Ath9k driver causes csr_afir: EMEM address decode error (FIXED)

Hi,

I am working with a Atheros AR9580 card and the Jetson TX1 developer kit.

I have compiled the Linux stock Ath9k drivers but I am continually seeing a significant number of “EMEM address decode error”.

[ 73.355182] mc-err: (0) csr_afir: EMEM address decode error
[ 73.360788] mc-err: status = 0x2000500e; addr = 0x7a743f40
[ 73.366523] mc-err: secure: no, access-type: read, SMMU fault: none
[ 73.415047] mc-err: (0) csr_afir: EMEM address decode error
[ 73.420702] mc-err: status = 0x2004600e; addr = 0x7a744180
[ 73.426628] mc-err: secure: no, access-type: read, SMMU fault: none
[ 73.474921] mc-err: Too many MC errors; throttling prints
[ 114.065542] mc-err: (0) csr_afir: EMEM address decode error
[ 114.071227] mc-err: status = 0x2004100e; addr = 0x7a744840
[ 114.077211] mc-err: secure: no, access-type: read, SMMU fault: none
[ 114.093414] mc-err: (0) csw_afiw: EMEM address decode error
[ 114.099179] mc-err: status = 0x20011031; addr = 0x7a92e060
[ 114.105003] mc-err: secure: no, access-type: write, SMMU fault: none
[ 114.118055] mc-err: (0) csw_afiw: EMEM address decode error
[ 114.123684] mc-err: status = 0x20011031; addr = 0x78188060
[ 114.129543] mc-err: secure: no, access-type: write, SMMU fault: none
[ 114.185170] mc-err: (0) csr_afir: EMEM address decode error
[ 114.190835] mc-err: status = 0x2004600e; addr = 0x7b5baf80
[ 114.196815] mc-err: secure: no, access-type: read, SMMU fault: none

I have tried to link the address back to the driver code, but I cant find anywhere with this address range. Is this a problem with the driver or something else (I find it hard to believe that its the driver since this driver has been used for quite some time). I have connected the card to the M2 connector on the dev board which I believe is port 1 (1 lane).

Is there a way to get a stack trace of the section of code causing the error?

Any help is appreciated.

If it helps, here is the output from lspci

01:00.0 Network controller: Qualcomm Atheros AR9580 Wireless Network Adapter (rev 01)
	Subsystem: Device 1c14:0059
	Control: I/O- Mem+ BusMaster+ SpecCycle- MemWINV- VGASnoop- ParErr+ Stepping- SERR+ FastB2B- DisINTx-
	Status: Cap+ 66MHz- UDF- FastB2B- ParErr- DEVSEL=fast >TAbort- <TAbort- <MAbort- >SERR- <PERR- INTx-
	Latency: 0, Cache Line Size: 64 bytes
	Interrupt: pin A routed to IRQ 130
	Region 0: Memory at 13000000 (64-bit, non-prefetchable) 
	[virtual] Expansion ROM at 20000000 [disabled] 
	Capabilities: [40] Power Management version 3
		Flags: PMEClk- DSI- D1+ D2- AuxCurrent=375mA PME(D0+,D1+,D2-,D3hot+,D3cold-)
		Status: D0 NoSoftRst- PME-Enable- DSel=0 DScale=0 PME-
	Capabilities: [50] MSI: Enable- Count=1/4 Maskable+ 64bit+
		Address: 0000000000000000  Data: 0000
		Masking: 00000000  Pending: 00000000
	Capabilities: [70] Express (v2) Endpoint, MSI 00
		DevCap:	MaxPayload 128 bytes, PhantFunc 0, Latency L0s <1us, L1 <8us
			ExtTag- AttnBtn- AttnInd- PwrInd- RBE+ FLReset-
		DevCtl:	Report errors: Correctable- Non-Fatal- Fatal- Unsupported-
			RlxdOrd+ ExtTag- PhantFunc- AuxPwr- NoSnoop-
			MaxPayload 128 bytes, MaxReadReq 512 bytes
		DevSta:	CorrErr- UncorrErr- FatalErr- UnsuppReq- AuxPwr- TransPend-
		LnkCap:	Port #0, Speed 2.5GT/s, Width x1, ASPM L0s L1, Exit Latency L0s <2us, L1 <64us
			ClockPM- Surprise- LLActRep- BwNot-
		LnkCtl:	ASPM L0s Enabled; RCB 64 bytes Disabled- CommClk+
			ExtSynch- ClockPM- AutWidDis- BWInt- AutBWInt-
		LnkSta:	Speed 2.5GT/s, Width x1, TrErr- Train- SlotClk+ DLActive- BWMgmt- ABWMgmt-
		DevCap2: Completion Timeout: Not Supported, TimeoutDis+, LTR-, OBFF Not Supported
		DevCtl2: Completion Timeout: 50us to 50ms, TimeoutDis-, LTR-, OBFF Disabled
		LnkCtl2: Target Link Speed: 2.5GT/s, EnterCompliance- SpeedDis-
			 Transmit Margin: Normal Operating Range, EnterModifiedCompliance- ComplianceSOS-
			 Compliance De-emphasis: -6dB
		LnkSta2: Current De-emphasis Level: -6dB, EqualizationComplete-, EqualizationPhase1-
			 EqualizationPhase2-, EqualizationPhase3-, LinkEqualizationRequest-
	Capabilities: [100 v1] Advanced Error Reporting
		UESta:	DLP- SDES- TLP- FCP- CmpltTO- CmpltAbrt- UnxCmplt- RxOF- MalfTLP- ECRC- UnsupReq- ACSViol-
		UEMsk:	DLP- SDES- TLP- FCP- CmpltTO- CmpltAbrt- UnxCmplt- RxOF- MalfTLP- ECRC- UnsupReq- ACSViol-
		UESvrt:	DLP+ SDES+ TLP- FCP+ CmpltTO- CmpltAbrt- UnxCmplt- RxOF+ MalfTLP+ ECRC- UnsupReq- ACSViol-
		CESta:	RxErr- BadTLP- BadDLLP- Rollover- Timeout- NonFatalErr-
		CEMsk:	RxErr- BadTLP- BadDLLP- Rollover- Timeout- NonFatalErr+
		AERCap:	First Error Pointer: 00, GenCap- CGenEn- ChkCap- ChkEn-
	Capabilities: [140 v1] Virtual Channel
		Caps:	LPEVC=0 RefClk=100ns PATEntryBits=1
		Arb:	Fixed- WRR32- WRR64- WRR128-
		Ctrl:	ArbSelect=Fixed
		Status:	InProgress-
		VC0:	Caps:	PATOffset=00 MaxTimeSlots=1 RejSnoopTrans-
			Arb:	Fixed- WRR32- WRR64- WRR128- TWRR128- WRR256-
			Ctrl:	Enable+ ID=0 ArbSelect=Fixed TC/VC=ff
			Status:	NegoPending- InProgress-
	Capabilities: [300 v1] Device Serial Number 00-00-00-00-00-00-00-00
	Kernel driver in use: ath9k

I don’t have one of these to test with, and do not know specifics…however, there is always firmware which goes along with wireless devices. Have you installed any required firmware files?

FYI, the lspci shows the device is at least working correctly with the PCIe.

There is no firmware for this device. I have it working on a older kernel (2.6.37) with stock drivers with a different processor but I am trying to upgrade hardware.

Am I correct in understanding that the SMMU is used to map the virtual addresses from the hardware to physical memory space without the kernel intervention?

I do not know details on SMMU.

As for firmware, wireless devices are not your average device. Device access may work through default device configuration, but this is highly unlikely in this case. Kernel 2.6.37 is much older than the L4T 3.10 kernels…I would double check on firmware.

What happens is that firmware is loaded into the camera, not into Linux…Linux only acts as a firmware server from which the device downloads its firmware. Changing the /lib/firmware can change how the driver must access the device. Data offsets and sizes and function can change by changing firmware. Change the firmware access details and the driver is no longer valid…the driver would be accessing the wrong memory location and/or size. The 2.6.37 kernel had firmware, the driver was matched to it…the question is if the 3.10 driver needs updated firmware…likely it does.

I do not know which firmware is used for the Atheros…perhaps someone here knows. There is a strong chance that this causes memory error messages.

The ath9k is a memory mapped chip, rather than an offload cpu that requires a firmware download.

Refer to ath9k - Debian Wiki or https://wikidevi.com/wiki/Ath9k

“[Ath9k] does not require a binary HAL (hardware abstraction layer) and no firmware is required to be loaded from userspace.”

I am wondering if this could be the cause of my problems.

commit 878749aabadc3e12f57271a2c41581f7b46c3bb8
Author: Hiroshi Doyu hdoyu@nvidia.com
Date: Wed Oct 15 03:42:28 2014 +0300

iommu/tegra: of: disable IOMMU PCIe temporary

HACK: Disable IOMMU PCIe till dynamic loadable module issue is solved.

Bug 1561604
Bug 200168018

Change-Id: Id0960a4e7f6001175cc6dc21b8403d2865e51a1c
Signed-off-by: Martin Gao <marting@nvidia.com>
Reviewed-on: http://git-master/r/998410
Reviewed-by: Vidya Sagar <vidyas@nvidia.com>
Reviewed-by: Vinayak Pane <vpane@nvidia.com>

diff --git a/drivers/iommu/of_tegra-smmu.c b/drivers/iommu/of_tegra-smmu.c
index 512fa6b…d71e89c 100644
— a/drivers/iommu/of_tegra-smmu.c
+++ b/drivers/iommu/of_tegra-smmu.c
@@ -166,6 +166,7 @@ u64 tegra_smmu_of_get_swgids(struct device *dev,
u64 fixup, swgids = 0;

    if (dev_is_pci(dev)) {
  •           return SWGIDS_ERROR_CODE;
              swgids = TEGRA_SWGROUP_BIT(AFI);
              goto try_fixup;
      }
    

Can you please give the DMA capabilities of this WiFi chip?
Is it capable of 64-bit addressing or only capable of 32-bit addresses?

I see from ath9k driver that the DMA hardware in the chip is only 32-bit capable.
If you are using 24.1 release, please apply following changes and it should work

— a/arch/arm64/boot/dts/tegra210-soc-base.dtsi
+++ b/arch/arm64/boot/dts/tegra210-soc-base.dtsi
@@ -1252,6 +1252,7 @@
0x82000000 0 0x13000000 0x0 0x13000000 0 0x0d000000 /* non-prefetchable memory (208 MiB) /
0xc2000000 0 0x20000000 0x0 0x20000000 0 0x20000000>; /
prefetchable memory (512 MiB) */

  •   iommus = <&smmu TEGRA_SWGROUP_AFI>;
      status = "disabled";
    
      pci@1,0 {
    

and

— a/drivers/iommu/of_tegra-smmu.c
+++ b/drivers/iommu/of_tegra-smmu.c
@@ -166,7 +166,6 @@ u64 tegra_smmu_of_get_swgids(struct device *dev,
u64 fixup, swgids = 0;

if (dev_is_pci(dev)) {
  •   return SWGIDS_ERROR_CODE;
      swgids = TEGRA_SWGROUP_BIT(AFI);
      goto try_fixup;
    
    }

I am getting the opposite error on Jetson with kernel 3.10.67. How would I resolve a 32 bit DMA error.

ath9k: 32-bit dma not available

ath9k: probe of 0000:01:00.0 failed with error -5