SMMU Global Fault and EMEM Decode Error on Jetson Orin with Custom nveqos-based Ethernet Driver

Hello,

I am currently developing a high-performance EtherCAT Master Stack for the NVIDIA Jetson platform. As we have several customers looking to deploy our solution on Jetson-based hardware, we are working on writing our own low-level driver for the integrated Ethernet controller.

Specifically, we are focusing on the nveqos driver at base address 0x2310000. The DMA seems to Start and wants to read from an Adress, but the IOMMU throws an error:

Our Module inits

DMA buffer before align: VirtAddr 0x0000ffff8bb0a000, VirtAddrUncached 0x0000ffff8bb0a000, PhysAddr 0x00000000808000 00

VirtAddr: DmaDtorBase 0x0000ffff8bb1f000, DmaBase 0x0000ffff8bb0a 000

PhysAddr: DmaDtorBase 0x0000000080815000, DmaBase 0x000000008080 0000

PhyInit(): Found PHY RTL8211B, id = 0x001CC916, address =0x0 0

When the dma transfers start dmesg telle me:

[ 209.118108] atemsys: mmap: mapped IO memory, Phys:0x2310000 UVirt:0x0000ffff8c011000 Si ze:65536

[ 209.120356] atemsys: mmap: mapped DMA memory, Phys:0x0000000080800000 KVirt:0xffff800015bb5000 UVirt:0x0000ffff8bb0a000 S ize:90112

[ 210.931909] arm-smmu 8000000.iommu: Unexpected global fault, this could be serious

[ 210.931930] arm-smmu 8000000.iommu: GFSR 0x80000002, GFSYNR0 0x00000000, GFSYNR1 0x00000000, GFSYNR2 0x00000000

[ 210.936722] tegra-mc 2c00000.memory-controller: unknown: secure read @0x000000ffffffff00: EMEM address decode error (EMEM d ecode error)

[ 210.936742] tegra-mc 2c00000.memory-controller: unknown: secure write @0x00000003ffffff00: VPR viola tion ((null))

[ 210.936757] tegra-mc 2c00000.memory-controller: unknown: secure write @0x00000003ffffff00: Route Sanity error ((null))

[ 210.936821] arm-smmu 8000000.iommu: Unexpected global fault, this c ould be serious

[ 210.936826] arm-smmu 8000000.iommu: GFSR 0x80000002, GFSYNR0 0x00000002, GFSYNR1 0x00000000, GF SYNR2 0x00000000

[ 211.033032] arm-smmu 8000000.iommu: Unexpected global fault, this could be serious

[ 211.033044] arm-smmu 8000000.iommu: GFSR 0x80000002, GFSYNR0 0x00000000, GFSYNR1 0x00000000, GFSYNR2 0x00000000

[ 211.037770] tegra-mc 2c00000.memory-controller: unknown: secure read @0x000000ffffffff00: EMEM address decode error (EMEM decode error)

[ 211.037779] tegra-mc 2c00000.memory-controller: unknown: secure write @0x00000003ffffff00: VP R violation ((null))

[ 211.037786] tegra-mc 2c00000.memory-controller: unknown: secure write @0x00000003ffffff00: Route Sanity error ((null))

[ 211.037824] arm-smmu 8000000.iommu: Unexpected global fault, this could be serious

[ 211.037827] arm-smmu 8000000.iommu: GFSR 0x80000002, GFSYNR0 0x00000002, GFSYNR1 0x00000 000, GFSYNR2 0x00000000

[ 211.042566] tegra-mc 2c00000.memory-controller: unknown: secure read @0x000000ffffffff00: EMEM address decode e rror (EMEM decode error)

[ 211.042575] tegra-mc 2c00000.memory-controller: unknown: secure write @0x00000003ffffff0 0: VPR violation ((null))

[ 211.042582] tegra-mc 2c00000.memory-controller: unknown: secure write @0x00000003ffffff00: R oute Sanity error ((null))

[ 211.042639] tegra-mc 2c00000.memory-controller: unknown: secure read @0x000000ffffffff00: EMEM address decod e error (EMEM decode error)

[ 211.042654] arm-smmu 8000000.iommu: Unexpected global fault, this could be serious

[ 211.042657] arm-smmu 8000000.iommu: GFSR 0x80000002, GFSYNR0 0x00000000, GFSYNR1 0 x00000000, GFSYNR2 0x00000000

[ 211.047385] arm-smmu 8000000.iommu: Unexpected globa l fault, this could be serious

[ 211.047389] arm-smmu 8000000.iommu: GFSR 0x80000002, GFSYNR0 0x00000002, GFSYNR1 0x00000000, GFSYNR2 0x00000000

[ 211.052106] arm-smmu 8000000.iommu: Unexpected glo bal fault, this could be serious

[ 211.052110] arm-smmu 8000000.iommu: GFSR 0x80000002, GFSYNR0 0x00000002, GFSYN R1 0x00000000, GFSYNR2 0x00000000

[ 211.056816] arm-smmu 8000000.iommu: Unexpected g lobal fault, this could be serious

[ 211.056819] arm-smmu 8000000.iommu: GFSR 0x80000002, GFSYNR0 0x00000002, GFS YNR1 0x00000000, GFSYNR2 0x00000000

[ 211.061532] arm-smmu 8000000.iommu: Unexpected global fault, this could be serious

[ 211.061536] arm-smmu 8000000.iommu: GFSR 0x80000002, GFSYNR0 0x00000002, G FSYNR1 0x00000000, GFSYNR2 0x00000000

[ 211.066239] arm-smmu 8000000.iommu: Unexpect ed global fault, this could be serious

[ 211.066241] arm-smmu 8000000.iommu: GFSR 0x80000002, GFSYNR0 0x00000002, GFSYNR1 0x00000000, GFSYNR2 0x00000000

[ 213.937962] atemsys: dev_munmap: 0xffff80 0015bb5000 → 0x0000000080800000 (90112)

[ 213.939500] atemsys: devi ce_release, pDevDesc = 0xffff00008b704c00

[ 213.939519] atemsys: 2310000.ethern et: Cleanup: pDevDesc = 0xffff00008b704c00

I already tried to disable the connected iommu and dma-coherent from the ethernet node with:

ethernet@2310000 {

status = “okay”;

nvidia,mac-addr-idx = <0x05>;

nvidia,phy-reset-gpio = <0xf9 0x35 0x00>;

phy-mode = “rgmii-id”;

phy-handle = <0x14a>;

nvidia,max-platform-mtu = <0x3fff>;

compatible = “atemsys”;

atem sys-Ident = “DW3504”;

atemsys-Instance = <0x1>;

reg = <0x00 0x2310000 0x00 0x10000 0x 00 0x23d0000 0x00 0x10000 0x00 0x2300000 0x00 0x10000>;

reg-names = “mac\0macsec-base\0hypervisor”;

interrupts = <0x00 0xc2 0x04 0x00 0xba 0x04 0x00 0xbb 0x04 0x00 0xbc 0x04 0x00 0xbd 0x04 0x00 0xbe 0x04 0x00 0xbf 0x04>;

interrupt-names = “common\0vm0\0vm1\0vm2\0vm3\0macsec-ns-irq\0macsec-s-irq”;

resets = <0x06 0x11 0x06 0x16>;

reset-names = “mac\0macsec_ns_rst”;

clocks = <0x06 0x120 0x06 0x20 0x06 0x22 0x06 0x21 0x06 0x23 0x0 6 0x08 0x06 0x46 0x06 0x17 0x06 0x19e 0x06 0x19f 0x06 0x19d>;

clock-names = “pllrefe_vcoout\0eqos_axi\0eqos_rx\0eqos_ptp_ref\0eqos_tx\0axi_cbb\0eqos_rx_m\0eq os_rx_input\0eqos_macsec_tx\0eqos_tx_divider\0eqos_macsec_rx”;

interconnects = <0x5b 0x8e 0x5b 0x8f>;

interconnect-na mes = “dma-mem\0write”;

//iommus = <0x10a 0x03>;

nvidia,num-dma-chans = <0x08>;

nvidia,num-mtl-queues = <0x08>;

nvidia,mtl-queues = <0x00 0x01 0x02 0x03 0x04 0x05 0x06 0x07>;

nvidia,dma-chans = <0x00 0x01 0x02 0x03 0x04 0x05 0x06 0x07>;

nvidia,tc-mapping = <0x00 0x01 0x02 0x03 0x04 0x05 0x06 0x07>;

nvidia,residual-queue = <0x01>;

nvidia,rx-queue-prio = <0x02 0x01 0x30 0x48 0x00 0x00 0x00 0x00>;

nvidia,tx-queue-prio = <0x00 0x07 0x02 0x03 0x00 0x00 0x00 0x00>;

nvidia,rxq_enable_ctrl = <0x02 0x02 0x02 0x02 0x02 0x02 0x02 0x02>;

nvidia,vm-irq-config = <0x 14b>;

nvidia,dcs-enable = <0x0 1>;

nvidia,macsec-enable = <0x00 >;

nvidia,pad_calibration = <0x01>;

nvidia,pad_auto_cal_pd_offset = <0x00>;

nvidia,pad_auto_cal_pu_offs et = <0x00>;

nvidia,rx_riw t = <0x200>;

nvidia,rx_fra mes = <0x40>;

nvidia,tx_us ecs = <0x100>;

nvidia,tx_fram es = <0x05>;

nvidia,promisc_mode = <0x01>;

nvidia,slot_num_check = <0x00 0x00 0x00 0x00 0x00 0x00 0x00 0x00>;

nvidia,slot_intvl_vals = <0x00 0x7d 0x7d 0x7d 0x7d 0x7d 0x7d 0x7 d>;

nvidia,ptp_ref_clock_spe ed = <0xc6aea16>;

nvidia,inst ance_id = <0x04>;

nvidia,ptp-rx-queue = <0x03>;

pinctrl-names = “mii _rx_disable\0mii_rx_enable”;

pinctrl-0 = <0x14c>;

pi nctrl-1 = <0x14d>;

nvidia,dma_rx _ring_sz = <0x400>;

nvidi a,dma_tx_ring_sz = <0x400>;

//dma-coherent;

Best regards,

Andreas

Hello,

Thanks for visiting the NVIDIA Developer Forums.
To ensure better visibility and support, I’ve moved your post to the Jetson category where it’s more appropriate

Cheers,
Tom

Hi,
We don’t support this use-case. The driver by default is for the Orin NX module. Please check which interface(such as PCIe, USB, …) your component is connected to, and follow standard Linux interface to develop the driver.

Thank you for your reply. However, there seems to be a misunderstanding regarding our hardware setup.

We are not using an external PCIe or USB Ethernet component. We are writing a custom low-level driver for the integrated Tegra EQOS Ethernet MAC present directly on the Jetson Orin AGX SoC (Base Address 0x23100000).

As seen in the dmesg logs provided, the memory transactions are being blocked by the SoC’s internal memory controller (tegra-mc) and the IOMMU (arm-smmu), resulting in: EMEM address decode error, Route Sanity error …

This is strictly a Tegra-specific hardware/SoC issue, not a generic Linux interface problem. The internal Memory Controller, Stream IDs (SIDs), and hardware memory firewalls are proprietary and specifically configured by NVIDIA’ s.

Even after disabling the iommus property in the device tree for the ethernet node, the tegra-mc still denies the DMA transactions. This suggests that the memory region our DMA is trying to access is locked at the hardware level, or the EQOS MAC is asserting a Stream ID that the Memory Controller rejects without the SMMU translating it.

Hi,
The topic is in Orin NX category. Do you use AGX Orin or Orin NX? Which Jetpack version is used? Do you use developer kit or custom board? Would like to confirm the platform and software version.

Hello,

I use the Jetson AGX Orin 32Gb H01 Kit with NVIDIA Jetson Linux 36.4.3.

Hi,
We have the guidance to enable MGBE in SW and HW:
Jetson Download Center | NVIDIA Developer
Jetson AGX Orin Platform Adaptation and Bring-Up — NVIDIA Jetson Linux Developer Guide 1 documentation

If you follow design guide, you should only need to configure device tree and the driver is supposed to work. No need to modify driver code.

Hi DaneLLL,

Thank you for the documentation links.

To clarify our use case: We are not using the standard network driver (nveqos) for this interface because we are developing a custom high-performance real-time Ethernet driver that runs in user space .

We are using a custom kernel module (atemsys) to map the hardware registers and to allocate contiguous DMA memory. The actual driver logic (initializing the MAC, setting up descriptors, starting DMA) is implemented in our user-space application, based on the nveqos logic.

The problem we are facing is related to the SMMU/IOMMU configuration on the AGX Orin. When we allocate memory in our kernel module (using dma_alloc_coherent) and pass the address to the EQOS hardware descriptors, we encounter SMMU faults (e.g., GFSR 0x80000002, Unexpected global fault, EMEM address decode error) as soon as the hardware tries to access that memory.

It seems the EQOS hardware is using a Stream ID that is not authorized to access the memory allocated by our driver, or the SMMU translation is failing.

Is there some compatible matches inside the smmu driver where we need to name our custom driver/module?

Could you please advise on the following for a custom driver scenario:

  1. How should we configure the Device Tree to ensure that memory allocated by our custom driver is correctly mapped for the EQOS hardware’s Stream ID?

  2. Is there a supported way to put the SMMU for EQOS into “bypass” mode to maybe try and see if this solution works?

We tried removing iommus and dma-coherent properties from the device tree, but this resulted in the errors mentioned above.

Thanks for your support.