DMA on /dev/ttyTHS1 corrupts receiving data

Hello!

I’ve been having a weird problem with using the /dev/ttyTHS1 UART (&uarta on DeviceTree, pins 203, 205) on Jetson Orin Nano.

When receiving small packets (~ 12 bytes) on any baud rate (tested between 115200 and 500000) everything works fine. However, when trying to receive bigger packets (~24 bytes), I get errors in dmesg and corrupted data.

dmesg output during the errors

[  867.891028] nvidia_smmu_context_fault_bank: 24 callbacks suppressed
[  867.891044] arm-smmu 12000000.iommu: Unhandled context fault: fsr=0x402, iova=0x80202000, fsynr=0x450011, cbfrsynra=0xc04, cb=0
[  867.891318] arm-smmu 12000000.iommu: Unhandled context fault: fsr=0x402, iova=0x80202000, fsynr=0x450011, cbfrsynra=0xc04, cb=0
[  867.891456] arm-smmu 12000000.iommu: Unhandled context fault: fsr=0x402, iova=0x80202000, fsynr=0x450011, cbfrsynra=0xc04, cb=0
[  867.891593] arm-smmu 12000000.iommu: Unhandled context fault: fsr=0x402, iova=0x80202000, fsynr=0x450011, cbfrsynra=0xc04, cb=0
[  867.891861] arm-smmu 12000000.iommu: Unhandled context fault: fsr=0x402, iova=0x80202000, fsynr=0x450011, cbfrsynra=0xc04, cb=0
[  867.891997] arm-smmu 12000000.iommu: Unhandled context fault: fsr=0x402, iova=0x80202000, fsynr=0x450011, cbfrsynra=0xc04, cb=0
[  867.892265] arm-smmu 12000000.iommu: Unhandled context fault: fsr=0x402, iova=0x80202000, fsynr=0x450011, cbfrsynra=0xc04, cb=0
[  867.892402] arm-smmu 12000000.iommu: Unhandled context fault: fsr=0x402, iova=0x80202000, fsynr=0x450011, cbfrsynra=0xc04, cb=0
[  867.892538] arm-smmu 12000000.iommu: Unhandled context fault: fsr=0x402, iova=0x80202040, fsynr=0x450011, cbfrsynra=0xc04, cb=0
[  867.892806] arm-smmu 12000000.iommu: Unhandled context fault: fsr=0x402, iova=0x80202040, fsynr=0x450011, cbfrsynra=0xc04, cb=0
[  867.893348] tegra-mc 2c00000.memory-controller: unknown: secure write @0x00000003ffffff00: VPR violation ((null))
[  867.893357] tegra-mc 2c00000.memory-controller: unknown: secure write @0x00000003ffffff00: Route Sanity error ((null))
[  867.895636] tegra-mc 2c00000.memory-controller: unknown: secure write @0x00000003ffffff00: VPR violation ((null))
[  867.895643] tegra-mc 2c00000.memory-controller: unknown: secure write @0x00000003ffffff00: Route Sanity error ((null))
[  867.897913] tegra-mc 2c00000.memory-controller: unknown: secure write @0x00000003ffffff00: VPR violation ((null))
[  867.897919] tegra-mc 2c00000.memory-controller: unknown: secure write @0x00000003ffffff00: Route Sanity error ((null))

my-overlay.dtsi

    fragment@6 {
		target = <&uarta>;
		__overlay__ {
			status = "okay";
		};
	};

From the errors it looks like the issue was from dma, so I disabled it on the /dev/ttyTHS1 UART by setting dma-names to none and now the data is being received correctly even for larger data packets.

my-overlay-fix.dtsi

    fragment@6 {
		target = <&uarta>;
		__overlay__ {
			/*
			 * Override the inherited dma-names so the driver's
			 * of_property_match_string(np, "dma-names", "rx"/"tx")
			 * misses, setting use_rx_pio/use_tx_pio = true.
			 * /delete-property/ would be cleaner but doesn't
			 * survive fdtoverlay reliably.
			 */
			dma-names = "none", "none";
			status = "okay";
		};
	};

This is a very weird problem and I can’t explain why would DMA have these issues. Those bigger packets are also not being spammed - they are in multiple hundred millisecond intervals.

For example, on Jetson TX2NX this works just fine without editing anything in UART DeviceTree configuration.

What could be the options to get DMA working correctly?

First, be aware that one UART can have two drivers servicing it at the same time. One is the legacy driver, and this produces the “/dev/ttyS#” device special file; the other is the “/dev/ttyTHS#” name format, and the “THS” means “Tegra High Speed”, referring to the DMA you have mentioned. Legacy is supported in boot stages, whereas generally speaking you probably won’t find the DMA version supported in most (all?) boot environments unless someone has manually ported it there.

One possible corruption is if the non-DMA is also servicing actual data at the same time (or if it has buffered data). The most common case for this is if the serial console is putting out data to that file (when fully booted this can be to either the DMA or non-DMA version). If you run “ls -l” on the device special file you are using, and it shows group “dialout”, then this won’t be from serial console. If the group is “tty”, then you have a competing data source going through a UART.

The other issue is that the clock used for the serial UART will have some error at higher speeds. I would not be surprised if this is the cause, but I do not know of a solution to this. One way to improve the tolerance to clock issues at higher speed is to use two stop bits. Not all devices can use two stop bits, but if they can, then try this.

I understand but the /dev/ttyS# device is not being used and it’s only using the /dev/ttyTHS - that is clearly not the issue. dmesg on boot does also verify that it has sucessfully brought up the specific /dev/ttyTHS1.

I am pretty sure that this is Nvidia’s misconfiguration regarding the DMA in UART - why would sending work and receiving not work properly and send errors about SMMU?

NVIDIA would have to answer that. Clock rate issues though might still be an issue at higher speeds as TX and RX, though working together, are technically different UARTs. Have you tried testing in loopback modes? The Jetson clock itself is known to become an issue at the higher speeds which did not exist long ago, and the UART is inheriting some of that in silicon from the past.

Hi therealmatiss,

Are you using the devkit or custom board for Orin Nano?
What’s the Jetpack version in use?

If you want to run UART in DMA mode, please refer to the following thread for the fix.
Solved: UART/Serial Port not working after upgradint to Jetpack 6.2.2 (Orin Nano/NX) - #7 by KevinFFF

It’s Jetpack 6.2.2. The issue is replicable both on custom board and on DevKit.

I tried your suggestion and it works after adding this to my overlay:

    fragment@6 {
		target = <&uarta>;
		__overlay__ {
			dmas = <&gpcdma 8>, <&gpcdma 8>;
			dma-names = "rx", "tx";
			iommus = <&smmu_niso0 TEGRA234_SID_GPCDMA>;
			status = "okay";
		};
	};

Thank you for the response. Is this going to be fixed in the nearest release?

Yes, it will be fixed in the next official Jetpack release.