We are working on UART communication between Jetson Nano(developer kit) and custom controller connected via M.2 interface. We are frequently seeing the issue that the data read by Linux driver in nano is not the same as what was transmitted by custom controller FW.
We are seeing two extra random bytes between two packets. We don’t see these extra bytes in the analyser log (which is captured by connecting UART Rx pin on test point of custom controller).
We have tested same customer board with Xavier NX platform on /dev/ttyTHS1 interface and it is working fine with the same configurations.
We suspect something wrong with Linux driver which is not handling received data properly.
Keep in mind that UARTs have settings for stop bytes. The default is 115200 8N1, so that is one stop byte. If one end is setup to use a different number of stop bytes, then receiving an extra byte or two is what you told it to do. UARTs are not plug-n-play and do not have any means of asking each other what the settings are. Quite possibly the issue is just one side being set differently than the other.
Loopback is nice because a UART will always (mostly) agree with itself on settings when talking to itself. This is where TX and RX are wired together (and possibly CTS/RTS). You’d simply open a terminal program with the chosen setting using that port. If what you type in echoes correctly then you know the UART is working as expected.
We have performed loopback test by shorting Tx & Rx pins and performed data transfer for long duration and captured logs. We are seeing some random bytes received issue in loopback test as well. We are attaching test python script and log file for the reference. Please refer to line 2748, 3475 and 3679 in “loopback_logs.txt” for data corruption loopback_logs.txt (32.0 KB) serial_test.py (315 Bytes)
In terms of wiring contributing to noise, is the jumper from TX to RX short and straight, or does it contain a segment of wiring which might be a bit longer and perhaps curved?
→ Wiring is short.
What speeds have you test with when there were data corruption issues?
→ We have tested it with 9600, 115200 & 3000000 speed.
If you loopback test with two stop bytes, does the corruption reduce?
→ Nope. We are observing same corruption.
If you loopback test at slower speeds, does corruption reduce?
→ Yes. At lower speed corruption reduces but it is there.
If you jumper CTS and RTS and enable flow control, does corruption decrease?
→ After shorting tx/rx and rts/cts, we were not able to send/receive any data but we were able to send/receive data when only rx/tx was shorted
The CTS/RTS should have no effect at all unless CTS/RTS flow control is enabled. Can you check this again at speed 115200 8N1, with CTS/RTS jumpered, and see if it works with flow control off (corruption is ok, I just want to see if the hardware jumper itself is causing an effect when software is told to not use it, in which case data should flow)? With hardware flow control on there should be data transfer if and only if CTS/RTS is jumpered. Any different behavior is suspicious of hardware or driver failure.
I will recommend something like gtkterm for testing.
We are using attached python script for loopback testing. We are setting “rtscts” to “True” to enable hardware flow control and setting it to “False” to disable hardware flow control.
Here is the result of the suggested experiments.
Speed 115200 8N1 with CTS/RTS jumpered but flow control disabled
→ We are able to observe data loss in long run. We have performed experiment for 10 minutes. Attaching logs for reference.
Speed 115200 8N1 with CTS/RTS jumpered but flow control enabled
→ We have performed experiment for 10 minutes but we stopped receiving data after few seconds. Attaching logs for reference.
This is becoming a bit more peculiar. One variable I cannot predict is if program-based setup of the UART (in the Python script) might have an issue. I say this because although a legacy driver would have standardized IOCTL calls to make settings changes, I don’t know if the same is true for the Tegra High Speed driver.
I do know that when using gtkterm to simply type in values in loopback and look for echo, that this has functioned correctly. I am interested in knowing if the CTS/RTS testing (with and without the jumper, with and without gtkterm setting CTS/RTS) also behaves badly (I suspectgtkterm will behave badly and the Python method of making those settings won’t have anything to do with the issue, but if both have the same behavior, then we know for sure the issue has nothing to do with the method of setup).
There is one other possible influence on this: Device tree might have passed something unexpected. Beyond this it seems more likely to be hardware or driver failure. I’m not sure if there is a device tree setting which might cause this, perhaps someone from NVIDIA might know what influences enable or disable of CTS/RTS with the THS driver via device tree? If it isn’t related to device tree, then I don’t know of any other possible software causes and it starts looking more like hardware issues.