TX2 UART2_RX read data error

I have used UART2_RX(PIN B16) to accept data, when i run the detection program and the host pc will send the
heartbeat messages. Sometimes the rev data will be error and lost for only several seconds(maybe 3 seconds) and it will go well.
The same application on TK1 moudle is OK and never go wrong.

The UART2_RX is on 9600 BaudRate, 8 Data bits and 1 stop bits.

The Jetson_TX2_Series_Modules_DataSheet says:
NOTE: The UART receiver input has low baud rate tolerance in 1-stop bit mode. External devices must use 2 stop bits.
In 1-stop bit mode, the Tegra UART receiver can lose sync between Tegra receiver and the external transmitter
resulting in data errors/corruption. In 2-stop bit mode, the extra stop bit allows the Tegra UART receiver logic to
align properly with the UART transmitter.

I’m not sure if it’s related to this note.

You always want both sides to use the same number of stop bits. At higher bit rates the UART will require two stop bits (a limitation of the hardware, typically up to 115200 can work with one stop bit). A slow bit rate like 9600 probably shouldn’t require two stop bits.

But I get the data for uart buff, it lost a few frames of data.Then few seconds later it go well.
It looks like the uart reset.I have check the rev data both on tx2 and pc,the data it is ok to my pc.

There may be underflow/overflow, especially if there is some period of time which is switching from energy saving modes. You might try testing again in max performance mode. The following is correct for more recent releases, but needs a bit of modification for earlier releases:

# Make max performance range available:
sudo nvpmodel -m 0
# Force to high end of current performance range:
jetson_clocks

Incidentally, if it is underflow/overflow, then you may be interested in using CTS/RTS flow control.

Thanks, I will try and report the result.

Hello,i have already done it as you say,but the rev data error also occured.
Except CTS/RTS flow control, is any other way to solvel this problem? Beacause our hardware have fixed.

Do you have the ability to put a serial analyzer on the port to see traffic? Or perhaps at least an oscilloscope. At minimum it would be good to know that data is being sent when the receive side fails. Better would be to know if protocol analyzer can verify the settings the actual sent data is providing.

Can you verify if the issue changes or stays the same when switching between one stop bit and two stop bits? Can you verify if data comes through correctly at least part of the time? Can you verify the voltage levels are correct and symmetric/matching regardless of which side is sending the data?

Please ignore my colleague’s joke above.
1.I have observed the waveform input to TX2 with an oscilloscope, it’s ok. In order to make sure that the signal received by the receiving end of TX2 is correct, I also connected to my computer at the receiving end. The rev data on my PC is all ok.I confirm that there is no problem with the configuration of the sending end, because we have a product that uses TK1, and there is no problem connecting the same sending device to TK1.
2.I haven’t tried using two stop bits to experiment, I will test it.A strange phenomenon is that whenever the detection program is turned on, the GPU usage is high at this time, or when the detection program is turned off, buffer data loss is very prone to occur before or after this time.I send heartbeat clock information to TX2 every second, one or more data will be dropped in first frame, and then the next two frames(one frame contain 15 bytes) will not be read(I use the interrupt to accept the data),The next time the data will be OK.I have ensured that the level is 1.8V to TX2.

Under a high load it seems plausible that serial data could be dropped. I am not saying this is acceptable, but I am saying this needs to be considered now as a cause. Is it correct that the read error does not exist under light load, and seems to be associated with system load?

Assuming the serial UART does work correctly under some circumstances I would be interested in knowing about CPU0 (first core) load in comparison to other cores when the UART works versus when the UART fails. There are several ways to measure CPU load, but I will suggest install htop (“sudo apt-get install htop”). Run this, and during the test, simply look at the bar graph at top for the first core.

Some notes on why I am asking about this…

There are times when a hardware interrupt can only be serviced on the first core. I have not looked at this on the Jetsons in a long time, so anyone reading this might want to see if this is still the case, but there is an IRQ aggregator, and that aggregator on some of the Tegra hardware (and I think TX2) can only be handled on that first core. Software interrupts (ksoftirq) can be handled on any core. In cases where the first core (CPU0) is under load it may be that the driver is not available fast enough for the UART to do what it needs. This would be IRQ starvation, but I’m not sure this is what is going on, I am only suggesting this as a possibility.

Are you running the Jetson in max performance mode with all cores enabled? You did not mention the release you are using, but in more recent releases you would make all cores and clock ranges available, followed by forcing to max of those ranges, via this:

sudo nvpmodel -m 0
sudo jetson_clocks

Regardless of whether this is actually a CPU0 IRQ starvation or not, I think that running in max performance mode would be your next check, but this would be more important if there is a driver dependent upon the first core.

Thanks, I have used htop to see first cpu core(in max performance mode),It just only used around 30%.

If you have more suggestions,please tell me.
I will look for the problem and if it’s solved I will reply.

You mentioned it is an RX error on the Jetson side, and that it is a “sometimes” error. During that error does the data go completely missing, or does the data corrupt? Can you say more about the data itself, e.g., is it just some fixed set of bytes, ASCII text, so on?

What about the program which actually reads the heartbeat? Does the program buffer the data? Is the program in kernel space or is the program a user space program?

Also, can you describe the physical wiring? Is this twisted pair and shielded? Is this short/long?

I send my heartbeat data per second like this(hex data):
EF 03 03 00 01 02 0F 07 00 04 00 00 00 00 0F
EF 03 03 00 01 02 10 07 00 04 00 00 00 00 10
EF 03 03 00 01 02 11 07 00 04 00 00 00 00 11
EF 03 03 00 01 02 12 07 00 04 00 00 00 00 12
EF 03 03 00 01 02 13 07 00 04 00 00 00 00 13
EF 03 03 00 01 02 14 07 00 04 00 00 00 00 14
EF 03 03 00 01 02 15 07 00 04 00 00 00 00 15
EF 03 03 00 01 02 16 07 00 04 00 00 00 00 16
EF 03 03 00 01 02 17 07 00 04 00 00 00 00 17
EF 03 03 00 01 02 18 07 00 04 00 00 00 00 18
EF 03 03 00 01 02 19 07 00 04 00 00 00 00 19
EF 03 03 00 01 02 1A 07 00 04 00 00 00 00 1A

And it’s rev error on TX2 like this:
2019-12-21 01:29:54.400 ERROR head value error:00000008!!!
2019-12-21 01:29:54.420 ERROR uart read data error:00,error len:8!!!

2019-12-20 23:06:37.833 ERROR uart read data error:00,error len:9!!!
2019-12-20 23:06:37.853 ERROR uart read data error:00,error len:6!!!

2019-12-20 15:08:10.149 ERROR min read 4!!! error,read len:4,read_data ret:00000000
2019-12-20 15:08:10.171 ERROR uart read data error:00,error len:1!!!

My interrupt acceptance method is:Accept 10 interrupt data first, and wait for other 5 data from kernel space to my program buffer the data. Then I combin this 15 data and CRC in my program.

I use RS422 for transmission, and the cable is twisted pair.I confirm that it is not caused by interference, i have check the rev data both on tx2 and pc,the data it is ok to my pc, but at the same time TX2 is error. And product that uses TK1 in the same environment is ok. This error will appears on all TX2, not one or two.

When data goes wrong, the next two seconds will lost, it just like uart already dead.

For the error information, is it possible to see a side-by-side comparison of input line and output line? For example, if you send and receive correctly, it might show this:

TX:EF 03 03 00 01 02 0F 07 00 04 00 00 00 00 0F
RX:EF 03 03 00 01 02 0F 07 00 04 00 00 00 00 0F

The reason I ask is that I’m interested in seeing if there is bit shifting going on, or if it is a random drop or corruption (which would indicate different causes).

Would it be practical for you to temporarily try lines of all 0xFF, all 0x00, or all “0x55 0xAA”? These are somewhat easier to see shifts versus corruption if the original lines are not easy to capture.

If you have a protocol analyzer, and the TX/RX can be added side-by-side with the program’s idea of what was sent and received, then this too might help. Any number of the above tests would be of interest, depending on what is practical for you to test.

I haven’t worked much with RS-422, but do you have some sort of adapter close to the UART at both sides of the connection? I am thinking that the built in hardware will have the logical ability to work with this, but not the electrical capabilities with the longer lines and higher speeds. In particular, the UARTs which are integrated are known to need two stop bits at higher data rates (which would be true even with an adapter). Whatever description you can give of the physical wiring and any kind of adapter would be useful.

Also, the UARTs are by default 3.3V TTL level, and so if your RS-422 uses 5V, then success would be hit and miss when directly connecting without level shifting. Level shifting will in turn add jitter and latency, and could get in the way at higher speeds.

Thanks, I will use two stop bits and do more test.