TX2 serial UARTs ttyTHS2 sometimes receive wrong message

I am using multiple TX2 boards to do high speed object tracking, each TX2 is connect to a camera.
I combine two TX2 as a stereo camera system.
Cameras will be triggered at the same time and send the image to TX2 for image processing.
After processing, I will get the coordinate(x, y) of object in the image, image counter and a flag to stand that I detect the object or not.
In order to form a stereo camera system, one TX2 has to send the information to another TX2 by serial port (ttyTHS2).
And this part confuse me a lot.
The serial port works correctly for most of the time, while in some timing, the serial port will give me the wrong message.
I am using thread with while loop to keep waiting and reading for serial port.

while(1) {
     get_time1
     read(...) 
     get_time2
     calculate the time interval ( time2 - time1 )
     process the message I receive
     save the message and time interval into file for debug
     send to another thread to do opencv triangulatePoints
}

I got 3 different types of the wrong message in different tests.
the following message will in the format of (# while loop counter, x y image_counter object_flag. , time interval)

Type 1:

# 774 , 284 361 774 1. , 0.003909
# 775 , 283 361 775 1. , 0.004055
# 776 , 284 361 776 1. , 0.003854
# 777 , 284 361 776 , 0.004098
# 778 , 777 1.1 , 0.000002
# 779 , 285 361 778 1. , 0.003771
# 780 , 284 361 779 1. , 0.003956
# 781 , 284 361 780 1. , 0.003952
# 782 , 284 361 781 1. , 0.003951

Type 2:

# 665 , 284 291 665 1. , 0.004000
# 666 , 284 291 666 1. , 0.003923
# 667 , 284 291 667 1. , 0.003948
# 668 , 284 291 668 1 , 0.003917
# 669 , 1.4 , 0.000034
# 670 , 284 291 669 1. , 0.003862
# 671 , 283 291 670 1. , 0.004009
# 672 , 284 291 671 1. , 0.004011
# 673 , 284 291 672 1. , 0.003938

Type 3:

# 610 , 216 286 610 1. , 0.003913
# 611 , 222 276 611 1. , 0.003957
# 612 , 217 285 612 1. , 0.003973
# 613 , 217 286 613 1. , 0.003991
# 614 , 216 286 614 1.216 286 615 1. , 0.009115
# 615 , 217 286 616 1.216 286 615 1. , 0.002787
# 616 , 216 286 617 1.216 286 615 1. , 0.003928
# 617 , 217 286 618 1.216 286 615 1. , 0.003980
# 618 , 216 286 619 1.216 286 615 1. , 0.003935

I am using a dot to represent the end of each message, and I trigger the cameras every 4 ms, thus the message should came with period around 4 ms.
From the above messages, type 1 and type 2 missing the dot, and one message is being cut into two parts.
And observing the time interval, it can be found out that the second message came right after the first one.
Also, there is a strange number, which is not part of my message came after the dot.

From type 3, the serial buffer keeps copying one section of message, then it keeps doing so till the end of my program.

Once a time, I have try with

dmesg | grep tty

and i got the message

RxData DMA copy to tty layer failed

This made me worry about whether the error came from memory copying stage or not.

Now, I am using the 8n1 and baudrate 115200.
Actually, this error won’t appear every time.
It seems that there is no rules that the error occurred frequency can be observed.
Sometimes it appears two times in one running of program, and sometimes the program works well in consecutively 5 times.
It really bother me a lot since I run the same code successfully on TK1 for the past time being.

Please help with this strange problem.

hello hzchen,

it looks some message dropping, may I know did you enable RTS/CTS for UART flow controls?
thanks

I try with enable CTS/RTS, but I still receive type 3 message

hello hzchen,

could you please check with having getty service on the background,
for example,
$ sudo /sbin/getty -a ubuntu -L 115200 ttyTHS<port> &

Sorry, I am not familiar with getty. Here is some questions about how I check it.
Is it still run the program on TX2, and open another terminal with the above code?
If yes, how to check the getty result?
Or I just run the above getty code to catch the serial port directly without my original program?

To disable getty (which is what you would want to do to use the serial device as your own purpose instead of as a terminal), run this from a terminal (other than from serial console):

sudo systemctl stop nvgetty.service
sudo systemctl disable nvgetty.service

To view if getty is running for a particular ttyTHS you would look at the file permissions. Because /dev/ttyTHS2 is just a different driver to the same UART which also corresponds to /dev/ttyS2, you would know that if either of these are group “tty”, then it is part of a serial terminal and is not available for your own custom purpose. Run command “ls -l /dev/ttyTHS2” and “ls -l /dev/ttyS2”. Result will be with group as one of either “tty” (getty running, and device not usable for custom purpose) or group “dialout” (device available for your purpose and not currently used as serial console).

Example, still running getty (group tty):

# ls -l /dev/ttyTHS2 /dev/ttyS2
crw-rw---- 1 root tty   4, 66 Oct 24 12:10 /dev/ttyS2
crw-rw---- 1 root tty 238,  2 Oct 24 12:10 /dev/ttyTHS2

Example, not running getty (group dialout):

# ls -l /dev/ttyTHS2 /dev/ttyS2
crw-rw---- 1 root dialout   4, 66 Oct 24 12:10 /dev/ttyS2
crw-rw---- 1 root dialout 238,  2 Oct 24 12:10 /dev/ttyTHS2

If you look at all ttyTHS# and all ttyS#, then anything dialout is free, and anything tty is spoken for:
ls -l /dev/ttyTHS* /dev/ttyS*

The old “dialout” group was invented for a general class of serial devices used as a modem to dial out to the internet on. The “tty” group is named after terminals, where tty was a teletype.

After I running getty on the background.
/sbin/getty -a nvidia -L 115200 ttyTHS2 &
Then I try doing serial read, most of the messages I got went wrong.
Many of them lost the head of message.

And I try to follow the above steps, stop and disable getty with the commands
sudo systemctl stop nvgetty.service
sudo systemctl disable nvgetty.service

And I got
Failed to stop nvgetty.service: Unit nvgetty.service not loaded.

I do with systemctl list-units --all and systemctl list-unit-file
But I cannot find the nvgetty.service
I wonder if any steps I missed or misunderstood
BTW, I am using JetPack 3.3.

FYI, JetPack3.3 is very old, and so I don’t know if it follows all of the same rules for setup. In that L4T series (L4T is what actually gets flashed, JetPack is just a front end) the NVIDIA content was as a loose set of files, and not via package. It is possible that naming of services and content actually run as a service have changed. The part which will not change is that group “dialout” will occur on serial port group for available UARTs, and that group “tty” says it is run as a serial console and should not be used for custom purposes (unless it is disabled from tty first).

Did you use sudo with the getty command? It won’t work if you don’t.

Before starting (after a fresh boot), what do you see from:
systemctl list-units | egrep nvgetty

If you then run the previous getty command in another terminal, and without background, does it show any errors?
sudo /sbin/getty -a nvidia -L 115200 ttyTHS2

I do with sudo -s and then use the command

It works the same as “sudo /sbin/getty -a nvidia -L 115200 ttyTHS2

But I am curious about the result from “ls -l /dev/ttyTHS2 /dev/ttyS2
Before getty
crw-rw---- 1 root dialout 4, 66 Oct 29 16:23 /dev/ttyS2
crw-rw---- 1 root dialout 238, 2 Oct 29 16:23 /dev/ttyTHS2

After getty
crw-rw---- 1 root dialout 4, 66 Oct 29 16:23 /dev/ttyS2
crw------- 1 nvidia tty 238, 2 Oct 29 17:21 /dev/ttyTHS2
Not both of their status become tty after getty

No matter I run the getty directly or on the background
the serial result looks like the same, still some of messages lost the head.

Maybe I can flash new version Jetpack with new version L4T later.
But for Jetpack 3.3, I getting nothing from “systemctl list-units | egrep nvgetty

This means getty attached to “/dev/ttyS2”. Typically the serial console avoids the THS driver because it is not available in early boot stages. The legacy S driver is used for serial console. You should not use both drivers at the same time on the same UART, and so this means ttyTHS2 is not available, and that ttyS2 is a serial console.

By this, do you mean no units are showing?

…if so, I am thinking this is just the earlier L4T design. The newer releases will use this as a service unit.