Tegra HSUART speeds at and above 4Mbaud lose characters, crash system

I think I am using an AGX Xavier; I am accessing it remotely via SSH.

The ultimate goal is to transfer serial data at the higher baud rates (12MBaud+). As that does not currently work for me, I am asking for any advice to what steps I can take to fix or improve my approach.

The current experience is that

  • the Tegra HSUART starts dropping loopback characters at 4Mbaud speeds, if not lower speeds
  • the entire system crashes at speeds above 4Mbaud (8Mbaud, 12Mbaud, 12.5Mbaud),
    • and eventually fails to recover fully on subsequent reboots, requiring a hard reset.
    • This is a hard crash, with no discernible kernel logs.

The code I am using is at this Github repository: https://github.com/drbitboy/sst. That seems to work correctly on other, properly functioning serial ports on other Linux platforms.

Some of the details of my setup are in the Jetson/ subdirectory of that Github repository; let me know if more information would be useful.

Notes

  • I have modified the group ownership and permissions of /dev/ttyTHS0 so a non-root user can run the application.
  • I have disabled the systemd nvgetty.service.

hello drbitboya,

may I also know which JetPack release version you’re using.
please check $ cat /etc/nv_tegra_release for confirmation.

what’s the serial port configuration you’re using. had you try 2 stop bits.
please also check TTY settings, for example, $ sudo stty -F /dev/ttyTHS1
thanks

Thank you.

% more /etc/nv_tegra_release

# R35 (release), REVISION: 1.0, GCID: 31346300, BOARD: t186ref, EABI: aarch64, DATE: Thu Aug 25 18:41:45 UTC 2022

Yes, I am using two stop bits. Please look at the TTY Settings section of the README of the repo; that refers you to raw.sst.settings.txt which has the stty-style settings that I use, including cstopb (two stop bits) not -cstopb (one stop bit).

Also note that even if I had been using just one stop bit, that still does not explain why the system crashes, and eventually leaves the serial port in an unusable state, at baudrates above 4Mbaud, which is the other problem to be solved.

TL;DR

If you look at the Github repo link /drbitboy/sst I put in the original post, you would see that

  1. the file raw.sst.settings.txt has the “raw” settings from/to the stty command that I use for /dev/ttyTHS0, and that file has cstopb, which means two stop bits, not -cstopb,
  2. the include file raw_settings.h has that same setting, cstopb in string static char const raw_settings[], and
    2.1. the stty_raw_config(…) routine, starting at line 279 in that same include file raw_settings.h, uses that string to configure the serial port.
  3. Here is the stty output after running the sst app of that repo with option –do-raw-config, with the stop bit option shown to be cstopb (two stop bits) and not -cstopb (one stop bit):

% stty -a -F /dev/ttyTHS0

speed 115200 baud; rows 0; columns 0; line = 0;
intr = ; quit = ; erase = ; kill = ; eof = ; eol = ; eol2 = ; swtch = ; start = ; stop = ;
susp = ; rprnt = ; werase = ; lnext = ; discard = ; min = 1; time = 0;
parenb -parodd -cmspar cs8 -hupcl cstopb cread clocal crtscts
-ignbrk -brkint -ignpar -parmrk inpck -istrip -inlcr -igncr -icrnl -ixon -ixoff -iuclc -ixany -imaxbel -iutf8
-opost -olcuc -ocrnl -onlcr -onocr -onlret -ofill -ofdel nl0 cr0 tab0 bs0 vt0 ff0
-isig -icanon -iexten -echo -echoe -echok -echonl -noflsh -xcase -tostop -echoprt -echoctl -echoke -flusho -extproc

`

Akismet, the automated moderator, hid my reply, probably after I edited it a few times.

Here is the release info:

% more /etc/nv_tegra_release

# R35 (release), REVISION: 1.0, GCID: 31346300, BOARD: t186ref, EABI: aarch64, DATE: Thu Aug 25 18:41:45 UTC 2022

and yes I am using two stop bits; refer to the Github repo link for details of how I am doing that, and see below:

% stty -a -F /dev/ttyTHS0

speed 115200 baud; rows 0; columns 0; line = 0;
intr = ; quit = ; erase = ; kill = ; eof = ; eol = ; eol2 = ; swtch = ; start = ; stop = ;
susp = ; rprnt = ; werase = ; lnext = ; discard = ; min = 1; time = 0;
parenb -parodd -cmspar cs8 -hupcl cstopb cread clocal crtscts
-ignbrk -brkint -ignpar -parmrk inpck -istrip -inlcr -igncr -icrnl -ixon -ixoff -iuclc -ixany -imaxbel -iutf8
-opost -olcuc -ocrnl -onlcr -onocr -onlret -ofill -ofdel nl0 cr0 tab0 bs0 vt0 ff0
-isig -icanon -iexten -echo -echoe -echok -echonl -noflsh -xcase -tostop -echoprt -echoctl -echoke -flusho -extproc

Btw, as noted in the repo, COLUMNS=1 stty -a -F /dev/ttyTHS0 is a more useful way to check the settings, because you can pipe the output to grep cstopb.

hello drbitboya,

please also gather the last kernel logs which indicate system crash.
you may setup a terminal and running $ dmesg --follow to keep receiving logs.
please share the details for reference, thanks

Here it is. This is after it had failed several times. The logging might be different after a hard reset.

Has anyone run my sst (Serial Stress Test) code on another system? It would be useful to know if this is a problem with our particular unit and configuration or if it is a problem with the package itself.

The code is straightforward to download, build and run. Let me know if the documentation is not sufficient.

hello drbitboya,

I’ve not yet perform that sst (Serial Stress Test) locally,
anyway, there’s kernel message indicate an error… configured baud rate is out of range by 625. this 625 is the diff of configured baud-rate and required baud-rate.

according to device tree,
$public_sources/kernel_src/hardware/nvidia/soc/t19x/kernel-dts/tegra194-soc/tegra194-soc-uart.dtsi
its default baud-rate settings is 115200/8n1, had you also update this property for testing?

        uarta: serial@3100000 {
                compatible = "nvidia,tegra186-hsuart";
...
                reset-names = "serial";
                nvidia,adjust-baud-rates = <115200 115200 100>;

Yes, my sst app can set the baud rate. See stty_set_speed(…) in raw_settings.h: for baud rates at 4Mbaud and below, it uses the macros from (#include <asm-generic/termbits.h>?) specific to each baud rate; above 4Mbaud, it uses the BOTHER macro.

  1. Is it acceptable that the Jetson should crash from an incorrect setting of the baud rate?
  2. Also, after a crash and reboot, why does the HSUART not work at all at any baud rate, even baud rates at which it worked before, requiring a hard reset to fix?
  3. What is wrong with how I set the baud rate (see routine sst_set_speed in raw_settings.h)? I have tried other rates with no success.
  4. If those kernel log errors are the problem, then why does the application code return a success response to the user code when setting those baud rates?

I ran it with a BOTHER baud rate of 8000625, and the kernel log said the difference was 624, and the system crashed.

I then ran it with a BOTHER baud rate of 8390625 (= 8M + 625*625), and there was no kernel log output (sudo dmesg --follow) at all, and the system crashed.

From everything I have done so far, I see no evidence that the HSUART can run at anything above 4Mbaud, and it also has trouble at that rate.

Also, any attempt to run at advertised baud rates above 4Mbaud will cause the system to crash.

hello drbitboya,

some comments to follow-up this thread.

are you running the UART with HW Flow control or not (RTS & CTS connections)?
if HW Flow control is disabled, then missing characters are expected at higher baudrates. When enabled properly, we should not see any missing characters.

for setting a custom baudrate, the UART clock needs to run at (baudrate * 16) MHz, however, this is not always achievable.
the HSUART supports baudrates up to 12M, but this should not be used without HW Flow control (otherwise missing characters would be seen) and custom baudrates are not always guaranteed to work. besides, all the standard baudrates are supported.

here’s kernel patch to fix uart error handler, b3e3f3f.diff (1.8 KB)
could you please give it a try about OS crash issues.
thanks

Thanks Jerry!

Just to confirm: if there needs to be HW flow control (RTS/CTS) for this to work, then large-scale/continuous data transfer at baudrates above some level (e.g. 4Mbaud? 12.5Mbaud?) is not possible. Is that correct?

Not sure if I have the resources to test that kernel patch, but I will kick it up the chain.

Best regards,

Brian T. Carcich

I had been setting crtscts in the terminfo, but we did not have the pins connected.

Those pins are now connected, and communication is still unreliable at 4Mbaud, and still crashes above that speed.

Here is what I see from [dmesg --follow] at 4Mbaud.

[ 41.953090] vdd-3v3-slt: disabling
[ 290.855641] ttyTHS ttyTHS0: 1 input overrun(s)
[ 361.931287] ttyTHS ttyTHS0: 1 input overrun(s)
[ 440.812792] ttyTHS ttyTHS0: 29 input overrun(s)

Hi Again Jerry,

Btw, I have been using /dev/ttyTHS0. I stopped the serial-getty@ttyTCU0.service process. But I was wondering if there might be other processes writing to /dev/ttyTHS0, e.g. I have seen some posts on this forum about kernel messages and serial ports. Is there any documentation about this you can point me to?

Thanks.

hello jerry,

I set up a cross-compiling environment (based on this). I successfully built the base arm kernel, both with and without the real-time patches (./kernel-5.10/scripts/rt-patch.sh), including reverting the real-time patches and building the kernel again.

However, when I applied the diff file you uploaded, the build failed. Here is the log.
build_tegra-patch-b3e3f3f.log (108.9 KB)

Hi @JerryChang,

I figured out why that kernel build failed: there was summat missing in that kernel patch/commit b3e3f3f.diff.

This kernel patch b3e3f3f+drbitboya.diff (438 Bytes) is incremental from the result of that patch; this patch adds one line necessary to ensure the kernel compiles successfully; whether that patched serial-tegra.c does everything you intended is more than I have resources to check right now.

This kernel patch drbitboya.diff (2.1 KB) combines that first kernel patch b3e3f3f.diff and my incremental patch (b3e3f3f+drbitboya.diff above), i.e. it is applied to the version of serial-tegra.c (git object hash 1a46a86) that was in the repo before your patch.

Best regards,

Brian Carcich

This topic was automatically closed 14 days after the last reply. New replies are no longer allowed.