UART communication randomly blocks

I’m using UART on my Jetson Nano 2GB to communicate with a motor controller, and it’s working fine most of the time. However, recently an issue started to occur: After some time, a call to read() to read from the serial port blocks and never returns. When I attach gdb, I see that read() is called. Some more information:

  • I use C to communicate (I can show code if required).
  • I set VTIME and VMIN to 0.
  • Baudrate is 921600 with 2 stopbits.
  • I use /dev/ttyTHS1 on pins 8 and 10.
  • /var/log/kern.log shows nothing.

I found out that the code to handle this bus is in Linux_for_Tegra\source\public\kernel\kernel-4.9\drivers\tty\serial\serial-tegra.c, but I didn’t find anything obvious there that would help me. Any hint on what to look at there?
How can I debug the kernel, so I can see where it hangs?

Thanks and best regards!

1 Like

hello heyho123,

please setup a terminal and running $ dmesg --follow in the background to gather the logs,
you should repo the issue and check the failures from kernel side, please share the error messages for reference,

Would it be possible for you to test with two stop bits instead of one?

Thanks JerryChang, I’ll test with dmesg --follow as soon as I can and report then.

@ linuxdev. I already am using two stop bits. I can test with one, if you think that would help.

No need to test single stop bit. Two is already the more reliable case.

1 Like

Okay, I was finally able to reproduce this while running dmesg --follow. Sadly, there was no output. At least none after booting.

One difference though: This time, the serial port on the J12 Button Header on pin 3/4 via /dev/ttyS0 hanged while reading. (I have two motor controllers connected, one on each serial port).
Everything else is the same though. __libc_read() is called and then there is no return until I kill the process.

Is there a way to make a memory dump that can be inspected to find out where in the kernel it hangs?
Or should I maybe add logging output in serial-tegra.c?
Could this be caused by a recent OS update? I installed security updates recently and I didn’t have any problems like this until then.

I should also mention that I followed these instructions on how to make /dev/ttyS0 work.
Is it possible that this tegra_combined_uart driver causes this?

I couldn’t say much as to how to debug in this case. I’d honestly think the best method is to use a logic analyzer and watch the actual traffic since it might be a signal issue and might not be a software issue. An example is if the UART at one end sends the data, but the other end does not realize anything was sent…then it would wait forever. However, it could be something in the software, but it is unlikely to be the case since __libc_read() is quite reliable and well-tested.

Serial protocol analyzers are probably the cheapest of all analyzers (though overall not cheap if for example it is on an MSO). Is there any chance you can throw a protocol analyzer on it?

Thanks for the reply!
I agree that it is probably caused by a signal issue. But still, it shouldn’t cause a blocking call, because I set VTIME to zero. If there is no data to read, it should return 0 or -1.
And sure, the bug isn’t in __libc_read() directly, but that function probably calls a driver from NVIDIA and my guess is that there is some bug.

I already figured out how to get the kernel stack:

sudo cat /proc/7224/stack
[<ffffff80080863bc>] __switch_to+0x9c/0xc0
[<ffffff80080c5f48>] ptrace_stop+0x128/0x230
[<ffffff80080c76f0>] get_signal+0x338/0x578
[<ffffff800808b090>] do_signal+0x70/0x500
[<ffffff800808b698>] do_notify_resume+0x90/0xb0
[<ffffff800808379c>] work_pending+0x8/0x10
[<ffffffffffffffff>] 0xffffffffffffffff

But this doesn’t make sense to me. I expected the bottom of this stack to be a read syscall or something.

I’ll look into buying a logic analyzer, but I’d prefer to solve this via software first.

Ok, it was my own stupidity…
The read call wasn’t blocking afterall, but returning 0 as it should. My own code was calling it in a loop, without me realizing it.

I’d say you do have a logic analyzer! :)

This topic was automatically closed 14 days after the last reply. New replies are no longer allowed.