Hello,
We are having some issues with our software that runs on the Orin and uses the Nvidia GTE (Generic Timestamp Engine)
We have a software service that uses a gpio pin input for precise timing of a peripheral. Seemingly randomly in the field, the software crashes due to a timing issue, and when it dies, the kernel takes ownership of the GPIO pin we were using.
We cannot seem to get the kernel to release the pin after this event, without fully rebooting the orin.
Using gpioinfo
, we see that after reboot, the pin shows as:
line 13: "PCC.01" input
When the software is running, it appears as:
line 13: "PCC.01" input consumer="gevent13"
When the software crashes due to the aforementioned issue, we get:
line 13: "PCC.01" input consumer="kernel"
using lsfd
I don’t see any process having opened the GPIO’s file descriptor either after the process crashes. It seems as though the pin is owned by the kernel and it won’t release it.
I have not been able to re-create the issue on a benchtop, we only seem to see it in the field. I have re-created the “timing issue” that causes the software to crash, but on my bench with my board it dies gracefully and the pin is released.
When the problem occurs, there are no dmesg messages printed when the crash occurs, only afterwards when the process restarts, where it prints:
[Thu May 9 00:40:20 2024] tegra_gte c1e0000.gte: failed request irq
[Thu May 9 00:40:24 2024] tegra_gte c1e0000.gte: failed to request gpio
[Thu May 9 00:40:37 2024] tegra_gte c1e0000.gte: failed to request gpio
[Thu May 9 00:40:51 2024] tegra_gte c1e0000.gte: failed to request gpio
It says “failed to request irq” the first time it restarts, and “failed to request gpio” for every subsequent try.
I was wondering if anyone has seen the GTE go into this state before, so that we can inform our debugging. So far I have tried a number of hardware and software steps to attempt to re-create the problem but I have been unsuccessful thus far.