Orin NX Generic Timestamp Engine (GTE) Crashing Issues

Hello,

We are having some issues with our software that runs on the Orin and uses the Nvidia GTE (Generic Timestamp Engine)

We have a software service that uses a gpio pin input for precise timing of a peripheral. Seemingly randomly in the field, the software crashes due to a timing issue, and when it dies, the kernel takes ownership of the GPIO pin we were using.

We cannot seem to get the kernel to release the pin after this event, without fully rebooting the orin.

Using gpioinfo, we see that after reboot, the pin shows as:
line 13: "PCC.01" input

When the software is running, it appears as:
line 13: "PCC.01" input consumer="gevent13"

When the software crashes due to the aforementioned issue, we get:
line 13: "PCC.01" input consumer="kernel"

using lsfd I don’t see any process having opened the GPIO’s file descriptor either after the process crashes. It seems as though the pin is owned by the kernel and it won’t release it.

I have not been able to re-create the issue on a benchtop, we only seem to see it in the field. I have re-created the “timing issue” that causes the software to crash, but on my bench with my board it dies gracefully and the pin is released.

When the problem occurs, there are no dmesg messages printed when the crash occurs, only afterwards when the process restarts, where it prints:

[Thu May  9 00:40:20 2024] tegra_gte c1e0000.gte: failed request irq
[Thu May  9 00:40:24 2024] tegra_gte c1e0000.gte: failed to request gpio
[Thu May  9 00:40:37 2024] tegra_gte c1e0000.gte: failed to request gpio
[Thu May  9 00:40:51 2024] tegra_gte c1e0000.gte: failed to request gpio

It says “failed to request irq” the first time it restarts, and “failed to request gpio” for every subsequent try.

I was wondering if anyone has seen the GTE go into this state before, so that we can inform our debugging. So far I have tried a number of hardware and software steps to attempt to re-create the problem but I have been unsuccessful thus far.

Sorry for the late response.
Is this still an issue to support? Any result can be shared?

Hi! Yes this is still an issue. No further updates yet, we are still running into the issue intermittently. Just looking for any guidance that can be shared for more info that I can share or insight into GTE.

Hi vikram.bhat,

Are you using the devkit or custom board for Orin NX?
What’s your Jetpack version in use?

For GTE, have you referred to Generic Timestamp Engine — NVIDIA Jetson Linux Developer Guide 1 documentation for details?

It seems this pin is busy and it can not be requested currently.
Are you using PCC.01 for GTE?
Have you confirmed that it is not used by others?

Hi Kevin,

We are using a custom board for the Orin NX.

We are currently on Jetpack version 5.1.2

We have looked through that guide but haven’t found anything yet that points to the issue we are having. Most of the time the GTE seems to run “nominally” without errors, however when we do hit it seemingly randomly, we aren’t able to resolve it without a reboot.

The error presents as someone else using the pin (PCC.01 is the pin we are timestamping), but no other user space process is grabbing the pin from what we can tell (using gpioinfo and lsfd). It seems like the GTE engine isn’t releasing the pin properly when the program exits, but I’m not sure how to dig in further.

Could you get the devkit to verify if there’s the same issue?

Have you tried if other AON GPIO pin would work?

Could you share the result of the following command at this moment?

$ sudo cat /sys/kernel/debug/gpio|grep PCC.01

Thanks for your reply. We will check if we can run it on a devkit, but the process that’s dieing depends on having an IMU connected so it may be difficult to replicate.

We haven’t tried another GPIO yet, as the signal is wired to this one. Do you have any insight as to if this GPIO is different or expected to work differently from other AON gpio?

Thanks for the command, the next time we see the issue comes up I will run it.

That is just helping to clarify if the issue is specific to this GPIO(PCC.01).

You can also check if there’s any other driver/service/process running when you hit the issue.

Hi I’m working with Vikram on this problem

I can reliably get it to go into this state by creating a softlockup if you wanna reproduce it on your end

sudo stress-ng --cpu 8 --io 4 --vm 2 --vm-bytes 128M --fork 4 --timeout 5s --softlockup 10

After this returns, we can’t register the GPIO anymore.

Obv softlockup’s are not ideal, but we wanna be able to recover our timestamp engine without rebooting if something went wrong

root@ubuntu:/sys/class/gpio# sudo stress-ng --cpu 8 --io 4 --vm 2 --vm-bytes 128M --fork 4 --timeout 5s --softlockup 10
stress-ng: info:  [12611] dispatching hogs: 8 cpu, 4 io, 2 vm, 4 fork, 10 softlockup
stress-ng: info:  [12611] successful run completed in 5.09s
root@ubuntu:/sys/class/gpio# sudo cat /sys/kernel/debug/gpio|grep PCC.01
 gpio-329 (PCC.01              )
root@ubuntu:/sys/class/gpio# echo 329 > export 
root@ubuntu:/sys/class/gpio# ls
export  gpiochip316  gpiochip348  PCC.01  unexport
root@ubuntu:/sys/class/gpio# cat PCC.01/value 
1
root@ubuntu:/sys/class/gpio# cat PCC.01/direction 
out

I’ve tried to run the command you shared on my devkit.
After running that command, i can still export and use PCC.01.
Please help to check at your side.

You need to be running the gte_mon example program w/ PCC.01 before running those steps for it to fail

The pin by itself will be available if the GTE is not involved.