Nvethernet PTP bug

peckama2 · June 30, 2023, 6:48pm

I found two issues running linuxptp on Jetson AGX Orin 32 GB + Forge carrier using the 10 GBe port. If I get the docs correctly, the carrier should not influence this as it only adds a PHY, but timestamping happens in the MAC which is integrated into the tegra core.

I run the default L4T from Jetpack 5.1.1 with carrier BSP.

1) Sometimes, polling for TX timestamp takes long

The default setting for tx_timestamp_timeout in linuxptp is 10 milliseconds. With other NICs, this is never a problem. But with the 10 GBe port, it happens irregularly that this timeout is too short and timestamp polling fails (and I don’t transmit almost anything over the port). I’m now trying with 20 ms and haven’t seen the problem yet. Can you confirm such a long delay is expected?

When this happens, dmesg is silent and linuxptp debug log says:

ptp4l[14427.236]: rms   29 max   49 freq +16416 +/-  40 delay  8765 +/-   2
ptp4l[14428.235]: rms   36 max   65 freq +16438 +/-  45
ptp4l[14428.764]: port 1 (mgbe0): delay timeout
ptp4l[14428.774]: timed out while polling for tx timestamp
ptp4l[14428.774]: increasing tx_timestamp_timeout may correct this issue, but it is likely caused by a driver bug
ptp4l[14428.774]: port 1 (mgbe0): send delay request failed
ptp4l[14428.774]: port 1 (mgbe0): SLAVE to FAULTY on FAULT_DETECTED (FT_UNSPECIFIED)
ptp4l[14428.824]: waiting 2^{4} seconds to clear fault on port 1 (mgbe0)
ptp4l[14444.824]: clearing fault on port 1 (mgbe0)

2) `tegra_register_hwtime_source` adds duplicate entries

I found about this bug because of the previous bug. When the linuxptp failed to poll a timestamp, it re-set the synchronization. This re-set calls (among others) also the SIOCSHWTSTAMP ioctl handled by ether_handle_hwtstamp_ioctl in nvethernet/ptp.c . Whenever this ioctl handler is called, it calls tegra_register_hwtime_source from platform/tegra/ptp-notifier.c .

This function has a 5-item buffer for callbacks and net devices. However, when it is called multiple times for the same device, it doesn’t figure out it already has a callback for this device and adds a new one. After 5 calls, the callback buffer is full and this starts to be shown in the log:

[14551.813229] nvethernet 6810000.ethernet: config.flags = 0x0, tx_type = 0x1,rx_filter = 0xc
[14551.813259] Maximum registrations reached

I think the function should check if it already has a callback for the given netdev, and if it does, it should replace it.

peckama2 · June 30, 2023, 6:56pm

To replicate issue 2:

sudo hwstamp_ctl -i mgbe1 -r 12 -t 1  # Run this command several times

hwtstamp_ctl is a part of GitHub - richardcochran/linuxptp: User space PTP stack for the GNU/Linux operating system. .

peckama2 · June 30, 2023, 10:38pm

Update to issue #1: even with 20 ms timeout, I got one timed-out poll after approx. 1 hour.

WayneWWW · July 3, 2023, 2:57am

Could you share the commands to reproduce issue 1 too?

peckama2 · July 3, 2023, 8:00am

To replicate issue 1, download master branch of linuxptp to two computers, build and install it. Copy the configs folder to /etc/linuxptp. Connect the two computers to the same wired network.

Run this command on a different PTP-capable computer:

sudo ptp4l -H -i eno1 -f /etc/linuxptp/automotive-master.cfg --step_threshold=0.001 -m -l6

And run this on the Orin:

sudo ptp4l -H -i mgbe0 -f /etc/linuxptp/automotive-slave.cfg --step_threshold=0.001 -m -l7

You can try setting different values of tx_timestamp_timeout in the automotive-slave.cfg. The default is 10.

If you try with 10 or 5, the timeout will occur much more often.

I now let linuxptp run over the weekend with 20 ms timeout, and here’s the log of events when the timestamp polling timed out. Looking at the timestamps at the beginning of the lines, you can tell how otfen the polling fails - approx. once in 5000 seconds. Linuxptp is configured by default to send one delay request a second, and delay request packets are exactly those that need to poll the tx timestamp. So generalizing a bit, it seems that with 20 ms timeout, there is a 1/5000 chance that tx timestamp polling will fail.

[27886.568202] nvethernet 6810000.ethernet: config.flags = 0x0, tx_type = 0x1,rx_filter = 0xc
[27886.568261] Maximum registrations reached
[37519.798199] nvethernet 6810000.ethernet: config.flags = 0x0, tx_type = 0x1,rx_filter = 0xc
[37519.798258] Maximum registrations reached
[43891.268918] nvethernet 6810000.ethernet: config.flags = 0x0, tx_type = 0x1,rx_filter = 0xc
[43891.268978] Maximum registrations reached
[48917.816830] nvethernet 6810000.ethernet: config.flags = 0x0, tx_type = 0x1,rx_filter = 0xc
[48917.816897] Maximum registrations reached
[53400.743378] nvethernet 6810000.ethernet: config.flags = 0x0, tx_type = 0x1,rx_filter = 0xc
[53400.743443] Maximum registrations reached
[58950.664277] nvethernet 6810000.ethernet: config.flags = 0x0, tx_type = 0x1,rx_filter = 0xc
[58950.664339] Maximum registrations reached
[64958.753548] nvethernet 6810000.ethernet: config.flags = 0x0, tx_type = 0x1,rx_filter = 0xc
[64958.753607] Maximum registrations reached
[67624.451120] nvethernet 6810000.ethernet: config.flags = 0x0, tx_type = 0x1,rx_filter = 0xc
[67624.451184] Maximum registrations reached
[80001.141025] Maximum registrations reached
[86908.296045] nvethernet 6810000.ethernet: config.flags = 0x0, tx_type = 0x1,rx_filter = 0xc
[86908.296105] Maximum registrations reached
[88827.543319] nvethernet 6810000.ethernet: config.flags = 0x0, tx_type = 0x1,rx_filter = 0xc
[88827.543385] Maximum registrations reached
[91939.849184] nvethernet 6810000.ethernet: config.flags = 0x0, tx_type = 0x1,rx_filter = 0xc
[91939.849240] Maximum registrations reached
[105350.831237] nvethernet 6810000.ethernet: config.flags = 0x0, tx_type = 0x1,rx_filter = 0xc
[105350.831270] Maximum registrations reached
[105472.919689] nvethernet 6810000.ethernet: config.flags = 0x0, tx_type = 0x1,rx_filter = 0xc
[105472.919747] Maximum registrations reached
[116061.185679] nvethernet 6810000.ethernet: config.flags = 0x0, tx_type = 0x1,rx_filter = 0xc
[116061.185735] Maximum registrations reached
[118566.006938] nvethernet 6810000.ethernet: config.flags = 0x0, tx_type = 0x1,rx_filter = 0xc
[118566.007006] Maximum registrations reached
[120003.362164] nvethernet 6810000.ethernet: config.flags = 0x0, tx_type = 0x1,rx_filter = 0xc
[120003.362229] Maximum registrations reached
[121265.013041] nvethernet 6810000.ethernet: config.flags = 0x0, tx_type = 0x1,rx_filter = 0xc
[121265.013103] Maximum registrations reached
[140214.452390] nvethernet 6810000.ethernet: config.flags = 0x0, tx_type = 0x1,rx_filter = 0xc
[140214.452449] Maximum registrations reached
[146416.118631] nvethernet 6810000.ethernet: config.flags = 0x0, tx_type = 0x1,rx_filter = 0xc
[146416.118693] Maximum registrations reached
[148250.666738] nvethernet 6810000.ethernet: config.flags = 0x0, tx_type = 0x1,rx_filter = 0xc
[148250.666795] Maximum registrations reached
[150054.758534] nvethernet 6810000.ethernet: config.flags = 0x0, tx_type = 0x1,rx_filter = 0xc
[150054.758600] Maximum registrations reached
[153808.534033] nvethernet 6810000.ethernet: config.flags = 0x0, tx_type = 0x1,rx_filter = 0xc
[153808.534066] Maximum registrations reached
[173535.119174] nvethernet 6810000.ethernet: config.flags = 0x0, tx_type = 0x1,rx_filter = 0xc
[173535.119237] Maximum registrations reached
[181168.894124] nvethernet 6810000.ethernet: config.flags = 0x0, tx_type = 0x1,rx_filter = 0xc
[181168.894183] Maximum registrations reached
[184647.896640] nvethernet 6810000.ethernet: config.flags = 0x0, tx_type = 0x1,rx_filter = 0xc
[184647.896703] Maximum registrations reached
[192097.708232] nvethernet 6810000.ethernet: config.flags = 0x0, tx_type = 0x1,rx_filter = 0xc
[192097.708296] Maximum registrations reached
[196066.864019] nvethernet 6810000.ethernet: config.flags = 0x0, tx_type = 0x1,rx_filter = 0xc
[196066.864078] Maximum registrations reached
[198730.586151] nvethernet 6810000.ethernet: config.flags = 0x0, tx_type = 0x1,rx_filter = 0xc
[198730.586215] Maximum registrations reached
[206010.531109] nvethernet 6810000.ethernet: config.flags = 0x0, tx_type = 0x1,rx_filter = 0xc
[206010.531174] Maximum registrations reached
[220313.686634] nvethernet 6810000.ethernet: config.flags = 0x0, tx_type = 0x1,rx_filter = 0xc
[220313.686691] Maximum registrations reached
[220341.278806] nvethernet 6810000.ethernet: config.flags = 0x0, tx_type = 0x1,rx_filter = 0xc
[220341.278872] Maximum registrations reached
[226589.942661] nvethernet 6810000.ethernet: config.flags = 0x0, tx_type = 0x1,rx_filter = 0xc
[226589.942721] Maximum registrations reached
[229599.062060] nvethernet 6810000.ethernet: config.flags = 0x0, tx_type = 0x1,rx_filter = 0xc
[229599.062117] Maximum registrations reached
[229921.762971] nvethernet 6810000.ethernet: config.flags = 0x0, tx_type = 0x1,rx_filter = 0xc
[229921.763032] Maximum registrations reached
[234684.570617] nvethernet 6810000.ethernet: config.flags = 0x0, tx_type = 0x1,rx_filter = 0xc
[234684.570670] Maximum registrations reached

braja · October 6, 2023, 9:48pm

I’m seeing the same issue, but not even 20ms works that well. Was there ever a resolution here?

peckama2 · January 11, 2024, 11:46am

@WayneWWW I’m sad this bugreport hasn’t been added to L4T release notes - neither known issues, nor fixed issues.

Issue 2 has a complete analysis, now it only awaits like 10 minutes of work of your kernel developers.

Issue 1 needs more investigation on your side. It is definitely very unusual that the TX packet timestamps cannot be read back in < 20 ms.

Now I’ve updated to L4T 35.4.1 and I started seeing a new issue in linuxptp (new Issue 3):

clockcheck: clock frequency changed unexpectedly!

This happens when the freq value written in clock_adjtime()call isn’t read back later.

I’ve added a few debug prints to linuxptp/clockcheck.c at master · richardcochran/linuxptp · GitHub to print the last set value and the value read back. Here’s the result:

clockcheck: clock frequency changed unexpectedly! 520300 1036783
clockcheck: clock frequency changed unexpectedly! 371436 1036783
clockcheck: clock frequency changed unexpectedly! 247989 1036784
clockcheck: clock frequency changed unexpectedly! 142558 1036784
clockcheck: clock frequency changed unexpectedly! -25609 1036784
clockcheck: clock frequency changed unexpectedly! -90039 1036784

The values read back increase by approximately 1 each second. Given it is frequency offset, this is super weird!

The code that handles the calls to clock_adjtime() is here:

github.com

richardcochran/linuxptp/blob/17195fd9cd8f2aaebe76cbb914f161307f81cbf1/clockadj.c#L65


      
          
          	/* With system clock set also the tick length. */
          	if (clkid == CLOCK_REALTIME && realtime_nominal_tick) {
          		tx.modes |= ADJ_TICK;
          		tx.tick = round(freq / 1e3 / realtime_hz) + realtime_nominal_tick;
          		freq -= 1e3 * realtime_hz * (tx.tick - realtime_nominal_tick);
          	}
          
          	tx.modes |= ADJ_FREQUENCY;
          	tx.freq = (long) (freq * 65.536);
          	if (clock_adjtime(clkid, &tx) < 0) {
          		pr_err("failed to adjust the clock: %m");
          		return -1;
          	}
          	return 0;
          }
          
          double clockadj_get_freq(clockid_t clkid)
          {
          	double f = 0.0;
          	struct timex tx;

This was definitely not happening on 35.2.1. What I can google about this error is that something else is controlling the same clock. However, I’m pretty sure that doesn’t happen. The ptp4l daemon runs on mgbe0, which is tied to clock /dev/ptp2. I do not run anything else using this clock. Also, sudo lsof /dev/ptp2 confirms that.

On the other hand, the linuxptp manages to sync the clock pretty well (~ 80 ns), so it seems it is just the reported freq value that is wrong, but the clock actually reacts to the given adjustment commands.

WayneWWW · January 11, 2024, 11:52am

Hi @peckama2

Sorry that it was my mistake. This was not recorded in my list so I forgot to track this.

I will use rel-35.4.1 to check this issue.

WayneWWW · January 11, 2024, 11:53am

BTW, what is the method to reproduce error 3?

peckama2 · January 11, 2024, 11:54am

Thanks, the repro for Issue 3 is the same as for Issue 2.

WayneWWW · January 11, 2024, 12:06pm

But the reproduce way is just running this several times, do you mean sometimes you hit issue 3 and sometimes you hit issue 2?

peckama2 · January 11, 2024, 1:52pm

Ah, I’m sorry, I meant the reproducer for Issue 1. It means running PTP slave on Jetson and PTP master on some other computer connected to the mgbe0 port.

WayneWWW · January 12, 2024, 8:15am

Could you help clarify how long will it take to reproduce this issue (both issue 1 and 2)?

We tried to reproduce this but so far we didn’t see it.

peckama2 · January 12, 2024, 8:53am

Thank you for trying.

Issue 2 should appear after 5 or 6 runs of the command. You don’t even need a cable connected to the port.

Issue 1 is more tricky. After the update to 35.4.1, I haven’t seen this happening (yet) even with the default 10 ms timeout. However, if you want to first just verify you can get the same error, decrease tx_timestamp_timeout to 1 ms and you should get the error within a minute. It also seems to be tied to the link speed. When connected to a 10 Gbps link, even with 1 ms timeout it takes about a minute to get the error. On a 1 Gbps link, I got the error almost instantly (~ 5 seconds). Unfortunately, I don’t remember how often I got the error with 10 ms timeout when I reported the issue. It definitely wasn’t too often. You can see in Nvethernet PTP bug - #6 by peckama2 that with 20 ms timeout, the probability of error was ~ 1 in 5000 seconds.

Issue 3 appears instantly.

WayneWWW · January 12, 2024, 8:55am

Ok, maybe I will start from issue 2 first as you already provided the code analysis (many thanks for that).

peckama2 · January 12, 2024, 9:10am

Reg. issue 1, I’ve now started it with 10 ms (default) timeout, 1 Gbps link and turned off the errors caused by Issue 3 to have a clearer view of the log. Now I can see that the error is still there, approx every 500 seconds.

peckama2 · January 12, 2024, 11:48am

On a 10 Gbps link I’ve registered so far two errors - one after 4000 seconds, the other after another 2000 seconds.

WayneWWW · January 15, 2024, 5:13am

Since default interface name is eth0, shall I use your command as

sudo hwstamp_ctl -i eth0 -r 12 -t 1

And with port to port connection of 2 ethernet ports?

peckama2 · January 15, 2024, 8:08am

Hmm, I don’t know what you mean. These bug reports are about the 10 GbE ports. I don’t know how they’re named in devkits, but I see names mgbe0 and mgbe1 (along with eth0 and eth1 representing the 1 GbE ports).

WayneWWW · January 15, 2024, 9:00am

Hi,

We actually don’t have 2 mgbe on the boards but only one. I mean actually there are no support for multiple MGBE on Orin AGX. This is mentioned in the design guide document.

If you want us to fix such issue, please reproduce this on NV devkit and tell us the method to reproduce.

Topic		Replies	Views
MGBE PTP Timestamp Timeout DRIVE AGX Orin General driveos-networking	31	1823	October 10, 2023
Origin 10Gnetwork gPtTP test fail Jetson AGX Orin ethernet	8	992	January 18, 2023
Nvidia Orin linuxptp (ptp4l) hardware timestamping[Uploading: ptp4l.conf...]() Jetson AGX Orin kernel , ubuntu	9	4388	May 17, 2023
JP6.2系统时间跳变问题 Jetson AGX Orin clock	27	686	July 28, 2025
Orin 10G ptp slave Jetson AGX Orin kernel , networking , chinese	2	561	December 5, 2023
PTP4l software timesamp error on orin Jetson AGX Orin nvbugs , time-synchronization	29	4125	May 8, 2023
AGX-Orin Kernel panic when using PTP4L cmd Jetson AGX Orin kernel , board-design	5	139	August 28, 2024
Time sensitive networking [TSN] on NX Jetson Xavier NX kernel , nvbugs	154	15840	August 7, 2022
Jetson AGX Orin 64GB的ptp4l.service报错timed out while polling for tx timestamp Jetson AGX Orin ethernet	6	336	December 13, 2024
Jetson AGX Xavier both NICs use Same PTP Hardware Clock Jetson AGX Xavier networking	4	1448	November 17, 2021

Nvethernet PTP bug

1) Sometimes, polling for TX timestamp takes long

2) tegra_register_hwtime_source adds duplicate entries

Related topics

2) `tegra_register_hwtime_source` adds duplicate entries