Nvethernet PTP bug

I’ve got my hands on the devkit. Results are:

  • Issue 1: Happened with tx_timestamp_timeout set to 1 and connected via 10 Gbps link. MTBF was around 60 secs. However, this error rate started only after some time running. At first, I thought the errors are not there, but after leaving it be for approx. 1 hour, I saw log with this amount of errors.
    • Testing now with timeout set to 10, will report results later.
    • To clean up the log a bit so that you see the error, pass -l6 instead of -l7 to ptp4l on Orin. It would show everything that’s needed.
    • Also, is the PTP sync actually running in your testbed? I.e., are you seeing lines starting with rms XXXX max YYYY logged every second, with XXXX being a relatively small number (< 10 000)?
    • You can also try changing delay_mechanism to E2E. But you need to do it on both master and slave computers. This is the only deviation from the default configs I use.
  • Issue 2: Replicated (with sudo hwstamp_ctl -i eth0 -r 12 -t 1) and linuxptp from master branch
  • Issue 3: Doesn’t happen. Not sure why. I’ll try flashing another module with CT carrier to figure out if it’s not just some glitch from the existing configuration.

I completely trust your comments to be valid ;) I just haven’t seen any reason for the removal. With such an important change, it desires one, I’d say. So current status is that the ConnectTech Forge board works with both 10 GbE ports, but some future update might break it. Do I get it correctly?

I’m a bit confused because their DT config says:

ODMDATA="gbe-uphy-config-22,hsstp-lane-map-3,nvhs-uphy-config-2,hsio-uphy-config-0,gbe0-enable-10g,gbe1-enable-10g"

Which should select Config # 1 with only one mgbe (mode 22, not 25). But both mgbe0 and mgbe1 show up and work. Do you have an explanation for this?