NVPPS PTP synchronization

Please provide the following info (tick the boxes after creating this topic):
Software Version
DRIVE OS 6.0.8.1
DRIVE OS 6.0.6
DRIVE OS 6.0.5
DRIVE OS 6.0.4 (rev. 1)
DRIVE OS 6.0.4 SDK
other

Target Operating System
Linux
QNX
other

Hardware Platform
DRIVE AGX Orin Developer Kit (940-63710-0010-300)
DRIVE AGX Orin Developer Kit (940-63710-0010-200)
DRIVE AGX Orin Developer Kit (940-63710-0010-100)
DRIVE AGX Orin Developer Kit (940-63710-0010-D00)
DRIVE AGX Orin Developer Kit (940-63710-0010-C00)
DRIVE AGX Orin Developer Kit (not sure its number)
other

SDK Manager Version
1.9.3.10904
other

Host Machine Version
native Ubuntu Linux 20.04 Host installed with SDK Manager
native Ubuntu Linux 20.04 Host installed with DRIVE OS Docker Containers
native Ubuntu Linux 18.04 Host installed with DRIVE OS Docker Containers
other

Hello,
I am experimenting with fsync-groups with the TSC generator and NVPPS and while I am getting encouraging results, the most important part seems to not be working as implied by the Orin Timesync document. Specifically:

  • I enabled fsync-groups in the dtsi, changing from status=“disabled” to status=“okay”
  • I have an external PTP master connected to J11 Port 4, and ptp4l is running on mgbe2_0 and shows a valid synchronization
  • I use ioctl to poke nvpps (timer mode, results in nanoseconds) to get the both tsc and ptp timestamps of the last event, both are showing reasonable values
  • I use the offset value from the ptp timestamp to the tsc timestamp, modulo 1 second, converted down to tsc ticks and use nvsipl_camera with the -F flag (fsync-groups), specifying group 0 and the tsc tick count of the desired time of start triggering (plus something like 2 seconds to let it initialize first)
  • I am probing the output of the camera trigger and comparing it to the PPS output of my PTP grandmaster

I would expect based on the documentation that I have effectively told it ‘start at the TSC equivalent of the PTP second’, and I would expect that the rising edge of the trigger would concide with the PPS output of the grandmaster plus or minus the lock-in value (by default 20us) of the TSC to PTP synchronization system.

Instead, I see the rising edges start nearly perfectly aligned every time (off by about 2 microseconds), and then the fsync pulse starts moving backwards relative to grandmaster PPS output, sliding continuously at the rate of a few microseconds per second, until it’s well over 20 microseconds.

This tells me that whatever mechanism is aligning TSC and PTP is not working accurately. I can verify (1) that the ptp4l instance is working correctly and that I am timesync’d to the external grandmaster, (2), dmesg tells me that it is getting lock state updates, and while it does say that it can’t get concurrent timestamps from mgbe2_0, it appears that it is in fact working correctly (I suspect that is when I was starting ptp4l on mgbe2_0, and (3) the fsync groups are working as expected, though I note that I have to rmmod and modprobe cam_fsync every time as nvsipl_camera doesn’t appear to cleanly disable cam_fsync after it quits. Here’s the dmesg output for nvpps:

timestamps [ 1.386912] nvpps c6a0104.nvpps: nvpps_probe
[ 1.386921] nvpps c6a0104.nvpps: PPS GPIO not provided in DT, only Timer mode available
[ 1.386923] nvpps c6a0104.nvpps: using ptp notifier method with interface(mgbe2_0)
[ 1.386925] nvpps c6a0104.nvpps: tsc_res_ns(32)
[ 1.387024] nvpps c6a0104.nvpps: nvpps cdev(509:0)
[ 1.387155] nvpps c6a0104.nvpps: tegra_gte_register_event err = -22
[ 1.387170] nvpps c6a0104.nvpps: TSC config ptx 0x313
[ 10.591468] nvpps nvpps0: tsc_lock_stat:25
[ 12.639477] nvpps nvpps0: tsc_lock_stat:45
[ 3636.575468] nvpps nvpps0: tsc_lock_stat:c5
[ 3637.599469] nvpps nvpps0: tsc_lock_stat:e5
[ 3656.065262] nvpps nvpps0: failed to get PTP_TSC concurrent timestamp from interface(mgbe2_0)
[10444.127463] nvpps nvpps0: tsc_lock_stat:45
[10446.175462] nvpps nvpps0: tsc_lock_stat:65
[10448.223462] nvpps nvpps0: tsc_lock_stat:85
[10450.271478] nvpps nvpps0: tsc_lock_stat:a5
[10452.319471] nvpps nvpps0: tsc_lock_stat:c5
[11225.439395] nvpps nvpps0: tsc_lock_stat:15
[11227.487391] nvpps nvpps0: tsc_lock_stat:35
[11232.607388] nvpps nvpps0: tsc_lock_stat:95
[11243.871387] nvpps nvpps0: tsc_lock_stat:45
[11279.711375] nvpps nvpps0: tsc_lock_stat:85
[11281.759380] nvpps nvpps0: tsc_lock_stat:a5

Can you please help me out to figure out why PPS to TSC syntonization isn’t working as described in the documentation?

Where did you get the lock-in value?

Could you provide additional details on how you observed the drift? For instance, did you perform signal probing? If so, could you specify where and the exact procedure followed?

Looking in nvpps_main.c, the driver module code for nvpps, if the ptp lock-in value is not specified in the dtsi it defaults to 20us (at least as per comments in the code - comments in the nvpps.txt file seem to indicate it’s even tighter, 1us).

I am probing the multi-function pin assigned frame sync GPIO on the serializer of a camera module attached to the Drive AGX and comparing it to the PPS output of my PTP grandmaster.

Are these details relevant to answering my question?

From nvpps.txt:

NVIDIA nvpps driver bindings

Nvpps is a Linux Kernel mode driver to support the Xavier & Orin time domain
correlation feature.

Required properties:

  • compatibles: should be “nvpps,tegra194-nvpps”

Optional properties:

  • gpios: GPIO number and active level for the PPS input signal
  • memmap_phc_regs: boolean flag to indicate MAC PHC regs to be memory mapped
    for getting PTP time. If not defined ptp notifer method will
    be used with selected interface
  • interface: NW interface name to be used for MAC PHC regs. This field can be
    set to ‘eqos_0’, ‘mgbe0_0’, ‘mgbe1_0’, ‘mgbe2_0’ or ‘mgbe3_0’ for Orin.
    For Xavier, it shoud be set to ‘eqos_0’. If undef, default to ‘eqos_0’
  • sec_interface: NW interface name to be used to calculate PTP Time offset.
    set to ‘eqos_0’, ‘mgbe0_0’, ‘mgbe1_0’, ‘mgbe2_0’ or ‘mgbe3_0’ for Orin.
    For Xavier, Leave this undefined. For Orin, If undef default to ‘eqos_0’
  • ptp_tsc_k_int: Specifies the integer part of the factor used to calculate the delta to
    apply to NUM when the fast convergence algorithm is enabled when syncing
    or locking TSC time with PTP time domain.
    The value is a 8bit hexa-decimal value. If unspecified, NvPPS driver uses
    0x70 as default value
  • ptp_tsc_lock_threshold: specifies the threshold value which is used by HW to determine
    if the TSC PTP sync/Lock is lost. The lock is deemed to be lost if the HW
    determined absolute diff between PTP & TSC time exceed this value.
    The value is a 16bit hexa-decimal value. The minimum value(0x1F) supported
    correspond to 1us and max value(0xFFFF) supported correspond to approx 2.1ms.
    If unspecified, NvPPS driver uses 0x26C(corresponding to 20us) by default
  • ptp_tsc_sync_dis: boolean flag to indicate if nvpps should disable PTP TSC sync logic.
    The default behaviour is to keep PTP TSC sync logic enabled.

(bolding mine)

The relevant dtsi entry (drive-linux/kernel/source/hardware/nvidia/platform/t23x/automotive/kernel-dts/p3710/common/tegra234-p3710-0010.dtsi)

    nvpps {
            compatible = "nvidia,tegra194-nvpps";
            #address-cells = <1>;
            #size-cells = <1>;
            reg = <0x0 0xc6a0104 0x0 0x118>;
            interface = "mgbe2_0";
            sec_interface = "eqos_0";
            status = "disabled";
    };

Based on this, I would expect that the it would attempt to syntonize TSC to mgbe2_0’s PHC (/dev/ptp2, for the record) with a K value of 0x70 and locking threshold of 20uS.

Please monitor the TSC to PTP delta with the NVPPS_GETEVENT ioctl snippet provided below to identify any potential drift issue.

static __s64 tsc_2_ptp_delta(__s64 tsc, __s64 ptp)
{
	static __s64	tsc_prev = 0;
	static __s64	ptp_prev = 0;
	__s64		ptp_delta, tsc_delta;

	ptp_delta = ptp - ptp_prev;
	tsc_delta = tsc - tsc_prev;
	/* remember the previous value */
	tsc_prev = tsc;
	ptp_prev = ptp;
	return tsc_delta - ptp_delta;
}

/* get the timestamps */
if (ioctl(fd, NVPPS_GETEVENT, &ts) != 0) {
    fprintf(stderr, "ioctl failed for NVPPS_GETEVENT err %s\n", strerror(errno));
} else {
    fprintf(stdout, "evt, %d, tsc, %llu, phc, %llu, sec_phc %llu, delta, %lld, latency %llu, ptp_offset %lld, tsc_res_ns, %llu, evt_mode, %d, tsc_mode, %d\n", ts.evt_nb, ts.tsc, ts.ptp, ts.secondary_ptp, tsc_2_ptp_delta((__s64)ts.tsc, (__s64)ts.ptp), ts.irq_latency, (__s64)(ts.secondary_ptp - ts.ptp), ts.tsc_res_ns, ts.evt_mode, ts.tsc_mode);
}

Thanks for the response.

I don’t know that I recognize this function:

Is that something trivial that can be pasted in this forum?

static __s64 tsc_2_ptp_delta(__s64 tsc, __s64 ptp)
{
	static __s64	tsc_prev = 0;
	static __s64	ptp_prev = 0;
	__s64		ptp_delta, tsc_delta;

	ptp_delta = ptp - ptp_prev;
	tsc_delta = tsc - tsc_prev;
	/* remember the previous value */
	tsc_prev = tsc;
	ptp_prev = ptp;
	return tsc_delta - ptp_delta;
}

Running that every second for several seconds, I get the attached output. It would appear from the delta that it’s not changing much second to second, which would imply that the syntonization feature is in fact working as expected. It still raises the question of why FSYNC walks relative to PPS…
timesync_data.txt (21.2 KB)

Incoming data dump:

I ran the following set of commands for around a minute:

DATE=date +%s%N; OFFSET=~/eva_nvpps; START_TIME=$(($(($((DATE-$((DATE%1000000000))))-OFFSET))+45000000000)); sudo ./nvsipl_camera -c HZKJ_IMX728_ES2_V2_070FOV_CPHY_x4 -m “1 0 0 0” -sR -M -v 4 -F ‘0 ‘“$((START_TIME/32))”’’ >> “$((START_TIME/32))”.txt`

The eva_nvpps is just a quick and dirty executable that gets the offset from PTP to TSC. Here are the environment variables it came up with:

START_TIME: 88746125260559
OFFSET: 1698779554874739441
DATE: 1698868256565350833

The nvsipl_camera executable captured that start time as 2773316414392

All of which seem pretty reasonable - the goal here was to get it to (1) catch the current date in nanoseconds (which is synchronized via phc2sys from a ptp instance running on mgbe2_0), (2) modulo the time to the nearest second and then add the offset to get rid of the TAI offset plus a bit to ensure we don’t start in the past, (3) convert that start time into TSC ticks, and then (4) launch nvsipl_camera.

Gathering all the information from the output and reformatting it, I got a good list of TSC Start Times and TSC Capture Times. Doing the same backwards conversion (plus a bit of a fudge factor to take into account the fact that Linux can’t do things instantaneously), I got the following graph of what I think the TSC timestamp should be (starting at the top of the second and incrementing by 1/30 every frame) versus what the TSC Start of Capture timestamp show it currently is.

As you can see, with the fudge factor it starts at near zero (I observe this physically too, so it’s not so unbelievable), and then slowly grows until it hits nearly 20us, then jumps back down, then ratchets back up past 20us, growing all the while.

I admit my testing methodology is less than ideal for proving out a problem, but this does seem to indicate that camera images are being triggered with a slowly drifting offset from the desired PPS-aligned time.

Could you kindly provide a picture detailing how the PTP GM is connected to the development kit?

Additionally, have you reached out to Quanta regarding this issue? If so, I’d appreciate any details or responses you’ve received. I’m also currently investigating this matter with our team.

Hello,

the PTP GM is connected to J11 Port 4 via an NVIDIA media converter.

I have not spoken to Quanta on this - as fsync is entirely controlled by NVIDIA, there really doesn’t seem to be a point in including them in this.

I would suggest contacting Quanta in parallel. While NVIDIA controls the fsync signal source, involving Quanta could be beneficial. They closely collaborate with our camera team and might offer valuable insights, especially considering you’re probing the signal on the deserializer within a Quanta camera module.

Do you have access to another camera module to conduct similar testing on it? This comparison might provide valuable insights and help ascertain if the issue is specific to the current module or more broadly applicable.

Hello,

I’ll send them an email and see if I can get any support from them.

Some interesting additional information, I hooked the PTP master to an 802.1AS-aware switch and connected two Drive AGXes, each with their own individual cameras, established PTP sync, and then executed the command above.

Probing the signals, the two FSYNC signals on the two cameras across the two Drive AGXes followed each other very closely, bouncing around 0.9-1.2 microseconds, and clearly servoed to keep in line with one another.

Both drifted identically away from the master PPS line, however, moving ‘forward’ (rising edge of the FSYNC signal occuring sooner and sooner relative to the PPS line) at that same rate of about 1us per second.

Starting one Drive AGX some time after starting the first would result in the FSYNC signal of that Drive AGX starting aligned well with the PPS line and starting its drift anew, and therefore would see its rising edge happen well after the already-running Drive AGX, exactly what one would expect if the both systems are synchronized but running juuuuust a little off from 30 Hz. Starting the capture process always resets the FSYNC to the top of the PPS signal, and the longer it runs, the further it drifts, at a very predictable rate.

The fact that two running together drift together in lockstep makes me think the PPS synchronization is accurate, but the frequency generator is off by a smidge.

I hate to ask the question, but could it be because 30 Hz is not an even multiple of a TSC with 32 nanosecond resolution? 1e9 nanoseconds per second / 32 nanoseconds per TSC tick gives 31250000 TSC ticks per second, divide that by 30 and you get 1041666.667 TSC ticks per frame, assuming it rounds that down you get 1041666 TSC ticks per frame, multiply that back up by 30 Frames/sec and 32 ticks per nanosecond and you get 640 nanoseconds short of a full nanosecond every second, or a drift of about 0.64 nanoseconds (EDIT: I meant microseconds) per second, on the order of what I observe. Does this follow? I don’t have access to the register map of the cdi-tsc, so I don’t know if that’s how it does its thing

I figured this is actually something I can test, in that aforementioned dtsi I changed the frequency generator to 10 Hz (all of them, reading the driver information it does some kind of LCM algorithm so none of them could be divisible by 3), reflashed and ran my code - sure enough, it locks dead on and stays locked.

So, seems like the frequency generator isn’t capable of generating exactly 30 Hz, and doesn’t have a mechanism to correct for frequency drift to keep cameras aligned relative to the top of the PPS second - I’d be interested in learning what NVIDIA plans to do about that, but in the meantime, I think I have my answer.

The use of the DIV_ROUND_CLOSEST() macro by the CDI TSC driver for calculations does indeed correlate with your analysis regarding the possible discrepancy. However, I’m curious about the observed drift rate of a few microseconds per second, which seems higher than the estimated 0.64 nanoseconds. Is there a specific reason for this discrepancy in the observed drift rate compared to the calculated value?

That’s reasonable, but I think I misspoke - observing the drift more closely, it does align with 0.64 microseconds per second. I estimated 1 microsecond per second elsewhere, I think that was translated to ‘a few’ to establish that it wasn’t precise, but in reality it was lower than what that initial phrase implied.

Oh, I see I badly misrepresented that response - when I do the math, I get a discrepancy of 640 nanoseconds, or 0.64 microseconds, not 0.64 nanoseconds as it says above - that’s three orders of magnitude difference, I can see why you were confused. No, what I wanted to say was that it I assume the TSC is the time base for the frequency generator, and the TSC has a resolution of 32 nanoseconds, the requesting a frequency of 30 Hz will result in a period of 1041666 TSC ticks, which, when multiplied by 30 fps and then 32 nanoseconds per tick, gives 999999360 nanoseconds, which is 640 nanoseconds shy of a full second (and thus fast by 0.64 microseconds). Clearer?

Thank you for the detailed clarification and rectification. Your explanation regarding the discrepancy of 640 nanoseconds in relation to the TSC’s resolution and the requested frequency of 30 Hz is much clearer now.