I’m working with Xavier-to-Xavier link via PCIe switch (NTB). I manage to establish link between two devices and utilize both ntb_perf
and ntb-tool
to do some basic tests. I’m trying to test end-to-end latency and for that I modified ntb-tool
test utility. The idea is to synchronize both Xaviers to nanosecond precision with up to 100ns of allowed deviation. Currently, PTP (ptp4l
) is used and active synchronization suggest the required precision is achieved with cca. ± 10ns deviation most of the time. The following image serves as a proof.
The sudo ptp4l -i eth0 -m -s
is used on the Slave device (Xavier B) and sudo ptp4l -i eth0 -m
on the Master device (Xavier A). The PTP sync exchange is run actively as the NTB test is run. However, after executing a variety of tests I’m suspecting either the actual clock sync might not be as the ptp4l
suggests or there is misunderstanding how I’m supposed to use this time in kernel.
I’m using ktime_get_real_ts64()
to get timestamp on Xavier A side and send it over memcpy_toio()
which executes PCIe transfer to inbound window of Xavier B (that is outbound window of Xavier A). The Xavier B polls for data in its inbound window and collects initial timestamp using memcpy_fromio()
, gets current timestamp using ktime_get_real_ts64()
and gives the result end-to-end latency as the difference of two timestamps.
Now, it the ktime_xxx
API relevant for the actual clock source being synced between two Xaviers? Or should I get some hardware timer or other source’s data instead? I’m aware there could be a bug in my code but I’m still investigating this matter. I’m getting ± 10ms latency which is unrealistic high and should be in range of microseconds (us) max, assuming kernel executes the PCIe transfer immediately on Xavier A side. This is of course a big assumption however there’s no way kernel would wait milliseconds (ms) to carry out a single PCIe write while idling.
Another question is whether PTP is the best option for nanosecond peer-to-peer synchronization? Are there other, more appropriate ways of resolving this matter (assuming PTP is the problem in my case)?
P.S.: The minimum latency of round-trip time (RTT) using doorbell (DB) registers (without using PTP, just measuring from Xavier A) results in cca 3-4 us. I’m assuming the correct test code and proper PTP sync would give me similar results. For further clarifications I can post actual test code.