We have a cluster of Xavier NXes connected with a PCIe switch in NTB configuration. And using clustering drivers, we are getting TLP size of 16 bytes between communication of two NX nodes as tested by Chiplink PCIe Analyzer. I am attaching a screenshot of my test which shows TLP payload of 4 Data Words = 16 Bytes. I am concerned as this is limiting the performance of PCIe communication of Gen3 Devices. I want to know if we can increase the TLP size in PCIe Driver or Device tree for Xavier NX or is there any other way to increase the performance?
As per the Technical Reference Manual (TRM), we can achieve maximum Payload size of 256 bytes. I am not sure how to achieve it and in which way to increase the performance.
Please guide me as soon as possible. Your help will be much appreciated.
this does not seem to be a solution.
we have more or less the exact same problem and setpci only sets the maximum allowable size. the issue seems to be that we cant seem to get the cpu to actually make the bigger packets.
that the TLP is 16 bytes is shown in the original post above. the reason we know the TLPs are small is that the transfer speed over PCIe is getting bottlenecked. generating large TLPs would radically increase transfer speed, as we are seing a drastic reduction compared to x86 where write combining is letting us send larger TLPs, usually 64 bytes.