What size PCIe TLPs are used for P2P enabled UVA CUDA mem copys

I am wondering what the Max Payload Size of the MWr and MRd Transaction Layer Packets is when running cudaMemcpy() between 2 GTX 1080 GPUs with Peer-to-Peer access through the Unified Virtual Addressing enabled

This is the type of low-level implementation-specific detail that is unlikely to be publicly documented anywhere. I have never seen any NVIDIA-provided documentation that spoke to such details.

If it is important to know this information for your use case, I would suggest hooking up a logic analyzer to examine PCIe traffic. That way you will know for sure. I would be curious to know for what purpose one would need this information, if you are allowed to share that.

If I remember comments correctly (that’s a big if!) that were made in these forums by someone looking into communication between an FPGA and a GPU (I don’t recall which type!), the packet size they observed being used by the GPU was 128 bytes. Again: I may misremember the number and I did not look into this myself.

Yes, I think its 128 bytes max payload.

https://devtalk.nvidia.com/default/topic/1026924/cuda-programming-and-performance/-about-how-to-set-gpu-max-payload-size-please-help-me/

It’s not user configurable or controllable.