PCI-E 4.0 vs 3.0 for RTX 3090 related to host<->device memory transfer

I tried to search for this and couldn’t really find an answer.

Is there a decrease in memory transfer bandwidth moving from PCI-E 4.0 to PCI-E 3.0 in terms of copying from host -> device or device -> host using standard methods (e.g. cudaMemcpy calls)?

If so, what is the quantifiable difference?

I haven’t tested it, but my general expectation would be approximately a 2x difference.

I don’t have access to the latest hardware. Internet-published PCIe gen4 x16 throughput data that I have seen measured by others was in the 23.5 GB/sec to 26 GB/sec range, so about 2x the PCIe gen3 x16 throughput of about 12 GB/sec commonly observed. Which is as expected based on the respective PCIe specifications (16 giga transfers per second vs 8 giga transfers per second).

Those sources that performed application benchmarks generally showed only small (< 10%) performance differences between running with PCIe gen 3 vs PCIe gen 4. That does not mean there could not be a use case that benefits heavily from the faster link, but it is also not surprising since various existing compute applications I have looked at use only 20% to 40% of the full PCIe gen 3 link bandwidth.

Thanks to both for the responses. I also saw the application/gaming benchmarks but I didn’t see a mention of transfer bandwidth as it pertained to them, which is why I asked.

For at least one previously coded real-time streaming/processing application that is time-constrained in processing and looking to take advantage of Ampere (vs Volta), shaving off half of the data transfer time which occurs at the start of the processing is a worthwhile gain. Granted this could probably be alleviated otherwise with GPUDirect RDMA, but that’s quite a bit more lift.

1 Like