Data transfer time

cudaMancpy · March 12, 2018, 1:09pm

Hello, I am studying CUDA!

I have a question.

What is the better way between small data transtering serveral times and long data transfering once? It means.
When data transfer from cpu to gpu, Is 1MB copying once better or 0.1MB copying 10 times better?
I thought 1MB copying once is better because of networking cost.
But I can’t find a paper which deals with data transfer cost.

njuffa · March 12, 2018, 1:21pm

I assume you are referring to transfers between the host system and the GPU, which use PCIe interconnect, unless you are one of the lucky few who get to use a PowerPC system with NVLINK.

PCIe is a packetized transport, which also means that there is fixed, per-transfer overhead. Therefore, effective throughput increases with growing transfer size. Maximum transfer rates (~12 GB/sec per direction for a PCIe gen3 x16 link) are typically reached for transfer sizes in the 8 MB to 16 MB region.

Note that while fewer large transfers are therefore more efficient overall than many small transfers, use of larger transfers might have an impact on latency in the context of your application.

You don’t need a paper to assess the transfer rates at different transfer sizes, you can simply measure those yourself. CUDA comes with a sample application called bandwidthTest that you could modify according to your needs.

cudaMancpy · March 12, 2018, 2:56pm

Thank you, njuffa!
I tried to use bandwidthTest sample file.
That uses pined memory, right?

and I don’t know how to change memory size…
I use PCI-E.
I am wondering about minimum networking cost.
If you know that, please let me know.

Thatnks!

njuffa · March 12, 2018, 3:38pm

Try bandwidthTest --help and/or look at the source code and your first two questions will be answered.

“networking” usually applies to connectivity via Ethernet, Infiniband, etc, not PCIe. I am not sure what you mean by “minimum cost”. If you are interested in the latency of minimally-sized transfers across PCIe I would suggest measuring that on your system. I do not know off-hand, but I think it might be in the 0.2 to 1.0 microsecond range.