Host/Device Array Transfer

Quick question on this thread:

What is the runtime for a data transfer from host to device or device to host for an array? Is it O(n) or is there some hardware feature that allows it to transfer faster? Just trying to make the determination when the transfer will outweigh the benefit of parallelizing a small loop. Thanks!

It’s O(n) where n is the number of bytes to transfer, plus probably a little bit of constant overhead.

If you were hoping for O(log(n)), I have to disappoint you. :)

Oh, but if only some sneaky hardware designer could invent a way to transfer n bytes in less than O(n) scaling! Memory walls would be a thing of the past!

Might not be able to reduce past O(n), but with some parallel data transfer it might perhaps be possible to reduce by a factor of 4, 8, etc. A detail that would be lost in the big O model.

It’s all done over the PCI-Express, so the limit is really the practical bandwidth of that bus.