I am running some initial benchmarks on CARMA kit and found out that the latency is more than I expected. The numbers look something like this:
Elements Transferred : 1 ( float: 4 bytes)
H2D: 114.940798 us
H2D Pinned: 125.577605 us
D2H: 170.166397 us
D2H Pinned: 125.494397 us
Elements Transferred : 4096 ( float: 4 bytes)
H2D: 186.67 us
H2D Pinned: 181.747 us
D2H: 260.294 us
D2H Pinned: 158.3555 us
Where H2D is Host to device and D2H is device to host transfer. All the timings are in us ( micro seconds ) and every element is 4 byte float.
Can anybody confirm on these numbers and why this latency is too high?