When I create a time event around a cudaMemcpy I get a pretty constant time for the copy. dosen’t matter if it’s 1000 float3 or 500.000 .
it’s only .3 ms slower to copy the large pile than the small pile (5 ms vs 5.3).
is it my timing event that is misplaced (one before and one after) or ?
It is to be expected that most of the transfer time for small transfers will be dominated by overhead, thus appear constant. For large transfers it should increase linearly. 50000 float3 structs is still only a few MB.