Nowadays I am just trying to using CUDA, but something really strange confuse me for a long time. Can you help me? External Image
The problem is like this, when I am timing the speed of cudaMemcpy, I found for about 1.5mbyte data, from host to device will cost very little, which shows 0ms. While for 0.6mbyte data, from device to host will cost as much as 15ms!
I don’t know why this happened? Is it something wrong in my program or it just very slow for gpu to translate data to host?
int ffFitSize=ffFitWidffFitHeisizeof(uchar3);
cudaMemcpy(outputImage, ffFit,ffFitSize,cudaMemcpyDeviceToHost);
Just these two sentences, but i found if ffFit is float4, it will cost 16ms,but if it is uchar4,it will cost very little like 2ms, why these things happened?