Hi all
could anyone let me know what im doing wrong here, but i seem to be getting realy slow transfer speeds from the device to the host.
The program basicaly writes some data into 2 arrays, but i then need to get the data back to the host.
So im curently using the following code to do this
CUDA_SAFE_CALL(cudaMemcpy(Liquid,Liquidd,memsize,cudaMemcpyDeviceToHost));
CUDA_SAFE_CALL(cudaMemcpy(Crystal,Crystald,memsize,cudaMemcpyDeviceToHost));
So the size of the data i am copying is, 2048x2048x sizeof(int) and im transfering 2 of them.
Now if i just run the program, i get a 4.75 second pause, whilst the program executes those 2 lines of code, but if i pause before them, with a scan statement for example, the delay is reduced.
Any ideas of how i can do this a bit quicker, cause as far as i can see im transfering 8meg of data, and its taking nearly 5 seconds.
I,m prety sure the bandwidth isnt 2mb/s Device to Host, so i guess im doing something wrong.
Thanks
Mark