wanted to check my up- and download speed to the GPU with the CUDA Profiler 1.0.
I’m uploading and downloading exactly 2073600 bytes.
The results are as followed:
method=[ memcopy ] gputime=[ 387.664 ]
method=[ memcopy ] gputime=[ 1188.992 ]
The upload is done by cudaMemcpyToArray and the download is done by cudaMemcpy.
The download speed seems reasonable,
but the upload speed is a little weird. If it takes 387 microseconds to push
2073600 bytes to the Device than the throughput must be around 4,9GB/second
which is way above the maximum throughput of my tiny PCI-Express 1.1 bus :).
Maybe I’m just calculating the throughput wrong but
could anybody give me some info how the profiler gets those timings?
If I simply do a timing on the host code that issues the memcpy i get around 1,5GB/sec.