copying image using cudaMemcpy is terribly slow

I have decoded a bitmap image into the array. to apply the filter i have to copy it to gpu, when i used cudaMemcpy the 3mb image took 20seconds to copy.

in the sdk they have used buffers in openGL and the image gets copied instantly. but i dont want to use openGL. so is there any alternate method to copy image into graphics card quickly.

Thanks in advance


May I know how are you measuring the time?

I had similar issue. But actually in my case it was because the first call to CUDA always takes a lot time compared to other calls. Although a delay of 20 seconds is still a lot ( I used to get a delay of 5 to 6 seconds). So if you are also measuring the time of the first call then it may not be correct. In any case it should never take such a long time because bandwidth of data from CPU to GPU is still quite high. So 3 MB image shouldn’t really take this much time.