Initial image transfer to GPU slow

I’ve been using NPP to experiment with replacing portions of algorithms we currently have that use Intel’s IPP. I started with the samples provided and have been able to get several operations working of interest at my current job. I’ve been doing a fair amount of performance measuring to get numbers to convince others of the potential of using GPUs. However, one of the things I’ve noticed is that the first time an image is transferred to the GPU, it is very slow. I’m seeing 600-1000ms for the initial image transfer to the GPU, regardless of image size. After the first one is complete, however, subsequent image transfers execute very quickly (i.e. less than 30ms even for the largest images I’m using that are 55 megapixels). To get better numbers in my prototype code, I simply send the sample Lena image to the GPU before doing anything on the images I’m interested in. This transfer to the device takes 600-1000ms. Then I load the image I want to use and transfer it to the GPU which only takes 30ms or less, even for large images. I’ve posted a snippet of the code I’m using for loading and transferring both images.

My question is whether there is an explanation for this? It seems some sort of “warm up” or initialization time is required for the GPU to be ready to process images. Is this the case? If not, what can be done to improve the initial image transfer to the GPU?


// Load Lena image on GPU since for some reason this makes subsequent transfers much faster

npp::ImageCPU_8u_C1 lenaHostSrc;

std::string lenaFile = "../../data/Lena.pgm";

npp::loadImage(lenaFile, lenaHostSrc);

npp::ImageNPP_8u_C1 lenaDeviceSrc(lenaHostSrc);


// Load gray-scale image from disk and upload to the device

npp::ImageCPU_8u_C1 oHostSrc;

npp::loadImage(sFilename, oHostSrc);

npp::ImageNPP_8u_C1 oDeviceSrc(oHostSrc);

the first time you make a call that uses the GPU, it has to initialize all GPU state. that’s the overhead you’re seeing.