The time of putting image data into cuda memory increase a a lot when the image become larger

I use tensorrt do inference. When the input image size is 400225, everything is fine.The time of putting image data into cuda memory is 1ms. However, the time of putting image data into cuda memory increase a a lot, becomes 10ms when the image size is 500288.

the code below is how i put image data into cuda memory, mInputCPU is mapped to an gpu memory of CUDA
by cudaHostAlloc().

for( uint32_t y=0; y < INPUT_H; y++)
{
for( uint32_t x=0; x < INPUT_W; x++)
{
cv::Vec3b intensity = dstImage.atcv::Vec3b(y, x);
mInputCPU[imgPixels * 0 + y * INPUT_W + x] = float(intensity.val[2]);
mInputCPU[imgPixels * 1 + y * INPUT_W + x] = float(intensity.val[1]);
mInputCPU[imgPixels * 2 + y * INPUT_W + x] = float(intensity.val[0]);
}
}

Thanks

Hello,

this question is more suitable for CUDA Programming and Performance forum:
https://devtalk.nvidia.com/default/board/57/cuda-programming-and-performance/

when posting, please provide details on the platforms you are using?

Linux distro and version
GPU type
nvidia driver version
CUDA version
CUDNN version
Python version [if using python]
Tensorflow version
TensorRT version

thanks