The time of putting image data into cuda memory increase a a lot when the image become larger

I use tensorrt do inference on TX2 platform. The Jetpack version is 3.3. When the input image size is 400225, everything is fine.The time of putting image data into cuda memory is 1ms. However, the time of putting image data into cuda memory increase a a lot, becomes 10ms when the image size is 500288.

the code below is how i put image data into cuda memory, mInputCPU is mapped to an gpu memory of CUDA
by cudaHostAlloc().

for( uint32_t y=0; y < INPUT_H; y++)
{
for( uint32_t x=0; x < INPUT_W; x++)
{
cv::Vec3b intensity = dstImage.atcv::Vec3b(y, x);
mInputCPU[imgPixels * 0 + y * INPUT_W + x] = float(intensity.val[2]);
mInputCPU[imgPixels * 1 + y * INPUT_W + x] = float(intensity.val[1]);
mInputCPU[imgPixels * 2 + y * INPUT_W + x] = float(intensity.val[0]);
}
}

Thanks

Hi,

Here are two suggestion for you:

1. Please remember to maximize CPU/GPU clock first.

sudo ./jetson_clocks.sh

2. Could you share the block/thread number you use for launching kernel?

Thanks.