The time of putting image data into cuda memory increase a a lot when the image become larger

373197201 · September 18, 2018, 5:45am

I use tensorrt do inference on TX2 platform. The Jetpack version is 3.3. When the input image size is 400225, everything is fine.The time of putting image data into cuda memory is 1ms. However, the time of putting image data into cuda memory increase a a lot, becomes 10ms when the image size is 500288.

the code below is how i put image data into cuda memory, mInputCPU is mapped to an gpu memory of CUDA
by cudaHostAlloc().

for( uint32_t y=0; y < INPUT_H; y++)
{
for( uint32_t x=0; x < INPUT_W; x++)
{
cv::Vec3b intensity = dstImage.atcv::Vec3b(y, x);
mInputCPU[imgPixels * 0 + y * INPUT_W + x] = float(intensity.val[2]);
mInputCPU[imgPixels * 1 + y * INPUT_W + x] = float(intensity.val[1]);
mInputCPU[imgPixels * 2 + y * INPUT_W + x] = float(intensity.val[0]);
}
}

Thanks

AastaLLL · September 20, 2018, 7:19am

Hi,

Here are two suggestion for you:

1. Please remember to maximize CPU/GPU clock first.

sudo ./jetson_clocks.sh

2. Could you share the block/thread number you use for launching kernel?

Thanks.