I use tensorrt do inference on TX2 platform. the Jetpack version is 3.3. When the input image size is 400225, everything is fine.The time of putting image data into cuda memory is 1ms. However, the time of putting image data into cuda memory increase a a lot, becomes 10ms when the image size is 500288.
the code below is how i put image data into cuda memory, mInputCPU is mapped to an gpu memory of CUDA
by cudaHostAlloc().
for( uint32_t y=0; y < INPUT_H; y++)
{
for( uint32_t x=0; x < INPUT_W; x++)
{
cv::Vec3b intensity = dstImage.atcv::Vec3b(y, x);
mInputCPU[imgPixels * 0 + y * INPUT_W + x] = float(intensity.val[2]);
mInputCPU[imgPixels * 1 + y * INPUT_W + x] = float(intensity.val[1]);
mInputCPU[imgPixels * 2 + y * INPUT_W + x] = float(intensity.val[0]);
}
}
Thanks