Hi,I am using Tensor RT on Jetson TX2 for live video parsing.As for memory copy,I compare two ways, CUDA zero copy and cudaMemcpy. The model I use is googlenet, the data type is FP32

- with CUDA zero copy,I do

context.excute(1, buffers);

this takes about 20ms. - with cudaMemcpy,

the time I count includes cudaMalloc, cudaMemcpy, context.excute(1, buffers), this in total takes 16ms.

Is there any problem? Shouldn’t CUDA zero copy be faster than cudaMemcpy?

Also, I count the time with caffe-cudnn5, and it takes 21 ms, which is almost the same with Tensor RT+CUDA zero copy.

So, could you please give me some advise on memory copy with Tensor RT for live video parsing?

Thanks.