Hi,I am using Tensor RT on Jetson TX2 for live video parsing.As for memory copy,I compare two ways, CUDA zero copy and cudaMemcpy. The model I use is googlenet, the data type is FP32
- with CUDA zero copy,I do
this takes about 20ms.
- with cudaMemcpy,
the time I count includes cudaMalloc, cudaMemcpy, context.excute(1, buffers), this in total takes 16ms.
Is there any problem? Shouldn’t CUDA zero copy be faster than cudaMemcpy?
Also, I count the time with caffe-cudnn5, and it takes 21 ms, which is almost the same with Tensor RT+CUDA zero copy.
So, could you please give me some advise on memory copy with Tensor RT for live video parsing?