Inference Time on TX2 with MobileNet

Hi, when I use yolov3-mobilenet version test on TX2. The inference time of FP32 is 40ms and for FP16 it is 36ms. Almost no defference between FP16 and FP32. The resolution of input is 416*416. About the depthwise conv layer. I used the group conv para in convolution layer as depthwise convolution instead of implement the depthwise layer as pluginlayer. Is this the reason caused the time problem? Hope you could provide some ideas. Thanks !

Hi,

May I know the TensorRT version you use?

There is a performance issue on the depthwise convolution of the previous TensorRT.
An implementation is available on TensorRT5.0.
https://devtalk.nvidia.com/default/topic/1025870/depthwise-convolution-is-very-slow-using-tensorrt3-0/

FP16 mode: The speedups of E2E inference time for MobileNet v1/v2 are 2.39x ~ 2.90x (Xavier iGPU) or 1.52x ~ 2.08x (Xavier dGPU) with current version (for different batch sizes) compared with TRT v4.0.

Please wait for our next JetPack release to get TensorRT5.0 for TX2.

Thanks

Hi, AastaLLL! Thanks for your reply! I use the Jetpack 3.3 with TRT version 4.0( with caffemodel). So what do you mean is that it’s not how I implement the depthwise convolution layer, but the TRT version? And when will the new version be released? My graduate project needs the help of TRT acceleration.Thanks!!

There is another question that the accuracy of my caffemodel in mode FP32 and in FP16(TensorRT) is much worse than that in caffe. I don’t think the accuracy of FP32 should be reduced so much. What do you think is the cause? Thanks!

Hi,

We cannot disclose our future plan for TensorRT 5.0 on TX2.
But if you just want to give it a try, you can use JetPack4.1.1 to fit the requirement.
(But we don’t do the full-test for JetPack4.1.1+TX2)

For accuracy issue, it can be improved by training the model with FP16 precision.
The difference between FP32 and FP16 is little but be enlarged from the layer like softmax, relu.

Thanks.