TensorRT 6.0.1 performs worse than TensorRT 5.1.6 on Jetson AGX Xavier

Hi,

I am currently trying some examples within from docker containers on new release of Jetpack (4.3). I wanted to do some benchmarkings and compare TensorRT 6.0.1 and cuDNN 7.6.3 versions with the old ones (5.1.6 and 7.5.0 respectively). I have some issues regarding to this as follows;

1- 24 FPS with --> protobuf=3.6.1, tensorflow-gpu=1.13.1+nv19.3, tensorrt=5.1.6, cudnn=7.5.0 opencv=3.3.1
2- 9.6 FPS with --> protobuf=3.8.0, tensorflow-gpu=1.15.0+nv19.11, tensorrt=6.0.1, cudnn=7.6.3, opencv=4.1.1
3- 8.5 FPS with --> protobuf=3.6.1, tensorflow-gpu=1.15.0+nv19.11, tensorrt=6.0.1, cudnn=7.6.3, opencv=4.1.1
4- I think no support for Tensorflow=1.14.0 with TensorRT 6.0.1??? (So I am currently building it from source)

The main question is that why there is such performance drop when it comes to new versions of TensorRT and cuDNN?

Hi,

Please noticed that TF-TRT combines the implementation of TensorFlow and TensorRT.
A possible cause is that in newer TensorFlow version, the API change make fewer operations are supported by the TensorRT.
Using more TensorFlow implementation leads to the possible performance degradation.

One way to confirm this is to try JetPack4.2 with tensorflow-gpu=1.15.0+nv19.11.
https://developer.download.nvidia.com/compute/redist/jp/v42/tensorflow-gpu/
Could you help us to give it a try?

Thanks.

Actually I think that the main reason of this performance issue is the need for forcing the NMS layers to work on GPU instead CPU when working with Tensorflow 1.15.0. Because If you don’t force it, it throws Aborted (core dumped) when TRT compiles the model.

The full error is:

F tensorflow/core/util/device_name_utils.cc:92] Check failed: IsJobName(job)

Also here is the perf result you asked:
4- 6.5 FPS with --> protobuf=3.6.1, tensorflow-gpu=1.15.0+nv19.11, tensorrt=5.1.6, cudnn=7.5.0, opencv=3.4.6

After all of the steps above, I think all we can do here is posting an issue on tensorflow’s github repo regarding to this problem, isn’t it?

Hi,

Sorry for the late update.

Which model do you use?
If you are using standard ssd_mobilenet, it’s recommended to convert the TF model into TensorRT.
TensorRT has optimized for the Jetson platform and should give it much better performance.

Thanks.