TensorRT 5.0RC on Xavier (JetPack 4.0EA) vs TensorRT 4.0GA on TX2 (JetPack 3.3)

Problem:

Running the same c++ program with the same uff file on both platforms. TensorRT 5.0RC on Xavier got UFFParser error while TensorRT 4.0GA on TX2 was working fine. (The same code and file work on TensorRT 5.0.0.10 and TensorRT 4.0.1.6 on 1080Ti as well)

When running with FP32:
UFFParser: Parser error: test_net/concat_2: Concat operation axis is out of bounds for layer test_net/concat_2

When running with FP16:
ERROR: UFFParser: Parser error: test_net/BatchNorm_11/moving_variance: Weight 76611.585938 is outside of [65504.000000, -65504.000000].

Questions:

  1. Also tried to generate the uff file with version 0.5.1 uff but still got the same error for both FP32 and FP16 on Xavier. Should/(can) I downgrade my Xavier to JetPack 3.3 with TensorRT 4.0GA? Or should I modify the network?

  2. For the FP16, does the FP16 option rescale the weights in the network? If not, do I need FP16 training for Xavier to make FP16 inference work?

Thanks!

Jetson Xavier (Jetpack 4.0EA)
Linux distro and version 18.04
CUDA version 10.0
CUDNN version 7.3
Tensorflow version 1.11
TensorRT version 5.0RC

Jetson TX2 (Jetpack 3.3)
Linux distro and version 16.04
CUDA version 9.0
CUDNN version 7.1.5
Tensorflow version 1.9
TensorRT version 4.0GA

1080Ti
Linux distro and version 16.04
nvidia driver version 390.48
CUDA version 9.0
CUDNN version 7.1.3
Tensorflow version 1.8
TensorRT version 5.0.0.10 and 4.0.1.6

Hello,

can you share the uff and pb file which produces this error?

Regarding the FP16 issue, you’ll need to make sure the weights are in the FP16 range when running inference.

A way around this is to run the parser in higher precision (FP32) mode and run the builder in FP16 mode so that the builder will try to convert the weights. The function to do this is IBuilder::setFp16Mode()

Thanks for the offline help on the first parser error!

Parser error: test_net/concat_2: Concat operation axis is out of bounds for layer test_net/concat_2

This error is caused by the fact that the axis for concatV2 has to be zero or a positive integer (cannot be -1).

Hi,JF_brt, Thanks very much. I also get the similar error in RT5.0 while it is ok in 4.0. Then I modify the negative value for the axis in “concat” to the positive one as your suggestion. It works.It is some difference between two versions in UffParsers inside TR that makes it happened.