Misc Error in transformWeightsIfFP: 1 when using onnx model and run for tensorrt using FP16

Please provide the following info (check/uncheck the boxes after clicking “+ Create Topic”):
Software Version
NVIDIA DRIVE™ Software 9.0 (Linux)

Target Operating System
Linux

Hardware Platform
NVIDIA DRIVE™ AGX Pegasus DevKit (E3550)

SDK Manager Version
1.4.1.7402

Host Machine Version
native Ubuntu 16.04

Hi.

I encountered some error related to TensorRT build CUDA engine.

Situation:
** TensorRT version is 5.0.2 (DRIVE software 9.0)
** Use custom onnx-tensorrt (parser) based on v5.0
** Precision FP16

There is an error when build cuda engine with parsed onnx network.

Could you please explain why this situation happens and any solutions?
Also, isn’t libnvonnxparser.so in DRIVE software 9.0 version 5.0?

Error log:

[TRT] Tactic 0 time 0.002048
[TRT] Adding reformat layer: (Unnamed Layer* 5) [Convolution] + (Unnamed Layer* 6) [Activation] reformatted input 0 (140) from Float(1,256,65536,4194304) to Half(1,256,65536,4194304)
[TRT] Adding reformat layer: (Unnamed Layer* 41) [Pooling] reformatted input 0 (173) from Half(1,32,1024,524288) to Half(64,2048,1:8,65536)
[TRT] Adding reformat layer: (Unnamed Layer* 42) [Convolution] + (Unnamed Layer* 43) [Activation] reformatted input 0 (174) from Half(64,2048,1:8,65536) to Half(1,32,1024,524288)
[TRT] Adding reformat layer: (Unnamed Layer* 48) [Convolution] + (Unnamed Layer* 49) [Activation] output to be reformatted 0 (182) from Float(1,16,256,131072) to Half(1,16,256,131072)
[TRT] Adding reformat layer: (Unnamed Layer* 67) [Shuffle] output to be reformatted 0 (output) from Float(1,16,1024,65536) to Half(1,16,1024,65536)
[TRT] Adding reformat layer: (Unnamed Layer* 69) [Shuffle] output to be reformatted 0 (202) from Float(1,32,2048,131072) to Half(1,32,2048,131072)
[TRT] Adding reformat layer: (Unnamed Layer* 71) [Shuffle] output to be reformatted 0 (204) from Float(1,16,1024,65536) to Half(1,16,1024,65536)
[TRT] Adding reformat layer: (Unnamed Layer* 73) [Shuffle] output to be reformatted 0 (206) from Float(1,28,1792,114688) to Half(1,28,1792,114688)
[TRT] Adding reformat layer: (Unnamed Layer* 75) [Shuffle] output to be reformatted 0 (208) from Float(1,12,768,49152) to Half(1,12,768,49152)
[TRT] Adding reformat layer: (Unnamed Layer* 77) [Shuffle] output to be reformatted 0 (210) from Float(1,16,1024,65536) to Half(1,16,1024,65536)
[TRT] Adding reformat layer: (Unnamed Layer* 78) [Convolution] || (Unnamed Layer* 80) [Convolution] || (Unnamed Layer* 82) [Convolution] || (Unnamed Layer* 84) [Convolution] || (Unnamed Layer* 86) [Convolution] || (Unnamed Layer* 88) [Convolution] reformatted input 0 (178) from Half(1,32,1
2,1024,1048576) to Float(1,32,1024,1048576)
[TRT] For layer (Unnamed Layer* 0) [Convolution] + (Unnamed Layer* 1) [Activation] a higher-precision implementation was chosen than was requested because it resulted in faster network performance
[TRT] For layer (Unnamed Layer* 2) [Convolution] + (Unnamed Layer* 3) [Activation] a higher-precision implementation was chosen than was requested because it resulted in faster network performance
[TRT] For layer (Unnamed Layer* 4) [Pooling] a higher-precision implementation was chosen than was requested because it resulted in faster network performance
[TRT] For layer (Unnamed Layer* 50) [Convolution] + (Unnamed Layer* 51) [Activation] a higher-precision implementation was chosen than was requested because it resulted in faster network performance
[TRT] For layer (Unnamed Layer* 52) [Convolution] + (Unnamed Layer* 53) [Activation] a higher-precision implementation was chosen than was requested because it resulted in faster network performance
[TRT] For layer (Unnamed Layer* 54) [Convolution] + (Unnamed Layer* 55) [Activation] a higher-precision implementation was chosen than was requested because it resulted in faster network performance
[TRT] For layer (Unnamed Layer* 56) [Convolution] + (Unnamed Layer* 57) [Activation] a higher-precision implementation was chosen than was requested because it resulted in faster network performance
[TRT] For layer (Unnamed Layer* 58) [Convolution] + (Unnamed Layer* 59) [Activation] a higher-precision implementation was chosen than was requested because it resulted in faster network performance
[TRT] For layer (Unnamed Layer* 60) [Convolution] + (Unnamed Layer* 61) [Activation] a higher-precision implementation was chosen than was requested because it resulted in faster network performance
[TRT] For layer (Unnamed Layer* 62) [Convolution] + (Unnamed Layer* 63) [Activation] a higher-precision implementation was chosen than was requested because it resulted in faster network performance
[TRT] For layer (Unnamed Layer* 64) [Convolution] + (Unnamed Layer* 65) [Activation] a higher-precision implementation was chosen than was requested because it resulted in faster network performance
[TRT] For layer (Unnamed Layer* 78) [Convolution] || (Unnamed Layer* 80) [Convolution] || (Unnamed Layer* 82) [Convolution] || (Unnamed Layer* 84) [Convolution] || (Unnamed Layer* 86) [Convolution] || (Unnamed Layer* 88) [Convolution] a higher-precision implementation was chosen than w$s r
s requested because it resulted in faster network performance
[TRT] For layer (Unnamed Layer* 79) [Shuffle] a higher-precision implementation was chosen than was requested because it resulted in faster network performance
[TRT] For layer (Unnamed Layer* 81) [Shuffle] a higher-precision implementation was chosen than was requested because it resulted in faster network performance
[TRT] For layer (Unnamed Layer* 83) [Shuffle] a higher-precision implementation was chosen than was requested because it resulted in faster network performance
[TRT] For layer (Unnamed Layer* 85) [Shuffle] a higher-precision implementation was chosen than was requested because it resulted in faster network performance
[TRT] For layer (Unnamed Layer* 87) [Shuffle] a higher-precision implementation was chosen than was requested because it resulted in faster network performance
[TRT] For layer (Unnamed Layer* 89) [Shuffle] a higher-precision implementation was chosen than was requested because it resulted in faster network performance
[TRT] For layer (Unnamed Layer* 90) [Convolution] || (Unnamed Layer* 92) [Convolution] || (Unnamed Layer* 94) [Convolution] || (Unnamed Layer* 96) [Convolution] || (Unnamed Layer* 98) [Convolution] || (Unnamed Layer* 100) [Convolution] a higher-precision implementation was chosen than $as
as requested because it resulted in faster network performance
[TRT] For layer (Unnamed Layer* 91) [Shuffle] a higher-precision implementation was chosen than was requested because it resulted in faster network performance
[TRT] For layer (Unnamed Layer* 93) [Shuffle] a higher-precision implementation was chosen than was requested because it resulted in faster network performance
[TRT] For layer (Unnamed Layer* 95) [Shuffle] a higher-precision implementation was chosen than was requested because it resulted in faster network performance
[TRT] For layer (Unnamed Layer* 97) [Shuffle] a higher-precision implementation was chosen than was requested because it resulted in faster network performance
[TRT] For layer (Unnamed Layer* 99) [Shuffle] a higher-precision implementation was chosen than was requested because it resulted in faster network performance
[TRT] For layer (Unnamed Layer* 101) [Shuffle] a higher-precision implementation was chosen than was requested because it resulted in faster network performance
[TRT] For layer (Unnamed Layer* 102) [Convolution] || (Unnamed Layer* 104) [Convolution] || (Unnamed Layer* 106) [Convolution] || (Unnamed Layer* 108) [Convolution] || (Unnamed Layer* 110) [Convolution] || (Unnamed Layer* 112) [Convolution] a higher-precision implementation was chosen $han
han was requested because it resulted in faster network performance
[TRT] For layer (Unnamed Layer* 103) [Shuffle] a higher-precision implementation was chosen than was requested because it resulted in faster network performance
[TRT] For layer (Unnamed Layer* 105) [Shuffle] a higher-precision implementation was chosen than was requested because it resulted in faster network performance
[TRT] For layer (Unnamed Layer* 107) [Shuffle] a higher-precision implementation was chosen than was requested because it resulted in faster network performance
[TRT] For layer (Unnamed Layer* 109) [Shuffle] a higher-precision implementation was chosen than was requested because it resulted in faster network performance
[TRT] For layer (Unnamed Layer* 111) [Shuffle] a higher-precision implementation was chosen than was requested because it resulted in faster network performance
[TRT] For layer (Unnamed Layer* 113) [Shuffle] a higher-precision implementation was chosen than was requested because it resulted in faster network performance
[TRT] For layer (Unnamed Layer* 114) [Convolution] || (Unnamed Layer* 116) [Convolution] || (Unnamed Layer* 118) [Convolution] || (Unnamed Layer* 120) [Convolution] || (Unnamed Layer* 122) [Convolution] || (Unnamed Layer* 124) [Convolution] a higher-precision implementation was chosen $han
han was requested because it resulted in faster network performance
[TRT] For layer (Unnamed Layer* 115) [Shuffle] a higher-precision implementation was chosen than was requested because it resulted in faster network performance
[TRT] For layer (Unnamed Layer* 117) [Shuffle] a higher-precision implementation was chosen than was requested because it resulted in faster network performance
[TRT] For layer (Unnamed Layer* 119) [Shuffle] a higher-precision implementation was chosen than was requested because it resulted in faster network performance
[TRT] For layer (Unnamed Layer* 121) [Shuffle] a higher-precision implementation was chosen than was requested because it resulted in faster network performance
[TRT] For layer (Unnamed Layer* 123) [Shuffle] a higher-precision implementation was chosen than was requested because it resulted in faster network performance
[TRT] For layer (Unnamed Layer* 125) [Shuffle] a higher-precision implementation was chosen than was requested because it resulted in faster network performance
[TRT] For layer (Unnamed Layer* 126) [Convolution] || (Unnamed Layer* 128) [Convolution] || (Unnamed Layer* 130) [Convolution] || (Unnamed Layer* 132) [Convolution] || (Unnamed Layer* 134) [Convolution] || (Unnamed Layer* 136) [Convolution] a higher-precision implementation was chosen $han
han was requested because it resulted in faster network performance
[TRT] For layer (Unnamed Layer* 127) [Shuffle] a higher-precision implementation was chosen than was requested because it resulted in faster network performance
[TRT] For layer (Unnamed Layer* 129) [Shuffle] a higher-precision implementation was chosen than was requested because it resulted in faster network performance
[TRT] For layer (Unnamed Layer* 131) [Shuffle] a higher-precision implementation was chosen than was requested because it resulted in faster network performance
[TRT] For layer (Unnamed Layer* 133) [Shuffle] a higher-precision implementation was chosen than was requested because it resulted in faster network performance
[TRT] For layer (Unnamed Layer* 135) [Shuffle] a higher-precision implementation was chosen than was requested because it resulted in faster network performance
[TRT] For layer (Unnamed Layer* 137) [Shuffle] a higher-precision implementation was chosen than was requested because it resulted in faster network performance
[TRT] For layer (Unnamed Layer* 138) [Convolution] || (Unnamed Layer* 140) [Convolution] || (Unnamed Layer* 142) [Convolution] || (Unnamed Layer* 144) [Convolution] || (Unnamed Layer* 146) [Convolution] || (Unnamed Layer* 148) [Convolution] a higher-precision implementation was chosen $han
han was requested because it resulted in faster network performance
[TRT] For layer (Unnamed Layer* 139) [Shuffle] a higher-precision implementation was chosen than was requested because it resulted in faster network performance
[TRT] For layer (Unnamed Layer* 141) [Shuffle] a higher-precision implementation was chosen than was requested because it resulted in faster network performance
[TRT] For layer (Unnamed Layer* 143) [Shuffle] a higher-precision implementation was chosen than was requested because it resulted in faster network performance
[TRT] For layer (Unnamed Layer* 145) [Shuffle] a higher-precision implementation was chosen than was requested because it resulted in faster network performance
[TRT] For layer (Unnamed Layer* 147) [Shuffle] a higher-precision implementation was chosen than was requested because it resulted in faster network performance
[TRT] For layer (Unnamed Layer* 149) [Shuffle] a higher-precision implementation was chosen than was requested because it resulted in faster network performance
[TRT] Formats and tactics selection completed in 118.896 seconds.
[TRT] After reformat layers: 99 layers
[TRT] Block size 268435456[TRT] Block size 268435456
[TRT] Block size 33554432[TRT] Block size 16777216
[TRT] Block size 4194304
[TRT] Block size 1048576
[TRT] Block size 65536
[TRT] Block size 16384
[TRT] Block size 4096
[TRT] Total Activation Memory: 592531456
[TRT] …/builder/cudnnBuilderWeightConverters.cpp (310) - Misc Error in transformWeightsIfFP: 1 (Weights are outside of fp16 range. A possible fix is to retrain the model with regularization to bring the magnitude of the weights down.)

Dear @dkas2,
Could you check with the latest DRIVE release and let us know the status?

Thanks for the reply.

Because of company policy, we cannot use the latest version.

Is there any other solution?

Regards.

Dear @dkas2 ,
The DRIVE SW 9.0 is very old version. Could you share the onnx file?Let me check with latest version and see if noticed same?

Dear @dkas2 ,
Could you provide any update?

Hi,

Sorry for the late reply and we cannot give you onnx file :(

We analyzed the situation more, and here is updates:

  • Error occurs when TrtEngine starts building itself right after onnx file parsed
  • Default onnx parser library (in driveworks version 9) works
  • Custom onnx parser library (onnx-tensorrt.v5.0 @ github + some custom operator) doesn’t work with driveworks

Since we cannot use the latest version of driveworks, we analyzed this situation and here is the way to solve:

  • Train with half precision in pytorch and convert to onnx file with full precision
  • Or fix custom onnx parser to guarantee the precision

Training with half precision is easy to apply, so we will try this method first.

I’ll let you know the progress :)