Please provide the following info (check/uncheck the boxes after clicking “+ Create Topic”):
Software Version
NVIDIA DRIVE™ Software 9.0 (Linux)
Target Operating System
Linux
Hardware Platform
NVIDIA DRIVE™ AGX Pegasus DevKit (E3550)
SDK Manager Version
1.4.1.7402
Host Machine Version
native Ubuntu 16.04
Hi.
I encountered some error related to TensorRT build CUDA engine.
Situation:
** TensorRT version is 5.0.2 (DRIVE software 9.0)
** Use custom onnx-tensorrt (parser) based on v5.0
** Precision FP16
There is an error when build cuda engine with parsed onnx network.
Could you please explain why this situation happens and any solutions?
Also, isn’t libnvonnxparser.so in DRIVE software 9.0 version 5.0?
Error log:
[TRT] Tactic 0 time 0.002048
[TRT] Adding reformat layer: (Unnamed Layer* 5) [Convolution] + (Unnamed Layer* 6) [Activation] reformatted input 0 (140) from Float(1,256,65536,4194304) to Half(1,256,65536,4194304)
[TRT] Adding reformat layer: (Unnamed Layer* 41) [Pooling] reformatted input 0 (173) from Half(1,32,1024,524288) to Half(64,2048,1:8,65536)
[TRT] Adding reformat layer: (Unnamed Layer* 42) [Convolution] + (Unnamed Layer* 43) [Activation] reformatted input 0 (174) from Half(64,2048,1:8,65536) to Half(1,32,1024,524288)
[TRT] Adding reformat layer: (Unnamed Layer* 48) [Convolution] + (Unnamed Layer* 49) [Activation] output to be reformatted 0 (182) from Float(1,16,256,131072) to Half(1,16,256,131072)
[TRT] Adding reformat layer: (Unnamed Layer* 67) [Shuffle] output to be reformatted 0 (output) from Float(1,16,1024,65536) to Half(1,16,1024,65536)
[TRT] Adding reformat layer: (Unnamed Layer* 69) [Shuffle] output to be reformatted 0 (202) from Float(1,32,2048,131072) to Half(1,32,2048,131072)
[TRT] Adding reformat layer: (Unnamed Layer* 71) [Shuffle] output to be reformatted 0 (204) from Float(1,16,1024,65536) to Half(1,16,1024,65536)
[TRT] Adding reformat layer: (Unnamed Layer* 73) [Shuffle] output to be reformatted 0 (206) from Float(1,28,1792,114688) to Half(1,28,1792,114688)
[TRT] Adding reformat layer: (Unnamed Layer* 75) [Shuffle] output to be reformatted 0 (208) from Float(1,12,768,49152) to Half(1,12,768,49152)
[TRT] Adding reformat layer: (Unnamed Layer* 77) [Shuffle] output to be reformatted 0 (210) from Float(1,16,1024,65536) to Half(1,16,1024,65536)
[TRT] Adding reformat layer: (Unnamed Layer* 78) [Convolution] || (Unnamed Layer* 80) [Convolution] || (Unnamed Layer* 82) [Convolution] || (Unnamed Layer* 84) [Convolution] || (Unnamed Layer* 86) [Convolution] || (Unnamed Layer* 88) [Convolution] reformatted input 0 (178) from Half(1,32,1
2,1024,1048576) to Float(1,32,1024,1048576)
[TRT] For layer (Unnamed Layer* 0) [Convolution] + (Unnamed Layer* 1) [Activation] a higher-precision implementation was chosen than was requested because it resulted in faster network performance
[TRT] For layer (Unnamed Layer* 2) [Convolution] + (Unnamed Layer* 3) [Activation] a higher-precision implementation was chosen than was requested because it resulted in faster network performance
[TRT] For layer (Unnamed Layer* 4) [Pooling] a higher-precision implementation was chosen than was requested because it resulted in faster network performance
[TRT] For layer (Unnamed Layer* 50) [Convolution] + (Unnamed Layer* 51) [Activation] a higher-precision implementation was chosen than was requested because it resulted in faster network performance
[TRT] For layer (Unnamed Layer* 52) [Convolution] + (Unnamed Layer* 53) [Activation] a higher-precision implementation was chosen than was requested because it resulted in faster network performance
[TRT] For layer (Unnamed Layer* 54) [Convolution] + (Unnamed Layer* 55) [Activation] a higher-precision implementation was chosen than was requested because it resulted in faster network performance
[TRT] For layer (Unnamed Layer* 56) [Convolution] + (Unnamed Layer* 57) [Activation] a higher-precision implementation was chosen than was requested because it resulted in faster network performance
[TRT] For layer (Unnamed Layer* 58) [Convolution] + (Unnamed Layer* 59) [Activation] a higher-precision implementation was chosen than was requested because it resulted in faster network performance
[TRT] For layer (Unnamed Layer* 60) [Convolution] + (Unnamed Layer* 61) [Activation] a higher-precision implementation was chosen than was requested because it resulted in faster network performance
[TRT] For layer (Unnamed Layer* 62) [Convolution] + (Unnamed Layer* 63) [Activation] a higher-precision implementation was chosen than was requested because it resulted in faster network performance
[TRT] For layer (Unnamed Layer* 64) [Convolution] + (Unnamed Layer* 65) [Activation] a higher-precision implementation was chosen than was requested because it resulted in faster network performance
[TRT] For layer (Unnamed Layer* 78) [Convolution] || (Unnamed Layer* 80) [Convolution] || (Unnamed Layer* 82) [Convolution] || (Unnamed Layer* 84) [Convolution] || (Unnamed Layer* 86) [Convolution] || (Unnamed Layer* 88) [Convolution] a higher-precision implementation was chosen than w$s r
s requested because it resulted in faster network performance
[TRT] For layer (Unnamed Layer* 79) [Shuffle] a higher-precision implementation was chosen than was requested because it resulted in faster network performance
[TRT] For layer (Unnamed Layer* 81) [Shuffle] a higher-precision implementation was chosen than was requested because it resulted in faster network performance
[TRT] For layer (Unnamed Layer* 83) [Shuffle] a higher-precision implementation was chosen than was requested because it resulted in faster network performance
[TRT] For layer (Unnamed Layer* 85) [Shuffle] a higher-precision implementation was chosen than was requested because it resulted in faster network performance
[TRT] For layer (Unnamed Layer* 87) [Shuffle] a higher-precision implementation was chosen than was requested because it resulted in faster network performance
[TRT] For layer (Unnamed Layer* 89) [Shuffle] a higher-precision implementation was chosen than was requested because it resulted in faster network performance
[TRT] For layer (Unnamed Layer* 90) [Convolution] || (Unnamed Layer* 92) [Convolution] || (Unnamed Layer* 94) [Convolution] || (Unnamed Layer* 96) [Convolution] || (Unnamed Layer* 98) [Convolution] || (Unnamed Layer* 100) [Convolution] a higher-precision implementation was chosen than $as
as requested because it resulted in faster network performance
[TRT] For layer (Unnamed Layer* 91) [Shuffle] a higher-precision implementation was chosen than was requested because it resulted in faster network performance
[TRT] For layer (Unnamed Layer* 93) [Shuffle] a higher-precision implementation was chosen than was requested because it resulted in faster network performance
[TRT] For layer (Unnamed Layer* 95) [Shuffle] a higher-precision implementation was chosen than was requested because it resulted in faster network performance
[TRT] For layer (Unnamed Layer* 97) [Shuffle] a higher-precision implementation was chosen than was requested because it resulted in faster network performance
[TRT] For layer (Unnamed Layer* 99) [Shuffle] a higher-precision implementation was chosen than was requested because it resulted in faster network performance
[TRT] For layer (Unnamed Layer* 101) [Shuffle] a higher-precision implementation was chosen than was requested because it resulted in faster network performance
[TRT] For layer (Unnamed Layer* 102) [Convolution] || (Unnamed Layer* 104) [Convolution] || (Unnamed Layer* 106) [Convolution] || (Unnamed Layer* 108) [Convolution] || (Unnamed Layer* 110) [Convolution] || (Unnamed Layer* 112) [Convolution] a higher-precision implementation was chosen $han
han was requested because it resulted in faster network performance
[TRT] For layer (Unnamed Layer* 103) [Shuffle] a higher-precision implementation was chosen than was requested because it resulted in faster network performance
[TRT] For layer (Unnamed Layer* 105) [Shuffle] a higher-precision implementation was chosen than was requested because it resulted in faster network performance
[TRT] For layer (Unnamed Layer* 107) [Shuffle] a higher-precision implementation was chosen than was requested because it resulted in faster network performance
[TRT] For layer (Unnamed Layer* 109) [Shuffle] a higher-precision implementation was chosen than was requested because it resulted in faster network performance
[TRT] For layer (Unnamed Layer* 111) [Shuffle] a higher-precision implementation was chosen than was requested because it resulted in faster network performance
[TRT] For layer (Unnamed Layer* 113) [Shuffle] a higher-precision implementation was chosen than was requested because it resulted in faster network performance
[TRT] For layer (Unnamed Layer* 114) [Convolution] || (Unnamed Layer* 116) [Convolution] || (Unnamed Layer* 118) [Convolution] || (Unnamed Layer* 120) [Convolution] || (Unnamed Layer* 122) [Convolution] || (Unnamed Layer* 124) [Convolution] a higher-precision implementation was chosen $han
han was requested because it resulted in faster network performance
[TRT] For layer (Unnamed Layer* 115) [Shuffle] a higher-precision implementation was chosen than was requested because it resulted in faster network performance
[TRT] For layer (Unnamed Layer* 117) [Shuffle] a higher-precision implementation was chosen than was requested because it resulted in faster network performance
[TRT] For layer (Unnamed Layer* 119) [Shuffle] a higher-precision implementation was chosen than was requested because it resulted in faster network performance
[TRT] For layer (Unnamed Layer* 121) [Shuffle] a higher-precision implementation was chosen than was requested because it resulted in faster network performance
[TRT] For layer (Unnamed Layer* 123) [Shuffle] a higher-precision implementation was chosen than was requested because it resulted in faster network performance
[TRT] For layer (Unnamed Layer* 125) [Shuffle] a higher-precision implementation was chosen than was requested because it resulted in faster network performance
[TRT] For layer (Unnamed Layer* 126) [Convolution] || (Unnamed Layer* 128) [Convolution] || (Unnamed Layer* 130) [Convolution] || (Unnamed Layer* 132) [Convolution] || (Unnamed Layer* 134) [Convolution] || (Unnamed Layer* 136) [Convolution] a higher-precision implementation was chosen $han
han was requested because it resulted in faster network performance
[TRT] For layer (Unnamed Layer* 127) [Shuffle] a higher-precision implementation was chosen than was requested because it resulted in faster network performance
[TRT] For layer (Unnamed Layer* 129) [Shuffle] a higher-precision implementation was chosen than was requested because it resulted in faster network performance
[TRT] For layer (Unnamed Layer* 131) [Shuffle] a higher-precision implementation was chosen than was requested because it resulted in faster network performance
[TRT] For layer (Unnamed Layer* 133) [Shuffle] a higher-precision implementation was chosen than was requested because it resulted in faster network performance
[TRT] For layer (Unnamed Layer* 135) [Shuffle] a higher-precision implementation was chosen than was requested because it resulted in faster network performance
[TRT] For layer (Unnamed Layer* 137) [Shuffle] a higher-precision implementation was chosen than was requested because it resulted in faster network performance
[TRT] For layer (Unnamed Layer* 138) [Convolution] || (Unnamed Layer* 140) [Convolution] || (Unnamed Layer* 142) [Convolution] || (Unnamed Layer* 144) [Convolution] || (Unnamed Layer* 146) [Convolution] || (Unnamed Layer* 148) [Convolution] a higher-precision implementation was chosen $han
han was requested because it resulted in faster network performance
[TRT] For layer (Unnamed Layer* 139) [Shuffle] a higher-precision implementation was chosen than was requested because it resulted in faster network performance
[TRT] For layer (Unnamed Layer* 141) [Shuffle] a higher-precision implementation was chosen than was requested because it resulted in faster network performance
[TRT] For layer (Unnamed Layer* 143) [Shuffle] a higher-precision implementation was chosen than was requested because it resulted in faster network performance
[TRT] For layer (Unnamed Layer* 145) [Shuffle] a higher-precision implementation was chosen than was requested because it resulted in faster network performance
[TRT] For layer (Unnamed Layer* 147) [Shuffle] a higher-precision implementation was chosen than was requested because it resulted in faster network performance
[TRT] For layer (Unnamed Layer* 149) [Shuffle] a higher-precision implementation was chosen than was requested because it resulted in faster network performance
[TRT] Formats and tactics selection completed in 118.896 seconds.
[TRT] After reformat layers: 99 layers
[TRT] Block size 268435456[TRT] Block size 268435456
[TRT] Block size 33554432[TRT] Block size 16777216
[TRT] Block size 4194304
[TRT] Block size 1048576
[TRT] Block size 65536
[TRT] Block size 16384
[TRT] Block size 4096
[TRT] Total Activation Memory: 592531456
[TRT] …/builder/cudnnBuilderWeightConverters.cpp (310) - Misc Error in transformWeightsIfFP: 1 (Weights are outside of fp16 range. A possible fix is to retrain the model with regularization to bring the magnitude of the weights down.)