Hello everyone,
Recently i update to trt5, and try to convert the same model, got below issue, is anybody got same issue?
UFFParser: Parser error: res_aspp_g/decoder/resnet/bn_conv1/batch_normalization/moving_variance: Weight 78177.664062 at index 4 is outside of [-65504.000000, 65504.000000]. Please try running the parser in a higher precision mode and setting the builder to fp16 mode instead.
Failed to parse UFF
I am facing the same error, I am using unet network for segmentation. I am using cuda 10 and TensorRT 5 on Nvidia DRIVE development platform - XAVIER. Model is written in Keras. The file with representation of our model can be found here:
UFFParser: Parser error: batch_normalization_11/moving_variance: Weight 70463.468750 at index 3 is outside of [-65504.000000, 65504.000000]. Please try running the parser in a higher precision mode and setting the builder to fp16 mode instead.
Per engineering, please make sure you are not setting parser in FP32 mode and builder in FP16 mode? To run the builder in half mode you can use builder->setFp16Mode(true);
I have checked that, I am using trtexe builded for platform which I got by installing TensorRT, when --fp16 arg is turned on, in configureBuilder function there is a line which does that:
builder->setFp16Mode(gParams.fp16);
And this gParams.fp16 is true if --fp16 arg is used in function parseArgs by calling parseBool function:
as stated in the error message you need to run the parser in a higher precision mode (fp32) and the builder in fp16.
As the builder is already set to fp16 you only need to change the precision of the parser in line 283 in trtexec from:
if (!parser->parse(gParams.uffFile.c_str(), *network, gParams.fp16 ? DataType::kHalf : DataType::kFLOAT))
return nullptr;
to:
if (!parser->parse(gParams.uffFile.c_str(), *network, DataType::kFLOAT))
return nullptr;
Hello, with standard TRT5.0.2, and your uff file , I’m seeing
root@ff618c3382c2:/home/scratch.zhenyih_sw/tensorrt# trtexec --uff=/home/scratch.zhenyih_sw/tensorrt/model34_sub.uff --uffInput=input_1,3,256,512 --output=conv2d_9/sub --avgRuns=1 --iterations=10 --batch=1 --fp16 --allowGPUFallback --useDLACore=1
uff: /home/scratch.zhenyih_sw/tensorrt/model34_sub.uff
uffInput: input_1,3,256,512
output: conv2d_9/sub
avgRuns: 1
iterations: 10
batch: 1
fp16
allowGPUFallback
useDLACore: 1
Parameter check failed at: ../builder/builder.cpp::setDefaultDeviceType::228, condition: mHwContext.hasDLA() && mHwContext.getNbDLAEngines() > 0
name=input_1, bindingIndex=0, buffers.size()=2
name=conv2d_9/sub, bindingIndex=1, buffers.size()=2
Average over 1 runs is 8.64989 ms (host walltime is 8.95034 ms, 99% percentile time is 8.64989).
Average over 1 runs is 8.59574 ms (host walltime is 8.79849 ms, 99% percentile time is 8.59574).
Average over 1 runs is 8.63802 ms (host walltime is 8.81568 ms, 99% percentile time is 8.63802).
Average over 1 runs is 8.63338 ms (host walltime is 8.90014 ms, 99% percentile time is 8.63338).
Average over 1 runs is 8.63222 ms (host walltime is 8.80926 ms, 99% percentile time is 8.63222).
Average over 1 runs is 8.65533 ms (host walltime is 8.87169 ms, 99% percentile time is 8.65533).
Average over 1 runs is 8.62499 ms (host walltime is 8.83119 ms, 99% percentile time is 8.62499).
Average over 1 runs is 8.63654 ms (host walltime is 8.85553 ms, 99% percentile time is 8.63654).
Average over 1 runs is 8.62941 ms (host walltime is 8.81509 ms, 99% percentile time is 8.62941).
Average over 1 runs is 8.62736 ms (host walltime is 8.8101 ms, 99% percentile time is 8.62736).
root@d64645ab85f9:/mnt# TensorRT-#####/targets/x86_64-linux-gnu/bin/trtexec --uff=/mnt/model34_sub.uff --uffInput=input_1,3,256,512 --output=conv2d_9/sub --avgRuns=1 --iterations=10 --batch=1 --fp16 --allowGPUFallback --useDLACore=1
&&&& RUNNING TensorRT.trtexec # TensorRT-#####/targets/x86_64-linux-gnu/bin/trtexec --uff=/mnt/model34_sub.uff --uffInput=input_1,3,256,512 --output=conv2d_9/sub --avgRuns=1 --iterations=10 --batch=1 --fp16 --allowGPUFallback --useDLACore=1
[I] uff: /mnt/model34_sub.uff
[I] uffInput: input_1,3,256,512
[I] output: conv2d_9/sub
[I] avgRuns: 1
[I] iterations: 10
[I] batch: 1
[I] fp16
[I] allowGPUFallback
[I] useDLACore: 1
Trying to use DLA core 1 on a platform that doesn't have any DLA cores
[I] [TRT] Detected 1 input and 1 output network tensors.
[I] Average over 1 runs is 4.73395 ms (host walltime is 4.96019 ms, 99% percentile time is 4.73395).
[I] Average over 1 runs is 4.69914 ms (host walltime is 4.95625 ms, 99% percentile time is 4.69914).
[I] Average over 1 runs is 4.6848 ms (host walltime is 4.94229 ms, 99% percentile time is 4.6848).
[I] Average over 1 runs is 4.69504 ms (host walltime is 4.92772 ms, 99% percentile time is 4.69504).
[I] Average over 1 runs is 4.6848 ms (host walltime is 4.92634 ms, 99% percentile time is 4.6848).
[I] Average over 1 runs is 4.70118 ms (host walltime is 4.93648 ms, 99% percentile time is 4.70118).
[I] Average over 1 runs is 4.69094 ms (host walltime is 4.92564 ms, 99% percentile time is 4.69094).
[I] Average over 1 runs is 4.70528 ms (host walltime is 4.97176 ms, 99% percentile time is 4.70528).
[I] Average over 1 runs is 4.76672 ms (host walltime is 5.03453 ms, 99% percentile time is 4.76672).
[I] Average over 1 runs is 4.68685 ms (host walltime is 4.9509 ms, 99% percentile time is 4.68685).
&&&& PASSED TensorRT.trtexec # TensorRT-#####/targets/x86_64-linux-gnu/bin/trtexec --uff=/mnt/model34_sub.uff --uffInput=input_1,3,256,512 --output=conv2d_9/sub --avgRuns=1 --iterations=10 --batch=1 --fp16 --allowGPUFallback --useDLACore=1
root@d64645ab85f9:/mnt#
Please stayed tuned for the next release of TRT which should have this fix.