trt5 has a issue when convert model in half mode

Hello everyone,
Recently i update to trt5, and try to convert the same model, got below issue, is anybody got same issue?

UFFParser: Parser error: res_aspp_g/decoder/resnet/bn_conv1/batch_normalization/moving_variance: Weight 78177.664062 at index 4 is outside of [-65504.000000, 65504.000000]. Please try running the parser in a higher precision mode and setting the builder to fp16 mode instead.
Failed to parse UFF

Hello,

can you provide details on what type of model you are trying to convert? (tensorflow, caffe, etc?) can you share the model to help us debug?

Hello,

I am facing the same error, I am using unet network for segmentation. I am using cuda 10 and TensorRT 5 on Nvidia DRIVE development platform - XAVIER. Model is written in Keras. The file with representation of our model can be found here:

Line I am executing:

./trtexec --uff=/home/ubuntu/model.uff --uffInput=input_1,3,256,512 --output=conv2d_9/BiasAdd --avgRuns=1 --iterations=10 --batch=1 --fp16 --allowGPUFallback --useDLACore=1

Error:

UFFParser: Parser error: batch_normalization_11/moving_variance: Weight 70463.468750 at index 3 is outside of [-65504.000000, 65504.000000]. Please try running the parser in a higher precision mode and setting the builder to fp16 mode instead.

Hello,

Per engineering, please make sure you are not setting parser in FP32 mode and builder in FP16 mode? To run the builder in half mode you can use builder->setFp16Mode(true);

regards,
NVIDIA Enterprise Support

Hello,

I have checked that, I am using trtexe builded for platform which I got by installing TensorRT, when --fp16 arg is turned on, in configureBuilder function there is a line which does that:

builder->setFp16Mode(gParams.fp16);

And this gParams.fp16 is true if --fp16 arg is used in function parseArgs by calling parseBool function:

if (parseBool(argv[j], “fp16”, gParams.fp16)
|| parseBool(argv[j], “int8”, gParams.int8)
|| parseBool(argv[j], “verbose”, gParams.verbose)
|| parseBool(argv[j], “allowGPUFallback”, gParams.allowGPUFallback))

Best regards,
Filip Baba

Hello together,

as stated in the error message you need to run the parser in a higher precision mode (fp32) and the builder in fp16.
As the builder is already set to fp16 you only need to change the precision of the parser in line 283 in trtexec from:

if (!parser->parse(gParams.uffFile.c_str(), *network, gParams.fp16 ? DataType::kHalf : DataType::kFLOAT))
        return nullptr;

to:

if (!parser->parse(gParams.uffFile.c_str(), *network, DataType::kFLOAT))
        return nullptr;

Best Regards,
Frank

This is the error I am getting after that change:

uff: /home/ubuntu/model.uff
uffInput: input_1,3,256,512
output: conv2d_9/BiasAdd
avgRuns: 1
iterations: 10
batch: 1
fp16
allowGPUFallback
useDLACore: 1
verbose


Layers running on DLA:
.
.
.



Layers running on GPU:
.
.
.


Original: 311 layers
After dead-layer removal: 111 layers
Segmentation fault (core dumped)

Best regards,
Filip

@flipa.baba, our engineers have a fix. can you share the /home/ubuntu/model.uff so we can confirm the fix? you can DM me if you’d like.

I sent it to DM.

Thanks,
Filip Baba

Hello, with standard TRT5.0.2, and your uff file , I’m seeing

root@ff618c3382c2:/home/scratch.zhenyih_sw/tensorrt# trtexec --uff=/home/scratch.zhenyih_sw/tensorrt/model34_sub.uff --uffInput=input_1,3,256,512 --output=conv2d_9/sub --avgRuns=1 --iterations=10 --batch=1 --fp16 --allowGPUFallback --useDLACore=1
uff: /home/scratch.zhenyih_sw/tensorrt/model34_sub.uff
uffInput: input_1,3,256,512
output: conv2d_9/sub
avgRuns: 1
iterations: 10
batch: 1
fp16
allowGPUFallback
useDLACore: 1
Parameter check failed at: ../builder/builder.cpp::setDefaultDeviceType::228, condition: mHwContext.hasDLA() && mHwContext.getNbDLAEngines() > 0
name=input_1, bindingIndex=0, buffers.size()=2
name=conv2d_9/sub, bindingIndex=1, buffers.size()=2
Average over 1 runs is 8.64989 ms (host walltime is 8.95034 ms, 99% percentile time is 8.64989).
Average over 1 runs is 8.59574 ms (host walltime is 8.79849 ms, 99% percentile time is 8.59574).
Average over 1 runs is 8.63802 ms (host walltime is 8.81568 ms, 99% percentile time is 8.63802).
Average over 1 runs is 8.63338 ms (host walltime is 8.90014 ms, 99% percentile time is 8.63338).
Average over 1 runs is 8.63222 ms (host walltime is 8.80926 ms, 99% percentile time is 8.63222).
Average over 1 runs is 8.65533 ms (host walltime is 8.87169 ms, 99% percentile time is 8.65533).
Average over 1 runs is 8.62499 ms (host walltime is 8.83119 ms, 99% percentile time is 8.62499).
Average over 1 runs is 8.63654 ms (host walltime is 8.85553 ms, 99% percentile time is 8.63654).
Average over 1 runs is 8.62941 ms (host walltime is 8.81509 ms, 99% percentile time is 8.62941).
Average over 1 runs is 8.62736 ms (host walltime is 8.8101 ms, 99% percentile time is 8.62736).

With our fix candidate, things look much cleaner:

root@d64645ab85f9:/mnt# TensorRT-#####/targets/x86_64-linux-gnu/bin/trtexec --uff=/mnt/model34_sub.uff --uffInput=input_1,3,256,512 --output=conv2d_9/sub --avgRuns=1 --iterations=10 --batch=1 --fp16 --allowGPUFallback --useDLACore=1
&&&& RUNNING TensorRT.trtexec # TensorRT-#####/targets/x86_64-linux-gnu/bin/trtexec --uff=/mnt/model34_sub.uff --uffInput=input_1,3,256,512 --output=conv2d_9/sub --avgRuns=1 --iterations=10 --batch=1 --fp16 --allowGPUFallback --useDLACore=1
[I] uff: /mnt/model34_sub.uff
[I] uffInput: input_1,3,256,512
[I] output: conv2d_9/sub
[I] avgRuns: 1
[I] iterations: 10
[I] batch: 1
[I] fp16
[I] allowGPUFallback
[I] useDLACore: 1

Trying to use DLA core 1 on a platform that doesn't have any DLA cores
[I] [TRT] Detected 1 input and 1 output network tensors.
[I] Average over 1 runs is 4.73395 ms (host walltime is 4.96019 ms, 99% percentile time is 4.73395).
[I] Average over 1 runs is 4.69914 ms (host walltime is 4.95625 ms, 99% percentile time is 4.69914).
[I] Average over 1 runs is 4.6848 ms (host walltime is 4.94229 ms, 99% percentile time is 4.6848).
[I] Average over 1 runs is 4.69504 ms (host walltime is 4.92772 ms, 99% percentile time is 4.69504).
[I] Average over 1 runs is 4.6848 ms (host walltime is 4.92634 ms, 99% percentile time is 4.6848).
[I] Average over 1 runs is 4.70118 ms (host walltime is 4.93648 ms, 99% percentile time is 4.70118).
[I] Average over 1 runs is 4.69094 ms (host walltime is 4.92564 ms, 99% percentile time is 4.69094).
[I] Average over 1 runs is 4.70528 ms (host walltime is 4.97176 ms, 99% percentile time is 4.70528).
[I] Average over 1 runs is 4.76672 ms (host walltime is 5.03453 ms, 99% percentile time is 4.76672).
[I] Average over 1 runs is 4.68685 ms (host walltime is 4.9509 ms, 99% percentile time is 4.68685).
&&&& PASSED TensorRT.trtexec # TensorRT-#####/targets/x86_64-linux-gnu/bin/trtexec --uff=/mnt/model34_sub.uff --uffInput=input_1,3,256,512 --output=conv2d_9/sub --avgRuns=1 --iterations=10 --batch=1 --fp16 --allowGPUFallback --useDLACore=1
root@d64645ab85f9:/mnt#

Please stayed tuned for the next release of TRT which should have this fix.

Thank you so much!

So I will be able to run it after the update?
And I have one more question, which platform are you using?

Best regards,
Filip Baba

Yes, the updated trtexec should be available in the next TRT release. I’m testing on a DGX1-V server.

Nice to hear that!

Thanks!

Hello All,

I am using TensorRT 5.0 bundled with JetPack 4.1 on Jetson AGX Xavier.

I am also facing the same error.

Is the update available now? How can I update my system for this purpose?

Thanks,

We’re still seeing this issue on DRIVE OS, 1.5 years later. Any insight when it’s gonna be finally fixed?