Tensort model with FP16 give me float16 > 1.0 output with sigmoid activation

Hello,

I follow the post from here telling me my problem seems to be a jetson nano issue :

I created a tensort model with --best option with trtexec on jetson nano from an onnx model, derivated from a keras/tensorflow model.

When using it with float32 input, it works great. But it does not work with input float16.

The output of my last layer is a sigmoid activation so I normally not be able to have float >1.0 but I got values like 5e2, …

Do you have an idea of the problem which cause this and how to solve it ?

Thanks a lot for your help.

Regards

Hi,

Could you share which JetPack version do you use?
If you are not using JetPack4.6, it’s recommended to give it a try first.

Thanks.

Hello, this is the information from my board about jetpack and cuda.

I cast my float32 tensor into float16 using tensorflow cast method with version 2.5.0.

Thanks you for your help.

Package: nvidia-jetpack
Version: 4.6-b199
Architecture: arm64
Maintainer: NVIDIA Corporation
Installed-Size: 194
Depends: nvidia-cuda (= 4.6-b199), nvidia-opencv (= 4.6-b199), nvidia-cudnn8 (= 4.6-b199), nvidia-tensorrt (= 4.6-b199), nvidia-visionworks (= 4.6-b199), nvidia-container (= 4.6-b199), nvidia-vpi (= 4.6-b199), nvidia-l4t-jetson-multimedia-api (>> 32.6-0), nvidia-l4t-jetson-multimedia-api (<< 32.7-0)
Homepage: http://developer.nvidia.com/jetson
Priority: standard
Section: metapackages
Filename: pool/main/n/nvidia-jetpack/nvidia-jetpack_4.6-b199_arm64.deb
Size: 29368
SHA256: 69df11e22e2c8406fe281fe6fc27c7d40a13ed668e508a592a6785d40ea71669
SHA1: 5c678b8762acc54f85b4334f92d9bb084858907a
MD5sum: 1b96cd72f2a434e887f98912061d8cfb
Description: NVIDIA Jetpack Meta Package
Description-md5: ad1462289bdbc54909ae109d1d32c0a8

Package: nvidia-jetpack
Version: 4.6-b197
Architecture: arm64
Maintainer: NVIDIA Corporation
Installed-Size: 194
Depends: nvidia-cuda (= 4.6-b197), nvidia-opencv (= 4.6-b197), nvidia-cudnn8 (= 4.6-b197), nvidia-tensorrt (= 4.6-b197), nvidia-visionworks (= 4.6-b197), nvidia-container (= 4.6-b197), nvidia-vpi (= 4.6-b197), nvidia-l4t-jetson-multimedia-api (>> 32.6-0), nvidia-l4t-jetson-multimedia-api (<< 32.7-0)
Homepage: http://developer.nvidia.com/jetson
Priority: standard
Section: metapackages
Filename: pool/main/n/nvidia-jetpack/nvidia-jetpack_4.6-b197_arm64.deb
Size: 29356
SHA256: 104cd0c1efefe5865753ec9b0b148a534ffdcc9bae525637c7532b309ed44aa0
SHA1: 8cca8b9ebb21feafbbd20c2984bd9b329a202624
MD5sum: 463d4303429f163b97207827965e8fe0
Description: NVIDIA Jetpack Meta Package
Description-md5: ad1462289bdbc54909ae109d1d32c0a8

Cuda information :

nvcc: NVIDIA (R) Cuda compiler driver
Copyright (c) 2005-2021 NVIDIA Corporation
Built on Sun_Feb_28_22:34:44_PST_2021
Cuda compilation tools, release 10.2, V10.2.300
Build cuda_10.2_r440.TC440_70.29663091_0

Is there a need to adapt the batchnormalisation layer in FP16 ?

Because here : GitHub - kentaroy47/benchmark-FP32-FP16-INT8-with-TensorRT: Benchmark inference speed of CNNs with various quantization methods in Pytorch+TensorRT with Jetson Nano/Xavier
There is a adapation of batchnormalisation layer…

Still have the problem I cannot get floating point < 1.0 in FP16 mode…

Is is normal that get_binding_dtype(binding) give me float32 and not float16 ?

Ok, I found the solution, the sample in Gihub are false, I must use float32 as entry and not cast entry to float16.