Why inference in jetson nano with fp16 is slower than fp32

530869411 · May 24, 2021, 1:03am

when i use fp16 to inference with jetson nano,i found that is slower than fp32,. and i found that jetson nano is support fp16,so why is slower?

AastaLLL · May 24, 2021, 3:40am

Hi,

How do you apply the inference on Nano?

Do you use TensorRT?
If yes, it can automatically choose an optimal algorithm (either fp32 or fp16) if the best mode is used.

$ /usr/src/tensorrt/bin/trtexec --best ...

Thanks.

530869411 · May 24, 2021, 5:58am

@AastaLLL yes , i use TensorRT, you mean tensorRT can optimal choose to use fp32 or fp16?
i have model.onnx(fp32),now i want to convert onnx to .trt, and i have convert successful! but is slower than fp16

AastaLLL · May 26, 2021, 8:24am

Hi,

Could you share the comment and corresponding benchmark data with us first?
Thanks.

530869411 · May 26, 2021, 8:32am

F_Let_fp16.trt (156.1 KB)
F-Let.trt (267.5 KB)
F_Let.pth (276.3 KB)
convert_to_onnx.py (505 Bytes)
inference-Let-10.py (5.7 KB)
thank you very much

530869411 · May 26, 2021, 8:32am

@AastaLLL

AastaLLL · June 8, 2021, 7:35am

Hi,

We try to gnereate the ONNX format from .pth file with convert_to_onnx.py.
However, it shows the below dependency error:

$ python3 convert_to_onnx.py
1.8.0
Traceback (most recent call last):
  File "convert_to_onnx.py", line 11, in <module>
    model = torch.load('F_Let.pth', map_location="cuda:0")
  File "/home/nvidia/.local/lib/python3.6/site-packages/torch/serialization.py", line 593, in load
    return _legacy_load(opened_file, map_location, pickle_module, **pickle_load_args)
  File "/home/nvidia/.local/lib/python3.6/site-packages/torch/serialization.py", line 772, in _legacy_load
    result = unpickler.load()
ModuleNotFoundError: No module named 'lenet'

Could you share the required LeNet class with us?
Or the converted ONNX file with us?

Thanks.

530869411 · June 8, 2021, 7:39am

F-LeNet-0.pth (258.4 KB)
Uploading: F-LeNet-0.pth…

AastaLLL · June 22, 2021, 5:09am

There is no update from you for a period, assuming this is not an issue any more.
Hence we are closing this topic. If need further support, please open a new one.
Thanks

Hi,

Thanks for updating the new .pth with us.
Unfortunately, it still request for a custom code when converting the .pth into ONNX format.

$ python3 convert_to_onnx.py
1.8.0
Traceback (most recent call last):
  File "convert_to_onnx.py", line 11, in <module>
    model = torch.load('F_Let.pth', map_location="cuda:0")
  File "/home/nvidia/.local/lib/python3.6/site-packages/torch/serialization.py", line 592, in load
    return _load(opened_zipfile, map_location, pickle_module, **pickle_load_args)
  File "/home/nvidia/.local/lib/python3.6/site-packages/torch/serialization.py", line 851, in _load
    result = unpickler.load()
ModuleNotFoundError: No module named 'models'

Do you mind to share the model definition with us?
Or could you attach a converted ONNX file for us checking?

Thanks.

system · September 5, 2021, 2:58am

This topic was automatically closed 60 days after the last reply. New replies are no longer allowed.

Topic		Replies	Views
No performance improvement on Jetson Nano FP16 vs FP32 TensorRT	6	2691	February 22, 2021
ONNX Runtime Error: fp16 precision has been set for a layer or layer output, but fp16 is not configured in the builder Jetson Nano jetson-inference , onnx	3	2956	February 4, 2022
FP16 does not decrease inference time on Jetson Nano Jetson Nano tensorrt	6	1211	August 23, 2022
Inference using FP16 and FP32 precision giving no performance gain on Jetson Nano Jetson Nano	2	1351	October 14, 2021
Why jetson nano fp16 is slower than fp32 Jetson Nano jetson-inference	2	625	October 15, 2021
TF/Keras inference 4 times faster with FP32 precision than with FP16 Jetson Nano	8	2685	October 18, 2021
Low FPS on Jetson Nano using TensorRT Jetson Nano tensorrt , tensorflow	7	1230	August 27, 2020
I do not get any performance improvement after using TensorRT provider for object detection model Jetson Nano tensorrt , onnx	7	1425	July 12, 2022
Speed of FP32 vs FP16 TAO Toolkit	4	1359	October 12, 2021
Tensorrt can not speed up well TensorRT	7	1650	June 29, 2022

Why inference in jetson nano with fp16 is slower than fp32

Related topics