How can we know we have convert the onnx to int8trt rather than Float32?

530869411 · May 20, 2021, 2:28am

Description

A clear and concise description of the bug or issue.

Environment

TensorRT Version: 7.1.3.0
GPU Type:
Nvidia Driver Version: jetson nano
CUDA Version: 10.2
CUDNN Version:
Operating System + Version:
Python Version (if applicable): 3.6.9
TensorFlow Version (if applicable):
PyTorch Version (if applicable): 1.6
*Baremetal or Container (if container which image + tag)

NVES · May 20, 2021, 2:37am

Hi,
Request you to share the ONNX model and the script if not shared already so that we can assist you better.
Alongside you can try few things:

validating your model with the below snippet

check_model.py

import sys
import onnx
filename = yourONNXmodel
model = onnx.load(filename)
onnx.checker.check_model(model).
2) Try running your model with trtexec command.
https://github.com/NVIDIA/TensorRT/tree/master/samples/opensource/trtexec
In case you are still facing issue, request you to share the trtexec “”–verbose"" log for further debugging
Thanks!

530869411 · May 20, 2021, 2:52am

but i don’t have caffe i just have onnx , i have convert to trt ,but when i use the following code to check :`import tensorrt as trt
TRT_LOGGER = trt.Logger()
def get_engine1(engine_path):
# If a serialized engine exists, use it instead of building an engine.
print(“Reading engine from file {}”.format(engine_path))
with open(engine_path, “rb”) as f, trt.Runtime(TRT_LOGGER) as runtime:
return runtime.deserialize_cuda_engine(f.read())

if name == ‘main’:
engine_file_path = “F-Let.trt”
# 可用netron查看onnx的输出数量和尺寸
engines = get_engine1(engine_file_path)
for binding in engines:
size = trt.volume(engines.get_binding_shape(binding)) * 1
dims = engines.get_binding_shape(binding)
print(‘size=’, size)
print(‘dims=’, dims)
print(‘binding=’, binding)
print(“input =”, engines.binding_is_input(binding))
dtype = trt.nptype(engines.get_binding_dtype(binding))
print(“dtype =”, dtype)`
whether is int8, i got this:

530869411 · May 20, 2021, 2:53am

@NVES but i don’t have caffe i just have onnx , i have convert to trt ,but when i use the following code to check :`import tensorrt as trt
TRT_LOGGER = trt.Logger()
def get_engine1(engine_path):

If a serialized engine exists, use it instead of building an engine.

print(“Reading engine from file {}”.format(engine_path))
with open(engine_path, “rb”) as f, trt.Runtime(TRT_LOGGER) as runtime:
return runtime.deserialize_cuda_engine(f.read())

if name == ‘ main ’:
engine_file_path = “F-Let.trt”

可用netron查看onnx的输出数量和尺寸

engines = get_engine1(engine_file_path)
for binding in engines:
size = trt.volume(engines.get_binding_shape(binding)) * 1
dims = engines.get_binding_shape(binding)
print(‘size=’, size)
print(‘dims=’, dims)
print(‘binding=’, binding)
print(“input =”, engines.binding_is_input(binding))
dtype = trt.nptype(engines.get_binding_dtype(binding))
print(“dtype =”, dtype)`
whether is int8, i got this:

spolisetty · May 24, 2021, 10:05am

Hi @530869411,

The input/output is still FP32 even for INT8 model unless reformatFree is used, so we cannot use the above code to check INT8.
Also there is no INT8 support for Jetson Nano, it is maxwell chip.

Thank you.

530869411 · May 24, 2021, 11:46am

@spolisetty now i use the following code to create a fp16 model
onnx_to_tensorrt.py (2.8 KB)
and the inference code
inference-10.py (5.5 KB)
but i just found that fp16 is slower than fp32, Can you help me?

spolisetty · May 24, 2021, 11:52am

Hi @530869411,

We recommend you to please share both issue reproducible onnx models and scripts/steps to try from our end for better assistance.

Meanwhile, alternatively you can try running your model with trtexec command and share us –verbose logs.
https://github.com/NVIDIA/TensorRT/tree/master/samples/opensource/trtexec

Thank you.

530869411 · May 24, 2021, 12:09pm

@spolisetty
convert_to_onnx.py (505 Bytes)
F_TorrentNet.pth (2.0 MB)
F_TorrentNet.onnx (2.0 MB)
verbose.txt (45.1 KB)

530869411 · May 25, 2021, 6:33am

@spolisetty @NVES

spolisetty · May 25, 2021, 4:52pm

Hi @530869411,

Looks like you’ve shared single ONNX file. We request you to please share other model as well to compare performance of both models.

Thank you.

530869411 · May 25, 2021, 11:51pm

i just have onnx(fp32),and i want to through the code to convert onnx(fp32) to fp16trt, when i convert successful ,i flound it’s slower than fp32trt

530869411 · May 26, 2021, 12:44am

F_Let.pth (276.3 KB)
F_Let_fp16.trt (156.1 KB)

spolisetty · May 26, 2021, 5:53pm

Hi @530869411,

We are unable to reproduce this issue, we got FP16 is faster than FP32. We recommend you to please share trtexec --verbose logs of both for better debugging. FP16 should never be slower than FP32. In the worst case (i.e. all fp16 kernels are worse than fp32 kernels), TRT would just fallback to fp32.

For your reference, we can observe fp32 mean latency is 0.160975 and fp16 is 0.136759 which tells fp16 is faster.

fp32

[05/26/2021-17:30:49] [I] Host Latency
[05/26/2021-17:30:49] [I] min: 0.152832 ms (end to end 0.163513 ms)
[05/26/2021-17:30:49] [I] max: 4.63269 ms (end to end 4.64197 ms)
[05/26/2021-17:30:49] [I] mean: 0.160975 ms (end to end 0.169979 ms)
[05/26/2021-17:30:49] [I] median: 0.158691 ms (end to end 0.16748 ms)
[05/26/2021-17:30:49] [I] percentile: 0.174606 ms at 99% (end to end 0.184784 ms at 99%)
[05/26/2021-17:30:49] [I] throughput: 5181.99 qps
[05/26/2021-17:30:49] [I] walltime: 3.00039 s
[05/26/2021-17:30:49] [I] Enqueue Time
[05/26/2021-17:30:49] [I] min: 0.143646 ms
[05/26/2021-17:30:49] [I] max: 4.62085 ms
[05/26/2021-17:30:49] [I] median: 0.147705 ms
[05/26/2021-17:30:49] [I] GPU Compute
[05/26/2021-17:30:49] [I] min: 0.139648 ms
[05/26/2021-17:30:49] [I] max: 4.61816 ms
[05/26/2021-17:30:49] [I] mean: 0.147407 ms
[05/26/2021-17:30:49] [I] median: 0.145508 ms
[05/26/2021-17:30:49] [I] percentile: 0.160156 ms at 99%
[05/26/2021-17:30:49] [I] total compute time: 2.29189 s
&&&& PASSED TensorRT.trtexec # trtexec --onnx=F_TorrentNet.onnx --verbose

fp16

[05/26/2021-17:28:57] [I] Host Latency
[05/26/2021-17:28:57] [I] min: 0.130615 ms (end to end 0.143433 ms)
[05/26/2021-17:28:57] [I] max: 3.4176 ms (end to end 3.42505 ms)
[05/26/2021-17:28:57] [I] mean: 0.136759 ms (end to end 0.151333 ms)
[05/26/2021-17:28:57] [I] median: 0.136841 ms (end to end 0.150391 ms)
[05/26/2021-17:28:57] [I] percentile: 0.149658 ms at 99% (end to end 0.162842 ms at 99%)
[05/26/2021-17:28:57] [I] throughput: 6425.03 qps
[05/26/2021-17:28:57] [I] walltime: 3.0003 s
[05/26/2021-17:28:57] [I] Enqueue Time
[05/26/2021-17:28:57] [I] min: 0.107452 ms
[05/26/2021-17:28:57] [I] max: 3.40491 ms
[05/26/2021-17:28:57] [I] median: 0.111328 ms
[05/26/2021-17:28:57] [I] GPU Compute
[05/26/2021-17:28:57] [I] min: 0.116943 ms
[05/26/2021-17:28:57] [I] max: 3.40259 ms
[05/26/2021-17:28:57] [I] mean: 0.122423 ms
[05/26/2021-17:28:57] [I] median: 0.123291 ms
[05/26/2021-17:28:57] [I] percentile: 0.132812 ms at 99%
[05/26/2021-17:28:57] [I] total compute time: 2.35996 s
&&&& PASSED TensorRT.trtexec # trtexec --onnx=F_TorrentNet.onnx --fp16 --verbose

Thank you.

530869411 · May 27, 2021, 1:42am

[quote=“spolisetty, post:14, topic:178479, full:true”]
Hi @530869411,

We are unable to reproduce this issue, we got FP16 is faster than FP32. We recommend you to please share trtexec --verbose logs of both for better debugging. FP16 should never be slower than FP32. In the worst case (i.e. all fp16 kernels are worse than fp32 kernels), TRT would just fallback to fp32.

For your reference, we can observe fp32 mean latency is 0.160975 and fp16 is 0.136759 which tells fp16 is faster.

fp32

[05/26/2021-17:30:49] [I] Host Latency
[05/26/2021-17:30:49] [I] min: 0.152832 ms (end to end 0.163513 ms)
[05/26/2021-17:30:49] [I] max: 4.63269 ms (end to end 4.64197 ms)
[05/26/2021-17:30:49] [I] mean: 0.160975 ms (end to end 0.169979 ms)
[05/26/2021-17:30:49] [I] median: 0.158691 ms (end to end 0.16748 ms)
[05/26/2021-17:30:49] [I] percentile: 0.174606 ms at 99% (end to end 0.184784 ms at 99%)
[05/26/2021-17:30:49] [I] throughput: 5181.99 qps
[05/26/2021-17:30:49] [I] walltime: 3.00039 s
[05/26/2021-17:30:49] [I] Enqueue Time
[05/26/2021-17:30:49] [I] min: 0.143646 ms
[05/26/2021-17:30:49] [I] max: 4.62085 ms
[05/26/2021-17:30:49] [I] median: 0.147705 ms
[05/26/2021-17:30:49] [I] GPU Compute
[05/26/2021-17:30:49] [I] min: 0.139648 ms
[05/26/2021-17:30:49] [I] max: 4.61816 ms
[05/26/2021-17:30:49] [I] mean: 0.147407 ms
[05/26/2021-17:30:49] [I] median: 0.145508 ms
[05/26/2021-17:30:49] [I] percentile: 0.160156 ms at 99%
[05/26/2021-17:30:49] [I] total compute time: 2.29189 s
&&&& PASSED TensorRT.trtexec # trtexec --onnx=F_TorrentNet.onnx --verbose

fp16

[05/26/2021-17:28:57] [I] Host Latency
[05/26/2021-17:28:57] [I] min: 0.130615 ms (end to end 0.143433 ms)
[05/26/2021-17:28:57] [I] max: 3.4176 ms (end to end 3.42505 ms)
[05/26/2021-17:28:57] [I] mean: 0.136759 ms (end to end 0.151333 ms)
[05/26/2021-17:28:57] [I] median: 0.136841 ms (end to end 0.150391 ms)
[05/26/2021-17:28:57] [I] percentile: 0.149658 ms at 99% (end to end 0.162842 ms at 99%)
[05/26/2021-17:28:57] [I] throughput: 6425.03 qps
[05/26/2021-17:28:57] [I] walltime: 3.0003 s
[05/26/2021-17:28:57] [I] Enqueue Time
[05/26/2021-17:28:57] [I] min: 0.107452 ms
[05/26/2021-17:28:57] [I] max: 3.40491 ms
[05/26/2021-17:28:57] [I] median: 0.111328 ms
[05/26/2021-17:28:57] [I] GPU Compute
[05/26/2021-17:28:57] [I] min: 0.116943 ms
[05/26/2021-17:28:57] [I] max: 3.40259 ms
[05/26/2021-17:28:57] [I] mean: 0.122423 ms
[05/26/2021-17:28:57] [I] median: 0.123291 ms
[05/26/2021-17:28:57] [I] percentile: 0.132812 ms at 99%
[05/26/2021-17:28:57] [I] total compute time: 2.35996 s
&&&& PASSED TensorRT.trtexec # trtexec --onnx=F_TorrentNet.onnx --fp16 --verbose

Thank you. [/quote]

but when i use the .trt to do inference .fp16.trt inference is slower than fp32, in some model is slower than fp32

spolisetty · May 30, 2021, 2:20pm

Hi @530869411,

Could you please share us issue reproducible inference script (includes time calculation) to try from our end.

Thank you.

530869411 · May 31, 2021, 2:30am

@spolisetty hi the above inference-10.py is my inference code ,you could download the above code to test. you could download F_Let.pth and F_Let_fp16.trt and inference-10.py to test,what’s more i found that inference on Lenet fp16 is faster but in MLP is slower.
F_MLP.pth (940.4 KB)
F_MLP_fp16.trt (484.2 KB)

spolisetty · June 3, 2021, 6:19pm

Hi @530869411,

We tried running convert_to_onnx.py. But facing some errors. We recommend you to please share only ONNX model, so that we will generate FP16 and FP32 engines and verify the performance to reproduce the issue.

For your info, we need to execute the conversion on the machine on which we will run inference. This is because TensorRT optimizes the graph by using the available GPUs and thus the generated engine cannot be used on different platform.

Thank you.

530869411 · June 4, 2021, 12:47am

thank you very much
F_MLP.onnx (932.7 KB)
F_XMLP.onnx (4.1 MB)

530869411 · June 4, 2021, 12:48am

@spolisetty

spolisetty · June 9, 2021, 7:49pm

Hi @530869411,

Sorry for the delayed response. We tried running converting model to trt and run inference script to test the performance, but facing many errors. And it looks like you’re calculating time for loading pytorch model and trt engine at a time. We recommend you to write separate code to calculate only time for inference on trt engine to get correct calculation.

when we run using trtexec, couldn’t reproduce the issue (fp16 is slower than fp32).

Thank you.

Topic		Replies	Views
Different FP16 inference with tensorrt and pytorch TensorRT	5	4413	October 25, 2021
tensorRT inference unstable compared onnxruntime TensorRT	4	1290	May 4, 2021
Inswapper onnx model conversion to tensorrt model Jetson AGX Orin tensorrt , onnx	28	339	December 30, 2024
TensorRT --- non-int8 fallback when trying to calibrate ONNX model DeepStream SDK tensorrt , deepstream	11	419	July 1, 2024
LSTM ONNX to TensorRT mismatched outputs TensorRT tensorrt	3	923	September 29, 2022
Inference result gets worse when converting pytorch model to TensorRT model TensorRT pytorch	6	1084	January 19, 2022
Onnx -> tensorrt fp32 conversion performance degradation different outputs TensorRT tensorrt , pytorch , onnx	4	1982	November 29, 2022
Extreme engine building time for certain models on Windows with FP16 TensorRT	6	1180	March 23, 2022
TensorRt inference is taking 1.5 sec to inference a single frame.i want to speed up my inference TensorRT tensorrt , jetson-inference , jetson-nano	1	901	March 13, 2023
Onnx to trt conversion TensorRT tensorrt	8	785	April 21, 2020

How can we know we have convert the onnx to int8trt rather than Float32?

Description

Environment

check_model.py

If a serialized engine exists, use it instead of building an engine.

可用netron查看onnx的输出数量和尺寸

Related topics