TensorRT run ONNX model with Int8 issue

qmara781128 · December 17, 2019, 3:31am

I try my onnx model in tensorrt follow the link below:
https://elinux.org/TensorRT/YoloV3

command:

trtexec --onnx=my_model.onnx --output=idx:195_convolutional --output=idx:205_convolutional --output=idx:215_convolutional --int8 --batch=1 --device=0

But run error, information is follow:
My use device: Jetson-Xavier
SW: TensorRT-5.1.6, CUDA-10.0, cuDNN-7.5.1

Input filename: elan_qu2.onnx
ONNX IR version: 0.0.5
Opset version: 9
Producer name: ELAN-AIRD
Producer version:
Domain:
Model version: 0
Doc string:

WARNING: ONNX model has a newer ir_version (0.0.5) than this parser was built against (0.0.3).
[W] [TRT] Tensor idx:191_sub is uniformly zero; network calibration failed.
[W] [TRT] Tensor idx:191_sub copy is uniformly zero; network calibration failed.
[E] [TRT] …/builder/cudnnBuilder2.cpp (1791) - Misc Error in createRegionScalesFromTensorScales: -1 (Could not find scales for tensor idx:117_convolutional_batch_normalize_activation copy.)
[E] [TRT] …/builder/cudnnBuilder2.cpp (1791) - Misc Error in createRegionScalesFromTensorScales: -1 (Could not find scales for tensor idx:117_convolutional_batch_normalize_activation copy.)
[E] could not build engine
[E] Engine could not be created
[E] Engine could not be created
&&&& FAILED TensorRT.trtexec # trtexec --onnx=my_model.onnx --input=input_data --output=idx:195_convolutional --output=idx:205_convolutional --output=idx:215_convolutional --int8 --batch=1 --device=0

I can run success of --fp16, please help of --int8.

SunilJB · December 17, 2019, 4:48am

Hi,

Can you please share the model file to reproduce the issue?

Thanks

qmara781128 · December 17, 2019, 9:28am

How do share file?

qmara781128 · December 17, 2019, 10:23am

The file size > 10MB, please download by link:

https://drive.google.com/open?id=1pAv5mGIhbvFdhM3gOgUawX5NJJDD5_FJ
https://drive.google.com/open?id=1j7MqVvtbt_Cyrxe9Zggljprt-ylNiaXz

Have two model, “my_model173.onnx” can be success, but “my_model174.onnx” not success, i think the problem lies in slice layer “idx:174_slice”, this layer slice from “idx:117_convolutional_batch_normalize_activation”, but I am not sure why can’t get scale

SunilJB · December 19, 2019, 10:23am

Hi,

The model seems to be working on TRT 6.
Could you please try to upgrade to JetPack 4.3 and try on TRT 6?
https://docs.nvidia.com/jetson/jetpack/release-notes/#jetpack-version

Thanks

qmara781128 · December 20, 2019, 8:07am

Hi SunilJB,

I upgrade to JetPack 4.3 and try on TRT 6 with “my_model174.onnx”, but report error about unknown option.

Commend

[code]nvidia@nvidia:~/Downloads$ trtexec --onnx=my_model.onnx --output=idx:174_activation --int8 --batch=1 --device=0

Information:

&&&& RUNNING TensorRT.trtexec # trtexec --onnx=my_model.onnx --output=idx:174_activation --int8 --batch=1 --device=0
[11/20/2019-15:57:41] [E] Unknown option: --output idx:174_activation
=== Model Options ===
  --uff=<file>                UFF model
  --onnx=<file>               ONNX model
  --model=<file>              Caffe model (default = no model, random weights used)
  --deploy=<file>             Caffe prototxt file
  --output=<name>[,<name>]*   Output names (it can be specified multiple times); at least one output is required for UFF and Caffe
  --uffInput=<name>,X,Y,Z     Input blob name and its dimensions (X,Y,Z=C,H,W), it can be specified multiple times; at least one is required for UFF models
  --uffNHWC                   Set if inputs are in the NHWC layout instead of NCHW (use X,Y,Z=H,W,C order in --uffInput)

=== Build Options ===
  --maxBatch                  Set max batch size and build an implicit batch engine (default = 1)
  --explicitBatch             Use explicit batch sizes when building the engine (default = implicit)
  --minShapes=spec            Build with dynamic shapes using a profile with the min shapes provided
  --optShapes=spec            Build with dynamic shapes using a profile with the opt shapes provided
  --maxShapes=spec            Build with dynamic shapes using a profile with the max shapes provided
                              Note: if any of min/max/opt is missing, the profile will be completed using the shapes 
                                    provided and assuming that opt will be equal to max unless they are both specified;
                                    partially specified shapes are applied starting from the batch size;
                                    dynamic shapes imply explicit batch
                              Input shapes spec ::= Ishp[","spec]
                                           Ishp ::= name":"shape
                                          shape ::= N[["x"N]*"*"]
  --inputIOFormats=spec       Type and formats of the input tensors (default = all inputs in fp32:chw)
  --outputIOFormats=spec      Type and formats of the output tensors (default = all outputs in fp32:chw)
                              IO Formats: spec  ::= IOfmt[","spec]
                                          IOfmt ::= type:fmt
                                          type  ::= "fp32"|"fp16"|"int32"|"int8"
                                          fmt   ::= ("chw"|"chw2"|"chw4"|"hwc8"|"chw16"|"chw32")["+"fmt]
  --workspace=N               Set workspace size in megabytes (default = 16)
  --minTiming=M               Set the minimum number of iterations used in kernel selection (default = 1)
  --avgTiming=M               Set the number of times averaged in each iteration for kernel selection (default = 8)
  --fp16                      Enable fp16 mode (default = disabled)
  --int8                      Run in int8 mode (default = disabled)
  --calib=<file>              Read INT8 calibration cache file
  --safe                      Only test the functionality available in safety restricted flows
  --saveEngine=<file>         Save the serialized engine
  --loadEngine=<file>         Load a serialized engine

=== Inference Options ===
  --batch=N                   Set batch size for implicit batch engines (default = 1)
  --shapes=spec               Set input shapes for explicit batch and dynamic shapes inputs
                              Input shapes spec ::= Ishp[","spec]
                                           Ishp ::= name":"shape
                                          shape ::= N[["x"N]*"*"]
  --iterations=N              Run at least N inference iterations (default = 10)
  --warmUp=N                  Run for N milliseconds to warmup before measuring performance (default = 200)
  --duration=N                Run performance measurements for at least N seconds wallclock time (default = 10)
  --sleepTime=N               Delay inference start with a gap of N milliseconds between launch and compute (default = 0)
  --streams=N                 Instantiate N engines to use concurrently (default = 1)
  --useSpinWait               Actively synchronize on GPU events. This option may decrease synchronization time but increase CPU usage and power (default = false)
  --threads                   Enable multithreading to drive engines with independent threads (default = disabled)
  --useCudaGraph              Use cuda graph to capture engine execution and then launch inference (default = false)
  --buildOnly                 Skip inference perf measurement (default = disabled)

=== Build and Inference Batch Options ===
                              When using implicit batch, the max batch size of the engine, if not given, 
                              is set to the inference batch size;
                              when using explicit batch, if shapes are specified only for inference, they 
                              will be used also as min/opt/max in the build profile; if shapes are 
                              specified only for the build, the opt shapes will be used also for inference;
                              if both are specified, they must be compatible; and if explicit batch is 
                              enabled but neither is specified, the model must provide complete static
                              dimensions, including batch size, for all inputs

=== Reporting Options ===
  --verbose                   Use verbose logging (default = false)
  --avgRuns=N                 Report performance measurements averaged over N consecutive iterations (default = 10)
  --percentile=P              Report performance for the P percentage (0<=P<=100, 0 representing max perf, and 100 representing min perf; (default = 99%)
  --dumpOutput                Print the output tensor(s) of the last inference iteration (default = disabled)
  --dumpProfile               Print profile information per layer (default = disabled)
  --exportTimes=<file>        Write the timing results in a json file (default = disabled)
  --exportProfile=<file>      Write the profile information per layer in a json file (default = disabled)

=== System Options ===
  --device=N                  Select cuda device N (default = 0)
  --useDLACore=N              Select DLA core N for layers that support DLA (default = none)
  --allowGPUFallback          When DLA is enabled, allow GPU fallback for unsupported layers (default = disabled)
  --plugins                   Plugin library (.so) to load (can be specified multiple times)

=== Help ===
  --help                      Print this message
Note: the following options are not fully supported in trtexec: dynamic shapes, multistream/threads, cuda graphs, json logs, and actual data IO
&&&& FAILED TensorRT.trtexec # trtexec --onnx=my_model.onnx --output=idx:174_activation --int8 --batch=1 --device=0

If not get --output, will prompt “Network must have at least one output” error.
Options is change of TRT6? How use --output?

SunilJB · December 20, 2019, 8:36am

Hi,

“–output” param is mandatory just for UFF and Caffe model.
Check trtexec --help:
Mandatory params for UFF:
–uffInput=,C,H,W Input blob name and its dimensions for UFF parser (can be specified multiple times)
–output= Output blob name (can be specified multiple times)

Mandatory params for Caffe:
–output= Output blob name (can be specified multiple times)

Can you try without “–output” option in “–verbose” mode?

trtexec --onnx=my_model_174.onnx --int8 --batch=1 --device=0 --verbose

trtexec --onnx=my_model_174.onnx --int8 --batch=1 --device=0 --saveEngine=<file> --verbose

Thanks

qmara781128 · December 24, 2019, 1:20am

Sorry i’m late.

Thank your reply, I can work onnx model by this function of “trtexec” with TRT6.0.

I will verify the quantization inference performance.

Thank you.

qmara781128 · March 17, 2020, 6:47am

Hi,
Another problem, we have try one layer of ([in_ch, out_ch, w, h], 128 * 128 * 3 * 3) convolution layer of ONNX, run ./trtexec can be success by DLA, but try one layer of ([ in_ch, out_ch, w, h], 1024 * 1024 * 3 * 3) convolution layer of ONNX, run ./trtexec is not success by DLA, it needs use GPU.

So, DLA not supported 1024 channel number of convolution kernel channel number?

How many are the limits of convolution kernel channel number?

Topic		Replies	Views
Issues running Onnx classifier model in deepstream DeepStream SDK tensorrt , onnx	5	1672	October 12, 2021
TensorRT Batch Inference: different results TensorRT	4	4205	December 1, 2021
Inference error while using tensorrt engine on jetson nano Jetson Nano tensorrt , nvbugs	23	3596	April 20, 2022
Issues while converting ONNX to TRT Jetson Nano tensorrt , onnx	9	1269	October 18, 2021
Generate Dynamic batch size engine with tensorrt for DLA based CNN Inference Jetson AGX Orin tensorrt , dla , onnx	2	42	September 30, 2024
I do not get any performance improvement after using TensorRT provider for object detection model Jetson Nano tensorrt , onnx	7	1401	July 12, 2022
Onnx to trt conversion TensorRT tensorrt	8	795	April 21, 2020
Batch Inference Wrong in Python API TensorRT	15	3550	October 12, 2021
ONNX model and TensorRT engine works differently TensorRT	5	724	February 20, 2023
Model onnx trt engine generation process report different results compared between PC and jetson XAVIER NX Jetson Xavier NX tensorrt	19	1017	September 28, 2022

TensorRT run ONNX model with Int8 issue

But run error, information is follow: My use device: Jetson-Xavier SW: TensorRT-5.1.6, CUDA-10.0, cuDNN-7.5.1

Input filename: elan_qu2.onnx ONNX IR version: 0.0.5 Opset version: 9 Producer name: ELAN-AIRD Producer version: Domain: Model version: 0 Doc string:

Related topics

But run error, information is follow:
My use device: Jetson-Xavier
SW: TensorRT-5.1.6, CUDA-10.0, cuDNN-7.5.1

Input filename: elan_qu2.onnx
ONNX IR version: 0.0.5
Opset version: 9
Producer name: ELAN-AIRD
Producer version:
Domain:
Model version: 0
Doc string: