Issues running Onnx classifier model in deepstream

DS ver : 5.1
cuda: 11.2

ONNX IR version:  0.0.6
Opset version:    9
Producer name:    pytorch
Producer version: 1.8
Domain:           
Model version:    0
Doc string:       
----------------------------------------------------------------
WARNING: ../nvdsinfer/nvdsinfer_func_utils.cpp:36 [TRT]: TensorRT was linked against cuBLAS/cuBLAS LT 11.3.0 but loaded cuBLAS/cuBLAS LT 11.2.1
INFO: ../nvdsinfer/nvdsinfer_func_utils.cpp:39 [TRT]: Some tactics do not have sufficient workspace memory to run. Increasing workspace size may increase performance, please check verbose output.
INFO: ../nvdsinfer/nvdsinfer_func_utils.cpp:39 [TRT]: Detected 1 inputs and 1 output network tensors.
WARNING: ../nvdsinfer/nvdsinfer_func_utils.cpp:36 [TRT]: TensorRT was linked against cuBLAS/cuBLAS LT 11.3.0 but loaded cuBLAS/cuBLAS LT 11.2.1
0:00:31.737852157 329671 0x55ea2891af30 INFO                 nvinfer gstnvinfer.cpp:619:gst_nvinfer_logger:<secondary_gie_0> NvDsInferContext[UID 4]: Info from NvDsInferContextImpl::buildModel() <nvdsinfer_context_impl.cpp:1749> [UID = 4]: serialize cuda engine to file: /home/ubuntu/models/Onnx/resnet18_224x224.onnx_b64_gpu0_fp16.engine successfully
WARNING: ../nvdsinfer/nvdsinfer_func_utils.cpp:36 [TRT]: TensorRT was linked against cuBLAS/cuBLAS LT 11.3.0 but loaded cuBLAS/cuBLAS LT 11.2.1
ERROR: ../nvdsinfer/nvdsinfer_func_utils.cpp:33 [TRT]: Flatten_47: reshaping failed for tensor: 189
ERROR: ../nvdsinfer/nvdsinfer_func_utils.cpp:33 [TRT]: shapeMachine.cpp (160) - Shape Error in executeReshape: reshape would change volume
ERROR: ../nvdsinfer/nvdsinfer_func_utils.cpp:33 [TRT]: Instruction: RESHAPE{64 512 1 1} {1 512}
INFO: ../nvdsinfer/nvdsinfer_model_builder.cpp:685 [FullDims Engine Info]: layers num: 2
0   INPUT  kFLOAT input.1         3x224x224       min: 1x3x224x224     opt: 64x3x224x224    Max: 64x3x224x224    
1   OUTPUT kFLOAT output          4               min: 0               opt: 0               Max: 0               

iris-app: nvdsinfer_context_impl.cpp:1302: NvDsInferStatus nvdsinfer::NvDsInferContextImpl::allocateBuffers(): Assertion `bindingDims.numElements > 0' failed.
Aborted (core dumped)

Could not understand what the error is precisely.
The engine seems to be generating from the model (can be seen from the logs and the file is present locally) but immediately crashes out. Regarding the original onnx model it’s a plain old resnet18 model trained on torch with the modification in final fc layer to be of 4.

Can you try with trtexec to see if the issue exist?

4x224.onnx ~/Dataset/PoseClassifier/CombinedDataset/TwoClass/train/sleeping/Coffee_room_01_video_1_0157.png
&&&& RUNNING TensorRT.trtexec # /usr/src/tensorrt/bin/trtexec --onnx=resnet18_224x224.onnx /home/ubuntu/Dataset/PoseClassifier/CombinedDataset/TwoClass/train/sleeping/Coffee_room_01_video_1_0157.png
[06/01/2021-15:14:50] [E] Unknown option: /home/ajay/Dataset/PoseClassifier/CombinedDataset/TwoClass/train/sleeping/Coffee_room_01_video_1_0157.png 
=== Model Options ===
  --uff=<file>                UFF model
  --onnx=<file>               ONNX model
  --model=<file>              Caffe model (default = no model, random weights used)
  --deploy=<file>             Caffe prototxt file
  --output=<name>[,<name>]*   Output names (it can be specified multiple times); at least one output is required for UFF and Caffe
  --uffInput=<name>,X,Y,Z     Input blob name and its dimensions (X,Y,Z=C,H,W), it can be specified multiple times; at least one is required for UFF models
  --uffNHWC                   Set if inputs are in the NHWC layout instead of NCHW (use X,Y,Z=H,W,C order in --uffInput)

=== Build Options ===
  --maxBatch                  Set max batch size and build an implicit batch engine (default = 1)
  --explicitBatch             Use explicit batch sizes when building the engine (default = implicit)
  --minShapes=spec            Build with dynamic shapes using a profile with the min shapes provided
  --optShapes=spec            Build with dynamic shapes using a profile with the opt shapes provided
  --maxShapes=spec            Build with dynamic shapes using a profile with the max shapes provided
  --minShapesCalib=spec       Calibrate with dynamic shapes using a profile with the min shapes provided
  --optShapesCalib=spec       Calibrate with dynamic shapes using a profile with the opt shapes provided
  --maxShapesCalib=spec       Calibrate with dynamic shapes using a profile with the max shapes provided
                              Note: All three of min, opt and max shapes must be supplied.
                                    However, if only opt shapes is supplied then it will be expanded so
                                    that min shapes and max shapes are set to the same values as opt shapes.
                                    In addition, use of dynamic shapes implies explicit batch.
                                    Input names can be wrapped with escaped single quotes (ex: \'Input:0\').
                              Example input shapes spec: input0:1x3x256x256,input1:1x3x128x128
                              Each input shape is supplied as a key-value pair where key is the input name and
                              value is the dimensions (including the batch dimension) to be used for that input.
                              Each key-value pair has the key and value separated using a colon (:).
                              Multiple input shapes can be provided via comma-separated key-value pairs.
  --inputIOFormats=spec       Type and format of each of the input tensors (default = all inputs in fp32:chw)
                              See --outputIOFormats help for the grammar of type and format list.
                              Note: If this option is specified, please set comma-separated types and formats for all
                                    inputs following the same order as network inputs ID (even if only one input
                                    needs specifying IO format) or set the type and format once for broadcasting.
  --outputIOFormats=spec      Type and format of each of the output tensors (default = all outputs in fp32:chw)
                              Note: If this option is specified, please set comma-separated types and formats for all
                                    outputs following the same order as network outputs ID (even if only one output
                                    needs specifying IO format) or set the type and format once for broadcasting.
                              IO Formats: spec  ::= IOfmt[","spec]
                                          IOfmt ::= type:fmt
                                          type  ::= "fp32"|"fp16"|"int32"|"int8"
                                          fmt   ::= ("chw"|"chw2"|"chw4"|"hwc8"|"chw16"|"chw32"|"dhwc8")["+"fmt]
  --workspace=N               Set workspace size in megabytes (default = 16)
  --noBuilderCache            Disable timing cache in builder (default is to enable timing cache)
  --nvtxMode=mode             Specify NVTX annotation verbosity. mode ::= default|verbose|none
  --minTiming=M               Set the minimum number of iterations used in kernel selection (default = 1)
  --avgTiming=M               Set the number of times averaged in each iteration for kernel selection (default = 8)
  --noTF32                    Disable tf32 precision (default is to enable tf32, in addition to fp32)
  --refit                     Mark the engine as refittable. This will allow the inspection of refittable layers 
                              and weights within the engine.
  --fp16                      Enable fp16 precision, in addition to fp32 (default = disabled)
  --int8                      Enable int8 precision, in addition to fp32 (default = disabled)
  --best                      Enable all precisions to achieve the best performance (default = disabled)
  --calib=<file>              Read INT8 calibration cache file
  --safe                      Only test the functionality available in safety restricted flows
  --saveEngine=<file>         Save the serialized engine
  --loadEngine=<file>         Load a serialized engine
  --tacticSources=tactics     Specify the tactics to be used by adding (+) or removing (-) tactics from the default 
                              tactic sources (default = all available tactics).
                              Note: Currently only cuBLAS and cuBLAS LT are listed as optional tactics.
                              Tactic Sources: tactics ::= [","tactic]
                                              tactic  ::= (+|-)lib
                                              lib     ::= "cublas"|"cublasLt"

=== Inference Options ===
  --batch=N                   Set batch size for implicit batch engines (default = 1)
  --shapes=spec               Set input shapes for dynamic shapes inference inputs.
                              Note: Use of dynamic shapes implies explicit batch.
                                    Input names can be wrapped with escaped single quotes (ex: \'Input:0\').
                              Example input shapes spec: input0:1x3x256x256, input1:1x3x128x128
                              Each input shape is supplied as a key-value pair where key is the input name and
                              value is the dimensions (including the batch dimension) to be used for that input.
                              Each key-value pair has the key and value separated using a colon (:).
                              Multiple input shapes can be provided via comma-separated key-value pairs.
  --loadInputs=spec           Load input values from files (default = generate random inputs). Input names can be wrapped with single quotes (ex: 'Input:0')
                              Input values spec ::= Ival[","spec]
                                           Ival ::= name":"file
  --iterations=N              Run at least N inference iterations (default = 10)
  --warmUp=N                  Run for N milliseconds to warmup before measuring performance (default = 200)
  --duration=N                Run performance measurements for at least N seconds wallclock time (default = 3)
  --sleepTime=N               Delay inference start with a gap of N milliseconds between launch and compute (default = 0)
  --streams=N                 Instantiate N engines to use concurrently (default = 1)
  --exposeDMA                 Serialize DMA transfers to and from device. (default = disabled)
  --noDataTransfers           Do not transfer data to and from the device during inference. (default = disabled)
  --useSpinWait               Actively synchronize on GPU events. This option may decrease synchronization time but increase CPU usage and power (default = disabled)
  --threads                   Enable multithreading to drive engines with independent threads (default = disabled)
  --useCudaGraph              Use cuda graph to capture engine execution and then launch inference (default = disabled)
  --separateProfileRun        Do not attach the profiler in the benchmark run; if profiling is enabled, a second profile run will be executed (default = disabled)
  --buildOnly                 Skip inference perf measurement (default = disabled)

=== Build and Inference Batch Options ===
                              When using implicit batch, the max batch size of the engine, if not given, 
                              is set to the inference batch size;
                              when using explicit batch, if shapes are specified only for inference, they 
                              will be used also as min/opt/max in the build profile; if shapes are 
                              specified only for the build, the opt shapes will be used also for inference;
                              if both are specified, they must be compatible; and if explicit batch is 
                              enabled but neither is specified, the model must provide complete static
                              dimensions, including batch size, for all inputs

=== Reporting Options ===
  --verbose                   Use verbose logging (default = false)
  --avgRuns=N                 Report performance measurements averaged over N consecutive iterations (default = 10)
  --percentile=P              Report performance for the P percentage (0<=P<=100, 0 representing max perf, and 100 representing min perf; (default = 99%)
  --dumpRefit                 Print the refittable layers and weights from a refittable engine
  --dumpOutput                Print the output tensor(s) of the last inference iteration (default = disabled)
  --dumpProfile               Print profile information per layer (default = disabled)
  --exportTimes=<file>        Write the timing results in a json file (default = disabled)
  --exportOutput=<file>       Write the output tensors to a json file (default = disabled)
  --exportProfile=<file>      Write the profile information per layer in a json file (default = disabled)

=== System Options ===
  --device=N                  Select cuda device N (default = 0)
  --useDLACore=N              Select DLA core N for layers that support DLA (default = none)
  --allowGPUFallback          When DLA is enabled, allow GPU fallback for unsupported layers (default = disabled)
  --plugins                   Plugin library (.so) to load (can be specified multiple times)

=== Help ===
  --help, -h                  Print this message
&&&& FAILED TensorRT.trtexec # /usr/src/tensorrt/bin/trtexec --onnx=resnet18_224x224.onnx /home/ajay/Dataset/PoseClassifier/CombinedDataset/TwoClass/train/sleeping/Coffee_room_01_video_1_0157.png

It seems to fail

@Amycao any update?

Sorry for the late.
./trtexec --onnx=path --explicitBatch