What is the meaning of code at cudnnBuilderBlockChooser.cpp:127?


I use caffe-onnx project to generate a R100 onnx model. Then I got the error when I use trtexec to generate an engine from this R100 onnx model.


TensorRT Version: 7.0
Nvidia Driver Version: 440.44
CUDA Version: 10.2
CUDNN Version: 7.6
Operating System + Version: Deepstream 5.0 container
Python Version (if applicable): 3.6
Baremetal or Container (if container which image + tag): nvcr.io/nvidia/deepstream :5.0.1-20.09-triton

Relevant Files

Steps To Reproduce

When caffe-onnx generated the R100.onnx, it was with batch size 1.
And I want to get a dynamic batch onnx model. So I use the code to support dynamic batch.

    onnxmodel.graph.input[0].type.tensor_type.shape.dim[0].dim_param = -1
onnxmodel.graph.output[0].type.tensor_type.shape.dim[0].dim_param = -1

Then, I use next command to generate an engine and I got an error:
[TensorRT] INTERNAL ERROR: Assertion failed: mg.nodes[mg.regionIndices[outputRegion]].size == mg.nodes[mg.regionIndices[inputRegion]].size …/builder/cudnnBuilderBlockChooser.cpp:127

./trtexec --explicitBatch --workspace=4096 --onnx=./r100.onnx --minShapes=‘data_input’:1x3x112x112 --maxShapes=‘data_input’:8x3x112x112 --optShapes=‘data_input’:8x3x112x112 --shapes=‘data_input’:8x3x112x112 --int8 --calib=./r100_int8_caliborator.cache --saveEngine=r100_int8.engine

Do anyone knows how to fix this bug ? Any help would be appreciated.
And what is the meaning of code at cudnnBuilderBlockChooser.cpp:127 ?

I also use polygraphy to inspect the onnx model:

./polygraphy inspect model ./r100_gs.onnx
[I] ==== ONNX Model ====
Name: r100_dynamic | Opset: 11

---- 1 Graph Inputs ----
{data_input [dtype=float32, shape=(-1, 3, 112, 112)]}

---- 1 Graph Outputs ----
{fc1_Y [dtype=float32, shape=(-1, 256)]}

---- 772 Initializers ----
(Use --mode to display)

---- 358 Nodes ----
(Use --mode to display)

Hi @Amy_21,

We request you to share issue reproducible ONNX model to try from our end for better assistance.

Thank you.

Hi, @spolisetty ,

The onnx file is too big to upload.
So, I tried the resnet50 which is in the git repo caffe-onnx/caffemodel/resnet-50 to reproduce the bug.

To get a dynamic batchsize input and output, I use onnx_graphsurgeon to change the onnx.
My caffe-onnx/convert2onnx.py is uploaded. You can check the fuction do_change_batch().convert2onnx.py (2.5 KB)

Also, When I use the next command to get tensorRT engine, the same error occurred again.([F] [TRT] Assertion failed: mg.nodes[mg.regionIndices[outputRegion]].size == mg.nodes[mg.regionIndices[inputRegion]].size

./trtexec --explicitBatch --workspace=4096 --onnx=/home/zj/11_TensorRT/caffe-onnx/onnxmodel/r50.onnx --minShapes=input:1x3x224x224 --maxShapes=input:8x3x224x224 --optShapes=input:8x3x224x224 --shapes=input:8x3x224x224 --saveEngine=r50.engine

Download resnet50 .caffemodel file from BaiduDisk and put resnet-50-model.caffemodel to ./caffemodel/resnet-50/

Convert resnet50 caffe model to onnx model. But you need to use my uploaded convert2onnx.py

$ python3 convert2onnx.py
resnet50 onnxmodel

Hi @Amy_21,

For some reason I’m unable to download the shared files. It would be really helpful, if you could share in google drive or DM above files.

Thank you.