TensorRT parse onnx failed with AveragePool layer

Details

There is a model that I exported to ONNX format LA-Transformers
I got an error when trying to convert to TensorRT engine under trexec.

Environment

TensorRT Version: 8.0.1.6
GPU Type: V100
Nvidia Driver Version: 470.57.02
CUDA Version: 11.3
CUDNN Version: 8.2.1
Operating System + Version: Ubuntu 20.04
Python Version (if applicable):
TensorFlow Version (if applicable):
PyTorch Version (if applicable): 1.8.1
Baremetal or Container (if container which image + tag):

Relevant Files

Model graph:

ONNX Model : Google Drive: Sign-in

In that folder there is model with AveragePool layer: la_transformer_opset_12_origin.onnx
and without it: la_transformer_opset_12_wo_avg_pool.onnx

Steps To Reproduce

using trexec to test with the following params:
$: ./trtexec --onnx=la_transformer_opset_12_orig.onnx --batch=64 --verbose --buildOnly

Tracing output:

[08/05/2021-18:54:28] [W] [TRT] onnx2trt_utils.cpp:390: One or more weights outside the range of INT32 was clamped
[08/05/2021-18:54:28] [V] [TRT] Constant_963 [Constant] outputs: [1216 -> (1)[INT32]], 
[08/05/2021-18:54:28] [V] [TRT] Parsing node: Constant_964 [Constant]
[08/05/2021-18:54:28] [V] [TRT] Constant_964 [Constant] inputs: 
[08/05/2021-18:54:28] [V] [TRT] Constant_964 [Constant] outputs: [1217 -> (1)[INT32]], 
[08/05/2021-18:54:28] [V] [TRT] Parsing node: Slice_965 [Slice]
[08/05/2021-18:54:28] [V] [TRT] Searching for input: 1213
[08/05/2021-18:54:28] [V] [TRT] Searching for input: 1215
[08/05/2021-18:54:28] [V] [TRT] Searching for input: 1216
[08/05/2021-18:54:28] [V] [TRT] Searching for input: 1214
[08/05/2021-18:54:28] [V] [TRT] Searching for input: 1217
[08/05/2021-18:54:28] [V] [TRT] Slice_965 [Slice] inputs: [1213 -> (1, 197, 768)[FLOAT]], [1215 -> (1)[INT32]], [1216 -> (1)[INT32]], [1214 -> (1)[INT32]], [1217 -> (1)[INT32]], 
[08/05/2021-18:54:28] [V] [TRT] Registering layer: Slice_965 for ONNX node: Slice_965
[08/05/2021-18:54:28] [V] [TRT] Registering tensor: 1218 for ONNX tensor: 1218
[08/05/2021-18:54:28] [V] [TRT] Slice_965 [Slice] outputs: [1218 -> (1, 196, 768)[FLOAT]], 
[08/05/2021-18:54:28] [V] [TRT] Parsing node: AveragePool_966 [AveragePool]
[08/05/2021-18:54:28] [V] [TRT] Searching for input: 1218
[08/05/2021-18:54:28] [V] [TRT] AveragePool_966 [AveragePool] inputs: [1218 -> (1, 196, 768)[FLOAT]], 
[08/05/2021-18:54:28] [V] [TRT] Original shape: (1, 196, 768), unsqueezing to: (1, 196, 768, 1)
[08/05/2021-18:54:28] [V] [TRT] Registering layer: AveragePool_966 for ONNX node: AveragePool_966
[08/05/2021-18:54:28] [E] Error[3]: AveragePool_966: at least 5 dimensions are required for input.
[08/05/2021-18:54:28] [E] Error[3]: AveragePool_966: at least 5 dimensions are required for input.
[08/05/2021-18:54:28] [E] [TRT] ModelImporter.cpp:720: While parsing node number 966 [AveragePool -> "1219"]:
[08/05/2021-18:54:28] [E] [TRT] ModelImporter.cpp:721: --- Begin node ---
[08/05/2021-18:54:28] [E] [TRT] ModelImporter.cpp:722: input: "1218"
output: "1219"
name: "AveragePool_966"
op_type: "AveragePool"
attribute {
  name: "ceil_mode"
  i: 0
  type: INT
}
attribute {
  name: "kernel_shape"
  ints: 14
  ints: 1
  type: INTS
}
attribute {
  name: "pads"
  ints: 0
  ints: 0
  ints: 0
  ints: 0
  type: INTS
}
attribute {
  name: "strides"
  ints: 14
  ints: 1
  type: INTS
}

[08/05/2021-18:54:28] [E] [TRT] ModelImporter.cpp:723: --- End node ---
[08/05/2021-18:54:28] [E] [TRT] ModelImporter.cpp:726: ERROR: ModelImporter.cpp:162 In function parseGraph:

Also I tried to load the model and manually add the pool layer like:

void addPoolingLayer(nvinfer1::INetworkDefinition &network) {
    using namespace nvinfer1;
    auto pooling = network.addPoolingNd(*network.getLayer(n_layers - 1)->getOutput(0), nvinfer1::PoolingType::kAVERAGE, nvinfer1::DimsHW{14, 1});
    pooling->getOutput(0)->setName("result");
    pooling->setName("AvgPool2d");
    network.markOutput(*pooling->getOutput(0));
}

bool buildEngine(const std::filesystem::path &model_path, const int MAX_BATCH_SIZE = 1) {
    using namespace common;
    Logger gLogger_(Severity::kVERBOSE);
    //Logger gLogger_;
    auto builder = TensorRTUPtr<nvinfer1::IBuilder>(nvinfer1::createInferBuilder(gLogger_.getTRTLogger()));
    if (!builder)
    {
        return false;
    }
    builder->setMaxBatchSize(MAX_BATCH_SIZE);

    const auto explicitBatch = 1U << static_cast<uint32_t>(nvinfer1::NetworkDefinitionCreationFlag::kEXPLICIT_BATCH);
    auto network = TensorRTUPtr<nvinfer1::INetworkDefinition>(builder->createNetworkV2(explicitBatch));
    if (!network)
    {
        return false;
    }

    auto config = TensorRTUPtr<nvinfer1::IBuilderConfig>(builder->createBuilderConfig());
    if (!config)
    {
        return false;
    }

    auto parser = TensorRTUPtr<nvonnxparser::IParser>(nvonnxparser::createParser(*network, gLogger_.getTRTLogger()));
    if (!parser)
    {
        return false;
    }

    auto parsed = parser->parseFromFile(model_path.string().c_str(), static_cast<int>(gLogger_.getReportableSeverity()));
    if (!parsed)
    {
        return false;
    }

    addPoolingLayer(*network);

    auto engine = std::shared_ptr<nvinfer1::ICudaEngine>(builder->buildEngineWithConfig(*network, *config), InferDeleter());
    if (!engine)
    {
        return false;
    }
    auto hostMemory = engine->serialize();
    const std::string out_name = model_path.stem().string() + ".engine";
    auto output = model_path.parent_path() / out_name;
    std::ofstream ofs(output.string(), std::ios::binary);
    if (!ofs) {
        std::cerr << "could not open plan output file" << std::endl;
        return false;
    }
    ofs.write(reinterpret_cast<const char*>(hostMemory->data()), hostMemory->size());
    return true;
}

But it gives the similar error:
[08/05/2021-18:31:58] [E] [TRT] AvgPool2d: at least 4 dimensions are required for input.

Could you please give a hint how to do it ?

Hi,
Request you to share the ONNX model and the script if not shared already so that we can assist you better.
Alongside you can try few things:
https://docs.nvidia.com/deeplearning/tensorrt/quick-start-guide/index.html#onnx-export

  1. validating your model with the below snippet

check_model.py

import sys
import onnx
filename = yourONNXmodel
model = onnx.load(filename)
onnx.checker.check_model(model).
2) Try running your model with trtexec command.

In case you are still facing issue, request you to share the trtexec “”–verbose"" log for further debugging
Thanks!

Tested. And model is fine. ONNX version is: 1.10.1

Model is shared via google drive link. Please have a look Relevant Files section( TensorRT parse onnx failed with AveragePool layer )

Logs of trexec also provided ( TensorRT parse onnx failed with AveragePool layer )

Anyway thank you for quick response

Any updates?

Hi @v.yastrebov90,

We are looking into this issue. Please allow some time to get back on this.

Thank you.

Hi @v.yastrebov90,

What are the dimensions of the pooling window here? TensorRT requires the input rank to be at least the rank of the pooling window + 2.

Thank you.

Let me explain:

  1. Input tensor shape = (1,196,768)
  2. I need to get (1, 14, 768)
    That means that window size is (14,1).
    First dimension is batch size

Currently I did the solution of manual applying the Average Pooling from cuDNN and passing device pointer as input.
The model was exported from Pytorch and as far as I got it does the Average Pooling from the lowest dimensions. it’s just memory representation I think.

My solution may help someone:

//Header
struct  TensorInfo {
            int max_batch, channels, height, width;
        };

        struct WindowInfo {
            int batch, channels, height, widht;
        };

        struct AveragePooling {
            AveragePooling(TensorInfo &&tensorInfo, WindowInfo &&windowInfo);
            ~AveragePooling();
            const TensorInfo &getOutInfo() const;

            void apply(const float *deviceInput, float *deviceOutput);
        private:
            TensorInfo tensor_;
            WindowInfo window_;

            cudnnHandle_t cudnn_;
            cudnnPoolingDescriptor_t pooling_desc;
            cudnnTensorDescriptor_t in_desc;
            cudnnTensorDescriptor_t out_desc;
            TensorInfo out_tensor_info;
        };

//Source
AveragePooling::AveragePooling(TensorInfo &&tensorInfo, WindowInfo &&windowInfo)
            :tensor_(std::move(tensorInfo))
            , window_(std::move(windowInfo))
        {
            auto status = cudnnCreate(&cudnn_);
            if(status != CUDNN_STATUS_SUCCESS) {
                throw std::runtime_error("Can't create cudnn");
            }
            status = cudnnCreatePoolingDescriptor(&pooling_desc);
            if(status != CUDNN_STATUS_SUCCESS) {
                throw std::runtime_error("Can't create poooling descriptor");
            }
            status = cudnnSetPooling2dDescriptor(pooling_desc,
                                        CUDNN_POOLING_AVERAGE_COUNT_EXCLUDE_PADDING,
                                        CUDNN_NOT_PROPAGATE_NAN,
                                        window_.height,
                                        window_.widht,
                                        0,
                                        0,
                                        window_.height,
                                        window_.widht);
            if(status != CUDNN_STATUS_SUCCESS) {
                std::string error(cudnnGetErrorString(status));
                throw std::runtime_error("Can't set Pooling 2d descriptor");
            }
            status = cudnnCreateTensorDescriptor(&in_desc);
            if(status != CUDNN_STATUS_SUCCESS) {
                throw std::runtime_error("Can't create input pooling descriptor");
            }
            status = cudnnSetTensor4dDescriptor(in_desc,
                                       CUDNN_TENSOR_NCHW,
                                       //CUDNN_TENSOR_NCHW_VECT_C,
                                       CUDNN_DATA_FLOAT,
                                       tensor_.max_batch,
                                       tensor_.channels,
                                       tensor_.height,
                                       tensor_.width);
            if(status != CUDNN_STATUS_SUCCESS) {
                throw std::runtime_error("Can't set input tensor descriptor");
            }
            //INFO: might need verification
            out_tensor_info = TensorInfo {
                    tensor_.max_batch,
                    tensor_.channels,
                    tensor_.height / window_.height,
                    tensor_.width / window_.widht
            };
            status = cudnnCreateTensorDescriptor(&out_desc);
            if(status != CUDNN_STATUS_SUCCESS) {
                throw std::runtime_error("Can't create output tensor descriptor");
            }
            status = cudnnSetTensor4dDescriptor(out_desc,
                                       CUDNN_TENSOR_NCHW,
                                       CUDNN_DATA_FLOAT,
                                       out_tensor_info.max_batch,
                                       out_tensor_info.channels,
                                       out_tensor_info.height,
                                       out_tensor_info.width);
            if(status != CUDNN_STATUS_SUCCESS) {
                throw std::runtime_error("Can't set output tensor descriptor");
            }
        }

        AveragePooling::~AveragePooling() {
            cudnnDestroyTensorDescriptor(out_desc);
            cudnnDestroyTensorDescriptor(in_desc);
            cudnnDestroyPoolingDescriptor(pooling_desc);
            cudnnDestroy(cudnn_);
        }

        void AveragePooling::apply(const float *deviceInput, float *deviceOutput) {
            float alpha = 1.0f;
            float beta = 0.0f;
            auto status = cudnnPoolingForward(cudnn_,
                                pooling_desc,
                                &alpha,
                                in_desc,
                                deviceInput,
                                &beta,
                                out_desc,
                                deviceOutput);
            if(status != CUDNN_STATUS_SUCCESS) {
                throw std::runtime_error("cudnnPoolingForward failed");
            }
        }

Usage:

AveragePooling pooling(TensorInfo {batch_size_, 1, 196, 768}, WindowInfo {batch_size_, 1, 14, 1})

It would be good to have it out of the box but currently this issue can be closed from my point of view.

Also I did the testing like followings:

    constexpr const int BATCH = 4;
    constexpr const int CHANELLS = 1;
    constexpr const int HEIGHT = 10;
    constexpr const int WIDTH = 10;
    float arr[BATCH][CHANELLS][HEIGHT][WIDTH] = {
        {
            {
                {0.0f, 1.0f, 2.0f, 3.0f, 4.0f, 5.0f, 6.0f, 7.0f, 8.0f, 9.0f},
                {0.0f, 1.0f, 2.0f, 3.0f, 4.0f, 5.0f, 6.0f, 7.0f, 8.0f, 9.0f}, //90 / 20 => 4.5

                {0.0f, 1.0f, 2.0f, 3.0f, 4.0f, 5.0f, 6.0f, 7.0f, 8.0f, 9.0f},
                {0.0f, 1.0f, 2.0f, 3.0f, 4.0f, 5.0f, 6.0f, 7.0f, 8.0f, 9.0f},

                {0.0f, 1.0f, 2.0f, 3.0f, 4.0f, 5.0f, 6.0f, 7.0f, 8.0f, 9.0f},
                {0.0f, 1.0f, 2.0f, 3.0f, 4.0f, 5.0f, 6.0f, 7.0f, 8.0f, 9.0f},

                {0.0f, 1.0f, 2.0f, 3.0f, 4.0f, 5.0f, 6.0f, 7.0f, 8.0f, 9.0f},
                {0.0f, 1.0f, 2.0f, 3.0f, 4.0f, 5.0f, 6.0f, 7.0f, 8.0f, 9.0f},

                {0.0f, 1.0f, 2.0f, 3.0f, 4.0f, 5.0f, 6.0f, 7.0f, 8.0f, 9.0f},
                {0.0f, 1.0f, 2.0f, 3.0f, 4.0f, 5.0f, 6.0f, 7.0f, 8.0f, 9.0f}
            }
        },
        {
            {
                {10.0f, 10.0f, 10.0f, 10.0f, 10.0f, 10.0f, 10.0f, 10.0f, 10.0f, 10.0f},
                {4.0f, 4.0f, 4.0f, 4.0f, 4.0f, 4.0f, 4.0f, 4.0f, 4.0f, 4.0f}, //140 / 20 => 7.0

                {10.0f, 10.0f, 10.0f, 10.0f, 10.0f, 10.0f, 10.0f, 10.0f, 10.0f, 10.0f},
                {4.0f, 4.0f, 4.0f, 4.0f, 4.0f, 4.0f, 4.0f, 4.0f, 4.0f, 4.0f}, //140 / 20 => 7.0

                {10.0f, 10.0f, 10.0f, 10.0f, 10.0f, 10.0f, 10.0f, 10.0f, 10.0f, 10.0f},
                {4.0f, 4.0f, 4.0f, 4.0f, 4.0f, 4.0f, 4.0f, 4.0f, 4.0f, 4.0f}, //140 / 20 => 7.0

                {10.0f, 10.0f, 10.0f, 10.0f, 10.0f, 10.0f, 10.0f, 10.0f, 10.0f, 10.0f},
                {4.0f, 4.0f, 4.0f, 4.0f, 4.0f, 4.0f, 4.0f, 4.0f, 4.0f, 4.0f}, //140 / 20 => 7.0

                {10.0f, 10.0f, 10.0f, 10.0f, 10.0f, 10.0f, 10.0f, 10.0f, 10.0f, 10.0f},
                {4.0f, 4.0f, 4.0f, 4.0f, 4.0f, 4.0f, 4.0f, 4.0f, 4.0f, 4.0f} //140 / 20 => 7.0
            }
        },
        {
            {
                {0.5f, 0.5f, 0.5f, 0.5f, 0.5f, 0.5f, 0.5f, 0.5f, 0.5f, 0.5f}, // 5
                {4.0f, 4.0f, 4.0f, 4.0f, 4.0f, 4.0f, 4.0f, 4.0f, 4.0f, 4.0f}, // 40+ 5 / 20 => 2.25

                {0.5f, 0.5f, 0.5f, 0.5f, 0.5f, 0.5f, 0.5f, 0.5f, 0.5f, 0.5f}, // 5
                {4.0f, 4.0f, 4.0f, 4.0f, 4.0f, 4.0f, 4.0f, 4.0f, 4.0f, 4.0f}, // 40+ 5 / 20 => 2.25

                {0.5f, 0.5f, 0.5f, 0.5f, 0.5f, 0.5f, 0.5f, 0.5f, 0.5f, 0.5f}, // 5
                {4.0f, 4.0f, 4.0f, 4.0f, 4.0f, 4.0f, 4.0f, 4.0f, 4.0f, 4.0f}, // 40+ 5 / 20 => 2.25

                {0.5f, 0.5f, 0.5f, 0.5f, 0.5f, 0.5f, 0.5f, 0.5f, 0.5f, 0.5f}, // 5
                {4.0f, 4.0f, 4.0f, 4.0f, 4.0f, 4.0f, 4.0f, 4.0f, 4.0f, 4.0f}, // 40+ 5 / 20 => 2.25

                {0.5f, 0.5f, 0.5f, 0.5f, 0.5f, 0.5f, 0.5f, 0.5f, 0.5f, 0.5f}, // 5
                {4.0f, 4.0f, 4.0f, 4.0f, 4.0f, 4.0f, 4.0f, 4.0f, 4.0f, 4.0f}, // 40+ 5 / 20 => 2.5
            }
        },
        {
            {
                {-1.5f, -1.5f, -1.5f, -1.5f, -1.5f, -1.5f, -1.5f, -1.5f, -1.5f, -1.5f}, // -15
                {3.5f, 3.5f, 3.5f, 3.5f, 3.5f, 3.5f, 3.5f, 3.5f, 3.5f, 3.5f}, // 35 - 15 => 20 / 20 => 1.0f

                {-1.5f, -1.5f, -1.5f, -1.5f, -1.5f, -1.5f, -1.5f, -1.5f, -1.5f, -1.5f}, // -15
                {3.5f, 3.5f, 3.5f, 3.5f, 3.5f, 3.5f, 3.5f, 3.5f, 3.5f, 3.5f}, // 35 - 15 => 20 / 20 => 1.0f

                {-1.5f, -1.5f, -1.5f, -1.5f, -1.5f, -1.5f, -1.5f, -1.5f, -1.5f, -1.5f}, // -15
                {3.5f, 3.5f, 3.5f, 3.5f, 3.5f, 3.5f, 3.5f, 3.5f, 3.5f, 3.5f}, // 35 - 15 => 20 / 20 => 1.0f

                {-1.5f, -1.5f, -1.5f, -1.5f, -1.5f, -1.5f, -1.5f, -1.5f, -1.5f, -1.5f}, // -15
                {3.5f, 3.5f, 3.5f, 3.5f, 3.5f, 3.5f, 3.5f, 3.5f, 3.5f, 3.5f}, // 35 - 15 => 20 / 20 => 1.0f

                {-1.5f, -1.5f, -1.5f, -1.5f, -1.5f, -1.5f, -1.5f, -1.5f, -1.5f, -1.5f}, // -15
                {3.5f, 3.5f, 3.5f, 3.5f, 3.5f, 3.5f, 3.5f, 3.5f, 3.5f, 3.5f}, // 35 - 15 => 20 / 20 => 1.0f
            }
        }
    };

    constexpr const std::size_t IN_SIZE = BATCH *CHANELLS *HEIGHT * WIDTH * sizeof(float);
    void *deviceInput;
    auto status = cudaMalloc(&deviceInput, IN_SIZE);

    constexpr const int COMPUTE_BATCH = 2;
    constexpr const int WINDOW_HEIGHT = 2;
    constexpr const int WINDOW_WIDTH = 10;

    constexpr const int OUT_HEIGHT = HEIGHT / WINDOW_HEIGHT;
    constexpr const int OUT_WIDTH = WIDTH / WINDOW_WIDTH;

    constexpr const std::size_t OUTPUT_SIZE = COMPUTE_BATCH * CHANELLS * OUT_HEIGHT * OUT_WIDTH * sizeof(float);
    //INFO: window = H=2, W=10 => output must be 5x1
    void *deviceOutput;
    status = cudaMalloc(&deviceOutput, OUTPUT_SIZE);

    status = cudaMemcpy(deviceInput, arr, IN_SIZE, cudaMemcpyHostToDevice);
    AveragePooling avgPool(TensorInfo {COMPUTE_BATCH, CHANELLS, HEIGHT, WIDTH}, WindowInfo {COMPUTE_BATCH, CHANELLS, WINDOW_HEIGHT, WINDOW_WIDTH});
    avgPool.apply((float*)deviceInput, (float*)deviceOutput, BATCH);

    float out_array[COMPUTE_BATCH][CHANELLS][OUT_HEIGHT][OUT_WIDTH] = { 0 };

    status = cudaMemcpy(out_array, deviceOutput, OUTPUT_SIZE, cudaMemcpyDeviceToHost);
    for(int i =0 ; i < COMPUTE_BATCH; ++i) {
        std::cout << "{" << std::endl;
        for(int j = 0; j < OUT_HEIGHT; ++j) {
            std::cout << out_array[i][0][j][0] << std::endl;
        }
        std::cout << "}" << std::endl;
    }
    cudaFree(deviceOutput);
    cudaFree(deviceInput);

Hi @v.yastrebov90,

Thanks for bringing this to our attention. Please allow us some time to work on this.

Hi @v.yastrebov90,

Could you update the permissions of the google drive link. Our team is unable to download the model to work on this issue.

Thank you.

Hello . Yeah sure.
This one should work: la_transform_model - Google Drive
Please check. If you need to share for someone else please provide the request.

Thank you.

Hi @v.yastrebov90,

After our team investigating the model, found that avgpool is an invalid one (attaching a screenshot of the model)

The input to the averagepool is a 3D tensor of shape 1x196x768. According to the ONNX spec, the corresponding pooling window, stride, and pads should be 1D, but provided as 2D values here.

Using ONNX-GS we edited this node to follow the proper ONNX spec of averagepool. The network can now be run fully in TensorRT. Please find the model here.
https://drive.google.com/file/d/1mC46F84UYLv_-JJfA2bTEhYFQ89MQjFs/view?usp=sharing

Thank you.

Oh, I see. Thank you very much for your support.