Transform onnx to trt failed with Conv2d which has "groups" parameter greater than 1

2190585204 · November 12, 2024, 3:04am

Description

Transform onnx model to trt engine with static shape success but with dynamic shape failed. The failer is related to Conv2d with “group” parameter, like:“nn.Conv2d(640, 512, kernel_size=3, stride=1, padding=1, groups=2)”. where the “group” parameter is not 1. The pytorch model definision is the same(static vs dynamic). The torch2onnx file is almost the same expect dynamic shape related lines.
A clear and concise description of the bug or issue.
Error:

[11/11/2024-03:55:34] [E] [TRT] ModelImporter.cpp:951: — End node —
[11/11/2024-03:55:34] [E] [TRT] ModelImporter.cpp:954: ERROR: ModelImporter.cpp:181 In function parseNode:
[6] Invalid Node - /layers.10/Conv
Error Code: 3: /layers.10/Conv:kernel weights has count 1474560 but 737280 was expected
ITensor::getDimensions: Error Code 4: API Usage Error (/layers.10/Conv: count of 1474560 weights in kernel, but kernel dimensions (3,3) with 320 input channels, 512 output channels and 2 groups were specified. Expected Weights count is 320 * 3*3 * 512 / 2 = 737280)

Model definision:

class Encoder(nn.Module):
    def __init__(self):
        super(Encoder, self).__init__()
        self.group = [1, 2, 4, 8, 1]
        self.layers = nn.ModuleList([
            nn.Conv2d(5, 64, kernel_size=3, stride=2, padding=1),
            nn.LeakyReLU(0.2, inplace=True),
            nn.Conv2d(64, 64, kernel_size=3, stride=1, padding=1),
            nn.LeakyReLU(0.2, inplace=True),
            nn.Conv2d(64, 128, kernel_size=3, stride=2, padding=1),
            nn.LeakyReLU(0.2, inplace=True),
            nn.Conv2d(128, 256, kernel_size=3, stride=1, padding=1),
            nn.LeakyReLU(0.2, inplace=True),
            nn.Conv2d(256, 384, kernel_size=3, stride=1, padding=1, groups=1),
            nn.LeakyReLU(0.2, inplace=True),
            nn.Conv2d(640, 512, kernel_size=3, stride=1, padding=1, groups=2),
            nn.LeakyReLU(0.2, inplace=True),
            nn.Conv2d(768, 384, kernel_size=3, stride=1, padding=1, groups=4),
            nn.LeakyReLU(0.2, inplace=True),
            nn.Conv2d(640, 256, kernel_size=3, stride=1, padding=1, groups=8),
            nn.LeakyReLU(0.2, inplace=True),
            nn.Conv2d(512, 128, kernel_size=3, stride=1, padding=1, groups=1),
            nn.LeakyReLU(0.2, inplace=True)
        ])

    def forward(self, x):
        bt, c, _, _ = x.size()
        # h, w = h//4, w//4
        out = x
        for i, layer in enumerate(self.layers):
            if i == 8:
                x0 = out
                _, _, h, w = x0.size()
            if i > 8 and i % 2 == 0:
                g = self.group[(i - 8) // 2]
                x = x0.view(bt, g, -1, h, w)
                o = out.view(bt, g, -1, h, w)
                out = torch.cat([x, o], 2).view(bt, -1, h, w)
            out = layer(out)
        return out

torch2onnx codes:

torch.onnx.export(model,
input,
onnx_file_name,
# verbose=True,
export_params=True,
opset_version=18,
do_constant_folding=True,
input_names=[‘input’],
output_names = [‘output’],
dynamic_axes={
# ‘input’ : {0 : ‘time_domain1’},
‘input’ : {0 : ‘time_domain1’, 2: ‘height1’, 3: ‘width1’},
# ‘output’ : {0 : ‘time_domain1’},
‘output’ : {0 : ‘time_domain1’, 2: ‘height2’, 3: ‘width2’},
},
)

trtexec command:

trtexec --onnx=my_models/encode_dynamic_shape.onnx --saveEngine=my_models/encode_sim.trt --minShapes=input:13x5x240x240 --optShapes=input:17x5x360x360 --maxShapes=input:18x5x432x432

My Tries:

i observed that the input channel and output channel of conv2d is 640 and 512. When transformed to onnx, it becomes like this:
![image|474x499](upload://hssArxmPc0lcUYNC6kVcJW4aJo2.png)
 640->320. becaurse of "group" parameter is 2. When onnx transformed to trt, the group parameter is used twice by trt again. The error is generated as above. 
when i modify the onnx file, make the "groups" parameter to 1, transform the onnx to trt, i also got an Error:

Error:

[11/11/2024-06:31:11] [E] Error[3]: Error Code: 3: /layers.10/Conv:kernel weights has count 1474560 but 2949120 was expected
[11/11/2024-06:31:11] [E] Error[4]: IBuilder::buildSerializedNetwork: Error Code 4: API Usage Error (IConvolutionLayer /layers.10/Conv: /layers.10/Conv: count of 1474560 weights in kernel, but kernel dimensions (3,3) with 640 input channels, 512 output channels and 1 groups were specified. Expected Weights count is 640 * 3*3 * 512 / 1 = 2949120)

What should i do to fix that? is there any bug in trt?

Environment

TensorRT Version: 10.4
GPU Type: rtx 4070
Nvidia Driver Version: 550
CUDA Version: 12.4
CUDNN Version:
Operating System + Version: ubuntu 22.04
Python Version (if applicable): 3.10
TensorFlow Version (if applicable):
PyTorch Version (if applicable): 2.5.1
Baremetal or Container (if container which image + tag):

Relevant Files

Please attach or include links to any models, data, files, or scripts necessary to reproduce your issue. (Github repo, Google Drive, Dropbox, etc.)

Steps To Reproduce

Please include:

Exact steps/commands to build your repro
Exact steps/commands to run your repro
Full traceback of errors encountered

AakankshaS · November 30, 2024, 8:51am

Hi @2190585204 ,
Looks like you are feeding incompatible input shapes.
In the meantime, trying a repro.

Thanks

Topic		Replies	Views
TRT conversion Error: Transform onnx to trt failed with Conv2d where the "groups" parameter of Conv2d is not one TensorRT cudnn	1	36	January 31, 2025
False (?) dynamic shape error during onnx->trt conversion TensorRT tensorrt , onnx	2	349	August 3, 2024
Dynamic shape onnx model TensorRT tensorrt	3	1615	August 18, 2020
How to convert F.conv2d(x, kernel) to tensorrt when kernel is one dynamic input TensorRT tensorrt , pytorch	5	2042	February 8, 2023
Dynamic Input. IShuffleLayer applied to shape tensor must have 0 or 1 reshape dimensions TensorRT	2	1079	November 29, 2022
Trtexec create engine failed from onnx when adding dynamic shapes TensorRT	5	2177	June 22, 2021
How to convert .onnx to .trtmodel tensorrt using trtexec with size output optional TensorRT	2	1619	September 11, 2024
Error when run executeV2() with dynamic shape model from Onnx file TensorRT	10	3298	September 14, 2023
Convert a pb model to trt with dynamic shape TensorRT	2	786	November 12, 2020
Try to convert onnx to dynamic shape engine via trtexec,while not work TensorRT	7	1738	August 11, 2022

Transform onnx to trt failed with Conv2d which has "groups" parameter greater than 1

Description

Environment

Relevant Files

Steps To Reproduce

Related topics