TRT conversion Error: Transform onnx to trt failed with Conv2d where the "groups" parameter of Conv2d is not one

Description

Transform onnx model to trt engine with static shape success but with dynamic shape failed. The failer is related to Conv2d with “group” parameter, like:“nn.Conv2d(640, 512, kernel_size=3, stride=1, padding=1, groups=2)”. where the “group” parameter is not 1. The pytorch model definision is the same(static vs dynamic). The torch2onnx file is almost the same expect dynamic shape related lines.
A clear and concise description of the bug or issue.
Error:
[11/11/2024-03:55:34] [E] [TRT] ModelImporter.cpp:951: — End node —
[11/11/2024-03:55:34] [E] [TRT] ModelImporter.cpp:954: ERROR: ModelImporter.cpp:181 In function parseNode:
[6] Invalid Node - /layers.10/Conv
Error Code: 3: /layers.10/Conv:kernel weights has count 1474560 but 737280 was expected
ITensor::getDimensions: Error Code 4: API Usage Error (/layers.10/Conv: count of 1474560 weights in kernel, but kernel dimensions (3,3) with 320 input channels, 512 output channels and 2 groups were specified. Expected Weights count is 320 * 33 * 512 / 2 = 737280)
Model definision:
class Encoder(nn.Module):
def init(self):
super(Encoder, self).init()
self.group = [1, 2, 4, 8, 1]
self.layers = nn.ModuleList([
nn.Conv2d(5, 64, kernel_size=3, stride=2, padding=1),
nn.LeakyReLU(0.2, inplace=True),
nn.Conv2d(64, 64, kernel_size=3, stride=1, padding=1),
nn.LeakyReLU(0.2, inplace=True),
nn.Conv2d(64, 128, kernel_size=3, stride=2, padding=1),
nn.LeakyReLU(0.2, inplace=True),
nn.Conv2d(128, 256, kernel_size=3, stride=1, padding=1),
nn.LeakyReLU(0.2, inplace=True),
nn.Conv2d(256, 384, kernel_size=3, stride=1, padding=1, groups=1),
nn.LeakyReLU(0.2, inplace=True),
nn.Conv2d(640, 512, kernel_size=3, stride=1, padding=1, groups=2),
nn.LeakyReLU(0.2, inplace=True),
nn.Conv2d(768, 384, kernel_size=3, stride=1, padding=1, groups=4),
nn.LeakyReLU(0.2, inplace=True),
nn.Conv2d(640, 256, kernel_size=3, stride=1, padding=1, groups=8),
nn.LeakyReLU(0.2, inplace=True),
nn.Conv2d(512, 128, kernel_size=3, stride=1, padding=1, groups=1),
nn.LeakyReLU(0.2, inplace=True)
])
torch2onnx codes:
torch.onnx.export(model,
input,
onnx_file_name,
# verbose=True,
export_params=True,
opset_version=18,
do_constant_folding=True,
input_names=[‘input’],
output_names = [‘output’],
dynamic_axes={
# ‘input’ : {0 : ‘time_domain1’},
‘input’ : {0 : ‘time_domain1’, 2: ‘height1’, 3: ‘width1’},
# ‘output’ : {0 : ‘time_domain1’},
‘output’ : {0 : ‘time_domain1’, 2: ‘height2’, 3: ‘width2’},
},
)
trtexec command:
trtexec --onnx=my_models/encode_dynamic_shape.onnx --saveEngine=my_models/encode_sim.trt --minShapes=input:13x5x240x240 --optShapes=input:17x5x360x360 --maxShapes=input:18x5x432x432
My Tries:
i observed that the input channel and output channel of conv2d is 640 and 512. When transformed to onnx, it becomes like this:


, 640->320. becaurse of “group” parameter is 2. When onnx transformed to trt, the group parameter is used twice by trt again. The error is generated as above.
when i modify the onnx file, make the “groups” parameter to 1, transform the onnx to trt, i also got an Error:
"[11/11/2024-06:31:11] [E] Error[3]: Error Code: 3: /layers.10/Conv:kernel weights has count 1474560 but 2949120 was expected
[11/11/2024-06:31:11] [E] Error[4]: IBuilder::buildSerializedNetwork: Error Code 4: API Usage Error (IConvolutionLayer /layers.10/Conv: /layers.10/Conv: count of 1474560 weights in kernel, but kernel dimensions (3,3) with 640 input channels, 512 output channels and 1 groups were specified. Expected Weights count is 640 * 3
3 * 512 / 1 = 2949120)".
What should i do to fix that? is there any error in trt?

Environment

TensorRT Version: 10.4
GPU Type: RTX 4070
Nvidia Driver Version: 550
CUDA Version: 12.4
CUDNN Version: 90100
Operating System + Version: ubuntu 22.04
Python Version (if applicable): 3.10
TensorFlow Version (if applicable):
PyTorch Version (if applicable): 2.5.1
Baremetal or Container (if container which image + tag):

Relevant Files

Please attach or include links to any models, data, files, or scripts necessary to reproduce your issue. (Github repo, Google Drive, Dropbox, etc.)

Steps To Reproduce

Please include:

  • Exact steps/commands to build your repro
  • Exact steps/commands to run your repro
  • Full traceback of errors encountered

Hi @2190585204 ,
Can you please help with your onnx model?
Thanks