Trtexec exports engine with error (out of memory)

Description

I’m using trtexec to create engine for Xoftr. Onnx model reports Cuda Runtime (out of memory) error when exporting engine.

[05/08/2025-09:07:34] [W] [TRT] /fine_process/self_attn_m/mlp/Reshape_1: IShuffleLayer with zeroIsPlaceHolder=true has reshape dimension at position 0 that might or might not be zero. TensorRT resolves it at runtime, but this may cause excessive memory consumption and is usually a sign of a bug in the network.
[05/08/2025-09:07:34] [W] [TRT] /fine_process/cross_attn_m/mlp/Reshape_1: IShuffleLayer with zeroIsPlaceHolder=true has reshape dimension at position 0 that might or might not be zero. TensorRT resolves it at runtime, but this may cause excessive memory consumption and is usually a sign of a bug in the network.
[05/08/2025-09:07:34] [W] [TRT] /fine_process/cross_attn_m/mlp_1/Reshape_1: IShuffleLayer with zeroIsPlaceHolder=true has reshape dimension at position 0 that might or might not be zero. TensorRT resolves it at runtime, but this may cause excessive memory consumption and is usually a sign of a bug in the network.
[05/08/2025-09:07:34] [W] [TRT] /fine_process/self_attn_f/mlp/Reshape_1: IShuffleLayer with zeroIsPlaceHolder=true has reshape dimension at position 0 that might or might not be zero. TensorRT resolves it at runtime, but this may cause excessive memory consumption and is usually a sign of a bug in the network.
[05/08/2025-09:07:34] [W] [TRT] /fine_process/cross_attn_f/mlp/Reshape_1: IShuffleLayer with zeroIsPlaceHolder=true has reshape dimension at position 0 that might or might not be zero. TensorRT resolves it at runtime, but this may cause excessive memory consumption and is usually a sign of a bug in the network.
[05/08/2025-09:07:34] [W] [TRT] /fine_process/cross_attn_f/mlp_1/Reshape_1: IShuffleLayer with zeroIsPlaceHolder=true has reshape dimension at position 0 that might or might not be zero. TensorRT resolves it at runtime, but this may cause excessive memory consumption and is usually a sign of a bug in the network.
[05/08/2025-09:07:35] [I] [TRT] Local timing cache in use. Profiling results in this builder pass will not be stored.
[05/08/2025-09:07:35] [I] [TRT] Compiler backend is used during engine build.
[05/08/2025-09:08:19] [I] [TRT] Detected 2 inputs and 3 output network tensors.
[05/08/2025-09:08:21] [E] Error[1]: [defaultAllocator.cpp::allocate::31] Error Code 1: Cuda Runtime (out of memory)
[05/08/2025-09:08:21] [W] [TRT] Requested amount of GPU memory (1283457024000 bytes) could not be allocated. There may not be enough free memory for allocation to succeed.
[05/08/2025-09:08:22] [E] Error[1]: IBuilder::buildSerializedNetwork: Error Code 1: Myelin ([tunable_graph.cpp:create:117] autotuning: User allocator error allocating 1283457024000-byte buffer)
[05/08/2025-09:08:22] [E] Engine could not be created from network
[05/08/2025-09:08:22] [E] Building engine failed
[05/08/2025-09:08:22] [E] Failed to create engine from model or file.
[05/08/2025-09:08:22] [E] Engine set up failed
&&&& FAILED TensorRT.trtexec [TensorRT v100500] [b18] # /usr/src/tensorrt/bin/trtexec --onnx=/mnt/d/wsl/XoFTR-main/weights/xoftr0507.onnx --fp16 --saveEngine=/mnt/d/wsl/XoFTR-main/weights/xoftr_fp16.engine --memPoolSize=workspace:2048 --optShapes=image0:1x1x360x640,image1:1x1x512x640

Environment

TensorRT Version: 10.5.0
GPU Type: NVIDIA GeForce RTX 4070 SUPER
CUDA Version: 10.7
CUDNN Version: 8.7.0
Operating System + Version: WSL2 ubuntu20.04
Python Version (if applicable): 3.8.20
PyTorch Version (if applicable): 2.0.1

Relevant Files

Warning code content:

class Mlp(nn.Module):
“”“Multi-Layer Perceptron (MLP)”“”

def __init__(self,
             in_dim,
             hidden_dim=None,
             out_dim=None,
             act_layer=nn.GELU):
    """
    Args:
        in_dim: input features dimension
        hidden_dim: hidden features dimension
        out_dim: output features dimension
        act_layer: activation function
    """
    super().__init__()
    out_dim = out_dim or in_dim
    hidden_dim = hidden_dim or in_dim
    self.fc1 = nn.Linear(in_dim, hidden_dim)
    self.act = act_layer()
    self.fc2 = nn.Linear(hidden_dim, out_dim)
    self.out_dim = out_dim

def forward(self, x): 
    x_size = x.size()
    x = x.view(-1, x_size[-1])
    x = self.fc1(x)
    x = self.act(x)
    x = self.fc2(x)
    x = x.view(*x_size[:-1], self.out_dim) 
    return x

Can anybody help? Thanks