Trtexec exports engine with error (out of memory)

Description

I’m using trtexec to create engine for Xoftr. Onnx model reports Cuda Runtime (out of memory) error when exporting engine.

[05/08/2025-09:07:34] [W] [TRT] /fine_process/self_attn_m/mlp/Reshape_1: IShuffleLayer with zeroIsPlaceHolder=true has reshape dimension at position 0 that might or might not be zero. TensorRT resolves it at runtime, but this may cause excessive memory consumption and is usually a sign of a bug in the network.
[05/08/2025-09:07:34] [W] [TRT] /fine_process/cross_attn_m/mlp/Reshape_1: IShuffleLayer with zeroIsPlaceHolder=true has reshape dimension at position 0 that might or might not be zero. TensorRT resolves it at runtime, but this may cause excessive memory consumption and is usually a sign of a bug in the network.
[05/08/2025-09:07:34] [W] [TRT] /fine_process/cross_attn_m/mlp_1/Reshape_1: IShuffleLayer with zeroIsPlaceHolder=true has reshape dimension at position 0 that might or might not be zero. TensorRT resolves it at runtime, but this may cause excessive memory consumption and is usually a sign of a bug in the network.
[05/08/2025-09:07:34] [W] [TRT] /fine_process/self_attn_f/mlp/Reshape_1: IShuffleLayer with zeroIsPlaceHolder=true has reshape dimension at position 0 that might or might not be zero. TensorRT resolves it at runtime, but this may cause excessive memory consumption and is usually a sign of a bug in the network.
[05/08/2025-09:07:34] [W] [TRT] /fine_process/cross_attn_f/mlp/Reshape_1: IShuffleLayer with zeroIsPlaceHolder=true has reshape dimension at position 0 that might or might not be zero. TensorRT resolves it at runtime, but this may cause excessive memory consumption and is usually a sign of a bug in the network.
[05/08/2025-09:07:34] [W] [TRT] /fine_process/cross_attn_f/mlp_1/Reshape_1: IShuffleLayer with zeroIsPlaceHolder=true has reshape dimension at position 0 that might or might not be zero. TensorRT resolves it at runtime, but this may cause excessive memory consumption and is usually a sign of a bug in the network.
[05/08/2025-09:07:35] [I] [TRT] Local timing cache in use. Profiling results in this builder pass will not be stored.
[05/08/2025-09:07:35] [I] [TRT] Compiler backend is used during engine build.
[05/08/2025-09:08:19] [I] [TRT] Detected 2 inputs and 3 output network tensors.
[05/08/2025-09:08:21] [E] Error[1]: [defaultAllocator.cpp::allocate::31] Error Code 1: Cuda Runtime (out of memory)
[05/08/2025-09:08:21] [W] [TRT] Requested amount of GPU memory (1283457024000 bytes) could not be allocated. There may not be enough free memory for allocation to succeed.
[05/08/2025-09:08:22] [E] Error[1]: IBuilder::buildSerializedNetwork: Error Code 1: Myelin ([tunable_graph.cpp:create:117] autotuning: User allocator error allocating 1283457024000-byte buffer)
[05/08/2025-09:08:22] [E] Engine could not be created from network
[05/08/2025-09:08:22] [E] Building engine failed
[05/08/2025-09:08:22] [E] Failed to create engine from model or file.
[05/08/2025-09:08:22] [E] Engine set up failed
&&&& FAILED TensorRT.trtexec [TensorRT v100500] [b18] # /usr/src/tensorrt/bin/trtexec --onnx=/mnt/d/wsl/XoFTR-main/weights/xoftr0507.onnx --fp16 --saveEngine=/mnt/d/wsl/XoFTR-main/weights/xoftr_fp16.engine --memPoolSize=workspace:2048 --optShapes=image0:1x1x360x640,image1:1x1x512x640

Environment

TensorRT Version: 10.5.0
GPU Type: NVIDIA GeForce RTX 4070 SUPER
CUDA Version: 10.7
CUDNN Version: 8.7.0
Operating System + Version: WSL2 ubuntu20.04
Python Version (if applicable): 3.8.20
PyTorch Version (if applicable): 2.0.1

Relevant Files

Warning code content:

class Mlp(nn.Module):
“”“Multi-Layer Perceptron (MLP)”“”

def __init__(self,
             in_dim,
             hidden_dim=None,
             out_dim=None,
             act_layer=nn.GELU):
    """
    Args:
        in_dim: input features dimension
        hidden_dim: hidden features dimension
        out_dim: output features dimension
        act_layer: activation function
    """
    super().__init__()
    out_dim = out_dim or in_dim
    hidden_dim = hidden_dim or in_dim
    self.fc1 = nn.Linear(in_dim, hidden_dim)
    self.act = act_layer()
    self.fc2 = nn.Linear(hidden_dim, out_dim)
    self.out_dim = out_dim

def forward(self, x): 
    x_size = x.size()
    x = x.view(-1, x_size[-1])
    x = self.fc1(x)
    x = self.act(x)
    x = self.fc2(x)
    x = x.view(*x_size[:-1], self.out_dim) 
    return x

Can anybody help? Thanks

Hi,
Can you please try the below possible suggestions-

  1. Add explicit optimization profiles covering your input shapes (min, opt, max shapes for each input), e.g.:

/usr/src/tensorrt/bin/trtexec --onnx=/mnt/d/wsl/XoFTR-main/weights/xoftr0507.onnx
–fp16 --saveEngine=/mnt/d/wsl/XoFTR-main/weights/xoftr_fp16.engine
–memPoolSize=workspace:2048
–minShapes=image0:1x1x360x640,image1:1x1x512x640
–optShapes=image0:1x1x360x640,image1:1x1x512x640
–maxShapes=image0:1x1x360x640,image1:1x1x512x640

  1. Investigate the reshape layers in your ONNX model causing the zeroIsPlaceHolder warning. Fix those in your model export or preprocessing if possible.
  2. Simplify the ONNX model using onnx-simplifier:
    python3 -m onnxsim xoftr0507.onnx xoftr0507_simplified.onnx
    Then try building the engine from the simplified model.
  3. Monitor GPU memory usage with nvidia-smi during build to confirm actual memory usage.