Description
I’m using trtexec to create engine for Xoftr. Onnx model reports Cuda Runtime (out of memory) error when exporting engine.
[05/08/2025-09:07:34] [W] [TRT] /fine_process/self_attn_m/mlp/Reshape_1: IShuffleLayer with zeroIsPlaceHolder=true has reshape dimension at position 0 that might or might not be zero. TensorRT resolves it at runtime, but this may cause excessive memory consumption and is usually a sign of a bug in the network.
[05/08/2025-09:07:34] [W] [TRT] /fine_process/cross_attn_m/mlp/Reshape_1: IShuffleLayer with zeroIsPlaceHolder=true has reshape dimension at position 0 that might or might not be zero. TensorRT resolves it at runtime, but this may cause excessive memory consumption and is usually a sign of a bug in the network.
[05/08/2025-09:07:34] [W] [TRT] /fine_process/cross_attn_m/mlp_1/Reshape_1: IShuffleLayer with zeroIsPlaceHolder=true has reshape dimension at position 0 that might or might not be zero. TensorRT resolves it at runtime, but this may cause excessive memory consumption and is usually a sign of a bug in the network.
[05/08/2025-09:07:34] [W] [TRT] /fine_process/self_attn_f/mlp/Reshape_1: IShuffleLayer with zeroIsPlaceHolder=true has reshape dimension at position 0 that might or might not be zero. TensorRT resolves it at runtime, but this may cause excessive memory consumption and is usually a sign of a bug in the network.
[05/08/2025-09:07:34] [W] [TRT] /fine_process/cross_attn_f/mlp/Reshape_1: IShuffleLayer with zeroIsPlaceHolder=true has reshape dimension at position 0 that might or might not be zero. TensorRT resolves it at runtime, but this may cause excessive memory consumption and is usually a sign of a bug in the network.
[05/08/2025-09:07:34] [W] [TRT] /fine_process/cross_attn_f/mlp_1/Reshape_1: IShuffleLayer with zeroIsPlaceHolder=true has reshape dimension at position 0 that might or might not be zero. TensorRT resolves it at runtime, but this may cause excessive memory consumption and is usually a sign of a bug in the network.
[05/08/2025-09:07:35] [I] [TRT] Local timing cache in use. Profiling results in this builder pass will not be stored.
[05/08/2025-09:07:35] [I] [TRT] Compiler backend is used during engine build.
[05/08/2025-09:08:19] [I] [TRT] Detected 2 inputs and 3 output network tensors.
[05/08/2025-09:08:21] [E] Error[1]: [defaultAllocator.cpp::allocate::31] Error Code 1: Cuda Runtime (out of memory)
[05/08/2025-09:08:21] [W] [TRT] Requested amount of GPU memory (1283457024000 bytes) could not be allocated. There may not be enough free memory for allocation to succeed.
[05/08/2025-09:08:22] [E] Error[1]: IBuilder::buildSerializedNetwork: Error Code 1: Myelin ([tunable_graph.cpp:create:117] autotuning: User allocator error allocating 1283457024000-byte buffer)
[05/08/2025-09:08:22] [E] Engine could not be created from network
[05/08/2025-09:08:22] [E] Building engine failed
[05/08/2025-09:08:22] [E] Failed to create engine from model or file.
[05/08/2025-09:08:22] [E] Engine set up failed
&&&& FAILED TensorRT.trtexec [TensorRT v100500] [b18] # /usr/src/tensorrt/bin/trtexec --onnx=/mnt/d/wsl/XoFTR-main/weights/xoftr0507.onnx --fp16 --saveEngine=/mnt/d/wsl/XoFTR-main/weights/xoftr_fp16.engine --memPoolSize=workspace:2048 --optShapes=image0:1x1x360x640,image1:1x1x512x640
Environment
TensorRT Version: 10.5.0
GPU Type: NVIDIA GeForce RTX 4070 SUPER
CUDA Version: 10.7
CUDNN Version: 8.7.0
Operating System + Version: WSL2 ubuntu20.04
Python Version (if applicable): 3.8.20
PyTorch Version (if applicable): 2.0.1
Relevant Files
Warning code content:
class Mlp(nn.Module):
“”“Multi-Layer Perceptron (MLP)”“”
def __init__(self,
in_dim,
hidden_dim=None,
out_dim=None,
act_layer=nn.GELU):
"""
Args:
in_dim: input features dimension
hidden_dim: hidden features dimension
out_dim: output features dimension
act_layer: activation function
"""
super().__init__()
out_dim = out_dim or in_dim
hidden_dim = hidden_dim or in_dim
self.fc1 = nn.Linear(in_dim, hidden_dim)
self.act = act_layer()
self.fc2 = nn.Linear(hidden_dim, out_dim)
self.out_dim = out_dim
def forward(self, x):
x_size = x.size()
x = x.view(-1, x_size[-1])
x = self.fc1(x)
x = self.act(x)
x = self.fc2(x)
x = x.view(*x_size[:-1], self.out_dim)
return x
Can anybody help? Thanks