TensorRT Build Error: Tensor Volume Exceeds 2^31 Limit for Large Fixed Shapes (Super-Resolution/Restoration Models)

Environment:

  • TensorRT Version: 10.9.0.34
  • GPU Type: NVIDIA GeForce RTX 3090 (24GB VRAM)
  • Nvidia Driver Version: 572.83
  • CUDA Version: 12.8.1
  • CUDNN Version: 9.8.0
  • Operating System: Windows 11 24H2
  • Python Version (if used for ONNX export): 3.10
  • PyTorch Version (if used for ONNX export): 2.6.0
  • ONNX Opset Version: 20
  • trtexec command (example causing failure for 4x SR):

trtexec --onnx=esc_real_x4_GAN.onnx --saveEngine=ESC_REAL_GAN_tile_2560.engine --shapes=input:1x3x2560x2560 --memPoolSize=workspace:16384M --fp16 --inputIOFormats=fp16:chw --skipInference --verbose

  • trtexec command (example causing failure for 1x Restoration):

trtexec --onnx=scunet_color_real_gan.onnx --saveEngine=scunet_color_real_gan_tile_8192.engine --shapes=input:1x3x8192x8192 --memPoolSize=workspace:16384M --fp16 --inputIOFormats=fp16:chw --verbose

Description:

When attempting to build TensorRT engines with large, fixed input shapes (–shapes) for various ONNX models (primarily super-resolution like ESC Real-GAN, RealESRGAN, BSRGAN, and restoration models like SCUNet), the trtexec build process fails with an “API Usage Error” related to tensor volume exceeding the 32-bit signed integer limit (2^31 or 2,147,483,648).

This limitation prevents the creation of engines optimized for larger tile sizes during inference, even if sufficient VRAM might be available on the target GPU (like an RTX 3090/4090). This forces the use of smaller tiles, increasing the overhead of tiling/stitching logic and potentially impacting inference quality at tile boundaries for high-resolution images (e.g., 4K, 8K).

Observed Behavior:

The build fails with errors similar to these:

  • For 4x Super-Resolution (e.g., ESC Real-GAN with input 1x3x2560x2560):

[E] Error[4]: IBuilder::buildSerializedNetwork: Error Code 4: API Usage Error (IResizeLayer /to_img/to_img.3/Resize: /to_img/to_img.3/Resize_output_0: tensor volume exceeds 2147483648, dimensions are [1,64,10240,10240])

Analysis: The intermediate tensor after 4x upscaling (2560*4=10240) with 64 internal channels exceeds the limit (1 * 64 * 10240 * 10240 > 2^31). The largest working fixed shape found was 1x3x1448x1448.

  • For 1x Restoration (e.g., SCUNet with input 1x3x8192x8192):

[E] Error[4]: IBuilder::buildSerializedNetwork: Error Code 4: API Usage Error (IConvolutionLayer /m_head/m_head.0/Conv: /m_head/m_head.0/Conv_output_0: tensor volume exceeds 2147483648, dimensions are [1,64,8192,8192])

Analysis: The intermediate tensor after the initial convolution (maintaining input spatial dimensions) with 64 internal channels exceeds the limit (1 * 64 * 8192 * 8192 > 2^31).

  • Internal Padding Issue (SCUNet): When trying the calculated limit based on the previous error (1x3x5792x5792), the build still failed, revealing potential internal padding:

[E] Error[4]: IBuilder::buildSerializedNetwork: Error Code 4: API Usage Error (IConvolutionLayer /m_head/m_head.0/Conv: /m_head/m_head.0/Conv_output_0: tensor volume exceeds 2147483648, dimensions are [1,64,5824,5824])

Analysis: It seems TensorRT (or the model structure) padded the input dimension 5792 up to the next multiple of 64 (5824) before the convolution, causing the volume limit to be exceeded again. The largest working fixed shape found that accounts for this padding was 1x3x5760x5760.

Expected Behavior:

Ideally, the TensorRT engine build process for fixed shapes should succeed as long as the theoretical VRAM requirements for the build process itself (weights, workspace, activations for optimization) are met. The limitation should primarily be the available VRAM at runtime for the chosen inference shape, not a hardcoded 32-bit volume limit for intermediate tensors during the build phase.

Using dynamic shapes (–minShapes, --optShapes, --maxShapes) does not seem to bypass this issue, as the builder still checks against the maxShapes during build time and fails if maxShapes would lead to an intermediate tensor exceeding the volume limit.

Question:

  1. Is this 2^31 tensor volume limit a fundamental and unavoidable limitation in the current TensorRT architecture for specific layers (like Resize, Convolution) during the build phase for fixed or max dynamic shapes?
  2. Are there any known workarounds or builder flags (beyond the standard ones) that could potentially mitigate this issue and allow building engines for larger fixed shapes, assuming sufficient VRAM is available? (Other than the obvious workaround of using smaller fixed shapes and tiling during inference).
  3. Are there any plans to address this limitation in future TensorRT versions, perhaps by introducing 64-bit indexing for critical operations or tensors?

This limitation significantly impacts the practical application of TensorRT for high-resolution image processing tasks where larger, optimized inference tiles are desirable. Any insights or potential solutions would be greatly appreciated.

Thank you!

Can anyone comment on this? It would be really helpful for me to be able to set/define larger dimensions when creating TensorRT models.