[E] [TRT] C:\source\rtSafe\safeRuntime.cpp (25) - Cuda Error in nvinfer1::internal::DefaultAllocator::allocate: 2 (out of memory)

Description

I met the OOM error when i use TensorRT 7.1.3 to generate the engine file. the network model is VNet that converted from PyTorch model to ONNX model. When i try to generate the engine file with the onnx model file (the input node data size is [1x1x96x176x176]), the engine plan file can produced successfully (config->setMaxWorkspaceSize(3_GiB)), But when i increase the Depth dimenstion (3th dimension) of the input node (the input node data size is [1x1x112x176x176]), the Out of Memory error occurred. I try to increase the workspace size by config->setMaxWorkspaceSize() with 5_GiB, 8_GiB, 10_GiB, 20_GiB, the Out of memory error still occurred as bellowed. It seems like the setMaxWorkspaceSize()code have no useness when i set the workspace size larger than 3GiB. I don’t know why the OOM error occur when i just change the input node data size from [1x1x96x176x176] to [1x1x96x176x176] of the onnx model. Does the tensorRT restrict the input data node size for 3D convolution neural network?
There is the error message bellowed:

&&&& RUNNING TensorRT.sample_mnist_api # C:\NVIDIA\TensorRT\TensorRT-7.1.3.4\samples\sampleMNISTAPI_Vnet\x64\Debug\sample_mnist_api_vnet3d.exe
[12/08/2020-09:54:09] [I] Building and running a GPU inference engine for MNIST API
[12/08/2020-09:54:10] [W] [TRT] Setting layouts of network and plugin input/output tensors to linear, as 3D operators are found and 3D non-linear IO formats are not supported, yet.
[12/08/2020-09:54:15] [E] [TRT] C:\source\rtSafe\safeRuntime.cpp (25) - Cuda Error in nvinfer1::internal::DefaultAllocator::allocate: 2 (out of memory)
[12/08/2020-09:54:15] [W] [TRT] GPU memory allocation error during getBestTactic: (Unnamed Layer* 115) [Convolution] + (Unnamed Layer* 118) [ElementWise] + (Unnamed Layer* 119) [Activation]
[12/08/2020-09:54:15] [E] [TRT] Try increasing the workspace size with IBuilderConfig::setMaxWorkspaceSize() if using IBuilder::buildEngineWithConfig, or IBuilder::setMaxWorkspaceSize() if using IBuilder::buildCudaEngine.
[12/08/2020-09:54:15] [E] [TRT] C:\source\builder\tacticOptimizer.cpp (1715) - TRTInternal Error in nvinfer1::builder::`anonymous-namespace'::LeafCNode::computeCosts: 0 (Could not find any implementation for node (Unnamed Layer* 115) [Convolution] + (Unnamed Layer* 118) [ElementWise] + (Unnamed Layer* 119) [Activation].)
[12/08/2020-09:54:15] [E] [TRT] C:\source\builder\tacticOptimizer.cpp (1715) - TRTInternal Error in nvinfer1::builder::`anonymous-namespace'::LeafCNode::computeCosts: 0 (Could not find any implementation for node (Unnamed Layer* 115) [Convolution] + (Unnamed Layer* 118) [ElementWise] + (Unnamed Layer* 119) [Activation].)

Appreciated for any reply, Thanks.

Environment

TensorRT Version: 7.1.3
GPU Type: RTX 6000 (24GiB device memory)
Nvidia Driver Version: 451.48
CUDA Version: 11.0
CUDNN Version: 8.0.2
Operating System + Version: Windows 10
Python Version (if applicable):
TensorFlow Version (if applicable):
PyTorch Version (if applicable):
Baremetal or Container (if container which image + tag):

Relevant Files

Please attach or include links to any models, data, files, or scripts necessary to reproduce your issue. (Github repo, Google Drive, Dropbox, etc.)

Steps To Reproduce

Please include:

  • Exact steps/commands to build your repro
  • Exact steps/commands to run your repro
  • Full traceback of errors encountered

Hi @xiaojianli,
Currently TRT has limited 3D support.
Also, you are currently using very old driver version, hence we recommend you to try with the latest.
If the issue persist, kindly share your model and script.
Thanks!

Hi @AakankshaS,sorry for the mistake spelling. The driver of my device is 451.48. And i try to update the driver to 456.81 and the weird error still occurred. What’s more, I tested different input data size model, when the input node data size is too big, the error will occur for the 3D Conv network. I think the tensorrt restricts the input data size for 3D convolution. What’s more, how can i decide the size of workspace by config->setMaxWorkspaceSize() when i build an engine plan file? I try to increase the workspace size because the tensorrt suggest to increase the workspace size with this OOM error at build engine stage, but without any useness for the workspace size changing.

Hi @xiaojianli ,
Are you still facing the issue?

For setting up workspace size, please check the below link
https://docs.nvidia.com/deeplearning/tensorrt/developer-guide/index.html#build_engine_c
alternatively you can use trtexec command with your model and define workspace size.

Thanks!