How to solve OutOfMemory error except changing a larger GPU?

Description

I have a model which I want to optimize using trtexec. When I use batch size 2, it can optimize normally. However, when I use batch size 16, out of memory error happens.

Try decreasing the workspace size with IBuilderConfig::setMemoryPoolLimit().                                     
[02/15/2023-06:20:02] [E] Error[2]: [virtualMemoryBuffer.cpp::resizePhysical::145] Error Code 2: OutOfMemory (no 
further information)                                                                                             
[02/15/2023-06:20:02] [E] Error[2]: [virtualMemoryBuffer.cpp::resizePhysical::145] Error Code 2: OutOfMemory (no 
further information)                                                                                             
[02/15/2023-06:20:02] [W] [TRT] Requested amount of GPU memory (2147483648 bytes) could not be allocated. There $
ay not be enough free memory for allocation to succeed.                                                          
[02/15/2023-06:20:02] [W] [TRT] Skipping tactic 2 due to insufficient memory on requested size of 2147483648 det$
cted for tactic 0x0000000000000000.                                                                              
Try decreasing the workspace size with IBuilderConfig::setMemoryPoolLimit().                                     
[02/15/2023-06:20:08] [E] Error[1]: [convolutionRunner.cpp::executeConv::465] Error Code 1: Cudnn (CUDNN_STATUS_$
LLOC_FAILED)                                                                                                     
[02/15/2023-06:20:08] [E] Error[2]: [builder.cpp::buildSerializedNetwork::751] Error Code 2: Internal Error (Ass$
rtion engine != nullptr failed. )                                                                                
[02/15/2023-06:20:08] [E] Engine could not be created from network                                               
[02/15/2023-06:20:08] [E] Building engine failed                                                                 
[02/15/2023-06:20:08] [E] Failed to create engine from model or file.                                            
[02/15/2023-06:20:08] [E] Engine set up failed 

Except chaning a GPU with larger RAM, what can I do to deal with this condition?

Environment

nvidia docker container 22.12

Could you please share with us the trtexec command used and complete verbose logs for better debugging.

Here is the terminal log output:

https://cloud.tsinghua.edu.cn/f/891f5f2b1d9d47a481d7/?dl=1

And some info shown in terminal:

[02/15/2023-10:12:35] [W] [TRT] Skipping tactic 0x0000000000000000 due to Myelin error: autotuning: CUDA error 2 allocating 2148533757-byte buffer: out of memory
[02/15/2023-10:12:36] [E] Error[10]: [optimizer.cpp::computeCosts::3728] Error Code 10: Internal Error (Could
not find any implementation for node {ForeignNode[onnx::MatMul_15569 + (Unnamed Layer* 5865) [Shuffle].../output_blocks.5/output_blocks.5.1/transformer_blocks.0/Reshape_6 + /output_blocks.5/output_blocks.5.1/transformer_blocks.0/Transpose_3 + /output_blocks.5/output_blocks.5.1/transformer_blocks.0/Reshape_7]}.)
[02/15/2023-10:12:36] [E] Error[2]: [builder.cpp::buildSerializedNetwork::751] Error Code 2: Internal Error (Assertion engine != nullptr failed. )
[02/15/2023-10:12:36] [E] Engine could not be created from network
[02/15/2023-10:12:36] [E] Building engine failed
[02/15/2023-10:12:36] [E] Failed to create engine from model or file.
[02/15/2023-10:12:36] [E] Engine set up failed

Hope it helps.

Please ensure that GPU memory is available and no other applications using complete GPU memory.

I can ensure that GPU memory is available and empty before running trtexec. However, the error is still raised. I run with a 24G RTX 3090.

Please share with us the ONNX model for better debugging.

Thank you.

Here is the onnx file: https://cloud.tsinghua.edu.cn/f/4f0a921584564e45be6d/?dl=1

Sorry for the delay, the network is too slow and the model is large.

Hi,

We are unable to reproduce the error, and the model looks fine. But the model is huge.
As below warning indicates, for some reason TensorRT is unable to allocate required memory. Please make sure enough GPU memory is available (make sure you’re marking GPUs are visible in the container).

Also, please try passing higher memory using trtexec --workspace= option.
And you can also try decreasing the batch size.

Thank you.

Do you mean trtexec can run with multiple gpus? I have made only one gpu visible for trtexec. Is this the reason?

Yes, TensorRT can run on multiple GPUs.
Please make more GPU memory available for the TensorRT container and try again.

I tried, but trtexec only use one gpu even if multiple gpus available.