Build engine with 1GB GPU Cudnn_status_alloc_failed

Description

Hello, I am trying to build a yolov7 engine with 1GB of GPU memory available. I know for a fact that while building the engine the maximum amount of GPU memory it consumes is 600MB, and once the engine is created the algorithm consumes 890MB of GPU memory.

Building the engine on the same GPU although with 2GB of memory instead of only 1GB, works. (unfortunately, the engine created in the machine that uses the 2GB GPU does not deserialize on the machine that uses the 1 GB GPU, it throws a cudnn initialization error.)

THIS IS THE ERROR I GET WHEN I TRY BUILDING THE ENGINE IN A A16 1GB GPU:

root@ai-1gb:/home/ubuntu/TRT/yolov7/build# sudo ./yolov7 -s yolov7.wts yolov7.engine v7
Loading weights: yolov7.wts
Building engine, please wait for a while…
[07/26/2023-20:12:52] [W] [TRT] TensorRT was linked against cuBLAS/cuBLAS LT 11.5.1 but loaded cuBLAS/cuBLAS LT 11.4.2
[07/26/2023-20:12:52] [W] [TRT] Detected invalid timing cache, setup a local cache instead
[07/26/2023-20:12:52] [E] [TRT] 1: [convolutionRunner.cpp::executeConv::458] Error Code 1: Cudnn (CUDNN_STATUS_ALLOC_FAILED)
[07/26/2023-20:12:52] [E] [TRT] 2: [builder.cpp::buildSerializedNetwork::417] Error Code 2: Internal Error (Assertion enginePtr != nullptr failed.)
Build engine successfully!
yolov7: /home/ubuntu/tensorrtx/yolov7/main.cpp:38: void serialize_engine(unsigned int, std::string&, std::string&, std::string&): Assertion `serialized_engine != nullptr’ failed.
Aborted

Environment

TensorRT Version: 8.0.1.6
GPU Type: A16
Nvidia Driver Version: 525.85.05
CUDA Version: 11.3
CUDNN Version: 8.8.0
Operating System + Version: UBuntu 22.04
Python Version (if applicable): 3.8.0
TensorFlow Version (if applicable):
PyTorch Version (if applicable):

Related FIles

I am including the CmakeList and the main.cpp file.

CMakeLists.txt (1.4 KB)

main.cpp (7.6 KB)

Steps To Reproduce

GIt clone and follow the steps of this Tutorial to reproduce the error,

The error happens when the command sudo ./yolov7 -s is executed.

Why can’t I initialize CUDNN in a 1GB GPU machine?

Hi,

The generated plan files are not portable across platforms or TensorRT versions. Plans are specific to the exact GPU model they were built on (in addition to the platforms and the TensorRT version) and must be re-targeted to the specific GPU in case you want to run them on a different GPU.

Achieving this can be done by following certain rules in the most recent version of TensorRT, which is 8.6.1. For further information, please refer to the accompanying document.

Thank you.