Yolov5 Engine Inference error


A clear and concise description of the bug or issue.


TensorRT Version: 8.0.3
GPU Type: dGPU
Nvidia Driver Version: 470
CUDA Version: 11.4
CUDNN Version: 8.2.1
Operating System + Version: Ubuntu 20.04
Python Version (if applicable):
TensorFlow Version (if applicable):
PyTorch Version (if applicable):
Baremetal or Container (if container which image + tag): nvcr.io/nvidia/tensorrt:21.09-py3

Relevant Files

Please attach or include links to any models, data, files, or scripts necessary to reproduce your issue. (Github repo, Google Drive, Dropbox, etc.)

Steps To Reproduce

Please include:

  • Exact steps/commands to build your repro
  • Exact steps/commands to run your repro
  • Full traceback of errors encountered
  1. Generated python tensorrt engine from BlueMirrors/Yolov5-TensorRT. Correct results with TensorRT python.
  2. Desrialized python engine in C++. Runs perfectly fine. Successful deserialization.
  3. When I run inference on the deserialized TensorRT engine it throws this error.

C++ Code to deserialize python TensorRT engine:
CMakeLists.txt (1003 Bytes)
inference.cpp (9.2 KB)
logging.h (16.4 KB)
utils.hpp (4.8 KB)

Error when running inference:

   [10/10/2021-10:52:04] [I] [TRT] [MemUsageChange] Init CUDA: CPU +328, GPU +0, now: CPU 363, GPU 204 (MiB)
[10/10/2021-10:52:04] [I] [TRT] Loaded engine size: 24 MB
[10/10/2021-10:52:04] [I] [TRT] [MemUsageSnapshot] deserializeCudaEngine begin: CPU 363 MiB, GPU 204 MiB
[10/10/2021-10:52:05] [I] [TRT] [MemUsageChange] Init cuBLAS/cuBLASLt: CPU +498, GPU +214, now: CPU 869, GPU 436 (MiB)
[10/10/2021-10:52:05] [I] [TRT] [MemUsageChange] Init cuDNN: CPU +169, GPU +204, now: CPU 1038, GPU 640 (MiB)
[10/10/2021-10:52:05] [I] [TRT] [MemUsageChange] Init cuBLAS/cuBLASLt: CPU +0, GPU +0, now: CPU 855, GPU 422 (MiB)
[10/10/2021-10:52:05] [I] [TRT] [MemUsageSnapshot] deserializeCudaEngine end: CPU 855 MiB, GPU 422 MiB
deserialized engine successfully.
[10/10/2021-10:52:05] [I] [TRT] [MemUsageSnapshot] ExecutionContext creation begin: CPU 830 MiB, GPU 422 MiB
[10/10/2021-10:52:06] [I] [TRT] [MemUsageChange] Init cuBLAS/cuBLASLt: CPU +184, GPU +210, now: CPU 1014, GPU 632 (MiB)
[10/10/2021-10:52:06] [I] [TRT] [MemUsageChange] Init cuDNN: CPU +0, GPU +8, now: CPU 1014, GPU 640 (MiB)
[10/10/2021-10:52:06] [I] [TRT] [MemUsageSnapshot] ExecutionContext creation end: CPU 1014 MiB, GPU 668 MiB
[10/10/2021-10:52:06] [E] [TRT] 1: [reformat.cpp::executeCutensor::384] Error Code 1: CuTensor (Internal cuTensor permutate execute failed)
Cuda failure: 700
Aborted (core dumped)

Can you try running your model with trtexec command, and share the “”–verbose"" log in case if the issue persist

You can refer below link for all the supported operators list, in case any operator is not supported you need to create a custom plugin to support that operation

Also, request you to share your model and script if not shared already so that we can help you better.

Meanwhile, for some common errors and queries please refer to below link:


I am trying to run the engine generated with python API to work in c++, this works just fine in python but throws the above error in C++.
Sharing the model … c++ code is shared above already. PLease have a look if you can helpyolov5s.engine (23.9 MB)

Any update on this topic?