Cuda Error in nvinfer1::cudnn::findFastestTactic: 700

Description

Hi, thank you in advance for your time! I am facing errors when I am trying to convert my ONNX model to TensorRT engine.

Full logs are the following:

error | C:\source\builder\cudnnBuilderUtils.cpp (421) - Cuda Error in nvinfer1::cudnn::findFastestTactic: 700 (cudaEventElapsedTime)
error | C:\source\rtSafe\safeRuntime.cpp (32) - Cuda Error in nvinfer1::internal::DefaultAllocator::free: 700 (an illegal memory access was encountered)

It works perfectly fine with the previous version of the model (both files are attached) with different architecture where original ResNet50 is used as a feature extractor, but in a new one there are some modifications based on TResNet paper ([2003.13630] TResNet: High Performance GPU-Dedicated Architecture) however without any custom layers, only native Pytorch operations

Environment

TensorRT Version: TensorRT-7.2.1.6
GPU Type: RTX3060Ti
Nvidia Driver Version: 510.10
CUDA Version: 11.6
Operating System + Version: Windows 11 Pro Insider Preview

Relevant Files

ONNX file of the old model which is converted fine

ONNX file of the new model which cannot be converted

Unfortunately, I cannot share exact commands to build the repo and steps to reproduce it as they are parts of a company’s library. But the error happens inside buildEngineWithConfig() method.

Hi,
Request you to share the ONNX model and the script if not shared already so that we can assist you better.
Alongside you can try few things:
https://docs.nvidia.com/deeplearning/tensorrt/quick-start-guide/index.html#onnx-export

  1. validating your model with the below snippet

check_model.py

import sys
import onnx
filename = yourONNXmodel
model = onnx.load(filename)
onnx.checker.check_model(model).
2) Try running your model with trtexec command.
https://github.com/NVIDIA/TensorRT/tree/master/samples/opensource/trtexec
In case you are still facing issue, request you to share the trtexec “”–verbose"" log for further debugging
Thanks!

[10/11/2021-17:46:47] [V] [TRT] After vertical fusions: 108 layers
[10/11/2021-17:46:47] [V] [TRT] After dupe layer removal: 108 layers
[10/11/2021-17:46:47] [V] [TRT] After final dead-layer removal: 108 layers
[10/11/2021-17:46:47] [V] [TRT] After tensor merging: 108 layers
[10/11/2021-17:46:47] [V] [TRT] Eliminating concatenation Concat_182
[10/11/2021-17:46:47] [V] [TRT] Generating copy for 517 to 543
[10/11/2021-17:46:47] [V] [TRT] Generating copy for 542 to 543
[10/11/2021-17:46:47] [V] [TRT] After concat removal: 109 layers
[10/11/2021-17:46:47] [V] [TRT] Graph construction and optimization completed in 0.132946 seconds.
[10/11/2021-17:46:49] [V] [TRT] Constructing optimization profile number 0 [1/1].
[10/11/2021-17:46:49] [V] [TRT] --------------- Timing Runner: <reformat> (Reformat)
[10/11/2021-17:46:49] [V] [TRT] Tactic: 1002 time 7.33837
[10/11/2021-17:46:49] [V] [TRT] Tactic: 0 time 7.13882
[10/11/2021-17:46:49] [V] [TRT] Fastest Tactic: 0 Time: 7.13882
[10/11/2021-17:46:49] [V] [TRT] --------------- Timing Runner: <reformat> (Reformat)
[10/11/2021-17:46:49] [V] [TRT] Tactic: 1002 time 11.6412
[10/11/2021-17:46:49] [V] [TRT] Tactic: 0 time 0.251904
[10/11/2021-17:46:49] [V] [TRT] Fastest Tactic: 0 Time: 0.251904
[10/11/2021-17:46:49] [V] [TRT] --------------- Timing Runner: <reformat> (Reformat)
[10/11/2021-17:46:49] [V] [TRT] Tactic: 1002 time 1.27768
[10/11/2021-17:46:49] [V] [TRT] Tactic: 0 time 7.81786
[10/11/2021-17:46:49] [V] [TRT] Fastest Tactic: 1002 Time: 1.27768
[10/11/2021-17:46:49] [V] [TRT] *************** Autotuning format combination: Float(1,128,49152,147456) -> Float(1,32,3072,9216,36864,147456) ***************
[10/11/2021-17:46:49] [V] [TRT] --------------- Timing Runner: Reshape_1 + Transpose_2 (Shuffle)
[10/11/2021-17:46:49] [V] [TRT] Tactic: 0 time 0.209536
[10/11/2021-17:46:49] [V] [TRT] Tactic: 1 skipped. Scratch requested: 17694720, available: 16777216
[10/11/2021-17:46:49] [I] [TRT] Some tactics do not have sufficient workspace memory to run. Increasing workspace size may increase performance, please check verbose output.
[10/11/2021-17:46:49] [V] [TRT] Fastest Tactic: 0 Time: 0.209536
[10/11/2021-17:46:49] [V] [TRT] *************** Autotuning format combination: Float(3,384,1,147456) -> Float(3,96,1,9216,36864,147456) ***************
[10/11/2021-17:46:49] [V] [TRT] --------------- Timing Runner: Reshape_1 + Transpose_2 (Shuffle)
[10/11/2021-17:46:49] [V] [TRT] Tactic: 0 time 0.517632
[10/11/2021-17:46:49] [V] [TRT] Tactic: 1 skipped. Scratch requested: 17694720, available: 16777216
[10/11/2021-17:46:49] [V] [TRT] Fastest Tactic: 0 Time: 0.517632
[10/11/2021-17:46:49] [V] [TRT] *************** Autotuning format combination: Float(1,128,1:4,49152) -> Float(1,32,1:4,3072,12288,49152) ***************
[10/11/2021-17:46:49] [V] [TRT] --------------- Timing Runner: Reshape_1 + Transpose_2 (Shuffle)
[10/11/2021-17:46:49] [V] [TRT] Tactic: 0 time 0.207608
[10/11/2021-17:46:49] [V] [TRT] Tactic: 1 skipped. Scratch requested: 17694720, available: 16777216
[10/11/2021-17:46:49] [V] [TRT] Fastest Tactic: 0 Time: 0.207608
[10/11/2021-17:46:49] [V] [TRT] *************** Autotuning format combination: Float(1,128,49152:32,49152) -> Float(1,32,3072:32,3072,12288,49152) ***************
[10/11/2021-17:46:49] [V] [TRT] --------------- Timing Runner: Reshape_1 + Transpose_2 (Shuffle)
[10/11/2021-17:46:49] [E] [TRT] C:\source\builder\cudnnBuilderUtils.cpp (419) - Cuda Error in nvinfer1::cudnn::findFastestTactic: 700 (cudaEventElapsedTime)
[10/11/2021-17:46:49] [E] [TRT] C:\source\rtSafe\safeRuntime.cpp (32) - Cuda Error in nvinfer1::internal::DefaultAllocator::free: 700 (an illegal memory access was encountered)

Here is a part of the log I got when trying to run trtexec on the onnx file I’ve provided. Full log is in the txt filelog.txt (163.8 KB)

Hi @voeykovroman,

We look into this issue, meanwhile we recommend you to please try on latest TensorRT version 8.2 EA and let us know if you still face this issue.

Thank you.

Thanks! Error message in 8.2 EA is more informative

 Error Code 4: Miscellaneous (IShuffleLayer Reshape_1: reshape changes volume. Reshaping [30,3,384,128] to [1,3,96,4,32,4].)

It turned out that there is a bug during ONNX conversion (however, it wasn’t detectable with onnx.checker.check_model(model). For some reason, the problem was fixed when I changed ceil_mode to False in nn.AvgPool2d.

Thank you for your help!

1 Like