Cuda Error in nvinfer1::cudnn::findFastestTactic: 700

Description

Hi, thank you in advance for your time! I am facing errors when I am trying to convert my ONNX model to TensorRT engine.

Full logs are the following:

error | C:\source\builder\cudnnBuilderUtils.cpp (421) - Cuda Error in nvinfer1::cudnn::findFastestTactic: 700 (cudaEventElapsedTime)
error | C:\source\rtSafe\safeRuntime.cpp (32) - Cuda Error in nvinfer1::internal::DefaultAllocator::free: 700 (an illegal memory access was encountered)

It works perfectly fine with the previous version of the model (both files are attached) with different architecture where original ResNet50 is used as a feature extractor, but in a new one there are some modifications based on TResNet paper ([2003.13630] TResNet: High Performance GPU-Dedicated Architecture) however without any custom layers, only native Pytorch operations

Environment

TensorRT Version: TensorRT-7.2.1.6
GPU Type: RTX3060Ti
Nvidia Driver Version: 510.10
CUDA Version: 11.6
Operating System + Version: Windows 11 Pro Insider Preview

Relevant Files

ONNX file of the old model which is converted fine

ONNX file of the new model which cannot be converted

Unfortunately, I cannot share exact commands to build the repo and steps to reproduce it as they are parts of a company’s library. But the error happens inside buildEngineWithConfig() method.

Hi,
Request you to share the ONNX model and the script if not shared already so that we can assist you better.
Alongside you can try few things:

  1. validating your model with the below snippet

check_model.py

import sys
import onnx
filename = yourONNXmodel
model = onnx.load(filename)
onnx.checker.check_model(model).
2) Try running your model with trtexec command.
https://github.com/NVIDIA/TensorRT/tree/master/samples/opensource/trtexec
In case you are still facing issue, request you to share the trtexec “”–verbose"" log for further debugging
Thanks!

[10/11/2021-17:46:47] [V] [TRT] After vertical fusions: 108 layers
[10/11/2021-17:46:47] [V] [TRT] After dupe layer removal: 108 layers
[10/11/2021-17:46:47] [V] [TRT] After final dead-layer removal: 108 layers
[10/11/2021-17:46:47] [V] [TRT] After tensor merging: 108 layers
[10/11/2021-17:46:47] [V] [TRT] Eliminating concatenation Concat_182
[10/11/2021-17:46:47] [V] [TRT] Generating copy for 517 to 543
[10/11/2021-17:46:47] [V] [TRT] Generating copy for 542 to 543
[10/11/2021-17:46:47] [V] [TRT] After concat removal: 109 layers
[10/11/2021-17:46:47] [V] [TRT] Graph construction and optimization completed in 0.132946 seconds.
[10/11/2021-17:46:49] [V] [TRT] Constructing optimization profile number 0 [1/1].
[10/11/2021-17:46:49] [V] [TRT] --------------- Timing Runner: <reformat> (Reformat)
[10/11/2021-17:46:49] [V] [TRT] Tactic: 1002 time 7.33837
[10/11/2021-17:46:49] [V] [TRT] Tactic: 0 time 7.13882
[10/11/2021-17:46:49] [V] [TRT] Fastest Tactic: 0 Time: 7.13882
[10/11/2021-17:46:49] [V] [TRT] --------------- Timing Runner: <reformat> (Reformat)
[10/11/2021-17:46:49] [V] [TRT] Tactic: 1002 time 11.6412
[10/11/2021-17:46:49] [V] [TRT] Tactic: 0 time 0.251904
[10/11/2021-17:46:49] [V] [TRT] Fastest Tactic: 0 Time: 0.251904
[10/11/2021-17:46:49] [V] [TRT] --------------- Timing Runner: <reformat> (Reformat)
[10/11/2021-17:46:49] [V] [TRT] Tactic: 1002 time 1.27768
[10/11/2021-17:46:49] [V] [TRT] Tactic: 0 time 7.81786
[10/11/2021-17:46:49] [V] [TRT] Fastest Tactic: 1002 Time: 1.27768
[10/11/2021-17:46:49] [V] [TRT] *************** Autotuning format combination: Float(1,128,49152,147456) -> Float(1,32,3072,9216,36864,147456) ***************
[10/11/2021-17:46:49] [V] [TRT] --------------- Timing Runner: Reshape_1 + Transpose_2 (Shuffle)
[10/11/2021-17:46:49] [V] [TRT] Tactic: 0 time 0.209536
[10/11/2021-17:46:49] [V] [TRT] Tactic: 1 skipped. Scratch requested: 17694720, available: 16777216
[10/11/2021-17:46:49] [I] [TRT] Some tactics do not have sufficient workspace memory to run. Increasing workspace size may increase performance, please check verbose output.
[10/11/2021-17:46:49] [V] [TRT] Fastest Tactic: 0 Time: 0.209536
[10/11/2021-17:46:49] [V] [TRT] *************** Autotuning format combination: Float(3,384,1,147456) -> Float(3,96,1,9216,36864,147456) ***************
[10/11/2021-17:46:49] [V] [TRT] --------------- Timing Runner: Reshape_1 + Transpose_2 (Shuffle)
[10/11/2021-17:46:49] [V] [TRT] Tactic: 0 time 0.517632
[10/11/2021-17:46:49] [V] [TRT] Tactic: 1 skipped. Scratch requested: 17694720, available: 16777216
[10/11/2021-17:46:49] [V] [TRT] Fastest Tactic: 0 Time: 0.517632
[10/11/2021-17:46:49] [V] [TRT] *************** Autotuning format combination: Float(1,128,1:4,49152) -> Float(1,32,1:4,3072,12288,49152) ***************
[10/11/2021-17:46:49] [V] [TRT] --------------- Timing Runner: Reshape_1 + Transpose_2 (Shuffle)
[10/11/2021-17:46:49] [V] [TRT] Tactic: 0 time 0.207608
[10/11/2021-17:46:49] [V] [TRT] Tactic: 1 skipped. Scratch requested: 17694720, available: 16777216
[10/11/2021-17:46:49] [V] [TRT] Fastest Tactic: 0 Time: 0.207608
[10/11/2021-17:46:49] [V] [TRT] *************** Autotuning format combination: Float(1,128,49152:32,49152) -> Float(1,32,3072:32,3072,12288,49152) ***************
[10/11/2021-17:46:49] [V] [TRT] --------------- Timing Runner: Reshape_1 + Transpose_2 (Shuffle)
[10/11/2021-17:46:49] [E] [TRT] C:\source\builder\cudnnBuilderUtils.cpp (419) - Cuda Error in nvinfer1::cudnn::findFastestTactic: 700 (cudaEventElapsedTime)
[10/11/2021-17:46:49] [E] [TRT] C:\source\rtSafe\safeRuntime.cpp (32) - Cuda Error in nvinfer1::internal::DefaultAllocator::free: 700 (an illegal memory access was encountered)

Here is a part of the log I got when trying to run trtexec on the onnx file I’ve provided. Full log is in the txt filelog.txt (163.8 KB)

Hi @voeykovroman,

We look into this issue, meanwhile we recommend you to please try on latest TensorRT version 8.2 EA and let us know if you still face this issue.

Thank you.

Thanks! Error message in 8.2 EA is more informative

 Error Code 4: Miscellaneous (IShuffleLayer Reshape_1: reshape changes volume. Reshaping [30,3,384,128] to [1,3,96,4,32,4].)

It turned out that there is a bug during ONNX conversion (however, it wasn’t detectable with onnx.checker.check_model(model). For some reason, the problem was fixed when I changed ceil_mode to False in nn.AvgPool2d.

Thank you for your help!

1 Like

Hello voeykovroman,

I got the same problem as you on my model, using TRT 8.2 GA.

Error Code 4: Miscellaneous IShuffleLayer

I have no nn.AvgPool2d . in my model, the problem may be somewhere else.

My question is, how did you made a link between your error and the fact that the parameter in the nn.AvgPool2d was the source of the problem ? How did you deduce that there was a bug in the onnx conversion ? This could help me to find my problem.

Thanks,

Hi!
Unfortunately, I didn’t get to the bottom of the problem back then. The error message was just indicating that there was a problem with the batch axis so after I checked my code and didn’t notice anything specific I started to google what layers may cause it and found that there was some undefined behavior with nn.AvgPool2d and its ceil_mode (unrelated to this particular problem, though, but related to onnx) so I tried to disable it just in case and turned out that it was the reason (because it was the only change I made).
Sorry if it is not much of a help for your problem.

Hello,

Thank you for your help, I will continue to look for a solution, if I find something I will come back here :)

1 Like