Hi, thank you in advance for your time! I am facing errors when I am trying to convert my ONNX model to TensorRT engine.
Full logs are the following:
error | C:\source\builder\cudnnBuilderUtils.cpp (421) - Cuda Error in nvinfer1::cudnn::findFastestTactic: 700 (cudaEventElapsedTime)
error | C:\source\rtSafe\safeRuntime.cpp (32) - Cuda Error in nvinfer1::internal::DefaultAllocator::free: 700 (an illegal memory access was encountered)
It works perfectly fine with the previous version of the model (both files are attached) with different architecture where original ResNet50 is used as a feature extractor, but in a new one there are some modifications based on TResNet paper ([2003.13630] TResNet: High Performance GPU-Dedicated Architecture) however without any custom layers, only native Pytorch operations
Environment
TensorRT Version: TensorRT-7.2.1.6 GPU Type: RTX3060Ti Nvidia Driver Version: 510.10 CUDA Version: 11.6 Operating System + Version: Windows 11 Pro Insider Preview
Relevant Files
ONNX file of the old model which is converted fine
ONNX file of the new model which cannot be converted
Unfortunately, I cannot share exact commands to build the repo and steps to reproduce it as they are parts of a company’s library. But the error happens inside buildEngineWithConfig() method.
Hi,
Request you to share the ONNX model and the script if not shared already so that we can assist you better.
Alongside you can try few things:
validating your model with the below snippet
check_model.py
import sys
import onnx
filename = yourONNXmodel
model = onnx.load(filename)
onnx.checker.check_model(model).
2) Try running your model with trtexec command. https://github.com/NVIDIA/TensorRT/tree/master/samples/opensource/trtexec
In case you are still facing issue, request you to share the trtexec “”–verbose"" log for further debugging
Thanks!
It turned out that there is a bug during ONNX conversion (however, it wasn’t detectable with onnx.checker.check_model(model). For some reason, the problem was fixed when I changed ceil_mode to False in nn.AvgPool2d.
I got the same problem as you on my model, using TRT 8.2 GA.
Error Code 4: Miscellaneous IShuffleLayer
I have no nn.AvgPool2d . in my model, the problem may be somewhere else.
My question is, how did you made a link between your error and the fact that the parameter in the nn.AvgPool2d was the source of the problem ? How did you deduce that there was a bug in the onnx conversion ? This could help me to find my problem.
Hi!
Unfortunately, I didn’t get to the bottom of the problem back then. The error message was just indicating that there was a problem with the batch axis so after I checked my code and didn’t notice anything specific I started to google what layers may cause it and found that there was some undefined behavior with nn.AvgPool2d and its ceil_mode (unrelated to this particular problem, though, but related to onnx) so I tried to disable it just in case and turned out that it was the reason (because it was the only change I made).
Sorry if it is not much of a help for your problem.