../rtSafe/cuda/cudaElementWiseRunner.cpp (149) - Cuda Error in execute: 9 (invalid configuration argument

I get the following error when I run an inference on an engine I generated after parsing an ONNX model:

../rtSafe/cuda/cudaElementWiseRunner.cpp (149) - Cuda Error in execute: 9 (invalid configuration argument

According to Cuda’s documentation, this error occurs when:

cudaErrorInvalidConfiguration = 9
This indicates that a kernel launch is requesting resources that can never be satisfied by the current device. Requesting more shared memory per block than the device supports will trigger this error, as will requesting too many threads or blocks. See cudaDeviceProp for more device limitations.

It’s probably because of my layer is requesting too much memory, which is likely due to bad computation of some size. As the source code for this part is not public, could you provide help on whatever it is trying to do so that I can better understand the root cause?

Hi,

Can you provide the following information so we can better help?
Provide details on the platforms you are using:
o Linux distro and version
o GPU type
o Nvidia driver version
o CUDA version
o CUDNN version
o Python version [if using python]
o Tensorflow and PyTorch version
o TensorRT version

Also, if possible please share the script & model file to reproduce the issue.

Meanwhile, could you please try to use “trtexec” command to test the model.
“trtexec” useful for benchmarking networks and would be faster and easier to debug the issue.

https://docs.nvidia.com/deeplearning/sdk/tensorrt-developer-guide/index.html#trtexec
https://github.com/NVIDIA/TensorRT/blob/release/6.0/samples/opensource/trtexec/README.md

Thanks