../rtSafe/cuda/cudaConvolutionRunner.cpp (483) - Cudnn Error in executeConv: 3 (CUDNN_STATUS_BAD_PARAM)

fiendish_elf · November 1, 2022, 8:39am

Description

When the trt-model was inference in C++，I got the error like this:
…/rtSafe/cuda/cudaConvolutionRunner.cpp (483) - Cudnn Error in executeConv: 3 (CUDNN_STATUS_BAD_PARAM)
FAILED_EXECUTION: std::exception

Moreover, the error only happens when I use fp32 mode, it’s ok for int8 mode.
And I am sure the input is fp32, data type is right.
I convert the model from pb to onnx and then convert the onnx to trt model.

Can you give any solutions? Thanks.

Environment

TensorRT Version: TensorRT-7.2.2.3
GPU Type: 1050 Ti
Nvidia Driver Version: 440.82
CUDA Version: 10.2
CUDNN Version: 8
Operating System + Version: ubuntu 18.04

Relevant Files

Please attach or include links to any models, data, files, or scripts necessary to reproduce your issue. (Github repo, Google Drive, Dropbox, etc.)

Steps To Reproduce

Please include:

Exact steps/commands to build your repro
Exact steps/commands to run your repro
Full traceback of errors encountered

fiendish_elf · November 1, 2022, 8:48am

what’s more, the version of TensorRT can’t changed because of the demand.And model can’t be offered on account of secret

fiendish_elf · November 1, 2022, 9:27am

detailed_logs.zip (104.4 KB)
There is the detailed logs.

spolisetty · November 2, 2022, 3:57pm

Hi,

Based on the logs looks like you could successfully generate the TRT engine, please make sure, you’re handling the data type and following in your inference script correctly.

github.com/NVIDIA/TensorRT

CUDNN_STATUS_BAD_PARAM when infer with dynamic shape

opened 02:00PM - 31 May 21 UTC

closed 01:52AM - 25 Jan 22 UTC

ysyyork

Release: 7.x triaged Runtime: Error

## Description I'm using dynamic shaped trt engine generated from trtexec to …do serving but I met this error: ``` [TensorRT] ERROR: ../rtSafe/cuda/cudaConvolutionRunner.cpp (483) - Cudnn Error in executeConv: 3 (CUDNN_STATUS_BAD_PARAM) [TensorRT] ERROR: FAILED_EXECUTION: std::exception ``` ## Environment **TensorRT Version**: 7.2.3.4 **NVIDIA GPU**: 2080 ti **NVIDIA Driver Version**: 460.73.01 **CUDA Version**: 11.2 **CUDNN Version**: 8 **Operating System**: Ubuntu 18.04 **Python Version (if applicable)**: 3.6.9 **Tensorflow Version (if applicable)**: 2.4.0 **PyTorch Version (if applicable)**: **Baremetal or Container (if so, version)**: ## Relevant Files ``` import tensorrt as trt import pycuda.driver as cuda import threading, numpy as np import functools, operator INITED = False LOCK = threading.Lock() def init(): global TRT_LOGGER, runtime, cfx, INITED LOCK.acquire() if not INITED: TRT_LOGGER = trt.Logger(trt.Logger.INFO) trt.init_libnvinfer_plugins(TRT_LOGGER, '') runtime = trt.Runtime(TRT_LOGGER) cfx = cuda.Device(0).make_context() INITED = True LOCK.release() class TRTInference: def __init__(self, trt_engine_path, max_batch_size=1): init() stream = cuda.Stream() # deserialize engine with open(trt_engine_path, 'rb') as f: buf = f.read() engine = runtime.deserialize_cuda_engine(buf) context = engine.create_execution_context() # prepare buffer host_inputs = [] cuda_inputs = [] host_outputs = [] cuda_outputs = [] bindings = [] self.max_batch_size = max_batch_size for binding in engine: dim = engine.get_binding_shape(binding)[1:] size = functools.reduce(operator.mul, dim) * self.max_batch_size host_mem = cuda.pagelocked_empty(size, np.float32) cuda_mem = cuda.mem_alloc(host_mem.nbytes) bindings.append(int(cuda_mem)) if engine.binding_is_input(binding): self.input_dim = dim self.input_unit_size = functools.reduce(operator.mul, dim) host_inputs.append(host_mem) cuda_inputs.append(cuda_mem) else: self.output_dim = dim self.output_unit_size = functools.reduce(operator.mul, dim) host_outputs.append(host_mem) cuda_outputs.append(cuda_mem) # store self.stream = stream self.context = context self.engine = engine self.host_inputs = host_inputs self.cuda_inputs = cuda_inputs self.host_outputs = host_outputs self.cuda_outputs = cuda_outputs self.bindings = bindings def infer(self, data): threading.Thread.__init__(self) batch_size = data.shape[0] data = np.ravel(data) cfx.push() # restore stream = self.stream context = self.context context.set_binding_shape(0, (batch_size, *self.input_dim)) host_inputs = self.host_inputs cuda_inputs = self.cuda_inputs host_outputs = self.host_outputs cuda_outputs = self.cuda_outputs bindings = self.bindings # read image np.copyto(host_inputs[0][:batch_size * self.input_unit_size], data) # inference cuda.memcpy_htod_async(cuda_inputs[0], host_inputs[0], stream) context.execute_async(bindings=bindings, stream_handle=stream.handle) cuda.memcpy_dtoh_async(host_outputs[0], cuda_outputs[0], stream) stream.synchronize() cfx.pop() return np.reshape(host_outputs[0][:batch_size * self.output_unit_size], (batch_size, *self.output_dim)) def destory(self): cfx.pop() ``` Engine model: https://drive.google.com/file/d/1es2HktMGl4_murFvmRjE6OG1ki_Tm_dl/view?usp=sharing ## Steps To Reproduce 1. Use the model I provided 2. Use the `TRTInference` class above 3. do the below: ``` for i in range(1, 65): model.infer(np.ones((i, 260, 260, 15), dtype=np.float32)) ``` You will see there are some batch sizes working well but some are not. Very weird. This model i generated, the min, opt, and max batch size are 1, 4, 64 respectively. I also tried using different combination of batch sizes but the behavior seems random. there will always be some random batch size not working. Very strange! I also tried some other model structures, but same results. However, if you use `trtexec --loadEngine` and set the shape to a fixed one, all batch sizes work. One side note: this model is from tensorflow -> onnx -> trt

For your reference, please refer to the following samples.

Thank you.

Topic		Replies	Views
Error Code 1: Cudnn (CUDNN_STATUS_EXECUTION_FAILED) TensorRT cuda	3	2181	May 31, 2022
Got Cudnn Error in executeConv: 3 (CUDNN_STATUS_BAD_PARAM) TensorRT tensorrt , cuda , ubuntu	1	1398	February 21, 2022
safeContext.cpp (184) - Cudnn Error in configure: 7 (CUDNN_STATUS_MAPPING_ERROR) TensorRT	10	3564	July 9, 2021
TensorRT 5 Bug？cuda/cudaConvolutionLayer.cpp (163) - Cudnn Error in execute: 3 TensorRT	3	2664	June 28, 2019
Onnx to trt and use int8 for inference, with batchsize=8. Got ERROR:genericReformat.cu (1262) TensorRT	2	563	May 5, 2021
Falure to do inference TAO Toolkit tensorrt	9	1071	January 11, 2022
CUDNN_STATUS_BAD_PARAM when infer with dynamic shape TensorRT	4	1500	June 25, 2021
Build TRT engine with onnx QAT model throws segmentation fault TensorRT	3	1274	August 12, 2021
[TensorRT] Cuda Error in findFastestTactic: 4 (unspecified launch failure), Cuda Error in free: 4 (u... Jetson AGX Xavier	8	1713	March 11, 2020
Error occurred while running the Tensorrt samples: [reformat.cpp::executeCutensor::385] TensorRT tensorrt	3	1193	December 12, 2023

../rtSafe/cuda/cudaConvolutionRunner.cpp (483) - Cudnn Error in executeConv: 3 (CUDNN_STATUS_BAD_PARAM)

Description

Environment

Relevant Files

Steps To Reproduce

Related topics