Onnxruntime error

Hi,
I trained a Fasterrcnn Resnet50 object detection model on custom dataset with PyTorch and converted it to onnx model
When ran onnxruntime against the onnx model, it managed to load the model. But when the model tried predicting on a image with session.run(), it returns the following error:

[E:onnxruntime:Default, cuda_call.cc:117 CudaCall] CUDNN failure 4: CUDNN_STATUS_INTERNAL_ERROR ; GPU=0 ; hostname=harshatejas ; expr=cudnnFindConvolutionForwardAlgorithmEx( s_.handle, s_.x_tensor, s_.x_data, s_.w_desc, s_.w_data, s_.conv_desc, s_.y_tensor, s_.y_data, 1, &algo_count, &perf, algo_search_workspace.get(), max_ws_size);
2021-07-02 16:59:04.808417944 [E:onnxruntime:, sequential_executor.cc:339 Execute] Non-zero status code returned while running Conv node. Name:‘Conv_441’ Status Message: CUDNN error executing cudnnFindConvolutionForwardAlgorithmEx( s_.handle, s_.x_tensor, s_.x_data, s_.w_desc, s_.w_data, s_.conv_desc, s_.y_tensor, s_.y_data, 1, &algo_count, &perf, algo_search_workspace.get(), max_ws_size)
terminate called after throwing an instance of ‘onnxruntime::OnnxRuntimeException’
what(): /home/onnxruntime/onnxruntime-py36/onnxruntime/core/providers/cuda/cuda_call.cc:121 bool onnxruntime::CudaCall(ERRTYPE, const char*, const char*, ERRTYPE, const char*) [with ERRTYPE = cudaError; bool THRW = true] /home/onnxruntime/onnxruntime-py36/onnxruntime/core/providers/cuda/cuda_call.cc:115 bool onnxruntime::CudaCall(ERRTYPE, const char*, const char*, ERRTYPE, const char*) [with ERRTYPE = cudaError; bool THRW = true] CUDA failure 702: the launch timed out and was terminated ; GPU=0 ; hostname=harshatejas ; expr=cudaEventDestroy(read_event_);

Aborted (core dumped)

I changed the runtime to run on cpu with session.set_providers([‘CPUExecutionProvider’])
BOOM!! The code ran successfully and predicted perfectly on the image, it took very long time to run the prediction. But i want the prediction done on gpu to able to predict in lesser time.

Thank You.

Hi,

How do you install the onnxruntime package?
Do you use the prebuilt from the below link?
https://www.elinux.org/Jetson_Zoo#ONNX_Runtime

More, could you validate the model on the desktop onnxruntime environment?
Please help to verify if this issue is specified to the Jetson device.

Thanks.

Hey,

Thanks for the reply and yes I installed onnxruntime package from Jetson Zoo - eLinux.org.
The onnxruntime version is 1.8.0

The model ran successfully on Colab onnxruntime environment with GPU. Here’s a screen shot of the result

Hi,

We want to reproduce this error in our environment.
Could you share the model (model.onnx) with us?

Thanks.

Hi,

Here’s the a folder that contains the model.onnx, create_onnx.py, predict_onnx.py and the image
https://drive.google.com/drive/folders/103hJwjMvpcT_F87jOw1F5gt3tMuZ4uny?usp=sharing

And this is my GitHub repo, where you can find the data, train.py and the Pytorch model - GitHub - harshatejas/pytorch_custom_object_detection: Training PyTorch Faster-RCNN on custom dataset

Thank You.

Hi,

Thanks for your source.
Confirmed that we can also reproduce the CUDA 702 error in our environment.

We are checking this internally.
Will get back to you later.

Thanks.

Hi,

The package is built by the ONNX team.
Could you file a topic in their GitHub for help?

Thanks.

Hi,

Is it possible to run faster-rcnn onnx model on jetson nano?

Thank you.