I have some PyTorch neural networks and I convert them to tensorRT .plan file, I successfully deploy the model to Triton Server, then I write a C++ program to run an inference with an HTTP request.
The problem is, I can successfully convert and run some models like YoloV5, but when I try to do the same thing to a very simple model, I got an error on the triton server side:
E0218 09:59:10.719600 1 logging.cc:43] …/rtSafe/cuda/cudaConvolutionRunner.cpp (483) - Cudnn Error in executeConv: 3 (CUDNN_STATUS_BAD_PARAM)
E0218 09:59:10.724232 1 logging.cc:43] FAILED_EXECUTION: std:exception
I convert the model multiple times, then I found the error does not occur every time, it occurs most of the time, and the inference works sometimes.
The model visualization, model onnx file, and full log of Triton Server is attached below
TensorRT Version: 220.127.116.11
GPU Type: NVIDIA Corporation Device 1f82 (rev a1), TU117 [GeForce GTX 1650]
Nvidia Driver Version: 470.57.02
CUDA Version: 11.1 but nvidia-smi shows 11.4
CUDNN Version: 8.0.4
Operating System + Version: Ubuntu 18.04
Python Version (if applicable): 3.6
TensorFlow Version (if applicable): not used
PyTorch Version (if applicable): 1.8.0+cu111
** python library ONNX version **: 1.6.0
** python library onnxruntime-gpu version **: 1.4.0
Triton Server: nvcr.io/nvidia/tritonserver:20.10-py3
test.onnx (804 Bytes)
full triton server log:
full_log.txt (23.4 KB)
- You can just define a model in PyTorch like the one above with the above environment, I did not train it, just use the initial weights, export it to onnx model.
- You can also directly use the onnx model I attached here
- Use the official trtexec file in TensorRT-18.104.22.168, or write python code to convert the onnx model to tensorRT plan.
- Deploy the model in triton server 20.10
- Write C++ code to send HTTP inference request to the triton server.
Can anyone solve my problem? Thank you.