Description
I am doing a model deployment, things going fine with TensorRT 8.0.1, CUDA 11.3 and ONNX model exported with opset=11. but recently I tried to upgrade to TensorRT 8.5.1.7, CUDA 11.8 and export the ONNX model with opset=16, the efficiency improves a lot. However, the TRT model always give the wrong result.
So I compare the output layer by layer using the Polygraphy tool, and the report says the outputs of ScatterND ops mismatched with the output from ONNX.
The wired thing is that if I just compare the last output by running the command below, the comparision will not pass.
POLYGRAPHY_AUTOINSTALL_DEPS=1 \
polygraphy run model.onnx \
--onnxrt --trt \
--save-engine=model.plan \
--atol 1e-4 --rtol 1e-4 \
--verbose \
> result-run-FP32.txt 2>&1
but if I mark all layers as output and compare all layers outputs, the comparision with just pass on the last output, with a few layers failed.
polygraphy run model.onnx \
--onnxrt --trt \
--save-engine=model_markall.plan \
--atol 1e-4 --rtol 1e-4 \
--verbose \
--onnx-outputs mark all \
--trt-outputs mark all \
> result-run-FP32-markall.txt 2>&1
So I tried to avoid to mark all layers as output, I mark some key layers as output, and finally find the layers give the wrong output, and they happens all to be ScatterND ops.
Here’s my command
polygraphy run model.sim.onnx \
--onnxrt --trt \
--save-engine=model_sim_markmiddle.plan \
--atol 1e-3 --rtol 1e-3 \
--verbose \
--onnx-outputs \
'/Reshape_12_output_0' \
'/reg_x_layer.0/layer/layer.0/Gemm_output_0' \
'/reg_x_layer.0/layer/layer.1/Clip_output_0' \
'/reg_x_layer.0/layer/layer.2/Gemm_output_0' \
'/reg_x_layer.0/layer/layer.3/Clip_output_0' \
'/reg_x_layer.0/layer/layer.4/Gemm_output_0' \
'/reg_z_layer.0/layer/layer.0/Gemm_output_0' \
'/reg_z_layer.0/layer/layer.1/Clip_output_0' \
'/reg_z_layer.0/layer/layer.2/Gemm_output_0' \
'/reg_z_layer.0/layer/layer.3/Clip_output_0' \
'/reg_z_layer.0/layer/layer.4/Gemm_output_0' \
'/reg_vis_layer.0/layer/layer.0/Gemm_output_0' \
'/reg_vis_layer.0/layer/layer.1/Clip_output_0' \
'/reg_vis_layer.0/layer/layer.2/Gemm_output_0' \
'/reg_vis_layer.0/layer/layer.3/Clip_output_0' \
'/reg_vis_layer.0/layer/layer.4/Gemm_output_0' \
'/cls_layer.0/layer/layer.0/Gemm_output_0' \
'/cls_layer.0/layer/layer.1/Clip_output_0' \
'/cls_layer.0/layer/layer.2/Gemm_output_0' \
'/cls_layer.0/layer/layer.3/Clip_output_0' \
'/cls_layer.0/layer/layer.4/Gemm_output_0' \
'/Reshape_13_output_0' \
'/Reshape_16_output_0' \
'/Add_24_output_0' \
'/ScatterND_2_output_0' \
'/ScatterND_3_output_0' \
'/ScatterND_4_output_0' \
'/ScatterND_5_output_0' \
'/ScatterND_6_output_0' \
'2442' \
--trt-outputs \
'/Reshape_12_output_0' \
'/reg_x_layer.0/layer/layer.0/Gemm_output_0' \
'/reg_x_layer.0/layer/layer.1/Clip_output_0' \
'/reg_x_layer.0/layer/layer.2/Gemm_output_0' \
'/reg_x_layer.0/layer/layer.3/Clip_output_0' \
'/reg_x_layer.0/layer/layer.4/Gemm_output_0' \
'/reg_z_layer.0/layer/layer.0/Gemm_output_0' \
'/reg_z_layer.0/layer/layer.1/Clip_output_0' \
'/reg_z_layer.0/layer/layer.2/Gemm_output_0' \
'/reg_z_layer.0/layer/layer.3/Clip_output_0' \
'/reg_z_layer.0/layer/layer.4/Gemm_output_0' \
'/reg_vis_layer.0/layer/layer.0/Gemm_output_0' \
'/reg_vis_layer.0/layer/layer.1/Clip_output_0' \
'/reg_vis_layer.0/layer/layer.2/Gemm_output_0' \
'/reg_vis_layer.0/layer/layer.3/Clip_output_0' \
'/reg_vis_layer.0/layer/layer.4/Gemm_output_0' \
'/cls_layer.0/layer/layer.0/Gemm_output_0' \
'/cls_layer.0/layer/layer.1/Clip_output_0' \
'/cls_layer.0/layer/layer.2/Gemm_output_0' \
'/cls_layer.0/layer/layer.3/Clip_output_0' \
'/cls_layer.0/layer/layer.4/Gemm_output_0' \
'/Reshape_13_output_0' \
'/Reshape_16_output_0' \
'/Add_24_output_0' \
'/ScatterND_2_output_0' \
'/ScatterND_3_output_0' \
'/ScatterND_4_output_0' \
'/ScatterND_5_output_0' \
'/ScatterND_6_output_0' \
'2442' \
--plugins='/usr/local/TensorRT/lib/libnvinfer_plugin.so' \
> result-run-FP32-markmiddle.txt 2>&1
The model graph loos like
and Polygraphy generate report as
result-run-FP32-markmiddle_op_xzviscls_r1316_add24_scatterND23456_p.txt (141.1 KB)
Also, I tried to upgrade the TensorRT to 8.6.1.6, and re-export the ONNX model with opset=17, the TRT model still gives the wrong results.
How can I fix this issue and get the right output?
Environment
TensorRT Version: 8.5.1.7
GPU Type: NVIDIA GeForce GTX 1050 Ti
Nvidia Driver Version: 535.54.03
CUDA Version: 11.8
CUDNN Version: 8.9.0
Operating System + Version: Host: Ubuntu 16.04.7 LTS, Docker Container: Ubuntu 20.04.6 LTS
Python Version (if applicable): python3.8
PyTorch Version (if applicable): 2.1.0a0+fe05266
Container: nvidia/cuda:11.8.0-cudnn8-devel-ubuntu20.04
Relevant Files
ONNX Model: model.onnx - Google Drive