TRT is not working as expected, in contrast to torch and onnxruntime

Description

Simple operation is not working as expected on trt(in contrast to the same operation using torch/onnxruntime)-

offsets = offsetsB[:, :, keypoints[:, :, 0], keypoints[:, :, 1], :]

Environment

TensorRT Version: 8.4.2.4
GPU Type: Quadro T2000
Nvidia Driver Version: R471.68 (r471_59-5) / 30.0.14.7168 (8-5-2021)
CUDA Version: 11.4
CUDNN Version: 8.1.1
Operating System + Version: Windows 10
Python Version (if applicable): 3.6.8
TensorFlow Version (if applicable): NA
PyTorch Version (if applicable): NA
Baremetal or Container (if container which image + tag): Baremetal

Relevant Files

model.py (547 Bytes)
test.py (2.2 KB)
model_folded.onnx (1.6 KB)
model.onnx (5.9 KB)

offsets_ort.bin (9.2 KB)
offsets_trt.bin (9.2 KB)
cpu_input.bin (640 KB)
keypointsExtract.bin (9.2 KB)
cpu_output.bin (9.2 KB)

Steps To Reproduce

Put the files in the same directory, and run the test.py file.
The script will create the onnx model(“model_gather_test”), and will save the random input(cpu_input.bin), and output(cpu_output.bin).

Anyway, I have already uploaded the onnx model, the folded onnx model(after using polygraphy) and the inputs/outputs that fit for the same running I activated.

After that I built the trt engine using-

trtexec --onnx=C:\projects\playground/model_folded.onnx  --saveEngine=C:\projects\playground/model_folded.engine --minShapes="keypoints:1x1x2,rand_input:1x1x320x256x2" --optShapes="keypoints:1x2565x2,rand_input:1x1x320x256x2" --maxShapes="keypoints:1x8000x2,rand_input:1x1x760x690x2"

And activated the engine on the inputs.

I have also activated the model using torch and onnxruntime.
When I use torch or onnxruntime, the outputs are identical(or almost identical), but when I use the TRT engine so the output is significately different.
I have uploaded the inputs and outputs, so you can see the difference.

Hi,
Please refer to below links related custom plugin implementation and sample:

While IPluginV2 and IPluginV2Ext interfaces are still supported for backward compatibility with TensorRT 5.1 and 6.0.x respectively, however, we recommend that you write new plugins or refactor existing ones to target the IPluginV2DynamicExt or IPluginV2IOExt interfaces instead.

Thanks!

But I don’t understand why it is a problem.
Which operator in my model is problematic?
I don’t recieve any indication about a problem in my model.

Hi,

How are you comparing the TensorRT engine output with onnx-runtime, pytorch output?
Could you please share that issue repro script as well.
Also, how about before running polygraphy tool, are you facing the same issue?

Thank you.

I see the underministic behavior in the outputs I have uploaded.
The operator I’m trying to implement is “gather_nd” that is not implemented in torch. That’s not an computational operator so I don’t see a reason that it will act differently than onnx_runtime on the same model.

Before polygraphy I’m facing the same problem.

Could you please share with us the issue output logs on running test.py?

This is the log-

C:\Users\E030852\AppData\Local\Programs\Python\Python36\python.exe C:/projects/playground/test.py
C:\Users\E030852\AppData\Local\Programs\Python\Python36\lib\site-packages\torch\onnx\symbolic_opset9.py:2819: UserWarning: Exporting aten::index operator of advanced indexing in opset 13 is achieved by combination of multiple ONNX operators, including Reshape, Transpose, Concat, and Gather. If indices include negative values, the exported graph will produce incorrect results.
  "If indices include negative values, the exported graph will produce incorrect results.")
graph(%keypoints : Long(*, *, *, strides=[2346, 2, 1], requires_grad=0, device=cpu),
      %rand_input : Float(*, *, *, *, *, strides=[163840, 163840, 512, 2, 1], requires_grad=0, device=cpu)):
  %2 : Long(requires_grad=0, device=cpu) = onnx::Constant[value={0}]()
  %3 : Long(*, *, strides=[2346, 2], requires_grad=0, device=cpu) = onnx::Gather[axis=2](%keypoints, %2) # C:\projects\playground\model.py:55:0
  %4 : Long(requires_grad=0, device=cpu) = onnx::Constant[value={1}]()
  %5 : Long(*, *, strides=[2346, 2], requires_grad=0, device=cpu) = onnx::Gather[axis=2](%keypoints, %4) # C:\projects\playground\model.py:55:0
  %6 : Long(5, strides=[1], device=cpu) = onnx::Shape(%rand_input)
  %7 : Long(1, strides=[1], device=cpu) = onnx::Constant[value={0}]()
  %8 : Long(1, strides=[1], device=cpu) = onnx::Gather[axis=0](%6, %7)
  %9 : Long(1, strides=[1], device=cpu) = onnx::Constant[value={1}]()
  %10 : Long(1, strides=[1], device=cpu) = onnx::Gather[axis=0](%6, %9)
  %11 : Long(1, strides=[1], device=cpu) = onnx::Constant[value={3}]()
  %12 : Long(1, strides=[1], device=cpu) = onnx::Gather[axis=0](%6, %11)
  %13 : Long(1, strides=[1], device=cpu) = onnx::Constant[value={4}]()
  %14 : Long(1, strides=[1], device=cpu) = onnx::Gather[axis=0](%6, %13)
  %15 : Float(*, *, *, *, *, device=cpu) = onnx::Transpose[perm=[2, 3, 0, 1, 4]](%rand_input)
  %16 : Float(*, *, device=cpu) = onnx::Flatten[axis=2](%15)
  %17 : Long(*, *, device=cpu) = onnx::Mul(%3, %12)
  %18 : Long(*, *, device=cpu) = onnx::Add(%5, %17)
  %19 : Float(*, *, *, device=cpu) = onnx::Gather[axis=0](%16, %18)
  %20 : Long(2, strides=[1], device=cpu) = onnx::Shape(%18)
  %21 : Long(1, strides=[1], device=cpu) = onnx::Constant[value={-1}]()
  %22 : Long(4, strides=[1], device=cpu) = onnx::Concat[axis=0](%21, %8, %10, %14)
  %23 : Float(*, *, *, *, device=cpu) = onnx::Reshape(%19, %22)
  %24 : Float(*, *, *, *, device=cpu) = onnx::Transpose[perm=[1, 2, 0, 3]](%23)
  %25 : Long(5, strides=[1], device=cpu) = onnx::Concat[axis=0](%8, %10, %20, %14)
  %offsets : Float(*, *, *, *, *, strides=[2346, 2346, 2346, 2, 1], requires_grad=0, device=cpu) = onnx::Reshape(%24, %25) # C:\projects\playground\model.py:55:0
  return (%offsets)


Process finished with exit code 0

I have already checked the warning written on the log, but it’s probably not the problem. I succeeded in replacing the code to an alternative way, the warning disappears but the problem isn’t.
The alternative code I tried is-

        gather_x = keypoints.select(2, 0) # tensor.select(2, index) is equivalent to tensor[:,:,index] according to torch doc.
        gather_x = gather_x.reshape(gather_x.shape[1])

        gather_y = keypoints.select(2, 1)
        gather_y = gather_y.reshape(gather_y.shape[1])

        keypoints = gather_x * offsetsB.shape[3] + gather_y
        # keypoints = gather_x * offsetsB.shape[2] + gather_y
        offsetsB = offsetsB.reshape(1, 1, 1, offsetsB.shape[2] * offsetsB.shape[3], offsetsB.shape[4]) #need to be flatten

        offsets = torch.index_select(offsetsB, 3, keypoints)

Maybe it will help to understand the source of the problem

Thank you for your support

Hi,

When I execute the script observed the following output. Is this the expected one(correct result) you’re looking for ?
I didn’t get Process finished with exit code 0 as you got.

Exported graph: graph(%keypoints : Long(*, *, *, strides=[2346, 2, 1], requires_grad=0, device=cpu),
      %rand_input : Float(*, *, *, *, *, strides=[163840, 163840, 512, 2, 1], requires_grad=0, device=cpu)):
  %onnx::Gather_2 : Long(device=cpu) = onnx::Constant[value={0}, onnx_name="Constant_0"]()
  %onnx::Mul_3 : Long(*, *, strides=[2346, 2], requires_grad=0, device=cpu) = onnx::Gather[axis=2, onnx_name="Gather_1"](%keypoints, %onnx::Gather_2) # /my_data/files_share/223407/model.py:17:0
  %onnx::Gather_4 : Long(device=cpu) = onnx::Constant[value={1}, onnx_name="Constant_2"]()
  %onnx::Add_5 : Long(*, *, strides=[2346, 2], requires_grad=0, device=cpu) = onnx::Gather[axis=2, onnx_name="Gather_3"](%keypoints, %onnx::Gather_4) # /my_data/files_share/223407/model.py:17:0
  %onnx::Gather_6 : Long(5, strides=[1], device=cpu) = onnx::Shape[onnx_name="Shape_4"](%rand_input) # /my_data/files_share/223407/model.py:17:0
  %onnx::Gather_7 : Long(1, strides=[1], device=cpu) = onnx::Constant[value={0}, onnx_name="Constant_5"]() # /my_data/files_share/223407/model.py:17:0
  %onnx::Concat_8 : Long(1, strides=[1], device=cpu) = onnx::Gather[axis=0, onnx_name="Gather_6"](%onnx::Gather_6, %onnx::Gather_7) # /my_data/files_share/223407/model.py:17:0
  %onnx::Gather_9 : Long(1, strides=[1], device=cpu) = onnx::Constant[value={1}, onnx_name="Constant_7"]() # /my_data/files_share/223407/model.py:17:0
  %onnx::Concat_10 : Long(1, strides=[1], device=cpu) = onnx::Gather[axis=0, onnx_name="Gather_8"](%onnx::Gather_6, %onnx::Gather_9) # /my_data/files_share/223407/model.py:17:0
  %onnx::Gather_11 : Long(1, strides=[1], device=cpu) = onnx::Constant[value={3}, onnx_name="Constant_9"]() # /my_data/files_share/223407/model.py:17:0
  %onnx::Mul_12 : Long(1, strides=[1], device=cpu) = onnx::Gather[axis=0, onnx_name="Gather_10"](%onnx::Gather_6, %onnx::Gather_11) # /my_data/files_share/223407/model.py:17:0
  %onnx::Gather_13 : Long(1, strides=[1], device=cpu) = onnx::Constant[value={4}, onnx_name="Constant_11"]() # /my_data/files_share/223407/model.py:17:0
  %onnx::Concat_14 : Long(1, strides=[1], device=cpu) = onnx::Gather[axis=0, onnx_name="Gather_12"](%onnx::Gather_6, %onnx::Gather_13) # /my_data/files_share/223407/model.py:17:0
  %onnx::Flatten_15 : Float(*, *, *, *, *, device=cpu) = onnx::Transpose[perm=[2, 3, 0, 1, 4], onnx_name="Transpose_13"](%rand_input) # /my_data/files_share/223407/model.py:17:0
  %onnx::Gather_16 : Float(*, *, device=cpu) = onnx::Flatten[axis=2, onnx_name="Flatten_14"](%onnx::Flatten_15) # /my_data/files_share/223407/model.py:17:0
  %onnx::Add_17 : Long(*, *, device=cpu) = onnx::Mul[onnx_name="Mul_15"](%onnx::Mul_3, %onnx::Mul_12) # /my_data/files_share/223407/model.py:17:0
  %onnx::Gather_18 : Long(*, *, device=cpu) = onnx::Add[onnx_name="Add_16"](%onnx::Add_5, %onnx::Add_17) # /my_data/files_share/223407/model.py:17:0
  %onnx::Reshape_19 : Float(*, *, *, device=cpu) = onnx::Gather[axis=0, onnx_name="Gather_17"](%onnx::Gather_16, %onnx::Gather_18) # /my_data/files_share/223407/model.py:17:0
  %onnx::Concat_20 : Long(2, strides=[1], device=cpu) = onnx::Shape[onnx_name="Shape_18"](%onnx::Gather_18) # /my_data/files_share/223407/model.py:17:0
  %onnx::Concat_21 : Long(1, strides=[1], requires_grad=0, device=cpu) = onnx::Constant[value={-1}, onnx_name="Constant_19"]() # /my_data/files_share/223407/model.py:17:0
  %onnx::Reshape_22 : Long(4, strides=[1], device=cpu) = onnx::Concat[axis=0, onnx_name="Concat_20"](%onnx::Concat_21, %onnx::Concat_8, %onnx::Concat_10, %onnx::Concat_14) # /my_data/files_share/223407/model.py:17:0
  %onnx::Transpose_23 : Float(*, *, *, *, device=cpu) = onnx::Reshape[onnx_name="Reshape_21"](%onnx::Reshape_19, %onnx::Reshape_22) # /my_data/files_share/223407/model.py:17:0
  %onnx::Reshape_24 : Float(*, *, *, *, device=cpu) = onnx::Transpose[perm=[1, 2, 0, 3], onnx_name="Transpose_22"](%onnx::Transpose_23) # /my_data/files_share/223407/model.py:17:0
  %onnx::Reshape_25 : Long(5, strides=[1], device=cpu) = onnx::Concat[axis=0, onnx_name="Concat_23"](%onnx::Concat_8, %onnx::Concat_10, %onnx::Concat_20, %onnx::Concat_14) # /my_data/files_share/223407/model.py:17:0
  %offsets : Float(*, *, *, *, *, strides=[2346, 2346, 2346, 2, 1], requires_grad=0, device=cpu) = onnx::Reshape[onnx_name="Reshape_24"](%onnx::Reshape_24, %onnx::Reshape_25) # /my_data/files_share/223407/model.py:17:0
  return (%offsets)

/usr/local/lib/python3.8/dist-packages/torch/onnx/symbolic_opset9.py:4189: UserWarning: Exporting aten::index operator of advanced indexing in opset 13 is achieved by combination of multiple ONNX operators, including Reshape, Transpose, Concat, and Gather. If indices include negative values, the exported graph will produce incorrect results.
  warnings.warn(

I run command python test.py.

Thank you.

It looks the same as mine.
Could you run this as a trt engine and compare the output on a single input(torch against trt)?

Please let us know the command or steps you’re following for the above.