P2PNet converted to onnx return bad output when used on triton server

Description

I Trained model by pytorch from GitHub - TencentYoutuResearch/CrowdCounting-P2PNet: The official codes for the ICCV2021 Oral presentation "Rethinking Counting and Localization in Crowds: A Purely Point-Based Framework"
model. inference on trained model look Ok.
train and infer pytorch version 1.14.0a0+410ce96 - train and infer done on docker that is based on :nvcr.io/nvidia/pytorch:22.12-py3
I converted the model to onnx. onnxruntime version 1.15.1
The resulting onnx model - gives good result in the above environment using onnxruntime version 1.15.1.
I configured the model to triton server - the triton server loaded the model successfully. the triton server is based on nvcr.io/nvidia/tritonserver:23.08-py3
When I try to do inference on that model I get a result in the callback but the result is not floating points vector as expected but bytes arrays.
I uploaded the onnx model the config.pbtxt and the python code I use to test the triton inference server.
model.onnx (69.0 MB)
config.pbtxt (560 Bytes)
p2p_unit_test.py (11.0 KB)
Dockerfile (247 Bytes)
launch.sh (2.4 KB)

Environment

Tensorrt Version:
GPU Type: RTX3080 (laptop)
Nvidia Driver Version: 525.125.06
CUDA Version: 12.0
CUDNN Version:
Operating System + Version: ubuntu 20.04
Python Version (if applicable): 3.8.10
TensorFlow Version (if applicable):
PyTorch Version (if applicable):
Baremetal or Container (if container which image + tag):

Relevant Files

Please attach or include links to any models, data, files, or scripts necessary to reproduce your issue. (Github repo, Google Drive, Dropbox, etc.)

Steps To Reproduce

IP.zip (94.7 MB)

Take the IP.zip and unzip it on ubuntu 20.04.
cd to IP/NVIDIA_ENGINE-SEETRAIN/docker/triton
build the triton docker by ./launch.sh -b pc
run the triton docker by ./launch.sh -r pc
if the triton is up - good - 3 models should be up ready for inference.
cd to IP/NVIDIA_ENGINE-SEETRAIN/triton/tests/
open vscode or some other python IDE.
open p2p_unit_test.py
change line 262 to open an image.
go to line 158 and put a breakpoint.
debug the p2p_unit_test.py
in line 158 look at result logits_tensor and points_tensor should be of size ‘1’, ‘49152’, ‘2’ fp32
I get short byte string.

Please include:

  • Exact steps/commands to build your repro
  • Exact steps/commands to run your repro
  • Full traceback of errors encountered

Hi,

We recommend you raise this query in the TRITON Inference Server Github instance issues section.

Thanks!

According recommendation from NVIDIA I did the following successfully:
1.
import sys
import onnx
filename = yourONNXmodel
model = onnx.load(filename)
onnx.checker.check_model(model).

//github.com//NVIDIA/TensorRT/tree/master/samples/trtexec
I compiled the TensorRT package and run succesfully trtexec on my model.