P2PNet converted to onnx return bad output when used on triton server


I Trained model by pytorch from GitHub - TencentYoutuResearch/CrowdCounting-P2PNet: The official codes for the ICCV2021 Oral presentation "Rethinking Counting and Localization in Crowds: A Purely Point-Based Framework"
model. inference on trained model look Ok.
train and infer pytorch version 1.14.0a0+410ce96 - train and infer done on docker that is based on :nvcr.io/nvidia/pytorch:22.12-py3
I converted the model to onnx. onnxruntime version 1.15.1
The resulting onnx model - gives good result in the above environment using onnxruntime version 1.15.1.
I configured the model to triton server - the triton server loaded the model successfully. the triton server is based on nvcr.io/nvidia/tritonserver:23.08-py3
When I try to do inference on that model I get a result in the callback but the result is not floating points vector as expected but bytes arrays.
I uploaded the onnx model the config.pbtxt and the python code I use to test the triton inference server.
model.onnx (69.0 MB)
config.pbtxt (560 Bytes)
p2p_unit_test.py (11.0 KB)
Dockerfile (247 Bytes)
launch.sh (2.4 KB)


Tensorrt Version:
GPU Type: RTX3080 (laptop)
Nvidia Driver Version: 525.125.06
CUDA Version: 12.0
CUDNN Version:
Operating System + Version: ubuntu 20.04
Python Version (if applicable): 3.8.10
TensorFlow Version (if applicable):
PyTorch Version (if applicable):
Baremetal or Container (if container which image + tag):

IP.zip (94.7 MB)

Take the IP.zip and unzip it on ubuntu 20.04.
cd to IP/NVIDIA_ENGINE-SEETRAIN/docker/triton
build the triton docker by ./launch.sh -b pc
run the triton docker by ./launch.sh -r pc
if the triton is up - good - 3 models should be up ready for inference.
cd to IP/NVIDIA_ENGINE-SEETRAIN/triton/tests/
open vscode or some other python IDE.
open p2p_unit_test.py
change line 262 to open an image.
go to line 158 and put a breakpoint.
debug the p2p_unit_test.py
in line 158 look at result logits_tensor and points_tensor should be of size ‘1’, ‘49152’, ‘2’ fp32
I get short byte string.

We recommend you raise this query in the TRITON Inference Server Github instance issues section.


According recommendation from NVIDIA I did the following successfully:
import sys
import onnx
filename = yourONNXmodel
model = onnx.load(filename)

I compiled the TensorRT package and run succesfully trtexec on my model.