I have fine-tuned a BodyPose2D model with 17 key points and tested it using the .hdf5 model for inference, where the results were good, and all key points were correctly detected.
However, when I converted the model to a deployable ONNX model, integrated it into the sample pipeline, and ran inference in DeepStream, I observed that the results are not good:
- Upper body keypoints are detected correctly.
- Lower body keypoints are not detected at all.
Debugging Steps Taken:
- Tested different TensorRT engine files generated from the ONNX model.
- Tried various DeepStream parameters to synchronize results with pre-conversion inference.
- Experimented with different input resolutions (320x448, 288x384, and trained model resolution 640x640) for BPNet conversion models.
- Verified keypoint mapping in pre-processing and post-processing steps.
Despite these efforts, the overall detection results remain incorrect and inconsistent with the original model inference.
Questions:
- What could be causing this discrepancy between the .hdf5 model inference and the ONNX model inference in DeepStream?
- Are there any specific DeepStream configuration settings or ONNX conversion parameters that might be affecting the keypoint detection?
Sample Outputs:
Sample ONNX model output (DeepStream inference output)
Sample hdf5 model output
![]()
Configurations
bodypose2d_pgie_config.yml
property:
gpu-id: 0
model-engine-file: /home/nvidia/deepstream_tao_apps/Iteration4-updated/bpnet_model.deploy-320-448.onnx_b1_gpu0_fp16.engine
tlt-encoded-model: ../../models/bodypose2d/model.etlt
onnx-file: /home/nvidia/deepstream_tao_apps/Iteration4-updated/bpnet_model.deploy-320-448.onnx
tlt-model-key: nvidia_tlt
#int8-calib-file: /home/nvidia/deepstream_tao_apps/apps/Models1/calibration.640.640.deploy.bin
network-input-order: 1
infer-dims: 3;320;448
#dynamic batch size
batch-size: 1
0: FP32, 1: INT8, 2: FP16 mode
network-mode: 2
num-detected-classes: 1
gie-unique-id: 1
output-blob-names: conv2d_transpose_1/BiasAdd:0;heatmap_out/BiasAdd:0
#0: Detection 1: Classifier 2: Segmentation 100: other
network-type: 100
Enable tensor metadata output
output-tensor-meta: 1
#1-Primary 2-Secondary
process-mode: 1
net-scale-factor: 0.00390625
offsets: 128.0;128.0;128.0
#0: RGB 1: BGR 2: GRAY
model-color-format: 0
maintain-aspect-ratio: 1
symmetric-padding: 1
scaling-filter: 1
class-attrs-all:
threshold: 0.8
deepstream-bodypose2d-app/bodypose2d_app_config.yml
source-list:
list: file:///home/nvidia/deepstream_tao_apps/apps/tao_others/deepstream-bodypose2d-app/trimmed_videonew.ts
output:
1:file ouput 2:fake output 3:eglsink output 4:RTSP output
type: 1
0: H264 encoder 1:H265 encoder
codec: 0
encoder type 0=Hardware 1=Software
enc-type: 0
bitrate: 2000000
udpport: 2345
rtspport: 8554
##The file name without suffix
filename: test
streammux:
width: 640
height: 480
batched-push-timeout: 40000
primary-gie:
#0:nvinfer, 1:nvinfeserver
plugin-type: 0
config-file-path: /home/nvidia/deepstream_tao_apps/configs/bodypose2d_tao/bodypose2d_pgie_config.yml
#config-file-path: ../../../configs/triton/bodypose2d_tao/bodypose2d_pgie_config.yml
#config-file-path: ../../../configs/triton-grpc/bodypose2d_tao/bodypose2d_pgie_config.yml
unique-id: 1
model-config:
config-file-path: /home/nvidia/deepstream_tao_apps/configs/bodypose2d_tao/sample_bodypose2d_model_config.yml



