Hardware Platform: GPU
DeepStream Version: deepstream-app version 7.0.0
DeepStreamSDK 7.0.0
JetPack Version: N/A (GPU Platform)
TensorRT Version: 8.6.1 (Python), Unknown (Binary)
NVIDIA GPU Driver Version: 565.77
Issue Type: Bug - TensorRT engine produces identical outputs for all keypoints
Problem Description
I have a Vision Transformer (ViT) based keypoint detection model that works correctly in PyTorch and ONNX Runtime, but when converted to TensorRT engine, all 7 keypoints produce identical coordinate values instead of the expected diverse keypoint locations.
Model Details
- Architecture: DualHeadViTPose (ViT backbone with dual heads for coordinates and visibility)
- Input: 1x3x512x512 (FP32/FP16)
- Output: 1x7x2 (coordinates only, expecting 7 different keypoint locations)
- ONNX Opset: 14
- PyTorch Version: 2.4.1
Hardware Setup
- GPU: NVIDIA GeForce RTX 3080
- GPU Memory: 10240 MB
- Driver: 565.77
- CUDA: 12.1 (PyTorch)
- OS: Ubuntu 22.04.5 LTS
TensorRT Build Command Used
trtexec --onnx=keypoint_coords_only.onnx \
--saveEngine=keypoint_model.engine \
--fp16 \
--workspace=4096 \
--hardwareCompatibilityLevel=ampere+ \
--minShapes=input:1x3x512x512 \
--optShapes=input:1x3x512x512 \
--maxShapes=input:4x3x512x512 \
--inputIOFormats=fp16:chw \
--outputIOFormats=fp16:chw \
--builderOptimizationLevel=5
Build Warnings Observed
- 108 weights affected by subnormal FP16 values
- 48 weights below FP16 minimum subnormal value
- “Running layernorm after self-attention in FP16 may cause overflow”
- External tactic sources disabled due to hardware compatibility mode
Build Results
- Build Time: 276.42 seconds
- Engine Size: 171 MiB
- Performance: 4.045ms mean latency, 252 QPS throughput
- Status: Build completes successfully with warnings
Issue Details
Expected Behavior (PyTorch/ONNX):
# 7 different keypoint coordinates
[[x1, y1], [x2, y2], [x3, y3], [x4, y4], [x5, y5], [x6, y6], [x7, y7]]
# Example: [[0.1, 0.2], [0.3, 0.4], [0.5, 0.6], [0.7, 0.8], ...]
Actual Behavior (TensorRT):
# All keypoints have identical coordinates
[[x, y], [x, y], [x, y], [x, y], [x, y], [x, y], [x, y]]
# Example: [[0.5, 0.5], [0.5, 0.5], [0.5, 0.5], [0.5, 0.5], ...]
Code for Reproduction
# TensorRT inference code
output_layer = get_tensor_meta_layer(tensor_meta, "output")
output_array = tensor_meta_layer_to_numpy(output_layer)
print(output_array.shape) # (1, 7, 2) - correct shape
print(output_array[0]) # All 7 keypoints have identical [x,y] values
Detailed Investigation
- Shape Verification: Output tensor has correct shape (1, 7, 2)
- Value Analysis: All 7 keypoints output identical coordinate pairs
- Input Validation: Same input produces diverse keypoints in PyTorch/ONNX
- Model Architecture: Uses learnable keypoint queries and cross-attention
- ONNX Verification: ONNX model produces expected diverse keypoint outputs