BlazePose integration problem to DeepStream 7.0

• Hardware Platform (Jetson / GPU) Nvidia Geforce RTX 4070
• DeepStream Version 7.0
• JetPack Version (valid for Jetson only)
• TensorRT Version 8.6.1
• NVIDIA GPU Driver Version (valid for GPU only) 535.161.08
• Issue Type( questions, new requirements, bugs) BlazePose model integration problem.

I integrated and ran the BlazePose model pose estimation to the DeepStream. For this I wrote the nvinfer config:

[property]
gpu-id=0
net-scale-factor=0.0039215697906911373
model-color-format=0
onnx-file=…/models/pose_landmark_full.onnx
model-engine-file=…/models/pose_landmark_full.onnx_b1_gpu0_fp32.engine
#int8-calib-file=calib.table
labelfile-path=…/models/labels.txt
batch-size=1
network-mode=0
num-detected-classes=1
interval=0
gie-unique-id=1
process-mode=1
network-type=3
cluster-mode=4
maintain-aspect-ratio=1
symmetric-padding=1
#workspace-size=2000
parse-bbox-instance-mask-func-name=NvDsInferParseBlazePose
custom-lib-path=…/nvdsinfer_custom_impl_Blaze_pose/libnvdsinfer_custom_impl_Blaze_pose.so
output-instance-mask=1
input-tensor-meta=1
infer-dims=3;256;256
debug-level=3
output-tensor-meta=1
output-blob-names=Identity
layer-name=Identity
output-order=1

Also I implemented the plugin where i added the debug info to look at the model result after the inference:

static bool NvDsInferParseCustomBlazePose(std::vector<NvDsInferLayerInfo> const& outputLayersInfo,
    NvDsInferNetworkInfo const& networkInfo, NvDsInferParseDetectionParams const& detectionParams,
std::vector<NvDsInferInstanceMaskInfo>& objectList) {

const NvDsInferLayerInfo* output = nullptr;
for (const auto& layer : outputLayersInfo) {
    if (strcmp(layer.layerName, "Identity") == 0) {
        output = &layer;
        break;
    }
}

if (!output) {
    std::cerr << "ERROR: Could not find the 'Identity' layer in the output" << std::endl;
    return false;
}

if (output->dataType != FLOAT) {
    std::cerr << "ERROR: Unexpected data type. Expected float, but got: " << output->dataType << std::endl;
    return false;
}

const _Float32* outputData = static_cast<const _Float32*>(output->buffer);
const uint channelsSize = output->inferDims.numElements;
const uint netW = networkInfo.width;
const uint netH = networkInfo.height;

std::cout << "Model Output Info:" << std::endl;
std::cout << "Name: " << output->layerName << std::endl;
std::cout << "Shape: (1, " << channelsSize << ")" << std::endl;
std::cout << "Data Type: " << output->dataType << std::endl;
std::cout << "Number of Keypoints: " << channelsSize << std::endl;
std::cout << "Keypoints Data: [[";
for (uint i = 0; i < channelsSize; ++i) {
    std::cout << outputData[i] << " ";
}
std::cout << "]]" << std::endl;

std::vector<NvDsInferInstanceMaskInfo> keypointsInfo = decodeBlazePoseOutput(outputData, channelsSize, netW, netH);

objectList.clear();
objectList = keypointsInfo;

return true;

}

This is result of the inference:

Model Output Info:
Name: Identity
Shape: (1, 195)
Data Type: 0
Number of Keypoints: 195
Keypoints Data: [[133.892 126.812 -126.83 4.87129 6.3456 135.983 123.48 -131.546 4.3541 5.94556 137.22 122.911 -131.498 4.10678 5.7117 138.453 122.39 -131.495 4.08368 5.43323 132.481 124.268 -131.179 4.58841 6.57842 131.341 124.224 -131.197 4.48914 6.65509 130.211 124.183 -131.215 4.80924 6.74207 140.796 118.361 -125.995 3.55969 5.27982 128.281 120.765 -123.668 4.5182 6.89889 135.879 125.66 -121.86 4.50667 5.89907 132.033 126.377 -121.316 4.6821 6.48752 148.604 113.945 -86.4735 4.88566 4.97485 117.902 121.542 -94.9949 5.9602 6.21012 149.473 121.899 -40.7146 -0.246334 5.05674 120.425 138.051 -57.0485 2.22312 6.5247 139.225 126.301 -19.6482 0.12634 5.08594 132.019 134.044 -28.5732 2.74862 8.15707 136.733 128.301 -19.8782 0.153291 5.06046 135.332 134.34 -28.7915 2.339 8.04276 135.16 125.418 -22.7351 0.271172 5.1737 135.974 130.125 -28.4864 2.44404 8.14153 135.554 123.81 -20.2336 0.312834 5.08776 135.61 129.095 -27.1212 2.41778 8.12186 140.181 116.584 2.59204 8.3082 10.0026 123.215 118.374 -2.65591 8.13905 10.0542 145.723 148.643 -0.735217 2.34421 8.44022 120.983 149.365 4.88721 2.19843 8.41207 147.851 176.049 29.1621 3.11148 6.81073 119.852 177.64 37.1354 2.5789 7.27463 146.23 178.335 30.7904 1.83683 6.55026 120.77 179.511 39.4435 1.36546 6.91496 150.221 190.782 -0.928259 2.98934 5.07984 121.377 194.861 10.6385 2.48772 5.62019 131.816 118.067 0.106942 -21.7882 21.0895 133.618 27.7204 0.263672 -21.8095 20.9721 135.753 126.366 0.0233847 -21.7362 4.76915 139.26 126.275 0.167522 -21.7498 4.84723 135.756 131.48 0.0680409 -21.7051 7.30306 131.973 134.014 -0.026127 -21.6361 7.48631 ]]

This result is not correct, because it is not the same with the reference pipeline, that works perfect.
This is the correct result after the inference in the reference pipeline:
Model Output Info:
Name: Identity
Shape: (1, 195)
Data Type: float32
Data: [[ 1.34127411e+02 8.51423035e+01 -7.72913742e+01 6.73996162e+00
6.12476158e+00 1.35687820e+02 8.20482788e+01 -7.05673141e+01
5.90085316e+00 6.11340332e+00 1.36884155e+02 8.19154358e+01
-7.05997162e+01 5.95664215e+00 6.15648460e+00 1.38069260e+02
8.18166504e+01 -7.06117554e+01 5.91953468e+00 6.19343090e+00
1.31883911e+02 8.22762680e+01 -7.08883972e+01 5.41670895e+00
5.95233154e+00 1.30659119e+02 8.23308563e+01 -7.08818970e+01
5.39485931e+00 5.81268215e+00 1.29453003e+02 8.23827820e+01
-7.08954697e+01 5.34367371e+00 5.71174240e+00 1.39727203e+02
8.28634186e+01 -2.53535576e+01 5.41604328e+00 6.27084827e+00
1.27449120e+02 8.36991653e+01 -2.58820782e+01 4.58715439e+00
5.48289108e+00 1.36352356e+02 8.82415848e+01 -5.97117538e+01
6.54122543e+00 6.59466648e+00 1.32080963e+02 8.85622940e+01
-5.99212036e+01 6.11606789e+00 6.23729324e+00 1.49781281e+02
1.00469231e+02 -3.86938047e+00 8.03562832e+00 7.07540989e+00
1.18148499e+02 1.01457352e+02 -1.44072628e+00 6.75756645e+00
5.90803051e+00 1.48102356e+02 1.22925110e+02 -2.01236191e+01
1.22893143e+00 5.59673119e+00 1.20533363e+02 1.26047218e+02
-8.98852444e+00 2.19238091e+00 4.95798111e+00 1.40159943e+02
1.37836594e+02 -8.92863007e+01 9.60869789e-01 6.51170492e+00
1.25583588e+02 1.41596588e+02 -6.00603294e+01 1.46546888e+00
5.68430853e+00 1.39452118e+02 1.43402100e+02 -1.08348984e+02
6.98071480e-01 6.35530806e+00 1.25737984e+02 1.47090820e+02
-7.66798401e+01 1.16211796e+00 5.82995224e+00 1.38041962e+02
1.42287186e+02 -1.17590317e+02 7.52276421e-01 6.42307949e+00
1.26497627e+02 1.46387238e+02 -8.50387878e+01 1.18911743e+00
5.77919245e+00 1.37357330e+02 1.40137482e+02 -9.51008835e+01
5.64127922e-01 6.50269699e+00 1.27454880e+02 1.44153549e+02
-6.49084854e+01 9.84773159e-01 5.81506538e+00 1.43007278e+02
1.51067993e+02 -3.25088501e+00 6.99400473e+00 6.90005779e+00
1.23827789e+02 1.50483246e+02 3.33819008e+00 6.56668234e+00
6.50400543e+00 1.45736176e+02 1.87921982e+02 -2.45862961e+01
4.38423443e+00 7.95130539e+00 1.20298485e+02 1.87304306e+02
-7.27436352e+00 4.73125172e+00 7.09363842e+00 1.46965393e+02
2.22381470e+02 4.40943909e+01 4.00260973e+00 7.13036728e+00
1.17406479e+02 2.21243652e+02 5.33527069e+01 4.27931213e+00
6.89513397e+00 1.45334076e+02 2.26461731e+02 4.82308006e+01
1.76353836e+00 6.79423189e+00 1.18358162e+02 2.25502304e+02
5.68017273e+01 1.63529110e+00 6.69629335e+00 1.47987625e+02
2.36739410e+02 -3.61620903e+00 3.60444260e+00 5.72853947e+00
1.17011459e+02 2.35260330e+02 5.88278961e+00 3.67541838e+00
5.58915186e+00 1.33429810e+02 1.50955917e+02 1.10433521e-02
-2.07624626e+01 2.00330315e+01 1.34158005e+02 5.72637711e+01
1.95749372e-01 -2.07550564e+01 2.00392818e+01 1.38469467e+02
1.42656693e+02 -3.06649655e-02 -2.06797619e+01 5.75013256e+00
1.40261719e+02 1.37813446e+02 1.39857873e-01 -2.06311188e+01
5.87664318e+00 1.26335205e+02 1.46619583e+02 7.93662369e-02
-2.06242943e+01 5.33506107e+00 1.25588226e+02 1.41638489e+02
3.51254493e-02 -2.06772423e+01 5.28287077e+00]]

I got this result when I took the generated in the deepstream the TensorRT engine, and I implemented python pipeline and run this model on the this engine but without deepstream.
So, the problem isn`t with the TensorRT, because the same model on the same engine works perfectly without deepstream.

This is how the model linked to the deepstream:

 GstElement *pgie = gst_element_factory_make("nvinfer", "nvinfer-blaze");
 if (!pgie) {
      g_printerr("ERROR: Failed to create nvinfer\n");
      return -1;
  }

 g_object_set(G_OBJECT(pgie), "config-file-path", CONFIG_INFER_POSE, "qos", 0, NULL);

 gst_bin_add_many(GST_BIN(pipeline), pgie, tracker, converter, osd, sink, NULL);
  if (!gst_element_link_many(streammux, pgie, tracker, converter, osd, sink, NULL)) {
    g_printerr("ERROR: Pipeline elements could not be linked\n");
    return -1;
  }

So, I think, the problem with the infer settings, or with something under the hood of deepstream, which leads to incorrect data preprocessing.

This is example of the preprocessing of the correct python pipeline:

def preprocess_frame(frame, input_size):
    frame_resized = cv2.resize(frame, input_size)
    frame_normalized = frame_resized.astype(np.float32) / 255.0
    frame_transposed = np.transpose(frame_normalized, [2, 0, 1])
    return np.expand_dims(frame_transposed, axis=0)

Maybe there is a problem with the infer settings? Or do I need to implement preprocessing?
Could your help me, to find where does the data distortion occur? At the input of the model or already at the output? And how to fix it?

Only these two parameters can be confirmed to be aligned with

The other things need to be confirmed by yourself since they are all model related but not DeepStream related.

Why do you enable this parameter?

Thank you for your answer. I enable it, because I debug the meta data of the model

We know nothing about the model. You may refer to DeepStream SDK FAQ - Intelligent Video Analytics / DeepStream SDK - NVIDIA Developer Forums to check the nvinfer configurations.

Thank you! Do I need change the parameter network-type=3 to 100, for the BlazePose? Because, now it is working like a Yolo and I am emulating in the plugin processing of the bboxes. Maybe it is not good way and I need to work with the BlazePose like a custom model?

Seems you want to output the bboxes and the object masks, it is correct to set “network-type=3”. You can refer to the instance segmentation custom postprocessing sample “NvDsInferParseCustomMrcnnTLT” in /opt/nvidia/deepstream/deepstream/sources/libs/nvdsinfer_customparser/nvdsinfer_custombboxparser.cpp. The configuration can be found in /opt/nvidia/deepstream/deepstream/sources/apps/sample_apps/deepstream-mrcnn-test/dsmrcnn_pgie_config.txt

Not really. BlazePose model returns only the pose key points, without bboxes, so I need only the key points array. How to get it correctly in the deepstream? And is there such a possibility? Because I tried to implement the model via network-type=100 without processing the bboxes, and get the inference output values ​​from the metadata and I got the same result of an array of incorrect values.

You need to assign bbox to the object. If you don’t need bbox, just ignore bbox in the following plugins.

It is model related, you need to make sure the preprocessing and postprocessing is the same as your reference pipeline.