Understanding and Improving DeepStream PoseClassification Preprocessing and Model Integration

Please provide complete information as applicable to your setup.

Hardware Platform (Jetson / GPU) GPU
• DeepStream Version - 6.3
• JetPack Version (valid for Jetson only) - 5.1.1
• TensorRT Version - 8.5.2
• NVIDIA GPU Driver Version (valid for GPU only)
• Issue Type( questions, new requirements, bugs) Questions
• How to reproduce the issue ? (This is for bugs. Including which sample app is using, the configuration files content, the command line used and other details for reproducing)
• Requirement details( This is for new requirement. Including the module name-for which plugin or for which sample application, the function description)

Hi , I have a few questions regarding How does Preprocess function of DeepStream PoseClassification Works.

Before that let me give you a brief How did I prepare the training data and the corresponding Model for 3 classes.

Here are the Steps that I followed.

  1. I was having training videos for each class ( The dataset was balanced) . Each video consist of maximum of 30 frames. Generally the All the videos were in the range of 20~30 frames . Using The DeepStream BodyPose3D I have generated the corresponding Json files for each video.
    deepstream_reference_apps/deepstream-bodypose-3d/README.md at master · NVIDIA-AI-IOT/deepstream_reference_apps · GitHub

  2. Once the JSON files were created for each video I refined it by considering that If any frame were missing then It should the data of the previous frame or if the object is detected but the key-points were not detected then it should just copy the key-points of previous frame.
    The above process was done to ensure that there shouldn’t be zero value for key points between minimum frame number and maximum frame number.

  3. Once the JSON were refined I have converted the JSON into .npy files using tao model pose classifcation dataset_convert function . This were configuration setting that I used.

dataset_convert:
pose_type: “25dbp”
num_joints: 34
input_width: 1280
input_height: 720
focal_length: 800.79041
sequence_length_max: 300
sequence_length_min: 10
sequence_length: 30
sequence_overlap: 0.9

The above conditions were considered in such a manner that there should not be any overlap between the frames in each video.
The shapes of the each npy array was ( 1, 3, 34, 300, 1) .
All though The array was formed based on the max-sequence-length of 300 but every .npy files were having non-zeros values till 30th Frame and after that it was all zeros in the array.

  1. Once the .npy files were generated, it was clubbed into single array and then splitted into train,val and test . Hence the final shape was ( 9000, 3, 34, 300, 1) and corresponding label pkl was created. [[0, 1, 2], [abc.json, cde.json, efg.json] in this format.

  2. Using tao PoseClassifcationNet, I have trained the model and converted the .tlt into onnx format

Now I have integrated the model in deepstream_tao_apps/apps/tao_others/deepstream-pose-classification at master · NVIDIA-AI-IOT/deepstream_tao_apps · GitHub and made the necessary changes in deepstream_tao_apps/apps/tao_others/deepstream-pose-classification/infer_pose_classification_parser/infer_pose_classification_parser.cpp at master · NVIDIA-AI-IOT/deepstream_tao_apps · GitHub

  1. I have set the frame-sequence-length =30 in deepstream_tao_apps/configs/nvinfer/bodypose_classification_tao/config_preprocess_bodypose_classification.txt at master · NVIDIA-AI-IOT/deepstream_tao_apps · GitHub

After following the above When I try to test the pipeline , I was not getting the accurate results even for the training videos that were used during training.

Few Things I have observed.

  1. Even Though I have set 30 as frame-sequence-length I was started to get predictions after 3rd frame only and continuously till the End of the Video.

  2. If I am started to get prediction before the set frame-sequence-length, How is the array getting generated since we need to feed the array of shape ( 1, 3. 300, 34, 1) . to BodyPoseClassificaion Model . Is the array getting appended by zero to form an array of 300 sequence length for the frames where predictions were generated by the pipeline?

I am suspecting I have to do some custom modifications in deepstream_tao_apps/apps/tao_others/deepstream-pose-classification/nvdspreprocess_lib/nvdspreprocess_lib.cpp at master · NVIDIA-AI-IOT/deepstream_tao_apps · GitHub

I am not able to understand the current flow of the preprocess script and what changes should I make for the above mentioned process that I followed as to make the predictions much more accurate at least for the training videos on which I have trained?

Looking forward to your suggestions. and do help me understand How does preprocess function works.

Thanks
Shabbir

If there is no historical objects available, the previous key points are all zero coordinates.

The code is open source.

Hi @Fiona.Chen

Thanks for the response but why are predictions happening almost every frame instead of happening after a defined frame-sequence-length?

I have tried to print the 34 pose co-ordinates, [units , object id , object meta id , from CustomTensorPreparaton funcion in nvdspreprocess_lb.cpp ] , frame_num, activity name and it’s corresponding probability.

As you can see that the Even though the historical data available for frame 7th Still the prediction happened for the 8th frame as well .

  1. How come the prediction is happening for every frame?
  2. How does the final array of shape ( 1, 3, 300, 34, 1) is being generated ?

When there is person detected, there will be output

Please refer to the source code.

Let me know if my understanding is correct.

Whenever the person is detected the prediction will happen and in order to generate the array of ( 1, 3, 300, 34, 1) . Historical data will be used for the previous frames if it is available and the frames after it will be zero.

Is my understanding correct?

I would request if you can provide me the link to that part of the code where the array of shape is (1, 3, 300, 34, 1) getting created.

Thanks

Yes.

The code is here: deepstream_tao_apps/apps/tao_others/deepstream-pose-classification/nvdspreprocess_lib/nvdspreprocess_lib.cpp at master · NVIDIA-AI-IOT/deepstream_tao_apps (github.com)

There is no update from you for a period, assuming this is not an issue anymore. Hence we are closing this topic. If need further support, please open a new one. Thanks

This topic was automatically closed 14 days after the last reply. New replies are no longer allowed.