Mispredictions with custom trained PoseClassificationNet using deepstream-tao-apps

Please provide complete information as applicable to your setup.

• Hardware Platform (Jetson / GPU) - Jetson
• DeepStream Version - 6.2
• JetPack Version (valid for Jetson only) - 5.1.1
• TensorRT Version - 8.5.2
• NVIDIA GPU Driver Version (valid for GPU only)
• Issue Type( questions, new requirements, bugs) - questions
• How to reproduce the issue ? (This is for bugs. Including which sample app is using, the configuration files content, the command line used and other details for reproducing)
• Requirement details( This is for new requirement. Including the module name-for which plugin or for which sample application, the function description)

I am training a Custom Pose Classification model using Tao toolkit Pose Classification model for 4 classes ( A,B,C,D) . We have achieved an overall accuracy on test data of about 85 % . and individual accuracy is above 80 % for each class. ( We generated the json data using deepstream bodypose-3d and then tao pose_classification dataset_convert using 25D to convert it into numpy file and finally merged all the numpy files into single numpy file)

After that we have converted the tlt model into onnx and deployed it into Deepstream Pose Classification Pipeline which is available on deepstream tao apps. After deploying it , keeping all the params to be same as mentioned in default configs file, We end up getting mispredictions for all the test cases ( The videos were same as that used for the testing dataset in tao)

Q - What is going wrong in the overall process for the model to completely misprediction?

Q - In tao pose_classification dataset convert , do we need to convert into 3D or 25D for generating the numpy files?

Q - In deepstream tao apps Pose classification pipeline , the nvinferserver for config_infer_secondary_bodypose3dnet.yml produces the output - outputs: [
{name: “pose2d”},
{name: “pose2d_org_img”},
{name: “pose25d”},
{name: “pose3d”}
] .How do we know which one is being used by next model for pose classification ?

Do you use bodypose-3d output to train the Pose Classification model?

Please consult TAO toolkit forum. Latest Intelligent Video Analytics/TAO Toolkit topics - NVIDIA Developer Forums

It depends on how you trained the model.

Yes ,we used bodypose-3d output to train the train classification Model. Our Model is able to get the accuracy of greater than 90% on test dataset. When the same test/train video is used to validate the model performance which is integrated with DeepStream Pose-Classification pipeline on an edge device, the model completely gives false results.

Q1. Could you tell me if we are missing anything while integrating the model?
Q2. Could you tell me what parameters needs to be configured to achieve the same performance which we were getting if we standalone test the PoseClassification Model.

Looking forward to you reply.

Thanks

Which bodypose3d outputs are you using to train the model? There are 4 outputs from the bodypose3d. The Pose Classification | NVIDIA NGC needs 3d points as the input. How did you get the 3d points data?

Hi @Fiona.Chen . Here are the steps which we followed to train the model.

  1. Given the raw videos for each activity. I ran the the bodypose3dNet for each video and generated the JSON for each video.

The JSON contains metadata for both the Pose25d and Pose3d pose.

  1. Once the JSON file has been generated for each video, We converted the JSON file .npy array for each JSON file using

> tao Pose Classification Dataset convert
**> **
> dataset_convert:
> results_dir: “${results_dir}/dataset_convert”
> data: “???”
> pose_type: “3dbp”
> num_joints: 34
> input_width: 1920
> input_height: 1080
> focal_length: 1200.0
> sequence_length_max: 300
> sequence_length_min: 10
> sequence_length: 100
> sequence_overlap: 0.5

  1. Once the .npy files is generated for each JSON / raw video. We combined all the npy files into single array which is further split train, val, test based on the split ratio.

The final shape of an array looks = [total sequences in array , total channels, max_sequence_length, total_keypoints(34), num_persons]

  1. To generate the label .pkl file. We created two list, one contains name of the file and other list contain label_id

eg -
[[“xl6vmD0XBS0.json”, “OkLnSMGCWSw.json”, “IBopZFDKfYk.json”, “HpoFylcrYT4.json”, “mlAtn_zi0bY.json”, …], [4, 3, 2, 1, 0 …]]
**

After all the steps , We train the Pose Classification by defining the Hyper-parameters.
and generated .tlt model.

Q We are able to achieve good accuracy numbers on each class but when we integrate it to the deepstream Pose Classification Pipeline, It’s becoming biased toward 1 class in similar way as the Pre-trained model of Pose Classification getting biased towards Walking?

There is no update from you for a period, assuming this is not an issue anymore.
Hence we are closing this topic. If need further support, please open a new one.
Thanks

If the pose classification is trained by such data, please change the code here deepstream_tao_apps/apps/tao_others/deepstream-pose-classification/deepstream_pose_classification_app.cpp at master · NVIDIA-AI-IOT/deepstream_tao_apps (github.com) to

for (auto i = 0; i < p3dLifted.size(); i++) {
        p3dLifted[i].x *= scale;
        p3dLifted[i].y *= scale;
        p3dLifted[i].z *= scale;
      }

This topic was automatically closed 14 days after the last reply. New replies are no longer allowed.