Sourcing and running a model with both Bounding Boxes and Keypoints in Deepstream

Please provide complete information as applicable to your setup.

• Hardware Platform (Jetson / GPU)

GPU

• DeepStream Version

Any / Latest

• JetPack Version (valid for Jetson only)
• TensorRT Version

Any / Latest

• NVIDIA GPU Driver Version (valid for GPU only)

Any / Latest

• Issue Type( questions, new requirements, bugs)

Questions

• How to reproduce the issue ? (This is for bugs. Including which sample app is using, the configuration files content, the command line used and other details for reproducing)
• Requirement details( This is for new requirement. Including the module name-for which plugin or for which sample application, the function description)

I would like to run a model that does both Bounding Boxes and Keypoints (2 keypoints, not human) in Deepstream. My application has 100’s of cameras so I need deepstream for the decode/infer throughput capabilities.

Pytorch has an official model called Keypoint R-CNN here: Link

I have tried and failed to use this with trtexec after exporting ONNX model: Link

I have also tried and failed to export this using torch2trt: Link

Questions:

  • Ideally I would like to use the Pytorch Keypoint R-CNN model, or at least try it first since I have already trained it with a small amount of data. Are there any other paths aside from the failed onnx and torch2trt routes to getting the model running in Deepstream?

  • If the Pytorch Keypoint R-CNN model is not able to be used in Deepstream, what is the recommended model with both keypoints and bounding boxes to be used with Deepstream? I see the TF2 Model Zoo has some models with both, but is there any other “official” (easy!) way supported by NVIDIA?

Thanks!

Hi @brian0b6iu ,

You could use nvinferserver (Triton) with torch as backend.
nvinferserver - Gst-nvinferserver — DeepStream 6.1 Release documentation
you can refer to python DeepStream sample - deepstream_python_apps/apps/deepstream-ssd-parser at master · NVIDIA-AI-IOT/deepstream_python_apps · GitHub or C++ DeepStream sample /opt/nvidia/deepstream/deepstream-6.1/sources/apps/sample_apps/deepstream-infer-tensor-meta-test .

How about TAO peopledet + bodypose2d
TAO peoplenet: Overview — TAO Toolkit 3.22.05 documentation , deepstream_reference_apps/README.md at master · NVIDIA-AI-IOT/deepstream_reference_apps · GitHub

bodypose2d:

or

Thanks for the response.

Do you mean the bodypose as mentioned here? Training and Optimizing a 2D Pose Estimation Model with NVIDIA TAO Toolkit, Part 1 | NVIDIA Technical Blog

So I could used e.g. YOLOv3 as to get the box, then use bodypose as a secondary inference SGIE on this?

I think the deepstream_bodypose2d_app.cpp you linked has the bodypose as a PGIE, is this correct?

If my assumptions are correct, is it possible to retrain the bodypose for custom keypoints? My target object is not a human.

Thanks!

Hi @brian0b6iu ,
I think peoplenet is ligher than YoloV3 with high detect accuracy.

is it possible to retrain the bodypose for custom keypoints? My target object is not a human.
Not sure, you may could take a try