How to run Pytorch model Keypoint R-CNN on Deepstream + Triton

Please provide complete information as applicable to your setup.

• Hardware Platform (Jetson / GPU)

GPU- RTX A5000

• DeepStream Version

6.1 via

• JetPack Version (valid for Jetson only)
• TensorRT Version

version in

• NVIDIA GPU Driver Version (valid for GPU only)

| NVIDIA-SMI 510.73.05    Driver Version: 510.73.05    CUDA Version: 11.6     |
| GPU  Name        Persistence-M| Bus-Id        Disp.A | Volatile Uncorr. ECC |
| Fan  Temp  Perf  Pwr:Usage/Cap|         Memory-Usage | GPU-Util  Compute M. |
|                               |                      |               MIG M. |
|   0  NVIDIA RTX A5000    Off  | 00000000:0A:00.0  On |                  Off |
| 30%   41C    P8    21W / 230W |   2117MiB / 24564MiB |      8%      Default |
|                               |                      |                  N/A |

• Issue Type( questions, new requirements, bugs)


• How to reproduce the issue ? (This is for bugs. Including which sample app is using, the configuration files content, the command line used and other details for reproducing)
• Requirement details( This is for new requirement. Including the module name-for which plugin or for which sample application, the function description)

I am trying to run the pytorch model Keypoint R-CNN from here: keypointrcnn_resnet50_fpn — Torchvision 0.12 documentation

I have tried and failed to use this with trtexec after exporting ONNX model: Link

I have also tried and failed to export this using torch2trt: Link

I was then recommended to try running using Deepstream with the Triton backend.

I am now attempting this but I have come across the first stumbling block, which is generating the config.pbtxt file.

The model is described as:

During inference, the model requires only the input tensors, and returns the post-processed predictions as a List[Dict[Tensor]], one for each input image. The fields of the Dict are as follows, where N is the number of detected instances:

  • boxes (FloatTensor[N, 4]): the predicted boxes in [x1, y1, x2, y2] format, with 0 <= x1 < x2 <= W and 0 <= y1 < y2 <= H.
  • labels (Int64Tensor[N]): the predicted labels for each instance
  • scores (Tensor[N]): the scores or each instance
  • keypoints (FloatTensor[N, K, 3]): the locations of the predicted keypoints, in [x, y, v] format.

I can run the following:

import torch
import torchvision

# Instance of the model
model = torchvision.models.detection.keypointrcnn_resnet50_fpn(pretrained=True)

# Switch the model to eval model

# An example input you would normally provide to your model's forward() method.
example = torch.rand(2, 3, 800, 800)

result = model(example)


and get the output:

[{'boxes': tensor([], size=(0, 4), grad_fn=<StackBackward0>), 'labels': tensor([], dtype=torch.int64), 'scores': tensor([], grad_fn=<IndexBackward0>), 'keypoints': tensor([], size=(0, 17, 3)), 'keypoints_scores': tensor([], size=(0, 17))}, {'boxes': tensor([], size=(0, 4), grad_fn=<StackBackward0>), 'labels': tensor([], dtype=torch.int64), 'scores': tensor([], grad_fn=<IndexBackward0>), 'keypoints': tensor([], size=(0, 17, 3)), 'keypoints_scores': tensor([], size=(0, 17))}]

However I do not know how to translate this to a config.pbtxt file.

Trying to follow the docs I have come up with the following:

platform: "pytorch_libtorch"
max_batch_size: 2
input {
    name: "INPUT__0"
    data_type: TYPE_FP32
    format: FORMAT_NCHW
    dims: [ 3, 800, 800 ]
output [
    name: "boxes"
    data_type: TYPE_FP32
    dims: 4
    name: "labels"
    data_type: TYPE_INT64
    dims: [ -1 ]
    label_filename: "resnet50_labels.txt"
    name: "scores"
    data_type: TYPE_FP32
    dims: [ -1 ]
    name: "keypoints"
    data_type: TYPE_FP32
    dims: [ 17 ]
    dims: [ 3 ]
    name: "keypoints_scores"
    data_type: TYPE_FP32
    dims: [ 17 ]

However I am not sure it is correct. In the python output there are outputs without sizes like the labels and I have put -1 here which I’m not sure if it’s correct. Same for scores. I also found the label_filename in another example but I’m not sure if that is correct either.

My questions:

  • Is it possible for someone to check my pbtxt and give feedback
  • Are there any good pytorch examples for what I am trying to do?


I have done some background reading. It seems that this model outputs a dictionary.

This person here seems to have some success in wrapping a model that outputs a dictionary, but I am not sure what happens to the data after that - e.g. I want to display bounding boxes and keypoints in deepstream: link

I have also found this example where it looks like the result from an ssd model is post-processed in python after inference:

Are either of these methods valid for what I am trying to achieve? I am trying to reduce the amount of effort I am putting into dead-ends.


Hi @brian0b6iu
NV TAO provides GitHub - NVIDIA-AI-IOT/deepstream_tao_apps: Sample apps to demonstrate how to deploy models trained with TAO on DeepStream which includes multiple models, such as faster-rcnn, mask-rcnn, etc, you could check if you can just use the model or re-train with TAO and the corresponding DS.

If you will continue to use your own model, can you do Trinton/nvinferserver inference now? With your own model + nvinferserver, each inference will output the raw inference data, and you need to customize the post-process as you see in deepstream_python_apps/apps/deepstream-ssd-parser at master · NVIDIA-AI-IOT/deepstream_python_apps · GitHub

There is no update from you for a period, assuming this is not an issue anymore.
Hence we are closing this topic. If need further support, please open a new one.