Trying to convert Detectron2's keypoint_rcnn_R_50_fpn_3x to TensorRT


Hi, so following the Tensorrt’s Github page, I was successfully able to generate the TensorRT engine for mask_rcnn and faster_rcnn.

But when it comes to keypoint_rcnn, while the engine did got created but the output I get is not actual keypoints but the keypoint heatmaps.

Now I do know that the support for detectron2 to tensorrt is specifically supported for mask_rcnn but still, since I am not very skilled at this, I was hoping to get some ideas.

Thank You.


TensorRT Version: 8.6
GPU Type: NVIDIA GeForce RTX 3070 Ti
CUDA Version: 12.0
CUDNN Version: 8.8
Operating System + Version: Ubuntu 20.04
Python Version (if applicable): 3.8
PyTorch Version (if applicable): 2.1

Relevant Files

Keypoint ONNX:

Keypoint Converted ONNX:

Steps To Reproduce

Just follow this link


We couldn’t successfully build the TensorRT engine using the ONNX file shared.
Please share with us more details of the problem, logs with the issue repro model, and sample data.

Thank you.

Please request the converted keypoint onnx file one more time.
Thank You.


Sorry for the delayed response. Could you please give more details on your query?
Please refer to the following (mentioned in the sample readme) to learn more about Detectron’s output.

The outputs of the graph are the same as the outputs of the EfficientNMS_TRT plugin and segmentation head output, name of the last node is detection_masks , shape is [batch_size, max_proposals, mask_height, mask_width] , dtype is float32.

Please check the below for sample output results

Thank you.

Hi, I generated an onnx file for keypoint and then further generating the converted onnx for tensorrt engine.
Thing is that in the output, instead of keypoint x, y points, I get the keypoint heatmap values.

I will attach the onnx and the converted onnx.


Converted Onnx:

If you see the onnx file, at node ConstantOfShape_2057, I can see the xy_preds .

But on the converted onnx at the last node (gather), the output dimension is 100x17x56x56 which I think are heatmap values since I believe xy_preds should be 100x17x3x3.

I modified the converted onnx python code by doing something like this. (took reference from TensorRT’s github)

        mask_pooler_output = self.ROIAlign(nms_outputs[1], p2, p3, p4, p5, self.second_ROIAlign_pooled_size, \
                                           self.second_ROIAlign_sampling_ratio, self.second_ROIAlign_type, self.second_NMS_max_proposals, 'keypoint_pooler')

        # Reshape mask pooler output.
        mask_pooler_shape = np.asarray([self.second_NMS_max_proposals*self.batch_size, self.fpn_out_channels, self.second_ROIAlign_pooled_size, self.second_ROIAlign_pooled_size], dtype=np.int64)
        mask_pooler_reshape_node = self.graph.op_with_const("Reshape", "keypoint_pooler/reshape", mask_pooler_output, mask_pooler_shape)

        # Get first Conv op in mask head and connect ROIAlign's squeezed output to it.
        mask_head_conv = self.graph.find_node_by_op_name("Conv", "/roi_heads/keypoint_head/conv_fcn1/Conv")
        mask_head_conv.inputs[0] = mask_pooler_reshape_node[0]

        # Reshape node that is preparing 2nd NMS class outputs for Add node that comes next.
        classes_reshape_shape = np.asarray([self.second_NMS_max_proposals * self.batch_size], dtype=np.int64)
        classes_reshape_node = self.graph.op_with_const("Reshape", "box_outputs/reshape_classes", nms_outputs[3], classes_reshape_shape)

        # This loop will generate an array used in Add node, which eventually will help Gather node to pick the single
        # class of interest per bounding box, instead of creating 80 masks for every single bounding box.
        add_array = []
        for i in range(self.second_NMS_max_proposals * self.batch_size):
            if i == 0:
                start_pos = 0
                start_pos = i * self.num_classes

        # This Add node is one of the Gather node inputs, Gather node performs gather on 0th axis of data tensor
        # and requires indices that set tensors to be withing bounds, this Add node provides the bounds for Gather.
        add_array = np.asarray(add_array, dtype=np.int32)
        classes_add_node = self.graph.op_with_const("Add", "box_outputs/add", classes_reshape_node[0], add_array)

        # Get the last Conv op in mask head and reshape it to correctly gather class of interest's masks.
        last_resize = self.graph.find_node_by_op_name("Resize", "/roi_heads/keypoint_head/Resize")

        # Gather node that selects only masks belonging to detected class, 79 other masks are discarded.
        final_gather = self.graph.gather("/keypoint_head/gathering", last_resize.outputs[0], classes_add_node[0])
        final_gather[0].dtype = np.float32

        return nms_outputs, final_gather[0`

Could you please help me with the converted onnx? Maybe I am doing something wrong?

From what I understand the problem could be is that, xy_pred stuff happens after the resize node but in the code, after getting the last resize, I am just gathering it.

Somehow I want to insert the node ConstantOfShape_2057 but whenever I try to insert it in the code, it throws error.

# Get the last Conv op in mask head and reshape it to correctly gather class of interest's masks.
        last_resize = self.graph.find_node_by_op_name("Resize", "/roi_heads/keypoint_head/Resize")


Hi guys, any updates?

Sorry for asking this again but is there any update?
I have been trying to create the onnx to get keypoints instead of heatmaps but I am still not well versed with onnx.

I’d appreciate any help, thanks.

@rajupadhyay59 I know it’s been a year, but if you are still on the problem, I can offer you a tip: it’s quite easy to convert 100x17x56x56 heatmaps to keypoints using simple numpy (and it also seems to be a very fast operation that doesn’t influence perfromance).

detectron2/detectron2/structures/ at main · facebookresearch/detectron2 · GitHub ← rewriting _keypoints_to_heatmap() should be enough.

Then, you also need to map the keypoints to your bounding boxes width, height and position - this can also be done relatively easily in post-processing.

1 Like

Though I am no longer working on this, I can see how that would work.
Thanks for the response!