TX2 SCRFD model tensorRT conversion faulty

emirhanyagcioglu20 · February 25, 2025, 5:55pm

Hello!

I am trying to run a SCRFD 500m model on TX2 from https://github.com/deepinsight/insightface/blob/master/detection/scrfd. I managed to convert the model into a TensorRT engine, however the outputs are wildly different from the version I have deployed in my Laptop, which was running the ONNX version of the model as explained in the model repo.

I am aware that TensorRT conversion might not convert some of the preprocessing and postprocessing steps, but there are no minimal examples for SCRFD on how to manage these inputs & outputs. For example, can I give the same exact input to the TRT, as the ONNX model, or do I need to apply some preprocessing first? How can I know what steps are not included in tensorRT version, if any?

Can I do something to make the TRT engine run exactly the same as the ONNX version? For example,

from insightface.model_zoo import SCRFD
scrfd_model_path = "face/saved_models/det_500m.onnx"
...
scrfd_detector = SCRFD(scrfd_model_path, scrfd_session)
scrfd_bboxes_orig, _ = scrfd_detector.detect(rgb_orig)  # rgb_orig is image as an array

Yields me the outputs I want, but

input_data = preprocess_image(image_path, input_shape)  # just shuffles dimensions and channels
...
np.copyto(inputs[0]["host"], input_data.ravel())
cuda.memcpy_htod_async(inputs[0]["device"], inputs[0]["host"], stream)
context.execute_async_v2(bindings=bindings, stream_handle=stream.handle)
cuda.memcpy_dtoh_async(outputs[0]["host"], outputs[0]["device"], stream)
stream.synchronize()
output_data = outputs[0]["host"].reshape(5, -1)

gives a very different output. Even if I assume the outputs are unfiltered bboxes, they are not meaningful when overlaid on the input image (Many are smaller than 10x10). This indicates the input needs some processing, but I don’t know what they require. If possible, I want to see what tensorRT included as the model.

AastaLLL · February 26, 2025, 5:16am

Hi,

When deploying the model on the desktop, do you get the correct output?
If so, which TensorRT version do you use on the desktop?

Thanks.

emirhanyagcioglu20 · February 27, 2025, 2:55pm

Hello!

Turns out TensorRT conversion mixed up the output order, so the postprocessing was working with mixed up values. For future reference, tensorRT engine for SCRFD outputs as fine, the same as the ONNX version, just in a different order as follows:

SCRFD outputs scores, bboxes and keypoints for each stride, so there are 9 output lists. Given strides

self._feat_stride_fpn = [8, 16, 32]

ONNX outputs were ordered as follows:

stride1_scores
stride2_scores
stride3_scores
stride1_bboxes
stride2_bboxes
stride3_bboxes
stride1_keypoints
stride2_keypoints
stride3_keypoints

However TRT engine yields outputs in the following order:

stride1_scores
stride1_bboxes
stride1_keypoints
stride2_scores
stride2_bboxes
stride2_keypoints
stride3_scores
stride3_bboxes
stride3_keypoints

So after inference, change the section of prediction code:

fmc = self.fmc
for idx, stride in enumerate(self._feat_stride_fpn):
  if self.batched:
    scores = net_outs[idx][0]
    bbox_preds = net_outs[idx + fmc][0]
    bbox_preds = bbox_preds * stride
    if self.use_kps:
      kps_preds = net_outs[idx + fmc * 2][0] * stride
  else:
    scores = net_outs[idx]
    bbox_preds = net_outs[idx + fmc]
    bbox_preds = bbox_preds * stride
    if self.use_kps:
      kps_preds = net_outs[idx + fmc * 2] * stride

To something like

fmc = self.fmc
indx=0
for stride in self._feat_stride_fpn:
  if self.batched:
    scores = net_outs[indx][0]
    bbox_preds = net_outs[indx + 1][0]
    bbox_preds = bbox_preds * stride
    if self.use_kps:
      kps_preds = net_outs[indx + 2][0] * stride
  else:
    scores = net_outs[indx]
    bbox_preds = net_outs[indx + 1]
    bbox_preds = bbox_preds * stride
    if self.use_kps:
      kps_preds = net_outs[indx + 2] * stride
  indx += fmc

To obtain the correct output.

system · March 26, 2025, 1:09am

This topic was automatically closed 14 days after the last reply. New replies are no longer allowed.

Topic		Replies	Views
TRT FIle TensorRT	0	247	April 16, 2024
Memory error for tensorRT model on TX2 Jetson TX2 tensorrt	6	1465	January 5, 2022
TRT is not working as expected, in contrast to torch and onnxruntime TensorRT	9	926	September 1, 2022
Could not find any implementation for node failure of TensorRT 8.5 when running on GPU Jetson Xavier NX TensorRT	1	24	December 31, 2024
Problem testing TensorRT optimized model Jetson TX2	5	1108	February 23, 2020
TensorRT gives diffent results than ONNX and Pytorch TensorRT	8	1565	September 28, 2023
Cannot convert SSD ONNX model to TensorRT TensorRT tensorrt	15	2360	November 23, 2022
SSD Mobilenet V2 TensorRT optimization for Jetson TX2 Jetson TX2 tensorrt	6	1855	October 18, 2021
Use pre-trained object detection TF2 models with TensorRT ONNX TensorRT	9	1933	May 31, 2021
Tensorrt Inference in Real time Jetson Nano tensorrt , jetson-inference , gstreamer , python	8	1730	April 12, 2023

TX2 SCRFD model tensorRT conversion faulty

Related topics