Accuracy decreased when using deepstream pipeline

• Hardware Platform (Jetson / GPU) : T4/3080
• DeepStream Version: 5.0.1
• TensorRT Version: 7.2
• NVIDIA GPU Driver Version (valid for GPU only): 460.32.03
• Issue Type( questions, new requirements, bugs): questions

Hello, our team found that the accuracy will decreased when integrating the model into DeepStream pipeline(object detector such as Yolo), we have dig for some time and doubt if it is caused by image resizing in nvstreammux plugin(or other pre-process operations in pipeline).

below is the question details

  1. we train our yolo model with opencv to do preprocess, resizing original images(1280*720) to network’s input_shape(320*320 or 416*416), using default interpolation method(liner), don’t maintain aspect ratio
  2. we test the raw model on our test dataset and get not bad performance
  3. we transfer the model to tensorrt engine(INT 8 precision) and test on the same dataset, the F1-Score decreased 2%~3% and mAP@small decreased about 2% compared with step 2, which is acceptable
  4. we integrate the tensorrt engine into DeepStream pipeline and set nvstreammux width and height are 1280*720(as same as original test images),and run the pipeline with our test dataset then eval its outputs, the F1-Score and mAP@small is almost unchanged compared with step 3, which is acceptable too
  5. based on step 4, we change the nvstreammux’s width and height to 740*416(for some purpose), and thought we can get the same result as same as step 4, but we it’s not. the F1-Score decreased about 16% and mAP@small decreased about 30%~35% compared with step 4. red in table below.

we google a lot and found maybe it is caused by pre-process in deepstream, there are at least 2 times resize operations, one is in nvstreammux and another is in nvinfer:

  1. nvstreammux resize original images to what we want, such as 1280*720 or 740*416, which is set by property
  2. nvinfer resize the image to network’s input_shape, such as 320*320 or 416*416

nvstreammux use interpolation-liner method to resample the image and nvinfer use interpolation-nearest method to resample the image, according to here and here.

So different resize operations between training and inference will affect the performance?
training
resize once and with interpolation-liner method using OpenCV based cpu

inference
resize twice and with interpolation-liner first and then interpolation-nearest using CUDA API(if we set nvstreammux’ width and height as 1280*720, maybe just resize once in nvinfer)

more test
we trained model using twice resize operations(keep as same as deepstream’s preprocess), resize using liner and then nearest, but no effects :(

interpolation-liner algorithm in opencv-cpu and cuda-api are the same? if not, need we use cuda api do preprocess(resize) in our model training stage?

anybody know this ?

I found several users complain that the accuracy decreased when using deepstream in this forum, but nobody confirm what this reason is.

Can you refer to About the resize method in nvvideoconvert/nvstreammux - Intelligent Video Analytics / DeepStream SDK - NVIDIA Developer Forums?

@Fiona.Chen
yes , I have read that topic.

My question is:

  1. should we resize image twice in training stage to keep as same as deepstream’s pre-process?
  2. what interpolation-method should we use? liner then nearest? the interpolation-algorithm in opencv and deepstream are the same?

thanks for your reply!

It may help.

The nvstreammux uses bilinear algorithm and nvinfer is using nearest interpolation algorithm.

In Step 4, you set streammux res as 1280x720, while in Step 5, you set it as 740x416. One thing to keep in mind is that the bbox coordinates from NvInfer will be based on the streammux res (not the original res). Thus, you need to scale back the bbox coordinates from NvInfer if you set the streammux res differently from the orignal source res.

So, in Step 5, you need to scale x and width by 1280/740, y and height by 720/416, respectively.

@pshin
thanks for your reply, and we do scale the value to the original size, or the mAP & F1-Score would be wrong totally.

the fact is mAP@small decreased obviously but the others not.