• Hardware Platform (Jetson / GPU) : T4/3080
• DeepStream Version: 5.0.1
• TensorRT Version: 7.2
• NVIDIA GPU Driver Version (valid for GPU only): 460.32.03
• Issue Type( questions, new requirements, bugs): questions
Hello, our team found that the accuracy will decreased when integrating the model into DeepStream pipeline(object detector such as Yolo), we have dig for some time and doubt if it is caused by image resizing in nvstreammux plugin(or other pre-process operations in pipeline).
below is the question details
- we train our yolo model with opencv to do preprocess, resizing original images(1280*720) to network’s input_shape(320*320 or 416*416), using default interpolation method(liner), don’t maintain aspect ratio
- we test the raw model on our test dataset and get not bad performance
- we transfer the model to tensorrt engine(INT 8 precision) and test on the same dataset, the F1-Score decreased 2%~3% and mAP@small decreased about 2% compared with step 2, which is acceptable
- we integrate the tensorrt engine into DeepStream pipeline and set nvstreammux width and height are 1280*720(as same as original test images),and run the pipeline with our test dataset then eval its outputs, the F1-Score and mAP@small is almost unchanged compared with step 3, which is acceptable too
- based on step 4, we change the nvstreammux’s width and height to 740*416(for some purpose), and thought we can get the same result as same as step 4, but we it’s not. the F1-Score decreased about 16% and mAP@small decreased about 30%~35% compared with step 4. red in table below.
we google a lot and found maybe it is caused by pre-process in deepstream, there are at least 2 times resize operations, one is in nvstreammux and another is in nvinfer:
- nvstreammux resize original images to what we want, such as 1280*720 or 740*416, which is set by property
- nvinfer resize the image to network’s input_shape, such as 320*320 or 416*416
nvstreammux use interpolation-liner method to resample the image and nvinfer use interpolation-nearest method to resample the image, according to here and here.
So different resize operations between training and inference will affect the performance?
training
resize once and with interpolation-liner method using OpenCV based cpu
inference
resize twice and with interpolation-liner first and then interpolation-nearest using CUDA API(if we set nvstreammux’ width and height as 1280*720, maybe just resize once in nvinfer)
more test
we trained model using twice resize operations(keep as same as deepstream’s preprocess), resize using liner and then nearest, but no effects :(
interpolation-liner algorithm in opencv-cpu and cuda-api are the same? if not, need we use cuda api do preprocess(resize) in our model training stage?