Are you training with BGR or RGB?
Your offsets in nvinfer are I think in BGR but your normalisation in pytorch seems to be RGB.
And for your Resize in pytorch does it maintain aspect ratio? If yes, what kind of padding does it add?
Nvinfer by default does not maintain aspect ratio, but if it does it will either do bottom right zero padding or symmetric-padding (if symmetric-padding=1)
Lastly did you check your Resize Interpolation method? If I remember correctly, default in pytorch is different from default in nvinfer.
We know the fact that different libraries gives different resizing result even though using same interpolation methods and because of that it is crucial for us to know how nvinfer interpolation method works and how it can be replicated in our offline training to achieve same preprocessing in production and offline environments.
We hope that someone have solved similar issue and can share information with us.
I had been struggling with a similar problem. So I ended up using scaling-fitler=1 or scaling-filter=2 in the SGIE configuration file. Note that I had not changed the resize method for Streammux as I did not need to resize there.
To confirm my results, I had exported the crops from deepstream’s detector and run inference on them in pytorch (which was much higher than what i was getting in deepstream).
So I played around with the scaling filter and just by changing to scaling-filter=1, the accuracy went up significantly (~15-20% on different videos). scaling-filter=2 had similar results but it kept crashing on dGPU so I stuck to scaling-filter=1
Thank you for your suggestions.
Changing offsets to 123.675;116.28;103.53 improved results the most and scaling-filter=1 helped get better results as well. But there are still mismatches between models predictions, now only ~20% predictions mismatch, compared to previously ~75%.
I have also tried every suggestion @mchi referred, but none improved results.
Lastly I can think of the precision. How about try to run with fp32? (network-mode=0) and check?
And maybe also explicitly disable maintain-aspect-ratio=0 because your pytorch resize does not maintain the aspect ratio.
Right I forgot to mention that I had also changed network-mode=0 previously and I’m not using A.Resize(height=224, width=224, interpolation=cv2.INTER_NEAREST) anymore but went back to using A.Resize(height=224, width=224) because results got worse in most cases while changing settings that @mchi referred to.
I have tried what you mention by adding explicitly maintain-aspect-ratio=0 but it had no impact on results.
Could you explain in more detail how I should change my offsets and net-scale-factor, because currently I have it calculated according to this comment where: np.array([0.485, 0.456, 0.406])*255 = array([123.675, 116.28 , 103.53 ])
And np.array([0.229, 0.224, 0.225]).mean()*255 = 57.63
Therefore net-scale-factor is going to be 1/57.63 = 0.01735207357279195.
Yes, so I take: mean=(0.485, 0.456, 0.406) and multiply every value by 255, which equals to 123.675;116.28;103.53 e.g. 0.485 * 255 = 123.675
And for std=(0.229, 0.224, 0.225) I calculate mean of std which is (0.229+0.224+0.225) / 3 = 0.226 and then calculate net-scale-factor by 1 / (0.226 * 255) = 0,017352074
I have gone through the options You have listed in DeepStream SDK FAQ and confirmed that I’m using the same ones for training as I’m for DS
Thank you for all the help so far.
I have tried what you suggested by adding position=‘top_left’ to PadIfNeeded function, however predictions on pth model and from deepstream still differ. Maybe you are right that I’m doing something wrong in my PGIE.
I’m using darknet YOLOv4 model in PGIE, it’s config:
When you visualize your video, which of these situations occur?
Bbox there, but wrong classification
Incase this happens, do you have a tracker between your PGIE and SGIE? (NOTE: Tracker may alter the bounding boxes which may cause some difference in classification result. So it would be good to move tracker after SGIE to verify.)
Incase this happens, very likely the detector is the problem. From the nvinfer code it seems the default min object width and height is 16. Are your objects smaller than that? Maybe they are being discarded? And I see you have disabled clustering, then I think it would be worth setting pre-cluster-threshold=0.0 just to ensure each box is rendered.
When I visualize my video I can see that bbox is there, but classifications are wrong. I do not have a tracker between PGIE and SGIE unfortunately.
Another thing I have tested is writing a benchmark script which takes engine model file and classifies images locally. This benchmark of engine model gives me matching results of 98% comparing predictions with pth model locally. However this engine file was generated on a different PC than which I’m running Deepstream 5.1 on and a different version of TensorRT was used. Instead of 188.8.131.52 (which is used in Deepstream 5.1) I have used 184.108.40.206 version for local engine benchmark, because only newer versions are supported with tensorrt python package.
So my question is, can an older version of TensorRT decrease precision so drastically, or there is still something wrong with my deepstream config files?
Thank you for your answer @mchi
Currently it is not possible to upgrade current system to DeepStream 6.2 unfortunately.
As far as I can tell it is the same, yes. But maybe I’m missing something crucial. I have RTSP stream, which goes to Streammux → PGIE (YOLO model 1 class) → SGIE (Classification model which works only on YOLO detections) → I save images, which went through Streammux, OD coordinates and classification class in Redis.
From Redis I save images with coordinates and classification class in their names.
Locally I cut saved images according to OD coordinates and pass them through pth model and compare pth predicted class with classification ID in images name.
I’m using FP32 inference precision for both - OD and classification.