Please provide complete information as applicable to your setup.
**• Hardware Platform (Jetson / GPU)**Jetson
**• DeepStream Version 6.0
Hello NVIDIA community, I trained a YOLOv5s model on a custom dataset (VisDrone), achieving excellent results in PyTorch. However, when I converted the model to ONNX for deployment on my Jetson Nano using DeepStream 6.0, I encountered significantly poor performance on the same image. I used the parameters “–simplify --batch 1” during conversion. Attached are the images for reference.
Seeking insights on the performance gap and any optimization suggestions. Appreciate your assistance.
So the image above is the result of DeepStream, and the image below is the result of PyTorch?
Could you refer to the FAQ to tune the piepline and check the result?
I also have some experience with the deepstream down accuracy when convert pytorch model to tensorRT.
I already checked the FAQ but I think it might be have one more potential issue is about the tensor input pre-processing on GPU in deepstream are not as expected.
I said that because I had tried to use the model tensorRT convert default from nvinfer. and write a code cpp inference to test preprocess same with deepstream but use opencv (CPU proceed then copy to GPU)
The netscale and std also had same formular implement in deepstream.
And result is better than the deepstream output and also same with pytorch output.
Still not sure what is the main issue there what difference about
preprocess on CPU and copy data to GPU
and
use deepstream to pre-process all data on GPU.
I hope that this topic could dig more deeply to improve the outcome from deepstream pipeline
@mjemvThere is no update from you for a period, assuming this is not an issue anymore. Hence we are closing this topic. If need further support, please open a new one. Thanks