Why is inferencing result different on dGPU vs Jetson given same model and input source?

Please provide complete information as applicable to your setup.

• Hardware Platform (Jetson / GPU)
Jetson TX2/Xavier, RTX2060

• DeepStream Version
DS5.1

• JetPack Version (valid for Jetson only)
JP4.5.1

• TensorRT Version
Jetson : 7.1.3-1+cuda10.2
dGPU : 7.2.1-1+cuda11.1

• NVIDIA GPU Driver Version (valid for GPU only)
455.32.00

• Issue Type( questions, new requirements, bugs)
Questions

• Detail

Background :
I was working on a unit test to verify the inferencing result for automated test system for both Jetson and dGPU, expecting them to produce exact same result, which seems to be wrong.

Questions :

  1. Given a same input video and model (e.g: YoloV3 or .uff based model), can we expect the same exact result from both Jetson and dGPU in a deepstream pipeline?

  2. So, I ran trtexec using a sample binary data and a single model .uff file across two different platform and got the same exact values. However, when these .uff are integrated into deepstream pipeline and ran with a video stream, I am observing different inferencing result. Is this an expected behaviour?

  3. After looking at deepstream code, I noticed that Jetson uses RGBA color format while dGPU uses RGB color format. There are also some part where Jetson has to use NvBufSurfaceMapEglImage but this is not required for dGPU. Could this be contributing factor to the difference in inference result on two different platform?

  4. I tested on 1080ti and RTX2060 and the result is the same for both this dGPU so it seems that the inference result should be the same across GPU. Is this correct?

I hope I can gain some insight or confirmation if the behavior that I am observing is normal.

Thanks!

does not guarantee bit exactly same because float data calculation is flow dependent.

I don’t think this is expected.

One question, what do you mean “same”? same bboxs (exactly same BBOX number and coordinates), inference output binary bit exactly same? Can you elaborate the meaning of “same”?

hi @mchi, thanks for your reply.

The “same” that I am observing is the number of inference result (e.g: number of people detected using my Yolo model). It is not referring to the same BBOX coordinate nor inference output binary bit.

Does that mean different dGPU will also not guarantee same bit exact result because each dGPU has different compute capability and optimization flow (e.g: RTX2060 vs GTX1060 vs 1080ti) ?

Would trtexec exhibit deterministic value when running .uff model on the same GPU architecture? (e.g: running a few runs on the same RTX2060 without deleting the generated .plan file)? I am observing about 1e-7 to 1e-15 difference between my 1st and 3rd run.