Need help explaining inference results on different GPUs with DS6.3

dj118 · November 13, 2023, 10:52pm

Please provide complete information as applicable to your setup.

• Hardware Platform (Jetson / GPU) GPU
• DeepStream Version DS 6.3
• JetPack Version (valid for Jetson only) N/a
• TensorRT Version 8.5.3
• NVIDIA GPU Driver Version (valid for GPU only) 535.129.03
**• Issue Type( questions, new requirements, bugs)**bugs
**• How to reproduce the issue ? (This is for bugs. Including which sample app is using, the configuration files content, the command line used and other details for reproducing)**Docker container on multiple GPUs, compare detection results (4090, 4080, 3090)
• Requirement details( This is for new requirement. Including the module name-for which plugin or for which sample application, the function description) same results across GPUs

We ran a model using FP16 precision on a RTX 4090, RTX 4080, and a RTX 3090. The results differed widely between the GPUs in terms of detections. The models were the exact same, as were the video, drivers, cuda version, and software inside the docker container (compiled on the target machine). The results are as follows:

RTX 4080: 1456 detections
RTX 3090: 1010 detections

We understand that different GPUs and GPU generations handle precisions differently; however, we do not expect that difference to broach a 50% discrepancy in detection count. Could you please explain why this may have happened? This is paramount to our ability to benchmark our models for production readiness.

This not the first time that Deepstream/TensorRT has proved to be non-deterministic. Any additional information you could provide would be helpful.

Fiona.Chen · November 14, 2023, 6:51am

We will investigate the differences.

Fiona.Chen · November 22, 2023, 7:32am

Can you share us the model and test pipeline you are using for us to check the differences?

dj118 · November 22, 2023, 4:15pm

I can provide the repo that the model was pulled from along with our create_pipeline() method.
Let me know if I can provide anything else

Model : https://github.com/ultralytics/yolov5

Pipeline method:

mem_type = get_mem_type()
        Gst.init(None)
        pipeline = Gst.Pipeline()
        streammux = Gst.ElementFactory.make('nvstreammux', 'Stream-Muxer')
        streammux.set_property('width', self.STREAMMUX_WIDTH)
        streammux.set_property('height', self.STREAMMUX_HEIGHT)
        streammux.set_property('batch-size', self.BATCH_SIZE)
        streammux.set_property('batched-push-timeout', 4000000)
        pipeline.add(streammux)
        for i in range(self.BATCH_SIZE):
            source_bin = create_source_bin(i, f'file:///{self.VIDEO_PATH}')
            pipeline.add(source_bin)
            padname = f'sink_{i}'
            sinkpad = streammux.get_request_pad(padname)
            srcpad = source_bin.get_static_pad('src')
            srcpad.link(sinkpad)
        pgie = Gst.ElementFactory.make("nvinfer", "primary-inference")
        pgie.set_property('config-file-path', self.INFERENCE_MODEL.deepstream_data.pgie_file)
        pipeline.add(pgie)
        pre_inference_queue = Gst.ElementFactory.make('queue', 'pre-inference-queue')
        pre_inference_queue.set_property('max-size-buffers', 0)
        pre_inference_queue.set_property('max-size-time', 0)
        pipeline.add(pre_inference_queue)
        post_inference_queue = Gst.ElementFactory.make('queue', 'post-inference-queue')
        post_inference_queue.set_property('max-size-buffers', 0)
        post_inference_queue.set_property('max-size-time', 0)
        pipeline.add(post_inference_queue)
        nvvidconv1 = Gst.ElementFactory.make("nvvideoconvert", "convertor")
        pipeline.add(nvvidconv1)
        nvosd = Gst.ElementFactory.make("nvdsosd", "onscreendisplay")
        nvosd.set_property('process-mode', 1)
        pipeline.add(nvosd)
        nvcaps = Gst.ElementFactory.make('capsfilter', f'caps-1')
        nvcaps.set_property('caps', Gst.Caps.from_string(f'video/x-raw(memory:NVMM),format=RGBA'))
        pipeline.add(nvcaps)
        nvvidconv2 = Gst.ElementFactory.make('nvvideoconvert', f'converter2')
        pipeline.add(nvvidconv2)
        if not is_aarch64():
            for p in [streammux, nvvidconv1, nvvidconv2]:
                p.set_property("nvbuf-memory-type", mem_type)
                # p.set_property('gpu-id', self.GPU_ID)
        streammux.link(pre_inference_queue)
        pre_inference_queue.link(pgie)
        pgie.link(post_inference_queue)
        post_inference_queue.link(nvvidconv1)
        nvvidconv1.link(nvosd)
        nvosd.link(nvcaps)
        nvcaps.link(nvvidconv2)
        sink = Gst.ElementFactory.make("fakesink", "fakesink")
        sink.set_property("sync", 0)
        sink.set_property('qos', 0)
        sink.set_property('enable-last-sample', 0)
        pipeline.add(sink)
        nvvidconv2.link(sink)
        loop = GLib.MainLoop()
        bus = pipeline.get_bus()
        bus.add_signal_watch()
        bus.connect("message", bus_call, loop)
        buffer_probe_pad = nvvidconv2.get_static_pad('sink')
        buffer_probe_pad.add_probe(Gst.PadProbeType.BUFFER, self.detection_probe, 0)
        return pipeline, loop```

dj118 · November 22, 2023, 4:21pm

Additionally, this is the repo that has been used to convert our .pt files to .onnx format

https://github.com/marcoslucianops/DeepStream-Yolo

Fiona.Chen · November 23, 2023, 1:28am

Which yolo model are you using?

dj118 · November 23, 2023, 3:16am

YOLOv5L

Topic		Replies	Views
Why is inferencing result different on dGPU vs Jetson given same model and input source? DeepStream SDK tensorrt , gstreamer	5	1535	December 28, 2021
Difference output when using Deepstream pipeline on difference GPU cards DeepStream SDK	2	386	March 16, 2022
TensorRT’s result is different between 1080ti and jetson tx2 TensorRT	1	903	January 24, 2019
Same tensorRT code get different result TensorRT	10	2264	July 23, 2019
Detection result is different between Xavier and Orin for the same model and weights DeepStream SDK	27	1865	June 8, 2023
The inference results of multi streams are different using fp16 or int8 model TensorRT	1	1477	August 28, 2018
TensorRT model accuracy on different GPUs TensorRT	3	1980	October 3, 2018
SOS! TensorRT engine different infer output in yolov4 on DeepStream5.0 DeepStream SDK	10	595	October 12, 2021
Different output of neural network using TensorRT model directly and using in DeepStream DeepStream SDK	2	1201	October 12, 2021
Number of detections reduce when Yolo infernece engine is used with multiple streams DeepStream SDK	6	1136	October 12, 2021

Need help explaining inference results on different GPUs with DS6.3

Related topics