Need help explaining inference results on different GPUs with DS6.3

Please provide complete information as applicable to your setup.

• Hardware Platform (Jetson / GPU) GPU
• DeepStream Version DS 6.3
• JetPack Version (valid for Jetson only) N/a
• TensorRT Version 8.5.3
• NVIDIA GPU Driver Version (valid for GPU only) 535.129.03
**• Issue Type( questions, new requirements, bugs)**bugs
**• How to reproduce the issue ? (This is for bugs. Including which sample app is using, the configuration files content, the command line used and other details for reproducing)**Docker container on multiple GPUs, compare detection results (4090, 4080, 3090)
• Requirement details( This is for new requirement. Including the module name-for which plugin or for which sample application, the function description) same results across GPUs

We ran a model using FP16 precision on a RTX 4090, RTX 4080, and a RTX 3090. The results differed widely between the GPUs in terms of detections. The models were the exact same, as were the video, drivers, cuda version, and software inside the docker container (compiled on the target machine). The results are as follows:

  • RTX 4080: 1456 detections
  • RTX 3090: 1010 detections

We understand that different GPUs and GPU generations handle precisions differently; however, we do not expect that difference to broach a 50% discrepancy in detection count. Could you please explain why this may have happened? This is paramount to our ability to benchmark our models for production readiness.

This not the first time that Deepstream/TensorRT has proved to be non-deterministic. Any additional information you could provide would be helpful.

We will investigate the differences.

1 Like

Can you share us the model and test pipeline you are using for us to check the differences?

I can provide the repo that the model was pulled from along with our create_pipeline() method.
Let me know if I can provide anything else

Model : https://github.com/ultralytics/yolov5

Pipeline method:

mem_type = get_mem_type()
        Gst.init(None)
        pipeline = Gst.Pipeline()
        streammux = Gst.ElementFactory.make('nvstreammux', 'Stream-Muxer')
        streammux.set_property('width', self.STREAMMUX_WIDTH)
        streammux.set_property('height', self.STREAMMUX_HEIGHT)
        streammux.set_property('batch-size', self.BATCH_SIZE)
        streammux.set_property('batched-push-timeout', 4000000)
        pipeline.add(streammux)
        for i in range(self.BATCH_SIZE):
            source_bin = create_source_bin(i, f'file:///{self.VIDEO_PATH}')
            pipeline.add(source_bin)
            padname = f'sink_{i}'
            sinkpad = streammux.get_request_pad(padname)
            srcpad = source_bin.get_static_pad('src')
            srcpad.link(sinkpad)
        pgie = Gst.ElementFactory.make("nvinfer", "primary-inference")
        pgie.set_property('config-file-path', self.INFERENCE_MODEL.deepstream_data.pgie_file)
        pipeline.add(pgie)
        pre_inference_queue = Gst.ElementFactory.make('queue', 'pre-inference-queue')
        pre_inference_queue.set_property('max-size-buffers', 0)
        pre_inference_queue.set_property('max-size-time', 0)
        pipeline.add(pre_inference_queue)
        post_inference_queue = Gst.ElementFactory.make('queue', 'post-inference-queue')
        post_inference_queue.set_property('max-size-buffers', 0)
        post_inference_queue.set_property('max-size-time', 0)
        pipeline.add(post_inference_queue)
        nvvidconv1 = Gst.ElementFactory.make("nvvideoconvert", "convertor")
        pipeline.add(nvvidconv1)
        nvosd = Gst.ElementFactory.make("nvdsosd", "onscreendisplay")
        nvosd.set_property('process-mode', 1)
        pipeline.add(nvosd)
        nvcaps = Gst.ElementFactory.make('capsfilter', f'caps-1')
        nvcaps.set_property('caps', Gst.Caps.from_string(f'video/x-raw(memory:NVMM),format=RGBA'))
        pipeline.add(nvcaps)
        nvvidconv2 = Gst.ElementFactory.make('nvvideoconvert', f'converter2')
        pipeline.add(nvvidconv2)
        if not is_aarch64():
            for p in [streammux, nvvidconv1, nvvidconv2]:
                p.set_property("nvbuf-memory-type", mem_type)
                # p.set_property('gpu-id', self.GPU_ID)
        streammux.link(pre_inference_queue)
        pre_inference_queue.link(pgie)
        pgie.link(post_inference_queue)
        post_inference_queue.link(nvvidconv1)
        nvvidconv1.link(nvosd)
        nvosd.link(nvcaps)
        nvcaps.link(nvvidconv2)
        sink = Gst.ElementFactory.make("fakesink", "fakesink")
        sink.set_property("sync", 0)
        sink.set_property('qos', 0)
        sink.set_property('enable-last-sample', 0)
        pipeline.add(sink)
        nvvidconv2.link(sink)
        loop = GLib.MainLoop()
        bus = pipeline.get_bus()
        bus.add_signal_watch()
        bus.connect("message", bus_call, loop)
        buffer_probe_pad = nvvidconv2.get_static_pad('sink')
        buffer_probe_pad.add_probe(Gst.PadProbeType.BUFFER, self.detection_probe, 0)
        return pipeline, loop```

Additionally, this is the repo that has been used to convert our .pt files to .onnx format

https://github.com/marcoslucianops/DeepStream-Yolo

Which yolo model are you using?

YOLOv5L