Please provide complete information as applicable to your setup.
• Hardware Platform (Jetson / GPU) GPU • DeepStream Version DS 6.3 • JetPack Version (valid for Jetson only) N/a • TensorRT Version 8.5.3 • NVIDIA GPU Driver Version (valid for GPU only) 535.129.03
**• Issue Type( questions, new requirements, bugs)**bugs
**• How to reproduce the issue ? (This is for bugs. Including which sample app is using, the configuration files content, the command line used and other details for reproducing)**Docker container on multiple GPUs, compare detection results (4090, 4080, 3090) • Requirement details( This is for new requirement. Including the module name-for which plugin or for which sample application, the function description) same results across GPUs
We ran a model using FP16 precision on a RTX 4090, RTX 4080, and a RTX 3090. The results differed widely between the GPUs in terms of detections. The models were the exact same, as were the video, drivers, cuda version, and software inside the docker container (compiled on the target machine). The results are as follows:
RTX 4080: 1456 detections
RTX 3090: 1010 detections
We understand that different GPUs and GPU generations handle precisions differently; however, we do not expect that difference to broach a 50% discrepancy in detection count. Could you please explain why this may have happened? This is paramount to our ability to benchmark our models for production readiness.
This not the first time that Deepstream/TensorRT has proved to be non-deterministic. Any additional information you could provide would be helpful.