System Configuration:
I have a workstation with two NVIDIA RTX 2080 Ti GPUs, and I am running two instances of the same application, each assigned to a separate GPU using the CUDA_VISIBLE_DEVICES environment variable to ensure proper isolation.
Application Details:
First Instance (GPU 0)
- Decodes video from one RTSP camera using NVDEC.
- Runs face detection & recognition on 640×480 resolution frames using TensorRT.
- GPU utilization is ~3%.
Second Instance (GPU 1)
- Decodes video from 11 RTSP cameras using NVDEC.
- Runs the same face detection & recognition models as the first instance.
- All videos are 4K resolution and processed in concurrent threads.
10 GB out of 11 GB VRAM is utilized, but GPU utilization only reaches 15–20%.
Expected vs. Actual Behavior:
Given that the second instance processes significantly more data (11x streams, higher resolution), I expected GPU utilization to be around 80–90%, but it remains quite low (~15–20%).
Questions:
- What could be causing such low GPU utilization, despite high VRAM usage?
- Are NVDEC workloads inherently low in GPU utilization, or could there be a bottleneck in data transfer or processing?
- Would increasing batch sizes, adjusting NVDEC settings, or improving TensorRT pipeline efficiency help improve GPU utilization?
- Are there specific profiling tools (Nsight Systems, Nsight Compute, etc.) that I should use to diagnose the issue further?
- Any insights on optimizing NVDEC-based workloads for higher GPU utilization would be greatly appreciated. Thanks in advance for your help!
Hi @sms.holding.3 ,
Potential Causes of low GPU util -
- Slow or inadequately optimized data transfer between the CPU and GPU can inhibit GPU performance. This might be due to the large size of data, high latency, or poor data handling methods.
- Low GPU utilization can also stem from inefficient processing methods on the GPU side. This might be related to complex algorithms, inefficient coding practices, or a lack of parallelism in execution.
- If your batch size is too small, it can lead to underutilization of GPU resources. Experimenting with different batch sizes may reveal a more optimal size that enhances GPU utilization.
- Incorrect settings for NVDEC can adversely affect performance. Optimizing these settings according to your input data may yield improvements.
- Not utilizing profiling tools can make it difficult to identify specific performance bottlenecks or inefficiencies in your application.
Optimization Strategies:
- Test various batch sizes to identify an optimal setting that enhances GPU utilization without degrading performance.
- Fine-tune settings like decode resolution and bitrate to match your specific input data more effectively.
- Employ profiling tools such as NVIDIA Nsight Systems or NVIDIA Visual Profiler to analyze performance metrics, pinpoint bottlenecks, and provide actionable insights for optimization.
- Leverage multi-threading, asynchronous execution, or CUDA streams to maximize resources and enhance overall performance.
- Enhance data transfer efficiency by minimizing unnecessary transfers, employing pinned memory, and optimizing transfer techniques to reduce latency.
Please let m eknow if this helps.
Thanks