Graphics settings -> Decode and Display Performance

BBVDev · November 27, 2024, 2:08pm

Hi, we have moved our decoding product from T1000 GCs to A1000 GCs for decoding high density video H64/5. The new A1000 has two decoder engines, but the same memory of 8GB. We have observed a stange issue, when decoding 27 1080p@30FPS (x9 on 3 outputs) our GPU is 36% and decode 54%, see attached image. After 10 seconds of all being OK, the decode engine ramps to 96% and the GPU to 100%… We observed this on the T1000 and discovered that selecting ‘Performance’ over ‘Quality’ in the 3D settings fixed the issue. This is no longer the case. We have also abserved a second Windows image where the problem does not exist (same hardware). Could it be an app on the second image changing some performance settings? Any tests or tools you could point us at would be appriciated. Driver vesrion 31.0.15.5222 April 2024

ddesouza · November 29, 2024, 8:26am

Hi @BBVDev, could you run in parallel with your application in a terminal the following command:

nvidia-smi dmon -s puct

Generally, it gives a better report of GPU utilization than the Task Manager.

Please consider using NVIDIA Nsight Systems to profile your application and find possible bottlenecks.

Best regards,

Diego

BBVDev · December 2, 2024, 2:02pm

Hi Diego,

Thank you for your support.
Attached is the output from ’

nvidia-smi dmon -s puct’

We have x3 cards in the system, two showing the issue, GPU 0 and 2.
GPU 1 is working OK as expected?

What is pclk?
It seems higher on the working GPU 1?

We already set NVML_CLOCK_SM & NVML_CLOCK_MEM
Can we set ‘pclk’ with one of the following programatically:
typedef enum nvmlClockType_enum
{
NVML_CLOCK_GRAPHICS = 0, //!< Graphics clock domain
NVML_CLOCK_SM = 1, //!< SM clock domain
NVML_CLOCK_MEM = 2, //!< Memory clock domain
NVML_CLOCK_VIDEO = 3, //!< Video encoder/decoder clock domain

// Keep this last
NVML_CLOCK_COUNT //!< Count of clock types

} nvmlClockType_t;

Only seen in the A1000 NOT T1000…

Thanks.

BBVDev · December 2, 2024, 2:36pm

Update,

Using ```
nvidia-smi -i 0 -lgc 1852, 1852
nvidia-smi -i 1 -lgc 1852, 1852
nvidia-smi -i 2 -lgc 1852, 1852
Fixed the issue.

Q. how to do this programatically?
Command from nvml ?
Query max and set?

Thanks.

BBVDev · December 3, 2024, 3:59pm

OK, as we are pushed for release, I’ve tested the following, however, unsure if this is the best route:

result = nvmlDeviceGetMaxClockInfo(nvmlDeviceId, NVML_CLOCK_GRAPHICS, &maxGPUclock);
if (NVML_SUCCESS == result)
result = nvmlDeviceSetGpuLockedClocks(nvmlDeviceId, maxGPUclock, maxGPUclock);

Howeverthe clocks remain at this max pair rate 100% of the time with our application, even for low load use cases where the min to max API call would make more sense but that does not work, both values need to be at max for it to work as required.

Let me know any pointers, cheers.

ddesouza · December 9, 2024, 11:35am

Hi @BBVDev,

Could you clarify exactly what the issue is? If the GPUs are running at 100% utilization at a lower clock rate, does this affect your application’s decoding throughput? If not, I would recommend you not to change the clock frequencies. It would be better to run it at a lower clock frequency. Moreover, it is not recommended to set the GPU to maximum clock frequency for a long period of time.

If you must for any reason, please consider to use the nvmlDeviceSetApplicationsClocks.

Best regards,

Diego

BBVDev · December 9, 2024, 11:59am

Hi,
We already call nvmlDeviceSetApplicationsClocks and set to max for both NVML_CLOCK_SM and NVML_CLOCK_MEM, this was good or the previous generation T1000 when decoding.
The issue re-appeared with the A1000 GC and settings these clocks was not enough for the decode performance to match the previous generation T1000 which also had only a single decode engine.
Through this support thread we managed to achieve theT1000 performance on the A1000 card using the additional calls:
nvmlDeviceGetMaxClockInfo and nvmlDeviceSetGpuLockedClocks as detailed above.
The problem is then solved.
My question would be any recommendation or consideration advice as to these two ne API calls being the correct solution?
Thank you in advance.

Topic		Replies	Views
Nvidia codec sdk for 4K@60fps video Video Processing & Optical Flow gstreamer	0	632	March 5, 2021
gpu clock and video player Linux	3	820	November 21, 2019
Difference in performace for parallell decode encode with ffmpeg h264_cuvid and h264_nvenc Tesla P100 GPU-Accelerated Libraries	0	1470	November 14, 2017
Inference throughput vs Graphics Clock speed DeepStream SDK	4	1197	April 20, 2022
A100 nvdec usage is insufficient General camera , ubuntu , gstreamer , deepstream	18	302	March 10, 2025
[Linux] NVCuvid - Performarce CUDA Programming and Performance	13	4239	March 9, 2016
nvJpeg's JPEG encode performance on Ampere A100 GPU-Accelerated Libraries nvjpeg	22	3186	June 8, 2022
Unable to change to high frequence [GPU CLOCK] Linux linux-driver	1	147	July 25, 2024
How to run video decoding on dGPU(A10) DeepStream SDK	4	769	September 8, 2022
GPU 0% usage when use NVENC and NVDEC Video Processing & Optical Flow	0	1021	January 6, 2021

Graphics settings -> Decode and Display Performance

Related topics