Question about GPU Utilization Fluctuations

hello,

I’ve encountered a puzzling issue while running a YOLOv5 training task on my Jetson Xavier NX. Utilizing the command cat /sys/devices/gpu.0/load, I’ve observed significant fluctuations in GPU utilization. The GPU usage frequently drops from nearly 100% to below 10%, which is perplexing to me. Notably, when executing the exact same training task on a GeForce RTX 3060, the GPU utilization remains consistently around 90% without such drastic variations.

Below, in Figure 1, are the results of my sampling on the Xavier NX, where GPU utilization was recorded every 0.5 seconds over a total duration of 30 seconds. Figure 2 includes screenshots of GPU utilization detected by ‘jtop’.


figure 1



figure 2

I’m seeking insights into why these fluctuations occur on the Jetson Xavier NX and if there’s a way to maintain a consistent GPU utilization rate of 90% or higher(Considering the YOLOv5 training task with a batch size of 8, the workload is already substantial. Resolving this matter is crucial for my project.

Any guidance or suggestions would be highly appreciated.

Thank you!

Hi,

Have you maximized the device performance first?

$ sudo nvpmodel -m 0
$ sudo jetson_clocks

More, latency is usually crucial for inference but not training.
Could you share how the training performance impacts your project?
Do you need this for runtime training?

Thanks.

Thanks for your reply.

This is part of our project, involving some resource management. In fact, I am not concerned about latency in tasks. I need to ensure that the Jetson maintains a stable high GPU utilization when assigned to execute a computationally intensive task, rather than experiencing fluctuations as shown here.

Additionally, is it possible that the fluctuation is due to the slow reading speed of the SD card? If data is not read fast enough, it may cause the GPU to be idle, waiting for training data. Would using an SSD resolve this issue?

thanks again.

Hi,

You can give SSD a try.

It looks like the GPU sometimes needs to wait for the data to process.

In general, you can improve this by feeding more data to GPU so the queue won’t be empty.
Or you can submit jobs into the same CUDA stream to force the GPU to execute in sequence.
Although this might slow down the performance, it can ensure there is always something to process for GPU.

Thanks.

This topic was automatically closed 14 days after the last reply. New replies are no longer allowed.