Hi,
I captured a NSight trace with our project game build. It’s a UE5 project, and I was capturing it on RTX 4080.
There’s a big gap between command list executions.
The GPU Active and all the SM Warp Occupancy is 0. The GPU is not completely silent. The PCIe Bandwidth was still used at 7.7%.
My GPU memory usage was 12GB out of 16GB, so I assume that there’s no significant memory swapping with the system.
I’m not sure what caused this stall. Any ideas for continuing the investigation would be very helpful. Thank you!
I found the reason by capturing a frame with NSight System.
Basically, it is caused by insufficient task uploading to the GPU. The submission thread waited for jobs due to a bottleneck on the Render thread and processed too many resource barriers before submitting a new command list.
I’ll focus on optimizing the render thread and clarifying the usage of resources barrier.
1 Like
Thank you @guanning79 for updating here!
And welcome to the NVIDIA developer forums.
Sorry you didn’t receive an earlier answer.
Best of success with your game!