GPU trace reports unexpected CTA, Warp and Thread launch metrics for workgraph

Hello,

I am spawning a broadcast node with a thread group size of 8x8 in a workgraph (the node does not spawn any other nodes) and inspecting the dispatch graph in GPU Trace I notice unexpected results for number of CTAs launched (40,516), number of warps spawned (72,937) and number of threads (2,082,822) which don’t make sense for a 1920x1080 image. I have a compute shader implementation of the same shader and the counts I see in GPU trace are as expected.

Do workgraphs have different to compute shaders CTA launch rules or am I misinterpreting the GPU trace numbers?

Using NSight Graphics 2024.2, on an 3080 laptop GPU, driver version 32.0.15.6094

Thanks,

Kostas

Hi Kostas,

Could you please provide a simple example that would allow us to reproduce the issue? This will help us in investigating and resolving the problem more efficiently.

Thanks
An