What is the meaning of the word kernel in the memory workload analysis

I mean the left-most kernel in the image, what do they mean? can I see it as register file?

This chart is documented here: 2. Kernel Profiling Guide — NsightCompute 12.4 documentation
“Kernel” does not represent the register file, but the kernel executing on the GPU which executes instructions.

2 Likes

Thank you! I really need to read the manual more carefully.

I’m sorry but I still have a question.

The red line from L2 cache to shared memory, I know it’s the new feature from Ampere, which enables cuda::memcpy_async. I wonder whether the data goes through the L1 cache.

As I understand it, the L1 cache and Shared MEM are physically the same hardware, just partitioned differently, right?
So does the data go directly from L2 to SMEM, or does it pass through L1?

@felix_dt could you please take some time to answer my question, thank you!

This topic was automatically closed 14 days after the last reply. New replies are no longer allowed.

As I understand it, the L1 cache and Shared MEM are physically the same hardware, just partitioned differently, right?

That is correct, they are physically the same unit, partitioned into the two. However, if you load from global memory into shared memory, the data will still be implicitly cached.

So does the data go directly from L2 to SMEM, or does it pass through L1?

It still passes through L1, but is not staged into registers anymore before reaching shared memory. This is implied by the line passing through L1, instead of around it. This blog should provide more background: htts://developer.nvidia.com/blog/controlling-data-movement-to-boost-performance-on-ampere-architecture/