Nisght Compute Kernel mapping information

m_ali102 · October 22, 2022, 1:42am

Apologies, I am new to GPU profiling.

I am using Nsight Compute to profile matrix multiplications on Ampere A100 GPU.

I want to know if it’s possible to get the kernel mapping inside the GPU memory hierarchy. Specifically, I want to get the actual nested loops of the gemm kernel running on the GPU hardware that shows the mapping of data on different memory levels.

Is that possible using Nsight Compute? If not, is there any tool that can do that?

jmarusarz · October 24, 2022, 8:06pm

Can you clarify what you mean by mapping of data on different memory levels? A piece of data can move through the memory levels, for example it could sometimes reside in L1, then be evicted to L2, and finally evicted all the way to the device memory.

I’m not sure I understand what you mean by “the kernel mapping inside the GPU memory hierarchy”

m_ali102 · October 26, 2022, 2:42am

Thanks a lot for your response! Sorry for the late response.

I think my question wasn’t clear as well. Here is what I was asking:
I would like to get tiling details in L2, SMEM, and RF. Also, the loop ordering which defines which matrix/dimension (m,n,k) in a GEMM will reside more in L2, SMEM, and RF. Also, which dimensions are spatially unrolled in the memory heirarchy.

Can Nsight-Compute provide such analysis?

jmarusarz · October 27, 2022, 8:29pm

Nsight Compute doesn’t have information about where specific pieces of data are mapped/tiled or how they are laid out in memory. It only observes the effects on performance of the mapping. There are some additional details about bank conflicts etc… for shared memory and I recommend this GTC talk to learn about that in detail. How to Understand and Optimize Shared Memory Accesses using Nsight Compute | NVIDIA On-Demand

m_ali102 · October 28, 2022, 12:33am

Thanks for the answer and suggestions!

Is there any other tool that can give that information?

jmarusarz · October 28, 2022, 3:06pm

I don’t know of a tool that can give that information.

Topic		Replies	Views
NVIDIA Nsight Compute to profile the whole application Nsight Compute	4	604	May 26, 2021
How to get memory access profile (over time) inside a kernel? Profiling Linux Targets nsight	4	866	May 3, 2023
How to get the compute and memory throughput of GPU from the perspective of the whole GPU system Nsight Compute cuda	4	1245	September 23, 2022
Question about Memory Workload Analysis (keyword) Nsight Compute	4	678	October 12, 2021
Visualisation of Integer based Random Memory Access Kernel Nsight Compute	2	102	January 9, 2025
Nsight-compute and NvBit differences Nsight Compute	3	938	February 14, 2023
What is the meaning of the word kernel in the memory workload analysis Nsight Compute kernel	6	799	May 14, 2024
Measuring Execution Time Inside a GPU Kernel Nsight Compute cuda , nsight	2	1413	January 23, 2024
Measuring L1/SMEM throughput on V100 using nvprof CUDA Programming and Performance	4	634	October 22, 2020
Nsight System questions Nsight Systems	0	513	August 26, 2021

Nisght Compute Kernel mapping information

Related topics