Problems with lts__t_requests_srcunit_tex_aperture_peer

2944419175 · September 15, 2022, 1:02pm

Hi,

I’m testing P2P memory access, I want to know when local GPU data cache miss, how many sectors will be transfered. So, I perform a kernel to read peer GPU memory and test lts__t_requests_srcunit_tex_aperture_peer metric, but I got a zero value.

Did I misunderstand something? And how to test peer memory access correctly by ncu?

Thanks!

jmarusarz · September 19, 2022, 8:36pm

The metric you’re looking at should give some information if there is peer reads. A request could be comprised of multiple sector reads, so in our memory calculations, we use this metric for peer accesses “lts__t_sectors_srcunit_tex_aperture_peer_lookup_miss.sum”. You could try collecting that. Please share a screenshot of the L2 Cache table from the Memory Workload Analysis like the one below. That will help us verify if peer accesses are actually occurring.

2944419175 · September 21, 2022, 12:00pm

L2 Cache table:

I tried lts__t_sectors_srcunit_tex_aperture_peer_lookup_miss.sum metric, also got a zero value.

And below is the nvlink received bytes and transmitted bytes, which indicates data transfer actually happened.

Moreover, if I collect l1tex__m_xbar2l1tex_read_sectors, I can get a value match nvlink user received data.

jmarusarz · September 28, 2022, 8:34pm

I didn’t realize you were using NVLINK, as opposed to PCIe. The peer metrics are for PCIe connected GPUs. The NVLINK section is what will show the communication between GPUs in your system. The only thing to be aware of is that they are device-wide, so you want to make sure that you don’t have traffic generated by other applications running at the same time.

Topic		Replies	Views
Fail to find metric & No metrics to collect found in sections Nsight Compute	7	1183	November 27, 2023
Ampere GPU L2 cache write miss policy CUDA Programming and Performance	3	904	February 9, 2022
Different betweent in lts__t_sectors_srcunit_tex_op_read.sum and lts__t_bytes.sum Nsight Compute	5	507	June 24, 2024
How to get the bytes read/write sum about Memory access between GPUs? Nsight Compute	7	883	March 20, 2024
What is the expected L1/L2 hit rate for fully coalesced accesses? CUDA Programming and Performance	9	53	January 8, 2025
Confusion about lts srcunit tex Nsight Compute	0	591	April 24, 2022
What is the 'ga10x-gfxt' Metric set in collect GPU metric option? Profiling Linux Targets	4	806	March 30, 2023
Performance problems with NVLink and L2 cache CUDA Programming and Performance	6	1052	September 26, 2022
Got a message "Failed to get throughput counters" when I try to collect the throughput of Nvlink CUDA Programming and Performance nvidia-smi	3	1416	November 16, 2022
Weird Number for L2 Cache Hitrate Nsight Compute nsight	1	1362	April 25, 2020

Problems with lts__t_requests_srcunit_tex_aperture_peer

Related topics