I’m testing P2P memory access, I want to know when local GPU data cache miss, how many sectors will be transfered. So, I perform a kernel to read peer GPU memory and test lts__t_requests_srcunit_tex_aperture_peer metric, but I got a zero value.
Did I misunderstand something? And how to test peer memory access correctly by ncu?
The metric you’re looking at should give some information if there is peer reads. A request could be comprised of multiple sector reads, so in our memory calculations, we use this metric for peer accesses “lts__t_sectors_srcunit_tex_aperture_peer_lookup_miss.sum”. You could try collecting that. Please share a screenshot of the L2 Cache table from the Memory Workload Analysis like the one below. That will help us verify if peer accesses are actually occurring.
I didn’t realize you were using NVLINK, as opposed to PCIe. The peer metrics are for PCIe connected GPUs. The NVLINK section is what will show the communication between GPUs in your system. The only thing to be aware of is that they are device-wide, so you want to make sure that you don’t have traffic generated by other applications running at the same time.
Hi, I have a same problem. GPU0 and GPU1 have been connected with NVLink. I checked Sector Misses to Peer value column. That is all the 0. And I can’t see the NVLink section. I can see the only NVlink Topology and NVLink Tables.
(I used the cudaDeviceEnablePeerAccess API.)