In case of using peer memory, How can I measure the L1 or L2 cache's value on operating GPU?

I ran a program that reads data from GPU1 and computes the data on GPU0 using peer memory.
On nsys (NSight Systems), I checked that data is moving from GPU1 to GPU0 through NVLINK.
However, On ncu-ui (Nsight Compute GUI), I couldn’t have observed that data is moving from peer memory.

It seems that ncu measures peer memory data movement through L2 cache miss counter.
Thus, I also checked L1 cache miss counter.
Both the L2 cache value and the L1 cache value was measured as 0 on the GPU0.
Since the operation results come out correctly and data movement is observed through nsys, I think that the data is transfered from GPU1’s memory (peer memory).

I have two questions.

  1. How can I measure data movement from/to peer memory “through the ncu”?
  2. How can I enable caching data from peer memory?

I also used these metrics to ncu’s CLI version.
all of Metric Values were 0.
lts__t_requests_srcunit_l1_aperture_peer
lts__t_requests_srcunit_l1_aperture_peer_evict_first
lts__t_requests_srcunit_l1_aperture_peer_evict_first_lookup_hit
lts__t_requests_srcunit_l1_aperture_peer_evict_first_lookup_miss
lts__t_requests_srcunit_l1_aperture_peer_evict_last
lts__t_requests_srcunit_l1_aperture_peer_evict_last_lookup_hit
lts__t_requests_srcunit_l1_aperture_peer_evict_last_lookup_miss
lts__t_requests_srcunit_l1_aperture_peer_evict_normal
lts__t_requests_srcunit_l1_aperture_peer_evict_normal_demote
lts__t_requests_srcunit_l1_aperture_peer_evict_normal_demote_lookup_hit
lts__t_requests_srcunit_l1_aperture_peer_evict_normal_demote_lookup_miss
lts__t_requests_srcunit_l1_aperture_peer_evict_normal_lookup_hit
lts__t_requests_srcunit_l1_aperture_peer_evict_normal_lookup_miss
lts__t_requests_srcunit_l1_aperture_peer_lookup_hit
lts__t_requests_srcunit_l1_aperture_peer_lookup_miss
lts__t_requests_srcunit_l1_aperture_peer_op_atom
lts__t_requests_srcunit_l1_aperture_peer_op_atom_dot_alu
lts__t_requests_srcunit_l1_aperture_peer_op_atom_dot_alu_lookup_hit
lts__t_requests_srcunit_l1_aperture_peer_op_atom_dot_cas
lts__t_requests_srcunit_l1_aperture_peer_op_atom_dot_cas_lookup_hit
lts__t_requests_srcunit_l1_aperture_peer_op_atom_lookup_hit
lts__t_requests_srcunit_l1_aperture_peer_op_atom_lookup_miss
lts__t_requests_srcunit_l1_aperture_peer_op_membar
lts__t_requests_srcunit_l1_aperture_peer_op_membar_lookup_hit
lts__t_requests_srcunit_l1_aperture_peer_op_membar_lookup_miss
lts__t_requests_srcunit_l1_aperture_peer_op_read
lts__t_requests_srcunit_l1_aperture_peer_op_read_lookup_hit
lts__t_requests_srcunit_l1_aperture_peer_op_read_lookup_miss
lts__t_requests_srcunit_l1_aperture_peer_op_red
lts__t_requests_srcunit_l1_aperture_peer_op_red_lookup_hit
lts__t_requests_srcunit_l1_aperture_peer_op_red_lookup_miss
lts__t_requests_srcunit_l1_aperture_peer_op_write
lts__t_requests_srcunit_l1_aperture_peer_op_write_lookup_hit
lts__t_requests_srcunit_l1_aperture_peer_op_write_lookup_miss
lts__t_sectors_srcunit_l1_aperture_peer
lts__t_sectors_srcunit_l1_aperture_peer_evict_first
lts__t_sectors_srcunit_l1_aperture_peer_evict_first_lookup_hit
lts__t_sectors_srcunit_l1_aperture_peer_evict_first_lookup_miss
lts__t_sectors_srcunit_l1_aperture_peer_evict_last
lts__t_sectors_srcunit_l1_aperture_peer_evict_last_lookup_hit
lts__t_sectors_srcunit_l1_aperture_peer_evict_last_lookup_miss
lts__t_sectors_srcunit_l1_aperture_peer_evict_normal

I seems you already found your answer why there is no peer traffic shown for nvlink data here. Peer traffic is for PCIe-connected GPUs, it does not count nvlink traffic. This is shown in the NVLink section.

To collect the nvlink section, use --set nvlink or --set full --section Nvlink.

1 Like

Thank you for answering me how to measure it.
I would like to ask you again what I posted as the second question above.
NVLink measurements can be made possible and I can calculate the data movements, but I wonder if Peer Memory is being cached by default.
If it’s not working, how can I check it and how can I enable it?

This topic was automatically closed 14 days after the last reply. New replies are no longer allowed.