I ran a program that reads data from GPU1 and computes the data on GPU0 using peer memory.
On nsys (NSight Systems), I checked that data is moving from GPU1 to GPU0 through NVLINK.
However, On ncu-ui (Nsight Compute GUI), I couldn’t have observed that data is moving from peer memory.
It seems that ncu measures peer memory data movement through L2 cache miss counter.
Thus, I also checked L1 cache miss counter.
Both the L2 cache value and the L1 cache value was measured as 0 on the GPU0.
Since the operation results come out correctly and data movement is observed through nsys, I think that the data is transfered from GPU1’s memory (peer memory).
I have two questions.
- How can I measure data movement from/to peer memory “through the ncu”?
- How can I enable caching data from peer memory?
I also used these metrics to ncu’s CLI version.
all of Metric Values were 0.
lts__t_requests_srcunit_l1_aperture_peer
lts__t_requests_srcunit_l1_aperture_peer_evict_first
lts__t_requests_srcunit_l1_aperture_peer_evict_first_lookup_hit
lts__t_requests_srcunit_l1_aperture_peer_evict_first_lookup_miss
lts__t_requests_srcunit_l1_aperture_peer_evict_last
lts__t_requests_srcunit_l1_aperture_peer_evict_last_lookup_hit
lts__t_requests_srcunit_l1_aperture_peer_evict_last_lookup_miss
lts__t_requests_srcunit_l1_aperture_peer_evict_normal
lts__t_requests_srcunit_l1_aperture_peer_evict_normal_demote
lts__t_requests_srcunit_l1_aperture_peer_evict_normal_demote_lookup_hit
lts__t_requests_srcunit_l1_aperture_peer_evict_normal_demote_lookup_miss
lts__t_requests_srcunit_l1_aperture_peer_evict_normal_lookup_hit
lts__t_requests_srcunit_l1_aperture_peer_evict_normal_lookup_miss
lts__t_requests_srcunit_l1_aperture_peer_lookup_hit
lts__t_requests_srcunit_l1_aperture_peer_lookup_miss
lts__t_requests_srcunit_l1_aperture_peer_op_atom
lts__t_requests_srcunit_l1_aperture_peer_op_atom_dot_alu
lts__t_requests_srcunit_l1_aperture_peer_op_atom_dot_alu_lookup_hit
lts__t_requests_srcunit_l1_aperture_peer_op_atom_dot_cas
lts__t_requests_srcunit_l1_aperture_peer_op_atom_dot_cas_lookup_hit
lts__t_requests_srcunit_l1_aperture_peer_op_atom_lookup_hit
lts__t_requests_srcunit_l1_aperture_peer_op_atom_lookup_miss
lts__t_requests_srcunit_l1_aperture_peer_op_membar
lts__t_requests_srcunit_l1_aperture_peer_op_membar_lookup_hit
lts__t_requests_srcunit_l1_aperture_peer_op_membar_lookup_miss
lts__t_requests_srcunit_l1_aperture_peer_op_read
lts__t_requests_srcunit_l1_aperture_peer_op_read_lookup_hit
lts__t_requests_srcunit_l1_aperture_peer_op_read_lookup_miss
lts__t_requests_srcunit_l1_aperture_peer_op_red
lts__t_requests_srcunit_l1_aperture_peer_op_red_lookup_hit
lts__t_requests_srcunit_l1_aperture_peer_op_red_lookup_miss
lts__t_requests_srcunit_l1_aperture_peer_op_write
lts__t_requests_srcunit_l1_aperture_peer_op_write_lookup_hit
lts__t_requests_srcunit_l1_aperture_peer_op_write_lookup_miss
lts__t_sectors_srcunit_l1_aperture_peer
lts__t_sectors_srcunit_l1_aperture_peer_evict_first
lts__t_sectors_srcunit_l1_aperture_peer_evict_first_lookup_hit
lts__t_sectors_srcunit_l1_aperture_peer_evict_first_lookup_miss
lts__t_sectors_srcunit_l1_aperture_peer_evict_last
lts__t_sectors_srcunit_l1_aperture_peer_evict_last_lookup_hit
lts__t_sectors_srcunit_l1_aperture_peer_evict_last_lookup_miss
lts__t_sectors_srcunit_l1_aperture_peer_evict_normal