In case of using peer memory, How can I measure the L1 or L2 cache’s value on operating GPU?

I left a question in the Nsight Compute category and received an answer about how to measure NVLink.
However, my main question is caching received data from peer memory.
I think this question falls into this category, so I’m leaving the same question again.

Q. Is there a way to see if it’s being cached?
(I mean, I also want to check the hit ratio for the data that was accessed from peer.)

Generally it should not be difficult to see the difference between local caching or not. Devise a test that repeatedly accesses the data (perhaps copying the data). If the data access occurs repeatedly at a slower rate, it is not being cached. The slower rate would be consistent with whatever the peer bus speed is - PCIE or NVLink. If the measured performance is consistent with a higher rate, then it would seem to be cached, locally.

Regarding using the profiler, the profiler has metrics for cache hit rates. It should be possible to devise a similar test to that described above, and then run with the profiler, and observe the hit metrics.

1 Like

I checked the way you told me.
It seems no caching totaly.
(Remote GPU memory’s total data amount is 256MB. Received User Bytes value is for about 268MB.)

Peer Memory is enabled.
In this state, when the local GPU reads the data from the remote GPU memory, I thought caching occurred by default with the local GPU L2 cache.
However, it does not appear to be the result of the experiment.
It seems to be a one-time read only data without caching with direct access.