Using Visual Profiler

Vectorizer · July 29, 2013, 1:32am

Hello,

How can I use the Visual Profiler to see if my kernel is performing coalesced global loads/stores
and shared mem bank conflicts?
Seems like, starting with CC 2.0, warp_serialize, gld_incoherent, gld_coherent gst_coherent are gone.
How can you tell if there are shared mem conflicts w/o warp_serialize?
How can you tell if your global mem load/stores are coalesced w/o gld/gst_cohererent/incoherent?
I guess one can look at global_store_transaction which counts the number of load transactions and a decrease may be inferred as some coalescing is taking place but that’s about it.

Thanks

Greg · July 30, 2013, 5:11am

Fermi and Kepler architectures support counters for assessing the efficiency of your memory accesses to L1 and bank conflicts to shared memory. The nvprof metrics are:

shared_replay_overhead: Average number of replays due to shared memory conflicts for each instruction executed

global_replay_overhead: Average number of replays due to local memory cache misses for each instruction executed

global_cache_replay_overhead: Average number of replays due to global memory cache misses for each instruction executed

local_load_transactions_per_request: Average number of local memory load transactions performed for each local memory load

local_store_transactions_per_request: Average number of local memory store transactions performed for each local memory store

shared_load_transactions_per_request: Average number of shared memory load transactions performed for each shared memory load

shared_store_transactions_per_request: Average number of shared memory store transactions performed for each shared memory store

gld_transactions_per_request: Average number of global memory load transactions performed for each global memory load

gst_transactions_per_request: Average number of global memory store transactions performed for each global memory store

local_load_transactions: Number of local memory load transactions

local_store_transactions: Number of local memory store transactions

shared_load_transactions: Number of shared memory load transactions

shared_store_transactions: Number of shared memory store transactions

gld_transactions: Number of global memory load transactions

gst_transactions: Number of global memory store transactions

If you run the Visual Profiler memory analysis and any of the transactions per request values are high the analysis will provide you a link to the source line responsible for the memory operation.

tejashree170893 · December 11, 2019, 10:38am

Hi,

Is this also applicable to Pascal Architecture (on Jetson TX2)?