Using Visual Profiler

Hello,

How can I use the Visual Profiler to see if my kernel is performing coalesced global loads/stores
and shared mem bank conflicts?
Seems like, starting with CC 2.0, warp_serialize, gld_incoherent, gld_coherent gst_coherent are gone.
How can you tell if there are shared mem conflicts w/o warp_serialize?
How can you tell if your global mem load/stores are coalesced w/o gld/gst_cohererent/incoherent?
I guess one can look at global_store_transaction which counts the number of load transactions and a decrease may be inferred as some coalescing is taking place but that’s about it.

Thanks

Fermi and Kepler architectures support counters for assessing the efficiency of your memory accesses to L1 and bank conflicts to shared memory. The nvprof metrics are:

shared_replay_overhead: Average number of replays due to shared memory conflicts for each instruction executed

global_replay_overhead: Average number of replays due to local memory cache misses for each instruction executed

global_cache_replay_overhead: Average number of replays due to global memory cache misses for each instruction executed

local_load_transactions_per_request: Average number of local memory load transactions performed for each local memory load

local_store_transactions_per_request: Average number of local memory store transactions performed for each local memory store

shared_load_transactions_per_request: Average number of shared memory load transactions performed for each shared memory load

shared_store_transactions_per_request: Average number of shared memory store transactions performed for each shared memory store

gld_transactions_per_request: Average number of global memory load transactions performed for each global memory load

gst_transactions_per_request: Average number of global memory store transactions performed for each global memory store

local_load_transactions: Number of local memory load transactions

local_store_transactions: Number of local memory store transactions

shared_load_transactions: Number of shared memory load transactions

shared_store_transactions: Number of shared memory store transactions

gld_transactions: Number of global memory load transactions

gst_transactions: Number of global memory store transactions

If you run the Visual Profiler memory analysis and any of the transactions per request values are high the analysis will provide you a link to the source line responsible for the memory operation.

Hi,

Is this also applicable to Pascal Architecture (on Jetson TX2)?