are there any hardware metrices available? how the gpu performs with certain setups


i wrote a small test-program for a cuda device, for which i can specify the konfiguration of the used grid within some limits (number of threads, size of blocks, dimensions of the grid etc.)

and if it is possible i want to look at some metric of the gpu other than the resulting runtime of the test - for example number of context switches, cache hits/misses or similar values.

… do somebody know where to get information about such things?

cheers & thx in advance :)

You can get some information (number of coalesced/non-coalesced loads/stores, branches) using cuda visual profiler.

Getting information about context switches or cache hits… I would also love to know :)

As you implement your cache usage yourself, there can be now hardware counters for that. The problem with context switches is the invalidation of caches and the saving of caches and registers to some slower memory. However this does not occur here, as all concurrent threads/blocks have their separate registers (which can limit your occupancy of course).

Context switches in regard to switching warps would be nice to know, however I have several benchmarks that are not influenced in performance whether I run them with over a million threads or just a few thousand. So warp switching is pretty fast and shouldn’t be an issue as long as there are enough warps.