I have analysed a CUDA program with visual profiler, and I have seen someone show the global memory bandwidth and shared memory bandwidth of every kernel. But I cannot find these two item in profiler.
In addition, what does the Reads and Writes in global memory bandwidth, Loads and Stores in shared memory bandwidth mean?
I cannot understand it well.
use the guided analysis tool to “Perform Additional Analysis”
when that is complete, click on one of the tabs to the right of the analysis tab to bring up the metric details.
I did find some metric details here. But there only shows the efficiency like global load efficiency,
global store efficiency, shared efficiency and so on, can I compute the bandwidth through these efficiency?