I am trying to optimize an cuda application code by running cuda visual profiler. However, I am not sure what those number exactly mean and how are they estimated, for example, global load throughout, DRAM write throughout…? Is there any documentation for visual profiler? Also can you suggest a paper that using virtual profiler to analyze and optimize a code? Thank you.