Nsight->unguided application->kernel memory meaning?

the Unified cache and Device memory in kernel memory, but i can not know the different of global load and device memory read. And sometimes, the global loads is greater than the memory bandwidth?

I don’t understand what you are asking in the first sentence. Regarding the second sentence, yes, global loads can be greater than device memory bandwidth if some of the global loads are hitting in one of the caches.

there are several memory spaces - shared, local (stack), global. So the first counter measures reads from the Global memory space. Both Global and Local memory spaces are mapped to device memory, and cached in L1 and L2 caches. So the second measure counts physical reads - either from L1$, L2$ or device memory

See for example http://docs.nvidia.com/cuda/cuda-c-best-practices-guide/#device-memory-spaces or any CUDA textbook

Hi BulatZiganshin, tanks for your reply.

According to your sentences, I think the Unified Cache (Nsight -> “Analysis” console -> “kernel Memory” tag -> “Results” -> Unified Cache) that counter measures reads/loads from the device memory and cache memory?

Thanks for your reply txbob.

You mean the global loads counts from device memory and caches?
The Unified cache and Device memory in first sentence that you can find :
Nsight -> “Analysis” console -> “kernel Memory” tag -> “Results” -> Unified Cache
Unified Cache:
Local Loads
Local Stores
Global Loads
Global stores
Texture Reads
Unified Total

Nsight -> “Analysis” console -> “kernel Memory” tag -> “Results” -> Device Memory
Device Memory:

I don not know the meaning of the Global Loads and Device Memory.