Nsight->unguided application->kernel memory meaning?

sam_zhong · September 10, 2016, 10:23am

Hello,
the Unified cache and Device memory in kernel memory, but i can not know the different of global load and device memory read. And sometimes, the global loads is greater than the memory bandwidth?

Robert_Crovella · September 10, 2016, 3:09pm

I don’t understand what you are asking in the first sentence. Regarding the second sentence, yes, global loads can be greater than device memory bandwidth if some of the global loads are hitting in one of the caches.

BulatZiganshin · September 10, 2016, 5:38pm

there are several memory spaces - shared, local (stack), global. So the first counter measures reads from the Global memory space. Both Global and Local memory spaces are mapped to device memory, and cached in L1 and L2 caches. So the second measure counts physical reads - either from L1$, L2$ or device memory

See for example http://docs.nvidia.com/cuda/cuda-c-best-practices-guide/#device-memory-spaces or any CUDA textbook

sam_zhong · September 12, 2016, 2:50am

Hi BulatZiganshin, tanks for your reply.

According to your sentences, I think the Unified Cache (Nsight → “Analysis” console → “kernel Memory” tag → “Results” → Unified Cache) that counter measures reads/loads from the device memory and cache memory?

sam_zhong · September 12, 2016, 2:57am

Thanks for your reply txbob.

You mean the global loads counts from device memory and caches?
The Unified cache and Device memory in first sentence that you can find :
Nsight → “Analysis” console → “kernel Memory” tag → “Results” → Unified Cache
Unified Cache:
Local Loads
Local Stores
Global Loads
Global stores
Texture Reads
Unified Total

Nsight → “Analysis” console → “kernel Memory” tag → “Results” → Device Memory
Device Memory:
Reads
Writes
Total

I don not know the meaning of the Global Loads and Device Memory.

Topic		Replies	Views
Global memory vs device memory CUDA Programming and Performance	6	3195	March 26, 2023
Bandwidth of reading data from global device memory CUDA Programming and Performance	1	3413	June 27, 2011
Device memory VS Shared memory CUDA Programming and Performance	4	4086	September 22, 2008
Memory terms CUDA Programming and Performance	5	633	May 16, 2019
persistent global memory? CUDA Programming and Performance	1	5053	April 30, 2010
global memory caching CUDA Programming and Performance	4	1367	March 13, 2012
comparision: shared mem <=> global mem actually no difference CUDA Programming and Performance	6	7551	July 21, 2008
Unbalanced Memory Read & Write CUDA Programming and Performance cuda	3	296	June 29, 2023
Shared memory bandwidth CUDA Programming and Performance	10	8508	November 10, 2007
P100, relationship between global transactions and texture reads inside the Unified L1/Texture memor... CUDA Programming and Performance	3	443	July 12, 2018

Nsight->unguided application->kernel memory meaning?

Related topics