Device memory in nvidia visual profiler

Nesta · October 10, 2015, 3:09pm

Hi,
I’m a bit confused what nvidia visual profiler takes as a device memory in kernel memory section of analysis. Programming guide says that device memory includes “global, local, shared, constant, or texture memory”. If so, why I get larger amount of transactions in Global L1 Cache than reads of device memory? Transactions in device memory section are only uncached ones?

I assume that transaction are performed per warp and can be 32,64, 96 or 128 byte. So if every thread reads 8 bit type there are one 32 bytes transaction.

link

Thanks for help!

edit: 32 bytes, not bits

Robert_Crovella · October 10, 2015, 3:47pm

Yes, device memory (DRAM) transactions are only the ones that don’t hit in L1 or L2 (or one of the other caches).

Transactions to device memory (DRAM) occur on an L2 cache basis. The entity in the L2 cache is called a “line” and the entity in DRAM is called a “segment”. Only segments can be read from or written to DRAM. A segment is 32 bytes, not 32 bits.

As long as the individual requirements of threads in a warp that are contributing to a transaction can be “coalesced” into a set of lines/segments, only those lines/segments will be requested.

Topic		Replies	Views
Warp or thread level stats for memory metrics CUDA Programming and Performance	1	372	March 24, 2020
What is a transaction from HBM to L2? CUDA Programming and Performance	2	186	August 29, 2024
How to understand memory access? CUDA Programming and Performance	1	532	October 16, 2023
Memory terms CUDA Programming and Performance	5	641	May 16, 2019
Understanding Profiling Metrics CUDA Programming and Performance	0	390	January 16, 2019
Why reading one byte produces multiple global load l2 transactions? CUDA Programming and Performance	3	1188	August 30, 2018
Memory transaction size and coalesced access CUDA Programming and Performance	6	4815	November 12, 2008
Trying to understand Transactions per request for P100 CUDA Programming and Performance	2	1452	February 26, 2018
some question about "384-bit memory bus from device memory to L2 cache" CUDA Programming and Performance	2	1248	September 30, 2010
Cu_device_attribute_global_memory_bus_width CUDA Programming and Performance gpu	6	880	February 23, 2021

Device memory in nvidia visual profiler

Related topics