Effective bandwidth between using shared memory and global memory

0031611 · August 2, 2020, 9:00am

Hi all,

I hope your help can make me clarify this issue.
I am trying to get the effective bandwidth through the formula
https://developer.nvidia.com/blog/how-implement-performance-metrics-cuda-cc/

Bw_Effective= ( Rb + Wb ) / (t * 10^9)*

My device is the Kepler k40c with Theoretically bandwith 288,38 GB/s. CUDA SDK 9.0
N= Number of elemens to calculate
A = Number of atlas
K = Size of neighbourhood
P = Size of the patch
Blocks= Number of blocks in the grid. In my case (222)/10 * (222)/10 * (112)/10
Sh = Size of the shared memory needed for each block 22 or 20 o 18 it depends on K

For A=18 K=11, P=3, Blocks = 6348, Sh= 22 and N=5038523
the global memory version the calculation reads

CUDA-GM

Wb= 4 * N^3
Rb= 4 * N^3 * A * ( K^3 + 2 * K^3 * P^3)

Rb = 20154092
Wb = 2.65568E+13

we have Bw_Effective= ( 20154092+ 2.65568E+13) / (330.6 * 10^9) = 80.30 GB/s

CUDA-SM
We have the same write bytes but the reads are reduced because we load the needed elements in each block only once and read from shared memory for the calculations.
Wb= 4 * N^3
Rb= 4 * N^3 * A * K^3 + 4 * A * Blocks * Sh^3 + 4 * N^3 * P^3

Rb = 20154092
Wb = 4.83396E+11

we have Bw_Effective= ( 20154092+ 4.83396E+11) / (139.58 * 10^9) = 3.46GB/s

I am confused because the time is faster but the effective bandwith is smaller. This is not what I had expected but it makes sense from the formulas, due to the fact that we reduce the reads from Global Memory. However, this means that even with the SM I am not using the fully capability of the resources in my device and it can even much faster? if it is so, how?

Let me know if you need more information I willl be more than happy to provide it

Thanks in advance for any suggestion you might have

Topic		Replies	Views
global memory bandwidth problem CUDA Programming and Performance	4	1406	March 2, 2010
Measuring Effective Bandwidth CUDA Programming and Performance	1	4640	February 20, 2011
Effective Bandwidth Problem CUDA Programming and Performance	13	7708	March 23, 2011
Effective memory bandwidth? CUDA Programming and Performance	9	3599	July 26, 2021
How to read result of bandWidthTest CUDA Programming and Performance	1	1551	October 27, 2008
Performance test sharedmemory <-> globalmemory CUDA Programming and Performance	2	3931	May 30, 2008
shared memory vs local memory CUDA Programming and Performance	1	8057	December 12, 2011
CUDA: Memory performance, What is Global memory bandwidth CUDA Programming and Performance	2	6216	November 2, 2011
Bandwidth of reading data from global device memory CUDA Programming and Performance	1	3413	June 27, 2011
Shared Memory Bandwidth CUDA Programming and Performance	3	1390	August 3, 2013

Effective bandwidth between using shared memory and global memory

Related topics