Question regarding peak memory bandwidth Gigabytes or Gibibytes per second?

Maddy_Scientist · December 13, 2010, 2:55pm

I’ve been measuring the peak memory bandwidth I can obtain in some linear algebra kernels I’ve written. I want to make sure I am calculating my percentage utilization correctly (or really check that Nvidia is reporting their bandwidth correctly). The most I seem to be able to get out of a GTX 480 is 150 GiB/s = 161 GB/s. Nvidia reports the peak memory bandwidth of the 480 as 177.4 GB/s. Since they state GB/s and not GiB/s, I presume that they are counting memory bandwidth with base-10 (Gigabyte = 10^9 bytes) not base-2 (Gibabyte = 2^30 bytes)?

If Nvidia use base-10, then I can achieve about 90% of peak bandwidth, but if they’re using base-2, then my utilization is 85%. So which is the correct measure?

apaehler · December 13, 2010, 4:49pm

in their bandwidth test, when outputs are reported in MB/s, the code is

//calculate bandwidth in MB/s

    bandwidthInMBs = (1e3f * memSize * (float)MEMCOPY_ITERATIONS) / 

                                        (elapsedTimeInMs * (float)(1 << 20));

so that is in base 2 (or base 1024 if you wish). I would assume the same holds for other numbers. 85% of peak bandwidth sounds like a very nice number to me.

njuffa · December 14, 2010, 4:19am

The C2050 peak bandwidth is 1500 MHz * (384 bits / 8) * 2 [for GDDR5] = 144 * 10^9 bytes/sec = 144 GByte/sec
http://www.nvidia.com/object/product_tesla_C2050_C2070_us.html

The GTX480 peak bandwidth is 1848 MHz * (384 bits / 8) * 2 [for GDDR5] = 177.4 * 10^9 bytes/sec = 177.6 GByte/sec
http://www.nvidia.com/object/product_geforce_gtx_480_us.html

Maddy_Scientist · December 14, 2010, 3:14pm

Thanks for the reply. So it looks like I can achieve up to 90% of potential bandwidth then since Nvidia uses the correct base-10 Giga definition. Hooray for SI standards External Image

Another question, though not related. If I use the fmaf function in my code, how will this be interpreted on GT200? Will it issue a MAD instruction, or will it perform a software emulated IEEE compliant FMA? What I need is to be able to write code that is guaranteed to use the fast FMA on Fermi, and the fast MAD on GT200. I’ve found I can’t always rely on the compiler to always issue a FMA/MAD instruction from “a += b*c” type code.

njuffa · December 14, 2010, 6:52pm

fmaf() is a math library function defined in C99 and requires fused multiply-add functionality. Therefore it maps to a single-precision FMA instruction on sm_2x, and a software emulation on sm_1x where there is no matching hardware instruction.

To generate either an FMA on sm_2x or an FMAD on sm_1x the CUDA math library uses an inlined function, along the following lines:

__device__ __forceinline__ float my_fmadf (float a, float b, float c)

{

#if defined(__CUDA_ARCH__) && (__CUDA_ARCH__ >= 200)

  return __fma_rn(a, b, c);

#else

  return a * b + c;

#endif 

}

While this does not guarantee the generation of an FMAD on sm_1x, one can get quite close in practice, especially when one breaks up longer computation such that each call to my_fmadf() becomes a separate expression whose result is assigned to a temporary variable. See the math library sources (i.e. header file math_functions.h) for numerous worked examples.

Topic		Replies	Views
How to calculate the theoretical memory bandwidth? CUDA Programming and Performance	8	8623	December 18, 2024
THEORETICAL BANDWIDTH vs EFFECTIVE BANDWIDTH CUDA Programming and Performance	13	6925	February 23, 2017
How to calculate "ideal" ratio of instructions to memory accesses? CUDA Programming and Performance	6	1464	August 23, 2010
Bandwidth measurement Theortical bandwidth vs BandwidthTest(SDK) results CUDA Programming and Performance	4	1564	May 30, 2011
Double precision: GTX 465, GTX 480 and C2050 CUDA Programming and Performance	16	3768	September 9, 2010
Memory Speed Calculation CUDA Programming and Performance	3	850	May 4, 2011
Is my bandwidth calculation right? bandwidth CUDA Programming and Performance	3	1454	November 13, 2009
GTX 480 - performance CUDA Programming and Performance	8	6802	June 9, 2010
How to get peak rate with simple opeartion Question about performance optimization CUDA Programming and Performance	17	13633	June 2, 2008
Global memory bandwidth on GTX 690 CUDA Programming and Performance	5	1566	September 13, 2014

Question regarding peak memory bandwidth Gigabytes or Gibibytes per second?

Related topics