Is my bandwidth calculation right? bandwidth

GoGdizzY · November 12, 2009, 8:21am

In BestPracticeGuide, it says

Effective bandwidth = (( Br + Bw ) / 10^9) / time

Theoretical Bandwidth = ( clockRate * 10^6 * (bitwidth/8) * 2 ) / 10^9

so my GTX260 216sp 's theoretical bandwidth is

( 1175 * 10^6 * (448/8) * 2) / 10^9 = 131.6 GB/s

In practice, my effective bandwidth is only 1.832 GB/s, is that too small?

[codebox] #define z_uint8 unsigned char

#define z_float32 float

#define z_int32 int

#define N 1024

#define R 1.23456789

// global var

z_uint8 G_Input[N*N];

z_uint8 G_Output[int(N*R)int(NR)];

z_float32 G_Input2[N*N];

z_float32 G_Output2[int(N*R)int(NR)];

…

{

cudaEvent_t start, stop;

float time;

cudaEventCreate(&start);

cudaEventCreate(&stop);

cudaEventRecord( start, 0 );

z_int32 size1 = NN, size2 = int(NR)int(NR), size3 = int(N*R);

z_float32* d_dest;

z_float32* d_src;

cudaMalloc( (void**)&d_src, size1 * sizeof(z_float32) );

cudaMalloc( (void**)&d_dest, size2 * sizeof(z_float32) );

for(z_int32 i = 0; i < size1; i++) G_Input2[i] = z_float32(G_Input[i]);

z_int32 iter = 10;

for(int i =0; i<iter;i++)

{

cudaMemcpy( d_src, G_Input2, size1 * sizeof(z_float32), cudaMemcpyHostToDevice);

cudaMemcpy( G_Output2, d_dest, size2 * sizeof(z_float32), cudaMemcpyDeviceToHost);

}

for(z_int32 i = 0; i < size2; i++) G_Output[i] = z_uint8(G_Output2[i]);

cudaFree( d_dest );

cudaFree( d_src );

cudaEventRecord( stop, 0 );

cudaEventSynchronize( stop );

cudaEventElapsedTime( &time, start, stop );

cudaEventDestroy( start );

cudaEventDestroy( stop );

printf(“Gpu time: %f miliseconds. \n”, time );

printf(“Gpu bandwidth: %f Gflops. \n”, (size1+size2)sizeof(z_float32)/1e6/timeiter );

}[/codebox]

GoGdizzY · November 12, 2009, 9:45am

i know that the first Bandwidth is between device memory and GPU, and the second Bandwidth is between
Host and Device memory.

so my peak rate is 250 MB/s * 16 = 4 GB/s (PCIE1.0 x16)

Now another question, is the GTX260 using PCIE1.0 or 2.0?

LSChien · November 12, 2009, 2:59pm

no, Theoretical Bandwidth = ( clockRate * 10^6 * (bitwidth/8) * 2 ) / 10^9

means transfer rate among device memory.

However your code does not measure bandwidth of device memory, you measure

data transfer between host memory

for(z_int32 i = 0; i < size1; i++) G_Input2[i] = z_float32(G_Input[i]);

...

for(z_int32 i = 0; i < size2; i++) G_Output[i] = z_uint8(G_Output2[i]);

this depends on FSB, and how many cores you use. you use only one core, so

bandwidth is about 2GB/s

transfer between host memory and device memory

cudaMemcpy( d_src, G_Input2, size1 * sizeof(z_float32), cudaMemcpyHostToDevice);

cudaMemcpy( G_Output2, d_dest, size2 * sizeof(z_float32), cudaMemcpyDeviceToHost);

this depends on PCI express, roughly speaking, bandwidth is 1.7~2.5 GB/s for non-pinned memory

in my machine (ASUS P5Q PRO).

you must write a kernel function to measure bandwidth of device memory, for example, data copy

GoGdizzY · November 13, 2009, 7:16am

no, Theoretical Bandwidth = ( clockRate * 10^6 * (bitwidth/8) * 2 ) / 10^9

means transfer rate among device memory.

However your code does not measure bandwidth of device memory, you measure

data transfer between host memory
for(z_int32 i = 0; i < size1; i++) G_Input2[i] = z_float32(G_Input[i]);

...

for(z_int32 i = 0; i < size2; i++) G_Output[i] = z_uint8(G_Output2[i]);
this depends on FSB, and how many cores you use. you use only one core, so

bandwidth is about 2GB/s

transfer between host memory and device memory
cudaMemcpy( d_src, G_Input2, size1 * sizeof(z_float32), cudaMemcpyHostToDevice);

cudaMemcpy( G_Output2, d_dest, size2 * sizeof(z_float32), cudaMemcpyDeviceToHost);
this depends on PCI express, roughly speaking, bandwidth is 1.7~2.5 GB/s for non-pinned memory

in my machine (ASUS P5Q PRO).

you must write a kernel function to measure bandwidth of device memory, for example, data copy

Thank you very much!!

In fact, i realize it myself later. :)

Topic		Replies	Views
Measuring Effective Bandwidth CUDA Programming and Performance	1	4643	February 20, 2011
THEORETICAL BANDWIDTH vs EFFECTIVE BANDWIDTH CUDA Programming and Performance	13	6861	February 23, 2017
Bandwidth measurement Theortical bandwidth vs BandwidthTest(SDK) results CUDA Programming and Performance	4	1557	May 30, 2011
Bandwidht Usage CUDA Programming and Performance	16	8894	October 30, 2008
Weird bandwidth issues CUDA Programming and Performance	8	1363	December 1, 2016
How to calculate the theoretical memory bandwidth? CUDA Programming and Performance	8	8298	December 18, 2024
memCpy : Device to Device VERY SLOW CUDA Programming and Performance	7	2824	September 13, 2009
Using bandwidthTest, D2D performance exceeds theoretical bandwidth CUDA Programming and Performance cuda	1	396	October 27, 2022
Using bandwidthTest tool, D2D performance More than the official given bandwidth CUDA Programming and Performance cuda	6	846	October 28, 2022
Bandwith Problem CUDA Programming and Performance	7	2633	March 16, 2009

Is my bandwidth calculation right? bandwidth

Related topics