Bandwidth calculation Newbie question...

oreo1 · July 30, 2008, 10:25am

Hi there,

I want to calculate bandwidth for a simple kernel :

__global__

void kernel(float* a, float* b, float* c, float*  d)

{

  unsigned int index = blockIdx.x * blockDim.x + threadIdx.x;

 a[index] = b[index] + c[index] * d[index];

}

As i have 1 write and 3 reads, i should calculate this :

bandwidth = 4 * sizeof(float) * arrayLength / executionTime

Is it the right way ?

I tried this and i get 120 GB/s although i have a FX4600 which has a 46GB/s local memory bandwidth. I do not understand External Image

Thanks for you help.

yk_cadcg · July 30, 2008, 11:35am

maybe wrong execution time? try both syncthreads() method and event method

oreo1 · July 30, 2008, 1:19pm

I dont think time is wrong. I have tried the profiler time and also, a simple timer. I get the same results.

In my program, this code is called 1000 times, and each time, almost same execution time.

Around 1,11 ms for a 4194304 float array.

tmurray · July 30, 2008, 6:37pm

In my experience, usually when this happens you’ll be reading the wrong number of elements or something like that.

Just a guess.

oreo1 · July 31, 2008, 11:33am

Mmm, do you mean i do not map the blocks the good way ?

My launch parameters seems ok, and the results are the same as the CPU version :S

I’m really perplex. Is my formula good?

Thanks again.

frea · July 31, 2008, 11:45am

Guessing but maybe you can read and write at the same time, so actually you should be multiplying by 3 not 4. Although this would give you 90gb/sec (which i though is moreless that what generally cuda cards achieve)

oreo1 · July 31, 2008, 12:03pm

I’ve tried the Bandwidth test (in CUDA SDK projects), and i get 46 GB/s Device to Device bandwidth. That’s what i compare to. By the way, this is quite far from the theoretical 67.2 GB/s… :S

senorbum · July 31, 2008, 3:03pm

Yeah, reaching theoretical bandwidth is rarely, if ever, possible. From what I’ve gathered, 2/3 theoretical is about normal. I get around 48 GB/s. These differences can come from MoBo and driver issues, and other hardware issues. I presume there are other reasons as well, but these are common.

MisterAnderson42 · July 31, 2008, 4:03pm

Maybe just an honest error in calculating the bandwidth?

I get:

4*4194304 floats * 4 bytes/float / 1.11e-3 seconds / 1024^3 bytes/GiB = 56.3063063 GiB/s

That is still higher than your 46 theoretical though, which is odd…

oreo1 · August 1, 2008, 9:31am

I have tried on a friend’s computer. He’s got a 8800GTX and he gets 80% of its theoretical bandwidth on bandwidth SDK example. Anyway, as you say, it may be coming from somewhere else…

You are right, i’ve done an error…

The thoretical bandwidth is 67.2 GB/s. Getting 56.3 is quite realistic !

The last i don’t get, is why i get a higher result with this kernel than with the bandwith SDK test ?

Anyway… Sorry for my mistake and thanks a lot for your help guys =)

oYo

MisterAnderson42 · August 1, 2008, 11:38am

Yep, the original 8800 GTX consistently gets 70 GiB/s bandwidth in kernels like this and the peak is 86 GiB/s.

The bandwidth SDK test is benchmarking using a device to device cudaMemcpy, which a little different than running a kernel so they don’t have to be the same.

Topic		Replies	Views
Maximum bandwith? CUDA Programming and Performance	4	4425	April 16, 2008
Measuring Effective Bandwidth CUDA Programming and Performance	1	4646	February 20, 2011
Effective Bandwidth Problem CUDA Programming and Performance	13	7711	March 23, 2011
Bandwidth measurement Theortical bandwidth vs BandwidthTest(SDK) results CUDA Programming and Performance	4	1564	May 30, 2011
Device Memory Bandwidth CUDA Programming and Performance	8	1846	March 29, 2015
Quadro 4000 Bandwidth The device to device bandwidth obtained with CUDA Programming and Performance	8	3519	March 7, 2011
THEORETICAL BANDWIDTH vs EFFECTIVE BANDWIDTH CUDA Programming and Performance	13	6920	February 23, 2017
upper limit for memory bandwidth on the device ? CUDA Programming and Performance	13	11246	July 8, 2009
What's a reasonable memory bandwidth performance to expect? My current maximum is only around 50 CUDA Programming and Performance	1	634	July 27, 2010
Question about bandwidth test CUDA Programming and Performance	8	351	April 2, 2024

Bandwidth calculation Newbie question...

Related topics