Hi there,

I want to calculate bandwidth for a simple kernel :

```
__global__
void kernel(float* a, float* b, float* c, float* d)
{
unsigned int index = blockIdx.x * blockDim.x + threadIdx.x;
a[index] = b[index] + c[index] * d[index];
}
```

As i have 1 write and 3 reads, i should calculate this :

bandwidth = 4 * sizeof(float) * arrayLength / executionTime

Is it the right way ?

I tried this and i get 120 GB/s although i have a FX4600 which has a 46GB/s local memory bandwidth. I do not understand

Thanks for you help.