blocks and threads

i have this simple kernel, but since im new to CUDA there are a few things im unsure about. Im wondering if each block is running simultaneously, or each thread of each block is running simultaneously?

//handled by the gpu
global void incrementArrayOnDevice(double *a, double N)
//blockIdx = block index within a grid
//blockDim = number of threads in each block
//threadIdx = thread index within the block

in main(), you can see each block has 256 threads.
and there is 8589934592 blocks. Resulting in 2199023255552
threads, which is the same amount of array elements we are
working with. So by taking advantage of the blockIdx,blockDim,
and threadIdx variables, your able to access each element
of the array. Since there are 8589934592 blocks
int idx = blockIdx.x*blockDim.x + threadIdx.x;

if (idx<N) a[idx] += 1;



blocks and threads are running simultaneously … read the documentation ;)

You won’t be able to run that many blocks at once with just blockDim.x (max is 65535). You’ll need to use blockDim.x and .y to get up to that many blocks.

And what GPU are running on that has enough memory for that many array elements! The biggest I’ve heard of is 4GiB on the Tesla 1000 series.

They don’t call him bigjoe for nothing.