i have this simple kernel, but since im new to CUDA there are a few things im unsure about. Im wondering if each block is running simultaneously, or each thread of each block is running simultaneously?
//handled by the gpu
global void incrementArrayOnDevice(double *a, double N)
{
//blockIdx = block index within a grid
//blockDim = number of threads in each block
//threadIdx = thread index within the block
/*
in main(), you can see each block has 256 threads.
and there is 8589934592 blocks. Resulting in 2199023255552
threads, which is the same amount of array elements we are
working with. So by taking advantage of the blockIdx,blockDim,
and threadIdx variables, your able to access each element
of the array. Since there are 8589934592 blocks
*/
int idx = blockIdx.x*blockDim.x + threadIdx.x;
if (idx<N) a[idx] += 1;
}
thanks