Blocks/Threads/Indexing in CUDA C

Hello,

I have fundamental experience in c/c++ and started to learn CUDA C some time ago.

Using the book CudaByExample I could play with its code examples in order to understand the capabilities of CUDA on old as well as on newer GPUs.

There is just one point which is not yet clear for me: Indexing

Could please someone answer my following to questions?

Refer to kernel definition example below:
The threadID is initialized outside the while loop.
-What is the meaning of the bold line, is it a definition for an offset? In some examples this line is missing. But without it, my program will crash.
-If it is definitely an offset: why is it necessary to make a such large one from current thread to the next?

[i]#define N (1024*1024) // calculation steps = amount of blocks
#define THREADS_PER_BLOCK 256

global void addKernel(const int *a, const int *b, int *c)
{
int tid = threadIdx.x + blockIdx.x * blockDim.x;

while (tid < N)
{
	c[tid] = a[tid] + b[tid];[/i]
	[b]tid = tid + blockDim.x * gridDim.x;//????????

[/b] }

}

Thank you very much in advance!

It’s a grid-stride loop. Read this:

https://devblogs.nvidia.com/cuda-pro-tip-write-flexible-kernels-grid-stride-loops/