Scalar Product Problems understanding example code

Hi there,

Just to make one thing clear right at the very beginning: I am new to CUDA programming and probably my question here sound rather stupid… but maybe someone wants to help me anyway :)

Here we go:

In the example codes there is one project called “scalarProd” and i am trying to figure out the “”

More specific: The for loops which are starting in line 73

for(int vec = blockIdx.x; vec < vectorN; vec += gridDim.x){

In the kernel is launched with

scalarProdGPU<<<128, 256>>>(d_C, d_A, d_B, VECTOR_N, ELEMENT_N);

So i can assume that 128 (blocks being launched) is actually (128,1,1) (it should be of type dim3 and any component left unspecified is initialized to 1) and the same should be valid for 256 (threads per block)–> (256,1,1)

ok, so far, so good

now comes the tricky part (at least for me):

in the for-loop mentioned above “vec” is incremented by gridDim.x which is equal to 128, isn’t it?

If this is the case, then the whole loop wouldn’t make “any sense” because then i have several problems with the indices in line 87

The index should go from 0 to N_ELEMENTS (which is equal to 4096) but if I assume gridDim.x to be 128, I get some sort of an “illogical computation”.

My explanation is, as far as I understood the whole GPGPU thing, that the Kernel itself is loaded several times SIMULTANEOUSLY, so that gridDim.x starts at 0 and then increments at every kernel call by 1.

I hope you understood my problem :/