Hi all,

it seems that my code does nothing, don’t know what’s wrong with it tough…

```
int Base = threadIdx.x;
int End = Base + elementN;
float sum = 5.0f;
for(int i = Base; i < End; i += blockDim.x){
sum += d_data_A[i] * d_data_B[i];
}
//__syncthreads();
d_data_C[0] = sum;
}
```

when this kernel function returns, the result is 5.0, so the variable sum stays unchanged.

Can anyone tell me what i do wrong please?

If can help this one is the call:

```
dim3 grid(1);
dim3 threads(elementN);
CUDA_SAFE_CALL( cudaThreadSynchronize() );
scalarProdGPU<<<grid, threads>>>(d_data_C, d_data_A, d_data_B, elementN);
CUDA_SAFE_CALL( cudaThreadSynchronize() );
```

Thanks in advance