I am having problem with a code that calculates dot product of two vectors r1 and r2. This should be simple but for some reason i am unable to get it to work.

The vectors are one dimensional vectors.

```
dim3 block (BlockSize, 1);
dim3 grid(vec_size/block.x, 1);
dot2 <<<grid, block>>>(r1, r2, result);
.........................................
__global__ void dot2(float* r1, float* r2, float* result){
Â Â Â Â int tid = blockDim.x * blockIdx.x + threadIdx.x;
Â Â Â Â float sum = r1[tid] * r2[tid];
Â Â Â Â __syncthreads();
Â Â Â Â *result += sum;
}
```

The problem is with the kernel code. It looks fine to me. I cant find the error. By problem I mean the answer from the CPU and CUBLAS does not match. So clearly I am missing out something.

Any help will be appreciated.