What I wish to achieve is
- I should have a single variable updated in different blocks.
So when I pass the execution configuration like <<<N,1>>> specifying that N blocks have 1 thread each, every time in the kernel function a variable should get incremented.
For eg : I have some code like this
#include<iostream>
#include<stdio.h>
using namespace std;
__global__ void test(int *a,int *b)
{
int i;
i=blockIdx.x;
if(i%2)
a++;
else
b++;
printf("%d -- %d\n",a,b);
}
int main()
{
int *da,*db;
int a=0,b=0;
cudaMalloc((void**)&da,sizeof(int));
cudaMalloc((void**)&db,sizeof(int));
cudaMemcpy(&da,&a,sizeof(int),cudaMemcpyHostToDevice);
cudaMemcpy(&db,&b,sizeof(int),cudaMemcpyHostToDevice);
test<<<30,1>>>(da,db);
cudaThreadSynchronize();
getchar();
return 0;
}
So at the end of the above code , shouldn’t variables a and b contain 15 in both? Because they are both in the global memory and change should be reflected in each call to the kernel function.
I get random values each time in a and b in the end. (Like sometimes 5 and 7, sometimes 4 and 2 etc.)
Any problem with the code? If not, then what might be the explanation?