Hello all,
I have posted this thread before. That time I thought I got the answer but now I realised I still get improper results.
Explaining my problem,I have a kernal function where threads are concurrently writing into various vectors. I want threads to write to the end of a particular vector.The write operation should be exclusive in nature so I used atomicAdd() operation to provide index of writing for each thread.
Each “case” in the “switch” block represents my write operation on the vector. Each thread calculates an index for itself using atomicAdd and later writes to that index.
The problem which i am facing is that after the execution of this kernal, I found that not all the n threads are writing to the array.
If I pass 3000 threads then only 2700 or so are writing to the array.I found this by adding the counts of each vector (variable dc_counts is used for this purpose).
The code for my kernal function is given below :
template <typename T> __global__ void ClusterFormation(point *d_p,int *d_cnum, int k, T *d_c1, T *d_c2, T *d_c3, int *dc_counts)
{
int id=blockIdx.x * blockDim.x + threadIdx.x;
count1=count2=count3=0; // __device__ int variables
dc_counts[0]=dc_counts[1]=dc_counts[2]=0;
int *c1_val=&count1, *c2_val=&count2, *c3_val=&count3;
int num=d_cnum[id];
__syncthreads();
switch(num)
{
case 1:
int i1=atomicAdd(c1_val,1);
d_c1[i1]=d_p[id];
if(*c1_val > dc_counts[0]) //THread with highest count will write into dc_counts variable
dc_counts[0]=*c1_val;
break;
case 2:
int i2=atomicAdd(c2_val,1);
d_c2[i2]=d_p[id];
if(*c2_val > dc_counts[1])
dc_counts[1]=*c2_val;
break;
case 3:
int i3=atomicAdd(c3_val,1);
d_c3[i3]=d_p[id];
if(*c3_val > dc_counts[2])
dc_counts[2]=*c3_val;
break;
}
}
The code for my function call is given below :
n=3200;
tnum=400;
ClusterFormation<<<n/tnum , tnum>>>(d_p, d_cnum,k,d_c1, d_c2, d_c3,dc_counts);
Kindly help me as i get correct results for elements upto 3000 but beyond that the code simply does not work as expected.
Also let me know if the graphics card I am using can be the cause of my problem. I am using GeForce 820M.