Reduction Code for arrays with random indexes

Hi Friends

we have the following cuda kernel

global void runUP(double *r,int *n1,int *n2,int *n3,int *n4)
{

double n_1,n_2,n_3,n_4;

int tid= threadIdx.x + blockDim.x * gridDim.x;

while(tid < 1000)
{
n_1=n1[tid];
n_2=n2[tid];
n_3=n3[tid];
n_4=n4[tid];

r[n_1] += (some code like a1+a2a3… etc) //line 1
r[n_2] += (some code like a1+a2
a3… etc) // line 2
r[n_3] += (some code like a1+a2a3… etc) //line 3
r[n_4] += (some code like a1+a2
a3… etc) // line 4

tid +=1;
}

__synthreads();

}

the arrays n1 to n4 contain random values in int type.

the kernel is called from main program as : runUP<<<1,1>>>(d_r,d_n1,d_n2,d_n3,d_n4);

External Image

the above kernel runs fine for 1 block and 1 thread , but when we increase
the num of blocks and threads the above code does not run…

Pls hlp me and suggestions if any…

regds
Jashwantpreet Singh

Pls help us. Any suggestions is welcome.

if thinks it s more like that

if(tid < 1000)
{
n_1=n1[tid];
n_2=n2[tid];
n_3=n3[tid];
n_4=n4[tid];

r[n_1] += (some code like a1+a2a3… etc) //line 1
r[n_2] += (some code like a1+a2
a3… etc) // line 2
r[n_3] += (some code like a1+a2a3… etc) //line 3
r[n_4] += (some code like a1+a2
a3… etc) // line 4

}

__synthreads();