Hello Everyone,
Hope you all are doing well.
I have written some __ device__ code for parallelisation of my algorithm. Inside my kernel or device code, I have various nested for loops and inside those loops, there are some multidimensional arrays having those loop variables as dimensions.
The snippet can be seen below:
int natoms =147;
int h = blockIdx.y*blockDim.y+threadIdx.y;
if((h < 147){
for(ll=0;ll<lmax;ll++)
{
int ju = blockIdx.x*blockDim.x+threadIdx.x;
for(b_mm=0;b_mm<19;b_mm++)
{
if(ju<natoms)
{
m_1 = abs(mm);
if (mm<0)
{
temp1 = cuCmul(q, make_cuDoubleComplex(-m_1,0));
ex1[(h*147)+ju] = cuCmul(temp1, make_cuDoubleComplex(fophi1[(h*147)+ju],0));
comp_1_w[(h*147)+ju] = my_complex_exp(ex1[(h*147)+ju]);
}
else
{
temp2 = cuCmul(q, make_cuDoubleComplex(m_1,0));
ex2[(h*147)+ju] = cuCmul(temp2, make_cuDoubleComplex(fophi1[(h*147)+ju],0));
ex3[(h*147)+ju] = my_complex_exp (ex2[(h*147)+ju]);
comp_1_w[(h*147)+ju] = cuCmul(ex3[(h*147)+ju], make_cuDoubleComplex(result, 0));
//printf("comp_1_w values : %f %f\n",comp_1_w[(h*147)+ju].x,comp_1_w[(h*147)+ju].y);
}
mult1[(h*147)+ju] = cuCmul(comp_1_w[(h*147)+ju], make_cuDoubleComplex(cons1[(ll*10)+b_mm], 0));
Lmmnn = Lmn_all[ju][ll][m_1];
}
}
}
}
As can be seen from above code, cons1[(ll*10)+b_mm] depends on ll and b_mm for loops.
similarly, the statement below:
Lmmnn = Lmn_all[ju][ll][m_1];
where Lmn_all is a multidimensional array depending on loop variable.
I am not getting correct values of Lmmnn variable in last line. Please anyone help how to handle these multidimensional arrays inside for loops in a parallelized code.
Thank You.