hello everyone,
I have 1000 arrays(lets call them G1, G2, G3…) of length 256 in memory as a 1d array and I need to multiply each of the “G” arrays by Array F
I currently have this kernel:
[codebox]
global void MulArray(cufftComplex *A_d,cufftComplex *B_d,cufftComplex *C_d)
{
int idx = blockIdx.x+threadIdx.x;//index of element along array B_d (G1, G2, G3... are all in memory back to back)
int idx2 = threadIdx.x;//index of element in array A_d (array F)
C_d[idx].x=(A_d[idx2].y*(B_d[idx].y))+(A_d[idx2].x*B_d[idx].
x);
C_d[idx].y=(A_d[idx2].y*B_d[idx].x)+(A_d[idx2].x*(-1*B_d[idx].y));
}
[/codebox]
now the problem is that I only get results for the first 256 elements…
I think this might be because multiple threads try to read the same element at the same time…
what could i do to fix this?? would i have to just make enough copies of F to match up with every G??
any pointer appreciated
thanks