I have 1000 arrays(lets call them G1, G2, G3…) of length 256 in memory as a 1d array and I need to multiply each of the “G” arrays by Array F
I currently have this kernel:
global void MulArray(cufftComplex *A_d,cufftComplex *B_d,cufftComplex *C_d)
int idx = blockIdx.x+threadIdx.x;//index of element along array B_d (G1, G2, G3... are all in memory back to back) int idx2 = threadIdx.x;//index of element in array A_d (array F) C_d[idx].x=(A_d[idx2].y*(B_d[idx].y))+(A_d[idx2].x*B_d[idx].
now the problem is that I only get results for the first 256 elements…
I think this might be because multiple threads try to read the same element at the same time…
what could i do to fix this?? would i have to just make enough copies of F to match up with every G??
any pointer appreciated