I need to cross multiply 2 vector arrays, and I am using one thread for each element in the resulting array.
at the moment im just calling the elements directly from global memory like so:
global void CrossMulArray(cufftComplex A_d,cufftComplex B_d,cufftComplex C_d,int BATCH)
int idx = (blockIdx.y65535256)+(blockIdx.x256)+threadIdx.x;
int idx2 = threadIdx.x;
C_d[idx].x= -1 * ((A_d[idx2].y * (-1*B_d[idx].y)) - (A_d[idx2].x * B_d[idx].x)); C_d[idx].y= -1 * ((A_d[idx2].y * B_d[idx].x) + (A_d[idx2].x * ( -1 * B_d[idx].y))); }
but since shared memory is so much faster, i was going to load all A_d and B_d elements in shared memory for each block that I launch.
how would i attempt that?
I tried declaring:
shared float ax=A_d[threadIdx.x].x;
shared float ay=A_d[threadIdx.x].y;
but that gives me an error.
any suggestions? what am i doing wrong?