This is my first time post questions here. So I need to calculate N signals’ mean values using reduction. The input is 1D array of size MN, M is the length of each signal. Originally I had additional shared memory to first copy the data and do the reduction on each signal. However, the original corrupt is corrupted. So I was wondering how I can use registers to do reduction sum on N signals. (I know how to do with sequential addition using multi-thread programming and 1 register). The reason is that I want to reduce the shared memory I declared in the most to minimum for latter use.
Anyone would give me some hints? Thanks for your help!
Uh, I should explain this more clearly. I have N threads, a shared memory (float) s_m[N*M], 0…M-1 is the first signal, etc. Do I need N registers (or one) to store do mean value of N different signals? The next step I want to do is subtract every value in the input from its correspondent signal’s mean. That’s why I don’t want additional shared memory. Hope that explains a little bit more. Thanks
I think I understand what it is you want to do, some pseudo code or real code of what you trying to do could still make it more crystal clear, maybe then others or me can help you better External Image