This is my first time post questions here. So I need to calculate N signals’ mean values using reduction. The input is 1D array of size MN, M is the length of each signal. Originally I had additional shared memory to first copy the data and do the reduction on each signal. However, the original corrupt is corrupted. So I was wondering how I can use registers to do reduction sum on N signals. (I know how to do with sequential addition using multi-thread programming and 1 register). The reason is that I want to reduce the shared memory I declared in the most to minimum for latter use.
Anyone would give me some hints? Thanks for your help!