Hi,
I seem to remember that latest CUDA versions have introduced built-in macros/function to do reduction/min-max code over
some array in memory. Am I wrong? Should I still implement this myself in code or there is something CUDA supplies?
I’m mainly interested in a built-in function to compute the min/max value from an array in smem…
mmm i am very interested in reduction.
i’m working on “many but small reduction”.
SDK examples sums 16M elements, but i my case i have to sum N arrays of M COMPLEX elements each, where M is small (max 8192, usually 512).
this is the case when you have to find the meanvalue for each row of a matrix with few columns ;) it’s a common problem in signals filtering.
mmm i am very interested in reduction.
i’m working on “many but small reduction”.
SDK examples sums 16M elements, but i my case i have to sum N arrays of M COMPLEX elements each, where M is small (max 8192, usually 512).
this is the case when you have to find the meanvalue for each row of a matrix with few columns ;) it’s a common problem in signals filtering.
How about running N threads, each thread adds M elements? If N is large and you use a friendly memory layout (e.g. column-major), this should be as fast as you can get.
How about running N threads, each thread adds M elements? If N is large and you use a friendly memory layout (e.g. column-major), this should be as fast as you can get.
Why not just cut and paste from the reduction SDK example? You can try with atomicmax, but my guess is that it will not be faster than the sdk example code.
Why not just cut and paste from the reduction SDK example? You can try with atomicmax, but my guess is that it will not be faster than the sdk example code.