Reduction algorithm and matrices How to apply the SDK reduction algorithm to a half-matrix

mahnaz · May 15, 2009, 12:34am

[font=“Courier New”][font=“Courier New”]Hello. I have a relatively large matrix (7000x7000 floats) with only the upper half occupied. I need to find the minimum element and its index for each column, and then the minimum of all the minimums (I need to know the index as well as value). I’ve looked at the SDK scan and reduction routines, but am not experienced enough to know how to apply them efficiently.

My initial approach is to create 7000 blocks (one for each column) and find the minimum for each. Then with a second kernel find the minimum of the results. To avoid idle threads in the lower half blocks (where columns are mostly empty) I thought to let the threads in the lower blocks “help” the upper blocks by working on some of their elements, and then merge their results with the results of the local threads. But it’s messy and complicated and I’m not sure if it’ll be efficient.

Can anyone please give me some guidance as to how to approach this? For my given data size what size grid and block should I use? How should I organize the data efficiently, and how should I address the issue of not being a power of two? I have searched the forum for related discussions and have read many of the posts, but still haven’t found anything that answers my questions. Thanks very much for your help.

[/font][/font]

Topic		Replies	Views
Reduction algorithm and matrices How to apply the SDK reduction algorithm to a half-matrix CUDA Programming and Performance	2	1268	May 18, 2009
Problem with reduction CUDA Programming and Performance	1	2958	May 11, 2010
Reduction on odd number of thread / block CUDA Programming and Performance	5	1929	December 15, 2012
Matrix Reduction CUDA Programming and Performance	7	8305	November 18, 2009
Reduction CUDA Programming and Performance	14	8843	August 9, 2010
Reduction Operation to find the Minimum CUDA Programming and Performance	4	10801	November 23, 2009
Cuda : Reduce (max/min) function on matrix implementation CUDA Programming and Performance	1	1553	August 22, 2019
How to perform multiple small reduction efficiently? CUDA Programming and Performance	3	905	May 24, 2013
Optimization of kernel for batch convolution of many small matrices CUDA Programming and Performance	4	1715	August 1, 2013
Shared Memory and Matrix CUDA Programming and Performance	1	746	February 22, 2016

Reduction algorithm and matrices How to apply the SDK reduction algorithm to a half-matrix

Related topics