[font=“Courier New”][font=“Courier New”]Hello. I have a relatively large matrix (7000x7000 floats) with only the upper half occupied. I need to find the minimum element and its index for each column, and then the minimum of all the minimums (I need to know the index as well as value). I’ve looked at the SDK scan and reduction routines, but am not experienced enough to know how to apply them efficiently.
My initial approach is to create 7000 blocks (one for each column) and find the minimum for each. Then with a second kernel find the minimum of the results. To avoid idle threads in the lower half blocks (where columns are mostly empty) I thought to let the threads in the lower blocks “help” the upper blocks by working on some of their elements, and then merge their results with the results of the local threads. But it’s messy and complicated and I’m not sure if it’ll be efficient.
Can anyone please give me some guidance as to how to approach this? For my given data size what size grid and block should I use? How should I organize the data efficiently, and how should I address the issue of not being a power of two? I have searched the forum for related discussions and have read many of the posts, but still haven’t found anything that answers my questions. Thanks very much for your help.
[/font][/font]