Thread theory? Splitting up a lower triangle matrix into threads

crip_crop · November 16, 2010, 12:49pm

Hi everyone,

just a quick one really… just thinking about threads and how the best way to spilt up my problem is. I’m working on the lower triangle of a matrix. Ideally I want to utilise a 2-dimensional thread grid to split the matrix elements into threads, however, as only the lower triangle of the matrix is being used, this technique will result in idle threads. Obviously I don’t want to have idle cores during execution, but I’m not sure how the scheduler works and how linked the threads and cores are. Does anyone know if having idle threads means I will have idle cores? If so, does anyone have any other suggestions for splitting up a lower triangle of a matrix in to threads.

Cheers,
crip_crop

crip_crop · November 16, 2010, 12:49pm

Hi everyone,

just a quick one really… just thinking about threads and how the best way to spilt up my problem is. I’m working on the lower triangle of a matrix. Ideally I want to utilise a 2-dimensional thread grid to split the matrix elements into threads, however, as only the lower triangle of the matrix is being used, this technique will result in idle threads. Obviously I don’t want to have idle cores during execution, but I’m not sure how the scheduler works and how linked the threads and cores are. Does anyone know if having idle threads means I will have idle cores? If so, does anyone have any other suggestions for splitting up a lower triangle of a matrix in to threads.

Cheers,
crip_crop

Dittoaway · November 17, 2010, 6:47pm

'

Dittoaway · November 17, 2010, 6:47pm

'

crip_crop · November 18, 2010, 11:32am

Hey Dittoaway,

you seem to have edited away your post… was that intentional?

crip_crop · November 18, 2010, 11:32am

Hey Dittoaway,

you seem to have edited away your post… was that intentional?

Dittoaway · November 18, 2010, 2:51pm

Yes. I realized I had written something incredibly stupid. :(

Dittoaway · November 18, 2010, 2:51pm

Yes. I realized I had written something incredibly stupid. :(

avidday · November 18, 2010, 4:54pm

Probably the simplest way to deal with something like a triangular matrix is forget about trying to map a grid into the matrix and just launch a threads for every non-zero if you are doing element type operations. For dot product and the like, it would probably be more effect to have one warp or block per row or column, and just live with the fact that some warps or blocks will do less net work that others. That would certainly still be preferrable to having half a grid worth of threads doing no work.

avidday · November 18, 2010, 4:54pm

Probably the simplest way to deal with something like a triangular matrix is forget about trying to map a grid into the matrix and just launch a threads for every non-zero if you are doing element type operations. For dot product and the like, it would probably be more effect to have one warp or block per row or column, and just live with the fact that some warps or blocks will do less net work that others. That would certainly still be preferrable to having half a grid worth of threads doing no work.

Dittoaway · November 18, 2010, 6:42pm

Depends on what you doing to the triangular matrix …

If the computation on each matrix element can be broken up, you could have thread i,j do part of the work and thread j,i do the rest. Of course the ‘diagonal’ threads would have to do both.

Dittoaway · November 18, 2010, 6:42pm

Depends on what you doing to the triangular matrix …

If the computation on each matrix element can be broken up, you could have thread i,j do part of the work and thread j,i do the rest. Of course the ‘diagonal’ threads would have to do both.

Antagonistic · November 19, 2010, 8:55am

I’ve written about my approach on this. An upper triangular, but I am sure you can adjust it to your needs:

It involved launching a 1D group of threads, but a quick algorithmic mapping to find its 2D coordinate.

Antagonistic · November 19, 2010, 8:55am

I’ve written about my approach on this. An upper triangular, but I am sure you can adjust it to your needs:

It involved launching a 1D group of threads, but a quick algorithmic mapping to find its 2D coordinate.

crip_crop · December 1, 2010, 3:27pm

Thanks for the responses, they’re really useful.

The code that I’m working on calculates the forces for individual atom pairs. As they’re all independent calculations for each atom pair I’m going to try putting the atom pair combinations into a 1D matrix and then decomposing the matrix into threads, so each thread works on an atom pair. This way I won’t suffer from load imbalances. I’m not too sure what the best strategy for computing the atom pair matrix is though. I have 2 options: compute that atom pair matrix once on the host and then transfer it across to the device where it will be decomposed into threads; or compute the atom pair matrix on the device before decomposing it into threads… this however would mean every thread computes the matrix.

I’m not too sure which would be the better strategy, performance-wise. Any suggestion or comments would be much appreciated.

Crip_crop

crip_crop · January 23, 2012, 2:45pm

Hi Antagonistic,

I’ve been playing around with your algorithm and trying to adjust it for the lower triangle (including diagonal) but I’m really struggling. I guess my maths isn’t up to scratch. Could you give me any tips on how I would do this?

Cheers,

Crip_crop