Hello,

i have an algorithm that only needs to process the upper triangular part of a matrix,

what im doing now is mapping the whole matrix size N x N to a 2D grid and then with if-else i filter which blocks get processed. im using shared memory btw.

i was wondering would i get a good speedup if i somehow manage to map the blocks only for the upper triangular part?