SDK Transpose revisited ... yet again!


I’ve been working on the SDK’s Transpose for quite some time now and have encountered problems with matrices with dimensions that are not a multiple of BLOCK_DIM.

I have read all the forum posts that were made regarding this issue in the past, and there indeed were quite a few. One of the suggested solutions was zero padding - however, this creates such a massive overhead in computation time that removing the zero padding at the end of the computation takes longer than the entire process when it is made using the CPU.

My code which includes the zero padding is here.

Other suggested solutions were also either computationally cumbersome or simply failed to work.

So what I’m looking for is an efficient solution for transposing very large yet irregularly shaped 2D matrices.

I’d appreciate any help the community can lend here…



did you try to insert branching instead of zero padding?

increase number of blocks to the nearest (bigger) multiple of 16
in each thead, check it’s index if it’s greater than the irregular size of your matrix then just do nothing

I am not sure if anybody already suggested that already, it’s just wild guess.

This works, I’m not 100% sure it’s teh fastest, but it’s fine, and it works. Confirmed.

I am glad I could help! :-) :thumbup: