I’ve been working on the SDK’s Transpose for quite some time now and have encountered problems with matrices with dimensions that are not a multiple of BLOCK_DIM.
I have read all the forum posts that were made regarding this issue in the past, and there indeed were quite a few. One of the suggested solutions was zero padding - however, this creates such a massive overhead in computation time that removing the zero padding at the end of the computation takes longer than the entire process when it is made using the CPU.
My code which includes the zero padding is here.
Other suggested solutions were also either computationally cumbersome or simply failed to work.
So what I’m looking for is an efficient solution for transposing very large yet irregularly shaped 2D matrices.
I’d appreciate any help the community can lend here…