Efficient Matrix Transpose off-the-shelf solution?

I had a look at the implementations that come with the SDK, but I wondered if there is somewhere an off-the-shelf solution that works for matrix dimensions of arbitrary size?