Multiple matrix transpositions

Hi all,

I have a sequence of Two-dimensional Arrays something like a cube. I would like to apply a high-level transposition function to each one of them something like CUBLA’s geam. What options do I have? I would like to avoid concurrent launches (there will 512 calls of them). Ideally, I’d like to use one high-level function for this operation. Thank you.