I’m looking for fast matrix transpose of any matrix size M*N.
The data format is very simple:
global
void transpose(byte* output, byte* input, int w, int h)
which turns a HxW matrix into a WxH matrix; input/output are arrays size w*h.
Which library should I use? The only concern is speed.
Thank you.