Can CuBLAS do a simple transpose?

I have a matrix that is 2x3 (stored in row-major from c). I want to use CuBLAS to tranpose the matrix to 3x2. I tried:

float alpha = 1.0f;
 float beta = 0.0f;
// tranpose(da) -> dat, C=2, T=3
                CUBLAS_OP_T, CUBLAS_OP_T,  
                C, T,              
                &alpha, da, T,        
                &beta, da, T,   
                dat, C              

But the result I get is a weird strided output:

Original matrix:
[0.849739, 0.989397, 0.288401;
0.46367, 0.471273, 0.158544]
Transposed matrix:
[0.849739, 0.288401;
0.471273, 0.989397;
0.46367, 0.158544]

Pretty sure that the column-major input for CuBLAS is causing this but I can’t pinpoint what’s happening. Any help would be appreciated!

On a higher level, I want to use CuBLAS to do matrix transpose but now I’m unsure if that’s even possible/intended.

  • CUBLAS expects column-major data storage.
  • the geam function is the usual one suggested for just a transpose
  • its often recommended to avoid transposing or moving data unnecessarily
  • if you want to handle row-major input in CUBLAS, it can be done in some cases, but it requires special manipulation of parameters. Here is a recent related question. Note the link in the comments to the previous question with the excerpted treatment by Mr. Wittek.

If I am not mistaken, this arrangement seems to work for your test case:

# cat
#include <iostream>
#include <cublas_v2.h>

int main(){

  float *dat, *da;
  const int R = 2;
  const int C = 3;
  int T = R;
  cudaMallocManaged(&dat, R*C*sizeof(dat[0]));
  cudaMallocManaged(&da,  R*C*sizeof(dat[0]));
  float di[R*C] = {0.849739, 0.989397, 0.288401, 0.46367, 0.471273, 0.158544};
  float alpha = 1.0f;
  float beta = 0.0f;
  cublasHandle_t handle;
  memcpy(da, di, R*C*sizeof(da[0]));
// tranpose(da) -> dat
                CUBLAS_OP_T, CUBLAS_OP_T,
                T, C,
                &alpha, da, C,
                &beta, da, C,
                dat, T
  for (int i = 0; i < R*C; i++) std::cout << dat[i] << " ";
  std::cout << std::endl;
# nvcc -o t303 -lcublas
# compute-sanitizer ./t303
0.849739 0.46367 0.989397 0.471273 0.288401 0.158544
========= ERROR SUMMARY: 0 errors

FWIW I note that my presentation of arguments is exactly the same as yours. So whatever problem you were having is not evident from what you have posted/shown. The geam call you have indicated does not result in the transposed matrix you indicated. Anyway, I think it works. (You would get the output you indicated if you set C=3, T=2, but that is contrary to what you have indicated in the comment before the call.)

