Transpose with cublasDgeam routine for row major ordered rectangular matrice

Hey all,

I am learning C++ and Cuda and have some difficulties to pick the right parameters for the routine.

I have a row major matrice of size M*N where M>N .I would like to take transpose of the matrix in the beginning of the code in order to be able to deal with cusolver and cublas libraries working with column major matrices. But even the particular routine for transpose in cublas work with column major order matrices and i somehow give the wrong parameters to the routine that causes an invalid value error. Could not find out the right order of the parameters .

this is the output on the screen.

GPU Device 0: "GeForce GTX 1080" with compute capability 6.1

Cuda environment is starting...
factoring for k=0:
cublasSafeCall() failed at ../src/transpose.cu:40 : CUBLAS_STATUS_INVALID_VALUE

This is the code

#include "transpose.h"

#include "cublas_v2.h"
#include <cuda_runtime.h>

#include <stdlib.h>
#include <stdio.h>
#include <assert.h>

#include "errorChkcublas.h"

void trans(double * V,
           int M,
           int N)
{
    int lda = N;
    int ldb =M;
    int ldc = N;
    cublasHandle_t handle;
    cublasStatus_t status;

    //double * clone;
    //clone = V;

    //cudaMalloc((void **)&clone , M * N * sizeof(float));

    status = cublasCreate(&handle);

    if (status != CUBLAS_STATUS_SUCCESS)
        {
            printf("cublasCreate returned error code %d, line(%d)\n", status, __LINE__);
            exit(EXIT_FAILURE);
        }
    const double alf = 1.0;
    const double bet = 0.0;
    const double *alpha = &alf;
    const double *beta = &bet;

    CublasSafeCall(cublasDgeam( handle, CUBLAS_OP_T, CUBLAS_OP_N, M, N, alpha, V, lda, beta, V, ldb, V, ldc));
    cudaDeviceSynchronize();

    cublasDestroy(handle);

Instead of creating an axuliary matrix and of copying from V to it which is costly, I gave V to the 10th parameter in the routine. Would that be a problem?

Thanks in advance for help!!