I must work with matrices stored in row major order format and I want to use CUBLAS and CULA (and possibly cuSOLVER). The original matrices in row major order are stored in the host memory. As CUBLAS et al. work only with column major order scheme, I need to copy my data to the GPU in column major order. But the function
cublasStatus_t cublasSetMatrix(int rows, int cols, int elemSize, const void *A, int lda, void *B, int ldb)
works only with matrices in colum major order.
Exist any way to copy a host row major order-strored matrix to a device-column major order matrix? Probably it could be copied row by row (or column by column) using
cublasStatus_t cublasSetVector(int n, int elemSize, const void *x, int incx, void *y, int incy)
But I’m not so sure about the way to made the copy. Has anyone the same problem?
My question is also about the recovery from device to host for the results, in this case from column major order to row major order.
Maybe it can be used the generic functions from CUDA cudaMemcpy*, but I’m a bit confused about the parameters. Has anyone an example about?