Extracting a submatrix using cublas

Hi everyone,

I hoped to find a topic about this but didn’t. Here is my issue/question:

Basically, I have a nxm matrix and I would like to extract a sub-matrix from it.

Extracting the first line or the first column seems pretty easy using a simple scopy, but when it comes to extracting the bottom-right elements I can’t figure out how to do that using cublas.

I noticed that I can perform operations on a submatrix using the lead dimensions, but here I just want to “extract” a part of a matrix.

I wrote a kernel myself that does the job correctly and is pretty simple, but it is really not good in terms of performances… :/ I write it here for info it extract n-1 rows and m-1 columns from the origin matrix

if origin is
3 4 12 5
2 5 11 3
1 3 8 2

destination should be
5 11 3
3 8 2

__global__ extractBottomRightMatrix(T *dest, const T *orig, int nbRows, int size)
{
     int i = blockDim.x * blockIdx.x + threadIdx.x;

     if (i < size)
          dest[i] = orig[i + nbRows + 1 + i/(nbRows - 1)];
}

First of all, is it possible using cublas ? Or just cuda ? If yes, can anyone give me some pointers ?

Thanks ! :)

In cuda this can be done using cudaMemcpy2D

http://docs.nvidia.com/cuda/cuda-runtime-api/group__CUDART__MEMORY.html#group__CUDART__MEMORY_1g3a58270f6775efe56c65ac47843e7cee

In cublas this can be done using cublasGetMatrix/cublasSetMatrix

http://docs.nvidia.com/cuda/cublas/index.html#cublassetmatrix

“This function copies a tile of rows x cols elements from a matrix A …”

Thank you for this answer.

Unfortunately, it implies copies from/to the device from/to the host, which I would like to avoid.

I was hoping to do that without memory transfers. ;)

cudaMemcpy2D can do it as a device->device copy

but a well-written copy kernel should be just as fast