nvblas

I have a large matrix which I need to calculate its dot product. The matrix size is 900000,3 and I transpose it and calculate its dot product.

I have tried NVblas with python and it works but I fear I am getting the appropriate coverage via block parameter. What should my NVBLAS_TILE_DIM be set for such a large matrix?

Any reason why you cannot use cuBLAS?

http://docs.nvidia.com/cuda/cublas/index.html#cublas-lt-t-gt-dot

Which function are you using to compute the dot product?

I don’t want to calculate the dot product, instead I would like to calculate the Matrix product. So, I would like to use

http://docs.nvidia.com/cuda/cublas/index.html#cublas-lt-t-gt-gemmbatched

For my matrix size, (900000,3). What shall my m,n,k,lda,ldb set as? And will this function work for 2d matrix? It seems to work with 3d matrix fine.

batched gemm is not one of the functions accelerated by nvblas:

http://docs.nvidia.com/cuda/nvblas/index.html#routines

So your question about NVBLAS_TILE_DIM probably isn’t relevant.

What exactly do you mean by matrix product?

If you simply want to multiply a (900000,3) matrix by a (3,900000) matrix, you should just use gemm, not gemm batched:

http://docs.nvidia.com/cuda/cublas/index.html#cublas-lt-t-gt-gemm

Correct, my question is not related to NVBLAS anymore. Its more of a cublas question.

I am having trouble setting the proper m,n,k,lda,ldb values.

For which cublas function? gemm or gemm batched? gemm batched has a fairly involved setup compared to just using gemm