cublasxt auto chunking

JoeMD · May 15, 2014, 5:42pm

One of the listed features for cublasxt is that matrices are limited only by host memory size, not gpu memory. I’m curious if this also applies to the free version running on a single GPU; i.e. I have a single 6GB Tesla and I try to do a matrix multiplication with two 10GB matrices (and the host has 64GB RAM) – can I just throw them in, or do I need to pre-chunk since it is the free version and/or a single GPU?

(I’d try it myself but I don’t have access to that configuration with CUDA 6 at the moment, trying to plan ahead and convince the cluster admins to upgrade from 5 sooner rather than later.)

philippev · May 20, 2014, 5:49am

Yes, you can just throw them in.
To get better perf, you shoud pin the matrix on the Host ( using malloc + cudaHostRegister, or cudaHostAlloc)

If you have a PCI Gen3 PC and Kepler K20 or K40, a block dimension of 2K is enough to overlap computation/PCI transfer. If you have PCI Gen2, make the tile bigger.

Topic		Replies	Views
CUBLAS matrix multiplication matrix size limited by GPU memory size CUDA Programming and Performance	8	3566	August 2, 2010
"Invalid pitch argument error" from cublasSetmatrix Limitation on size of matrix CUDA Programming and Performance	5	6321	April 8, 2009
cublas/fft on multiple gpus ? CUDA Programming and Performance	1	1375	December 12, 2008
Large memory Matrix GPU-Accelerated Libraries	9	5281	November 28, 2015
Cublas functions, matrix size limit..? Able to allocate too much memory through cublasAlloc CUDA Programming and Performance	0	2359	March 18, 2009
Query on Matrix Multiply performance when the matrix is very huge CUDA Programming and Performance	3	873	January 7, 2016
Memory allocation cuda c++ Teaching and Curriculum Support	1	1194	January 2, 2015
matrix multiplication for large matrices CUDA Programming and Performance	3	1608	August 22, 2011
problem with big matrix CUDA Programming and Performance	3	1982	August 29, 2008
Matrix multiplication ERRORS & few thoughts on CUDA Basic programming errors need correction CUDA Programming and Performance	14	13348	January 24, 2009

cublasxt auto chunking

Related topics