large matrix multiply

mcoffey · September 18, 2011, 7:57pm

Im trying to multiply large matrices on a CUDA device and want to know if my supposition is correct.
I assume if a device has limited RAM then the calling routines need to break the matrices into blocks to manipulate on the device. Im currently trying to multiply 2 matrices of around 1.75GB each and am writing a wrapper to do it in blocks. However, are there routines already out there that do this? Has anyone already done this? The idea is to produce a wrapper that will automatically break a matrix into the required number of blocks appropriate to the number of devices and power available.

Im trying to matrix multiply then invert a matrix of 30,000 * 50,000 full precision

Thanks for any guidance

Topic		Replies	Views
matrix multiplication for large matrices CUDA Programming and Performance	3	1658	August 22, 2011
Query on Matrix Multiply performance when the matrix is very huge CUDA Programming and Performance	3	920	January 7, 2016
CUBLAS matrix multiplication matrix size limited by GPU memory size CUDA Programming and Performance	8	3658	August 2, 2010
How to improve this matrix multiplication code in CUDA? CUDA Programming and Performance	6	1576	July 2, 2015
Large Matrix Multiplication and Inversion Matrices that does'nt fit in GPU-Memory CUDA Programming and Performance	1	4553	September 19, 2011
Huge Matrices General question about how best to deal with very large matrices >4 CUDA Programming and Performance	8	2285	July 6, 2009
Matrix multiplication woes large inner, small outer dimensions CUDA Programming and Performance	21	10372	March 24, 2009
max matrix size in matrix multiplication matrix example in programming guide CUDA Programming and Performance	6	7043	November 5, 2007
PGI fortran large matrix Legacy PGI Compilers	5	5920	December 23, 2016
matrix_mul with max_size CUDA Programming and Performance	1	1120	May 21, 2010

large matrix multiply

Related topics