matrix multiplication for large matrices

Rashmi_Mahima · August 14, 2011, 9:01am

Can anyone tell me how to multiply a matrix of 200000 by 200000 matrix with 200000 by 200000 using shared memory and tiling? the examples given in programming guide or cuda by example does not support for matrices whose size is more than 1024. Or is it not possible to use shared memory with tiling for such large matrices? Is it necessary to launch grid in the order of resultant matrix? Thanks in advance :)

avidday · August 14, 2011, 2:49pm

The matrix product you are asking about requires about 480Gb of memory in single precision, 960Gb in double precision. I would be much more worried about how to do this on a device with a maximum of 6Gb of ram, rather than any of the intricacies of the CUDA implementation.

apostglen46 · August 22, 2011, 2:21pm

Matrix Multiplication is a blocked algorithm is it not?So you can use streaming,althought you have to stream chuncks from the hard Drive to main memory and then to card.

avidday · August 22, 2011, 2:37pm

Of course, but the mechanics of that sort of out-of-core gemm implementation completely dwarf the minutae of what goes on in the GPU. At that size, it would be folly to use anything other than CUBLAS or MagmaBLAS for the GPU gemm kernel.

Topic		Replies	Views
Accelerate matrix multiplication CUDA Programming and Performance cuda	3	789	May 17, 2022
Problems in deciding Gridsize & Blocksize for kernel CUDA Programming and Performance	13	8808	June 8, 2010
Optimize problem regarding problem size CUDA Programming and Performance	4	6128	May 25, 2011
How to improve performance when multiply two matrices with large data in CUDA ? CUDA Programming and Performance	5	3925	March 19, 2014
A Question from Programming Massively Parallel Processors: A Hands-on Approach CUDA Programming and Performance cuda , kernel	0	633	September 28, 2021
Query on Matrix Multiply performance when the matrix is very huge CUDA Programming and Performance	3	857	January 7, 2016
large matrix multiply Legacy PGI Compilers	0	7716	September 18, 2011
Large matrix multiplication for neural network purpose CUDA Programming and Performance	1	731	October 6, 2016
Matrix Multiplication with Shared Memory CUDA Programming and Performance	0	1346	September 28, 2009
Example of matrix multiplication (max. block_size) CUDA Programming and Performance	2	11595	January 28, 2010

matrix multiplication for large matrices

Related topics