Memory size in 'real problem' sizes?!

myname · May 26, 2011, 4:38pm

Hey there,
is there a way to get an idea on how big the shared memory size of a certain GPU is, e.g. by representing it in the size of ‘real problems’, e.g. lets say ‘you can solve a linear equation system of N equations depending on N variables’, where N is unknown and limited by the shared memory. I want to know N e.g. for a GTX460 so that I know that for a greater N the solution will be computed (at least partially) in global memory which is slower…
Thanks!

avidday · May 27, 2011, 7:04am

There isn’t much mystery about the shared memory size - it is either 16kb or 48kb per multiprocessor, depending on what hardware you are using and how it has been configured.

Beyond that I don’t really understand the rest of the question. Shared memory is per multiprocessor scratch memory which can be used for sharing and reusing data between threads within a block. It is almost universal that an entire input data set won’t fit into shared memory, and it is also almost universal that algorithms are implemented “tile wise” or “sub domain wise” or “block wise” for this reason. The shared memory size dictates the tile/sub-domain/block size, not the maximum admissible problem size.

If you are having difficulty understanding how shared memory can be used in this way, I highly recommend several of the examples in the SDK - transpose, matrixMul, reduction, and FDTD3d. The first three have very useful papers included which describe the algorithms and the thinking behind the GPU code design. Between them you get to see four very different uses for shared memory, all of which allow data reuse and intra-block communication in this sort of “tile wise” algorithms.

myname · May 27, 2011, 2:38pm

What I didn’t find yet is a matrix inversion algorithm… Is there an example on matrix inversion somewhere?
I wonder about wether the matrix inversion can be done in shared memory by splitting it into tiles because there’s much more data dependency than in something like vector-reduction or matrix additions…?

avidday · May 27, 2011, 3:17pm

You can find good CUDA versions of the three most common matrix factorization routines here (there are many others floating around too). The basic structure follows “look ahead” versions of the block factorization algorithms found in Lapack. They can be built out of level 3 blas functions like gemm and syrk. So the idea isn’t to do the whole factorization in a single kernel, but in multiple, overlapping operations to factorize the matrix a block at a time. CUBLAS includes triangular matrix solvers for solving with the result of one of these factorizations.

myname · May 27, 2011, 9:55pm

Okay thanks so far; is my assumption right that all the operations are refering to shared memory?

avidday · May 30, 2011, 8:06am

No. Shared memory only has a lifetime of a single block in a single kernel launch. Internally, some of the BLAS calls are undoubtedly using shared memory, but the matrix being factorized is in global memory. There is no other way to do it in CUDA.

myname · May 31, 2011, 8:09pm

Okay thanks again for your answer; that’s because the complete set of data is dependant on the other data, right?

Topic		Replies	Views
About the different memories CUDA Programming and Performance	12	11479	December 6, 2007
Example of matrix multiplication (max. block_size) CUDA Programming and Performance	2	11572	January 28, 2010
Help to understand the frame of CUDA programming CUDA Programming and Performance	2	1412	November 30, 2014
How to implement shared memory of smaller size than problem? CUDA Programming and Performance	1	569	April 12, 2017
Cant understand Shared Memory Concept ! I want to talk Live to somebody who knows it !!& CUDA Programming and Performance	2	1437	April 13, 2009
shared memory and CUDA calculator CUDA Programming and Performance	6	4035	October 26, 2008
Grids and Threads question CUDA Programming and Performance	2	4421	August 7, 2007
GPU Allocating memory Memory allocation on GPU CUDA Programming and Performance	2	4644	April 23, 2009
Shared Memory Buffer CUDA Programming and Performance	1	2676	May 13, 2011
Shared memory declaration Simple question about shared memory CUDA Programming and Performance	2	2970	October 5, 2009

Memory size in 'real problem' sizes?!

Related topics