Solving larger problems from Fortran RAM is larger than GPU memory.

olewsaa · March 12, 2008, 9:02am

RAM in a compute node is larger than GPU memory. CUBLAS called from Fortran works really wonderful, but the problem sizes that can be solved are limited to amount of RAM on the GPU card. I would love to have a library that splits a large matrix (RAM sizes - maybe 16 GB or more) into smaller pieces that can fit within the GPU memory. This is well known, but the code should be placed in the library. Maybe the Fortran.c code in Fortran wrapper package or anywhere else. Point is that seen from Fortran one should only call s or c with any size matrix or vector. Also I am still waiting for double precision data types.

Any comments from developers ?

The Fortran wrappers and library to enable calling from Fortran really makes the GPU a useful tool. The speedup is just great !

Ole

DenisR · March 12, 2008, 9:16am

RAM in a compute node is larger than GPU memory. CUBLAS called from Fortran works really wonderful, but the problem sizes that can be solved are limited to amount of RAM on the GPU card. I would love to have a library that splits a large matrix (RAM sizes - maybe 16 GB or more) into smaller pieces that can fit within the GPU memory. This is well known, but the code should be placed in the library. Maybe the Fortran.c code in Fortran wrapper package or anywhere else. Point is that seen from Fortran one should only call s or c with any size matrix or vector. Also I am still waiting for double precision data types.

Any comments from developers ?

The Fortran wrappers and library to enable calling from Fortran really makes the GPU a useful tool. The speedup is just great !

Ole

[snapback]341563[/snapback]

You can make your own s/c_big if it is well known or extend the library. You can download the sources of CUBLAS, so nothing is keeping you from implementing it. :D

AdrianCG · March 20, 2008, 2:47pm

You could split the matrix into appropriately sized blocks, but depending on the algorithm you use, ithis can get tricky… But at that point there is also room for more performance, since the jobs for each block could run on different gpu(/cpu)s.

So let me know, when you have a CUBALS_big-Library at hand External Media

Topic		Replies	Views
CUBLAS matrix multiplication matrix size limited by GPU memory size CUDA Programming and Performance	8	3438	August 2, 2010
CUBlas and very large matrices CUDA Programming and Performance	3	827	September 30, 2019
simple matrix (or matrix vector) multiplication using CUBLAS CUDA Programming and Performance	9	5563	November 25, 2009
Memory size in 'real problem' sizes?! CUDA Programming and Performance	6	6920	May 31, 2011
CUBLAS Library and Fortran Bindings Fortran 2003 provides interoperability standards CUDA Programming and Performance	4	3224	April 24, 2009
CUBLAS VS CBLAS sgemv Benchmarking matrix-vector operations on GPU and CPU CUDA Programming and Performance	5	9997	March 24, 2014
CUBLAS operating on different parts of an array CUBLAS based code development CUDA Programming and Performance	2	8075	August 8, 2010
A few Questions related to CUDA and CUBLAS CUDA Programming and Performance	0	908	February 1, 2013
CUBLAS Configuration The use of CUBLAS for small matrix CUDA Programming and Performance	3	3723	April 4, 2007
How to improve performance when multiply two matrices with large data in CUDA ? CUDA Programming and Performance	5	3909	March 19, 2014

Solving larger problems from Fortran RAM is larger than GPU memory.

Related topics