Solving larger problems from Fortran RAM is larger than GPU memory.

RAM in a compute node is larger than GPU memory. CUBLAS called from Fortran works really wonderful, but the problem sizes that can be solved are limited to amount of RAM on the GPU card. I would love to have a library that splits a large matrix (RAM sizes - maybe 16 GB or more) into smaller pieces that can fit within the GPU memory. This is well known, but the code should be placed in the library. Maybe the Fortran.c code in Fortran wrapper package or anywhere else. Point is that seen from Fortran one should only call s or c with any size matrix or vector. Also I am still waiting for double precision data types.

Any comments from developers ?

The Fortran wrappers and library to enable calling from Fortran really makes the GPU a useful tool. The speedup is just great !


You can make your own s/c_big if it is well known or extend the library. You can download the sources of CUBLAS, so nothing is keeping you from implementing it. :D

You could split the matrix into appropriately sized blocks, but depending on the algorithm you use, ithis can get tricky… But at that point there is also room for more performance, since the jobs for each block could run on different gpu(/cpu)s.

So let me know, when you have a CUBALS_big-Library at hand :thumbup: