RAM in a compute node is larger than GPU memory. CUBLAS called from Fortran works really wonderful, but the problem sizes that can be solved are limited to amount of RAM on the GPU card. I would love to have a library that splits a large matrix (RAM sizes - maybe 16 GB or more) into smaller pieces that can fit within the GPU memory. This is well known, but the code should be placed in the library. Maybe the Fortran.c code in Fortran wrapper package or anywhere else. Point is that seen from Fortran one should only call s or c with any size matrix or vector. Also I am still waiting for double precision data types.
Any comments from developers ?
The Fortran wrappers and library to enable calling from Fortran really makes the GPU a useful tool. The speedup is just great !